Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 -related are hypermethylated in lung cancer and hypermethylated

2 HIST1H4F could serve as a pan-cancer biomarker

3 Shi-Hua Dong1,#, Wei Li1,#, Lin Wang2,#, Jie Hu3,#, Yuanlin Song3,#, Baolong Zhang1,

4 Xiaoguang Ren1, Shimeng Ji3, Jin Li1, Peng Xu1, Ying Liang1, Gang Chen4, Jia-Tao

5 Lou2†, Wenqiang Yu1†

6

1 7 Shanghai Public Health Clinical Center and Department of General Surgery, Huashan

8 Hospital, Cancer Metastasis Institute and Laboratory of RNA Epigenetics, Institutes of

9 Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai, 201508,

10 China.2Department of Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao

11 Tong University, Shanghai 200030, China.3Department of Pulmonary Medicine,

12 Zhongshan Hospital, Fudan University, 180 Fenglin Road, Shanghai 200032, China.

13 4Department of Pathology, Zhongshan Hospital, Fudan University, 180 Fenglin Road,

14 Shanghai 200032, China.

15 #These authors contributed equally to this work.

16

17 Running title

18 HIST1H4F region as a Universal-Cancer-Only Methylation marker.

19

20 Keywords

21 Lung cancer, DNA methylation signature, Histone , HIST1H4F, Universal-Cancer-

22 Only Methylation, Pan-cancer biomarker

23

1

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

24 Financial Support

25 “This work was supported by the National Key R&D Program of China (Grant No.

26 2018YFC1005004), the Science and Technology Innovation Action Plan of Shanghai

27 (Grant No. 17411950900), the National Natural Science Foundation of China (Grant Nos.

28 31671308, 31872814, and 81272295), Major Special Projects of Basic Research of

29 Shanghai Science and Technology Commission (Grant No. 18JC1411101), the Shanghai

30 Science and Technology Committee (Grant No. 12ZR1402200), the Ministry of

31 Education of the People’s Republic of China (Grant No. 2009CB825600), and the

32 Innovation Group Project of Shanghai Municipal Health Commission (Grant No.

33 2019CXJQ03).”

34

35 Corresponding author

36 †Wenqiang Yu, PhD&MD, Institute of Biomedical Sciences, Fudan University, 130

37 Dong’an Road, West 13# Building, Room 419, Shanghai 200032, P.R.China. Tel.: +86-

38 21-54237978, Fax: +86-21-54237339, E-mail: [email protected]

39 †Jiatao Lou, MD, Department of Laboratory Medicine, Shanghai Chest Hospital, 241

40 West Huaihai Road, Shanghai, 200030, China. Tel.: +86-21-2220000-1503, Fax: +86-21-

41 62808279, E-mail: [email protected]

42

43 Conflict of interest

44 The authors declare potential conflicts of interest as patent application.

45

2

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

46 Abstract

47 Lung cancer is the leading cause of cancer-related deaths worldwide. Cytological

48 examination is the current "gold standard" for lung cancer diagnosis, however this has

49 low sensitivity. Here, we identified a typical methylation signature of histone genes in

50 lung cancer by whole-genome DNA methylation analysis, which was validated by a

51 TCGA lung cancer cohort (n=907) and was further confirmed in 265 bronchoalveolar

52 lavage fluid (BALF) samples with specificity and sensitivity of 96.7% and 87.0%,

53 respectively. More importantly, HIST1H4F was universally hypermethylated in all

54 seventeen tumor types from TCGA datasets (n=7344), which was further validated in

55 nine different types of cancer (n=243). These results demonstrate that HIST1H4F can

56 function as a Universal-Cancer-Only Methylation (UCOM) marker, which may aid in

57 understanding general tumorigenesis and improve screening for early cancer diagnosis.

58 Significance

59 Findings identify a new biomarker for cancer detection and show that

60 hypermethylation of histone-related genes seems to persist across cancers.

61

62

3

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

63 Introduction

64 Lung cancer is one of the most common malignant tumors and the leading cause

65 of cancer-ralated deaths worldwide (1,2). Early detection and surgery offer the best

66 chance for survival, with the five-year survival rate as high as 80% (3). However, most

67 lung cancer patients have been diagnosed with inoperable advanced stage with metastasis,

68 and patients must undergo chemotherapy, radiotherapy, immunotherapy, or targeted

69 therapy. The five-year survival rate of patients in the advanced stage is below 10% (4,5).

70 Over the past decade, LDCT (low-dose computed tomography) is the most commonly

71 used screening method for lung cancer, which has been shown to improve early detection

72 and reduce mortality (6). However, due to its low specificity, LDCT is far from

73 satisfactory as a screening tool for clinical application, similar to other currently used

74 cancer biomarkers, such as carcinoembryonic antigen (CEA), neuron-specific enolase

75 (NSE), CYFRA 21-1, etc. Therefore, effective biomarkers for early detection, diagnosis,

76 prognosis, and monitoring of lung cancer are urgently needed (7).

77 Epigenetic and genetic abnormalities are hallmarks of lung cancer (8-10).

78 Abnormal DNA methylation is the most common epigenetic variation in the process of

79 lung cancer. Compared to DNA mutations, DNA methylation occurs much earlier and is

80 more stable in the early diagnosis of tumors, and aberrant DNA methylation pattern can

81 be used for predicting the liver cancer metastasis to lung (11). Although many DNA

82 methylation biomarkers have been reported, they are still under the exploration process

83 and rarely used in clinical applications. Sensitivity and specificity of current methylation

84 markers are insufficient with high false positives and false negatives risk (12,13).

4

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

85 Therefore, applying methylation markers to clinical applications is challenging and

86 searching for new biomarkers for the early detection of cancer are urgently needed (14).

87 are major essential components of and conserved in eukaryotic

88 cells (15). There are five major types of histones: H1, H2A, H2B, H3, and H4. Histones

89 H2A, H2B, H3 and H4 are known as the core histones, while histone H1 is known as the

90 linker histone (16). Histones are divided into canonical replication-dependent histones

91 that are expressed during the S-phase of the cell cycle and replication-independent

92 histone variants, which are expressed during each phase of the cell cycle. Genes encoding

93 canonical histones are -less and lack a polyA tail at the 3’ end, having instead a

94 stem-loop structure, canonical histone genes also tend to be clustered in the genome.

95 Genes encoding histone variants are usually not clustered and have and polyA

96 tails (17,18). In the , histone genes mainly form histone cluster 1

97 (Chr6p21) and histone cluster 2 (Chr1q21) (19). Other histone genes are distributed

98 randomly in the human genome. Although histone modifications have been extensively

99 studied in chromatin regulation, epigenetic variation in the family of histone genes

100 themselves is rarely considered. It has been shown that histone gene cluster 1 is occupied

101 by abnormally higher-order chromatin organization in breast cancer (20). However, DNA

102 methylation alteration in histone genes loci has not yet been systematically investigated,

103 especially in cancer development.

104 Here, through genome-wide DNA methylation analysis with an unusual strategy, we

105 found that many histone gene loci are abnormally hypermethylated in lung cancer, which

106 piqued our interest for further investigation. We demonstrate that methylation of histone

107 genes can be used as a biomarker for early detection in BALF samples. Furthermore,

5

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

108 histone gene loci are not only abnormally hypermethylated in lung cancer but also

109 specifically methylated in various tumors. In particular, the HIST1H4F gene is

110 abnormally hypermethylated in seventeen types of cancer which we could obtain

111 informative clinical samples and act as a potential Universal-Cancer-Only Methylation

112 marker. We speculate that the methylation of HIST1H4F will be of great significance for

113 early diagnosis, especially during the screening process of cancer in clinical applications.

114

115 Materials and methods

116 WGBS data analysis

117 WGBS data sets were downloaded from the Encode database

118 (https://www.encodeproject.org/) and the SRA database

119 (https://www.ncbi.nlm.nih.gov/sra); the serial numbers were summarized in

120 Supplementary Table 1. DNA methylation levels were calculated using BSMAP software

121 (21) as described previously (11), where hg19 human genome assembly and UCSC

122 reference gene annotations were used. Specifically, for each CpG site, reads supporting

123 either methylation or unmethylation were achieved, and the methylation value was

124 calculated as the ratio of the number of reads supporting methylation to the sum of the

125 number of reads supporting both methylation and unmethylation. Only CpG sites covered

126 by more than five reads and detected in all the seven WGBS data sets were used for

127 subsequent analysis.

128 DMS, DMR and DMG definition

6

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

129 The methylation levels of four Normal lung Cell samples (NC) and three lung Cancer

130 Cell samples (CC) were calculated. For each CpG site, we calculated methylation value

131 difference for all twelve CC-NC pairs (CCi – NCj, where i=1,2, or 3 and j=1,2,3 or 4).

132 CpG sites with all twelve (CCi - NCj) ≥ 50% were defined as Cancer Cell-Differentially

133 Methylated Sites (CC-DMS). Similarly, CpG sites with all twelve (NCj - CCi) ≥ 50%

134 were defined as Normal Cell-Differentially Methylated Sites (NC-DMS). In addition,

135 CpG sites with all twelve (|CCi - NCj|) ≤ 20% were defined as NO-Differentially

136 Methylated Sites (NO-DMS). A Differentially Methylated Region (DMR) was defined as

137 at least 3 adjacent DMS within 100bp genomic window. Genes overlapping with any

138 DMR were defined as Differentially Methylated Genes (DMG).

139 TCGA DNA methylation data analysis

140 The Illumina 450K methylation array level three data from the TCGA (The Cancer

141 Genome Atlas) database were downloaded from the UCSC Xena browser

142 (https://xenabrowser.net/). For each histone gene, only probes within the genebody region

143 (listed in Supplementary Table 2) were selected to calculate an average methylation value.

144 Probes with “NA” values were excluded. The absolute methylation values were

145 calculated from the β-values of 450K methylation array ( methylation value = (β-value +

146 0.5)*100% ). For each gene, the final methylation value was calculated by the average of

147 all CpG sites selected. The samples used from TCGA database and the methylation levels

148 of HIST1H4F were listed in Supplementary Table 3.

149 Clinical samples

150 We collected 243 primary tissue samples and 265 BALF samples from Shanghai Chest

151 Hospital and Zhongshan Hospital of Fudan University. Primary tissue samples included

7

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

152 25 lung cancer and 25 paired para-cancer control samples, 12 colorectal cancer and 12

153 paired para-cancer control samples, 10 esophagus cancer and 12 paired para-cancer

154 control samples, 20 liver cancer and 23 para-cancer control samples, 9 pancreatic cancer

155 and 9 paired para-cancer control samples, 10 cervical cancer and 10 control samples, 10

156 gastric cancer and 10 para-cancer control samples, 14 breast cancer and 14 paired para-

157 cancer control samples, 10 head and neck cancer and 10 paired para-cancer control

158 samples. Clinical characters of these samples were summarized in Supplementary Table 4.

159 BALF samples contained a benign lung disease (BLD) control group and lung cancer

160 group. BLD control group contained 59 samples, including pneumonia, emphysema, and

161 tuberculosis, etc. The lung cancer experimental group included 92 lung squamous cell

162 carcinoma (LUSC) samples, 70 lung adenocarcinoma (LUAD) samples, and 44 small cell

163 lung carcinoma (SCLC) samples. BALF samples were randomly assigned to a training

164 set and a validation set. All patients provided written informed consent before their

165 samples were collected. Institutional Review Boards approval for research on human

166 subjects was obtained from the Hospital.

167 DNA extraction and Bisulfite-PCR treatment, pyrosequencing

168 Genomic DNA from cultured cell lines and primary tissue samples was extracted with

169 phenol-chloroform. Genomic DNA from BALF samples was extracted with the Qiagen

170 DNA Extraction Kit (Qiagen, cat# 51404). Next, 20~200 ng genomic DNA was taken for

171 bisulfite treatment (ZYMO Research, cat# D5006), and the recovered bisulfite-treated

172 DNA was used as the subsequent PCR template. We detected eleven CpG sites for

173 HIST1H4F gene (chr6:26,240,743-26,240,800) and eight CpG sites for HIST1H4I gene

174 (chr6:27,107,185-27,107,239). The genomic sequences and primers designed for target

8

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

175 genes were listed in Supplementary Table 5. Two rounds of semi-nested PCR was

176 performed to produce the single band biotin modified PCR products. The out forward

177 primer and the reverse primer were used for the first round of PCR amplification. The

178 inner forward primer and the reverse primer were used for the second round of PCR

179 amplification. The two-round PCR were performed with the same program: 98℃ 30s for

180 pre-denaturation, 98℃ 10s 58℃ 30s 72℃ 30 for a 30 cycle amplification, 72 ℃ 3min

181 for a final elongation. The pyrosequencing assay was performed on a PyroMark Q96 ID

182 instrument (QIAGEN). For each target gene, the average of each CpG site detected by

183 pyrosequencing matched the final methylation value.

184 Cell Culture

185 The human lung cancer cell line A549, human lung fibroblast cell line MRC5 and human

186 hepatocarcinoma cell line HepG2 were kindly provided by Stem Cell Bank, Chinese

187 Academy of Sciences. All the cell lines were authenticated by the PowerPlex 16 System

188 (Promega) and were negative for mycoplasma tested by qPCR. A549, MRC5 and HepG2

189 cells were cultured in DMEM medium supplemented with 10% v/v FBS and 1% v/v

190 antibiotics at 37°C in a humidified atmosphere of 5% CO2. For passaging, cells were

191 washed once by PBS and dissociated using 1ml 0.25% trypsin, then neutralized with 1 ml

192 DMEM medium and equally plated into two 10cm dishes.

193 Results

194 The pipeline of Genome-wide WGBS data analysis and identified differentially

195 methylated regions validated by the TCGA cohort.

196 To detect genome-wide screening DNA methylation biomarkers for the early

197 diagnosis of lung cancer, we collected three WGBS data sets of lung cancer cells and

9

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

198 another four WGBS data sets of cell samples derived from normal lung tissues as controls

199 (Supplementary Table 1). To effectively screen for lung cancer biomarkers from these

200 WGBS data sets, we developed a new data analysis strategy (Fig. 1A).

201 1) First, we performed a genome-wide methylation analysis for each WGBS

202 sample and obtained all CpG sites covered by more than five reads. By this process, we

203 obtained at least 30 million CpG sites per sample, covering at least 55.7% of whole

204 genomes (Supplementary Table 1). To robustly analyze the difference between normal

205 and cancer samples at single-nucleotide resolution, only CpG sites detected in all seven

206 samples were selected for further analysis. In total, 19,461,312 CpG sites were selected,

207 covering 34.5% of all possible sites in human genome. This rate is much higher than both

208 the reduced representation bisulfite sequencing (RRBS), whose coverage was estimated

209 to be 1-3%, and the Illumina 450k methylation array, covering 485,455 CpG sites and

210 accounting for approximately 2% of all possible sites (22,23). The average methylation

211 levels showed that the cancer samples were hypomethylated compared to normal ones

212 (Fig. 1B), which is consistent with the previous report that cancer is globally

213 hypomethylated. Meanwhile, the 19,461,312 CpG sites were expected to distribute

214 throughout the whole genome, including intergenic, intron, exon, and promoter regions

215 (Fig. 1C). These results indicate that our approach is applicable throughout the genome

216 with minor sequence bias (Supplementary Fig. S1A).

217 2) Second, based on the 19,461,312 CpG sites, by calculating the methylation

218 differences between CCi and NCj, we found 24,257 CC-DMS, 442,233 NC-DMS, and

219 4,456,347 NO-DMS, which accounted for 0.12%, 2.27% and 22.9% of all 19,461,312

220 CpG sites, respectively. Compared to the equilibrium distribution of all the 19,461,312

10

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

221 CpG sites (Fig. 1C), CC-DMS were obviously enriched in the promoter and exonic

222 regions (hypergeometric test, p-value <1e-5) (Fig. 1D); NC-DMS was enriched in the

223 intergenic region (hypergeometric test, p-value < 1e-5 ) (Fig. 1E); meanwhile, NO-DMS

224 were mostly enriched in the intronic region (hypergeometric test, p-value <1e-5 ) (Fig.

225 1F). Additionally, 13,932 CpG sites out of 24,257 CC-DMS (57.4%) were located in

226 CpG island regions. In contrast, only 3,518 CpG sites out of 442,233 NC-DMS (0.8%)

227 were located in the CpG island regions, indicating that DNA methylation in tumor

228 usually occurred in cis-regulating elements. However, for NO-DMS, hypomethylated

229 CpG sites (methylation level ≤ 20%) were mainly distributed in the promoter region

230 (Supplementary Fig. S1B), while hypermethylated CpG sites (methylation level ≥ 80%)

231 were mainly distributed in the intronic region (Supplementary Fig. S1C). These results

232 reveal that the cancer cells are globally hypomethylated and locally hypermethylated, and

233 these locally hypermethylated regions are mainly distributed in promoter and exonic

234 regions.

235 3) Third, similar to the genetic linkage effect, DNA methylation within a small

236 genome region also tends to be consistent (24). Based on this principle, adjacent CpG

237 sites together among regional DNA methylation behavior is much more reliable than

238 single CpG sites. For example, DMR or methylation haplotypes have been widely used

239 for DNA methylation analysis. Therefore, we further defined DMR by more than three

240 DMS within the 100bps genome region. Among the 24,257 CC-DMS sites, we identified

241 2,408 CC-DMR. Calculating on the 442,233 NC-DMS, we found 36,393 NC-DMR.

242 Meanwhile, based on 4,456,347 NO-DMS, we found 435,249 NO-DMR. We further

243 analyzed these DMR-embedded genes. There were 958 CC-DMR-related genes and

11

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

244 1,925 NC-DMR-related genes, which we called CC-DMG (Cancer Cell-Differentially

245 Methylated Genes, Supplementary Table 6) and NC-DMG (Normal Cell-Differentially

246 Methylated Genes, Supplementary Table 7). We calculated the methylation levels of CC-

247 DMG and NC-DMG in WGBS and TCGA data (Fig. 1G-1H, Supplementary Tables 6

248 and 7). KEGG pathway analysis showed that CC-DMG were mainly enriched in tumor-

249 associated signaling pathways, such as the Hippo signaling pathway and transcriptional

250 misregulation in cancer. NC-DMG were enriched in olfactory transduction with less link

251 to tumor-related signaling pathways. NO-DMG were mainly enriched in basic cellular

252 function-related pathways (Supplementary Fig. S1D). Interestingly, both CC-DMG and

253 NC-DMG were enriched in the neuroactive ligand-receptor interaction signaling pathway.

254 Particularly, some adrenaline signaling-related genes, such as ADRA1A, ADRA2A,

255 ADRA2C, and ADRBK1, appeared in the CC-DMG list, but some cholinergic signaling-

256 related genes, such as CHRM2, CHRM3, and CHRM5 were found in the NC-DMG list.

257 The variation in DNA methylation in nerve-related genes indicates that neuroregulation

258 plays an important role in the genesis and development of lung cancer, which is

259 supported by evidence from several groups showing that cancer development in a variety

260 of tissues is controlled by an assortment of nerve-mediated signals, including

261 neurotransmitters and other molecules (25-27), indicating that epigenetic regulation of

262 neuron related genes will be of great interest in cancer development. As expected, many

263 renowned lung cancer methylation biomarkers that were reported in the literature are

264 among our CC-DMG list, for example, SHOX2, POU4F2, BCAT1, HOXA9, and PTGDR

265 (28-32). These results further support our strategy of analysis.

12

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

266 4) Fourth, to further confirm the veracity of the WGBS analysis, we downloaded

267 the Illumina 450K methylation array data of the TCGA lung cancer cohort. The Illumina

268 450K methylation array contains 485,455 CpG probes. The TCGA lung cancer cohort

269 contained a total of 907 samples, including 75 para-cancer normal control samples and

270 832 lung cancer samples (Supplementary Table 8). We selected overlapping detected

271 CpG sites among 450K probes and CpGs in DMS/DMR/DMG (Fig. 1A, Supplementary

272 Fig. S1E) to verify our WGBS analysis. In the 485,455 450K probes, 845 and 1662 CpG

273 sites were commonly detected in 450K probes with CC-DMS and NC-DMS, respectively.

274 Methylation levels of CC-DMS and NC-DMS from WGBS were clearly either

275 hypermethylated or hypomethylated between cancer and normal samples in TCGA

276 datasets accordingly. There were 624 and 840 CpG sites commonly detected in 450K

277 probes with CC-DMR and NC-DMR, respectively. Similarly, CC-DMR and NC-DMR

278 obtained from WGBS are also verified by TCGA data sets (Supplementary Fig. S1F-I).

279 As for DMG, 401 and 377 CpG sites were both detected in 450K probes with CC-DMG

280 and NC-DMG, respectively, and their DNA methylation status were all supported by

281 TCGA data sets (Fig. 1H). Take it together, our results can be fully verified by lung

282 cancer 450K methylation array data from the TCGA, which further prove the validity of

283 our previous analyzed approach.

284

285 Abnormally hypermethylated signature of histone gene in lung cancer

286 In addition to some already acknowledged biomarkers, such as SHOX2 and

287 POU4F2, we effectively found many unreported new genes on our CC-DMG list. More

13

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

288 interestingly, some histone genes appeared on the CC-DMG list, such as HIST1H3C,

289 HIST1H4F, and HIST1H4I, which called for further investigation.

290 As essential and conserved housekeeping genes, histones are stably expressed in

291 almost all eukaryotic cells. Due to the important function of histones, each histone

292 is encoded by multiple histone genes (19). In total, 85 histone genes have been

293 found in the human genome, including 68 canonical histone genes and 17 histone variant

294 genes. Canonical histone genes include six H1 genes, seventeen H2a genes, eighteen H2b

295 genes, thirteen H3 genes, and fourteen H4 genes. Variant histone genes include four H1

296 variants, seven H2a variants, two H2b variants, and four H3 variants. Histone

297 modifications have been widely investigated in the epigenetic field (33,34).

298 Unfortunately, DNA methylation of the histone gene family has not been well described

299 in the literature. We summarize the 85 histone genes in Supplementary Table 9.

300 We further focus on the analysis of DNA methylation of the whole histone gene

301 family in WGBS data (Fig. 2A). Four histone genes were not included in our analysis

302 (HIST2H2AA4, HIST2H3C, HIST2H4A, H2BFS), because they were not all detected in

303 the WGBS dataset, therefore, we excluded these four genes from the subsequent analysis.

304 According to the DNA methylation signature of histone gene in normal and cancer

305 samples, they can be divided into seven groups. As shown in group 1, normal and cancer

306 cells are all poorly methylated, and meanwhile, group 2 histone genes are all highly

307 methylated in normal and cancer samples. Group 3 histone genes are randomly

308 methylated in normal and cancer samples and group 4, including 14 histone genes

309 (HIST1H4I, HIST1H2BM, HIST1H3C, HIST1H4F, HIST1H2BB, HIST1H2BE,

310 HIST1H1A, HIST1H2BI, HIST1H3G, HIST1H2AD, HIST1H2BE, HIST1H3J,

14

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

311 HIST1H2BH, HIST1H4D) were hypermethylated in all lung cancer samples (Fig. 2B,

312 Supplementary Fig. S2A). To confirm this finding, we reanalyzed the methylation of

313 these hypermethylated histone genes on the Illumina 450K methylation arrays of the

314 TCGA lung cancer cohort (n=907), and the results showed that nine of the fourteen genes

315 (HIST1H4I, HIST1H4F, HIST1H3C, HIST1H2BE, HIST1H2BM, HIST1H3J,

316 HIST1H2BB, HIST1H1A, HIST1H2BI) were significantly hypermethylated in both

317 LUAD and LUSC (Fig. 2C). In addition, we found that DNA methylation of histone

318 genes can be used for the classification of the three main types of lung cancer. We found

319 that four histone genes (group 5: HIST1H2AG, HIST3H2A, HIST3H2BB, HIST1H3F)

320 were specifically hypermethylated in LUAD (Fig. 2D), and four histone genes (group 6:

321 HIST1H4A, HIST1H3A, HIST1H2AL, HIST1H3I) were only methylated in LUSC

322 samples (Fig. 2E), and another six histone genes (group 7: HIST1H2BL, HIST2H3D,

323 HIST1H2AJ, H2AFJ, HIST1H2AI, HIST1H1D) were high methylated in SCLC (Fig. 2F).

324 More importantly, these cancer type-specific hypermethylated genes can be verified in

325 the TCGA datasets (Supplementary Fig. S2B). These results suggest that methylation of

326 histone gene loci may be used for distinguishing lung cancer subtypes.

327 We further performed receiver operating characteristics (ROC) analysis on

328 fourteen histone genes that were hypermethylated by using TCGA datasets. The results

329 show that HIST1H4F and HIST1H4I have much higher specificity and sensitivity; the

330 specificity and sensitivity of HIST1H4F were 97.3% and 82.7%, respectively, and the

331 specificity and sensitivity of HIST1H4I were 96.0% and 87.5%, respectively

332 (Supplementary Table 10). Moreover, they exhibit an excellent performance within stage

333 I of lung cancer and ROC analysis reveals that they have a similar AUCs between

15

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

334 different stages, which indicates that methylation of HIST1H4F and HIST1H4I can act as

335 early lung cancer diagnosis biomarker (Supplementary Fig. S2C-D, Supplementary Table

336 10). Furthermore, ROC analysis showed that maximum methylation of HIST1H4F and

337 HIST1H4I (Max-IF) performed better than individual genes, with an area under the curve

338 (AUC) of 0.95, a specificity of 96.0%, and a sensitivity of 92.9% (Supplementary Fig.

339 S2E, Supplementary Table 10).

340 To further confirm our results, the lung cancer primary tissue samples were used

341 for verification. We collected 25 lung cancer tissue samples and paired para-cancer tissue

342 samples as control (Supplementary Table 11). Methylation of HIST1H4F and HIST1H4I

343 were detected by bisulfite PCR-pyrosequencing. The results showed that HIST1H4F and

344 HIST1H4I were significantly hypermethylated in lung cancer, and ROC analysis showed

345 very high sensitivity and specificity for each gene (Supplementary Fig. S2F). Max-IF was

346 significantly hypermethylated in lung cancer samples, with an AUC=0.98 and a

347 sensitivity of 96% and a specificity of 88% (Fig. 2G-H).

348

349 Methylation pattern of Histone gene for lung cancer diagnosis by bronchoalveolar

350 lavage fluid samples

351 BALF is of great significance in the early diagnosis of lung cancer (35,36).

352 Therefore, we tried to diagnose lung cancer by detecting the methylation of histone genes

353 using BALF samples. We collected 265 BALF samples consisting of 59 BLD control

354 samples and 206 lung cancer samples. The BLD control group contain pneumonia,

355 emphysema, tuberculosis samples, etc. The lung cancer experimental group included 92

356 LUSC, 70 LUAD, and 44 SCLC samples. After obtaining the BALF samples, we

16

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

357 randomly divided the samples into the training set (n=133) and validation set (n=132)

358 (Table 1).

359 A bisulfite-PCR pyrosequencing assay was used to detect HIST1H4F and

360 HIST1H4I methylation. To ensure the reproducibility of pyrosequencing, three technical

361 replications of bisulfite-PCR pyrosequencing were completed of a total of 30 BALF

362 samples including 10 low-methylated (0% ≤ methylation ≤ 5%), 10 middle-methylated (5%

363 < methylation < 20%) and 10 high-methylated (20% ≤ methylation ≤ 100%) samples, the

364 results showed an excellent performance in all low, middle, and high methylated samples,

365 with a methylation variation within 5% (Supplementary Fig. S3A-C). Our analysis of

366 clinical samples displayed that, in both the training set and the validation set, HIST1H4F

367 and HIST1H4I were significantly hypermethylated in different types of lung cancer

368 (Supplementary Fig. S4A-B). Max-IF was also significantly higher in LUAD, LUSC,

369 SCLC and all lung cancer samples (Fig. 3A). To assess the potential for lung cancer

370 diagnosis using HIST1H4I, HIST1H4F or Max-IF, we first performed ROC analysis in

371 the training data set, where the area under the ROC curve (AUC) was calculated and a

372 cutoff value was determined accordingly; sensitivity and specificity were further

373 calculated based on this cutoff. Moreover, to robustly estimate the diagnostic accuracy,

374 an independent evaluation using the validation set were performed, where another

375 sensitivity and specificity were calculated based on the given cutoff (Fig. 3B-3C,

376 Supplementary Table 10). For LUSC and SCLC, Max-IF achieved AUCs of 0.94 and

377 0.97, respectively (Fig. 3B). For LUSC, with a methylation cutoff of 6.05%, the

378 specificity and sensitivity of Max-IF were 96.7% and 86.4% in the training set and were

379 96.5% and 85.4% in the validation set. For SCLC, with the methylation cutoff of 7.75%,

17

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

380 the specificity and sensitivity of Max-IF were 96.7% and 95.5% in the training set and

381 were 96.5% and 95.7% in the validation set (Fig. 3C, Supplementary Table 10).

382 Comparing to LUSC and SCLC, which tend to be more centrally located, LUAD is

383 usually observed peripherally in the lungs (37). Therefore, LUSC and SCLC BALF

384 samples are more likely to contain cancer cells than LUAD BALF samples (38), thus the

385 sensitivity of LUAD should be lower than that in BALF samples of LUSC and SCLC. As

386 expected, in LUAD, the specificity and sensitivity of Max-IF were 96.7% and 60.5% in

387 the training set (cutoff=6.3% and AUC=0.84) and were 96.5% and 65.6% in the

388 validation set. In order to improve the detection sensitivity in LUAD, we combined Max-

389 IF with serum carcinoembryonic antigen (CEA). The sensitivity of CEA alone as a lung

390 cancer biomarker is very low for lung cancer diagnosis (39). In our study, the sensitivities

391 of CEA (cut off=5 ng/ml) in the training set and validation set were 27.3% and 30.7%,

392 respectively. However, the sensitivity of CEA in LUAD is much higher than in LUSC or

393 SCLC. In the training set, the sensitivities of LUAD, LUSC, and SCLC were 47.1%,

394 16.2%, and 14.3%, respectively. In the validation set, the sensitivities of LUAD, LUSC,

395 and SCLC were 50%, 22.2%, and 26.1%, respectively. Therefore, we combined Max-IF

396 with serum CEA for LUAD diagnosis, the final result of the sample can be positive by

397 either of them, and the sensitivity increased from 60.5% to 77.8% in the training set and

398 from 65.6% to 81.5% in the validation set (Fig. 3D). For all cancer samples, the

399 specificity and sensitivity were 96.7% and 86.0% in the training set and 96.5% and 87.0%

400 in the validation set, indicating that histone gene methylation as lung cancer biomarker

401 has excellent accuracy for lung cancer diagnosis (Fig. 3E).

402

18

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

403 Methylation of HIST1H4F gene is a potential Universal-Cancer-Only Methylation

404 marker

405 We have demonstrated that many histone genes are abnormally hypermethylated

406 in lung cancer, we wonder whether histone genes are also abnormally methylated in other

407 types of cancer. In total, seventeen cancer cohorts from the TCGA were analyzed. They

408 include BLCA (bladder urothelial carcinoma, n=433), BRCA (breast invasive carcinoma,

409 n=867), CESC (cervical squamous cell carcinoma and endocervical adenocarcinoma,

410 n=310), CHOL (cholangiocarcinoma, n=45), COAD (colon adenocarcinoma, n=335),

411 ESCA (esophageal carcinoma, n=201), HNSC (head & neck squamous cell carcinoma,

412 n=578), KIRC (kidney renal clear cell carcinoma, n=479), LIHC (liver hepatocellular

413 carcinoma, n=427), LUNG (lung cancer, n=907), PAAD (pancreatic adenocarcinoma,

414 n=194), PRAD (prostate adenocarcinoma, n=548), READ (rectum adenocarcinoma,

415 n=106), SKCM (skin cutaneous melanoma, n=476), STAD (stomach adenocarcinoma,

416 n=398), THCA (thyroid carcinoma, n=563), and UCEC (uterine corpus endometrioid

417 carcinoma, n=477) (Supplementary Table 12).

418 For each cancer type, we calculated the average methylation difference in normal

419 and cancer samples (Fig. 4A). We found that there are no methylation differences in most

420 histone genes. However, some histone genes tended to be hypermethylated in different

421 types of cancer, including HIST1H4F, HIST1H3E, HIST1H2BB, HIST1H1A, HIST1H3C,

422 HIST1H4I. However, H2BFM and H2BFWT tended to be hypomethylated in various

423 types of cancer. Importantly, we found that HIST1H4F was hypermethylated in all tumor

424 types, except THCA (Fig. 4B). In THCA, even minor methylation difference was

425 observed between normal (median=6.1%) and cancer (median=5.4%) samples, we

19

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

426 showed HIST1H4F did hypermethylated in different stages of cancer than normal

427 samples (Supplementary Fig. S5). Therefore, we considered HIST1H4F

428 hypermethylation as a conserved feature across almost all types of cancers, and named it

429 “Universal-Cancer-Only Methylation (UCOM)”. Further more, we analyzed the

430 relationship between HIST1H4F methylation and tumor stages or patients’ outcome in

431 eight tumor types with a larger sample size in the TCGA database (Supplementary Fig.

432 S6A-S6C and Supplementary Fig. S7A-S7G). The results showed that HIST1H4F was

433 even hypermethylated in stage I of all eight types of cancers without significant

434 differences among stages of cancer. Moreover, ROC analysis showed that the AUCs were

435 also similar in different stages (Supplementary Table 13). These results indicate that

436 HIST1H4F locus is methylated in the initiation process of cancer development.

437 Furthermore, the survival analysis in these eight cancer types showed there were no

438 significant differences for patients’ outcome among the low-middle-high methylation

439 group (Supplementary Table 14). Taken together, our results suggest that

440 hypermethylation of HIST1H4F can act as a useful early diagnostic marker for multi-

441 types of cancers.

442 To further confirm HIST1H4F as a Universal-Cancer-Only Methylation marker,

443 we selected 243 cases of a total of nine types of clinical cancer samples, including 50

444 lung cancer sample as shown previously and another 193 samples from eight different

445 types of cancers (Supplementary Table 4). Methylation of HIST1H4F in these samples

446 was detected by bisulfite-PCR pyrosequencing assay. The results showed that HIST1H4F

447 was significantly hypermethylated in all nine types of cancer (Fig. 4C). ROC analysis of

448 HIST1H4F methylation in nine types of cancer was performed, and the results showed

20

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

449 that the AUCs in all nine cancers were above 0.87, suggesting HIST1H4F as a dreaming

450 Universal-Cancer-Only Methylation marker (Supplementary Table 10). To further

451 confirmed HIST1H4F as a Universal-Cancer-Only Methylation marker, we should expect

452 that the DNA methylation level of HIST1H4F should represent the ratio of cancer cell

453 mixed with non-cancer cell in clinical samples. To verify this point, we mixed normal

454 cells (lung fibroblast cell line MRC5 or normal liver cells) within cancer cells (lung

455 cancer cell line A549 or liver cancer cell line HepG2) by the proportion of 0%, 25%, 50%,

456 75%, and 100%. We then detected the methylation level of each sample by bisulfite-PCR

457 pyrosequencing assay. As expected, the final methylation level was properly represent

458 percentatge of cancer cell DNA mixed with normal ones. These results indicating that

459 HIST1H4F is not only a Universal-Cancer-Only Methylation marker, but also able to

460 estimate the cancer cell ratio in clinical samples (Supplementary Fig. S8A-S8B).

461 In summary, we collected nine types of cancer, and though many other rare types

462 of cancers have not yet been verified, we speculate that HIST1H4F is hypermethylated in

463 many other cancer types as well. Therefore, we conclude that HIST1H4F may be a

464 promising Universal-Cancer-Only Methylation marker for the screening of early cancer

465 patients and its role in tumorigenesis awaits further study.

466 DNA methylation is usually correlated with , so we asked whether

467 abnormal hypermethylation of HIST1H4F influenced gene expression. We analyzed

468 HIST1H4F expression in fifteen tumor types in the TCGA database (tumor types without

469 normal controls were excluded), the results showed that in most types of tumors

470 HIST1H4F has no (or very low) gene expression in both normal controls as well as

471 tumors (Supplementary Fig. S9A). We verified in cultured normal lung fibroblast cell

21

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

472 line MRC5 and lung cancer cell line A549, in which we detected DNA methylation and

473 gene expression of HIST1H4F, the results showed that HIST1H4F was hypermethylated

474 in A549 cells and hypomethylated in MRC5 cells (Supplementary Fig. S9B), but has no

475 gene expression in both of them (Supplementary Fig. S9C). These unexpected results

476 indicating that the expression of HIST1H4F itself maybe not involved in tumor genesis,

477 but instead that the epigenetic status of HIST1H4F loci may affect the chromatin

478 information or structure which further alter the cancer related gene expression during

479 tumor imitation, which is further supported by the discovery that the histone gene H4

480 genome sequence are completely different but generate almost the same amino acid

481 peptides (Supplementary Fig. S9D-E).

482

483 Discussion

484 WGBS is the most comprehensive method for detecting genome-wide DNA

485 methylation (23). However, few reports have directly investigated methylation

486 biomarkers in WGBS dataset. Here, we developed a new strategy to analyze WGBS data

487 and to efficiently screen for new methylation markers of lung cancer genome-wide.

488 These markers were also further verified by TCGA data and clinical cancer samples.

489 Through these analyses, we unexpectedly found that many histone genes were

490 abnormally hypermethylated in lung cancer. The methylation status of HIST1H4F and

491 HIST1H4I in BALF samples can be used as an effective approach for the early diagnosis

492 of lung cancer, with a specificity of 96.7% and a sensitivity of 87.0%.

493 The TCGA program provides us with a wealth of data on the study of tumors,

494 especially for the study of pan-cancerous characteristics, and a series of high-level

22

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

495 literature have been published, including pan-cancer-related signaling pathway analysis

496 (40,41), genetic alteration analysis (42-44), molecular-based tumor reclassification analysis

497 (45), and pan-cancer DNA methylation analysis (46), etc. These studies have given us

498 informative views of cancer from different perspectives. However, these reported pan-

499 cancer related markers are combined lots of genes together for cancer diagnosis, there are

500 few reports described that one gene or one locus can be used for all cancer type screening.

501 These may be due to the methylation data in the TCGA database was measured using the

502 450K methylation array, covering only about 2% of all CpG sites in the genome, and

503 most information of the genome was missing. Therefore, combining WGBS data with

504 TCGA data for analysis is an efficient strategy for screening DNA methylation

505 biomarkers across the genome.

506 Histones are an important family of housekeeping genes expressed in almost all

507 organisms. In order to ensure the expression stability of histone, each histone protein is

508 encoded by many histone genes. The regulation of spatial and temporal expression of the

509 histone genes are very different from other genes (17,19). In addition, the modification of

510 histones has been extensively studied. However, there is no systematic study on the

511 methylation abnormality of the histone loci themselves. Alterations in the chromatin

512 structure of the histone gene cluster 1 region have been found in breast cancer (20). By

513 coincidence, it has been reported that the histone gene cluster 1 genomic region is

514 abnormally enriched of H3K27me3 in acute myeloid leukemia (AML) (47). Interestingly

515 on our part, we found aberrant DNA methylation in many histones loci located in the

516 histone gene cluster 1. We further analyzed the expression of HIST1H4F in fifteen tumor

517 types in the TCGA database, and the results showed that HIST1H4F has no (or very low)

23

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

518 expression in normal tissues and tumors of different cancer types. We interpreted that

519 these aberrant DNA methylations may affect CTCF binding that will further alter the

520 chromatin structure of histone gene cluster 1 during cancer development (48,49), and we

521 could imagine that the epigenetic status or chromatin high order structure of histone loci

522 other than their expression themselves may involve in tumor initiative process. More

523 interestingly, histone gene in cluster 1 is also methylated in different types of cancer,

524 which suggest that aberrant DNA methylation in the region of histone gene cluster 1 may

525 also be involved in common mechanisms for multiple types of cancer development and it

526 will be interesting for us to explore this in the near future.

527 To extend our unexpected findings, we analyzed seventeen cohorts of cancer in

528 the TCGA database and found that many histone genes are not only hypermethylated in

529 lung cancer but also abnormally hypermethylated in many other tumors. Moreover, we

530 were surprised to find that HIST1H4F is hypermethylated in all cancer types and is both

531 highly sensitive and specific as a potential Universal-Cancer-Only Methylation marker,

532 which was further verified by a total of 243 clinical samples, covering nine tumor types.

533 Unlike most reported multigene panels for pan-cancer diagnosis (50-52), HIST1H4F is a

534 potential Universal-Cancer-Only Methylation marker, which was a completely

535 unexpected finding and will be of great convenience and significance in subsequent

536 clinical applications. Meanwhile, further exploring the underlying mechanism of

537 HIST1H4F in cancer development may help us better understand the common feature of

538 tumorigenesis. As a Universal-Cancer-Only Methylation marker, the epigenetics status

539 and chromatin structure of HIST1H4F loci will be of great significance for understanding

24

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

540 the general mechanism of cancer development and reversing DNA methylation in

541 specific histone locus may be a potential common strategy for future cancer treatment.

542

543 Acknowledgments

544 We thank Yan Li, Lina Peng, Huaibing Luo, ZhiCong Chu, Yao Xiao, Min Xiao, Ying

545 Guo, Lu Chen and Lan Zhang for experimental help. We thank Ruitu Lv and Feizhen Wu

546 for bioinformatic analysis help. We thank Yue Yu, Zhicong Yang, Ying Tong and

547 Zhiqiang Hu for editorial help and useful comments on the manuscript.

25

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

548 References

549

550 1. Hirsch FR, Scagliotti GV, Mulshine JL, Kwon R, Curran WJ, Jr., Wu YL, et al. Lung 551 cancer: current therapies and new targeted treatments. Lancet 2017;389:299- 552 311 553 2. Melosky B, Chu Q, Juergens R, Leighl N, McLeod D, Hirsh V. Pointed Progress in 554 Second-Line Advanced Non-Small-Cell Lung Cancer: The Rapidly Evolving Field of 555 Checkpoint Inhibition. J Clin Oncol 2016;34:1676-88 556 3. Sozzi G, Boeri M. Potential biomarkers for lung cancer screening. Transl Lung 557 Cancer Res 2014;3:139-48 558 4. National Lung Screening Trial Research T, Aberle DR, Adams AM, Berg CD, Black 559 WC, Clapp JD, et al. Reduced lung-cancer mortality with low-dose computed 560 tomographic screening. N Engl J Med 2011;365:395-409 561 5. Kanodra NM, Silvestri GA, Tanner NT. Screening and early detection efforts in 562 lung cancer. Cancer 2015;121:1347-56 563 6. Singhal S, Vachani A, Antin-Ozerkis D, Kaiser LR, Albelda SM. Prognostic 564 implications of cell cycle, apoptosis, and angiogenesis biomarkers in non-small 565 cell lung cancer: a review. Clin Cancer Res 2005;11:3974-86 566 7. Kathuria H, Gesthalter Y, Spira A, Brody JS, Steiling K. Updates and controversies 567 in the rapidly evolving field of lung cancer screening, early detection, and 568 chemoprevention. Cancers (Basel) 2014;6:1157-79 569 8. Risch A, Plass C. Lung cancer epigenetics and genetics. Int J Cancer 570 2008;123:1-7 571 9. Mundbjerg K, Chopra S, Alemozaffar M, Duymich C, Lakshminarasimhan R, 572 Nichols PW, et al. Identifying aggressive prostate cancer foci using a DNA 573 methylation classifier. Genome Biol 2017;18:3 574 10. Nguyen LV, Pellacani D, Lefort S, Kannan N, Osako T, Makarem M, et al. 575 Barcoding reveals complex clonal dynamics of de novo transformed human 576 mammary cells. Nature 2015;528:267-71 577 11. Li J, Li Y, Li W, Luo H, Xi Y, Dong S, et al. Guide Positioning Sequencing identifies 578 aberrant DNA methylation patterns that alter cell identity and tumor-immune 579 surveillance networks. Genome Res 2019;29:270-80 580 12. Dor Y, Cedar H. Principles of DNA methylation and their implications for biology 581 and medicine. Lancet 2018;392:777-86 582 13. Koch A, Joosten SC, Feng Z, de Ruijter TC, Draht MX, Melotte V, et al. Analysis of 583 DNA methylation in cancer: location revisited. Nat Rev Clin Oncol 2018;15:459- 584 66 585 14. Vargas AJ, Harris CC. Biomarker development in the precision medicine era: lung 586 cancer as a case study. Nat Rev Cancer 2016;16:525-37 587 15. Hu Y, Lai Y. Identification and expression analysis of rice histone genes. Plant 588 Physiol Biochem 2015;86:55-65 589 16. Bhasin M, Reinherz EL, Reche PA. Recognition and classification of histones using 590 support vector machine. J Comput Biol 2006;13:102-12 591 17. Isogai Y, Keles S, Prestel M, Hochheimer A, Tjian R. of histone 592 gene cluster by differential core-promoter factors. Genes Dev 2007;21:2936-49

26

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

593 18. Buschbeck M, Hake SB. Variants of core histones and their roles in cell fate 594 decisions, development and cancer. Nat Rev Mol Cell Biol 2017;18:299-314 595 19. Braastad CD, Hovhannisyan H, van Wijnen AJ, Stein JL, Stein GS. Functional 596 characterization of a human histone gene cluster duplication. Gene 597 2004;342:35-40 598 20. Fritz AJ, Ghule PN, Boyd JR, Tye CE, Page NA, Hong D, et al. Intranuclear and 599 higher-order chromatin organization of the major histone gene cluster in breast 600 cancer. J Cell Physiol 2018;233:1278-90 601 21. Xi Y, Li W. BSMAP: whole genome bisulfite sequence MAPping program. BMC 602 Bioinformatics 2009;10:232 603 22. Yong WS, Hsu FM, Chen PY. Profiling genome-wide DNA methylation. Epigenetics 604 Chromatin 2016;9:26 605 23. Chatterjee A, Rodger EJ, Morison IM, Eccles MR, Stockwell PA. Tools and 606 Strategies for Analysis of Genome-Wide and Gene-Specific DNA Methylation 607 Patterns. Methods Mol Biol 2017;1537:249-77 608 24. Guo S, Diep D, Plongthongkum N, Fung HL, Zhang K, Zhang K. Identification of 609 methylation haplotype blocks aids in deconvolution of heterogeneous tissue 610 samples and tumor tissue-of-origin mapping from plasma DNA. Nat Genet 611 2017;49:635-42 612 25. Zhao CM, Hayakawa Y, Kodama Y, Muthupalani S, Westphalen CB, Andersen GT, 613 et al. Denervation suppresses gastric tumorigenesis. Sci Transl Med 614 2014;6:250ra115 615 26. Zahalka AH, Arnal-Estape A, Maryanovich M, Nakahara F, Cruz CD, Finley LWS, 616 et al. Adrenergic nerves activate an angio-metabolic switch in prostate cancer. 617 Science 2017;358:321-6 618 27. Magnon C, Hall SJ, Lin J, Xue X, Gerber L, Freedland SJ, et al. Autonomic nerve 619 development contributes to prostate cancer progression. Science 620 2013;341:1236361 621 28. Ilse P, Biesterfeld S, Pomjanski N, Wrobel C, Schramm M. Analysis of SHOX2 622 methylation as an aid to cytology in lung cancer diagnosis. Cancer Genomics 623 Proteomics 2014;11:251-8 624 29. Pradhan MP, Desai A, Palakal MJ. Systems biology approach to stage-wise 625 characterization of epigenetic genes in lung adenocarcinoma. BMC Syst Biol 626 2013;7:141 627 30. Ooki A, Maleki Z, Tsay JJ, Goparaju C, Brait M, Turaga N, et al. A Panel of Novel 628 Detection and Prognostic Methylated DNA Markers in Primary Non-Small Cell 629 Lung Cancer and Serum DNA. Clin Cancer Res 2017;23:7141-52 630 31. Diaz-Lagares A, Mendez-Gonzalez J, Hervas D, Saigi M, Pajares MJ, Garcia D, et 631 al. A Novel Epigenetic Signature for Early Diagnosis in Lung Cancer. Clin Cancer 632 Res 2016;22:3361-71 633 32. Su J, Huang YH, Cui X, Wang X, Zhang X, Lei Y, et al. Homeobox oncogene 634 activation by pan-cancer DNA hypermethylation. Genome Biol 2018;19:108 635 33. Cedar H, Bergman Y. Linking DNA methylation and histone modification: patterns 636 and paradigms. Nat Rev Genet 2009;10:295-304 637 34. Hammond CM, Stromme CB, Huang H, Patel DJ, Groth A. Histone chaperone 638 networks shaping chromatin function. Nat Rev Mol Cell Biol 2017;18:141-58

27

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

639 35. Wang H, Zhang X, Liu X, Liu K, Li Y, Xu H. Diagnostic value of bronchoalveolar 640 lavage fluid and serum tumor markers for lung cancer. J Cancer Res Ther 641 2016;12:355-8 642 36. Poletti V, Poletti G, Murer B, Saragoni L, Chilosi M. Bronchoalveolar lavage in 643 malignancy. Semin Respir Crit Care Med 2007;28:534-45 644 37. Collins LG, Haines C, Perkel R, Enck RE. Lung cancer: diagnosis and 645 management. Am Fam Physician 2007;75:56-63 646 38. Sareen R, Pandey CL. Lung malignancy: Diagnostic accuracies of bronchoalveolar 647 lavage, bronchial brushing, and fine needle aspiration cytology. Lung India 648 2016;33:635-41 649 39. Holdenrieder S, Wehnl B, Hettwer K, Simon K, Uhlig S, Dayyani F. 650 Carcinoembryonic antigen and cytokeratin-19 fragments for assessment of 651 therapy response in non-small cell lung cancer: a systematic review and meta- 652 analysis. Br J Cancer 2017;116:1037-45 653 40. Sanchez-Vega F, Mina M, Armenia J, Chatila WK, Luna A, La KC, et al. Oncogenic 654 Signaling Pathways in The Cancer Genome Atlas. Cell 2018;173:321-37 e10 655 41. Chen H, Li C, Peng X, Zhou Z, Weinstein JN, Cancer Genome Atlas Research N, 656 et al. A Pan-Cancer Analysis of Enhancer Expression in Nearly 9000 Patient 657 Samples. Cell 2018;173:386-99 e12 658 42. Korkut A, Zaidi S, Kanchi RS, Rao S, Gough NR, Schultz A, et al. A Pan-Cancer 659 Analysis Reveals High-Frequency Genetic Alterations in Mediators of Signaling by 660 the TGF-beta Superfamily. Cell Syst 2018;7:422-37 e7 661 43. Huang KL, Mashl RJ, Wu Y, Ritter DI, Wang J, Oh C, et al. Pathogenic Germline 662 Variants in 10,389 Adult Cancers. Cell 2018;173:355-70 e14 663 44. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, 664 et al. Comprehensive Characterization of Cancer Driver Genes and Mutations. 665 Cell 2018;174:1034-5 666 45. Hoadley KA, Yau C, Hinoue T, Wolf DM, Lazar AJ, Drill E, et al. Cell-of-Origin 667 Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types 668 of Cancer. Cell 2018;173:291-304 e6 669 46. Saghafinia S, Mina M, Riggi N, Hanahan D, Ciriello G. Pan-Cancer Landscape of 670 Aberrant DNA Methylation across Human Tumors. Cell Rep 2018;25:1066-80 e8 671 47. Tiberi G, Pekowska A, Oudin C, Ivey A, Autret A, Prebet T, et al. PcG methylation 672 of the HIST1 cluster defines an epigenetic marker of acute myeloid leukemia. 673 Leukemia 2015;29:1202-6 674 48. Bonev B, Cavalli G. Organization and function of the 3D genome. Nat Rev Genet 675 2016;17:661-78 676 49. Dixon JR, Xu J, Dileep V, Zhan Y, Song F, Le VT, et al. Integrative detection and 677 analysis of structural variation in cancer genomes. Nat Genet 2018;50:1388-98 678 50. Yang X, Gao L, Zhang S. Comparative pan-cancer DNA methylation analysis 679 reveals cancer common and specific patterns. Brief Bioinform 2017;18:761-73 680 51. Hao X, Luo H, Krawczyk M, Wei W, Wang W, Wang J, et al. DNA methylation 681 markers for diagnosis and prognosis of common cancers. Proc Natl Acad Sci U S 682 A 2017;114:7414-9 683 52. Brena RM, Plass C, Costello JF. Mining methylation for early detection of common 684 cancers. PLoS Med 2006;3:e479

685

28

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

686

687

688

689

690

691

29

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Tables

Table 1. Clinical information of the training set and validation set.

Bronchoalveolar Lavage Fluid

Characteristics Training Set Validation Set

BLD LUAD LUSC NSLC Total BLD LUAD LUSC NSLC Total (n=30) (n=38) (n=44) (n=21) (n=103) (n=29) (n=32) (n=48) (n=23) (n=103) Age (years)

Mean±SEM 55.8±2.1 62.0±1.5 64.1±1.4 56.6±1.9 61.8±0.9 53.5±2.3 60.2±1.6 61.2±1.4 59.7±1.7 60.7±0.9

Range 34-72 44-76 31-79 44-76 31-79 35-80 43-80 39-80 46-76 39-80 Gender

Female (%) 12(40.0) 12(31.6) 3(6.8) 3(14.3) 18(17.5) 12(41.4) 10(31.2) 4(8.3) 5(21.7) 19(18.4)

Male (%) 18(60.0) 26(68.4) 41(93.2) 18(85.7) 85(82.5) 17(58.6) 22(68.8) 44(91.7) 18(78.3) 84(81.6) Stage

Stage I (%) - 10(26.3) 13(30.0) 8(38.1) 31(30.1) - 13(40.6) 14(29.2) 9(39.1) 36(35.0)

Stage II (%) - 11(28.9) 12(27.3) 3(14.3) 26(25.2) - 7(21.9) 16(33.3) 4(17.4) 27(26.2)

Stage III (%) - 10(26.3) 13(30.0) 7(33.3) 30(29.1) - 5(15.6) 13(27.1) 6(26.1) 24(23.3)

Stage IV (%) - 7(18.4) 6(13.6) 3(14.3) 16(15.5) - 7(21.9) 5(10.4) 4(17.4) 16(15.5)

30

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Figure Legend

Fig. 1. Systemic analysis of WGBS data and validation by TCGA datasets. A,

Outline of WGBS data analysis and TCGA data validation: Four normal cell (NC)

WGBS data and three cancer cell (CC) WGBS date were collected, then CpG sites

detected by all seven samples were selected to do subsequent analysis, DMS was defined

by the methylation difference between CCi and NCj, DMR was defined by continuous 3

DMS in 100 bps region, DMG was defined by DMR embedded genes, CCi represent any

of CC samples, NCj represent any of NC samples. B, Average methylation level of each

normal and cancer sample in WGBS data showed cancer genome are global

hypomethylated (Wilcox test, P=0.057). C~F, Genomic distribution of all detected CpG

sites, CC-DMS, NC-DMS and NO-DMS, the promoter region was defined by TSS±1k. G,

Heatmap of CC-DMG and NC-DMG methylation from WGBS data, each row represents

one gene. H, Validation of CC-DMG and NC-DMG methylation by TCGA datasets:

Each blue dot represents a CC-DMG and each red dot represent an NC-DMG, the x-axis

represents the average methylation of normal samples in TCGA data, the y-axis

represents the average methylation of cancer samples in TCGA data. NC, Normal Cell.

CC, Cancer Cell. DMS, Differentially Methylated Sites. DMR, Differentially Methylated

Regions. DMG, Differentially Methylated Genes. TSS, Transcriptional Start Sites.

31

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Fig. 2. Histone genes are hypermethylated in lung cancer. A, Histone gene family are

divided into seven groups according to different DNA methylation pattern in normal cells

and cancer cells of WGBS data. B, Fourteen histone genes in group 4 were

hypermethylated in lung cancer cells in WGBS data. C, Fourteen histone gene are

hypermethylated from group 4 are validated in TCGA lung cancer cohort, nine of

fourteen are hypermethylated in both LUAD and LUSC. Box and whiskers plots, box

represents the upper quartile, lower quartile, and median, whiskers represent min to max.

NS, not significant. ***, P< 0.001. ****, P<0.0001. P values were calculated using the

two-tailed nonparametric Mann-Whitney test by GraphPad Prism 7.0 software. D, Four

histone genes in group 5 were specifically hypermethylated in LUAD sample in WGBS

data. E, Four histone genes in group 6 were specifically hypermethylated in LUSC

sample in WGBS data. F, Six histone genes in group 7 were specifically hypermethylated

in SCLC sample in WGBS data. G, Maximum methylation of HIST1H4F and HIST1H4I

(Max-IF) are significantly hypermethylated in primary lung cancer tissues. Error bar

represents upper quartile, lower quartile, and median. P-value was calculated using the

two-tailed, paired, nonparametric, Wilcoxon matched-pairs signed rank test by GraphPad

Prism 7.0 software. H, ROC analysis of Max-IF in primary lung cancer tissue, the AUC

(area under the curve) is 0.98 (95% CI 0.95-1.00, P<0.0001), with a specificity of 88.0%

and a sensitivity of 96.0%.

32

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Fig. 3. HIST1H4F and HIST1H4I were used as lung cancer biomarkers in BALF

samples. A, Maximum methylation of HIST1H4F and HIST1H4I (Max-IF) are

significantly hypermethylated in LUAD, LUSC, SCLC and total lung cancer in the BALF

training set (left) and the validation set (right). Box and whiskers plots, box represents the

upper quartile, lower quartile, and median, whiskers represent min to max. ****, P <

0.0001. P values for all the analyses were calculated using the two-tailed nonparametric

Mann-Whitney test by GraphPad Prism 7.0 software. B, ROC analysis of Max-IF in

training set: LUAD (AUC=0.84, 95% CI: 0.74-0.93, P<0.0001), LUSC (AUC=0.94, 95%

CI 0.89-1.00, P<0.0001), SCLC (AUC=0.97, 95% CI 0.92-1.00, P<0.0001) and total lung

cancer (AUC=0.91, 95%CI 0.86-0.96, P<0.0001). C, Sensitivity and specificity of LUAD,

LUSC, SCLC, total lung cancer in the training set (left) and validation set (right). D, The

sensitivity of LUAD detected by Max-IF combined CEA is much higher than Max-IF or

CEA individually. E, The comprehensive sensitivity and specificity for HIST1H4I and

HIST1H4F as a lung cancer diagnosis marker in the training set and validation set. BLD,

benign lung disease, containing pneumonia, emphysema, or tuberculosis, etc. LUAD,

lung adenocarcinoma. LUSC, lung squamous cell carcinoma. SCLC, small cell lung

carcinoma. Total, total lung cancer include LUAD, LUSC, and SCLC.

33

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Fig. 4. HIST1H4F as a Universal-Cancer-Only Methylation (UCOM) marker.

A, Histone gene family methylation in 17 different types of cancer: for each histone gene

in each cancer type, calculate the average methylation difference between normal and

cancer samples in the corresponding cancer type. The color showed the degree of average

methylation difference, the negative value means that histone gene is hypomethylated,

the positive value means that histone gene is hypermethylated. B, HIST1H4F is

hypermethylated in different types of cancer in the TCGA data. Ten CESC, ten STAD

and six SKCM paracancer samples were collected from primary tissues by us, due to too

few (n ≤ 3) control samples in TCGA database. Box and whiskers plots, box represent the

upper quartile, lower quartile, and median, whiskers represent min to max, light-colored

box represent para-cancer control samples, dark-colored box represent cancer samples.

NS, not significant. *, P < 0.1. **, P < 0.01. ***, P< 0.001, ****, P<0.0001. P values for

all the analyses were calculated using the two-tailed nonparametric Mann-Whitney test

by GraphPad Prism 7.0 software. C, Validation of HIST1H4F methylation in 8 other

types of cancer besides lung cancer. Error bar represents upper quartile, lower quartile

and median. P values for esophagus cancer, colorectal cancer, pancreatic cancer, head

and neck cancer were calculated using the two-tailed, paired, nonparametric, Wilcoxon

matched-pairs signed rank test by GraphPad Prism 7.0 software. P value for

cervical cancer, gastric cancer, breast cancer, and liver cancer were calculated using the

two-tailed nonparametric Mann-Whitney test by GraphPad Prism 7.0 software.

34

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

A B WGBS Data All CpG Sites DMS DMR DMG Average Methylation 90.0 WGBS: 24257 CC-DMS WGBS: 2408 CC-DMR WGBS: 958 CC-DMG 79.0 80.9 (CCi – NCj) ≥ 50% (17670 CC-DMS) (1508 CC-DMR, 11233 CC-DMS) 80.0 74.7 69.6 68.6 450K Methylation Array: 450K Methylation Array: 450K Methylation Array: 70.0 845 CC-DMS 488 CC-DMR (624 CC-DMS) 251 CC-DMG (401 CC-DMS) 60.0 53.0 Normal: 50.0 46.0 NC-1, NC-2, NC-3, NC-4 All Detected CpG WGBS: 442233 NC-DMS WGBS: 36393 NC-DMR WGBS: 1925 NC-DMG Sites: 19461312 (NCj – CCi) ≥ 50% (165221 NC-DMS) (11430 NC-DMR, 52454 NC-DMS) 40.0 Cancer: (34.5% of Genome) CC-1, CC-2, CC-3 450K Methylation Array: 450K Methylation Array: 450K Methylation Array: 30.0 1662 NC-DMS 736 NC-DMR (840 CC-DMS) 200 NC-DMG (377 CC-DMS)

Methylation Level Level (%) Methylation 20.0

WGBS: 4456347 NO-DMS 10.0 435249 NO-DMR 19575 NO-DMG (|CCi – NCj|) ≤ 20% 0.0

Methylation [0,20): Methylation [20,40): Methylation[40,60): Methylation [60,80): Methylation [80,100]: 961219 CpG Sites 426 CpG Sites 429 CpG Sites 5920 CpG Sites 3488353 CpG Sites

Promoter C D E Exon F 2% 3% Promoter 7% Exon Promoter Intergenic 8% Promoter 16% 17% 18% Intergenic 24% Intron 27% Intergenic Exon 41% 13% Exon Intergenic 21% 68% Intron 44% Intron Intron 37% 54%

19461312 All Detected CpG Sites 24257 CC-DMS 442233 NC-DMS 4456347 NO-DMS

G H

100(%) level Methylation DMG Methylation Verified in TCGA Data 100.0 80 CC-DMG

NC-DMG 60 80.0

CC-DMG 40 evel (%) L 20 60.0

0 40.0

20.0 CancerMethylation

0.0 0.0 20.0 40.0 60.0 80.0 100.0

NC-DMG Normal Methylation Level (%)

DMG Methylation in WGBS

Figure. 1 Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Methylation of histone gene in lung cancer sample from WGBS data A histone−wgbs−fig−3a−1 B

Methylation (%) 100.0 HIST1H2BK H3F3B 100 HIST1H4I HIST1H2BM H1F0 90.0 HIST2H2BF HIST1H2BG HIST1H3C HIST1H4F HIST1H3H HIST1H2AE 80 80.0 HIST1H3B HIST1H2BB HIST1H2BE HIST1H2AB HIST1H4K 70.0 HIST1H2AH H2AFZ HIST1H1A HIST1H2BI HIST2H2BE 60 60.0 HIST1H2AM HIST1H2BJ HIST1H3G HIST1H2AD HIST1H2BO Group 1 H2AFX 50.0 HIST1H2BD HIST1H2BF HIST1H3J HIST2H2AB 40 HIST1H4J 40.0 HIST2H2AC HIST1H2BH HIST1H4D HIST1H2BC HIST1H4B

Methylationlevel (%) 30.0 HIST1H4H HIST1H2AC 20 HIST1H1C HIST4H4 20.0 HIST1H2AK HIST1H4C HIST1H2BN 10.0 H1FX 0 HIST1H1E H2AFV 0.0 H3F3A CENPA NC-1 NC-2 NC-3 NC-4 CC-1 CC-2 CC-3 HIST1H2BA H1FNT H2BFWT H2BFM C Group 2 H1FOO Methylation of histone gene in lung cancer validated by TCGA data H2AFY H2AFY2 Normal (n=75) HIST1H2AA H3F3C LUAD (n=460) HIST1H1T HIST1H4G LUSC (n=372) HIST3H3 HIST1H3E HIST1H4E Group 3 HIST1H3D **** **** **** **** **** **** **** **** *** ns ns **** **** *** HIST1H4L 100 HIST1H1B **** **** **** **** **** **** **** **** **** **** **** **** **** **** HIST1H4I HIST1H2BM HIST1H3C 80 HIST1H4F HIST1H2BB HIST1H2BE Group 4 HIST1H1A 60 HIST1H2BI HIST1H3G HIST1H2AD HIST1H2BF 40 HIST1H3J HIST1H2BH HIST1H4D HIST1H2AG 20

HIST3H2A Methylation Level (%) Group 5 HIST3H2BB HIST1H3F HIST1H4A 0 HIST1H3A Group 6 HIST1H2AL HIST1H3I HIST1H2BL IST1H4I IST1H3J HIST2H3D IST1H4F IST1H3C IST1H1A IST1H2BI IST1H3G IST1H4D H H H IST1H2BE IST1H2BM H IST1H2BB H IST1H2AD IST1H2BF IST1H2BH H H HIST1H2AJ H H H H H H Group 7 H2AFJ H HIST1H2AI HIST1H1D Lung Cancer Specific Hypermethylated Histone Genes D E F LUAD Specific Hypermethylated Genes in WGBS LUSC Specific Hypermethylated Genes in WGBS SCLC Specific Hypermethylated Genes in WGBS 100.0 100.0 100.0 HIST1H2BL 90.0 90.0 90.0 HIST1H2AG HIST1H4A HIST2H3D 80.0 80.0 80.0 HIST3H2A HIST1H2AJ 70.0 70.0 HIST1H3A 70.0 HIST3H2BB H2AFJ 60.0 60.0 HIST1H2AL 60.0 HIST1H3F HIST1H2AI 50.0 50.0 HIST1H3I 50.0 HIST1H1D 40.0 40.0 40.0 30.0 30.0 30.0 20.0 20.0 20.0 Methylationlevel (%) Methylationlevel (%) Methylationlevel (%) 10.0 10.0 10.0 0.0 0.0 0.0

G H Max-IF in Primary Tissue ROC: Max-IF in Primary Tissue

**** 100 60

AUC = 0.98 40 50 P value < 0.0001 Cutoff = 9.95% Sensitivity = 96.0% Sensitivity% 20 Specificity = 88.0% 95% CI (0.95-1.00) Methylation Level (%) 0 0 0 50 100 100% - Specificity% 25) 25) = = n (n l ( tr C cer r an ce g C an n -C a Lu Par

Figure. 2 Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. A Max-IF in Training Set Max-IF in Validation Set **** 100 **** **** **** 100 **** 80 **** **** 80 **** 60 60 40 40 20

Methylation Level (%) 20

0 Methylation Level (%) 0 3) 30) 38) =44) =21) 10 (n= (n= n n = 3) n 29) 32) D D =48) =23) 10 LC ( n n = BL C (n= (n= n LUA LUSC ( S Total ( D D LC ( BL C LUA LUSC ( S Total (

B R OC o f Ma x-IF i n Tr aining set

1 00

5 0 9 5% C I P val ue L UA D 0 .74- 0. 93 P <0 .0 001

S en si ti vity % L US C 0 .89- 1. 00 P <0 .0 001 S CL C 0 .92- 1. 00 P <0 .0 001 T ot al 0 .86- 0. 97 P <0 .0 001 0 0 5 0 1 00 1 00 % - Specific it y%

C Max-IF in Training Set Max-IF in Validation Set 100.0 96.7 96.7 96.7 95.2 96.7 100.0 96.5 96.5 96.5 95.7 96.5 86.4 90.0 90.0 85.4 82.5 78.6 80.0 80.0 65.6 70.0 60.5 70.0 60.0 60.0 50.0 50.0 Specificity% 40.0 40.0 Sensitivity% 30.0 30.0 20.0 20.0 Sensitivity/Specificity(%) Sensitivity/Specificity(%) 10.0 10.0 0.0 0.0 LUAD LUSC SCLC Total LUAD LUSC SCLC Total

D E LUAD 90.0 Lung Cancer 81.5 80.0 77.8 100.0 96.7 96.5 86.0 87.0 70.0 65.6 90.0 60.0 60.5 80.0 70.0 50.0 47.2 44.4 60.0 40.0 Training Set Specificity 50.0 30.0 Sensitivity Sensitivity(%) Validation Set 40.0 20.0 30.0 10.0 20.0 Sensitivity/Specificity(%) 0.0 10.0 Max-IF CEA Max-IF 0.0 Combined Training Set Validation Set CEA

Figure. 3 Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

A C HIST1H4F Methylation in Multi-Types of Cancer HIST1H4F 60 HIST1H3E 45 HIST1H2BB 100 HIST1H1A HIST1H3C 40 **** ** *** **** ** * **** ** HIST1H2BH HIST1H3J 30 HIST1H2BM HIST1H2BI HIST1H3F 20 80 HIST1H2BE )) % % ( ( nn oitaly oitaly hte hte M M HIST1H4L HIST1H4I 15 HIST1H3G HIST1H3I 0 HIST3H2BB HIST2H3D 60 HIST1H2AJ 0 HIST1H4D HIST3H2A −20 HIST1H1B HIST2H2BF HIST1H2AG HIST1H2BG -15 HIST1H1D 40 HIST1H3A HIST1H2AI HIST1H3B − 40 H2AFJ -30 HIST1H4J HIST1H4K

HIST1H3H Methylation Level (%) 20 HIST1H2AL HIST1H4E HIST1H3D HIST1H2AE HIST1H2BO HIST1H2AD HIST1H2BF 0 HIST1H2AB H1F0 H3F3B ) ) ) ) ) ) ) ) 0) 0 0) HIST1H4A 1 =9) 1 H3F3C = =12) =10) (n=9) n =14) H2AFY2 (n=20 (n=10 n (n=12 n (n=10 (n=1 n (n=10 (n=14 n (n=10 HIST1H2BL r r HIST1H2BN Ctrl HIST1H2AM Ctrl Ctrl Ctrl Ctrl ce mal ( ce Ctrl Ctrl HIST1H2AH r HIST1H2AC Cancer ( Can No HIST1H2BK s ic Can HIST1H2BD ical east Cancer ( H3F3A Liver Cancer (n=23) creatic Cancer ( HIST1H2BJ rectal Cancer ( Br phagu o Gastr an HIST4H4 o Cerv P H2AFY Col HIST1H2AK Es H1FX HIST1H4C Head and Neck Cancer (n= HIST1H4B HIST3H3 HIST2H2AC HIST1H2BC H2AFZ HIST1H4H H2AFX HIST2H2BE HIST1H1C HIST2H2AB HIST1H1E HIST1H4G H2AFV CENPA H2AFB3 HIST1H1T HIST1H2BA HIST1H2AA H1FNT H1FOO H2BFM H2BFWT

B HIST1H4F Methylation in TCGA Database **** **** **** *** **** **** **** **** **** *** **** **** *** ** **** *** **** 100

80

60

40

20 Methylation Level (%)

0 ) ) ) ) ) 7 9) 9 6 3 2 30) 1 07) 8 5 (n=50) er (n=769) er (n=431)mal (n=cer (n=36) c c r n cer (n=307) rmal (n=8) o ncer (n=474) a M-N C HOL-No EAD-Normal (n=7) M-C UNG-NormalIHC-Normal (n=75)IHC-CancerRCA-Normal (n=37AAD-Normal (n=98) RAD-Normal (n=10) IRC-Normal (n=50)IRC-CancerLCA-Normal (n=160) (n=3NSC-Normal (n=21)NSC-CancerCEC-Normal (n=50) (n=528) (n=46)HOL-CaOAD-NormalR (n=38) SCA-NormalSCA-CancerHCA-Normal (n=1 (n=185)ESC-Normal (n=56)ESC-CancerK (n=1C (n=307)TAD-Normal (n=1 L UNG-CancerL (n=B RCA-CanP AAD-CancerP RAD-Cancer (n=184) (n=498)B LCA-CancerH (n=412)U CEC-CanC C OAD-CancerEAD-Can (n=297)E T HCA-CancerC (n=S K S TAD-Cancer (n=396) L L B P P K K B H U C C R E T C S S

Figure. 4 Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Histone-related genes are hypermethylated in lung cancer and hypermethylated HIST1H4F could serve as a pan-cancer biomarker

Shi-Hua Dong, Wei Li, Lin Wang, et al.

Cancer Res Published OnlineFirst October 1, 2019.

Updated version Access the most recent version of this article at: doi:10.1158/0008-5472.CAN-19-1019

Supplementary Access the most recent supplemental material at: Material http://cancerres.aacrjournals.org/content/suppl/2019/10/01/0008-5472.CAN-19-1019.DC1

Author Author manuscripts have been peer reviewed and accepted for publication but have not yet been Manuscript edited.

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/early/2019/10/01/0008-5472.CAN-19-1019. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research.