Carcinogenesis

Genome-wide DNA methylation profile of early-onset endometrial cancer: Its correlation with genetic aberrations and comparison with late-onset endometrial cancer

Journal: Carcinogenesis ManuscriptFor ID CARCIN-2018-00453.R2 Peer Review Manuscript Type: Original Manuscript

Date Submitted by the 28-Jan-2019 Author:

Complete List of Authors: Makabe, Takeshi; Keio University School of Medicine Graduate School of Medicine, Pathology; Keio University School of Medicine Graduate School of Medicine, Obstetrics and Gynecology Arai, Eri; Keio University School of Medicine, Department of Pathology; National Cancer Center Reserach Institute, Division of Molecular Pathology Hirano, Takuro; Keio University School of Medicine Graduate School of Medicine, Pathology; Keio University School of Medicine Graduate School of Medicine, Obstetricus and Gynecology Ito, Nanako; Keio University School of Medicine, Department of Pathology Fukamachi, Yukihiro; Mitsui Knowledge Industry Co Ltd, Bioscience Takahashi, Yoriko; Mitsui Knowledge Industry Co., Ltd., Bioscience Department, Business Development Division Hirasawa, Akira; Keio University School of Medicine Graduate School of Medicine, Obstetrics and Gynecology Yamagami, Wataru; Keio University School of Medicine Graduate School of Medicine, Obstetrics and Gynecology Susumu, Nobuyuki; International University of Health and Welfare School of Medicine, Obstetrics and Gynecology Aoki, Daisuke; Keio Gijuku Daigaku Igakubu Daigakuin Igaku Kenkyuka, Department of Obstetrics and Gynecology Kanai, Yae; Keio University School of Medicine, Department of Pathology; National Cancer Center Research Institute, Division of Molecular Pathology

DNA methylation, Infinium array, endometrioid endometrial cancer, Keywords: early-onset, fertility preservation

Page 1 of 37 Carcinogenesis

1 Makabe et al. 1 2 3 4 5 6 Genome-wide DNA methylation profile of early-onset endometrial cancer: Its 7 8 correlation with genetic aberrations and comparison with late-onset endometrial 9 10 11 cancer 12 13 14 15 16 Takeshi Makabe,1,2 Eri Arai,1* Takuro Hirano,1,2 Nanako Ito,1 Yukihiro Fukamachi,3 17 18 Yoriko Takahashi,3 Akira Hirasawa,2 Wataru Yamagami,2 Nobuyuki Susumu,2,4 Daisuke 19 20 2 1 21 Aoki, Yae Kanai For Peer Review 22 23 24 25 26 1Department of Pathology, Keio University School of Medicine, Tokyo 160-8582, Japan, 27 28 2Department of Obstetrics and Gynecology, Keio University School of Medicine, Tokyo 29 30 3 31 160-8582, Japan, Bioscience Department, Mitsui Knowledge Industry Co., Ltd., Tokyo, 32 33 105-6215, Japan, 4Department of Obstetrics and Gynecology, International University of 34 35 Health and Welfare School of Medicine, Chiba 286-8686, Japan 36 37 38 39 40 * 41 To whom correspondence should be addressed: Department of Pathology, Keio 42 43 University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan. 44 45 Tel: 81-3-3353-1211; Fax: 81-3-3353-3290; E-mail: [email protected]. 46 47 48 49 50 51 Running head: DNA methylation profile of endometrial cancer 52 53 54 55 56 Abbreviations: CNA, copy number alteration; CTNNB1-M, tissue samples of early- 57 58 59 60 Carcinogenesis Page 2 of 37

1 Makabe et al. 2 2 3 4 5 6 onset endometrioid endometrial cancer with somatic mutation of the CTNNB1 ; 7 8 CTNNB1-W, tissue samples of early-onset endometrioid endometrial cancer without 9 10 11 somatic mutation of the CTNNB1 gene; EE, early-onset endometrioid endometrial cancer 12 13 tissue; FGFR2-M, tissue samples of late-onset endometrioid endometrial cancer with 14 15 somatic mutation of the FGFR2 gene; FGFR2-W, tissue samples of late-onset 16 17 endometrioid endometrial cancer without somatic mutation of the FGFR2 gene; GO, gene 18 19 20 ontology; MPA, medroxyprogesterone acetate; LE, late-onset endometrioid endometrial 21 For Peer Review 22 cancer tissue; PCA, principal component analysis; ROC, receiver operating 23 24 characteristic; TSS, transcription start site. 25 26 27 28 29 30 Topic: DNA methylation, Infinium array, endometrioid endometrial cancer, early-onset, 31 32 fertility preservation. 33 34 35 36 37 Issue Section: ORIGINAL MANUSCRIPT 38 39 40 41 42 43 248 words of abstract, 4,363 words of text, one table, 5 figures and 48 references. 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 3 of 37 Carcinogenesis

1 Makabe et al. 3 2 3 4 5 6 Abstract 7 8 9 The present study was performed to clarify the significance of DNA methylation 10 11 alterations during endometrial carcinogenesis. Genome-wide DNA methylation analysis 12 13 and targeted sequencing of tumor-related were performed using the Infinium 14 15 16 MethylationEPIC BeadChip and the Ion AmpliSeq Cancer HotSpot Panel v2, 17 18 respectively, for 31 samples of normal control endometrial tissue from patients without 19 20 endometrial cancer and 81 samples of endometrial cancer tissue. Principal component 21 For Peer Review 22 analysis revealed that tumor samples had a DNA methylation profile distinct from that of 23 24 25 control samples. enrichment analysis revealed significant differences of 26 27 DNA methylation at 1,034 CpG sites between early-onset endometrioid endometrial 28 29 cancer (EE) (patients aged <40 years) and late-onset cancer (LE), which were 30 31 accumulated among “transcriptional factors”. Mutations of the CTNNB1 gene or DNA 32 33 34 methylation alterations of genes participating in Wnt signaling were frequent in EEs, 35 36 whereas genetic and epigenetic alterations of FGF signaling genes were observed in LEs. 37 38 Unsupervised hierarchical clustering grouped EE samples into Clusters EA (n=22) and 39 40 EB (n=12). Clinicopathologically less aggressive tumors tended to be accumulated in 41 42 43 Cluster EB, and DNA methylation levels of 18 genes including HOXA9, HOXD10 and 44 45 SOX11 were associated with differences in such aggressiveness between the two clusters. 46 47 We identified eleven marker CpG sites that discriminated EB samples from EA samples 48 49 with 100% sensitivity and specificity. These data indicate that genetically and 50 51 52 epigenetically different pathways may participate in the development of EEs and LEs, 53 54 and that DNA methylation profiling may help predict tumors that are less aggressive and 55 56 amenable to fertility preservation treatment. 57 58 59 60 Carcinogenesis Page 4 of 37

1 Makabe et al. 4 2 3 4 5 6 Summary 7 8 9 Genome-wide analysis using 112 samples of endometrial tissue has revealed that 10 11 genetically and epigenetically different pathways may participate in early- and late-onset 12 13 endometrial carcinogenesis. DNA methylation profiling may predict tumors that are less 14 15 16 aggressive and amenable to fertility preservation treatment. 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 5 of 37 Carcinogenesis

1 Makabe et al. 5 2 3 4 5 6 Introduction 7 8 9 Endometrial cancer is one of the most common gynecological malignancies in the 10 11 developed countries, with about 320,000 new cases diagnosed annually worldwide (1). 12 13 Although it is typically a disease of post-menopausal women, approximately 2-14% of 14 15 16 cases occur in women aged 40 years or less (2-4), and in fact recently endometrial cancer 17 18 in younger patients has shown a marked increase (4). The standard treatment for 19 20 endometrial cancer is hysterectomy and bilateral salpingo-oophorectomy 21 For Peer Review 22 with or without lymph node dissection, but for younger patients fertility-sparing treatment 23 24 25 that avoids surgical menopause needs to be considered. Currently, conservative treatment 26 27 with hormones such as medroxyprogesterone acetate (MPA) is used for early-stage early- 28 29 onset endometrial cancer without myometrial invasion or extrauterine spread (5). 30 31 However, we have reported previously that the recurrence rate after MPA therapy is 32 33 34 63.2%, and the resulting pregnancy rate lower than would be desired (6). In order to 35 36 improve the outcome of medical treatment for patients with endometrial cancer, there is 37 38 a need to clarify the molecular basis of endometrial carcinogenesis, especially in young 39 40 patients. 41 42 43 It is well known that not only genomic but also epigenomic alterations participate 44 45 in carcinogenesis in various human organs (7,8). DNA methylation alterations are one of 46 47 48 the most important epigenomic changes resulting in chromosomal instability and aberrant 49 50 expression of tumor-related genes (9-11). We and other groups have shown that DNA 51 52 methylation alterations are frequently associated with tumor aggressiveness and poorer 53 54 patient outcome, and can be employed clinically as prognostic markers in cancers of 55 56 57 various organs (12-17). Recently single-CpG resolution genome-wide DNA methylation 58 59 60 Carcinogenesis Page 6 of 37

1 Makabe et al. 6 2 3 4 5 6 screening using the Infinium assay has been introduced for analysis of human tissue 7 8 specimens. However, none of previous studies of endometrial cancer based on the 9 10 11 Infinium assay (18-22), including the study by The Cancer Genome Atlas (TCGA) (23), 12 13 focused on early-onset endometrial cancer. 14 15 16 In the present study, to further clarify the significance of DNA methylation 17 18 alterations during endometrial carcinogenesis and their correlation with genetic 19 20 abnormalities, we performed genome-wide DNA methylation (methylome) analysis 21 For Peer Review 22 using the high-density EPIC BeadChip on 31 samples of normal control endometrial 23 24 25 tissue from patients without endometrial cancer and 81 samples of endometrial cancer 26 27 tissue including 35 samples of early-onset endometrial cancer. 28 29 30 31 32 Materials and methods 33 34 35 Patients and tissue samples 36 37 38 The 81 samples of cancerous tissue were obtained from 81 patients with primary 39 40 endometrial cancer who underwent hysterectomy at Keio University Hospital between 41 42 October 2005 and August 2016. Thirty-five of the patients were aged 40 years or less 43 44 45 (early-onset endometrial cancer) and 46 were older than 40 years (late-onset endometrial 46 47 cancer). None of these patients had received any preoperative treatment. Supplementary 48 49 Table 1A, available at Carcinogenesis Online, gives details of the ages and 50 51 clinicopathological backgrounds of these patients. Histological diagnosis and grading 52 53 54 were based on the 2003 World Health Organization classification (24) and the Tumor- 55 56 Node-Metastasis classification (25). The clinical stage of the disease was based on the 57 58 2008 revised International Federation of Gynecology and Obstetrics classification (26). 59 60 Page 7 of 37 Carcinogenesis

1 Makabe et al. 7 2 3 4 5 6 Recurrence was diagnosed by clinicians on the basis of physical examination and imaging 7 8 modalities such as computed tomography and positron emission tomography. 9 10 11 For comparison, 31 samples of normal control endometrial tissue were obtained from 12 13 materials that had been surgically resected from patients without endometrial cancer 14 15 16 (Supplementary Table 1B, available at Carcinogenesis Online): 17, 6, 6, and 2 patients 17 18 with uterine leiomyoma, ovarian cancer, cervical cancer and lobular endocervical 19 20 glandular hyperplasia, respectively. 21 For Peer Review 22 23 Tissue specimens were frozen immediately after surgery and preserved in the Keio 24 25 Women’s Health Biobank in accordance with the “Japanese Society of Pathology 26 27 Guidelines for the handling of pathological tissue samples for genomic research” (27). 28 29 30 This study was approved by the Ethics Committee of Keio University Hospital and was 31 32 performed in accordance with the Declaration of Helsinki. All patients included in this 33 34 study provided written informed consent for use of their materials. 35 36 37 38 39 40 Infinium analysis 41 42 High-molecular-weight DNA was extracted from fresh frozen tissue samples using 43 44 45 phenol-chloroform, followed by dialysis. DNA methylation status at 866,895 CpG loci 46 47 was examined at single-CpG resolution using the Infinium MethylationEPIC BeadChip 48 49 (Illumina, San Diego, CA) (28). Five-hundred-nanogram aliquots of DNA were subjected 50 51 to bisulfite conversion using an EZ DNA Methylation-Gold Kit (Zymo Research, Irvine, 52 53 54 CA). After hybridization, the specifically hybridized DNA was fluorescence-labeled by 55 56 a single-base extension reaction and detected using an iScan reader (Illumina) in 57 58 accordance with the manufacturer’s protocol. The data were then assembled using 59 60 Carcinogenesis Page 8 of 37

1 Makabe et al. 8 2 3 4 5 6 GenomeStudio methylation software (Illumina). At each CpG site, the ratio of the 7 8 fluorescence signal was measured using a methylated probe relative to the sum of the 9 10 11 methylated and unmethylated probes, i.e. the so-called -value, which ranges from 0.00 12 13 to 1.00 reflecting the methylation level of an individual CpG site. 14 15 16 In the present assay, the call proportions (P <0.01 for detection of signals above 17 18 the background) for 801 probes on autosomes in all of the 112 tissue samples were less 19 20 than 90%. Since such a low proportion may have been attributable to polymorphism at 21 For Peer Review 22 the probe CpG sites, these probes were excluded from the present assay. In addition, to 23 24 25 avoid any gender-specific methylation bias, all 19,681 probes on X and Y 26 27 were excluded, leaving a final total of 846,413 autosomal CpG sites. Correlations 28 29 between levels of DNA methylation and mRNA expression were examined using the 30 31 dataset for endometrial cancer deposited in the TCGA database (https://tcga- 32 33 34 data.nci.nih.gov/docs/publications/ucec_2013/) (23). 35 36 37 38 39 Targeted sequencing analysis of 50 tumor-related genes 40 41 42 Targeted sequencing of 50 tumor-related genes was performed in 81 endometrial 43 44 cancer tissue samples. Library preparation was performed using an Ion AmpliSeq 45 46 Library Kit 2.0-96LV and Ion AmpliSeq Cancer Hotspot Panel v2 in accordance with the 47 48 49 manufacturer’s protocol (Thermo Fisher Scientific, Waltham, MA). The panel is designed 50 51 to amplify 207 amplicons covering 2,849 mutations of 50 oncogenes and tumor 52 53 suppressor genes deposited in the Catalogue of Somatic Mutations in Cancer (COSMIC) 54 55 database (https://cancer.sanger.ac.uk) (Supplementary Table 2, available at 56 57 58 Carcinogenesis Online). After library preparation, each amplicon library was quantified 59 60 Page 9 of 37 Carcinogenesis

1 Makabe et al. 9 2 3 4 5 6 using an Ion Library Quantitation Kit on the 7500 Fast Real-Time PCR System (Thermo 7 8 Fisher Scientific) employing the relative standard curve method. Next, emulsion PCR and 9 10 TM TM 11 Ion Sphere particle enrichment were carried out using the Ion OneTouch 2 system 12 13 and Ion OneTouchTM ES system (Thermo Fisher Scientific). Purified Ion Sphere TM 14 15 particles were loaded on an Ion 316 or 318 Chip and sequenced using the Ion PGMTM 16 17 sequencer (Thermo Fisher Scientific). Data were analyzed using Torrent Suite Software 18 19 20 v5.2.2 and Ion Reporter Software v5.2-5.4 (Thermo Fisher Scientific). Detected variants 21 For Peer Review 22 with quality scores of <20 and allele frequencies of <2.0% were eliminated in accordance 23 24 with the manufacturer’s instructions. We filtered out single nucleotide polymorphisms 25 26 using the databases of the 1000 Genomes project (http://www.internationalgenome.org/) 27 28 29 and 5000 exomes project (http://evs.gs.washington.edu/EVS/). Effects of amino acid 30 31 substitutions on function due to single-nucleotide non-synonymous mutations 32 33 were estimated using the Sorting Intolerant from Tolerant (SIFT) protocol (29) 34 35 (http://sift.jcvi.org). Copy number analyses were performed using the baseline of 5 36 37 38 normal human kidney tissue samples without kidney cancer, and confidence levels of <10 39 40 were eliminated in accordance with the manufacturer’s instructions. The incidence of 41 42 genetic aberrations in the present cohort was compared against the dataset for endometrial 43 44 cancer deposited in the TCGA database (https://tcga- 45 46 47 data.nci.nih.gov/docs/publications/ucec_2013/) (23). 48 49 50 51 52 Gene ontology (GO) enrichment analysis and pathway analysis 53 54 55 In order to reveal the function of encoded by genes showing DNA methylation 56 57 alterations and gene mutations, and to reveal the biological processes in which such 58 59 60 Carcinogenesis Page 10 of 37

1 Makabe et al. 10 2 3 4 5 6 proteins participate, GO enrichment analysis and pathway analysis were conducted using 7 8 the GeneGO MetacoreTM software package (version 6.7) (Thomson Reuters, New York, 9 10 11 NY). 12 13 14 15 16 Statistics 17 18 19 Infinium probes showing significant differences in DNA methylation levels between 20 21 sample groups were definedFor by PeerWelch’s t test Review and adjusted by Bonferroni correction. The 22 23 DNA methylation profiles of the 112 tissue samples were analyzed using principal 24 25 26 component analysis (PCA). Differences in the incidences of genetic aberrations between 27 28 sample groups were examined using Fisher’s exact test. Unsupervised hierarchical 29 30 clustering based on DNA methylation levels (Euclidean distance, Ward method and 31 32 Canberra distance, Complete linkage method) was performed. The associations between 33 34 35 clinicopathological variables and DNA methylation alterations were evaluated using 36 37 Fisher’s exact test, Mann-Whitney U test or Welch’s t test. The receiver operating 38 39 characteristic (ROC) curve was generated, and the Youden index (30) for each probe was 40 41 used as a cutoff value for examining distinctions between the sample groups. Programing 42 43 44 language R and IBM SPSS statistics 20.0 were used to analyze the data. 45 46 47 48 49 Results 50 51 52 DNA methylation alterations during endometrial carcinogenesis 53 54 55 First, to obtain a general overview of endometrial cancers, we identified 58,958 probes 56 57 that were aberrantly methylated in 81 endometrial cancer tissue samples in comparison 58 59 60 Page 11 of 37 Carcinogenesis

1 Makabe et al. 11 2 3 4 5 6 with the 31 normal control endometrial tissue samples (Welch’s t test, adjusted 7 8 Bonferroni correction [=1.18×10-8] and  9 endometrial cancer tissue–normal control endometrial tissue 10 11 value of more than 0.25 or less than -0.25), indicating that DNA methylation alterations 12 13 had occurred during endometrial carcinogenesis. PCA using the DNA methylation levels 14 15 of these 58,958 probes revealed that endometrial cancer tissue samples had a DNA 16 17 methylation profile that differed distinctly from that of normal control endometrial tissue 18 19 20 samples (Figure 1A). The leading 10 genes, i.e. TRIM15, HIST2H3PS2, NBPF19, L1TD1, 21 For Peer Review 22 HIST2H2BA, NKAPL, DLX2-AS1, GRM1, ADAM12 and CFAP46, showing significant 23 24 differences in DNA methylation levels between endometrial cancer tissue and normal 25 26 control endometrial tissue are labeled in the Manhattan plot (Figure 1B). To our 27 28 29 knowledge, no previous English-language papers have yet reported correlations between 30 31 these 10 genes and endometrial cancers: the present study may have revealed novel genes 32 33 potentially associated with endometrial carcinogenesis, although the expression levels 34 35 and functions of these genes will need to be further examined in connection with 36 37 38 endometrial cancers. 39 40 41 42 43 Somatic mutations and copy number alterations (CNAs) in endometrial cancer 44 45 46 Five hundred and thirty-five somatic mutations were detected in 81 endometrial cancer 47 48 tissue samples (6.6±2.9 mutations per sample): 230 synonymous and 305 non- 49 50 synonymous. The non-synonymous mutations consisted of 242 missense mutations 51 52 53 (79%), 44 nonsense mutations (14%), 14 frame shift deletions (5%), 4 frame shift 54 55 insertions (1%), and one frameshift block substitution (0.3%). Genes showing the highest 56 57 incidences of somatic mutations included PTEN in 47 patients (58%), PIK3CA in 44 58 59 60 Carcinogenesis Page 12 of 37

1 Makabe et al. 12 2 3 4 5 6 patients (54%), CTNNB1 in 25 patients (31%), and TP53 in 15 patients (19%). When 7 8 compared with data for endometrial cancer deposited in the TCGA database (https://tcga- 9 10 11 data.nci.nih.gov/docs/publications/ucec_2013/) (23), the incidences of somatic mutations 12 13 of the MLH1 (P=1.08×10-3), SMARCB1 (P=3.01×10-3), AKT1 (P=0.017) and STK11 14 15 (P=3.47×10-3) genes were higher, and the incidences of those of the FBXW7 (P=0.037), 16 17 ATM (P=0.049), EZH2 (P=0.043), GNAS (P=0.043) and JAK2 (P=0.044) genes were 18 19 20 lower in our cohort, whereas significant differences were not found in the remaining 41 21 For Peer Review 22 genes (Figure 2A). The incidences of CNAs of all the 50 genes examined were lower than 23 24 10%, and this low incidence was again in accordance with the data deposited in the TCGA 25 26 database (https://tcga-data.nci.nih.gov/docs/publications/ucec_2013/) (23) (Figure 2B). 27 28 29 30 31 32 Differences in DNA methylation profile between early- and late-onset endometrioid 33 34 endometrial cancers 35 36 37 In order to avoid the bias caused by differences in histological subtypes, we then focused 38 39 on the major subtype, i.e. endometrioid endometrial cancer. We again identified 63,033 40 41 probes that showed significant differences in DNA methylation levels between the 31 42 43 44 normal control endometrial tissue samples and 74 endometrioid endometrial cancer tissue 45 -8 46 samples (Welch’s t test, adjusted Bonferroni correction [=1.18×10 ] and endometrioid 47 48 endometrial cancer tissue-normal control endometrial tissue value of more than 0.25 or less than -0.25). 49 50 51 Next, we focused on differences between early-onset (patients aged <40 years) 52 53 and late-onset (>40 years) endometrioid endometrial cancers. Clinicopathological 54 55 parameters of early- and late-onset cases are summarized in Supplementary Table 3, 56 57 58 available at Carcinogenesis Online. Among the 63,033 probes, 1,034 showed significant 59 60 Page 13 of 37 Carcinogenesis

1 Makabe et al. 13 2 3 4 5 6 differences in DNA methylation levels between the 34 samples of early-onset 7 8 endometrioid endometrial cancer tissue (EE) and 40 samples of late-onset endometrioid 9 10 11 endometrial cancer tissue (LE) (Welch’s t test P <0.01; LE-EE value >0.25 or less than 12 13 <-0.25) (Supplementary Table 4, available at Carcinogenesis Online). Since several 14 15 patients (n=11) aged <50 years were included in the LE group, a relatively small number 16 17 of CpG sites showed significant differences in DNA methylation levels between EE and 18 19 20 LE samples. All 1,034 probes again showed significant differences in DNA methylation 21 For Peer Review 22 levels between LE samples and normal control endometrial tissue samples (Welch’s t test 23 24 P <0.01; LE-normal control endometrial tissue value >0.25 or <-0.25), whereas only 102 of 1,034 25 26 probes showed significant differences in DNA methylation levels between EE samples 27 28 29 and normal control endometrial tissue samples (Welch’s t test P <0.01; EE-normal control 30 31 endometrial tissue value >0.25 or <-0.25). Among these 102 probes, the DNA methylation 32 33 levels of 101 probes were higher in EE samples than in normal control endometrial tissue 34 35 samples and even higher in LE samples. 36 37 38 Three hundred and seventy-one genes, for which the 1,034 probes showing 39 40 significant differences of DNA methylation levels between EE samples and LE samples 41 42 43 were designed, were evaluated for protein function by enrichment analysis using the 44 45 MetaCore software and compared with the protein function distribution of genes within 46 47 the GeneGo database. These genes were clearly overrepresented by “transcriptional 48 49 factors” (P=1.67×10-32), being 6.004 times more abundant than expected. Indeed 66 of 50 51 52 371 genes were categorized as transcription factors by MetaCore software (shown by 53 54 asterisks in Supplementary Table 4, available at Carcinogenesis Online). The top 20 55 56 statistically significant GO molecular functions in which the 371 genes participated are 57 58 59 60 Carcinogenesis Page 14 of 37

1 Makabe et al. 14 2 3 4 5 6 listed in Table 1: Fifteen out of the top 20 functions were related to DNA binding or 7 8 transcriptional regulation, and are shown by asterisks, and transcription factors included 9 10 11 in the top 20 GO molecular functions are underlined in Table 1. All 189 statistically 12 13 significant GO molecular functions (P<0.05) are listed in Supplementary Table 5, 14 15 available at Carcinogenesis Online. 16 17 18 19 20 21 Differences in geneticFor aberrations Peer between Review early- and late-onset endometrioid 22 23 endometrial cancers 24 25 26 Two hundred and one somatic mutations (90 synonymous and 111 non-synonymous) 27 28 were detected in 34 samples of EE (5.9±2.2 mutations per sample), whereas 300 29 30 mutations (119 synonymous and 181 non-synonymous) were detected in 40 samples of 31 32 LE (7.5±3.3 mutations per sample). The incidence of somatic mutation of the CTNNB1 33 34 -4 35 gene was significantly higher in EE samples than in LE samples (P=4.26×10 ), and the 36 37 incidence of that of the FGFR2 gene was significantly higher in LE samples than in EE 38 39 samples (P=8.77×10-3), whereas significant differences were not found in the remaining 40 41 48 genes (Figure 3A). Average SIFT scores for the CTNNB1 and FGFR2 genes were both 42 43 44 0 in EE samples and LE samples, respectively, indicating that these somatic mutations 45 46 impair protein function (a SIFT score of <0.05 means “damaging” (29)). 47 48 49 50 51 Correlation between DNA methylation alterations and genetic aberrations in early- and 52 53 54 late-onset endometrioid endometrial cancers 55 56 In order to clarify the correlation between DNA methylation alterations and genetic 57 58 59 60 Page 15 of 37 Carcinogenesis

1 Makabe et al. 15 2 3 4 5 6 aberrations in EE samples, we identified 2,908 probes that showed significant differences 7 8 in DNA methylation levels between 19 EE samples with somatic mutation of the CTNNB1 9 10 11 gene (CTNNB1-M) and 15 EE samples without it (CTNNB1-W) (Welch’s t test P <0.01; 12 13 CTNNB1-M-CTNNB1-W value >0.15 or <-0.15) (Supplementary Table 6, available at 14 15 Carcinogenesis Online). MetaCore pathway analysis revealed that 1,419 genes for which 16 17 these 2,908 probes were designed were significantly accumulated in the Wnt signaling 18 19 -8 20 pathway (P=3.45×10 ) (Figure 3B). We identified 1,133 probes that showed significant 21 For Peer Review 22 differences in DNA methylation levels between 10 LE samples with somatic mutation of 23 24 the FGFR2 gene (FGFR2-M) and 30 LE samples without it (FGFR2-W) (Welch’s t test 25 26 P <0.01;  value >0.15 or <-0.15) (Supplementary Table 7, available at 27 FGFR2-M-FGFR2-W 28 29 Carcinogenesis Online). Three of the 567 genes for which these 1,133 probes were 30 31 designed were included in the FGF signaling pathway according to the MetaCore 32 33 software (Supplementary Figure 1, available at Carcinogenesis Online). 34 35 36 37 38 39 Epigenetic clustering of endometrioid endometrial cancer based on DNA methylation 40 41 profile 42 43 44 Unsupervised hierarchical clustering using 63,033 probes showing significant differences 45 46 in DNA methylation levels between the 31 normal control endometrial tissue samples and 47 48 74 endometrioid endometrial cancer tissue samples subclustered the cancer patients into 49 50 Cluster A (n=58) and Cluster B (n=16) (Figure 4A). The clinicopathological parameters 51 52 53 of the patients in these clusters are summarized in Supplementary Table 8A, available at 54 55 Carcinogenesis Online. Patients in Cluster A were significantly older, showed 56 57 significantly frequent lymphovascular invasion and tended to be more frequently 58 59 60 Carcinogenesis Page 16 of 37

1 Makabe et al. 16 2 3 4 5 6 diagnosed as histological grade 3 than those in Cluster B. 7 8 9 Kaplan-Meier survival curves of patients belonging to Clusters A and B were 10 11 plotted for a period ranging from 263 to 4,034 days (median, 1,605 days). Although the 12 13 cancer-free and overall patient survival rates tended to be lower in Cluster A than in 14 15 16 Cluster B, such differences did not reach statistically significant levels (P=0.24 and 17 18 P=0.53, respectively, log-rank test), probably due to the small number of deaths or 19 20 recurrent cases (Supplementary Figure 2, available at Carcinogenesis Online). Moreover, 21 For Peer Review 22 all recurrence-positive (n=4) and disease-specific death-positive (n=1) cases were 23 24 25 included in Cluster A (Supplementary Table 8A, available at Carcinogenesis Online). 26 27 Unsupervised hierarchical clustering using 40,589 probes showing significant 28 29 30 differences in DNA methylation levels between the 31 normal control endometrial tissue 31 32 samples and 34 EE samples (Welch’s t test, adjusted Bonferroni correction [α=1.18×10- 33 8 34 ]; ΔβEE-normal control endometrial tissue value >0.25 or <-0.25) subclustered 34 early-onset 35 36 endometrioid endometrial cancer patients into Cluster EA (n=22) and Cluster EB (n=12) 37 38 39 (Figure 4B). The clinicopathological parameters of the patients in these clusters are 40 41 summarized in Supplementary Table 8B, available at Carcinogenesis Online. Patients in 42 43 Cluster EA tended to show more frequent lymphovascular invasion and to be more 44 45 frequently diagnosed as histological grade 3 than those in Cluster EB. 46 47 48 The cancer-free survival rate of patients belonging to Cluster EA tended to be lower 49 50 than that of patients in Cluster EB, although the difference did not reach a statistically 51 52 53 significant level (P=0.37, log-rank test), probably due to the small number of recurrent 54 55 cases (Supplementary Figure 2, available at Carcinogenesis Online). Only one recurrence 56 57 case was included in Cluster EA (Supplementary Table 8B, available at Carcinogenesis 58 59 60 Page 17 of 37 Carcinogenesis

1 Makabe et al. 17 2 3 4 5 6 Online). Since the overall survival rate was 100% in both Clusters EA and EB, the log- 7 8 rank test was not performed. Moreover, all patients belonging to Cluster EA were 9 10 11 included in Cluster A and all patients belonging to Cluster EB were included in Cluster 12 13 B without exception (Figure 4C). 14 15 16 As seen in Figure 4B, Probe Clusters I and III showed obvious differences in DNA 17 18 methylation levels between Clusters EA and EB. Furthermore, the ratio of probes located 19 20 in CpG islands, CpG island shores and CpG island shelves that were important for 21 For Peer Review 22 transcription regulation to all probes belonging to Probe Cluster I was significantly higher 23 24 25 than that of probes in Probe Cluster III. We then focused on Probe Cluster I, in which 927 26 27 of the 6,415 probes were designed from the transcription start site (TSS) to 1500 bp 28 29 upstream of the TSS of the 470 genes. One hundred and one (Supplementary Table 9A, 30 31 available at Carcinogenesis Online) of the 470 genes showed a significant (P<0.05) 32 33 34 inverse correlation (r < -0.2) between DNA methylation levels and the mRNA expression 35 36 levels in the TCGA database (https://tcga- 37 38 data.nci.nih.gov/docs/publications/ucec_2013/) (23). The 101 genes were also evaluated 39 40 for protein function by enrichment analysis using the MetaCore software and compared 41 42 43 with the protein function distribution of genes within the GeneGo database 44 45 (Supplementary Table 9B, available at Carcinogenesis Online). Fifty-nine statistically 46 47 significant GO molecular functions in which the 101 genes participated are listed in 48 49 Supplementary Table 9C (available at Carcinogenesis Online). DNA methylation levels 50 51 52 of 18 of the 101 genes, including HOXA9, HOXD10 and SOX11, were significantly 53 54 correlated with a higher incidence of lymphovascular invasion of early-onset 55 56 endometrioid endometrial cancer (Supplementary Table 10, available at Carcinogenesis 57 58 59 60 Carcinogenesis Page 18 of 37

1 Makabe et al. 18 2 3 4 5 6 Online). 7 8 9 10 11 12 Identification of marker CpG sites for distinguishing Cluster EB from Cluster EA 13 14 In order to identify diagnostic markers capable of distinguishing the less aggressive 15 16 17 Cluster EB from Cluster EA, ROC analysis was performed using the top 100 probes 18 19 showing the largest differences in DNA methylation levels between EA and EB samples, 20 21 and the correspondingFor AUC values Peer were calculated. Review Among the 100 probes, 11 showed 22 23 AUC values of 1 (Supplementary Table 11, available at Carcinogenesis Online). The 24 25 26 Youden index was used as a cutoff value for each of the 11 probes. As shown by the 27 28 scattergrams in Figure 5, the use of such cutoff values was able to discriminate EB 29 30 samples from EA samples with 100% sensitivity and specificity. The DNA methylation 31 32 levels of these 11 candidate marker CpG sites were successfully verified using another 33 34 35 quantification method, MassARRAY: significant correlations between DNA methylation 36 37 levels demonstrated by Infinium assay and those by MassARRAY were confirmed 38 39 (Supplementary Figure 3, available at Carcinogenesis Online). 40 41 42 43 44 45 Discussion 46 47 We focused on the differences between early- and late-onset cancers of the major 48 49 50 histological subtype, i.e. endometrioid endometrial cancer. The incidence of somatic 51 52 mutations of the CTNNB1 gene was significantly higher in EE samples than in LE 53 54 samples, whereas somatic mutations of the FGFR2 gene were more frequent in LE 55 56 samples than in EE samples. Moreover, DNA methylation alterations and genetic 57 58 59 60 Page 19 of 37 Carcinogenesis

1 Makabe et al. 19 2 3 4 5 6 aberrations may cooperatively activate the Wnt signaling pathway (31) during the 7 8 development of early-onset endometrioid endometrial cancer. Although an inverse 9 10 11 association between immunohistochemically detected nuclear accumulation of -catenin 12 13 and the age of endometrial cancer patients has been reported previously (32), and 14 15 mutations of the CTNNB1 gene have been shown to be accumulated in a subset of 16 17 endometrioid endometrial cancers arising in young and obese patients (33), the present 18 19 20 study has comprehensively revealed epigenetic and genetic alterations of genes 21 For Peer Review 22 participating in Wnt signaling in early-onset endometrioid endometrial cancer for the first 23 24 time. Similarly, DNA methylation alterations and gene aberrations may cooperatively 25 26 participate in the FGF signaling pathway in late-onset endometrioid endometrial 27 28 29 carcinogenesis. Although activating mutation of the FGFR2 gene has been considered a 30 31 therapeutic target for endometrioid endometrial cancer (34, 35), its correlation with late- 32 33 onset carcinogenesis has never been reported previously. Thus, the pathways contributing 34 35 to the development of early- and late-onset endometrial cancer appear to partially differ. 36 37 38 With regard to the genome-wide DNA methylation profile, all 1,034 probes 39 40 showing significant differences in DNA methylation levels between EE and LE samples 41 42 43 again showed significant differences between LE and normal control endometrial tissue 44 45 samples. On the other hand, only 102 probes of the 1,034 probes showed significant 46 47 differences in DNA methylation levels between EE and normal control endometrial tissue 48 49 samples, indicating that DNA methylation alterations for the majority of the 1,034 probes 50 51 52 occurred specifically in LE samples. GO enrichment analysis indicated that 371 genes, 53 54 for which the 1,034 probes were designed, were clearly overrepresented by 55 56 “transcriptional factors” and were accumulated in signaling pathways participating in 57 58 59 60 Carcinogenesis Page 20 of 37

1 Makabe et al. 20 2 3 4 5 6 transcriptional regulation. The late-onset-specific DNA methylation profile may modify 7 8 the clinicopathological features of LE samples as a result of the differences in the gene 9 10 11 expression profiles. In fact, myometrial invasion, one of the most important prognostic 12 13 factors of endometrioid endometrial cancer (2), was more frequent in LE than in EE 14 15 samples. 16 17 18 Epigenetic clustering based on DNA methylation profiles showed that 19 20 endometrioid endometrial cancers belonging to Cluster A were clinicopathologically 21 For Peer Review 22 more aggressive than those belonging to Cluster B. This tendency was also evident in 23 24 25 patients belonging to Cluster EA, and all patients in Cluster EA were included in Cluster 26 27 A without exception. The 101 hallmark genes for Cluster EA may drive the development 28 29 and progression of different tumor subtypes. For example, the DNA methylation levels 30 31 of 18 genes, including HOXA9, HOXD10 and SOX11, were significantly correlated with 32 33 34 more frequent lymphovascular tumor invasion. DNA hypermethylation of the HOXA9 35 36 gene is reportedly associated with a higher grade of serous ovarian cancer (36), 37 38 recurrence of bladder cancer (37) and non-small cell lung cancer (38) and mortality in 39 40 non-infant patients with neuroblastoma (39). Decreased expression of the HOXD10 41 42 43 homeobox gene has been reported in prostate (40), breast (41), thyroid (42), colorectal 44 45 (43), ovarian (44) and endometrial (45) cancer. Promoter methylation of the SOX11 gene 46 47 has reportedly been associated with cell growth, invasion or poor prognosis of gastric 48 49 cancer (46), hematopoietic malignancies (47) and nasopharyngeal carcinoma (48). It is 50 51 52 feasible that DNA methylation alterations of these genes participate in determining the 53 54 clinicopathological characteristics of Cluster EA. These data indicate that 55 56 clinicopathological features are strongly defined on the basis of DNA methylation 57 58 59 60 Page 21 of 37 Carcinogenesis

1 Makabe et al. 21 2 3 4 5 6 profiles, even in early-onset endometrioid endometrial cancer. 7 8 9 ROC analysis identified markers at CpG sites that were able to distinguish 10 11 epigenetic Cluster EB from Cluster EA: 11 CpG sites showed AUC values of 1 for such 12 13 discrimination, and Cluster EB of this cohort was diagnosed using the 11 marker CpG 14 15 16 sites with both a sensitivity and specificity of 100%. Although some of the 11 marker 17 18 CpGs were located in CpG islands around the TSSs, an inverse correlation between levels 19 20 of DNA methylation and mRNA expression was not confirmed in the data for endometrial 21 For Peer Review 22 cancer deposited in the TCGA database (https://tcga- 23 24 25 data.nci.nih.gov/docs/publications/ucec_2013/) (23). Therefore, DNA methylation 26 27 alterations of such genes may not result in alterations of expression, and such genes may 28 29 not be potential therapeutic targets. Instead, since the present data have indicated that the 30 31 clinicopathological features of endometrial cancers may be determined by their DNA 32 33 34 methylation profiles, these 11 markers may be able to reproducibly identify less 35 36 aggressive cancers, such as those belonging to Cluster EB. Although a validation study 37 38 will, of course, be needed, DNA methylation diagnostics of biopsy specimens using 39 40 appropriate marker CpG sites, such as these 11 CpG sites, may help to indicate the 41 42 43 feasibility of fertility preservation therapy for patients with early-onset endometrioid 44 45 endometrial cancer. 46 47 48 Since all samples belonging to Clusters EA and EB were included in Clusters A 49 50 and B, respectively, it is not surprising that the 11 CpG sites showed excellent AUCs even 51 52 in ROC analysis for discrimination of the more aggressive Cluster A from the less 53 54 aggressive Cluster B (Supplementary Table 12, available at Carcinogenesis Online): The 55 56 57 11 CpG sites may also become good markers for prognostication of all age groups of 58 59 60 Carcinogenesis Page 22 of 37

1 Makabe et al. 22 2 3 4 5 6 patients with endometrioid endometrial cancers. Moreover, the DNA methylation levels 7 8 of the 11 CpG sites were not significantly correlated with clinicopathological parameters 9 10 11 reflecting tumor aggressiveness, i.e. histological grade and stage, or patient outcome, in 12 13 cancers of the lung, stomach, breast, colon and ovary deposited in the TCGA database 14 15 (data not shown), suggesting that the prognostic potential of the 11 CpG sites may be 16 17 specific to endometrioid endometrial cancers. 18 19 20 In summary, genome-wide DNA methylation analysis using EPIC array and 21 For Peer Review 22 targeted sequencing of tumor-related genes for 112 endometrial tissue samples have 23 24 25 indicated that genetically and epigenetically different pathways may participate in the 26 27 development of early- and late-onset endometrial cancers. Since DNA methylation 28 29 profiles may determine the clinicopathological features of endometrial cancers, such 30 31 profiling may predict tumors that are less aggressive and amenable to fertility 32 33 34 preservation treatment. 35 36 37 38 39 Supplementary Material 40 41 42 Supplementary data are available at Carcinogenesis online. 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 23 of 37 Carcinogenesis

1 Makabe et al. 23 2 3 4 5 6 Funding 7 8 9 This study was supported by the Project for Utilizing Glycans in the 10 11 Development of Innovative Drug Discovery Technologies (17ae0101020h0002) and 12 13 Funding for Research to Expedite Effective Drug Discovery by Government, Academia 14 15 16 and Private partnership (17ak0101043s0603) from the Japan Agency for Medical 17 18 Research and Development (AMED), and also KAKENHI (16H02472 and 16K08720) 19 20 from the Japan Society for the Promotion of Science. 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Carcinogenesis Page 24 of 37

1 Makabe et al. 24 2 3 4 5 6 References 7 8 9 1. Ferlay, J. et al. (2015) Cancer incidence and mortality worldwide: Sources, methods 10 11 and major patterns in GLOBOCAN 2012. Int. J. Cancer, 136, E359-386. 12 13 2. Yamagami, W. et al. (2015) Annual report of the Committee on Gynecologic 14 15 16 Oncology, the Japan Society of Obstetrics and Gynecology. J Obstet Gynaecol Res, 17 18 41, 167–177. 19 20 3. Ota, T. et al. (2005) Clinicopathologic study of uterine endometrial carcinoma in 21 For Peer Review 22 young women aged 40 years and younger. Int J Gynecol Cancer, 15, 657–662. 23 24 25 4. Duska, L. R. et al. (2001) Endometrial cancer in women 40 years old or younger. 26 27 Gynecol Oncol, 83, 388–393. 28 29 5. Laurelli, G. et al. (2011) Conservative treatment of early endometrial cancer: 30 31 preliminary results of a pilot study. Gynecol Oncol, 120, 43–46. 32 33 34 6. Yamagami, W. et al. (2018) Is repeated high-dose medroxyprogesterone acetate 35 36 (MPA) therapy permissible for patients with early stage endometrial cancer or 37 38 atypical endometrial hyperplasia who desire preserving fertility? J Gynecol Oncol, 39 40 29, e21. 41 42 43 7. Jones, P. A. et al. (2016) Targeting the cancer epigenome for therapy. Nat Rev Genet, 44 45 15, 630-641. 46 47 8. Baylin, S. B. et al. (2016) Epigenetic Determinants of Cancer. Cold Spring Harb 48 49 Perspect Biol, 8, a019505. 50 51 52 9. Ahuja, N. et al. (2016) Epigenetic Therapeutics: A New Weapon in the War Against 53 54 Cancer. Annu Rev Med, 67, 73–89. 55 56 10. Kanai, Y. (2010) Genome-wide DNA methylation profiles in precancerous conditions 57 58 59 60 Page 25 of 37 Carcinogenesis

1 Makabe et al. 25 2 3 4 5 6 and cancers. Cancer Sci, 101, 36–45. 7 8 11. Arai, E. et al. (2010) DNA methylation profiles in precancerous tissue and cancers: 9 10 11 carcinogenetic risk estimation and prognostication based on DNA methylation status. 12 13 Epigenomics, 2, 467–481. 14 15 12. Soto, J. et al. (2016) The impact of next-generation sequencing on the DNA 16 17 methylation-based translational cancer research. Transl Res, 169, 1–18.e1. 18 19 20 13. Robles, A. I. et al. (2015) An Integrated Prognostic Classifier for Stage I Lung 21 For Peer Review 22 Adenocarcinoma Based on mRNA, microRNA, and DNA Methylation Biomarkers. J 23 24 Thorac Oncol, 10, 1037-1048. 25 26 14. Yamanoi, K. et al. (2015) Epigenetic clustering of gastric carcinomas based on DNA 27 28 29 methylation profiles at the precancerous stage: its correlation with tumor 30 31 aggressiveness and patient outcome. Carcinogenesis, 36, 509-520. 32 33 15. Tian, Y. et al. (2014) Prognostication of patients with clear cell renal cell carcinomas 34 35 based on quantification of DNA methylation levels of CpG island methylator 36 37 38 phenotype marker genes. BMC Cancer, 14, 772. 39 40 16. Sato, T. et al. (2013) DNA methylation profiles at precancerous stages associated with 41 42 recurrence of lung adenocarcinoma. PLoS One, 8, e59444. 43 44 17. Arai, E. et al. (2012) Single-CpG-resolution methylome analysis identifies 45 46 47 clinicopathologically aggressive CpG island methylator phenotype clear cell renal 48 49 cell carcinomas. Carcinogenesis, 33, 1487–1493. 50 51 18. Trimarchi, M. P. et al. (2017) Identification of endometrial cancer methylation 52 53 features using combined methylation analysis methods. PLoS One, 12, e0173242. 54 55 56 19. Farkas, S. A. et al. (2017) Epigenetic changes as prognostic predictors in endometrial 57 58 carcinomas. Epigenetics, 12, 19-26. 59 60 Carcinogenesis Page 26 of 37

1 Makabe et al. 26 2 3 4 5 6 20. Zhang, B. et al. (2014) Comparative DNA methylome analysis of endometrial 7 8 carcinoma reveals complex and distinct deregulation of cancer promoters and 9 10 11 enhancers. BMC Genomics, 15, 868. 12 13 21. Jiao, Y. et al. (2014) A systems-level integrative framework for genome-wide DNA 14 15 methylation and data identifies differential gene expression modules 16 17 under epigenetic control. Bioinformatics, 30, 2360-2366. 18 19 20 22. Sánchez-Vega, F. et al. (2013) Recurrent patterns of DNA methylation in the ZNF154, 21 For Peer Review 22 CASP8, and VHL promoters across a wide spectrum of human solid epithelial tumors 23 24 and cancer cell lines. Epigenetics, 8, 1355-1372. 25 26 23. Levine, D. A. (2013) The Cancer Genome Atlas Research Network. Integrated 27 28 29 genomic characterization of endometrial carcinoma. Nature, 497, 67–73. 30 31 24. Silverberg, S. G. et al. (2003) Epithelial tumours and related lesions. In: Tavassoli 32 33 FA, Devilee P, eds. World Health Organization classification of tumours: pathology 34 35 and genetics—tumours of the breast and female genital organs. Lyon: IARC Press, 36 37 38 217–232. 39 40 25. Sobin, L. H. et al. (2009) International Union Against Cancer (UICC) TNM 41 42 Classification of Malignant Tumours, 7 ed. Wiley-Liss, New York. 43 44 26. Pecorelli, S. (2009) Revised FIGO staging for carcinoma of the vulva, cervix, and 45 46 47 endometrium. Int J Gynaecol Obstet, 105, 103-104. 48 49 27. Kanai, Y. et al. (2018) The Japanese Society of Pathology Guidelines on the handling 50 51 of pathological tissue samples for genomic research: Standard operating procedures 52 53 based on empirical analyses. Pathol Int, 68, 63-90. 54 55 56 28. Bibikova, M. et al. (2009) Genome-wide DNA methylation profiling using Infinium® 57 58 assay. Epigenomics, 1, 177-200. 59 60 Page 27 of 37 Carcinogenesis

1 Makabe et al. 27 2 3 4 5 6 29. Ng, P. C. et al. (2002) Accounting for human polymorphisms predicted to affect 7 8 protein function. Genome Res, 12, 436-446. 9 10 11 30. Fan, J. et al. (2006) Understanding receiver operating characteristic (ROC) curves. 12 13 CJEM, 8, 19–20. 14 15 31. Yuexin, L. et al. (2014) Clinical Significance of CTNNB1 Mutation and Wnt Pathway 16 17 Activation in Endometrioid Endometrial Carcinoma. J Natl Cancer Inst, 106, dju245 18 19 20 32. Peiro, G. et al. (2013) Association of mammalian target of rapamycin with aggressive 21 For Peer Review 22 type II endometrial carcinomas and poor outcome: a potential target treatment. Hum 23 24 Pathol, 44, 218–225. 25 26 33. Klaus, A. et al. (2008) Wnt signaling and its impact on development and cancer. Nat 27 28 29 Rev Cancer, 8, 387–398. 30 31 34. Pollock, P. M. et al. (2007) Frequent activating FGFR2 mutations in endometrial 32 33 carcinomas parallel germline mutations associated with craniosynostosis and skeletal 34 35 dysplasia syndromes. Oncogene, 26, 7158-7162. 36 37 38 35. Byron, S. A. et al. (2009) FGFR2 as a molecular target in endometrial cancer. Future 39 40 Oncol, 5, 27-32. 41 42 36. Montavon, C. et al. (2012) Prognostic and diagnostic significance of DNA 43 44 methylation patterns in high grade serous ovarian cancer. Gynecol Oncol, 124, 582– 45 46 47 588. 48 49 37. Reinert, T. et al. (2012) Diagnosis of bladder cancer recurrence based on urinary 50 51 levels of EOMES, HOXA9, POU4F2, TWIST1, VIM, and ZNF154 hypermethylation. 52 53 PLoS ONE, 7, e46297. 54 55 56 38. Hwang, J. A. et al. (2015) HOXA9 inhibits migration of lung cancer cells and its 57 58 hypermethylation is associated with recurrence in non-small cell lung cancer. Mol 59 60 Carcinogenesis Page 28 of 37

1 Makabe et al. 28 2 3 4 5 6 Carcin, 54, E72–80 7 8 39. Alaminos, M. et al. (2004) Clustering of gene hypermethylation associated with 9 10 11 clinical risk groups in neuroblastoma. J Natl Cancer Inst, 96, 1208–1219. 12 13 40. Mo, R. J. et al. (2017) Decreased HoxD10 expression promotes a proliferative and 14 15 aggressive phenotype in prostate cancer. Curr Mol Med, 17, 70-78. 16 17 41. Vardhini, N. V. et al. (2014) HOXD10 expression in human breast cancer. Tumour 18 19 20 Biol, 35, 10855-10860. 21 For Peer Review 22 42. Cao, Y. M. et al. (2018) Aberrant hypermethylation of the HOXD10 gene in papillary 23 24 thyroid cancer with BRAFV600E mutation. Oncol Rep, 39, 338-348. 25 26 43. Wang, Y. et al. (2016) miR-10b promotes invasion by targeting HOXD10 in 27 28 29 colorectal cancer. Oncol Lett, 12, 488-494. 30 31 44. Nakayama, I. et al. (2013) Loss of HOXD10 expression induced by upregulation of 32 33 miR-10b accelerates the migration and invasion activities of ovarian cancer cells. Int 34 35 J Oncol, 43, 63-71. 36 37 38 45. Osborne, J. et al. (1998) Expression of HOXD10 gene in normal endometrium and 39 40 endometrial adenocarcinoma. J Soc Gynecol Investig, 5, 277-280. 41 42 46. Hu, X. et al. (2015) Aberrant SOX11 promoter methylation is associated with poor 43 44 prognosis in gastric cancer. Cell Oncol, 38, 183-194. 45 46 47 47. Gustabsson, E. et al. (2010) SOX11 expression correlates to promoter methylation 48 49 and regulates tumor growth in hematopoietic malignancies. Mol Cancer, 9, 187. 50 51 48. Zhang, S. et al. (2013) Promoter methylation status of the tumor suppressor gene 52 53 SOX11 is associated with cell growth and invasion in nasopharyngeal carcinoma. 54 55 56 Cancer Cell Int, 13, 109. 57 58 59 60 Page 29 of 37 Carcinogenesis

1 Makabe et al. 29 2 3 4 5 6 Figure legends 7 8 9 Figure 1. DNA methylation profiles of normal control endometrial tissue (n=31) and 10 11 endometrial cancer tissue (n=81). (A) Principal component analysis was performed using 12 13 the 58,958 probes showing significant differences in DNA methylation levels between 14 15 16 normal control endometrial tissue and endometrial cancer tissue samples (Welch’s t test, 17 -8 18 adjusted Bonferroni correction [α=1.18×10 ] and endometrial cancer tissue – normal control endometrial 19 20 tissue value of more than 0.25 or less than -0.25). (B) Manhattan plot constructed using all 21 For Peer Review 22 846,413 probes. The Bonferroni corrected P value (1.18×10-8) is indicated by the red line. 23 24 25 The leading 10 genes, i.e. TRIM15, HIST2H3PS2, NBPF19, L1TD1, HIST2H2BA, 26 27 NKAPL, DLX2-AS1, GRM1, ADAM12 and CFAP46, showing significant differences in 28 29 DNA methylation levels between normal control endometrial tissue and endometrial 30 31 cancer tissue samples (Δβ value of more than 32 endometrial cancer tissue–normal control endometrial tissue 33 34 0.25 or less than -0.25) are labeled. N/A, not annotated (designed for the intergenic 35 36 regions). 37 38 39 40 41 42 Figure 2. The incidence of somatic mutations (A) and copy number alterations (gain and 43 44 loss) (B) of the 50 examined tumor-related genes in endometrial cancer tissue in the 45 46 present cohort (n=81) and data deposited in The Cancer Genome Atlas (TCGA) database 47 48 (https://tcga-data.nci.nih.gov/docs/publications/ucec_2013/) (n=248) (23). Genes 49 50 51 showing significantly higher or lower incidence (P<0.05) in the present cohort than in the 52 53 TCGA database are indicated by * and †, respectively. 54 55 56 57 58 59 60 Carcinogenesis Page 30 of 37

1 Makabe et al. 30 2 3 4 5 6 Figure 3. Differences in genetic and epigenetic states between early-onset endometrioid 7 8 endometrial cancer tissue (EE) (patient age <40 yr) and late-onset endometrioid 9 10 11 endometrial cancer tissue (LE). (A) The incidence of somatic mutations of the 50 12 13 examined tumor-related genes in 34 samples of EE and 40 samples of LE. Genes showing 14 15 a significantly higher incidence (P<0.05) of somatic mutations in EE samples than in LE 16 17 samples and genes showing a significantly higher incidence of somatic mutations in LE 18 19 20 samples than in EE samples are indicated by * and †, respectively. (B) The pathway 21 For Peer Review -8 22 “Development WNT signaling pathway” (P=3.45×10 ) illustrated schematically using 23 24 25 MetaCore software. Genes showing DNA hypermethylation in 19 EE samples with 26 27 somatic mutations of the CTNNB1 gene (CTNNB1-M) relative to 15 EE samples without 28 29 such mutations (CTNNB1-W) (Welch’s t test P<0.01 and Δβ value of 30 CTNNB1-M-CTNNB1-W 31 32 more than 0.15) are indicated by red circles. Genes showing DNA hypomethylation in 33 34 CTNNB1-M samples relative to CTNNB1-W samples (Welch’s t test P<0.01 and 35 36 ΔβCTNNB1-M-CTNNB1-W value of less than -0.15) are indicated by dotted red circles. The 37 38 CTNNB1 gene is indicated by a blue circle. 39 40 41 42 43 44 Figure 4. Epigenetic clustering of endometrioid endometrial cancer. (A) Unsupervised 45 46 hierarchical clustering of endometrioid endometrial cancer tissue samples using DNA 47 48 methylation levels on 63,033 probes showing significant differences in DNA methylation 49 50 51 levels between 31 samples of normal control endometrial tissue and 74 samples of 52 53 endometrioid endometrial cancer tissue (Welch’s t test, adjusted Bonferroni correction 54 -8 55 [α=1.18 × 10 ] and Δβendometrioid endometrial cancer tissue-normal control endometrial tissue value of more 56 57 than 0.25 or less than -0.25). Based on DNA methylation status, 74 patients were 58 59 60 Page 31 of 37 Carcinogenesis

1 Makabe et al. 31 2 3 4 5 6 subclustered into Cluster A (n=58) and Cluster B (n=16). Correlations between this 7 8 epigenetic clustering and clinicopathological parameters are summarized in 9 10 11 Supplementary Table 8A, available at Carcinogenesis Online. (B) Unsupervised 12 13 hierarchical clustering of early-onset endometrioid endometrial cancer tissue (EE) 14 15 samples using DNA methylation levels on the 40,589 probes showing significant 16 17 differences in DNA methylation between 31 normal control endometrial tissue samples 18 19 -8 20 and 34 EE samples (Welch’s t test, adjusted Bonferroni correction [α=1.18 × 10 ] and 21 For Peer Review 22 ΔβEE-normal control endometrial tissue value of more than 0.25 or less than -0.25). Based on DNA 23 24 methylation status, the 34 EE patients were subclustered into Cluster EA (n=22) and 25 26 Cluster EB (n=12). Correlations between this epigenetic clustering and 27 28 29 clinicopathological parameters of the patients are summarized in Supplementary Table 30 31 8B, available at Carcinogenesis Online. Probe Clusters I to V are shown on the left side 32 33 of the heatmap. (C) Venn diagram showing the relationship between Clusters A and B of 34 35 endometrioid endometrial cancer tissue samples and Clusters EA and EB of EE samples. 36 37 38 All samples belonging to Cluster EA are included in Cluster A, whereas all samples 39 40 belonging to Cluster EB are included in Cluster B without exception. 41 42 43 44 45 Figure 5. Scattergrams of DNA methylation levels for all 11 probes showing an area 46 47 48 under the curve value of 1 in receiver operating characteristic curve analysis for 49 50 discrimination of EB samples (n=12) from EA samples (n=22). P values by Welch’s t test 51 52 for each probe are shown in each panel. Cutoff values (CVs) are shown by a dotted line 53 54 in each panel. Using each probe and its CV, EB samples were discriminated from EA 55 56 57 samples with 100% sensitivity and specificity. 58 59 60 Carcinogenesis Page 32 of 37

1 Makabe et al. 1 2 3 4 5 Table 1. Top 20 statistically significant GO molecular functions revealed by MetaCore software analysis using the 371 genes, for 6 which the 1,034 probes showing differences in DNA methylation levels between samples of early-onset endometrioid endometrial 7 cancer tissue (patient age <40 years) and late-onset endometrioid endometrial cancer tissue were designed (listed in Supplementary 8 Table 4, available at Carcinogenesis Online). 9 Molecular functions P value Included genes showing differences in DNA methylation levels ASCL1, ASCL2, ASCL4, BARHL2, DMARTA2, DRGX, EN1, , EVX2, FEZF2, FLI1, FOXD3, FOXE1, 10 FOXI2, GATA4, GCM2, HAND2, HIC1, IRF4, LHX1, LHX4, LHX5, MNX1, MSC, MKX1-1, MYOD1, NKX2-6, Sequence-specific DNA binding* 2.05 × 10-19 NKX6-2, STN1, ONECUT2, OTP, OTX1, PAX3, PAX5, PAX6, PAX7, PHOX2A, PITX2, POU3F3, POU4F2, 11 PRDM14, PROX1, RAX, SALL3, SATB2, SIX6, SOX1, SOX11, SOX14, ,TBXT, TBX15, TBX18, TBX5, TLX2, TP73, VSX1, ZFHX4, ZIC5, ZNF516 12 ASCL1, ASCL2, ASCL4, BARHL2, BHLHE22, BHLHE23, BNC1, DMRTA2, DRGX, DBX1, EN1, EVX1, EVX2, 13 FEZF2, FLI1, FOXD3, FOXE1, FOXI2, SKOR1, GATA4, GCM2, GFI1, HAND2, HIC1, IRF4, LHX1, LHX4, activity, -19 LHX5, MNX1, MSC, MKX1-1, MYOD1, NKX2-6, NKX6-2, OLIG2, ONECUT1, ONECUT2, OTP, OTX1, 2.39 × 10 PAX3, PAX5, PAX6, PAX7, PHOX2A, PITX2, POU3F3, POU4F2, POU6F2, PRDM14, PROX1, RAX, SALL3, 14 sequence-specific DNA binding* SATB2, SIM1, SIX6, SOX1, SOX11, SOX14, SOX2, TBXT, TBX15, TBX18, TBX5, TLX2, TP73, VSX1, ZFHX4, ZIC5, ZNF132, ZNF229, ZNF334, ZNF516, ZNF667, ZSCAN1, ZSCAN23 15 ASCL1, ASCL2, ASCL4, BARHL2, BHLHE22, BHLHE23, BNC1, DMRTA2, DRGX, DBX1, EN1, EVX1, EVX2, FEZF2, FLI1, FOXD3, FOXE1, FOXI2, SKOR1, GATA4, GCM2, GFI1, HAND2, HIC1, IRF4, LHX1, LHX4, 16 Nucleic acid binding transcription -19 LHX5, MNX1, MSC, MKX1-1, MYOD1, NKX2-6, NKX6-2, OLIG2, ONECUT1, ONECUT2, OTP, OTX1, 2.48 × 10 PAX3, PAX5, PAX6, PAX7, PHOX2A, PITX2, POU3F3, POU4F2, POU6F2, PRDM14, PROX1, RAX, SALL3, 17 factor activity* SATB2, SIM1, SIX6, SOX1, SOX11, SOX14, SOX2, TBXT, TBX15, TBX18, TBX5, TLX2, TP73, VSX1, ZFHX4, ZIC5, ZNF132, ZNF229, ZNF334, ZNF516, ZNF667, ZSCAN1, ZSCAN23 18 ASCL1, ASCL2, ASCL4, BARHL2, BHLHE22, BHLHE23, BNC1, DMRTA2, DRGX, DBX1, EN1, EVX1, EVX2, RNA polymerase II transcription FEZF2, FLI1, FOXD3, FOXE1, FOXI2, SKOR1, GATA4, GCM2, GFI1, HAND2, HIC1, IRF4, LHX1, LHX4, 19 -16 LHX5, MNX1, MSC, MKX1-1, MYOD1, NKX2-6, NKX6-2, OLIG2, ONECUT1, ONECUT2, OTP, OTX1, factor activity, sequence-specific 3.14 × 10 PAX3, PAX5, PAX6, PAX7, PHOX2A, PITX2, POU3F3, POU4F2, POU6F2, PRDM14, PROX1, RAX, SALL3, 20 DNA binding* SATB2, SIM1, SIX6, SOX1, SOX11, SOX14, SOX2, TBXT, TBX15, TBX18, TBX5, TLX2, TP73, VSX1, ZFHX4, ZIC5, ZNF132, ZNF229, ZNF334, ZNF516, ZNF667, ZSCAN1, ZSCAN23 21 For PeerASCL1, Review ASCL2, ASCL4, BARHL2, BHLHE22, BHLHE23, BNC1, DMRTA2, DRGX, DBX1, EN1, EVX1, EVX2, FEZF2, FLI1, FOXD3, FOXE1, FOXI2, SKOR1, GATA4, GCM2, GFI1, HAND2, HIC1, IRF4, LHX1, LHX4, 22 -10 LHX5, MNX1, MSC, MKX1-1, MYOD1, NKX2-6, NKX6-2, OLIG2, ONECUT1, ONECUT2, OTP, OTX1, DNA binding* 2.05 × 10 PAX3, PAX5, PAX6, PAX7, PHOX2A, PITX2, POU3F3, POU4F2, POU6F2, PRDM14, PROX1, RAX, SALL3, 23 SATB2, SIM1, SIX6, SOX1, SOX11, SOX14, SOX2, TBXT, TBX15, TBX18, TBX5, TLX2, TP73, VSX1, ZFHX4, ZIC5, ZNF132, ZNF229, ZNF334, ZNF516, ZNF667, ZSCAN1, ZSCAN23 24 Transcription regulatory region ASCL1, ASCL2, ASCL4, BARHL2, BHLHE22, EN1, FEZF2, FLI1, FOXD3, GATA4, HAND2, IRF4, MNX1, 2.15 × 10-10 MSC, MKX1-1, MYOD1, NKX6-2, ONECUT2, OTX1, PAX5, PAX6, PHOX2A, PITX2, POU4F2, PRDM14, 25 sequence-specific DNA binding* PROX1, RAX, SATB2, SOX1, SOX11, SOX2, TBXT, TBX15, TBX18, TBX5, ZIC5, ZNF516 ASCL1, ASCL2, BARHL2, EN1, FLI1, GCM2, GIF1, HAND2, IRF4, MYOD1, NKX6-2, ONECUT1, 26 Transcription factor activity, RNA ONECUT2, OTX1, PAX5, PAX6, PHOX2A, PITX2, POU4F2, PROX1, RAX, SATB2, SOX1, SOX11, SOX2, 27 polymerase II core promoter proximal 2.81 × 10-10 TBX15, TBX18, TP73 28 region sequence-specific binding* Neurokinin binding 5.08 × 10-10 TAC1 29 Substance P receptor binding 5.08 × 10-10 TAC1 30 ASCL1, ASCL2, ASCL4, BARHL2, EN1, FEZF2, FLI1, FOXD3, GATA4, HAND2, IRF4, LHX1, MSC, MKX1- Sequence-specific double-stranded 1, MYOD1, NKX6-2, ONECUT2, OTX1, PAX5, PAX6, PHOX2A, PITX2, POU4F2, PRDM14, PROX1, RAX, 31 7.65 × 10-10 DNA binding* SATB2, SOX1, SOX11, SOX2, TBXT, TBX15, TBX18, TBX5, TP73, ZIC5, ZNF516 32 Transcription regulatory region DNA ASCL1, ASCL2, ASCL4, BARHL2, BHLHE22, EN1, FEZF2, FLI1, FOXD3, GATA4, GFI1, HAND2, IRF4, 9.47 × 10-10 LHX1, MSC, MKX1-1, MYOD1, NKX6-2, ONECUT2, OTX1, PAX5, PAX6, PHOX2A, PITX2, POU4F2, 33 binding* PRDM14, PROX1, RAX, SATB2, SOX1, SOX11, SOX2, TBXT, TBX15, TBX18, TBX5, TP73, ZIC5, ZNF516 34 ASCL1, ASCL2, ASCL4, BARHL2, BHLHE22, EN1, FEZF2, FLI1, FOXD3, GATA4, GFI1, HAND2, IRF4, Regulatory region DNA binding* 1.01 × 10-9 LHX1, MSC, MKX1-1, MYOD1, NKX6-2, ONECUT2, OTX1, PAX5, PAX6, PHOX2A, PITX2, POU4F2, 35 PRDM14, PROX1, RAX, SATB2, SOX1, SOX11, SOX2, TBXT, TBX15, TBX18, TBX5, TP73, ZIC5, ZNF516 36 Regulatory region nucleic acid ASCL1, ASCL2, ASCL4, BARHL2, BHLHE22, EN1, FEZF2, FLI1, FOXD3, GATA4, GFI1, HAND2, IRF4, 1.13 × 10-9 LHX1, MSC, MKX1-1, MYOD1, NKX6-2, ONECUT2, OTX1, PAX5, PAX6, PHOX2A, PITX2, POU4F2, 37 binding* PRDM14, PROX1, RAX, SATB2, SOX1, SOX11, SOX2, TBXT, TBX15, TBX18, TBX5, TP73, ZIC5, ZNF516 38 Neuropeptide receptor activity 3.30 × 10-9 GALR1, QRFPR, GPR139, NPFFR2, NPY2R, NPY5R, PROKR2, SSTR1, SSTR4, SORCS3 -9 CACNG7, CNGA3, GABRA1, GABRA2, GABRB1, GLRA3, GRIA4, KCNAB1, KCNH8, KCNK9, KCNQ5, 39 Gated channel activity 6.48 × 10 KCNA6, KCNJ3, KCNA1, KCNA3, KCNA4, KCNV1, GRIN3A, CACNA1A, RYR2, KCNT2, TTYH1, CHRNA4 ASCL1, ASCL2, ASCL4, BARHL2, EN1, FEZF2, FLI1, FOXD3, FOXE1, GATA4, HAND2, IRF4, LHX1, MSC, 40 Double-stranded DNA binding* 7.49 × 10-9 MKX1-1, MYOD1, NKX6-2, ONECUT2, OTX1, PAX5, PAX6, PHOX2A, PITX2, POU4F2, PRDM14, PROX1, 41 RAX, SATB2, SOX1, SOX11, SOX2, TBXT, TBX15, TBX18, TBX5, TP73, ZIC5, ZNF516 RNA polymerase II regulatory region ASCL1, ASCL2, ASCL4, BARHL2, EN1, FEZF2, FLI1, FOXD3, GATA4, HAND2, IRF4, MSC, MYOD1, 9.49 × 10-9 NKX6-2, ONECUT2, OTX1, PAX5, PAX6, PHOX2A, PITX2, POU4F2, PRDM14, PROX1, RAX, SATB2, 42 sequence-specific DNA binding* SOX1, SOX11, TBXT, TBX15, TBX18, TBX5, TP73, ZIC5 43 Voltage-gated potassium channel CNGA3, KCNAB1, KCNH8, KCNK9, KCNQ5, KCNA6, KCNJ3, KCNA1, KCNA3, KCNA4, KCNV1, KCNT2 1.06 × 10-8 44 activity RNA polymerase II regulatory region ASCL1, ASCL2, ASCL4, BARHL2, EN1, FEZF2, FLI1, FOXD3, GATA4, HAND2, IRF4, MSC, MYOD1, 45 1.19 × 10-8 NKX6-2, ONECUT2, OTX1, PAX5, PAX6, PHOX2A, PITX2, POU4F2, PRDM14, PROX1, RAX, SATB2, 46 DNA binding* SOX1, SOX11, TBXT, TBX15, TBX18, TBX5, TP73, ZIC5 BARHL2, FEZF2, FLI1, GATA4, GCM2, HAND2, IRF4, LHX4, MYOD1, ONECUT1, ONECUT2, OTX1, 47 Transcriptional activator activity, PAX5, PAX6, PHOX2A, PITX2, POU4F2, RAX, SATB2, SIX6, SOX1, SOX11, SOX2, TBX5, TLX2,, TP73 RNA polymerase II transcription 48 1.24 × 10-8 regulatory region sequence-specific 49 binding* 50 *GO molecular functions involved in DNA binding or transcriptional regulation. Genes for which the protein class is a transcription 51 factor are indicated by underlining. 52 53 54 55 56 57 58 59 60 Page 33 of 37 Carcinogenesis Figure 1 A 1

2 20 3 4 5

6 10 7 8 9

10 0 11 12 13

14 - 10 15 PC2 Normal control 16 17 endometrial tissue

18 - 20 For Peer Review 19 (n=31) 20 21

22 - 30 Endometrial caner 23 24 tissue (n=81) 25

26 - 40 27 28 29 -60 -40 -20 0 20 40 60 30 31 PC1 32B 33 34 35 36 37 38 TRIM15 HIST2HPS2 39 40 60 NBPF19 L1TD1 41 HIST2H2BA NKAPL 42 DLX2-AS1 50 GRM1 43 ADAM12 44 CFAP46 45 40

46 ) P 47 ( 48 10 30 49 -log 50 20 51 52 53 10 54 55 0 56 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 57 58 59 20 60 21 Carcinogenesis FigurePage 34 of2 37

1A 2 3 (%) 4 5 70 6 7 60 8 9 10 Present 50 TCGA 11 study 12 13 40 14 15 16 30 17 18 20 For Peer Review 19 † 20 †

Incidence of somatic mutations * * 21 10 * * 22 † † † 23 24 0

25 KIT RB1 SRC RET ALK APC VHL MPL MET ATM KDR SMO TP53 JAK2 JAK3 FLT3 ABL1 IDH1 IDH2 AKT1 KRAS EZH2 HRAS BRAF PTEN GNAS EGFR NPM1 MLH1 CDH1 GNAQ STK11 CSF1R ERBB2 ERBB4 GNA11 FGFR2 FGFR1 FGFR3 HNF1A FBXW7 26 SMAD4 PIK3CA PTPN11 CTNNB1 CDKN2A NOTCH1 PDGFRA

27 SMARCB1

28 NRAS|CSDE1 29 30 31 (%) 32B 33 12 Present TCGA 34 Gain study * 35 10 * 36 37 8 * * 38 * 39 * 6 40 41 * * * 42 4 * * 43 44 2 45 46 0 KIT

47 RB1 RET SRC ALK APC VHL KDR MET MPL ATM SMO TP53 JAK2 JAK3 FLT3 IDH1 IDH2 ABL1 AKT1 KRAS NRAS EZH2 HRAS BRAF PTEN GNAS EGFR MLH1 NPM1 CDH1 GNAQ STK11 CSF1R ERBB2 ERBB4 GNA11 FGFR1 FGFR2 FGFR3 HNF1A 48 FBXW7 SMAD4 PIK3CA PTPN11 CTNNB1 CDKN2A NOTCH1 PDGFRA

49 SMARCB1 50 0 51

52 Incidenceof copy numberalterations -22 53 * * * 54 -44 * 55 Present * TCGA * 56 -66 study 57 Loss 58 (%) 59 60 Page 35 of 37 Carcinogenesis A Figure 3 1 (%) 2 3 80 4 5 70 6 7 60 8 * 9 10 50 EE LE 11 12 13 40 14 15 30 16 † 17 18 20 For Peer Review 19 Incidence of somatic mutations 20 10 21 22 23 0 KIT

24 RB1 SRC RET ALK APC VHL MPL KDR MET ATM SMO TP53 JAK2 JAK3 FLT3 ABL1 IDH1 IDH2 AKT1 KRAS EZH2 HRAS BRAF PTEN GNAS EGFR NPM1 MLH1 CDH1 GNAQ STK11 CSF1R ERBB2 ERBB4 GNA11 FGFR2 FGFR1 FGFR3 HNF1A FBXW7 25 SMAD4 PIK3CA PTPN11 CTNNB1 CDKN2A NOTCH1 PDGFRA

26 SMARCB1

27 NRAS|CSDE1 28 29 30 31 32 33 34B 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Carcinogenesis FigurePage 36 of4 37 A 1 Patients 2 3 4 5 Cluster B Cluster A 6 (n=16) (n=58) 7 8 9 10 11 12 13

14 Probes 15 16 17 18 For Peer Review 19 20 21 22 Patients 23 B 24 25 Cluster EB Cluster EA 26 (n=12) (n=22) 27 28

29 Probe Cluster I 30 (n=6,415) 31 Probe Cluster II (n=2,655) 32 33 34 35 Probe Cluster III (n=22,133) 36 37 38 39 Probe Cluster IV (n=789) 40 41 Probe Cluster V (n=40,589) 42 43 44 45 46 C 47 Cluster A 48 49 Cluster B 50 Cluster EA Cluster EB 51 52 22 58 16 12 53 54 55 56 57 58 59 60 Page 37 of 37 Carcinogenesis Figure 5

1 2 3 4 5 6 7 cg02320740cg02320740 cg06248767cg06248767 cg13539545cg13539545 8 0.8 1.0 1.0

0.8 0.8 9 0.6 P = 6.34 × 10-8 P = 1.44 × 10-7 P = 4.77 × 10-5 10 0.6 0.6 0.4 11 0.4 0.4

0.2 12 CV = 0.41 0.2 CV = 0.60 0.2 CV = 0.61 13 DNA methylation levels methylation DNA DNA methylation levels methylation DNA 0.0 0.0 levels methylation DNA 0.0

14 ClusterYA EA ClusterYB EB ClusterYA EA ClusterYB EB ClusterYA EA ClusterYB EB 15 (n=22) (n=12) (n=22) (n=12) (n=22) (n=12) 16 17 18 For Peer Review 19 cg13997680cg13997680 cg23049458cg23049458 cg24452128cg24452128 20 1.0 1.0 1.0

21 0.8 0.8 0.8 P = 1.60 × 10-7 P = 8.45 × 10-7 P = 5.39 × 10-7 22 0.6 0.6 0.6

23 0.4 0.4 0.4 24 0.2 CV = 0.51 0.2 CV = 0.57 0.2 CV = 0.51 25 DNA methylation levels methylation DNA DNA methylation levels methylation DNA

0.0 levels methylation DNA 26 0.0 0.0 ClusterYA EA ClusterYB EB ClusterYA EA ClusterYB EB ClusterYA EA ClusterYB EB 27 (n=22) (n=12) (n=22) (n=12) (n=22) (n=12) 28 29 30 31 cg02065293cg02065293 cg05353872cg05353872 cg12197571cg12197571 32 1.0 1.0 1.0

33 0.8 0.8 0.8 P = 5.15 × 10-9 P = 3.51 × 10-7 P = 4.12 × 10-13 34 0.6 0.6 0.6 35 0.4 0.4 0.4 36 37 0.2 CV = 0.47 0.2 CV = 0.43 0.2 CV = 0.64 DNA methylation levels methylation DNA levels methylation DNA 38 0.0 0.0 levels methylation DNA 0.0 ClusterYA EA ClusterYB EB ClusterYA EA ClusterYB EB ClusterYA EA ClusterYB EB 39 (n=22) (n=12) (n=22) (n=12) (n=22) (n=12) 40 41 42 43 cg14269191cg14269191 cg26980111cg26980111 44 1.0 1.0 45 0.8 0.8 46 P = 8.77 × 10-12 P = 9.25 × 10-12 47 0.6 0.6 48 0.4 0.4 49 0.2 CV = 0.52 0.2 CV = 0.51

50 levels methylation DNA 0.0 levels methylation DNA 0.0 51 ClusterYA EA ClusterYB EB ClusterYA EA ClusterYB EB (n=22) (n=12) (n=22) (n=12) 52 53 54 55 56 57 58 59 60