Oncogene (2002) 21, 3804 ± 3813 ã 2002 Nature Publishing Group All rights reserved 0950 ± 9232/02 $25.00 www.nature.com/onc

A comprehensive catalog of CpG islands methylated in human lung adenocarcinomas for the identi®cation of tumor suppressor

Masahiko Shiraishi*,1, Azumi Sekiguchi1, Michael J Terry1, Adam J Oates1, Yuji Miyamoto1, Ying H Chuu1, Miyo Munakata1 and Takao Sekiya1

1DNA Methylation and Genome Function Project, National Cancer Center Research Institute, 1-1, Tsukiji 5-chome, Chuo-ku, Tokyo 104-0045, Japan

CpG island methylation is an important mechanism in expression and other biological phenomena (Bird, gene silencing and is a key epigenetic event in cancer 2002). development. As yet, the number and identities of the CpG islands are chromosomal regions, often located genes that are inactivated in cancer cells has not been at the 5' region of genes, which have a high density of determined. In order to address this issue, we have nonmethylated CpG sequences (Bird, 1986). In non- performed a comprehensive isolation of CpG islands that CpG-island sites of the most cytosine are methylated in human lung adenocarcinomas. We residues at CpG dinucleotides are methylated and this have isolated approximately 200 CpG islands that are methylation status is stably maintained. However in methylated in tumor DNA including those of known cancer cells there is an altered CpG methylation tumor-associated genes such as the HOXA5 gene. As the pattern which is not always stably maintained. In library contains the CpG islands of a number of known addition, hypomethylation at CpG sequences is a tumor suppressor genes it is highly likely that additional, general feature of tumor DNA, although in the DNA previously unidenti®ed tumor suppressor genes, will be of cancer cells some CpG dinucleotides within CpG present. On average, 1 ± 2% of CpG islands were islands of certain genes are known to be methylated. methylated speci®cally in tumors although this ®gure Aberrant de novo methylation events at speci®c CpG di€ered greatly between patients. This study provides an islands are frequently observed in tumor DNA and important resource in the search for genes inactivated in have been reported to play a key role in the tumors and for the investigation of epigenetic dysregula- inactivation of many tumor suppressor genes (Baylin tion of gene expression by CpG island methylation. et al., 1998). Oncogene (2002) 21, 3804 ± 3813. DOI: 10.1038/sj/ Transcriptional repression is not the sole event onc/1205454 mediated by methylation of CpG islands. CpG islands represent a model for active chromatin (Tazi and Bird, Keywords: CpG island; DNA methylation; lung ade- 1990) and methylation of CpG islands is associated nocarcinoma; methylated DNA binding domain col- with altered chromatin structure (Antequera et al., umn chromatography; segregation of partly melted 1990). It is reported that CpG islands are initiation molecules sites for both transcription and DNA replication (Delgado et al., 1998). The methylated CpG island of the HPRT gene on inactive X is not only Introduction associated with transcriptional inactivation but also late replication of the gene (Schmidt and Migeon, Accumulating evidence suggests that both genetic and 1990). When the cell is treated with demethylating epigenetic events play important roles in carcinogen- agent 5-aza-2'-deoxycytidine, the inactive allele became esis. Genetic events include alterations in nucleotide activated and shifts to an early replication state ONCOGENOMICS sequence that may result in changes in the structure (Schmidt and Migeon, 1990). These and other ®ndings and therefore function of a gene product. Epigenetic strongly suggest that CpG islands are critical regions changes are less well characterized but are known to that regulate a number of cellular events. produce alterations in gene regulation, chromatin In the study of cancer, the comprehensive isolation structure, and other events that are not attributed to of genes inactivated by CpG island methylation is of alterations of nucleotide sequence. DNA methylation is great importance for the identi®cation of novel tumor an important epigenetic event and is known to regulate suppressor genes and also in understanding epigenetic dysregulation of genome function. Approaches pre- viously described for the isolation of CpG islands methylated in cancer have been based on fragmenta- tion of DNA by restriction endonucleases in which *Correspondence: M Shiraishi; E-mail: [email protected]. Received 18 October 2001; revised 15 February 2002; accepted 20 digestion is dependent on the methylation status of the February 2002 recognition sites (Shiraishi et al., 2000; 2002 in press). Comprehensive isolation of methylated CpG islands M Shiraishi et al 3805 However, a potential caveat of this type of approach is DNA fragments derived from the nontranscribed that it is not clear whether the methylation status of spacer region of the ribosomal RNA genes has been speci®c restriction sites within a CpG island represents also reported when highly methylated fragments that of the entire region of the CpG island. derived from normal blood DNA were enriched by Furthermore, the PCR step in the isolation procedure MBD column chromatography and cloned (Brock et adopted by many groups results in the loss of al., 1999). To exclude these clones, we performed PCR sequences that are resistant to ampli®cation, as is using primers speci®c for ribosomal RNA genes, and a often the case with G+C-rich sequences. DNA template from bacterial colonies that had been We have previously described the combined use of transformed with the plasmid clones. Bacterial colonies two techniques, methyl-CpG binding domain (MBD) that produced a PCR product were subsequently column chromatography and segregation of partly eliminated. melted molecules (SPM), to isolate DNA fragments Ultimately 6000 clones were sequenced, and using an derived from CpG islands methylated in cancer cells algorithm that predicts the position of candidate CpG (Shiraishi et al., 1999a). Fractionation of DNA islands, approximately 1300 clones were identi®ed as fragments by MBD column chromatography depends having features of CpG islands. After allowing for primarily on the number of methyl-CpG (mCpG) sites redundancy of clones, 660 independent sequences were within the fragment (Cross et al., 1994; Shiraishi et al., identi®ed as candidate CpG islands methylated in 1999a). The preferential isolation of DNA fragments human lung adenocarcinomas. In the latter stages of derived from CpG islands by the SPM method is based sequencing, many clones were found to be identical to on the principle that DNA fragments derived from those we had previously sequenced suggesting that the G+C-rich regions have a reduced rate of strand majority of the highly represented clones in the library dissociation during denaturing gradient gel electro- had been identi®ed. phoresis (Shiraishi et al., 1995, 1998). DNA fragments derived from CpG islands are preferentially retained Selection of DNA fragments corresponding to regions of in the gel after prolonged electric ®eld exposure CpG islands while others disappear following strand dissociation (Shiraishi et al., 1995, 1998). Both experimental results To identify which of the isolated DNA fragments are and calculations have shown that greater than 90% of derived from CpG islands, the methylation status of CpG islands can be recovered by the SPM method the corresponding genomic sequences was determined (Shiraishi et al., 1995, 1998). In this study we have in normal somatic tissue DNA. As discussed earlier, applied SPM and MBD chromatography for the MBD column chromatography separates DNA frag- comprehensive isolation of methylated CpG islands in ments from nonmethylated CpG islands predominantly human lung adenocarcinomas to identify potential into a low salt fraction (L) and those from highly tumor suppressor genes and also to understand methylated CpG islands into a high salt fraction (H) epigenetic events in the cancer process. (Shiraishi et al., 1999a). Tsp509 I digests of normal somatic tissue DNA were subjected to MBD column chromatography and DNA fragments corresponding to cloned DNAs were detected by PCR. Representative Results results are shown in Figure 2a. PCR-ampli®ed genomic DNA corresponding to the SPM fragment derived Comprehensive isolation of methylated CpG islands from CpG island B2254 was predominantly detected in Tsp509 I digests of tumor-derived DNA were subjected fraction L suggesting that the genomic sequence is not to three rounds of MBD column chromatography to methylated. Therefore based on the nucleotide se- enrich for methylated DNA. The highly methylated quence of this region and that it is non-methylated, this fragments were then cloned using a l vector (Shiraishi sequence can be regarded as a CpG island. All DNA et al., 1999a). Unexpectedly, 96% of clones in the sequences whose PCR products were present predomi- methylated DNA library had short DNA inserts, most nantly in the L fraction were classi®ed as Class I of which were not associated with CpG islands sequences, and in total 80% of CpG islands were (Shiraishi et al., 1999a,b). CpG islands can be of assigned to this category (data not shown). Bisul®te variable length. However, on the basis of nucleotide genomic sequencing revealed that all of the analysed sequence, the size of most CpG islands is predicted to Class I sequences were methylation free in the be greater than 300 bp (Larsen et al., 1992). Therefore, predicted CpG island region (data not shown). in this study, we have focused on CpG-rich regions Genomic DNA sequences whose PCR products were that are greater than 300 bp in length and eliminated detected in H as observed in B1035 (Class II, PCR all other sequences. The above procedure is summar- products both in L and H) and B353 (Class III, PCR ized in Figure 1. products predominantly in H) (Figure 2a) were not Preliminary analysis revealed that 20% of the clones examined as to the presence of cancer-speci®c methyla- contained DNA fragments derived from the nontran- tion. Bisul®te genomic sequencing of Class II and Class scribed spacer region of the ribosomal RNA genes III sequences revealed heterogeneous methylation (positions 18956 ± 19358 of U13369, data not shown). patterns in that some sequences were highly methylated A similar high representation of the clones containing but only at one end, whereas other sequences were

Oncogene Comprehensive isolation of methylated CpG islands M Shiraishi et al 3806

Figure 1 Scheme of the procedure for the comprehensive isolation of methylated CpG island fragments. Tsp509 I fragments were cloned in the EcoRI site of the vector

evenly methylated throughout the fragment (Shiraishi methylation status of all Class I sequences in the DNA et al., 2001). We also identi®ed some sequences that from patients used in the construction of the were methylated at one allele and methylation-free at methylated DNA library. Representative results are the other allele (Shiraishi et al., 2001). shown in Figure 2b. PCR-ampli®ed DNA representing During our investigation we found that some DNA CpG island B2254 was detected in the High salt fragments from biallelically unmethylated CpG islands fraction in DNA from the Cancerous portion (CH) of could also be fractionated in H. This occurred lung tissue of patients 44 and 56, but not in the High primarily when a large proportion of the fragment salt fraction of DNA from the Noncancerous portion consisted of non-island sequence (Shiraishi et al., of lung (NH) tissue. This result indicates that in these 2001). patients CpG island B2254 is methylated speci®cally in tumors. In patients 108 and 113, DNA fragments corresponding to CpG island B2254 was detected in The identification of CpG islands that are methylated in NH as well as CH. This suggests that de novo cancer methylation of the island occurred in noncancerous The majority of SPM fragments in our study were as well as cancerous regions of lung tissue, however derived from CpG islands and corresponded to this CpG island was unmethylated in cancer DNA genomic sequences reported in public databases. from patient 59. DNA fragments corresponding to Therefore the corresponding genomic CpG island CpG island B1847 were observed speci®cally in CH of could be identi®ed and we were able to determine the patients 56, 59, 108, and 113.

Oncogene Comprehensive isolation of methylated CpG islands M Shiraishi et al 3807

Figure 2 Analysis of methylation status of genomic sequences by MBD column chromatography and PCR. (a) In normal somatic tissue, the methylation status of genomic DNA sequences corresponding to SPM fragments, CpG islands B2254, B1035 and B353 were analysed. L and H indicate low and high salt fractions, respectively. Detection of PCR products in L but not in H indicates nonmethylation. Detection of PCR products in H indicates methylation. (b) Methylation status of CpG islands B2254 and B1847 in patient DNA. Numerical characters refer to a particular patient. C and N indicate DNA from cancerous lung tissue or that from noncancerous lung tissue, respectively. The faint band for CpG island B2254 in 56NH was not reproducible

Bisul®te genomic sequencing was performed to con®rm the methylation status of the sequences described above. The SPM fragment corresponding to CpG island B2254 is derived from a 298 bp Tsp509 I fragment (positions 17 056 ± 17 353, AC004080), has 27 Figure 3 Bisul®te genomic sequencing analysis of the DNA CpG sites and is located at the 5' region of the HOXA5 fragments derived from the 5' CpG island B2254 (HOXA5). gene. When the methylation status of 22 of the CpG Numerical characters in boldface indicate exon number. Lollipops sites were analysed they were found to be nonmethy- indicate position of CpG sites. Numbers in plain-type characters lated in normal somatic tissue DNA (MaN, Figure 3). indicate position of analysed CpG sites. Note not all Tsp509 I The presence of sequence variations (G/T at position sites are indicated. Open and closed circles indicate nonmethy- lated and methylated CpG's, respectively. Letters (A to G) to the 17 219, G/C at position 17 282, AC004080) in the left of circles indicate the seven independent plasmid DNA clones fragment enabled the two alleles to be discriminated that were sequenced. Horizontal bars at CpG site #3 indicate and we could therefore ascertain that in patients 108 absence of CpG sequence due to sequence variation and 113 both alleles are methylated in C, as well as in N, DNA samples. The SPM fragment corresponding to CpG island B1847 is derived from a Tsp509 I fragment Investigation of a potential association between which is 676 bp in length (positions 51 553 ± 52 228, methylation status and transcriptional repression AL035562) and has 49 CpG sites. This CpG island is located at the 3' region of the PAX1 gene. When the We next investigated the potential relationship between methylation status of 26 of the 49 CpG sites in the CpG island methylation and gene silencing. MBD fragment was analysed in patients 108, 113, and 120, column chromatography and RT ± PCR were used to the presence of a highly methylated allele was determine methylation status, and gene expression, con®rmed (Figure 4). However, the methylation respectively. Representative results for CpG island pro®les di€ered between patients and since there were B754 (located at the 5' region of the NID2 gene) are no genetic polymorphisms in the analysed sequence, we shown in Figure 5. In cell lines A549, RERF-LC-OK, could not determine whether methylation was an allelic and RERF-LC-MS, CpG island B754 was unmethy- or nonallelic event, or alternatively whether the lated or weakly methylated, and the associated gene presence of a non-methylated allele was due to was expressed. In contrast, in cell lines ABC-1 and contamination with DNA from noncancerous cells. VMRC-LCD, CpG island B754 was methylated and a Methylation patterns of isolated CpG islands in transcript could not be detected. In cell line LC-2/ad, tumor DNAs are shown in Table 1. Three CpG islands the CpG island was unmethylated, but the transcript (B2361, B2737 and B5504) were methylated in all was not detected. CpG island B5279 corresponding to patients. The number of methylated CpG islands the 5' region of the TCF15 gene was not methylated in di€ered greatly between patients. In patient 44, 13% cell lines RERF-LC-MS and VMRC-LCD, and the of the isolated CpG islands were methylated while in gene was expressed. In cell lines A549, RERF-LC-OK, contrast 76% were methylated in patient 116. and LC-2/ad, this CpG island was methylated, but the

Oncogene Comprehensive isolation of methylated CpG islands M Shiraishi et al 3808 cancer has also been reported recently (Ugolini et al., 2001). The SFRP1 gene product is a member of Secreted Frizzled-Related family and one of its members, SFRP2 (B2878 in Table 1), has been isolated in our study. These results would also strongly imply that the SFRP2 gene product has a tumor-suppressor role. As our investigation has lead to the isolation of CpG island fragments from known, and candidate, tumor suppressor genes it is likely that a number of the other isolated fragments will correspond to previously unidenti®ed lung tumor suppressor genes. As yet the exact number of CpG islands in the human genome is unknown. On the basis of sensitivity to HpaII digestion, Antequera and Bird (1993) have estimated that there are 45 000 CpG islands in the haploid human genome. However, based on CpG density, a ®gure of 29 000 CpG islands has been suggested (Venter et al., 2001; International Human Genome Sequencing Consortium, 2001) (although the methylation status of these sequences has yet to be determined). Costello et al. (2000) analysed aberrant CpG islands in human cancer by restriction landmark genomic scanning and found the number of methylated CpG islands to be between zero and 4500 assuming Figure 4 Bisul®te genomic sequencing analysis of the DNA fragments derived from the 3' CpG island B1847 (PAX1). 45 000 CpG islands in the human genome, depending Symbols are identical to those used in Figure 3 on the patient. This provides an average of approxi- mately 600. A similar ®gure has been also reported in their subsequent work (Dai et al., 2001). In our study, the number of methylated CpG islands isolated per transcript was only detected in RERF-LC-OK. In cell patient was between 24 and 138. The di€erences line ABC-1, only a methylated CpG island was between patients could be attributed to the CpG island observed and the transcript was not detected. We methylator phenotype as reported in colon cancer con®rmed that the genes are reactivated by treatment (Toyota et al., 1999) or heterogeneity in subtypes of with 5-aza-2'-deoxycytidine and histone deacetylase lung adenocarcinomas. Since we are unsure how many inhibitor trichostatin A (unpublished results). In each methylated CpG islands are contained in the original reaction the presence of nearly equal quantities of library, we cannot estimate the total number of template RNA was con®rmed by RT ± PCR experi- methylated CpG islands in each patient's DNA. ments on the HPRT gene (data not shown). However, most randomly selected CpG islands were methylation free (Figure 4 and data not shown) (except for patients 113 and 116) and we believe that most of Discussion the highly represented CpG islands in the library have been identi®ed. Therefore we can estimate the average In this study we have described the comprehensive number of methylated CpG islands in human lung isolation and identi®cation of CpG islands that are adenocarcinomas to be approximately several hundred methylated in human lung adenocarcinoma. A major (1 ± 2% of the total number of CpG islands). Fourteen consequence of this de novo methylation of CpG of the CpG islands identi®ed did not have tumor- islands is the possible transcriptional repression of the speci®c methylation. Isolation of these fragments could associated gene. Indeed, in our study we have be accidental contamination or overrepresentation of con®rmed that some genes that are inactivated and DNA fragments that were methylated in only a limited associated with CpG island methylation are reactivated population of cancer cells. by demethylation and histone deacetylation, suggesting In our analysis we found that some aberrant that methylation is a cause of silencing. Therefore we methylation was not tumor-speci®c and also observed believe that of the inactivated genes with CpG island in DNA derived from noncancerous portions of lung methylation, a number will be novel tumor suppressor tissue (Figure 2b). However, methylation of each CpG genes. For example, CpG island B2254 is derived from island in noncancerous portions of the lung did not the HOXA5 gene whose gene product positively occur in all patients. These results suggest that regulates the expression of the TP53 tumor suppressor methylation of these CpG islands is not an intrinsic, gene (Raman et al., 2000). It has been reported that developmentally programmed event, but a de novo this gene is inactivated by CpG methylation in human event. The incremental di€erences in methylation breast cancer (Raman et al., 2000). A tumor- between patients may be an event associated with suppressive role for the SFRP1 gene in human breast aging (Issa et al., 1994; Ahuja et al., 1998) or a

Oncogene Comprehensive isolation of methylated CpG islands M Shiraishi et al 3809 Table 1 CpG islands methylated in human lung adenocarcinomas Name of Accession no. of the Identical genomic sequences and/or genes Patients CpG islands isolated sequences (positions of the potential CpG island) 44 56 59 108 113 116 120 202 209 B1 AB060342 AC006349 (11 338 ± 13 132) 14 + + + + + + B2 AB060343 AL162458 (143 549 ± 145 387), KCC2 20 ++++++ B6 AB060344 ++ + + B16 AB060345 AC091825 5e B42 AB060346 AC004590 (8482 ± 9064), epsin 3 17 + B61 AB060347 AC021102 17e ++++ B97 AB060348 AC084856 8e +++++++ B115 AB060349 AC003112 (10 261 ± 11 257) 19 B119 AB060350 AC007649 (139 537 ± 141 616) 12 + + + B151 AB060351 AC004232 (33 891 ± 34 882) 16 + + + + B163 AB060352 AL161781 (80 572 ± 81 509) 9 + + + + B178 AB060353 Z93024 (38 645 ± 39 579), PKDREJ 22 + ++++ B205 AB060354 Semcap3 mRNA 3f ++++++ B222 AB060355 AL356095 (136 411 ± 136 852) 10 + + + B229 AB060356 AC007040 (126 606 ± 127 469) 2 + + + + + B240 AB060357 AL033527 (10 840 ± 12 048), FKSG25 mRNA 1 + + + + B321 AB060358 AC011964 4e ++++++ + B329 AB060359 AC017068 (112 053 ± 113 213), PITX2 4 +++++ B336 AB060360 AC020928 (34 983 ± 35 415) 19 ND + + + + + + + B348 AB060361 AC005481 (109 630 ± 115 082) 7 + + + + + + + B350 AB060362 AC011445 (28 598 ± 29 750) 19 + + + + + B379 AB060363 AC005263 (31 558 ± 34 851), AMH 19 ++++ + B395 AB060364 RAC1 ++++++ B404 AB060365 AL162831 (153 126 ± 154 911) 14 + + + + + + + B422 AB060366 AC022739, CDS1 4e ++ B453 AB060367 AP001696 (301 878 ± 302 494) 21 + + + + + + + B469 AB060368 AL049715 (111 327 ± 112 500), NOTCH2 1+ B495 AB060369 AC073094 7e ++ + + + B505 AB060370 AC099552 5e ++ + + + B521 AB060371 AC009501 (100 867 ± 102 014) 2 + + + + + + + B601 AB060372 AC005596 (3658 ± 7267) 19 + + + + + + + B621 AB060373 AC009336 (112 758 ± 114 802), HOXD8 2 + ++++++ B646 AB060374 AC031987 5e +++ B666 AB060375 AC008750 (142 132 ± 142 446) 19 + + + + + + B668 AB060376 AP001694 (46 357 ± 47 130), VE-JAM 21 + + + B675 AB060377 AC013460 (68 787 ± 69 149) 2 + + + + B685 AB060378 AC013421 (11 667 ± 12 367) 12 + + + B722 AB060379 19e ++++++++ B735 AB060380 AC005746 (9027 ± 11 840), TBX2 17 +++++++ B754 AB060381 AC025872 (75 160 ± 77 300), NID2 14 ++++++++ B762 AB060382 AC025525, AVPR1A 12e ++++++ + B765 AB060383 AC021078 (35 226 ± 35 631) 5 + B787 AB060384 AC004080 (98 909 ± 99 243) 7 + + + + + ND + + B803 AB060385 AL158152 (27 783 ± 29 123) 9 B804 AB060386 AC087721, ITPKA 15e +++++ + B858 AB060387 AC023111 15e +++++++ B865 AB060388 AC034241 (1268 ± 2146) 5 + + B947 AB060389 AC021588 (55 240 ± 57 332), IGSF4 11 B951 AB060404 AC009336 (112 758 ± 114 802)a, HOXD8 2ND B1032 AB060390 AC022283 10e ++ B1035 AB060391 Z95332 (6719 ± 9113), PAX6 11 ++++++++ B1039 AB060392 AC022754, 60S ribosomal protein L21 1 + + + + + B1042 AB060393 AC026774 (43 597 ± 45 418) 5 + + + + + B1058 AB060394 AC079403 (56 569 ± 57 088), GLRB 4e + B1072 AB060395 AC004736 (67 127 ± 69 630), MYOD1 11 + + + + + B1099-1 AB060396 AC023315 17e +++++++ B1173 AB060397 AC106738 16e + ++++ B1233 AB060398 AC005901 (114 639 ± 115 332) 17 + + + + + + + B1260 AB060399 AL049635 (96 016 ± 96 357) 11 + + + + ND + B1280 AB060400 AC006450 (22 956 ± 24 383) 9 + + B1310 AB060401 AC022254 15e ++ + + + + B1340 AB060402 AC093495 (134 640 ± 135 921) 3 + B1382 AB060403 AL158800 (68 087 ± 68 973) 14 + B1429 AB060405 MLPH B1454 AB060406 AL357125 10e B1501 AB060407 AC012362 (103 058 ± 103 707), PTHR2 2+++ B1514 AB060408 +++ B1515 AB060409 AC022488, SIPA1 11e + Continued

Oncogene Comprehensive isolation of methylated CpG islands M Shiraishi et al 3810 Table 1 (Continued ) Name of Accession no. of the Identical genomic sequences and/or genes Patients CpG islands isolated sequences (positions of the potential CpG island) Chromosome 44 56 59 108 113 116 120 202 209 B1527 AB060410 AC105113 18e ++++++ B1533 AB060411 AL035467 (137 387 ± 138 395) 6 + B1536 AB060412 AL161658 (30 378 ± 32 057) 20 + + + + + B1559 AB060413 ++ B1562 AB060414 AL138691 (25 251 ± 27 935) 13 + + + B1634 AB060415 AC087591 (91 329 ± 92 439) 3 + + + + + + B1644 AB060418 AC084377 (142 914 ± 144 542) 2 + + + + + + + + B1666 AB060416 AC021518 (84 042 ± 84 552) 8 + + + + + B1727 AB060417 AC017068 (91 651 ± 92 000) 4 + + B1756 AB060419 AC018730 (67 825 ± 70 469) 2 + + + + + + + + B1765 AB060420 AL354735 (153 104 ± 155 842) 9 + + + + + + B1771 AB060421 AC012458 15e ++ + B1786 AB060422 AC007917 (160 628 ± 161 169) 3 + + + + + B1809 AB060423 AL008733 (98 904 ± 102 923), PRDM16 1+ + +++ B1844 AB060424 AC023133 17e +++++ B1847 AB060425 AL035562 (51 715 ± 52 586), PAX1 20 ++++++ + B1953 AB060426 AL022069 (133 090 ± 134 987) 6 B1955 AB060427 AC012531 12 + + + + + + B1965 AB060428 AC021188 (59 624 ± 59 948) 2 + + + + + B1966 AB060429 AC008761 19e +++ B1972 AB060430 AL390294 (47 075 ± 49 612) 10 + B1977 AB060432 AL355497 (188 750 ± 189 948) 6 B1987-1 AB060433 AC011599 (205 624 ± 206 270) 3 + + + + + B2060 AB060434 AC092669 (69 008 ± 70 374) 2 + + + B2074 AB060435 AC360216 10e ND ND + + + + + + B2166 AB060436 AL356308 (70 079 ± 71 103) 10 + + + + + B2205 AB060437 AC058812 4e ++++++++ BB2223 AB060438 AC106864 4e +++++++ B2254 AB060439 AC004080 (16 365 ± 17 453), HOXA5 7 ++ ++++ B2329 AB060440 AL449264 (104 266 ± 106 270), NHLH21 1 ++ ++ B2361 AB060441 AL158827 9e +++++++++ B2364 AB060442 AC004776 (39 056 ± 39 881), PCDHAC2 5 ++ ++++ + B2421 AB060443 AL049636 (48 876 ± 49 707) 1 + + + + B2424 AB060444 AC004480 4e ++ + + + B2477 AB060445 AP000697 (70 231 ± 75 013), SIM2 21 ++++ B2509 AB060446 GS15 mRNA B2515 AB060447 AC020635 3e ++ +++ B2517 AB060448 AC032001 3e +++ +++ B2573 AB060449 AC044787 (114 159 ± 115 787) 15 + + + + B2586 AB060450 AL162831 (163 957 ± 164 879) 14 + + + + + + B2601 AB060451 AC093887 (104 091 ± 105 470), POU4F2 4+ B2650 AB060452 AL096853 (10 684 ± 12 030) 22 + + + + B2707 AB060453 AC073957 (148 356 ± 149 669) 7 + + + B2737 AB060454 AC009336 (162 013 ± 162 689) 2 + + + + + + + + + B2748 AB060455 AF228661 8e ++ B2755 AB060456 AL121722 (20 229 ± 20 964) 20 + + + + B2779 AB060457 AC010496 5e ++ + + + + B2813 AB060458 AC003042 (6620 ± 7536) 17 + + + + + B2878 AB060459 AC020703 (70 520 ± 71 606), SFRP2 4++ B2934 AB060460 AC067863, CHRNA3 15e ++ B2981 AB060461 AC011815 (128 188 ± 129 033) 18 ND + B2983 AB060462 AC009711, TRKC 15e ++++ B3123 AB060463 AC011224 (121 131 ± 122 035), GABRAG3 15 + + B3125 AB060464 AC016931 3e +++ B3140 AB060465 AL450998 (112 002 ± 114 832) 1 + B3176 AB060466 AL354950 (12 652 ± 14 641) 10 + + + + + + + + B3216 AB060467 AC003663 (42 773 ± 46 203) 17 + + + + B3242 AB060468 AC037475 (41 717 ± 42 286) 17 + + B3263 AB060469 AL136320 (28 530 ± 28 860), CUGBP2 10 + ND + + + ND + + B3266 AB060470 AC009789 (96 713 ± 97 195), HOXB6 17 ++++++++ B3363 AB060471 AC003971 (55 630 ± 57 501), ST8SiaIII 18 + + + B3447 AB060472 AL135794 (81 672 ± 85 074), LBX1 10 + + + + + B3546 AB060473 AL121875 (113 132 ± 113 608) X + ND + + + ND + + + B2549 AB060474 AC004955 (114 502 ± 115 592) 7 B3560 AB060475 AC012391 10e +++ B3569 AB060476 AC027756 3e + B3600 AB060477 AC004764 (65 907 ± 67 954), PITX1 5 ++++++ B3812 AB060478 AC024191 4e +++++ + B3890 AB060480 AL022395 (122 403 ± 125 773), N-Oct-3 6 + Continued

Oncogene Comprehensive isolation of methylated CpG islands M Shiraishi et al 3811 Table 1 (Continued ) Name of Accession no. of the Identical genomic sequences and/or genes Patients CpG islands isolated sequences (positions of the potential CpG island) Chromosome 44 56 59 108 113 116 120 202 209 B3893 AB060481 AL356095 (136 441 ± 136 852) 10 + B3944 AB060482 AL035588 (92 791 ± 94 853), MDFI 6+ +++ B3954 AB060483 AC004080 (124 941 ± 125 947) 7 + + + + + + B4011 AB060484 AL359921 (163 273 ± 164 123), ACTN2 1 +++++ B4019 AB060485 AC017104 (138 580 ± 139 153), NMUR1 2+ + B4037 AB060486 AC087279, NELL1 11e + B4156 AB060487 AL136108, mRNA for KIAA1437 protein 9e B4168 AB060488 AL355338 (28 325 ± 32 398), ZIC5 13 ++++++++ B4203 AB060489 ++++++++ B4276 AB060490 AL360216b 10e ND ND + + + + + + B4303 AB060491 ARP3b B4703 AB060492 AC016940 (27 173 ± 28 018) X + B4752 AB060493 AC008755 (155 149 ± 157 434) NPAS1 19 + + + B4758 AB060494 AL136088 (99 688 ± 101 027) 11 + B4863 AB060495 AC020915 (2056 ± 2816) 19 ND + + B4870 AB060496 AC007450 (194 534 ± 195 042) 12 ND + ND + + B4873 AB060497 AC018866 (8340 ± 9130) 2 + + + + + + B4884 AB060498 AC022031 18e +++++ B4892 AB060499 AC011005 (113 910 ± 115 041), SMOH 7ND++ B4924 AB060500 AC073228, HRLP5 11e ++ B5102 AB060501 AL590452 (137 510 ± 137 949) 1 + + + B5215 AB060502 AL109801 (17 582 ± 18 111) X ND + + ND + + B5218 AB060503 AC009789 (78 616 ± 79 204), HOXB8 17 + ++++ B5240 AB060504 AL121917 (28 732 ± 31 919), GNAS1 20 +++ + B5268 AB060505 AC010642 (43 029 ± 43 808) 19 + + + + + B5279 AB060506 AL133231 (44 097 ± 45 096), TCF15 20+++++++ B5288 AB060507 AC073869 (99 022 ± 99 803), GNAQ 2 +++++ B5385 AB060508 AL137061 (74 350 ± 77 248), SOX21 13 +++ B5414 AB060509 AC079061 8e ++++ + B5449 AB060510 AC007461 17 + ND + + + ND + + + B5456 AB060511 AL162411 (35 355 ± 36 233), GLDC 9+ B5470 AB060512 AL121875 (105 894 ± 108 604), SOX3 X+++ B5479 AB060513 AL121760 (71 888 ± 72 615) 20 ND + + + + ND B5504 AB060514 AC005826 (155 844 ± 156 723) 7 + + + + + + + + + B5569 AB060515 AC024606 (135 130 ± 135 511) 10 + + + + B5643 AB060516 AC004232 (33 891 ± 34 882)c 16 + + + + B5658 AB060517 AP001715 (320 152 ± 320 872) 21 + + + + + + + + B5703 AB060518 AF053356 (56 288 ± 56 865), EPO 7 ++++++ B5770 AB060520 AC034139 (38 663 ± 39 087) 4 + + + B5845 AB060521 AC016255, mRNA H41 3e B5888 AB060522 ++++ B8900 AB060523 AC084251, PRDM14 8e ++++++++ B5932 AB060524 AL121926 (126 562 ± 127 299) 11 + + + B6051 AB060525 AC008060 (2742 ± 6003) 7 + + + + + + B6183 AB060526 AL138691d 13e ++++++ + B6191 AB060527 AL139260 (34 072 ± 34 724) 1 + + + + + + B6220 AB060528 AP000866 (157 944 ± 158 678) 11 + + + + B6319 AB060529 AL590369 (4438 ± 5154) 9 + + + B6372 AB060530 AC016044 15e + ++++ B6403 AB060531 AC016813 8e ++ B6509 AB060532 AC090790 8e +++ ++++ B6661 AB060533 AL035694 (50 917 ± 52 580) 6 + + + B6700 AB060534 AC084232, SOX11 2e ++ +

Positions of potential CpG islands are indicated in parentheses. PCR products for CpG islands B16, B115, B803, B947, B1340, B1429, B1454, B1953, B1977, B2509, B3549, B4156, B4303, and B5845 were not observed in H in tumor DNA. +, presence of PCR product in H fractions in cancer DNA. ND, not done. aWithin the same Tsp509 I fragment as B621. bWithin the same CpG island and Tsp509 I fragment as B2074. cWithin the same CpG island and Tsp509 I fragment as B151. dWithin the same CpG island as B1562. eBased on working draft sequences. Due to frequent updates, postitions of potential CpG islands are not indicated. fhttp://www.kazusa.or.jp/huge/

re¯ection of increased methylation activity and/or genes such as the PKDREJ (B178), which is exclusively decreased demethylation activity in precursor cells. expressed in testis (Hughes et al., 1999), and the Recently it has also been suggested that di€erent CpG arginine vasopressin receptor 1A (AVPR1A) gene islands have di€erent susceptibility to de novo methyla- (B762), which is not expressed in lung tissue tion (Nguyen et al., 2001). (Thibonnier et al., 1996), were also methylated in Methylation of CpG islands was not restricted to many patients (Table 1). Apparently these examples of genes expressed in lung tissue. Some CpG islands of de novo methylation are not associated with transcrip-

Oncogene Comprehensive isolation of methylated CpG islands M Shiraishi et al 3812 modi®cations. All colonies on plates were harvested by gently scraping into a small volume of medium (30 ml per plate) and subjected to overnight culture. Phagemid DNAs were prepared and puri®ed by two rounds of equilibrium centrifugation in CsCl-ethidium bromide gradients. After AseI digestion of the phagemid DNAs and DNA fragments longer than 2 kb (by preparative agarose gel electrophoresis) were isolated. Twenty mg of recovered fragments were digested with Tsp509 I, MseI, and NlaIII, and the DNA was recovered using GENECLEAN KIT (BIO 101). One third of the recovered fragments were subjected to denatur- Figure 5 Analysis of the methylation status of CpG islands and ing gradient gel electrophoresis as described previously the expression of associated genes in cultured cells. DNAs were analysed by MBD column chromatography and PCR. Total (Shiraishi et al., 1995, 1998, 1999a). As the amount of RNA was analysed by RT ± PCR. RT ± PCR reactions were DNA retained in the gel was not sucient to be detected by performed in the presence (+) or absence (7) of reverse ethidium bromide staining, the bottom half of the gel was transcriptase, respectively excised and DNA fragments were recovered as previously described (Shiraishi et al., 1995). The bottom 10% of the gel was then discarded to avoid contamination of dissociated single-stranded DNA. Fifty per cent of the recovered tional repression. CpG islands associated with cDNAs fragments were ligated with a mixture of EcoRI, NdeI, SphI, FLJ20699 and FLJ10140 are located near the PKDREJ EcoRI/NdeI, EcoRI/SphI, and NdeI/SphI digests of pKF3 locus, but in all patients the genes represented by these (TaKaRa) and electroporated into the bacterial strain TH2 cDNAs were unmethylated (data not shown). These (Hashimoto-Gotoh et al., 1993). Plasmid DNA was prepared results demonstrate that methylation of the PKDREJ from each colony and subjected to nucleotide sequence CpG island was not a consequence of an indiscriminate analysis of a single DNA strand. Similarity searches were wave of regional island methylation. The elucidation of performed using websites (http://www.ncbi. nlm. nih. gov / the molecular mechanism and signi®cance of de novo blast / blast.cgi, http://www.ncbi.nlm. nih.gov/genome/seq/ HsBlast.html, http://spiral.genes.nig.ac.jp/homology/blast- methylation of the CpG island in nonexpressed genes e.shtml). The algorithm at website (http://www.ebi.ac.uk/ requires further study. However, it has recently been cpg/ and/or http://www.ebi.ac.uk/emboss/cpgplot/) was used reported that DNA methylation mediated by DNMT1 is to detect potential CpG islands. An aliquot of the ampli®ed associated with histone acetylation and altered chroma- library is available upon request. tin structure (Fuks et al., 2000; Rountree et al., 2000; Robertson et al., 2000). Furthermore, in Neurospora Analysis of the methylation status by MBD column crassa, DNA methylation has been shown to depend on chromatography histone methylation (Tamaru and Selker, 2001). There- fore methylation of CpG islands of nonexpressed genes Ten mg of high molecular weight DNA from normal may be a consequence of altered histone modi®cation submandibular glands (MaN) and that from the cell line Lu65 (puri®ed by equilibrium centrifugation in CsCl that regulates chromatin structure. gradients) were digested with 30 units of Tsp509 I (New In summary, we have produced a catalog of CpG England Biolabs) following the manufacturer's recommenda- islands that are speci®cally methylated in human lung tions. After phenol-chloroform extraction and precipitation, adenocarcinomas. We believe that this catalog will be the recovered digests were analysed separately using the same an invaluable resource for the elucidation of epigenetic MBD column as described previously (Shiraishi et al., 1999a). processes in carcinogenesis. An aliquot from each fraction was subjected to PCR to detect DNA fragments containing a CpG island of the CDH1 gene as described previously (Shiraishi et al., 1999a). The CDH1 CpG island is not methylated in MaN DNA (Shiraishi Materials and methods et al., 1999a) and methylated in Lu65 DNA (Yoshiura et al., 1995). MaN fractions containing the DNA fragment Methylated DNA library and SPM analysis corresponding to the nonmethylated CDH1 CpG island and Construction of the library of methylated DNA derived from those corresponding to the fractions containing the methy- nine male lung adenocarcinoma patients was performed as lated CDH1 CpG island in Lu65 DNA were classi®ed as low previously reported (Shiraishi et al., 1999a). The original salt (L) and high salt (H) fractions, respectively. The absolute library, which was constructed from a mixture of tumor salt concentration at the boundary can di€er between DNAs from nine male lung adenocarcinoma patients (ages experiments since the retention capacity of the column ranging from 52 to 76) and contained 36105 plaque forming decreases after extensive use, and its determination requires units (pfu), was ampli®ed to 561011 pfu and then an aliquot calibration using appropriate standard DNA. DNAs from of the ampli®ed library (1.56108 pfu) plus helper phage surgical specimens (Shiraishi et al., 1989) and cell lines had ExAssist (1.56109 pfu, Stratagene) were used to coinfect to been already fractionated as described previously (Shiraishi et XL1-Blue MRF' cells (36108 cells). After a short period of al., 1999a). Further information on PCR conditions and culture, heat-treated supernatant was incubated with SOLR primer sequences is available upon request. cells (suspension in 20 mM MgSO4 solution from 20 ml of overnight culture) and spread on sixteen 2436243 mm plates Bisulfite genomic sequencing experiments for conversion to phagemid clones (26109 colony forming units). The conversion procedure was performed using a Bisul®te modi®cation of Tsp509 I digests of genomic DNA published protocol (Short and Sorge, 1992) with slight was performed following a published procedure (Frommer et

Oncogene Comprehensive isolation of methylated CpG islands M Shiraishi et al 3813 al., 1992) with slight modi®cations. Approximately 200 ng of Acknowledgments bisul®te-treated DNA was used for PCR experiments. This work was supported in part by a Grant-in-Aid for Further information on PCR conditions and primer Cancer Research from the Ministry of Health and sequences is available upon request. Welfare of Japan, a Grant-in-Aid for Scienti®c Research on Priority Areas (C) `Medical Genome Science' from the Ministry of Education, Science, Sports and Culture RT ± PCR experiments of Japan (to M Shiraishi), a Grant-in-Aid from the Total RNA from cultured cells was extracted with TriPure Ministry of Health and Welfare of Japan for the 2nd Isolation Reagent (Boehringer Mannheim). Three mg of total Term Comprehensive 10-Year Strategy for Cancer Con- RNA was incubated with random hexamer oligodeoxynu- trol (to M Shiraishi and T Seikiya), a research Grant on cleotides (Gibco ± BRL). Half of the mixture was treated with Human Genome and Gene Therapy from the Ministry of M-MLV reverse transcriptase (Gibco ± BRL) following the Health and Welfare, Japan and a Grant from Takeda manufacturer's recommendation (RT+). The other half was Science Foundation (to T Sekya). MJ Terry and YH used as negative control (RT7). One twentieth of each Chuu were Research Fellows supported by the MIT reaction mixture was used as template for PCR. Further Japan Program. AJ Oates was a Foreign Research information on PCR conditions and primer sequences is Fellow of the Foundation for Promotion of Cancer available upon request. Research, Japan.

References

Ahuja N, Li Q, Mohan AL, Baylin SB and Issa JP. (1998). Robertson KD, Ait-Si-Ali S, Yokochi T, Wade P, Jones P Cancer Res., 58, 5489 ± 5494. and Wol€e AP. (2000). Nat. Genet., 25, 338 ± 342. Antequera F, Boyes J and Bird A. (1990). Cell, 62, 503 ± 514. Rountree MR, Bachman KE and Baylin SB. (2000). Nat. Antequera F and Bird A. (1993). Proc. Natl. Acad. Sci. USA, Genet., 25, 269 ± 277. 90, 11995 ± 11999. Schmidt M and Migeon BR. (1990). Proc. Natl. Acad. Sci. Baylin SB, Herman JG, Gra€ JR, Vertino PM and Issa J-P. USA, 87, 3685 ± 3689. (1998). Adv. Cancer Res., 72, 141 ± 196. Shiraishi M, Noguchi M, Shimosato Y and Sekiya T. (1989). Bird AP. (1986). Nature, 321, 209 ± 213. Cancer Res., 49, 6474 ± 6479. Bird A. (2002). Genes Dev., 16, 6 ± 21. Shiraishi M, Lerman LS and Sekiya T. (1995). Proc. Natl. Brock GJR, Charlton J and Bird A. (1999). Gene, 240, 269 ± Acad.Sci.USA,92, 4229 ± 4233. 277. ShiraishiM,OatesAJ,XuL,HosodaF,OhkiM,AlitaloT, Costello JF, FruÈ wald MC, Smiraglia DJ, Rush LJ, Lerman LS and Sekiya T. (1998). Nucleic Acids Res., 26, Robertson GP, Gao X, Wright FA, Feramisco JD, 5544 ± 5550. PeltomaÈ ki P, Lang JC. et al. (2000). Nature Genet., 25, Shiraishi M, Chuu YH and Sekiya T. (1999a). Proc. Natl. 132 ± 138. Acad.Sci.USA,96, 2913 ± 2918. Cross SH, Charlton JA, Nan X and Bird AP. (1994). Nature Shiraishi M, Sekiguchi A, Chuu YH and Sekiya T. (1999b). Genet., 6, 126 ± 244. Biol. Chem., 380, 1127 ± 1131. DaiZ,LakshmananRR,ZhuW-G,SmiragliaDJ,RushLJ, Shiraishi M, Lerman LS, Oates AJ, Xu L, Chuu YH, FruÈ hwald MC, Brena RM, Li B, Wright FA, Ross P Sekiguchi A and Sekiya T. (2000). Gene Ther. Mol. Biol., Otterson GA and Plass C. (2001). Neoplasia, 3, 314 ± 323. 5, 35 ± 42. Delgado S, Go mez M, Bird A and Antequera F. (1998). Shiraishi M, Sekiguchi A, Oates AJ, Terry MJ, Miyamoto Y EMBO J., 17, 2426 ± 2435. and Sekiya T. (2001). Proc. Japan Acad., 77(B), 208 ± 211. Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, Shiraishi M, Oates AJ and Sekiya T. (2002). Biol. Chem., in Grigg GW, Molloy PL and Paul CL. (1992). Proc. Natl. press. Acad. Sci. USA, 89, 1827 ± 1831. Short JM and Sorge JA. (1992). Meth. Enzymol., 216, 495 ± Fuks F, Burgers WA, Brehm A, Hughes-Davis L and 508. Kouzarides T. (2000). Nat. Genet., 24, 88 ± 91. Tamaru H and Selker EU. (2001). Nature, 414, 277 ± 283. Hashimoto-Gotoh T, Tsujimura A, Kuriyama K and Tazi J and Bird A. (1990). Cell, 60, 909 ± 920. Matsuda S. (1993). Gene, 137, 211 ± 216. Thibonnier M, Graves MK, Wagner MS, Auzan C, Clauser Hughes J, Ward CJ, Aspinwall R, Butler R and Harris PC. E and Willard HF. (1996). Genomics, 31, 372 ± 334. (1999). Hum. Mol. Genet., 8, 543 ± 549. Toyota M, Ahuja N, Ohe-Toyota M, Herman JG, Baylin SB International Human Genome Sequencing Consortium. and Issa J-P. (1999). Proc.Natl.Acad.Sci.USA,96, (2001). Nature, 409, 860 ± 921. 8681 ± 8686. Issa J-P, Ottaviano YL, Celano P, Hamilton SR, Davidson YoshiuraK,KanaiY,OchiaiA,ShimoyamaY,SugimuraT NE and Baylin SB. (1994). Nature Genet., 7, 536 ± 540. and Hirohashi S. (1995). Proc. Natl. Acad. Sci. USA, 92, Larsen F, Gundersen G, Lopez R and Prydz H. (1992). 7416 ± 7419. Genomics, 13, 1095 ± 1107. Ugolini F, Charafe-Jau€ret E, Bardou V-J, Geneix J, Nguyen C, Liang G, Nguyen TT, Tsao-Wei D, Groshen S, Ade laõÈ de J, Labat-Moleur F, Penault-Llorca F, Longy LuÈ bbert M, Zhou J-H, Benedict WF and Jones PA. (2001). M, Jacquemier J, Birnbaum D and Pe busque M-L. (2001). J. Natl. Cancer Inst., 93, 1465 ± 1472. Oncogene, 20, 5810 ± 5817. Raman V, Martensen SA, Reisman D, Evron E, Odenwald VenterJC,AdamsMD,MyersEW,LiPW,MuralRJ, WF, Ja€ee E, Marks J and Sukumar S. (2000). Nature, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA et 405, 974 ± 978. al. (2001). Science, 291, 1304 ± 1351.

Oncogene