US 2013 0022974A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2013/0022974 A1 Chinnaiyan et al. (43) Pub. Date: Jan. 24, 2013

(54) DNA METHYLATION PROFILES IN CANCER Publication Classification (75) Inventors: Arul M. Chinnaiyan, Plymouth, MI (51) Int. Cl. (US); Mohan Saravana Dhanasekaran, Ann Arbor, MI (US); Jung Kim, CI2O I/68 (2006.01) Northville, MI (US) (52) U.S. Cl...... 435/6.11 (73) Assignee: THE REGENTS OF THE UNIVERSITY OF MICHIGAN, Ann (57) ABSTRACT Arbor, MI (US) (21) Appl. No.: 13/523,545 The present invention relates to compositions and methods (22) Filed: Jun. 14, 2012 for cancer diagnosis, research and therapy, including but not limited to, cancer markers. In particular, the present invention Related U.S. Application Data relates to methylation levels of (e.g., in CGI islands of (60) Provisional application No. 61/498,296, filed on Jun. the promoter regions) as diagnostic markers and clinical tar 17, 2011. gets for prostate cancer. Patent Application Publication Jan. 24, 2013 Sheet 1 of 30 US 2013/0022974 A1

Figure 1 A

SS 838

s & S&s&&xis&3:38

a. a. a. a. a. a. a. a. a.

888& R is

Patent Application Publication Jan. 24, 2013 Sheet 2 of 30 US 2013/0022974 A1

&-

3838; iiirage::c Benign Adja Patent Application Publication Jan. 24, 2013 Sheet 3 of 30 US 2013/0022974 A1

Figure 3

N = 6619 tique TSS Else.” (6077 Unique Reiseg Genes) t S

S

SS &S .S. SMyy

Na3S S

Nd:

- alie

Nic23

Ci (pg islands Cin 5'Shore of CpG islands On 3 Shore of CpG islands On Non-CpG islancis Patent Application Publica US 2013/0022974 A1

ty?un81

Patent Application Publication Jan. 24, 2013 Sheet 6 of 30 US 2013/0022974 A1

Figure 6

A RASSF1 Chr3:50,342,221-50,353,371 B RASSE1 Š H3K4methylation 2 Swariants 1 & 4 Variant2 8 DNA methylation 2 O giwariant 3

PrEC NCap swariants 1 & 4 swariant 2 swariant 3

Variant 3 C LNCaPDMSO LNCaP 5-AZA

Variant 3 NCP RACE (i. lili 1 1a 2 1, 3 5 8

Svariants 1-4 Š H3K4methylation Evariants 5-8, & 8 DNA methylation ( So 2 2 as PrEC LNCaP 2 4 3 Evariants 5-8 2 ( -82 is LNCaP 5 RACE F 2 0

a2345678siclilill 2134567 Exons Patent Application Publication Jan. 24, 2013 Sheet 7 of 30 US 2013/0022974 A1

Figure 7

H3K4 + Methylation SS S X X S

r O S. -N4 C O L

r 8S38S8Š8Š ŠCpG islands

Methylated Regions -1500 ISS 1500 bp H3K4 Binding Regions S Patent Application Publication Jan. 24, 2013 Sheet 8 of 30 US 2013/0022974 A1

Figure 8 A

S.

SS &Š xxxssssssssssssss

Š. SS-Sysis: Patent Application Publication US 2013/0022974 A1 ..…–.….§.

"…………. =.–ž§'___.…. ?

§'====,(,,)–---- |~~~~ •=(-)_—___| :.

|T?????

|r=: :L.

.¿?????????????? ……………………………...……… .

6?un814 er,????ae Patent Application Publication Jan. 24, 2013 Sheet 10 of 30 US 2013/0022974 A1

Figure 10

A Raw Counts associated with 21

8, 8...... 3,338...... g i:8:::::::::::::::::::::::::::::::: ------. 8.

$38: & 3: X s & n ce - N s ce U S C O O O O O O s o C A- s c U S.)

is: 2 a. R4=0.8556 x's ------warra-aaluka ------'. ... : :-----'-:::::::::s'--::::::'. : SSC 38, 8: S: 388 FE 8: $8 8: 3 x: S.S. 88 - 8:3 LNCaP 400 bp Cut v1 PrEC 400bp Cut v1 Correlation by Raw Counts Correlation by Raw Counts B Peaks associated with Total CpG islands &..is ...... 338...... 18:08cir. 18torio Q S H 8,838w -s d U 8 & 8 CP is ...------C O O O S st

?h n s s

O ?h 4 2.368.------...} : (iii...... i...... -

$828: a sist sixto PrEC 400bp Cut v1 Correlation by Peak Height Correlation by Peak Height Patent Application Publication Jan. 24, 2013 Sheet 11 of 30 US 2013/0022974 A1

Figure 11

A NCar vs. PEC PC NCa S.SEx&SR 99

SS 2 SS: Š CDKN2A 8 C4 & ES5 8 : Pf

FOXSE & iOSKE S. AfiOXC: S

AFEXA2 Cai SCX7 FOXC: Sa-i-A12 Sa-rra Sa-n-2 FOXA2 F5.

ES

MeDP-Seq, M.NGS

CGI promoter(13936) CG - Non-Promoter(54572) Patent Application Publication Jan. 24, 2013 Sheet 12 of 30 US 2013/0022974 A1

Figure 12

CpG islands LNCaP PrEC

n O . a C t L

ŠCpG islands Methylated Regions n = 3,496 Refsed genes -1,500 TSS +1,500 bp 8S Patent Application Publication Jan. 24, 2013 Sheet 13 of 30 US 2013/0022974 A1

Figure 13 -31814918

Scale: chr1: . 31814500mino 31816000 BS-Seq Amplicon ; : BS-Seq-Amplicon No data ; PrEC-MNGS ; : PrEC-MNGS No data 19: LNCap-MNGS LNCaP-MNGS

8.6288 . 35.1599 ifief MeDP

LNCap-MeDIP

RSSSSsess kwsSSX Repeating Elements by RepeatMasker RepeatMasker: ; : ; ; ; ; ; ; ; ; ; : ; ; ; ; ; ; ENCODEHudson Alpha Methyl-seq

TSPAN1 chr1:46418579-46418802

Scale: ; ; ; ; chr1: 46418000: 46419500: BS-Seq-Amplicon BS-Seq-Amplicon No data PrEC-MNGS

PrEC-MNGS

LNCaP-MNGS

&&ss-sesssssssss Repeating Elements by Repeat Masker ENCODE Hudson Alpha Methyl-sed Patent Application Publication Jan. 24, 2013 Sheet 14 of 30 US 2013/0022974 A1

Figure 13 (continued) SPON2 chr4:1156947-1157266; 1157164-1157488 Scale: ; ; ; - 1kb . chr4: 115300 1157000; 1157500; BS-Seq Amplicon BS-Seq-AmpliconN 34.979

PrEC-MNGS

8.43 No data

NCaP-MNGS

No data No data LNCaP-MeDIP

LNCaP-MeDP

RefSeq Genes

RepeatMaske NCODE Hudson Alpha Methyl-Seq K5521. K5622 : SHC1 chr1:153209107-153209344

Scale ...... chr1: 153208500 153209000: BS-Sec Amplicon15320950 153210000: BS-Seq-Amplicon ; : Š No data. PrEC-MNGS PrEC-MNGS No data, 19 NCaP-M

LNCaP-MNGS

15,682 CBN.aP-MeDIP LNCap-MeDIP

8.1079

RepeatMask Repeating Elements by RepeatMasker ENCODEHudson Alpha Methyl-sed Patent Application Publication Jan. 24, 2013 Sheet 15 of 30 US 2013/0022974 A1

Figure 13 (continued) RASSF1A chr3:50353011-503531.98 Scale 500 bases------chr3:3: ; : 50353.000: : 50352500: : BS-Seq-Amplicon 50353500s 50354000: BS-Seq-Amplicon S. No data, PrECMNGS PreC-MNGS

No data ; 30.27 LNCaPMNGS LNCap-MNGS

8.05 . . . . SSSS 46.7579 ...... LN

: RefSec. Genes

Repeating Elements by Repeat Masker RepeatMasker ENCODEHudson Alpha Methyl-sec Š PPP1R3C Chr10:93382583-93382815

nexeneeeeeeeeeeeeerint- . exexext seesenteresex sextee exextee-sex sexx...exceeeeeeeeeeeexses:S...... -... .

93382000 93382500 93383000 9338s 500 BS-Seq-Amplicon : BS-Seq-Amplicon No data seeinas

PrEC-MNGS

No data LNCap-MNGS 112.24

LNCaP-MNGS

12.3884. ; : ...... 39.2857 LNCap-MeD

LNCaP-MeDIP

ass

Repeating Elements by RepeatMasker

ENCODE Hudson Alpha Methyl-seq Patent Application Publication Jan. 24, 2013 Sheet 16 of 30 US 2013/0022974 A1

Figure 13 (continued) NTN4 Chr12:94707434-94.707704 Scale: ...... 1kbisexists ...... chr12, 94.706500 94707000 94.707500. . . . . 94708000 94708500

; : BS-Seq-Amplicon

BS-Seq-Amplicon

No data PrEC-MNGS

No data 139,855. LNCaP-MNGS& S : LNCap-MNGS

12.0178. 20.7231, NSSNSS

& LNCap-MeDIP

RefSeq Genes

Repeating Elements by RepeatMasker RepeatMasker ENCODEHudson Alpha Methyl-sed K5621. K5622. s MYC chr8:128815497-128815 710

chrs: 128815500 1288.16000 1 128817000 128317500 1288.18000 128818500 BS-Seq-Amplicon S. . . . BS-Seq-Amplicon: & ‘. . . . . No data, ...... PrEC-MNGS PrEC-MNGS

No data i No data: NCaP-MNGS LNCap-MNGS

No data No data ; LNCap-MeDIP LNCap-MeDIP

No data. : S : RefSeqGenes :

SSSSSSSSsass Sass . . . . . epeating Elements by R RepeatMasker ; : ; ; ; ; ; ; ; ; ; : ; ; ENCODE Hudson Alpha Methyl-sed K5621 K5622: ...... Patent Application Publication Jan. 24, 2013 Sheet 17 of 30 US 2013/0022974 A1

Figure 14 LAMC2 chr1:181421688-18142.1931

Scale : : 1kb------sess-ses------chr1: 181421000 : 181421500: 181422000: 181422500; ; ; ; ; ; ; ; ; 181423000 BS-Sea-Amplicon ; : : ESSENRicon . . q-Amp ; : ; ; ; ; ; ; ; ; : : & : ; ; No data ; : . . . . PrEC-MNGS : : PrEC-MNGS No data. : ; ; ; ; ; ; ; ; ; 43,78...... LNCaP-MNGS :

25.8276 : : : . . . . : LNCaP-MeDP

RefSeqGenes : 83.88. SSSSess is 8iss'ssis iss-se: ;: Repeating Elements by RepeatMasker RepeatMasker M ENCODEHudson Alpha Methylised K5621.K5522 8 : . . . . . ; ...... KCTD1 Chr18:22381178-2238.1550

scalee ... , - r chr18: 22380000 22380500: 22381000 22381500: 22382OOO: 223.82500 22383Coo: BS-Seq-Amplicon: . . . . . ;. .: ...... &S Location ::: : . No data: ...... PrEc-MNGS

PrEC-MNGS

66,9328.No data : LNCap-MNGS

incap-MNGs šs. 54.85 ...... s.

Se als

SSS is RepeatMasker ENCODEHudson Alpha Methyl-sea

Patent Application Publication Jan. 24, 2013 Sheet 18 of 30 US 2013/0022974 A1

Figure 14 (continued) CDKN2A Chr0:21960825-2196.1015

. . . . . - Soeto.sexes---s-s-s-s-s- : : . :: . . . ; : : 2196.0500: 2196.1000. 21.961500: BS-Seq-Amplicon ...... S.BS-Seq-Amplicon : No datas: : ...... PrEC-MNGS PrEC-MNGS

9.1038. ; : : &S SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS &s.& 78,0998. LNCaP-MeDIP S.

eas

RepeatMasker N K562. , , , ...... CALML3 chr10:55.56581-5556830

Scale: . chr10: 5557000 5557500 BS-Seq-Amplicon BS-Seq-AmpliconS...... ; No data...... ; ; ; ; ; ; ; ; ; : ; ; ; ; ; ; ; ; ; ; PrEC-MNGS PrEC-MNGS

...... LNCaP-MNGS: ......

:...... &. Š: Šs $. m S incapriatip” LNCaP-MeDIP

RefSeq Genes

Repeating Elements by RepeatMasker RepeatMasker - ...... ENCODEHudsonAlpha Methyl-seq Patent Application Publication Jan. 24, 2013 Sheet 19 of 30 US 2013/0022974 A1

Figure 14 (continued) CHMP4A chr14:28313174-28313411

Scale 1kbox-ories chr14: 28312500 : 28313000; : BS-Seq-Ampl28313500 gon BS-Sec-Amplicon: Š No datai. PrEC-MNGS PrEC-MNGS No data. 378,726. LNCaP-MNGS LNCap-MNGS 1644. 11,5662. LNCap-MeDIP

- epeatMasker RepeatMasker ENCODE Hudson Alpha Methyl-sed 5-112101,503

3. BS-Sed-Amplicon112101500; . . . . . 112102000: 112102500 S.PrEC-MNGS

LNCap-MNGS

& CaP-MeDIP LNCap-MeDIP

RefSeq Genes

epeating Elements by RepeatMasker ENCODEHudson Alpha Methyl-sea Patent Application Publication Jan. 24, 2013 Sheet 20 of 30 US 2013/0022974 A1

Figure 14 (continued) AOX1 chr2:201158884-201159157 Scales ...... 1 kb chr2: 201158000: . . . . . 201158500: 201159,000 2011595.00 20116000 BS-Seq Amplicon . . . . . : : : . . . . . BS-Sed-Amplicon : No data, "PENN65 :

PrEC-MNGS

No data,

180.991. LNCaP-MNGS

LNCaP-MNGS SSSSSSSSSSSSSSSS

s:RSSR's SSSSSSSS-S-S- Repeating Elements by Repeat Masker RepeatMasker m ENCODE Hudson Alpha Methyl-seq

K5621 ...... ; : : ::S ...... K5522 E:

AMT chr3:49434492-49434742 Scale 1kbsesssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss chr3: . . . . . 19434000 49.434500i 49435000 49435500 BS-Seq Amplicon ; : : BS-Seq-AmpliconŠ: ; : 8 . . . No data...... PrECMNGS PrEC-MNGs No data, 21.3628. LNCaP-MNGS

8.7386...... 19.7348, : ; ; ; ; ; ; ; ; LNCaP:MeDIP LNCap-MeDIP .

RepeatMaskerN Repeating Elements by RepeatMasker K5521. ENCODE Hudson Alpha Methyl-sea K5622 Patent Application Publication Jan. 24 2013 Sheet 21 Of 30 US 2013/0022974 A1

GT?an31

Patent Application Publication Jan. 24, 2013 Sheet 22 of 30 US 2013/0022974 A1

Figure 16

AOX1 cr2:201158884-20115915

2.FÉÞÝ%

C9Griff 25 cr; 132895-103289.348

NTN4 chr12:9470434-94.04

C-DNA %2% %

%

AT ch3:49.434,492-4943442

raineyaled

C-DNA

PPPR3C 1:933.82583-9333,2815 MC DNA (Universally Methylated Control DNA) CpG 1 2 4 5 6 7 8 8 12 13 O 7 O 1 1 1 O 41 2. UMC DNA (Unmethylated

% @ ontrol - Fetal DNA)

& * PreC V. MCaP

NAP15 cr:898.375.3 89838003

MC-DNA

Patent Application Publication Jan. 24, 2013 Sheet 23 of 30 US 2013/0022974 A1

Figure 17 One-Class SAM Analysis Genes induced by 5-Aza in LNCaP Cells r-ft--~

.2.

False Discovery Rate (%): 5 Expected Score N=973 Genes Significant Genes: 246 Patent Application Publication Jan. 24, 2013 Sheet 24 of 30 US 2013/0022974 A1

Figure 18

A 8-ississax:y:

&x s an ar. -

Š888& 8text{sy Šištes 8:8-&isit:88 as intereo arcier °3388.83 s s: s x a 'c'sis --- ...- . y .

A Ry388.888 8xivity

PrEC Methylated Six & RS vs. NCap (NGS) Patent Application Publication Jan. 24, 2013 Sheet 25 of 30 US 2013/0022974 A1

Figure 19 A Promoter

0.6 rs I O.5. c 64 O 0.3 . 8 g 0.1 .S. S 0.0i o oxio x x sex x. x x x . I -0. 1

s a C 2 3 na pos' (po a. J

- l 0. ' aero cross at 14003 Zero Cross at 14009 .

S -2.500 . e s -5 000 na neg (negatively correlated) 'naneg' (negatively correlated l a 62.500 5.6007,50610,606 1250615.06617,50626.00022,500 35600 50075000.066155615.6007,566.3606623.5065.60 s Rank in Ordered Dataset Rank in Ordered dataset

kin PrEC Methylated Genes Noap Methylated Genes Patent Application Publication Jan. 24, 2013 Sheet 26 of 30 US 2013/0022974 A1

Figure 19

Promoter - with CGS Promoter - without CGS 0.1 0.08: ... asw8 a 8-8-8, 8.8 s 88.8 x 8.88 :p. - -0.1: % a -0.2 8%. 8% 8, 8% a 6%. 8% 8 0.08 ge cy % - 3-0.3

| -0.7 r". . ; ; ; a a i -0.8;: I-3, 3 is: :

33; 3: 3.653.iii. 8,838 'has' O 0: O;

Zero cross at 400S aea css at 14009 Zero cross at 14:003 -2.5OO -2.500

'naneg (negatively correlated) -500) 'naneg' (negatively correlated) -5000 'nanes' negatively correlated) 0.066i566-1666613506i5.665ii 5636.606356838 O3.665.8565ocioloodiz.515.oOo. 17.50028.006235.0025.06 06005068,000.003. Soo 15.601560606022.56350 Rank in Ordered Dataset richment profile Hits Ranking metricsco NCap Methylated Genes LNCaP Methylated Genes Patent Application Publication Jan. 24, 2013 Sheet 27 of 30 US 2013/0022974 A1

Figure 20

§ Promoter Schematic ..!&$$$$$$$$$$$$$$$$$$$$$$$$ &C G sari: Patent Application Publication Jan. 24, 2013 Sheet 28 of 30 US 2013/0022974 A1

Figure 21

500 bases chr20:435-poon-3532500:43533000----- Fre. S&S

80s: ".

Rap-as

ississatist:

: Bénigadiecast Clore Clone Clone 3.-- Clone 4 Clone 8

Nai 3.oneQe Clone 4

Clone 1.- Clone 2. Cone 3-- Clone Clone 8S-Sequencing Core ------eas Clore 8 line Clone Clor. S. s.

Cre 5 Clone Clo?e Clore Core 5

Core - Cloneone2. Clone ES-Sequeling (Not viethyia viethylatex

3-eyes offbitats 8arie Šss is : seawy Satispie is & Normal Localized PCa $Cell Lines Benign : Metastatic PCa Patent Application Publication Jan. 24, 2013 Sheet 29 of 30 US 2013/0022974 A1

Figure 22 A

Air Sasirisf: : & 8 y'ai : is &

site:

: clic Svariants - 4 : Waiaris S. - 8

Adi, Normat Patent Application Publication Jan. 24, 2013 Sheet 30 of 30 US 2013/0022974 A1

Figure 23

DNA wethyiation Episy PER analysis of WFC2 prooter is prostate career

... xxx Casities it i. e.g. rostate:

Viethylated f & innaethyiated NA Startiards

Nora

s. Prostate tissties

locatized

State

Caier

tissues

Metastatie

State

aster

tissues

Prostate Narmai celis US 2013/0022974 A1 Jan. 24, 2013

DNA METHYLATION PROFILES IN CANCER gets the BCR-ABL kinase (Deininger et al., Blood 105: 2640 (2005)). Thus, diagnostic methods that specifically identify CROSS REFERENCE TO RELATED epithelial tumors are needed. APPLICATIONS 0001. This application claims priority to U.S. Provisional SUMMARY OF THE INVENTION Application No. 61/498,296, filed Jun. 17, 2011, which is 0006. The present invention relates to compositions and herein incorporated by reference in its entirety. methods for cancer diagnosis, researchand therapy, including but not limited to, cancer markers. In particular, the present STATEMENT REGARDING FEDERALLY invention relates to methylation levels of genes (e.g., in CGI SPONSORED RESEARCH ORDEVELOPMENT islands of the promoter regions) as diagnostic markers and 0002 This invention was made with government support clinical targets for prostate cancer. under CA069568, CA132874, CA111275 and DAO21519 0007 Embodiments of the present invention provide com awarded by the National Institutes of Health and BC07023 positions, kits, and methods useful in the detection and awarded by the Army Medical Research and Material Com screening of prostate cancer. Experiments conducted during mand. The government has certain rights in the invention. the course of development of embodiments of the present invention identified aberent methylation status of certain FIELD OF THE INVENTION genes in prostate cancer. Some embodiments of the present invention provide compositions and methods for detecting 0003. The present invention relates to compositions and such aberently methylated genes. Identification of aberently methods for cancer diagnosis, research and therapy, including methylated genes finds use in screening, diagnostic and but not limited to, cancer markers. In particular, the present research uses. invention relates to methylation levels of genes (e.g., in CGI 0008 For example, in some embodiments, the present islands of the promoter regions) as diagnostic markers and invention provides compositions, kits and methods of screen clinical targets for prostate cancer. ing for the presence of prostate cancer in a Subject, compris BACKGROUND OF THE INVENTION ing contacting a biological sample from a subject with a reagent for detecting the methylation status of one or more 0004. A central aim in cancer research is to identify altered genes (e.g., including but not limited to, WFDC2, MAGI2. genes that are causally implicated in oncogenesis. Several MEIS2, NTN4, GPRC5B, C90rf125, FGFR2, AOX1, types of somatic mutations have been identified including VAMP5, C14orf159, PPP1R3C, S100A16 or AMT as well as base Substitutions, insertions, deletions, translocations, and one or more of the genes listed in Table 4; and detecting the chromosomal gains and losses, all of which result in altered methylation status of said genes using an in vitro assay, activity of an oncogene or tumor Suppressor . First wherein a higher degress of methylation of said genes in said hypothesized in the early 1900s, there is now compelling sample relative to the level of methylation in normal prostate evidence for a causal role for chromosomal rearrangements in cells in indicative of prostate cancer in said Subject. In some cancer (Rowley, Nat Rev Cancer 1: 245 (2001)). Recurrent embodiments, the sample is selected from tissue, blood, chromosomal aberrations were thought to be primarily char plasma, serum, urine, urine Supernatant, urine cell pellet, acteristic of leukemias, lymphomas, and sarcomas. Epithelial semen, prostatic secretions or prostate cells. In some embodi tumors (carcinomas), which are much more common and ments, the detecting the level of methylation is carried out contribute to a relatively large fraction of the morbidity and utilizing Methylplex-Next Generation Sequencing (M-NGS) mortality associated with human cancer, comprise less than or another Suitable assay. In some embodiments, the cancer is 1% of the known, disease-specific chromosomal rearrange localized prostate cancer or metastatic prostate cancer. In ments (Mitelman, Mutat Res 462: 247 (2000)). While hema some embodiments, the methylation is detected in the 5' tological malignancies are often characterized by balanced, non-coding region (e.g., promoter region) of the gene. In disease-specific chromosomal rearrangements, most Solid Some embodiments, expression of genes is decreased when tumors have a plethora of non-specific chromosomal aberra an increased level of methylation is present. In some embodi tions. It is thought that the karyotypic complexity of Solid ments, analysis is conducted using a computer implemented tumors is due to secondary alterations acquired through can method, and results are displayed to a user using a user cer evolution or progression. interface. In some embodiments, the results of the method are 0005. Two primary mechanisms of chromosomal rear used to determine a treatment course of action. In some rangements have been described. In one mechanism, pro embodiments, the treatment course of action (e.g., chemo moter/enhancer elements of one gene are rearranged adjacent therapy) is administered. In some embodiments, the method to a proto-oncogene, thus causing altered expression of an is performed again after treatment and is used to determine oncogenic . This type of translocation is exemplified whether further treatment is needed and/or administered. by the apposition of immunoglobulin (IG) and T-cell receptor 0009. Additional embodiments are described herein. (TCR) genes to MYC leading to activation of this oncogene in B- and T-cell malignancies, respectively (Rabbitts, Nature DESCRIPTION OF THE FIGURES 372: 143 (1994)). In the second mechanism, rearrangement results in the fusion of two genes, which produces a fusion 0010 FIG. 1 shows characterization of genome-wide protein that may have a new function or altered activity. The methylation patterns in prostate cells by M-NGS (A) Venn prototypic example of this translocation is the BCR-ABL diagram represents a 70% overlap between the regions gene fusion in chronic myelogenous leukemia (CML) (Row methylated in LNCaP and PrEC cells. (B) In LNCaP and ley, Nature 243: 290 (1973); de Klein et al., Nature 300: 765 PrEC, the majority of DNA methylation occurred in inter (1982)). Importantly, this finding led to the rational develop genic and intronic regions and the genomic distribution of ment of imatinib mesylate (Gleevec), which successfully tar methylation peaks was similar. (C) Promoter associated CpG US 2013/0022974 A1 Jan. 24, 2013 islands displayed 7 fold difference in methylation between (0017 FIG. 8 shows differentially methylated regions LNCaP and PrEC cells. (D) DNA methylation in APC, between ETS-Positive and ETS-Negative samples. (A) Venn CHMP4A, CALML3, CDKN2A, KCTD1, LAMC2, diagram displays the methylation overlap observed between RASSF1, SHC1, TINAGL1 and TSPAN1 gene promoters in ETS-Positive, ETS-Negative and benign prostate tissue LNCaP (L) cells; SPON2 in PrEC (P) cells and a negative samples. (B) The coverage for various repeat elements was control region in MYC were validated by bisulfite sequenc higher in ETS-Positive compared to ETS-Negative samples 1ng. indicating higher methylation in the former. The fold differ 0011 FIG. 2 shows DNA methylation pattern in prostate ence for methylation in each class of repeat element is indi tissues (A) Genome-wide distribution of DNA methylation in cated by the line plot above. (C) Percent methylation was various prostate sample groups analyzed. The majority of assessed independently by pyro-sequencing assays for methylation peaks are confined to intergenic and intronic LINE-1 elements and GSTP1 gene promoter methylation in regions similar to cell lines. (B) A gradual increase in percent prostate tissue panel (Benign n=5, ETS-Positive cancers methylation with cancer progression among promoter CGIS n=10 and ETS-Negative cancers n=4). compared CGIs located in other genomic regions was (0018 FIG. 9 shows a schematic of M-NGS library gen observed. * Pearson's Chi-squared test p-value <2x10' eration. 0012 FIG. 3 shows promoter DNA methylation during (0019 FIG. 10 shows that regression analysis of M-NGS prostate cancer progression A total of 6619 gene promoters mapped reads and HMM output shows high correlation from 6077 unique RefSeq genes harbored DNA methylation between sequencing runs. (A) Reads that mapped to chromo among the various sample groups analyzed (Normal, Benign some 21 in LNCaP400 bp-1 and -5, and PrEC400 bp-1 and -5 Adjacent, PCa or MET). Each row represents a unique pro runs were compared using the window size of 25 bp. In moter region at 100base pair window size, covering +1500 bp LNCaPsamples, a total of 33,627 reads were present at 25bp flanking the transcription start site, indicated by white dotted windows with R value of 0.9508, and in PrEC, 37,406 reads line. The location of CpG island in methylated gene promot with R value of 0.8556 was observed. (B) Linear regression ers is shown in first column. Promoters are ordered by the analysis of all DNA methylation that occurred on CGIs location of methylation on CpG island, adjacent to the island showed high correlation (R value=0.9398 and 0.9819, n=5, (shores) or on promoters that lacked CpG islands for groups 734 and 4.966, respectively). I to IV. Methylation patterns in prostate cells PrEC and 0020 FIG. 11 shows a correlation between M-NGS vs LNCaP are presented alongside for comparison. Methylplex-Array and M-NGS vs MeDIP-Seq results. (A) Methylplex-array libraries made from LNCaP (Cy5) and 0013 FIG. 4 shows that promoter methylation is associ PrEC (Cy3) cells were hybridized to Agilent human CGI ated with gene repression. microarray. Array results are displayed on the left in heatmap 0014 FIG. 5 shows WFDC2, TACSTD2 and GSTP1 form. (B) Overlap in methylated CpG islands located within methylation in prostate tissue panel. MethylProfiler qPCR or outside gene promoters (1500 bps flanking the transcrip was used to determine DNA methylation of the (A) WFDC2 tion start site) in LNCaP cells identified by M-NGS and (B) TACSTD2 and GSTP1 genes. 17/22 prostate cancertis MeDIP-Seq. sues and 6/6 transformed prostate cell lines showed methyla 0021 FIG. 12 shows promoter methylation in LNCaPand tion of WFDC2 promoter, whereas there was no detectable PrEC cells identified by M-NGS. methylation in normal (0/3), benign adjacent tissues (0/7) or (0022 FIG. 13 shows representations of Methylplex NGS the normal PrEC cells. (B) Methylation of TACSTD2 pro sequencing data used for nomination of methylated candidate moter in prostate tissues and cell lines were assessed by gene promoters. MethylProfiler qPCR. Twenty one percent cancer tissues (0023 FIG. 14 shows representations of Methylplex NGS (5/23) and prostate cancer cell lines, VCaP, LNCaP and PC3 sequencing data used for nomination of methylated candidate were methylated. (C) Methylprofiler qPCR analysis of gene promoters. GSTP1. 20/22 prostate cancer tissues, 1/7 benign adjacent (0024 FIG. 15 shows representations of Methylplex NGS tissues and 6/6 transformed prostate cell lines showed methy sequencing data used for nomination of methylated candidate lation of GSTP1 promoter, whereas there was no detectable gene promoters. methylation in normal tissues (0/3), or normal PrEC cells. (0025 FIG. 16 shows DNA methylation in AOX1, 0015 FIG. 6 shows regulation of alternate transcription C9orf125, NTN4, AMT, PPP1R3C and NAP1L5 gene pro start site utilization by DNA methylation (A, D) Cancer moters in LNCaP (L), PrEC (P), Universally methylated con specific DNA methylation enables switching of alternative trol DNA (MC-DNA, ZymoResearch Inc) and Unmethylated transcriptional start sites (TSS) leading to transcript isoform control Fetal DNA (UMC-DNA, Millipore Inc) were vali regulation. In RASSF1 (A) and NDRG2 (D), CpG methyla dated by bisulfite sequencing. tion occurs at the TSS of the longer variants, with H3K4me3 0026 FIG. 17 shows that significance analysis of microar marks positioned on the TSS of the shorter variants. (B, E) ray (SAM) identified re-expression of LNCaP methylated Preferential silencing and 5-Aza-induced re-expression of genes after 5-AZa treatment. CpG-methylated variants in LNCaP cells. (B, D) 5'RACE 0027 FIG. 18 shows that genes hypermethylated in results validated RASSF1 variant-3 and NDRG2 variants 5-8 LNCaP cells are enriched for biologically-significant con expression in LNCaP cells. (C, F) expression values cepts. (A) Molecular Concept Map (MCM) analysis of from LNCaP RNA-Seq data, supports the corresponding LNCaP methylated genes revealed enrichment of gene sig variant transcription of RASSF1 and NDRG2 genes. natures repressed in prostate cancer and over-expressed in 0016 FIG. 7 shows mutually exclusive patterns of pro benign prostate tissues from multiple studies. (B) MCM moter DNA methylation and histone H3K4me3 marks in analysis of PrEC methylated genes shows the enrichment LNCaP cells. only for histone modification concept. US 2013/0022974 A1 Jan. 24, 2013

0028 FIG. 19 shows gene set enrichment analysis 0039. As used herein, the term “characterizing prostate (GSEA) showing the association between gene repression tissue in a subject” refers to the identification of one or more and promoter methylation. (A) Methylated genes from properties of a prostate tissue sample (e.g., including but not LNCaP and PrEC cells were tested for their corresponding limited to, the presence of cancerous tissue, the presence or ranked gene expression in next generation transcriptomic absence of methylated promoters, the presence of pre-cancer sequencing (RNA-Seq). (B) While no significant association ous tissue that is likely to become cancerous, and the presence was observed between gene body methylation and gene of cancerous tissue that is likely to metastasize). In some expression in LNCaP (p-value <0.623), candidates with gene promoter methylation with and without the presence of CGIs embodiments, tissues are characterized by the identification in LNCaP cells are enriched with under-expressed genes of the expression of one or more cancer marker genes, includ (p-value <0.0039 and 0.0015, respectively). ing but not limited to, the cancer markers disclosed herein. 0029 FIG. 20 shows that DNA methylation is associated 0040. As used herein, the term “stage of cancer refers to with gene repression. a qualitative or quantitative assessment of the level of 0030 FIG. 21 shows methylprofiler PCR-bisulfite advancement of a cancer. Criteria used to determine the stage sequencing comparison in PCapanel. of a cancer include, but are not limited to, the size of the tumor 0031 FIG. 22 shows that cancer-specific DNA methyla and the extent of metastases (e.g., localized or distant). tion enables Switching of alternative transcriptional start sites (TSS) leading to transcript isoform regulation. Figure details 0041. As used herein, the term “nucleic acid molecule' characterization of APC gene similar to the data presented for refers to any nucleic acid containing molecule, including but RASSF1 and NDRG2 (FIG. 6). (A) In contrast to the two not limited to, DNA or RNA. The term encompasses examples presented in FIG. 6, APC exhibits the reverse, with sequences that include any of the known base analogs of DNA H3K4me3 on the longer variant 1 and CpG methylation at the and RNA including, but not limited to, 4-acetylcytosine, shorter variants 2 and 3. (B) Preferential silencing and 5-Aza 8-hydroxy-N6-methyladenosine, aziridinylcytosine, pseudo induced re-expression of CpG-methylated variants of APC in isocytosine, 5-(carboxyhydroxylmethyl)uracil, 5-fluorou LNCaP cells. (C) Isoform-specific expression patterns of racil, 5-bromouracil, 5-carboxymethylaminomethyl-2-thiou NDRG2 in the prostate tissue cohort (n=12). racil, 5-carboxymethyl-aminomethyluracil, dihydrouracil, 0032 FIG.23 shows DNA methylation EpiTYPER analy inosine, N6-isopentenyladenine, 1-methyladenine, 1-meth sis of WFDC2 promoter in prostate cancer. ylpseudouracil, 1-methylguanine, 1-methylinosine, 2.2-dim ethylguanine, 2-methyladenine, 2-methylguanine, 3-methyl DEFINITIONS cytosine, 5-methylcytosine, N6-methyladenine, 7-methylguanine, 5-methylaminomethyluracil, 5-meth 0033. To facilitate an understanding of the present inven oxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, tion, a number of terms and phrases are defined below: 5'-methoxycarbonylmethyluracil, 5-methoxyuracil, 2-meth 0034. As used herein, the terms “detect”, “detecting or ylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid “detection' may describe either the general act of discovering or discerning or the specific observation of a detectably methylester, uracil-5-oxyacetic acid, oxybutoxosine, labeled composition. pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiou 0035. As used herein, the term “subject” refers to any racil, 2-thiouracil, 4-thiouracil, 5-methyluracil, N-uracil-5- organisms that are screened using the diagnostic methods oxyacetic acid methylester, uracil-5-oxyacetic acid, pseudou described herein. Such organisms preferably include, but are racil, queosine, 2-thiocytosine, and 2,6-diaminopurine. not limited to, mammals (e.g., murines, simians, equines, 0042. The term “gene’ refers to a nucleic acid (e.g., DNA) bovines, porcines, canines, felines, and the like), and most sequence that comprises coding sequences necessary for the preferably includes humans. production of a polypeptide, precursor, or RNA (e.g., rRNA, 0036. The term “diagnosed, as used herein, refers to the tRNA). The polypeptide can be encoded by a full length recognition of a disease by its signs and symptoms, or genetic coding sequence or by any portion of the coding sequence so analysis, pathological analysis, histological analysis, and the long as the desired activity or functional properties (e.g., like. enzymatic activity, ligand binding, signal transduction, 0037. A “subject suspected of having cancer encom immunogenicity, etc.) of the full-length or fragments are passes an individual who has received an initial diagnosis retained. The term also encompasses the coding region of a (e.g., a CT scan showing a mass or increased PSA level) but structural gene and the sequences located adjacent to the for whom the stage of cancer or presence or absence of coding region on both the 5' and 3' ends for a distance of about methylated genes indicative of cancer is not known. The term 1 kb or more on either end Such that the gene corresponds to further includes people who once had cancer (e.g., an indi the length of the full-length mRNA. Sequences located 5' of vidual in remission). In some embodiments, “subjects' are the coding region and present on the mRNA are referred to as control Subjects that are Suspected of having cancer or diag 5' non-translated sequences. Sequences located 3' or down nosed with cancer. stream of the coding region and present on the mRNA are 0038. As used herein, the term “characterizing cancer in a referred to as 3' non-translated sequences. The term “gene’ subject” refers to the identification of one or more properties encompasses both cDNA and genomic forms of a gene. A of a cancer sample in a Subject, including but not limited to, genomic form or clone of a gene contains the coding region the presence of benign, pre-cancerous or cancerous tissue, the interrupted with non-coding sequences termed “introns' or stage of the cancer, and the Subject’s prognosis. Cancers may “intervening regions' or “intervening sequences.” Introns are be characterized by the identification of the expression of one segments of a gene that are transcribed into nuclear RNA or more cancer marker genes, including but not limited to, the (hnRNA); introns may contain regulatory elements such as methylated genes or promoters disclosed herein. enhancers. Introns are removed or “spliced out from the US 2013/0022974 A1 Jan. 24, 2013

nuclear or primary transcript, introns therefore are absent in 0047. As used herein the term “stringency’ is used in the messenger RNA (mRNA) transcript. The mRNA func reference to the conditions oftemperature, ionic strength, and tions during translation to specify the sequence or order of the presence of other compounds Such as organic solvents, amino acids in a nascent polypeptide. under which nucleic acid hybridizations are conducted. Under “low stringency conditions a nucleic acid sequence of 0043. As used herein, the term "oligonucleotide.” refers to interest will hybridize to its exact complement, sequences a short length of single-stranded polynucleotide chain. Oli with single base mismatches, closely related sequences (e.g., gonucleotides are typically less than 200 residues long (e.g., sequences with 90% or greater ), and sequences between 15 and 100), however, as used herein, the term is also having only partial homology (e.g., sequences with 50-90% intended to encompass longer polynucleotide chains. Oligo homology). Under medium stringency conditions, a nucleic nucleotides are often referred to by their length. For example acid sequence of interest will hybridize only to its exact a 24 residue oligonucleotide is referred to as a "24-mer. complement, sequences with single base mismatches, and Oligonucleotides can form secondary and tertiary structures closely relation sequences (e.g., 90% or greater homology). by self-hybridizing or by hybridizing to other polynucle Under “high Stringency conditions, a nucleic acid sequence otides. Such structures can include, but are not limited to, of interest will hybridize only to its exact complement, and duplexes, hairpins, cruciforms, bends, and triplexes. (depending on conditions such a temperature) sequences with single base mismatches. In other words, under conditions of 0044 As used herein, the terms “complementary” or high Stringency the temperature can be raised so as to exclude “complementarity’ are used in reference to polynucleotides hybridization to sequences with single base mismatches. (i.e., a sequence of nucleotides) related by the base-pairing rules. For example, the sequence “5'-A-G-T-3', is comple 0048. The term "isolated” when used in relation to a mentary to the sequence “3'-T-C-A-5". Complementarity nucleic acid, as in “an isolated oligonucleotide' or "isolated may be “partial in which only some of the nucleic acids polynucleotide' refers to a nucleic acid sequence that is iden bases are matched according to the base pairing rules. Or, tified and separated from at least one component or contami there may be “complete' or “total complementarity between nant with which it is ordinarily associated in its natural the nucleic acids. The degree of complementarity between Source. Isolated nucleic acid is such present in a form or nucleic acid strands has significant effects on the efficiency setting that is different from that in which it is found in nature. and strength of hybridization between nucleic acid strands. In contrast, non-isolated nucleic acids as nucleic acids such as DNA and RNA found in the state they exist in nature. For This is of particular importance in amplification reactions, as example, a given DNA sequence (e.g., a gene) is found on the well as detection methods that depend upon binding between host cell chromosome in proximity to neighboring genes; nucleic acids. RNA sequences, such as a specific mRNA sequence encoding 0045. The term “homology” refers to a degree of comple a specific protein, are found in the cell as a mixture with mentarity. There may be partial homology or complete numerous other mRNAs that encode a multitude of . homology (i.e., identity). A partially complementary However, isolated nucleic acid encoding a given protein sequence is a nucleic acid molecule that at least partially includes, by way of example, such nucleic acid in cells ordi inhibits a completely complementary nucleic acid molecule narily expressing the given protein where the nucleic acidis in from hybridizing to a target nucleic acid is “substantially a chromosomal location different from that of natural cells, or homologous.” The inhibition of hybridization of the com is otherwise flanked by a different nucleic acid sequence than pletely complementary sequence to the target sequence may that found in nature. The isolated nucleic acid, oligonucle be examined using a hybridization assay (Southern or North otide, or polynucleotide may be present in single-stranded or ern blot, solution hybridization and the like) under conditions double-stranded form. When an isolated nucleic acid, oligo of low stringency. A Substantially homologous sequence or nucleotide or polynucleotide is to be utilized to express a probe will compete for and inhibit the binding (i.e., the protein, the oligonucleotide or polynucleotide will contain at hybridization) of a completely homologous nucleic acid mol a minimum the sense or coding strand (i.e., the oligonucle ecule to a target under conditions of low stringency. This is otide or polynucleotide may be single-stranded), but may not to say that conditions of low stringency are Such that contain both the sense and anti-sense Strands (i.e., the oligo non-specific binding is permitted; low stringency conditions nucleotide or polynucleotide may be double-stranded). require that the binding of two sequences to one another be a 0049. As used herein, the term “purified” or “to purify” specific (i.e., selective) interaction. The absence of non-spe refers to the removal of components (e.g., contaminants) cific binding may be tested by the use of a second target that from a sample. For example, antibodies are purified by is Substantially non-complementary (e.g., less than about removal of contaminating non-immunoglobulin proteins; 30% identity); in the absence of non-specific binding the they are also purified by the removal of immunoglobulin that probe will not hybridize to the second non-complementary does not bind to the target molecule. The removal of non target. immunoglobulin proteins and/or the removal of immunoglo 0046. As used herein, the term “hybridization' is used in bulins that do not bind to the target molecule results in an reference to the pairing of complementary nucleic acids. increase in the percent of target-reactive immunoglobulins in Hybridization and the strength of hybridization (i.e., the the sample. In another example, recombinant polypeptides strength of the association between the nucleic acids) is are expressed in bacterial host cells and the polypeptides are impacted by Such factors as the degree of complementary purified by the removal of host cell proteins; the percent of between the nucleic acids, stringency of the conditions recombinant polypeptides is thereby increased in the sample. involved, the T of the formed hybrid, and the G:C ratio 0050. As used herein, the term “sample' is used in its within the nucleic acids. A single molecule that contains broadest sense. In one sense, it is meant to include a specimen pairing of complementary nucleic acids within its structure is or culture obtained from any source, as well as biological and said to be “self-hybridized.” environmental samples. Biological samples may be obtained US 2013/0022974 A1 Jan. 24, 2013 from animals (including humans) and encompass fluids, Sol and AMT)). Exemplary, non-limiting methods are described ids, tissues, and gases. Biological samples include blood below. In some embodiments, methylation is increased in one products, such as plasma, serum and the like. Such examples or more of the described genes in patients with cancer. For are not however to be construed as limiting the sample types example, in Some embodiments, methylation of genes is applicable to the present invention. increased relative to a control sample from a Subject that does not have cancer (e.g., a population average of samples, a DETAILED DESCRIPTION OF THE INVENTION control sample, a prior sample from the same patient, etc.). 0055 Any patient sample suspected of containing the 0051. The present invention relates to compositions and aberrantly methylated genes or promoters may be tested methods for cancer diagnosis, research and therapy, including according to methods of embodiments of the present inven but not limited to, cancer markers. In particular, the present tion. By way of non-limiting examples, the sample may be invention relates to methylation levels of genes (e.g., in CGI tissue (e.g., a prostate biopsy sample or a tissue sample islands of the promoter regions) as diagnostic markers and obtained by prostatectomy), blood, urine, semen, prostatic clinical targets for prostate cancer. secretions or a fraction thereof (e.g., plasma, serum, urine 0052 Beginning with precursor lesions, aberrant DNA Supernatant, urine cell pellet or prostate cells). A urine sample methylation marks the entire spectrum of prostate cancer is preferably collected immediately following an attentive progression. Experiments conducted during the course of digital rectal examination (DRE), which causes prostate cells development of embodiments of the present invention from the prostate gland to shed into the urinary tract. mapped the global DNA methylation patterns in prostate 0056. In some embodiments, the patient sample is sub tissues and cell lines using Methylplex-Next Generation jected to preliminary processing designed to isolate or enrich Sequencing (M-NGS). Hidden Markov Model based next the sample for the aberently methylated genes or promoters or generation sequence analysis identified 68,000 methylated cells that contain the aberently methylated genes or promot regions per sample. While global CpG Island (CGI) methy ers. A variety of techniques known to those of ordinary skill in lation was not differential between benign adjacent and can the art may be used for this purpose, including but not limited cer samples, overall promoter CGI methylation significantly to: centrifugation; immunocapture; cell lysis; and, nucleic increased from ~12.6% in benign samples to 19.3% and 21.8% in localized and metastatic cancertissues respectively acid target capture (See, e.g., EP Pat. No. 1 409 727, herein (p-value <2x10'). Distinct patterns of promoter methyla incorporated by reference in its entirety). tion were identified around transcription start sites, where 0057 The methylation status of the cancer markers may methylation occurred not only on the CGIs, but also on flank be detected along with other markers in a multiplex or panel ing regions and CGI sparse promoters. Among the 6,691 format. Markers are selected for their predictive value alone methylated promoters in prostate tissues, 2481 differentially or in combination with the gene fusions. Exemplary prostate methylated regions (DMRS) are cancer-specific, including cancer markers include, but are not limited to: AMACR/ numerous novel DMRs. One cancer-specific DMR was iden P504S (U.S. Pat. No. 6,262,245); PCA3 (U.S. Pat. No. 7,008, tified in the WFDC2 promoter that showed frequent methy 765); PCGEM1 (U.S. Pat. No. 6,828,429); prostein/P501S, lation in cancer (17722 tissues, 6/6 cell lines), but not in the P503S, P504S, P509S, P510S, prostase/P703P, P71OP (U.S. benign tissues (0/10) and normal PrEC cells. Integration of Publication No. 2003.0185830); RAS/KRAS (Bos, Cancer LNCaP DNA methylation and H3K4me3 data indicated an Res. 49:4682-89 (1989); Kranenburg, Biochimica et Bio epigenetic mechanism for alternate transcription start site physica Acta 1756:81-82 (2005)); and, those disclosed in utilization and these modifications segregated into distinct U.S. Pat. Nos. 5,854.206 and 6,034,218, 7,718,369, 7,229, regions when present on the same promoter. Differences in 774, each of which is herein incorporated by reference in its repeat element methylation, particularly LINE-1, were entirety. Markers for other cancers, diseases, infections, and observed between ERG gene fusion positive and negative metabolic conditions are also contemplated for inclusion in a cancers. This comprehensive methylome map furthers an multiplex or panel format. understanding of epigenetic regulation in prostate cancer pro 0058. The methylation levels of non-amplified or ampli gression. fied nucleic acids can be detected by any conventional means. 0053 Accordingly, embodiments of the present invention For example, in some embodiments, Methylplex-Next Gen provide compositions, kits, and methods useful in the detec eration Sequencing (M-NGS) methodology is utilized (See tion and screening of prostate cancer. Experiments conducted e.g., experimental section below). In other embodiments, the during the course of development of embodiments of the methods described in U.S. Pat. Nos. 7,611,869, 7,553,627, present invention identified aberent methylation status of cer 7,399,614, and/or 7,794,939, each of which is herein incor tain genes in prostate cancer. Some embodiments of the porated by reference in its entirety, are utilized. Additional present invention provide compositions and methods for detection methods include, but are not limited to, bisulfate modification followed by any number of detection methods detecting Such aberently methylated genes. Identification of (e.g., probe binding, sequencing, amplification, mass spec aberently methylated genes finds use in Screening, diagnostic trometry, antibody binding, etc.) methylation-sensitive and research uses. restriction enzymes and physical separation by methylated DNA-binding proteins orantibodies against methylated DNA I. Diagnostic and Screening Methods (See e.g., Levenson, Expert Rev Mol Diagn. 2010 May; 0054 As described above, embodiments of the present 10(4): 481-488; herein incorporated by reference in its invention provide diagnostic and Screening methods that ulti entirety). lize the detection of aberent methylation of genes or promot 0059. In some embodiments, a computer-based analysis ers (e.g., including, but not limited to, those listed in Table 4 program is used to translate the raw data generated by the (e.g., WFDC2, MAGI2, MEIS2, NTN4, GPRC5B, C90rf125, detection assay (e.g., the presence, absence, or amount of FGFR2, AOX1, VAMP5, C14orf159, PPP1R3C, S100A16 methylation of a given marker or markers) into data of pre US 2013/0022974 A1 Jan. 24, 2013

dictive value for a clinician. The clinician can access the indicators of a particular condition or stage of disease or as a predictive data using any Suitable means. Thus, in some pre companion diagnostic to determine a treatment course of ferred embodiments, the present invention provides the fur action. ther benefit that the clinician, who is not likely to be trained in 0064 Compositions for use in the diagnostic methods genetics or molecular biology, need not understand the raw described herein include, but are not limited to, probes, data. The data is presented directly to the clinician in its most amplification oligonucleotides, detection reagents, controls useful form. The clinician is then able to immediately utilize and the like. In some embodiments, reagents are provided in the information in order to optimize the care of the subject. the form of an array. 0060. The present invention contemplates any method II. DrugScreening Applications capable of receiving, processing, and transmitting the infor 0065. In some embodiments, the present invention pro mation to and from laboratories conducting the assays, infor vides drug screening assays (e.g., to Screen for anticancer mation provides, medical personal, and Subjects. For drugs). The screening methods of the present invention utilize example, in some embodiments of the present invention, a genes with aberrant methylation. For example, in some sample (e.g., a biopsy or a serum or urine sample) is obtained embodiments, the present invention provides methods of from a Subject and Submitted to a profiling service (e.g., screening for compounds that alter (e.g., decrease) the methy clinical lab at a medical facility, genomic profiling business, lation of Such genes (e.g., in the promoter region) or the etc.), located in any part of the world (e.g., in a country number of cells containing aberrant methylation. The com different than the country where the subject resides or where pounds or agents may interfere with pathways that are the information is ultimately used) to generate raw data. upstream or downstream of the biological activity of aber Where the sample comprises a tissue or other biological ently methylated DNAs. In some embodiments, candidate sample, the Subject may visit a medical center to have the compounds are antisense or interfering RNA agents (e.g., sample obtained and sent to the profiling center, or Subjects oligonucleotides) directed against aberently methylated may collect the sample themselves (e.g., a urine sample) and DNAS. In other embodiments, candidate compounds are anti directly send it to a profiling center. Where the sample com bodies or small molecules that specifically bind to aberently prises previously determined biological information, the methylated DNA regulator or expression product to inhibit its information may be directly sent to the profiling service by biological function. the Subject (e.g., an information card containing the informa 0066. In one screening method, candidate compounds are tion may be scanned by a computer and the data transmitted to evaluated for their ability to alter expression of aberently a computer of the profiling center using an electronic com methylated DNAs by contacting a compound with a cell munication systems). Once received by the profiling service, expressing a aberently methylated DNAS and then assaying the sample is processed and a profile is produced (i.e., methy for the effect of the candidate compounds on expression. In lationata), specific for the diagnostic or prognostic informa Some embodiments, the effect of candidate compounds on tion desired for the subject. expression of aberently methylated DNAs is assayed for by 0061 The profile data is then prepared in a format suitable detecting the level of aberently methylated DNA expressed for interpretation by a treating clinician. For example, rather by the cell. Expression of aberently methylated DNAs can be than providing raw expression data, the prepared format may detected by any suitable method (e.g., those described represent a diagnosis or risk assessment (e.g., presence or herein). absence of aberrant methylation) for the subject, along with recommendations for particular treatment options. The data EXPERIMENTAL may be displayed to the clinician by any suitable method. For example, in some embodiments, the profiling service gener 0067. The following examples are provided in order to ates a report that can be printed for the clinician (e.g., at the demonstrate and further illustrate certain preferred embodi point of care) or displayed to the clinician on a computer ments and aspects of the present invention and are not to be monitor. construed as limiting the scope thereof. 0062. In some embodiments, the information is first ana Example 1 lyzed at the point of care orata regional facility. The raw data is then sent to a central processing facility for further analysis Methods and/or to convert the raw data to information useful for a 0068 Reagents, Cell Lines and Prostate Tissue Samples clinician or patient. The central processing facility provides 0069 Human primary prostate epithelial cells were pur the advantage of privacy (all data is stored in a central facility chased from Lonza (Mapleton Ill.), and the prostate cancer with uniform security protocols), speed, and uniformity of cell line LNCaP was obtained from ATCC (Manassas, Va.). data analysis. The central processing facility can then control The PrEC and LNCaP cells were grown in PrEGM media the fate of the data following treatment of the subject. For (Lonza, Mapleton Ill.) and RPMI 1640 containing 10% FBS example, using an electronic communication system, the cen (Life Technologies, Carlsbad, Calif.), respectively. Human tral facility can provide data to the clinician, the Subject, or prostate tissue samples were obtained from the University of researchers. Michigan SPORE program (Table 3). All samples were col 0063. In some embodiments, the subject is able to directly lected with informed consent of the patients and prior insti access the data using the electronic communication system. tutional review board approval. CpG island microarrays were The Subject may chose further intervention or counseling purchased from Agilent Technologies, Santa Clara, Calif. based on the results. In some embodiments, the data is used Genomic DNA was isolated from cultured cells and tissue for research use. For example, the data may be used to further using DNeasy Blood and tissue kit (Qiagen, Valencia, Calif.) optimize the inclusion or elimination of markers as useful according to manufacturers instructions. 5-AZa-2'-deoxycy US 2013/0022974 A1 Jan. 24, 2013

tidine (5-Aza) was purchased from Sigma-Aldrich (St. Louis, (0072 Total RNA Isolation and Quantitative Real Time Mo.) and used at 6 LM final concentration dissolved in PCR (QPCR) DMSO. (0073 Total RNA was isolated from cells using RNeasy mini kit (Qiagen, Valencia, Calif.) according to manufactur 0070 M-NGS Library Generation er's instructions. A DNAsel treatment step was included dur 0071 Methylplex library synthesis and GC-enrichment ing the total RNA isolation procedure to remove genomic was obtained through a commercial service at Rubicon DNA from the samples. One microgram of total RNA was Genomics Inc. Ann Arbor, Mich. (FIG.9). Briefly, fifty nano used in cDNA synthesis using Superscript III reverse tran grams of g|DNA from tissues or cells were digested with Scriptase (Invitrogen, Carlsbad, Calif.). Quantitative real time methylation sensitive restriction enzymes 1 and 2 (MSRE1 PCR (QPCR) was performed on prostate cell line cDNA and MSRE2, Rubicon Genomics, Ann Arbor, Mich.) in a 100 samples using SYBR Green Mastermix (Applied Biosys uls reaction volume at 37° C. for 12 hours followed by 60° C. tems) on an Applied Biosystems 7900 RealTime PCR system incubation for 2 hours. The samples were precipitated with as described (Tomlins et al. 2007, Nat Genet 39: 41-51). All two volumes of ethanol in presence of sodium acetate pH 5.2 oligonucleotide primers were synthesized by Integrated DNA and pellet paint (VWR. Radnor, Pa.). DNA pellets were Technologies and are listed in Table 2. GAPDH primers washed with 70% ethanol, airdried and suspended in 20lls of sequences were as described (Vandesompele et al. 2002, TE buffer pH8.0. To prepare MethylPlex libraries, ten micro Genome Biol 3: RESEARCH0034). The amount of target liters of the samples from the previous step were denatured at transcript and GAPDH in each sample was normalized by 95°C. for 4 minutes, cooled to 4°C., and mixed with 4 uls of standard ddCt methodology, and then to the reference PrEC library synthesis mix (Rubicon Genomics, Ann Arbor, or DMSO-treated LNCaP samples accordingly. Mich.). The tubes were incubated at 95°C. for 2 minutes, and 0074 CpG Island Annotation returned to 4° C. before adding 1 ul of library synthesis 0075. The genomic co-ordinates for human CGIs are enzyme (Rubicon Genomics, Ann Arbor, Mich.). The reac downloaded from NCBI at the NIH. Only islands annotated tion was carried in a thermocycler under the following con as strict CpGs were used in this study. ditions, 16° C. for 20 minutes, 24°C. for 20 minutes, 37° C. (0076 RNA Seq Library Preparation for 20 minutes, 75° C. for 10 minutes and returned to 4°C. 0.077 Poly-A RNA from LNCaP and PrEC cells (200 ng) Subsequently fifteen microliters of the MethylPlex library was isolated from total RNA using SeraMag Magnetic Oligo was amplified in a BioRad iCycler real time PCR machine (dT) Beads (Thermo Fisher Scientific, Waltham, Mass.). after mixing with 60 uls of library amplification mix (Rubi RNA was fragmented at 70° C. for 5 min in a fragmentation con Genomics, Ann Arbor, Mich.), under the following cycle buffer (Ambion, Austin, Tex.), and converted to first-strand conditions, 95°C. for 2 minutes (1 cycle), followed by 9 to 13 cDNA using SuperscriptiI (Life Technologies, Carlsbad, cycles of 96° C. for 20 seconds, 65° C. for 2 minutes and 75° Calif., Carlsbad, Calif.). Second-strand cDNA synthesis was C. for 1 minute. The amplified DNA was purified using performed with Escherichia coli DNA pol I (Life Technolo QIAquick PCR purification kit (Qiagen, Valencia, Calif.), gies, Carlsbad, Calif.). The double-stranded cDNA library eluted in 50 ul volume and subjected to GC enrichment fol was further processed following Illumina Genomic DNA lowing the manufacturer's protocol (Rubicon Genomics Inc. sample preparation protocol which involved end repair using Ann Arbor, Mich.). The GC enriched DNA was purified using T4DNA polymerase, Klenow DNA polymerase and T4 Poly DNA Clean and Concentrator kit (Zymo Research, Orange, nucleotide kinase followed by a single A base addition using Calif.) and eluted in 35 uls of Tris-EDTA buffer, pH8.0. One Klenow 3' to 5’ exo polymerase. Illumina’s adaptor oligo and five micrograms of the purified products from each cell was ligated using T4DNA ligase. The adaptor-ligated library line were directly incorporated into the genomic DNA was size selected by separating on a 4% agarose gel and sequencing sample preparation kit procedure from Illumina cutting out the library smear at 200 base pairs. The library was (San Diego, Calif.) at the end repair step, skipping the nebu PCR amplified by Phusion polymerase (Finnzymes, Woburn, lization process. An adenine base was then added to the purified end repaired products using Klenow exo (3' to 5’ exo Mass.) and purified by PCR purification kit (Qiagen, Valen minus) enzyme. The reaction product was purified, ligated to cia, Calif.). The library was quantified with Bio-Analyzer Illumina adaptors with DNA ligase and resolved on an agar (Agilent Technologies, Santa Clara, Calif.) and 10 nM of each ose gel. For LnCaP and PrEC libraries, gel pieces were library was used to prepare flowcells with approximately excised at 200 and 400 positions and the DNA was 30,000 clusters per lane. The GEO accession number for extracted using a gel extraction kit (Qiagen, Valencia, Calif.). LNCaP and PrEC RNA-Seq libraries is GSE251.83 Subsequently for all tissue samples 350-450 bp gel cut was (0078 Statistical Analysis utilized. One microliter of this eluate was used as a template 0079. HMM Analysis of M-NGS Data in a PCR amplification reaction with Phusion DNA poly 0080 Hidden Markov Model (HMM) based next genera merase (Finnzymes, INC., Woburn, Mass.) to enrich for the tion sequencing analysis is conducted in a two-step process adapter modified DNA fragments. The PCR product was that takes in raw reads and outputs refined boundaries of purified and analyzed by Bioanalyzer (Agilent Technologies, enriched chromosomal regions (Qin et al. 2010, BMC Bioin San Diego, Calif.) before using it for flow cell generation, formatics 11:369). The first step includes the formation of where 10 nM of library was used to prepare flowcells with hypothetical DNA fragments (HDFs) from uniquely mapped approximately 30,000 clusters per lane. The raw sequencing reads, where the coverage of HDFs is determined by the image data were analyzed by the Illumina analysis pipeline, specified DNA fragment size and overlapped HDFs are aligned to the unmasked human reference genome (NCBI merged to represent one consecutive genomic region. The v36, hg18) using the ELAND software (Illumina) to generate second step is designed to refine the boundaries of enriched sequence reads of 25-32 bps. Additional information on region using HMM with bin size of 25bp (by default). Under sequencing runs for all cells and tissue sample runs can be null hypothesis, raw reads are assumed to land on the genome found in Table 3. The M-NGS data has been deposited under following a Poisson distribution with the background rate of accession number GSE27619 in the GEO database. r", and enriched regions are expected to have more HDFs with US 2013/0022974 A1 Jan. 24, 2013

statistical significance. The rate of the Poisson distributions in were visualized with pink and green shades (top ranked ones a given sample is assumed to be r", and the transition prob with darker shades, pink for over-expression and green for abilities are estimated empirically, based on inferred enriched repression) in heatmap format, with each row representing regions defined in the first step. The output from HMM is genes and each column representing the dataset. Final order selected based on the posterior probability of being in the of the genes is determined by averaging ranks across the enriched regions and then further filtered using maximum datasets. read counts. The threshold for maximum read counts is deter I0089 Molecular Concepts Map Analysis mined from Bonferronicorrected p-value of 0.001 calculated using a Poisson distribution with background rate r". The 0090. A complete description of the methods used to iden output is provided in BED format as well as Wiggle format for tify biological concept signatures in Molecular Concepts UCSC genome browser visualization. The output file anno Map (MCM) is available (Rhodes et al. Neoplasia 9:443-454 tation field contains information Such as enriched genomic 2007). In addition to over 15,000 biological concepts from position and length, max height, GC content, repeated Oncomine, which include manual curation of the literature, sequencing genomic position and length, mean and standard target gene sets from genome-scale regulatory motif analy deviation of conservative scores for enriched region, relation ses, and reference gene sets from several gene and protein ship with nearest genes including whether the enriched region annotation databases, a gene list from differentially methy is located within the gene or between genes, gene name, GB lated regions identified from an independent Differential accession number, Strand and distance to gene transcription Methylation Hybridization profiling (concept named “DMH start site. Tissue Methylated in PCa), as well as known methylated 0081 Calculating Gene Expression from RNA-Seq Data genes in cancers provided from Pubmeth database were 0082 Gene expression levels of passing filter reads from uploaded. In brief, MCM analysis uses Fisher's exact test to RNA-Seq data that mapped by ELAND to exons (March find various significantly enriched concepts in an uploaded 2006 assembly of UCSC KnownGene) in LNCaP and PrEC gene list and provides visual interaction networks. cell lines are quantified as described (Maher et al. Nature 458: 0091 Repeat Element Methylation Analysis 97-101 2009). 0092. The list of repeat elements predicted by Repeat I0083. One-Class SAMAnalysis Masker (RepeatMasker Open-3.0) program is downloaded 0084. Significance analysis of microarray (SAM) (Tusher from UCSC genome browser. The Methylplex-NGS data et al. Proc Natl Acad Sci USA 98: 5116-5121 2001) was from localized and metastatic prostate tissue samples are performed on the gene expression dataset obtained from divided into two groups based on their ETS gene fusion status 5-Aza and DMSO-treated LNCaP cells by selecting genes (ETS positive and ETS negative). The samples in each group that were methylated in LNCaP. From 1,171 methylated were pooled together for HMM analysis and the regions genes from LNCaP M-NGS (Table 1), a total of 973 genes identified were mapped to repeat element location. were mapped to Agilent expression profiling data. One-class (0093 ChIP-Sequencing SAM analysis was done using default settings and significant (0094) LNCaP cells ChIP-Seq data obtained using genes were calculated with a false discovery rate (FDR) of H3K4me3 antibody (Abeam) and PanH3 (Abeam) was O.05. reported previously by Yu et al., and the GEO accession I0085 Gene Set Enrichment Analysis number for the datasetis GSE14097 (Yuetal. Cancer Cell 17: I0086 Gene Set Enrichment Analysis (GSEA) is a compu 443-454 2010). ChIP samples were prepared for sequencing tational method that assesses whether a defined set of genes using the Genomic DNA sample prep kit (Illumina) following shows statistically significant, concordant differences manufacturer's protocols. To facilitate ChIP-Seq data analy between any two given conditions. The fold change between sis, a Hidden Markov Model (HMM)-based enriched region the raw counts from RNA-seq NGS data on LNCaP and PrEC identifying algorithm (described under statistical analysis) (representing 24,167 unique genes) was calculated and genes was utilized. were ranked by the order of expression in LNCaP. This list (0095 MeDIP-Sequencing was uploaded as a pre-ranked gene list to GSEA v2.04 (Broad 0096 Six micrograms of genomic DNA isolated from Institute, Cambridge, Mass.), and using respective gene lists LNCaP cells were sonicated to ~100-500 bp range and puri of methylated targets in LNCaP and PrEC cell lines, GSEA fied using Qiagen PCR purification kit. Using standard Illu was performed using a weighted enrichment statistic and mina protocol/reagents ends were repaired. A tailed and adap default normalization mode. Similarly, the fold change tors were added to the fragmented DNA. The DNA was then between the average expression value from normal/benign heat denatured at 95°C. for 10 minutes and snap cooled on (n=4) and cancer samples (n=9) profiled on Agilent Human ice. The DNA was incubated with 6 ug anti-methyl cytosine GE 44K microarray was calculated and pre-ranked (repre antibody in IP buffer (10 mM sodium phosphate buffer con senting 27,928 unique probes). This list was uploaded to taining 140 mM sodium chloride and 0.05% Triton X-100) GSEA, and enrichmentanalysis was performed using methy overnight at 4°C. in a shaker. The methylated fragments were lation target gene lists (the methylation present in promoters collected by incubating with 100 uls of protein Abeads (Invit with CGIs and without CGIs, and in gene body) in tumor rogen, Carlsbad, Calif.) for 2 hour at 4°C. The beads were samples. washed four times at 4°C. in IP buffer and resuspended in 200 0087. Oncomine Meta-Analysis uls of TE buffer containing 0.25% SDS and 5ug proteinase K 0088 A complete description of meta-analysis performed and incubated at 55° C. for two hours. The samples were in Oncomine is available (Rhodes et al. Neoplasia 9:166-180 purified using DNA Clean and Concentrator-5 kit (Zymo 2007). In brief, a genelist of interest is uploaded to the Research, Orange, Calif.) and the libraries were prepared Oncomine database, and the built-in meta-analysis tool rank following Illumina ChIP-Seq protocol. The library was quan orders the genelist by the p-value, which is determined by tified with Bio-Analyzer (Agilent Technologies, Santa Clara, Student's t-test for comparisons made within each available Calif.) and 10 nM of each library was used to prepare flow dataset (for example Cancer vs. Normal). The ranked genes cells with approximately 30,000 clusters per lane. US 2013/0022974 A1 Jan. 24, 2013

0097 Methyl-Profiler performed for 16 h at 65° C. Scanned images from Agilent 0098 Methyl-ProfilerTM (SABiosciences, Frederick, Md.) microarray Scanner were analyzed and extracted using Agi is a restriction enzyme digestion based novel technology for lent Feature Extraction Software 9.1.3.1 with linear and low CGI methylation profiling, requiring less than 500 ng input ess normalization performed for each array. A total of 4 genomic DNA (Jaspers et al. Am J Respir Cell Mol Biol 43: hybridizations were performed including two 4 day and two 6 368-375 2010). The samples were first digested with methy day 5-Aza treated samples (Cy5) against control DMSO lation-sensitive (Ms) and/or methylation-dependent (Md) treated samples (Cy3). The accession number for gene restriction enzymes along with mock digestion according to expression dataset in the GEO database is GSE27619. manufacturers instruction. PCR reactions were performed 0104 Expression Profiling of Prostate Tissues with ABIStepOne qPCR machine (Applied Biosystems, Fos 0105 Prostate tissues characterized by M-NGS, normal/ ter City, Calif.) with RT SYBR Green/ROX qPCR Master benign (n=4) and cancer (n=9), was profiled on Agilent Mix (SABiosciences, Frederick, Md.) and primers targeting Human GE 44K microarray as described for LNCaP cells the region of interest. The PCR reactions were carried out above. Total RNA from pooled normal prostate tissues with following conditions: 10 min at 95°C., followed by 40 obtained from a commercial source (Clontech laboratories, cycles of 97° C. for 15", 72° C. 1 min as described in manu Mountain View, Calif.) was used as the reference. This facturer's protocol. Using delta-Ct values, the relative microarray dataset was used in GSEA analysis to study the amounts of methylation are calculated using an automated association between DNA methylation and gene expression. Excel-based data analysis template provided by the manufac The dataset has been deposited in the GEO database. turer. The mock digested template was used for initial DNA 0106 Methylplex Library Agilent CpG Array Hybridiza input quantification, the MS enzyme was used for hyperm tion ethylation quantification, and the Md enzyme was used for 0107 Two micrograms of the purified products from each quantifying unmethylated DNA. A mixture of these 2 PrEC and LNCaP MethylPlex DNA were labeled following enzymes (Msd) was used to quantify the undigested amount the mammalian ChIP-on-chip protocol (Agilent Technolo of DNA. A methylation rate below 5% was considered not gies, Santa Clara, Calif.) starting at the sample labeling stage significant. While the calculated methylation percentage which employs a random primed, Klenow-based extension between 10 and 60 was considered intermediate, values above protocol. The samples were hybridized to an Agilent Human sixty are taken as heavy methylation. CpG 244Karray (Catil G4492A, Agilent Technologies, Santa 0099 Bisulfite Sequencing Clara, Calif.), where LNCaP sample was coupled with Cy5 01.00 Bisulfite conversion was carried out using EZ DNA and PrEC to Cy3. The slides were washed according to manu methylation gold kit (Zymo Research, Orange, Calif.) facturers instructions. A dye-flip experiment was also per according to manufacturer's instructions. Briefly 500 ng of formed. The Scanned images were analyzed and extracted genomic DNA from either LNCaP or PrEC cells in 20 ul using Agilent Feature Extraction Software 9.1.3.1. Methy volume was mixed with 130 ul of CT conversion reagent and lated regions identified by the array data was compared to was initially incubated at 98°C. for 10 minutes followed by M-NGS targets and their overlap is presented in FIG. 11. This incubation at 64°C. for 2.5 hours. M-biding buffer (600 ul) dataset has been deposited in GEO under accession number was added to the above reaction and DNA purified using a GSE27619. Zymo spin column. Sequential washes were performed with (0.108 5' Rapid Amplification of cDNA Ends (5 RACE) 100 ul M-Washbuffer, 200ul M-Sulphonation buffer and 200 0109) 5 RACE was performed as previously described ul of M-wash buffer was carried out before eluting the DNA (Han et al. Cancer Res 68: 7629-7637 2008). First-strand in 30 ul of M-elution buffer. Purified DNA (2 Jul) was used as cDNA was amplified with gene-specific reverse primers template for PCR reactions with primers (Integrated DNA RASSF1, APC, and NDRG2 (Table 2) and 5' GeneRacer Technologies, San Diego, Calif.) and synthesized according primers (Life Technologies, Carlsbad, Calif.) using Platinum to bisulfite converted DNA sequences for the regions of inter Taq High Fidelity enzyme (Life Technologies, Carlsbad, est using the Methprimer software (Li and Dahiya, Bioinfor Calif.) after the touchdown PCR protocol according to manu matics 18: 1427-1431 2002). The PCR product was gel puri facturer's instructions. PCR amplification products were fied and cloned into pCR4TOPOTA sequencing vector (Life cloned into pCR4-TOPO TA vector (Life Technologies, Technologies, Carlsbad, Calif.). Plasmid DNA isolated from Carlsbad, Calif.) and sequenced bidirectionally using vector 10 colonies from each sample was sequenced by conventional primers as described (Tomlins et al. Nature 448: 595-599 Sanger Sequencing (University of Michigan DNA Sequenc 2007). ing Core). The “BIQ Analyzer” (Bocket al. 2005) online tool 0110 Pyrosequencing was used to calculate the methylation percentage and togen 0111 LINE-1 element methylation was estimated using erate the bar graphs. PyroMark Q24 LINE-1 methylation assay (Qiagen, Valencia, 0101 Microarray Profiling Calif.) according to manufacturers instructions. Briefly, 0102 Expression Profiling of 5-Aza Treated LNCaPCells bisulfite converted gldNA (described above), LINE-1 primers 0103 For 5-Aza stimulation experiments, LNCaP cells and components of Hotstart Master Mix (Qiagen, Valencia, cultured in RPMI 1640 were treated with vehicle, dimethyl Calif.) were employed in a PCR reaction to amplify LINE sulfoxide (DMSO) or 6 uM 5-Aza for 4 or 6 days in dupli regions from the sample. The amplification was obtained cates. Total RNA was isolated with Trizol (Life Technologies, from 45 cycles of 95°C. 20 seconds, 50° C. 20 seconds, 72° Carlsbad, Calif.) and further purified using RNAeasy Micro C. 20 seconds, after an initial denaturation/enzyme activation Kit (Qiagen, Valencia, Calif.) according to the manufactur at 95°C. for 15 minutes, and final elongation of 72°C. for 5 er's instructions. Expression profiling was performed using minutes. The PCR products were captured on Streptavidin the Agilent 44K expression array. One microgram of total Sepharose beads (GE Healthcare, Piscataway, N.J.) dena RNA was converted to cRNA and then labeled according to tured to produce single Strands, washed and annealed to the manufacturer's protocol (Agilent). Hybridizations were sequencing primer and the sequence determined using the US 2013/0022974 A1 Jan. 24, 2013

PyroMark Q24 system (Qiagen, Valencia, Calif.). The mean 0117 The cancer-derived LNCaP cells displayed frequent methylation of three individual positions within the PCR methylation among the 56 previously reported methylated product is considered in this assay. promoter regions in prostate cancer tissues (36/56 in LNCaP M-NGS and 40/56 in LNCaP MeDIP-Seq) compared to PrEC Results cells (7/56 in PrEC M-NGS) (Table 5). However, this differ 0112 Characterization of DNA Methylation by M-NGS ence was absent when the promoters and gene body of known in Prostate Cells imprinted genes was examined (Morison et al. Trends Genet 0113 To perform a genome-wide analysis of DNA methy 21:457-465 2005) (24/29 in PrEC M-NGS, 23/29 in LNCaP lation in prostate cancer, Methylplex-Next Generation M-NGS and 26/29 in LNCaP MeDIP-Seq) (Table 5). Sequencing (M-NGS) methodology, which enriches methy 0118 Global Differences in CGI Methylation lated DNA using restriction enzymes and requires minimal 0119 Because hypermethylation in CpG rich promoters is input genomic DNA (i.e., 50 nanograms) was utilized. The a common feature of tumorigenesis (Issa Nat Rev Cancer 4: ability of M-NGS to identify methylated genomic regions 988-993 2004), the extent of CpG island methylation between was first evaluated in a prostate cancer cell line LNCaP and LNCaP and PrEC cells was compared. Of the 68,508 (72.74 normal PrEC cells. A schematic describing sequencing MB) CpG islands identified using Takai Jones criteria (Takai library generation is provided in FIG. 9. Briefly, Methylplex and Jones Proc Natl AcadSci USA99:3740-37452002) in the libraries were constructed by digesting input genomic DNA , 6,865 (7.6 MB) and 5,767 (6.1 MB) CpG isolated from samples with a cocktail of methylation-sensi islands were methylated in LNCaP and PrEC respectively. tive restriction enzymes, followed by ligation of adaptors Globally, a 1.7-fold increase in uniquely methylated CpG containing universal primers sequences and PCR-based islands between LNCaP and PrEC was observed, and this amplification. A second round of enzymatic treatment ratio increased to ~7-fold specifically in CpG islands associ depleted non-GC rich sequences, followed by an additional amplification step to ensure enrichment of highly methylated ated within gene promoters but not among CGIs located DNA fragments. The amplification adaptors were enzymati elsewhere (FIG. 1C). In LNCaP cells, methylation in greater cally removed prior to NGS library preparation (FIG. 9). than 88% of CpG islands located within promoters and 83% Methyplex libraries described above were constructed of CGIs in non promoters detected by M-NGS was corrobo through the commercial service provided by Rubicon rated by the MeDIP-Seq data (FIG. 11 B). Genomics Inc, Ann Arbor, Mich. I0120 Aberrant promoter methylation is thought to con 0114. For initial standardization, two different concentra tribute to tumorigenesis by repressing transcription of tumor tions (1 and 5ug) of each Methylplex sample from LNCaP suppressor genes (Jones and Baylin Cell 128: 683-692 2007). and PrEC cells was used as input DNA to obtain single-read Methylation on Ref-Seq gene promoters (+1,500 bps flanking sequencing on the Illumina Genome Analyzer II (see Meth transcription start site) was analyzed and 3,496 locations that ods for protocol details). For each cell type (LNCaP and were methylated in at least one sample were identified (FIG. PrEC) a total of 4 sequencing libraries were prepared corre 12). Visualization of these methylation marks in the context sponding to 200- and 400-bp size selections of 1 g and 5ug of promoter CGIs revealed the presence of several distinct of Methylplex product. An average of 5 million mappable methylation patterns on gene promoters (FIG. 12). Broadly, reads was obtained for each M-NGS sample. CG dinucle the promoters fell into two groups based on the presence or otides were enriched by the Methylplex procedure up to absence of a CpG island within this specified region. three-fold in mapped reads from M-NGS compared to previ Although 35% of promoters (n=1,232) lacked CpG islands, ously obtained control ChIP-Sequencing data, namely pan they exhibited methylation around the transcription start site histone ChIP-Seq (Yu et al. Cancer Cell 17:443-454 2010). (TSS) (FIG. 12). The remaining 65% (n=2,264) had CpG 0115 To demonstrate experimental consistency, a com parative analysis of data from 1 and 5 ug Methylplex DNA islands spanning the TSS and three distinct methylation pat exhibited high correlation both for reads mapping to chromo terns were observed in this group: (1) methylation was mostly some 21 and for reads mapping to all CpG islands (FIG. 10). confined (39.6%, n=1.383) to the island, and interestingly Data from 400 bp-5 ug was most enriched for CG rich with much higher frequency (greater than 6 fold difference) in sequences and showed maximum overlap (-70%) with LNCaP (n=952) compared to PrEC (n=147) cells (FIG. 12). methylation identified by hybridizing the Methylplex product (2) methylation was positioned 5' to the CpG island (11.8%, to a CpG island array (FIG. 11A). This data was selected for n=412); and (3) methylation was positioned 3' to the CpG further analysis. island (13.4%, n=469). In total, methylation flanking 5' or 3' 0116 A Hidden Markov Model (HMM)-based algorithm of promoter CpG islands accounted for 25.2% of all methy previously used for ChIP-Seq data analysis (Qin et al. BMC lation observed (n=881). To explore the role of these methy 11: 369 2010) was used to locate peaks from lation patterns in prostate cancer pathogenesis, 812 out of mapped reads obtained in each sequencing run. A 70% over 1171 unique gene promoters to be methylated only in LNCaP lap in methylated genomic regions between LNCaP (56.727 were identified (Table 1) and considered for further analysis. regions) and PrEC cells (61,615 regions) was found (FIG. The remaining 359 promoters were methylated in both 1A). Methylation located in intergenic and intronic regions of LNCaP and PreC cells. the genomes analyzed had a similar distribution (FIG. 1B). 0121 Validation of DMRs additionally, in LNCaP cells, MeDIP-Seq, a methodology 0.122 18 regions based on M-NGS data were selected and that employs 5' methylcytosine antibody to enrich methylated their methylation status was validated using a standard regions was used to identify ~68,000 methylated regions in bisulfite sequencing technique in LNCaP and PrEC cells. this cell line, which was comparable to the M-NGS results. This included fifteen DMRs in LNCaP (RASSF1, KCTD1, Moreover, there was an overall 62% concordance between all CHMP4A, APC, CDKN2A, SHC1, LAMC2, TSPAN1, the genomic regions (data not shown) and >83% in CGIs CALML3, AOX1, AMT, C9orf125 and TINAGL1), one gene identified by M-NGS and MeDIP-Seq, thereby validating the in PrEC cells (SPON2), one region methylated in both two methodologies (FIG. 11B). LNCaP and PrEC cells (NAP1L5) and a control MYC pro US 2013/0022974 A1 Jan. 24, 2013 moter region that was unmethylated in both cell types. The I0127 Next, 6619 promoter methylation events (within UCSC genome browser view of methylation in the two +1500 bps flanking the transcriptional start site) present in samples by M-NGS and methylation in LNCaP by MeDIP either normal, benign adjacent, localized or metastatic pros Seq, along with gene schematic, primer sequences and tate cancer samples were identified (FIG.3 and Table 4). Of bisulfite sequence amplicon locations are presented in FIGS. 6619 total methylation events, 2737 were found in all samples 12 to 15 and Table 2. Notably, the results for all 18 regions and 1401 of the remaining 3,882 were absent in normal pros confirmed the data generated by M-NGS (FIG. 1D and FIG. tate samples and PrEC cells but present in benign adjacent 16). prostates. Nearly all of the 56 previously reported prostate 0123. In addition, overexpression of a significant number cancer methylated regions from pubmeth.org and a recent of LNCaP methylated genes following 5-Aza treatment of study (Kronetal. PLoS One 4: e4830 2009) showed increased cells in a functional validation strategy using gene expression methylation in cancer tissues (Table 5). arrays was observed. A total of 973 out of 1171 methylated I0128. In order to identify DMRs with functional signifi genes in LNCaP were present in gene expression array data. cance, promoter methylation events associated with tran Significance Analysis of Microarray (SAM) results showed Scriptional changes were examined. Promoters methylated in upregulation of 246 out of 973 methylated genes at 5% false cancer were significantly associated with gene repression discovery rate (FIG. 17), Supporting epigenetic regulation of regardless of whether that promoter contained (p<0.001) or these genes. lacked (p<0.001) a CpG island by GSEA, while genes that 0.124. To identify molecular concepts enriched in our displayed coding exon methylation tended to be overex DMRs, the dataset was analyzed using the Molecular Con pressed (p<0.024) (FIG. 4). Oncomine meta-analysis with 13 cept Map (MCM) analysis derived from the Oncomine data different prostate cancer gene expression dataset further Sup base (Rhodes et al. Neoplasia 9:443-4542007; Tomlins et al. ported methylated candidates association with gene repres Nat Genet 39: 41-51 2007). MCM analysis of 789 out of 813 Sion. Several previously characterized methylation targets genes methylated only in LNCaP that mapped to the Oncom (GSTM2, GSTM1, S100A6, PYCARD and RARRES1) were ine database (FIG. 18 and Table 1), revealed preferential present among this list, thereby validating the approach. enrichment with under-expressed genes signatures from I0129. MethylProfiler PCR (Jaspers et al. Am J Respir Cell localized and metastatic PCa Samples (lowest p-value Mol Biol 43: 368-375 2010) was used as an independent <1.90E-14) from several independent studies. Further, the evaluation of the methylation status of a novel target region in signatures "genes previously known to be methylated in pros WFDC2 (WAP four-disulphide core domain protein 2, pre tate cancer (p-value <1.40E-06) (Ongenaert et al. Nucleic viously called HE4), a recently reported prostate methylation Acids Res 36: D842-846 2008), and “-tumor target TACSTD2 (Ibragimova et al. Cancer Prev Res (Phila) suppressor genes' (p-values0.009) were significantly 3: 1084-1092 2010) and the well characterized GSTP1, all enriched (FIG. 18A). By contrast, PrEC cells did not share identified in this M-NGS study. WFDC2, which ranked 25" this enrichment and MCM analysis of PrEC-only methylated in Oncomine meta-analysis, was methylated in 100% (6/6) of regions, revealed only concepts pertaining to histone modifi transformed prostate cell lines and 77% (17/22) of cancer cations and were common to both PrEC and LNCaP MCM tissues but not in benign tissues or PrEC (FIG. 5A). In addi analysis (FIG. 18B). Finally, integration with RNA-Seq data tion, WFDC2 methylation in select samples was indepen revealed an association between gene repression and pro dently confirmed by bi-sulfite sequencing (FIG. 21). By com moter methylation, globally by Gene Set Enrichment Analy parison, the TACSTD2 promoter was less frequently sis (GSEA) (FIG. 19) and upon specific evaluation of select methylated, with 21% (5/23) of cancer tissues and 9% (1/11) genes (FIG. 20). For example, TIG1, GSTP1, CALML3, of benigntissues showing hypermethylation, and prostate cell TASCTD2 and KCTD1 were methylated and repressed spe lines similarly exhibited variable levels of methylation (FIG. cifically in LNCaP. compared to SPON2 and GAGE genes 5B). In contrast, the well characterized GSTP1 promoter that were methylated and repressed only in PrEC cells. HIC1 showed frequent methylation in cancer tissues (86%) and in showed basal transcript expression and was methylated in all transformed cell lines (100%), similar to WFDC2 (FIG. both cell types. 5C). 0125 Characterization of DNA Methylation in Prostate 0.130 Regulation of Transcript Variant Expression by Cancer Tissues DNA Methylation 0126 Having established the robustness of M-NGS to I0131. It was also observed that a subset of genes displayed identify highly methylated regions in cell line models 17 selective promoter methylation in a transcript isoform-spe prostate tissues (6 benign adjacent, 2 normal, 5 localized cific manner RASSF1, frequently inactivated by epigenetic prostate cancer and 4 metastatic prostate cancer specimens) alteration in human cancers (Dammann et al. Histol Histo were next characterized (Table 3). A genome-wide assess pathol 20: 645-663 2005), is comprised of three distinct vari ment of both benign adjacent and cancer tissues showed a ants. In LNCaP. DNA methylation-mediated silencing was similar number of methylation events within intergenic and observed of the longer transcript of RASSF1, variant-1, while intronic regions (FIG. 2A). Of the total 68.508 CGIs present the smaller isoform, variant-3, that codes for an N-terminal genome-wide, 18.5, 19.7 and 20.2 percent of all CGIs were variant protein expressed in multiple cancer cell lines and methylated in benign, localized and metastatic cancer tissues including PCa, retains high expression (FIG. 6 A, B) samples respectively (FIG. 2B). A significant increase in pro (Dammann et al. 2000, supra; Kuzmin et al. Cancer Res 62: moter-associated CGI methylation (Pearson’s Chi Squared 3498-3502 2002). Active transcription of variant-3 in LNCaP Test, p-value <2x10') paralleled prostate cancer progres cells is supported by histone 3 lysine 4 trimethylation sion (Benign 12.6%, Localized PCa 19.3% and Metastatic (H3K4me3) as observed in previously obtained ChIP-Seq PCa 21.8%), whereas methylation of intragenic CGIs data (Yu et al. 2010, supra), and 5' Rapid Amplification of remained essentially unchanged (-26.5%) among the three cDNA Ends (5"RACE) showed presence of shorter transcripts groups (FIG. 2B). but not variant-1 in LNCaP (FIG. 6A). Isoform-specific US 2013/0022974 A1 Jan. 24, 2013 methylation of variant-1 was confirmed by preferential re mately 40-50% of patients and serve as the most frequent expression of this transcript upon 5-Aza treatment of LNCaP genetic aberration in this disease (Kumar-Sinha et al. Nat Rev cells (FIG. 6B). Segregation of epigenetic marks into distinct Cancer 8: 497-511 2008). DNA methylation differences genomic regions was found in promoters containing CpG between patients harboring or lacking an ETS gene fusion islands when we Superimposed the promoter methylation and thus provides insights into transcriptional program of ERG in H3K4me3 ChIP-seq data from LNCaP cells (Yu et al. 2010, prostate cancer. The 5 ERG fusion-positive (ETS-Positive) supra). (FIG. 7). While integration of other epigenetic marks patients and 4 fusion-negative (ETS-Negative) in the cohort is necessary for a full analysis, these data indicate that mul were compared and more than 40 Mb of DMRs specifically tiple epigenetic modifications may co-occur in distinct pat associated with ETS positive samples were observed. The terns to regulate transcript expression in cancer. majority of DMRs in ETS-Negative samples were also shared 0.132. Since the M-NGS methodology accurately detected with benign samples (FIG. 8A). ETS-Positive samples also DNA methylation events of RASSF1, the data for differential contained higher repeat element methylation compared to methylation of transcript variants compared to H3K4me3 ETS-Negative samples (FIG. 8B). In particular, assessment marks was queried, and 34 genes were identified in LNCaP of global LINE-1 methylation by an independent pyrose that exhibit isoform-specific promoter methylation (Table 6). quencing analysis on a prostate tissue cohort (n=20), revealed Two genes from this list (NDRG2 and APC) were validated a significant decrease in LINE-1 element methylation (FIG. 6D and FIG. 22A). In both of these candidates, the (p-value <0.0001) in ETS-Negative compared to ETS-Posi transcript variants (variants 1-4 in NDRG2 and variant-2 and tive samples (FIG. 8C). These data indicate that previous -3 in APC) showing DNA methylation were confirmed to be studies documenting global hypomethylation of LINE-1 ele under-expressed in LNCaP cells compared to PrEC cells by ments in prostate cancer may miss Subtleties present in dif qRT-PCR and 5'RACE (FIG. 6E and FIG. 19A). Furthermore, ferent molecular subtypes of this disease. these variants were preferentially re-expressed upon 5-Aza I0135 Methylation Analysis of WFDC2 Gene Promoter treatment of LNCaP cells. To determine whether patient tis (0.136 FIG. 23 shows DNA methylation analysis of Sues demonstrated similar isoform-specific expression pat WFDC2 gene promoter using EpiTYPER analysis. A gldNA terns, NDRG2 isoforms were tested in 2 normals, 3 adjacent sample cohort (n=42) that comprises prostate cell lines (10), normals, 5 localized PCa, and 2 metastatic samples by qRT Normal (6), localized PCa (7) and metastatic PCa (19) tis PCR. Similar to LNCaP cells, variants 1-4 were significantly Sues. Bisulfite conversion was performed, specific genomic under-expressed compared to variants 5-8 in localized PCa regions of interest in WFDC2 gene promoter were PCR (p-value=0.034) and adjacent benign prostate (p-value=0. amplified and the DNA methylation status of CG residues 012), but not in normal (non-prostate cancer) tissues (FIG. was characterized using EpiTYPER analysis. The results 22B). In addition, previously obtained RNA-Seq data from from this analysis is presented in FIG. 23, where the WFDC2 LNCaP cells supported the above observation for RASSF1 promoter region showed a high frequency of methylation and NDRG2 genes (FIGS. 6C and 6F). compared to normal counterparts. 0.133 Methylation Differences Between ETS-Positive I0137 Four different PCR products (4 vertical column and ETS-Negative Tissues groups) that monitored 26 CG position in WFDC2 promoter 0134) Transcription factor occupancy has a protective role were characterized by EpiTYPER analysis. Samples are rep in limiting the spread of DNA methylation into affected CpG resented in rows and CG position monitored is represented in islands (Gebhard et al. Cancer Res 70: 1398-1407 2010). In columns. Unmethylated State and increasing degree of prostate cancer, gene fusions involving ETS transcription methylation at each position per sample is represented as a factors (most commonly ERG and ETV1) occur in approxi heatmap. TABLE 1. A4GALT ABBA-1 ABCC3 ABCG4 ABHD1 ABHD9 ACAA2 ACADL ACE ACP1 ACP5 ACPL2 ACSS1 ACSS3 ADAM32 ADAMTS7 ADAMTSL3 ADM ADORA2B ADRA1A ADRA2B ADRA2C ADRB1 AGBL2 ALDH1L1 ALOX12 ALOX15 ALS2CR11 AMOTL1 AMOTL2 ANKRD2OA2 ANKRD2OA3 ANKRD29 ANKRD58 ANUBL1 ANXA2 AOX1 AP2A2 AP3M2 APCDD1 AQP5 ARHGAP4 ARHGEF3 ARL10 ARL4C ARL9 ARNT2 ARNTL ARRB1 ARSG ARSI ARTN ARX ASCL4 ASTN2 ATOH1 ATP8A2 ATP9B AVPR1A B4GALNT3 BAG2 BAHCC1 BAMBI BCL2L10 BCL6B BCL9L. BCR BIRC3 BLVRB BNC2 BOLA1 BTBD14A BTG4 C10orf116 C12Orfs 6 C13orf51 C14orf169 C14orf39 C15orf26 C17orf246 C17orf64 C17orf32 C19Crf23 C19orf26 C19Crf5S C1orf101 C1orf104 C1orf113 C1 orf183 C1orf190 C1orf194 C1 orf58 C1orf59 C1QL3 C1QL4 C20orf149 C20orf35 C21orf123 C2Orf39 C2OrfA4 C2Orfss C3orf244 C3orf52 C3orf54 C3orf5S C3orf57 CSOrf3S C6orf138 C6orf145 C6orf150 C7orf241 C8orfA1 C9Crf12S C9orf129 C9Crf25 C9Crfaf C9orf64 CABLES1 CACNA1D CACNA1H CACNG4 CALHM2 CALML3 CAMK1D CARTPT CBX4 CBX8 CCDC109B CCDC11 CCDC122 CCDC134 CCDC4 CCDC67 CCDC74B CCDC87 CCDC96 CCK CCND1 CCNTL CCS CD14 CD8A CDC14B CDH3 CDO1 CDYL CEBPA CGB7 CHAD CHKA CHMP4A CHST11 CHST7 CIDEB CIRBP CLDN10

US 2013/0022974 A1 Jan. 24, 2013 15

TABL 2- Continued Amplicon Genomic Accession Size Gene Location Number Forward Primer Reverse Primer Applications (bp)

TINAGL1 chr1:31814.575 - NAA NCaP ATTTTGTAAGT TTC TATTCCTA BS- sequencing 344 3.1814918 Hyper TTATGAGTTGT TATCCC, TCTAT GGG CCC (SEQ ID NO: 8) (SEO ID NO. 39)

TSPAN1 chr1: 46418579 - NCaP TGTTTTATTTTT CACACACCACT BS- sequencing 224 46.4.18802 Hyper GGGTTGTTTTT ACT CACC TACA TT AAC (SEO ID NO: 9) (SEQ ID NO: 40)

CALML3 chr10:555 6581 NCaP TTATTTAAGGG ATTTAAACAAA. BS- sequencing 25 O 55.5683. O Hyper AAGAAAGGGT AATCCAAAACC ATTG TAC (SEQ ID NO : 10) (SEQ ID NO: 41)

SPON2 - 1 chr4 : 1156947 PrBEC TAGTTTATATG AAAACCCCTAA BS- sequencing 32O 1157.266 Hyper TTGGAAGTGG AAAAACTCTAC TTGG ACC (SEQ ID NO: 11) (SEQ ID NO: 42)

SPON2-2 chr4 : 11571-64 PrBEC TAGGAAGAGTT CATTAACAAAA BS- sequencing 3.25 1157488 Hyper ATAGAAAGGG TTCCAAACATC GGTT AAA (SEQ ID NO: 12) (SEQ ID NO : 43)

chr1 O:933 82583 - NCaP TTTTTTGGGGT AACCCT CAATC BS- sequencing 233 93382815 Hyper GTTTATTTTTA TCTCCCAAC GAG (SEQ ID NO: 44) (SEQ ID NO : 13)

AMT chr3 : 49434492 - NCaP AATAGGGAGG AAACCCTAATA BS- sequencing 251 4943 4742 Hyper AAGGTTTGATT ACACTAAACCC AGAT AAC (SEQ ID NO: 14) (SEQ ID NO: 45)

NTN4 chr12 : 947 Of 434 - NCaP GTTTAAGAAAG AAAAAACCAAA BS- sequencing 271 947 Off O4 Hyper TTAGGATGGG TAAAAACACCT AGTG AAAAC (SEQ ID NO: 15) (SEQ ID NO: 46)

AOX1 chir2: 2O1158884 - NCaP GAGTTTTTGGT CAAATCTAAAA BS- sequencing 274 2O1159157 Hyper AAAGAGTTTAG ACAAAAAAAAC GAA CC (SEQ ID NO: 16) (SEO ID NO: 47)

chre : 1.O.32891. Os NCaP ATTGTTTAGGT CTTTTCCCTAA BS- sequencing 244 103289.348 Hyper TGGTTAGTTTA CTACCCATCAA TTT TTA (SEO ID NO : 17) (SEQ ID NO: 48)

chre : 1032881O4 NCaP GTATGGTGAG ACTAAATCCAA BS- sequencing 242 103288.345 Hyper ATTATTATTGG AACCCTAAACA AGT TC (SEQ ID NO: 18) (SEQ ID NO: 49)

NAP1L5 chr4:89837763 NCaP GGGTTTTTTAG CAAAATCTCTC BS- sequencing 241 89838003 and Pre C TTATTTGATTA TAAACCAACTC Hyper GT (SEO ID NO : 50) (SEQ ID NO : 19)

NAP1L5 chr4:89837763 NCaP GGGTTTTTTAG AAAACCCTCCTAA BS- sequencing 89837.965 and Pre C TTATTTGATTA ACCTCTAC Hyper GT (SEQ ID NO: 51) (SEQ ID NO: 20)

MYC chr8: 128815497 Negativ e TTTGTTTTTGT ATTACTCCTAC BS- sequencing 214 1288.15710 control TTTTATTTGATT CTCCAAACCTT TT TAC (SEQ D NO: 21) (SEQ ID NO: 52) US 2013/0022974 A1 Jan. 24, 2013 16

TABLE 2 - continued Amplicon Genomic Accession Size Gene Location Number Type Forward Primer Reverse Primer Applications (bp) APC NAA NM_001127511 variant 1 TCAGTTCTCGG TCCTTGGCTAC qPCR GTCCTGGAG CCTTGGAC (SEQ ID NO: 22) (SEQ ID NO: 53 ) APC NAA NM_001127510 Variant 2 GTGTCACTGG TAGATT qPCR AGACAGAATG CACATCAGCC GA ATCTGC (SEQ ID NO: 23) (SEQ ID NO: 54) APC NAA NM 000038 Wariant 3 AGGGTGTCAC CTACCCTTGGA qPCR TGGAGACAGA CCCCATTTC AT (SEO ID NO : 55) (SEQ ID NO: 24) GAPDH NAA NM_002046 NA CTGAACGGGA TTACTCCTTGG qPCR AGCTCACTGG AGGCCATGTG CA GG (SEO ID NO: 25) (SEO ID NO : 56) GSTP1 NAA NM 000852 NA GACTTGCTGCT AGGTTCACGTA qPCR GATCCATGA CTCAGGGGAG (SEQ ID NO: 26) (SEO ID NO : 57) NDRG2 NAA NM 210535 Variant 1 CCTTGTTGTCC ACAGTGGCTT qPCR NM_016250 Variant 2 AACTTCTCCC CTCCTCTGTGA NM 201536 Variant 3 (SEQ ID NO: 27) T NM 210537 Variant 4 (SEO ID NO. 58) NDRG2 NAA NM 201538 Wariant 5 GAGTCAAAGG ACAGTGGCTT qPCR NM 201539 Wariant 6 CAAGTGAAGG CTCCTCTGTGA NM 201540 Wariant 7 TG T NM 201541 Variant 8 (SEQ ID NO: 28) (SEO ID NO. 59) RASSF1 NAA NM_007182 Variant 1 GACCTCTGTG GGCAGGTGAA qPCR NM 170714 Wariant 4 GCGACTTCAT CTTGCAATC (SEQ ID NO : 29) (SEQ ID NO: 6O) RASSF1 NAA NM 170712 Wariant 2 AGGTGGCCAA AACAGTCCAG qPCR CATTAGAGTCC GCAGACGAG (SEQ ID NO : 30) (SEQ ID NO : 61) RASSF1 NAA NM 170713 Variant 3 CTTCTTTCGAA TCCGAGTCCG qPCR ATGACCTGGA AGTCCTCTT G (SEQ ID NO: 62) (SEQ ID NO : 31) APC NAA All Variants 1st round N/A TGCAATGGCC 5 RACE TGTAGTCCCC CTAGT (SEQ ID NO: 63) nested NAA AGCCTTCGAG 5 RACE PCR GTGCAGAGTG TGTGCT (SEQ ID NO: 64) NDRG2 NAA All variants 1st round N/A CCAGGGGCAT 5 RACE CCACATGAAC CCGCA (SEO ID NO : 65) nested NAA CGCTGGGCGT 5 RACE PCR TTGGGTTTGG GGGT (SEQ ID NO: 66) RASSF1 NAA All Variants 1st round N/A CAGAGCCATA 5 RACE CCTGGCTACA C (SEO ID NO : 67) nested NAA GCCGCAGGGG 5 RACE PCR CTGCTCATCAT CCA (SEQ ID NO: 68) US 2013/0022974 A1 Jan. 24, 2013 17 US 2013/0022974 A1 Jan. 24, 2013 18