USOO80671 68B2

(12) United States Patent (10) Patent No.: US 8,067,168 B2 Ordway et al. (45) Date of Patent: Nov. 29, 2011

(54) METHYLATION IN CANCER OTHER PUBLICATIONS DAGNOSIS GenBank Locus AF369786 Homo sapiens growth hormone Secretagogue receptor gene, complete cols, alternatively spliced (Jul. (75) Inventors: Jared Ordway, St. Louis, MO (US); 7, 2001), from www.ncbi.nlm.nih.gov. 5 printed pages.* Jeffrey A. Jeddeloh, Oakville, MO Juppner H. Bone (1995) vol. 17 No. 2 Supplement, pp. 39S-42S.* (US); Joseph Bedell, Chesterfield, MO GenBank Locus AC069523 “Homo sapiens 3 BAC RP11-1K20 (Roswell Park Cancer Institute Human BAC Library) complete (US) sequence” (Aug. 30, 2002) from www.ncbi.nlm.nih.gov, pp. 1-31.* Rein T. et al. Nucleic Acids Research (1998) vol. 26, No. 10, pp. (73) Assignee: Orion Genomics LLC, St. Louis,MO 2255-2264. (US) Gaytan F. etal. The Journal of Clinical Endocrinology & Metabolism (Mar. 2005) vol. 90, No. 3, pp. 1798-1804.* (*) Notice: Subject to any disclaimer, the term of this Cmaeron E.E. et al Blood (Oct. 1, 1999) vol. 94, No. 7, p. 2445 patent is extended or adjusted under 35 2451.* Costello, Josoph F. et al.; “Graded Methylation in the Promoter and U.S.C. 154(b) by 462 days. BOdy of the O-Methylguanine DNA Methyltransferase (MGMT) Gene Correlates with MGMT Expression in Human Glioma Cells'; (21) Appl. No.: 11/809,355 1994. The Journal of Biological Chemistry, vol. 269, No. 25, pp. 17228-17237. (22) Filed: May 30, 2007 Widschwendter, Martin et al.; "Methylation and Silencing of the Retinoic Acid Receptor-B2 Gene in Breast Cancer'; 2000, Journal of (65) Prior Publication Data the National Cancer Institute, vol. 92, No. 10, pp. 826-832. Bae, Young Kyung et al., “Hypermethylation in histologically dis US 2007/O2985O6A1 Dec. 27, 2007 tinct classes of breast cancer'; 2004, Clinical Cancer Research. An Official Journal of the American Association for Cancer Research, Related U.S. Application Data vol. 10, No. 18, pp. 5998-6005. Makiyama, K. et al.; "Aberrant expression of HOX in human (60) Provisional application No. 60/803,571, filed on May invasive breast carcinoma'; 2005. Oncology Reports, vol. 13, No. 4. 31, 2006, provisional application No. 60/848,543, pp. 673-679. filed on Sep. 28, 2006. Yan, P.S. et al.; “Differential distribution of DNA methylation within the RASSF1A CpG island in breast cancer’: 2003, Cancer Research, (51) Int. Cl. vol. 63, No. 19, pp. 6178-6186. CI2O I/68 (2006.01) CI2P 19/34 (2006.01) * cited by examiner C7H 2L/02 (2006.01) Primary Examiner — Stephen Kapushoc (52) U.S. Cl...... 435/6.1; 435/91.2:536/23.1 (74) Attorney, Agent, or Firm — Kilpatrick Townsend & (58) Field of Classification Search ...... None Stockon LLP See application file for complete search history. (57) ABSTRACT (56) References Cited The present invention provides DNA biomarker sequences U.S. PATENT DOCUMENTS that are differentially methylated in samples from normal individuals and individuals with cancer. The invention further 2004/023496.0 A1* 11/2004 Olek et al...... 435/6 provides methods of identifying differentially methylated FOREIGN PATENT DOCUMENTS DNA biomarker sequences and their use in detecting and WO WO O2/OO927 A2 1, 2002 diagnosing cancer. WO WO O2, 18632 A2 3, 2002 WO WO 2007/O16668 A2 2, 2007 14 Claims, 18 Drawing Sheets U.S. Patent Nov. 29, 2011 Sheet 1 of 18 US 8,067,168 B2

U.S. Patent Nov. 29, 2011 Sheet 3 of 18 US 8,067,168 B2

??????).

3æðxxxxxxxx

3 ??38

* *****?g U.S. Patent Nov. 29, 2011 Sheet 4 of 18 US 8,067,168 B2

¿???????

3

!xxxx

..?$$.3

33

3 *

*

U.S. Patent Nov. 29, 2011 Sheet 5 of 18 US 8,067,168 B2

U.S. Patent Nov. 29, 2011 Sheet 6 of 18 US 8,067,168 B2

U.S. Patent US 8,067,168 B2

*********3 3.***33

?au??a & 33.ag

}} { 3 $$$$ {{}}

U.S. Patent Nov. 29, 2011 Sheet 8 of 18 US 8,067,168 B2

Figure 6. Demonstration of threshold adjustment for determining sensitivity and specificity.

ha1p 39189; Locus number 2

Tumor Normal Blood n = 25 in a 25 n 24 U.S. Patent US 8,067,168 B2

O U.S. Patent Nov. 29, 2011 Sheet 10 of 18 US 8,067,168 B2

Figure 8. Correlation between qPCR based DNA methylation detection and 5 methylcytosine content measured by bisulfite sequencing.

SO ... - -o-o: ------na-a-War-wa-a- y = 0.1838x +0.0391 Rs 0.7965 ( 50% f R i.40% -1 O. 30% . -1 : 20% - 2 a 10% e 0. 0. O% O 0.5 15 2 2.5 MethylScreen Avg. dCt N-- N-- Normat Tutor

U.S. Patent Nov. 29, 2011 Sheet 12 of 18 US 8,067,168 B2

Figure 10A. Detection of tumor-specific DNA methylation in Stage I breast tumors relative to Stage II-III breast tumors.

3 - y = 0.9815x w w m IM y = X

8

s

S

O 20 40 60 80 100 Frequency (Stage I/III)

U.S. Patent Nov. 29, 2011 Sheet 14 of 18 US 8,067,168 B2

Figure 11A. Examples of DNA methylation data for four selected biomarker loci.

Biomarker Locus Number: 2

wa

c o s C c r C -i: -- i. Blood Normal Tumor Blood Normal Tumor

Biomarker Locus Number:

f se

s Blood Normal Tumor Blood Normal Tumor U.S. Patent Nov. 29, 2011 Sheet 15 of 18 US 8,067,168 B2

Figure 11B. ROC curve plots for loci shown in Figure 11A.

OO 0.2 O4 O6 O8 O 1-Specificity U.S. Patent Nov. 29, 2011 Sheet 16 of 18 US 8,067,168 B2 Figure 12. Analysis of DNA methylation by bisulfite sequencing. A. B. ao. 80% Ros 2. 60% g 60% E 40% & 40% E g is no < 20% g 20% a. 0% 0% r O 2 3 4 5 2 3 4. 5 C. Avg delta Ct D. CpG position 100% O 100% 80% 3. 80% s hur E 60% 60% SS O) 40% E 40% 3. SS 20% ------0% CC O upour full- T T - 4 1 2 3 4 5 6 7 8 9 CpG position E. F. 100% () 3. 80% has 3. 60% C E 40% SS 20% O% 0% O 2 3 4. 1 2 3 4 5 6 7 Avg delta Ct CpG position

t 80% 80% () D 60% s 60% s E t SS 40% 2 40% g SS C 20% g 20% 0% 3 || -- *-tt-, i.e., O 1 2 3 4. 1 2 3 4 5 6 7 8 9 Avg delta Ct CpG position U.S. Patent Nov. 29, 2011 Sheet 17 of 18 US 8,067,168 B2

Figure 13. Hypermethylation is associated with decreased transcription.

Tumor 2 Tumor 3

g wa s v s sa) s s s s we U.S. Patent Nov. 29, 2011 Sheet 18 of 18 US 8,067,168 B2

Figure 14, DNA methylation detection in fine needle aspirate specimens.

100% y = 0.817x + 0.0038 90% R = 0.7417 80% 70% 60% 50% 40% 40% 50%. 60% 70% 80% 90%. 100% % POSITIVE TUMOR US 8,067,168 B2 1. 2 GENE METHYLATION IN CANCER targeting these early changes, leading to more effective can DAGNOSIS cer treatment. The present invention addresses these and other problems. CROSS-REFERENCE TO RELATED PATENT APPLICATIONS BRIEF SUMMARY OF THE INVENTION The present application claims benefit of priority to U.S. The present invention provides methods for determining Provisional Patent application No. 60/803,571, filed May 31, the methylation status of an individual. In one aspect, the 2006, and U.S. Provisional Patent application No. 60/848, methods comprise: 543, filed Sep. 28, 2006, each of which are incorporated by 10 obtaining a biological sample from an individual; and reference in their entirety. determining the methylation status of at least one cytosine within a DNA region in a sample from the individual BACKGROUND OF THE INVENTION where the DNA region is a SEQID NO: selected from the group consisting of 213, 214, 215, 216, 217, 218, Human cancer cells typically contain Somatically altered 15 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, genomes, characterized by mutation, amplification, or dele 230, 231, 232, 233,234, 235, 236, 237, 238,239, 240, tion of critical genes. In addition, the DNA template from 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, human cancer cells often displays Somatic changes in DNA 252,253,254, 255, 256, 257, 258, 259, 260, 261,262, methylation. See, e.g., E. R. Fearon, etal, Cell 61:759 (1990); 263,264, and 265. P. A. Jones, et al., Cancer Res. 46:461 (1986); R. Holliday, In a further aspect, the methods comprise determining the Science 238:163 (1987); A. De Bustros, et al., Proc. Natl. presence or absence of cancer (including but not limited to Acad. Sci. USA 85:5693 (1988); P. A. Jones, et al., Adv. breast, lung, renal, liver, ovarian, head and neck, thyroid, Cancer Res. 54:1 (1990); S. B. Baylin, et al., Cancer Cells bladder, cervical, colon, endometrial, esophageal, or prostate 3:383 (1991); M. Makos, et al., Proc. Natl. AcadSci, USA cancer or melanoma) in an individual. 89:1929 (1992); N. Ohtani-Fujita, et al., Oncogene 8:1063 25 In some embodiments, the methods comprise: (1993). a) determining the methylation status of at least one DNA methylases transfer methyl groups from the universal cytosine within a DNA region in a sample from an indi methyl donor S-adenosyl methionine to specific sites on the vidual where the DNA region is at least 90%, 91%, 92%, DNA. Several biological functions have been attributed to the 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, or methylated bases in DNA. The most established biological 30 comprises, a sequence selected from the group consist function is the protection of the DNA from digestion by ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, cognate restriction enzymes. This restriction modification 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, phenomenon has, so far, been observed only in bacteria. 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, Mammalian cells, however, possess different methylases 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, that exclusively methylate cytosine residues on the DNA that 35 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, are 5' neighbors of guanine (CpG). This methylation has been 264, and 265; shown by several lines of evidence to play a role in gene b) comparing the methylation status of the at least one activity, cell differentiation, tumorigenesis, X- cytosine to a threshold value for the biomarker, wherein inactivation, genomic imprinting and other major biological the threshold value distinguishes between individuals processes (Razin, A., H., and Riggs, R. D. eds. in DNA Methy 40 with and without cancer, wherein the comparison of the lation Biochemistry and Biological Significance, Springer methylation status to the threshold value is predictive of Verlag, N.Y., 1984). the presence or absence of cancer in the individual. In eukaryotic cells, methylation of cytosine residues that In some embodiments, the methods comprise: are immediately 5' to a guanosine, occurs predominantly in a) determining the methylation status of at least one CG poor loci (Bird, A., Nature 321:209 (1986)). In contrast, 45 cytosine within a DNA region in a sample from an indi discrete regions of CG dinucleotides called CG islands (CGi) vidual where the DNA region is at least 90%, 91%, 92%, remain unmethylated in normal cells, except during X-chro 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, or mosome inactivation and parental specific imprinting (Li, et comprises, a sequence selected from the group consist al., Nature 366:362 (1993)) where methylation of 5' regula ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, tory regions can lead to transcriptional repression. For 50 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, example, de novo methylation of the Rb gene has been dem 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, onstrated in a small fraction of retinoblastomas (Sakai, et al., 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, Am. J Hum. Genet., 48:880 (1991)), and a more detailed 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, analysis of the VHL gene showed aberrant methylation in a 264, and 265; Subset of sporadic renal cell carcinomas (Herman, et al., Proc. 55 b) comparing the methylation status of the-at least one Natl. Acad. Sci. U.S.A., 91.9700 (1994)). Expression of a cytosine to a threshold value for the biomarker, wherein tumor suppressor gene can also be abolished by denovo DNA the threshold value distinguishes between individuals methylation of a normally unmethylated 5' CG island. See, with and withoutbreast cancer, wherein the comparison e.g., Issa, et al., Nature Genet. 7:536 (1994); Merlo, et al., of the methylation status to the threshold value is pre Nature Med. 1:686 (1995); Herman, et al., Cancer Res., 60 dictive of the presence or absence of breast cancer in the 56:722 (1996); Graff, et al., Cancer Res., 55:51.95 (1995); individual. Herman, et al., Cancer Res. 55:4525 (1995). In some embodiments, the methods comprise: Identification of the earliest genetic and epigenetic changes a) determining the methylation status of at least one in tumorigenesis is a major focus in molecular cancer cytosine within a DNA region in a sample from an indi research. Diagnostic approaches based on identification of 65 vidual where the DNA region is at least 90%, 91%, 92%, these changes can allow implementation of early detection 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, or strategies, tumor staging and novel therapeutic approaches comprises, a sequence selected from the group consist US 8,067,168 B2 3 4 ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, son of the methylation status to the threshold value is 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, predictive of the presence or absence of ovarian cancer 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, in the individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 5 a) determining the methylation status of at least one 264, and 265; cytosine within a DNA region in a sample from an indi b) comparing the methylation status of the at least one vidual where the DNA region is at least 90%, 91%, 92%, cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without lung cancer, wherein the comparison 10 ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, of the methylation status to the threshold value is pre 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, dictive of the presence or absence of lung cancer in the 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 15 264, and 265; cytosine within a DNA region in a sample from an indi b) comparing the methylation status of the at least one vidual where the DNA region is at least 90%,91%.92%, cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%,97%.98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without head and neck cancer, wherein the ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, 20 comparison of the methylation status to the threshold 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, value is predictive of the presence or absence of head and 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, neck cancer in the individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 264, and 265; 25 cytosine within a DNA region in a sample from an indi b) comparing the methylation status of the at least one vidual where the DNA region is at least 90%, 91%, 92%, cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without renal cancer, wherein the comparison ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, of the methylation status to the threshold value is pre- 30 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, dictive of the presence or absence of renal cancer in the 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 264, and 265; cytosine within a DNA region in a sample from an indi- 35 b) comparing the methylation status of the at least one vidual where the DNA region is at least 90%,91%.92%, cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%,97%.98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without thyroid cancer, wherein the compari ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, son of the methylation status to the threshold value is 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 40 predictive of the presence or absence of thyroid cancer in 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, the individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 264, and 265; cytosine within a DNA region in a sample from an indi b) comparing the methylation status of the at least one 45 vidual where the DNA region is at least 90%, 91%, 92%, cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without liver cancer, wherein the comparison ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, of the methylation status to the threshold value is pre- 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, dictive of the presence or absence of liver cancer in the 50 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 264, and 265; cytosine within a DNA region in a sample from an indi- b) comparing the methylation status of the at least one vidual where the DNA region is at least 90%,91%.92%. 55 cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%,97%.98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist- with and without bladder cancer, wherein the compari ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, son of the methylation status to the threshold value is 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, predictive of the presence or absence of bladder cancer 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, 60 in the individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 264, and 265; cytosine within a DNA region in a sample from an indi b) comparing the methylation status of the at least one vidual where the DNA region is at least 90%, 91%, 92%, cytosine to a threshold value for the biomarker, wherein 65 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without ovarian cancer, wherein the compari ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, US 8,067,168 B2 5 6 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, parison of the methylation status to the threshold value is 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, predictive of the presence or absence of esophegeal can 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, cer in the individual. 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, In some embodiments, the methods comprise: 264, and 265; 5 a) determining the methylation status of at least one b) comparing the methylation status of the at least one cytosine within a DNA region in a sample from an indi cytosine to a threshold value for the biomarker, wherein vidual where the DNA region is at least 90%, 91%, 92%, the threshold value distinguishes between individuals 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, or with and without cervical cancer, wherein the compari comprises, a sequence selected from the group consist 10 ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, son of the methylation status to the threshold value is 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, predictive of the presence or absence of cervical cancer 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, in the individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 15 264, and 265; cytosine within a DNA region in a sample from an indi b) comparing the methylation status of the at least one vidual where the DNA region is at least 90%,91%.92%, cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%,97%.98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without colon cancer, wherein the comparison ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, of the methylation status to the threshold value is pre 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, dictive of the presence or absence of colon cancer in the 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 264, and 265; 25 cytosine within a DNA region in a sample from an indi b) comparing the methylation status of the at least one vidual where the DNA region is at least 90%, 91%, 92%, cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without colon cancer, wherein the comparison ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, of the methylation status to the threshold value is pre 30 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, dictive of the presence or absence of colon cancer in the 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 264, and 265; cytosine within a DNA region in a sample from an indi 35 b) comparing the methylation status of the at least one vidual where the DNA region is at least 90%,91%.92%, cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%,97%.98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without prostate cancer, wherein the compari ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, son of the methylation status to the threshold value is 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 40 predictive of the presence or absence of prostate cancer 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, in the individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 264, and 265; cytosine within a DNA region in a sample from an indi b) comparing the methylation status of the at least one 45 vidual where the DNA region is at least 90%, 91%, 92%, cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without endometrial cancer, wherein the com ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, parison of the methylation status to the threshold value is 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, predictive of the presence or absence of endometrial 50 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, cancer in the individual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, In some embodiments, the methods comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a) determining the methylation status of at least one 264, and 265; cytosine within a DNA region in a sample from an indi b) comparing the methylation status of the at least one vidual where the DNA region is at least 90%,91%.92%, 55 cytosine to a threshold value for the biomarker, wherein 93%, 94%, 95%, 96%,97%.98%, or 99% identical to, or the threshold value distinguishes between individuals comprises, a sequence selected from the group consist with and without melanoma, wherein the comparison of ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, the methylation status to the threshold value is predictive 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, of the presence or absence of melanoma in the indi 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, 60 vidual. 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, With regard to the embodiments, in some embodiments, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, the determining step comprises determining the methylation 264, and 265; status of at least one cytosine-in the DNA region correspond b) comparing the methylation status of the at least one ing to a nucleotide in a biomarker in the DNA region, wherein cytosine to a threshold value for the biomarker, wherein 65 the biomarker is at least 90%, 91%, 92%, 93%, 94%, 95%, the threshold value distinguishes between individuals 96%, 97%, 98%, or 99% identical to, or comprises, a with and without esophageal cancer, wherein the com sequence selected from the group consisting of SEQID NOs: US 8,067,168 B2 7 8 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, melanoma), wherein the comparison of the methylation 172,173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, value to the threshold value is predictive of the presence 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, or absence of cancer (including but not limited to breast, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, lung, renal, liver, ovarian, head and neck, thyroid, blad 208, 209, 210, 211, and 212. der, cervical, colon, endometrial, esophageal, or prostate In some embodiments, the determining step comprises cancer or melanoma) in the individual. determining the methylation status of the DNA region corre In another aspect, the invention provides computer pro sponding to a biomarker. gram products for determining the presence or absence of In some embodiments, the sample is from any body fluid, cancer (including but not limited to breast, lung, renal, liver, including but not limited to blood serum, blood plasma, fine 10 ovarian, head and neck, thyroid, bladder, cervical, colon, needle aspirate of the breast, biopsy of the breast, ductal fluid, endometrial, esophageal, or prostate cancer or melanoma) in ductal lavage, feces, urine, sputum, saliva, semen, lavages, an individual. In some embodiments, the computer readable biopsy of the lung, bronchial lavage or bronchial brushings. products comprise: In some embodiments, the sample is from a tumor or polyp. In a computer readable medium encoded with program code, Some embodiments, the sample is a biopsy from lung, kidney, 15 the program code including: liver, ovarian, head, neck, thyroid, bladder, cervical, colon, program code for receiving a methylation value repre endometrial, esophageal, prostate or skin tissue. In some senting the methylation status of at least one cytosine embodiments, the sample is from cell scrapes, washings, or within a DNA region in a sample from the individual resected tissues. where the DNA region is is at least 90%, 91%, 92%, In some embodiments, the methylation status of at least 93%, 94%, 95%, 96%,97%.98%, or 99% identical to, one cytosine is compared to the methylation status of a con or comprises, a sequence selected from the group trol locus. In some embodiments, the control locus is an consisting of SEQID NOS: 213, 214, 215, 216, 217, endogenous control. In some embodiments, the control locus 218, 219, 220, 221, 222, 223, 224, 225, 226,227, 228, is an exogenous control. 229, 230, 231, 232,233,234,235,236,237,238,239, In some embodiments, the determining step comprises 25 240,241,242,243,244, 245,246, 247,248,249,250, determining the methylation status of at least one cytosine in 251,252,253,254, 255,256,257,258, 259,260,261, at least two of the DNA regions. 262, 263,264, and 265; and In a further aspect, the invention provides computer imple program code for comparing the methylation value to a mented methods for determining the presence or absence of threshold value, wherein the threshold value distin cancer (including but not limited to breast, lung, renal, liver, 30 guishes between individuals with and without cancer ovarian, head and neck, thyroid, bladder, cervical, colon, (including but not limited to breast, lung, renal, liver, endometrial, esophageal, prostate or melanoma) in an indi ovarian, head and neck, thyroid, bladder, cervical, vidual. In some embodiments, the methods comprise: colon, endometrial, esophageal, or prostate cancer or receiving, at a host computer, a methylation value repre melanoma), wherein the comparison of the methyla senting the methylation status of at least one cytosine 35 tion value to the threshold value is predictive of the within a DNA region in a sample from the individual presence or absence of cancer (including but not lim where the DNA region is at least 90%, 91%, 92%, 93%, ited to breast, lung, renal, liver, ovarian, head and 94%. 95%, 96%, 97%, 98%, or 99% identical to, or neck, thyroid, bladder, cervical, colon, endometrial, comprises, a sequence selected from the group consist esophageal, or prostate cancer or melanoma) in the ing of SEQID NOS: 213, 214, 215, 216, 217, 218, 219, 40 individual. 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, In a further aspect, the invention provides kits for deter 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, mining the methylation status of at least one biomarker. In 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, Some embodiments, the kits comprise: 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, a pair of polynucleotides capable of specifically amplify 264, and 265; and 45 ing at least a portion of a DNA region where the DNA comparing, in the host computer, the methylation value to region is a sequence selected from the group consisting a threshold value, wherein the threshold value distin of SEQ ID NOs: of 213, 214, 215, 216, 217, 218, 219, guishes between individuals with and without cancer 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, (including but not limited to breast, lung, renal, liver, 231, 232, 233,234, 235, 236, 237,238, 239, 240, 241, ovarian, head and neck, thyroid, bladder, cervical, colon, 50 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, endometrial, esophageal, prostate or melanoma), 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, wherein the comparison of the methylation value to the 264, and 265; and threshold value is predictive of the presence or absence a methylation-dependent or methylation sensitive restric of cancer (including but not limited to breast, lung, renal, tion enzyme and/or sodium bisulfite. liver, ovarian, head and neck, thyroid, bladder, cervical, 55 In some embodiments, the pair of polynucleotides are colon, endometrial, esophageal, or prostate cancer or capable of specifically amplifying a biomarker selected from melanoma) in the individual. the group consisting of one or more of SEQID NOs: 160, 161, In some embodiments, the receiving step comprises receiv 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, ing at least two methylation values, the two methylation Val 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, ues representing the methylation status of at least one 60 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, cytosine biomarkers from two different DNA regions; and 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, the comparing step comprises comparing the methylation 210, 211, and 212. values to one or more threshold value(s) wherein the In some embodiments, the kits comprise at least two pairs threshold value distinguishes between individuals with of polynucleotides, wherein each pair is capable of specifi and without cancer (including but not limited to breast, 65 cally amplifying at least a portion of a different DNA region lung, renal, liver, ovarian, head and neck, thyroid, blad selected from the group consisting of SEQID NOs: of 213, der, cervical, colon, endometrial, esophageal, prostate or 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, US 8,067,168 B2 9 10 226, 227, 228, 229, 230, 231, 232, 233,234, 235, 236, 237, template. However, “unmethylated DNA” or “methylated 238,239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, DNA can also refer to amplified DNA whose original tem 250, 251, 252,253, 254, 255, 256, 257, 258, 259, 260, 261, plate was methylated or methylated, respectively. 262, 263,264, and 265. A “methylation profile” refers to a set of data representing In some embodiments, the kits further comprise a detect the methylation states of one or more loci within a molecule ably labeled polynucleotide probe that specifically detects the of DNA from e.g., the genome of an individual or cells or amplified biomarker in a real time amplification reaction. tissues from an individual. The profile can indicate the methy In a further aspect, the invention provides kits for deter lation State of every base in an individual, can comprise mining the methylation status of at least one biomarker. In information regarding a Subset of the base pairs (e.g., the Some embodiments, the kits comprise: 10 methylation state of specific restriction enzyme recognition Sodium bisulfite and polynucleotides to quantify the pres sequence) in a genome, or can comprise information regard ence of the converted methylated and or the converted ing regional methylation density of each locus. unmethylated sequence of at least one cytosine from a “Methylation status' refers to the presence, absence and/or DNA region that is selected from the group consisting of quantity of methylation at a particular nucleotide, or nucle SEQID NOs: 213, 214, 215, 216, 217, 218, 219, 220, 15 otides within a portion of DNA. The methylation status of a 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, particular DNA sequence (e.g., a DNA biomarker or DNA 232, 233,234, 235, 236, 237,238,239, 240, 241, 242, region as described herein) can indicate the methylation state 243, 244, 245, 246, 247, 248, 249, 250, 251, 252,253, of every base in the sequence or can indicate the methylation 254, 255, 256, 257, 258, 259, 260, 261, 262, 263,264, state of a Subset of the base pairs (e.g., of cytosines or the and 265. methylation state of one or more specific restriction enzyme In a further aspect, the invention provides kits for deter recognition sequences) within the sequence, or can indicate mining the methylation status of at least one biomarker. In information regarding regional methylation density within Some embodiments, the kits comprise: the sequence without providing precise information of where Sodium bisulfite, primers and adapters for whole genome in the sequence the methylation occurs. The methylation sta amplification, and polynucleotides to quantify the pres 25 tus can optionally be represented or indicated by a “methyla ence of the converted methylated and or the converted tion value. A methylation value can be generated, for unmethylated sequence of at least one cytosine from a example, by quantifying the amount of intact DNA present DNA region that is selected from the group consisting of following restriction digestion with a methylation dependent SEQID NOs: 213, 214, 215, 216, 217, 218, 219, 220, restriction enzyme. In this example, ifa particular sequence in 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 30 the DNA is quantified using quantitative PCR, an amount of 232, 233,234, 235, 236, 237,238,239, 240, 241, 242, template DNA approximately equal to a mock treated control 243, 244, 245, 246, 247, 248, 249, 250, 251, 252,253, indicates the sequence is not highly methylated whereas an 254, 255, 256, 257, 258, 259, 260, 261, 262, 263,264, amount oftemplate Substantially less than occurs in the mock and 265. treated sample indicates the presence of methylated DNA at In another aspect, the methods provide kits for determining 35 the sequence. Accordingly, a value, i.e., a methylation value, the methylation status of at least one biomarker. In some for example from the above described example, represents the embodiments, the kits comprise: methylation status and can thus be used as a quantitative a methylation sensing restriction enzymes, primers and indicator of methylation status. This is of particular use when adapters for whole genome amplification, and poly it is desirable to compare the methylation status of a sequence nucleotides to quantify the number of copies of at least a 40 in a sample to a threshold value. portion of a DNA region where the DNA region is A “methylation-dependent restriction enzyme” refers to a selected from the group consisting of SEQID NOS: 213, restriction enzyme that cleaves or digests DNA at or in prox 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, imity to a methylated recognition sequence, but does not 225, 226, 227, 228, 229, 230, 231, 232, 233,234, 235, cleave DNA at or near the same sequence when the recogni 236, 237,238,239, 240, 241, 242, 243, 244, 245, 246, 45 tion sequence is not methylated. Methylation-dependent 247, 248, 249, 250, 251, 252,253, 254, 255, 256, 257, restriction enzymes include those that cut at a methylated 258, 259, 260,261,262, 263,264, and 265. recognition sequence (e.g., DpnI) and enzymes that cut at a In a further aspect, the invention provides kits for deter sequence near but not at the recognition sequence (e.g., mining the methylation status of at least one biomarker. In McrBC). For example, McrBC's recognition sequence is 5' Some embodiments, the kits comprise: 50 RmC (N40-3000) RmC 3' where “R” is a purine and “mc” is a methylation binding moiety and one or more polynucle a methylated cytosine and “N40-3000 indicates the distance otides to quantify the number of copies of at least a between the two RmChalf sites for which a restriction event portion of a DNA region where the DNA region is has been observed. McrBC generally cuts close to one half selected from the group consisting of SEQID NOS: 213, site or the other, but cleavage positions are typically distrib 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 55 uted over several base pairs, approximately 30 base pairs 225, 226, 227, 228, 229, 230, 231, 232, 233,234, 235, from the methylated base. McrBC sometimes cuts 3' of both 236, 237,238,239, 240, 241, 242, 243, 244, 245, 246, half sites, sometimes 5' of both half sites, and sometimes 247, 248, 249, 250, 251, 252,253, 254, 255, 256, 257, between the two sites. Exemplary methylation-dependent 258, 259, 260,261,262, 263,264, and 265. restriction enzymes include, e.g., McrBC (see, e.g., U.S. Pat. 60 No. 5,405,760), Mcra, MrrA, and DpnI. One of skill in the art DEFINITIONS will appreciate that any methylation-dependent restriction enzyme, including homologs and orthologs of the restriction “Methylation” refers to cytosine methylation at positions enzymes described herein, is also suitable for use in the C5 or N4 of cytosine, the N6 position of adenine or other present invention. types of nucleic acid methylation. In vitro amplified DNA is 65 A “methylation-sensitive restriction enzyme” refers to a unmethylated because in vitro DNA amplification methods restriction enzyme that cleaves DNA at or in proximity to an do not retain the methylation pattern of the amplification unmethylated recognition sequence but does not cleave at or US 8,067,168 B2 11 12 in proximity to the same sequence when the recognition and the particular in question. Such differ sequence is methylated. Exemplary methylation-sensitive ence can be the result of slight genetic variation between restriction enzymes are described in, e.g., McClelland et al. humans. Nucleic Acids Res. 22(17):3640-59 (1994) and http://re “Sensitivity” of a given biomarker refers to the percentage base.neb.com. Suitable methylation-sensitive restriction 5 of tumor samples that report a DNA methylation value above enzymes that do not cleave DNA at or near their recognition a threshold value that distinguishes between tumor and non sequence when a cytosine within the recognition sequence is tumor samples. The percentage is calculated as follows: methylated at position Cinclude, e.g., Aat II, Aci I, Acll, Age I, Alu I, Asc I, Ase I, AsiS I, Bbe I, BsaAI, BsaFI I, BsiB I, BsiW I, BSrF I, BSSH II, BSSKI, BstBI, BstNI, BstUI, Cla I, 10 (the number of tumor Eae I, Eag I, Fau I, Fse I, Hha I, HinP1 I, Hinc II, HpaII, Sensitivity= samples above the threshold) Hpy99 I, HpyCH4 IV. Kas I, Mbo I, Milu I, Map A1 I, Msp I, ensitivity ls total number of tumor samples tested) x 100 Nae I, Nar I, Not I, Pml I, Pst I, Pvu I, RSr II, Sac II, Sap I, Sau3AI, Sfl I, Sfo I, Sgr AI, Sma I, SnaB I, Tsc I, Xma I, and Zra I. Suitable methylation-sensitive restriction enzymes that 15 The equation may also be stated as follows: do not cleave DNA at or near their recognition sequence when an adenosine within the recognition sequence is methylated at (theth numberber of ttrue positiveiti samples)I x 100 position N' include, e.g., Mbo I. One of skill in the art will Sensitivity=ensitivity ---number of true positive - - samples) - +- appreciate that any methylation-sensitive restriction enzyme, (the number of false negative samples) including homologs and orthologs of the restriction enzymes described herein, is also suitable for use in the present inven tion. One of skill in the art will further appreciate that a where true positive is defined as a histology-confirmed tumor methylation-sensitive restriction enzyme that fails to cut in sample that reports a DNA methylation value above the the presence of methylation of a cytosine at or near its recog 25 threshold value (i.e. the range associated with disease), and nition sequence may be insensitive to the presence of methy false negative is defined as a histology-confirmed tumor lation of an adenosine at or near its recognition sequence. sample that reports a DNA methylation value below the Likewise, a methylation-sensitive restriction enzyme that threshold value (i.e. the range associated with no disease). fails to cut in the presence of methylation of an adenosine at The value of sensitivity, therefore, reflects the probability that or near its recognition sequence may be insensitive to the 30 a DNA methylation measurement for a given biomarker presence of methylation of a cytosine at or near its recognition obtained from a known diseased sample will be in the range of sequence. For example, Sau3AI is sensitive (i.e., fails to cut) disease-associated measurements. As defined here, the clini to the presence of a methylated cytosine at or near its recog cal relevance of the calculated sensitivity value represents an nition sequence, but is insensitive (i.e., cuts) to the presence of estimation of the probability that a given biomarker would a methylated adenosine at or near its recognition sequence. 35 detect the presence of a clinical condition when applied to a One of skill in the art will also appreciate that some methy patient with that condition. lation-sensitive restriction enzymes are blocked by methyla “Specificity' of a given biomarker refers to the percentage tion of bases on one or both strands of DNA encompassing of of non-tumor samples that report a DNA methylation value their recognition sequence, while other methylation-sensitive below a threshold value that distinguishes between tumor and restriction enzymes are blocked only by methylation on both 40 Strands, but can cut if a recognition site is hemi-methylated. non-tumor samples. The percentage is calculated as follows: A “threshold value that distinguishes between individuals with and without a particular disease refers to a value or range of values of a particular measurement that can be used (the number of non-tumor to distinguish between samples from individuals with the 45 Specificity= samples below the threshold) x 100 disease and samples without the disease. Ideally, there is a (the total number of non-tumor samples tested) threshold value or values that absolutely distinguishes between the two groups (i.e., values from the diseased group are always on one side (e.g., higher) of the threshold value and The equation may also be stated as follows: values from the healthy, non-diseased group are on the other 50 side (e.g., lower) of the threshold value). However, in many (the number of true negative samples) instances, threshold values do not absolutely distinguish Specificity= (the number of true negative samples) + x 100 between diseased and non-diseased samples (for example, when there is some overlap of values generated from diseased (the number of false positive samples) and non-diseased samples). 55 The phrase “corresponding to a nucleotide in a biomarker” where true negative is defined as a histology-confirmed non refers to a nucleotide in a DNA region that aligns with the tumor sample that reports a DNA methylation value below the same nucleotide (e.g., a cytosine) in a biomarker sequence. threshold value (i.e. the range associated with no disease), Generally, as described herein, biomarker sequences are Sub and false positive is defined as a histology-confirmed non sequences of the DNA regions. Sequence comparisons can be 60 tumor sample that reports DNA methylation value above the performed using any BLAST including BLAST 2.2 algo threshold value (i.e. the range associated with disease). The rithm with default parameters, described in Altschul et al., value of specificity, therefore, reflects the probability that a Nuc. Acids Res. 25:3389 3402 (1997) and Altschulet al., J. DNA methylation measurement for a given biomarker Mol. Biol. 215:403 410 (1990), respectively. Thus for obtained from a known non-diseased sample will be in the example, a DNA region or biomarker described herein can 65 range of non-disease associated measurements. As defined correspond to a DNA sequence in a human genome even if here, the clinical relevance of the calculated specificity value there is slight variation between the biomarker or DNA region represents an estimation of the probability that a given biom US 8,067,168 B2 13 14 arker would detect the absence of a clinical condition when 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-me applied to a patient without that condition. thylcytosine, N6-methyladenine, 7-methylguanine, 5-methy Software for performing BLAST analyses is publicly laminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, available through the National Center for Biotechnology beta-D mannosylqueosine, 5'-methoxycarboxymethyluracil, Information. This algorithm involves first identifying high 5 5-methoxyuracil, 2-methylthio-N6-isopentenyladenine, scoring sequence pairs (HSPs) by: identifying short words of uracil-5-oxyacetic acid (v), Wybutoxosine, pseudouracil, length W in the query sequence, which either match or satisfy queosine,2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, some positive-valued threshold score T when aligned with a 4-thiouracil, uracil-5- oxyacetic acidmethylester, 3-(3- word of the same length in a database sequence. T is referred amino-3-N-2-carboxypropyl) uracil, (acp3)w, 2,6- diami to as the neighborhood word score threshold (Altschulet al., 10 nopurine, and 5-propynyl pyrimidine. Other examples of supra). These initial neighborhood word hits act as seeds for modified, non-standard, or derivatized base moieties may be initiating searches to find longer HSPs containing them. The found in U.S. Pat. Nos. 6,001,611; 5,955,589; 5,844,106: word hits are extended in both directions along each sequence 5,789,562; 5,750,343; 5,728,525; and 5,679,785. for as far as the cumulative alignment score can be increased. Furthermore, a nucleic acid, polynucleotide or oligonucle Cumulative scores are calculated using, for nucleotide 15 otide can comprise one or more modified Sugar moieties sequences, the parameters M (reward score for a pair of including, but not limited to, arabinose, 2-fluoroarabinose, matching residues; always >0) and N (penalty score for mis Xylulose, and a hexose. matching residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. BRIEF DESCRIPTION OF THE DRAWINGS Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X FIG. 1 illustrates a general overview of the experimental from its maximum achieved value; the cumulative score goes design of differential methylation screening. A graphical rep to zero or below, due to the accumulation of one or more resentation of the transcription start site and 5' structure of one negative-scoring residue alignments; or the end of either predicted differentially methylated gene is indicated (A). The sequence is reached. The BLAST algorithm parameters W.T. 25 bar graph (B) indicates the relative local density of purine-CG and X determine the sensitivity and speed of the alignment. sequences within this region. The relative position of the The BLASTN program (for nucleotide sequences) uses as DNA microarray feature that reported differential DNA defaults a wordlength (W) of 11, an expectation (E) of 10, methylation at this locus is indicated by (C). PCR primers M=5, N=-4 and a comparison of both strands. Foramino acid were selected to amplify the region indicated by (D). The sequences, the BLASTP program uses as defaults a 30 vertical bars (E and F) represent the microarray DNA methy wordlength of 3, and expectation (E) of 10, and the BLO lation measurement representing all breast tumors (E) and all SUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. normal breast samples (F). Acad. Sci. USA 89:10915 (1989)) alignments (B) of 50, FIG. 2 illustrates an example of McrBC-based real-time expectation (E) of 10, M-5, N=-4, and a comparison of both PCR strategy to monitor DNA methylation status. Panel A Strands. 35 shows the untreated/treated PCR replicate 1 for amplification As used herein, the terms “nucleic acid.” “polynucletide' of the GHSR locus in a breast tumor sample. The delta Ct and "oligonucleotide refer to nucleic acid regions, nucleic (Treated 1—Untreated 1) is 5.38 cycles. Panel B shows the acid segments, primers, probes, amplicons and oligomer frag untreated/treated PCR replicate 2 for amplification of the ments. The terms are not limited by length and are generic to same locus from the same tumor sample. The delta Ct linear polymers of polydeoxyribonucleotides (containing 40 (Treated 2 Untreated 2) is 5.40 cycles. Panel C shows the 2-deoxy-D-ribose), polyribonucleotides (containing D-ri untreated/treated PCR replicate 1 for amplification of the bose), and any other N-glycoside of a purine or pyrimidine GHSR locus in a normal breast sample. The delta Ct (Treated base, or modified purine or pyrimidine bases. These terms 1—Untreated 1) is 0.18 cycles. Panel D shows the untreated/ include double- and single-stranded DNA, as well as double treated PCR replicate 2 for amplification of the same locus and single-stranded RNA. 45 from the same normal sample. The delta Ct (Treated 2 Un A nucleic acid, polynucleotide or oligonucleotide can treated 2) is 0.03 cycles. Tumor samples produce a change in comprise, for example, phosphodiester linkages or modified cycle threshold (“delta Ct) of 1.0 or greater. Normal samples linkages including, but not limited to phosphotriester, phos produce a delta Ct of less than 1.0. phoramidate, siloxane, carbonate, carboxymethylester, FIGS. 3A-B illustrates verification of microarray DNA acetamidate, carbamate, thioether, bridged phosphoramidate, 50 methylation predictions. Open boxes represent loci that are bridged methylene phosphonate, phosphorothioate, meth unmethylated (average delta Ct-1.0), grey boxes represent ylphosphonate, phosphorodithioate, bridged phosphorothio loci that are methylated (average delta Ctd 1 and <2), and ate or Sulfone linkages, and combinations of such linkages. blackboxes represent loci that are densely methylated (aver A nucleic acid, polynucleotide or oligonucleotide can age delta Ctd2). comprise the five biologically occurring bases (adenine, gua 55 FIGS. 4A and 4B illustrate validation of DNA methylation nine, thymine, cytosine and uracil) and/or bases other than the differences in biomarkers from independent tumor and nor five biologically occurring bases. For example, a polynucle mal samples. Boxes areas indicated for FIG. 3. otide of the invention can contain one or more modified, FIG. 5 illustrates DNA methylation differences in biomar non-standard, or derivatized base moieties, including, but not kers from a larger panel of breast tumor, normal breast and limited to, N'-methyl-adenine, N-tert-butyl-benzyl-ad 60 normal peripheral blood samples. Boxes are as indicated for enine, imidazole, Substituted imidazoles, 5-fluorouracil, FIG. 3. 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, FIG. 6 illustrates demonstration of threshold adjustment Xanthine, 4-acetylcytosine, 5-(carboxyhydroxymethyl) for determining sensitivity and specificity. uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5-car FIG. 7 illustrates bisulfite sequencing confirmation of dif boxymethylaminomethyluracil, dihydrouracil, beta-D-galac 65 ferential DNA methylation. tosylqueosine, inosine, N6-isopentenyladenine, FIG. 8 illustrates the correlation between qPCR based 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, DNA methylation measurements and DNA methylation US 8,067,168 B2 15 16 occupancies determined by bisulfite sequencing. Primers sample is indicated (-average dCt-1.0, +average dCt21.0 were designed to amplify approximately 150 bp amplicons but <2.0, and ++average dCt22.0). within the region of three loci that were analyzed by qPCR as FIG. 14 illustrates the comparison of DNA methylation described above. The loci included feature ID halp 39.189 detection infine needle aspirate (% POSITIVE FNA) samples (locus number 2), halg 00644 (locus number 3) and halp relative to unmatched primary breast tumor samples (% 104423 (locus number 12). For each amplicon, products were POSITIVE TUMOR). Each sample was scored as positive if amplified from three normal breast DNA samples that the average dCt was 21.0, as described in Example 3. Ana reported average dCt values <0.5, three normal breast DNA lyzed samples included 7 FNA samples and at least 14 pri samples that reported averagedCtvalues between 0.5 and 1.0, mary breast tumor samples. and three breast tumor DNA samples that reported average 10 dCt values greater than 1.0. Amplicons were purified and DETAILED DESCRIPTION OF THE INVENTION cloned using TA cloning kits (Invitrogen). At least 29 inde pendent clones were sequenced per amplicon, per locus. The I. Introduction graph shows the median 5-methylcytosine content for all The present invention is based, in part, on the discovery sequenced clones per amplicon plotted against the average 15 dCt value for that locus in the same DNA sample. The dashed that sequences in certain DNA regions are methylated in vertical line represents the dCt=1.0 threshold used to indicate cancer cells, but not normal cells. The inventors have found a positive qPCR measurement for DNA methylation detec that methylation within the DNA regions described herein are tion. associated with breast cancer, particularly ductal carcinoma, FIG. 9 illustrates an example of selection of a potentially as well as a number of other cancers. differentially methylated region based on an analysis of CpG In view of this discovery, the inventors have recognized density (identification of a CG island). that methods for detecting the biomarker sequences and DNA FIG. 10A illustrates the frequency of differential DNA regions comprising the biomarker sequences as well as methylation of 16 loci in stage Ibreast tumors relative to stage sequences adjacent to the biomarkers that contain a signifi II-III breast tumors. The 16 loci include those listed in Table 25 cant amount of CG subsequences, methylation of the DNA 5. regions, and/or expression of the genes regulated by the DNA FIG. 10B illustrates the DNA methylation density of three regions can be used to detect cancer cells. Detecting cancer selected loci relative to tumor stage. The averaged approxi cells allows for diagnostic tests that detect disease, assess the mate percent depletion of methylated molecules by McrBC risk of contracting disease, determining a predisposition to was calculated to determine the load of methylated molecules 30 disease, stage disease, diagnose disease, monitor disease, in each sample 1-(1/2 delta Ct (McrBC digested-Mock and/or aid in the selection of treatment for a person with treated))*100. Data are plotted (from left to right) for normal disease. breast samples, stage I tumors, stage IIA tumors, stage IIB II. Methylation Biomarkers tumors and stage III tumors. In some embodiments, the presence or absence or quantity FIG. 11A illustrates the differential DNA methylation of 35 of methylation of the chromosomal DNA within a DNA four selected loci in breast tumor, normal breast tissue and region or portion thereof (e.g., at least one cytosine) selected peripheral blood from a cancer-free woman. Each data point from SEQID Nos: 213-265 is detected. Portions of the DNA represents the averaged delta Clt value for an independent regions described herein will comprise at least one potential clinical sample. methylation site (i.e., a cytosine) and can in some embodi FIG. 11B illustrates ROC curve analyses of the four loci 40 ments generally comprise 2, 3, 4, 5, 10, or more potential depicted in FIG. 11 A. Sensitivity (percentage of tumor methylation sites. In some embodiments, the methylation samples scoring above a methylation threshold) and specific status of all cytosines within at least 20, 50, 100, 200, 500 or ity (percentage of non-tumor samples scoring below that more contiguous base pairs of the DNA region are deter same threshold) were calculated for all observed delta Ct mined. values. 45 In some embodiments, the methylation of more than one FIG. 12 illustrates the analysis of DNA methylation of four DNA region (orportion thereof) is detected. In some embodi selected loci by bisulfite sequencing. Analyzed loci included ments, the methylation status at least one cytosine in 2, 3, 4, locus number 2 (A, B), 3 (C, D), 4 (E. F) and 12 (G, H). 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, Bisulfite sequencing was performed. The average number of 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34,35, 36, 37,38, 39, molecules sequenced for each locus within each sample was 50 40, 41, 42, 43,44, 45, 46, 47, 48, 49, 50, 51, 52 or 53 of the 587. The calculated DNA methylation density (number of DNA regions is determined. methylated CpGs divided by the total number of Cpos In some embodiments of the invention, the methylation of sequenced) for each sample is plotted versus the qPCR DNA a DNA region or portion thereof is determined and then methylation measurement for the same sample (A, C, E, G). compared (e.g., normalized) to the methylation of a control In addition, the percent methylation occupancy at each ana 55 locus. Typically the control locus will have a known, rela lyzed CpG dinucleotide is shown (B, D, F. H). Analyzed tively constant, methylation status. For example, the control samples included normal breast tissues (open circles), adja sequence can be previously determined to have no. Some, or cent histology normal breast tissues (filled circles) and breast a high amount of methylation, thereby providing a relative tumors (filled squares). constant value to control for error in detection methods, etc., FIG. 13 illustrates the correlation between DNA hyperm 60 unrelated to the presence or absence of cancer. In some ethylation and gene expression. Transcription of GHSR (lo embodiments, the control locus is endogenous, i.e., is part of cus number 2), MGA (locus number 4), and NFX1 (locus the genome of the individual sampled. For example, in mam number 12) were analyzed by RT-PCR. Serial dilutions of malian cells, the testes-specific histone 2B gene (hTH2B in cDNA from normal breast tissue and four breast tumors were human) gene is known to be methylated in all somatic tissues used as template as indicated. GAPDH expression was ana 65 except testes. Alternatively, the control locus can be an exog lyzed as an internal control for each sample. The DNA methy enous locus, i.e., a DNA sequence spiked into the sample in a lation measurement (qPCR) for each locus in each tumor known quantity and having a known methylation status. US 8,067,168 B2 17 18 A DNA region comprises a nucleic acid including one or fication can be performed using primers that are gene specific. more methylation sites of interest (e.g., a cytosine, a Alternatively, adaptors can be added to the ends of the ran “microarray feature' as exemplified in FIG. 1C, or an ampli domly fragmented DNA, the DNA can be digested with a con amplified from select primers as exemplified in FIG. 1D) methylation-dependent or methylation-sensitive restriction and flanking nucleic acid sequences (i.e., “wingspan) of up enzyme, intact DNA can be amplified using primers that to 4 kilobases (kb) in either or both of the 3' or 5' direction hybridize to the adaptor sequences. In this case, a second step from the amplicon. This range corresponds to the lengths of can be performed to determine the presence, absence or quan DNA fragments obtained by randomly shearing the DNA tity of a particular gene in an amplified pool of DNA. In some before screening for differential methylation between DNA embodiments, the DNA is amplified using real-time, quanti in two or more samples (e.g., carrying out methods used to 10 tative PCR. initially identify differentially methylated sequences as In some embodiments, the methods comprise quantifying described in the Examples, below). In some embodiments, the average methylation density in a target sequence within a the wingspan of the one or more DNA regions is about 0.5 kb, population of genomic DNA. In some embodiments, the 0.75 kb, 1.0 kb, 1.5 kb, 2.0 kb, 2.5 kb, 3.0 kb, 3.5 kb or 4.0 kb. method comprises contacting genomic DNA with a methyla In some cases, the DNA region comprises more nucle 15 tion-dependent restriction enzyme or methylation-sensitive otides than simply the wingspan of the discovery method restriction enzyme under conditions that allow for at least because the relevant microarray feature or amplicon reside in Some copies of potential restriction enzyme cleavage sites in a larger region of higher CG density in the chromosome. This the locus to remain uncleaved; quantifying intact copies of the range corresponds to identified lengths of nucleic acid locus; and comparing the quantity of amplified product to a sequences having higher CG density (e.g., a “CG island') control value representing the quantity of methylation of con than flanking nucleic acid sequences (e.g., "local minimum trol DNA, thereby quantifying the average methylation den CG density) (see, for example, FIG. 8). DNA regions having sity in the locus compared to the methylation density of the extended sequences of heightened CG density include, for control DNA. example, sequences 213, 214, 215, 216, 217, 218, 219, 220, The quantity of methylation of a locus of DNA can be 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 25 determined by providing a sample of genomic DNA compris 233,234, 235, 236, 237,238,239, 240, 241, 242, 243, 244, ing the locus, cleaving the DNA with a restriction enzyme that 245, 246, 247, 248, 249, 250, 251, 252, 253,254, 255, 256, is either methylation-sensitive or methylation-dependent, and 257,258, 259,260,261,262,263,264, and 265 (see, Table 2 then quantifying the amount of intact DNA or quantifying the and section “SEQUENCE LISTING”). amount of cut DNA at the DNA locus of interest. The amount The methylation sites in a DNA region can reside in non 30 of intact or cut DNA will depend on the initial amount of coding transcriptional control sequences (e.g., promoters, genomic DNA containing the locus, the amount of methyla enhancers, etc.) or in coding sequences, including introns and tion in the locus, and the number (i.e., the fraction) of nucle exons of the designated genes listed in Tables 1 and 2 and in otides in the locus that are methylated in the genomic DNA. section “SEQUENCE LISTING.” In some embodiments, the The amount of methylation in a DNA locus can be determined methods comprise detecting the methylation status in the 35 by comparing the quantity of intact DNA or cut DNA to a promoter regions (e.g., comprising the nucleic acid sequence control value representing the quantity of intact DNA or cut that is about 1.0 kb, 1.5 kb, 2.0 kb, 2.5 kb, 3.0 kb, 3.5 kb or 4.0 DNA in a similarly-treated DNA sample. The control value kb 5’ from the transcriptional start site through to the tran can represent a known or predicted number of methylated Scriptional start site) of one or more of the genes identified in nucleotides. Alternatively, the control value can represent the Tables 1 and 2 and in the “SEQUENCE LISTING” section. 40 quantity of intact or cut DNA from the same locus in another The DNA regions of the invention also include naturally (e.g., normal, non-diseased) cell or a second locus. occurring variants, including for example, variants occurring By using at least one methylation-sensitive or methylation in different Subject populations and variants arising from dependent restriction enzyme under conditions that allow for single nucleotide polymorphisms (SNPs). Variants include at least Some copies of potential restriction enzyme cleavage nucleic acid sequences from the same DNA region (e.g., as set 45 sites in the locus to remain uncleaved and Subsequently quan forth in Tables 1 and 2 and in the “SEQUENCE LISTING” tifying the remaining intact copies and comparing the quan section) sharing at least 90%. 95%, 98%, 99% sequence tity to a control, average methylation density of a locus can be identity, i.e., having one or more deletions, additions, Substi determined. If the methylation-sensitive restriction enzyme is tutions, inverted sequences, etc., relative to the DNA regions contacted to copies of a DNA locus under conditions that described herein. 50 allow for at least some copies of potential restriction enzyme III. Methods For Determining Methylation cleavage sites in the locus to remain uncleaved, then the Any method for detecting DNA methylation can be used in remaining intact DNA will be directly proportional to the the methods of the present invention. methylation density, and thus may be compared to a control to In some embodiments, methods for detecting methylation determine the relative methylation density of the locus in the include randomly shearing or randomly fragmenting the 55 sample. Similarly, if a methylation-dependent restriction genomic DNA, cutting the DNA with a methylation-depen enzyme is contacted to copies of a DNA locus under condi dent or methylation-sensitive restriction enzyme and Subse tions that allow for at least some copies of potential restriction quently selectively identifying and/or analyzing the cut or enzyme cleavage sites in the locus to remain uncleaved, then uncut DNA. Selective identification can include, for example, the remaining intact DNA will be inversely proportional to separating cut and uncut DNA (e.g., by size) and quantifying 60 the methylation density, and thus may be compared to a a sequence of interest that was cut or, alternatively, that was control to determine the relative methylation density of the not cut. See, e.g., U.S. Pat. No. 7,186.512. Alternatively, the locus in the sample. Such assays are disclosed in, e.g., U.S. method can encompass amplifying intact DNA after restric patent application Ser. No. 10/971,986. tion enzyme digestion, thereby only amplifying DNA that Kits for the above methods can include, e.g., one or more of was not cleaved by the restriction enzyme in the area ampli 65 methylation-dependent restriction enzymes, methylation fied. See, e.g., U.S. patent application Ser. Nos. 10/971,986: sensitive restriction enzymes, amplification (e.g., PCR) 11/071,013; and 10/971,339. In some embodiments, ampli reagents, probes and/or primers. US 8,067,168 B2 19 20 Quantitative amplification methods (e.g., quantitative PCR tion buffer (for the Ms-SNuPE reaction); and detectably or quantitative linear amplification) can be used to quantify labeled nucleotides. Additionally, bisulfite conversion the amount of intact DNA within a locus flanked by amplifi reagents may include: DNA denaturation buffer; sulfonation cation primers following restriction digestion. Methods of buffer; DNA recovery regents or kit (e.g., precipitation, ultra quantitative amplification are disclosed in, e.g., U.S. Pat. Nos. filtration, affinity column); desulfonation buffer; and DNA 6,180,349; 6,033,854; and 5,972,602, as well as in, e.g., Gib recovery components. son et al., Genome Research 6:995-1001 (1996); DeGraves, In some embodiments, a methylation-specific PCR et al., Biotechniques 34(1):106-10, 112-5 (2003); Deiman B, (“MSP) reaction is used alone or in combination with other et al., Mol Biotechnol. 2002):163–79 (2002). Amplifications methods to detect DNA methylation. An MSP assay entails may be monitored in “real time.” 10 initial modification of DNA by sodium bisulfite, converting Additional methods for detecting DNA methylation can all unmethylated, but not methylated, cytosines to uracil, and involve genomic sequencing before and after treatment of the Subsequent amplification with primers specific for methy DNA with bisulfite. See, e.g., Frommer et al., Proc. Natl. lated versus unmethylated DNA. See, Herman et al., Proc. Acad. Sci. USA 89:1827-1831 (1992). When sodium bisulfite Natl. Acad. Sci. USA 93:9821-9826, (1996); U.S. Pat. No. is contacted to DNA, unmethylated cytosine is converted to 15 5,786,146. uracil, while methylated cytosine is not modified. Additional methylation detection methods include, but are In some embodiments, restriction enzyme digestion of not limited to, methylated CpG island amplification (see, PCR products amplified from bisulfite-converted DNA is Toyota et al., Cancer Res. 59:23.07-12 (1999)) and those used to detect DNA methylation. See, e.g., Sadri & Hornsby, described in, e.g., U.S. Patent Publication 2005/0069879; Nucl. Acids Res. 24:5058-5059 (1996); Xiong & Laird, Rein, et al. Nucleic Acids Res. 26 (10): 2255-64 (1998); Olek, Nucleic Acids Res. 25:2532-2534 (1997). et al. Nat Genet. 17(3): 275-6 (1997); and PCT Publication In some embodiments, a Methylight assay is used alone or No. WOOOf7OO90. in combination with other methods to detect DNA methyla It is well known that methylation of genomic DNA can tion (see, Eads et al., Cancer Res. 59:2302-2306 (1999)). affect expression (transcription and/or translation) of nearby Briefly, in the Methylight process genomic DNA is con 25 gene sequences. Therefore, in some embodiments, the meth verted in a sodium bisulfite reaction (the bisulfite process ods include the step of correlating the methylation status of at converts unmethylated cytosine residues to uracil). Amplifi least one cytosine in a DNA region with the expression of cation of a DNA sequence of interest is then performed using nearby coding sequences, as described in Tables 1 and 2 and PCR primers that hybridize to CpG dinucleotides. By using in the “SEQUENCE LISTING” section. For example, primers that hybridize only to sequences resulting from 30 expression of gene sequences within about 1.0 kb, 1.5 kb, 2.0 bisulfite conversion of unmethylated DNA, (or alternatively kb, 2.5 kb, 3.0 kb, 3.5 kb or 4.0 kb in either the 3' or 5' to methylated sequences that are not converted) amplification direction from the cytosine of interest in the DNA region can can indicate methylation status of sequences where the prim be detected. Methods for measuring transcription and/or ers hybridize. Similarly, the amplification product can be translation of a particular gene sequence are well known in detected with a probe that specifically binds to a sequence 35 the art. See, for example, Ausubel, Current Protocols in resulting from bisulfite treatment of a unmethylated (or Molecular Biology, 1987-2006, John Wiley & Sons; and methylated) DNA. If desired, both primers and probes can be Sambrook and Russell, Molecular Cloning: A Laboratory used to detect methylation status. Thus, kits for use with Manual, 3rd Edition, 2000, Cold Spring Harbor Laboratory Methylight can include sodium bisulfite as well as primers or Press. In some embodiments, the gene or expression detectably-labeled probes (including but not limited to Taq 40 of a gene in Tables 1 and 2 and in the “SEQUENCE LIST man or molecular beacon probes) that distinguish between ING” section is compared to a control, for example, the methylated and unmethylated DNA that have been treated methylation status in the DNA region and/or the expression of with bisulfite. Other kit components can include, e.g., a nearby gene sequence; and/or the same gene sequence from reagents necessary for amplification of DNA including but a sample from an individual known to be negative for cancer not limited to, PCR buffers, deoxynucleotides; and a thermo 45 or known to be positive for cancer, or to an expression level stable polymerase. that distinguishes between cancer and noncancer States. Such In some embodiments, a Ms-SNuPE (Methylation-sensi methods, like the methods of detecting methylation described tive Single Nucleotide Primer Extension) reaction is used herein, are useful in providing diagnosis, prognosis, etc., of alone or in combination with other methods to detect DNA breast cancer. methylation (see, Gonzalgo & Jones, Nucleic Acids Res. 50 In some embodiments, the methods further comprise the 25:2529-2531 (1997)). The Ms-SNuPE technique is a quan step of correlating the methylation status and expression of titative method for assessing methylation differences at spe one or more of the gene regions identified in Tables 1 and 2 cific CpG sites based on bisulfite treatment of DNA, followed and in the “SEQUENCE LISTING” section. by single-nucleotide primer extension (Gonzalgo & Jones, IV. Cancer Detection supra). Briefly, genomic DNA is reacted with sodium bisulfite 55 The present biomarkers and methods can be used in the to convert unmethylated cytosine to uracil while leaving detection, diagnosis, prognosis, classification, and treatment 5-methylcytosine unchanged. Amplification of the desired of a number of types of cancers. A cancer at any stage of target sequence is then performed using PCR primers specific progression can be detected, such as primary, metastatic, and for bisulfite-converted DNA, and the resulting product is recurrent cancers. Information regarding numerous types of isolated and used as a template for methylation analysis at the 60 cancer can be found, e.g., from the American Cancer Society CpG site(s) of interest. (available on the worldwide web at cancer.org), or from, e.g., Typical reagents (e.g., as might be found in a typical MS Harrison's Principles of internal Medicine, Kaspar, et al., SNuPE-based kit) for Ms-SNuPE analysis can include, but eds., 16th Edition, 2005, McGraw-Hill, Inc. Exemplary can are not limited to: PCR primers for specific gene (or methy cers that can be detected include, e.g., breast cancers, includ lation-altered DNA sequence or CpG island); optimized PCR 65 ing ductal carcinoma, as well as lung, renal, liver, ovarian, buffers and deoxynucleotides; gel extraction kit; positive head and neck, thyroid, bladder, cervical, colon, endometrial, control primers; Ms-SNuPE primers for a specific gene; reac esophageal, or prostate cancer or melanoma. US 8,067,168 B2 21 22 The present invention provides methods for determining as described herein, and the relative efficacy of one or another whether or not a mammal (e.g., a human) has cancer, i.e., anti-cancer agent. Such analyses can be performed, e.g., ret whether or not a biological sample taken from a mammal rospectively, i.e., by detecting methylation in one or more of contains cancerous cells, estimating the risk or likelihood of the diagnostic genes in Samples taken previously from mam a mammal developing cancer, classifying cancer types and 5 mals that have Subsequently undergone one or more types of stages, and monitoring the efficacy of anti-cancer treatment anti-cancer therapy, and correlating the known efficacy of the or selecting the appropriate anti-cancer treatment in a mam treatment with the presence, absence or levels of methylation mal with cancer. Such methods are based on the discovery of one or more of the diagnostic biomarkers. that cancer cells have a different methylation status than In making a diagnosis, prognosis, risk assessment or clas normal cells in the DNA regions described in the invention. 10 sification, in monitoring disease, or in determining the most Accordingly, by determining whether or not a cell contains beneficial course of treatment based on the presence or differentially methylated sequences in the DNA regions as absence of methylation in at least one of the diagnostic biom described herein, it is possible to determine whether or not the arkers, the quantity of methylation may be compared to a cell is cancerous. threshold value that distinguishes between one diagnosis, In numerous embodiments of the present invention, the 15 prognosis, risk assessment, classification, etc., and another. presence of methylated nucleotides in the diagnostic biomar For example, a threshold value can represent the degree of ker sequences of the invention is detected in a biological methylation foundata particular DNA region that adequately sample, thereby detecting the presence or absence of cancer distinguishes between breast cancer samples and normal ous cells in the mammal from which the biological sample breast samples with a desired level of sensitivity and speci was taken. In some embodiments, the biological sample com ficity. It is understood that a threshold value will likely vary prises a tissue sample from a tissue Suspected of containing depending on the assays used to measure methylation, but it is cancerous cells. For example, in an individual suspected of also understood that it is a relatively simple matter to deter having cancer, breast tissue, lymph tissue, lung tissue, brain mine a threshold value or range by measuring methylation of tissue, or blood can be evaluated. Alternatively, lung, renal, a DNA sequence in diseased and normal samples using the liver, ovarian, head and neck, thyroid, bladder, cervical, 25 particular desired assay and then determining a value that colon, endometrial, esophageal, prostate, or skin tissue can be distinguishes at least a majority of the cancer samples from a evaluated. The tissue or cells can be obtained by any method majority of non-cancer Samples. An example of this is shown known in the art including, e.g., by Surgery, biopsy, phle in FIG. 6 and the accompanying text in the examples. If botomy, Swab, nipple discharge, stool, etc. In other embodi methylation of two or more DNA regions is detected, two or ments, a tissue sample known to contain cancerous cells, e.g., 30 more different threshold values (one for each DNA region) from a tumor, will be analyzed for the presence or quantity of will often, but not always, be used. Comparisons between a methylation at one or more of the diagnostic biomarkers of quantity of methylation of a sequence in a sample and a the invention to determine information about the cancer, e.g., threshold value in any way known in the art. For example, a the efficacy of certain treatments, the Survival expectancy of manual comparison can be made or a computer can compare the individual, etc. In some embodiments, the methods will be 35 and analyze the values to detect disease, assess the risk of used in conjunction with additional diagnostic methods, e.g., contracting disease, determining a predisposition to disease, detection of other cancer biomarkers, etc. stage disease, diagnose disease, monitor, or aid in the selec The methods of the invention can be used to evaluate indi tion of treatment for a person with disease. viduals known or Suspected to have cancer or as a routine In some embodiments, threshold values provide at least a clinical test, i.e., in an individual not necessarily Suspected to 40 specified sensitivity and specificity for detection of a particu have cancer. lar cancer type. In some embodiments, the threshold value Further, the present methods may be used to assess the allows for at least a 50%, 60%, 70%, or 80% sensitivity and efficacy of a course of treatment. For example, the efficacy of specificity for detection of a specific cancer, e.g., breast, lung, an anti-cancer treatment can be assessed by monitoring DNA renal, liver, ovarian, head and neck, thyroid, bladder, cervical, methylation of the biomarker sequences described herein 45 colon, endometrial, esophageal, prostate cancer or mela over time in a mammal having cancer. For example, a reduc noma. More detail regarding specificity and sensitivity for tion or absence of methylation in any of the diagnostic biom various cancers can be found in, e.g., Tables 5-6, and 8-20. arkers of the invention in a biological sample taken from a In embodiments involving prognosis of cancer (including, mammal following a treatment, compared to a level in a for example, the prediction of progression of non-malignant sample taken from the mammal before, or earlier in, the 50 lesions to invasive carcinoma, prediction of metastasis, pre treatment, indicates efficacious treatment. diction of disease recurrance or prediction of a response to a The methods detecting cancer can comprise the detection particular treatment), in some embodiments, the threshold of one or more other cancer-associated polynucleotide or value is set such that there is at least 10, 20, 30, 40, 50, 60, 70, polypeptides sequences. Accordingly, detection of methyla 80% or more sensitivity and at least 70% specificity with tion of any one or more of the diagnostic biomarkers of the 55 regard to detecting cancer. invention can be used either alone, or in combination with In some embodiments, the methods comprise recording a other biomarkers, for the diagnosis or prognosis of cancer. diagnosis, prognosis, risk assessment or classification, based The methods of the present invention can be used to deter on the methylation status determined from an individual. Any mine the optimal course of treatment in a mammal with type of recordation is contemplated, including electronic cancer. For example, the presence of methylated DNA within 60 recordation, e.g., by a computer. any of the diagnostic biomarkers of the invention or an V. Kits increased quantity of methylation within any of the diagnos This invention also provides kits for the detection and/or tic biomarkers of the invention can indicate a reduced survival quantification of the diagnostic biomarkers of the invention, expectancy of a mammal with cancer, thereby indicating a or expression or methylation thereof using the methods more aggressive treatment for the mammal. In addition, a 65 described herein. correlation can be readily established between the presence, For kits for detection of methylation, the kits of the inven absence or quantity of methylation at a diagnostic biomarker, tion can comprise at least one polynucleotide that hybridizes US 8,067,168 B2 23 24 to at least one of the diagnostic biomarker sequences of the 246, 247, 248, 249, 250, 251, 252,253, 254, 255, 256, 257, invention and at least one reagent for detection of gene methy 258, 259, 260, 261, 262, 263,264, and 265. A methylation lation. Reagents for detection of methylation include, e.g., binding moiety refers to a molecule (e.g., a polypeptide) that sodium bisulfite, polynucleotides designed to hybridize to specifically binds to methyl-cytosine. Examples include sequence that is the product of a biomarker sequence of the restriction enzymes or fragments thereof that lack DNA cut invention if the biomarker sequence is not methylated (e.g., ting activity but retain the ability to bind methylated DNA, containing at least one C->U conversion), and/or a methyla antibodies that specifically bind to methylated DNA, etc.). tion-sensitive or methylation-dependent restriction enzyme. VI. Computer-Based Methods The kits can provide Solid Supports in the form of an assay The calculations for the methods described herein can apparatus that is adapted to use in the assay. The kits may 10 involve computer-based calculations and tools. For example, further comprise detectable labels, optionally linked to a a methylation value for a DNA region or portion thereof can polynucleotide, e.g., a probe, in the kit. Other materials useful be compared by a computer to a threshold value, as described in the performance of the assays can also be included in the herein. The tools are advantageously provided in the form of kits, including test tubes, transfer pipettes, and the like. The computer programs that are executable by a general purpose kits can also include written instructions for the use of one or 15 computer system (referred to herein as a “host computer) of more of these reagents in any of the assays described herein. conventional design. The host computer may be configured In some embodiments, the kits of the invention comprise with many different hardware components and can be made one or more (e.g., 1,2,3,4, or more) different polynucleotides in many dimensions and styles (e.g., desktop PC, laptop, capable of specifically amplifying at least a portion of a DNA tablet PC, handheld computer, server, workstation, main region where the DNA region is a sequence selected from the frame). Standard components, such as monitors, keyboards, group consisting of SEQ ID NOs: 213, 214, 215, 216, 217, disk drives, CD and/or DVD drives, and the like, may be 218, 219, 220, 221, 222, 223, 224,225, 226, 227, 228, 229, included. Where the host computer is attached to a network, 230, 231, 232, 233,234, 235, 236, 237, 238,239, 240, 241, the connections may be provided via any suitable transport 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252,253, media (e.g., wired, optical, and/or wireless media) and any 254, 255, 256, 257, 258, 259, 260, 261, 262, 263,264, and 25 suitable communication protocol (e.g., TCP/IP); the host 265. Optionally, one or more detectably-labeled polypeptide computer may include Suitable networking hardware (e.g., capable of hybridizing to the amplified portion can also be modem, Ethernet card, WiFi card). The host computer may included in the kit. In some embodiments, the kits comprise implement any of a variety of operating systems, including sufficient primers to amplify 2, 3, 4, 5, 6, 7, 8, 9, 10, or more UNIX, Linux, Microsoft Windows, MacOS, or any other different DNA regions or portions thereof, and optionally 30 operating system. include detectably-labeled polynucleotides capable of Computer code for implementing aspects of the present hybridizing to each amplified DNA region orportion thereof. invention may be written in a variety of languages, including The kits further can comprise a methylation-dependent or PERL, C, C++, Java, JavaScript, VBScript, AWK, or any methylation sensitive restriction enzyme and/or sodium other scripting or programming language that can be executed bisulfite. 35 on the host computer or that can be compiled to execute on the In some embodiments, the kits comprise sodium bisulfite, host computer. Code may also be written or distributed in low primers and adapters (e.g., oligonucleotides that can be level languages such as assembler languages or machinelan ligated or otherwise linked to genomic fragments) for whole guages. genome amplification, and polynucleotides (e.g., detectably The host computer system advantageously provides an labeled polynucleotoides) to quantify the presence of the 40 interface via which the user controls operation of the tools. In converted methylated and or the converted unmethylated the examples described herein, software tools are imple sequence of at least one cytosine from a DNA region that is mented as Scripts (e.g., using PERL), execution of which can selected from the group consisting of SEQID NOS:213,214, be initiated by a user from a standard command line interface 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, ofan operating system such as Linux or UNIX. Those skilled 227, 228, 229, 230, 231, 232, 233,234, 235, 236, 237, 238, 45 in the art will appreciate that commands can be adapted to the 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, operating system as appropriate. In other embodiments, a 251, 252,253,254, 255, 256, 257, 258, 259, 260, 261, 262, graphical user interface may be provided, allowing the user to 263,264, and 265. control operations using a pointing device. Thus, the present In some embodiments, the kits comprise a methylation invention is not limited to any particular user interface. sensing restriction enzymes (e.g., a methylation-dependent 50 Scripts or programs incorporating various features of the restriction enzyme and/or a methylation-sensitive restriction present invention may be encoded on various computer read enzyme), primers and adapters for whole genome amplifica able media for storage and/or transmission. Examples of suit tion, and polynucleotides to quantify the number of copies of able media include magnetic disk or tape, optical storage at least a portion of a DNA region where the DNA region is media such as compact disk (CD) or DVD (digital versatile selected from the group consisting of SEQID NOS: 213,214, 55 disk), flash memory, and carrier signals adapted for transmis 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, sion via wired, optical, and/or wireless networks conforming 227, 228, 229, 230, 231, 232, 233,234, 235, 236, 237, 238, to a variety of protocols, including the Internet. 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252,253,254, 255, 256, 257, 258, 259, 260, 261, 262, EXAMPLES 263,264, and 265. 60 In some embodiments, the kits comprise a methylation Example 1 binding moiety and one or more polynucleotides to quantify the number of copies of at least a portion of a DNA region Identification of Breast Cancer DNA Methylation where the DNA region is selected from the group consisting Biomarkers of SEQID NOs: 213,214, 215, 216,217, 218, 219, 220, 221, 65 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, Loci that are differentially methylated in breast tumors 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, relative to matched adjacent histologically normal breast tis US 8,067,168 B2 25 26 sue were identified using a DNA microarray-based technol cent normal samples) were analyzed in this way. Therefore, ogy platform (U.S. Pat. No. 7,186.512) that utilizes the the discovery experiment included a total of 80 microarray methylation-dependent restriction enzyme McrBC. In this hybridizations. discovery phase, 10 infiltrating ductal breast carcinomas (9 The microarray described in this Example consists of Stage II, I Stage III) and 10 matched adjacent histologically 85,176 features. Each 60 mer oligonucleotide feature is rep normal breast tissue samples were analyzed. Purified resented by four replicates per microarray slide, yielding a genomic DNA from each sample (60 ug) was randomly total of 21.294 unique feature probes. The features represent sheared to a range of 1 to 4 kb. The sheared DNA of each 19,595 randomly selected human transcriptional start sites sample was then split into four equal portions of 15ug each. (TSS) representing 79% of the identified human genes, 1395 Two portions were digested with McrBC under the following 10 GenBank BAC annotated CG islands (CGi), 161 features conditions: 15ug sheared genomic DNA, 1X NEB2 buffer spanning ~165 kb along the MTAPase/CDKN2A/B locus on (New England Biolabs), 0.1 mg/mL bovine serum albumin chromosome 9, 66 additional features dedicated to cancer (New England Biolabs), 2 mM GTP (Roche) and 120 units of gene promoters, and 77 features designed as copy number McrBC enzyme (New England Biolabs) in a total volume of 15 (HERV. LINEs. SINE) and other controls. Together, the TSS 600 uL at 37° C. for approximately 12 hours. These two features and CGi features scan more than 9000 UCSC anno portions represent a technical replicate of McrBC digestion tated human CG islands. (Treated 1 and Treated 2). The remaining two 15ug portions Following statistical analysis of these datasets, loci that were mock treated under identical conditions with the excep were predicted to be differentially methylated in at least 70% tion that 12 uL of sterile 50% glycerol were added instead of of tumors relative to normal tissues were identified. As McrBC enzyme. These two portions represent a technical described in the Examples below, differential DNA methyla tion of a collection of 53 loci identified by the microarray replicate of mock treatment (Untreated 1 and Untreated 2). discovery experiment described herein was verified within All reactions were treated with 5uL proteinase K (50 mg/mL) the discovery panel of 10 infiltrating ductal breast carcinomas for 1 hour at 50° C., and precipitated with EtOH under stan relative to 10 matched adjacent histologically normal breast dard conditions. Pellets were washed twice with 70% EtOH, 25 samples, as well as validated in larger panels of independent dried and resuspended in 30 ul H2O. Samples were then infiltrating ductal breast carcinomas, normal breast samples resolved on a 1% low melting point SeaPlaque GTG Agarose and normal female peripheral blood samples. Tables 1 and 2 gel (Cambridge Bio Sciences). Untreated 1 and Treated 1 and the “SEQUENCE LISTING” section list the unique portions were resolved side-by-side, as were Untreated 2 and microarray feature identifier (Feature name) for each of these Treated 2 portions. 1 kb DNA sizing ladder was resolved 30 53 loci. Locus number is an arbitrary locus identifier that will adjacent to each untreated/treated pair to guide accurate gel be used to identify the loci in the following examples. Of the slice excision. Gels were visualized with long-wave UV, and 53 features, 48 represent sequence within 1 kb of at least one gel slices including DNA within the modal size range of the annotated transcribed gene. These are referred to in the table untreated fraction (approximately 1-4 kb) were excised with by the Ensembl gene ID, as well as the official gene symbol a clean razorblade. DNA was extracted from gel slices using for each gene (Gene Name). The genomic region in which a gel extraction kits (Qiagen). given microarray feature can report DNA methylation status McrBC recognizes a pair of methylated cytosine residues is dependent upon the molecular size of the DNA fragments in the context 5'-Pu"C(Noco) Pu"C-3' (where Pu-A or G, that were labeled for the microarray hybridizations. As "C=5-methylcytosine, and N=any nucleotide), and cleaves 40 described above, DNA in the size range of 1 to 4 kb was within approximately 30 base-pairs from one of the methy purified by agarose gel extraction and used as template for lated cytosine residues. Therefore, locithat include high local cyanogen dye labeling. Therefore, the genomic region inter densities of Pu "C will be cleaved to a greater extent than loci rogated by each microarray feature is at least 1kb (i.e., 500 bp that include low local densities of Pu"C. Since Untreated and upstream and 500 bp downstream of the sequence represented Treated portions were resolved by agarose gel electrophore 45 by the microarray feature). Note that 5 features represent loci sis, and DNA within the modal size range of the Untreated in which there is no annotated transcribed gene within this 1 portions were excised and gel extracted, the Untreated por kb “wingspan’ (Locus numbers 3, 9, 20, 31, and 47). Also tions represent the entire fragmented genome of the sample note that 8 features represent loci in which more than one while the Treated portions are depleted of DNA fragments annotated transcribed gene falls within wingspan (Locus including Pu "C. Fractions were analyzed using a duplicated 50 numbers 21,26, 27, 29, 35,38,39, and 53). DNA methylation dye Swap microarray hybridization paradigm. For example, at these loci can affect the regulation of any of these neigh equal mass (200 ng) of Untreated 1 and Treated 1 fraction boring genes, and thus detection of gene expression from DNA were used as template for labeling with Cy3 and Cy5, neighboring genes is also useful for determining the presence repectively, and hybridized to a DNA microarray (described or absence of cancer for numerous types of diagnostic tests. below). Equal mass (200 ng) of the same Untreated 1 and 55 Treated 1 fraction DNA were used as template for labeling TABLE 1 with Cy5 and Cy3, respectively, and hybridized to a second Microarray Features Reporting Differential DNA Methylation And Identity DNA microarray (these two hybridizations represent a dye Of Annotated Genes Within 1kb Of Each Feature. swap of Untreated 1/Treated 1 fractions). Equal mass (200 ng) of Untreated 2 and Treated 2 fraction DNA were used as 60 Locus template for labeling with Cy3 and Cy5, respectively, and Featurename Number Ensembl Gene ID Gene Name hybridized to a third DNA microarray. Finally, equal mass ha1g O0681 1 ENSGOOOOO105997 HOXA3 ha1p 391.89 2 ENSGOOOOO121853 GHSR (200 ng) of Untreated 2 and Treated 2 fraction DNA were ha1g 0.0644 3 NA used as template for labeling with Cy5 and Cy3, respectively, halp 81674 4 ENSGOOOOO174197 MGA and hybridized to a fourth DNA microarray (the final two 65 halp 81149 5 ENSGOOOOO122971 ACADS hybridizations represent a technical replicate of the first dye halp 83841 6 ENSGOOOOO178187 ZNF454 swap). All 20 DNA samples (10 tumor samples and 10 adja US 8,067,168 B2 27 28 TABLE 1-continued TABLE 1-continued Microarray Features Reporting Differential DNA Methylation And Identity Microarray Features Reporting Differential DNA Methylation And Identity Of Annotated Genes Within 1kb Of Each Feature. Of Annotated Genes Within 1kb Of Each Feature.

Locus Locus Featurename Number Ensembl Gene ID Gene Name Featurename Number Ensembl Gene ID Gene Name ha1p. 38705 7 ENSGOOOOO163638 ADAMTS9 ha1p 12646 S3. ENSGOOOOO107833. NPM3 ha1p 40164 8 ENSGOOOOO1188SS MFSD1 ENSGOOOOO1984.08 MGEAS ha1p. 23178 9 NA ha1p 46.057 10 ENSGOOOOO132640 BTBD3 10 ha1p 40959 11 ENSGOOOOO111707 SUDS3 ha1p 104423 12 ENSGOOOOOOO8441 NFIX Example 2 ha1g O0847 13 ENSGOOOOO116106 EPHA4 ha1p 08347 14 ENSGOOOOO134802 SLC43 A3 ha1g 02416 1S ENSGOOOOO172238 ATOH1 Design of Independent DNA Methylation ha1p 87540 16 ENSGOOOOO159403 PREDICTED: Similar Verification and Validation Assays to Complement C1r 15 Subcomponent precursor PCR primers that interrogated the 53 loci predicted to be ha1p 1101.07 17 ENSGOOOOO1222S4 HS3ST2 differentially methylated between breast tumor and adjacent ha1p. 89799 18 ENSGOOOOO163739 CXCL1 ha1p 45.173 19 ENSGOOOOO120915 EPHX2 histologically normal breast tissue were designed. Due to the ha1p. 80771 20 NAA functional properties of the enzyme, DNA methylation-de ha1p 69407 21 ENSGOOOOO180667 YOD1 pendent depletion of DNA fragments by McrBC is capable of ENSGOOOOO198878 O9P1L8 HUMAN monitoring the DNA methylation status of sequences neigh ha1p O5406 22 ENSGOOOOO16SSS6 CDX2 boring the genomic sequences represented by the features on ha1p 80287 23 ENSGOOOOO109113 RAB34 ha1g 02345 24 ENSGOOOOO122592 HOXA7 the microarray described in Example 1 (wingspan). Since the ha1p 36,172 2S ENSGOOOOO185070 FLRT2 size of DNA fragments analyzed as described in Example 1 ha1p. 70459 26 ENSGOOOOO163481 RNF2S 25 was approximately 1-4 kb, we selected a 1kb region spanning ENSGOOOOO1634.82 STK36 the sequence represented by the microarray feature as an ha1p 105937 27 ENSGOOOOO161551 ZNF577 estimate of the predicted region of differential methylation. ENSGOOOOO198093 ZNF649 ha1p 89.099 28 ENSGOOOOOO62485 CS For each locus, PCR primers were selected within this ha1g 03099 29 ENSGOOOOOO66032 CTNNA2 approximately 1 kb region flanking the genomic sequence ENSGOOOOO162951 LRRTM1 30 represented on the DNA microarray (approximately 500 bp ha1p. 67625 30 ENSGOOOOO130711 PRDM12 upstream and 500 bp downstream). Selection of primer ha1g 00218 31 N.A sequences was guided by uniqueness of the primer sequence ha1p 12535 32 ENSGOOOOO149090 RAMP across the genome, as well as the distribution of purine-CG ha1p 105474 33 ENSGOOOOOO33627 ATP6V0A1 sequences within the 1 kb region. PCR primer pairs were ha1p. 74707 34 ENSGOOOOOO10278 CD9 selected to amplify an approximately 400-600 bp sequence ha1p 93325 3S ENSGOOOOO101019 C20orf24 35 ENSGOOOOO125965 GDFS within each 1 kb region. For demonstration, an example of ha1p 101161 36 ENSGOOOOO130176 CNN1 one such PCR amplicon design is shown in FIG.1. A graphi ha1p 101251 37 ENSGOOOOO14223S LMTK3 cal representation of the transcription start site and 5' struc ha1p 69214 38 ENSGOOOOO158927 C8orf58 ture of one predicted differentially methylated gene is indi ENSGOOOOO158941 KIAA1967 cated (A). The bar graph (B) indicates the relative local ENSGOOOOO183646 40 density of purine-CG sequences within this region. The rela ha1p. 88517 39 ENSGOOOOO101.412 E2F1 ENSGOOOOO125967 APBA2BP tive position of the DNA microarray feature that reported ha1p 103824 40 ENSGOOOOO167178 ISLR2 differential DNA methylation at this locus is indicated by (C). ha1p 108445 41 ENSGOOOOO175287 PHYHD1 PCR primers were selected to amplify the region indicated by ha1g 02210 42 ENSGOOOOO151615 POU4F2 (D). The vertical bars (E and F) represent the microarray DNA ha1p 103872 43 ENSGOOOOO129009 ISLR 45 methylation measurement representing all breast tumors (E) ha1p 56412 44 ENSGOOOOO17S182 C3orfa.0 and all normal breast samples (F). For example, this locus is ha1p 18292 45 ENSGOOOOO115561 WPS24 predicted to be hypermethylated in the breast tumors (positive ha1p 12075 46 ENSGOOOOO164619 BMPER value) relative to the adjacent normal breast samples (nega ha1p 22519 47 NA ha1p. 29531 48 ENSGOOOOOO60718 COL11A1 tive value). Suitable PCR cycling conditions for the 53 primer ha1p 58853 49 ENSGOOOOO113648 H2AFY 50 pairs were empirically determined, and amplification of a ha1p 35052 SO ENSGOOOOO111341 MGP specific PCR amplicon of the correct size was verified. The ha1p. 67002 S1 ENSGOOOOO15944S THEM4 sequences of the 53 microarray features, primer pairs and ha1p. 45580 S2 ENSGOOOOO168079 SCARAS amplicons are indicated in Table 2, and in the “SEQUENCE LISTING Section. TABLE 2 Sequence identification numbers for all sequences described in the application. See, section "SEQUENCE LISTING for actual sequences as listed by locus number in the table. Locus Feature Feature Left Right Annealing DNA Region Selection Number Name Seq. Primer Primer Temp. Amplicon Seq. Seq. Criteria

1 ha1g 0.0681 1 S4 107 66 C. 160 213 CG 2 ha1p 391.89 2 55 108 66 C. 161 214 CG 3 ha1g 0.0644 3 56 109 66 C. 162 215 CG 4 halp 81674 4 57 110 66 C. 163 216 CG 5 halp 81149 5 58 111 66 C. 164 217 CG 6 halp 83841 6 59 112 66 C. 16S 218 CG US 8,067,168 B2 29 30 TABLE 2-continued Sequence identification numbers for all sequences described in the application. See, section 'SEQUENCE LISTING for actual sequences as listed by locus number in the table. Locus Feature Feature Left Right Annealing DNA Region Selection Number Name Seq. Primer Primer Temp. Amplicon Seq. Seq. Criteria 7 ha1p 38705 7 60 13 66 C. 66 219 CG 8 ha1p 40164 8 61 14 66 C. 67 220 CG 9 ha1p. 231.78 9 62 15 66 C. 68 221 CG 10 ha1p 46057 10 63 16 66 C. 69 222 CG 11 ha1p 40959 11 64 17 66 C. 70 223 CG 12 ha1p 104423 12 65 18 66 C. 71 224 CG 13 ha1g O0847 13 66 19 66 C. 72 225 CG 14 ha1p 08347 14 67 2O 66 C. 73 226 CG 15 ha1g 02416 15 68 21 66 C. 74 227 CG 16 ha1p 87540 16 69 22 66 C. 75 228 CG 17 ha1p 1101.07 17 70 23 66 C. 76 229 CG 18 ha1p. 89799 18 71 24 66 C. 77 230 CG 19 ha1p 45.173 19 72 25 66 C. 78 231 CG 2O ha1p 80771 2O 73 26 66 C. 79 232 CG 21 ha1p 69407 21 74 27 66 C. 8O 233 CG 22 ha1p O5406 22 75 28 66 C. 81 234 CG 23 ha1p 80287 23 76 29 66 C. 82 235 CG 24 ha1g 02345 24 77 30 72 C. 83 236 CG 25 ha1p. 36172 25 78 31 72 C. 84 237 CG 26 ha1p. 70459 26 79 32 66 C. 85 238 CG 27 ha1p 105937 27 8O 33 66 C. 86 239 CG 28 ha1p 89.099 28 81 34 66 C. 87 240 CG 29 ha1g 03099 29 82 35 66 C. 88 241 CG 30 ha1p. 67625 30 83 36 72 C. 89 242 CG 31 ha1g 00218 31 84 37 66 C. 90 243 CG 32 ha1p 12535 32 85 38 66 C. 91 244 CG 33 ha1p 105474 33 86 39 66 C. 92 245 1 kb 34 ha1p. 74707 34 87 40 66 C. 93 246 CG 35 ha1p 93325 35 88 41 66 C. 94 247 1 kb 36 ha1p 101161 36 89 42 66 C. 95 248 1 kb 37 ha1p 101251 37 90 43 66 C. 96 249 1 kb 38 ha1p 69214 38 91 44 66 C. 97 250 1 kb 39 ha1p 88517 39 92 45 66 C. 98 251 CG 40 ha1p 103824 40 93 46 66 C. 99 252 CG 41 ha1p 108445 41 94 47 72 C. 200 253 1 kb 42 ha1g 02210 42 95 48 72 C. 2O1 254 CG 43 ha1p 103872 43 96 49 66 C. 2O2 255 CG 44 ha1p 56.412 44 97 50 72 C. 2O3 256 CG 45 ha1p 18292 45 98 51 72 C. 204 257 CG 46 ha1p 12075 46 99 52 66 C. 205 258 CG 47 ha1p 22519 47 100 53 66 C. 2O6 259 CG 48 ha1p. 29531 48 101 S4 66 C. 2O7 260 CG 49 ha1p 58853 49 102 55 66 C. 208 261 CG 50 ha1p 35052 50 103 56 66 C. 209 262 1 kb 51 ha1p 67002 51 104 57 72 C. 210 263 CG 52 ha1p. 45580 52 105 58 72 C. 211 264 CG 53 ha1p 12646 53 106 59 66 C. 212 26S CG

Example 3 except that 8 uI of sterile 50% glycerol was added instead of McrBC enzyme. Reactions were incubated at 37° C. for Verification of Microarray DNA Methylation so approximately 12 hours, followed by incubation at 60° C. for Predictions 20 minutes to inactivate McrBC. The extent of McrBC cleavage at each locus was monitored Initially, the DNA methylation state of these 53 loci was by quantitative real-time PCR (qPCR). For each assayed independently assayed in the 10 infiltrating ductal breast car locus, qPCR was performed using 20 ng of the Untreated cinoma samples and the 10 matched adjacent histologically 55 Portion DNA as template and, separately, using 20 ng of the normal samples described above (i.e., the discovery tissue Treated Portion DNA as template. Each reaction was per panel used for microarray experiments). DNA methylation formed in 10 uL total volume including 1X LightCycler 480 was assayed by a quantitative PCR approach utilizing diges SYBR Green I Master mix (Roche) and 625 nM of each tion by the MicrBC restriction enzyme to monitor DNA primer. Reactions were run in a Roche LightCycler 480 methylation status. Genomic DNA purified from each sample 60 instrument. Optimal annealing temperatures varied depend was split into two equal portions of 9.6 g. One 9.6 ugportion ing on the primer pair. Primer sequences (Left Primer; Right (Treated Portion) was digested with McrBC in a total volume Primer) and appropriate annealing temperatures (Annealing of 120 uL including 1XNEB2 buffer (New England Biolabs), Temp.) are shown in Table 2. Cycling conditions were: 95°C. 0.1 mg/mL bovine serum albumin (New England Biolabs), 2 for 5 min.: 45 cycles of 95° C. for 1 min. annealing tem mM GTP (Roche) and 80 units of McrBC enzyme (New 65 perature, see Table 2 for 30 sec. 72° C. for 1 min.., 83°C. for England Biolabs). The second 9.6 ug portion (Untreated Por 2 sec. followed by a plate read. Melting curves were calcu tion) was treated exactly the same as the Treated Portion, lated under the following conditions: 95°C. for 5 sec. 65° C. US 8,067,168 B2 31 32 for 1 min., 65° C. to 95° C. at 2.5° C./sec. ramp rate with was digested with McrBC (Treated Portion) in a total volume continuous plate reads. Each Untreated/Treated qPCR reac of 200 uL including 1XNEB2 buffer (New England Biolabs), tion pair was performed in duplicate. The difference in the 0.1 mg/mL bovine serum albumin (New England Biolabs), 2 cycle number at which amplification crossed threshold (delta mM GTP (Roche) and 32 units McrBC (New England Ct) was calculated for each Untreated/Treated qPCR reaction Biolabs). The second portion was mock treated under identi pair by subtracting the Ct of the Untreated Portion from the Ct cal conditions, except that 3.2 uL sterile 50% glycerol was of the Treated Portion. Because McrBC-mediated cleavage added instead of McrBC enzyme (Untreated Portion). between the two primers increases the Ct of the Treated Samples were incubated at 37° C. for approximately 12 Portion, increasing delta Clt values reflect increasing mea hours, followed by incubation at 60° C. to inactivate the surements of local DNA methylation densities. The average 10 McrBC enzyme. qPCR reactions and data analysis were per delta Ct between the two replicate Untreated/Treated qPCR formed as described in Example 3. reactions was calculated, as well as the standard deviation The DNA methylation state measurements are summa between the two delta Ct values. rized in FIG. 4. As described above, each locus was scored as For demonstration purposes, amplification profiles for one unmethylated (average delta Ct <1.0, open boxes), methy locus (GHSR) in a tumor sample and a normal sample are 15 lated (average delta Ct-1.0 and <2.0, grey boxes) or densely shown in FIG. 2. Panel A shows the untreated/treated PCR methylated (average delta Ct >2.0, black boxes). Measure replicate 1 for amplification of the GHSRamplicon in a breast ments with a standard deviation between pPCR replicates >1 tumor sample. The delta Ct (Treated 1—Untreated 1) is 5.38 cycle were not included in the analysis (ND). Table 3 indi cycles. Panel B shows the untreated/treated PCR replicate 2 cates the percent sensitivity and specificity for each locus. for amplification of the same amplicon from the same tumor Sensitivity reflects the frequency of scoring a known tumor sample. The delta Ct (Treated 2 Untreated 2) is 5.40 cycles. sample as positive for DNA methylation at each locus. Speci The average delta Ct of the two replicates is 5.39 cycles, ficity reflects the frequency of Scoring a known normal representing a -97% reduction of amplifiable copies in the sample as negative for DNA methylation at each locus. As treated relative to the untreated portions 100%-((1/2 delta described above, an average delta Ct-1.0 (Treated Portion— Ct)x100). The standard deviation of the delta Ct’s between 25 Untreated Portion) was used as a threshold to score a sample the two qPCR replicates is 0.01 cycles. Panel C shows the as positive for DNA methylation at each locus (representing untreated/treated PCR replicate 1 for amplification of the >~50% depletion of amplifiable molecules in the DNA GHSR amplicon in a normal breast sample. The delta Ct methylation-dependent restricted population relative to the (Treated 1—Untreated 1) is 0.18 cycles. Panel D shows the untreated population). Percent sensitivity was calculated as untreated/treated PCR replicate 2 for amplification of the 30 the number of tumor samples with an average delta Ct >1.0 same amplicon from the same normal sample. The delta Ct divided by the total number of tumor samples analyzed for (Treated 2 Untreated 2) is 0.03 cycles. The average delta Ct that locus (i.e. excluding any measurements with a standard of the two replicates is 0.11 cycles, representing a ~7% reduc deviation between qPCR replicates >1 cycle)x100. Percent tion of amplifiable copies in the treated relative to the specificity was calculated as (1-(the number of normal untreated portions. The standard deviation of the delta Ct's 35 samples with an average delta Ct >1.0 divided by the total between the two qPCR replicates is 0.11 cycles. The average number of normal samples analyzed for that locus))x100. As delta Ct of the tumor sample would be scored as a methylated shown in Table 3, the 53 loci have sensitivities >13% and locus. In contrast, the average delta Clt of the normal sample specificities >80%. Notably, 33 of the 53 loci have 100% would be scored as a relatively unmethylated locus. An aver specificity. It is important to point out that the sensitivity and age delta Ct of >1.0 cycle, representing >~50% reduction of 40 specificity of the differential DNA methylation status of any amplifiable copies in the treated relative to the untreated given locus may be increased by further optimization of the portions, was set as the threshold for scoring a sample as precise local genetic region interrogated by a DNA methyla positive for DNA methylation. Any average delta Ct measure tion-sensing assay. ment with a standard deviation >1.0 cycle in qPCR replicates was excluded as an unreliable measurement (ND in FIG. 3). 45 TABLE 3 Finally, any average delta Ct <0 was adjusted to 0. FIG.3 shows the results of the DNA methylation measure Sensitivity and specificity of differentially methylated loci ments for the 53 loci in the 10 tumor samples and 10 normal in a panel of 25 normal breast and 16 breast tumor samples. samples used in the microarray discovery experiment. Open FEATUREID LOCUS NUMBER SENSITIVITY SPECIFICITY boxes represent loci that are unmethylated (average delta Ct 50 ha1g O0681 1 86% OO% <1.0), grey boxes represent loci that are methylated (average ha1p 391.89 2 81% OO% delta Ct >1 and <2), and black boxes represent loci that are ha1g O0644 3 79% OO% densely methylated (average delta Ct >2). ha1p. 81674 4 69% OO% ha1p 81149 5 69% OO% Example 4 55 ha1p 83841 6 69% OO% ha1p. 38705 7 63% OO% ha1p 40164 8 63% OO% Validation of DNA Methylation Changes in ha1p. 23178 9 63% OO% Independent Breast Tumor and Normal Breast ha1p 46057 10 S6% OO% Samples ha1p 40959 11 S6% OO% ha1p 104423 12 SO% OO% 60 ha1g 0.0847 13 SO% OO% The differential DNA methylation status of the 53 loci was ha1p 08347 14 SO% OO% further validated by analyzing an independent panel of 16 ha1g 02416 15 44% OO% infiltrating ductal breast carcinoma samples (1 Stage I, 4 ha1p 87540 16 44% OO% ha1p 1101.07 17 38% OO% Stage II, 11 Stage III) and 25 normal breast tissue samples. ha1p. 89799 18 31% OO% The normal breast tissues included in this panel were 65 ha1p 45.173 19 31% OO% obtained from biopsies unrelated to breast cancer. Each ha1p. 80771 2O 25% OO% sample was split into two equal portions of 4 g. One portion US 8,067,168 B2 33 34 TABLE 3-continued TABLE 4-continued Sensitivity and specificity of differentially methylated loci in a panel of 25 normal breast and 16 breast tumor samples. Sensitivity and specificity of differentially methylated loci in a panel of 25 normal breast and 25 breast tumor samples and 25 normal blood FEATUREID LOCUS NUMBER SENSITIVITY SPECIFICITY 5 samples. ha1p 69407 21 25% 100% ha1p O5406 22 25% 100% ha1p 80287 23 25% 100% SPECIFICITY ha1g 02345 24 20% 100% LOCUS SENSI- VSNORMAL SPECIFICITY ha1p 36,172 25 20% 100% 10 ha1p. 70459 26 1996 100% FEATUREID NUMBER TIVITY BREAST WS BLOOD ha1p 105937 27 1996 100% ha1p 89.099 28 1996 100% ha1g 02416 15 54% 100% 100% ha1g 03099 29 14% 100% ha1p. 67625 30 13% 100% ha1p 110107 17 52% 100% 100% ha1g 00218 31 13% 100% 15 ha1p 89799 18 40% 100% 100% ha1p 12535 32 75% 96% ha1p 105474 33 75% 96% ha1p 80771 2O 36% 100% 100% ha1p. 74707 34 69% 96% ha1p 69407 21 26% 92% 96% ha1p 93325 35 69% 96% ha1p 101161 36 S6% 96% halp O5406 22 32% 100% 100% ha1p 101251 37 87% 96% ha1g 02345 24 1796 100% 100% ha1p 69214 38 81% 96% halp 36,172 25 38% 100% 100% ha1p. 88517 39 69% 96% ha1p 103824 40 1796 96% ha1p 70459 26 24% 100% 96% ha1p 108445 41 60% 95% ha1p 67625 30 35% 100% 100% ha1g 02210 42 18% 95% ha1p 103872 43 94% 92% ha1p 56412 44 79% 92% 25 ha1p 18292 45 60% 91% ha1p 12075 46 31% 90% ha1p 22519 47 88% 88% Example 6 ha1p. 29531 48 88% 88% ha1p 58853 49 50% 88% ha1p 35052 50 31% 88% 30 Demonstration of a DNA Methylation Measurement ha1p. 67002 51 64% 83% Threshold ha1p. 45580 52 40% 83% ha1p 12646 53 81% 80% In the examples above, a threshold for scoring differential 35 methylation (average delta Ctd 1.0) was established and indis criminately applied to all loci. However, the most informative Example 5 threshold is dependent upon the specific locus in question. Further Validation of Selected DNA Methylation This is demonstrated in FIG. 6. The graph shows the average Biomarkers in a Larger Panel of Breast Tumor delta Ct (Treated Portion. Untreated Portion) for the ana Samples, Normal Breast Samples, and Normal 40 lyzed region of the GHSR locus in 25 tumor samples, 25-nor Female Peripheral Blood Samples mal breast samples, and 24 normal female peripheral blood samples. Using an average delta Ct threshold of >1.0 as the A panel of 15 loci were selected for further validation in a criteria for a positive DNA methylation measurement, sensi panel of 9 additional 5 infiltrating ductal breast carcinoma samples, bringing the total number of tumor samples ana 45 tivity is 84%, specificity relative to normal tissue is 100% and lyzed to 25 (I Stage II, Stage III). In addition, 25 normal specificity relative to blood is 96% (Table 4). However, an female peripheral blood samples were analyzed. Samples optimal threshold may be set for each individual locus, and were treated and analyzed as described in Example 4. FIG. 5 this threshold is dependent upon the technology used to detect shows the results of these analyses, including the 25 normal 50 the differential DNA methylation state. For example, in the breast samples described in Example 4. As shown in Table 4. GSHR example shown in FIG. 6, a threshold of 1.3 (hatched these loci display >17% sensitivity, >92% specificity relative line in figure) would adjust the specificity relative to blood to to normal breast tissue, and >92% specificity relative to nor mal female peripheral blood. 100%. TABLE 4 55 Example 7 Sensitivity and specificity of differentially methylated loci in a panel of 25 normal breast and 25 breast tumor samples and 25 normal blood Samples. Validation of Selected DNA Methylation Biomarkers in a Panel Including Approximately 100 Breast SPECIFICITY 60 Tumor Samples And 100 Normal Breast Samples LOCUS SENSI- VSNORMAL SPECIFICITY FEATUREID NUMBER TIVITY BREAST WS BLOOD ha1p 39189 2 84% 100% 96% A panel of 16 biomarker loci was further validated in ha1g O0644 3 83% 100% 100% additional breast tumor and normal breast samples. In total, halp 81674 4 76% 100% 95% 65 approximately 100 samples were analyzed for each group. ha1p 74707 34 729, 96% 100% The total number of samples analyzed for each biomarker and ha1p 101251 37 88% 96% 92% for each sample category is reported in Table 5. US 8,067,168 B2 35 36 TABLE 5 Sensitivity and specificity of differentially methylated loci in a panel of approximately 100 breast tumor samples, 100 normal breast samples and 25 normal blood samples. No. No. Positive Total Total Locus Positive Total Normal Normal % Specificity No. Positive Normal % Specificity Feature ID Number Tumor Tumor % Sensitivity Breast Breast (Normal Breast) Normal Blood Blood (Normal Blood) ha1p 39189 2 87 O2 85% 1 O3 99% 1 24 96% ha1g O0644 3 74 O1 7396 1 O4 99% O 25 OO% ha1p. 81674 4 71 99 729% 10 93 89% 1 19 95% ha1p 101251 37 67 O1 66% 4 O2 96% 2 24 92% ha1p. 74707 34 55 O2 S4% 1 O4 99% O 25 OO% ha1g 02416 15 41 O1 41% 1 O4 99% O 25 OO% ha1p. 67625 30 36 91 40% 1 99 99% O 21 OO% ha1p 69407 21 37 OO 37% 6 O3 94% 1 23 96% ha1p 104423 12 36 98 37% O 97 100% 1 24 96% ha1p 36,172 25 34 O2 33% O O4 100% O 23 OO% ha1p. 89799 18 27 O1 27% O 99 100% O 24 OO% ha1p. 80771 2O 27 O3 26% O O4 100% O 25 OO% ha1p. 70459 26 26 O1 26% O O3 100% 1 24 96% ha1p 1101.07 17 24 O1 24% O O4 100% O 24 OO% ha1p O5406 22 23 O3 22% O O3 100% O 24 OO% ha1g 02345 24 17 O2 1796 O O3 100% O 22 OO% No. Positive Tumor: Number of tumor samples that reported avg. dCt 21.0 (methylated locus). Total Tumor: Number of tumor samples tested. % Sensitivity; (No. Positive Tumor Total Tumor) x 100 No. Positive Normal Breast: Number of normal breast samples that reported avg. dCt 2 1.0 (methylated locus). Total Normal Breast: Number of normal breast samples tested. % Specificity (Nomral Breast): (1 - (No. Positive Nomral Breast Total Normal Breast)) x 100 No. Posivite Normal Blood: Number of normal blood samples that reported avg. dCt 21.0 (methylated locus). Total Normal Blood: Number of normal blood samples tested. % Specificity (Normal Blood); (1 - (No. Positive Normal Blood. Total Normal Blood)) x 100

30 Example 8 described above. The loci included feature ID halp 39.189 (locus number 2), halg 00644 (locus number 3) and halp Bisulfite Sequencing Confirmation of Differential 104423 (locus number 12). Primer sequences lacked CpG DNA Methylation dinucleotides, and therefore amplify bisulfite converted DNA 35 independently of DNA methylation status. For each ampli An example of confirmation of differential DNA methyla con, products were amplified from three normal breast DNA tion by bisulfite sequencing is shown in FIG. 7. Primers were samples that reported average dCt values <0.5, three normal designed to amplify a 130 bp amplicon within the 412 bp breast DNA samples that reported average dCt values region of Nuclear Factor 1 X-type analyzed by qPCR (as between 0.5 and 1.0, and three breast tumor DNA samples discussed in the Examples above) from bisulfite converted 40 that reported average dCt values greater than 1.0. Amplicons genomic DNA. Primers sequences lack CpG dinucleotides, were purified and cloned using TA cloning kits (Invitrogen). and therefore amplify bisulfite converted DNA independently At least 29 independent clones were sequenced per amplicon, of DNA methylation status. Products were amplified from per locus. FIG. 8 shows the median 5-methylcytosine content one tumor sample (positive for DNA methylation) and from for all sequenced clones per amplicon plotted against the one pooled normal female peripheral blood sample. Ampli 45 average dCt value for that locus in the same DNA sample. The cons were purified and cloned using TA cloning kits (Invit dashed vertical line represents the dCt=1.0 threshold used to rogen). Eighteen (18) independent clones were sequenced for indicate a positive qPCR measurement for DNA methylation the tumor sample. Seven (7) independent clones were detection. These data verify the differential DNA methylation sequenced for the blood sample. Bisulfite treatment results in content in tumors relative to normal breast samples. Further conversion of unmethylated cytosines to uracil, but does not 50 more, the linear relationship between the qPCR measurement convert methylated cytosines. The percent methylation of and the 5-methylcytosine content determined by bisulfite each CpG dinucleotide within the region was calculated as the sequencing (R-0.7965) provides justification for the high number of sequence reads of C at each CpG divided by the throughput qPCR method for DNA methylation detection. total number of sequence reads. FIG. 7A shows the '% methy lation occupancy for each of the 18 CpG dinucleotides in the 55 Example 9 tumor sample. FIG. 7B shows the 96 methylation occupancy for each of the 18 CpG dinucleotides in the normal blood Selection of Sequence Identified As Potential Region sample. All 18 CpG dinucleotides are methylated in the tumor of Differential DNA Methylation (occupany ranging from 11% to 89%). However, only one CpG dinucleotide displayed methylation in the normal blood 60 As described in the examples above, the loci identified as sample (14%). differentially methylated were originally discovered based on To provide further confirmation of DNA methylation dif DNA methylation-dependent microarray analyses. The ferences and to justify the qPCR based strategy for high sequences of the 53 microarray features reporting this differ throughput detection of DNA methylation, three loci were ential methylation are indicated in Table 2 and in the analyzed by bisulfite genomic sequencing. Primers were 65 “SEQUENCE LISTING” section. Because the “wingspan” designed to amplify approximately 150 bp amplicons within of genomic interrogation by each feature is conservatively 1 the region of three loci that were analyzed by qPCR as kb, PCR primers that amplify an amplicon within a 1 kb US 8,067,168 B2 37 38 region Surrounding the sequence represented by each methylation events are just as likely to be present in a Stage I microarray feature were selected and used for independent tumor as they are in later stage tumors. The proportion of Verification and validation experiments. Primer sequences methylated loci in tumors at each stage was then analyzed for and amplicon sequences are indicated in Table 2 and in the three selected loci. The percent depletion by MicrBC for each “SEQUENCE LISTING” section. To optimize successful sample in which a given locus scored as methylated was PCR amplification, these amplicons were designed to be less calculated 1-(1/2 delta Ct (McrBC digested-Mock treated)) than the entire 1 kb region represented by the wingspan of the * 100 to provide a measure of the load of methylated mol microarray feature. However, it should be noted that differ ecules within the sample. The mean percent depletion at each ential methylation may be detectable anywhere within this tumor stage is shown in FIG. 10B. While there is a trend for sequence window. For each locus, the sequence representing 10 increased methylation density at these loci with increasing at least this 1 kb region flanking the sequence represented by tumor stage, methylation density of Stage I tumors is not the microarray feature was selected as the claimed potentially significantly different than Stage II-III tumors, yet is dramati differentially methylated genomic region. These sequences cally different than the average of all normal samples. There are indicated in Table 2 (DNA Region Sequences) and in the fore, differential methylation of these loci is independent of “SEQUENCE LISTING” section. Sequences claimed based 15 tumor stage in regards to both the frequency and the density of on the 1 Kb region flanking the sequence represented by the hypermethylation. microarray feature are indicated by “1” kb in Table 2 (Selec tion Criteria). Example 11 In addition, the local CpG density Surrounding each region was calculated. Receiver-Operator Curve Analysis of Biomarker Approximately 10 kb of sequence both upstream and Sensitivity and Specificity downstream of each feature was 30 extracted from the human genome. For each 20 kb region of the genome, a sliding Receiver-operator characteristic (ROC) analyses were per window of 500 bp moving in 100 bp steps was used to calcu formed for each of the 16 loci described in Table 5 to deter late the CG density. CG density was expressed as the ratio of 25 mine optimal thresholds for calculation of sensitivity and CG dinucleotides per kb. An example is shown in FIG.9 and illustrates the position of the transcription start site of the specificity of the differential DNA methylation event. GHSR gene relative to the regional CpG density of the sur Examples of the primary qPCR data for four selected loci are rounding sequence. In this example, methylation anywhere shown in FIG. 11A. These plots demonstrate the overall dis with the ~4 kb peak of CpG density associated with the crimination between tumor, normal breast tissue and normal promoter region of the gene is monitored and is useful in a 30 peripheral blood samples. The frequency at which tumor clinical diagnostic assay. Loci in which the claimed region tissues were scored as differentially methylated at these loci was determined by analysis of local CpG density are indi was not significantly associated with either age of the cancer cated by “CG” in Table 2 (Selection Criteria). As diagrammed patient or estrogen receptor status of the patient’s primary in FIG. 9, the claimed sequences were selected based on tumor. ROC curves for the corresponding four datasets are setting the local minimum of CpG density flanking the 35 shown in FIG. 11B. Optimal thresholds were identified as the sequence represented by the PCR amplicon as the upstream maximum sum of sensitivity and specificity calculated at each and downstream boundaries. observed deltaCt value. The minimum allowed threshold was set at 0.5 so that calculations could not be based on thresholds Example 10 within the variability range of the qPCR platform. Results are 40 summarized in Table 6. Sensitivity and specificity calcula Demonstration that Differential DNA Methylation is tions based on optimal thresholds are similar to those calcu Detectable in Early Stage Disease lated using a standard delta Ct threshold of 1.0. As hypoth esized, the direct global profiling of DNA methylation Although fewer Stage I tumors compared to Stage II or III identified numerous novel DNA methylation-based biomar tumors were analyzed (8 of 103 samples), the inclusion of a kers that display Substantially improved sensitivity and speci small number of Stage I tumors allowed a determination of 45 ficity relative to the vast majority of previously identified whether the differential methylation events are related to differentially methylated genes in breast cancer. In fact, a tumor stage. FIG. 10A shows a plot of the frequency of single differentially methylated biomarker, located in the pro hypermethylation of the 16 loci in the 8 Stage I tumors (i.e. moter region of GHSR, was capable of distinguishing IDC the percentage of Stage I tumors scoring as intermediately to from normal and benign breast tissue with sensitivity of 90% densely methylated) versus the Stage II and III tumors. The 50 and specificity of 96%. Other biomarkers displayed similar relationship between the two sensitivity calculations specificity, with decreasing sensitivity. Several of these biom (R=0.887; slope=0.9815) indicates that the frequency of arkers were hypermethylated at a higher frequency than the hypermethylation of these loci is similar regardless of tumor majority of previously reported hypermethylated biomarkers stage. Therefore, for the majority of loci, the differential (i.e. 12 of 16 displayed sensitivity between 53% and 90%). TABLE 6 Breast Cancer Biomarker Validation. Thresholds indicate the optimal average dCt value for distinction between tumor and non-tumor tissues.

BREAST TUMORVS. Locus BREAST TUMORVS. NORMAL BREAST NORMAL, BLOOD Feature ID Number Sensitivity Pos. of Total Specificity Neg. of Total Threshold Specificity Neg. of Total Threshold

ha1p 39189 2 90% 92 of 102 96% 99 of 103 O.64 100% 24 of 24 122 ha1g O0644 3 89% 90 of 101 92% 96 of 104 0.555 100% 25 of 25 O.695 ha1p 101251 37 779, 78 of 101 87% 89 of 102 0.755 96% 23 of 24 1.06 US 8,067,168 B2 39 40 TABLE 6-continued Breast Cancer Biomarker Validation. Thresholds indicate the optimal average dCt value for distinction between tumor and non-tumor tissues.

BREAST TUMORVS. Locus BREAST TUMORVS. NORMAL BREAST NORMAL, BLOOD Feature ID Number Sensitivity Pos. of Total Specificity Neg. of Total Threshold Specificity Neg. of Total Threshold halp 81674 4 70% 69 of 99 92% 86 of 93 1.11 95% 18 of 19 O.935 ha1p 74707 34 69% 69 of 100 82% 84 of 103 0.615 87% 20 of 23 O.63 ha1g 02416 15 65% 66 of 102 97% 101 of 104 0.705 100% 25 of 25 0.535 ha1p 70459 26 63% 64 of 101 97% 101 of 104 0.525 100% 25 of 25 0.57 ha1p 69407 21 63% 64 of 101 93% 96 of 103 O.S 71.9% 17 of 24 0.55 ha1p 110107 17 60% 61 of 101 98% 102 of 104 O.S1 96% 23 of 24 O.S1 halp 36172 25 S8% S9 of 102 100% 104 of 104 0.515 100% 23 of 23 0.515 ha1p 67625 30 S6% S1 of 91 97% 96 of 99 O.S.45 95% 2O of 21 O.S.45 ha1p 104423 12 53% 52 of 98 97% 94 of 97 O.61 96% 23 of 24 0.855 halp O5406 22 48% 49 of 103 97% 1OO of 103 O.S1 100% 24 of 24 O.S1 ha1p 89799 18 42% 42 of 101 99% 98 of 99 O.S.45 100% 24 of 24 O.71 ha1p 80771 2O 38% 39 of 103 97% 101 of 104 O.S 100% 25 of 25 0.72 ha1g 02345 24 34% 35 of 102 97% 1OO of 103 0.535 100% 22 of 22 0.535

Example 12 CpG dinucleotides of the analyzed region were greater than 50%, while methylation of the following four CpG dinucle In-Depth Analysis of DNA Methylation by Bisulfite otides fell to lower densities more consistent with the baseline Sequencing 25 levels of methylation at the other analyzed loci. Interestingly, tumor samples displayed the same general methylation den To provide an in-depth analysis of DNA methylation states sity pattern, but with significantly higher methylation density relative to the qPCR-based measurements of methylated per CpG across the entire analyzed region. Together, these DNA load between different tissue types, we selected four results confirm the hypermethylated state of these loci in loci (Locus Number 2, 3, 4 and 12) for extensive bisulfite 30 breast cancer and provide an extensive validation of the accu sequencing analysis (FIG. 12). For each locus, analyzed racy of the qPCR-based method used to screen for DNA regions overlapped those amplified in the qPCR assay. Primer methylation changes in this study. pairs were designed to flank, but not include CpG dinucle otides. For analysis of each locus, we selected tumor samples Example 13 that scored as intermediately to densely methylated and nor 35 mal breast samples that scored as sparsely methylated. In DNA Hypermethylation is Associated with addition, we selected three histology normal tumor-adjacent Decreased Transcription tissue samples. Loci were amplified from bisulfite-modified genomic DNA with primers that included patient-specific To address the association between hypermethylation and sequence tags to identify the tissue sample, and amplicons 40 transcription repression, we performed RT-PCR analyses of were pooled and sequenced. The average number of mol Locus Numbers 2, 4 and 12 (FIG. 13). Four breast infiltrating ecules analyzed for each locus in each sample was 587. To ductal carcinoma samples (>90% neoplastic cellularity) were provide a general measurement of local DNA methylation analyzed for both DNA methylation and transcription of the density at each locus, the total number of CpG sites three genes. DNA methylation was analyzed using the qPCR sequenced as C (methylated) was divided by the total of 45 based assays described above. For gene expression analyses, number of CpG sites sequenced for each individual sample. RT-PCR was performed using gene-specific primer pairs This percent methylated CpG value was then plotted against designed to flank intronic sequences so that the contribution the qPCR methylation measurement for the same tissue of contaminating genomic DNA could be excluded. Analysis sample (FIG. 12A, C, E, G). Methylation load values of GAPDH expression was performed as an internal control. obtained by bisulfite sequencing and by qPCR displayed a 50 Serial dilutions of first-strand cDNA preparations from tumor strong correlation for Locus number 2, 12 and 3(R-076, samples and a normal breast tissue sample were used as 0.87 and 0.78, respectively). While tumor samples displayed templates for PCR. As shown in FIG. 13, expression of Locus higher DNA methylation load at Locus number 4 than normal Number 2 transcript (GHSR) was undetectable in all four breast and adjacent histology normal breast samples, the non tumor samples, while expression was detected at 1:10 dilu tumor tissues displayed higher baseline DNA methylation 55 tion of the normal breast cDNA. Consistent with the high densities than at the other loci (FIG. 12E). Next, the average sensitivity of hypermethylation at the GHSR locus (90%), all occurrence of DNA methylation per CpG site in each tissue tumor samples demonstrated intermediate to dense DNA type was calculated (FIG. 12B, D, F. H). In general, tumor methylation at this locus. Likewise, all tumor samples dis samples displayed higher variability in methylation per CpG played reduced expression of Locus Number 12 (NFX1) rela site than non-tumor (i.e., normal) samples (indicated by 60 tive to normal breast tissue. Expression was undetectable in higher standard deviations for the average percent methylated three of four tumor samples, whereas expression was detected CpGs). At each locus, the DNA methylation pattern was in one tumor using undiluted cDNA as template. In normal significantly hypermethylated relative to non-tumor samples. breast tissue, expression was detected at 1:10 dilution of the Furthermore, analysis of DNA methylation per CpG site pro cDNA. Interestingly, the tumor sample in which NFX1 vided an explanation for the higher baseline DNA methyla 65 expression was detected was scored as sparsely methylated tion densities detected at the Locus Number 4 (FIG. 12F). In by the qPCR-based assay. Methylation of the analyzed region non-tumor samples, methylation densities at the first three of Locus Number 4 (MGA) was detected in all four tumors. US 8,067,168 B2 41 42 However, reduced expression of MGA relative to normal 53 claimed biomarkers were analyzed in panels of lung, renal, breast was demonstrated in two of the four tumor samples. liver, ovarian, head and neck, thyroid, bladder, cervical, Example 14 colon, endometrial, esophageal and prostate tumors. Adja cent histology normal tissues were analyzed as controls. In Detection of Tumor-Specific DNA Methylation in addition, melanoma tumors were analyzed, although no adja Fine Needle Aspirate Specimens cent normal tissues were available. The number of samples analyzed for each cancer type is provided in Table 7. DNA A common procedure to biopsy Suspect masses in the methylation was measured as described in Example 3. For breast is to perform fine needle aspiratation (FNA) of the each locus and each cancer type, the sensitivity and specific tissue. The procedure involves removal of a small amount of 10 fluid and cellular material from the Suspect mass using a fine ity for discriminating between tumor and adjacent normal gauge needle. In addition, random periareolar fine needle tissue are reported in Tables 8-20. For melanoma tumors aspiration (RPFNA) can be used to sample breast tissue in (Table 20), only sensitivity (the frequency of DNA methyla asymptomatic women to assess the risk of breast cancer tion detection (i.e. samples that report an average dCt21.0)) development. Both approaches typically involve a cytologi 15 is reported due to the unavailability of adjacent normal tis cal based diagnosis. Therefore, applying molecular tests to Sues. For each locus, the optimal threshold for discriminating specimens obtained by these approaches promises to offer between tumor and adjacent normal tissue was calculated significantly improved clinical sensitivity and specificity following ROC curve analyses as described in Example 11. relative to the current practice. To assess the ability to detect breast tumor-specific DNA methylation of the claimed differ These data demonstrate that particular biomarker loci are entially methylated loci, eight loci with varying frequency of applicable to cancer types other than breast cancer. differential DNA methylation in primary breast tissue were analyzed in a panel of 7 FNA specimens taken from women TABLE 7 with confirmed infiltrating ductal breast carcinoma. DNA methylation was measured as described in Example 3. These Number of Tumor and Adjacent Normal tissues tested for methylation included Locus Number 1, 2, 3, 4, 12, 37, 38 and 43. In FIG. 25 of the 53 biomarker loci. 14, the percent sensitivity for each locus as listed in Tables 3A and 3B (i.e. the percentage of tumors that report and average Cancer Type Tumor Adjacent Normal dCt21.0) is plotted against the percentage of unmatched FNA samples that report and average dCte 1.0. The fre Lung 10 10 quency of DNA methylation detection (i.e. samples that 30 Renal 10 10 report an average dCt21.0) is very similar regardless of Liver 9 9 whether primary tumor samples or unmatched FNA speci Ovarian 8 8 mens from confirmed breast cancer patients were analyzed Head and Neck 9 5 (R-0.7415, slope=0.817). These results suggest that the Thyroid 9 9 DNA methylation biomarkers described herein can be 35 detected in a sample type relevant to molecular diagnostics of Bladder 9 9 breast cancer. Cervical 10 9 Example 15 Colon 8 8 Endometrial 14 9 Esophageal 10 Analysis of DNA Methylation in Various Cancer 40 Types Prostate 9 Melanoma To address the applicability of the claimed DNA methyla tion biomarkers to cancer types other than breast cancer, all TABLE 8 Sensitivity and Specificity of differentially methylated loci in lung tumors relative to adiacent histological normal lung tissue. Feature ID Locus Number Threshold Sensitivity Pos. of Total Specificity Neg. of Total ha1p 105474 33 1.98 80% 8 of 10 OO% O of 10 ha1p 391.89 2 1.17 70% 7 of 10 OO% O of 10 ha1p. 231.78 9 3.01S 70% 7 of 10 OO% O of 10 ha1p. 89099 28 1.38 70% 7 of 10 OO% O of 10 ha1g O0644 3 1445 60% 6 of 10 OO% O of 10 ha1p 40164 8 1.96 60% 3 of S OO% 5 of 5 ha1p 81149 5 2.37 SO% 5 of 10 OO% O of 10 ha1p 08347 14 1.67 SO% 4 of 8 OO% 7 of 7 ha1p 12075 46 2.33 43% 3 of 7 OO% 7 of 7 ha1p 40959 11 2.435 40% 4 of 10 OO% O of 10 ha1p. 36172 25 O.835 40% 4 of 10 OO% O of 10 ha1p 56412 44 3.86 40% 4 of 10 OO% 9 of 9 ha1g 03099 29 O.63S 30% 3 of 10 OO% O of 10 ha1p 67625 30 O.96S 30% 3 of 10 OO% 9 of 9 ha1p 93325 35 2.93 30% 3 of 10 OO% O of 10 ha1p 103824 40 0.765 30% 3 of 10 OO% O of 10 ha1p 69407 21 2.77 29% 2 of 7 OO% 6 of 6 ha1p. 81674 4 2.16S 25% 2 of 8 OO% 7 of 7 ha1p. 80771 2O 1.OS 20% 2 of 10 OO% 9 of 9 US 8,067,168 B2 43 44 TABLE 8-continued Sensitivity and Specificity of differentially methylated loci in lung tumors relative to adjacent histological normal lung tissue.

Feature ID Locus Number Threshold Sensitivity Pos. of Total Specificity Neg. O O tal ha1g 00218 31 O.71 10% 1 of 10 100% ha1p 45.173 19 1.66 100% 10 of 10 90% ha1p 83841 6 1.54 90% 9 of 10 90% ha1p 46.057 10 O.9 90% 9 of 10 90% ha1p 105937 27 1.375 80% 8 of 10 90% ha1p 69214 38 3.735 78% 7 of 9 90% ha1p 18292 45 2.06 60% 6 of 10 90% ha1p 12535 32 2.05 50% 5 of 10 90% ha1p. 67002 51 2.O2S 50% 5 of 10 90% ha1p 87540 16 O.98 80% 8 of 10 89% ha1p 108445 41 1.9 40% 4 of 10 89% ha1p. 88517 39 1.38 50% 3 of 6 83% ha1p. 29531 48 1.01 90% 9 of 10 80% ha1p 58853 49 1.315 80% 8 of 10 80% ha1p 103872 43 2.96 70% 7 of 10 80% ha1p. 89799 18 O.S 63% 5 of 8 80% ha1p 104423 12 O.83 40% 4 of 10 80% ha1p 80287 23 O.82 40% 4 of 10 80% ha1p. 74707 34 O.76 78% 7 of 9 78% ha1p. 38705 7 O.62 63% 5 of 8 78% 9 ha1g O0681 1 1.96S S6% 5 of 9 78% ha1p 35052 50 1.6 70% 7 of 10 70% ha1p. 45580 52 3.1 70% 7 of 10 70% g ha1g 02345 24 O.S1 67% 6 of 9 67% ha1p O5406 22 0.615 71.9% 5 of 7 63% ha1p 22519 47 1.6 100% 10 of 10 60% ha1g 02416 15 O.S8 90% 9 of 10 60% ha1p 101161 36 O.885 80% 8 of 10 60% ha1g O0847 13 O.S3 63% 5 of 8 60% ha1g 02210 42 O.6OS 60% 6 of 10 60% ha1p. 70459 26 O.S4 100% 6 of 6 56% ha1p 12646 53 5.455 78% 7 of 9 56% ha1p 101251 37 1.34 100% 10 of 10 40% O ha1p 1101.07 17 0.55 86% 6 of 7 40%

Threshold: Average dCtvalue established by ROC curve analysis as optimal threshold for distinguishing tumor and adjacent normal tissues, Sensitivity;% of positive (i.e. methylation score above Threshold) tumors, Pos. of Total: Number of positive tumors relative to the total number of tumors analyzed. Specificity: % of negative (i.e. methylation score below Threshold) adjacent normal samples. Neg, of Total:Number of negative adjacent normal samples relative to the total number of adjacentnormal samples analyzed.

TABLE 9 Sensitivity and Specificity of differentially methylated loci in renal tumors relative to adjacent histological normal kidney tissue. Feature ID Locus Number Threshold Sensitivity Pos. of Total Specificity Neg. O ha1p. 29531 48 O.68 80% 8 of 10 ha1p. 23178 9 52 70% 7 of 10 ha1p 69407 21 255 70% 7 of 10 ha1p 22519 47 68 67% 6 of 9 ha1g O0644 3 13S 60% 6 of 10 ha1p 83841 6 .2 60% 6 of 10 ha1p. 80771 2O 0.535 60% 6 of 10 ha1p 103872 43 3.25 60% 6 of 10 ha1p 89.099 28 O.935 S6% 5 of 9 ha1p. 74707 34 .6 S6% 5 of 9 ha1g 02416 15 O.S6 SO% 5 of 10 ha1p 45.173 19 0.675 SO% 5 of 10 ha1p 105937 27 O.99 SO% 5 of 10 ha1p 93325 35 155 SO% 5 of 10 ha1p 108445 41 O2S SO% 5 of 10 ha1p 103824 40 O.82 44% 4 of 9 ha1p 56412 44 84 44% 4 of 9 ha1p. 38705 7 28 40% 4 of 10 9 ha1p. 70459 26 O.74 40% 4 of 10 ha1p 36,172 25 O.92 33% 3 of 9 ha1g 02345 24 0.565 30% 3 of 10 ha1p 80287 23 13S 22% 2 of 9 ha1g 03099 29 0.555 20% 2 of 10 ha1p 58853 49 195 11% 1 of 9 US 8,067,168 B2 45 46 TABLE 9-continued Sensitivity and Specificity of differentially methylated loci in renal tumors relative to adjacent histological normal kidney tissue. Feature ID Locus Number Threshold Sensitivity Pos. of Total Specificity Neg. of Total ha1p 08347 14 2.65 100% 9 of 9 90% 9 of 10 ha1p 46.057 10 1.325 80% 8 of 10 90% 9 of 10 ha1g O0681 1 1.905 60% 6 of 10 90% 9 of 10 ha1p 18292 45 1.09 60% 6 of 10 90% 9 of 10 ha1p 87540 16 O.92S 40% 4 of 10 90% 9 of 10 ha1p. 89799 18 O.6OS 40% 4 of 10 90% 9 of 10 ha1p O5406 22 O.S2 40% 4 of 10 90% 9 of 10 ha1g 00218 31 0.505 20% 2 of 10 90% 9 of 10 ha1p. 39189 2 1.03 80% 8 of 10 8996 8 of 9 ha1p. 67625 30 0.79 60% 6 of 10 8996 8 of 9 ha1p. 88517 39 1.98 50% 5 of 10 8996 8 of 9 ha1p 35052 50 1495 100% 10 of 10 809/o 8 of 10 ha1p 40164 8 O.82S 90% 9 of 10 809/o 8 of 10 ha1p. 67002 51 1.S6S 90% 9 of 10 809/o 8 of 10 ha1p 40959 11 O.88 80% 8 of 10 809/o 8 of 10 ha1p 12535 32 O.93 70% 7 of 10 809/o 8 of 10 ha1p 12646 53 4.23 60% 6 of 10 809/o 8 of 10 ha1p 1101.07 17 O.S3 S6% 5 of 9 809/o 8 of 10 ha1p 101161 36 O.93 40% 4 of 10 809/o 8 of 10 ha1g 02210 42 O.S2 33% 3 of 9 809/o 8 of 10 ha1g O0847 13 0.7 S6% 5 of 9 7896 7 of 9 ha1p. 81674 4 1.365 89% 8 of 9 679/o 6 of 9 ha1p 12075 46 1.66 89% 8 of 9 569 5 of 9 ha1p. 45580 52 2.1 100% 10 of 10 SO90 5 of 10 ha1p 81149 5 0.875 90% 9 of 10 SO90 5 of 10 ha1p 105474 33 1.035 70% 7 of 10 SO90 5 of 10 ha1p 101251 37 1 70% 7 of 10 SO90 5 of 10 ha1p 104423 12 O.68 80% 8 of 10 409/o 4 of 10 ha1p 69214 38 1.085 100% 10 of 10 33% 3 of 9

Threshold: Average dCtvalue established by ROC curve analysis as optimal threshold for distinguishing tumor and adjacent normal tissues, Sensitivity;% of positive (i.e. methylation score above Threshold) tumors, Pos. of Total: Number of positive tumors relative to the total number of tumors analyzed. Specificity: % of negative (i.e. methylation score below Threshold) adjacent normal samples. Neg, of Total:Number of negative adjacent normal samples relative to the total number of adjacentnormal samples analyzed.

TABLE 10 Sensitivity and Specificity of differentially methylated loci in liver tumors relative to adjacent histological normal liver tissue. Feature ID Locus Number Threshold Sensitivity Pos. of Total Specificity Neg. of Total ha1p. 89799 18 3.01 67% 6 of 9 100% 9 of 9 ha1p. 81674 4 1.66 S6% 5 of 9 100% 9 of 9 ha1p 56412 44 1.94 S6% 5 of 9 100% 9 of 9 ha1p. 67002 51 1.825 SO% 4 of 8 100% 9 of 9 ha1p 81149 5 1.39 38% 3 of 8 100% 9 of 9 ha1p 83841 6 1.30S 38% 3 of 8 100% 9 of 9 ha1p. 74707 34 1.335 38% 3 of 8 100% 8 of 8 ha1p. 80771 2O O.87 33% 3 of 9 100% 9 of 9 ha1p. 70459 26 2.785 33% 3 of 9 100% 9 of 9 ha1p 89.099 28 2.715 33% 3 of 9 100% 8 of 8 ha1p 12535 32 2.1 33% 3 of 9 100% 9 of 9 ha1p. 39189 2 4.805 25% 2 of 8 100% 9 of 9 ha1g 00218 31 0.775 22% 2 of 9 100% 9 of 9 ha1p 12646 53 3.49 1796 1 of 6 100% 7 of 7 ha1g O0847 13 O.86 78% 7 of 9 89% 8 of 9 ha1p O5406 22 1.465 78% 7 of 9 89% 8 of 9 ha1p 105474 33 1.55 78% 7 of 9 89% 8 of 9 ha1p 69407 21 3.06 67% 6 of 9 89% 8 of 9 ha1g 02416 15 O.94 SO% 4 of 8 89% 8 of 9 ha1p 40164 8 1.91 44% 4 of 9 89% 8 of 9 ha1p 101251 37 1.195 44% 4 of 9 89% 8 of 9 ha1p 1101.07 17 1955 38% 3 of 8 89% 8 of 9 ha1p 105937 27 1.585 38% 3 of 8 89% 8 of 9 ha1p 18292 45 2.51 63% 5 of 8 88% 7 of 8 ha1p 35052 50 6 89% 8 of 9 78% 7 of 9 ha1p. 67625 30 1.97 83% 5 of 6 78% 7 of 9 ha1p. 23178 9 2.655 78% 7 of 9 78% 7 of 9 ha1p 93325 35 4.85 78% 7 of 9 78% 7 of 9 ha1g O0681 1 0.615 75% 6 of 8 78% 7 of 9

US 8,067,168 B2 55 56 TABLE 14-continued Sensitivity and Specificity of differentially methylated loci in bladder tumors relative to adjacent histological normal bladder tissue. Feature ID Locus Number Threshold Sensitivity Pos. of Total Specificity Neg. of Total ha1g 02210 42 1145 60% 3 of S 7896 7 of 9 ha1p 108445 41 1.99 100% 9 of 9 759, 6 of 8 halp 81149 5 0.615 100% 8 of 8 679/o 6 of 9 ha1p 69407 21 1.185 100% 9 of 9 679/o 6 of 9 Threshold: Average dCtvalue established by ROC curve analysis as optimal threshold for distinguishing tumor and adjacent normal tissues, Sensitivity;% of positive (i.e. methylation score above Threshold) tumors, Pos. of Total: Number of positive tumors relative to the total number of tumors analyzed. Specificity: % of negative (i.e. methylation score below Threshold) adjacent normal samples. Neg, of Total:Number of negative adjacent normal samples relative to the total number of adjacentnormal samples analyzed.

TABLE 1.5 TABLE 15-continued Sensitivity and Specificity of differentially methylated loci in cervica Sensitivity and Specificity of differentially methylated loci in cervical tumors relative to adjacent histological normal cervical tissue. tumors relative to adjacent histological normal cervical tissue. Locus Thresh- Sen- Pos. of Spec- Neg. of Locus Thresh- Sen- Pos. of Spec- Neg. of Feature ID Number old sitivity Total ificity Total Feature ID Number old sitivity Total ificity Total ha1p 83841 6 O.905 100% 9 of 9 OO% 8 of 8 ha1p 35052 50 1.135 80% 8 of 10 S6% S of 9 ha1p. 23178 9 1 100% 9 of 9 OO% 9 of 9 ha1p 69407 21 0.665 90% 9 of 10 44% 4 of 9 ha1p. 74707 34 O.61 100% 9 of 9 OO% 9 of 9 25 ha1p 108445 41 O.995 100% 10 of 10 OO% 9 of 9 Threshold: Average dCt value established by ROC curve analysis as optimal threshold for distinguishing tumor and adjacent normal tissues, ha1p 40959 11 1.49 90% 9 of 10 OO% 8 of 8 Sensitivity;% of positive (i.e. methylation score above Threshold) tumors, ha1g 0.0847 13 1.01 90% 9 of 10 OO% 9 of 9 Pos. of Total: Number of positive tumors relative to the total number of tumors analyzed. ha1p 69214 38 0.515 90% 9 of 10 OO% 8 of 8 Specificity: % of negative (i.e. methylation score below Threshold) adjacent normal ha1g 0.0644 3 0.725 89% 8 of 9 OO% 9 of 9 samples, ha1p 40164 8 0.655 89% 8 of 9 OO% 9 of 9 30 Neg, of Total: Number of negative adjacent normal samples relative to the total number of adjacent normal samples analyzed. ha1p 103872 43 0.785 89% 8 of 9 OO% 9 of 9 ha1p 1101.07 17 O.775 88% 7 of 8 O09% 7 of 7 ha1p 391.89 2 O.S9 80% 8 of 10 OO% 9 of 9 ha1p 46057 10 O.89 80% 8 of 10 OO% 9 of 9 TABLE 16 ha1p 45.173 19 O.S3 80% 8 of 10 OO% 9 of 9 35 Sensitivity and Specificity of differentially methylated loci in colon ha1p 58853 49 1.215 80% 8 of 10 OO% 9 of 9 tumors relative to adjacent histological normal colon tissue. ha1p 8.8517 39 O.63S 78% 7 of 9 OO% 9 of 9 ha1p. 80771 2O O.S2 70% 7 of 10 OO% 9 of 9 Locus Sen- Pos. of Spec- Neg. of ha1p 105937 27 O.64 70% 7 of 10 OO% 9 of 9 Feature ID Number Threshold sitivity Total ificity Total ha1p 101161 36 0.57 70% 7 of 10 OO% 9 of 9 ha1p 56412 44 1855 88% 7 of 8 100% 8 of 8 ha1g 02416 15 O.S45 60% 6 of 10 OO% 8 of 8 40 ha1p 103824 40 0.515 60% 6 of 10 OO% 8 of 8 ha1g 0.0644 3 1.46 86% 6 of 7 100% 8 of 8 ha1p 38705 7 O.S9 S6% 5 of 9 OO% 7 of 7 ha1p 40959 11 2.16 71.9% 5 of 7 100% 6 of 6 ha1g 02345 24 O.S6 S6% 5 of 9 OO% 8 of 8 ha1p. 23178 9 1805 63% 5 of 8 100% 8 of 8 ha1p 104423 12 0.515 SO% 5 of 10 OO% 9 of 9 ha1p 105937 27 1.425 63% 5 of 8 100% 8 of 8 ha1p. 36172 25 0.77 SO% 5 of 10 OO% 9 of 9 ha1g 03099 29 O.63 63% 5 of 8 100% 8 of 8 ha1p. 70459 26 1.27 SO% 5 of 10 OO% 9 of 9 ha1p. 74707 34 2.08 63% 5 of 8 100% 8 of 8 45 ha1p 101161 36 O.65 63% 5 of 8 100% 8 of 8 ha1p O5406 22 0.73 44% 4 of 9 OO% 8 of 8 ha1p 103824 40 O.6OS 63% 5 of 8 100% 8 of 8 ha1p 87540 16 0.73 40% 4 of 10 OO% 8 of 8 ha1g 02210 42 O.82 63% 5 of 8 100% 8 of 8 ha1p. 89799 18 O.63 38% 3 of 8 OO% 9 of 9 ha1p. 81674 4 4.63 SO% 3 of 6 100% 8 of 8 ha1p 08347 14 1.745 33% 3 of 9 OO% 9 of 9 ha1p 83841 6 2.175 SO% 4 of 8 100% 7 of 7 ha1g 03099 29 O.S 33% 3 of 9 OO% 8 of 8 ha1p. 80771 2O 31 SO% 4 of 8 100% 8 of 8 ha1p 89099 28 O.8 30% 3 of 10 OO% 9 of 9 50 ha1p 101251 37 29 SO% 3 of 6 100% 8 of 8 ha1p. 67625 30 O.91 22% 2 of 9 OO% 8 of 8 ha1p 103872 43 2.36 SO% 4 of 8 100% 8 of 8 ha1p 80287 23 O.86S 20% 2 of 10 OO% 9 of 9 ha1p. 67002 51 57 SO% 4 of 8 100% 8 of 8 ha1g 00218 31 0.705 10% 1 of 10 OO% 8 of 8 ha1p 104423 12 O.575 43% 3 of 7 100% 8 of 8 ha1p 12535 32 0.585 100% 10 of 10 89% 8 of 9 ha1p 81149 5 2.82 38% 3 of 8 100% 8 of 8 ha1p 93325 35 0.595 90% 9 of 10 89% 8 of 9 ha1g 02345 24 O.S8 38% 3 of 8 100% 8 of 8 ha1p. 29531 48 1.OS 80% 8 of 10 89% 8 of 9 55 ha1g 00218 31 745 38% 3 of 8 100% 8 of 8 ha1p 101251 37 1.63S 70% 7 of 10 89% 8 of 9 ha1p. 67625 30 0.515 25% 2 of 8 100% 8 of 8 ha1p. 81674 4 0.975 SO% 4 of 8 89% 8 of 9 ha1p 105474 33 .99 100% 8 of 8 88% 7 of 8 ha1p. 67002 51 2.055 SO% 5 of 10 89% 8 of 9 ha1p 18292 45 S45 100% 8 of 8 88% 7 of 8 ha1g 02210 42 O.845 38% 3 of 8 89% 8 of 9 ha1g00681 1 0.705 88% 7 of 8 88% 7 of 8 ha1p 108445 41 2.175 88% 7 of 8 88% 7 of 8 ha1p. 45580 52 1.765 67% 6 of 9 88% 7 of 8 ha1p 8.8517 39 645 86% 6 of 7 88% 7 of 8 ha1p 12075 46 O.S9 78% 7 of 9 83% S of 6 60 ha1p 45.173 19 145 75% 6 of 8 88% 7 of 8 ha1p 81149 5 1.11 100% 9 of 9 78% 7 of 9 ha1p 89099 28 2.22 75% 6 of 8 88% 7 of 8 ha1p 105474 33 0.525 100% 10 of 10 78% 7 of 9 ha1p 58853 49 .85 75% 6 of 8 88% 7 of 8 ha1p 22519 47 1.2 100% 10 of 10 78% 7 of 9 ha1p 80287 23 485 63% 5 of 8 88% 7 of 8 ha1p 12646 53 2.385 100% 10 of 10 78% 7 of 9 ha1p. 29531 48 3 63% 5 of 8 88% 7 of 8 ha1p 18292 45 1.195 70% 7 of 10 75% 6 of 8 ha1p 87540 16 52 86% 6 of 7 86%. 6 of 7 ha1p 56412 44 O.S9 90% 9 of 10 67%. 6 of 9 65 ha1p 12535 32 O.93 75% 6 of 8 86%. 6 of 7 ha1g00681 1 O.S8 89% 8 of 9 67%. 6 of 9 ha1p 38705 7 O1 SO% 3 of 6 86% 6 of 7 US 8,067,168 B2 57 58 TABLE 16-continued TABLE 17-continued Sensitivity and Specificity of differentially methylated loci in colon Sensitivity and Specificity of differentially methylated tumors relative to adjacent histological normal colon tissue. loci in endometrial tumors relative to adjacent histological normal endometrial tissue. Locus Sen- Pos. of Spec- Neg. of 5 Feature ID Number Threshold sitivity Total ificity Total Locus Sen- Pos. of Spec- Neg. of Feature ID Number Threshold sitivity Total ificity Total ha1p 1101.07 17 O.74 100% 8 of 8 75% 6 of 8 ha1p. 36172 25 O.S3 100% 8 of 8 75% 6 of 8 ha1p. 45580 52 1.435 43% 6 of 14 100% 8 of 8 ha1g 0.0847 13 2.085 88% 7 of 8 75% 6 of 8 ha1g 02345 24 O6 36% 5 of 14 100% 8 of 8 ha1p 69214 38 2.085 88% 7 of 8 75% 6 of 8 10 ha1p 67625 30 O.S6 31% 4 of 13 100% 8 of 8 ha1p. 89799 18 295 75% 6 of 8 75% 6 of 8 ha1p 80287 23 O.S1 21% 3 of 14 100% 9 of 9 ha1g 02416 15 295 63% 5 of 8 75% 6 of 8 ha1g 00218 31 0.535 21% 3 of 14 100% 9 of 9 ha1p 08347 14 415 88% 7 of 8 71% S of 7 ha1p 103824 40 O645 21% 3 of 14 100% 9 of 9 ha1p 391.89 2 O.575 86% 6 of 7 71% S of 7 ha1p 81149 5 1.16S 86%. 12 of 14 89% 8 of 9 ha1p 46057 10 O6 100% 8 of 8 63%. 5 of 8 ha1p. 81674 4 0.875 82% 9 of 11. 89% 8 of 9 ha1p 93325 35 O7 100% 8 of 8 63%. 5 of 8 15 na g O0847 13 1.11 79%. 11 of 14 89% 8 of 9 ha1p 40164 8 O.815 75% 6 of 8 63%. 5 of 8 ha1p 694O7 21 1.59 79%. 11 of 14 89% 8 of 9 ha1p. 70459 26 .21 75% 6 of 8 63%. 5 of 8 ha1p 93325 35 0.795 77%. 10 of 13 89% 8 of 9 ha1p 12075 46 O.935 75% 6 of 8 63%. 5 of 8 ha1p. 70459 26 0.655 71% 10 of 14 89% 8 of 9 ha1p 22519 47 1S SO% 4 of 8 63%. 5 of 8 ha1p. 80771 2O O.S8 64% 9 of 14 89% 8 of 9 ha1p O5406 22 .64 75% 6 of 8 S7% 4 of 7 ha1p 08347 14 1.86 57% 8 of 14 89% 8 of 9 ha1p 12646 53 2.16 67% 4 of 6 50%. 4 of 8 ha1g 02416 15 O.61 SO% 7 of 14 89% 8 of 9 ha1p 694O7 21 0.565 100% 8 of 8 38% 3 of 8 20 ha p. 89799 18 O.S2 62% 8 of 13 88% 7 of 8 ha1p 35052 50 0.565 100% 8 of 8 38%. 3 of 8 ha1p 12075 46 O.915 85%. 11 of 13 78% 7 of 9 ha1p. 45580 52 .505 100% 8 of 8 38%. 3 of 8 Threshold: Average dCt value established by ROC curve analysis as optimal threshold for Threshold: Average dCt value established by ROC curve analysis as optimal threshold for distinguishing tumor and adjacent normal tissues, distinguishing tumor and adjacent normal tissues, Sensitivity;% of positive (i.e. methylation score above Threshold) tumors, Sensitivity;% of positive (i.e. methylation score above Threshold) tumors, 25 Pos, of Total: Number of positive tumors relative to the total number of tumors analyzed. Pos. of Total: Number of positive tumors relative to the total number of tumors analyzed. Specificity: % of negative (i.e. methylation score below Threshold) adjacent normal Specificity: % of negative (i.e. methylation score below Threshold) adjacent normal samples, samples, Neg, of Total: Number of negative adjacent normal samples relative to the total number of Neg, of Total: Number of negative adjacent normal samples relative to the total number of adjacent normal samples analyzed. adjacent normal samples analyzed,

30 TABLE 1.8 TABLE 17 Sensitivity and Specificity of differentially methylated Sensitivity and Specificity of differentially methylated loci in esophageal tumors relative to adjacent loci in endometrial tumors relative to adjacent histological normal esophageal tissue. histological normal endometrial tissue. 35 Locus Thresh- Sen- Pos. of Spec- Neg. of Locus Sen- Pos. of Spec- Neg. of Feature ID Number old sitivity Total ificity Total Feature ID Number Threshold sitivity Total ificity Total ha1p 1101.07 17 0.515 67% 6 of 9 100% 9 of 9 ha1p 391.89 2 0.75 93% 3 of 14 100% 9 of 9 ha1p 38705 7 2.58 43% 3 of 7 100% 9 of 9 ha1g 0.0644 3 O.91 93% 3 of 14 100% 9 of 9 ha1p 87540 16 .21 33% 3 of 9 100% 9 of 9 ha1p 83841 6 1.07 93% 3 of 14 100% 9 of 9 40 a p. 67625 30 0.665 33% 3 of 9 100% 9 of 9 ha1p 103872 43 0.75 93% 3 of 14 100% 9 of 9 ha1p 103824 40 0.72 13% 1 of 8 100% 10 of 10 ha1p 56412 44 O.685 93% 3 of 14 100% 8 of 8 ha1p O5406 22 O.71 11% 1 of 9 100% 10 of 10 ha1p 12646 53 2.175 93% 3 of 14 100% 9 of 9 ha1g 03099 29 O.6 11% 1 of 9 100% 10 of 10 ha1p 40959 11 O.945 86% 2 of 14 100%. 3 of 3 ha1p 391.89 2 16S 89% 8 of 9 90% 9 of 10 ha1p 58853 49 O.S1 86% 2 of 14 100% 8 of 8 ha1p. 29531 48 27 78% 7 of 9 90% 9 of 10 ha1p 12535 32 1.025 83% O of 12 100% 9 of 9 ha1g 02416 15 O.65 38% 3 of 8 90% 9 of 10 ha1p 18292 45 1.32 83% () of 12 100% 8 of 8 45 ha1p 80771 2O 0.675 22% 2 of 9 90% 9 of 10 ha1p 46057 10 O.935 79% 1 of 14 100% 9 of 9 ha1p 45.173 19 O.98S 100% 9 of 9 89% 8 of 9 ha1p 45.173 19 O6 79% 1 of 14 100% 9 of 9 ha1p 8.8517 39 .14 100% 8 of 8 89% 8 of 9 ha1p 105474 33 0.555 79% 1 of 14 100% 9 of 9 ha1p. 89799 18 O.S3 14% 1 of 7 89% 8 of 9 ha1p 101251 37 1.28 79% 1 of 14 100% 9 of 9 ha1p 40959 11 52 78% 7 of 9 88% 7 of 8 ha1p 22519 47 2.2 79% 1 of 14 100% 9 of 9 ha1g 02210 42 O.905 33% 3 of 9 88% 7 of 8 ha1p. 29531 48 O.82 79% 1 of 14 100% 9 of 9 50 ha1p. 23178 9 645 100% 9 of 9 80% 8 of 10 ha1p 87540 16 O.S6 770, O of 13 100% 9 of 9 ha1p 46057 10 O.92 100% 9 of 9 80% 8 of 10 ha1g 02210 42 O.66S 75% 6 of 8 100% 6 of 6 ha1p 104423 12 O.65 100% 9 of 9 80% 8 of 10 ha1g00681 1 1145 71.9% O of 14 100% 9 of 9 ha1p 08347 14 0.75 100% 9 of 9 80% 8 of 10 ha1p. 23178 9 0.795 71.9% O of 14 100% 9 of 9 ha1p 108445 41 715 100% 9 of 9 80% 8 of 10 ha1p 1101.07 17 O.91 71.9% 5 of 7 100% 9 of 9 ha1p 105937 27 0.855 89% 8 of 9 80% 8 of 10 ha1p 105937 27 0.525 71.9% 0 of 14 100% 9 of 9 ss halp. 105474 33 655 89% 8 of 9 80% 8 of 10 ha1p 101161 36 0.55 71.9% O of 14 100% 9 of 9 ha1p 101161 36 0.785 89% 8 of 9 80% 8 of 10 ha1p 69214 38 0.555 71.9% O of 14 100% 9 of 9 ha1p 22519 47 77 89% 8 of 9 80% 8 of 10 ha1p 8.8517 39 0.855 69% 9 of 13 100% 8 of 8 ha1p 81149 5 2.02 78% 7 of 9 80% 8 of 10 ha1p 108445 41 O.805 67% 8 of 12 100% 8 of 8 ha1g 0.0644 3 O.905 75% 6 of 8 80% 8 of 10 ha1p 38705 7 1.085 64% 9 of 14 100% 9 of 9 ha1p 56412 44 O.74 67% 6 of 9 80% 8 of 10 ha1p 40164 8 0.785 64% 9 of 14 100% 9 of 9 60 na p 74707 34 O3S S6% 5 of 9 80% 8 of 10 ha1p 104423 12 0.705 64% 9 of 14 100% 9 of 9 ha1p 80287 23 O.62 22% 2 of 9 80% 8 of 10 ha1p 89099 28 0.565 64% 9 of 14 100% 8 of 8 ha1p. 36172 25 O.6 22% 2 of 9 80% 8 of 10 ha1p O5406 22 O.S3 SO% 7 of 14 100% 9 of 9 ha1p. 67002 51 .92S 78% 7 of 9 78% 7 of 9 ha1p. 67002 51 2.02 46% 6 of 13 100% 8 of 8 ha1p 89099 28 O.6 S6% 5 of 9 78% 7 of 9 ha1p. 36172 25 O.S6 43% 6 of 14 100% 8 of 8 ha1p 18292 45 18 89% 8 of 9 75% 6 of 8 ha1g 03099 29 O.64 43% 6 of 14 100% 9 of 9 ha1p 58853 49 1OS 100% 9 of 9 70% 7 of 10 ha1p. 74707 34 1.885 43% 6 of 14 100% 9 of 9 65 ha1p 83841 6 12S 89% 8 of 9 70% 7 of 10 ha1p 35052 50 1.63 43% 6 of 14 100% 9 of 9 ha1p 12535 32 O.78 89% 8 of 9 70% 7 of 10

US 8,067,168 B2 61 62 TABLE 20-continued Although the invention has been described in some detail Frequency of methylation of each locus in melanoma tumors. by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to one of ordinary Feature ID Locus Number Sensitivity Pos. of Total 5 skill in the art in light of the teachings of this invention that ha1p 101161 36 29% 2 of 7 ha1p 110107 17 14% 1 of 7 certain changes and modifications may be made thereto with ha1p 80771 2O 14% 1 of 7 out departing from the spirit or scope of the appended claims. ha1p 67625 30 14% 1 of 7 ha1p 74707 34 14% 1 of 7 ha1g 02416 15 O% O of 7 10 ha1g 02345 24 O% O of 7 ha1g 00218 31 O% O of 7 ha1p 103824 40 O% O of 7 All publications, databases, Genbank sequences, patents, Sensitivity;% of positive (i.e. methylation score above 1.0) tumors. and patent applications cited in this specification are herein Pos. of Total: Number of positive tumors relative to the total number of tumors analyzed. 15 Note that adjacent histology normal or normal skin samples were not available for analysis, Threshold for a positive methylation score was set at an average dCt of 1.0. incorporated by reference as if each was specifically and individually indicated to be incorporated by reference.

SEQUENCE LISTING

NUMBER OF SEO ID NOS: 265

SEO ID NO 1 LENGTH: 60 TYPE: DNA ORGANISM: Artificial Sequence FEATURE; OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO 681

< 4 OOs SEQUENCE: 1

togtag tagg togCtttittg Catcgcgttgttt Cacg at C ttgatcgcac actctgacag 6 O

SEO ID NO 2 LENGTH: 60 TYPE: DNA ORGANISM: Artificial Sequence FEATURE; OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 3918.9

< 4 OOs SEQUENCE: 2

accct acagc agact ct ct a cc act accga gtgatacaaa ggactgggat totggacagg 60

SEO ID NO 3 LENGTH: 60 TYPE: DNA ORGANISM: Artificial Sequence FEATURE; OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO644

< 4 OOs SEQUENCE: 3

cagttcCtcc accgcagcgg to acaccgtt gatttgatcc agaaataaga C9gatagtac 60

SEO ID NO 4 LENGTH: 60 TYPE: DNA ORGANISM: Artificial Sequence FEATURE; OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 81674

< 4 OOs SEQUENCE: 4

ctitt.cggtot cotc.cccatt titcc tagt cq tott cqgttg toggatgttgt aaactic gacc 60

SEO ID NO 5 US 8,067,168 B2 63 64 - Continued

&211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 81149

<4 OOs, SEQUENCE: 5 cc.caggaggc tatgggaatc agaat cacac ttgcacalaga galagacic ctt atgggacaag 6 O

<210s, SEQ ID NO 6 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 83841

<4 OOs, SEQUENCE: 6 aaataccacc ttcttaaata ccacccagct gtccaagatg gag act tcct ggcatcCagg 6 O

<210s, SEQ ID NO 7 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 38705

<4 OO > SEQUENCE: 7 ggaggagtaa aggctgttta caaacttgac gtacacacgc agt cct atcc ctacggit cot 6 O

<210s, SEQ ID NO 8 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 4O164

<4 OOs, SEQUENCE: 8 aaacaa.gcac cct gactggc gcatt cittac to attctago citgggtttca acttcaagga 6 O

<210s, SEQ ID NO 9 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 231.78

<4 OOs, SEQUENCE: 9 c cct tactgc tigctgctgta acagotaagt gccaattatc. tcc caag.cgt gaaagcaaat 6 O

<210s, SEQ ID NO 10 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 46.057

<4 OOs, SEQUENCE: 10 gaaacatgca Ctgcctgaca toggcaa.gct gcc cacacca agttacaggc titatggaaag 6 O

<210s, SEQ ID NO 11 &211s LENGTH: 60 &212s. TYPE: DNA US 8,067,168 B2 65 - Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 4O959

<4 OOs, SEQUENCE: 11 cgtgggcctt agccaaataa cittctggtag goaaatticto ccc.ttct tca gttgaga.gct 6 O

<210s, SEQ ID NO 12 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 104423

<4 OOs, SEQUENCE: 12 gagagggaag cct gatctgt titcct cactic gcttgct cqt gigatgtcatt ttctgtc.ttic 6 O

<210s, SEQ ID NO 13 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO847

<4 OOs, SEQUENCE: 13 atacgctgga acagagcacg agcatacact caac acacg cqcacacact Cagga Catct 6 O

<210s, SEQ ID NO 14 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p O8347

<4 OOs, SEQUENCE: 14 atggitttctgaaaag.ccctic agt cotggitt cittggct tcc tdattctgtg gttgggattit 6 O

<210s, SEQ ID NO 15 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g O2416

<4 OOs, SEQUENCE: 15 caacgacaag aagctgtc.ca aatatgagac cct gcagatg gcc caaatct a catcaacgc 6 O

<210s, SEQ ID NO 16 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 87540

<4 OOs, SEQUENCE: 16 gaggcaattic agggagacag gaacaccott citct citt cac acacct catc agttccactg 6 O

<210s, SEQ ID NO 17 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: US 8,067,168 B2 67 - Continued <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 1101.07

<4 OOs, SEQUENCE: 17 tgtgttgttgtc. catttggcga gatgtc.gaga gcggggggag tdt CCttgtc. g.gtgt atctg 6 O

<210s, SEQ ID NO 18 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 89799

<4 OOs, SEQUENCE: 18 tctic ccagca ccttgttaat ttct ct caat citccago cac aaatcc.gaga cacaacgctic 6 O

<210s, SEQ ID NO 19 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 45.173

<4 OOs, SEQUENCE: 19 agcgtgccta gaggagtggit Caggatagtgagggatctgt gatcc titcgt t ctgaat ct a 6 O

<210s, SEQ ID NO 2 O &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 80771

<4 OOs, SEQUENCE: 2O ccggagagcc ttcaaggga agagctgaat Cttitt at Ctt ttgtaacgac tacccagtga 6 O

<210s, SEQ ID NO 21 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 694O7

<4 OOs, SEQUENCE: 21 ggcc tdtctic cittggcatgt toccittgctt ctogcttgtcc agittaatcct ttctgacata 6 O

<210s, SEQ ID NO 22 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p O5406

<4 OOs, SEQUENCE: 22 gaaggaatgg cc catctott agggct ct ct gcttgtcacc taccaggttg gtcagaaacg 6 O

<210s, SEQ ID NO 23 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 80287 US 8,067,168 B2 69 - Continued

<4 OOs, SEQUENCE: 23 gctic Cagaca cagagaaagg titt Cttaa.ca ct caggttcg Cttct Cota a ggtgttgttgcc 6 O

<210s, SEQ ID NO 24 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g O2345

<4 OOs, SEQUENCE: 24 ttittgacctg. tatttgttg tcc.ggcagct ttcagtgtcg gttttacgag gtagagtgat 6 O

<210s, SEQ ID NO 25 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 3 6172

<4 OOs, SEQUENCE: 25 titt.cct agtic gatcc.ca.gct tct ct aggga gtgtcaggcg cacac agggit taagttagtt 6 O

<210s, SEQ ID NO 26 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 70459

<4 OOs, SEQUENCE: 26 tttic catgac agaggttt Caggat Ctctta gaaggaagaa agtagagacg gtgaagagct 6 O

<210s, SEQ ID NO 27 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 105937 <4 OOs, SEQUENCE: 27 atcc to agga gtaataac cc ctic cqattgt togcticagg to cct coct citt gaaattic ctd 6 O

<210s, SEQ ID NO 28 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 89.099

<4 OOs, SEQUENCE: 28 ggct cqgaaa goctt Catag taagggcgca gttalagacitt ggatttic ctg C cattacaca 6 O

<210s, SEQ ID NO 29 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g O3099

<4 OOs, SEQUENCE: 29 US 8,067,168 B2 71 72 - Continued

Catgcaga aa gctgggcagc atagaaagtt cacagcc acg gaaagat caa agagatggtg 6 O

<210s, SEQ ID NO 3 O &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 67625

<4 OOs, SEQUENCE: 30 tittgc.cat ct c tacagattt cac catct ct citt coccitct c cc cct cott cqctitt.cctic 6 O

<210s, SEQ ID NO 31 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO218

<4 OOs, SEQUENCE: 31 titccatcttt tdtgg.cgcga aaata accct ttgct cocto gttggittttgttgaggttga 6 O

<210s, SEQ ID NO 32 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 12535

<4 OOs, SEQUENCE: 32 tgct at Cttg tacctaaact gagcc cttitt ggtggaggca gtgagggitt C ttgttgttgttc 6 O

<210s, SEQ ID NO 33 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 105474

<4 OOs, SEQUENCE: 33 ttct cott cq agcatatt.cg ggaagggaag tittgaagagt gag to cctgt gagggcc.gtg 6 O

<210s, SEQ ID NO 34 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 74707

<4 OOs, SEQUENCE: 34 ggccttgagg gcaagacgag gaattitcgac ttagg to Ctt gaatctggag agctacagaa 6 O

<210s, SEQ ID NO 35 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 93325

<4 OOs, SEQUENCE: 35 tggagcacac aggcagoatt acgc.cattct tcc ttcttgg aaaaatc cct cagccittata 6 O US 8,067,168 B2 73 74 - Continued

<210s, SEQ ID NO 36 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 10116.1

<4 OOs, SEQUENCE: 36 agagagataa agagagatga cct cagagac aaa.gagactic aga.cccagcc agaggcc caa 6 O

<210s, SEQ ID NO 37 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 1.01251

<4 OO > SEQUENCE: 37 tatt Ctttgt toccatttgttct agggact gtctg.cgtgg Ctgttgacitat gagtgtcagc 6 O

<210s, SEQ ID NO 38 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 69214

<4 OOs, SEQUENCE: 38 acaacctggg aactgaataa ctittcaaag.c cagtgct cag cittctictogct c cqtact agc 6 O

<210s, SEQ ID NO 39 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 88517

<4 OOs, SEQUENCE: 39

Ctgtgtcagg tatttalacag ttctggggac acgggtgtta cct cotttca tdgtgct ct c 6 O

<210s, SEQ ID NO 4 O &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 103824

<4 OOs, SEQUENCE: 4 O agacitcgctg titc.ccgactg. tcgct cocta gctctgatga aaccoccaca tttctitt cag 6 O

<210s, SEQ ID NO 41 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 108445

<4 OOs, SEQUENCE: 41 gtgttcttgg cacatggtag gCatctgtct ttgttgggca gttgcatcag aagggittaag 6 O US 8,067,168 B2 75 - Continued <210s, SEQ ID NO 42 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g 02210

<4 OOs, SEQUENCE: 42 ctgcact citc ggcttctittctgtggct tcc ct citttittct citt caccitct gttitt cagga 6 O

<210s, SEQ ID NO 43 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 103872

<4 OOs, SEQUENCE: 43 gctaaaatcc ticcitggcc cc at catttctt gggtc.ctitt C cagacagtgc tigtgtc.ttta 6 O

<210s, SEQ ID NO 44 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 56.412

<4 OOs, SEQUENCE: 44 gggatgtgtg cct coaactt Cattaagtgagggaaac att totggggot tagt cagggag 6 O

<210s, SEQ ID NO 45 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 18292

<4 OOs, SEQUENCE: 45 tittggtttgt cacctgtgag ttgcc tactg gacacaaaga tigaagctgtc. gggaaagtga 6 O

<210s, SEQ ID NO 46 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 12O75

<4 OOs, SEQUENCE: 46 gttgat Cttic alacatggcta accaaatgga catcagcag goaga cacga ggt attitt Ca 6 O

<210s, SEQ ID NO 47 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 22519

<4 OOs, SEQUENCE: 47 tgtttggatg atgggacaca t caccCtggg aactgtctica aag caca acc acatcttagg 6 O

<210s, SEQ ID NO 48 &211s LENGTH: 60 US 8,067,168 B2 77 78 - Continued

&212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 29531

<4 OOs, SEQUENCE: 48 cctic cttctic cct cagagta cagttcaact cittittaa.gag gaaagcc act gaatgaacct 6 O

<210s, SEQ ID NO 49 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 58853

<4 OOs, SEQUENCE: 49 ggggaggg to tctgcaccitt toctogacatc ttitt citt cqg gagat cotca tagaaccata 6 O

<210s, SEQ ID NO 50 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 35052

<4 OOs, SEQUENCE: 50 gaga caacgc cct cagaaat gagagaacag taccct citta t cottgctgc actitt coagc 6 O

<210s, SEQ ID NO 51 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 67002

<4 OOs, SEQUENCE: 51 atgctic tittg togcc.ca.gact gcc ct ctata aat cago act atcaa.ccctg. tccagagact 6 O

<210s, SEQ ID NO 52 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 45580

<4 OOs, SEQUENCE: 52 gccatctgca tag cctaact citgccaccct ggcttggaca attatggtga titt citctgat 6 O

<210s, SEQ ID NO 53 &211s LENGTH: 60 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 12646

<4 OOs, SEQUENCE: 53 agggcct cita gtttgagttt togc catct at taggatagaa agcacacagt ttgcc tigatt 6 O

<210s, SEQ ID NO 54 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence US 8,067,168 B2 79 80 - Continued

FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO681 left PCR primer

SEQUENCE: 54 agat.cgc.cga gctgtcgtag tag 23

SEO ID NO 55 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 3.9189 left PCR primer

SEQUENCE: 55 cgttct coat coagttc.cat citc 23

SEO ID NO 56 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO644 left PCR primer

SEQUENCE: 56 tat cat catt agaccc.ggga tigg 23

SEO ID NO 57 LENGTH: 24 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 81674 left PCR primer

SEQUENCE: 57 titcggit ct co tocc catttt cota 24

SEO ID NO 58 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 81149 left PCR primer

SEQUENCE: 58 gcaaat aggt caatgctggg aac 23

SEO ID NO 59 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 83841 left PCR primer

SEQUENCE: 59 aagttcc.caa gcacgaagtg titc 23

SEQ ID NO 60 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier US 8,067,168 B2 81 82 - Continued ha1p 38705 left PCR primer

< 4 OOs SEQUENCE: 6 O tccagg tagt catttctgaa gCC 23

SEQ ID NO 61 LENGTH 19 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 4O164 left PCR primer

SEQUENCE: 61 gccc.gc.cctg caaccalacc 19

SEQ ID NO 62 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 23178 left PCR primer

SEQUENCE: 62 ttct cagagt gttggit accg cag 23

SEQ ID NO 63 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 46057 left PCR primer

SEQUENCE: 63 ggttgcCagg acacaaagta agc 23

SEQ ID NO 64 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 4O959 left PCR primer

SEQUENCE: 64 t cagttgaga gct cagagca agc 23

SEO ID NO 65 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 104423 left PCR primer

SEQUENCE: 65 tgatctgttt cot cacticgc titg 23

SEQ ID NO 66 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO847 left PCR primer US 8,067,168 B2 83 - Continued <4 OOs, SEQUENCE: 66 atgggtggat gatggatag atg

<210s, SEQ ID NO 67 &211s LENGTH: 24 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p O8347 left PCR primer

<4 OO > SEQUENCE: 67 ggaggactgc cccatattitt cact

<210s, SEQ ID NO 68 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g O2416 left PCR primer

<4 OOs, SEQUENCE: 68 gcgc.ccc.ggg acgaggtgga C

<210s, SEQ ID NO 69 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 875.40 left PCR primer

<4 OOs, SEQUENCE: 69 gttaatggag ggtgagggitt to c

<210s, SEQ ID NO 70 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 1101.07 left PCR primer

<4 OO > SEQUENCE: 7 O gctgggtgaa tagt cacgga atc

<210s, SEQ ID NO 71 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 89799 left PCR primer

<4 OOs, SEQUENCE: 71 aacgct ctitc ctic caaagag gtc

<210s, SEQ ID NO 72 &211s LENGTH: 27 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 45173 left PCR primer

<4 OOs, SEQUENCE: 72 US 8,067,168 B2 85 86 - Continued gttcttctaa gott catt col acaagag 27

<210s, SEQ ID NO 73 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 80771 left PCR primer

<4 OO > SEQUENCE: 73 ggtgaaat at gtgcggactg atg 23

<210s, SEQ ID NO 74 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 694O7 left PCR primer

<4 OOs, SEQUENCE: 74 ct catt catg cataggtoac act 23

<210s, SEQ ID NO 75 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p O5406 left PCR primer

<4 OO > SEQUENCE: 75 citcc titcc.ca citaggctgca gag 23

<210s, SEQ ID NO 76 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 80287 left PCR primer

<4 OO > SEQUENCE: 76 aggcctggga ggtgactic at ag 22

<210s, SEQ ID NO 77 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g O2345 left PCR primer

<4 OO > SEQUENCE: 77 gtagatgcgg aaattggcct cag 23

<210s, SEQ ID NO 78 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 3 6172 left PCR primer

<4 OO > SEQUENCE: 78 cctaga cago alacacaccca citg 23 US 8,067,168 B2 87 88 - Continued

<210s, SEQ ID NO 79 &211s LENGTH: 24 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 70459 left PCR primer

<4 OO > SEQUENCE: 79 ttcaccgtct c tactittctt cott 24

<210s, SEQ ID NO 8O &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 105937 left PCR primer

<4 OOs, SEQUENCE: 80 ctictaalacac togcct ctac cc.g 23

<210s, SEQ ID NO 81 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 89.099 left PCR primer

<4 OOs, SEQUENCE: 81 tagc ccttga cagcagttgt gac 23

<210s, SEQ ID NO 82 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g O3099 left PCR primer

<4 OOs, SEQUENCE: 82 agagcc acgt gagctgcatt aac 23

<210s, SEQ ID NO 83 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 67625 left PCR primer

<4 OOs, SEQUENCE: 83

Cctggagt cc tagaga.gc.ct cq 22

<210s, SEQ ID NO 84 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO218 left PCR primer

<4 OOs, SEQUENCE: 84 gcc.gc.gc.cac cccacctgag t 21

<210s, SEQ ID NO 85 US 8,067,168 B2 89 90 - Continued

&211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 12535 left PCR primer

<4 OOs, SEQUENCE: 85 agtgttgggit gctgggagga g 21

<210s, SEQ ID NO 86 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 105474 left PCR primer

<4 OOs, SEQUENCE: 86 citctgggctt tot ctagott coc 23

<210s, SEQ ID NO 87 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 74707 left PCR primer

<4 OO > SEQUENCE: 87 agtttgtgag ct caggagaa gC9 23

<210s, SEQ ID NO 88 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 93325 left PCR primer

<4 OOs, SEQUENCE: 88

Ccagtic cc at agtggaaatg Ctic 23

<210s, SEQ ID NO 89 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 101161 left PCR primer

<4 OOs, SEQUENCE: 89 actaccttgg gtgtcagt cc tic 23

<210s, SEQ ID NO 90 &211s LENGTH: 24 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 1.01251 left PCR primer

<4 OOs, SEQUENCE: 90 tggacacatgtctacacgc taag 24

<210s, SEQ ID NO 91 &211s LENGTH: 23 &212s. TYPE: DNA US 8,067,168 B2 91 92 - Continued ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 69214 left PCR primer

SEQUENCE: 91 aggit cactgc agaaggatgg aac 23

SEQ ID NO 92 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 88517 left PCR primer

SEQUENCE: 92 gggact catt caatgtcaag tic 23

SEO ID NO 93 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 103824 left PCR primer

SEQUENCE: 93 t ctittagacg ctg.cgct citt agc 23

SEQ ID NO 94 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 108445 left PCR primer

SEQUENCE: 94

Catctgtc.tt tttgggcag ttg 23

SEO ID NO 95 LENGTH: 21 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g 02210 left PCR primer

SEQUENCE: 95 cc caagtact cqgcactgca c 21

SEO ID NO 96 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 103872 left PCR primer

SEQUENCE: 96 tgagattcaa titt citcgc.cc titc 23

SEO ID NO 97 LENGTH: 18 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: US 8,067,168 B2 93 94 - Continued OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 56.412 left PCR primer

SEQUENCE: 97 tgc.cggtctg. Ctctgctic 18

SEO ID NO 98 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 18292 left PCR primer

SEQUENCE: 98 ttct cotggg tottt coaaa cag 23

SEO ID NO 99 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 12075 left PCR primer

SEQUENCE: 99 gtttgccalaa tacagctgt ttg 23

SEQ ID NO 100 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 22519 left PCR primer

SEQUENCE: 1.OO citcacgittaa toaaccogag to c 23

SEQ ID NO 101 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 295.31 left PCR primer

SEQUENCE: 101

CtcCagdact ttgggaatga aag 23

SEQ ID NO 102 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 58853 left PCR primer

SEQUENCE: 1 O2 atgtcaggaa aggtgcagag acc 23

SEQ ID NO 103 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 35052 left PCR primer US 8,067,168 B2 95 96 - Continued

< 4 OOs SEQUENCE: 103 atct tcc.cag tagggctgaa toc 23

SEQ ID NO 104 LENGTH: 24 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 67002 left PCR primer

SEQUENCE: 104

Cagtgctc.cg tdt CCC caag tagt 24

SEO ID NO 105 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 45580 left PCR primer

SEQUENCE: 105 gggtgcagga tigaagactag Ctg 23

SEQ ID NO 106 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 12646 left PCR primer

SEQUENCE: 106 gcct taggaa titt coat coc aac 23

SEO ID NO 107 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO681 right PCR primer

SEQUENCE: 107 aggccaagga gcagaalacta agc 23

SEQ ID NO 108 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 3.9189 right PCR primer

SEQUENCE: 108 accgagtgat acaaaggact ggg 23

SEQ ID NO 109 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO644 right PCR primer

SEQUENCE: 109 US 8,067,168 B2 97 98 - Continued gggittaatgg agc act acat gcc 23

<210s, SEQ ID NO 110 &211s LENGTH: 24 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 81674 right PCR primer

<4 OOs, SEQUENCE: 110 acggc.cccag agcagcagca agac 24

<210s, SEQ ID NO 111 &211s LENGTH: 19 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 81149 right PCR primer

<4 OOs, SEQUENCE: 111 atgcctg.ccc taccacac 19

<210s, SEQ ID NO 112 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 83841 right PCR primer

<4 OOs, SEQUENCE: 112 ttagaaccag gtc.tctgcct togc 23

<210s, SEQ ID NO 113 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 38705 right PCR primer

<4 OOs, SEQUENCE: 113 cct tcc ttct ctdcctitt ca atg 23

<210s, SEQ ID NO 114 &211s LENGTH: 19 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 4O164 right PCR primer

<4 OOs, SEQUENCE: 114 tgcc ct cocc agcgt.ctitt 19

<210s, SEQ ID NO 115 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 23178 right PCR primer

<4 OOs, SEQUENCE: 115 gaataaagac atgcca aggc cag 23 US 8,067,168 B2 99 100 - Continued

<210s, SEQ ID NO 116 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 46057 right PCR primer

<4 OOs, SEQUENCE: 116

Catggaagca taalgac catg Ctg 23

<210s, SEQ ID NO 117 &211s LENGTH: 18 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 4O959 right PCR primer

<4 OOs, SEQUENCE: 117 actitcc.ggag acago.cgc 18

<210s, SEQ ID NO 118 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 104423 right PCR primer

<4 OOs, SEQUENCE: 118 gcgattgttc tacgaaagtg tdg 23

<210s, SEQ ID NO 119 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO847 right PCR primer

<4 OOs, SEQUENCE: 119 gggcggtgat catttagttt Ctg 23

<210s, SEQ ID NO 120 &211s LENGTH: 24 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 08347 right PCR primer

<4 OOs, SEQUENCE: 120 ctggctgtcc cct atcgtaa cctic 24

<210s, SEQ ID NO 121 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g O2416 right PCR primer

<4 OOs, SEQUENCE: 121

22 US 8,067,168 B2 101 102 - Continued <210s, SEQ ID NO 122 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 87540 right PCR primer

<4 OOs, SEQUENCE: 122 gttcattccc ticcaac attc ctic 23

<210s, SEQ ID NO 123 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 1101.07 right PCR primer

<4 OOs, SEQUENCE: 123

Caaggcatag tactCctc.cg gg 22

<210s, SEQ ID NO 124 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 89799 right PCR primer

<4 OOs, SEQUENCE: 124 gctgtgagag gag C9gaaga g 21

<210s, SEQ ID NO 125 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 45173 right PCR primer

<4 OOs, SEQUENCE: 125 gggitta aggc tiggactictgg C 21

<210s, SEQ ID NO 126 &211s LENGTH: 19 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 80771 right PCR primer

<4 OOs, SEQUENCE: 126 c cctitt catt cacctgg.cg 19

<210s, SEQ ID NO 127 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 694O7 right PCR primer

<4 OOs, SEQUENCE: 127 tggggaggag caatagagat a 21

<210s, SEQ ID NO 128 &211s LENGTH: 23 US 8,067,168 B2 103 104 - Continued

&212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p O5406 right PCR primer

<4 OOs, SEQUENCE: 128 aaacaaccac togcticcitgtc. tcc 23

<210s, SEQ ID NO 129 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 80287 right PCR primer

<4 OOs, SEQUENCE: 129 taaggtgtgt gcc taaccca toc 23

<210s, SEQ ID NO 130 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g O2345 right PCR primer

<4 OOs, SEQUENCE: 130 at atttgagc ct cittgcc ct tcc 23

<210s, SEQ ID NO 131 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 3 6172 right PCR primer

<4 OOs, SEQUENCE: 131 gtc.ttct cot togcaatgggc tic 22

<210s, SEQ ID NO 132 &211s LENGTH: 24 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 70459 right PCR primer

<4 OOs, SEQUENCE: 132 gttgctgttc ttatcc cc at tcta 24

<210s, SEQ ID NO 133 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 105937 right PCR primer

<4 OOs, SEQUENCE: 133 cc catcCtgt agacagat.ca ggg 23

<210s, SEQ ID NO 134 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence US 8,067,168 B2 105 106 - Continued

FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 89.099 right PCR primer

SEQUENCE: 134 gactaatgcc catgacccag aag 23

SEO ID NO 135 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g O3099 right PCR primer

SEQUENCE: 135

Catgcttaga ggtgaacgtg tdg 23

SEQ ID NO 136 LENGTH 19 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p. 67625 right PCR primer

SEQUENCE: 136 gtgcagggtg ggtgagagg 19

SEO ID NO 137 LENGTH: 22 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO218 right PCR primer

SEQUENCE: 137 gcggct Cogg ggc caatgaa t 22

SEQ ID NO 138 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 12535 right PCR primer

SEQUENCE: 138 tgagggttct ttgttgttct tcg 23

SEQ ID NO 139 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 105474 right PCR primer

SEQUENCE: 139 gggtgctic ct tccatctaca atg 23

SEQ ID NO 140 LENGTH: 2O TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier US 8,067,168 B2 107 108 - Continued ha1p 74707 right PCR primer

<4 OOs, SEQUENCE: 140 ggcgttctgc tittgggagac

<210s, SEQ ID NO 141 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 93325 right PCR primer

<4 OOs, SEQUENCE: 141

CtcCatgagg taggtgaa gac 23

<210s, SEQ ID NO 142 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 101161 right PCR primer

<4 OOs, SEQUENCE: 142 gttgaagtga gcagaggaca to 23

<210s, SEQ ID NO 143 &211s LENGTH: 24 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 1.01251 right PCR primer

<4 OOs, SEQUENCE: 143 tgtctgtggg tatt citttgt togcc 24

<210s, SEQ ID NO 144 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 69214 right PCR primer

<4 OOs, SEQUENCE: 144 atgcct agcc agt caggaat cag 23

<210s, SEQ ID NO 145 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 88517 right PCR primer

<4 OOs, SEQUENCE: 145 ggtgttacct cottt catgg togc 23

<210s, SEQ ID NO 146 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 103824 right PCR primer US 8,067,168 B2 109 110 - Continued <4 OOs, SEQUENCE: 146 aaacgt.ccgc tact cittgag cac 23

<210s, SEQ ID NO 147 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 108445 right PCR primer

<4 OOs, SEQUENCE: 147 ataa cacttg cacaccacca ccc 23

<210s, SEQ ID NO 148 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g 02210 right PCR primer

<4 OOs, SEQUENCE: 148 agagggaagc cacagaaaga agc 23

<210s, SEQ ID NO 149 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 103872 right PCR primer

<4 OOs, SEQUENCE: 149 agggatggac Caagagggala C 21

<210s, SEQ ID NO 150 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 56.412 right PCR primer

<4 OOs, SEQUENCE: 150 taactic cacc ctd tacctgg ctic 23

<210s, SEQ ID NO 151 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 18292 right PCR primer

<4 OOs, SEQUENCE: 151 ttagtgacca ggtgatggtg gtg 23

<210s, SEQ ID NO 152 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 12075 right PCR primer

<4 OOs, SEQUENCE: 152 US 8,067,168 B2 111 112 - Continued

Ctcaggaatt gcago Cttga gac 23

<210s, SEQ ID NO 153 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 22519 right PCR primer

<4 OOs, SEQUENCE: 153 atcCtgcaat gtttggatga tigg 23

<210s, SEQ ID NO 154 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 29531 right PCR primer

<4 OOs, SEQUENCE: 154 t caccct citt aatgtcagct c cc 23

<210s, SEQ ID NO 155 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 58853 right PCR primer

<4 OO > SEQUENCE: 155 caggctgttct ct cotagca atg 23

<210s, SEQ ID NO 156 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 35052 right PCR primer

<4 OOs, SEQUENCE: 156 ggcatcaaac tdaaatcc to citg 23

<210s, SEQ ID NO 157 &211s LENGTH: 24 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 67002 right PCR primer

<4 OO > SEQUENCE: 157

Cagt cc.gatt tdalaggtocc agtg 24

<210s, SEQ ID NO 158 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 45580 right PCR primer

<4 OOs, SEQUENCE: 158

Cagggcagtic actgagaatg agg 23

US 8,067,168 B2 121 122 - Continued

SEO ID NO 171 LENGTH: 413 TYPE: DNA ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 104423 amplicon

SEQUENCE: 171 tgatctgttt cot cacticgc titgct cqtgg atgtcattitt citgtc.ttctt ggggg.cggcc 6 O aaaaatcgac C9gtgtcggg gaccagaggc ggcc.ccgcac gcc.ccc.gcgt gtgcgt.ccac 12 O gggcgtctgt gcagacggac actgtgc.cgg ggcgagctga Caggagttca C9gctg.cgat 18O agaacatgga gatgtcatgg gcgcgacaga gcctg.cggg gataccagca gcgtgtgttgt 24 O gtggacggca acgttgttctg. tcc.gc.gtgttgttgagtgag tagggagag agaga.gagaa 3OO taggtgttgttg tagaggct co cqgtgcct ct gtctggctgc tigaggctgag atgggagcaa. 360 gtggctggcg aagctggtgg tottcaaa C cacactitt C gtagaacaat CC 413

SEO ID NO 172 LENGTH: 560 TYPE: DNA ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g OO847 amplicon

SEQUENCE: 172 atgggtggat ggatggatag atgaatgggt ggatggatgg agggatggat ggacggacgg 6 O acggacagat atatggatgg atgcattatg gatggatgca tacggacg gacggatgca 12 O cggagagatg gatggatgca tatggata gatagacgga tiggacggacg gatagataga 18O tggatgc.cgg aaaggagaag aatgagagac ggatgtgaga Ctagatgcac gcaggcc aga 24 O gcaccagcat acgctggaac agagcacgag catacact.cg alacacacgcg cacacactica 3OO ggaCatctgc gcacagacat acaatcct Co gcgt.ccgctt C cacgcaggc attcgc.gcac 360 Ctacatacac gcagttgcac gcgc.gc.gcac acacacaggc acacacacgc agaCatgcac acacacgcag acatgcacac acaccacaca tdcagacatg cacacacacg cagacatgca cacacacacic acacatacac acacacacac acacagttcg cct ct coct g g gtttct cag 54 O aaactaaatg at caccgc.cc 560

SEO ID NO 173 LENGTH: 430 TYPE: DNA ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p O8347 amplicon

SEQUENCE: 173 ggaggactgc cccatattitt cactic ccaaa aaccoct tag atgactic cct gcct caccc.c 6 O cgcc.ccc.cag gttctgaaag agcct tcc.cg ccagacitgca ttgattalacc att cattgcc 12 O c catttittta ttaatcaaag acatatataa ttgct catcg gagcttgttga t cagogtgag 18O gcct tact aa goagctgcct tactatoctt coag.cccaga gcacgtgagc tigacgtc.ttic 24 O tittggcctgt gtggcc.gttt ccttgccalaa agct cagttt ggggaga.gct tcttgcgitat 3OO tagatgcagt ctgcagactic cca accc.cag ctacctggat CCC ctgaggg CCC aggaact 360 c cagctatt c caa.gcc cact c ct ctitttitt ttalagaggaa gaaatagagg ttacgatagg

US 8,067,168 B2 131 132 - Continued ttcaccgt.ct c tactittctt cottctaaga gatcc tdaaa cct citgtcat ggaaaagtac 6 O cacgtgttgg agatgattgg agaaggct ct tttgggaggg ttacaaggg togaagaaaa 12 O tacagtgctic aggtgttgca caaagaggga tacct tttgg gtgggatttg gtaccCC caa 18O citccagtgga aaatggat ct agaaggaatg tatttatacc agtttgtatt cotaagg tac 24 O tgactic cct c at actt citta toggagtataa gotttgaagit toggattattt gggtgcaaat 3OO tccatctoca cc.gctttcta gcc ctagaat tatggacaaa ttact taact tcc ctatgtc 360 t caattitcct tat citctaga atggggataa gaacagdaac 4 OO

SEQ ID NO 186 LENGTH: 408 TYPE: DNA ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 105937 amplicon

SEQUENCE: 186 ctictaalacac togcct ctac cc.gcc.gc.ccc gcgaaccc.ca cacactgcag acgcgacact 6 O cgcaagtttic ggggatggcg gcc.ggcgagg gcc at actgc gtctitt.ccgg agacacggaa 12 O tacggcacca gcc.gtc.cctt tatgatgcaa tatgtctgcg CCC aggggac gcttgctggg 18O agcago catt ttcaac ccta citgcc.gtaga gcaggcggag tocct cittitt cqc.gc.cttaa 24 O gacagg tagg ttctgacgat gaaaa.gcaat tdaaaacgac ccattt cacc ctitttitccag 3OO tccacgtgaa citgctagatc ttggctttgc aac attagcc agggg.cgct a cataaactgc 360 ttagtttct c aaaggcticaa gCCtgcc.ctg atctgtctac aggatggg 408

SEO ID NO 187 LENGTH: 591 TYPE: DNA ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 89.099 amplicon

SEQUENCE: 187 tagc ccttga cagcagttgt gacggggagg attggaccgc ccggattagg gctaattaag 6 O ggtggatggg tag.cgggcgg galagtgcgg gttggctaaat aggtgcCtga talacticagtt 12 O aact tcc.ctic ttggcttitt c cctittgacct taacacttitt ggggittatct c to aggcgaa 18O tgctaaagga gacgct coag gactic gacct Ctgaaggtoc ttggagccala titc.cgtaata 24 O tgat catgga aactgat cat togcctgat cottctaccgcc ttgg.cgc.gct ttcttgagga 3OO atgtctttgg ttaatggctt cacggc.catg gaggtgacat catgtggaca acaggctagg 360 atgc.ca.gagt tagctgct cq gtggagat catttgaggltda gcagaggc.ca catatt at a gatact alaca gaccc.cgata tigggagalaa agcaaaagca ggagcctgaa tat coagtgc tttgtgagct gtcagttctg ttgtaatgg Caggaaatcc aagttcttaac togc cctta 54 O

Ctatgaaggc titt CC9agcc agccacagct tctgggt cat gggcattagt c 591

SEQ ID NO 188 LENGTH: 470 TYPE: DNA ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1g O3099 amplicon

US 8,067,168 B2 147 148 - Continued aatticagdac taggitt catt cagtggctitt cotcttaaaa gagttgaact gtact ctdag ggagaaggag gaaaaaattt atgggagctg acattaa.gag ggtga 525

<210s, SEQ ID NO 208 &211s LENGTH: 507 212. TYPE : DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 58853 amplicon

<4 OOs, SEQUENCE: 208 atgtcaggaa aggtgcagag acc ct cocco aca Caccaca ttgttgttcc at cottct ct 6 O gtgc ctittgg gcaaggactic tcc gttctgt agggaccacc aggtggaatt aaagt ctaca 12 O citcc to caag agctgat citt gggg.cggCCC cacct coatg CCCCtectaca gcgtgcc att 18O ct catagaac acactaggac citttgtcc to tggagctgtt Cagtgcagca gctctgacct 24 O cat cottctg Cagaa.gc.ctic cacott ct ct c cc ct ct citc. ctic ct gcgct ttgtgtgtc.c 3OO tgttct tcca Cttctggtgac citgtctocto c cctaatctg gct cagaga.g ggg taccagc 360 tgctgctgct gctattgctt cittcttctgt taaaggttitt t tattitt titt c caatgacaa agctatgctic attctgaaaa catgaaaaat aaaaatgctic aaaaaataaa. act Cact Cta catt cattgc taggagagaa Cagcctg

<210s, SEQ ID NO 209 &211s LENGTH: 526 212. TYPE : DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 35052 amplicon

<4 OOs, SEQUENCE: 209 atct tcc.cag tagggctgaa tcc tagacca atctatocaat cc.ca.gactaa t caggcattt 6 O gcctggggat atgcatctitt ggcatttitt c Caagggttca t caggatgga gat at CC9gt 12 O gcac catgag ttctgtttco ttaat caa.ca cc.gttgtaac ttgcc catcc agttttgttga 18O

Cattaatt Ca aac ctdtgcc c tagt cct ct tittaggcagc gtat cagtgc tggaaagtgc 24 O agcaaggata agaggg tact gttct ct cat ttctgagggc gttgtctica taattalacta 3OO acttgataga Ctttittagtg agtggCaggit gagatgcaag gtact.gtgct aggtgctgttg 360 ggggatgtac agacaaacaa Cacac CtcCC taaggaggta agtaatagot act tact att cactittgctic titt cactgta atgitat cotc aag cacaggt titt Cactaca c catcaggcc

Cagaagtact agct tattitt ccacaggagg attt cagttt gatgcc 526

<210s, SEQ ID NO 210 &211s LENGTH: 451 212. TYPE : DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <223> OTHER INFORMATION: OGHAv1. O microarray feature identifier ha1p 67002 amplicon

<4 OOs, SEQUENCE: 210

Cagtgctc.cg tdt CCC caag tag titatic cc toccc.cggala gactgaagtic cctgggg.cgg 6 O gtaggagcgc acgctaggag tag tatgaat aaagtgttitt Cttggaggt C atgggcaagt 12 O

Ctctggacag ggttgatagt gctgattt at agagggcagt ctgggcacaa agagcattta 18O