USOO923424.4B2

(12) United States Patent (10) Patent No.: US 9.234,244 B2 Zeiger et al. (45) Date of Patent: Jan. 12, 2016

(54) DIAGNOSTICTOOL FOR DIAGNOSING FOREIGN PATENT DOCUMENTS BENIGN VERSUS MALIGNANT THYROD EP O721016 T 1996 LESIONS EP O72852O 5, 2001 EP O799897 6, 2006 (75) Inventors: Martha Allen Zeiger, Bethesda, MD WO WO95/22058 8, 1995 (US); Nijaguna B. Prasad, Columbia, WO WO 97/O2357 1, 1997 WO WO 97.27317 7/1997 MD (US); Steven K. Libutti, Potomac, WO WO 97.29212 8, 1997 MD (US); Christopher B. Umbricht, WO WO 2005100608 * 10/2005 Towson, MD (US) OTHER PUBLICATIONS (73) Assignees: The United States of America, as represented by the Secretary, Promega Catalog. 1997, p. 78.* Rosen et al. Surgery. 2005. 138: 1050-1057.* Department of Health and Human Liu et al. Clinical Immunology. 2004. 112: 225-230.* Services, Washington, DC (US); The Coleman, R. Drug Discovery Today, 2003. 8:233-235.* Johns Hopkins University, Baltimore, Saetire et al. Molecular Brain Research 2004. 126: 198-206. MD (US) Ito et al. AntiCancer Research. 2002. 22(4):2385-2389.* Hanke et al. Clinical Chemistry. 2007. 53: 2070-2077.* (*) Notice: Subject to any disclaimer, the term of this Palmer et al. BMC Genomics. 2006. 7:115. patent is extended or adjusted under 35 Minet al. BMC Genomics. 2010. 11:96. U.S.C. 154(b) by 608 days. GeneCard for HMGA2, available via url: , printed Jul. 17, 2012.* Ito et al. Cancer Letters. 2003. 200: 161-166. (21) Appl. No.: 12/675,209 Beige etal. and Cancer. Oct. 2007. 47: 56-63.* Free Dictionary definition for “measuring', available via url: < (22) PCT Filed: Aug. 27, 2008 thefreedictionary.com/measuring>, printed on Mar. 18, 2014.* Barden et al., “Classification of follicular thyroid tumors by molecu (86). PCT No.: PCT/US2O08/O1O139 lar signature: results of profiling”. Clin. Cancer Res., May 2003, 9(5), 1792-1800. S371 (c)(1), Berlingieri et al., "Inhibition of HMGI-C synthesis Sup (2), (4) Date: Jul. 26, 2010 presses retrovirally induced neoplastic transformation of rat thyroid cells”, Molec Cell Biol. 2005:15, p. 1545. (87) PCT Pub. No.: WO2009/029266 Caraway et al., “Diagnostic pitfalls in thyroid fine-needle aspiration: a review of 394 cases', Diagn. Cytopathol., 1993, 9(3), 345-350. PCT Pub. Date: Mar. 5, 2009 Cerutti et al., “Molecular profiling of matched samples identifies (65) Prior Publication Data biomarkers of papillary thyroid carcinomalymph node metastasis'. Cancer Res., Aug. 15, 2007, 67(16), 7885-7892. US 2010/0285979 A1 Nov. 11, 2010 Chee et al., “Accessing genetic information with high-density DNA arrays”, Science, Oct. 25, 1996, 274(5287), 610-614. Dudoit et al... “Comparison of discrimination methods for classifica tion of tumors using DNA microarrays'. Journal of the American Related U.S. Application Data Statistical Association, 2002, 97.77-87. (60) Provisional application No. 60/966,271, filed on Aug. Eberwine, "Amplification of mRNA populations usingaRNA gener 27, 2007. ated from immobilized oligo(dT)-T7 primed cDNA'. Biotechniques, Apr. 1996, 2004), 584-591. Fedor et al., “Practical methods for tissue microarray construction'. (51) Int. Cl. Methods Mol Med. 2005, 103, 89-101. CI2O I/68 (2006.01) Finley et al., “Advancing the molecular diagnosis of thyroid nodules: (52) U.S. Cl. defining benign lesions by molecular profiling', Thyroid. Jun. 15, CPC ...... CI2O I/6886 (2013.01); C12O 2600/106 2005, (6), 562-568. (2013.01); C12O 2600/1 12 (2013.01); C12O Gharib et al., “Fine-needle aspiration biopsy of the thyroid. The 2600/136 (2013.01) problem of Suspicious cytologic findings'. Ann Intern Med., Jul. 1984, 101(1), 25-28. (58) Field of Classification Search Goellner et al., “Fine needle aspiration cytology of the thyroid 1980 None to 1986, Acta Cytol., Sep.-Oct. 1987, 31(5),587-590. See application file for complete search history. Goellner, “Problems and pitfalls in thyroid cytology”, Monogr (56) References Cited Pathol., 1997, (39), 75-93. (Continued) U.S. PATENT DOCUMENTS Primary Examiner — Carla Myers 4,683, 195 A 7, 1987 Mullis et al. 4,683.202 A 7, 1987 Mullis 4,965,188 A 10, 1990 Mullis et al. (57) ABSTRACT 5,556,752 A 9, 1996 Lockhart et al. The present invention relates to the use of genes differentially 5,578,832 A 11/1996 Trulson et al. 5,593,839 A 1/1997 Hubbell et al. expressed in benign thyroid lesions and malignant thyroid 5,599,695 A 2f1997 Pease et al. lesions for the diagnosis and staging of thyroid cancer. 5,631,734 A 5, 1997 Stern et al. 2007/003718.6 A1* 2/2007 Jiang et al...... 435/6 6 Claims, 24 Drawing Sheets US 9.234,244 B2 Page 2

(56) References Cited Raychaudhuri et al., “Basic microarray analysis: grouping and fea ture reduction'. Trends Biotechnol. May 2001, 19(5), 189-193. Rosen et al., “A six-gene model for differentiating benign from OTHER PUBLICATIONS malignant thyroid tumors on the basis of '. Surgery, Golub et al., “Molecular classification of cancer: class discovery and Dec. 2005, 138(6), 1050-1056. class prediction by gene expression monitoring. Science, Oct. 15, Sauter et al., “Predictive molecular pathology”. N. EnglJ.Med., Dec. 1999, 286(5439), 531-537. 19, 2002, 347(25), 1995-1996. Gordon et al., “Using gene expression ratios to predict outcome Schulze et al., “Navigating gene expression using microarrays—a among patients with mesothelioma. J. Natl. Cancer Inst., Apr. 16. technology review”, Nat Cell Biol. Aug. 2001, 3(8), E 190-E195. 2003, 95(8), 598-605. Sherman, “Thyroid carcinoma'. Lancet, Feb. 8, 2003, 361 (9356), Hacia et al., “Detection of heterozygous mutations in BRCA1 using 501-511 high density oligonucleotide arrays and two-colour fluorescence Simon et al., “Analysis of gene expression data using BRB-Ar analysis', Nat Genet. Dec. 1996, 14(4), 441-447. rayTools', Cancer Inform, Feb. 4, 2007, 3, 11-17. Hamberger et al., “Fine-needle aspiration biopsy of thyroid nodules. Simon et al., “Pitfalls in the use of DNA microarray data for diag Impact on thyroid practice and cost of care”. Am. J. Med., Sep. 1982, nostic and prognostic classification'. J. Natl Cancer Inst., Jan. 1, 73(3), 381-384. 2003, 95(1), 14-18. Huang et al., “Gene expression in papillary thyroid carcinoma reveals Siraj et al., “Genome-wide expression analysis of Middle Eastern highly consistent profiles”, PNAS USA., Dec. 18, 2001, 98(26), papillary thyroid cancer reveals c-MET as a novel target for cancer 15044-15049. therapy”, J. Pathol., Oct. 2007, 213(2), 190-199. Jarzab et al., “Gene expression profile of papillary thyroid cancer: Smith et al., “Comparison of biosequences.” Adv. Appl. Math., 1970, Sources of variability and diagnostic implications'. Cancer Res., Feb. 2,482. 15, 2005, 65(4), 1587-1597. Staudt, “Gene expression profiling of lymphoid malignancies'. Kozal et al., “Extensive polymorphisms observed in HIV-1 Glade Annu. Rev. Med., 2002, 53, 303-318. Bprotease gene using high-density oligonucleotide arrays'. Nat Suen, “How does one separate cellular follicular lesions of the thy Med., Jul 1996, 207), 753-759. roid by fine-needle aspiration biopsy?', Diagn Cytopathol. Mar. Lockhart et al., “Expression monitoring by hybridization to high 1988, 4(1), 78-81. density oligonucleotide arrays', Nat Biotechnol. Dec. 1996, 14(13), Tatusova et al., “Blast 2 Sequences, a new tool for comparing protein 1675-1680. and nucleotide sequences'. FEMS Microbiol Lett. May 15, 1999, MaZZaferri et al., “Long term impact of initial Surgical and medical 174(2), 247-250. Erratum in: FEMS Microbiol Lett, Aug. 1, 1999, therapy on papillary and follicular thyroid cancer', 1994, 97, 418 177(1), 187-188. 428. Vallone et al., “Neoplastic transformation of ratthyroid cells requires Mazzaferri, “Management of a solitary thyroid nodule'. N. Engl. J. the junBand fra-1 gene induction which is dependent on the HMGI-C Med., Feb. 25, 1993, 328(8), 553-559. gene product”, EMBO J., Sep. 1, 1997, 16(17), 5310-5321. Mechanicket al., “Progress in the preoperative diagnosis of thyroid Van de Vijver et al., "A gene-expression signature as a predictor of nodules: managing uncertainties and the ultimate role for molecular survival in breast cancer', N. Engl. J. Med., Dec. 19, 2002, 347(25), investigation'. Biomed Pharmacother, Sep. 2006, 60(8), 396–404. 1999-2009. Miller et al., “Optimal gene expression analysis by microarrays'. Van't Veer et al., “The microarray way to tailored cancer treatment”. Cancer Cell, Nov. 2002, 2(5), 353-361. Nat. Med., Jan. 2002, 8(1), 13-14. Needleman et al., “A general method applicable to the search for Wang et al., “High-fidelity Mirna amplification for gene profiling”. similarities in the amino acid sequence of two ”. J. Mol Biol. Nat Biotechnol., Apr. 2000 18(4), 457-459. Mar. 1970, 48(3), 443-453. West et al., “Predicting the clinical status of human breast cancer by Pearson et al., “Improved tools for biological sequence comparison'. using gene expression profiles'. PNAS USA, Sep. 25, 2001, 98(20), PNAS USA, Apr. 1988, 85(8), 2444-2448. 11462-11467. Radmacher et al., “A paradigm for class prediction using gene Wright et al., “A random variance model for detection of dif expression profiles'. J. Comput Biol., 2002, 9(3), 505-511. ferentialgene expression in Small microarray experiments'. Ramaswamy et al., “Multiclass cancer diagnosis using tumor gene Bioinformatics, Dec. 12, 2003, 19(18), 2448-2455. expression signatures”, PNAS USA, Dec. 18, 2001, 98(26), 15149 Yarden et al., “Human proto-oncogene c-kit: a new cell Surface 15154. receptor tyrosine kinase for an unidentified ligand”. EMBO.J., Nov. Ravetto et al., “Usefulness of fine-needle aspiration in the diagnosis 1987, 6(11), 3341-3351. of thyroid carcinoma: A retrospective study in 37,895 patients'. Cancer, Dec. 25, 2000, 90(6), 357-363. * cited by examiner U.S. Patent Jan. 12, 2016 Sheet 1 of 24 US 9.234,244 B2

PCA MAPPING OF 1 (58.9%)

3.25

SS O CnO) -0.5 8.& XX S zSN

|al "a 2 2, 2 is -425 || | | | | | | -3.6 -0.95 17 455 7 -16 -8.5 -1 6.5 14 PC #1 43.7% F.G. 1 U.S. Patent Jan. 12, 2016 Sheet 2 of 24 US 9.234,244 B2

PCA MAPPING OF 3 (62.4%) O BENIGN

5.7 8 MALIGNANT

PC 151.7% F.G.2 U.S. Patent Jan. 12, 2016 Sheet 3 of 24 US 9.234,244 B2

PCA MAPPING OF 1 (75.8%) RENT SS XX

Xs X a " 22 SSSr MX Xs Sy x H x x x

XX X

| | ety of| E| to ||o | |PC is 12.7%7% a BEAEAREESEEEleSeaSteelaH-T-T-l -2.42 Le Fiz=z===s=s=s=s 3.3 -5.7 - 1.65 0.4 2.45 4.5 PC #1 51.2% FIG. 3A

PCA MAPPING OF 1 (75.8%) ENNTx. sy Il ENGUNKNOWN ---in ca. It to MALIGNANT a

Sa

| TFF | | as It is

J.5 -3.7 -1.65 0.4 2.45 4.5 PC #1 47.7% FG.3B U.S. Patent Jan. 12, 2016 Sheet 4 of 24 US 9.234,244 B2

S.N- No33 ve CN so 1. S on N.S &v H, G 9 NZ II & l

s

22222222 SN

&

22 NNNNNNN

22 NSNS

22 NS

&

22222222 N

&III Se2 NNNNNzX

X s IITTTTTTTTTTTTTTTTTTTTTTIele (fSt

ZZZZ as N

rarer as III t Z N

& X - - - Se ZXN - co O r- N N var O U.S. Patent Jan. 12, 2016 Sheet 5 of 24 US 9.234,244 B2

LWS]

OWAS)O U.S. Patent Jan. 12, 2016 Sheet 6 of 24 US 9.234,244 B2

E-cadherin Kit

2. ' essary as HN

FA

FVPTC

PTC U.S. Patent Jan. 12, 2016 Sheet 7 of 24 US 9.234,244 B2

OWAS)O U.S. Patent Jan. 12, 2016 Sheet 8 of 24 US 9.234,244 B2

A Papillary thyroid carcinoma

PTC PC PC3 PC PC5 PTC6 PTC7 PTC PTC9 PTC 10 sissississists sists e HMGA2

Follicular variant of papillary thyroid carcinoma VPTC1 VPTC4 FVPC5 FVPTC6 FWTC WTC10 FwpTC 1 fi: 8 k is to HMGA2

is Act (as GAPDH

B Folticular adenona

FA FA2 FA3 FA4 FAs FAs A. FAB FA9 FA FA11

de PLAs

GAPDH

Adenomatoid nodule AN AN2 AN3 AN4 AN5 ANs AN AN3 AN9 AN

FIG. 8 U.S. Patent Jan. 12, 2016 Sheet 9 of 24 US 9.234,244 B2

SPOCK

FIG. 9 U.S. Patent Jan. 12, 2016 Sheet 10 of 24 US 9.234,244 B2

PDE5A

FIG. 9 (cont. ..) U.S. Patent Jan. 12, 2016 Sheet 11 of 24 US 9.234,244 B2

CEACAM6

FIG. 9 (cont. U.S. Patent Jan. 12, 2016 Sheet 12 of 24 US 9.234,244 B2

LRRK2

Hodwº)onpaz!!leuuuousla^31uopssæudxaaap?eæ8

FIG. 9 (cont...) U.S. Patent Jan. 12, 2016 Sheet 13 of 24 US 9.234,244 B2

PRSS3

.

HOdvs)onpºzi?euulsousp3a31uopssaudxºaap?e?ax.

FIG. 9 (cont...) U.S. Patent Jan. 12, 2016 Sheet 14 of 24 US 9.234,244 B2

TAO5

HOdvs)oùpazl?euuluousj?naiuoqssaudxææa?eqax

FIG. 9 (cont...) U.S. Patent Jan. 12, 2016 Sheet 15 of 24 US 9.234,244 B2

HMGA2

2 un

Hordvs)oùpazjuevuuoustana]uo]ssaadxa?aj?elaxa

FIG. 10 U.S. Patent Jan. 12, 2016 Sheet 16 of 24 US 9.234,244 B2

CD3

HOdvs.)o,paz!!leuuuousla^31uomssaudxæaaneqaae

FIG. 10 (cont...) U.S. Patent Jan. 12, 2016 Sheet 17 of 24 US 9.234,244 B2

PAG1

FIG. 10 (cont...) U.S. Patent Jan. 12, 2016 Sheet 18 of 24 US 9.234,244 B2

RAG2

FIG. 10 (cont. U.S. Patent Jan. 12, 2016 Sheet 19 of 24 US 9.234,244 B2

DPP4

HOdvsoo?paz!!leuuuousiaaaluorssæludx?aap?eqax

FIG. 10 (cont. ..) U.S. Patent Jan. 12, 2016 Sheet 20 of 24 US 9.234,244 B2

AGTR

Hodwºonpaz!!leuuuouslaaa!uopssæadxaaap?e?a?

FIG. 10 (cont. ..) U.S. Patent Jan. 12, 2016 Sheet 21 of 24 US 9.234,244 B2

U.S. Patent Jan. 12, 2016 Sheet 22 of 24 US 9.234,244 B2

PCA Mapping (33.2%)

rC f1 12.4% F.G. 12A U.S. Patent Jan. 12, 2016 Sheet 23 of 24

PCA Mapping (44.3%)

aCO O52 OS 16

U.S. Patent Jan. 12, 2016 Sheet 24 of 24 US 9.234,244 B2

PCA Mapping (47.9%)

PC #1 35.4% FIG. 12C US 9,234,244 B2 1. 2 DAGNOSTIC TOOL FOR DAGNOSING when the diagnosis is unclear on FNA these patients are BENIGN VERSUS MALIGNANT THYROD classified as having a suspicious or indeterminate lesion only. LESIONS It is well known that frozen section analysis often yields no additional information. CROSS-REFERENCE TO RELATED The question then arises: “Should the surgeon perform a APPLICATIONS thyroid lobectomy, which is appropriate for benign lesions or a total thyroidectomy, which is appropriate for malignant This application is the National Stage of International lesions when the diagnosis is uncertain both preoperatively Application No. PCT/US2008/0010139, filed Aug. 27, 2008, and intra-operatively?' Thyroid lobectomy as the initial pro which claims the benefit of U.S. Provisional Application No. 10 cedure for every patient with a suspicious FNA could result in 60/966.271, filed Aug. 27, 2007, the disclosures of which are the patient with cancer having to undergo a second operation incorporated herein by reference in their entireties. for completion thyroidectomy. Conversely, total thyroidec tomy for all patients with suspicious FNA would result in a FIELD OF THE INVENTION majority of patients undergoing an unnecessary Surgical pro 15 cedure, requiring lifelong thyroid hormone replacement and The present invention relates to the use of genes differen exposure to the inherent risks of Surgery (16). tially expressed in benign thyroid lesions and malignant thy Several attempts to formulate a consensus about classifi roid lesions for the diagnosis and staging of thyroid cancer. cation and treatment of thyroid carcinoma based on standard histopathologic analysis have resulted in published guide BACKGROUND OF THE INVENTION lines for diagnosis and initial disease management (2). In the past few decades no improvement has been made in the dif It is well known that cancer results from changes in gene ferential diagnosis of thyroid tumors by fine needle aspiration expression patterns that are important for cellular regulatory biopsy (FNA), specifically suspicious or indeterminate thy processes Such as growth, differentiation, DNA duplication, roid lesions, Suggesting that a new approach to this should be mismatch repair and apoptosis. It is also becoming more 25 explored. Thus, there is a compelling need to develop more apparent that effective treatment and diagnosis of cancer is accurate initial diagnostic tests for evaluating a thyroid nod dependent upon an understanding of these important pro ule. cesses. Classification of human cancers into distinct groups based on their origin and histopathological appearance has SUMMARY OF THE INVENTION historically been the foundation for diagnosis and treatment. 30 This classification is generally based on cellular architecture, This invention is based in part on the discovery of genes certain unique cellular characteristics and cell-specific anti whose expression levels can be correlated to benign or malig gens only. In contrast, gene expression assays have the poten nant states in a thyroid cell. Thus, the present invention pro tial to identify thousands of unique characteristics for each vides differentially expressed genes that can be utilized to tumor type (3) (4). Elucidating a genome wide expression 35 diagnose, stage and treat thyroid cancer. These differentially pattern for disease states not only could have a enormous expressed genes are collectively referred to herein as “Differ impact on the understanding of specific cell biology, but entially Expressed Thyroid' genes (DET genes). Examples could also provide the necessary link between molecular of these DET genes are provided herein and include C21orf4 genetics and clinical medicine (5) (6) (7). (DET1), Hs. 145049 (DET2), Hs.296031 (DET3), KIT Thyroid carcinoma represents 1% of all malignant dis 40 (DET4), LSM7 (DET5), SYNGR2 (DET6), C11orf3 eases, but 90% of all neuroendocrine malignancies. It is esti (DET7), CDH1 (DET8), FAM13A1 (DET9), IMPACT mated that 5-10% of the population will develop a clinically (DET10), KIAA1128 (DET11). significant thyroid nodule during their life-time (8). The best Examples of additional DET genes provided herein available test in the evaluation of a patient with a thyroid include HMGA2 (DET12), KLK7 (DET13), MRC2 nodule is fine needle aspiration biopsy (FNA) (9). Of the 45 (DET14), LRRK2 (DET15), PLAG1 (DET16), CYP1B1 malignant FNAS, the majority are from papillary thyroid can (DET17), DPP4 (DET18), FNDC4 (DET19), PHLDA2 cers (PTC) or its follicular variant (FVPTC). These can be (DET20), CCNA1 (DET21), CDH3 (DET22), CEACAM6 easily diagnosed if they have the classic cytologic features (DET23), QSCN6 (DET24), COL7A1 (DET25), MGC9712 including abundant cellularity and enlarged nuclei containing (DET26), IL1RAP (DET27), LAMB3 (DET28), PRSS3 intra-nuclear grooves and inclusions (10). Indeed, one third of 50 (DET29), LRP4 (DET30), SPOCK1 (DET31), PDE5A the time these diagnoses are clear on FNA. Fine needle aspi (DET32), FLJ37078 (DET33), FBN3 (DET34), DIRAS3 ration biopsy of thyroid nodules has greatly reduced the need (DET35), PRSS1 (DET36), CAMK2N1 (DET37), SNIP for thyroid Surgery and has increased the percentage of malig (DET38), KCNJ2 (DET39), SFN (DET40), GALNT7 nant tumors among excised nodules (11, 12). In addition, the (DET41), TGFA (DET42), BAIAP3 (DET43), KCNK15 diagnosis of malignant thyroid tumors, combined with effec 55 (DET44) These genes are upregulated in malignant thyroid tive therapy, has lead to a marked decrease in morbidity due to tumors. thyroid cancer. Unfortunately, many thyroid FNAs are not Examples of additional DET genes provided herein definitively benign or malignant, yielding an “indeterminate' include RAG2 (DET45), CLYBL (DET46), NEB (DET47), or 'suspicious' diagnosis. The prevalence of indeterminate TNFRSF11B (DET48), GNAI1 (DET49), AGTR1 (DET50), FNAs varies, but typically ranges from 10-25% of FNAs 60 HLF (DET51), SLC26A4 (DET52), MT1A (DET53), (13-15). In general, thyroid FNAs are indeterminate due to FABP4 (DET54), LRP1B (DET55), SLC4A4 (DET56), overlapping or undefined morphologic criteria for benign LOC646278 (DET57), MAN1C1 (DET58), KCNIP3 Versus malignant lesions, or focal nuclear atypia within oth (DET59), DNAJB9 (DET60), UBR1 (DET61), HSD17B6 erwise benign specimens. Of note, twice as many patients are (DET62), SLC33A1 (DET63), CDH16 (DET64), TBC1D1 referred for surgery for a suspicious lesion (10%) than for a 65 (DET65), SLC26A7 (DET66), C11orf74 (DET67), PLA2R1 malignant lesion (5%), an occurrence that is not widely (DET68), PTTG3 (DET69), EFEMP1 (DET70), ZMAT4 appreciated since the majority of FNAs are benign. Therefore (DET71), STEAP3 (DET72), DIO1 (DET73), KIT (DET4), US 9,234,244 B2 3 4 TPO (DET74), PTTG1 (DET75), LGI3 (DET76), lation comprising at least one cell for which a thyroid lesion TMEM38B (DET77), SLITRK4 (DET78), VBP1 (DET79), classification is known; and c) identifying a difference, if COL9A3 (DET80), IRS1 (DET81), STARD13 (DET82), present, in expression levels of one or more nucleic acid LOC654085 (DET83), RPS3A (DET84), SPARCL1 sequences selected from the group consisting of DET1, (DET85). These genes are down regulated in malignant thy DET2, DET3, DET4, DET5 and DET6, in the test cell popu roid tissue. lation and reference cell population, thereby classifying the Provided is a method of distinguishing normal thyroid thyroid lesion in the subject. tissue from malignant thyroid tumor tissue, comprising a) Further provided is a method for classifying a thyroid measuring the expression of one or more nucleic acid lesion in a subject comprising: a) measuring the expression of sequences selected from the group consisting of DET1, 10 one or more nucleic acid sequences selected from the group DET2, DET 4, DET 5, DET 7, DET 8, DET 9, DET 10, DET consisting of DET1, DET2, DET3, DET4, DET5, DET6, 11, and DET12 in a test cell population, wherein at least one DET7, DET8, DET9, DET10, DET11, DET12, DET13, cell in the said test cell population is capable of expressing DET14, DET15, DET16, DET17, DET18, DET19, DET20, one or more nucleic acid sequences selected from the group DET21, DET22, DET23, DET24, DET25, DET26, DET27, consisting of DET1, DET2, DET 4, DET 5, DET 7, DET 8, 15 DET28, DET29, DET30, DET31, DET32, DET33, DET34, DET 9, DET 10, DET 11, and DET12, b) comparing the DET35, DET36, DET37, DET38, DET39, DET40, DET41, expression of one or more nucleic acid sequences selected DET42, DET43, DET44, DET45, DET46, DET47, DET48, from the group consisting of DET1, DET2, DET 4, DET 5, DET49, DET50, DET51, DET52, DET53, DET54, DET55, DET 7, DET 8, DET 9, DET 10, DET 11, and DET12 in the DET56, DET57, DET58, DET59, DET60, DET61, DET62, test cell population to the expression of the same one or more DET63, DET64, DET65, DET66, DET67, DET68, DET69, nucleic acid sequences(s) in a reference cell population com DET70, DET71, DET72, DET73, DET74, DET75, DET76, prising at least one cell which is known to be normal; and c) DET77, DET78, DET79, DET80, DET81, DET82, DET83, identifying an increase in expression of one or more nucleic DET84, and DET85 in a test cell population, wherein at least acid sequences selected from the group consisting of DET 1. one cell in said test cell population is capable of expressing DET2, and DET 12, an increase in expression being associ 25 one or more nucleic acid sequences selected from the group ated with a malignant thyroid tumor, or d) identifying a consisting of DET1, DET2, DET3, DET4, DET5, DET6, decrease in expression of one or more nucleic acid sequences DET7, DET8, DET9, DET10, DET11, DET12, DET13, selected from the group consisting of DET 4, DET 5, DET 7. DET14, DET15, DET16, DET17, DET18, DET19, DET20, DET8, DET9, DET 10, and DET 11, a decrease in expression DET21, DET22, DET23, DET24, DET25, DET26, DET27, being associated with a malignant thyroid tumor. 30 DET28, DET29, DET30, DET31, DET32, DET33, DET34, Also provided is a gene expression approach to diagnose DET35, DET36, DET37, DET38, DET39, DET40, DET41, benign vs. malignant thyroid lesions. Identification of differ DET42, DET43, DET44, DET45, DET46, DET47, DET48, entially expressed genes allows the development of models DET49, DET50, DET51, DET52, DET53, DET54, DET55, that can differentiate benign vs. malignant thyroid tumors. DET56, DET57, DET58, DET59, DET60, DET61, DET62, Results obtained from these models provide a molecular clas 35 DET63, DET64, DET65, DET66, DET67, DET68, DET69, sification system for thyroid tumors and this in turn provides DET70, DET71, DET72, DET73, DET74, DET75, DET76, a more accurate diagnostic tool for the clinician managing DET77, DET78, DET79, DET80, DET81, DET82, DET83, patients with Suspicious thyroid lesions. DET84, and DET85; b) comparing the expression of the The present invention also provides a method for classify nucleic acid sequence(s) to the expression of the same nucleic ing a thyroid lesion in a Subject comprising: a) measuring the 40 acid sequence(s) in a reference cell population comprising at expression of one or more nucleic acid sequences selected least one cell for which a thyroid lesion classification is from the group consisting of DET1, DET2, DET3, DET4, known; and c) identifying a difference, if present, in expres DET6, DET7, DET8, DET9, DET 10 and DET11 in a test cell sion levels of one or more nucleic acid sequences selected population, wherein at least one cell in said test cell popula from the group consisting of DET1, DET2, DET3, DET4, tion is capable of expressing one or more nucleic acid 45 DET5, DET6, DET7, DET8, DET9, DET 10, DET11, sequences selected from the group consisting of DET1, DET12, DET13, DET14, DET15, DET16, DET17, DET18, DET2, DET3, DET4, DET6, DET7, DET8, DET9, DET 10 DET19, DET20, DET21, DET22, DET23, DET24, DET25, and DET11; b) comparing the expression of the nucleic acid DET26, DET27, DET28, DET29, DET30, DET31, DET32, sequence(s) to the expression of the same nucleic acid DET33, DET34, DET35, DET36, DET37, DET38, DET39, sequence(s) in a reference cell population comprising at least 50 DET40, DET41, DET42, DET43, DET44, DET45, DET46, one cell for which a thyroid lesion classification is known; DET47, DET48, DET49, DET50, DET51, DET52, DET53, and c) identifying a difference, if present, in expression levels DET54, DET55, DET56, DET57, DET58, DET59, DET60, of one or more nucleic acid sequences selected from the group DET61, DET62, DET63, DET64, DET65, DET66, DET67, consisting of DET1, DET2, DET3, DET4, DET6, DET7, DET68, DET69, DET70, DET71, DET72, DET73, DET74, DET8, DET9, DET10 and DET11, in the test cell population 55 DET75, DET76, DET77, DET78, DET79, DET80, DET81, and reference cell population, thereby classifying the thyroid DET82, DET83, DET84, and DET85, in the test cell popula lesion in the Subject. tion and reference cell population, thereby classifying the Further provided is a method for classifying a thyroid thyroid lesion in the subject. lesion in a subject comprising: a) measuring the expression of The present invention also provides a method of identify one or more nucleic acid sequences selected from the group 60 ing the stage of a thyroid tumor in a Subject comprising: a) consisting of DET1, DET2, DET3, DET4, DET5 and DET6 measuring the expression of one or more nucleic acid in a test cell population, wherein at least one cell in said test sequences selected from the group consisting of DET1, cell population is capable of expressing one or more nucleic DET2, DET3, DET4, DET5 and DET6 in a test cell popula acid sequences selected from the group consisting of DET 1. tion, wherein at least one cell in said test cell population is DET2, DET3, DET4, DET5 and DET6; b) comparing the 65 capable of expressing one or more nucleic acid sequences expression of the nucleic acid sequence(s) to the expression selected from the group consisting of DET1, DET2, DET3, of the same nucleic acid sequence(s) in a reference cell popu DET4, DET5 and DET6; b) comparing the expression of the US 9,234,244 B2 5 6 nucleic acid sequence(s) to the expression of the same nucleic DET23, DET24, DET25, DET26, DET27, DET28, DET29, acid sequence(s) in a reference cell population comprising at DET30, DET31, DET32, DET33, DET34, DET35, DET36, least one cell for which a thyroid tumor stage is known; and c) DET37, DET38, DET39, DET40, DET41, DET42, DET43, identifying a difference, if present, in expression levels of one DET44, DET45, DET46, DET47, DET48, DET49, DET50, or more nucleic acid sequences selected from the group con DET51, DET52, DET53, DET54, DET55, DET56, DET57, sisting of DET1, DET2, DET3, DET4, DET5 and DET6, in DET58, DET59, DET60, DET61, DET62, DET63, DET64, the test cell population and reference cell population, thereby DET65, DET66, DET67, DET68, DET69, DET70, DET71, identifying the stage of the thyroid tumor in the Subject. DET72, DET73, DET74, DET75, DET76, DET77, DET78, Further provided by the present invention is a method of DET79, DET80, DET81, DET82, DET83, DET84, and identifying the stage of a thyroid tumor in a Subject compris 10 DET85, in the test cell population and reference cell popula ing: a) measuring the expression of one or more nucleic acid tion, thereby identifying the stage of the thyroid tumor in the sequences selected from the group consisting of DET1, Subject. DET2, DET3, DET4, DET6, DET7, DET8, DET9, DET 10 Also provided by the present invention is a method of and DET11 in a test cell population, wherein at least one cell identifying an agent for treating a thyroid tumor, the method in said test cell population is capable of expressing one or 15 comprising: a) contacting a population of thyroid tumor cells more nucleic acid sequences selected from the group consist from a Subject for which a tumor stage is known, wherein at ing of DET1, DET2, DET3, DET4, DET6, DET7, DET8, least one cell in said population is capable of expressing one DET9, DET10 and DET11; b) comparing the expression of or more nucleic acid sequences selected from the group con the nucleic acid sequence(s) to the expression of the same sisting of DET1, DET2, DET3, DET4, DET5 and DET6, with nucleic acid sequence(s) in a reference cell population com a test agent; b) measuring the expression of one or more prising at least one cell for which a thyroid tumor stage is nucleic acid sequences selected from the group consisting of known; and c) identifying a difference, if present, in expres DET1, DET2, DET3, DET4, DET5 and DET6 in the popu sion levels of one or more nucleic acid sequences selected lation; c) comparing the expression of the nucleic acid from the group consisting of DET1, DET2, DET3, DET4, sequence(s) to the expression of the same nucleic acid DET6, DETT, DET8, DET9, DET10 and DET11, in the test 25 sequence(s) in a reference cell population comprising at least cell population and reference cell population, thereby identi one cell for which a thyroid tumor stage is known; and d) fying the stage of the thyroid tumor in the Subject. identifying a difference, if present, in expression levels of one Further provided by the present invention is a method of or more nucleic acid sequences selected from the group con identifying the stage of a thyroid tumor in a Subject compris sisting of DET1, DET2, DET3, DET4, DET5 and DET6, in ing: a) measuring the expression of one or more nucleic acid 30 the test cell population and reference cell population, Such sequences selected from the group consisting of DET1, that if there is a difference corresponding to an improvement, DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET9, a therapeutic agent for treating a thyroid tumor has been DET10, DET11, DET12, DET13, DET14, DET15, DET16, identified. DET17, DET18, DET19, DET20, DET21, DET22, DET23, The present invention also provides a method of identify DET24, DET25, DET26, DET27, DET28, DET29, DET30, 35 ing an agent for treating a thyroid tumor, the method com DET31, DET32, DET33, DET34, DET35, DET36, DET37, prising: a) contacting a population of thyroid tumor cells from DET38, DET39, DET40, DET41, DET42, DET43, DET44, a subject for which a tumor stage is known, wherein at least DET45, DET46, DET47, DET48, DET49, DET50, DET51, one cell in said population is capable of expressing one or DET52, DET53, DET54, DET55, DET56, DET57, DET58, more nucleic acid sequences selected from the group consist DET59, DET60, DET61, DET62, DET63, DET64, DET65, 40 ing of DET1, DET2, DET3, DET4, DET6, DET7, DET8, DET66, DET67, DET68, DET69, DET70, DET71, DET72, DET9, DET 10 and DET11, with a test agent; b) measuring DET73, DET74, DET75, DET76, DET77, DET78, DET79, the expression of one or more nucleic acid sequences selected DET80, DET81, DET82, DET83, DET84, and DET85 in a from the group consisting of DET1, DET2, DET3, DET4, test cell population, wherein at least one cell in said test cell DET6, DET7, DET8, DET9, DET10 and DET11 in the popu population is capable of expressing one or more nucleic acid 45 lation; c) comparing the expression of the nucleic acid sequences selected from the group consisting of DET1, sequence(s) to the expression of the same nucleic acid DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET9, sequence(s) in a reference cell population comprising at least DET10, DET11, DET12, DET13, DET14, DET15, DET16, one cell for which a thyroid tumor stage is known; and d) DET17, DET18, DET19, DET20, DET21, DET22, DET23, identifying a difference, if present, in expression levels of one DET24, DET25, DET26, DET27, DET28, DET29, DET30, 50 or more nucleic acid sequences selected from the group con DET31, DET32, DET33, DET34, DET35, DET36, DET37, sisting of DET1, DET2, DET3, DET4, DET6, DET7, DET8, DET38, DET39, DET40, DET41, DET42, DET43, DET44, DET9, DET10 and DET11, in the test cell population and DET45, DET46, DET47, DET48, DET49, DET50, DET51, reference cell population, such that if there is a difference DET52, DET53, DET54, DET55, DET56, DET57, DET58, corresponding to an improvement, a therapeutic agent for DET59, DET60, DET61, DET62, DET63, DET64, DET65, 55 treating a thyroid tumor has been identified. DET66, DET67, DET68, DET69, DET70, DET71, DET72, The present invention also provides a method of identify DET73, DET74, DET75, DET76, DET77, DET78, DET79, ing an agent for treating a thyroid tumor, the method com DET80, DET81, DET82, DET83, DET84, and DET85; b) prising: a) contacting with a test agent a population of thyroid comparing the expression of the nucleic acid sequence(s) to tumor cells from a Subject for which a tumor stage is known, the expression of the same nucleic acid sequence(s) in a 60 wherein at least one cell in said population is capable of reference cell population comprising at least one cell for expressing one or more nucleic acid sequences selected from which a thyroid tumor stage is known; and c) identifying a the group consisting of DET1, DET2, DET3, DET4, DET5, difference, if present, in expression levels of one or more DET6, DET7, DET8, DET9, DET10, DET11, DET12, nucleic acid sequences selected from the group consisting of DET13, DET14, DET15, DET16, DET17, DET18, DET19, DET1, DET2, DET3, DET4, DET5, DET6, DET7, DET8, 65 DET20, DET21, DET22, DET23, DET24, DET25, DET26, DET9, DET10, DET11, DET12, DET13, DET14, DET15, DET27, DET28, DET29, DET30, DET31, DET32, DET33, DET16, DET17, DET18, DET19, DET20, DET21, DET22, DET34, DET35, DET36, DET37, DET38, DET39, DET40, US 9,234,244 B2 7 8 DET41, DET42, DET43, DET44, DET45, DET46, DET47, tor model to the gene expression data; and outputting the class DET48, DET49, DET50, DET51, DET52, DET53, DET54, of tumor as malignant or benign based on the determination. DET55, DET56, DET57, DET58, DET59, DET60, DET61, The present invention also provides a method for classify DET62, DET63, DET64, DET65, DET66, DET67, DET68, ing a thyroid lesion in a Subject as malignant or benign com DET69, DET70, DET71, DET72, DET73, DET74, DET75, prises receiving gene expression data of one or more nucleic DET76, DET77, DET78, DET79, DET80, DET81, DET82, acid sequences selected from the group consisting of the DET83, DET84, and DET85; b) measuring the expression of differentially expressed thyroid genes DET1, DET2, DET3, one or more nucleic acid sequences selected from the group DET4, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, consisting of DET1, DET2, DET3, DET4, DET5, DET6, DET12, DET13, DET14, DET15, DET16, DET17, DET18, DET7, DET8, DET9, DET10, DET11, DET12, DET13, 10 DET14, DET15, DET16, DET17, DET18, DET19, DET20, DET19, DET20, DET21, DET22, DET23, DET24, DET25, DET21, DET22, DET23, DET24, DET25, DET26, DET27, DET26, DET27, DET28, DET29, DET30, DET31, DET32, DET28, DET29, DET30, DET31, DET32, DET33, DET34, DET33, DET34, DET35, DET36, DET37, DET38, DET39, DET35, DET36, DET37, DET38, DET39, DET40, DET41, DET40, DET41, DET42, DET43, DET44, DET45, DET46, DET42, DET43, DET44, DET45, DET46, DET47, DET48, 15 DET47, DET48, DET49, DET50, DET51, DET52, DET53, DET49, DET50, DET51, DET52, DET53, DET54, DET55, DET54, DET55, DET56, DET57, DET58, DET59, DET60, DET56, DET57, DET58, DET59, DET60, DET61, DET62, DET61, DET62, DET63, DET64, DET65, DET66, DET67, DET63, DET64, DET65, DET66, DET67, DET68, DET69, DET68, DET69, DET70, DET71, DET72, DET73, DET74, DET70, DET71, DET72, DET73, DET74, DET75, DET76, DET75, DET76, DET77, DET78, DET79, DET80, DET81, DET77, DET78, DET79, DET80, DET81, DET82, DET83, DET82, DET83, DET84, and DET85 in a test cell population, DET84, and DET85 in the population; c) comparing the wherein at least one cell in said test cell population is capable expression of the nucleic acid sequence(s) to the expression of expressing one or more nucleic acid sequences selected of the same nucleic acid sequence(s) in a reference cell popu from the group consisting of DET1, DET2, DET3, DET4, lation comprising at least one cell for which a thyroid tumor DET5, DET6, DET7, DET8, DET9, DET 10, DET11, stage is known; and d) identifying a difference, if present, in 25 DET12, DET13, DET14, DET15, DET16, DET17, DET18, expression levels of one or more nucleic acid sequences DET19, DET20, DET21, DET22, DET23, DET24, DET25, selected from the group consisting of DET1, DET2, DET3, DET26, DET27, DET28, DET29, DET30, DET31, DET32, DET4, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, DET33, DET34, DET35, DET36, DET37, DET38, DET39, DET12, DET13, DET14, DET15, DET16, DET17, DET18, DET40, DET41, DET42, DET43, DET44, DET45, DET46, DET19, DET20, DET21, DET22, DET23, DET24, DET25, 30 DET47, DET48, DET49, DET50, DET51, DET52, DET53, DET26, DET27, DET28, DET29, DET30, DET31, DET32, DET54, DET55, DET56, DET57, DET58, DET59, DET60, DET33, DET34, DET35, DET36, DET37, DET38, DET39, DET61, DET62, DET63, DET64, DET65, DET66, DET67, DET40, DET41, DET42, DET43, DET44, DET45, DET46, DET68, DET69, DET70, DET71, DET72, DET73, DET74, DET47, DET48, DET49, DET50, DET51, DET52, DET53, DET75, DET76, DET77, DET78, DET79, DET80, DET81, DET54, DET55, DET56, DET57, DET58, DET59, DET60, 35 DET82, DET83, DET84, and DET85; and determining a DET61, DET62, DET63, DET64, DET65, DET66, DET67, class of tumor, wherein the determination is made by apply DET68, DET69, DET70, DET71, DET72, DET73, DET74, ing a statistical classifier or predictor model to the gene DET75, DET76, DET77, DET78, DET79, DET80, DET81, expression data; and outputting the class of tumor as malig DET82, DET83, DET84, and DET85, in the test cell popula nant or benign based on the determination. tion and reference cell population, Such that if there is a 40 The present invention also provides a method for identify difference corresponding to an improvement, a therapeutic ing the stage of a thyroid tumor in a Subject comprises receiv agent for treating a thyroid tumor has been identified. ing gene expression data of one or more nucleic acid The present invention also provides a method for classify sequences selected from the group consisting of the differen ing a thyroid lesion in a Subject as malignant or benign com tially expressed thyroid genes DET1, DET2, DET3, DET4, prises receiving gene expression data of one or more nucleic 45 DET5, and DET6 in a test cell population, wherein at least acid sequences selected from the group consisting of the one cell in said test cell population is capable of expressing differentially expressed thyroid genes DET1, DET2, DET3, one or more nucleic acid sequences selected from the group DET4, DET5, and DET6 in a test cell population, wherein at consisting of DET1, DET2, DET3, DET4, DET5, and DET6: least one cell in said test cell population is capable of express and determining the stage of the thyroid tumor, wherein the ing one or more nucleic acid sequences selected from the 50 determination is made by applying a statistical classifier or group consisting of DET1, DET2, DET3, DET4, DET5, and predictor model to the gene expression data; and outputting DET6; and determining a class of tumor, wherein the deter the stage of the thyroid tumor based on the determination. mination is made by applying a statistical classifier or predic The present invention also provides a method for identify tor model to the gene expression data; and outputting the class ing the stage of a thyroid tumor in a Subject comprises receiv of tumor as malignant or benign based on the determination. 55 ing gene expression data of one or more nucleic acid The present invention also provides a method for classify sequences selected from the group consisting of the differen ing a thyroid lesion in a Subject as malignant or benign com tially expressed thyroid genes DET1, DET2, DET3, DET4, prises receiving gene expression data of one or more nucleic DET6, DETT, DET8, DET9, DET 10, DET11, in a test cell acid sequences selected from the group consisting of the population, wherein at least one cell in said test cell popula differentially expressed thyroid genes DET1, DET2, DET3, 60 tion is capable of expressing one or more nucleic acid DET4, DET6, DET7, DET8, DET9, DET 10, DET11, in a test sequences selected from the group consisting of DET1, cell population, wherein at least one cell in said test cell DET2, DET3, DET4, DET6, DET7, DET8, DET9, DET 10, population is capable of expressing one or more nucleic acid DET11; and determining the stage of the thyroid tumor, sequences selected from the group consisting of DET1, wherein the determination is made by applying a statistical DET2, DET3, DET4, DET6, DET7, DET8, DET9, DET10, 65 classifier or predictor model to the gene expression data; and DET11; and determining a class of tumor, wherein the deter outputting the stage of the thyroid tumor based on the deter mination is made by applying a statistical classifier or predic mination. US 9,234,244 B2 10 The present invention also provides a method for identify sequences selected from the group consisting of the differen ing the stage of a thyroid tumor in a Subject comprises receiv tially expressed thyroid genes DET1, DET2, DET3, DET4, ing gene expression data of one or more nucleic acid DET5, DET6, DET7, DET8, DET9, DET 10, DET11, sequences selected from the group consisting of the differen DET12, DET13, DET14, DET15, DET16, DET17, DET18, tially expressed thyroid genes DET1, DET2, DET3, DET4, DET19, DET20, DET21, DET22, DET23, DET24, DET25, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, DET26, DET27, DET28, DET29, DET30, DET31, DET32, DET12, DET13, DET14, DET15, DET16, DET17, DET18, DET33, DET34, DET35, DET36, DET37, DET38, DET39, DET19, DET20, DET21, DET22, DET23, DET24, DET25, DET40, DET41, DET42, DET43, DET44, DET45, DET46, DET26, DET27, DET28, DET29, DET30, DET31, DET32, DET47, DET48, DET49, DET50, DET51, DET52, DET53, DET33, DET34, DET35, DET36, DET37, DET38, DET39, 10 DET54, DET55, DET56, DET57, DET58, DET59, DET60, DET40, DET41, DET42, DET43, DET44, DET45, DET46, DET61, DET62, DET63, DET64, DET65, DET66, DET67, DET47, DET48, DET49, DET50, DET51, DET52, DET53, DET68, DET69, DET70, DET71, DET72, DET73, DET74, DET54, DET55, DET56, DET57, DET58, DET59, DET60, DET75, DET76, DET77, DET78, DET79, DET80, DET81, DET61, DET62, DET63, DET64, DET65, DET66, DET67, DET82, DET83, DET84, and DET85 in a test cell population, DET68, DET69, DET70, DET71, DET72, DET73, DET74, 15 wherein at least one cell in said test cell population is capable DET75, DET76, DET77, DET78, DET79, DET80, DET81, of expressing one or more nucleic acid sequences selected DET82, DET83, DET84, and DET85 in a test cell population, from the group consisting of DET1, DET2, DET3, DET4, wherein at least one cell in said test cell population is capable DET5, DET6, DET7, DET8, DET9, DET 10, DET11, of expressing one or more nucleic acid sequences selected DET12, DET13, DET14, DET15, DET16, DET17, DET18, from the group consisting of DET1, DET2, DET3, DET4, DET19, DET20, DET21, DET22, DET23, DET24, DET25, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, DET26, DET27, DET28, DET29, DET30, DET31, DET32, DET12, DET13, DET14, DET15, DET16, DET17, DET18, DET33, DET34, DET35, DET36, DET37, DET38, DET39, DET19, DET20, DET21, DET22, DET23, DET24, DET25, DET40, DET41, DET42, DET43, DET44, DET45, DET46, DET26, DET27, DET28, DET29, DET30, DET31, DET32, DET47, DET48, DET49, DET50, DET51, DET52, DET53, DET33, DET34, DET35, DET36, DET37, DET38, DET39, 25 DET54, DET55, DET56, DET57, DET58, DET59, DET60, DET40, DET41, DET42, DET43, DET44, DET45, DET46, DET61, DET62, DET63, DET64, DET65, DET66, DET67, DET47, DET48, DET49, DET50, DET51, DET52, DET53, DET68, DET69, DET70, DET71, DET72, DET73, DET74, DET54, DET55, DET56, DET57, DET58, DET59, DET60, DET75, DET76, DET77, DET78, DET79, DET80, DET81, DET61, DET62, DET63, DET64, DET65, DET66, DET67, DET82, DET83, DET84, and DET85; and determining the DET68, DET69, DET70, DET71, DET72, DET73, DET74, 30 stage of the thyroid tumor, wherein the determination is made DET75, DET76, DET77, DET78, DET79, DET80, DET81, by applying a statistical classifier or predictor model to the DET82, DET83, DET84, and DET85; and determining the gene expression data; and outputting the stage of the thyroid stage of the thyroid tumor, wherein the determination is made tumor based on the determination. by applying a statistical classifier or predictor model to the Also provided by the present invention is a kit comprising gene expression data; and outputting the stage of the thyroid 35 one or more reagents for detecting the expression of one or tumor based on the determination. more nucleic acid(s) selected from the group consisting of The present invention also provides a method for identify DET1, DET2, DET3, DET4, DET5, DET6, DET7, DET8, ing the stage of a thyroid tumor in a Subject comprises receiv DET9, DET10, DET11. ing gene expression data of one or more nucleic acid sequences selected from the group consisting of the differen 40 BRIEF DESCRIPTION OF THE FIGURES tially expressed thyroid genes DET1, DET2, DET3, DET4, DET5, and DET6 in a test cell population, wherein at least FIG. 1 shows PCA (principle component analysis) organi one cell in said test cell population is capable of expressing Zation in a three-dimensional space of all samples divided one or more nucleic acid sequences selected from the group into four groups: hyperplastic-nodule (HN), follicular consisting of DET1, DET2, DET3, DET4, DET5, and DET6: 45 adenoma (FA), follicular variant of papillary thyroid carci and determining the stage of the thyroid tumor, wherein the noma (FVPTC) and papillary thyroid carcinoma (PTC). Each determination is made by applying a statistical classifier or dot represents how that sample is localized in space on the predictor model to the gene expression data; and outputting basis of its gene expression profile. The distance between any the stage of the thyroid tumor based on the determination. pair of points is related to the similarity between the two The present invention also provides a method for identify 50 observations in high dimensional space. The principal com ing the stage of a thyroid tumor in a Subject comprises receiv ponents are plotted along the various axes (x,y,z). The % ing gene expression data of one or more nucleic acid indicates the total amount of variance captured by the PCs; sequences selected from the group consisting of the differen the first PC is the one capturing the largest amount of vari tially expressed thyroid genes DET1, DET2, DET3, DET4, ance, or information, the second PC, the second largest etc. DET6, DETT, DET8, DET9, DET 10, DET11, in a test cell 55 Three PCs were plotted, thus creating a 3D plot. population, wherein at least one cell in said test cell popula FIG. 2 shows PCA organization in a three-dimensional tion is capable of expressing one or more nucleic acid space of all samples divided into two groups: benign (HN sequences selected from the group consisting of DET1, FA) and malignant (FVPTC-PTC). Each dot represents how DET2, DET3, DET4, DET6, DET7, DET8, DET9, DET10, that sample is localized in space on the basis of its gene DET11; and determining the stage of the thyroid tumor, 60 expression profile. The distance between any pair of points is wherein the determination is made by applying a statistical related to the similarity between the two observations in high classifier or predictor model to the gene expression data; and dimensional space. outputting the stage of the thyroid tumor based on the deter FIG. 3 shows PCA organization in a three-dimensional mination. space of all samples with (A) and without the unknowns (B) The present invention also provides a method for identify 65 based on the gene expressions values of the six most infor ing the stage of a thyroid tumor in a Subject comprises receiv mative genes. It is clear there is a separation of the two groups ing gene expression data of one or more nucleic acid and that it is possible to predict visually the diagnosis of each US 9,234,244 B2 11 12 unknown. The pathological diagnoses of the unknowns are primers. The upper and lower limits of each box represent marked respectively with a+ and a for the benign and the third and first quartiles, respectively; Red lines represent malignant tumor. The red--sign indicates an unknown sample medians; whiskers represent extreme measurements; *. for which pathological diagnosis and predicted diagnosis P<0.001 by two-tailed t test between benign and malignant were discordant. Based on the present six gene diagnostic tumor-types. Note: As expected from the microarray analysis, predictor model, this lesion was placed in the malignant HMGA2, PLAG1 and CDH3 are overexpressed, while both group. Upon re-review by the pathologist, this sample was RAG2 and AGTR1 are underexpressed in malignant tumors reclassified from benign to a neoplasm of uncertain malignant compared to benign. potential. FIG. 11A shows HMGA2 expression in thyroid. Western FIG. 4 is a graph showing gene expression profiles often 10 blot analysis of HMGA2 protein expression in thyroid tumors unknown samples. On the basis of their profile the predictor (T1-T7) and in adjacent normal thyroid tissues (N-1-N7). The model of this invention gave a correct diagnosis in 100% of anti-HMGA2 goat polyclonal antibody recognized HMGA2 the cases. The y axis represents the ratio between thyroid expression specifically in thyroid tumors but not in adjacent tumor mRNA expression level (Cy5 fluorescence intensity) normal thyroid tissue. High protein expression of HMGA2 is and control thyroid tissue mRNA expression level (Cy3 fluo 15 detected in malignant tumors (T4, T5, T6. T7) compared to rescence intensity). benign tumors (T1, T2, T3). FIG. 5 shows the results of RT-PCR utilizing the 6 gene FIG. 11B shows immunohistochemistry of HMGA2 in predictor model. The RT-PCR data using 6 genes across 42 thyroid tumors. Positive HMGA2 immunosignals were patient samples demonstrates separation by group. detected in the nuclei of all tumor cells and specifically in FIG. 6 shows immunohistochemical results for expression 70-90% of malignant tumor cells (FVPTC; follicular variant of KIT and CDH1 in malignant and benign thyroid lesions. papillary thyroid carcinomas and PTC: papillary thyroid car These results correlate with the expression data obtained via cinomas) compared to only 20-30% of benign tumor cells microarray and RT-PCR. (AN: adenomatoid nodules and FA; follicular adenoma). No FIG. 7 shows the results of RT-PCR utilizing the 10 gene detectable expression was seen in the adjacent normal thyroid predictor model. The RT-PCR data using 10 genes demon 25 tissue. Magnifications: x400 strate separation by group. FIG. 12 shows Principal component analysis (PCA) using FIG. 8 shows RT-PCR analysis of HMGA2 and PLAG1 in 94 thyroid tumor samples. A, expression values from all thyroid tumors. The mRNA expression of both HMGA2 and 15,745 genes. B., expression values from the most variable PLAG1 in malignant A: papillary thyroid carcinoma (PTC: 1000 genes. C, expression values from the best 75 differen n=10) & follicular variant of papillary thyroid carcinoma 30 tially expressed genes. Benign (triangle) and malignant (FVPTC; n=7) and benign B; follicular adenoma (FA: (square) thyroid tumor samples are localized in three-dimen n=11) & adenomatoid nodule (AN; n=10) was determined sional space on the basis of their gene expression profile. The by RT-PCR. GAPDH expression after 22 PCR-cycles and 35 distance between any 2 points is related to the similarity PCR-cycles served as a loading control for malignant and between the two observations in high dimensional space. benign tumors respectively. Note: With the exception of one 35 adenomatoid nodule (AN4) the benign tumors exhibited no DIFFERENTIALLY EXPRESSED THY ROID detectable levels of HMGA2 or PLAG1. Only smear was GENES found after extending PCR-cycles to 40. FIG. 9 shows Real-time RT-PCR validation of 6 genes One aspect of the invention relates to genes that are differ (SPOCK1, CEACAM6, PRSS3, PDE5A, LRRK2 and TPO5) 40 entially expressed in benign and/or malignant thyroid lesions using 76 tumors from the original set of microarray samples. relative to benign thyroid tissue. These differentially Relative gene expression levels normalized to GAPDH in 41 expressed genes are collectively referred to herein as “Differ benign follicular adenomas (FA; n-11), adenomatoid nod entially Expressed Thyroid' genes (“DET genes). The cor ules (AN; n=10), lymphocytic thyroiditis nodules (LcT: responding gene products are referred to as "DET products n=10) & Hiirthle cell adenomas (HA; n=10) and 35 malig 45 “DET polypeptides” and/or "DET proteins”. The DET genes nant Hirthle cell carcinomas (HC; n=5), follicular carcino of the present invention include C21orf4 (DET1), Hs. 145049 mas (FC: n=10), follicular variant of papillary thyroid carci (DET2), Hs.296031 (DET3), KIT (DET4), LSM7 (DET5), nomas (FVPTC; n=10) & 10 papillary thyroid carcinomas SYNGR2 (DET6), C11orf3 (DET7), CDH1 (DET8), (PTC; n=10) tumors were determined using gene-specific FAM13A1 (DET9), IMPACT (DET10), KIAA1128 primers as described in Materials and Methods. The upper 50 (DET11), HMGA2 (DET12), KLK7 (DET13), MRC2 and lower limits of each box represent third and first quar (DET14), LRRK2 (DET15), PLAG1 (DET16), CYP1B1 tiles, respectively; Redlines represent medians; whiskers rep (DET17), DPP4 (DET18), FNDC4 (DET19), PHLDA2 resent extreme measurements; *, P<0.001 by two-tailed ttest (DET20), CCNA1 (DET21), CDH3 (DET22), CEACAM6 between benign and malignant tumor-types. Note. As (DET23), QSCN6 (DET24), COL7A1 (DET25), MGC9712 expected from the microarray analysis, SPOCK1, 55 (DET26), IL1RAP (DET27), LAMB3 (DET28), PRSS3 CEACAM6, PRSS3 & LRRK2 are overexpressed, and TPO5 (DET29), LRP4 (DET30), SPOCK1 (DET31), PDE5A is underexpressed in malignant tumors compared to benign. (DET32), FLJ37078 (DET33), FBN3 (DET34), DIRAS3 FIG. 10 shows Real-time RT-PCR validation of 6 genes (DET35), PRSS1 (DET36), CAMK2N1 (DET37), SNIP (HMGA2, PLAG1, DPP4, CDH3, RAG2 and AGTR1) using (DET38), KCNJ2 (DET39), SFN (DET40), GALNT7 31 new thyroid tumors. Relative expression levels normalized 60 (DET41), TGFA (DET42), BAIAP3 (DET43), KCNK15 to GAPDH in 20 benign follicular adenomas (FA; n=7), (DET44), RAG2 (DET45), CLYBL (DET46), NEB adenomatoid nodules (AN: n=7), lymphocytic thyroiditis (DET47), TNFRSF11B (DET48), GNAI1 (DET49), AGTR1 nodules (LcT; n=2) & Hirthle cell adenomas (HA; n=4) and (DET50), HLF (DET51), SLC26A4 (DET52), MT1A 11 malignant Hirthle cell carcinomas (HC; n=1), follicular (DET53), FABP4 (DET54), LRP1B (DET55), SLC4A4 carcinomas (FC: n=3), follicular variant of papillary thyroid 65 (DET56), LOC646278 (DET57), MAN1C1 (DET58), carcinomas (FVPTC; n=3) & 10 papillary thyroid carcinomas KCNIP3 (DET59), DNAJB9 (DET60), UBR1 (DET61), (PTC; n. 4) tumors were determined using gene-specific HSD17B6 (DET62), SLC33A1 (DET63), CDH16 (DET64), US 9,234,244 B2 13 14 TBC1D1 (DET65), SLC26A7 (DET66), C11orf74 (DET67), downregulated in benign thyroid lesions and downregulated PLA2R1 (DET68), PTTG3 (DET69), EFEMP1 (DET70), in malignant thyroid lesions as compared to normal thyroid ZMAT4 (DET71), STEAP3 (DET72), DIO1 (DET73), TPO tissue. Upon comparing benign tissue with malignant tissue, (DET74), PTTG1 (DET75), LGI3 (DET76), TMEM38B KIT was found to be upregulated in benign tissue as com (DET77), SLITRK4 (DET78), VBP1 (DET79), COL9A3 pared to malignant tissue. Thus, KIT expression decreases (DET80), IRS1 (DET81), STARD13 (DET82), LOC654085 during malignancy. A nucleic acid encoding KIT is set forth (DET83), RPS3A (DET84), SPARCL1 (DET85). The fol hereinas SEQID NO: 45. Nucleic acid sequences for KIT can lowing provides a brief description of DET1-DET 11. also be accessed via GenBank Accession Nos. X06182 and C21orf4 (DET1) NM 000222 and via Unigene No. Hs.81665. All of the infor C21orf4 is a gene encoding an integral membrane protein 10 mation, including any nucleic acid and amino acids of unknown function, located in the q region of sequences provided for KIT under GenBank Accession No. 21. C21orf4 was found to be upregulated in benign thyroid X06182, GenBank Accession No. NM 000222 and via Uni lesions and upregulated in malignant thyroid lesions as com gene No. Hs.81665 is hereby incorporated in its entirety by pared to normal thyroid tissue. Upon comparing benign tissue this reference. with malignant tissue, C21orf4 was found to be upregulated 15 U6 small nuclear RNA Associated Homo sapiens LSM7 in benign tissue as compared to malignant tissue. An example Homolog (LSM7) (DET5) of a nucleic acid encoding C21orf4 is set forth herein as SEQ LSM7 is a U6 small nuclear ribonucleoprotein that is ID NO: 40. Nucleic acid sequences for C21orf4 can also be involved in tRNA processing. LSM7 was found to be upregu accessed via GenBank Accession No. AP001717, GenBank lated in benign thyroid lesions and downregulated in malig Accession No. NM 006134 and via Unigene No. nant thyroid lesions as compared to normal thyroid tissue. HS.433668. All of the information, including any nucleic acid Upon comparing benign tissue with malignant tissue, LSM-7 and amino acids sequences provided for C21orf4 under Gen was found to be upregulated in benign tissue as compared to Bank Accession No. AP001717, GenBank Accession No. malignant tissue. A nucleic acid sequence encoding LSM7 is NM 006134 and Unigene No. Hs.433668 is hereby incor set forth hereinas SEQID NO: 47. Nucleic acid sequences for porated in its entirety by this reference. 25 LSM7 can also be accessed via GenBank Accession No. Hs. 145049 (DET2) NM 016199 and via Unigene No. Hs.512610. All of the Hs. 145049, formerly known as Hs.24183, is a sodium-D- information, including any nucleic acid and amino acids glucose transporter. The Unigene cluster identified as Uni sequences provided for LSM7 under GenBankAccession No. gene NO. Hs. 24183 has been retired and has been replaced by NM 016199 and Unigene No. Hs.512610 is hereby incor Hs. 145049. Hs. 145049 was found to be upregulated in both 30 porated in its entirety by this reference. benign and malignant thyroid lesions as compared to normal Synaptogyrin 2 (SYNGR2) (DET6) thyroid tissue. Upon comparing benign tissue with malignant SYNGR2 is a synaptic vesicle protein that may play a role tissue, Hs. 145049 was found to be upregulated in benign in regulating membrane traffic. SYNGR2 was found to be tissue as compared to malignant tissue. A nucleic acid encod downregulated in benign thyroid lesions and comparable to ing Hs. 145049 is set forth herein as SEQID NO: 42. Nucleic 35 normal in malignant thyroid lesions as compared to normal acid sequences for Hs. 145049 can also be accessed via Gen thyroid tissue. Upon comparing benign tissue with malignant Bank Accession No. NP 060265, via GenBank Accession tissue, SYNGR2 was found to be upregulated in malignant No. AL832414.1 and via Unigene No. Hs. 145049. All of the tissue as compared to benign tissue. A nucleic acid encoding information, including any nucleic acid and amino acids SYNG2 is set forth herein as SEQID NO: 49. Nucleic acid sequences provided for Hs. 145049 under GenBank Acces 40 sequences for SYNGR2 can also be accessed via GenBank sion NP 060265, via GenBank Accession No. AL832414 Accession No. NM 004710 and via Unigene No. Hs. and via Unigene No. Hs. 145049 is hereby incorporated in its 433753. All of the information, including any nucleic acid entirety by this reference. and amino acids sequences provided for LSM7 under Gen Hs.296031 (DET3) Bank Accession No. NM 004710 and via Unigene No. Hs. Hs.29.6031 is a gene of unknown function. Hs. 29.6031 was 45 433753 is hereby incorporated in its entirety by this reference. found to be downregulated in benign and comparable to nor C11orf8 (DET7) mal in malignant thyroid lesions as compared to normal thy C11orf8 is a gene involved in central nervous system devel roid tissue. Upon comparing benign tissue with malignant opment and function. C11orf8 was found to be downregu tissue, Hs.29.6031 was found to be upregulated in malignant lated in both benign thyroid lesions and malignant thyroid tissue as compared to benign tissue. A nucleic acid encoding 50 lesions as compared to normal thyroid tissue. Upon compar Hs. 296031 is set forth hereinas SEQID NO:44. Nucleic acid ing benign tissue with malignant tissue, C11orf& was found to sequences for Hs.296031 can also be accessed via GenBank be upregulated in benign tissue as compared to malignant Accession No. BC038512 and via Unigene No. Hs.296.031. tissue. A nucleic acid encoding C11orf8 is set forth herein as All of the information, including any nucleic acid and amino SEQID NO: 51. Nucleic acid sequences for C11orf8 can also acids sequences provided for Hs.296031 under GenBank 55 be accessed via GenBank Accession No. NM 001584 and Accession No. BC038512 and Unigene No. Hs.296031 is via Unigene No. Hs. 432000. All of the information, includ hereby incorporated in its entirety by this reference. ing any nucleic acid and amino acids sequences provided for c-kit proto-oncogene (KIT) (DET4) LSM7 under GenBank Accession No. NM 001584 and Uni KIT is a protooncogene that functions as a transmembrane gene No. Hs. 432000 is hereby incorporated in its entirety by receptor tyrosine kinase and is involved in cellular prolifera 60 this reference. tion. See Yarden et al. “Human proto-oncogene c-kit: a new Cadherin 1, type1, E-cadherin (CDH1) (DET8) cell Surface receptor tyrosine kinase for an unidentified CDH1 is a cadherin protein involved in cell adhesion, ligand” EMBO.J. 6(11): 3341-3351 (1987). The Yarden et al. motility, growth and proliferation. CDH1 was found to be reference is incorporated herein in its entirety for the purpose upregulated in benign thyroid lesions and downregulated in of describing KIT function as well as for incorporating all 65 malignant thyroid lesions as compared to normal thyroid KIT protein sequences and nucleic acids encoding KIT pro tissue. Upon comparing benign tissue with malignant tissue, vided in the Yarden et al. reference. KIT was found to be CDH1 was found to be upregulated in benign tissue as com US 9,234,244 B2 15 16 pared to malignant tissue. A nucleic acid encoding CDH1 is IMPACT and CDH1 decreases during malignancy). set forth hereinas SEQID NO:53. Nucleic acid sequences for Hs.296031 (DET3) and SYNGR2 (DET6) were upregulated CDH1 can also be accessed via GenBank Accession No. in malignant samples as compared to benign samples (expres NM 004360 and via Unigene No. Hs. 194657. All of the sion of Hs.29.6031 and SYNGR2 increases during malig information, including any nucleic acid and amino acids nancy) sequences provided for CDH1 under GenBank Accession No. Thus, provided is a method for classifying a thyroid lesion NM 004360 and Unigene No. Hs. 194657 is hereby incor in a subject as a benign lesion comprising: a) measuring the porated in its entirety by this reference. expression of DET1, DET2, DET3, DET4, DET6, DET7, Homo Sapiens Family with Sequence Similarity 13, Member DET8, DET9, DET10 and DET11 in a test cell population, A1 (FAM13A1) (DET9) 10 wherein at least one cell in said test cell population is capable FAM13A1 is a gene of unknown function. FAM13A1 was of expressing DET1, DET2, DET3, DET4, DET6, DET7, found to be upregulated in benign thyroid lesions and down DET8, DET9, DET10 and DET11; b) comparing the expres regulated in malignant thyroid lesions as compared to normal sion of the nucleic acid sequence(s) to the expression of the thyroid tissue. Upon comparing benign tissue with malignant same nucleic acid sequences in a reference cell population tissue, FAM13A1 was found to be upregulated in benign 15 comprising at least one cell from a thyroid lesion known to be tissue as compared to malignant tissue. A nucleic acid encod a benign lesion; and c) identifying similarity of expression ing FAM13A1 is set forth hereinas SEQID NO: 55. Nucleic levels of DET1, DET2, DET3, DET4, DET6, DET7, DET8, acid sequences for FAM13A1 can also be accessed via Gen DET9, DET10 and DET11 in the test cell population and Bank Accession No. NM 014883 and via Unigene No. Hs. reference cell population, thereby classifying the thyroid 442818. All of the information, including any nucleic acid lesion in the Subject as a benign thyroid lesion. and amino acids sequences provided for FAM13A1 under Thus, provided is a method for classifying a thyroid lesion GenBank Accession No. NM 014883 and Unigene No. Hs. in a subject as a benign lesion comprising: a) measuring the 442818 is hereby incorporated in its entirety by this reference. expression of DET1, DET2, DET3, DET4, DET5 and DET6 Homo Sapiens Hypothetical Protein IMPACT (IMPACT) in a test cell population, wherein at least one cell in said test (DET10) 25 cell population is capable of expressing DET1, DET2, DET3, IMPACT is a gene of unknown function. IMPACT was DET4, DET5 and DET6; b) comparing the expression of the found to be upregulated in benign thyroid lesions and down nucleic acid sequence(s) to the expression of the same nucleic regulated in malignant thyroid lesions as compared to normal acid sequences in a reference cell population comprising at thyroid tissue. Upon comparing benign tissue with malignant least one cell from a thyroid lesion known to be a benign tissue, IMPACT was found to be upregulated in benign tissue 30 lesion; and c) identifying similarity of expression levels of as compared to malignant tissue. A nucleic acid encoding DET1, DET2, DET3, DET4, DET6, DET7, DET8, DET9, IMPACT is set forth herein as SEQID NO: 57. Nucleic acid DET10 and DET11 in the test cell population and reference sequences for IMPACT can also be accessed via GenBank cell population, thereby classifying the thyroid lesion in the Accession No. NM 018439 and via Unigene No. Hs. Subject as a benign thyroid lesion. 284245. All of the information, including any nucleic acid 35 Thus, provided is a method for classifying a thyroid lesion and amino acids sequences provided for IMPACT under Gen in a subject as a benign lesion comprising: a) measuring the Bank Accession No. NM 018439 and Unigene No. Hs. expression of DET1, DET2, DET3, DET4, DET5, DET6, 284245 is hereby incorporated in its entirety by this reference. DET7, DET8, DET9, DET10, DET11, DET12, DET13, KIAA1128 Protein (KIAA1128) (DET11) DET14, DET15, DET16, DET17, DET18, DET19, DET20, KIAA1128 is a gene of unknown function. KIAA1128 was 40 DET21, DET22, DET23, DET24, DET25, DET26, DET27, found to be upregulated in benign thyroid lesions and down DET28, DET29, DET30, DET31, DET32, DET33, DET34, regulated in malignant thyroid lesions as compared to normal DET35, DET36, DET37, DET38, DET39, DET40, DET41, thyroid tissue. Upon comparing benign tissue with malignant DET42, DET43, DET44, DET45, DET46, DET47, DET48, tissue, KIAA1128 was found to be upregulated in benign DET49, DET50, DET51, DET52, DET53, DET54, DET55, tissue as compared to malignant tissue. A nucleic acid encod 45 DET56, DET57, DET58, DET59, DET60, DET61, DET62, ing KIAA1128 is set forth herein as SEQID NO. 59. Nucleic DET63, DET64, DET65, DET66, DET67, DET68, DET69, acid sequences for KIAA1128 can also be accessed via Gen DET70, DET71, DET72, DET73, DET74, DET75, DET76, Bank Accession Nos. AB032954 and via Unigene No. Hs. DET77, DET78, DET79, DET80, DET81, DET82, DET83, 81897. All of the information, including any nucleic acid and DET84, and DET85 in a test cell population, wherein at least amino acids sequences provided for KIAA1128 under Gen 50 one cell in said test cell population is capable of expressing Bank Accession Nos. AB032954 and via Unigene No. Hs. DET1, DET2, DET3, DET4, DET5, DET6, DET7, DET8, 81897 is hereby incorporated in its entirety by this reference. DET9, DET10, DET11, DET12, DET13, DET14, DET15, Differential Expression DET16, DET17, DET18, DET19, DET20, DET21, DET22, As shown in Example 1, in a 6-gene panel c21orf4. DET23, DET24, DET25, DET26, DET27, DET28, DET29, Hs. 145049, KIT and LSM-7 were upregulated in benign 55 DET30, DET31, DET32, DET33, DET34, DET35, DET36, samples as compared to malignant samples (i.e., the expres DET37, DET38, DET39, DET40, DET41, DET42, DET43, sion of c21orf4, Hs. 145049, KIT and LSM7 decreases during DET44, DET45, DET46, DET47, DET48, DET49, DET50, malignancy). Hs.29.6031 and SYNGR2 were upregulated in DET51, DET52, DET53, DET54, DET55, DET56, DET57, malignant samples as compared to benign samples (i.e., DET58, DET59, DET60, DET61, DET62, DET63, DET64, expression of Hs.29.6031 and SYNGR2 increases during 60 DET65, DET66, DET67, DET68, DET69, DET70, DET71, malignancy). DET72, DET73, DET74, DET75, DET76, DET77, DET78, As described in Example 1 and FIG. 7, in a ten-gene panel, DET79, DET80, DET81, DET82, DET83, DET84, and C21orf4 (DET1), Hs. 145049 (DET2), KIT (DET4), C11orf& DET85; b) comparing the expression of the nucleic acid (DET7), CDH1 (DET8), FAM13A1 (DET9), IMPACT sequence(s) to the expression of the same nucleic acid (DET10), KIAA1128 (DET11) were upregulated in benign 65 sequences in a reference cell population comprising at least samples as compared to malignant samples (the expression of one cell from a thyroid lesion known to be a benign lesion; c21orf4, Hs. 145049, KIT, FAM13A1, C11orf&, KIAA1128, and c) identifying similarity of expression levels of DET 1. US 9,234,244 B2 17 18 DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET9, DET51, DET52, DET53, DET54, DET55, DET56, DET57, DET10, DET11, DET12, DET13, DET14, DET15, DET16, DET58, DET59, DET60, DET61, DET62, DET63, DET64, DET17, DET18, DET19, DET20, DET21, DET22, DET23, DET65, DET66, DET67, DET68, DET69, DET70, DET71, DET24, DET25, DET26, DET27, DET28, DET29, DET30, DET72, DET73, DET74, DET75, DET76, DET77, DET78, DET31, DET32, DET33, DET34, DET35, DET36, DET37, DET79, DET80, DET81, DET82, DET83, DET84, and DET38, DET39, DET40, DET41, DET42, DET43, DET44, DET85; b) comparing the expression of the nucleic acid DET45, DET46, DET47, DET48, DET49, DET50, DET51, sequence(s) to the expression of the same nucleic acid DET52, DET53, DET54, DET55, DET56, DET57, DET58, sequences in a reference cell population comprising at least DET59, DET60, DET61, DET62, DET63, DET64, DET65, one cell from a thyroid lesion known to be a malignantlesion; DET66, DET67, DET68, DET69, DET70, DET71, DET72, 10 and c) identifying similarity of expression levels of DET 1. DET73, DET74, DET75, DET76, DET77, DET78, DET79, DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET9, DET80, DET81, DET82, DET83, DET84, and DET85 in the DET10, DET11, DET12, DET13, DET14, DET15, DET16, test cell population and reference cell population, thereby DET17, DET18, DET19, DET20, DET21, DET22, DET23, classifying the thyroid lesion in the Subject as a benign thy DET24, DET25, DET26, DET27, DET28, DET29, DET30, roid lesion. 15 DET31, DET32, DET33, DET34, DET35, DET36, DET37, Thus, provided is a method for classifying a thyroid lesion DET38, DET39, DET40, DET41, DET42, DET43, DET44, in a subject as a malignant lesion comprising: a) measuring DET45, DET46, DET47, DET48, DET49, DET50, DET51, the expression of DET1, DET2, DET3, DET4, DET6, DET7, DET52, DET53, DET54, DET55, DET56, DET57, DET58, DET8, DET9, DET10 and DET11 in a test cell population, DET59, DET60, DET61, DET62, DET63, DET64, DET65, wherein at least one cell in said test cell population is capable DET66, DET67, DET68, DET69, DET70, DET71, DET72, of expressing DET1, DET2, DET3, DET4, DET6, DET7, DET73, DET74, DET75, DET76, DET77, DET78, DET79, DET8, DET9, DET10 and DET11; b) comparing the expres DET80, DET81, DET82, DET83, DET84, and DET85 in the sion of the nucleic acid sequence(s) to the expression of the test cell population and reference cell population, thereby same nucleic acid sequences in a reference cell population classifying the thyroid lesion in the Subject as a malignant comprising at least one cell from a thyroid lesion known to be 25 thyroid lesion. a malignantlesion; and c) identifying similarity of expression The present invention provides a method for one skilled in levels of DET1, DET2, DET3, DET4, DET6, DET7, DET8, the art using the molecular biological and statistical methods DET9, DET10 and DET11 in the test cell population and described herein to quantify the gene expression levels of a reference cell population, thereby classifying the thyroid particular DET gene in a number of tumor samples (reference lesion in the Subject as a malignant thyroid lesion. 30 cell populations) and get a statistical distribution of gene Thus, provided is a method for classifying a thyroid lesion expression levels for that particular DET gene that are char in a subject as a malignant lesion comprising: a) measuring acteristic of a collection of benignormalignant tissues. Based the expression of DET1, DET2, DET3, DET4, DET5 and on this information, a test cell population that is derived from DET6 in a test cell population, wherein at least one cell in said a thyroid tumor of uncertain diagnosis can have its expression test cell population is capable of expressing DET1, DET2, 35 level for that particular DET gene characterized as statisti DET3, DET4, DET5 and DET6; b) comparing the expression cally more likely belonging to either the benign or malignant of the nucleic acid sequence(s) to the expression of the same distribution of gene expression levels by using standard sta nucleic acid sequences in a reference cell population com tistical software, thereby designating that test cell population prising at least one cell from a thyroid lesion known to be a from a particular thyroid tumor as being either benign or malignant lesion; and c) identifying similarity of expression 40 malignant. levels of DET1, DET2, DET3, DET4, DET6, DET7, DET8, As disclosed herein, the nucleic acid sequences selected DET9, DET10 and DET11 in the test cell population and from the group consisting of C21orf4 (DET1) and Hs. 145049 reference cell population, thereby classifying the thyroid (DET2) are upregulated in malignant thyroid lesions consist lesion in the Subject as a malignant thyroid lesion. ing of cell populations from papillary thyroid carcinomas and Thus, provided is a method for classifying a thyroid lesion 45 follicular variant of papillary thyroid carcinomas, when com in a subject as a malignant lesion comprising: a) measuring pared to benign thyroid lesions consisting of cell populations the expression of DET1, DET2, DET3, DET4, DET5, DET6, from follicular adenomas and hyperplastic nodules. Thus, DET7, DET8, DET9, DET10, DET11, DET12, DET13, provided is a method to distinguish malignant thyroid lesions DET14, DET15, DET16, DET17, DET18, DET19, DET20, consisting of cell populations from papillary thyroid carcino DET21, DET22, DET23, DET24, DET25, DET26, DET27, 50 mas and follicular variant of papillary thyroid carcinomas, DET28, DET29, DET30, DET31, DET32, DET33, DET34, from benign thyroid lesions consisting of cell populations DET35, DET36, DET37, DET38, DET39, DET40, DET41, from follicular adenomas and hyperplastic nodules. DET42, DET43, DET44, DET45, DET46, DET47, DET48, As disclosed herein, the nucleic acid sequences selected DET49, DET50, DET51, DET52, DET53, DET54, DET55, from the group consisting of HMGA2 (DET12), KLK7 DET56, DET57, DET58, DET59, DET60, DET61, DET62, 55 (DET13), MRC2 (DET14), LRRK2 (DET15), PLAG1 DET63, DET64, DET65, DET66, DET67, DET68, DET69, (DET16), CYP1B1 (DET17), DPP4 (DET18), FNDC4 DET70, DET71, DET72, DET73, DET74, DET75, DET76, (DET19), PHLDA2 (DET20), CCNA1 (DET21), CDH3 DET77, DET78, DET79, DET80, DET81, DET82, DET83, (DET22), CEACAM6 (DET23), QSCN6 (DET24), COL7A1 DET84, and DET85 in a test cell population, wherein at least (DET25), MGC9712 (DET26), IL1RAP (DET27), LAMB3 one cell in said test cell population is capable of expressing 60 (DET28), PRSS3 (DET29), LRP4 (DET30), SPOCK1 DET1, DET2, DET3, DET4, DET5, DET6, DET7, DET8, (DET31), PDE5A (DET32), FLJ37078 (DET33), FBN3 DET9, DET10, DET11, DET12, DET13, DET14, DET15, (DET34), DIRAS3 (DET35), PRSS1 (DET36), CAMK2N1 DET16, DET17, DET18, DET19, DET20, DET21, DET22, (DET37), SNIP (DET38), KCNJ2 (DET39), SFN (DET40), DET23, DET24, DET25, DET26, DET27, DET28, DET29, GALNT7 (DET41), TGFA (DET42), BAIAP3 (DET43), and DET30, DET31, DET32, DET33, DET34, DET35, DET36, 65 KCNK15 (DET44) are upregulated in malignant thyroid DET37, DET38, DET39, DET40, DET41, DET42, DET43, lesions consisting of cell populations from papillary thyroid DET44, DET45, DET46, DET47, DET48, DET49, DET50, carcinomas, follicular variant of papillary thyroid carcino US 9,234,244 B2 19 20 mas, follicular carcinomas, and Hurthle cell carcinomas, MT1A, FABP4, MAN1C1, HSD17B6 (RODH), PLA2R1, when compared to benign thyroid lesions consisting of cell EFEMP1, D100, KIT, TPO, PTTG1, COL9A3, IRS1, and populations from adenomatoid nodules, follicular adenomas, SPARCL1. Hurthle cell adenomas, and lymphocytic thyroid nodules. Diagnostic Methods Thus, provided is a method to distinguish malignant thyroid The diagnostic (e.g., staging and classification) methods lesions consisting of cell populations from papillary thyroid provided herein are based on the comparison of an expression carcinomas, follicular variant of papillary thyroid carcino profile for a specific set of DET (one or more) in a test cell mas, follicular carcinomas, and Hurthle cell carcinomas, population to the expression profile for the same set of DET from benign thyroid lesions consisting of cell populations for a test cell population of known condition (e.g., normal 10 thyroid, malignant thyroid tumor or benign thyroid tumor). from adenomatoid nodules, follicular adenomas, Hurthle cell The present invention provides a method for classifying a adenomas, and lymphocytic thyroid nodules. thyroid lesion in a Subject comprising: a) measuring the As disclosed herein, the nucleic acid sequences selected expression of one or more nucleic acid sequences selected from the group consisting of KIT(DET4), LSM7(DET5), from the group consisting of DET1, DET2, DET3, DET4, C11orf3(DET7), FAM13A1(DET9), IMPACT(DET10), 15 DET6, DET7, DET8, DET9, DET10 and DET11 in a test cell KIAA1128(DET11), and CDH1(DET8), are downregulated population, wherein at least one cell in said test cell popula in malignant thyroid lesions consisting of cell populations tion is capable of expressing one or more nucleic acid from papillary thyroid carcinomas and follicular variant of sequences selected from the group consisting of DET1, papillary thyroid carcinomas, when compared to benign thy DET2, DET3, DET4, DET6, DET7, DET8, DET9, DET 10 roid lesions consisting of cell populations from follicular and DET11; b) comparing the expression of the nucleic acid adenomas and hyperplastic nodules. Thus, provided is a sequence(s) to the expression of the nucleic acid sequence(s) method to distinguish malignant thyroid lesions consisting of in a reference cell population comprising at least one cell for cell populations from papillary thyroid carcinomas and folli which a thyroid lesion classification is known; and c) identi cular variant of papillary thyroid carcinomas, from benign fying a difference, if present, in expression levels of one or thyroid lesions consisting of cell populations from follicular 25 more nucleic acid sequences selected from the group consist adenomas and hyperplastic nodules. ing of DET1, DET2, DET3, DET4, DET6, DET7, DET8, Also disclosed herein, the nucleic acid sequences selected DET9, DET10 and DET11, in the test cell population and from the group consisting of KIT(DET4), RAG2 (DET45), reference cell population, thereby classifying the thyroid CLYBL (DET46), NEB (DET47), TNFRSF11B (DET48), lesion in the Subject. 30 The present invention also provides a method for classify GNAI1 (DET49), AGTR1 (DET50), HLF (DET51), ing a thyroid lesion in a Subject comprising: a) measuring the SLC26A4 (DET52), MT1A (DET53), FABP4 (DET54), expression of one or more nucleic acid sequences selected LRP1B (DET55), SLC4A4 (DET56), LOC646278 (DET57), from the group consisting of DET1, DET2, DET3, DET4, MAN1C1 (DET58), KCNIP3 (DET59), DNAJB9 (DET60), DET5 and DET6 in a test cell population, wherein at least one UBR1 (DET61), HSD17B6 (DET62), SLC33A1 (DET63), 35 cell in said test cell population is capable of expressing one or CDH16 (DET64), TBC1D1 (DET65), SLC26A7 (DET66), more nucleic acid sequences selected from the group consist C11orf74 (DET67), PLA2R1 (DET68), PTTG3 (DET69), ing of DET1, DET2, DET3, DET4, DET5 and DET6; b) EFEMP1 (DET70), ZMAT4 (DET71), STEAP3 (DET72), comparing the expression of the nucleic acid sequence(s) to DIO1 (DET73), TPO (DET74), PTTG1 (DET75), LGI3 the expression of the nucleic acid sequence(s) in a reference (DET76), TMEM38B (DET77), SLITRK4 (DET78), VBP1 40 cell population comprising at least one cell for which a thy (DET79), COL9A3 (DET80), IRS1 (DET81), STARD13 roid lesion classification is known; and c) identifying a dif (DET82), LOC654085 (DET83), RPS3A (DET84), ference, if present, in expression levels of one or more nucleic SPARCL1 (DET85) are downregulated in malignant thyroid acid sequences selected from the group consisting of DET 1. lesions consisting of cell populations from papillary thyroid DET2, DET3, DET4, DET5 and DET6, in the test cell popu carcinomas, follicular variant of papillary thyroid carcino 45 lation and reference cell population, thereby classifying the mas, follicular carcinomas, and Hurthle cell carcinomas, thyroid lesion in the subject. when compared to benign thyroid lesions consisting of cell The present invention also provides a method for classify populations from adenomatoid nodules, follicular adenomas, ing a thyroid lesion in a Subject comprising: a) measuring the Hurthle cell adenomas, and lymphocytic thyroid nodules. expression of one or more nucleic acid sequences selected Thus, provided is a method to distinguish malignant thyroid 50 from the group consisting of DET1, DET2, DET3, DET4, lesions consisting of cell populations from papillary thyroid DET5, DET6, DET7, DET8, DET9, DET 10, DET11, carcinomas, follicular variant of papillary thyroid carcino DET12, DET13, DET14, DET15, DET16, DET17, DET18, mas, follicular carcinomas, and Hurthle cell carcinomas, DET19, DET20, DET21, DET22, DET23, DET24, DET25, from benign thyroid lesions consisting of cell populations DET26, DET27, DET28, DET29, DET30, DET31, DET32, from adenomatoid nodules, follicular adenomas, Hurthle cell 55 DET33, DET34, DET35, DET36, DET37, DET38, DET39, adenomas, and lymphocytic thyroid nodules. DET40, DET41, DET42, DET43, DET44, DET45, DET46, The disclosed methods of the present invention, including DET47, DET48, DET49, DET50, DET51, DET52, DET53, classifying, staging, and screening for a therapeutic, include DET54, DET55, DET56, DET57, DET58, DET59, DET60, an embodiment wherein the gene expression is not measured DET61, DET62, DET63, DET64, DET65, DET66, DET67, for only one of the DET genes selected from the group con 60 DET68, DET69, DET70, DET71, DET72, DET73, DET74, sisting of HMGA2, CYP1B1, DPP4, PHLDA2, LAMB3, DET75, DET76, DET77, DET78, DET79, DET80, DET81, LRP4, TGFA, RAG2, TNFRSF11B, SLC26A4, MT1A, DET82, DET83, DET84, and DET85 in a test cell population, FABP4, MAN1C1, HSD17B6 (RODH), PLA2R1, EFEMP1, wherein at least one cell in said test cell population is capable D100, KITTPO, PTTG1, COL9A3, IRS1, and SPARCL1, or of expressing one or more nucleic acid sequences selected for only a combination of the DET genes selected from the 65 from the group consisting of DET1, DET2, DET3, DET4, group consisting of HMGA2, CYP1B1, DPP4, PHLDA2, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, LAMB3, LRP4, TGFA, RAG2, TNFRSF11B, SLC26A4 DET12, DET13, DET14, DET15, DET16, DET17, DET18, US 9,234,244 B2 21 22 DET19, DET20, DET21, DET22, DET23, DET24, DET25, DET10, DET11, DET12, DET13, DET14, DET15, DET16, DET26, DET27, DET28, DET29, DET30, DET31, DET32, DET17, DET18, DET19, DET20, DET21, DET22, DET23, DET33, DET34, DET35, DET36, DET37, DET38, DET39, DET24, DET25, DET26, DET27, DET28, DET29, DET30, DET40, DET41, DET42, DET43, DET44, DET45, DET46, DET31, DET32, DET33, DET34, DET35, DET36, DET37, DET47, DET48, DET49, DET50, DET51, DET52, DET53, DET38, DET39, DET40, DET41, DET42, DET43, DET44, DET54, DET55, DET56, DET57, DET58, DET59, DET60, DET45, DET46, DET47, DET48, DET49, DET50, DET51, DET61, DET62, DET63, DET64, DET65, DET66, DET67, DET52, DET53, DET54, DET55, DET56, DET57, DET58, DET68, DET69, DET70, DET71, DET72, DET73, DET74, DET59, DET60, DET61, DET62, DET63, DET64, DET65, DET75, DET76, DET77, DET78, DET79, DET80, DET81, DET66, DET67, DET68, DET69, DET70, DET71, DET72, DET82, DET83, DET84, and DET85; b) comparing the 10 DET73; DET74, DET75, DET76, DET77, DET78, DET79, expression of the nucleic acid sequence(s) to the expression DET80, DET81, DET82, DET83, DET84, and DET85 can be of the nucleic acid sequence(s) in a reference cell population measured. comprising at least one cell for which a thyroid lesion classi Also disclosed herein is a method of classifying a tumor as fication is known; and c) identifying a difference, if present, in malignant or benign based on the statistical similarity of the expression levels of one or more nucleic acid sequences 15 expression levels found in the tumor cells of question of the selected from the group consisting of DET1, DET2, DET3, nucleic acid sequences selected from the group consisting of DET4, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, C21orf4 (DET1), Hs. 145049(DET2), HMGA2 (DET12), DET12, DET13, DET14, DET15, DET16, DET17, DET18, KLK7 (DET13), MRC2 (DET14), LRRK2 (DET15), PLAG1 DET19, DET20, DET21, DET22, DET23, DET24, DET25, (DET16), CYP1B1 (DET17), DPP4 (DET18), FNDC4 DET26, DET27, DET28, DET29, DET30, DET31, DET32, (DET19), PHLDA2 (DET20), CCNA1 (DET21), CDH3 DET33, DET34, DET35, DET36, DET37, DET38, DET39, (DET22), CEACAM6 (DET23), QSCN6 (DET24), COL7A1 DET40, DET41, DET42, DET43, DET44, DET45, DET46, (DET25), MGC9712 (DET26), IL1RAP (DET27), LAMB3 DET47, DET48, DET49, DET50, DET51, DET52, DET53, (DET28), PRSS3 (DET29), LRP4 (DET30), SPOCK1 DET54, DET55, DET56, DET57, DET58, DET59, DET60, (DET31), PDE5A (DET32), FLJ37078 (DET33), FBN3 DET61, DET62, DET63, DET64, DET65, DET66, DET67, 25 (DET34), DIRAS3 (DET35), PRSS1 (DET36), CAMK2N1 DET68, DET69, DET70, DET71, DET72, DET73, DET74, (DET37), SNIP (DET38), KCNJ2 (DET39), SFN (DET40), DET75, DET76, DET77, DET78, DET79, DET80, DET81, GALNT7 (DET41), TGFA (DET42), BAIAP3 (DET43), and DET82, DET83, DET84, and DET85, in the test cell popula KCNK15 (DET44), that are upregulated in malignant thyroid tion and reference cell population, thereby classifying the lesions consisting of cell populations from papillary thyroid thyroid lesion in the subject. 30 carcinomas and follicular variant of papillary thyroid carci In the methods of the present invention, “classifying a nomas, when compared to benign thyroid lesions consisting thyroid lesion' is equivalent to diagnosing a subject with a of cell populations from follicular adenomas and hyperplastic type of thyroid lesion. These lesions can be benign or malig nodules. nant. Examples of a benign lesion include, but are not limited The present invention also provides a method for classify to, follicular adenoma, hyperplastic nodule, papillary 35 ing a thyroid lesion as malignant or benign in a Subject com adenoma, thyroiditis nodule, multinodular goiter, adenoma prising: a) measuring the expression of one or more nucleic toid nodules, Hirthle cell adenomas, and lymphocytic thy acid sequences selected from the group consisting of C21orf4 roiditis nodules. Examples of malignant lesions include, but (DET1), Hs. 145049(DET2), HMGA2 (DET12), KLK7 are not limited to, papillary thyroid carcinoma, follicular (DET13), MRC2 (DET14), LRRK2 (DET15), PLAG1 variant of papillary thyroid carcinoma, follicular carcinoma, 40 (DET16), CYP1B1 (DET17), DPP4 (DET18), FNDC4 Hurthle cell tumor, anaplastic thyroid cancer, medullary thy (DET19), PHLDA2 (DET20), CCNA1 (DET21), CDH3 roid cancer, thyroid lymphoma, poorly differentiated thyroid (DET22), CEACAM6 (DET23), QSCN6 (DET24), COL7A1 cancer and thyroid angiosarcoma. (DET25), MGC9712 (DET26), IL1RAP (DET27), LAMB3 In the methods of the present invention, measuring the (DET28), PRSS3 (DET29), LRP4 (DET30), SPOCK1 expression levels of one or more nucleic acids sequences 45 (DET31), PDE5A (DET32), FLJ37078 (DET33), FBN3 selected from the group consisting of DET1, DET2, DET3, (DET34), DIRAS3 (DET35), PRSS1 (DET36), CAMK2N1 DET4, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, (DET37), SNIP (DET38), KCNJ2 (DET39), SFN (DET40), DET12, DET13, DET14, DET15, DET16, DET17, DET18, GALNT7 (DET41), TGFA (DET42), BAIAP3 (DET43), and DET19, DET20, DET21, DET22, DET23, DET24, DET25, KCNK15 (DET44), in a test cell population, wherein at least DET26, DET27, DET28, DET29, DET30, DET31, DET32, 50 one cell in said test cell population is capable of expressing DET33, DET34, DET35, DET36, DET37, DET38, DET39, one or more nucleic acid sequences selected from the group DET40, DET41, DET42, DET43, DET44, DET45, DET46, consisting of C21orf4 (DET1), Hs. 145049(DET2), HMGA2 DET47, DET48, DET49, DET50, DET51, DET52, DET53, (DET12), KLK7 (DET13), MRC2 (DET14), LRRK2 DET54, DET55, DET56, DET57, DET58, DET59, DET60, (DET15), PLAG1 (DET16), CYP1B1 (DET17), DPP4 DET61, DET62, DET63, DET64, DET65, DET66, DET67, 55 (DET18), FNDC4 (DET19), PHLDA2 (DET20), CCNA1 DET68, DET69, DET70, DET71, DET72, DET73, DET74, (DET21), CDH3 (DET22), CEACAM6 (DET23), QSCN6 DET75, DET76, DET77, DET78, DET79, DET80, DET81, (DET24), COL7A1 (DET25), MGC9712 (DET26), IL1RAP DET82, DET83, DET84, and DET85 means that the expres (DET27), LAMB3 (DET28), PRSS3 (DET29), LRP4 sion of any combination of these sequences can be measured. (DET30), SPOCK1 (DET31), PDE5A (DET32), FLJ37078 For example, the expression level of one, two, three, four, 60 (DET33), FBN3 (DET34), DIRAS3 (DET35), PRSS1 five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16, 17, 18, (DET36), CAMK2N1 (DET37), SNIP (DET38), KCNJ2 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34,35, (DET39), SFN (DET40), GALNT7 (DET41), TGFA 36, 37,38, 39, 40, 41,42, 43,44, 45,46, 47,48, 49, 50, 51, 52, (DET42), BAIAP3 (DET43), and KCNK15 (DET44); b) 53, 54, 55,56, 57,58, 59, 60, 61, 62,63, 64, 65,66, 67,68, 69, comparing the expression of the nucleic acid sequence(s) to 70,71, 72, 73,74, 75, 76, 77, 78,79, 80, 81, 82, 83, 84 or 85 65 the expression of the nucleic acid sequence(s) in two refer sequences selected from the group consisting of DET1, ence cell populations comprising cells from malignant thy DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET9, roid lesions, and cells from benign thyroid lesions; and c) US 9,234,244 B2 23 24 identifying a similarity, if present, in expression levels of one prising: a) measuring the expression of one or more nucleic or more nucleic acid sequences selected from the group con acid sequences selected from the group consisting of KIT sisting of C21orf4 (DET1), Hs.145049(DET2), HMGA2 (DET4), LSM7(DET5), C11orf8(DET7), FAM13A1 (DET12), KLK7 (DET13), MRC2 (DET14), LRRK2 (DET9), IMPACT(DET10), KIAA1128(DET11), CDH1 (DET15), PLAG1 (DET16), CYP1B1 (DET17), DPP4 (DET8), RAG2 (DET45), CLYBL (DET46), NEB (DET47), (DET18), FNDC4 (DET19), PHLDA2 (DET20), CCNA1 TNFRSF11B (DET48), GNAI1 (DET49), AGTR1 (DET50), (DET21), CDH3 (DET22), CEACAM6 (DET23), QSCN6 HLF (DET51), SLC26A4 (DET52), MT1A (DET53), (DET24), COL7A1 (DET25), MGC9712 (DET26), IL1RAP FABP4 (DET54), LRP1B (DET55), SLC4A4 (DET56), (DET27), LAMB3 (DET28), PRSS3 (DET29), LRP4 LOC646278 (DET57), MAN1C1 (DET58), KCNIP3 (DET30), SPOCK1 (DET31), PDE5A (DET32), FLJ37078 10 (DET59), DNAJB9 (DET60), UBR1 (DET61), HSD17B6 (DET33), FBN3 (DET34), DIRAS3 (DET35), PRSS1 (DET62), SLC33A1 (DET63), CDH16 (DET64), TBC1D1 (DET36), CAMK2N1 (DET37), SNIP (DET38), KCNJ2 (DET65), SLC26A7 (DET66), C11orf74 (DET67), PLA2R1 (DET39), SFN (DET40), GALNT7 (DET41), TGFA (DET68), PTTG3 (DET69), EFEMP1 (DET70), ZMAT4 (DET42), BAIAP3 (DET43), and KCNK15 (DET44), in the (DET71), STEAP3 (DET72), DIO1 (DET73), TPO test cell population and reference cell populations, thereby 15 (DET74), PTTG1 (DET75), LGI3 (DET76), TMEM38B classifying the thyroid lesion in the Subject as malignant if (DET77), SLITRK4 (DET78), VBP1 (DET79), COL9A3 one or more nucleic acid sequences consisting of C21orf4 (DET80), IRS1 (DET81), STARD13 (DET82), LOC654085 (DET1), Hs. 145049(DET2), HMGA2 (DET12), KLK7 (DET83), RPS3A (DET84), and SPARCL1 (DET85), in a (DET13), MRC2 (DET14), LRRK2 (DET15), PLAG1 test cell population, wherein at least one cell in said test cell (DET16), CYP1B1 (DET17), DPP4 (DET18), FNDC4 population is capable of expressing one or more nucleic acid (DET19), PHLDA2 (DET20), CCNA1 (DET21), CDH3 sequences selected from the group consisting of KIT(DET4), (DET22), CEACAM6 (DET23), QSCN6 (DET24), COL7A1 LSM7(DET5), C11orf8(DET7), FAM13A1(DET9), (DET25), MGC9712 (DET26), IL1RAP (DET27), LAMB3 IMPACT(DET10), KIAA1128(DET11), CDH1(DET8), (DET28), PRSS3 (DET29), LRP4 (DET30), SPOCK1 RAG2 (DET45), CLYBL (DET46), NEB (DET47), (DET31), PDE5A (DET32), FLJ37078 (DET33), FBN3 25 TNFRSF11B (DET48), GNAI1 (DET49), AGTR1 (DET50), (DET34), DIRAS3 (DET35), PRSS1 (DET36), CAMK2N1 HLF (DET51), SLC26A4 (DET52), MT1A (DET53), (DET37), SNIP (DET38), KCNJ2 (DET39), SFN (DET40), FABP4 (DET54), LRP1B (DET55), SLC4A4 (DET56), GALNT7 (DET41), TGFA (DET42), BAIAP3 (DET43), and LOC646278 (DET57), MAN1C1 (DET58), KCNIP3 KCNK15 (DET44), are found to be overexpressed. For (DET59), DNAJB9 (DET60), UBR1 (DET61), HSD17B6 example, if any one of these DET genes, DET 1, DET2, and 30 (DET62), SLC33A1 (DET63), CDH16 (DET64), TBC1D1 DET 12-44, are used alone, it has been shown that they were (DET65), SLC26A7 (DET66), C11orf74 (DET67), PLA2R1 all significantly differentially overexpressed in malignant vs. (DET68), PTTG3 (DET69), EFEMP1 (DET70), ZMAT4 benign tumor types. In the methods of the invention the (DET71), STEAP3 (DET72), DIO1 (DET73), TPO malignant cell populations can be from papillary thyroid car (DET74), PTTG1 (DET75), LGI3 (DET76), TMEM38B cinomas, follicular variant of papillary thyroid carcinomas, 35 (DET77), SLITRK4 (DET78), VBP1 (DET79), COL9A3 follicular carcinomas, and Hurthle cell carcinomas. In the (DET80), IRS1 (DET81), STARD13 (DET82), LOC654085 methods of the invention the benign cell populations can be (DET83), RPS3A (DET84), and SPARCL1 (DET85); b) from follicular adenomas, hyperplastic nodules, adenoma comparing the expression of the nucleic acid sequence(s) to toid nodules, Hurthle cell adenomas, and lymphocytic thy the expression of the nucleic acid sequence(s) in two refer roid nodules. 40 ence cell populations comprising cells from malignant thy Also disclosed herein, the method of classifying a tumor as roid lesions, and cells from benign thyroid lesions; and c) malignant or benign based on the statistical similarity of the identifying a similarity, if present, in expression levels of one expression levels found in the tumor cells of question of the or more nucleic acid sequences selected from the group con nucleic acid sequences selected from the group consisting of sisting of KIT(DET4), LSM7(DET5), C11orf3(DET7), KIT(DET4), LSM7(DET5), C11orf8(DET7), FAM13A1 45 FAM13A1(DET9), IMPACT(DET10). KIAA1128(DET11), (DET9), IMPACT(DET10), KIAA1128(DET11), CDH1 CDH1(DET8), RAG2 (DET45), CLYBL (DET46), NEB (DET8), RAG2 (DET45), CLYBL (DET46), NEB (DET47), (DET47), TNFRSF11B (DET48), GNAI1 (DET49), AGTR1 TNFRSF11B (DET48), GNAI1 (DET49), AGTR1 (DET50), (DET50), HLF (DET51), SLC26A4 (DET52), MT1A HLF (DET51), SLC26A4 (DET52), MT1A (DET53), (DET53), FABP4 (DET54), LRP1B (DET55), SLC4A4 FABP4 (DET54), LRP1B (DET55), SLC4A4 (DET56), 50 (DET56), LOC646278 (DET57), MAN1C1 (DET58), LOC646278 (DET57), MAN1C1 (DET58), KCNIP3 KCNIP3 (DET59), DNAJB9 (DET60), UBR1 (DET61), (DET59), DNAJB9 (DET60), UBR1 (DET61), HSD17B6 HSD17B6 (DET62), SLC33A1 (DET63), CDH16 (DET64), (DET62), SLC33A1 (DET63), CDH16 (DET64), TBC1D1 TBC1D1 (DET65), SLC26A7 (DET66), C11orf74 (DET67), (DET65), SLC26A7 (DET66), C11orf74 (DET67), PLA2R1 PLA2R1 (DET68), PTTG3 (DET69), EFEMP1 (DET70), (DET68), PTTG3 (DET69), EFEMP1 (DET70), ZMAT4 55 ZMAT4 (DET71), STEAP3 (DET72), DIO1 (DET73), TPO (DET71), STEAP3 (DET72), DIO1 (DET73), TPO (DET74), PTTG1 (DET75), LGI3 (DET76), TMEM38B (DET74), PTTG1 (DET75), LGI3 (DET76), TMEM38B (DET77), SLITRK4 (DET78), VBP1 (DET79), COL9A3 (DET77), SLITRK4 (DET78), VBP1 (DET79), COL9A3 (DET80), IRS1 (DET81), STARD13 (DET82), LOC654085 (DET80), IRS1 (DET81), STARD13 (DET82), LOC654085 (DET83), RPS3A (DET84), and SPARCL1 (DET85), in the (DET83), RPS3A (DET84), and SPARCL1 (DET85), that are 60 test cell population and reference cell populations, thereby downregulated in malignant thyroid lesions consisting of cell classifying the thyroid lesion in the Subject as malignant if populations from papillary thyroid carcinomas and follicular one or more nucleic acid sequences consisting of KIT variant of papillary thyroid carcinomas, when compared to (DET4), LSM7(DET5), C11orf8(DET7), FAM13A1 benign thyroid lesions consisting of cell populations from (DET9), IMPACT(DET10), KIAA1128(DET11), CDH1 follicular adenomas and hyperplastic nodules. 65 (DET8), RAG2 (DET45), CLYBL (DET46), NEB (DET47), The present invention also provides a method for classify TNFRSF11B (DET48), GNAI1 (DET49), AGTR1 (DET50), ing a thyroid lesion as malignant or benign in a Subject com HLF (DET51), SLC26A4 (DET52), MT1A (DET53), US 9,234,244 B2 25 26 FABP4 (DET54), LRP1B (DET55), SLC4A4 (DET56), DET20, DET21, DET22, DET23, DET24, DET25, DET26, LOC646278 (DET57), MAN1C1 (DET58), KCNIP3 DET27, DET28, DET29, DET30, DET31, DET32, DET33, (DET59), DNAJB9 (DET60), UBR1 (DET61), HSD17B6 DET34, DET35, DET36, DET37, DET38, DET39, DET40, (DET62), SLC33A1 (DET63), CDH16 (DET64), TBC1D1 DET41, DET42, DET43, DET44, DET45, DET46, DET47, (DET65), SLC26A7 (DET66), C11orf74 (DET67), PLA2R1 DET48, DET49, DET50, DET51, DET52, DET53, DET54, (DET68), PTTG3 (DET69), EFEMP1 (DET70), ZMAT4 DET55, DET56, DET57, DET58, DET59, DET60, DET61, (DET71), STEAP3 (DET72), DIO1 (DET73), TPO DET62, DET63, DET64, DET65, DET66, DET67, DET68, (DET74), PTTG1 (DET75), LGI3 (DET76), TMEM38B DET69, DET70, DET71, DET72, DET73, DET74, DET75, (DET77), SLITRK4 (DET78), VBP1 (DET79), COL9A3 DET76, DET77, DET78, DET79, DET80, DET81, DET82, (DET80), IRS1 (DET81), STARD13 (DET82), LOC654085 10 DET83, DET84, and DET85 in a test cell population, wherein (DET83), RPS3A (DET84), and SPARCL1 (DET85), are at least one cell in said test cell population is capable of found to be underexpressed. For example, if any one of these expressing one or more nucleic acid sequences selected from DET genes, DET4, DET5, DET7-11, and DET45-85, are the group consisting of DET1, DET2, DET3, DET4, DET5, used alone, it has been shown that they were all significantly DET6, DET7, DET8, DET9, DET10, DET11, DET12, differentially underexpressed in malignant vs. benign tumor 15 DET13, DET14, DET15, DET16, DET17, DET18, DET19, types. In the methods of the invention the malignant cell DET20, DET21, DET22, DET23, DET24, DET25, DET26, populations can be from papillary thyroid carcinomas, folli DET27, DET28, DET29, DET30, DET31, DET32, DET33, cular variant of papillary thyroid carcinomas, follicular car DET34, DET35, DET36, DET37, DET38, DET39, DET40, cinomas, and Hurthle cell carcinomas. In the methods of the DET41, DET42, DET43, DET44, DET45, DET46, DET47, invention the benign cell populations can be from follicular DET48, DET49, DET50, DET51, DET52, DET53, DET54, adenomas, hyperplastic nodules, adenomatoid nodules, DET55, DET56, DET57, DET58, DET59, DET60, DET61, Hurthle cell adenomas, and lymphocytic thyroid nodules. DET62, DET63, DET64, DET65, DET66, DET67, DET68, As disclosed herein, the method for classifying a thyroid DET69, DET70, DET71, DET72, DET73, DET74, DET75, lesionina Subject as malignant or benign comprises receiving DET76, DET77, DET78, DET79, DET80, DET81, DET82, gene expression data of one or more nucleic acid sequences 25 DET83, DET84, and DET85; and determining a class of selected from the group consisting of the differentially tumor, wherein the determination is made by applying a sta expressed thyroid genes DET1, DET2, DET3, DET4, DET5, tistical classifier or predictor model to the gene expression and DET6 in a test cell population, wherein at least one cell in data; and outputting the class of tumor as malignant or benign said test cell population is capable of expressing one or more based on the determination. nucleic acid sequences selected from the group consisting of 30 In the methods of the present invention, the classifier, pre DET1, DET2, DET3, DET4, DET5, and DET6; and deter dictor model, or diagnosis-predictor model can be a com mining a class oftumor, wherein the determination is made by pound covariate predictor, a diagonal linear discriminant applying a statistical classifier or predictor model to the gene analysis, nearest-neighbor classification, or Support vector expression data; and outputting the class of tumor as malig machines with linear kernel. For example, with the nearest nant or benign based on the determination. 35 neighbor classifier or predictor model, data are provided that For example, the specific DET1-6 gene expression patterns show 73% sensitivity, 82% specificity, and 78% predictive that are shown in FIG. 4 can be used as a comparator, Such that value for the prediction of malignancy. if an unknown tumor sample matches these patterns, it can In the methods of the present invention, the differentially then be classified as malignant or benign. Thus, provided is a expressed thyroid genes incorporated into the classifier, pre method of classifying, staging or identifying a therapeutic 40 dictor model, or diagnosis-predictor model can be differen agent comprising the step of comparing the expression pat tially expressed in malignant vs. benign thyroid tumors with tern of sample (e.g., thyroid tumor cell or tissue) from a a level of statistical significance signified with a P value of subject with the patterns displayed in FIG.4, thereby identi less than 0.05 using standard statistical analysis. More spe fying the tumor as benign or malignant. A similar approach cifically, the P value can be less than 0.0001 to limit the can be taken using other sets of genes to classify a thyroid 45 number of false positives. In the methods of the present inven tumor as benign versus malignant. tion, standard statistical analysis can be an ANOVA test with As disclosed herein, the method for classifying a thyroid Bonferroni correction, or a random-variance t test. lesionina Subject as malignant or benign comprises receiving In the methods of the present invention, measuring the gene expression data of one or more nucleic acid sequences expression levels of one or more nucleic acids sequences selected from the group consisting of the differentially 50 selected from the group consisting of DET1, DET2, DET3, expressed thyroid genes DET1, DET2, DET3, DET4, DET6, DET4, DET6, DET7, DET8, DET9, DET10 and DET11, DET7, DET8, DET9, DET10, DET11, in a test cell popula means that the expression of any combination of these tion, wherein at least one cell in said test cell population is sequences can be measured. For example, the expression capable of expressing one or more nucleic acid sequences level of one, two, three, four, five, six, seven, eight, nine or ten selected from the group consisting of DET1, DET2, DET3, 55 sequences selected from the group consisting of DET1, DET4, DET6, DET7, DET8, DET9, DET 10, DET11; and DET2, DET3, DET4, DET6, DET7, DET8, DET9, DET 10 determining a class of tumor, wherein the determination is and DET11 can be measured. Similarly, when measuring the made by applying a statistical classifier or predictor model to expression levels of one or more nucleic acid sequences the gene expression data; and outputting the class of tumor as selected from the group consisting of DET1, DET2, DET3, malignant or benign based on the determination. 60 DET4, DET5 and DET6, one of skill in the art can measure As disclosed herein, the method for classifying a thyroid the expression level of one, two, three, four, five or six lesionina Subject as malignant or benign comprises receiving sequences selected from the group consisting of DET1, gene expression data of one or more nucleic acid sequences DET2, DET3, DET4, DET5 and DET6. selected from the group consisting of the differentially In the methods of the present invention, the invention expressed thyroid genes DET1, DET2, DET3, DET4, DET5, 65 includes providing a test population which includes at least DET6, DET7, DET8, DET9, DET10, DET11, DET12, once cell that is capable of expressing one or more of the DET13, DET14, DET15, DET16, DET17, DET18, DET19, sequences DET1-85. As utilized herein, “expression” refers US 9,234,244 B2 27 28 to the transcription of a DET gene to yield a DET nucleic acid, Wunsch, J. Mol. Biol. 48: 443 (1970), by the search for such as a DET mRNA. The term “expression' also refers to similarity method of Pearson and Lipman, Proc. Natl. Acad. the transcription and translation of a DET gene to yield the Sci. U.S.A. 85: 2444 (1988), by computerized implementa encoded protein, in particular a DET protein or a fragment tions of these algorithms (GAP, BESTFIT. FASTA, and thereof. Therefore, one of skill in the art can detect the expres TFASTA in the Wisconsin Genetics Software Package, sion of a DET gene by monitoring DET nucleic acid produc Genetics Computer Group, 575 Science Dr. Madison, Wis.; tion and/or expression of the DET protein. As utilized herein, the BLAST algorithm of Tatusova and Madden FEMS Micro “upregulated’ refers to an increase in expression and "down biol. Lett. 174: 247-250 (1999) available from the National regulated refers to a decrease in expression. Center for Biotechnology Information or by inspection. Simi In the methods of the present invention, the reference cell 10 larly, the present invention provides for the detection of DET population can be from normal thyroid tissue, cancerous thy proteins that are homologues of human DET proteins in other roid tissue or any other type of thyroid tissue for which a species. It would be readily apparent to one of skill in the art classification is known. As used herein, “a cell of a normal that the DET sequences set forth herein and in GenBank can subject' or “normal thyroid tissue” means a cell or tissue be utilized in sequence comparisons to identify DET which is histologically normal and was obtained from a Sub 15 sequences in other species. ject believed to be without malignancy and having no The sample of this invention, Such as a test cell population increased risk of developing a malignancy or was obtained or a reference cell population, can be from any organism and from tissues adjacent to tissue known to be malignant and can be, but is not limited to, peripheral blood, urine, saliva, which is determined to be histologically normal (non-malig sputum, feces, bone marrow specimens, primary tumors, nant) as determined by a pathologist. The reference cell popu embedded tissue sections, frozen tissue sections, cell prepa lation can be from any Subject, including cells of the Subject rations, cytological preparations, exfoliate samples (e.g., spu being tested obtained prior to developing the condition that tum), fine needle aspirations, lung fluid, amnion cells, fresh lead to the testing. The normal reference cell population can tissue, dry tissue, and cultured cells or tissue. The sample can be homogeneous for normal cells. be from malignant tissue or non-malignant tissue. The sample Using the sequence information provided herein and the 25 can be thyroid cells or thyroid tissue. The sample can be sequences provided by the database entries, the expression of unfixed or fixed according to standard protocols widely avail the DET sequences or fragments thereof can be detected, if able in the art and can also be embedded in a suitable medium present, and measured using techniques well known in the art. for preparation of the sample. For example, the sample can be For example, sequences disclosed herein can be used to con embedded in paraffin or other Suitable medium (e.g., epoxy or struct probes for detecting DET DNA and RNA sequences. 30 acrylamide) to facilitate preparation of the biological speci The amount of a DET nucleic acid, for example, DET mRNA, men for the detection methods of this invention. Furthermore, in a cell can be determined by methods standard in the art for the sample can be embedded in any commercially available detecting or quantitating a nucleic acid in a cell. Such as in situ mounting medium, either aqueous or organic. hybridization, quantitative PCR, Northern blotting, The sample can be on, Supported by, or attached to, a ELISPOT, dot blotting, etc., as well as any other method now 35 substrate which facilitates detection. A substrate of the known or later developed for detecting or quantitating the present invention can be, but is not limited to, a microscope amount of a nucleic acid in a cell. slide, a culture dish, a culture flask, a culture plate, a culture The presence or amount of a DET protein in or produced by chamber, ELISA plates, as well as any other substrate that can a cell can be determined by methods standard in the art, such be used for containing or Supporting biological samples for as Western blotting, ELISA, ELISPOT, immunoprecipita 40 analysis according to the methods of the present invention. tion, immunofluorescence (e.g., FACS), immunohistochem The substrate can be of any material suitable for the purposes istry, immunocytochemistry, etc., as well as any other method of this invention, such as, for example, glass, plastic, poly now known or later developed for detecting or quantitating styrene, mica and the like. The substrates of the present inven protein in or produced by a cell. tion can be obtained from commercial sources or prepared As used throughout, “subject” means an individual. Pref 45 according to standard procedures well known in the art. erably, the Subject is a mammal Such as a primate, and, more Additionally, an antibody or fragment thereof, an antigenic preferably, a human. The term “subject' includes domesti fragment of a DET protein, or DET nucleic acid of the inven cated animals, such as cats, dogs, etc., livestock (e.g., cattle, tion can be on, Supported by, or attached to a Substrate which horses, pigs, sheep,goats, etc.), and laboratory animals (e.g., facilitates detection. Such a substrate can include a chip, a mouse, monkey, rabbit, rat, guinea pig, etc.). 50 microarray or a mobile solid support. Thus, provided by the The present invention also provides for detection of vari invention are substrates including one or more of the antibod ants of the DET nucleic acids and polypeptides disclosed ies or antibody fragments, antigenic fragments of DET pro herein. In general, variants of nucleic acids and polypeptides teins, or DET nucleic acids of the invention. herein disclosed typically have at least, about 70, 71, 72,73, The nucleic acids of this invention can be detected with a 74, 75,76, 77,78, 79,80, 81, 82, 83, 84, 85,86, 87,88, 89,90, 55 probe capable of hybridizing to the nucleic acid of a cell or a 91, 92,93, 94, 95, 96, 97,98, or 99 percent sequence simi sample. This probe can be a nucleic acid comprising the larity (also referred to herein as “homology’) to the stated nucleotide sequence of a coding strand or its complementary sequence or the native sequence. Those of skill in the art Strand or the nucleotide sequence of a sense Strand or anti readily understand how to determine the homology of two sense strand, or a fragment thereof. The nucleic acid can polypeptides or nucleic acids. For example, the homology can 60 comprise the nucleic acidofa DET gene or fragments thereof. be calculated after aligning the two sequences so that the Thus, the probe of this invention can be either DNA or RNA homology is at its highest level. and can bind either DNA or RNA, or both, in the biological Another way of calculating homology can be performed by sample. The probe can be the coding or complementary published algorithms. Optimal alignment of sequences for strand of a complete DET gene or DET gene fragment. comparison may be conducted by the local homology algo 65 The nucleic acids of the present invention, for example, rithm of Smith and Waterman Adv. Appl. Math. 2: 482 (1981), DET1-DET85 nucleic acids and fragments thereof, can be by the homology alignment algorithm of Needleman and utilized as probes or primers to detect DET nucleic acids. US 9,234,244 B2 29 30 Therefore, the present invention provides DET polynucle 6-carboxy-X-rhodamine (ROX), 6-carboxy-2',4',7,4,7- otide probes or primers that can be at least 15, 25, 30, 35, 40, hexachlorofluorescein (HEX), 5-carboxyfluorescein 45, 50,55, 60, 65,70, 75, 80, 85,90, 95, 100,105, 110, 115, (5-FAM) or N.N.N',N'-tetramethyl-6-carboxyrhodamine 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, (TAMRA), radioactive labels, e.g., P. S. H; etc. The label 180, 185, 190, 195, 200, 250, 300,350 or at least 400 nucle 5 may be a two stage system, where the amplified DNA is otides in length. conjugated to biotin, haptens, etc. having a high affinity bind As used herein, the term “nucleic acid probe' refers to a ing partner, e.g. avidin, specific antibodies, etc., where the nucleic acid fragment that selectively hybridizes under strin binding partner is conjugated to a detectable label. The label gent conditions with a nucleic acid comprising a nucleic acid may be conjugated to one or both of the primers. Alterna set forth in a DET sequence provided herein. This hybridiza 10 tively, the pool of nucleotides used in the amplification is tion must be specific. The degree of complementarity labeled, so as to incorporate the label into the amplification between the hybridizing nucleic acid and the sequence to product. The amplification reaction can also include a dual which it hybridizes should be at least enough to exclude fluorescent probe, as described in the Examples, which hybridization with a nucleic acid encoding an unrelated pro hybridizes to and detects the amplification product thus tein. 15 allowing real time quantitation of the amplification product. Stringent conditions refers to the washing conditions used Therefore, expression of the nucleic acid(s) of the present in a hybridization protocol. In general, the washing condi invention can be measured by amplifying the nucleic acid(s) tions should be a combination of temperature and salt con and detecting the amplified nucleic acid with a fluorescent centration chosen so that the denaturation temperature is probe. approximately 5-20° C. below the calculated T of the For example, DET1 can be amplified utilizing forward nucleic acid hybrid under study. The temperature and salt primer GCAATCCTCTTACCTCCGCTTT (SEQID NO: 7) conditions are readily determined empirically in preliminary and reverse primer GGAATCGGAGACAGAAGAGAGCTT experiments in which samples of reference DNA immobi (SEQID NO:8). The nucleic acid amplified by these primers lized on filters are hybridized to the probe or protein coding can be detected with a probe comprising the nucleic acid nucleic acid of interest and then washed under conditions of 25 Sequence CTGGGACCACAGATGTATCCTCCACTCC different stringencies. The T of Such an oligonucleotide can (SEQ ID NO: 9) linked to a fluorescent label. These primers be estimated by allowing 2°C. for each A or T nucleotide, and are merely exemplary for the amplification of DET1 as one of 4°C. for each G or C. For example, an 18 nucleotide probe of skill in the art would know how to design primers, based on 50% G+C would, therefore, have an approximate T of 54°C. the DET1 nucleic acid sequences provided herein, such as Stringent conditions are known to one of skill in the art. 30 SEQID NO: 40 and the nucleic acid sequences provided by See, for example, Sambrook et al. (2001). An example of the database entries, to amplify a DET 1 nucleic acid. Simi stringent wash conditions is 4xSSC at 65° C. Highly stringent larly, the probe sequences provided herein are merely exem wash conditions include, for example, 0.2xSSC at 65°C. plary for the detection of a DET1 nucleic acid, as one of skill As mentioned above, the DET nucleic acids and fragments in the art would know how to design a probe, based on the thereof can be utilized as primers to amplify a DET nucleic 35 DET1 nucleic acid sequences provided herein, such as SEQ acid, Such as a DET gene transcript, by standard amplification ID NO: 40 and the nucleic acid sequences provided by the techniques. For example, expression of a DET gene transcript database entries, to detect a DET2 nucleic acid. can be quantified by RT-PCR using RNA isolated from cells, DET2 can be amplified utilizing forward primer GGCT as described in the Examples. GACTGGCAAAAAGTCTTG (SEQ ID NO: 1) and reverse A variety of PCR techniques are familiar to those skilled in 40 primer TTGGTTCCCTTAAGTTCTCAGAGTTT (SEQ ID the art. For a review of PCR technology, see White (1997) and NO: 2). The nucleic acid amplified by these primers can be the publication entitled “PCR Methods and Applications' detected with a probe comprising the nucleic acid sequence (1991, Cold Spring Harbor Laboratory Press), which is incor TGGCCCTGTCACTCCCATGATGC (SEQ ID NO. 3) porated herein by reference in its entirety for amplification linked to a fluorescent label. These primers are merely exem methods. In each of these PCR procedures, PCR primers on 45 plary for the amplification of DET2 as one of skill in the art either side of the nucleic acid sequences to be amplified are would know how to design primers, based on the DET2 added to a suitably prepared nucleic acid sample along with nucleic acid sequences provided herein, such as SEQID NO: dNTPs and a thermostable polymerase such as Taq poly 42 and the nucleic acid sequences provided by the database merase, Pfu polymerase, or Vent polymerase. The nucleic entries, to amplify a DET2 nucleic acid. Similarly, the probe acid in the sample is denatured and the PCR primers are 50 sequences provided herein are merely exemplary for the specifically hybridized to complementary nucleic acid detection of a DET2 nucleic acid, as one of skill in the art sequences in the sample. The hybridized primers are would know how to design a probe, based on the DET2 extended. Thereafter, another cycle of denaturation, hybrid nucleic acid sequences provided herein, such as SEQID NO: ization, and extension is initiated. The cycles are repeated 42 and the nucleic acid sequences provided by the database multiple times to produce an amplified fragment containing 55 entries, to detect a DET2 nucleic acid. the nucleic acid sequence between the primer sites. PCR has DET3 can be amplified utilizing forward primer TGC further been described in several patents including U.S. Pat. CAAGGAGCTTTGTTTATAGAA (SEQ ID NO: 19) and Nos. 4,683,195, 4,683.202 and 4,965,188. Each of these pub reverse primer ATGACGGCATGTACCAACCA (SEQ ID lications is incorporated herein by reference in its entirety for NO: 20). The nucleic acid amplified by these primers can be PCR methods. One of skill in the art would know how to 60 detected with a probe comprising the nucleic acid sequence design and synthesize primers that amplify a DET sequence TTGGTCCCCTCAGTTCTATGCTGTTGTGT (SEQ ID or a fragment thereof. NO: 21) linked to a fluorescent label. These primers are A detectable label may be included in an amplification merely exemplary for the amplification of DET3 as one of reaction. Suitable labels include fluorochromes, e.g. fluores skill in the art would know how to design primers, based on cein isothiocyanate (FITC), rhodamine, Texas Red, phyco 65 the DET3 nucleic acid sequences provided herein, such as erythrin, allophycocyanin, 6-carboxyfluorescein (6-FAM), SEQID NO: 44 and the nucleic acid sequences provided by 2.7'-dimethoxy-4',5'-dichloro-6-carboxyfluorescein (JOE), the database entries, to amplify a DET3 nucleic acid. Simi US 9,234,244 B2 31 32 larly, the probe sequences provided herein are merely exem a fluorescent label. These primers are merely exemplary for plary for the detection of a DET3 nucleic acid, as one of skill the amplification of DET7 as one of skill in the art would in the art would know how to design a probe, based on the know how to design primers, based on the DET7 nucleic acid DET3 nucleic acid sequences provided herein, such as SEQ sequences provided herein, such as SEQID NO: 51 and the ID NO: 44 and the nucleic acid sequences provided by the nucleic acid sequences provided by the database entries, to database entries, to detect a DET3 nucleic acid. amplify a DET7 nucleic acid. Similarly, the probe sequences DET4 can be amplified utilizing forward primer GCAC provided herein are merely exemplary for the detection of a CTGCTGAAATGTATGACATAAT (SEQ ID NO: 22) and DET7 nucleic acid, as one of skill in the art would know how reverse primer TTTGCTAAGTTGGAGTAAATATGAT to design a probe, based on the DETT nucleic acid sequences TGG (SEQID NO. 23). The nucleic acid amplified by these 10 primers can be detected with a probe comprising the nucleic provided herein, such as SEQID NO: 51 and the nucleic acid acid sequence ATTGTTCAGCTAATTGAGAAGCA sequences provided by the database entries, to detect a DET7 GATTTCAGAGAGC (SEQ ID NO: 24) linked to a fluores nucleic acid. cent label. These primers are merely exemplary for the ampli DET8 can be amplified utilizing forward primer TGAGT fication of DET4 as one of skill in the art would know how to 15 GTCCCCCGGTATCTTC (SEQ ID NO: 28) and reverse design primers, based on the DET4 nucleic acid sequences primer CAGCCGCTTTCAGATTTTCAT (SEQID NO: 29). provided herein, such as SEQID NO: 45 and the nucleic acid The nucleic acid amplified by these primers can be detected sequences provided by the database entries, to amplify a with a probe comprising the nucleic acid sequence CCTGC DET4 nucleic acid. Similarly, the probe sequences provided CAATCCCGATGAAATTGGAAAT (SEQ ID NO: 30) herein are merely exemplary for the detection of a DET4 linked to a fluorescent label. These primers are merely exem nucleic acid, as one of skill in the art would know how to plary for the amplification of DET8 as one of skill in the art design a probe, based on the DET4 nucleic acid sequences would know how to design primers, based on the DET8 provided herein, such as SEQID NO: 45 and the nucleic acid nucleic acid sequences provided herein, such as SEQID NO: sequences provided by the database entries, to detect a DET4 53 and the nucleic acid sequences provided by the database nucleic acid. 25 entries, to amplify a DET8 nucleic acid. Similarly, the probe DET5 can be amplified utilizing forward primer GAC sequences provided herein are merely exemplary for the GATCCGGGTAAAGTTCCA (SEQID NO:34) and reverse detection of a DET8 nucleic acid, as one of skill in the art primer AGGTTGAGGAGTGGGTCGAA (SEQ ID NO: 35) would know how to design a probe, based on the DET8 The nucleic acid amplified by these primers can be detected nucleic acid sequences provided herein, such as SEQID NO: with a probe comprising the nucleic acid sequence AGGC 30 53 and the nucleic acid sequences provided by the database CGCGAAGCCAGTGGAATC (SEQID NO:36) linked to a entries, to detect a DET8 nucleic acid. fluorescent label. These primers are merely exemplary for the DET9 can be amplified utilizing forward primer ATG amplification of DET5 as one of skill in the art would know GCAGTGCAGTCATCATCTT (SEQIDNO: 10) and reverse how to design primers, based on the DET5 nucleic acid primer GCATTCATACAGCTGCTTACCATCT (SEQ ID sequences provided herein, such as SEQID NO: 47 and the 35 NO: 11). The nucleic acid amplified by these primers can be nucleic acid sequences provided by the database entries, to detected with a probe comprising the nucleic acid sequence amplify a DET5 nucleic acid. Similarly, the probe sequences TTTGGTCCCTGCCTAGGACCGGG (SEQ ID NO: 12) provided herein are merely exemplary for the detection of a linked to a fluorescent label. These primers are merely exem DET5 nucleic acid, as one of skill in the art would know how plary for the amplification of DET9 as one of skill in the art to design a probe, based on the DET5 nucleic acid sequences 40 would know how to design primers, based on the DET9 provided herein, such as SEQID NO: 47 and the nucleic acid nucleic acid sequences provided herein, such as SEQID NO: sequences provided by the database entries, to detect a DET5 55 and the nucleic acid sequences provided by the database nucleic acid. entries, to amplify a DET9 nucleic acid. Similarly, the probe DET6 can be amplified utilizing forward primer GCTG sequences provided herein are merely exemplary for the GTGCTCATGGCACTT (SEQ ID NO: 31) and reverse 45 detection of a DET9 nucleic acid, as one of skill in the art primer CCCTCCCCAGGCTTCCTAA (SEQ ID NO:32). would know how to design a probe, based on the DET9 The nucleic acid amplified by these primers can be detected nucleic acid sequences provided herein, such as SEQID NO: with a probe comprising the nucleic acid sequence 55 and the nucleic acid sequences provided by the database AAGGGCTTTGCCTGACAACACCCA (SEQ ID NO: 33) entries, to detect a DET9 nucleic acid. linked to a fluorescent label. These primers are merely exem 50 DET 10 can be amplified utilizing forward primer TGAA plary for the amplification of DET6 as one of skill in the art GAATGTCATGGTGGTAGTATCA (SEQ ID NO: 25) and would know how to design primers, based on the DET6 reverse primer ATGACTCCTCAGGTGAATTTGTGTAG nucleic acid sequences provided herein, such as SEQID NO: (SEQID NO: 26). The nucleic acid amplified by these prim 49 and the nucleic acid sequences provided by the database ers can be detected with a probe comprising the nucleic acid entries, to amplify a DET6 nucleic acid. Similarly, the probe 55 sequence CTGGTATGGAGGGATTCTGCTAGGACCAG sequences provided herein are merely exemplary for the (SEQID NO: 27) linked to a fluorescent label. These primers detection of a DET6 nucleic acid, as one of skill in the art are merely exemplary for the amplification of DET 10 as one would know how to design a probe, based on the DET6 ofskill in the art would know how to design primers, based on nucleic acid sequences provided herein, such as SEQID NO: the DET 10 nucleic acid sequences provided herein, such as 49 and the nucleic acid sequences provided by the database 60 SEQID NO: 57 and the nucleic acid sequences provided by entries, to detect a DET6 nucleic acid. the database entries, to amplify a DET10 nucleic acid. Simi DETT can be amplified utilizing forward primer CCGGC larly, the probe sequences provided herein are merely exem CCAAGCTCCAT (SEQ ID NO: 13) and reverse primer plary for the detection of a DET 10 nucleic acid, as one of skill TTGTGTAACCGTCGGTCATGA (SEQ ID NO: 14). The in the art would know how to design a probe, based on the nucleic acid amplified by these primers can be detected with 65 DET 10 nucleic acid sequences provided herein, such as SEQ a probe comprising the nucleic acid sequence TGTTTGGTG ID NO: 57 and the nucleic acid sequences provided by the GAATCCATGAAGGTTATGGC (SEQID NO: 15) linked to database entries, to detect a DET10 nucleic acid. US 9,234,244 B2 33 34 DET11 can be amplified utilizing forward primer probes, can be detected once the unbound portion of the GAGAGCGTGATCCCCCTACA (SEQ ID NO: 16) and sample is washed away. Detection can be visual or with reverse primer ACCAAGAGTGCACCTCAGTGTCT (SEQ computer assistance. ID NO: 17). The nucleic acid amplified by these primers can The present invention also provides methods of detecting be detected with a probe comprising the nucleic acid and measuring a DET protein or fragment thereof. An amino Sequence TCACTTCCAAATGTTCCTGTAGCAT acid sequence for a C21orf4 (DET1) protein is set forth herein AAATGGTG (SEQID NO: 18) linked to a fluorescent label. as SEQID NO: 41. An amino acid sequence for a Hs. 145049 These primers are merely exemplary for the amplification of (DET2) protein is set forth herein as SEQ ID NO: 43. An DET11 as one of skill in the art would know how to design amino acid sequence for a KIT (DET4) protein is set forth 10 herein as SEQ ID NO: 46. An amino acid sequence for a primers, based on the DET11 nucleic acid sequences pro LSM7 (DET5) protein is set forth herein SEQID NO: 48. An vided herein, such as SEQ ID NO. 59 and the nucleic acid amino acid sequence for a SYNGR2 (DET6) protein is set sequences provided by the database entries, to amplify a forth herein as SEQ IN NO: 50. An amino acid sequence for DET11 nucleic acid. Similarly, the probe sequences provided a C11orf3 (DET7) protein is provided hereinas SEQID NO: herein are merely exemplary for the detection of a DET11 15 52. An amino acid sequence for a CDH1 (DET8) protein is set nucleic acid, as one of skill in the art would know how to forth herein as SEQID NO: 54. An amino acid sequence for design a probe, based on the DET11 nucleic acid sequences a FAM13A1 (DET9) protein is set forth hereinas SEQID NO: provided herein, such as SEQID NO. 59 and the nucleic acid 56. An amino acid sequence for IMPACT(DET 10) protein is sequences provided by the database entries, to detect a provided herein as SEQID NO: 58. An amino acid sequence DET11 nucleic acid. for KIAA1128(DET11) protein is set forth herein as SEQID The sample nucleic acid, e.g. amplified fragment, can be NO: 60. Therefore, the present invention provides antibodies analyzed by one of a number of methods known in the art. The that bind to the DET protein sequences or fragments thereof nucleic acid can be sequenced by dideoxy or other methods. set forth herein. The antibody utilized to detect a DET Hybridization with the sequence can also be used to deter polypeptide, or fragment thereof, can be linked to a detectable mine its presence, by Southern blots, dot blots, etc. 25 label either directly or indirectly through use of a secondary The DET nucleic acids of the invention can also be used in and/or tertiary antibody; thus, bound antibody, fragment or polynucleotide arrays. Polynucleotide arrays provide a high molecular complex can be detected directly in an ELISA or throughput technique that can assay a large number of poly similar assay. nucleotide sequences in a single sample. This technology can The sample can be on, Supported by, or attached to, a be used, for example, as a diagnostic tool to identify samples 30 substrate which facilitates detection. A substrate of the present invention can be, but is not limited to, a microscope with differential expression of DET nucleic acids as com slide, a culture dish, a culture flask, a culture plate, a culture pared to a reference sample. chamber, ELISA plates, as well as any other substrate that can To create arrays, single-stranded polynucleotide probes be used for containing or Supporting biological samples for can be spotted onto a substrate in a two-dimensional matrix or 35 analysis according to the methods of the present invention. array. Each single-stranded polynucleotide probe can com The substrate can be of any material suitable for the purposes prise at least 6,7,8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, of this invention, such as, for example, glass, plastic, poly 25, or 30 or more contiguous nucleotides selected from the styrene, mica and the like. The substrates of the present inven nucleotide sequences of DET1-DET85. The substrate can be tion can be obtained from commercial sources or prepared any Substrate to which polynucleotide probes can be attached, 40 according to standard procedures well known in the art. including but not limited to glass, nitrocellulose, silicon, and Conversely, an antibody or fragment thereof, an antigenic nylon. Polynucleotide probes can be bound to the substrate by fragment of a DET protein can be on, Supported by, or either covalent bonds or by non-specific interactions, such as attached to a substrate which facilitates detection. Such a hydrophobic interactions. Techniques for constructing arrays substrate can be a mobile solid support. Thus, provided by the and methods of using these arrays are described in EP No. 0 45 invention are substrates including one or more of the antibod 799897; PCT No. WO97/29212; PCT No. WO97/27317; EP ies or antibody fragments, or antigenic fragments of a DET No. 0785 280; PCT No. WO97/02357; U.S. Pat. Nos. 5,593, polypeptide. 839; 5,578,832; EP No. 0728,520; U.S. Pat. No. 5,599,695; In the methods of the present invention, once the expres EP No. 0721 016; U.S. Pat. No. 5,556,752; PCT No. WO sion levels of one or more DET nucleic acids is measured, 95/22058; and U.S. Pat. No. 5,631,734 (each of which is 50 these expression levels are compared to the expression of the incorporated herein by reference for its teaching of prepara nucleic acid sequence(s) in a reference cell population com tion of arrays). Commercially available polynucleotide prising at least one cell for which a thyroid lesion classifica arrays, such as Affymetrix GeneChipTM, can also be used. Use tion is known. Once this comparison is performed, a differ of the GeneChipTM to detect gene expression is described, for ence in expression levels, if present, is identified by one of example, in Lockhart et al., Nature Biotechnology 14:1675 55 skill in the art. (1996); Chee et al., Science 274:610 (1996); Hacia et al., A difference or alteration in expression of any DET nucleic Nature Genetics 14:441, 1996; and Kozal et al., Nature Medi acid measured in the test cell population (i.e., in one or more cine 2:753, 1996. DET nucleic acids), as compared to the expression of the Tissue samples can be treated to form single-stranded poly same DET nucleic acid(s) in the reference cell population, nucleotides, for example by heating or by chemical denatur 60 indicates that the test cell population is different from the ation, as is known in the art. The single-stranded polynucle reference cell population. By “difference' or “alteration' is otides in the tissue sample can then be labeled and hybridized meant that the expression of one or more DET nucleic acid to the polynucleotide probes on the array. Detectable labels sequences is either increased or decreased as compared to the which can be used include but are not limited to radiolabels, expression levels of the reference cell population. If desired, biotinylated labels, fluorophors, and chemiluminescent 65 but not necessary, relative expression levels within the test labels. Double stranded polynucleotides, comprising the and reference cell populations can be normalized by refer labeled sample polynucleotides bound to polynucleotide ence to the expression level of a nucleic acid sequence that US 9,234,244 B2 35 36 does not vary according to thyroid cancer stage in the Subject. thyroid lesion in a Subject, as well as the type of benign or The absence of a difference or alteration in expression of any malignant lesion in the Subject. DET nucleic acid measured in the test cell population (i.e., in Staging of Thyroid Cancer one or more DET nucleic acids), as compared to the expres Once a subject has been diagnosed with a malignantlesion sion of the same DET nucleic acid(s) in the reference cell or thyroid tumor, the stage of thyroid malignancy can also be population, indicates that the test cell population is similar to determined by the methods of the present invention. Staging the reference cell population. of a thyroid malignancy or tumor can be useful in prescribing The comparison of a set of expression levels of one or more treatment as well as in determining a prognosis for the Sub DET nucleic acids in a test cell population to the expression ject. level of the same set of one or more DET nucleic acid(s) in the 10 Therefore, also provided by the present invention is a reference cell population provides the expression profile for method of identifying the stage of a thyroid tumorina Subject that DET for the cell population. As an example, if the refer comprising: a) measuring the expression of one or more ence cell population is from normal thyroid tissue, a similar nucleic acid sequences selected from the group consisting of DET gene expression profile in the test cell population indi DET1, DET2, DET3, DET4, DET6, DET7, DET8, DET9, cates that the test cell population is also normal whereas a 15 DET 10 and DET11 in a test cell population, wherein at least different profile indicates that the test cell population is not one cell in said test cell population is capable of expressing normal. By "similar is meant that an expression pattern does one or more nucleic acid sequences selected from the group not have to be exactly like the expression pattern but similar consisting of DET1, DET2, DET3, DET4, DET6, DET7, enough such that one of skill in the art would know that the DET8, DET9, DET10 and DET11; b) comparing the expres expression pattern is more closely associated with one type of sion of said nucleic acid sequences to the expression of the tissue than with another type of tissue. In another example, if same nucleic acid sequence(s) in a reference cell population the reference cell population is from malignant thyroid tissue, comprising at least one cell for which a thyroid tumor stage is a similar DET gene expression profile in the test cell popula known; and c) identifying a difference, if present, in expres tion indicates that the test cell population is also malignant sion levels of one or more nucleic acid sequences selected whereas a different profile indicates that the test cell popula 25 from the group consisting of DET1, DET2, DET3, DET4, tion is not malignant. Similarly, if the reference cell popula DET6, DETT, DET8, DET9, DET10 and DET11, in the test tion is from benign thyroid tissue (e.g., a benign thyroid cell population and reference cell population, thereby identi lesion), a similar DET gene expression profile in the test cell fying the stage of the thyroid tumor in the Subject. population indicates that the test cell population is also Also provided by the present invention is a method of benign whereas a different profile indicates that the test cell 30 identifying the stage of a thyroid tumor in a Subject compris population is not benign. ing: a) measuring the expression of one or more nucleic acid Upon observing a difference between the test cell popula sequences selected from the group consisting of DET 1. tion and a normal reference cell population, one of skill in the DET2, DET3, DET4, DET5 and DET6 in a test cell popula art can classify the test cell population as benign or malignant tion, wherein at least one cell in said test cell population is by comparing the expression pattern to known expression 35 capable of expressing one or more nucleic acid sequences patterns for benign and malignant cells. This comparison can selected from the group consisting of DET1, DET2, DET3, be done by comparing the expression pattern of the test cell DET4, DET5 and DET6; b) comparing the expression of said population to the expression pattern obtained from a plurality nucleic acid sequences to the expression of the same nucleic of reference cells used as a control while measuring expres acid sequence(s) in a reference cell population comprising at sion levels in the test cell population. One of skill in the art can 40 least one cell for which a thyroid tumor stage is known; and c) also compare the expression pattern of the test cell population identifying a difference, if present, in expression levels of one with a database of expression patterns corresponding to nor or more nucleic acid sequences selected from the group con mal, benign and malignant cells and Subcategories thereof. sisting of DET1, DET2, DET3, DET4, DET5 and DET6, in For example, upon observing a difference between the test the test cell population and reference cell population, thereby cell population and a reference cell population from normal 45 identifying the stage of the thyroid tumor in the Subject. thyroid tissue, one of skill in the art can compare the expres Also provided by the present invention is a method of sion pattern of the test cell population with a database of identifying the stage of a thyroid tumor in a Subject compris expression patterns corresponding to normal, benign and ing: a) measuring the expression of one or more nucleic acid malignant cells. One of skill in the art would then determine sequences selected from the group consisting of DET1, which expression pattern in the database is most similar to the 50 DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET9, expression pattern obtained for the test cell population and DET10, DET11, DET12, DET13, DET14, DET15, DET16, classify the test cell population as benign or malignant, as DET17, DET18, DET19, DET20, DET21, DET22, DET23, well as classify the test cell population as a type of benign or DET24, DET25, DET26, DET27, DET28, DET29, DET30, malignant lesion. For example, if the test cell population is DET31, DET32, DET33, DET34, DET35, DET36, DET37, classified as being from a benign lesion, this population can 55 DET38, DET39, DET40, DET41, DET42, DET43, DET44, be further classified as being from a follicular adenoma, DET45, DET46, DET47, DET48, DET49, DET50, DET51, hyperplastic nodule, papillary adenoma, thyroiditis nodule, DET52, DET53, DET54, DET55, DET56, DET57, DET58, multimodal goiter or any other type of benign thyroid lesion. DET59, DET60, DET61, DET62, DET63, DET64, DET65, If the test cell population is classified as being from a malig DET66, DET67, DET68, DET69, DET70, DET71, DET72, nant lesion, this population can be further classified as being 60 DET73, DET74, DET75, DET76, DET77, DET78, DET79, from papillary thyroid carcinoma, follicular variant of papil DET80, DET81, DET82, DET83, DET84, and DET85 in a lary thyroid carcinoma, follicular carcinoma, Hurthle cell test cell population, wherein at least one cell in said test cell tumor, anaplastic thyroid cancer, medullary thyroid cancer, population is capable of expressing one or more nucleic acid thyroid lymphoma, poorly differentiated thyroid cancer and sequences selected from the group consisting of DET1, thyroid angiosarcoma or any other type of malignant thyroid 65 DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET9, lesion. Therefore, utilizing the methods of the present inven DET10, DET11, DET12, DET13, DET14, DET15, DET16, tion, one of skill in the art can diagnose a benign or malignant DET17, DET18, DET19, DET20, DET21, DET22, DET23, US 9,234,244 B2 37 38 DET24, DET25, DET26, DET27, DET28, DET29, DET30, DET43, DET44, DET45, DET46, DET47, DET48, DET49, DET31, DET32, DET33, DET34, DET35, DET36, DET37, DET50, DET51, DET52, DET53, DET54, DET55, DET56, DET38, DET39, DET40, DET41, DET42, DET43, DET44, DET57, DET58, DET59, DET60, DET61, DET62, DET63, DET45, DET46, DET47, DET48, DET49, DET50, DET51, DET64, DET65, DET66, DET67, DET68, DET69, DET70, DET52, DET53, DET54, DET55, DET56, DET57, DET58, 5 DET71, DET72, DET73, DET74, DET75, DET76, DET77, DET59, DET60, DET61, DET62, DET63, DET64, DET65, DET78, DET79, DET80, DET81, DET82, DET83, DET84, DET66, DET67, DET68, DET69, DET70, DET71, DET72, and DET85 in a test cell population, wherein at least one cell DET73, DET74, DET75, DET76, DET77, DET78, DET79, in said test cell population is capable of expressing one or DET80, DET81, DET82, DET83, DET84, and DET85; b) more nucleic acid sequences selected from the group consist comparing the expression of said nucleic acid sequences to 10 ing of DET1, DET2, DET3, DET4, DET5, DET6, DET7, the expression of the same nucleic acid sequence(s) in a DET8, DET9, DET10, DET11, DET12, DET13, DET14, reference cell population comprising at least one cell for DET15, DET16, DET17, DET18, DET19, DET20, DET21, which a thyroid tumor stage is known; and c) identifying a DET22, DET23, DET24, DET25, DET26, DET27, DET28, difference, if present, in expression levels of one or more DET29, DET30, DET31, DET32, DET33, DET34, DET35, nucleic acid sequences selected from the group consisting of 15 DET36, DET37, DET38, DET39, DET40, DET41, DET42, DET1, DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET43, DET44, DET45, DET46, DET47, DET48, DET49, DET9, DET10, DET11, DET12, DET13, DET14, DET15, DET50, DET51, DET52, DET53, DET54, DET55, DET56, DET16, DET17, DET18, DET19, DET20, DET21, DET22, DET57, DET58, DET59, DET60, DET61, DET62, DET63, DET23, DET24, DET25, DET26, DET27, DET28, DET29, DET64, DET65, DET66, DET67, DET68, DET69, DET70, DET30, DET31, DET32, DET33, DET34, DET35, DET36, DET71, DET72, DET73, DET74, DET75, DET76, DET77, DET37, DET38, DET39, DET40, DET41, DET42, DET43, DET78, DET79, DET80, DET81, DET82, DET83, DET84, DET44, DET45, DET46, DET47, DET48, DET49, DET50, and DET85; and determining the stage of the thyroid tumor, DET51, DET52, DET53, DET54, DET55, DET56, DET57, wherein the determination is made by applying a statistical DET58, DET59, DET60, DET61, DET62, DET63, DET64, classifier or predictor model to the gene expression data; and DET65, DET66, DET67, DET68, DET69, DET70, DET71, 25 outputting the stage of the thyroid tumor based on the deter DET72, DET73, DET74, DET75, DET76, DET77, DET78, mination. DET79, DET80, DET81, DET82, DET83, DET84, and In the methods of the present invention, the classifier, pre DET85, in the test cell population and reference cell popula dictor model, or diagnosis-predictor model can be a com tion, thereby identifying the stage of the thyroid tumor in the pound covariate predictor, a diagonal linear discriminant Subject. 30 analysis, nearest-neighbor classification, or Support vector As disclosed herein, the method for identifying the stage of machines with linear kernel. a thyroid tumor in a subject comprises receiving gene expres In the methods of the present invention, the differentially sion data of one or more nucleic acid sequences selected from expressed thyroid genes incorporated into the classifier, pre the group consisting of the differentially expressed thyroid dictor model, or diagnosis-predictor model can be differen genes DET1, DET2, DET3, DET4, DET5, and DET6 in a test 35 tially expressed in malignant vs. benign thyroid tumors with cell population, wherein at least one cell in said test cell a level of statistical significance signified with a P value of population is capable of expressing one or more nucleic acid less than 0.05 using standard statistical analysis. More spe sequences selected from the group consisting of DET1, cifically, the P value can be less than 0.0001 to limit the DET2, DET3, DET4, DET5, and DET6; and determining the number of false positives. In the methods of the present inven stage of the thyroid tumor, wherein the determination is made 40 tion, standard statistical analysis can be an ANOVA test with by applying a statistical classifier or predictor model to the Bonferroni correction, or a random-variance t test. gene expression data; and outputting the stage of the thyroid Also provided by the present invention is a method of tumor based on the determination. determining a prognosis for Subject comprising: a) measuring As disclosed herein, the method for identifying the stage of the expression of one or more nucleic acid sequences selected a thyroid tumor in a subject comprises receiving gene expres 45 from the group consisting of DET1, DET2, DET3, DET4, sion data of one or more nucleic acid sequences selected from DET6, DET7, DET8, DET9, DET10 and DET11 in a test cell the group consisting of the differentially expressed thyroid population, wherein at least one cell in said test cell popula genes DET1, DET2, DET3, DET4, DET6, DET7, DET8, tion is capable of expressing one or more nucleic acid DET9, DET 10, DET11, in a test cell population, wherein at sequences selected from the group consisting of DET1, least one cell in said test cell population is capable of express 50 DET2, DET3, DET4, DET6, DET7, DET8, DET9, DET 10 ing one or more nucleic acid sequences selected from the and DET11; b) comparing the expression of said nucleic acid group consisting of DET1, DET2, DET3, DET4, DET6, sequences to the expression of the same nucleic acid DET7, DET8, DET9, DET10, DET11; and determining the sequence(s) in a reference cell population comprising at least stage of the thyroid tumor, wherein the determination is made one cell for which a thyroid tumor stage is known; and c) by applying a statistical classifier or predictor model to the 55 identifying a difference, if present, in expression levels of one gene expression data; and outputting the stage of the thyroid or more nucleic acid sequences selected from the group con tumor based on the determination. sisting of DET1, DET2, DET3, DET4, DET6, DET7, DET8, As disclosed herein, the method for identifying the stage of DET9, DET10 and DET11, in the test cell population and a thyroid tumor in a subject comprises receiving gene expres reference cell population, thereby determining the prognosis sion data of one or more nucleic acid sequences selected from 60 for the subject. the group consisting of the differentially expressed thyroid Also provided by the present invention is a method of genes DET1, DET2, DET3, DET4, DET5, DET6, DET7, determining the prognosis for a Subject comprising: a) mea DET8, DET9, DET10, DET11, DET12, DET13, DET14, Suring the expression of one or more nucleic acid sequences DET15, DET16, DET17, DET18, DET19, DET20, DET21, selected from the group consisting of DET1, DET2, DET3, DET22, DET23, DET24, DET25, DET26, DET27, DET28, 65 DET4, DET5 and DET6 in a test cell population, wherein at DET29, DET30, DET31, DET32, DET33, DET34, DET35, least one cell in said test cell population is capable of express DET36, DET37, DET38, DET39, DET40, DET41, DET42, ing one or more nucleic acid sequences selected from the US 9,234,244 B2 39 40 group consisting of DET1, DET2, DET3, DET4, DET5 and tion is capable of expressing one or more nucleic acid DET6; b) comparing the expression of said nucleic acid sequences selected from the group consisting of DET1, sequences to the expression of the same nucleic acid DET2, DET3, DET4, DET5, and DET6; and determining the sequence(s) in a reference cell population comprising at least prognosis for a Subject, wherein the determination is made by one cell for which a thyroid tumor stage is known; and c) applying a statistical classifier or predictor model to the gene identifying a difference, if present, in expression levels of one expression data; and outputting the prognosis for the Subject or more nucleic acid sequences selected from the group con based on the determination. sisting of DET1, DET2, DET3, DET4, DET5 and DET6, in As disclosed herein, the method for determining the prog the test cell population and reference cell population, thereby nosis for a subject comprises receiving gene expression data determining the prognosis for the Subject. 10 of one or more nucleic acid sequences selected from the group Also provided by the present invention is a method of consisting of the differentially expressed thyroid genes determining the prognosis for a Subject comprising: a) mea DET1, DET2, DET3, DET4, DET6, DET7, DET8, DET9, Suring the expression of one or more nucleic acid sequences DET 10, DET11, in a test cell population, wherein at least one selected from the group consisting of DET1, DET2, DET3, cell in said test cell population is capable of expressing one or DET4, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, 15 more nucleic acid sequences selected from the group consist DET12, DET13, DET14, DET15, DET16, DET17, DET18, ing of DET1, DET2, DET3, DET4, DET6, DET7, DET8, DET19, DET20, DET21, DET22, DET23, DET24, DET25, DET9, DET 10, DET11; and determining the prognosis for a DET26, DET27, DET28, DET29, DET30, DET31, DET32, Subject, wherein the determination is made by applying a DET33, DET34, DET35, DET36, DET37, DET38, DET39, statistical classifier or predictor model to the gene expression DET40, DET41, DET42, DET43, DET44, DET45, DET46, data; and outputting the prognosis for the Subject based on the DET47, DET48, DET49, DET50, DET51, DET52, DET53, determination. DET54, DET55, DET56, DET57, DET58, DET59, DET60, As disclosed herein, the method for determining the prog DET61, DET62, DET63, DET64, DET65, DET66, DET67, nosis for a subject comprises receiving gene expression data DET68, DET69, DET70, DET71, DET72, DET73, DET74, of one or more nucleic acid sequences selected from the group DET75, DET76, DET77, DET78, DET79, DET80, DET81, 25 consisting of the differentially expressed thyroid genes DET82, DET83, DET84, and DET85 in a test cell population, DET1, DET2, DET3, DET4, DET5, DET6, DET7, DET8, wherein at least one cell in said test cell population is capable DET9, DET10, DET11, DET12, DET13, DET14, DET15, of expressing one or more nucleic acid sequences selected DET16, DET17, DET18, DET19, DET20, DET21, DET22, from the group consisting of DET1, DET2, DET3, DET4, DET23, DET24, DET25, DET26, DET27, DET28, DET29, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, 30 DET30, DET31, DET32, DET33, DET34, DET35, DET36, DET12, DET13, DET14, DET15, DET16, DET17, DET18, DET37, DET38, DET39, DET40, DET41, DET42, DET43, DET19, DET20, DET21, DET22, DET23, DET24, DET25, DET44, DET45, DET46, DET47, DET48, DET49, DET50, DET26, DET27, DET28, DET29, DET30, DET31, DET32, DET51, DET52, DET53, DET54, DET55, DET56, DET57, DET33, DET34, DET35, DET36, DET37, DET38, DET39, DET58, DET59, DET60, DET61, DET62, DET63, DET64, DET40, DET41, DET42, DET43, DET44, DET45, DET46, 35 DET65, DET66, DET67, DET68, DET69, DET70, DET71, DET47, DET48, DET49, DET50, DET51, DET52, DET53, DET72, DET73, DET74, DET75, DET76, DET77, DET78, DET54, DET55, DET56, DET57, DET58, DET59, DET60, DET79, DET80, DET81, DET82, DET83, DET84, and DET61, DET62, DET63, DET64, DET65, DET66, DET67, DET85 in a test cell population, wherein at least one cell in DET68, DET69, DET70, DET71, DET72, DET73, DET74, said test cell population is capable of expressing one or more DET75, DET76, DET77, DET78, DET79, DET80, DET81, 40 nucleic acid sequences selected from the group consisting of DET82, DET83, DET84, and DET85; b) comparing the DET1, DET2, DET3, DET4, DET5, DET6, DET7, DET8, expression of said nucleic acid sequences to the expression of DET9, DET10, DET11, DET12, DET13, DET14, DET15, the same nucleic acid sequence(s) in a reference cell popula DET16, DET17, DET18, DET19, DET20, DET21, DET22, tion comprising at least one cell for which a thyroid tumor DET23, DET24, DET25, DET26, DET27, DET28, DET29, stage is known; and c) identifying a difference, if present, in 45 DET30, DET31, DET32, DET33, DET34, DET35, DET36, expression levels of one or more nucleic acid sequences DET37, DET38, DET39, DET40, DET41, DET42, DET43, selected from the group consisting of DET1, DET2, DET3, DET44, DET45, DET46, DET47, DET48, DET49, DET50, DET4, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, DET51, DET52, DET53, DET54, DET55, DET56, DET57, DET12, DET13, DET14, DET15, DET16, DET17, DET18, DET58, DET59, DET60, DET61, DET62, DET63, DET64, DET19, DET20, DET21, DET22, DET23, DET24, DET25, 50 DET65, DET66, DET67, DET68, DET69, DET70, DET71, DET26, DET27, DET28, DET29, DET30, DET31, DET32, DET72, DET73, DET74, DET75, DET76, DET77, DET78, DET33, DET34, DET35, DET36, DET37, DET38, DET39, DET79, DET80, DET81, DET82, DET83, DET84, and DET40, DET41, DET42, DET43, DET44, DET45, DET46, DET85; and determining the prognosis for a subject, wherein DET47, DET48, DET49, DET50, DET51, DET52, DET53, the determination is made by applying a statistical classifier DET54, DET55, DET56, DET57, DET58, DET59, DET60, 55 or predictor model to the gene expression data; and outputting DET61, DET62, DET63, DET64, DET65, DET66, DET67, the prognosis for the Subject based on the determination. DET68, DET69, DET70, DET71, DET72, DET73, DET74, In the methods of the present invention, the classifier, pre DET75, DET76, DET77, DET78, DET79, DET80, DET81, dictor model, or diagnosis-predictor model can be a com DET82, DET83, DET84, and DET85, in the test cell popula pound covariate predictor, a diagonal linear discriminant tion and reference cell population, thereby determining the 60 analysis, nearest-neighbor classification, or Support vector prognosis for the Subject. machines with linear kernel. As disclosed herein, the method for determining the prog In the methods of the present invention, the differentially nosis for a subject comprises receiving gene expression data expressed thyroid genes incorporated into the classifier, pre of one or more nucleic acid sequences selected from the group dictor model, or diagnosis-predictor model can be differen consisting of the differentially expressed thyroid genes 65 tially expressed in malignant vs. benign thyroid tumors with DET1, DET2, DET3, DET4, DET5, and DET6 in a test cell a level of statistical significance signified with a P value of population, wherein at least one cell in said test cell popula less than 0.05 using standard statistical analysis. More spe US 9,234,244 B2 41 42 cifically, the P value can be less than 0.0001 to limit the nation thereof. Examples of chemotherapeutic agents include number of false positives. In the methods of the present inven cisplatin, 5-fluorouracil and S-1. Immunotherapeutics meth tion, standard statistical analysis can be an ANOVA test with ods include administration of interleukin-2 and interferon-C. Bonferroni correction, or a random-variance t test. In determining the prognosis for a subject, once the expres In staging a thyroid tumor, once the expression levels of 5 sion levels of one or more DET nucleic acids is measured, one or more DET nucleic acids is measured, these expression these expression levels are compared to the expression of the levels are compared to the expression of the same nucleic acid same nucleic acid sequence(s) in a reference cell population sequence(s) in a reference cell population comprising at least comprising at least one cell for which a prognosis is known. one cell for which a stage of thyroid tumor is known. Once Once this comparison is performed, a difference in expres this comparison is performed, a difference in expression lev- 10 sion levels, if present, is identified by one of skill in the art. els, if present, is identified by one of skill in the art. Thus, the present method can comprise a step of correlating a A difference or alteration in expression of one or more DET expression pattern with the prognosis of a subject hav DET nucleic acids in the test cell population, as compared to ing a thyroid tumor. the reference cell population, indicates that the test cell popu One skilled in the art can measure DET nucleic acid levels lation is at a different stage than the stage of the reference cell 15 and/or DET polypeptide levels in order to determine a prog population. By “difference' or “alteration' is meant that the nosis for a subject. One of skill in the art can measure DET expression of one or more DET nucleic acid sequences is nucleic acid levels and/or DET polypeptide levels in numer either increased or decreased as compared to the expression ous Subjects with varying prognoses in order to establish levels of the reference cell population. If desired, but not reference expression patterns that correspond to prognoses necessary, relative expression levels within the test and ref- 20 for Subjects. As utilized herein, prognosis” means a predic erence cell populations can be normalized by reference to the tion of probable development and/or outcome of a disease. expression level of a nucleic acid sequence that does not vary These reference expression patterns or a database of reference according to thyroid cancer stage in the Subject. The absence expression patterns can then be used to compare an expres of a difference or alteration in expression of one or more DET sion pattern from a test sample and determine what the prog nucleic acids in the test cell population, as compared to the 25 nosis for a subject is. These expression patterns can also be expression of the same one or more DET nucleic acid(s) in the used to compare an expression pattern from a test sample reference cell population, indicates that the test cell popula from a subject and determine whether or not a Subject can tion is at the same stage as that of the reference cell popula recover from the disease. Upon correlation of a DET expres tion. As an example, if the reference cell population is from an sion pattern with a particular prognosis, the skilled practitio early stage thyroid tumor, a similar DET gene expression 30 ner can then determine if a therapy suited for the treatment of profile in the test cell population indicates that the test cell cancer is applicable. population is also from an early stage thyroid tumor whereas The present invention provides a computer system com a different profile indicates that the test cell population is not prisinga) a database including records comprising a plurality from an early stage thyroid tumor. By “similar is meant that of reference DET gene expression profiles or patterns for an expression pattern (expression profile) does not have to be 35 benign, malignant and normal tissue samples and associated exactly like another expression pattern but similar enough diagnosis and therapy data; and b) a user interface capable of such that one of skill in the art would know that the expression receiving a selection of one or more test gene expression pattern is more closely associated with one stage than with profiles for use in determining matches between the test another stage. expression profiles and the reference DET gene expression In order to establish a database of stages of thyroid cancer, 40 profiles and displaying the records associated with matching one skilled in the art can measure DET nucleic acid levels expression profiles. The database can also include DET gene and/or DET polypeptide levels in numerous subjects in order expression profiles for Subclasses of benign tissue samples to establish expression patterns that correspond to clinically Such as follicular adenoma, hyperplastic nodule, papillary defined stages Such as, for example, 1) normal. 2) at risk of adenoma, thyroiditis nodule and multinodular goiter. The developing thyroid cancer, 3) pre-cancerous or 4) cancerous 45 database can also include DET gene expression profiles for as well as other Substages defined within each of these stages, Subclasses of malignant tissue samples Such as papillary thy e.g., stage I papillary, stage II papillary, stage III papillary, roid carcinoma, follicular variant of papillary thyroid carci stage IV papillary, stage I follicular, stage II follicular, stage noma, follicular carcinoma, Hurthle cell tumor, anaplastic III follicular, stage IV follicular, stage I medullary, stage II thyroid cancer, medullary thyroid cancer, thyroid lymphoma, medullary, stage III medullary, or stage IV medullary thyroid 50 poorly differentiated thyroid cancer and thyroid angiosar cancer. These stages are not intended to be limiting as one of coma. The database can also include DET gene expression skill in the art may define other stages depending on the type profiles for stages of thyroid cancer as well as DET gene of sample, type of cancer, age of the Subject and other factors. expression profiles that correspond to prognoses for Subjects. This database can then be used to compare an expression It will be appreciated by those skilled in the art that the pattern from a test sample and make clinical decisions. Upon 55 DET gene expression profiles provided herein as well as the correlation of a DET expression pattern with a particular DET expression profiles identified from samples and subjects stage of thyroid cancer, the skilled practitioner can administer can be stored, recorded, and manipulated on any medium a therapy suited for the treatment of cancer. The present which can be read and accessed by a computer. As used invention also allows the skilled artisan to correlate a DET herein, the words “recorded and “stored refer to a process expression pattern with a type of thyroid lesion and correlate 60 for storing information on a computer medium. A skilled the expression pattern with a particular stage of thyroid can artisan can readily adopt any of the presently known methods cer. Thus, the present methods can comprise a step of corre for recording information on a computer readable medium to lating a DET expression pattern with the status of a thyroid generate a list of DET gene expression profiles comprising tumoras, for example, benign or malignant or a certain stage one or more of the DET expression profiles of the invention. of malignancy. The Subjects of this invention undergoing 65 Another aspect of the present invention is a computer read anti-cancer therapy can include Subjects undergoing Surgery, able medium having recorded thereon at least 2, 5, 10, 15, 20, chemotherapy, radiotherapy, immunotherapy or any combi 25, 30, 50, 100, 200, 250, 300, 400, 500, 1000, 2000, 3000, US 9,234,244 B2 43 44 4000 or 5000 expression profiles of the invention or expres profile (s) in order to determine whether the test expression sion profiles identified from subjects. profile(s) differs from or is the same as a reference expression Computer readable media include magnetically readable profile. media, optically readable media, electronically readable This invention also provides for a computer program that media and magnetic/optical media. For example, the com correlates DET gene expression profiles with a type of cancer puter readable media may be a hard disc, a floppy disc, a and/or a stage of cancer and/or a prognosis. The computer magnetic tape, CD-ROM, DVD, RAM, or ROM as well as program can optionally include treatment options or drug other types of other media known to those skilled in the art. indications for subjects with DET gene expression profiles Embodiments of the present invention include systems, associated with a type of cancer and/or stage of cancer. 10 Screening Methods particularly computer systems which contain the DET gene Further provided by the present invention is a method of expression information described herein. As used herein, “a identifying an agent for treating a thyroid tumor, the method computer system” refers to the hardware components, soft comprising: a) contacting a population of thyroid tumor cells ware components, and data storage components used to store from a Subject for which a tumor stage is known, wherein at and/or analyze the DET gene expression profiles of the 15 least one cell in said population is capable of expressing one present invention or other DET gene expression profiles. The or more nucleic acid sequences selected from the group con computer system preferably includes the computer readable sisting of DET1, DET2, DET3, DET4, DET6, DET7, DET8, media described above, and a processor for accessing and DET9, DET 10 and DET11, with a test agent; b) measuring manipulating the DET gene expression data. the expression of one or more nucleic acid sequences selected Preferably, the computer is a general purpose system that from the group consisting of DET1, DET2, DET3, DET4, comprises a central processing unit (CPU), one or more data DET6, DETT, DET8, DET9, DET 10 and DET11 in the cell storage components for storing data, and one or more data population; c) comparing the expression of the nucleic acid retrieving devices for retrieving the data stored on the data sequence(s) to the expression of the same nucleic acid storage components. A skilled artisan can readily appreciate sequence(s) in a reference cell population comprising at least that any one of the currently available computer systems are 25 one cell for which a thyroid tumor stage is known; and d) suitable. identifying a difference, if present, in expression levels of one In one particular embodiment, the computer system or more nucleic acid sequences selected from the group con includes a processor connected to a bus which is connected to sisting of DET1, DET2, DET3, DET4, DET6, DET7, DET8, a main memory, preferably implemented as RAM, and one or DET9, DET10 and DET11, in the test cell population and more data storage devices, such as a hard drive and/or other 30 reference cell population, such that if there is a difference computer readable media having data recorded thereon. In corresponding to an improvement, a therapeutic agent for some embodiments, the computer system further includes treating thyroid tumor has been identified. one or more data retrieving devices for reading the data stored Further provided by the present invention is a method of on the data storage components. The data retrieving device identifying an agent for treating a thyroid tumor, the method may represent, for example, a floppy disk drive, a compact 35 comprising: a) contacting a population of thyroid tumor cells disk drive, a magnetic tape drive, a hard disk drive, a CD from a Subject for which a tumor stage is known, wherein at ROM drive, a DVD drive, etc. In some embodiments, the data least one cell in said test population is capable of expressing storage component is a removable computer readable one or more nucleic acid sequences selected from the group medium such as a floppy disk, a compact disk, a magnetic consisting of DET1, DET2, DET3, DET4, DET5 and DET6, tape, etc. containing control logic and/or data recorded 40 with a test agent; b) measuring the expression of one or more thereon. The computer system may advantageously include nucleic acid sequences selected from the group consisting of or be programmed by appropriate Software for reading the DET1, DET2, DET3, DET4, DET5 and DET6 in the cell control logic and/or the data from the data storage component population; c) comparing the expression of the nucleic acid once inserted in the data retrieving device. Software for sequence(s) to the expression of the same nucleic acid accessing and processing the expression profiles of the inven 45 sequence(s) in a reference cell population comprising at least tion (such as search tools, compare tools, modeling tools, one cell for which a thyroid tumor stage is known; and d) etc.) may reside in main memory during execution. identifying a difference, if present, in expression levels of one In some embodiments, the computer system may further or more nucleic acid sequences selected from the group con comprise a program for comparing expression profiles stored sisting of DET1, DET2, DET3, DET4, DET5 and DET6, in on a computer readable medium to another test expression 50 the cell population and reference cell population, such that if profile on a computer readable medium. An "expression pro there is a difference corresponding to an improvement, a file comparer” refers to one or more programs which are therapeutic agent for treating thyroid tumor has been identi implemented on the computer system to compare an expres fied. sion profile with other expression profiles. Further provided by the present invention is a method of Accordingly, one aspect of the present invention is a com 55 identifying an agent for treating a thyroid tumor, the method puter system comprising a processor, a data storage device comprising: a) contacting with a test agent a population of having stored thereon a DET gene expression profile of the thyroid tumor cells from a subject for which a tumor stage is invention, a data storage device having retrievably stored known, wherein at least one cell in said test population is thereon reference DET gene expression profiles to be com capable of expressing one or more nucleic acid sequences pared with test or sample sequences and an expression profile 60 selected from the group consisting of DET1, DET2, DET3, comparer for conducting the comparison. The expression DET4, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, profile comparer may indicate a similarity between the DET12, DET13, DET14, DET15, DET16, DET17, DET18, expression profiles compared or identify a difference DET19, DET20, DET21, DET22, DET23, DET24, DET25, between the two expression profiles. DET26, DET27, DET28, DET29, DET30, DET31, DET32, Alternatively, the computer program may be a computer 65 DET33, DET34, DET35, DET36, DET37, DET38, DET39, program which compares a test expression profile(s) from a DET40, DET41, DET42, DET43, DET44, DET45, DET46, Subject or a plurality of Subjects to a reference expression DET47, DET48, DET49, DET50, DET51, DET52, DET53, US 9,234,244 B2 45 46 DET54, DET55, DET56, DET57, DET58, DET59, DET60, ing of DET1, DET2, DET3, DET4, DET6, DET7, DET8, DET61, DET62, DET63, DET64, DET65, DET66, DET67, DET9, DET10, DET11 in the cell population; and determin DET68, DET69, DET70, DET71, DET72, DET73, DET74, ing the identity of the agent to treat the thyroid tumor, wherein DET75, DET76, DET77, DET78, DET79, DET80, DET81, the determination is made by applying a statistical classifier DET82, DET83, DET84, and DET85; b) measuring the or predictor model to the gene expression data; and outputting expression of one or more nucleic acid sequences selected the identity of an agent to treat the thyroid tumor based on the from the group consisting of DET1, DET2, DET3, DET4, determination. DET5, DET6, DET7, DET8, DET9, DET 10, DET11, As disclosed herein, the method of identifying an agent for DET12, DET13, DET14, DET15, DET16, DET17, DET18, treating a thyroid tumor comprises receiving gene expression DET19, DET20, DET21, DET22, DET23, DET24, DET25, 10 data after contacting with a test agent a population of thyroid DET26, DET27, DET28, DET29, DET30, DET31, DET32, tumor cells from a Subject for which a tumor stage is known, DET33, DET34, DET35, DET36, DET37, DET38, DET39, wherein at least one cell in said test population is capable of DET40, DET41, DET42, DET43, DET44, DET45, DET46, expressing one or more nucleic acid sequences selected from DET47, DET48, DET49, DET50, DET51, DET52, DET53, the group consisting of differentially expressed thyroid genes DET54, DET55, DET56, DET57, DET58, DET59, DET60, 15 DET1, DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET61, DET62, DET63, DET64, DET65, DET66, DET67, DET9, DET10, DET11, DET12, DET13, DET14, DET15, DET68, DET69, DET70, DET71, DET72, DET73, DET74, DET16, DET17, DET18, DET19, DET20, DET21, DET22, DET75, DET76, DET77, DET78, DET79, DET80, DET81, DET23, DET24, DET25, DET26, DET27, DET28, DET29, DET82, DET83, DET84, and DET85 in the cell population: DET30, DET31, DET32, DET33, DET34, DET35, DET36, c) comparing the expression of the nucleic acid sequence(s) to DET37, DET38, DET39, DET40, DET41, DET42, DET43, the expression of the same nucleic acid sequence(s) in a DET44, DET45, DET46, DET47, DET48, DET49, DET50, reference cell population comprising at least one cell for DET51, DET52, DET53, DET54, DET55, DET56, DET57, which a thyroid tumor stage is known; and d) identifying a DET58, DET59, DET60, DET61, DET62, DET63, DET64, difference, if present, in expression levels of one or more DET65, DET66, DET67, DET68, DET69, DET70, DET71, nucleic acid sequences selected from the group consisting of 25 DET72, DET73, DET74, DET75, DET76, DET77, DET78, DET1, DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET79, DET80, DET81, DET82, DET83, DET84, and DET9, DET10, DET11, DET12, DET13, DET14, DET15, DET85; and measuring the expression of one or more nucleic DET16, DET17, DET18, DET19, DET20, DET21, DET22, acid sequences selected from the group consisting of DET 1. DET23, DET24, DET25, DET26, DET27, DET28, DET29, DET2, DET3, DET4, DET5, DET6, DET7, DET8, DET9, DET30, DET31, DET32, DET33, DET34, DET35, DET36, 30 DET10, DET11, DET12, DET13, DET14, DET15, DET16, DET37, DET38, DET39, DET40, DET41, DET42, DET43, DET17, DET18, DET19, DET20, DET21, DET22, DET23, DET44, DET45, DET46, DET47, DET48, DET49, DET50, DET24, DET25, DET26, DET27, DET28, DET29, DET30, DET51, DET52, DET53, DET54, DET55, DET56, DET57, DET31, DET32, DET33, DET34, DET35, DET36, DET37, DET58, DET59, DET60, DET61, DET62, DET63, DET64, DET38, DET39, DET40, DET41, DET42, DET43, DET44, DET65, DET66, DET67, DET68, DET69, DET70, DET71, 35 DET45, DET46, DET47, DET48, DET49, DET50, DET51, DET72, DET73, DET74, DET75, DET76, DET77, DET78, DET52, DET53, DET54, DET55, DET56, DET57, DET58, DET79, DET80, DET81, DET82, DET83, DET84, and DET59, DET60, DET61, DET62, DET63, DET64, DET65, DET85, in the cell population and reference cell population, DET66, DET67, DET68, DET69, DET70, DET71, DET72, Such that if there is a difference corresponding to an improve DET73, DET74, DET75, DET76, DET77, DET78, DET79, ment, atherapeutic agent for treating athyroid tumor has been 40 DET80, DET81, DET82, DET83, DET84, and DET85 in the identified. cell population; and determining the identity of the agent to As disclosed herein, the method of identifying an agent for treat the thyroid tumor, wherein the determination is made by treating a thyroid tumor comprises receiving gene expression applying a statistical classifier or predictor model to the gene data after contacting with a test agent a population of thyroid expression data; and outputting the identity of an agent to tumor cells from a Subject for which a tumor stage is known, 45 treat the thyroid tumor based on the determination. wherein at least one cell in said test population is capable of In the methods of the present invention, the classifier, pre expressing one or more nucleic acid sequences selected from dictor model, or diagnosis-predictor model can be a com the group consisting of differentially expressed thyroid genes pound covariate predictor, a diagonal linear discriminant DET1, DET2, DET3, DET4, DET5, and DET6; and measur analysis, nearest-neighbor classification, or Support vector ing the expression of one or more nucleic acid sequences 50 machines with linear kernel. selected from the group consisting of DET1, DET2, DET3, In the methods of the present invention, the differentially DET4, DET5 and DET6 in the cell population; and determin expressed thyroid genes incorporated into the classifier, pre ing the identity of the agent to treat the thyroid tumor, wherein dictor model, or diagnosis-predictor model can be differen the determination is made by applying a statistical classifier tially expressed in malignant vs. benign thyroid tumors with or predictor model to the gene expression data; and outputting 55 a level of statistical significance signified with a P value of the identity of an agent to treat the thyroid tumor based on the less than 0.05 using standard statistical analysis. More spe determination. cifically, the P value can be less than 0.0001 to limit the As disclosed herein, the method of identifying an agent for number of false positives. In the methods of the present inven treating a thyroid tumor comprises receiving gene expression tion, standard statistical analysis can be an ANOVA test with data after contacting with a test agent a population of thyroid 60 Bonferroni correction, or a random-variance t test. tumor cells from a Subject for which a tumor stage is known, The test agents used in the methods described herein can be wherein at least one cell in said test population is capable of made by methods standard in the art and include, but are not expressing one or more nucleic acid sequences selected from limited to, chemicals, Small molecules, antisense molecules, the group consisting of differentially expressed thyroid genes siRNAS, drugs, antibodies, peptides and secreted proteins. DET1, DET2, DET3, DET4, DET6, DET7, DET8, DET9, 65 By “improvement' is meant that the treatment leads to a DET 10, DET11; and measuring the expression of one or shift in a thyroid tumor stage to a less advanced stage. As more nucleic acid sequences selected from the group consist mentioned above, the expression pattern obtained for the test US 9,234,244 B2 47 48 cell population can be compared to expression patterns in a lesions consisting of cell populations from papillary thyroid database before and after contacting the test cell population carcinomas, follicular variant of papillary thyroid carcino with a test agent to determine the stage of the test cell popu mas, follicular carcinomas, and Hurthle cell carcinomas, lation before and after treatment. when compared to benign thyroid lesions consisting of cell The reference cell population can be from normal thyroid 5 populations from adenomatoid nodules, follicular adenomas, tissue. For example, if the cell population from the subject is Hurthle cell adenomas, and lymphocytic thyroid nodules. from an early stage thyroid tumor, and after treatment, the The present invention also provides a method for identify expression pattern of the cell population when compared to ing an agent for treating a thyroid tumor, the method com the reference cell population from normal thyroid tissue, is prising: a) contacting with a test agent a population of thyroid similar to that of the reference cell population, the agent is 10 tumor cells from a subject for which a tumor classification is effective in treating a thyroid tumor. By “similar is meant known, wherein at least one cell in said test population is that the expression pattern does not have to be exactly like the capable of expressing one or more nucleic acid sequences expression pattern from normal thyroid tissue but similar selected from the group consisting of C21orf4 (DET1), enough such that one of skill in the art would know that the Hs. 145049(DET2), HMGA2 (DET12), KLK7 (DET13), treatment leads to expression patterns more closely associ 15 MRC2 (DET14), LRRK2 (DET15), PLAG1 (DET16), ated with normal thyroid tissue. As an another example, if CYP1B1 (DET17), DPP4 (DET18), FNDC4 (DET19), both the cell population from the subject and the reference PHLDA2 (DET20), CCNA1 (DET21), CDH3 (DET22), cell population are from an early stage thyroid tumor, and CEACAM6 (DET23), QSCN6 (DET24), COL7A1 (DET25), after treatment, the expression pattern of the cell population is MGC9712 (DET26), IL1RAP (DET27), LAMB3 (DET28), similar to the reference cell population, the agent is not effec PRSS3 (DET29), LRP4 (DET30), SPOCK1 (DET31), tive in treating a thyroid tumor. By “similar is meant that the PDE5A (DET32), FLJ37078 (DET33), FBN3 (DET34), expression pattern does not have to be exactly like the expres DIRAS3 (DET35), PRSS1 (DET36), CAMK2N1 (DET37), sion pattern from the early stage thyroid tumor cell popula SNIP (DET38), KCNJ2 (DET39), SFN (DET40), GALNT7 tion but similar enough such that one of skill in the art would (DET41), TGFA (DET42), BAIAP3 (DET43), and KCNK15 know that the treatment does not lead to an expression pattern 25 (DET44); b) measuring the expression of one or more nucleic corresponding to a less advanced thyroid tumor stage. As acid sequences selected from the group consisting of C21orf4 another example, if both the cell population from the subject (DET1), Hs. 145049(DET2), HMGA2 (DET12), KLK7 and the reference cell population are from an early stage (DET13), MRC2 (DET14), LRRK2 (DET15), PLAG1 thyroid tumor, and after treatment, the expression pattern of (DET16), CYP1B1 (DET17), DPP4 (DET18), FNDC4 the cell population is different from the reference cell popu 30 (DET19), PHLDA2 (DET20), CCNA1 (DET21), CDH3 lation, and correlates with a less advanced thyroid tumor (DET22), CEACAM6 (DET23), QSCN6 (DET24), COL7A1 stage, the agent is effective in treating a thyroid tumor. These (DET25), MGC9712 (DET26), IL1RAP (DET27), LAMB3 examples are not intended to be limiting with regard to the (DET28), PRSS3 (DET29), LRP4 (DET30), SPOCK1 types of thyroid tumor populations that can be contacted with (DET31), PDE5A (DET32), FLJ37078 (DET33), FBN3 an agent, the types of agents that can be utilized, the type of 35 (DET34), DIRAS3 (DET35), PRSS1 (DET36), CAMK2N1 reference cell population that can be utilized or the effects (DET37), SNIP (DET38), KCNJ2 (DET39), SFN (DET40), observed as there are numerous variations known to one of GALNT7 (DET41), TGFA (DET42), BAIAP3 (DET43), and skill in the art for performing these methods. KCNK15 (DET44) in the cell population; c) comparing the Also disclosed herein, the method of identifying an agent expression of the nucleic acid sequence(s) to the expression for treating a thyroid tumor by screening tumor cells for 40 of the same nucleic acid sequence(s) in a reference cell popu agents that preferentially decrease the expression of the lation comprising at least one cell for which a thyroid tumor nucleic acid sequences found in the malignant vs. the benign classification is known; and d) identifying a difference, if tumor cells, wherein those nucleic acid sequences are present, in expression levels of one or more nucleic acid selected from the group consisting of C21orf4 (DET1) and sequences selected from the group consisting of C21orf4 Hs. 145049(DET2) that are upregulated in malignant thyroid 45 (DET1), Hs. 145049(DET2), HMGA2 (DET12), KLK7 lesions consisting of cell populations from papillary thyroid (DET13), MRC2 (DET14), LRRK2 (DET15), PLAG1 carcinomas and follicular variant of papillary thyroid carci (DET16), CYP1B1 (DET17), DPP4 (DET18), FNDC4 nomas, when compared to benign thyroid lesions consisting (DET19), PHLDA2 (DET20), CCNA1 (DET21), CDH3 of cell populations from follicular adenomas and hyperplastic (DET22), CEACAM6 (DET23), QSCN6 (DET24), COL7A1 nodules. 50 (DET25), MGC9712 (DET26), IL1RAP (DET27), LAMB3 Also disclosed herein, the method of identifying an agent (DET28), PRSS3 (DET29), LRP4 (DET30), SPOCK1 for treating a thyroid tumor by screening tumor cells for (DET31), PDE5A (DET32), FLJ37078 (DET33), FBN3 agents that preferentially decrease the expression of the (DET34), DIRAS3 (DET35), PRSS1 (DET36), CAMK2N1 nucleic acid sequences found in the malignant vs. the benign (DET37), SNIP (DET38), KCNJ2 (DET39), SFN (DET40), tumor cells, wherein those nucleic acid sequences are 55 GALNT7 (DET41), TGFA (DET42), BAIAP3 (DET43), and selected from the group consisting of HMGA2 (DET12), KCNK15 (DET44) in the cell population and reference cell KLK7 (DET13), MRC2 (DET14), LRRK2 (DET15), PLAG1 population, such that if there is a downregulation correspond (DET16), CYP1B1 (DET17), DPP4 (DET18), FNDC4 ing to an improvement, then a therapeutic agent for treating a (DET19), PHLDA2 (DET20), CCNA1 (DET21), CDH3 thyroid tumor has been identified. For example, if any one of (DET22), CEACAM6 (DET23), QSCN6 (DET24), COL7A1 60 these DET genes, DET 1, DET2, and DET12-44, are used (DET25), MGC9712 (DET26), IL1RAP (DET27), LAMB3 alone, it has been shown that they were all significantly dif (DET28), PRSS3 (DET29), LRP4 (DET30), SPOCK1 ferentially overexpressed in malignant vs. benign tumor (DET31), PDE5A (DET32), FLJ37078 (DET33), FBN3 types. In the methods of the invention the malignant cell (DET34), DIRAS3 (DET35), PRSS1 (DET36), CAMK2N1 populations can be from papillary thyroid carcinomas, folli (DET37), SNIP (DET38), KCNJ2 (DET39), SFN (DET40), 65 cular variant of papillary thyroid carcinomas, follicular car GALNT7 (DET41), TGFA (DET42), BAIAP3 (DET43), and cinomas, and Hurthle cell carcinomas. In the methods of the KCNK15 (DET44), that are upregulated in malignant thyroid invention the benign cell populations can be from follicular US 9,234,244 B2 49 50 adenomas, hyperplastic nodules, adenomatoid nodules, group consisting of KIT(DET4), LSM7(DET5), C11orf3 Hurthle cell adenomas, and lymphocytic thyroid nodules. (DET7), FAM13A1(DET9), IMPACT(DET10), KIAA1128 Also disclosed herein, the method of identifying an agent (DET11), CDH1(DET8), RAG2 (DET45), CLYBL for treating a thyroid tumor by screening tumor cells for (DET46), NEB (DET47), TNFRSF11B (DET48), GNAI1 agents that preferentially increase the expression of the 5 (DET49), AGTR1 (DET50), HLF (DET51), SLC26A4 nucleic acid sequences found in the malignant vs. the benign (DET52), MT1A (DET53), FABP4 (DET54), LRP1B tumor cells, wherein those nucleic acid sequences are (DET55), SLC4A4 (DET56), LOC646278 (DET57), selected from the group consisting of KIT(DET4), LSM7 MAN1C1 (DET58), KCNIP3 (DET59), DNAJB9 (DET60), (DET5), C11orf8(DET7), FAM13A1(DET9), IMPACT UBR1 (DET61), HSD17B6 (DET62), SLC33A1 (DET63), (DET10), KIAA1128(DET11), and CDH1(DET8), that are 10 CDH16 (DET64), TBC1D1 (DET65), SLC26A7 (DET66), downregulated in malignant thyroid lesions consisting of cell C11orf74 (DET67), PLA2R1 (DET68), PTTG3 (DET69), populations from papillary thyroid carcinomas and follicular EFEMP1 (DET70), ZMAT4 (DET71), STEAP3 (DET72), variant of papillary thyroid carcinomas, when compared to DIO1 (DET73), TPO (DET74), PTTG1 (DET75), LGI3 benign thyroid lesions consisting of cell populations from (DET76), TMEM38B (DET77), SLITRK4 (DET78), VBP1 follicular adenomas and hyperplastic nodules. 15 (DET79), COL9A3 (DET80), IRS1 (DET81), STARD13 Also disclosed herein, the method of identifying an agent (DET82), LOC654085 (DET83), RPS3A (DET84), and for treating a thyroid tumor by screening tumor cells for SPARCL1 (DET85) in the cell population; c) comparing the agents that preferentially increase the expression of the expression of the nucleic acid sequence(s) to the expression nucleic acid sequences found in the malignant vs. the benign of the same nucleic acid sequence(s) in a reference cell popu tumor cells, wherein those nucleic acid sequences are lation comprising at least one cell for which a thyroid tumor selected from the group consisting of KIT(DET4), RAG2 classification is known; and d) identifying a difference, if (DET45), CLYBL (DET46), NEB (DET47), TNFRSF11B present, in expression levels of one or more nucleic acid (DET48), GNAI1 (DET49), AGTR1 (DET50), HLF sequences selected from the group consisting of KIT(DET4), (DET51), SLC26A4 (DET52), MT1A (DET53), FABP4 LSM7(DET5), C11orf8(DET7), FAM13A1(DET9), (DET54), LRP1B (DET55), SLC4A4 (DET56), LOC646278 25 IMPACT(DET10), KIAA1128(DET11), CDH1(DET8), (DET57), MAN1C1 (DET58), KCNIP3 (DET59), DNAJB9 RAG2 (DET45), CLYBL (DET46), NEB (DET47), (DET60), UBR1 (DET61), HSD17B6 (DET62), SLC33A1 TNFRSF11B (DET48), GNAI1 (DET49), AGTR1 (DET50), (DET63), CDH16 (DET64), TBC1D1 (DET65), SLC26A7 HLF (DET51), SLC26A4 (DET52), MT1A (DET53), (DET66), C11orf74 (DET67), PLA2R1 (DET68), PTTG3 FABP4 (DET54), LRP1B (DET55), SLC4A4 (DET56), (DET69), EFEMP1 (DET70), ZMAT4 (DET71), STEAP3 30 LOC646278 (DET57), MAN1C1 (DET58), KCNIP3 (DET72), DIO1 (DET73), TPO (DET74), PTTG1 (DET75), (DET59), DNAJB9 (DET60), UBR1 (DET61), HSD17B6 LGI3 (DET76), TMEM38B (DET77), SLITRK4 (DET78), (DET62), SLC33A1 (DET63), CDH16 (DET64), TBC1D1 VBP1 (DET79), COL9A3 (DET80), IRS1 (DET81), (DET65), SLC26A7 (DET66), C11orf74 (DET67), PLA2R1 STARD13 (DET82), LOC654085 (DET83), RPS3A (DET68), PTTG3 (DET69), EFEMP1 (DET70), ZMAT4 (DET84), and SPARCL1 (DET85), that are downregulated in 35 (DET71), STEAP3 (DET72), DIO1 (DET73), TPO malignant thyroid lesions consisting of cell populations from (DET74), PTTG1 (DET75), LGI3 (DET76), TMEM38B papillary thyroid carcinomas and follicular variant of papil (DET77), SLITRK4 (DET78), VBP1 (DET79), COL9A3 lary thyroid carcinomas, follicular variant of papillary thyroid (DET80), IRS1 (DET81), STARD13 (DET82), LOC654085 carcinomas, follicular carcinomas, and Hurthle cell carcino (DET83), RPS3A (DET84), and SPARCL1 (DET85) in the mas, when compared to benign thyroid lesions consisting of 40 cell population and reference cell population, Such that if cell populations from adenomatoid nodules, follicular there is an upregulation corresponding to an improvement, adenomas, Hurthle cell adenomas, and lymphocytic thyroid then a therapeutic agent for treating a thyroid tumor has been nodules. identified. For example, if any one of these DET genes, The present invention also provides a method for identify DET4, DET5, DETT-11, and DET45-85, are used alone, it ing an agent for treating a thyroid tumor, the method com 45 has been shown that they were all significantly differentially prising: a) contacting with a test agent a population of thyroid underexpressed in malignant vs. benign tumor types. In the tumor cells from a subject for which a tumor classification is methods of the invention the malignant cell populations can known, wherein at least one cell in said test population is be from papillary thyroid carcinomas, follicular variant of capable of expressing one or more nucleic acid sequences papillary thyroid carcinomas, follicular carcinomas, and selected from the group consisting of KIT(DET4), LSM7 50 Hurthle cell carcinomas. In the methods of the invention the (DET5), C11orf8(DET7), FAM13A1(DET9), IMPACT benign cell populations can be from follicular adenomas, (DET10), KIAA1128(DET11), CDH1(DET8), RAG2 hyperplastic nodules, adenomatoid nodules, Hurthle cell (DET45), CLYBL (DET46), NEB (DET47), TNFRSF11B adenomas, and lymphocytic thyroid nodules. (DET48), GNAI1 (DET49), AGTR1 (DET50), HLF Treatment Methods (DET51), SLC26A4 (DET52), MT1A (DET53), FABP4 55 Also provided by the present invention is a method of (DET54), LRP1B (DET55), SLC4A4 (DET56), LOC646278 treating malignant thyroid lesions or thyroid cancer in a Sub (DET57), MAN1C1 (DET58), KCNIP3 (DET59), DNAJB9 ject Suffering from or at risk of developing thyroid cancer (DET60), UBR1 (DET61), HSD17B6 (DET62), SLC33A1 comprising administering to the Subject an agent that modu (DET63), CDH16 (DET64), TBC1D1 (DET65), SLC26A7 lates the expression of one or more DET sequences. By “at (DET66), C11orf74 (DET67), PLA2R1 (DET68), PTTG3 60 risk for developing is meant that the Subject's prognosis is (DET69), EFEMP1 (DET70), ZMAT4 (DET71), STEAP3 less favorable and that the subject has an increased likelihood (DET72), DIO1 (DET73), TPO (DET74), PTTG1 (DET75), of developing thyroid cancer. Administration of the agent can LGI3 (DET76), TMEM38B (DET77), SLITRK4 (DET78), be prophylactic or therapeutic. VBP1 (DET79), COL9A3 (DET80), IRS1 (DET81), By "modulation' is meant that the expression of one or STARD13 (DET82), LOC654085 (DET83), RPS3A 65 more DET sequences can be increased or decreased. (DET84), and SPARCL1 (DET85); b) measuring the expres For example, KIT(DET4), LSM7(DET5), C11orf8 sion of one or more nucleic acid sequences selected from the (DET7), FAM13A1(DET9), IMPACT(DET10), KIAA1128 US 9,234,244 B2 51 52 (DET11), CDH1(DET8), RAG2 (DET45), CLYBL notherapeutics methods include administration of (DET46), NEB (DET47), TNFRSF11B (DET48), GNAI1 interleukin-2 and interferon-C. (DET49), AGTR1 (DET50), HLF (DET51), SLC26A4 The following are lists of anti-cancer (anti-neoplastic) (DET52), MT1A (DET53), FABP4 (DET54), LRP1B drugs that can be used in conjunction with the presently (DET55), SLC4A4 (DET56), LOC646278 (DET57), disclosed methods. Antineoplastic: Acivicin; Aclarubicin; MAN1C1 (DET58), KCNIP3 (DET59), DNAJB9 (DET60), Acodazole Hydrochloride: AcrCnine: Adozelesin; Aldesleu UBR1 (DET61), HSD17B6 (DET62), SLC33A1 (DET63), kin; Altretamine; Ambomycin; Ametantrone Acetate; Ami CDH16 (DET64), TBC1D1 (DET65), SLC26A7 (DET66), noglutethimide; Amsacrine; Anastrozole; Anthramycin; C11orf74 (DET67), PLA2R1 (DET68), PTTG3 (DET69), Asparaginase; Asperlin, AZacitidine: AZetepa, AZotomycin; 10 Batimastat; Benzodepa; Bicalutamide: Bisantrene Hydro EFEMP1 (DET70), ZMAT4 (DET71), STEAP3 (DET72), chloride; Bisnafide Dimesylate; Bizelesin; Bleomycin Sul DIO1 (DET73), TPO (DET74), PTTG1 (DET75), LGI3 fate: Brequinar Sodium; Bropirimine; Busulfan; Cactinomy (DET76), TMEM38B (DET77), SLITRK4 (DET78), VBP1 cin; Calusterone; Caracemide: Carbetimer; Carboplatin: (DET79), COL9A3 (DET80), IRS1 (DET81), STARD13 Carmustine; Carubicin Hydrochloride; Carzelesin; Cedefin (DET82), LOC654085 (DET83), RPS3A (DET84), 15 gol; Chlorambucil; Cirolemycin; Cisplatin: Cladribine; Cri SPARCL1 (DET85) were all downregulated or underex snatol Mesylate: Cyclophosphamide: Cytarabine; Dacarba pressed in malignant thyroid lesions as compared to benign zine; Dactinomycin; Daunorubicin Hydrochloride: thyroid tissue. Therefore, a subject can be treated with an Decitabine; Dexormaplatin: Dezaguanine; DeZaguanine effective amount of an agent that increases the amount of the Mesylate; Diaziquone; Docetaxel; Doxorubicin; Doxorubi downregulated or underexpressed nucleic acids in the Sub cin Hydrochloride; Droloxifene; Droloxifene Citrate; Dro ject. Administration can be systemic or local, e.g. in the mostanolone Propionate; Duazomycin; Edatrexate; Eflo immediate vicinity of the subjects cancerous cells. This mithine Hydrochloride; Elsamitrucin; Enloplatin: agent can be for example, the protein product of a downregu Enpromate: Epipropidine: Epirubicin Hydrochloride; Erbu lated or underexpressed DET gene or a biologically active lozole; Esorubicin Hydrochloride; Estramustine; Estramus fragment thereof, a nucleic acid encoding a downregulated or 25 tine Phosphate Sodium; Etanidazole; Ethiodized Oil I 131; underexpressed DET gene and having expression control Etoposide; Etoposide Phosphate; Etoprine: Fadrozole sequences permitting expression in the thyroid cancer cells or Hydrochloride; Fazarabine; Fenretinide; Floxuridine: Flu an agent which increases the endogenous level of expression darabine Phosphate: Fluorouracil (e.g., 5-fluorouracil); Fluo of the gene. rocitabine; Fosquidone; Fostriecin Sodium; Gemcitabine; With regard to genes that are upregulated or overexpressed 30 Gemcitabine Hydrochloride; Gold Au 198: Hydroxyurea; in malignant as compared to benign thyroid tissue, C21orf4 Idarubicin Hydrochloride; Ifosfamide; Ilmofosine; Inter (DET1), Hs. 145049(DET2), HMGA2 (DET12), KLK7 feron Alfa-2a: Interferon Alfa-2b: Interferon Alfa-n1; Inter (DET13), MRC2 (DET14), LRRK2 (DET15), PLAG1 feron Alfa-n3; Interferon Beta-I a: Interferon Gamma-Ib, (DET16), CYP1B1 (DET17), DPP4 (DET18), FNDC4 Iproplatin; Irinotecan Hydrochloride; Lanreotide Acetate; (DET19), PHLDA2 (DET20), CCNA1 (DET21), CDH3 35 Letrozole; Leuprolide Acetate; Liarozole Hydrochloride: (DET22), CEACAM6 (DET23), QSCN6 (DET24), COL7A1 Lometrexol Sodium; Lomustine; Losoxantrone Hydrochlo (DET25), MGC9712 (DET26), IL1RAP (DET27), LAMB3 ride; Masoprocol; Maytansine; Mechlorethamine Hydro (DET28), PRSS3 (DET29), LRP4 (DET30), SPOCK1 chloride; Megestrol Acetate; Melengestrol Acetate; Mel (DET31), PDE5A (DET32), FLJ37078 (DET33), FBN3 phalan; Menogaril; Mercaptopurine; Methotrexate: (DET34), DIRAS3 (DET35), PRSS1 (DET36), CAMK2N1 40 Methotrexate Sodium; Metoprine; Meturedepa; Mitindo (DET37), SNIP (DET38), KCNJ2 (DET39), SFN (DET40), mide; Mitocarcin; Mitocromin; Mitogillin; Mitomalcin: GALNT7 (DET41), TGFA (DET42), BAIAP3 (DET43), and Mitomycin; Mitosper; Mitotane; Mitoxantrone Hydrochlo KCNK15 (DET44) were upregulated or overexpressed in ride; Mycophenolic Acid; Nocodazole; Nogalamycin; Orma malignant thyroid lesions as compared to benign thyroid tis platin: Oxisuran; Paclitaxel; Pegaspargase; Peliomycin; Pen sue. Therefore, a subject can be treated with an effective 45 tamustine: Peplomycin Sulfate; Perfosfamide; Pipobroman; amount of an agent that decreases the amount of the upregu Piposulfan; Piroxantrone Hydrochloride; Plicamycin; Plom lated or overexpressed nucleic acids in the Subject. Adminis estane; Porfimer Sodium; Porfiromycin; Prednimustine; Pro tration can be systemic or local, e.g. in the immediate vicinity carbazine Hydrochloride; Puromycin; Puromycin Hydro of the Subjects cancerous cells. The agent can be, for chloride; Pyrazofurin; Riboprine; Rogletimide: Safmgol; example, a nucleic acid that inhibits or antagonizes the 50 Safingol Hydrochloride; Semustine; SimtraZene; Sparfosate expression of the overexpressed DET gene. Such as an anti Sodium; Sparsomycin; Spirogermanium Hydrochloride; sense nucleic acid or an siRNA. The agent can also be an Spiromustine; Spiroplatin; Streptonigrin: Streptozocin, antibody that binds to a DET protein that is overexpressed. Strontium Chloride Sr 89: Sulofenur; Talisomycin; Taxane: In the treatment methods of the present invention, the sub Taxoid: Tecogalan Sodium; Tegafur, Teloxantrone Hydro ject can be treated with one or more agents which decrease the 55 chloride; Temoporfin; Teniposide; Teroxirone; Testolactone: expression of overexpressed DET sequences alone or in com Thiamiprine: Thioguanine: Thiotepa; Tiazofurin; Tira bination with one or more agents which increase the expres pazamine; Topotecan Hydrochloride; Toremifene Citrate; sion of DET sequences that are downregulated or underex Trestolone Acetate; Triciribine Phosphate: Trimetrexate; Tri pressed in thyroid cancer. The subject can also be treated with metrexate Glucuronate; Triptorelin; Tubulozole Hydrochlo one or more agents which increase the expression of DET 60 ride; Uracil Mustard; Uredepa; Vapreotide; Verteporfin; Vin sequences that are downregulated or underexpressed in thy blastine Sulfate; Vincristine Sulfate; Vindesine; Vindesine roid cancer alone, or in combination with one or more agents Sulfate; Vinepidine Sulfate; Vinglycinate Sulfate; Vinleuro which decrease the expression of overexpressed DET sine Sulfate; Vinorelbine Tartrate; Vinrosidine Sulfate; Vin Sequences. Zolidine Sulfate; Vorozole; Zeniplatin: Zinostatin: Zorubicin These treatment methods can be combined with other anti 65 Hydrochloride. cancer treatments such as Surgery, chemotherapy, radio Other anti-neoplastic compounds include: 20-epi-1,25 therapy, immunotherapy or any combination thereof. Immu dihydroxyvitamin D3:5-ethynyluracil; abiraterone; aclarubi US 9,234,244 B2 53 54 cin; acylfulvene; adecypenol; adoZelesin; aldesleukin; ALL monophosphoryl lipid A+myobacterium cell wall sk; mopi TKantagonists; altretamine; ambamustine; amidox; amifos damol; multiple drug resistance genie inhibitor, multiple tine; aminolevulinic acid; amrubicin; atrsacrine; anagrelide; tumor Suppressor 1-based therapy; mustard anticancer agent; anastroZole; andrographolide, angiogenesis inhibitors; my caperoxide B; mycobacterial cell wall extract; myriapor antagonist D; antagonist G. antarelix; anti-dorsalizing mor one; N-acetyldinaline; N-substituted benzamides; nafarelin; phogenetic protein-1, antiandrogen, prostatic carcinoma; nagrestip; naloxone-pentazocine; napavin; naphterpin, nar antiestrogen; antineoplaston; antisense oligonucleotides; tograstim; nedaplatin: nemorubicin; neridronic acid; neutral aphidicolin glycinate; apoptosis gene modulators; apoptosis endopeptidase; nilutamide; nisamycin; nitric oxide modula regulators; apurinic acid; ara-CDP-DL-PTBA; arginine tors; nitroxide antioxidant; nitrullyn; O6-benzylguanine; oct deaminase; asulacrine; atamestane; atrimustine; axinastatin 10 reotide; okicenone; oligonucleotides; onapristone; 1; axinastatin 2; axinastatin 3; aZasetron; azatoxin; azaty ondansetron; ondansetron; oracin; oral cytokine inducer, rosine; baccatin III derivatives; balanol; batimastat; BCR/ ormaplatin: osaterone; oxaliplatin, oxaunomycin; paclitaxel ABL antagonists; benzochlorins; benzoylstaurosporine; beta analogues; paclitaxel derivatives; palauamine; palmitoyl lactam derivatives; beta-alethine; betaclamycin B; betulinic rhizoxin; pamidronic acid; panaxytriol; panomifene; para acid; bFGF inhibitor; bicalutamide; bisantrene; bisaziridinyl 15 bactin; paZelliptine; pegaspargase; peldesine; pentosan spermine; bisnafide; bistratene A: bizelesin; breflate; bropir polysulfate sodium; pentostatin: pentrozole; perflubron; per imine; budotitane; buthionine Sulfoximine; calcipotriol: fosfamide; perillyl alcohol; phenazinomycin; phenylacetate; calphostin C; camptothecin derivatives; canarypox IL-2; phosphatase inhibitors; picibanil; pilocarpine hydrochloride; capecitabine; carboxamide-amino-triazole; carboxyamidot pirarubicin; piritrexim; placetin A; placetin B; plasminogen riazole; CaRest M3; CARN 700; cartilage derived inhibitor; activator inhibitor, platinum complex; platinum compounds; carzelesin; casein kinase inhibitors (ICOS); castanosper platinum-triamine complex; porfimer Sodium; porfiromycin; mine; cecropin B; cetrorelix; chlorins; chloroquinoxaline Sul propyl bis-acridone; prostaglandin J2, proteasome inhibitors; fonamide; cicaprost; cis-porphyrin, cladribine; clomifene protein A-based immune modulator, protein kinase C inhibi analogues; clotrimazole; collismycin A; collismycin B; com tor, protein kinase C inhibitors, microalgal; protein tyrosine bretastatin A4, combretastatin analogue; conagenin, cramb 25 phosphatase inhibitors; purine nucleoside phosphorylase escidin 816; crisinatol; cryptophycin 8: cryptophycin A inhibitors; purpurins; pyrazoloacridine; pyridoxylated hemo derivatives; curacin A; cyclopentanthraquinones; cyclo globin polyoxyethylene conjugate; raf antagonists; raltitr platam, cypemycin; cytarabine ocfosfate; cytolytic factor, exed; ramosetron; ras farnesyl protein transferase inhibitors; cytostatin: dacliximab; decitabine; dehydrodidemnin B; ras inhibitors; ras-GAP inhibitor; retelliptine demethylated: deslorelin; dexifosfamide; dexraZOxane; dexVerapamil; 30 rhenium Re 186 etidronate; rhizoxin: ribozymes: RII retina diaziquone; didemnin B; didox; diethylnorspermine; dihy mide; rogletimide; rohitukine; romurtide; roquinimex: dro-5-azacytidine; dihydrotaxol, 9-; dioxamycin; diphenyl rubiginone B1; ruboxyl; safingol: Saintopin; SarCNU; sarco spiromustine; docosanol; dolasetron: doxifluridine; drolox phytol A; SargramoStim; Sdi 1 mimetics; Semustine; senes ifene; dronabinol; duocannycin SA; ebselen; ecomustine; cence derived inhibitor 1; sense oligonucleotides; signal edelfosine; edrecolomab: eflornithine; elemene; emitefur, 35 transduction inhibitors; signal transduction modulators; epirubicin, epristeride; estramustine analogue; estrogen ago single chain antigenbinding protein; sizofuran; Sobuzoxane: nists; estrogen antagonists; etanidazole; etoposide phos Sodium borocaptate; sodium phenylacetate; Solverol; phate; exemestane; fadrozole; faZarabine; fenretinide; Somatomedin binding protein; Sonermin; Sparfosic acid; spi filgrastim; finasteride; flavopiridol; flezelastine; fluasterone: camycin D; spiromustine; splenopentin; spongistatin 1: fludarabine; fluorodaunorunicin hydrochloride; forfenimex: 40 squalamine; stem cell inhibitor; stem-cell division inhibitors; formestane; fostriecin, fotemustine; gadolinium texaphyrin, stipiamide; stromelysin inhibitors; Sulfmosine; Superactive gallium nitrate; galocitabine; ganirelix; gelatinase inhibitors; vasoactive intestinal peptide antagonist; Suradista; Suramin; gemcitabine; glutathione inhibitors; hepsulfam; heregulin; Swainsonine; synthetic glycosaminoglycans; tallimustine; hexamethylene bisacetamide; hypericin; ibandronic acid; tamoxifen methiodide; tauromustine; tazarotene; tecogalan idarubicin; idoxifene; idramantone, ilmofosine; illomastat; 45 Sodium, tegafur, tellurapyrylium; telomerase inhibitors; imidazoacridones; imiquimod; immunostimulant peptides; temoporfin, temozolomide; teniposide; tetrachlorodecaox insulin-like growth factor-1 receptor inhibitor; interferon ide; tetrazomine; thaliblastine; thalidomide; thiocoraline; agonists; interferons; interleukins; iobenguane; iododoxoru thrombopoietin; thrombopoietin mimetic; thymalfasin; thy bicin; ipomeanol, 4-, irinotecan; iroplact; irsogladine; mopoietin receptoragonist; thymotrinan; thyroid stimulating isobengaZole; isohomohalicondrin B; itasetron, jasplakino 50 hormone; tin ethyl etiopurpurin; tirapazamine; titanocene lide; kahalalide F. lamellarin-N triacetate; lanreotide; leina dichloride; topotecan; top sentin; toremifene; totipotent stem mycin; lenograstim; lentinan Sulfate; leptolstatin; letrozole; cell factor; translation inhibitors; tretinoin; triacetyluridine: leukemia inhibiting factor, leukocyte alpha interferon; leu triciribine; trimetrexate; triptorelin; tropisetron; turosteride; prolide--estrogen-progesterone; leuprorelin; levamisole; tyrosine kinase inhibitors; tyrphostins; UBC inhibitors: ube liarozole; linear polyamine analogue; lipophilic disaccharide 55 nimex: urogenital sinus-derived growth inhibitory factor; peptide; lipophilic platinum compounds; lissoclinamide 7: urokinase receptor antagonists; vapreotide; variolin B; vector lobaplatin, lombricine; lometrexol; lonidamine; losox system, erythrocyte gene therapy; Velaresol; Veramine; Ver antrone; lovastatin; loxoribine; lurtotecan; lutetium texaphy dins; verteporfin; vinorelbine; vinxaltine; vitaxin; Vorozole; rin; lysofylline; lytic peptides; maitansine; mannostatin A; Zanoterone; Zeniplatin: Zilascorb; Zinostatin stimalamer. marimastat; masoprocol; maspin; matrilysin inhibitors; 60 Identification of Differentially Expressed Thyroid Genes matrix metalloproteinase inhibitors; menogaril; merbarone; The present invention also provides a method of identify meterelin; methioninase; metoclopramide: MIF inhibitor; ing differentially expressed genes and/or expression patterns mifepristone; miltefosine; mirimostim; mismatched double for Such genes in other types of benign and malignantlesions. Stranded RNA, mitoguaZone; mitolactol; mitomycin ana As set forth in the Examples, one of skill in the art can utilize logues; mitonafide; mitotoxin fibroblast growth factor-sa 65 gene expression profiling and Supervised machine learning porin; mitoxantrone; mofarotene; molgramostim; mono algorithms to construct a molecular classification scheme for clonal antibody, human chorionic gonadotrophin; other types of thyroid tumors. These include any type of US 9,234,244 B2 55 56 benign lesion Such as papillary adenoma, multinodular goiter DET5, DET6, DET7, DET8, DET9, DET 10, DET11, or thyroiditis nodule, and any type of malignant lesion, Such DET12, DET13, DET14, DET15, DET16, DET17, DET18, as papillary thyroid carcinoma, follicular carcinoma, Hurthle DET19, DET20, DET21, DET22, DET23, DET24, DET25, cell tumor, anaplastic thyroid cancer, medullary thyroid can DET26, DET27, DET28, DET29, DET30, DET31, DET32, cer, thyroid lymphoma, poorly differentiated thyroid cancer 5 DET33, DET34, DET35, DET36, DET37, DET38, DET39, and thyroid angiosarcoma. Those genes and expression pat DET40, DET41, DET42, DET43, DET44, DET45, DET46, terns identified via these methods can be utilized in the meth DET47, DET48, DET49, DET50, DET51, DET52, DET53, ods of the present invention to diagnose, stage and treat can DET54, DET55, DET56, DET57, DET58, DET59, DET60, C. DET61, DET62, DET63, DET64, DET65, DET66, DET67, Kits 10 DET68, DET69, DET70, DET71, DET72, DET73, DET74, The present invention also provides a kit comprising one or DET75, DET76, DET77, DET78, DET79, DET80, DET81, more reagents for detecting one or more nucleic acid DET82, DET83, DET84, and DET85; and probes for detect sequences selected from the group consisting of DET1 ing DET1, DET2, DET3, DET4, DET5, DET6, DET7, DET85. In various embodiments the expression of one or DET8, DET9, DET10, DET11, DET12, DET13, DET14, more of the sequences represented by DET1-DET85 are mea 15 DET15, DET16, DET17, DET18, DET19, DET20, DET21, sured. The kit can identify the DET nucleic acids by having DET22, DET23, DET24, DET25, DET26, DET27, DET28, homologous nucleic acid sequences, such as oligonucleotide DET29, DET30, DET31, DET32, DET33, DET34, DET35, sequences, complimentary to a portion of the recited nucleic DET36, DET37, DET38, DET39, DET40, DET41, DET42, acids, or antibodies to proteins encoded by the DET nucleic DET43, DET44, DET45, DET46, DET47, DET48, DET49, acids. The kit can also include amplification primers for per DET50, DET51, DET52, DET53, DET54, DET55, DET56, forming RT-PCR, such as those set forth in Table 4 and DET57, DET58, DET59, DET60, DET61, DET62, DET63, probes, such as those set forth in Table 4, that can be fluores DET64, DET65, DET66, DET67, DET68, DET69, DET70, cently labeled for detecting amplification products in, for DET71, DET72, DET73, DET74, DET75, DET76, DET77, example, a Taqman assay. The kits of the present invention DET78, DET79, DET80, DET81, DET82, DET83, DET84, can optionally include buffers, enzymes, detectable labels 25 and DET85 amplification products. The kit can also contain and other reagents for the detecting expression of DET an antibody or antibodies that detect one or more of the DET sequences described herein. proteins encoded by one or more of the nucleic acid For example, a kit comprising one or more reagents for sequences consisting of DET1, DET2, DET3, DET4, DET5, detecting the expression of one or more nucleic acid(s) DET6, DET7, DET8, DET9, DET10, DET11, DET12, selected from the group consisting of DET1, DET2, DET3, 30 DET13, DET14, DET15, DET16, DET17, DET18, DET19, DET4, DET5, and DET6. The kit can contain amplification DET20, DET21, DET22, DET23, DET24, DET25, DET26, primers for DET1, DET2, DET3, DET4, DET5 and DET6: DET27, DET28, DET29, DET30, DET31, DET32, DET33, and probes for detecting DET1, DET2, DET3, DET4, DET5 DET34, DET35, DET36, DET37, DET38, DET39, DET40, and DET6 amplification products. The kit can also contain an DET41, DET42, DET43, DET44, DET45, DET46, DET47, antibody or antibodies that detect one or more of the DET 35 DET48, DET49, DET50, DET51, DET52, DET53, DET54, proteins encoded by one or more of the nucleic acid DET55, DET56, DET57, DET58, DET59, DET60, DET61, sequences consisting of DET1, DET2, DET3, DET4, DET5 DET62, DET63, DET64, DET65, DET66, DET67, DET68, and DET6. DET69, DET70, DET71, DET72, DET73, DET74, DET75, A further example is a kit comprising one or more reagents DET76, DET77, DET78, DET79, DET80, DET81, DET82, for detecting the expression of one or more nucleic acid(s) 40 DET83, DET84, and DET85. selected from the group consisting of DET1, DET2, DET3, The following examples are put forth so as to provide those DET4, DET6, DET7, DET8, DET9, DET10 and DET11. The of ordinary skill in the art with a complete disclosure and kit can contain amplification primers for DET1, DET2, description of how the antibodies, polypeptides, nucleic DET3, DET4, DET6, DET7, DET8, DET9, DET10 and acids, compositions, and/or methods claimed herein are made DET11; and probes for detecting DET1, DET2, DET3, 45 and evaluated, and are intended to be purely exemplary of the DET4, DET6, DET7, DET8, DET9, DET10 and DET11 invention and are not intended to limit the scope of what the amplification products. The kit can also contain an antibody inventors regard as their invention. Efforts have been made to or antibodies that detect one or more of the DET proteins ensure accuracy with respect to numbers (e.g., amounts, tem encoded by one or more of the nucleic acid sequences con perature, etc.), but some errors and deviations should be sisting of DET1, DET2, DET3, DET4, DET6, DET7, DET8, 50 accounted for. DET9, DET10 and DET11. A further example is a kit comprising one or more reagents EXAMPLES for detecting the expression of one or more nucleic acid(s) selected from the group consisting of DET1, DET2, DET3, DNA microarrays allow quick and complete evaluation of DET4, DET5, DET6, DET7, DET8, DET9, DET 10, DET11, 55 a cell's transcriptional activity. Expression genomics is very DET12, DET13, DET14, DET15, DET16, DET17, DET18, powerful in that it can generate expression data for a large DET19, DET20, DET21, DET22, DET23, DET24, DET25, number of genes simultaneously across multiple samples. In DET26, DET27, DET28, DET29, DET30, DET31, DET32, cancer research, an intriguing application of expression DET33, DET34, DET35, DET36, DET37, DET38, DET39, arrays includes assessing the molecular components of the DET40, DET41, DET42, DET43, DET44, DET45, DET46, 60 neoplastic process and in cancer classification (1). Classifi DET47, DET48, DET49, DET50, DET51, DET52, DET53, cation of human cancers into distinct groups based on their DET54, DET55, DET56, DET57, DET58, DET59, DET60, molecular profile rather than their histological appearance DET61, DET62, DET63, DET64, DET65, DET66, DET67, can be more relevant to specific cancer diagnoses and cancer DET68, DET69, DET70, DET71, DET72, DET73, DET74, treatment regimes. Several attempts to formulate a consensus DET75, DET76, DET77, DET78, DET79, DET80, DET81, 65 about classification and treatment of thyroid carcinoma based DET82, DET83, DET84, and DET85. The kit can contain on standard histopathologic analysis have resulted in pub amplification primers for DET1, DET2, DET3, DET4, lished guidelines for diagnosis and initial disease manage US 9,234,244 B2 57 58 ment (2). In the past few decades no improvement has been tions of RNA species was then hybridized to the same made in the differential diagnosis of thyroid tumors by fine microarray and incubated for 16 hr at 42°C. cDNA microar needle aspiration biopsy (FNA), specifically Suspicious or rays were then washed and scanned using the GenePix(R) indeterminate thyroid lesions, suggesting that a new approach 4000B (Axon Instruments Inc., CA) and images were ana to this should be explored. lyzed with GenePix software version 3.0. For each sample a There is a compelling need to develop more accurate initial file containing the image of the array and an Excel file con diagnostic tests for evaluating a thyroid nodule. Recent stud taining the expression ratio values for each gene was ies suggest that gene expression data from cDNA microarray uploaded onto the Madb Array web-site (National Center for analysis holds promise for improving tumor classification Biotechnology Information/NIH) http://nciarray.nci.nih.gov and for predicting response to therapy among cancer patients 10 for further analysis. To accurately compare measurements (17) (18) (19). No clear consensus exists regarding which from different experiments, the data were normalized and the computational tool is optimal for the analysis of large gene ratio (Signal Cy5/Signal Cy3) was calculated so that the expression profiling datasets, especially when they are used median (Ratio) was 1.0. to predict outcome (20). Immunohistochemistry This invention describes the use of gene expression profil 15 Immunohistochemistry studies utilizing antibodies to two ing and Supervised machine learning algorithms to construct gene products in the predictor models have also been per a molecular classification scheme for thyroid tumors (22). formed and these data correlate with the expression data. The gene expression signatures provided herein include new Taqman analysis was performed for CHD1 and KIT. Both tumor related genes whose encoded proteins can be useful for KIT and CDH1 expression decreased in malignancy, which improving the diagnosis of thyroid tumors. correlates with the microarray data. As shown in FIG. 6, immunohistochemical results show that both KIT and CDH1 Example 1 expression decrease in malignancy which correlates with the expression results obtained via microarray and Taqman DET1-DET11 analysis. 25 Statistical Analysis In this study a gene expression approach was developed to Data from the 73 thyroid tumors were used to build a diagnose benign VS malignant thyroid lesions in 73 patients benign (FA and HN) vs. malignant (PTC and FVPTC) expres with thyroid tumors. A 10 gene and 6 gene model were sion ratio-based model, capable of predicting the diagnosis developed to be able to differentiate benign vs. malignant (benign VS malignant) of each sample. After normalization, a thyroid tumors. These results provide a molecular classifica 30 file containing the gene expression ratio values from all 73 tion system for thyroid tumors and this in turn provides a samples was imported into a statistical analysis Software more accurate diagnostic tool for the clinician managing package (Partek Inc., MO). Samples were divided in two sets: patients with Suspicious thyroid lesions. one set (63 samples) was used to train the diagnosis predictor Tissue Samples model and a second set (10) was used as a validation set to test Thyroid tissues collected under John Hopkins University 35 the model. These 10 samples were not previously used to do Hospital Institutional Review Board-approved protocols any other analysis. As a first step, the data from the 63 samples were snap-frozen in liquid nitrogen and stored at -80°C. until were subjected to Principal Component Analysis (PCA) to use. The specimens were chosen based on their tumor type: perform an exploratory analysis and to view the overall trend papillary thyroid carcinoma (PTC n=17), follicular variant of of the data. PCA is an exploratorytechnique that describes the PTC (FVPTC n=15), follicular adenoma (FA n=16) and 40 structure of high dimensional data by reducing its dimension hyperplastic nodule (HN n=15). All diagnoses were made by ality. It is a linear transformation that converts in original the Surgical Pathology Department at Johns Hopkins. variables (gene expression ratio values) into n new variables Tissue Processing and Isolation of RNA or principal components (PC) which have three important Frozen sections of 100-300 mg of tissue were collected in properties: they 1) are ordered by the amount of variance test tubes containing 1 ml of Trizol. Samples were transferred 45 explained; 2) are uncorrelated and; 3) they explain all varia to FastRNA tubes containing mini beads and homogenized in tion in the data. The new observations (each array) are repre a FastPrep beater (Bio 101 Savant, Carlsbad, Calif.) for 1.5 sented by points in a three dimensional space. The distance min at speed 6. The lysate was transferred to a new tube and between any pair of points is related to the similarity between total RNA was extracted according to the Trizol protocol the two observations in high dimensional space. Observations (Molecular Research Center, Inc. Cincinnati, Ohio). 50 that are near each other are similar for a large number of Approximately 12 ug of total RNA was obtained from each variables and conversely, the ones that are far apart are dif tumor sample. The total RNA was then subjected to two ferent for a large number of variables. rounds of amplification following the modified Eberwine An Anovatest with Bonferroni correction was then used to method (23) (24) resulting in approximately 42 lug of mes identify genes that were statistically different between the senger RNA (mRNA). The quality of the extracted RNA was 55 two groups. The resulting significant genes were used to build tested by spectrophotometry and by evaluations on minichips a diagnosis-predictor model. Variable (gene) selection analy (BioAnalyzer, Agilent Tecnologies, Palo Alto, Calif.). sis with cross-validation was performed different times, each Microarray Analysis time testing a different number of gene combinations. For Hybridization was performed on 10 k human cINA cross-validation the “leave-one-out' method was used to esti microarrays, Hs-UniGem2, produced by the NCI/NIH (ATC, 60 mate the accuracy of the output class prediction rule: the Gaithersburg, Md.). Comparisons were made for each tumor whole dataset was divided into different parts and each part with the same control which consisted of amplified RNA was individually removed from the data set. The remaining extracted from normal thyroid tissue and provided by Ambion data set was used to train a class prediction rule; and the Inc (Austin, Tex.). Fluorescent marker dyes (Cy5 and Cy3) resulting rule was applied to predict the class of the “held were used to label the test and control samples, respectively. 65 out” sample. The respective dyes and samples were also switched in order Anova test with Bonferroni correction was used on 9100 to test for any labeling bias. The mixture of the two popula genes to identify ones that were statistically different among US 9,234,244 B2 59 60 the 4 groups. PCA analysis of the 63 samples (FIG. 1) using dictive diagnostic algorithm based on a training set of the statistically significant genes showed a clear organization samples, it was necessary to have a "common reference of the samples based on diagnosis. The same analysis (Anova standard to which all individual samples are compared. In this test with Bonferronicorrection) was performed on the dataset way, differences between each, and in fact all, samples could organized, this time, in benign (HN-FA) and malignant (PTC be analyzed. Had each tumor been compared to the adjacent FVPTC). For this analysis, 47 genes were found to be signifi normal thyroid tissue from the same patient, it would only be cantly different between the benign and the malignant group possible to comment on gene changes within each patient. A (Table 1). PCA analysis also separated the data clearly into source of RNA from normal thyroid tissue was chosen since two groups (FIG. 2). the source was replenishable and could be used for all of our For the purpose of this invention, attention was focused on 10 the analysis of the dataset separating benign from malignant. future experiments once the diagnostic predictor algorithm These 47 genes were used to build a diagnostic predictor was validated. model. Variable (gene) selection analysis with cross valida The mRNA extracted from each sample was amplified. It tion was performed with a different number of gene combi was found that the quality of the arrays and the data derived nations. After cross-validation the model was 87.1% accurate 15 from them is superior when mRNA has been amplified from in predicting benign versus malignant with an error rate of total RNA. Of note, all samples and all reference controls 12.9% (Table 2). This suggested that it was possible to use the were amplified in the same fashion. Analysis of the overall data to create a diagnostic predictor model. gene expression profiles revealed that the benign lesions (FA, The most accurate results were obtained with a combina HN) could be distinguished from the malignantlesions (PTC, tion of 6 to 10 genes. This combination of genes constituted a FVPTC). Furthermore, although not statistically significant, predictor model and a validation set of 10 additional thyroid the 4 tumor sub-types appeared to have different gene pro samples was used to confirm the accuracy of this model files. The use of a powerful statistical analysis program (Table 3). The pathologic diagnosis for each sample was kept (Partek) helped discover a group of 11 genes that were infor blinded to researchers at the time of the analysis. When the mative enough to create a predictor model. Two combinations blind was broken, it was found that 9 of the samples were 25 were created out of these 11 genes, a combination of six genes diagnosed in concordance with the pathologic diagnosis by and a combination of 10 genes. PCA analysis of the six most our model. One sample that was originally diagnosed as a informative genes resulted in a nearly perfect distinction benign tumor by standard histologic criteria, was diagnosed between the two groups (FIG. 3A-B). In general, PCA analy as malignant by our model. This sample was re-reviewed by sis describes similarities between samples and is not a com the Pathology Department at The Johns Hopkins Hospital and 30 monly employed tool for predicting diagnosis. However, in was Subsequently found to be a neoplasm of uncertain malig this study the distinction was so powerful that it was possible nant potential. The diagnosis was changed by pathology after to visually make a correct diagnosis for each of the 10 review for clinical reasons, not because of the gene profiling. unknown samples (FIG. 3A-B). The predictor model deter What is so extraordinary about this is that this was not dis mines the kind of tumor with a specific probability value covered until the genotyping Suggested that the lesion might 35 diagnosis of all 10 unknown samples was correctly predicted, be malignant and the pathology report examined a second with a more accurate prediction using the six-gene combina time. By that time the report had been amended and it Sug tion (Table 3, see probabilities). It is clear from the graph in gested that the tumor had undetermined malignant potential. FIG. 4 how the combination of gene expression values gives Regarding the other tumors, all were examined a second time a distinctly different profile between the benign and malig before array analysis to be certain that the tissue was repre 40 nant lesions. However, within each tumor group there are sentative and consistent with the pathology report. Therefore, differences among the profiles of the five samples tested. This this model was correct in assigning the diagnosis in all 10 could be explained by the fact that each tumor, even if of the CaSCS. same type, could be at a different stage of progression. PCA analysis using only the six most informative genes Of the 11 genes that were informative for the diagnosis, was conducted on all the samples with and without the 10 45 five genes are known genes and for the other six genes no unknown samples (FIG. 3A-B). It is clear from the PCA functional studies are yet available. The genes that were iden organization that the six genes strongly distinguish benign tified are the ones that the model has determined best group from malignant. In addition, these same genes can be used for the known samples into their correct diagnosis. Those genes diagnosis with respect to the four Subcategories of thyroid identified are the ones that consistently grouped the samples lesion. Between the two-predictor models 11 genes are infor 50 into the categories and Subcategories described herein. This mative. type of pattern assignment is based on the analysis of thou The identification of markers that can determine a specific sands of genes and the recognition by the computer Software type of tumor, predict patient outcome or the tumor response that certain patterns are associated repeatedly with certain to specific therapies is currently a major focus of cancer diagnostic groups. This type of analysis derives it power (and research. This invention provides the use of gene expression 55 significance) by the number of genes that are analyzed, rather profiling to build a predictor model able to distinguish a than the degree of up or down regulation of any particular benign thyroid tumor from a malignant one. Such a model, gene. With respect to the specific genes identified, the com when applied to FNA cytology, could greatly impact the puter is not biased by the knowledge of previously identified clinical management of patients with Suspicious thyroid genes associated with thyroid cancer. The genes it identifies lesions. To build the predictor model four types of thyroid 60 are those that best differentiate the varied diagnoses of the lesions, papillary thyroid carcinoma (PTC), follicular variant known samples. This occurs during the “training phase of of papillary thyroid carcinoma (FVPTC), follicular adenoma establishing the algorithm. Once the computer is trained with (FA) and hyperplastic nodules (HN) were used. Taken data from comparisons of RNA from known diagnoses to a together, these represent the majority of thyroid lesions that standard reference, unknowns can be tested and fit to the often present as “suspicious’. The choice of the appropriate 65 diagnostic groups predicted during the training. For the pur control for comparative array experiments is often the Subject poses of Such an approach, individual genes are less impor of much discussion. In this case, in order to construct a pre tant. A specific gene which is found in a univariate study to be US 9,234,244 B2 61 62 associated with thyroid cancer, may not turn out to be the best also provides the GenBank Accession No. corresponding to multivariate predictor of a diagnosis in an analysis such as the each gene and the location of the primer and probe sequences one presented here. within the full-length nucleotide sequences provided under TaqMan Assay Utilizing 6 Gene Predictor Model and 10 the GenBank Accession Nos. Table 4 also provides the Gene Predictor Model InCytePD clone number for each gene (if available), a Uni Utilizing the information obtained for these differentially gene identification number for each gene (if available), the expressed genes TaqMan Real Time PCR analysis for the chromosomal location for each gene, and additional informa group of 6 genes and the group of 10 genes that are diagnostic tion about the primers and probes. The primer and probe for benign versus malignant thyroid lesions from total RNA sequences set forth in Table 4 are examples of the primers and extracted from thyroid tissue as well as RNA from control 10 probes that can be utilized to amplify and detect DET 1-11. normal thyroids was performed. TaqMan Real Time PCR These examples should not be limiting as one of skill in the art analysis was also performed for the group of 10 genes that are would know that other primer sequences for DET1-DET11 diagnostic for benign versus malignant thyroid lesions. including primers comprising the sequences set forth in Table Thyroid samples were collected under Johns Hopkins Uni 4 and fragments thereof can be utilized to amplify DET1 versity Hospital Institutional Review Board-approved proto 15 DET11. Similarly, other probes which specifically detect cols. The samples were Snap-frozen in liquid nitrogen and DET1-DET11 can be utilized such as probes that comprise stored at -80°C. until use. The specimens were chosen based the probe sequences set forth in Table 4 and fragments on their tumor type: papillary thyroid cancer (PTC); follicular thereof. variant of papillary thyroid cancer (FVPTC); follicular Primers and probes were synthesized by Sigma (sequences adenoma (FA); and hyperplastic nodule (HN). All diagnoses shown in Table 4: Sigma, The Woodlands,Tex.). Probes were were made using standard clinical criteria by the Surgical labeled at the 5' end with the reporter dye FAM (emission Pathology Department at Johns Hopkins University Hospital. wavelength, 518 nm) and at the 3' end with the quencher dye Tissue Processing and Isolation of RNA TAMRA (emission wavelength, 582 nm). Standards were Frozen sections of 100-300 mg of tissue were collected in created for the six genes using gel-extracted PCR products test tubes containing 1 ml of Trizol. Samples were transferred 25 (Qiagen, Valencia, Calif.). The G3PDH standard was created to FastRNATM tubes containing mini beads and homogenized using a plasmid construct containing the relevant G3PDH in a FastPrep beater (Biol(01SavantTM, Carlsbad, Calif.) for sequence (kind gift of Dr. Tetsuya Moriuchi, Osaka Univer 1.5 min at speed 6. The lysate was transferred to a new tube sity'). For PCR, 12.5ul TaqMan Universal PCR Master Mix, and total RNA was extracted according to the Trizol protocol 0.5ul per well each of 0.5 LM forward and reverse primers, in a final volume of 40 ul Rnase-free water (Molecular 30 and 0.5 ul per well of 10 uM dual labelled fluorescent probe Research Center, Inc., Cincinnati, Ohio). The quality of the were combined and adjusted to a total volume of 20 ul with extracted RNA was tested by spectrophotometry and by Rnase-free water. Finally, 5ul cDNA per well was added to a evaluation on minichips (BioAnalyzer, Agilent Technolo total reaction volume of 25 ul. The PCR reaction was per gies, Palo Alto, Calif.). Minimal criteria for a successful total formed for 40 cycles of a two-step program: denaturation at RNA run were the presence of two ribosomal peaks and one 35 95°C. for 15 seconds, annealing and extension at 60°C. for 1 marker peak. Normal human thyroid RNA (Clontech, BD minute. The fluorescence was read at the completion of the Biosciences) served as a reference control. The total RNA 60° C. step. For each experiment, a no-template reaction was extracted from tissue samples and normal thyroid was then included as a negative control. Each cDNA sample was tested used as the template for one round of reverse transcription to in triplicate, and the mean values were calculated. Triplicate generate cDNA. Eight microliters of purified total RNA (con 40 values varied by no more than 10% from the mean. We used taining up to 3 ug of total RNA) was added to a mix containing the standard curve absolute quantification technique to quan 3 ug/1 ul of random hexamer primers, 4 Jul of 1x reverse tify copy number. A standard curve was generated using a transcription buffer, 2 ul of DTT, 2 ul of dNTPs, 1 ul of Rnase ten-fold dilution series of four different known concentra inhibitor, and 2 ul of SuperScript II reverse transcriptase (200 tions of the standards. The number of PCR cycles required for U/ul) in a 20 ul reaction volume (all purchased from Invitro 45 the threshold detection of the fluorescence signal (cycle gen, Carlsbad, Calif.). Reverse transcription was performed threshold or Ct) was determined for each sample. Ct values of according to the SuperScript First-Strand Synthesis System the standard samples were determined and plotted against the instructions (Invitrogen, Carlsbad, Calif.). Following the log amount of standard. Ct values of the unknown samples reverse transcription reaction, the SuperScript II enzyme was were then compared with the standard curve to determine the heat inactivated, and degradation of the original template 50 amount of target in the unknown sample. Standard curves RNA was performed using 2U/1 ul of RNAse H (Invitrogen, from each experiment were compared to insure accurate, Carlsbad, Calif.) for 20 minutes at 37°C. The final volume of precise and reproducible results. Each plate contained dupli the mixture was brought to 500 ul using Rnase free water and cate copies of serial dilutions of known standards and stored at -20°C. until use. G3PDH, triplicate copies of cDNA from each sample and Quantitative Real-Time PCR 55 normal thyroid cDNA for amplification of G3PDH and the For the quantitative analysis of mRNA expression, ABI gene of interest. Prism 7500 Sequence Detection System (Applied Biosys Statistical Analysis tems) was used and the data analyzed using the Applied Data from 41 of the thyroid tumors were used to build a Biosystems 7500 System SDS Software Version 1.2.2. Prim benign (FA, n=15; HN, n=10) versus malignant (PTC, n=9; ers and probes for the genes of interest and for G3PDH were 60 FVPTC, n=7) expression ratio-based model, capable of pre designed using the Primer Express software (version 2.0: dicting the diagnosis (benign versus malignant) of each Applied BioSystems). Each primer was designed to produce sample. Ten additional samples were provided as blinded an approximately 70-150 by amplicon. Primer and probe specimens, processed as described above and used as a vali sequences that can be utilized in the 6 gene predictor model dation set to test the model. These ten samples were not and the 10 gene predictor model are listed in Table 4. Table 4 65 previously used to do any other analysis. Expression values of lists the forward and reverse primer for each gene as well as all six genes in all samples and normal thyroid were standard the fluorescent probe sequence that was dual labeled. Table 4 ized to the expression of G3PDH, a common housekeeping US 9,234,244 B2 63 64 gene chosen to serve as a reference control. The ratio of the DET4, DET6, DET7, DET8, DET9, DET10 and DET11. As expression values for each gene in each sample was then shown in FIG. 7, similar to results obtained via microarray, compared to the ratio in normal thyroid, and converted to log c21orfA, Hs. 145049 (Hs. 24.183), KIT, FAM13A1, C11orf8, 2 to generate a gene expression ratio value for all 41 samples. KIAA1128, IMPACT and CDH1 were upregulated in benign A file containing the gene expression ratio values from all 51 samples as compared to malignant samples. In other words, samples (41 known, 10unknown) was imported into a statis the expression of c21orf4, Hs.145049, KIT, FAM13A1, tical analysis Software package (Partek, Inc., St. Charles, C11orf8, KIAA1128, IMPACT and CDH1 decreases during Mo.). malignancy. Hs.29.6031 and SYNGR2 were upregulated in As a first step, the data from the 41 samples were subjected malignant samples as compared to benign samples. In other to principal component analysis (PCA) to provide a three 10 dimensional visualization of the data. All six genes were used words, expression of Hs.29.6031 and SYNGR2 increases dur to build a diagnosis-predictor model called a class prediction ing malignancy. Therefore, it is clear that this pattern of rule. This resulting rule was applied to predict the class of the differences between malignant and benign samples can be ten samples in the validation set. The same analysis was then utilized to classify thyroid lesions utilizing the 6 gene model performed on a second set of data from 47 of the thyroid 15 and the 10 gene model. In addition to classification, the Real tumors to build a benign (FA, n=15; HN, n=11) versus malig Time PCR Taqman assay can also be used for staging thyroid nant (PTC, n=9; FVPTC, n=12) expression ratio-based cancer and in identifying agents that treat thyroid tumors. model. Ten additional unstudied samples were provided as Analysis of the 6 gene expression and the 10 gene expres blinded specimens for this second training set. sion profiles revealed that the benign lesions could be distin Principal Component Analysis (PCA) of the 41 samples guished from the malignantlesions, and that this profile could using the gene expression values for all six genes showed a be used to diagnose unknown samples against the current clear organization of the samples based on diagnosis. PCA 'gold standard of pathologic criteria with a high degree of was then conducted on all of the 41 samples with the 10 accuracy. Of the six genes in the six gene model, downregu unknown samples. This combination of genes constituted a lation of KIT was seen in tissue of both benign and malignant first predictor model and the validation set of 10 additional 25 thyroid lesions when compared to normal control. The mag thyroid samples was used to confirm the accuracy of the nitude of this downregulation was much greater in malignant model. The pathological diagnosis for each sample was kept thyroid tissue. Kit is a well-known protooncogene. blinded until after the analysis was completed. When the As to the other five genes in the six gene model, for three of blind was broken, it was found that 8 of the 10 unknown these no functional studies are yet available. Of the remaining samples were diagnosed by this model in concordance with 30 the pathological diagnosis determined by standard pathologic two genes, SYNGR2 has been characterized as an integral criteria. One sample that was originally diagnosed as a benign vesicle membrane protein. LSM7 likewise has been follicular adenoma by standard histological criteria was diag described in the family of Sm-like proteins, possibly involved nosed as malignant by the six gene predictor model set forth in pre-mRNA splicing. The interaction of LSM7 with the herein; one sample that was originally diagnosed as a papil 35 TACC1 complex may participate in breast cancer oncogen lary thyroid carcinoma by standard histological criteria was esis. However, the role of LSM7 in thyroid oncogenesis has diagnosed as benign by the six gene predictor model set forth not yet been explored. herein. The six gene model determined the accurate diagnosis of Further to the analysis above, the G3PDH standard was 17 out of 20 unknown samples tested. Accuracy was based on redesigned and processing of all tissue for total RNA extrac 40 a comparison to the 'gold standard” pathologic diagnosis as tion was standardized. Following these two modifications, determined by clinical pathologists. Therefore, this strategy Principal Component Analysis (PCA) was performed on the demonstrates the power of genomic analysis as a technique second training set of 47 samples and on ten new unknown for studying the underlying pathways responsible for the samples using the gene expression values for all six genes. pathophysiology of neuroendocrine tumors. Further evalua Again, PCA demonstrated a clear organization of the samples 45 tion and linkage of clinical data to molecular profiling allows based on diagnosis. The pathological diagnosis for these ten for a better understanding of tumor pathogenesis, or even new unknowns was also kept blinded until after the analysis. normal thyroid function and development. In addition, the use When the blind was broken, it was found that 9 of the samples of qRT-PCR can lead to incorporation of this model and/or the were diagnosed in concordance with the pathological diag 10 gene model into preoperative decision making for patients nosis by the six gene predictor model set forth herein. One 50 with thyroid nodules. sample that was diagnosed as a benign hyperplastic nodule by The present invention is a clear example of how gene standard histological criteria was diagnosed as malignant by expression profiling can provide highly useful diagnostic our model. information. It is likely that gene expression profiling will be The results of the Taqman assays correlated with the used in the future for clinical decision-making. For this pur microarray data. As shown in FIG. 5, the Taqman data utiliz 55 ing the 6 gene model (DET1, DET2, DET3, DET4, DET5, pose adequate reporting of DNA-microarray data to clini DET6) demonstrate the ability to classify a thyroid sample as cians will be necessary. Gene-expression profiles may be benign or malignant. Similar to results obtained via microar more reproducible and clinically applicable than well-estab ray, c21orf4, Hs. 145049, KIT and LSM-7 were upregulated lished but highly subjective techniques, such as histopathol in benign samples as compared to malignant samples. In other 60 ogy. The small number of genes for which RNA expression words, the expression ofc21orf4, Hs. 145049, KIT and LSM7 levels are diagnostically and prognostically relevant could decreases during malignancy. Hs.29.6031 and SYNGR2 were lead to a robust, affordable, commercially available testing upregulated in malignant samples as compared to benign system. To this end, the present invention provides a useful samples. In other words, expression of Hs.29.6031 and method for classifying thyroid nodules as benign or malig SYNGR2 increases during malignancy. The same analysis 65 nant and therefore helps facilitate appropriate, and eliminate was performed with the 10 gene model utilizing the primers unnecessary, operations in patients with Suspicious thyroid and probes set forth in Table 4 for DET 1, DET2, DET3, tumors. US 9,234,244 B2 65 66 Example 2 array tracker.nci.nih.gov/ inciarrays.manual.october.2006.pdf). In order to test for label DET4 and DET12-DET85 ing bias, 10 representative tumor samples were used in dye Swap experiments. Dye Swap experiments were performed Although fine-needle aspiration biopsy (FNA) is the most with Cy5-labeled control cRNA and Cy3-labeled tumor useful diagnostic tool in evaluating a thyroid nodule, preop cRNA erative diagnosis of thyroid nodules is frequently imprecise, Bioinformatics and Statistical Analysis. with up to 30% of FNA cytology samples reported as suspi After image analysis using GenePix Pro 5.0, raw data from cious or indeterminate. Therefore, other adjuncts, such as all 125 arrays were arranged in mAdb (http://nciarray.ncini molecular-based diagnostic approaches are needed in the pre 10 h.gov/) and then exported for further analysis with BRB operative distinction of these lesions. In an attempt to identify ArrayTools (25). diagnostic markers for the preoperative distinction of these For each array, global normalization was performed to lesions, microarray analysis was used to study the 8 different median the center of the log-ratios in order to adjust for thyroid tumor Subtypes that can present a diagnostic chal differences in labeling intensities of the Cy3 and Cy5 dyes. lenge to the clinician. 15 Genes exhibiting minimal variation across the set of arrays The present microaray-based analysis of 94thyroid tumors from different tumor subtypes were excluded and only genes identified 75 genes that are differentially expressed between exhibiting expression differences of at least 1.5 fold from the benign and malignant tumor Subtypes. Of these, 33 were median in at least 20% of the arrays were retained for analy over-expressed, and 42 under-expressed, in malignant com S1S. pared to benign thyroid tumors. Statistical analysis of these Class Comparison genes, using Nearest Neighbor classification showed a 73% Genes that were differentially expressed between malig sensitivity and 82% specificity in predicting malignancy. nant and benign thyroid tumors were identified using a ran Real-time RT-PCR validation for 12 of these genes was con dom-variance t-test (26). In order to limit the number of false firmatory. Tissue validation by Western blot and immunohis positives, genes were included only if their p value was less tochemical analyses of one of the genes, HMGA2, further 25 than 0.001. We also performed a global test of whether the validated the microarray and real-time RT-PCR data. These expression profiles differed between benign and malignant by 12 genes are useful in the development of a panel of markers permuting (1000 times) the labels of which array corre to differentiate benign from malignant tumors and thus serve sponded to which category. For each permutation, the p val as an important step in Solving the clinical problem associated ues were re-computed and the number of significant genes with Suspicious thyroid lesions. 30 (ps001) was noted. The proportion of permutations that Tumor Specimens resulted in at least as many genes as with the actual data was A total of 125 thyroid tumor samples were collected from the significance level of the global test. patients who underwent thyroidectomy at Johns Hopkins Class Prediction Medical Institutions (Baltimore, Md.) between 2000 and We developed models that utilized gene expression profiles 2005. All samples were collected with Institutional Review 35 to predict class of tumors (benign vs. malignant). The models Board approval. Following Surgical excision, samples were were based on several classification methods: Compound Snap frozen in liquid nitrogen and stored at -80° C. until use. Covariate Predictor (27), Diagonal Linear Discriminant The specimens included 70 benign tumors (20 adenomatoid Analysis (28), Nearest Neighbor Classification (28), and Sup nodules, 20 follicular adenomas, 17 Hirthle cell adenomas, port Vector Machines with linear kernel (29). Genes that were 13 lymphocytic thyroiditis nodules) and 55 malignant tumors 40 differentially expressed (ps0.001) were then incorporated (19 papillary thyroid carcinomas, 16 follicular variant of pap into these models (26). We estimated the prediction error for illary thyroid carcinomas, 14 follicular carcinomas, and each model using leave-one-out cross-validation (LOOCV). 6Hirthle cell carcinomas). Each sample was obtained form For each LOOCV set, the entire model was recreated, includ the center of the tumor. ing the gene selection process. We also evaluated whether the RNA Isolation 45 cross-validated error rate for any given model was signifi Fresh frozen sections were reviewed by a pathologist to cantly less than what one would expect from random predic Verify the presence of tumor prior to tissue processing and tion. Class labels were randomly permuted and the entire RNA extraction. Total RNA was isolated from 50-75 mg of LOOCV process was repeated 1,000 times. The significance each tumor using TRIZol reagent (Invitrogen) and purified level was the proportion of the random permutations that gave with the RNeasy Kit (Qiagen). The quantity and the integrity 50 a cross-validated error rate no greater than the rate obtained of extracted RNA was determined by ND-1000 Spectrometer with the real data. (Nanodrop Technologies) and Bioanalyzer Nano Labchips Real Time RT-PCR. (Agilent Technologies), respectively. RNA that included 56 To validate the genes found to be significantly differen pooled normal thyroid specimens was used as control (Clon tially expressed, real time RT-PCR was performed on a subset tech). 55 of76 tumors that were available from the original array analy cRNA Synthesis, Labeling and Microarray Hybridization sis as well as on a new set of 31 tumors. cDNA was synthe One microgram of total RNA from each sample was sub sized in a 50 ul reverse transcription reaction mixture that jected to single round amplification using Aminoallyl Mes contained 3 ug total RNA from each tumor. After optimiza sage AmpTM II aRNA Amplification Kit (Ambion Inc). After tion for each primer pair, real-time PCR assays were per amplification, 5ug of aminoallyl RNA (aaRNA) was labeled 60 formed on iQTM5 real-time PCR detection system (Bio-Rad using a Cy-dye coupling method according to the manufac Laboratories, Inc.) according to the manufacturer's recom turer's instructions. Both Cy5-labeled tumor cRNA and Cy3 mendations. Briefly, 1 ul of cDNA was used in a 25ul reaction labeled control cFNA were hybridized to a 34K-human-oli mixture that contained an optimal concentration (150-250 gonucleotide array produced by the National Cancer Institute nM) of primers and SYBR-Green Supermix. The thermal (NCI) microarray facility (http://array tracker.nci.nih.gov/). 65 profile for PCR consisted of Taq-polymerase activation at 95° Microarray hybridization, washing and scanning (GenePix C. for 3 minutes, followed by 40 cycles of PCR at 95°C. for 4000B) were performed as described in NCI protocol (http:// 20 seconds (denaturation), 55° C. for 30 seconds (annealing), US 9,234,244 B2 67 68 and 72°C. for 60 seconds (extension). An average Ct (thresh analysis. The specimens included: 50 benign tumors (13 old cycle) from duplicate assays was used for further calcu adenomatoid nodules, 13 follicular adenomas, 13Hirthle cell lation, and GAPDH-normalized gene expression was deter adenomas and 11 lymphocytic thyroiditis nodules) and 44 mined using the relative quantification method as formulated malignant tumors (13 papillary thyroid carcinomas, 13 folli below. Results were expressed as the median of 3-4 indepen cular variant of papillary thyroid carcinomas, 13 follicular dent measurements. carcinomas, and 5uHirthle cell carcinomas). Several of these Relative expression levels normalized to GAPDH=2 tumors were used more than once for the analysis, resulting in (Gene of interest Ct-GAPDH COX100 128 arrays (Table 5). Western Blot Analysis After the expression data from replicate samples were Total cellular proteins were extracted from thyroid tumors 10 averaged, 15.745 genes met criteria for inclusion in the analy and their matching normal thyroid tissues. Tissues (20-25 sis by BRB ArrayTools. By using a random-variance t-test, mg) were ground and lysed in 250 ul ice-cold M-PER lysis the class comparison (benign vs. malignant) analysis identi buffer (Pierce) supplemented with a protease inhibitor cock fied 75 genes that were significantly (ps0.001) differentially tail for 60 min at 4° C. Supernatants were collected after expressed between malignant and benign tumor types. Of centrifugation at 11,600xg at 4°C. and protein concentration 15 these 75 differentially expressed genes, 33 were over-ex was measured. Protein samples, loaded at 40 ug per lane were separated by 10% SDS-PAGE gels as described elsewhere. pressed (Table 6) and 42 were under-expressed (Table 7) in After transfer to a polyvinylidene diflouride membrane, both malignant thyroid tumors compared to benign. Principal transfer efficiency and protein loads were checked by Pon component analysis of the 94 samples using these 75 genes ceau S solution (Sigma). Specific proteins were probed with showed a clear organization of the samples based on diagno anti-HMGA2 antibody (sc-23684 Santa Cruz Biotechnology, sis (FIG. 12). Inc). We further developed additional models utilizing gene Tissue Array (TMA) expression data to predict and cross-validate the samples. In A total of 87 formalin-fixed, paraffin-embedded thyroid addition to this, we evaluated whether the estimated error-rate specimens from 87 different individuals were selected from (cross-validated) for each model was significantly less than the Surgical pathology archives of the Johns Hopkins Hospi 25 one would expect from random prediction. Statistical analy tal, including classic papillary thyroid carcinoma (n=20), fol sis using 1-Nearest Neighbor classification provided the best licular variant of papillary thyroid carcinoma (n=9), follicular results and showed a 73% sensitivity, 82% specificity and carcinoma (n=14), lymphocytic thyroiditis nodules (n=11). 78% positive predictive value for the prediction of malig follicular adenoma (n=14) and normal thyroid adjacent to nancy (Table 8). tumor (n=19). These cases were different than those used for 30 RT-PCR Analysis the gene expression analysis. Each case was reviewed by a In order to validate the authenticity of the microarray data, pathologist (DPC) to confirm the diagnosis and select appro RT-PCR analysis of two genes high mobility group AT-hook priate areas for inclusion in the tissue array. For follicular 2 (HMGA2) and pleomorphic adenoma gene 1 (PLAG 1) variant of papillary thyroid carcinoma, cores from areas within the tumor displaying florid nuclear features of papil was first performed using 11 follicular adenomas, 10 lary thyroid carcinoma and follicular architecture were cho 35 adenomatoid nodules, 10 papillary thyroid carcinomas and 7 sen for the TMA. Tissue cores (0.6 mm diameter) from follicular variant of papillary thyroid carcinomas (FIG. 8). selected areas were obtained using a manual Tissue Puncher/ These representative tumor samples were also used in the Arrayer (Beecher Instruments, Silver Spring, Md.) and a microarray analysis. As shown in FIG. 8A, the expression high-density tissue array was generated as previously levels of both HMGA2 and PLAG1 were found to be very described (30). In addition to thyroid tumors, each TMA 40 high in most of the malignant tumors (papillary and follicular block had nine cylinders from non-thyroid control tissues. variant of papillary thyroid carcinoma). In contrast, all benign Five-micron sections were cut, and one H&E-stained slide tumors (follicular adenomas and adenomatoid nodules) was examined to Verify the presence of diagnostic cells. exhibited no detectable levels of either HMGA2 or PLAG1 Immunohistochemistry even after extending the PCR cycles to 40, with the exception H & E staining and immunohistochemistry were done on 45 of one of the benign tumors (adenomatoid nodule: AN4) that 4-5 um serial sections of formalin-fixed paraffin-embedded showed appreciable levels of HMGA2 expression (FIG. 8B). tissue. Briefly, sections were deparaffinized in xylene and Real-time RT-PCR analysis of 6 genes sparcfosteonectin rehydrated through a series of alcohol gradients. Antigen CWCV and kazal-like domain proteoglycan (SPOCK1), car retrieval was achieved by heating in citrate buffer at pH 6.0. cinoembryonic antigen-related cell adhesion molecule 6 Endogenous peroxidase activity was quenched in 3% hydro 50 (CEACAM6), protease serine 3 (PRSS3/mesotrypsin), phos gen peroxide and nonspecific binding of secondary antibody phodiesterase 5A (PDE5A), leucine-rich repeat kinase 2 blocked by incubation with normal horse serum. Individual (LRRK2) and thyroid peroxidase (TPO5) was also per sections were incubated with anti-HMGA2 goat polyclonal antibodies overnight at 4° C. Conditions omitting primary formed using RNA from 76 of the original tumor set used in antibody were used as negative controls. A streptavidin-biotin the microarray analysis. The expected differential expression peroxidase detection system was used in accordance with the 55 was confirmed in 5 out of 6 genes (FIG. 2). SPOCK1, manufacturers instructions and then developed using 3,3'- CEACAM6, PRSS3 and LRRK2 were overexpressed in diaminobenzidine (Vector Laboratories, Inc). Sections were malignant compared to the benign tumor subtypes (Table 7 counterstained with hematoxylin and eosin. Formalin-fixed and FIG.9). TPO5 was underexpressed in the majority of the paraffin-embedded cellblock sections of lung cancer cell line, malignant subtypes (Table 6 and FIG.9). While we did not see H1299 (ATCC) were used as positive controls. 60 any significant difference between benign vs. malignant tumors, the papillary thyroid cancers exhibited elevated lev Results els of PDE5A compared to all other subtypes (FIG.9). In addition to the original set of tumor samples, a new set of Microarray and Statistical Analysis 31 thyroid tumors was also used for validation by real-time 65 RT-PCR. The new set of samples had not been used for the Ninety-four unique thyroid samples representing the 8 dif microarray analysis and was used to validate the following 6 ferent thyroid tumor subtypes were used for microarray genes: dipeptidyl-peptidase 4 (DPP4), cadherin 3 type1 US 9,234,244 B2 69 70 (CDH3), recombination activating gene2 (RAG2), angio 3. Schulze, A. and Downward, J. Navigating gene expression tensin II receptor type1 (AGTR1), HMGA2 and PLAG1. using microarrays—a technology review. Nat Cell Biol. 3: Again, all 6 genes that we analyzed were found to be differ E190-195, 2001. entially expressed in benign vs. malignant, as expected by the 4. Raychaudhuri, S., Sutphin, P. D., Chang, J.T., and Altman, microarray analysis (FIG. 10). Very high expression levels of 5 R. B. Basic microarray analysis: grouping and feature CDH3, HMGA2, and PLAG 1 were observed in all of the reduction. Trends Biotechnol, 19: 189-193, 2001. malignant Subtypes compared to the benign tumors. Indeed, 5. Vant Veer, L.J. and De Jong, D. The microarray way to the expression levels of HMGA2 and PLAG 1 were quantified tailored cancer treatment. Nature Medicine, 8: 13, 2002. this time using a new set of thyroid tumors, and both genes 6. Gordon, G. J., Jensen, R. V., Hsiao, L. L., Gullans, S.R., were overexpressed in the majority of malignant compared to 10 Blumenstock, J. E., Richards, W. G., Jaklitsch, M.T., Sug benign subtypes. Low expression levels of RAG2 and arbaker, D. J., and Bueno, R. Using gene expression ratios AGTR1 were documented in all malignant tumors (Table 7 to predict outcome among patients with mesothelioma. J and FIG. 10). With the exception of lymphocytic thyroiditis Natl Cancer Inst, 95:598-605, 2003. nodules that exhibited very high expression levels of DPP4, 15 7. West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, the other three benign Subtypes (follicular adenomas, S., Spang, R., Zuzan, H. Olson, J. A., Jr., Marks, J. R., and adenomatoid nodules and Hirthle cell adenomas) exhibited Nevins, J. R. Predicting the clinical status of human breast very low expression levels compared to malignant tumors cancer by using gene expression profiles. Proc Natl Acad (FIG. 10). Sci USA, 98: 11462-11467, 2001. Validation by Western Blot and Immunohistochemistry 8. Mazzaferri, E. L. Management of a solitary thyroid nodule. Analysis N. Engl. J. Med., 328:553-559, 1993. Overexpression of HMGA2 in malignant tumors com 9. Mazzaferri E L and S M. J. Long term impact of initial pared to benign subtypes was further confirmed by Western Surgical and medical therapy on paillary and follicular blot analysis and immunohistochemistry. As assessed by both thyroid cancer. Am J Pathol, 97: 418-428, 1994. Western blot analysis and immunohistochemistry, HMGA2 25 10. Goellner, J. R. Problems and pitfalls in thyroid cytology. was expressed only in tumors but not in normal thyroid (FIG. Monogr Pathol 75-93, 1997. 11). Western blot analysis revealed overall less protein 11. Hamberger, B., et al Fine-needle aspiration biopsy of expression in benign compared to malignant tumors (FIG. thyroid nodules. Impact on thyroid practice and cost of 11A). Based on immunohistochemistry, HMGA2 expression care. Am J Med, 73: 381-334, 1982. was observed in three patterns classification: (i) high 30 12. Suen, K. C. How does one separate cellular follicular expression (moderate to intense nuclear staining within lesions of the thyroid by fine-needle aspiration biopsy? >66% of tumor cells, (ii) moderate expression (moderate to Diagn Cytopathol. 4: 78-81, 1988. intense nuclear staining within 33-66% of tumor cells, (iii) low expression (low to moderate nuclear expression in <33% 13. Goellner, J. R., et al., Fine needle aspiration cytology of of cells) and (iv) negative (no nuclear expression). As shown 35 the thyroid, 1980 to 1986. Acta Cytol, 31:587-590, 1987. in Table 9, HMGA2 expression was positive in most of the 14. Caraway, N. P. Sneige, N., and Samaan, N. A. Diagnostic malignant tumors including papillary thyroid carcinomas (26 pitfalls in thyroid fine-needle aspiration: a review of 394 of 30; 87%), follicular variant of papillary thyroid carcinomas cases. Diagn Cytopathol, 9: 345-350, 1993. (13 of 16; 81%) and follicular carcinomas (11 of 14: 79%). In 15. Ravetto, C., Colombo, L., and Dottorini, M. E. Usefulness contrast, most of the benign tumors were negative for 40 of fine-needle aspiration in the diagnosis of thyroid carci HMGA2 expression, including follicular adenomas (22 of 25; noma: a retrospective study in 37,895 patients. Cancer, 90: 88%), adenomatoid nodules (8 of 10; 80%), and normal thy 357-363, 2000. roid (17 of 19: 89%). Low levels of HMGA2 expression were 16. Gharib, H., Goellner, J. R., Zinsmeister, A. R., Grant, C. detected in 6 of 11 (55%) lymphocytic thyroiditis nodules. S., and Van Heerden, J. A. Fine-needle aspiration biopsy of Representative HMGA2 immunostaining of six thyroid 45 the thyroid. The problem of Suspicious cytologic findings. tumors is shown in FIG. 4B. Throughout this application, Ann Intern Med, 101: 25-28, 1984. various publications are referenced. The disclosures of these 17. Staudt, L. M. Gene expression profiling of lymphoid publications in their entireties are hereby incorporated by malignancies. Arum Rev Med, 53:303-318, 2002. reference into this application in order to more fully describe 18. van de Vijver, M.J., He, Y. D., vant Veer, L.J., Dai, H., the state of the art to which this invention pertains. 50 Hart, A. A., Voskuil. D. W., Schreiber, G. J., Peterse, J. L., It will be apparent to those skilled in the art that various Roberts, C., Marton, M. J., Parrish, M., Atsma, D., Wit modifications and variations can be made in the present teveen, A., Glas, A., Delahaye, L. van der Velde, T., Bar invention without departing from the scope or spirit of the telink, H., Rodenhuis, S., Rutgers, E.T., Friend, S. H., and invention. Other embodiments of the invention will be appar Bernards, R. A gene-expression signature as a predictor of ent to those skilled in the art from consideration of the speci 55 survival in breast cancer. N. Engl J Med, 347: 1999-2009, fication and practice of the invention disclosed herein. It is 2002. intended that the specification and examples be considered as 19. Sauter, G. and Simon, R. Predictive molecular pathology. exemplary only, with a true scope and spirit of the invention N Engl J Med, 347: 1995-1996, 2002. being indicated by the following claims. 20. Simon, R., Radmacher, M.D., Dobbin, K., and McShane, 60 L. M. Pitfalls in the use of DNA microarray data for diag BIBLIOGRAPHY nostic and prognostic classification. J Natl Cancer Inst, 95: 14-18, 2003. 1. Miller, L. D., Long, P. M., Wong, L., Mukherjee, S., McS 21. Barden, C. B., Shister, K.W., Zhu, B., Guiter, G., Green hane, L. M., and Liu, E.T. Optimal gene expression analy blatt, D.Y., M. A. Z., and Fahey, T. J. I. Classification of sis by microarrays. Cancer Cell. 2: 353-361, 2002. 65 follicular thyroid tumors by molecular signature: results of 2. Sherman, S. I. Thyroid carcinoma. Lancet, 361 : 501-511, gene profiling. Clinical Cancer Reserach, 9: 1792-1800, 2003. 2003. US 9,234,244 B2 71 72 22. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., 26. Wright GW. Simon R M. A random variance model for Gaasenbeek, M., Mesirov, J. P. Coller, H., Loh, M. L., detection of differential gene expression in Small microar Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and ray experiments. Bioinformatics 2003, 19:2448-55. Lander, E. S. Molecular classification of cancer: class dis 27. Radmacher MD, McShane LM, Simon R. A paradigm for covery and class prediction by gene expression monitor class prediction using gene expression profiles. J Comput Biol 2002, 9: 505-11. ing. Science, 286: 531-537, 1999. 28. Dudoit S, Fridly and F. Speed T P. Comparison of dis 23. Eberwine, J. Amplification of mRNA populations using crimination methods for classification of tumors using aRNA generated from immobilized oligo(dT)-T7 primed DNA microarrays. Journal of the American Statistical cDNA. Biotechniques, 20:584-591, 1996. Association 2002, 97: 77-87. 24. Wang, E., Miller, L. D., Ohnmacht, G. A., Liu, E.T., and 29. Ramaswamy S. Tamayo P. Rifkin R, et al. Multiclass Marincola, F. M. High-fidelity mRNA amplification for cancer diagnosis using tumor gene expression signatures. gene profiling. Nat Biotechnol, 18: 457-459, 2000.25. Proc Natl Acad Sci USA 2001, 98: 15 149-54. Simon R, Lam A. Li M, Ngan M. Menenzes S. Zhao Y. 30. Fedor HL, De Marzo A M. Practical methods for tissue Analysis of Gene Expression Data Using BRB-Array microarray construction. Methods Mol Med 2005, 103: Tools. Cancer Informatics 2007, 2: 11-7. 89-101 TABLE 1. Two tail Anova analysis with Bonferroni correction resulted in 47 genes significantly different (p = < 0.05) between the malignant and the benign group. The genes are listed from the most to the least significant. In bold are all the genes that combined together created the best predictor model. Gene Bonferroni p-value Mean (benign) S.D.+ - Mean (malignant) S.D..+f- C21orf

TABLE 3

31 benign tumors 32 malignant tumors DIAGNOSIS PREDICTORMODEL Patho 10 gene cross validation logic diagnosis predictor model of 87% benign malignant Predicted Diag CDH1 FAM13A1Hs.24,183 Hs.286031IMPACTKIAA1128 KIT SYNGR2 prob. prob. Diagnosis 55 1.02 1.21 2.03 1.12 0.99 O.O1 benign FA O.83 O45 1.67 .74 O.98 1.27 0.27 O.54 O.O2 0.98 malignant FVPTC

O.62 0.85 0.985 56 1.16 O.86 O.80 0.79 O43 0.57 malignant PTC 0.3786 2.07

0.50 ().55 O.86 .94 0.64 O.99 0.25 4.66 O.OO 1.00 malignant PTC 19 O.91 O.56 0.11 4.69 O.OO 1.00 malignant 2.17 1.01 1.24 O.82 0.95 O.93 1.59 3.69 0.05 0.95 malignant 1.82 2.92 1.38 1.OO O.OO benign

6 gene cross validation

diagnosis predictor model of 87% Predicted Pathologic

C21orfA. HS.241.83 HS.296O31 Diagnosis Diagnosis .81 2.40 benign FA 67 1.74 0.27 0.56 0.54 0.15 0.85 malignant FVPTC 0.65 1.70 1.04 0.94 0.06 benign YES 0.95 1.56 O.80 0.85 0.79 0.33 0.67 malignant PTC

.34 0.65 1.63 1.47 2.70 || 0.96 0.04 benign FA

.22 1.19 0.11 O.43 4.69 O.OO 1.00 malignant

.23 1.74 2.92 1.04

In this table the two predictor model of 10 and 6 genes is shown with their gene expression values, the predicted diagnosis, the percentage probability of the diagnosis being correct an the pathologic diagnosis, FA = follicular adenoma, HN = hyperplastic nodules, FVPTC = follicular variant papillary thyroid carcinoma and PTC = papillary thyroid carcinoma. The square indicates the unknown sample for which there was discordance between the predicted and the pathologic diagnosis, The percentage diagnosis probability for both 6 and 10 gene combinations strongly suggested that this was a malignant sample, The sample was re reviewed by the pathologist and the pathologic diagnosis was in-fact changed to a neoplasm with uncertain malignant potential,

TABLE 4 Primers and probes for select DET genes. Thyroid Primer/Probes Oligo Name Length Sequence (5'-3') Tm

Hs 241.83 - Forward SEO ID NO: 1 22 ggctgactggcaaaaagttcttg

Hs 241.83 -Rewerse SEO ID NO: 2 26 ttggttcc.cittaagttct cagagttt

Hs 241.83 - Probe SEO ID NO: 3 23 (6Fam) TaggCCCTgTCACTCCCATgATgC (Tamra) thyroglobulin-forward SEQ ID NO: 4 18 aagggctcgcatgcaaag 59 US 9,234,244 B2 75 76 TABLE 4 - continued Primers and probes for select DET genes. Thyroid Primer/Probes thyroglobulin-reverse SEQ ID NO: 5 25 cacagtagc actctgagttgaagca thyroglobulin-probe SEQ ID NO 6 33 (6Fam) TTTgTCCCTgCTTgTACTAgTgAgg (Tamra) c21orf 4-forward SEO ID NO : 7 22 gcaatcct cittacct cogctitt c21orf 4-reverse SEO ID NO: 8 25 ggaatcggagacagaagagagctt c21orf 4 - Probe SEO ID NO: 9 28 (6Fam) CTgggACCACAgATgTATCCTCCACTCC (Tamra) fam13a1-forward SEO ID NO: 10 22 atggcagtgcagt catcatctt famil3a1-reverse SEO ID NO: 11 25 gcatt catacagotgcttaccatct fam13a1- Probe SEO ID NO: 12 23 (6Fam) TTTgg TCCCTg CCTAggACCggg (Tamra) c11orf 8-forward SEO ID NO: 13 16 ccggcc caa.gctic cat c11orf 8-reverse SEO ID NO: 14 21 ttgttgtaaccgt.cggit catga c11orf 8-Probe SEO ID NO: 15 29 (6Fam) TTTTgg TggAATCCATgAAggTTATggC (Tamra)

kiaa1128-forward SEO ID NO: 16 2O gaga.gc.gtgat cocc ct aca

kiaa1128-reverse SEO ID NO : 17 23 accalagagtgcacct cagtgtct

kiaa1128-probe SEO ID NO: 18 33 (6Fam) TCACTTCCAAATgTTCCT.gTAgCATAAATgg Tg (Tamra)

Hs 296 O31-forward SEO ID NO: 19 24 tgccaaggagctttgtttatagaa

Hs 296 O31-rewerse SEO ID NO: 2O 2O atgacggcatgitaccaacca

Hs. 296 O31-probe SEO ID NO: 21 29 (6Fam) TTggTCCCCTCAgTTCTATg CTgTTg TgT (Tamra)

kit-forward SEO ID NO: 22 26 gcacctgctgaaatgitatgacataat

kit-reverse SEO ID NO: 23 28 tttgctaagttggagtaaatatgattgg

kit-probe SEO ID NO: 24 36 (6Fam) ATTgTTCAgCTAATTgAgAAgCAgATTTCAgAgAgC (Tamra) impact-forward SEO ID NO: 25 26 tgaagaatgtcatggtgg tagt at Ca impact-reverse SEO ID NO: 26 26 atgact cotcaggtgaatttgttgtag impact-probe SEO ID NO: 27 29 (6Fam) CTggTATggAgggATTCTgCTAggACCAg (Tamra) cch1 -forward SEO ID NO: 28 21 tgagtgtc.ccc.cggitat ctitc cch1 - reverse SEO ID NO : 29 21 cago cqctitt cagattitt cat cch1 -probe SEO ID NO: 3 O 27 (6Fam) CCTgCCAATCCCAT9AAATTggAAAT (Tamra) syngr2-forward SEO ID NO : 31 19 gctggtgct catggc actt syngr2-reverse SEO ID NO: 32 19 cc ct coccaggct tcctaa syngr2-probe SEO ID NO: 33 24 (6Fam) aagggctittgcctgacaa.caccca (Tamra)

lism f-forward SEO ID NO: 34 21 gacgatccaggtaaagttc.ca

lism f -reverse SEO ID NO : 35 2O aggttgaggagtgggtcgaa lsm7-probe SEO ID NO: 36 22 (6Fam) aggcc.gcgaagc.cagtggaatc (Tamra)

G3PDH-Forward SEO ID NO : 37 22 TCACCAGGGCTGCTTTTAACTC

G3PDH-Rewerse SEO ID NO: 38 26 GGAATCATATTGGAACATGTAAACCA

G3PDH-probe SEO ID NO : 39 27 FAM-TTGCCATCAATGACCCCTTCATTGACC-TAMRA normal thyroid sample Clontec Lot 631 OO284 US 9,234,244 B2 77 78 TABLE 4- Continued Primers and probes for select DET genes. Thyroid Primer/Probes ret = retired Oligo Name Residues InCytePD Clone Unigene

Hs 241.83 - Forward 2436-2457 21.23 O2O Hs241.83

Hs 241.83-Rewerse 253 O-25 OS 21.23 O2O Hs241.83

Hs 241.83 - Probe 2462 - 2484 21.23 O2O Hs241.83 thyroglobulin-forward thyroglobulin-reverse 21.57-21.33 thyroglobulin-probe 2O88-2120 c21orf 4-forward 2622-2643 1710736 S284142-ret) His 433 6.68 c21orf 4-reverse 2743 - 2712 1710736 S284142-ret) His 433 6.68 c21orf 4-Probe 2652-2679 1710736 S284142-ret) His 433 6.68 fam13a1-forward 2931 - 2952 14583 66 S177644-removed) HS. 442818 famil3a1-reverse 14583 66 S177644-removed) HS. 442818 fam13a1-Probe 2992-3 O14 14583 66 S177644-removed) HS. 442818 c11orf8-forward 849 - 864 4117578 S46638-ret) His 432OOO c11orf8-reverse 916 - 896 4117578 S46638-ret) His 432OOO c11orf 8-Probe 866-894 4117578 S46638-ret) His 432OOO

kiaa1128-forward 598O-5999 142 8225 S81897

kiaa1128-reverse 142 8225 S81897

kiaa1128-probe 6OO4 - 6036 142 8225 S81897

Hs 296 O31-forward 4271 - 4294 295.57644 s296 O31

Hs 296O31-rewerse 4353 - 4334 295.57644 s296 O31

Hs. 296O31-probe 43 O1 - 4329 295.57644 s296 O31

kit-forward 2704-2729 2358O31/1672225 s81665

kit-reverse 2843-2816 2358O31/1672225 s81665

kit-probe 2779-2814 2358O31/1672225 s81665 impact-forward 809 - 834 973364 s284.245 impact-reverse 943 - 918 973364 284.245 impact-probe 837 - 865 973364 s284.245 cch1 -forward 2499-2519 2793.857/1858 O5O/12O8946 S194657 cch1 - reverse 2579 - 2.559 2793.857/1858 O5O/12O8946 S194657 cch1 -probe 2525-255.1 2793.857/1858 O5O/12O8946 S194657 syngr2-forward 1255 - 1273 983 OO8 (HS5097-ret) H.S. 43375.3 syngr2-reverse 1374 - 1356 983 OO8 (HS5097-ret) H.S. 43375.3 syngr2-probe 13 O3-1326 983 OO8 (HS5097-ret) H.S. 43375.3

lism f-forward 72-92 1911913/2O6 O56O (HS 7083O-ret) H.S. 512 610

lism f-rewerse 146-127 1911913/2O6 O56O (HS 7083O-ret) H.S. 512 610 lism 7-probe 96-117 1911913/2O6 O56O (HS 7083O-ret) H.S. 512 610

G3PDH-Forward 128-149 US 9,234,244 B2 79 80 TABL E 4 - continued Primers and probes for select DET genes. Thyroid Primer/Probes

G3PDH-Reverse 228-2O3

G3PDH-probe 167-193 normal thyroid sample pooled 65 autopsy patients 650 - 424 -8222 CM Paper TAQman. Oligo Name GenBank/RefSeq, GenBank/RefSeq,

Hs 241.83 - Forward NPO6 O2 65 AL832414.1

Hs 241.83-Rewerse NPO6 O2 65 AL832414.1

Hs 241.83 - Probe NPO6 O2 65 AL832414.1 thyroglobulin-forward NMOO3235 NMOO3235 thyroglobulin-reverse NMOO3235 NMOO3235 thyroglobulin-probe NMOO3235 NMOO3235 c21orf 4-forward APOO1717 NMOO61344 c21orf 4-reverse APOO1717 NMOO61344 c21orf 4-Probe APOO1717 NMOO61344 fam13a1-forward (NMO 14883) fromABO2O721 (NMO14883) fromABO2O721 famil3a1-reverse (NMO 14883) fromABO2O721 (NMO14883) fromABO2O721 fam13a1-Probe (NMO 14883) fromABO2O721 (NMO14883) fromABO2O721 c11orf8-forward NMOO1584 NMOO1584 c11orf8-reverse NMOO1584 NMOO1584 c11orf 8-Probe NMOO1584 NMOO1584 kiaa1128-forward ABO32914. 1.- this is actually AB032954. 1 ABO32954. 1 kiaa1128-reverse ABO32914. 1.- this is actually AB032954. 1 ABO32954. 1 kiaa1128-probe ABO32914. 1.- this is actually AB032954. 1 ABO32954. 1

Hs 296 O31-forward BC38.512.1 BC385 12.1

Hs 296O31-rewerse BC38.512.1 BC385 12.1

Hs. 296O31-probe BC38.512.1 BC385 12.1

kit-forward XO 61821 XO61821

kit-reverse XO 61821 XO61821

kit-probe XO 61821 XO61821 impact-forward NMO18439 NMO 18439 impact-reverse NMO18439 NMO 18439 impact-probe NMO18439 NMO 18439 cch1 -forward NMOO4360 NMOO4360 cch1 - reverse NMOO4360 NMOO4360 cch1 -probe NMOO4360 NMOO4360 syngr2-forward NMOO47 10.2 NMOO471 O2 syngr2-reverse NMOO47 10.2 NMOO471 O2 syngr2-probe NMOO47 10.2 NMOO471 O2

lism f-forward NMO161991. 1 NMO 161991. 1 US 9,234,244 B2 81 82 TABLE 4 - continued Primers and probes for select DET genes. Thyroid Primer/Probes lism f-rewerse NMO161991. 1 NMO 161991. 1

lism 7-probe NMO161991. 1 NMO 161991. 1

G3PDH-Forward NMOO2O46

G3PDH-Reverse NMOO2O46 G3PDH-probe NMOO2O46 normal thyroid sample Oligo Name Chromosome Primer AProbe Details His .241.83 - Forward P1 used later part of sequence

Hs 241.83 - Reverse

His .241.83 - Probe

thyroglobulin-forward used within Exon 9 thyroglobulin-reverse thyroglobulin-probe

c21orf 4-forward 21q22.11 spans Exon 7-8

c21orf 4-reverse

c21orf 4 - Probe fam13a1-forward 4q22.1 used later part of seq-exon 19

famil3a1-reverse

fam13a1- Probe

c11orf 8-forward 11p 13 spans Exon 5-6

c11orf 8-reverse

c11orf 8-Probe kiaa1128-forward 1 Oc23.2 used later part of sequence

kiaa1128-reverse kiaa1128-probe

Hs 296 O31-forward X used later part of sequence

Hs 296 O31-rewerse Hs. 296 O31-probe kit-forward 4q11-q12 spans Exon 19-2O

kit-reverse kit-probe impact-forward 18q11.2-q12. 1 spans Exon 10-11 impact-reverse impact-probe

cch1 -forward 16q22.1 spans Exon 15-16

cch1 - reverse coh1 -probe Syngr2-forward 17q25.3 used later sequence Syngr2-reverse US 9,234,244 B2 83 84 TABLE 4 - continued Primers and probes for select DET genes. Thyroid Primer/Probes

Syngr2-probe

lism f-forward 19p 13.3 used later sequence

lism f -reverse lsm7-probe G3 PDH-Forward from Takahashi paper

G3PDH-Rewerse G3PDH-probe normal thyroid sample

TABLE 5 Schematic of microarray analysis of benign and malignant thyroid tumors. Benign subtypes Malignant subtypes

FA AN LcT HA HC FC PTC FVPTC

FA1 AN 1 LicT 1: HA1* HC 1: FC 1 PTC 1: FVPTC 1 FA 2 AN2 LcT 2 HA2 HC 2: FC 2 PTC 2 FVPTC 2 FA 3 AN3 LcT3 HA 3 HC 38 FC 3 PTC 3 FVPTC 3 FA 4 AN 4 LcT 4 HA 4 HC 4: FC 4 PTC 4 FVPTC 4 FAS ANS LCTS HAS HC 5* FC 5 PTC 5 FVPTC 5 FA 6 AN 6 LcT 6 HA 6 HC 1: FC 6 PTC 6 FVPTC 6 FA 7 AN7 LicT 7 HA 7 HC 2: FC 7 PTC 7 FVPTC 7 FA 8* AN 8:8 LicT 8:8 HA 8* HC 38 FC 8:8 PTC 88 FVPTC 8 FA9* AN9* LicT 9* HA9* HC 4: FC 9: PTC 9* FVPTC 9* FA 10 AN 10* LicT 10: HA 10 HC S* FC 10* PTC 10 FVPTC 10* FA 11* AN11 LicT 11: HA11 HC 1: FC 11 PTC 11 FVPTC 11: FA 8* AN 8:8 LicT 8:8 HA 8* HC 1: FC 12 PTC 88 FVPTC 11: FA9* AN9* LicT 9* HA9* HC 1: FC 9: PTC 9* FVPTC 9* FA12 AN 10* LicT 10: HA1* HC 1: FC 10* PTC 1: FVPTC 10* FA 11* AN 12 LicT 11: HA12 HC 1* FC 13 PTC 12 FVPTC 12 FA13 AN 13 LicT 1: HA13 HC 1* FC 8:8 PTC 13 FVPTC 13 Microarray analysis was performed using 50 benign tumors (13 follicular adenomas (FA), 13 adenomatoid nodules (AN), 11 lymphocytic thyroiditis (LeT) and 13 Hurthle celladenomas (HA) and 44 malignant tumors 5Hurthle cell carcinomas (HC), 13 follicular carcinomas (FC), 13 papillary thyroid carcinomas (PTC) and 13 follicular variant of papillary thyroid carcinomas (FVPTC). To minimize experimental variation all 8 tumor subtypes in each row were arrayed simultaneously Some tumor samples were used more than once and were considered as technical replicates during data analysis,

TABLE 6 Genes overexpressed in malignant thyroid tumors identified by microarray analysis. Description UG cluster Gene symbol Parametric P value Ratiof M/B High mobility group AT-hook 2, transcript variant 1 HSSOS924 HMGA2 O.OOO1597 2.6 Kallikrein 7 (chymotryptic, HS.151254 KLK7 O.OOO2012 2.5 stratum corneum), transcript variant 1 Mannose receptor, C type 2 HS.783S MRC2 <1e-O7 2.5 Leucine-rich repeat kinase 2 HS.187636 LRRK2 3.46e-05 2.2 Pleiomorphic adenoma gene 1: HS.14968 PLAG1 O.OOO2O47 2.2 Cytochrome P450, family 1, Subfamily B, polypeptide 1 HS.154654 CYP1B1 O.OOO3485 2.0 Dipeptidyl-peptidase 4 (CD26, HS.368,912 DPP4 OOOO6842 1.9 adenosine deaminase complexing protein 2) Fibronectin type III domain containing 4 HS.27836 FNDC4 3.30e-05 1.9 Pleckstrin homology-like domain, family A, member 2 HS.154036 PHILDA2 6.OOe-O7 1.9 Cyclin A1 HS.417 OSO CCNA1 8.08e-05 1.8 Cadherin 3, type 1, P-cadherin (placental) HS.SS4598 CDH3 1.10e-06 1.8 Carcinoembryonic antigen-related cell HS.466814 CEACAM6 O.OOO1172 1.8 adhesion molecule 6 (nonspecific cross-reacting antigen) Quiescin Q6 Hs.518374 QSCN6 <1e-O7 1.7 Collagen, type VII, C. 1 (epidermolysis bullosa, HS.476218 COL7A1 3.24e-05 1.7 dystrophic, dominant and recessive) US 9,234,244 B2 85 86 TABLE 6-continued

Genes overexpressed in malignant thyroid tumors identified by microarray analysis.

Description UG cluster Gene symbol Parametric P value Ratiof M/B

Hypothetical protein MGC9712 HS.S921.74 MGC9712 6.39e-OS 7 interleukin 1 receptor accessory protein, transcript variant 1 HS.478673 IL1RAP 9.68e-05 7 Laminin, 33, transcript variant 1 HS.497636 LAMB3 O.OOO1874 7 Protease, serine, 3 (mesotrypsin): HS.128O13 PRSS3 6.50e-O6 7 Low density lipoprotein receptor-related protein 4 HS.4930 LRP4 O.OOO1359 .6 Sparc/osteonectin, cwev and kazal-like HS.124611 SPOCK1 O.OOO1704 .6 domains proteoglycan (testican) 1 Phosphodiesterase 5 A, c0MP-specific, transcript variant 3 HS.370661 PDESA 2.07e-05 .6 Hypothetical protein FLJ37078 HS.S11 O2S FLJ37078 O.OOO106 .6 Fibrillin 3 HS.370362 FBN3 O.OOO7772 .6 DIRAS family, GTP-binding RAS-like 3 Hs.194695 DIRAS3 O.OOO1982 .6 Protease, serine, 1 (trypsin 1) HS.S11522 PRSS1 OOOO2246 .6 Calcium calmodulin-dependent protein kinase II inhibitor 1 HS.197922 CAMK2N1 O.OOOS162 .6 SNAP25-interacting protein HS.448872 SNIP O.OOO1026 .6 Potassium inwardly-rectifying channel, subfamily J, member 2 Hs. 1547 KCNV2 O.OOO1192 .6 Stratifin HS.S23718 SFN 3.23e-05 .5 UDP-N-acetyl-C-D-galactosamine:polypeptide HS.1274.07 GALNT7 O.OOO2O68 .5 N-acetylgalactosaminyltransferase 7 Transforming growth factor, C. Hs.170009 TGFA O.OOO3326 .5 BAI1-associated protein 3 HS.458427 BALAP3 4.13e-OS .5 Potassium channel, subfamily K, member 15 HS.S28664. KCNK15 O.OOO1188 .5

HUGO abbreviations used in Locus Link. The ratio between Geo mean expression values of malignant to benign thyroid tumors (P<0.001). Genes validated by real-time RT-PCR.

TABLE 7 Genes underexpressed in malignant thyroid tumors identified by microarray analysis.

Parametric Description UG cluster Gene symbol P value Ratiof M/B Recombination activating gene 2: HS.1593.76 RAG2 1.32e-OS O41 Citrate lyase f-like, transcript variant 1 HS.130690 CLYBL 1.43e-OS 0.44 Nebulin HS.S886SS NEB O.OOO2811 O.S3 Tumor necrosis factor receptor Superfamily, HS.81791 TNFRSF11B 4SOe-06 O.S4 member 11b (osteoprotegerin) Guanine nucleotide binding protein (G protein), HS.134587 GNAI1 4.33e-OS 0.55 C. inhibiting activity polypeptide 1 Angiotensin II receptor, type 1, transcript variant 5 HS.477887 AGTR1 4.28e-05 O.S6 Hepatic leukemia factor HS.196952 HLF 1.40e-06 0.57 Solute carrier family 26, member 4 HS-571246 SLC26A4 1.00e-O7 O.S8 Metallothionein 1A (functional) HS.643S32 MT1A OOOO4668 O.S9 Fatty acid binding protein 4, adipocyte HS.391561 FABP4 4.38e-05 O.6O Low density lipoprotein-related protein 1B (deleted in tumors) HS.4701 17 LRP1B O.OOO3571 O.6O Solute carrier family 4, sodium bicarbonate HS.5462 SLC4A4 O.OOO2S22 O.61 cotransporter, member 4 PREDICTED: similar to programmed cell death HS.S9783S LOC646278 O.OOO196S O.61 6 interacting protein, transcript variant 2 Mannosidase, C., class 1C, member 1 HS.197043 MAN1 C1 9.46e-OS O.61 KV channel interacting protein 3, callsenilin, transcript variant 2 HS.437376 KCNIP3 1.12e-OS O.62 DnaJ (Hsp40) homologue, subfamily B, member 9 HS.6790 DNAB9 5.10e-O6 O.62 Ubiquitin protein ligase E3 component n-recognin 1 HS.591121 UBR1 OOOO166 O.62 Hydroxysteroid (17-B) dehydrogenase 6 HS.S24513 HSD17B6 0.0002557 O.62 Solute carrier family 33 (acetyl-CoA transporter), member 1 HS.478O31 SLC33A1 2.49e-OS O.63 Cadherin 16, KSP-cadherin HS.S13660 CDH16 O.OOO7O68 O.63 TBC1 (tre-2/USP6, BUB2, ccdc16) domain family, member 1 Hs.176SO3 TBC1D1 8.OOe-O7 O.63 Solute carrier family 26, member 7, transcript variant 1 HS.3S4O13 SLC26A7 2.18e-05 O.63 Chromosome 11 open reading frame 74 HS.406726 C11orf74 1.40e-06 O.63 Phospholipase A2 receptor 1, 180 kDa HS.410477 PLA2R1 O.OOO1771 O.64 Pituitary tumor-transforming 3 on chromosome 8. PTTG3 5.00e-O7 O.64 EGF-containing fibulin-like extracellular matrix Hs.76224 EFEMP1 1.17e-OS O.64 protein 1, transcript variant 3 US 9,234,244 B2 87 88 TABLE 7-continued Genes underexpressed in malignant thyroi tumors identified by microarray analysis.

Parametric Description OG cluster Gene symbol P value Ratiof M/B Zinc finger, matrin type 4 S.S918SO ZMAT4 7.03e-OS O.64 STEAP family member 3 S.642719 STEAP3 O.OOO2097 O.64 Deiodinase, iodothyronine, type I, transcript variant 4 S.251415 DIO1 O.OOO7362 O.64 v-Kit Hardy-Zuckerman 4 feline sarcoma viral S.479754 KIT 8.16e-OS O.65 oncogene homologue Thyroid peroxidase, transcript variant 5 S.467554 TPO 9.70e-O6 O.65 Pituitary tumor-transforming 1 S.3SO966 PTTG1 6.OOe-O7 O.65 Leucine-rich repeat LGI family, member 3 S.33470 LGI3 4.OOe-OS O.65 Transmembrane protein 38B S.411925 TMEM38B O.OOO1833 O.65 SLIT and NTRK-like family, member 4 S.272284 SLITRK4 7.7Se-OS O.65 Von Hippel-Lindau binding protein 1 S.4368O3 VBP1 7.04e-OS O.65 Collagen, type IX, Ct. 3 S.126248 COL9A3 O.OOO998.7 O.65 Insulin receptor substrate 1 S.471508 IRS1 6.OOe-O6 O.66 START domain containing 13, transcript variant Y S.SOf704 STARD13 O.OOO 1052 O.66 PREDICTED: similar to glycine cleavage system LOC6S4085 9.60e-O6 O.66 H protein, mitochondrial precursor, variant 1 Ribosomal protein S3A S.356572 RPS3A O.OOO4627 O.66 SPARC-like 1 (mast9, hevin) S.62886 SPARCL1 7.61e-OS O.66

HUGO abbreviations used in locus Link, The ratio between Geo mean expression values of malignant to benign thyroi tumors (P<0.001). Genes validated by real-time RT-PCR.

TABLE 8 Summary of class performance indicating sensitivity, specificity, and positive predictive values obtained from Seven classification methods. Mean percent of correct Benign Tumors Malignant Tumors Methods classification Sensitivity Specificity PPV Sensitivity Specificity PPVi Compound Covariate 74 O.8 O.682 O.741 O.682 O.8 0.75 Predictor Diagonal Liner 76 O.8 0.705 0.755 O.705 O.8 0.756 Descriminant Analysis 1-Nearest Neighbor 78 O.82 0.727 0.774. O.727 O.82 O.78 3-Nearest Neighbor 71 0.72 0.705 0.735 0.705 0.72 O689 Nearest Centroid 73 O.78 O.682 0.736 O.682 O.78 0.732 Support Vector Machines 77 O.76 0.773 O.792 O.773 O.76 0.739 Bayesian Compound 74 O.8 O.682 O.741 O.682 O.8 0.75 Covariate predictor Abbreviation: PPV, positive predictor value *Highest percent of correct classification was determined by using 1-Nearest Neighbor method the probability that a sample predicted as Benign actually belongs to Benign subtype the probability that a sample predicted as Malignant actually belongs to Malignant subtype

TABLE 9 TABLE 9-continued 50 Immunohistochemical evaluation of HMGA2 in thyroid tumors. Immunohistochemical evaluation of HMGA2 in thyroid tumors. HMGA2 positive HMGA2 HMGA2 positive HMGA2 Total High Moderate Lows negative* 55 Total Highf Moderate Lows negative* Tissue array samples 87 17 Normal thyroid 19 2 Follicular variant of 7 4 1 1 1 Follicular adenoma 14 1 13 Lymphocytic thyroiditis 11 6 5 papillary thyroid nodule carcinoma Follicular carcinoma 14 3 2 4 60 Papillary thyroid 10 5 2 1 2 Follicular variant of 9 4 2 1 2 carcinoma papillary thyroid carcinoma No significant expression, Papillary thyroid carcinoma 2O 7 7 4 Nonarrayed samples 38 Expressed in >66% of cell population. Follicular adenoma 11 65 Expressed in 33% to 66% of cell population. Adenomatoid nodule 10 1 1 Expressed in <33% of cell population. US 9,234,244 B2 89 90

SEQUENCE LISTING

<16O is NUMBER OF SEO ID NOS: 60

<210s, SEQ ID NO 1 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 1 ggctgactgg caaaaagt ct td 22

<210s, SEQ ID NO 2 &211s LENGTH: 26 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 2 ttggttcc ct taagttctica gag titt 26

<210s, SEQ ID NO 3 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 3 tggc cctdtc act cocatga tigc 23

<210s, SEQ ID NO 4 &211s LENGTH: 18 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 4 aagggctcgc atgcaaag 18

<210s, SEQ ID NO 5 &211s LENGTH: 25 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 5

Cacagtagca citctgagttg aagca 25

<210s, SEQ ID NO 6 &211s LENGTH: 25 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 6 tttgtc.cctg. Cttgtactag tagg 25

<210s, SEQ ID NO 7 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence US 9,234,244 B2 91 92 - Continued

22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OO > SEQUENCE: 7 gcaatcct ct tacctic cqct tt 22

<210s, SEQ ID NO 8 &211s LENGTH: 24 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 8 ggaatcggag acagaa.gaga gCtt 24

<210s, SEQ ID NO 9 &211s LENGTH: 28 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 9 ctgggaccac agatgitat co to cacticc 28

<210s, SEQ ID NO 10 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 10 atggcagtgc agt catcatc tt 22

<210s, SEQ ID NO 11 &211s LENGTH: 25 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 11 gcatt catac agctgcttac catct 25

<210s, SEQ ID NO 12 &211s LENGTH: 23 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 12 tittggit coct gcc taggacc ggg 23

<210s, SEQ ID NO 13 &211s LENGTH: 16 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct

<4 OOs, SEQUENCE: 13 ccggcc caag ctic cat 16 US 9,234,244 B2 93 94 - Continued

SEQ ID NO 14 LENGTH: 21 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic construct SEQUENCE: 14 ttgttgtaacc gtcggit catg a 21

SEO ID NO 15 LENGTH: 29 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic construct SEQUENCE: 15 tgtttggtgg aatcCatgaa ggittatggc 29

SEQ ID NO 16 LENGTH: 2O TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic construct SEQUENCE: 16 gaga.gc.gtga t cc ccctaca

SEO ID NO 17 LENGTH: 23 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic construct SEQUENCE: 17 accalagagtg Caccticagtg tct 23

SEQ ID NO 18 LENGTH: 33 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic construct SEQUENCE: 18 t cact tccaa atgttcctgt agcataaatg gtg 33

SEQ ID NO 19 LENGTH: 24 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic construct

SEQUENCE: 19 tgccaaggag Ctttgttt at agaa 24

SEQ ID NO 2 O LENGTH: 2O TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic construct US 9,234,244 B2 95 96 - Continued

<4 OOs, SEQUENCE: 2O atgacggcat gtaccalacca

<210s, SEQ ID NO 21 &211s LENGTH: 29 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 21 ttgg tocc ct cagttctato citgttgtgt 29

<210s, SEQ ID NO 22 &211s LENGTH: 26 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 22 gcacctgctgaaatgitatga cataat 26

<210s, SEQ ID NO 23 &211s LENGTH: 28 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 23 tittgctaagt tagtaaat atgattgg 28

<210s, SEQ ID NO 24 &211s LENGTH: 36 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 24 attgttcagc taattgagaa gCagattt ca gagagc 36

<210s, SEQ ID NO 25 &211s LENGTH: 26 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 25 tgaagaatgt catggtggta gitatica 26

<210s, SEQ ID NO 26 &211s LENGTH: 26 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct

<4 OOs, SEQUENCE: 26 atgact cotc aggtgaattt gtgtag 26

<210s, SEQ ID NO 27 US 9,234,244 B2 97 98 - Continued

&211s LENGTH: 29 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 27

Ctgg tatgga gggattctgc taggaccag 29

<210s, SEQ ID NO 28 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 28 tgagtgtc.cc ccggitat citt c 21

<210s, SEQ ID NO 29 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 29 cago cqctitt cagattitt cat 21

<210s, SEQ ID NO 3 O & 211 LENGTH: 27 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 30 cctgccaatic ccgatgaaat toggaaat 27

<210s, SEQ ID NO 31 &211s LENGTH: 19 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 31 gctggtgctic atggcactt 19

<210s, SEQ ID NO 32 &211s LENGTH: 19 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct

<4 OOs, SEQUENCE: 32 c cct coccag gct tcc taa 19

<210s, SEQ ID NO 33 &211s LENGTH: 24 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct

<4 OOs, SEQUENCE: 33 US 9,234,244 B2 99 100 - Continued aagggctttg cctgacaa.ca ccca 24

<210s, SEQ ID NO 34 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 34 gacgat.ccgg gtaaagttcc a 21

<210s, SEQ ID NO 35 &211s LENGTH: 2O &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 35 aggttgagga gtgggtcgaa 2O

<210s, SEQ ID NO 36 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 36 aggcc.gcgaa gcc agtggaa tic 22

<210s, SEQ ID NO 37 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OO > SEQUENCE: 37 t caccagggc tigcttittaac to 22

<210s, SEQ ID NO 38 &211s LENGTH: 26 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct <4 OOs, SEQUENCE: 38 ggaatcat at tdgaacatgit aaacca 26

<210s, SEQ ID NO 39 &211s LENGTH: 27 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct

<4 OOs, SEQUENCE: 39 ttgc catcaa tdaccc ctitc attgacc 27

<210s, SEQ ID NO 4 O &211s LENGTH: 3O84 &212s. TYPE: DNA

US 9,234,244 B2 103 104 - Continued tctgtcatgc titctagoaaa ttgtaagact aattatttgt titccacctica taacctgttg 228O caataaat at tacttct cat acagtttaat attgttgttt gttggagaaa atgaaccata 234 O aaaattgatt togctgttcag ttitt caatta ttcaagtata cccaattaaa gatgcagtta 24 OO tgtttataaa ataagaagaa at agacittgt aaaatgctta titgagggitt attgaaggtt 246 O t ccctgaaga citgactggaa atggtggctg tttittittcta tittctgactic togc catgaat 252O tttitttittitt tttittittaaa gacaatat ct cactctgttg cctaggctgg agtgcagtgg 2580 tgcaac caca gct cactgca cct tcaaatig citggagctica ggcaatcct c ttacctic cqc 264 O titt.ccaagca gctgggacca cagatgitatic ctic cact cott cqctggccac catcc tigctg 27 OO

Ccca acagaa gaagct Cttic ttct Cogat titcctgaacg gtctaaggac Caggaagaaa 276 O

Caggct Cotg C cagdaccga cagcaacgaa aatgttcc.ca cigagat cag gatgacttgc 282O tgaagcticag taggctaa aaagaggaca caaagtgaa Cagaatgat C titCctacgca 288O caacacaaac at cagttaat gttccatcca togctgcttaa agagcattcc tdtcc tagta 294 O aaatgggcaa gtc.cct ctac ccc.ccaccct cacctggitat gcttacatta atagotaaag 3 OOO t caatcctgt aatgaaataa agcaa.gtggit agctgtctgg tag cct c cac tactgcaaat 3 O 6 O citcaagaaaa aaaaaaaaaa aaaa 3O84

<210s, SEQ ID NO 41 &211s LENGTH: 158 212. TYPE: PRT <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic construct: C21orf4 <4 OOs, SEQUENCE: 41 Met Ala Gly Phe Lieu. Asp Asn. Phe Arg Trp Pro Glu. Cys Glu. Cys Ile 1. 5 1O 15 Asp Trp Ser Glu Arg Arg Asn Ala Val Ala Ser Val Val Ala Gly Ile 2O 25 3O Lieu. Phe Phe Thr Gly Trp Trp Ile Met Ile Asp Ala Ala Val Val Tyr 35 4 O 45 Pro Llys Pro Glu Gln Lieu. Asn His Ala Phe His Thr Cys Gly Val Phe SO 55 6 O

Ser Thir Lieu Ala Phe Phe Met Ile Asn Ala Wal Ser Asn Ala Glin Wall

Arg Gly Asp Ser Tyr Glu Ser Gly Cys Lieu. Gly Arg Thr Gly Ala Arg 85 90 95 Val Trp Leu Phe Ile Gly Phe Met Leu Met Phe Gly Ser Lieu. Ile Ala 1OO 105 11 O Ser Met Trp Ile Leu Phe Gly Ala Tyr Val Thr Glin Asn Thr Asp Val 115 12 O 125

Tyr Pro Gly Lieu Ala Val Phe Phe Glin Asn Ala Lieu. Ile Phe Phe Ser 13 O 135 14 O

Thr Lieu. Ile Tyr Lys Phe Gly Arg Thr Glu Glu Lieu. Trp Thr 145 150 155

<210s, SEQ ID NO 42 &211s LENGTH: 2822 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct: Hs. 145049

US 9,234,244 B2 107 108 - Continued tcttitt cott ctittatatgt tottgcagtt act cittgtat togcaagattt totgactitta 24 OO agctttgaga C tactgcatc ttaaaagaag alactaggctg actggcaaaa agt cttgc.ca 246 O gtggcc ctgt cacticc catg atgctttggit tttgagagtt gggaaaactic tagaactta 252O agggalaccala act caggaat cccaaaattg gtggcattgt gcc attcgtt taggggctga 2580 acat aggacc ttctgaaac tagtgagct agatgcattt gggtttgaat ttttgtcaca 264 O tactgaaatg taagttcagcc ctaaataatc aaaac actitt attittattitt totttittitta 27 OO aataggaact ttctgaagaa aaagtggtgt gtaaaacatt tdatatttaa gacaataaag 276 O tttittatcat aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaagaaaaa aaaaaaaaaa 282O a.a. 2822

<210s, SEQ ID NO 43 &211s LENGTH: 152 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct: Hs. 145049 <4 OOs, SEQUENCE: 43 Met Val His Ser Pro Arg Ser Leu Val Ala Asn Pro Ser Glin Val Lieu. 1. 5 1O 15 Phe Phe Leu Ser Phe Leu Phe Phe Phe Phe Lieu. Arg Glin Ser Phe Ala 2O 25 3O Lieu Val Ala Glin Ala Gly Val Glin Trp Arg Asn Lieu. Gly Ser Lieu. Glin 35 4 O 45 Pro Pro Pro Pro Gly Phe Lys Glin Phe Ser Cys Lieu Ser Lieu. Leu Ser SO 55 6 O Ser Trp Asp Tyr Arg His Ala Pro Pro Cys Pro Ala Tyr Phe Val Phe

Lieu Val Asp Met Gly Phe Pro His Val Gly Glin Thr Gly Lieu. Glu Lieu. 85 90 95 Lieu. Thir Ser Gly Asp Pro Pro Ala Ser Ala Ser Glin Ser Ala Gly Ile 1OO 105 11 O Thr Gly Gly Ser His Arg Ala Glin Pro Thr Ser Ser Asn Pro Tyr Gly 115 12 O 125 Ile Val Phe Phe Phe Leu Pro Val Lys Thr Phe Ser Gly Met Ser Glin 13 O 135 14 O Glu Ala Gly Asp Cys Arg Glu Thr 145 150

<210s, SEQ ID NO 44 &211s LENGTH: 4597 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct: Hs. 296.031

<4 OOs, SEQUENCE: 44 aatggtacga ttgagagatg agtgctatgg agaaaaatcc agc.ca.gaaag ggagatgcag 6 O aagatagdac cagat cittct caatgttgtt ttcat catgg acaggtgcgt to cittagaag 12 O atgaagtgtt agtgattic ct gagtttitt ct cacct tcacg tcgattgat c tdaatttgga 18O gagt ctgttt t ctgttgtctg gctctgcact Caactttgta ggggaccctg. tcgaggit coc 24 O

Cacactgtgg Ctt Cagg tag acagagcaga tigggagcc.ca ttt cagttca ttgtcttgct 3OO US 9,234,244 B2 109 110 - Continued gaccalatggg gaactgtggit Caggtgagag gaggcagctt ttacaat cag actt cattga 360 at agtgtggg Ctgctgttt C Cttgtaacaa aaccc catala tatggcagt titc.cggatgt 42O gtc.tttittag gactitcagaa citt attattt gaatagaagt ttaaagcatc toggatgatga 48O tgctgtagct aaaacagotg cittgtcagaa gag acccitat ttaac acttic taaacttgtt 54 O t cagaggtgg aggaaaggat aatctgggaa ggcct Cocto tcaagtic cac aggttggitat 6OO cagotgttgtt catcc.cccaa aaggaaaata aaatgacaac aatattittgg to acagaatt 660 cctgagaaac citctgtttct atctt catgt ctittaagata gggacatgaa titc cc catga 72 O tctgggtgat agggittagag tecCaggac actgttactt ttgttgttgac acaggtggct 78O cct catgaca gttcct coat gcc ttagaac atgttgtctg. tctggtcatc cctdggggta 84 O gagdtgagtg acc cagcagt gggagattta acaactggag aagaagatgg gatgtgttta 9 OO attatc.ccca gagg tagggc caatttgtca ccctittaaat agacittattt gcatataaac 96.O taaag.cacct tagggcatca ttaccgaaag tdtctaag.ca aatgtctgat at agttacgt. O2O gcctgcatta aaagaaag.ca gcc cc ctitat cittgc cittaa tat cottaca gtgttittaat O8O aagttcataa togcatcctgt atgtgcattt tttggtataa alacac coaaa gotggagaat 14 O tgactt cagt tdtctic catc ctitt CCCCtt aagtgttggit ggcgctgcag gggcaacgtg 2OO cctic cc attg gaagtggtga citt cotctitt gatagaggitt togcctgtctic titgaaaatga 26 O aaagaa.gcgg agattgat ct ctggagtic cc atggit coagt ttggactatt gggaatattt 32O tittatgggat gttaaaaa.ca at attagaga C9tgagatag taaatttgttg gtaat accgg 38O atcCaggaag Cttacagtga agagtatgala Ctt aacctga aaagt attt C. tctgttctat 44 O aaatct ct ca gtgacatttg gattaatcaa goataattaa atgtagttag attitttgtca SOO gattgtagtt caaaataata titcatctato gagagggitaa tat attatgt agaaattitta 560 ttaa.gc actt tagttaa.gca aac actaagg agaacaaaat caacct cagg aaggittaatt 62O actaaaaaaa toacaaagta tagtagatta totaaatcat tittaattittgaataccatgg 68O cittgagctitt aatttacata gagacg tatt ttggatttgt ttitt cacatt at attittcta 74 O gtacaggatt gcaattgcat t cittgaaaag titc tact cat tittaggattic cattaagttt 8OO gcttaactitt titt catgtta taattitccaa aagcaaagaa ttacaattgt attctagota 86 O attattittaa tottt cacta actttgtgtg tattgtaaga ccatatttitt atttctatac 92 O aaatgatgat tttalagagaa gitat Caggag agagaatgta tatgaaag.ca t cqcgt.ccac 98 O gcctggctitt gcaataagtg tt catttaaa agaaaga cat ttacaaaggit aaaacataag 2O4. O agtttagact at agcgataa atc.tttitt at tittagtaatt totttaaagg gaaaagtaaa 21OO gagatcaaaa tdattittata totatttittt ttgtact cag agaattacat titt cact acc 216 O ccc.gc.ctgtc. tcagggaata gcc tittgata agaatcc cat ggagat citct ggaactictat 222 O tacagtgtgt toagatttgt tagttcatat gtaaattitca gagctagagc titcaaaacta 228O gag tattgta atctoaggaa cataagatta t coaagaagc ctdaaccttg citcttitt cat 234 O gataaatgac atccaaattt cctttgtcta ggagataagc atagat.ccct tittat catgc 24 OO ttct ctdaga ttitt cacaga acaac cct gc aatttgattt tdtttgataa ttittgcttitt 246 O tggcttitt ca gtgaggactic tatttitcc at tdgaactgac toctittgggg ataataagct 252O ttcacttaaa agaacatt co attagatagt totaact tca atgaacctaa aagtggcttic 2580 ttaatttgaa taatctggat aacttittgca aatgggit caa alacagcacaa gitat caacaa 264 O t cacg tatgt actgagtaat atttgcc.ctic cagttagcaa agt caagaaa tdtctaactic 27 OO

US 9,234,244 B2 117 118 - Continued aagtggttgt tagttataga tgtct aggta Ctt Caggggg actitcattga gagttttgtc 492 O aatgtc.ttitt gaatatt coc aagcc catga gtc.cttgaaa at atttittta tatatacagt 498O aactittatgt gtaaatacat aag.cggcgta agtttaaagg atgttggtgt t cc acgtgtt 5040 ttatt cotgt atgttgtc.ca attgttgaca gttctgaaga att C 5084

SEQ ID NO 46 LENGTH: 976 TYPE PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic construct: KI T

<4 OOs, SEQUENCE: 46

Met Arg Gly Ala Arg Gly Ala Trp Asp Phe Luell Wall Luell Luell Luell 1. 5 1O 15

Lell Luell Arg Val Glin Thr Gly Ser Ser Glin Pro Ser Wall Ser Pro Gly 25 3O

Glu Pro Ser Pro Pro Ser Ile His Pro Gly Ser Asp Luell Ile Wall 35 4 O 45

Arg Wall Gly Asp Glu Ile Arg Lieu. Luell Thir Asp Pro Gly Phe Wall SO 55 6 O

Lys Trp Thr Phe Glu Ile Lieu. Asp Glu Thir ASn Glu Asn Glin Asn 65 70

Glu Trp Ile Thr Glu Lys Ala Glu Ala Thir ASn Thir Gly Tyr Thir 85 90 95

Thr ASn Llys His Gly Lieu Ser Asn Ser Ile Wall Phe Wall Arg 105 11 O

Asp Pro Ala Lys Lieu. Phe Lieu Val Asp Arg Ser Lell Tyr Gly Glu 115 12 O 125

Asp Asn Asp Thir Lieu Val Arg Cys Pro Luell Thir Asp Pro Glu Wall Thir 13 O 135 14 O

Asn Tyr Ser Lieu Lys Gly Cys Glin Gly Pro Lell Pro Asp Luell 145 150 155 160

Arg Phe Ile Pro Asp Pro Lys Ala Gly Ile Met Ile Ser Wall Lys 1.65 17O 17s

Arg Ala Tyr His Arg Lieu. Cys Lieu. His Ser Wall Asp Glin Glu Gly 18O 185 19 O

Ser Val Lieu. Ser Glu Lys Phe Ile Luell Wall Arg Pro Ala Phe 195 2O5

Ala Wall Pro Wal Wal Ser Wall Ser Ala Ser Luell Luell Arg 21 O 215 22O

Glu Gly Glu Glu Phe Thir Wall. Thir Thir Ile Asp Wall Ser Ser 225 23 O 235 24 O

Ser Wall Tyr Ser Thr Trp Lys Arg Glu Asn Ser Glin Thir Luell Glin 245 250 255

Glu Tyr Asn Ser Trp His His Gly Asp Phe Asn Tyr Glu Arg Glin 26 O 265 27 O

Ala Thir Lieu. Thir Ile Ser Ser Ala Arg Wall ASn Asp Ser Gly Wall Phe 28O 285

Met Cys Tyr Ala Asn Asn Thr Phe Gly Ser Ala Asn Wall Thir Thir Thir 29 O 295 3 OO

Lell Glu Val Val Asp Llys Gly Phe Ile Asn Ile Phe Pro Met Ile Asn 3. OS 310 315

Thir Thir Val Phe Val Asn Asp Gly Glu Asn Wall Asp Lell Ile Wall Glu US 9,234,244 B2 119 120 - Continued

3.25 330 335

Glu Ala Phe Pro Pro Glu His Glin Glin Trp Ile Tyr Met Asn 34 O 345 35. O

Arg Thir Phe Thir Asp Trp Glu Asp Pro Ser Glu Asn Glu 355 360 365

Ser Asn Ile Arg Tyr Wall Ser Glu Luell His Luell Thir Arg Luell Gly 37 O 375 38O

Thir Glu Gly Gly Thir Tyr Thir Phe Luell Wall Ser Asn Ser Asp Wall Asn 385 390 395 4 OO

Ala Ala Ile Ala Phe Asn Wall Wall Asn Thir Pro Glu Ile Luell 4 OS 415

Thir Asp Arg Lell Wall Asn Gly Met Luell Glin Wall Ala Ala Gly 425 43 O

Phe Pro Glu Pro Thir Ile Asp Trp Phe Pro Gly Thir Glu Glin 435 44 O 445

Arg Cys Ser Ala Ser Wall Lell Pro Wall Asp Wall Glin Thir Luell Asn Ser 450 45.5 460

Ser Gly Pro Pro Phe Gly Lys Luell Wall Wall Glin Ser Ser Ile Asp Ser 465 470

Ser Ala Phe His Asn Gly Thir Wall Glu Cys Ala Asn Asp 485 490 495

Wall Gly Thir Ser Ala Phe Asn Phe Ala Phe Gly Asn Asn SOO 505 51O

Glu Glin Ile His Pro His Thir Luell Phe Thir Pro Lell Luell Ile Gly 515 525

Phe Wall Ile Wall Ala Gly Met Met Ile Ile Wall Met Ile Luell Thir 53 O 535 54 O

Tyr Luell Glin Lys Pro Met Glu Wall Glin Trp Wall Wall 5.45 550 555 560

Glu Glu Ile Asn Gly Asn Asn Tyr Wall Tyr Ile Asp Pro Thir Glin Luell 565 st O sts

Pro Asp His Lys Trp Glu Phe Pro Arg ASn Arg Lell Ser Phe Gly 585 59 O

Thir Luell Gly Ala Gly Ala Phe Gly Wall Wall Glu Ala Thir Ala 595 605

Gly Luell Ile Ser Asp Ala Ala Met Thir Wall Ala Wall Met 610 615

Lell Pro Ser Ala His Lell Thir Glu Arg Glu Ala Lell Met Ser Glu 625 630 635 64 O

Lell Wall Luell Ser Lell Gly Asn His Met Asn Ile Wall Asn Luell 645 650 655

Lell Ala Cys Thir Ile Gly Gly Pro Thir Luell Wall Ile Thir Glu Tyr 660 665 67 O

Tyr Gly Asp Lell Lell Asn Phe Luell Arg Arg Lys Arg Asp Ser 675 685

Phe Ile Ser Glin Glu Asp His Ala Glu Ala Ala Luell Tyr 69 O. 695 7 OO

Asn Luell Luell His Ser Lys Glu Ser Ser Ser Asp Ser Thir Asn Glu 7 Os 71O 72O

Met Asp Met Lys Pro Gly Wall Ser Tyr Wall Wall Pro Thir Lys Ala 72 73 O 73

Asp Arg Arg Ser Wall Arg Ile Gly Ser Tyr Ile Glu Arg Asp Wall 740 74. 7 O US 9,234,244 B2 121 122 - Continued

Thr Pro Ala Ile Met Glu Asp Asp Glu Lieu. Ala Lell Asp Luell Glu Asp 7ss 760 765

Lieu. Luell Ser Phe Ser Tyr Glin Val Ala Lys Gly Met Ala Phe Lieu Ala 770 775 78O

Ser Lys Asn Cys Ile His Arg Asp Lieu Ala Ala Arg Asn Ile Lieu. Luell 78s 79 O 79. 8OO

Thr His Gly Arg Ile Thr Lys Ile Cys Asp Phe Gly Lell Ala Arg Asp 805 810 815

Ile Lys Asn Asp Ser Asn Tyr Val Val Lys Gly Asn Ala Arg Leul Pro 825 83 O

Val Llys Trp Met Ala Pro Glu Ser Ile Phe Asn Wall Thir Phe 835 84 O 845

Glu Ser Asp Val Trp Ser Tyr Gly Ile Phe Lieu. Trp Glu Luell Phe Ser 850 855 860

Lieu. Gly Ser Ser Pro Tyr Pro Gly Met Pro Wall Asp Ser Phe Tyr 865 87O 87s 88O

Lys Met Ile Lys Glu Gly Phe Arg Met Leu Ser Pro Glu His Ala Pro 885 890 895

Ala Glu Met Tyr Asp Ile Met Lys Thr Cys Trp Asp Ala Asp Pro Leu 9 OO 905 91 O

Lys Arg Pro Thr Phe Lys Glin Ile Wall Glin Lieu. Ile Glu Glin Ile 915 92 O 925

Ser Glu Ser Thr Asn His Ile Tyr Ser Asn. Luell Ala Asn Ser Pro 93 O 935 94 O

Asn Arg Glin Llys Pro Val Val Asp His Ser Wall Arg Ile Asn Ser Wall 945 950 955 96.O

Gly Ser Thr Ala Ser Ser Ser Glin Pro Luell Luell Wall His Asp Asp Wall 965 97O 97.

<210s, SEQ I D NO 47 &211s LENGT H: 489 212. TYPE : DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATU RE: 223 OTHER INFORMATION: Synthetic construct: LSM7

<4 OOs, SEQUENCE: 47 cgcgacaaga tggcggataa ggagaagaag aaaaaggaga gCatcttgga cittgtccaag 6 O tacatcgaca agacgat.ccg ggtaaagttc Cagggaggcc tggaatcCtg 12 O aagggctt.cg accoactic ct caaccttgttg Ctggacggca c cattgagta Catgcgagac 18O cctgacgacc agtacaagct cacggaggac acccggcagc tgggcct cqt ggtgttgc.cgg 24 O ggcacgt.ccg tggtgctaat Ctgc.ccgcag gacgg catgg aggc.catcc c Calacc cott C 3OO atcCaggagc aggacgc.cta ggg.cgcgggg ggtgcagggc aggc.ccgagc 360 agct cqgttt cc.cgcggact t cccaccgca gtaccgc.ctic Ctggaacgga agcatttctic cittitttgtat aggttgaatt tttgttttct taataaaatt gcaaacctica 48O aaaaaaaaa. 489

<210s, SEQ I D NO 48 &211s LENGT H: 103 212. TYPE : PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATU RE: 223 OTHER INFORMATION: Synthetic construct: LSM7

US 9,234,244 B2 125 126 - Continued c cc catggct cc.ca.gact ct gtctgtgc.cg agtgt attat aaaatcgtgg gggagatgcc 144 O cggCCtggga tgctgtttgg agacggaata aatgttittct cattcagtict c cagt cattg 15OO gttgagccac agcctagggg ttggaggaag act CCaCtct ggg tacaccC ttaggggctg 1560 gctt tatgga acttgtagtt tgaacaaggc agtggcaatc cgc.ccc.cticc agcctgcctg 162O gctggCCCCC titccctctgt Ctggggtc.gc att cogcaca agc ctitt cat caa.catctta 168O aaatagtaac 1694

<210s, SEQ I D NO SO &211s LENGT H: 224 212. TYPE : PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATU RE: 223 OTHER INFORMATION: Synthetic construct: SYNGR2

<4 OOs, SEQUENCE: 50

Met Glu Ser Gly Ala Tyr Gly Ala Ala Lys Ala Gly Gly Ser Phe Asp 1. 5 1O 15

Lieu. Arg Arg Phe Lieu. Thir Glin Pro Glin Wal Wall Ala Arg Ala Val Cys 25

Leu Val Phe Ala Lieu. Ile Wall Phe Ser Cys Ile Gly Glu Gly Tyr 35 4 O 45

Ser Asn Ala His Glu Ser Lys Glin Met Tyr Cys Wall Phe Asn Arg Asn SO 55 6 O

Glu Asp Ala Cys Arg Tyr Gly Ser Ala Ile Gly Wall Lell Ala Phe Lieu. 65 70

Ala Ser Ala Phe Phe Leu Wal Wall Asp Ala Tyr Phe Pro Glin Ile Ser 85 90 95

Asn Ala Thr Asp Arg Llys Tyr Lieu. Val Ile Gly Asp Lell Luell Phe Ser 105 11 O

Ala Lieu. Trp Thr Phe Leu Trp Phe Val Gly Phe Phe Luell Thir Asn 115 12 O 125

Glin Trp Ala Val Thr Asn Pro Llys Asp Val Lieu. Wall Gly Ala Asp Ser 13 O 135 14 O

Val Arg Ala Ala Ile Thir Phe Ser Phe Phe Ser Ile Phe Ser Trp Gly 145 150 155 160

Wall Lieu Ala Ser Lieu Ala Tyr Glin Arg Tyr Lys Ala Gly Wall Asp Asp 1.65 17O 17s

Phe Ile Glin Asn Tyr Val Asp Pro Thr Pro Asp Pro Asn Thir Ala Tyr 18O 185 19 O

Ala Ser Tyr Pro Gly Ala Ser Val Asp Asn Tyr Glin Glin Pro Pro Phe 195

Thir Glin Asn Ala Glu. Thir Thr Glu Gly Tyr Glin Pro Pro Pro Val Tyr 21 O 215 22O

<210s, SEQ I D NO 51 &211s LENGT H: 22.72 212. TYPE : DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATU RE: 223 OTHER INFORMATION: Synthetic construct:

<4 OOs, SEQUENCE: 51 aatgcacagc gg tattgatgagtagatcct tatt caga ggttggctga aacgcac cat 6 O gcctgctt Co. atcttittgct Ctgtaaagtt gtgaattgct catgcct at a gggaggalagg 12 O atggcacatg ggattic Cttic tica aggcaaa gttaccatala C9gtggatga gtacagctica 18O

US 9,234,244 B2 129 130 - Continued

<4 OOs, SEQUENCE: 52 Met Ala His Gly Ile Pro Ser Glin Gly Llys Val Thr Ile Thr Val Asp 1. 5 1O 15 Glu Tyr Ser Ser Asn Pro Thr Glin Ala Phe Thr His Tyr Asn Ile Asn 2O 25 3O Gln Ser Arg Phe Gln Pro Pro His Val His Met Val Asp Pro Ile Pro 35 4 O 45 Tyr Asp Thr Pro Llys Pro Ala Gly His Thr Arg Phe Val Cys Ile Ser SO 55 6 O Asp Thr His Ser Arg Thr Asp Gly Ile Gln Met Pro Tyr Gly Asp Ile 65 70 7s 8O Lieu. Lieu. His Thr Gly Asp Phe Thr Glu Lieu. Gly Lieu Pro Ser Glu Val 85 90 95 Llys Llys Phe Asin Asp Trp Lieu. Gly Asn Lieu Pro Tyr Glu Tyr Lys Ile 1OO 105 11 O Val Ile Ala Gly Asn His Glu Lieu. Thir Phe Asp Llys Glu Phe Met Ala 115 12 O 125 Asp Lieu Val Lys Glin Asp Tyr Tyr Arg Phe Pro Ser Val Ser Lys Lieu 13 O 135 14 O Llys Pro Glu Asp Phe Asp Asn Val Glin Ser Lieu. Lieu. Thir Asn. Ser Ile 145 150 155 160 Tyr Lieu. Glin Asp Ser Glu Val Thr Val Lys Gly Phe Arg Ile Tyr Gly 1.65 17O 17s Ala Pro Trp Thr Pro Trp Phe Asin Gly Trp Gly Phe Asn Lieu Pro Arg 18O 185 19 O Gly Glin Ser Lieu. Lieu. Asp Llys Trp Asn Lieu. Ile Pro Glu Gly Ile Asp 195 2OO 2O5 Ile Leu Met Thr His Gly Pro Pro Leu Gly Phe Arg Asp Trp Val Pro 21 O 215 22O Lys Glu Lieu. Glin Arg Val Gly Cys Val Glu Lieu. Lieu. Asn Thr Val Glin 225 23 O 235 24 O Arg Arg Val Arg Pro Llys Lieu. His Val Phe Gly Gly Ile His Glu Gly 245 250 255 Tyr Gly Ile Met Thr Asp Gly Tyr Thr Thr Tyr Ile Asn Ala Ser Thr 26 O 265 27 O Cys Thr Val Ser Phe Gln Pro Thr Asn Pro Pro Ile Ile Phe Asp Leu 27s 28O 285 Pro Asn Pro Glin Gly Ser 29 O

<210s, SEQ ID NO 53 &211s LENGTH: 4828 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic construct: CDH1

<4 OOs, SEQUENCE: 53 agtggcgt.cg gaactgcaaa gCacctgtga gcttgcggala gtcagttcag act coagcc C 6 O gctic cagc cc ggc.ccgaccc gaccgcaccc ggcgc.ctgcc Ctc.gctcggc gtcc.ccggCC 12 O agc.catgggc ccttggagcc gcagoctotc ggcgctgctg. Ctgctgctgc aggtotic ct c 18O ttggct ctgc Caggagc.cgg agc cctgcca C cctggctitt gacgc.cgaga gct acacgtt 24 O cacggtgc cc cqgcgccacc tigagagagg cc.gcgtc.ctg ggcagagtga attittgaaga 3OO

US 9,234,244 B2 135 136 - Continued <223> OTHER INFORMATION: Synthetic construct: CDH1 <4 OOs, SEQUENCE: 54 Met Gly Pro Trp Ser Arg Ser Lieu. Ser Ala Lieu Lleu Lleu Lieu. Lieu. Glin 1. 5 1O 15

Val Ser Ser Trp Lieu. Cys Glin Glu Pro Glu Pro Cys His Pro Gly Phe 2O 25 3O Asp Ala Glu Ser Tyr Thr Phe Thr Val Pro Arg Arg His Lieu. Glu Arg 35 4 O 45

Gly Arg Val Lieu. Gly Arg Val Asn. Phe Glu Asp Cys Thr Gly Arg Glin SO 55 6 O Arg Thr Ala Tyr Phe Ser Lieu. Asp Thr Arg Phe Llys Val Gly Thr Asp 65 70 7s

Gly Val Ile Thr Val Lys Arg Pro Leu Arg Phe His Asn Pro Glin Ile 85 90 95

His Phe Leu Val Tyr Ala Trp Asp Ser Thr Tyr Arg Llys Phe Ser Thir 1OO 105 11 O

Llys Val Thr Lieu. Asn Thr Val Gly His His His Arg Pro Pro Pro His 115 12 O 125

Glin Ala Ser Val Ser Gly Ile Glin Ala Glu Lieu Lleu. Thr Phe Pro Asn 13 O 135 14 O

Ser Ser Pro Gly Lieu. Arg Arg Glin Lys Arg Asp Trp Val Ile Pro Pro 145 150 155 160

Ile Ser Cys Pro Glu Asn Glu Lys Gly Pro Phe Pro Lys Asn Lieu. Wall 1.65 17O 17s

Glin Ile Llys Ser Asn Lys Asp Llys Glu Gly Llys Val Phe Tyr Ser Ile 18O 185 19 O

Thr Gly Glin Gly Ala Asp Thr Pro Pro Val Gly Val Phe Ile Ile Glu 195 2OO 2O5 Arg Glu Thr Gly Trp Lieu Lys Val Thr Glu Pro Lieu. Asp Arg Glu Arg 21 O 215 22O

Ile Ala Thr Tyr Thr Lieu Phe Ser His Ala Val Ser Ser Asn Gly Asn 225 23 O 235 24 O

Ala Val Glu Asp Pro Met Glu Ile Lieu. Ile Thr Val Thr Asp Glin Asn 245 250 255

Asp Asn Llys Pro Glu Phe Thr Glin Glu Val Phe Lys Gly Ser Val Met 26 O 265 27 O Glu Gly Ala Leu Pro Gly. Thir Ser Val Met Glu Val Thr Ala Thr Asp 27s 28O 285

Ala Asp Asp Asp Val Asn. Thir Tyr Asn Ala Ala Ile Ala Tyr Thr Ile 29 O 295 3 OO

Lieu. Ser Glin Asp Pro Glu Lieu Pro Asp Lys Asn Met Phe Thr Ile Asn 3. OS 310 315

Arg Asn Thr Gly Val Ile Ser Val Val Thir Thr Gly Lieu. Asp Arg Glu 3.25 330 335

Ser Phe Pro Thr Tyr Thr Lieu Val Val Glin Ala Ala Asp Leu Gln Gly 34 O 345 35. O

Glu Gly Lieu Ser Thr Thr Ala Thr Ala Val Ile Thr Val Thr Asp Thir 355 360 365

Asn Asp Asn Pro Pro Ile Phe Asin Pro Thr Thr Tyr Lys Gly Glin Wall 37 O 375 38O

Pro Glu Asn Glu Ala Asn Val Val Ile Thr Thr Lieu Lys Val Thr Asp 385 390 395 4 OO US 9,234,244 B2 137 138 - Continued

Ala Asp Ala Pro Asn Thir Pro Ala Trp Glu Ala Wall Tyr Thir Ile Luell 4 OS 415

Asn Asp Asp Gly Gly Glin Phe Wall Wall Thir Thir Asn Pro Wall Asn Asn 425 43 O

Asp Gly Ile Luell Lys Thir Ala Lys Gly Luell Asp Phe Glu Ala Lys Glin 435 44 O 445

Glin Tyr Ile Luell His Wall Ala Wall Thir Asn Wall Wall Pro Phe Glu Wall 450 45.5 460

Ser Luell Thir Thir Ser Thir Ala Thir Wall Thir Wall Asp Wall Luell Asp Wall 465 470 47s

Asn Glu Ala Pro Ile Phe Wall Pro Pro Glu Arg Wall Glu Wall Ser 485 490 495

Glu Asp Phe Gly Wall Gly Glin Glu Ile Thir Ser Thir Ala Glin Glu SOO 505

Pro Asp Thir Phe Met Glu Glin Lys Ile Thir Arg Ile Trp Arg Asp 515 525

Thir Ala Asn Trp Lell Glu Ile Asn Pro Asp Thir Gly Ala Ile Ser Thir 53 O 535 54 O

Arg Ala Glu Luell Asp Arg Glu Asp Phe Glu His Wall Asn Ser Thir 5.45 550 555 560

Thir Ala Luell Ile Ile Ala Thir Asp Asn Gly Ser Pro Wall Ala Thir 565 st O sts

Gly Thir Gly Thir Lell Lell Lell Ile Luell Ser Asp Wall Asn Asp Asn Ala 585 59 O

Pro Ile Pro Glu Pro Arg Thr Ile Phe Phe Glu Arg Asn Pro 595 605

Pro Glin Wall Ile Asn Ile Ile Asp Ala Asp Luell Pro Pro Asn Thir Ser 610 615

Pro Phe Thir Ala Glu Lell Thir His Gly Ala Ser Ala Asn Trp Thir Ile 625 630 635 64 O

Glin Asn Asp Pro Thir Glin Glu Ser Ile Ile Lell Pro Lys Met 645 650 655

Ala Luell Glu Wall Gly Asp Ile Asn Luell Lell Met Asp Asn 660 665 67 O

Glin Asn Lys Asp Glin Wall Thir Thir Luell Glu Wall Ser Wall Asp 675 685

Glu Gly Ala Ala Gly Wall Cys Arg Ala Glin Pro Wall Glu Ala Gly 69 O. 695 7 OO

Lell Glin Ile Pro Ala Ile Lell Gly Ile Luell Gly Gly Ile Luell Ala Luell 7 Os 72O

Lell Ile Luell Ile Lell Lell Lell Luell Luell Phe Luell Arg Arg Arg Ala Wall 72 73 O 73

Wall Glu Pro Lell Lell Pro Pro Glu Asp Asp Thir Arg Asp Asn Wall 740 74. 7 O

Tyr Asp Glu Glu Gly Gly Gly Glu Glu Asp Glin Asp Phe Asp 760 765

Lell Ser Glin Luell His Arg Gly Luell Asp Ala Arg Pro Glu Wall Thir Arg 770 775 78O

Asn Asp Wall Ala Pro Thir Lell Met Ser Wall Pro Arg Luell Pro Arg 78s 79 O 79.

Pro Ala Asn Pro Asp Glu Ile Gly Asn Phe Ile Asp Glu Asn Luell Lys 805 810 815

Ala Ala Asp Thir Asp Pro Thir Ala Pro Pro Tyr Asp Ser Luell Luell Wall

US 9,234,244 B2 143 144 - Continued ttaccagt ct gattittatcg tdaaacacca agc.caggcta gcatgct cat gigcaatctgt 414 O ttggggctgt tttgttgttgg Cactagocaa acatalaaggg gCttaagttca gcctgcatac 42OO agaggat.cgg ggagagaagg ggcctgtgtt Ctcagcct Co tagt actta C cagagttta 426 O attitttittaa aaaaaatctg. cactaaaatc cccaaactga caggtaaatg tagcc ct cag 432O agct cago'cc aaggcagaat ctaaatcaca citattitt cqa gat catgitat aaaaagaaaa 438 O aaaagaagtic atgctgttgtg gccaattata atttittittcaaag actttgt cacaaaactg 4 44 O t citat attag acattttgga gqqaccagga aatgtaagac accaaatcct c catctgttc 4500 agtgtgcctg atgtcacctic atgatttgct gttacttittt taact cotgc gccaaggaca 456 O gtgggttctg tdtccaccitt ttgctttgc gaggc.cgagc C caggcatct gct cqcctgc 462O cacggctgac cagaga aggt gct tcaggag Ctctgcctta gacgacgtgt tacagtatga 468O acacacagca gaggcaccct cqtatgttitt gaaagttgcc ttctgaaagg gcacagttitt 474. O aaggaaaaga aaaagaatgt aaaactatac taccc.gttt toagttittaa agggit cqtga 48OO gaaactggct ggtc.calatgg gatttacago alacatttitcc attgctgaag tagg tagda 486 O gct Ctcttct gtcagdtgaa tittaaggat ggggaaaaag aatgc ctitta agtttgctict 492 O taatcg tatg gaagcttgag ctatgtgttg gaagtgc cct ggttittaatc catacacaaa 498O gacgg tacat aatcct acag gtttaaatgt acataaaaat at agtttgga attctittgct 5040 c tactgttta cattgcagat togctataatt toaaggagtg agattataaa taaaatgatg 51OO cactittagga tigttitcct at ttittgaaatc tdaacatgaa toatt cacat gaccaaaaat 516 O tgttgtttittt taaaaataca tdtctagt ct gtc.ctittaat agct citctta aataagctat 522 O gatattaatc agat cattac cagttagott ttaaag.caca tttgtttaag actatgttitt 528 O tggaaaaata cqctacagaa ttttitttitta agctacaaat aaatgagatg c tactaattg 534 O ttittggaatc tdttgtttct gccaaaggta aattalactaa agatttatt c aggaatc.ccc 54 OO atttgaattt g tatgattica ataaaagaaa acaccaagta agittatataa aataaattgt 546 O gtatgagatgttgttgtttitc citttgtaatt to cactaact alactalactaa cittatatt ct 552O t catggaatg gagcc.ca.gaa gaaatgagag gaa.gc.ccttt to acactaga t cittatttga 558 O agaaatgttt gttagt cagt cagt cagtgg tttctggctic tic.cgaggga gatgtgttcC 564 O c cagcaac catttctgcago coagaatcto aaggcactag aggcggtgtc. ttaattaatt st OO ggct tcacaa agacaaaatg Ctctggactg ggatttittco tittgctgttgt tdgaatatg 576. O tgttt attaa ttagcacatg ccaacaaaat aaatgtcaag agittatttica taagtgtaag 582O taaacttaag aattaaagag togcagacitta taattitt c 5858

<210s, SEQ ID NO 56 &211s LENGTH: 1023 212. TYPE : PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: 223 OTHER INFORMATION: Synthetic construct: FAM13A1

<4 OOs, SEQUENCE: 56 Met Gly Ala Gly Ala Lieu Ala Ile Cys Glin Ser Lys Ala Ala Val Arg 1. 5 15

Lieu Lys Glu Asp Met Lys Llys Ile Wall Ala Wall Pro Lieu. Asn. Glu Glin 25

Lys Asp Phe Thr Tyr Gln Lys Lieu. Phe Gly Val Ser Lieu. Glin Glu Lieu. 35 4 O 45 US 9,234,244 B2 145 146 - Continued

Glu Arg Glin Gly Lell Thir Glu Asn Gly Ile Pro Ala Val Val Trp Asn SO 55 6 O

Ile Wall Glu Tyr Lell Thir Glin His Gly Luell Thir Glin Glu Gly Luell Phe 65 70

Arg Wall Asn Gly Asn Wall Wall Wall Glu Glin Lell Arg Luell Lys Phe 85 90 95

Glu Ser Gly Wall Pro Wall Glu Luell Gly Lys Asp Gly Asp Wall Ser 105 11 O

Ala Ala Ser Luell Lell Lell Phe Luell Arg Glu Lell Pro Asp Ser Luell 115 12 O 125

Ile Thir Ser Ala Lell Glin Pro Arg Phe Ile Glin Lell Phe Glin Asp Gly 13 O 135 14 O

Arg Asn Asp Wall Glin Glu Ser Ser Luell Arg Asp Lell Ile Glu Luell 145 150 155 160

Pro Asp Thir His Tyr Lell Luell Tyr Luell Glin Phe Luell Thir 1.65 17O 17s

Wall Ala Lys His His Wall Glin Asn Arg Met Asn Wall His Asn Luell 18O 185 19 O

Ala Thir Wall Phe Gly Pro Asn Cys Phe His Wall Pro Pro Gly Luell Glu 195

Gly Met Glu Glin Asp Lell Asn Ile Met Ala Ile Luell 21 O 215

Glu Asn Asn Thir Lel Phe Glu Wall Glu Tyr Thir Glu Asn Asp His 225 23 O 235 24 O

Lell Arg Glu Asn Lel Ala Arg Luell Ile Ile Wall Glu Wall 245 250 255

Asn Ser Lell Ile Luell Luell Thir Arg Gly Lell Glu Arg Asp 26 O 265 27 O

Met Pro Lys Pro Pro Thir Ile Pro Ser Arg Ser Glu 27s 285

Gly Ser Ile Glin Ala His Arg Wall Luell Glin Pro Glu Lell Ser Asp Gly 29 O 295 3 OO

Ile Pro Glin Luell Ser Lel Arg Luell Ser Arg Ala Luell Glu 3. OS 310 315

Asp Met Asn Ser Ala Glu Gly Ala Ile Ser Ala Lell Wall Pro Ser 3.25 330 335

Ser Glin Glu Asp Glu Arg Pro Luell Ser Pro Phe Lell Ser Ala His 34 O 345 35. O

Wall Pro Glin Wall Ser Asn Wall Ser Ala Thir Gly Glu Lell Luell Glu Arg 355 360 365

Thir Ile Arg Ser Ala Wall Glu Glin His Luell Phe Asp Wall Asn Asn Ser 37 O 375 38O

Gly Gly Glin Ser Ser Glu Asp Ser Glu Ser Gly Thir Lell Ser Ala Ser 385 390 395 4 OO

Ser Ala Thir Ser Ala Arg Glin Arg Arg Arg Glin Ser Glu Glin Asp 4 OS 41O 415

Glu Wall Arg His Gly Arg Asp Gly Luell Ile Asn Glu Asn Thir 425 43 O

Pro Ser Gly Phe Asn His Lell Asp Asp Ile Lell Asn Thir Glin Glu 435 44 O 445

Wall Glu Lys Wall His Asn Thir Phe Gly Cys Ala Gly Glu Arg Ser 450 45.5 460 US 9,234,244 B2 147 148 - Continued

Lys Pro Arg Glin Lys Ser Ser Thir Lys Luell Ser Glu Luell His Asp 465 470

Asn Glin Asp Gly Lell Wall Asn Met Glu Ser Luell Asn Ser Thir Arg Ser 485 490 495

His Glu Arg Thir Gly Pro Asp Asp Phe Glu Trp Met Ser Asp Glu Arg SOO 505

Gly Asn Glu Lys Asp Gly Gly His Thir Glin His Phe Glu Ser Pro 515 525

Thir Met Ile Glin Glu His Pro Ser Luell Ser Asp Thir Glin Glin 53 O 535 54 O

Arg Asn Glin Asp Ala Gly Asp Glin Glu Glu Ser Phe Wall Ser Glu Wall 5.45 550 555 560

Pro Glin Ser Asp Lell Thir Ala Luell Asp Glu Asn Trp Glu Glu 565 st O sts

Pro Ile Pro Ala Phe Ser Ser Trp Glin Arg Glu Asn Ser Asp Ser Asp 585 59 O

Glu Ala His Luell Ser Pro Glin Ala Gly Arg Luell Ile Arg Glin Luell Luell 595 605

Asp Glu Asp Ser Asp Pro Met Luell Ser Pro Arg Phe Ala Gly 610 615

Glin Ser Arg Glin Tyr Lell Asp Asp Thir Glu Wall Pro Pro Ser Pro Pro 625 630 635 64 O

Asn Ser His Ser Phe Met Arg Arg Arg Ser Ser Ser Lell Gly Ser Tyr 645 650 655

Asp Asp Glu Gln Glu Asp Lieu Thr Pro Ala Gln Lieu Thr Arg Arg Ile 660 665 67 O

Glin Ser Luell Ile Arg Phe Glu Asp Arg Phe Glu Glu 675 685

Glu Lys Arg Pro Ser His Ser Asp Ala Ala Asn Pro Glu 69 O. 695 7 OO

Wall Luell Trp Thir Asn Asp Luell Ala Phe Arg Arg Glin Luell Lys 7 Os 71s

Glu Ser Luell Lys Ile Ser Glu Glu Asp Luell Thir Pro Arg Met Arg 72 73 O 73

Glin Arg Ser Asn Thir Lell Pro Ser Phe Gly Ser Glin Luell Glu Lys 740 74. 7 O

Glu Asp Glu Glin Glu Luell Wall Asp Ala Ile Pro Ser 760 765

Wall Glu Ala Thir Lell Glu Ser Ile Glin Arg Lell Glin Glu 770 775

Ala Glu Ser Ser Arg Pro Glu Asp Ile Asp Met Thir Asp Glin 79 O 79. 8OO

Ile Ala Asn Glu Lys Wall Ala Luell Glin Lys Ala Lell Lell Tyr Glu 805 810 815

Ser Ile His Gly Arg Pro Wall Thir Lys Asn Glu Arg Glin Wall Met Lys 825 83 O

Pro Luell Tyr Asp Arg Arg Luell Wall Lys Glin Ile Lell Ser Arg Ala 835 84 O 845

Asn Thir Ile Pro Ile Ile Gly Ser Pro Ser Ser Lys Arg Arg Ser Pro 850 855 860

Lell Luell Glin Pro Ile Ile Glu Gly Glu Thir Ala Ser Phe Phe Glu 865 87O 87s

Ile Glu Glu Glu Glu Gly Ser Glu Asp Asp Ser Asn Wall Pro