US008568974B2

(12) United States Patent (10) Patent No.: US 8,568,974 B2 Willman et al. (45) Date of Patent: Oct. 29, 2013

(54) IDENTIFICATION OF NOVEL SUBGROUPS Alberts et al. (Molecular Biology of the Cell, 3rd edition, 1994, p. OF HIGH-RSK PEDIATRIC PRECURSORB 465).* Greenbaum et al. ( Biology, 2003, vol. 4, Issue 9, pp. 117.1- ACUTELYMPHOBLASTICLEUKEMIA, 117.8).* OUTCOME CORRELATIONS AND Ries LAG, Wilbert D. Krapcho M. et al. SEER Cancer Statistics DAGNOSTIC AND THERAPEUTIC Review, 1975-2005. NIH publication. Bethesda, Md.: National Can METHODS RELATED TO SAME cer Institute, Bethesda, MD; 2008:v. Smith M. Arthur D. Camitta B, et al. Uniform approach to risk (75) Inventors: Cheryl L. Willman, Albuquerque, NM classification and treatment assignment for children with acute (US); Richard Harvey, Placitas, NM lymphoblastic leukemia. J Clin Oncol. 1996; 14:18-24. (US); George S. Davidson, Pieters R. Carroll WL. Biology and treatment of acute lymphoblastic Albuquerque, NM (US); Xuefei Wang, leukemia. Pediatr Clin North Am. 2008:55:1-20, ix. Albuquerque, NM (US); Susan RAtlas, Armstrong SA, Look AT. Molecular genetics of acute lymphoblastic leukemia. J Clin Oncol. 2005:23:6306-6315. Albuquerque, NM (US); Edward J. Yeoh E.J. Ross ME, Shurtleff SA, et al. Classification, subtype dis Bedrick, Albuquerque, NM (US); Iming covery, and prediction of outcome in pediatric acute lymphoblastic L. Chen, Albuquerque, NM (US) leukemia by expressionprofiling. Cancer Cell. 2002; 1:133-143. Moos PJ, Raetz EA, Carlson MA. etal. Identification of gene expres (73) Assignees: STC.UNM, Albuquerque, NM (US); sion profiles that segregate patients with childhood leukemia. Clin Sandia Corporation, Albuquerque, NM Cancer Res. 2002:8:3118-3130. Wilson CS, Davidson GS, Martin SB, et al. Gene expressionprofiling (US) of adult acute myeloid leukemia identifies novel biologic clusters for (*) Notice: Subject to any disclaimer, the term of this risk classification and outcome prediction. Blood. 2006; 108:686 patent is extended or adjusted under 35 696. U.S.C. 154(b) by 228 days. Shuster JJ. Camitta BM. Pullen J. et al. Identification of newly diag nosed children with acute lymphocytic leukemia at high risk for relapse. Cancer Res Ther Control. 1999;9:101-107. (21) Appl. No.: 12/734,542 Borowitz M.J. Devidas M. Hunger SP. et al. Clinical significance of minimal residual disease in childhood acute lymphoblastic leukemia (22) PCT Filed: Nov. 14, 2008 and its relationship to other prognostic factors: A Children's Oncol ogy Group study. Blood, 2008; 111:5477-5485. (86). PCT No.: PCT/US2O08/O12821 Nachman JB, Sather HN, Sensel MG, et al. Augmented post-induc tion therapy for children with high-risk acute lymphoblastic leuke S371 (c)(1), mia and a slow response to initial therapy. N Engl J Med. 1998; (2), (4) Date: Oct. 12, 2010 338:1663-1671. Seibel NL, Steinherz PG, Sather HN, et al. Early postinduction inten (87) PCT Pub. No.: WO2009/064481 sification therapy improves Survival for children and adolescents with high-risk acute lymphoblastic leukemia: a report from the Children's PCT Pub. Date: May 22, 2009 Oncology Group. Blood. 2008; 111:2548-2555. (65) Prior Publication Data Borowitz, MJ, Pullen DJ, Shuster JJ, et al. Minimal residual disease detection in childhood precursor-B-cell acute lymphoblastic leuke US 2011 FOO45999 A1 Feb. 24, 2011 mia: relation to other risk factors. A Children's Oncology Group study. Leukemia. 2003; 17:1566-1572. Related U.S. Application Data Davidson GS, Martin S. Boyack KW, et al. Robust Methods for (60) Provisional application No. 61/003,048, filed on Nov. Microarray Analysis, In: Akay M. ed. Genomics and Proteomics Engineering in Medicine and Biology. Hoboken, New Jersey: IEEE 14, 2007. Press; Wiley; 2007:99-130. Tomlins SA, Rhodes DR, Perner S, et al. Recurrent fusion of (51) Int. Cl. TMPRSS2 and ETS transcription factor in prostate cancer. CI2O I/68 (2006.01) Science. 2005:310:644-648. GOIN33/53 (2006.01) Bland J.M. Altman DG. The logrank test. BMJ. 2004:326:1073. (52) U.S. Cl. Armitage P. Berry G. Statistical methods in medical research (ed 3rd). Oxford; Boston: Blackwell Scientific Publications; 1994. USPC ...... 435/6.1; 435/7.1 BewickV. Cheek L. Ball J. Statistics review 12: survival analysis. Crit (58) Field of Classification Search Care. 2004;8:389-394. None See application file for complete search history. (Continued) (56) References Cited Primary Examiner — Sean Aeder (74) Attorney, Agent, or Firm — Henry D. Coleman; R. Neil U.S. PATENT DOCUMENTS Sudol 2007/0072178 A1 3/2007 Haferlach (57) ABSTRACT 2007/0207459 A1* 9/2007 Dugas et al...... 435/6 The present invention relates to the identification of genetic markers patients with high risk B-precursor acute lympho FOREIGN PATENT DOCUMENTS blastic leukemia (B-ALL) and associated methods and their relationship to therapeutic outcome. The present invention WO 2006009915 A2 1, 2006 also relates to diagnostic, prognostic and related methods WO 2006O71088 A1 T 2006 using these genetic markers, as well as kits which provide OTHER PUBLICATIONS microchips and/or immunoreagents for performing analysis on leukemia patients. Filshie etal (Leukemia, 1998, 12(3): Abstract).* Tockman et al (Cancer Res., 1992, 52:271 1s-2718s).* 25 Claims, 7 Drawing Sheets US 8,568,974 B2 Page 2

(56) References Cited Moniaux N. Chaturvedi P. Varshney GC, et al. MUC4 mucin induces ultra-structural changes and tumorigenicity in pancreatic OTHER PUBLICATIONS cancer cells. Br J Cancer. 2007;97:345-357. Juric D. Lacayo NJ, Ramsey MC, et al. Differential gene expression Bhojwani D, Kang H. Menezes RX. etal. Gene expression signatures patterns and interaction networks in BCR-ABL-positive and -nega predictive of early response and outcome in high-risk childhood tive adult acute lymphoblastic leukemias. J Clin Oncol. acute lymphoblastic leukemia: a Children's Oncology Group study. J 2007:25:1341-1349. Clin Oncol. 2008; 26:4376-4384. Kameda H. Ishigami H. Suzuki M. Abe T. Takeuchi T. Imatinib Fine BM, Stanulla M. Schrappe M. et al. Gene expression patterns mesylate inhibits proliferation of rheumatoid synovial fibroblast-like associated with recurrent chromosomal translocations in acute cells and phosphorylation of Gab adapterproteins activated by plate lymphoblastic leukemia. Blood. 2004:103: 1043-1049. let-derived growth factor. Clin Exp Immunol. 2006; 144:335-341. Zukerberg LR, DeBernardo RL, Kirley SD, et al. Loss of cables, a van Delft FW. Bellotti T. Luo Z. et al. Prospective gene expression cyclin-dependent kinase regulatory protein, is associated with the analysis accurately subtypes acute leukemia in children and estab development of endometrial hyperplasia and endometrial cancer. lishes a commonality between hyperdiploidy and t(12;21) in acute Cancer Res. 2004:64:202-208. lymphoblastic leukaemia. Br J Haematol. 2005: 130:26-35. Zhang H. Duan HO, Kirley SD, Zukerberg LR, Wu CL. Aberrant Coustan-Smith E. Sancho J. Behm FG, et al. Prognostic importance splicing of cables gene, a CDK regulator, in human cancers. Cancer of measuring early clearance of leukemic cells by flow cytometry in Biol Ther. 2005:4:1211-1215. childhood acute lymphoblastic leukemia. Blood. 2002; 100:52-58. Dong Q, Kirley S, Rueda B, Zhao C. Zukerberg L. Oliva E. Loss of Steinherz PG, Gaynon PS, Breneman JC, et al. Cytoreduction and cables, a novel gene on 18q in ovarian cancer. Mod prognosis in acute lymphoblastic leukemia—the importance of early Pathol. 2003; 16:863-868. Kirley SD, D'Apuzzo M. Lauwers GY, Graeme-Cook F. Chung DC, marrow response: report from the Childrens Cancer Group. J. Clin Zukerberg LR. The Cables gene on chromosome 18Q regulates colon Oncol. 1996; 14:389-396. cancer progression in vivo. Cancer Biol Ther. 2005:4:861-863. Bhatia S, Sather HN, HeeremaNA. Trigg ME, Gaynon PS, Robison Ross ME, Zhou X, Song G. et al. Classification of pediatric acute LL. Racial and ethnic differences in Survival of children with acute lymphoblastic leukemia by gene expression profiling. Blood. lymphoblastic leukemia. Blood. 2002: 100:1957-1964. 2003; 102:2951-2959. Pollock BH, DeBaun MR, Camitta BM, et al. Racial differences in Mullighan CG, Miller CB, Su X. et al. ERG deletions define a novel the survival of childhood B-precursor acute lymphoblastic leukemia: Subtype of B-progenitor acute lymphoblastic leukemia. Blood. a Pediatric Oncology Group Study. JClin Oncol. 2000:18:813-823. 2007: 110:212A-213A. Dworzak MN, Forschi G. Printz D, et al. CD99 expression in T-lin Hoffmann, K. Firth MJ, Beesley AH, et al. Prediction of relapse in eage ALL: implications for flow cytometric detection of minimal paediatric pre-B acute lymphoblastic leukaemia using a three-gene residual disease. Leukemia. 2004; 18:703-708. risk index. Br J Haematol. 2008: 140:655-664. Wilkerson AE, Glasgow MA, Hiatt KM. Immunoreactivity of CD99 Gandemer V, et al., Five distinct biological processes and 14 dif in invasive malignant melanoma. J Cutan Pathol. 2006:33:663-666. ferentially expressed genes characterize TEL/AML1-positive leuke Scotlandi K. Perdichizzi S, Bernard G. et al. Targeting CD99 in mia. BMC Genomics 2007:8:385. association with doxorubicin: an effective combined treatment for Timson, G. et al., High level expression of N-acetylglucosamine-6- Ewing's sarcoma. Eur J. Cancer. 2006:42:91-96. O-sulfotransferase is characteristic of a subgroup of paediatric pre Chaturvedi P. Singh AP, Moniaux N, et al. MUC4 mucin potentiates cursor-B acute lymphoblastic leukaemia. Cancer Lett. 2006; pancreatic tumor cell proliferation, Survival, and invasive properties 242:239-244. and interferes with its interaction to extracellular matrix proteins. Mol Cancer Res. 2007:5:309-320. * cited by examiner U.S. Patent Oct. 29, 2013 Sheet 1 of 7 US 8,568,974 B2

Figure 1A U.S. Patent Oct. 29, 2013 Sheet 2 of 7 US 8,568,974 B2

OL9Infil G0,2668

?89199/67||?£897

00000|| Asuau U.S. Patent Oct. 29, 2013 Sheet 3 of 7 US 8,568,974 B2

4. U.S. Patent Oct. 29, 2013 Sheet 4 of 7 US 8,568,974 B2

U.S. Patent Oct. 29, 2013 Sheet 5 Of 7 US 8,568,974 B2

O 1 2 3 a. 5

4. e e s s e d h

Time (years) U.S. Patent Oct. 29, 2013 Sheet 6 of 7 US 8,568,974 B2

U.S. Patent Oct. 29, 2013 Sheet 7 Of 7 US 8,568,974 B2

S-adjo A.IIIqeq Oud US 8,568,974 B2 1. 2 IDENTIFICATION OF NOVEL SUBGROUPS The high-risk ALL Therapeutically Applicable Research to OF HIGH-RISK PEDIATRIC PRECURSORB Generate Effective Treatments (TARGET) pilot project is a ACUTELYMPHOBLASTICLEUKEMIA, partnership between the National Cancer Institute and the OUTCOME CORRELATIONS AND Children's Oncology Group (COG) designed to use genom DAGNOSTIC AND THERAPEUTC ics to identify and validate therapeutic targets. We analyzed METHODS RELATED TO SAME specimens from 207 of 272 (75%) of high-risk B-precursor ALL patients from the COGP990.6 clinical trial in an effort to The present application claims the benefit of priority of identify Subgroups of these high-risk patients that were char U.S. provisional application Ser. No. 61/003,048, filed Nov. acterized by unique gene expression profiles or signatures. 14, 2007, entitled “Identification of Novel Subgroups of 10 Our objectives in this study were three-fold: 1) to identify High-risk Pediatric Precursory BAcute Lymphoblastic Luke subtypes of high-risk B-ALL defined by characteristic gene mia (B-ALL) by Unsupervised Microarray Analysis Clinical signatures, 2) to determine if these subtypes are associated Correlates and Therapeutic Implications. A Children's with specific clinical features and 3) to analyze the signature Oncology Group (COG) Study', the entire contents of said genes to gain insight into the biology of the Subtypes. The application being incorporated by reference herein in its 15 results from these analyses may lead to improved diagnostics, entirety. modified definitions of risk-categories and development of new targeted therapies. RELATED APPLICATIONS AND GOVERNMENT SUPPORT BRIEF DESCRIPTION OF THE FIGURES This invention was made with government Support under a FIG. 1 shows the clustering of COG P9906 samples. In grant from the National Institutes of Health (National Cancer Panel A hierarchical clustering was used to identify groups of Institute), Grant No. 5 UO1CA1114762.03 SPECS. The U.S. samples with related gene expression. The 100 probe sets are Government has certain rights in this invention. shown in rows and the 207 samples in columns. Shades of red 25 depict expression levels higher than the median while greens FIELD OF THE INVENTION indicate lower levels of expression. Colored boxes highlight the identification of eight groups. Bars across the bottom The present invention relates to the identification of genetic denote translocation groups (bright green for t(1:19); yellow markers patients with high risk B-precursor acute lympho for 11cq23 rearrangements; dark green for similar to t(1:19), blastic leukemia (B-ALL) and associated methods and their 30 outcome (red for relapse) and race (blue for Hispanic/Latino). relationship to therapeutic outcome. The present invention In Panel B, VxInsight was used to identify seven distinct also relates to diagnostic, prognostic and related methods clusters of ALL based on gene expression profiles. The data using these genetic markers, as well as kits which provide are visualized as a 3-dimensional terrain map with 2-dimen microchips and/or immunoreagents for performing analysis sional distances reflecting gene expression profile correlates on leukemia patients. 35 and the third dimension representing cluster membership density. Overlaps with the dominant signatures identified by BACKGROUND OF THE INVENTION hierarchical clustering are illustrated by the colors as indi cated in the insert. FIG. 1C shows an example of probe set The majority of children and adolescents with B-precursor with outlier group at high end. Red line indicates signal acute lymphoblastic leukemia (ALL) have good responses to 40 intensities for all 207 samples for probe 212151 at. Vertical current therapy with 5-year survival rates of 84% in 1996 blue lines depict partitioning of samples into thirds. A least 2003, as compared to 54% in 1975-77. To optimize the squares curve fit is applied to the middle third of the samples risk/benefit ratio, patients are stratified for treatment intensity and the resulting trend line is shown in yellow. Different based upon their risk of relapse. The majority of patients sample groups are illustrated by the dashed lines at the top have prognostic factors that place them into the favorable or 45 right. As shown by the double arrowed lines, the median value standard risk treatment groups. These patients generally have from each of these groups is compared to the trend line. long relapse free survivals (RFS), although prediction of the FIG. 2 shows the hierarchical heat map that identifies out individual patients who will fail therapy still remains a sig lier clusters. In Panel A the 209 COPA probe sets are shown in nificant problem. Patients in the high risk treatment group are rows and the 207 samples in columns. In Panel B the 215 fewer in number and have not been as well studied. A detailed 50 ROSE probe sets are shown in rows. The colored boxes indi examination of this cohort of patients may provide insights cate the identification of significant clusters. The colored bars into the genes and pathways that are fundamentally associ across the bottom denote translocations, outcome and race as ated with outcome. described in FIG. 1. The similarities between the groups The white blood cell (WBC) count, age and presence of identified by the ROSE or COPA and hierarchical clustering extramedullary disease at the time of diagnosis have been the 55 are shown in FIG. 2C. FIG. 2C shows a 3-D plot of cluster primary criteria for assigning B-precursor ALL patients to membership from different clustering methods. Each of the risk groups. These groups have been further refined by the three clustering methods is shown on an axis: identification of sentinel genetic alterations (e.g., BCR/ABL HC=hierarchical clusters, RC=ROSE/COPA clusters and or TEL/AML1 fusions) and the rate of response to initial Vx=VxInsight clusters. Cluster numbers are given across treatment." The considerable diversity and varying responses 60 each axis with the exception of RC9, which represents cluster to therapy has led to an effort to further refine risk stratifica 2A. tion. Molecular techniques are being explored in order to FIG.3 shows Kaplan-Meier plots for clusters with aberrant classify patients on the basis of their leukemic cell gene outcome. RFS survival are shown for cluster 6 (Panel A) and expression signatures. Previous microarray studies have cluster 8 (Panel B) for patients identified by multiple algo not only been effective in the identification of subtypes of 65 rithms. In panel 3B, the data for all 207 samples are shown leukemia, but in Some cases they have also found these sig with the line furthermost to the right. In panel 3B, H8 is natures to be associated with outcome." represented by the central line in the graph, V8 is represented US 8,568,974 B2 3 4 by the line second from the right, R8 is represented by the line which MUC4, GPR110 and IGJ are particularly important running from the top of the graph to the bottom and is fur prognostic genes of therapeutic failure within this group. thermost to the left and C8 is represented by the line which These appear in Table 1P, hereinbelow. overlaps with R8 on the left of the graph. In certain embodiments, the amount of the prognostic gene FIG. 4 shows the validation of ROSE in CCG 1961 data set. is determined by the quantitation of a transcript encoding the In Panel A a heat map generated as described in FIG. 2B sequence of the prognostic gene; or a polypeptide encoded by identifies groups of samples with similar patterns of genes the transcript. The quantitation of the transcript can be based expression. The colored boxes indicate the clusters with simi on hybridization to the transcript. The quantitation of the larities to those shown in the primary data set. In Panel B the polypeptide can be based on antibody detection or a related RFS curve for cluster R8 in Panel A is shown in red, while the 10 method. The method optionally comprises a step of amplify RFS for samples not in that group is shown in black. ing nucleic acids from the tissue sample before the evaluating (per analysis). In a number of embodiments, the evaluating is BRIEF DESCRIPTION OF THE INVENTION of a plurality of prognostic genes, preferably at least two (2) prognostic genes, at least three (3) prognostic genes, at least Accurate risk stratification constitutes the fundamental 15 four (4) prognostic genes, at least five (5) prognostic genes, at paradigm of treatment in acute lymphoblastic leukemia least six (6) prognostic genes, at least seven (7) prognostic (ALL), allowing the intensity of therapy to be tailored to the genes, at least eight (8) prognostic genes, at least nine (9) patient’s risk of relapse. The present invention evaluates a prognostic genes, at least ten (10) prognostic genes, at least gene expression profile and identifies prognostic genes of eleven (11) prognostic genes, at least twelve (12) prognostic cancers, in particular leukemia, more particularly high risk genes, at least thirteen (13) prognostic genes, at least fourteen B-precursor acute lymphoblastic leukemia (B-ALL), includ (14) prognostic genes, at least fifteen (15) prognostic genes, ing high risk pediatric acute lymphoblastic leukemia. The at least sixteen (16) prognostic genes, at least seventeen (17) present invention provides a method of determining the exist prognostic genes, at least eighteen (18) prognostic genes, at ence of high risk B-precursor ALL in a patient and predicting least nineteen (19) prognostic genes, at least twenty (20) therapeutic outcome of that patient. The method comprises 25 prognostic genes, at least twenty-one (21) prognostic genes, the steps of first establishing the threshold value of at least two at least twenty-two (22) prognostic genes, at least twenty (2) or three (3) prognostic genes of high risk B-ALL, or four three (23) prognostic genes, including as many as twenty-four (4) prognostic genes, at least five (5) prognostic genes, at least (24) or more prognostic genes. The prognosis which is deter 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least mined from measuring the prognostic genes contributes to 12, at least 13, at least 13, at least 14, at least 15, at least 16, 30 selection of atherapeutic strategy, which may be a traditional at least 17, at least 18, at least 19, at least 20, at least 21, at least therapy for B-precursor ALL (where a favorable prognosis is 22, at least 23 and 24 or more prognostic genes which are determined from measurements), or a more aggressive described in the present specification, especially Table 1 Pand therapy based upon a traditional therapy or non-traditional 1F. Then, the amount of the prognostic gene(s) from a patient therapy (where an unfavorable prognosis is determined from inflicted with high risk B-ALL is determined. The amount of 35 measurements). the prognostic gene present in that patient is compared with The present invention is directed to methods for outcome the established threshold value (a predetermined value) of the prediction and risk classification in leukemia, especially a prognostic gene(s) which is indicative of therapeutic Success high risk classification in B precursor acute lymphoblastic or failure, whereby the prognostic outcome of the patient is leukemia (ALL), especially in children. In one embodiment, determined. The prognostic gene may be a gene which is 40 the invention provides a method for classifying leukemia in a indicative of a poor (bad) prognostic outcome (Table 1P) or a patient that includes obtaining a biological sample from a favorable (good) outcome (Table 1G). Analyzing expression patient; determining the expression level for a selected gene levels of these genes provides accurate insight (diagnostic product, more preferably a group of selected gene products to and prognostic) information into the likelihood ofatherapeu yield an observed gene expression level; and comparing the tic outcome, especially in a high risk B-ALL patient. 45 observed gene expression level for the selected gene Prognostic genes which are indicative of therapeutic Suc product(s) to control gene expression levels (preferably cess in high risk B-ALL include the following: AGAP1 (Arf including a predetermined level). The control gene expres GAP with GTPase domain, ankyrin repeat and PH domain 1, sion level can be the expression level observed for the gene referred to as CENTG2 herein); PTPRM (protein tyrosine product(s) in a control sample, or a predetermined expression phosphatase, receptor type, M); STAP1 (signal transducing 50 level for the gene product. An observed expression level adaptor family member 1); CCNJ (cyclin J). PCDH17 (pro (higher or lower) that differs from the control gene expression cadherin 17); MCAM (melanoma cell adhesion molecule); level is indicative of a disease classification. In another CAPN3 (calpain3); CABLES1 (CdkS and Ablenzyme sub aspect, the method can include determining a gene expression strate 1); GPR155 (G protein-coupled receptor 155). These profile for selected gene products in the biological sample to appear in Table 1G, hereinbelow. 55 yield an observed gene expression profile; and comparing the Prognostic genes which are indicative of therapeutic fail observed gene expression profile for the selected gene prod ure in high risk B-ALL include the following: MUC4 (mucin ucts to a control gene expression profile for the selected gene 4); GPR110 (G protein-coupled receptor 110); IGJ (immu products that correlates with a disease classification, for noglobulin J polypeptide); NRXN3 (neurexin 3); CD99 example ALL, and in particular high risk B precursor ALL; (CD99 molecule); CRLF2 (cytokine receptor-like factor 2); 60 wherein a similarity between the observed gene expression ENAM (enamel in); TP53INP1 (tumor protein p53 inducible profile and the control gene expression profile is indicative of nuclear protein 1); IFITM1 (interferon induced transmem the disease classification (e.g., high risk B-all poor or favor brane protein 1); IFITM2 (interferon induced transmembrane able prognostic). protein 2); IFITM3 (interferon induced transmembrane pro The disease classification can be, for example, a classifi tein 3); TTYH2 (tweety homolog 2): SEMA6A (semaphorin 65 cation preferably based on predicted outcome (remission VS 6A); TNFSF4 (tumor necrosis factor superfamily, member therapeutic failure); but may also include a classification 4); and SLC37A3 (solute carrier family 37, member 3), of based upon clinical characteristics of patients, a classification US 8,568,974 B2 5 6 based on karyotype; a classification based on leukemia Sub to Tables 1P and 1G. In certain preferred aspects of the inven type; or a classification based on disease etiology. Where the tion, the expression levels of all 24 gene products (Tables 1P classification is based on disease outcome, the observed gene and 1G) may be determined and compared to a predetermined product is preferably a gene product selected from at least two gene expression level, wherein a measurement above or or three of the following group of five gene products, more below a predetermined expression level is indicative of the preferably three, four or all five gene products: MUC4 (Mu likelihood of a favorable therapeutic response (continuous cin 4, cell surface associated), GRP110 (G protein-coupled complete remission or CCR) ortherapeutic failure. In the case receptor 110), IGJ (Immunoglobulin J polypeptide, linker where therapeutic failure is predicted, the use of more aggres protein for immunoglobulin alpha and mu polypeptides), sive protocols of traditional anti-cancer therapies (higher CENTG2 (Centaurin, gamma 2) and PTPRM (protein 10 doses and/or longer duration of drug administration) or tyrosine phosphatase, receptor type, M). Expression levels of experimental therapies may be advisable. at least two of the first three gene products (MUC4, GRP110, Optionally, the method further comprises determining the IGJ) which are higher than a control group evidence poor expression level for other gene products within the list of gene prognosis (poor responders to traditional anti-leukemia products otherwise disclosed herein and comparing in a simi therapy) for a therapeutic outcome using traditional therapy, 15 lar fashion the observed gene expression levels for the whereas expression levels of the last two gene products selected gene products with a control gene expression level (CENTG2, PTPRM) which are higher than a control group for those gene products, wherein an observed expression level evidence favorable (good responders to traditional anti-leu for these gene products that is different from (above or below) kemia therapy) prognosis to traditional therapy. Preferably at the control gene expression level for that gene product is least two gene products from the group are expressed, more further indicative of predicted remission (favorable progno preferably at least three, at least four and all five gene prod sis) or relapse (unfavorable prognosis). ucts. Alternatively, the invention may rely on measuring at The invention further includes a method for treating leu least two of the nine (9) gene products (including CENT2G kemia comprising administering to a leukemia patient a and PTPRM) of those listed in Table 1G (favorable therapeu therapeutic agent that modulates the amount or activity of the tic outcome), and/or at least two or more of the fifteen (15) 25 gene product(s) associated with therapeutic outcome, in par gene products of those listed in Table 1P (unfavorable thera ticular, MUC4, GPR110 (inhibited or downregulated) or peutic outcome) or any combination of the twenty-four (24) CENTG2 or PTPRM (enhanced or upregulated). Preferably, gene products which appear in Tables 1 P and 1F, below. the method modulates (enhancement/upregulation of a gene Measurement of all 24 gene products set forth in Table 1 Pand product associated with a favorable or good therapeutic out 1F, below, may also be performed to provide an accurate 30 come or inhibition/downregulation of a gene product associ assessment of therapeutic intervention. ated with a poor or unfavorable therapeutic outcome as mea The invention further provides for a method for predicting sured by comparison with a control sample or predetermined a patient falls within a particular group of high risk B-ALL value) at least two of the five gene products as set forth above, patients and predicting therapeutic outcome in that BALL three of the gene products, four of the gene products or all five leukemia patient, especially pediatric B-ALL that includes 35 of the gene products. In addition, the therapeutic method obtaining a biological sample from a patient; determining the according to the present invention also modulates at least two, expression level for selected gene products associated with three, four, five, six, seven, eight, nine, ten, eleven, twelve, outcome to yield an observed gene expression level; and thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nine comparing the observed gene expression level for the selected teen, twenty, twenty-one, twenty-two, twenty-three, twenty gene product(s) to a control gene expression level for the 40 four of a number of gene products in Tables 1 P and 1G as selected gene product. The control gene expression level for indicated or otherwise described herein, any one or more of the selected gene product can include the gene expression the gene products of Table 1 P: MUC4; GPR110; IGJ; level for the selected gene product observed in a control NRXN3: CD99; CRLF2: ENAM: TP53INP1; IFITM1; sample, or a predetermined gene expression level for the IFITM2; IFITM3; TTYH2: SEMA6A: TNFSF4; and selected gene product; wherein an observed expression level 45 SLC37A3 as being inhibited or downregulated and/or any that is different from the control gene expression level for the one or more of the gene products of Table 1 F: CENTG2: selected gene product(s) is indicative of predicted remission. PTPRM, STAP1; CCNJ; PCDH17; MCAM; CAPN3; The method preferably may determine gene expression levels CABLES1; GPR155 as being enhanced or upregulated as of at least two gene products selected from the group consist measured in comparison to a control expression or predeter ing of MUC4, GRP110, IGJ, CENT2G and PTPRM, more 50 mined value. preferably at least three, four or all five gene products. Alter Also provided by the invention is an in vitro method for natively, at least two, three, four, five, six, seven, eight, nine, screening a compound useful for treating leukemia, espe ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seven cially high risk B-ALL. The invention further provides an in teen, eighteen, nineteen, twenty, twenty-one, twenty-two, Vivo method for evaluating a compound for use in treating twenty-three or twenty-four or gene products selected from 55 leukemia, especially high risk B-ALL. The candidate com the group consisting of MUC4; GPR110; IGJ: NRXN3: pounds are evaluated for their effect on the expression level(s) CD99; CRLF2: ENAM: TP53INP1; IFITM1; IFITM2: of one or more gene products associated with outcome in IFITM3; TTYH2: SEMA6A: TNFSF4; SLC37A3; leukemia patients (for example, Table 1 P and 1G and as CENTG2: PTPRM, STAP1; CCNJ; PCDH17; MCAM: otherwise described herein), especially high risk B-ALL, CAPN3; CABLES1; and GPR155 or as otherwise described 60 preferably at least two of those gene products, at least three of herein are measured, compared to predetermined values (e.g. those gene products, at least four of those gene products, at from a control sample) and then assessed to determine the least five of those gene products, at least six of those gene likelihood of a favorable or unfavorable therapeutic outcome products, at least seven of those gene products, at least eight and then providing a therapeutic approach consistent with the of those gene products, at least nine of those gene products, at analysis of the express of the measured gene products. The 65 least ten of those gene products, at least eleven of those gene present method may include measuring expression of at least products, at least twelve of those gene products, at least two gene products up to 24 or more gene products according thirteen of those gene products, at least fourteen of those gene US 8,568,974 B2 7 8 products, at least fifteen of those gene products, at least six DETAILED DESCRIPTION OF THE INVENTION teen of those gene products, at least seventeen of those gene products, at least eighteen of those gene products, at least Gene expression profiling can provide insights into disease twenty of those gene products, at least twenty-one of those etiology and genetic progression, and can also provide tools gene products, at least twenty-two of those gene products, at 5 for more comprehensive molecular diagnosis and therapeutic least twenty-three of those gene products or twenty-four of targeting. The biologic clusters and associated gene profiles those gene products may be measured to determine a thera identified herein may be useful for refined molecular classi peutic outcome. fication of acute leukemias as well as improved risk assess The preferred five gene products are as identified for ment and classification, especially of high risk B precursor example, using probe sets (MUC4, GPR110, IGJ, CENTG2, 10 acute lymphoblastic leukemia (B-ALL), especially including PTPRM). These 5 genes and their expression above or below pediatric B-ALL. In addition, the invention has identified a predetermined expression level are more predictive of over numerous genes, including but not limited to the genes all outcome. As shown below, at least two or more of the gene MUC4 (Mucin 4, cell surface associated), GRP110 (G pro products which are presented in tables 1P or 1G may be used tein-coupled receptor 110), IGJ (ImmunoglobulinJ polypep to predict therapeutic outcome. This predictive model is 15 tide, linker protein for immunoglobulin alpha and mu tested in an independent cohort of high risk pediatric B-ALL polypeptides), CENTG2 (Centaurin, gamma 2), PTPRM cases (20) and is found to predict outcome with extremely (protein tyrosine phosphatase, receptor type, M), as well as high statistical significance (p-value <1.0). It is noted that numerous additional genes which are presented in Table 1P the expression of gene products of at least two of the five and 1G hereof, that are, alone or in combination, strongly genes listed above, as well as additional genes from the list predictive of therapeutic outcome in high risk B-ALL, and in appearing in Tables 1P and 1F and in certain preferred particular high risk pediatric B precursor ALL. The genes instances, the expression of all 24 gene products of Table 1P identified herein, and the gene products from said genes, and 1 F may be measured and compared to predetermined including proteins they encode, can be used to refine risk expression levels to provide the greater degrees of certainty of classification and diagnostics, to make outcome predictions 25 and improve prognostics, and to serve as therapeutic targets in a therapeutic outcome. infant leukemia and pediatric ALL, especially B-precursor ALL. TABLE 1P “Gene expression” as the term is used herein refers to the Poor Unfavorable Outcome production of a biological product encoded by a nucleic acid 30 sequence, such as a gene sequence. This biological product, Symbol GeneID Location referred to herein as a 'gene product may be a nucleic acid MUC4 mucin 4 4585 3q29 or a polypeptide. The nucleic acid is typically an RNA mol GPR110 G protein-coupled receptor 110 266977 6p12 ecule which is produced as a transcript from the gene IG immunoglobulin J polypeptide 3512 4q21 NRXN3 neurexin 3 9369 14q31 sequence. The RNA molecule can be any type of RNA mol CD99 CD99 molecule 4267 Xp22: Yp 11 35 ecule, whether either before (e.g., precursor RNA) or after CRLF2 cytokine receptor-like factor 2 64109 Xp22: Yp 11 (e.g., mRNA) post-transcriptional processing. cDNA pre ENAM enamelin 101 17 4q13 pared from the mRNA of a sample is also considered a gene TP53INP1 tumor protein p53 inducible nuclear 94241 8q22 product. The polypeptide gene product is a peptide or protein protein 1 IFITM1 interferon induced transmembrane 8519 11p15 that is encoded by the coding region of the gene, and is protein 1 40 produced during the process of translation of the mRNA. IFITM2 interferon induced transmembrane 10581 11p15 The term “gene expression level” refers to a measure of a protein 2 gene product(s) of the gene and typically refers to the relative IFITM3 interferon induced transmembrane 10410 11 p.15 protein 3 or absolute amount or activity of the gene product. TTYH2 tweety homolog 2 94.015 17q25 The term “gene expression profile' as used herein is SEMA6A semaphorin 6A 57.556 5q23 45 defined as the expression level of two or more genes. The term TNFSF4 tumor necrosis factor Superfamily, 7292 1925 gene includes all natural variants of the gene. Typically a gene member 4 expression profile includes expression levels for the products SLC37A3 solute carrier family 37, member 3 84255 7q34 of multiple genes in given sample, up to about 13,000, pref erably determined using an oligonucleotide microarray. 50 Unless otherwise specified, “a,” “an.” “the and “at least TABLE 1G one' are used interchangeably and mean one or more than OC. Good Favorable Outcome The term “patient' shall mean within context an animal, Symbol GeneID Location preferably a mammal, more preferably a human patient, more 55 preferably a human child who is undergoing or will undergo AGAP1 Arf6AP with GTPase domain, 116987 2a37 ankyrin repeat and PH domain 1 therapy or treatment for leukemia, especially high risk B-pre (aka CENTG2) cursor acute lymphoblastic leukemia. PTPRM protein tyrosine phosphatase, 5797 18p11 The term “high risk B precursor acute lymphocytic leuke receptor type, M STAP1 signal transducing adaptor family 26228 4q13 mia' or “high risk B-ALL refers to a disease state of a patient member 1 60 with acute lymphoblastic leukemia who meets certain high CCNJ cyclin J 54619 10pter-q26 risk disease criteria. These include: confirmation of B-pre PCDH17 procadherin 17 27253 13q21 cursor ALL in the patient by central reference laboratories MCAM melanoma cell adhesion molecule 4162 11q23 CAPN3 calpain 3 825 15q15-q21 (See Borowitz, et al., Rec Results Cancer Res 1993: 131: CABLES1 ColkS and Abl enzyme substrate 1 91768 18q11 257-267); and exhibiting a leukemic cell DNA index of GPR155 G protein-coupled receptor 155 151556 2q31 65 s0-1.16 (DNA content in leukemic cells: DNA content of normal Go/G cells) (DI) by central reference laboratory (See, Trueworthy, et al., J Clin Oncol 1992: 10: 606-613; and US 8,568,974 B2 9 10 Pullen, et al., “Immunologic phenotypes and correlation with blastic leukemia, for markers which subset into responsive treatment results’. In Murphy S. B. Gilbert J R (eds). Leuke and non-responsive groups for particular treatments is very mia Research. Advances in Cell Biology and Treatment. useful. Elsevier: Amsterdam, 1994, pp. 221-239) and at least one of Current models of leukemia classification have become the following: (1) WBC>10000-99 000/ul, aged 1-2.99 years better at distinguishing between cancers that have similar orages 6-21 years; (2)WBC>100 000/ul, aged 1-21 years; (3) histopathological features but vary in clinical course and out all patients with CNS or overt testicular disease at diagnosis: come, except in certain areas, one of them being in high risk or (4) leukemic cell chromosome translocations t(1:19) or B-precursor acute lymphoblastic leukemia (B-ALL). Identi t(9:22) confirmed by central reference laboratory. (See, Crist, fication of novel prognostic molecular markers is a priority if et al, Blood 1990; 76: 117-122; and Fletcher, et al., Blood 10 radical treatment is to be offered on a more selective basis to 1991; 77:435-439). those high risk leukemia patients with disease states which do The term “traditional therapy” relates to therapy (protocol) not respond favorably to conventional therapy. A novel strat which is typically used to treat leukemia, especially B-pre egy is described to discover/assess/measure molecular mark cursor ALL (including pediatric B-ALL) and can include ers for B-ALL leukemia, especially high risk B-ALL to deter Memorial Sloan-Kettering New York II therapy (NY II), 15 mine a treatment protocol, by assessing gene expression in UKALLR2, AL 841, AL851, ALHR88, MCP841 (India), as leukemia patients and modeling these databased on a prede well as modified BFM (Berlin-Frankfurt-Münster) therapy, termined gene product expression for numerous patients hav BMF-95 or other therapy, including ALinC 17 therapy as is ing a known clinical outcome. The invention herein is well-known in the art. In the present invention the term “more directed to defining different forms of leukemia, in particular, aggressive therapy’ or "alternative therapy' usually means a B-precursor acute lymphoblastic leukemia, especially high more aggressive version of conventional therapy typically risk B-precursor acute lymphoblastic leukemia, including used to treat leukemia, for example B-ALL, including pedi high risk pediatric B-ALL by measuring expression gene atric B-precursor ALL, using for example, conventional or products which can translate directly into therapeutic prog traditional chemotherapeutic agents at higher dosages and/or nosis. Such prognosis allows for application of a treatment for longer periods of time in order to increase the likelihood of 25 regimen having a greater statistical likelihood of cost effec a favorable therapeutic outcome. It may also refer, in context, tive treatments and minimization of negative side effects from to experimental therapies for treating leukemia, rather than the different/various treatment options. simply more aggressive versions of conventional (traditional) In preferred aspects, the present invention provides an therapy. improved method for identifying and/or classifying acute Diagnosis, Prognosis and Risk Classification 30 leukemias, especially B precursor ALL, even more especially Current parameters used for diagnosis, prognosis and risk high risk B precursor ALL and also high risk pediatric B classification in pediatric ALL are related to clinical data, precursor ALL and for providing an indication of the thera cytogenetics and response to treatment. They include age and peutic outcome of the patient based upon an assessment of white blood count, cytogenetics, the presence or absence of expression levels of particular genes. Expression levels are minimal residual disease (MRD), and a morphological 35 determined for two or more genes associated with therapeutic assessment of early response (measured as slow or rapid early outcome, risk assessment or classification, karyotpe (e.g., therapeutic response). As noted above however, these param MLL translocation) or Subtype (e.g., B-ALL, especially high eters are not always well correlated with outcome, nor are risk B-ALL). Genes that are particularly relevant for diagno they precisely predictive at diagnosis. sis, prognosis and risk classification, especially for high risk Prognosis is typically recognized as a forecast of the prob 40 B precursor ALL, including high risk pediatric B precursor able course and outcome of a disease. As such, it involves ALL, according to the invention include those described in inputs of both statistical probability, requiring numbers of the tables (especially Table 1 P and 1G) and figures herein. samples, and outcome data. In the present invention, outcome The gene expression levels for the gene(s) of interest in a data is utilized in the form of continuous complete remission biological sample from a patient diagnosed with or Suspected (CCR) of ALL or therapeutic failure (non-CCR). A patient 45 of having an acute leukemia, especially B precursor ALL are population of hundreds is included, providing statistical compared to gene expression levels observed for a control power. sample, or with a predetermined gene expression level. The ability to determine which cases of leukemia, espe Observed expression levels that are higher or lower than the cially high risk B precursor acute lymphoblastic leukemia expression levels observed for the gene(s) of interest in the (B-ALL), including high risk pediatric B-ALL will respond 50 control sample or that are higher or lower than the predeter to treatment, and to which type of treatment, would be useful mined expression levels for the gene(s) of interest (as set forth in appropriate allocation of treatment resources. It would also in Table 1 P and 1G) provide information about the acute provide guidance as to the aggressiveness of therapy in pro leukemia that facilitates diagnosis, prognosis, and/or risk ducing a favorable outcome (continuous complete remission classification and can aid in treatment decisions, especially or CCR). As indicated above, the various standard therapies 55 whether to use a more of less aggressive therapeutic regimen have significantly different risks and potential side effects, or perhaps even an experimental therapy. When the expres especially therapies which are more aggressive or even sion levels of multiple genes are assessed for a single biologi experimental in nature. Accurate prognosis would also mini cal sample, a gene expression profile is produced. mize application of treatment regimens which have low like In one aspect, the invention provides genes and gene lihood of Success and would allow a more efficient aggressive 60 expression profiles that are correlated with outcome (i.e., or even an experimental protocol to be used without wasting complete continuous remission or good/favorable prognosis effort on therapies unlikely to produce a favorable therapeutic vs. therapeutic failure or poor/unfavorable prognosis) in high outcome, preferably a continuous complete remission. Such risk B-ALL. Assessment of at least two or more of these genes also could avoid delay of the application of alternative treat according to the invention, preferably at least three, at least ments which may have higher likelihoods of Success for a 65 four, at least five, six, seven, eight, nine, ten, eleven, twelve, particular presented case. Thus, the ability to evaluate indi thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nine vidual leukemia cases, especially B-precursor acute lympho teen, twenty, twenty-one, twenty-two, twenty-three, twenty US 8,568,974 B2 11 12 four or more as set forth in Tables 1P and 1F in a given gene family 37, member 3) are measured and compared with pre profile can be integrated into revised risk classification determined values for each of the gene products measured. schemes, therapeutic targeting and clinical trial design. In one Any number of genes may be measured, with at least two embodiment, the expression levels of a particular gene (gene genes being measured in the 15 genes listed. In preferred products) are measured, and that measurement is used, either aspects, the expression of all fifteen genes is measured. alone or with other parameters, to assign the patient to a Expression levels for multiple genes can be measured. For particular risk category (e.g., high risk B-ALL good/favor example, if normalized expression levels for (MUC4 (mucin able or high risk B-ALL poor/unfavorable). The invention 4); GPR110 (G protein-coupled receptor 110); IGJ (immu identifies several genes whose expression levels, either alone noglobulin J polypeptide); NRXN3 (neurexin 3); CD99 or in combination, are associated with outcome, including but 10 (CD99 molecule); CRLF2 (cytokine receptor-like factor 2); not limited to at least two genes, preferably at least three ENAM (enamelin); TP53INP1 (tumor protein p53 inducible genes, four genes and preferably all five genes genes selected nuclear protein 1); IFITM1 (interferon induced transmem from the group consisting of MUC4, GPR110, IGJ, CENTG2 brane protein 1); IFITM2 (interferon induced transmembrane and PTPRM. protein 2); IFITM3 (interferon induced transmembrane pro The prognostic genes for purposes of the present invention 15 tein 3); TTYH2 (tweety homolog 2): SEMA6A (semaphorin are selected from the group consisting of MUC4 (mucin 4); 6A); TNFSF4 (tumor necrosis factor superfamily, member GPR110 (G protein-coupled receptor 110); IGJ (immunoglo 4); and SLC37A3 (solute carrier family 37, member 3) are bulin J polypeptide); NRXN3 (neurexin 3); CD99 (CD99 higher than a predetermined value (higher expression levels molecule); CRLF2 (cytokine receptor-like factor 2); ENAM of these genes are predictive of therapeutic failure), an unfa (enamelin); TP53INP1 (tumor protein p53 inducible nuclear Vorable outcome can be predicted with greater certainty. In protein 1); IFITM1 (interferon induced transmembrane pro the case of the genes which are listed in Table 1G, which are tein 1); IFITM2 (interferon induced transmembrane protein genes predicting a favorable therapeutic outcome, if expres 2); IFITM3 (interferon induced transmembrane protein 3): sion levels of at least two of AGAP1 (ArfGAP with GTPase TTYH2 (tweety homolog 2): SEMA6A (semaphorin 6A); domain, ankyrin repeat and PH domain 1, aka CENTG2); TNFSF4 (tumor necrosis factor superfamily, member 4); 25 PTPRM (protein tyrosine phosphatase, receptor type, M); SLC37A3 (solute carrier family 37, member 3) which are STAP1 (signal transducing adaptor family member 1); CCNJ poor outcome predictors and AGAP1 (ArfGAP with GTPase (cyclin J); PCDH17 (procadherin 17); MCAM (melanoma domain, ankyrin repeat and PH domain 1, aka CENTG2); cell adhesion molecule); CAPN3 (calpain 3); CABLES1 PTPRM (protein tyrosine phosphatase, receptor type, M); (CdkS and Ablenzyme substrate 1); and GPR155 (G protein STAP1 (signal transducing adaptor family member 1); CCNJ 30 coupled receptor 155) are higher than a predetermined value, (cyclin J); PCDH17 (procadherin 17); MCAM (melanoma a more favorable outcome may be predicted. Preferably, at cell adhesion molecule); CAPN3 (calpain 3); CABLES1 least two of MUC4, GPR110 and IGJ are measured and (CdkS and Ablenzyme substrate 1); and GPR155 (G protein alternatively, both CENTG2 and PTPRM are measured and coupled receptor 155) which are favorable/good outcome compared to predetermined values. Preferably, at least three predictors. 35 of these gene produces are measured and compared to prede Some of these genes (e.g., those genes which are set forth termined values. in Table 1G) exhibit a positive association between expres In general, the expression of at least two genes in a single sion level and outcome. For these genes, expression levels group is measured and compared to a predetermined value to above a predetermined threshold level (or higher than that provide a therapeutic outcome prediction and in addition to exhibited by a control sample) is predictive of a positive 40 those two genes, the expression of any number of additional outcome (continuous complete remission). Our data Suggests genes described in Tables 1 P and 1G can be measured and that direct measurement of the expression level of at two or used for predicting therapeutic outcome. In certain aspects of more of these genes, preferably at least including CENTG2 the invention where very high reliability is desired/required, and PTPRM, more preferably at least three of those genes, at the expression levels of all 24 genes (as per Tables 1P and 1F) least four, at least five, at least six, at least seven, at least eight 45 may be measured and compared with a predetermined value and all nine of these genes more preferably all nine of these for each of the genes measured Such that a measurement genes, can be used in refining risk classification and outcome above or below the predetermined value of expression for prediction in high risk B precursor ALL. In particular, it is each of the group of genes is indicative of a favorable thera expected Such measurements can be used to refine risk clas peutic outcome (continuous complete remission) or a thera sification in children who are otherwise classified as having 50 peutic failure. In the event of a predictive favorable therapeu high risk B-ALL, but who can respond favorable (cured) with tic outcome, conventional anti-cancer therapy may be used traditional, less intrusive therapies. and in the event of a predictive unfavorable outcome (failure), MUC4, GPR110, IGJ, in particular, are strong predictors of more aggressive therapy may be recommended and imple an unfavorable outcome for a high risk B-ALL patient and mented. therefore in preferred aspects, the expression of at least three 55 The expression levels of multiple (two or more, preferably genes, and preferably the expression of at least two of those three or more, more preferably at least five genes as described three genes among the fifteen (genes) which are set forth in hereinabove and in addition to the five, up to twenty-four Table 1 P: (MUC4 (mucin 4); GPR110 (G protein-coupled genes within the genes listed in Tables 1P and 1F in one or receptor 110); IGJ (immunoglobulin J polypeptide); NRXN3 more lists of genes associated with outcome can be measured, (neurexin 3); CD99 (CD99 molecule); CRLF2 (cytokine 60 and those measurements are used, either alone or with other receptor-like factor 2); ENAM (enamelin); TP53INP1 (tumor parameters, to assign the patient to a particular risk category protein p53 inducible nuclear protein 1); IFITM1 (interferon as it relates to a predicted therapeutic outcome. For example, induced transmembrane protein 1); IFITM2 (interferon gene expression levels of multiple genes can be measured for induced transmembrane protein 2); IFITM3 (interferon a patient (as by evaluating gene expression using an Affyme induced transmembrane protein3); TTYH2 (tweety homolog 65 trix microarray chip) and compared to a list of genes whose 2); SEMA6A (semaphorin 6A); TNFSF4 (tumor necrosis expression levels (high or low) are associated with a positive factor superfamily, member 4); and SLC37A3 (solute carrier (or negative) outcome. If the gene expression profile of the US 8,568,974 B2 13 14 patient is similar to that of the list of genes associated with expression of these fifteen genes which is higher than prede outcome, then the patient can be assigned to a low risk (favor termined values for each of these genes is predictive of a high able outcome) or high risk (unfavorable outcome) category. likelihood of a therapeutic failure using traditional B precur The correlation between gene expression profiles and class sor ALL therapies. High expression for these fifteen genes distinction can be determined using a variety of methods. would dictate an early aggressive therapy or experimental Methods of defining classes and classifying samples are therapy in order to increase the likelihood of a favorable described, for example, in Golub et al., U.S. Patent Applica therapeutic outcome. tion Publication No. 2003/0017481 published Jan. 23, 2003, Some genes in these clusters are metabolically related, and Golub et al., U.S. Patent Application Publication No. Suggesting that a metabolic pathway that is associated with 2003/0134300, published Jul. 17, 2003. The information pro 10 cancer initiation or progression. Other genes in these meta vided by the present invention, alone or in conjunction with bolic pathways, like the genes described herein but upstream other test results, aids in sample classification and diagnosis or downstream from them in the metabolic pathway, thus can of disease. also serve as therapeutic targets. Computational analysis using the gene lists and other data, In yet another aspect, the invention provides genes and Such as measures of statistical significance, as described 15 gene expression profiles which may be used to discriminate herein is readily performed on a computer. The invention high risk B-ALL from acute myeloid leukemia (AML) in should therefore be understood to encompass machine read infant leukemias by measuring the expression levels of the able media comprising any of the data, including gene lists, gene product(s) correlated with B-ALL as otherwise described herein. The invention further includes an apparatus described herein, especially B-precursor ALL. that includes a computer comprising such data and an output It should be appreciated that while the present invention is device Such as a monitor or printerfor evaluating the results of described primarily in terms of human disease, it is useful for computational analysis performed using Such data. diagnostic and prognostic applications in other mammals as In another aspect, the invention provides genes and gene well, particularly in veterinary applications such as those expression profiles that are correlated with cytogenetics. This related to the treatment of acute leukemia in cats, dogs, cows, allows discrimination among the various karyotypes, such as 25 pigs, horses and rabbits. MLL translocations or numerical imbalances such as hyper Further, the invention provides methods for computational diploidy or hypodiploidy, which are useful in risk assessment and statistical methods for identifying genes, lists of genes and outcome prediction. and gene expression profiles associated with outcome, karyo In yet another aspect, the invention provides genes and type, disease Subtype and the like as described herein. gene expression profiles that are correlated with intrinsic 30 In Sum, the present invention has identified a group of disease biology and/or etiology. In other words, gene expres genes which strongly correlate with favorable/unfavorable sion profiles that are common or shared among individual outcome in B precursor acute lymphoblastic leukemia and leukemia cases in different patients can be used to define contribute unique information to allow the reliable prediction intrinsically related groups (often referred to as clusters) of of a therapeutic outcome in high risk B precursor ALL, espe acute leukemia that cannot be appreciated or diagnosed using 35 cially high risk pediatric B precursor ALL. standard means such as morphology, immunophenotype, or Measurement of Gene Expression Levels cytogenetics. Mathematical modeling of the very sharp peak Gene expression levels are determined by measuring the in ALL incidence seen in children 2-3 years old (>80 cases per amount or activity of a desired gene product (i.e., an RNA or million) has suggested that ALL may arise from two primary a polypeptide encoded by the coding sequence of the gene) in events, the first of which occurs in utero and the second after 40 a biological sample. Any biological sample can be analyzed. birth (Linetet al., Descriptive epidemiology of the leukemias, Preferably the biological sample is a bodily tissue or fluid, in Leukemias, 5' Edition. ES Henderson et al. (eds). WB more preferably it is a bodily fluid such as blood, serum, Saunders, Philadelphia. 1990). Interestingly, the detection of plasma, urine, bone marrow, lymphatic fluid, and CNS or certain ALL-associated genetic abnormalities in cord blood spinal fluid. Preferably, Samples containing mononuclear samples taken at birth from children who are ultimately 45 bloods cells and/or bone marrow fluids and tissues are used. affected by disease supports this hypothesis (Gale et al., Proc. In embodiments of the method of the invention practiced in Natl. Acad. Sci. U.S.A., 94:13950-13954, 1997: Ford et al., cell culture (such as methods for screening compounds to Proc. Natl. Acad. Sci. U.S.A., 95:4584-4588, 1998). identify therapeutic agents), the biological sample can be The results for pediatric B precursor ALL suggest that this whole or lysed cells from the cell culture or the cell superna disease is composed of novel intrinsic biologic clusters 50 tant. defined by shared gene expression profiles, and that these Gene expression levels can be assayed qualitatively or intrinsic subsets cannot reliably be defined or predicted by quantitatively. The level of a gene product is measured or traditional labels currently used for risk classification or by estimated in a sample either directly (e.g., by determining or the presence or absence of specific cytogenetic abnormalities. estimating absolute level of the gene product) or relatively We have identified 24 genes for determining outcome in high 55 (e.g., by comparing the observed expression level to a gene risk B-ALL, and in particular high risk pediatric B precursor expression level of another samples or set of samples). Mea ALL using the methods set forth hereinbelow, for identifying Surements of gene expression levels may, but need not, candidate genes associated with classification and outcome. include a normalization process. We have identified 9 genes (Table 1G) which are positive Typically, mRNA levels (or cDNA prepared from such predictors of favorable outcome in high risk B precursor ALL 60 mRNA) are assayed to determine gene expression levels. patients, especially high risk pediatric B precursor ALL Methods to detect gene expression levels include Northern patients. Expression of two or more of these genes which is blot analysis (e.g., Harada et al., Cell 63:303-312 (1990)), S1 greater than a predetermined value or from a control is indica nuclease mapping (e.g., Fujita et al., Cell 49:357-367 (1987)), tive that traditional B-ALL therapy is appropriate for treating polymerase chain reaction (PCR), reverse transcription in the patients B precursor ALL. In addition, the present inven 65 combination with the polymerase chain reaction (RT-PCR) tion has identified fifteen (15) genes (see Table 1P) which (e.g., Example III; see also Makino et al., Technique 2:295 correlate with failed therapy. Thus, a measurement of the 301 (1990)), and reverse transcription in combination with US 8,568,974 B2 15 16 the ligase chain reaction (RT-LCR). Multiplexed methods regimen. That regimen may be that traditionally used for the that allow the measurement of expression levels for many treatment of leukemia (as discussed hereinabove) in the case genes simultaneously are preferred, particularly in embodi where the analysis of gene products from samples taken from ments involving methods based on gene expression profiles the patient predicts a favorable therapeutic outcome, or alter comprising multiple genes. In a preferred embodiment, gene natively, the chosen regimen may be a more aggressive expression is measured using an oligonucleotide microarray, approach (e.g., higher dosages of traditional therapies for such as a DNA microchip. DNA microchips contain oligo longer periods of time) or even experimental therapies in nucleotide probes affixed to a solid substrate, and are useful instances where the predictive outcome is that of failure of for Screening a large number of samples for gene expression. therapy. DNA microchips comprising DNA probes for binding poly 10 In addition, the present invention may provide new treat nucleotide gene products (mRNA) of the various genes from ment methods, agents and regimens for the treatment of leu Table 1 are additional aspects of the present invention. kemia, especially high risk B-precursor acute lymphoblastic Alternatively or in addition, polypeptide levels can be leukemia, especially high risk pediatric B-precursor ALL. assayed. Immunological techniques that involve antibody The genes identified herein that are associated with outcome binding. Such as enzyme linked immunosorbent assay 15 and/or specific disease Subtypes or karyotypes are likely to (ELISA) and radioimmunoassay (RIA), are typically have a specific role in the disease condition, and hence rep employed. Where activity assays are available, the activity of resent novel therapeutic targets. Thus, another aspect of the a polypeptide of interest can be assayed directly. invention involves treating high risk B-ALL patients, includ As discussed above, the expression levels of these markers ing high risk pediatric ALL patients by modulating the in a biological sample may be evaluated by many methods. expression of one or more genes described herein in Table 1P They may be evaluated for RNA expression levels. Hybrid or 1F to a desired expression level or below. ization methods are typically used, and may take the form of In the case of those gene products (Table 1 P and 1F) whose a PCR or related amplification method. Alternatively, a num increased or decreased expression (whether above or below a ber of qualitative or quantitative hybridization methods may predetermined value, for example obtained for a control be used, typically with some standard of comparison, e.g., 25 sample) is associated with a favorable outcome or failure, the actin message. Alternatively, measurement of protein levels treatment method of the invention will involve enhancing the may performed by many means. Typically, antibody based expression of those gene products in which a favorable thera methods are used, e.g., ELISA, radioimmunoassay, etc., peutic outcome is predicted by Such enhancement and inhib which may not require isolation of the specific marker from iting the expression of those gene products in which enhanced other proteins. Other means for evaluation of expression lev 30 expression is associated with failed therapy. els may be applied. Antibody purification may be performed, Thus, in the case of CENTG2, PTPRMorother gene prod though separation of protein from others, and evaluation of ucts of Table 1G such as STAP1; CCNJ; PCDH17: MCAM: specific bands or peaks on protein separation may provide the CAPN3; CABLES1; and GPR155, increased expression of at same results. Thus, e.g., mass spectroscopy of a protein least two, at least three, at least four, at least five and prefer sample may indicate that quantitation of a particular peak will 35 ably all of these genes will be a therapeutic goal because allow detection of the corresponding gene product. Multidi enhanced expression of these genes together is predictive of a mensional protein separations may provide for quantitation favorable therapeutic outcome and in the case of MUC4: of specific purified entities. GPR110; IGJ; NRXN3: CD99; CRLF2: ENAM: TP53INP1; The observed expression levels for the gene(s) of interest IFITM1; IFITM2; IFITM3; TTYH2: SEMA6A: TNFSF4; are evaluated to determine whether they provide diagnostic or 40 and SLC37A3, decreased expression is the goal as high prognostic information for the leukemia being analyzed. The expression of genes, especially at least MUC4 and GPR110 evaluation typically involves a comparison between observed or MUC4, GPR110 and IGJ is a predictor of therapeutic gene expression levels and either a predetermined gene failure. The same is true for the expression products of the expression level or threshold value, or a gene expression level other genes in the list which are found in Table 1—those that characterizes a control sample (“predetermined value'). 45 which exhibit a favorable therapeutic outcome for high The control sample can be a sample obtained from a normal expression will be enhanced as a therapeutic goal, whereas as (i.e., non-leukemic) patient(s) or it can be a sample obtained those which exhibit a failed therapeutic outcome for high from a patient or patients with high risk B-ALL that has been expression will be inhibited as a therapeutic goal. cured. For example, ifa cytogenic classification is desired, the Thus, in the case of the 24 genes from Table 1 P and 1F, the biological sample can be interrogated for the expression level 50 increased or decreased expression levels for a particular gene of a gene correlated with the cytogenic abnormality, then as indicated in the table becomes a therapeutic goal in the compared with the expression level of the same gene in a treatment of leukemia, especially high risk B-precursor ALL patient known to have the cytogenetic abnormality (or an (especially pediatric B-precursor ALL). Therapeutic agents average expression level for the gene that characterizes that for effecting the increased or decreased expression levels may population). 55 be identified and used as alternative therapies to traditional The present study provides specific identification of mul treatment modalities for leukemia, especially high risk B-pre tiple genes whose expression levels in biological samples will cursor ALL and either the increased or decreased expression serve as markers to evaluate leukemia cases, especially thera of each of these genes will become a therapeutic goal for the peutic outcome in high risk B-ALL cases, especially high risk treatment of cancer or the development of agents for the pediatric B-ALL cases. These markers have been selected for 60 treatment of cancer. Thus, in this aspect of the present inven statistical correlation to disease outcome data on a large num tion, especially in high risk B precursor ALL (pediatric), the ber of leukemia (high risk B-ALL) patients as described treatment method of the invention involves enhancing or herein. inhibiting at least one of the gene product of expression as Treatment of Infant Leukemia and Pediatric B-Precursor such gene expression is described in Table 1 Pand/or 1F with ALL 65 a therapeutic outcome. In preferred aspects, the therapeutic The genes identified herein that are associated with out method preferably enhances expression at least one of the come of a disease state may provide insight into a treatment genes in Table 1G (preferably CENTG2 and/or PTPRM) or US 8,568,974 B2 17 18 alternatively inhibits the expression of one of the genes in promoter. It can be, but need not be, heterologous with respect table 1P (preferably at least one of MUC4, GPR110 and/or to the cell to which it is introduced. IGJ) in order to promote a more favorable therapeutic out Another option for increasing the expression of a gene like come. In addition to these five genes, expression of at least CENTG2, PTPRM or one or more gene products as described one additional gene and preferably as many as 19 additional in Table 1G (CENTG2; PTPRM, STAP1; CCNJ; PCDH17; genes (totally 24 genes) from the list in Tables 1F and/or 1P MCAM; CAPN3; CABLES1; and/or GPR155) wherein (high expression CCR or favorable outcome is desirable, low higher expression levels are predictive for favorable outcome expression of failure is desirable) can be influenced to provide is to reduce the amount of methylation of the gene. Demethy alternative therapies and anti-cancer agents. lation agents, therefore, may be used to re-activate the expres For a number (nine) of the gene products identified herein, 10 sion of one or more of the gene products in cases where as set forth in Table 1G above, increased expression is corre methylation of the gene is responsible for reduced gene lated with positive outcomes in leukemia patients. Thus, the expression in the patient. invention includes a method for treating leukemia, Such as For other genes identified herein as being correlated with high risk B-ALL including high risk pediatric B-ALL that therapeutic failure or without outcome in high risk B-ALL, involves administering to a patient a therapeutic agent that 15 Such as high risk pediatric B-ALL, high expression of the causes an increase in the amount or activity of at least one of gene is associated with a negative outcome rather than a CENTG2, PTPRM and/or other polypeptides of interest positive outcome. In the present invention, these genes/gene where high expression has been identified herein to be posi products (see Table 1P) are selected from the group consisting tively correlated with favorable outcome (CCR, see Table of MUC4; GPR110; IGJ; NRXN3: CD99; CRLF2: ENAM: 1G). Preferably the increase in amount or activity of the TP53INP1; IFITM1; IFITM2; IFITM3; TTYH2: SEMA6A: selected gene product is at least about 10%, preferably 25%, TNFSF4; and SLC37A3 at least two genes/gene products most preferably 100% above the expression level observed in from this list (especially including MUC4 and GPR110 or the patient prior to treatment. MUC4, GPR110 and/or IGJ), preferably at least three gene, at The therapeutic agent can be a polypeptide having the least 4 from this list, at least 5 from this list, at least 6 from this biological activity of the polypeptide of interest (e.g., 25 list, at least 7 from this list, at least 8, at least 9 at least 10, at CENTG2, PTPRM or other gene product) or a biologically least 11, at least 12, at least 13, at least 14 and all 15 genes/ active subunit or analog thereof. Alternatively, the therapeutic gene products from this list. In Such instances, where the agent can be a ligand (e.g., a small non-peptide molecule, a expression levels of these genes as described are high, the peptide, a peptidomimetic compound, an antibody, or the predicted therapeutic outcome in Such patients is therapeutic like) that agonizes (i.e., increases) the activity of the polypep 30 failure for traditional therapies. In such case, more aggressive tide of interest. For example, in the case of CENTG2, PTPRM approaches to traditional therapies and/or experimental thera or other gene product, these gene products may be adminis pies may be attempted. tered to the patient to enhance the activity and treat the The eight genes described above (negative outcome) patient. accordingly represent novel therapeutic targets, and the Gene therapies can also be used to increase the amount of 35 invention provides a therapeutic method for reducing (inhib a polypeptide of interest in a host cell of a patient. Polynucle iting) the amount and/or activity of these polypeptides of otides operably encoding the polypeptide of interest can be interest in a leukemia patient. Preferably the amount or activ delivered to a patient either as “naked DNA” or as part of an ity of the selected gene product is reduced to less than about expression vector. The term vector includes, but is not limited 90%, more preferably less than about 75%, most preferably to, plasmid vectors, cosmid vectors, artificial chromosome 40 less than about 25% of the gene expression level observed in vectors, or, in some aspects of the invention, viral vectors. the patient prior to treatment. Examples of viral vectors include adenovirus, herpes simplex A cell manufactures proteins by first transcribing the DNA virus (HSV), alphavirus, simian virus 40, picornavirus, vac of a gene for that protein to produce RNA (transcription). In cinia virus, retrovirus, lentivirus, and adeno-associated virus. eukaryotes, this transcript is an unprocessed RNA called pre Preferably the vector is a plasmid. In some aspects of the 45 cursor RNA that is Subsequently processed (e.g. by the invention, a vector is capable of replication in the cell to removal of introns, splicing, and the like) into messenger which it is introduced; in other aspects the vector is not RNA (mRNA) and finally translated by ribosomes into the capable of replication. In some preferred aspects of the desired protein. This process may be interfered with or inhib present invention, the vector is unable to mediate the integra ited at any point, for example, during transcription, during tion of the vector sequences into the genomic DNA of a cell. 50 RNA processing, or during translation. Reduced expression An example of a vector that can mediate the integration of the of the gene(s) leads to a decrease or reduction in the activity vector sequences into the genomic DNA of a cell is a retro of the gene product and, in cases where high expression leads viral vector, in which the integrase mediates integration of the to a therapeutic failure, an expected therapeutic Success. retroviral vector sequences. A vector may also contain trans The therapeutic method for inhibiting the activity of a gene poson sequences that facilitate integration of the coding 55 whose high expression (table 1) is correlated with negative region into the genomic DNA of a host cell. outcome/therapeutic failure involves the administration of a Selection of a vector depends upon a variety of desired therapeutic agent to the patient to inhibit the expression of the characteristics in the resulting construct, such as a selection gene. The therapeutic agent can be a nucleic acid, Such as an marker, vector replication rate, and the like. An expression antisense RNA or DNA, or a catalytic nucleic acid such as a vector optionally includes expression control sequences 60 ribozyme, that reduces activity of the gene product of interest operably linked to the coding sequence Such that the coding by directly binding to a portion of the gene encoding the region is expressed in the cell. The invention is not limited by enzyme (for example, at the coding region, at a regulatory the use of any particular promoter, and a wide variety is element, or the like) or an RNA transcript of the gene (for known. Promoters act as regulatory signals that bind RNA example, a precursor RNA or mRNA, at the coding region or polymerase in a cell to initiate transcription of a downstream 65 at 5' or 3' untranslated regions) (see, e.g., Golub et al., U.S. (3' direction) operably linked coding sequence. The promoter Patent Application Publication No. 2003/0134300, published used in the invention can be a constitutive or an inducible Jul. 17, 2003). Alternatively, the nucleic acid therapeutic US 8,568,974 B2 19 20 agent can encode a transcript that binds to an endogenous can be surveyed computationally after identification of a lead RNA or DNA; or encode an inhibitor of the activity of the drug to achieve rational drug design of even more effective polypeptide of interest. It is sufficient that the introduction of compounds. the nucleic acid into the cell of the patient is or can be accom The invention further relates to compounds thus identified panied by a reduction in the amount and/or the activity of the according to the screening methods of the invention. Such polypeptide of interest. An RNA captamer can also be used to compounds can be used to treat high risk B-ALL especially inhibit gene expression. The therapeutic agent may also be include high risk pediatric B-ALL as appropriate, and can be protein inhibitor or antagonist, Such as Small non-peptide formulated for therapeutic use as described above. molecule Such as a drug or a prodrug, a peptide, a peptidomi Active analogs, as that term is used herein, include modi metic compound, an antibody, a protein or fusion protein, or 10 fied polypeptides. Modifications of polypeptides of the inven tion include chemical and/or enzymatic derivatizations at one the like that acts directly on the polypeptide of interest to or more constituent amino acids, including side chain modi reduce its activity. fications, backbone modifications, and N- and C-terminal The invention includes a pharmaceutical composition that modifications including acetylation, hydroxylation, methyla includes an effective amount of a therapeutic agent as 15 tion, amidation, and the attachment of carbohydrate or lipid described herein as well as a pharmaceutically acceptable moieties, cofactors, and the like. carrier. These therapeutic agents may be agents or inhibitors In certain aspects of the present invention, a therapeutic of selected genes (table 1). Therapeutic agents can be admin method may rely on an antibody to one or more gene products istered in any convenient manner including parenteral, Sub predictive of outcome, preferably to one or more gene product cutaneous, intravenous, intramuscular, intraperitoneal, intra which otherwise is predictive of a negative outcome, so that nasal, inhalation, transdermal, oral or buccal routes. The the antibody may function as an inhibitor of a gene product. dosage administered will be dependent upon the nature of the Preferably the antibody is a human or humanized antibody, agent; the age, health, and weight of the recipient; the kind of especially if it is to be used fortherapeutic purposes. A human concurrent treatment, if any; frequency of treatment; and the antibody is an antibody having the amino acid sequence of a effect desired. A therapeutic agent identified herein can be 25 human immunoglobulin and include antibodies produced by administered in combination with any other therapeutic human B cells, or isolated from human Sera, human immu agent(s) such as immunosuppressives, cytotoxic factors and/ noglobulin libraries or from animals transgenic for one or or cytokine to augment therapy, see Golub et al. Golub et al., more human immunoglobulins and that do not express endog U.S. Patent Application Publication No. 2003/0134300, pub enous immunoglobulins, as described in U.S. Pat. No. 5,939, lished Jul. 17, 2003, for examples of suitable pharmaceutical 30 598 by Kucherlapati et al., for example. Transgenic animals formulations and methods, Suitable dosages, treatment com (e.g., mice) that are capable, upon immunization, of produc binations and representative delivery vehicles. ing a full repertoire of human antibodies in the absence of The effect of a treatment regimen on an acute leukemia endogenous immunoglobulin production can be employed. patient can be assessed by evaluating, before, during and/or For example, it has been described that the homozygous after the treatment, the expression level of one or more genes 35 deletion of the antibody heavy chain joining region (J(H)) as described herein. Preferably, the expression level of gene in chimeric and germ-line mutant mice results in com gene(s) associated with outcome, such as a gene as described plete inhibition of endogenous antibody production. Transfer above (preferably, favorable outcome Table 1G, but also, of the human germ-line immunoglobulin gene array in Such negative outcome as in Table 1P), may be monitored over the germ-line mutant mice will result in the production of human course of the treatment period. Optionally gene expression 40 antibodies upon antigen challenge (see, e.g., Jakobovits et al., profiles showing the expression levels of multiple selected Proc. Natl. Acad. Sci. U.S.A.,90:2551-2555 (1993); Jakobo genes associated with outcome can be produced at different vits et al., Nature, 362:255-258 (1993); Bruggemann et al., times during the course of treatment and compared to each Year in Immuno. 7:33 (1993)). Human antibodies can also be other and/or to an expression profile correlated with outcome. produced in phage display libraries (Hoogenboom et al., J. Screening for Therapeutic Agents 45 Mol. Biol., 227:381 (1991); Marks et al., J. Mol. Biol., 222: The invention further provides methods for screening to 581 (1991)). The techniques of Cote et al. and Boerner et al. identify agents that modulate expression levels of the genes are also available for the preparation of human monoclonal identified herein that are correlated withoutcome, riskassess antibodies (Cole et al., Monoclonal Antibodies and Cancer ment or classification, cytogenetics or the like. Candidate Therapy, Alan R. Liss, p. 77 (1985); Boerner et al., J. Immu compounds can be identified by Screening chemical libraries 50 nol., 147(1):86-95 (1991)). according to methods well known to the art of drug discovery Antibodies generated in non-human species can be and development (see Golub et al., U.S. Patent Application “humanized' for administration in in order to reduce Publication No. 2003/0134300, published Jul. 17, 2003, for a their antigenicity. Humanized forms of non-human (e.g., detailed description of a wide variety of screening methods). murine) antibodies are chimeric immunoglobulins, immuno The screening method of the invention is preferably carried 55 globulin chains or fragments thereof (such as Fv, Fab, Fab', out in cell culture, for example using leukemic cell lines F(ab')2, or other antigen-binding Subsequences of antibodies) (especially B-precursor ALL cell lines) that express known which contain minimal sequence derived from non-human levels of the therapeutic target, such as CENT2G, PTPRMor immunoglobulin. Residues from a complementary determin other gene product as otherwise described herein (see Table ing region (CDR) of a human recipient antibody are replaced 1G and 1P). The cells are contacted with the candidate com 60 by hydroxylation, methylation, amidation, and the attach pound and changes in gene expression of one or more genes ment of carbohydrate or lipid moieties, cofactors, and the relative to a control culture or predetermined values based like. upon a control culture are measured. Alternatively, gene In certain aspects of the present invention, a therapeutic expression levels before and after contact with the candidate method may rely on an antibody to one or more gene products compound can be measured. Changes in gene expression 65 predictive of outcome, preferably to one or more gene product (above or below a predetermined value) indicate that the which otherwise is predictive of a negative outcome, so that compound may have therapeutic utility. Structural libraries the antibody may function as an inhibitor of a gene product. US 8,568,974 B2 21 22 Preferably the antibody is a human or humanized antibody, bination of these genes/gene products. In certain preferred especially if it is to be used fortherapeutic purposes. A human embodiments, the microchip contains DNA probes for all 24 antibody is an antibody having the amino acid sequence of a genes which are set forth in Table 1 P and 1F or any one of the human immunoglobulin and include antibodies produced by two sets of gene products in Tables 1P or 1F, preferably at human B cells, or isolated from human Sera, human immu 5 least two or more gene products described above for Table 1P noglobulin libraries or from animals transgenic for one or or 1F alone with at least one additional gene taken from the more human immunoglobulins and that do not express endog other of Table 1P or 1F. Various probes can be provided onto enous immunoglobulins, as described in U.S. Pat. No. 5,939, the microchip representing any number and any variation of 598 by Kucherlapati et al., for example. Transgenic animals gene products as otherwise described in Table 1 P or 1F. In a (e.g., mice) that are capable, upon immunization, of produc 10 preferred embodiment, the kit is an immunoreagent kit and ing a full repertoire of human antibodies in the absence of contains one or more antibodies specific for the endogenous immunoglobulin production can be employed. polypeptide(s) of interest. For example, it has been described that the homozygous Relevant portion of the below cited references are refer deletion of the antibody heavy chain joining region (J(H)) enced and incorporated herein. In addition, previously pub gene in chimeric and germ-line mutant mice results in com 15 lished WO 2004/053074 (Jun. 24, 2004) is incorporated by plete inhibition of endogenous antibody production. Transfer reference in its entirety herein. of the human germ-line immunoglobulin gene array in Such In the present invention, Sophisticated computational tools germ-line mutant mice will result in the production of human and statistical methods were used to reduce the comprehen antibodies upon antigen challenge (see, e.g., Jakobovits et al., sive molecular profiles to a more limited set of 24 genes (a Proc. Natl. Acad. Sci. U.S.A.,90:2551-2555 (1993); Jakobo gene expression “classifier) that is highly predictive of over vits et al., Nature, 362:255-258 (1993); Bruggemann et al., all outcome in high risk B-ALL, including high risk pediatric Year in Immuno. 7:33 (1993)). Human antibodies can also be B-ALL. produced in phage display libraries (Hoogenboom et al., J. As described in the following examples, the inventors Mol. Biol., 227:381 (1991); Marks et al., J. Mol. Biol., 222: examined pre-treatment specimens from 207 patients with 581 (1991)). The techniques of Cote et al. and Boerner et al. 25 high risk B-precursor acute lymphoblastic leukemia (ALL) are also available for the preparation of human monoclonal who were uniformly treated on Children's Oncology Group antibodies (Cole et al., Monoclonal Antibodies and Cancer Trial COG P9906. Gene expression profiles were correlated Therapy, Alan R. Liss, p. 77 (1985); Boerner et al., J. Immu with clinical features, treatment responses, and relapse free nol., 147(1):86-95 (1991)). survivals (RFS). The use of four different unsupervised clus Antibodies generated in non-human species can be 30 tering methods showed significant overlap in the classifica “humanized' for administration in humans in order to reduce tion of these patients. Two clusters contained all children with their antigenicity. Humanized forms of non-human (e.g., either t(1:19)(cq23:p13) translocations or MLL rearrange murine) antibodies are chimeric immunoglobulins, immuno ments. The other six clusters were novel and not associated globulin chains or fragments thereof (such as Fv, Fab, Fab', with recurrent chromosomal abnormalities or distinctive F(ab')2, or otherantigen-binding Subsequences of antibodies) 35 clinical features. One of these clusters (R6; n=21) had sig which contain minimal sequence derived from non-human nificantly better 4-year RFS of 95% as compared to the 4-year immunoglobulin. Residues from a complementary determin RFS of 61% for the entire cohort (P"0.002). A cluster of ing region (CDR) of a human recipient antibody are replaced children (R8; n=24) with dismal outcomes was found with a by residues from a CDR of a non-human species (donor 4 year RFS of only 21% (P<0.001). A significant proportion antibody) Such as mouse, rat or rabbit having the desired 40 of these children (63%; 15/24) were of Hispanic/Latino eth specificity. Optionally, Fv framework residues of the human nicity. Specific gene alterations in this unique Subset of ALL immunoglobulin are replaced by corresponding non-human provide the basis for up-front identification of these residues. See Jones et al., Nature, 321:522-525 (1986); extremely high risk individuals and allow for the possibility Riechmann et al., Nature, 332:323-327 (1988); and Presta, of targeted therapy Curr. Op. Struct. Biol. 2:593-596 (1992). Methods for 45 humanizing non-human antibodies are well known in the art. EXAMPLES See Jones et al., Nature, 321:522-525 (1986); Riechmann et al., Nature, 332:323-327 (1988); Verhoeyen et al., Science, Material and Methods 239:1534-1536 (1988); and (U.S. Pat. No. 4,816,567). Patients Laboratory Applications 50 COG P9906 enrolled 272 eligible children and adolescents The present invention further includes an exemplary with higher risk ALL between Mar. 15, 2000 to Apr. 25, 2003. microchip for use in clinical settings for detecting gene This trial targeted a subset of patients with NCI high risk expression levels of one or more genes described herein as clinical features defined by a sliding age and white blood cell being associated with outcome, risk classification, cytogenics count criteria that identified a group that experienced very or Subtype in high risk B-ALL, including high risk pediatric 55 poor outcomes (44% 4-year RFS) in prior Pediatric Oncology B-ALL. In a preferred embodiment, the microchip contains Group clinical trials. Patients were first enrolled on the COG DNA probes specific for the target gene(s). Also provided by P9000 classification study and received a 4-drug induction. the invention is a kit that includes means for measuring Patients with 5-25% blasts in the bone marrow (BM) at day 29 expression levels for the polypeptide product(s) of one or of induction received 2 additional weeks of extended induc more Such genes, including any of the genes listed in Table 1G 60 tion therapy using the same agents. Patients with less than 5% and 1F below, preferably one or more of (CENTG2): BM blasts following 4-6 weeks of induction therapy were PTPRM, STAP1; CCNJ; PCDH17; MCAM; CAPN3; eligible to participate in COGP9906 if they met the age/WBC CABLES1; and GPR155 as positive outcome predictor criteria described previously or had overt central nervous genes/gene products or one or more of MUC4; GPR110; IGJ; system (CNS3) or testicular involvement. Patients with favor NRXN3: CD99; CRLF2: ENAM: TP53INP1; IFITM1; 65 able (trisomy 4+10; TEL/AML1) or unfavorable (Philadel IFITM2; IFITM3; TTYH2: SEMA6A: TNFSF4; and phia chromosome-positive or hypodiploid with less than 44 SLC37A3, as negative outcome predictors, preferably a com ) genetic features were excluded, with the US 8,568,974 B2 23 24 exception that those with favorable genetic features and and sort them by their average rank order. This generates CNS3 status or testicular involvement were included. tables of the highest and lowest expressed probes across each Patients enrolled in COG P9906 were treated uniformly with group that are, presumably, reflective of their nature. Because an “augmented BFM regimen that included two delayed these samples have so many probe sets with very low expres intensification phases analogous to that described previ 5 Sion, this analysis was not informative at the low end of the ously.''' rank order. At the high end, however, it worked quite well to All patients had minimal residual disease (MRD) testing identify genes for which each cluster had overexpression. performed by flow cytometry in a single central reference These top 50 probe sets for all R-clusters are given in the laboratory as described previously.' Testing was performed Supplement. The creation of gene lists by VX has been at day 8 on peripheral blood (PB), and at end induction and 10 described previously' and is also detailed in the Supplement. end of interim maintenance (week 22) on BM. Cases were defined as MRD positive or negative using a threshold of Statistical Methods 0.01%. Outcome data for all patients were frozen as of Octo Statistical analysis for each group was performed by com ber 2006. The median time to event or censoring was 3.7 paring group membership to all samples not in that group. years. 15 Log rank analysis was used to evaluate RFS." Kaplan-Meier Expression data were obtained on 207 cryopreserved Survival analysis and hazard ratios were also calculated for specimens with >80% leukemic blasts, stored in the COG comparisons of group RFS.''' Higherhazardratios indicate leukemia repository (University of New Mexico) and that a group has poorer RFS relative to the remainder of the selected solely on the basis of specimen availability. The cohort while lower hazard ratios indicate the opposite. Events clinical variables and outcome of the 207 patients studied in all RFS analysis are relapses following remission. Two were highly similar to those of the entire cohort of 272 eli sided Satterthwaite t-tests and Mann-Whitney rank sum tests gible patients (Table 5S). The NCI and participating institu were used to analyze intensities and age/WBC counts, respec tions approved the treatment protocol through their respective tively; Fisher's exact test was used to evaluate the binary Institutional Review Boards (IRBs). All patients or their variables." patients/guardians provided informed consent prior to trial 25 Results enrollment. Patient Cohort Nucleic Acids and Microarrays To determine if we could identify novel clinically-relevant RNA was purified from cryopreserved samples by the Tri leukemic Subgroups, gene expression profiles were obtained Zol method and was quantified by spectrophotometry. Gen from a retrospective cohort of 207 previously untreated ALL eration of cDNA, cRNA and biotin-labeled probes was per 30 patients who were enrolled on the COG P9906 higher risk formed as previously described." Samples were analyzed ALL trial. The cohort had a slight male predominance (66%) using the U133 Plus 2.0 arrays (Affymetrix, with one-quarter of the children being of Hispanic/Latino Santa Clara, Calif.). Signal intensities and expression data ethnicity. At diagnosis, the median white blood cell count was were generated with the Affymetrix GCOS 1.4 software elevated at 62.300/uL and high numbers of blasts were iden package. A mask to remove study-specific uninformative 35 tified in the CNS in 10% (20/201 for which data were avail probe pairs was applied to all the arrays (details in Supple able) of children. Mixed lineage leukemia (MLL) or E2A/ ment). The default Affymetrix normalization (all genes: PBX1 translocations were present in 10% and 11% of cases, intensity of 500) was used. This gene expression dataset may respectively. RFS and overall survival at 4 years were 61% be accessed via the National Cancer Institute caArray site and 83% respectively. Clinical details are shown in Table 5S. (array.nci.nih.gov/caarray/) and at Gene Expression Omnibus 40 Clustering Analysis (ncbi.nlm.nih.gov/geo/). A direct link to this dataset is pro Multiple approaches were taken to identify highly-related vided for the reviewers at: ncbi.nlm.nih.gov/geo/query/ groups of patients, under the assumption that the most robust acc.cgi?token-lrqbxguwayqaapk&acc=GSE11877. clusters would be independently partitioned by more than one Unsupervised Clustering Methods algorithm. Unsupervised two-dimensional hierarchical clus Microarray expression data were available from an initial 45 tering based on the association of gene signatures identified 8 54,668 probe sets after masking removed seven probe sets clusters (H1-8) (FIG. 1A). VxInsight identified 7 clusters (Table S1). Four complementary unsupervised clustering (V1-7), as shown in FIG. 1B. The strong overlap between the methods were used: traditional hierarchical clustering, VxIn clusters identified by these methods is also shown in FIG. 1B. sight (VX)", and hierarchical clustering using outlier genes The samples grouped in H1 were predominately found in V1. identified by Cancer Outlier Profile Analysis (COPA)' and 50 There was a similar overlap of H2 and H6 with V2 and V6, Recognition of Outliers by Sampling Ends (ROSE). Descrip respectively. The samples identified as H8 in the hierarchical tions of the details of each of these methods and their appli clustering were predominately found in V8, although some of cation to the data sets are Supplied as Supplementary infor these patients were also grouped into V4. mation. Hierarchical clustering using outlier genes also identified In an effort to simplify the nomenclature for the clusters the 55 related groups within the population of ALL patients. Both numbering from the hierarchical clustering groups was COPA (FIG. 2A) and ROSE (FIG. 2B) analysis segregated applied to the other methods. Each method cluster is prefixed patients into distinctive clusters that were assigned labels by a letter indicating the method used to identify it indicting the overlap of the members with groups identified in (H=hierarchical clustering, V=VxInsight, C-COPA and the hierarchical clustering shown in FIG. 1A. The similarities R=ROSE). Clusters from each method were compared to 60 between the groups identified by the ROSE or COPA and those of the hierarchical clustering and then the group num hierarchical clustering are shown in FIG.2C. The most highly bers were assigned based upon maximum similarity. related groups across all methods were determined by the Generation of Gene Lists largest number of shared samples: Cluster 1 (14), Cluster 2 Although the genes used for hierarchical clustering were (23), Cluster 6 (15) and Cluster 8 (17). Sufficient for distinguishing the groups, they were far from 65 For each of the clustering methods we performed X2 analysis comprehensive in characterizing them. Consequently, we to determine if there appeared to be a relationship between used the group membership to reevaluate all 54,668 probes selected clinically-relevant variables and cluster assignment. US 8,568,974 B2 25 26 The best statistical correlations with known translocations methods identified a population that fared significantly worse (MLL and E2A/PBX1) were found in the ROSE clusters, than the cohort as determined by log rankanalysis (Table 1B). shown in Table 1A (a more complete relationship of the Day 29 MRD also differed between ROSE clusters clinical correlates of both ROSE and hierarchical clusters are (p<0.001: Table 1A). A Fisher's Exact Test indicated that R8 presented in the Supplement Tables 3S and 4S). Shaded cells had a higher proportion of MRD positive patients on day 29, in Table 1A highlight those specific variables that were deter as might be expected given their eventual poor outcome. mined by Fisher's Exact Tests to be highly significant Surprisingly, R6, comprising patients with a very good out between cluster groups. Both of the known chromosomal come, did not have a corresponding marked increase in MRD translocations in this cohort were assigned to specific clusters negative cases. In addition, all of the patients assigned to R2 with 100% accuracy: cluster R1 contained exclusively the 10 (t(1:19) E2A/PBX1 translocations) were MRD negative on MLL translocations while all of the 41:19) E2A/PBX1 trans day 29, despite the fact that the RFS for this group was not locations clustered together in R2. different than that seen for the entire cohort. Similarly, R5 had TABLE 1A

Association of Clinical Features with ROSE Clusters

P R1 R2 R2A R4 RS R6 R7 R8 total (CHISQ) Sex Male 1121 11.23 611 11.13 8,11 1721 56.83 17.24 137,207 0.1 Translocation MLL 21f 21 O.23 Of 11 Of 13 Of 11 Of 21 Of83 0.24 21,207 <0.001 t(1:19) Of 21 23f23 Of 11 Of 13 Of 11 Of 21 Of83 0.24 23,207 <0.001 Outcome Relapse 7.21 6.23 3.11 3.13 2,10 1f2O 3O81 18.23 70,202 <0.001 MRD (d29) Positive 9.17 Of2O 1.9 2.13 8,11 6.21 22,77 19723 67,191 <0.001 Race Hispanic 4/21 6.23 211 2.13 Of 10 3f2O 19,83 15.24 S1,205 O.OO1

25 TABLE 1B a significant increase in MRD positive patients at day 29 without a corresponding alteration in RFS. Hazard Ratios and LOgrank p-values for Clusters 6 and 8 Finally, race also varied significantly across the ROSE clusters (p<0.001). While Hispanic/Latino patients were Good Outcome Clusters Poor Outcome Clusters 30 present in all clusters except R5, the proportion of Hispanics in R8, the cluster associated with the poorest outcome, was R6 C6 H6 V6 R8 C8 H3 V8 markedly higher than that in every other cluster (p<0.001). P O.O1O O.O10 OO15 O112 <0.001 <0.001

Probe Set ID Gene Gene Title EntrezID Band 220059 at STAP1 signal transducing adaptor family 26228 4q13.2 member 1 228240 at CENTG2* Full-length cDNA clone 2d37.2 CSODMOO2YA18 of Fetal liver of Homo sapiens (human) 204066 s at CENTG2 centaurin, gamma 2 116987 2p24.3- p24.1 233225 at CENTG2* CDNA FLJ36087 fis, clone 2d37.2 TESTI2O2O283 206756 at CHST7 carbohydrate (N-acetylglucosamine 6-O) 56548 Xp11.23 sulfotransferase 7 240758 at CENTG2* 2d37.2 1554343 a at STAP1 signal transducing adaptor family 26228 4q13.2 member 1 230537 at PCDH17: 13q21.1 203921 at CHST2 carbohydrate (N-acetylglucosamine-6-O) 9435 3d24 sulfotransferase 2 230179 at LOC285812 hypothetical protein LOC285812 285812 6p23 2198.21 sat GFOD1 glucose-fructose oxidoreductase S4438 6pter domain containing 1 p22.1 1554486 a. at C6orf114 chromosome 6 open reading frame 85411 6p23 14 209593 s at TOR1B orsin family 1, member B (torsin B) 27348 9q34 203329 at PTPRM protein tyrosine phosphatase, 5797 18p11.2 receptor type, M 227289 at PCDH17 protocadherin 17 27253 13q21.1 1552398 a. at CLEC12A C-type lectin domain family 12, 160364 12p13.2 member A 242457 at Transcribed Sq21.1 205656 at PCDH17 protocadherin 17 27253 13q21.1 1555579 s at PTPRM protein tyrosine phosphatase, 5797 18p11.2 receptor type, M 1556593 s at CDNA FLJ40061 fis, clone 3d23 TESOP2OOOO83 228863 a PCDH17 protocadherin 17 27253 13q21.1 202336 s at PAM peptidylglycine alpha-amidating SO66 5q14-q21 monooxygenase 235968 a CENTG2 centaurin, gamma 2 116987 2p24.3- p24.1 225611 a 5d.123 210944 sat CAPN3 calpain 3, (p94) 825 15q15.1- q21.1 211340s at MCAM melanoma cell adhesion molecule 4.162 11q23.3 233038 a CENTG2* CDNA: FLJ22776 fis, clone 2d37.2 KAIA1582 219470 x at CCNJ cyclin J S4619 10pter q26.12 244665 a. ITGA6* Transcribed locus 2a31.1 230954 a C20orf112 chromosome 20 open reading frame 14O688 20g 11.1- 112 q11.23 211890 x at CAPN3 calpain 3, (p94) 825 15q15.1- q21.1 226342 a. SPTBN1 spectrin, beta, non-erythrocytic 1 6711 2p21 202746 a ITM2A integral membrane protein 2A 9452 Xq13.3- Xq21.2 20908.7 x at MCAM melanoma cell adhesion molecule 4.162 11q23.3 22.3130 s at MYLIP myosin regulatory light chain 291.16 6p23 interacting protein p22.3 228.098 s at MYLIP myosin regulatory light chain 291.16 6p23 interacting protein p22.3 US 8,568,974 B2 29 30 TABLE 2-continued Top 50 Rank Order Genes for R6 (asterisks denote gene assignments using UCSC Genome Browser

Probe Set ID Gene Gene Title EntrezID Band

225613 at MAST4 microtubule associated 375449 serine/threonine kinase family member 4 40016 g at MAST4 microtubule associated 375449 5q12.3 serine/threonine kinase family member 4 232227 at AF161442 HSPC324 202747 s at integral membrane protein 2A 9452

228,097 at MYLIP myosin regulatory light chain 291.16 interacting protein 229.091 s at CCNJ cyclin J S4619

204836 at GLDC glycine dehydrogenase 2731 (decarboxylating) 201656 at ITGA6 integrin, alpha 6 3655 2a31.1 215177 s at ITGA6 integrin, alpha 6 3655 2a31.1 214475 X at CAPN3 calpain 3, (p94) 825 15q15.1- q21.1 1558621 at CABLES1 CdkS and Ablenzyme substrate 1 91768 18q11.2 229597 s at WDFY4 WDFY family member 4 57705 10q11.23 231166 at GPR155 G protein-coupled receptor 155 151556 2a31.1 239956 at CDNA FLJ40061 fis, clone 3q23 TESOP2OOOO83

TABLE 3 Top 50 Rank Order Genes for R8 (asterisks denote gene assignments using UCSC Genome Browser

Probe Set ID Gene Gene Title Entrez.ID Band 236489 at GPR110: Transcribed locus 212592 at IG Immunoglobulin J polypeptide, 3512 linker protein for immunoglobulin alpha and mu polypeptides 217109 at MUC4 mucin 4, cell Surface associated 4585 240586 at ENAM Enamelin 101.17 4q13.3 2O5795 at NRXN3 neurexin 3 9369 14q31 238689 at GPR110 G protein-coupled receptor 110 266977 6p12.3 21711.0 s at MUC4 mucin 4, cell Surface associated 4585 236750 at NRXN3: Transcribed locus 14q31.1 242051 at CD99* Transcribed locus Xp22.33; Y11.31 204895 X at MUC4 mucin 4, cell Surface associated 4585 201029 s at CD99 CD99 molecule 4267 Xp22.32: Y11.3 201028 s at CD99 CD99 molecule 4267 Xp22.32: Y11.3 229114 at GAB1* CDNA clone IMAGE:48O1326 4.d31.21 206873 at CA6 carbonic anhydrase VI 765 1p36.2 201876 at PON2 paraOXonase 2 S445 7q21.3 222154 sat LOC26010 viral DNA polymerase 26010 2d33.1 transactivated protein 6 210830s at PON2 paraOXonase 2 S445 7q21.3 235988 at GPR110 G protein-coupled receptor 110 266977 6p12.3 216565 x at LOC391 O20 interferon induced transmembrane 391 O20 1p36.11 protein pseudogene 215021 sat neurexin 3 9369 225912 at tumor protein p53 inducible nuclear 94241 protein 1 226002 at GAB1* CDNA clone IMAGE:48O1326 4q31.21 214022 s at IFITM1 interferon induced transmembrane 8519 11 p15.5 protein 1 (9–27) 212203 X at IFITM3 interferon induced transmembrane 10410 11 p15.5 protein 3 (1-8U) 1563357 at SERPINB9* MBNA, cDNA DKFZp564C203 6p25.2 (from clone DKFZp564C203) 225998 at GAB1 GRB2-associated binding protein 1 2549 4q31.21 201315 x at IFITM2 interferon induced transmembrane 10581 11 p15.5 protein 2 (1-8D) US 8,568,974 B2 31 32 TABLE 3-continued Top 50 Rank Order Genes for R8 (asterisks denote gene assignments using UCSC Genome Browser

Probe Set ID Gene Gene Title EntrezID Band 201601 x at IFITM1 interferon induced transmembrane 8519 11 p15.5 protein 1 (9–27) 230643 at WNT9A wingless-type MMTV integration 7483 1q42 site family, member 9A 212974 at DENND3 DENN/MADD domain containing 3 22898 203435 s at MME membrane metallo-endopeptidase 4311 223741 s at TTYH2 tweety homolog 2 (Drosophila) 94O15 212975 at DENND3 DENN/MADD domain containing 3 22898 207426 s at TNFSF4 tumor necrosis factor (ligand) 7292 Superfamily, member 4 (tax transcriptionally activated glycoprotein 1, 34 kDa) 52731 at FL2O294 hypothetical protein FLJ20294 55626 11p11.2 215028 a SEMA6A sema domain, transmembrane 57.556 5q23.1 domain (TM), and cytoplasmic domain, (Semaphorin) 6A 229649 a NRXN3 neurexin 3 9369 1559315 s at LOC144481 hypothetical protein LOC144481 144481 205983 a DPEP1 dipeptidase 1 (renal) 1800 226840 a H2AFY H2A histone family, member Y 9555 23.0161 a CD99* Transcribed locus Xp22.33; Y11.31 223304 a SLC37A3 solute carrier family 37 (glycerol-3- 84255 7q34 phosphate transporter), member 3 218862 a ASB13 ankyrin repeat and SOLS box- 79754 10p15.1 containing 13 213939 s at RUFY3 RUN and FYVE domain containing 3 22902 4q13.3 207112 s at GAB1 GRB2-associated binding protein 1 2549 4.d31.21 227856 a C4Orf32 chromosome 4 open reading frame 32 132720 4d25 238880 a GTF3A general transcription factor IIIA 2971 13q12.3- 1569666 s at SLC37A3* Homo sapiens, clone IMAGE:558.1630, mRNA 20936.5 s at ECM1 extracellular matrix protein 1 1893 203373 at SOCS2 Suppressor of cytokine signaling 2 883S

Sequences 3' of CENTG2, CHST2 and MAST4 as well as algorithms to analyze gene expression profiles from a retro introns of CENTG2 and ITGA6 were among the high ranking 40 spective cohort of ALL patients with a clinical profile that probe sets forming the R6 signature. This pattern of expres Suggested that they were at high risk for relapse. These meth sion suggests the possibility of alternative splicing or agen ods identified overlapping groups of transcripts that defined eralized elevation in expression within certain chromosomal clusters with important cytogenetic and clinical characteris regions. Several of the genes forming the R6 signature are tics. also postulated to be involved with cell signaling and adhe 45 We used four different unsupervised clustering algorithms sion (CENTG2, CLEC12A, GPR155, MCAM, ITM2A, to analyze the gene expression data in pretreatment speci PCDH17, and PTPRM). In addition, two cyclin associated mens from a cohort of 207 children with high-risk ALL (HR genes (CCNJ and CABLES1) are preferentially associated ALL). This type of analysis, without knowledge of prior class with the Good Outcome Cluster. While the R6 genes are more definitions, allows for identification of fundamental subsets commonly expressed in lymphocytes, there is no obvious 50 of patients sharing similar gene expression signatures. The pattern of expression that is associated with a particular stage composite result is a separation of the HR-ALL cases into of differentiation or cell type. eight distinct clusters based on traditional hierarchal cluster Discussion ing methods. The additional three methods show significant Gene expression profiling studies of pediatric ALLS have overlap in cluster membership with traditional hierarchical shown marked heterogeneity. In approximately 35-40% of 55 clustering, but allow for greater discrimination of unique all cases, specialized molecular techniques and gene cloning gene signatures that relate to outcome differences. The have identified recurring genetic abnormalities that are asso strength of this type of approach is apparent when using the ciated with drug responsiveness, prediction of relapse, and more restrictive clustering algorithms (ROSE and COPA), in overall survival. These genetic abnormalities are primarily the effective identification and clustering of HR-ALL speci seen in children who have either better treatment outcomes 60 mens with translocations into two distinct clusters (clusters 1 and “low risk” disease (such as TEL/AML1 or trisomies of and 2) using an unsupervised approach. chromosomes 4, 10, and 17) or poor outcomes and “very high As had been seen in other studies,' we discovered gene risk” disease (such as BCR/ABL or hypodiploidy). Classifi signatures characteristic of specific chromosomal abnormali cation of the remaining children for determination of risk ties common in ALL. In the primary data set we found two stratified therapy relies on clinical parameters such as patient 65 clusters that contained 100% of the t(1:19) translocations and age, presenting white blood cell count, and response to induc MLL rearrangements. In the validation data we also found a tion therapy. We used a series of unsupervised clustering signature that defined subjects with a to 12:21) translocation. US 8,568,974 B2 33 34 This pattern was not seen in the original data because only The pattern of gene expression in individual cohorts will three patients with this lesion were enrolled in COG P9906. provide insights into fundamental biological pathways that Interestingly, both the ROSE and COPA analysis identified a underlie the neoplastic diseases, as well as providing a poten distinct cluster with a signature related to that seen in t(1:19) tial population of genes and pathways that can be targeted by subjects. While the pattern of gene expression was distinct novel therapies. The top-ranked members of the clusters that enough that the samples did not cluster together with the predict both good and poor outcome were dominated by t(1:19) patients, the similarities were sufficient to conclude genes involved in cell signaling and adhesion. CD99 is over that these patients share a fundamental underlying process expressed in a variety of tumors''' and has served as a that was observed even in the absence of the translocation. therapeutic target in investigational therapies. Overexpres Two of the clusters described by multiple unsupervised 10 sion of MUC4 has been associated with a poor prognosis in a algorithms had remarkable differences in RFS compared to variety of solid tumors' but has not been previously linked the cohort as a whole, even though all the patients enrolled in to outcome in leukemias. In contrast GAB1 has been shownto COG P9906 were identified as being at higher risk of relapse be predictive of favorable response in BCR/ABL-ALL" and based on clinical characteristics (age and white blood cell its expression has been correlated with responsiveness to count). Cluster 8 identified by all the statistical methods con 15 imatinib in rheumatoid arthritis. The CABLES1 gene has sisted of patients that fared far worse that the 60% RFS seen been described as a growth suppressor and is frequently in the entire population. Only 20% of the 37 patients identi deleted in solid tumors' although it has not been previ fied by hierarchical clustering were disease-free at 5 years, ously described as playing a significant role in leukemia. Its while all of the 24 patients segregated by ROSE relapsed or overexpression in the good outcome group is consistent with were censored. In contrast, Cluster 6 identified by ROSE/ the Suppressed growth and senescence that might be expected COPA and hierarchical clustering consisted of a group of in light of the excellent RFS of this group of patients. approximately 20 patients with a 95% rate of RFS. Therefore There are similarities between the genes that we describe we identified a marked heterogeneity in treatment response here and those reported in other studies. Cluster 6 (good evenina group of children who had been preselected in a high outcome group) shares some features with the “novel' cluster risk category. Whether the patients in cluster 6 are actually 25 of patients initially described byYeoh etal and later reinves children who would respond well to less aggressive therapy, tigated by Ross et al. This novel cluster from the previous or who are good responders to the intensive treatment of COG studies has also been reported to be frequently associated P9906 and would fail conventional protocols is not clear. It is with deletions of the ERG gene. We analyzed the publicly clear however that cluster 8 consists of patients that relapse at available Affymetrix U133A data from the second study by very high rates and are candidates for novel treatment 30 ROSE and identified a distinct cluster of 13 members. Com regimes. parison of the top rank order of this group to cluster 6 resulted End induction MRD has been shown to be a robust predic in a set of 50 genes from the top 200 that were identical, even tor of RFS in many studies, including COG P9906.’’’ though the U133A array has less than half the probe sets on Interestingly, although the patient numbers in Subgroups are U133 Plus 2.0 arrays used in our studies. Despite the simi relatively small, the predictive power of some gene signatures 35 larities in the composition of the clusters, the earlier studies seemed to provide more information than day 29 MRD. did not find a correlation to clinical features, contrasted with Although the overall MRD positivity of Cluster 8 was highly the favorable prognosis patients in our group of high risk predictive of eventual relapse, this was not the case for many patients. Several genes with expression patterns associated of the other clusters. In particular, the MRD status of cluster with the R8 poor outcome cluster are also among those pre 6 was not statistically different from the entire cohort even 40 viously identified as distinguishing the BCR/ABL subtype of though only 1/21 patients in this group relapsed, and the ALL from other childhood ALL subtypes. These shared patient who relapsed was day 29 MRD negative. Similarly, genes includeMUC4, GPR110, CD99, IGJ and IFITM3. This although all patients incluster 2, all of whom had the t01:19), overlap in expression pattern between these two distinctive were MRD negative at day 29, the relapse risk for these high-risk ALL Subgroups suggests a biological similarity patients was quite similar to that of the overall group of P9906 45 despite the lack of BCR/ABL translocation in the R8 group. patients. These findings are consistent with the observation of A recent report measured gene expression in a series of Borowtiz et all that the most robust risk stratification algo ALL patients and proposed a three gene predictor of relapse. rithms integrate genetic features of the leukemia and early The single gene within this set whose induction was predic treatment response as measured by end induction MRD. It is tive of relapse was IGJ, a top-ranked gene in our poor out also possible that further characterization of additional high 50 come cluster. However none of the other genes identified in risk ALL patients will result in a high enough number of the extended data set in this paper as being related to relapse patients in each cohort that more conclusive statements con overlapped with those described here. There are a number of cerning the predictive role of MRD can be made. potential reasons for this discrepancy, although differences in It has been previously reported that Hispanic children with clustering techniques might well be the basis of the differ B-precursor ALL have poorer responses to therapy.'" 55 ences. The Random Forest technique used by Hoffman et While we found that patients of Hispanic/Latino ethnicity al. did not cluster the data of Ross et al. in a manner that were found in all the clusters, they were preferentially repre predicted outcome, while the combination of techniques used sented in cluster 8, the poor outcome group. Twelve of the 15 here extracted informative groups. (80%) of the Hispanics in this cluster relapsed, compared to This gene expression profiling study highlights the diver 1 1/36 (31%) of the Hispanics not in cluster 8. Since the 60 gent mechanisms and pathways of leukemic transformation relapse rate for non-Hispanics incluster 8 was also high (6/9: that are not recognized by current methods of pediatric ALL 67%) it seems that we identified patients of all races who diagnosis, classification and risk assignment. No bias was relapsed, not just simply Hispanic patients. It is possible that induced during cluster selection in this analysis of HR-ALL, the nature of the pattern of gene expression identified in and therefore these expression clusters likely represent the studies such as that reported here will provide Some insight 65 true intrinsic biology in this cohort of patients. We are now into preferential Susceptibilities of specific ethnic groups to determining the novel underlying genetic abnormalities asso high risk ALLS. ciated with each of these clusters through correlated studies US 8,568,974 B2 35 36 of whole genome copy number change and direct gene threshold. The largest value of k for which each probe set sequencing in a National Cancer Institute—Sponsored TAR surpassed the threshold was recorded. The probe sets were GET project. The identification of new genetic abnormalities then ordered by their maximum k values. In this study a probe will allow for targeted therapy in this group of patients who set was selected for clustering if 6sks30 and the median have historically have had a poor response on their therapeu intensity of the sampled group was at least 7-fold its corre tic trials. sponding value on the trend line. This range of k values was Further Details of Analysis selected in order to find groups in the range of 12 or more Masking and Filtering of Probe Sets members (greater than 5% of the population size) and not Masking of Probe Set exceeding 60 members. Groups smaller than 5% of the popu Prior to any intensity analysis, the microarray data were 10 lation were unlikely to yield any statistically significant first masked to remove those probes found to be uninforma results while those of approximately /3 the sample size or tive in a majority of the samples. Removal of these probe pairs greater were likely to identify clinical features such as gender. improves the overall quality of the data and eliminates many The 7-fold threshold was chosen to minimize the impact of non-specific signals that are shared by a particular sample signal noise on probe set selection and also to limit the total type. This was accomplished by evaluating the signals for all 15 number of probe sets to be used for clustering. Lower thresh probes across all 207 samples and then identifying those olds result in the inclusion of many more probe sets while probe pairs for which the mismatch (MM) signals exceeded higher thresholds dramatically reduce the number. Only 215 their corresponding perfect match signals (PM) in more than probesets out of 54,615 satisfied these criteria of 7x threshold 60% of the samples. Masking removed 94.767 probe pairs and k values between 6 and 30, inclusive. and had some impact on 38.588 probe sets (71%). As shown ROSE Gene Selection in CCG 1961 in Table 1C, the net impact of masking was a significant Masking was applied to the CCG 1961 data set exactly the increase in the number of present calls coupled with a dra same way as in P9906. The same 7-fold threshold for intensity matic decrease in the number of absent. The masked data also was also used. Because of the Smaller number of patients in removed 7 probe sets entirely (none of which represented this data set a probe set was selected for clustering if 6sks20. human genes). This resulted in the number of available probe 25 rather than an upper k of 30. Due to the noise of some of these sets on the microarray being reduced from 54.675 to 54,668. microarrays, a lower limit intensity of 150 was applied (roughly twice the background across all chips). This pre TABLE 1C vented misleading signals at, or below, the level of back ground from giving a misleadingly high slope to the trend Overall impact of masking on microarray calls 30 line. This was accomplished by substituting the value of 150 Present Marginal Absent No call for any lower intensity. This process also dampened the apparent deviation of low signals from median. Raw 34.9 1.7 6.3.3 O COPA Gene Selection Masked 48.0 3.1 48.9 O (7) As with ROSE, the intensities of the remaining 54,615 35 probe sets were used for the selection of COPA genes. The Filtering of Probe Sets COPA method was applied essentially as described by Tom All four unsupervised learning methods began with the full lins et al.' First, the median expression for each probe set was complement of probe sets (54,688 after masking). VxInsight set to zero. Secondly, the median absolute deviation (MAD) (VX) used the intensity values for the probe sets called either was calculated and the intensities for each probe set were present or marginal (as determine by GCOS 1.4) and treated 40 divided by its MAD. Finally, these MAD-normalized inten those with absent calls as missing data. Traditional hierarchi sities at the 95" percentile for each probe set were sorted. In cal clustering method (HC) applied two separate filtering order to make the comparison of COPA and ROSE more methods to refine the number of starting probes. First, only direct, an equal number of probe sets were selected from the those probe sets having present calls in more than 50% of the top of sorted list of 95" percentile COPA probe sets. From samples were included (23,775). This list was then distilled 45 these 215 probe sets it was determined that 6 corresponded to further by removing those genes that are known to simply the XIST gene and would simply segregate the boys and girls. determine sex (XIST, SRY, etc.) and those probe sets that by After removal of these XIST probe sets 209 remained for t-test analysis were comparable to these sex-related genes clustering. (1,828 total). The final number of evaluable probe sets was Clustering and Grouping Methods 21,947. The expression patterns for these probesets were then 50 COPA and ROSE Clustering analyzed and ordered by their variance. The 100 probe sets The hierarchical clustering of COPA and ROSE genes with the highest variance were used for clustering. ROSE and were performed using EPCLUST (an online tool that is part of COPA simply removed the Affymetrix controls (probe sets the Expression Profiler suite at www.bioinfebc.ee). The data with AFFX prefix) and used all of the remaining 54,615 probe for each probe set were converted to values of log (intensity/ sets for analysis. 55 median) and were uploaded to EPCLUST. Hierarchical clus Gene Selection for Clustering tering was performed using linear correlation based distance ROSE Gene Selection in P9906 (Pearson, centered) and average linkage (weighted group The intensity values for each of the 54,615 probe sets were average, WPGMA). A threshold branch distance was applied individually plotted in ascending order. The plots were and all clusters containing more than 10 members (greater divided into thirds and the intensities from the middle third 60 than 5% of the samples) were retained and labeled. were used to generate trend lines by least squares analysis. Gene List Preparation with VxInsight Groups of 2*k (where k is an integer from 2 to one third of the A gene-by-gene comparison of expression levels between sample size) were sampled from each end of the intensity pairs of groups was computed using analysis of variance plots and the median intensities of these groups were com followed by a sort to put the genes into decreasing order by the pared to the trend lines. FIG. 1C illustrates how this is done. 65 resulting F-statistic. To estimate the stability of this gene list, Increasing sized groups were sampled from each end until the two bootstrap calculations are made under the appropriate median intensity of a group failed to exceed the desired null hypotheses. First, we ask about the list stability given the US 8,568,974 B2 37 38 groupings. In this case bootstraps are resampled with replace (similar to E2A-PBX1 translocations), 6 (good outcome ment from within the indicated groups and processed with patients) and 8 (poor outcome patients) exhibited the best analysis of variance, just as for the actual measurements. The overlap across all three methods. FIG. 2C highlights the collection of resulting gene orders is examined to determine membership similarity across the methods. the 95% confidence bands for the rankings of individual Table 2C gives the adjusted Rand indices showing the genes. Next we compute a p-value for the observed rankings agreement across the three methods. This illustrates that the under the null hypothesis that, Ho: there is no difference in ROSE and hierarchical clustering are the most closely gene expression between the two groups. When Ho is, indeed, related, although all three methods are significantly similar. true, the best empirical distribution would be the combination of all values without respect to their group labels. To test the 10 TABLE 2C hypothesis we create ten thousand bootstraps by sampling from the combined expression levels, ignoring the group Adlusted Rand Indices for Clustering Method Comparison labels. Each bootstrap is processed exactly the same as the original array measurements. A p-value is accumulated by Rose Clusters Hierarchical Clusters WX Clusters counting the fraction of times that we observe a bootstrap 15 ARI P ARI P ARI P where a gene’s ranking is at or above its order in the real Rose Clusters O.4024 experiment. Hierarchical O.4024

Clinical features of ROSE clusters

R1 R2 R2A R4 RS R6 R7 R8 Total P

Cases 21 23 11 3 11 21 83 24 2O7 Age e1OYrs 9 (43%) 15 (65%) 10 (91%) () (77%) 10 (91%) 18 (86%) 42 (51%) 8 (75%) 132 (64%) O.OO1 <1OYrs 12 (57%) 8 (35%) 1 (9%) 3 (23%) 1 (9%) 3 (14%) 41 (49%) 6 (25%) 75 (36%) Median 4.67 13.09 15.32 3.95 14.67 1445 10.92 4.11 13.09 <0.001 range 1.19- 2.91- 11.20- 2.22- 1185- 9.82- 1.93- 5.71- 215 15.51 16.13 17.29 7.23 17.33 17.90 16.8O 7.74 17.34 Sex Female 10 (48%) 12 (52%) 5 (45%) 2 (15%) 3 (27%). 4 (19%) 27 (33%) 7 (29%) 70 (34%) O.18 Male 11 (52%) 11 (48%) 6 (55%) 1 (85%) 8 (73%) 17 (81%) 56 (67%) 7 (71%) 137 (66%) WBC eSOK 16 (76%) 12 (52%). 4 (36%) 2 (15%) 5 (45%) 9 (43%) 46 (55%) 14 (58%) 108 (52%) O.O39 5 blasts 1 (5%) 1 (4%) 2 (18%) 1 (8%) 2 (18%) 0 (0%) 13 (16%) 1 (4%) 21 (10%) D29 Negative 8 (47%) 20 (100%) 8 (89%) 11 (85%) 3 (27%) 15 (71%) 55 (71%). 4 (17%) 124 (65%) <0.001 MRD Positive 9 (53%) 0 (0%) 1 (11%) 2 (15%) 8 (73%) 6 (29%) 22 (29%) 19 (83%) 67 (35%) Relapse- 1 year O.762 O.913 O.909 1.OOO 1.OOO 1.OOO O.976 O.915 <0.001 free 2 years 0.667 0.739 O.818 O.923 1.OOO 1.OOO O828 Survival 3 years 0.667 0.739 O.818 O846 O.900 O.947 O.766 4 years 0.667 0.739 O.727 O.762 O.788 O.947 O661 O.697 5 years 0.667 0.739 O.727 O.762 O.788 O.947 O.S29 O.479 US 8,568,974 B2

TABLE 4S

Clinical features of Hierarchical clusters

H1 H2 H3 H4 H5 H6 H7 H3 Total P

Cases 21 33 27 27 7 2O 25 37 2O7 Age e1OYrs O (48%) 24 (73%) 6 (22%) 24 (89%) 14 (82%) 7 (85%) 11 (44%) 26 (70%) 132 (64%) <0.001 <1OYrs 1 (52%) 9 (27%) 21 (78%) 3 (11%) 3 (18%) 3 (15%) 14 (98%) 11 (30%) 75 (36%) Median 9.07 13.52 3.59 14.63 4.67 4.37 6.87 3.71 3.09 <0.001 range .26- 3.35- 1.35- 9.S8- 7.44- 9.44- .84- 3.37- 2.15 5.51 17.28 14.24 17.23 7.36 7.94 7.08 7.82 7.34 Sex Female 9 (43%) 17 (52%) 11 (41%) 3 (11%) 5 (29%) 4 (20%) 9 (36%) 12 (32%) 70 (34%) O.042 Male 2 (57%) 16 (48%) 16 (59%) 24 (89%) 12 (71%) 6 (80%) 16 (64%) 25 (68%) 137 (66%) WBC eSOK 5 (71%) 16 (48%) 16 (59%) 6 (22%) 7 (41%) 9 (45%) 15 (60%) 24 (65%) 108 (52%) 0.012 5 blasts 1 (5%) 3 (9%) 8 (30%) 2 (7%) 2 (12%) 3 (12%) 2 (5%) 21 (10%) D29 MRD Negative 8 (50%) 28 (97%) 16 (70%) 19 (73%) 9 (53%) 14 (70%) 18 (72%) 12 (34%) 124 (65%) <0.001 Positive 8 (50%) 1 (3%) 7 (30%) 7 (27%) 8 (47%) 6 (30%) 7 (28%) 23 (66%) 67 (35%) Relapse- 1 years 0.762 O.909 1.OOO O.962 1.OOO 1.OOO O.960 O.945 O.OO2 free 2 years 0.667 0.758 O846 O.801 1.OOO 1.OOO O.88O 0.723 survival 3 years 0.667 0.758 O.808 O.761 O.878 O.944 O.798 0.556 4 years 0.667 0.727 O.731 O.623 O.816 O.944 O.62O O.395 5 years 0.667 0.727 O.639 O.SS4 O.816 O.944 0.517 O.211

Comparison of 207 Samples to Entire Cohort TABLE 5S-continued The 207 samples that were tested in this study were select from a total of 272 eligible patients on the basis of their 35 Comparison of the 207 Tested Samples to the 65 Not Tested availability. There were 65 patients for which the sample criteria for inclusions were not met. Typically, this reflected Variable Value 65 207 Comparison P that the blast count was too low (<80%). In some cases this was due to insufficient amount of banked sample or failure or MLL Positive 4f65 20/207 Pos v Neg O.462 th let tth C standards. I ffort 40 TEL Positive 1.65 3/205 Pos v Neg 1.OOO E. e R E. Q aras. In an eIIo f TRISOM Positive 4/61 5/206 Pos v Neg O.124 to address whether the samples we teste a representativeo E2A Positive 5/64. 23/207 Pos v Neg O.638 the full cohort we compared the clinical variables of our 207 CNS Positive 1 1/65 47/207 Pos v Neg O.387 samples to the remaining 65. All variables except age and TESTIC Positive 2/54 4/143 Pos v Neg O. 666 WBC count were evaluated by Fisher's exact test. Age and CONGEN Downs Of 40 8/162 Downs v known 0.362 WBC counts were analyzed using Mann-Whitney rank sum D29 MRD >0.01% 40/59 124/191 Pos v Neg 0.755 testing. The following table reflects the raw numbers and D8 MRD >0.01% 18/59 31/184 Pos v Neg O.O27 associated p-values from these analyses. All variables not AGE (days) Median 5056.0 4782.0 Mann-Whitney O.O26 reaching significance at p<0.05 are shaded with gray. WBC Median 4.5 62.3 Mann-Whitney 0.000 50 TABLESS

Comparison of the 207 Tested Samples to the 65 Not Tested Only four variables (age, WBC count, D8 MRD and sex)

55 reached levels of significance. The WBC count by itself is Variable Value 65 207 Comparison P indicative of why these samples were not included in the

Sex Male 5265 137,207 My F O.044 testing. The median WBC count for the 65 samples omitted Race Caucasian 33.62. 126.205 Cauc w known O.301 from this study is 4.5 K/uL. This is more than ten fold lower Hispanic 1S 62 51/205 Hisp v known 1.OOO 60 than the median for the 207 samples that we tested and is even below the median WBC count for individuals without leuke Black 7.62 13,205 Blackw known O.267 Hawaiian 162 1,205 Hawaiian v known 0.411 mia. The majority of the other variables are quite comparable Asian 3.62 7.205 Asian v known O.702 between the two groups. None of the variables identified in Am. Indian 2.62 3,205 AmInd v known O.329 65 this paper as having noteworthy correlations with specific Other 162 4,205. Other w known 1.OOO clusters (Hispanic race, age and D29 MRD status, in particu lar) were significantly different between the two groups. US 8,568,974 B2 41 42 Probesets Used For Clustering TABLE 6S 100 Probe sets used to define H-Groups Probe Set ID Gene Symbol Gene Title Chrom 1552924 a at PITPNM2 phosphatidylinositol transfer protein, membrane-associated 2 12q24.31 1554026 a. at MYO10 myosin X 5.1-p14.3 1555270 a. at WFS1 Wolfram syndrome 1 (wolframin) 6 1556037 s at HHIP hedgehog interacting protein 1557411 sat SLC25A43 solute carrier family 25, member 43 1563335 at IRGM immunity-related GTPase family, M 3.1 201105 at LGALS1 lectin, galactoside-binding, soluble, 1 (galectin 1) 22d 13.1 201212 at LGMN legumain 14q32.1 201669 s a MARCKS myristoylated alanine-rich protein kinase C Substrate 2.2 201876 at PON2 paraOXonase 2 1.3 202242 at TSPAN7 tetraspanin 7 Xp11.4 202336 s a PAM peptidylglycine alpha-amidating monooxygenase 4-q21 202976 s a RHOBTB3 Rho-related BTB domain containing 3 5 203434 s a MME membrane metallo-endopeptidase 5.1-q25.2 203948 s a MPO myeloperoxidase 17q23.1 204066 s a CENTO2 centaurin, gamma 2 4.3-p24.1 204115 at GNG11 guanine nucleotide binding protein (G protein), gamma 11 204304 s a PROM1 prominin 1 5.32 204438 at MRC1 if MRC1L1 mannose receptor, C type 1 if mannose receptor, C type 1-like 1 2.33 204439 at IFI44L interferon-induced protein 44-like .1 204848 x a HBG1 HBG2 hemoglobin, gamma A if hemoglobin, gamma G 5.5 204913 s a SOX11 SRY (sex determining region Y)-box 11 5 205289 at BMP2 bone morphogenetic protein 2 205290 is a BMP2 bone morphogenetic protein 2 O 206067 s a WT1 Wilms tumor 1 207173 x a CDH11 cadherin 11, type 2, OB-cadherin (osteoblast) 2 207978 s a NR4A3 nuclear receptor Subfamily 4, group A, member 3 s 2 209167 at GPM6B glycoprotein M6B 2 2 2 209191 at TUBB6 tubulin, beta 6 2094.80 at HLA-DQB1 major histocompatibility complex, class II, DQ beta 1 209959 at NR4A3 nuclear receptor Subfamily 4, group A, member 3 2 210512 s at VEGFA vascular endothelial growth factor A O517 s at AKAP12 Akinase (PRKA) anchor protein (gravin) 12 O993 s at SMAD1 SMAD family member 1 1597 s at HOP homeodomain-only protein 2154 a SDC2 Syndecan 2 2192 a KCTD12 potassium channel tetramerisation domain containing 12 22.3 2592 a G Immunoglobulin J polypeptide, linker protein for immunoglobulin alpha and mu polypeptides 3371 a LDB3 LIM domain binding 3 3831 a HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1 3880 a LGRS leucine-rich repeat-containing G protein-coupled receptor 5 3894 a THSD7A thrombospondin, type I, domain containing 7A 4039 s a LAPTM4B lysosomal associated protein transmembrane 4 beta 4366 s a ALOX5 arachidonate 5-lipoxygenase 5028 a SEMA6A Sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6A 2 5177 s a TGA6 integrin, alpha 6 5721 a GHG1 immunoglobulin heavy constant gamma 1 (G1m marker) s 7022 s a GHA1 if IGHA2 immunoglobulin heavy constant alpha 1 if immunoglobulin heavy constant alpha 2 (A2m marker) 8469 a GREM1 gremlin 1, cysteine knot Superfamily, homolog (Xenopus laevis) 86.25 a. NRN1 neuritin 1 8793 s a SCML.1 sex comb on midleg-like 1 (Drosophila) Xp22.2-p22.1 8880 a FOSL2 FOS-like antigen 2 3.3 8899 s a BAALC brain and acute leukemia, cytoplasmic 2.3 219666 a MS4A6A membrane-spanning 4-domains, Subfamily A, member 6A 220448 a KCNK12 potassium channel, Subfamily K, member 12 220450 a hCG 1778643 hCG1778643 222101 s a DCHS1 dachsous 1 (Drosophila) 222154 s a LOC26010 viral DNA polymerase-transactivated protein 6 223449 a SEMA6A Sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6A 223.600 s a KIAA1683 KIAA1683 223708 a C1OTNF4 C1q and tumor necrosis factor related protein 4 225496 s a SYTL2 synaptotagmin-like 2 225548 a SHROOM3 shroom family member 3 225681 a CTHRC1 collagen triple helix repeat containing 1 225962 a ZNRF1 Zinc and ring finger 1 226244 a CLEC14A C-type lectin domain family 14, member A 226764 a LOC152485 hypothetical protein LOC152485 227361 a HS3ST3B1 heparan sulfate (glucosamine) 3-O-sulfotransferase 3B1 12-p11.2 227486 a NTSE 5'-nucleotidase, ecto (CD73) 4-q21 227530 a. AKAP12 Akinase (PRKA) anchor protein (gravin) 12 4-q25 US 8,568,974 B2 43 44 TABLE 6S-continued 100 Probe Sets used to define H-Groups Probe Set ID Gene Symbol Gene Title Chrom 227798 a SMAD1 SMAD family member 1 4q31 227923 a SHANK3 SH3 and multiple ankyrin repeat domains 3 22d 13.3 228.083 a CACNA2D4 calcium channel, Voltage-dependent, alpha 2 delta Subunit 4 12p13.33 228297 a Transcribed locus 228434 a BTNL9 butyrophilin-like 9 5q.35.3 228667 a AGPAT4 1-acylglycerol-3-phosphate O-acyltransferase 4 (lysophosphatidic 6q26 acid acyltransferase, delta) 228.737 a TOX2 TOX high mobility group box family member 2 20g 13.12 228854 a Transcribed locus 228988 a ZNF711 Zinc finger protein 711 Xq21.1-q21.2 229.072 a. CDNA clone IMAGE:5259272 2298.30 a. Transcribed locus 229902 a FLT4 fins-related tyrosine kinase 4 5q.35.3 231935 a. ARPP-21 cyclic AMP-regulated phosphoprotein, 21 kD 3p22.3 232231 a RUNX2 runt-related transcription factor 2 6p21 235099 a CMTM8 CKLF-like MARVEL transmembrane domain containing 8 3p22.3 235652 a CDNA FLJ37623 fis, clone BRCOC2014013 236203 a 236918 s at LRRC34 leucine rich repeat containing 34 3q26.2 238018 a hCG 1990.170 hypothetical protein LOC285016 2p25.3 238429 a TMEM11 transmembrane protein 71 8q24.22 238919 a Full-length cDNA clone CSODF024 YNO4 of Fetal brain of Homo sapiens (human) 240179 a 240336 a HBM hemoglobin, mu 16p13.3 241535 a. LOCA28176 hypothetical protein LOC728176 2p25.3 24-1844 x at TMEM156 transmembrane protein 156 4p14 2424.68 a 243756 a 244413 a CLECL1 C-type lectin-like 1 12p13.31 244623 a KCNQ5 potassium voltage-gated channel, KQT-like Subfamily, member 5 6q14 244665 a. Transcribed locus

TABLE 7S 215 ROSE Probe sets used to define R-Groups Probe Set ID Gene Symbol Gene Title Chrom 552398 a. at CLEC12A C-type lectin domain family 12, member A 12p13.2 55.2511 a. at CPA6 carboxypeptidase A6 8q13.2 552767 a. at HS6ST2 heparan sulfate 6-O-sulfotransferase 2 Xq26.2 553963 at RHOB ras homolog gene family, member B 2p24 55.4343 a at STAP1 signal transducing adaptor family member 1 4q13.2 554633 a at MYT1L. myelin transcription factor 1-like 2p25.3 555579 s at PTPRM protein tyrosine phosphatase, receptor type, M 18p11.2 555745 a. at LYZ lysozyme (renal amyloidosis) 12q15 556210 at CDNA FLJ38810 fis, clone LIVER2006251 557534 at LOC339862 hypothetical protein LOC339862 3p24.3 558214 s at CTNNA1 catenin (cadherin-associated protein), alpha 1, 102 kDa 5q31 558708 at NRXN1 neurexin 1 2p16.3 559394 a at Full length insert cDNA clone ZC65D06 559459 at LOC613266 hypothetical LOC613266 20p12.1 559477 s at MEIS1 Meis homeobox 1 2p14-p13 561025 at CDNA FLJ23762 fis, clone HEP18324 561765 at MRNA adjacent to 3' end of integrated HPV 16 (INT475) 563396 X at Homo sapiens, clone IMAGE:428.1761, mRNA 56.6825 at CDNA FLJ31010 fis, clone HLUNG2000174 567387 at 568603 at CADPS Ca2+-dependent secretion activator 56.9591 at F11 coagulation factor XI (plasma thromboplastin antecedent) 200799 at HSPA1A heat shock 70 kDa protein 1A 201105 at LGALS1 lectin, galactoside-binding, soluble, 1 (galectin 1) 2O1579 at FAT FAT tumor Suppressor homolog 1 (Drosophila) 201656 at ITGA6 integrin, alpha 6 201842 s at EFEMP1 EGF-contalning fibulin-like extracellular matrix protein 1 202178 at PRKCZ protein kinase C, Zeta 202207 at ARL4C ADP-ribosylation factor-like 4C 202273 at PDGFRB platelet-derived growth factor receptor, beta polypeptide 202336 s at PAM peptidylglycine alpha-amidating monooxygenase 202409 at IGF2 ... INS-IGF2 insulin-like growth factor 2 (somatomedin A) if insulin- insulin-like 11p15.5 growth factor 2 US 8,568,974 B2 45 46 TABLE 7S-continued 215 ROSE Probe Sets used to define R-Groups Probe Set ID Gene Symbol Gene Title Chrom 202411 at IFI27 interferon, alpha-inducible protein 27 4q32 202859 x at IL8 interleukin 8 4q13-q21 202917 s at S100A8 S100 calcium binding protein A8 q21 202988 s at RGS1 regulator of G-protein signaling 1 q31 203290 at HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1 6p21.3 203329 at PTPRM protein tyrosine phosphatase, receptor type, M 8p11.2 203476 at TPBG trophoblast glycoprotein 6q14-q15 203535 at S100A9 S100 calcium binding protein A9 q21 203695 s a DFNAS deafness, autosomal dominant 5 7p15 203726 s a LAMA3 laminin, alpha 3 8q11.2 203757 s a CEACAM6 carcinoembryonic antigen-related cell adhesion molecule 6 (non- 9q13.2 specific cross reacting antigen) 20386.5 s a ADARB1 adenosine deaminase, RNA-specific, B1 (RED1 homolog rat) 21q22.3 203910 at ARHGAP29 Rho GTPase activating protein 29 22.1 203921 at CHST2 carbohydrate (N-acetylglucosamine-6-O) sulfotransferase 2 3q24 203948 s a MPO myeloperoxidase 7q23.1 203949 at MPO myeloperoxidase 7q23.1 204014 at DUSP4 dual specificity phosphatase 4 8p12-p11 204066 s a CENTO2 centaurin, gamma 2 2p24.3-p24.1 204069 at MEIS1 Meis homeobox 1 2p14-p13 204114 at NID2 nidogen 2 (osteonidogen) 4q21-q22 204150 at STAB1 stabilin 1 3.21.1 204304 s a PROM1 prominin 1 4p15.32 204419 x a HBG2 hemoglobin, gamma G 1p15.5 204439 at IFI44L interferon-induced protein 44-like 31.1 204704 s a ALDOB aldolase B, fructose-bisphosphate 9q21.3-q22.2 204848 x a HBG1 HBG2 hemoglobin, gamma A if hemoglobin, gamma G 1p15.5 204895 x a MUC4 mucin 4, cell Surface associated 3q29 204913 s a SOX11 SRY (sex determining region Y)-box 11 225 204914 s a SOX11 SRY (sex determining region Y)-box 11 225 204.915 s a SOX11 SRY (sex determining region Y)-box 11 225 205239 at AREG II LOC727738 amphiregulin (schwaninoma-derived growth factor) i? similar to 4q13-q21 ft/ Amphiregulin precursor (AR) (Colorectum cell-derived growth 4q13.3 factor) (CRDGF) 205253 at PBX1 pre-B-cell leukemia homeobox 1 q23 205347 s at TMSL.8 thymosin-like 8 Xq21.33-q22.3 205413 at MPPED2 metallophosphoesterase domain containing 2 1p13 205445 at PRL prolactin 6p22.2-p21.3 205489 at CRYM crystallin, mu 6p13.11-p12.3 205656 at PCDH17 protocadherin 17 3q21.1 205844 at VNN1 vanin 1 6q23-q24 205899 at CCNA1 cyclin A1 3q12.3-q13 205950s at CA1 carbonic anhydrase I 8q13-q22.1 206028 s at MERTK c-mer proto-oncogene tyrosine kinase 2a14.1 206067 s at WT1 Wilms tumor 1 1p13 206070 s at EPHA3 EPH receptor A3 3p11.2 206181 at SLAMF1 signaling lymphocytic activation molecule family member 1 q22-q23 206258 at ST8SLAS ST8 alpha-N-acetyl-neuraminide alpha-2,8-sialyltransferase 5 8q21.1 206298 at ARHGAP22 Rho GTPase activating protein 22 Oq11.22 206310 at SPINK2 serine peptidase inhibitor, Kazal type 2 (acrosin-trypsin inhibitor) 4q12 206413 s a TCL1B TCL6 T-cell leukemia/lymphoma 1B fill T-cell leukemia/lymphoma 6 4q32.1 206478 at KIAAO125 KIAAO125 4q32.33 206633 at CHRNA1 cholinergic receptor, nicotinic, alpha 1 (muscle) 2d24-q32 206952 at G6PC glucose-6-phosphatase, catalytic subunit 7q21 207173 x a CDH11 cadherin 11, type 2, OB-cadherin (osteoblast) 6q22.1 207831 x a DHPS deoxyhypusine synthase 9p13.2-p13.1 2083.03 s a CRLF2 cytokine receptor-like factor 2 Xp22.3; Yp 11.3 208567 s a KCNJ12 potassium inwardly-rectifying channel, Subfamily J, member 12 7p11.1 209101 at CTGF connective tissue growth factor 6d23.1 209291 at ID4 inhibitor of DNA binding 4., dominant negative helix-loop-helix 6p22-p21 protein 209604 S a GATA3 GATA binding protein 3 Op15 209875 s a SPP1 Secreted phosphoprotein 1 (osteopontin, bonesialoprotein I, early 4q21-q25 T-lymphocyte activation 1) 209897 s a SLIT2 slit homolog 2 (Drosophila) 4p15.2 209905 at HOXA9 homeobox A9 7p15-p14 210016 at MYT1L. myelin transcription factor 1-like 2p25.3 21O150 s a LAMAS laminin, alpha 5 20d 13.2-q13.3 210664 s a TFPI tissue factor pathway inhibitor (lipoprotein-associated coagulation 2q32 inhibitor) 210665 at TFPI tissue factor pathway inhibitor (lipoprotein-associated coagulation 2q32 inhibitor) 2108.69 s a MCAM melanoma cell adhesion molecule 11q23.3 211341 at POU4F1 POU class 4 homeobox 1 13q31.1 211506 s a IL8 interleukin 8 4q13-q21 US 8,568,974 B2 47 48 TABLE 7S-continued 215 ROSE Probe Sets used to define R-Groups Probe Set ID Gene Symbol Gene Title Chrom 211657 a CEACAM6 carcinoembryonic antigen-related cell adhesion molecule 6 (non- 19q13.2 specific cross reacting antigen) 212062 a ATP9A ATPase, Class II, type 9A 20d 13.2 212077 a CALD1 caldesmon 1 7q33 212094 a PEG10 paternally expressed 10 7q21 212148 a PBX1 pre-B-cell leukemia homeobox 1 1q23 212151 a PBX1 pre-B-cell leukemia homeobox 1 1q23 212192 a KCTD12 potassium channel tetramerisation domain containing 12 13q22.3 213005 s at ANKRD15 ankyrin repeat domain 15 9p24.3 213150 a HOXA10 homeobox A10 7p15-p14 213258 a TFPI issue factor pathway inhibitor (lipoprotein-associated coagulation 2q32 inhibitor) 213317 a CLICS chloride intracellular channel 5 6p21.1-p12.1 213362 a PTPRD protein tyrosine phosphatase, receptor type, D 9p23-p24.3 213371 a LDB3 LIM domain binding 3 10q22.3-q23.2 213479 a NPTX2 neuronal pentraxin II 7q21.3-q22.1 213515 x a HBG1 HBG2 hemoglobin, gamma A if hemoglobin, gamma G 11 p15.5 213714 a CACNB2 calcium channel, Voltage-dependent, beta 2 subunit 10p12 213844 a HOXAS homeobox A5 7p15-p14 213880 a LGRS eucine-rich repeat-containing G protein-coupled receptor 5 12q22-q23 21414.6 s a PPBP pro-platelet basic protein (chemokine (C-X-C motif) ligand 7) 4q12-q13 214537 a HIST1H1D histone cluster 1, H1d 6p21.3 214651 s a HOXA9 homeobox A9 7p15-p14 215177 s a TGA6 integrin, alpha 6 2a31.1 215379 x a GL immunoglobulin lambda locus 22q11.1-q11.2 215692 s a MPPED2 metallophosphoesterase domain containing 2 1113 217109 a MUC4 mucin 4, cell Surface associated 3q29 217281 X a L8 interleukin 8 4q13-q21 217963 s a NGFRAP1 nerve growth factor receptor (TNFRSF16) associated protein 1 Xa22.2 218086 a NPDC1 neural proliferation, differentiation and control. 1 9q34.3 218847 a GF2BP2 insulin-like growth factor 2 mRNA binding protein 2 3q27.2 219463 a C20orf103 chromosome 20 open reading frame 103 2012 21948.9 s a NXN nucleoredoxin 713.3 220059 a STAP1 signal transducing adaptor family member 1 4q13.2 220377 a FAM3OA family with sequence similarity 30, member A 4q32.33 220416 a ATP8B4 ATPase, Class I, type 8B, member 4 5q21.2 221254 s a PITPNM3 PITPNM family member 3 713 221417 x a EDG8 endothelial differentiation, sphingolipid G-protein-coupled 913.2 receptor, 8 221933 a NLGN4X neuroligin 4, X-linked Xp22.32-p22.31 222934 s a CLEC4E C-type lectin domain family 4, member E 2p13.31 223121 s a SFRP2 secreted frizzled-related protein 2 4q31.3 223216 x a FBXO16 / ZNF395 zinc finger protein 395 / F-box protein 16 8p21.1 223786 a CHST6 carbohydrate (N-acetylglucosamine 6-O) sulfotransferase 6 6q22 224215 s a DLL1 delta-like 1 (Drosophila) 6q27 225483 a VPS26B vacuolar protein sorting 26 homolog B (S. pombe) 1d25 225496 s a SYTL2 synaptotagmin-like 2 1a14 225681 a CTHRC1 collagen triple helix repeat containing 1 8q22.3 226282 a Full length insert cDNA clone ZEO3F06 226415 a. KIAA1576 KIAA1576 protein 6d23.1 226733 a PFKFB2 6-phosphofructo-2-kinasef fructose-2,6-biphosphatase 2 q3 226913 s at SOX8 SRY (sex determining region Y)-box 8 6p13.3 227099 s at LOC387763 hypothetical LOC387763 1p11.2 227289 a PCDH17 protocadherin 17 3d21.1 227439 a ANKS1B ankyrin repeat and sterile alpha motif domain containing 1B 2d23.1 227440 a ANKS1B ankyrin repeat and sterile alpha motif domain containing 1B 2d23.1 227441 s at ANKS1B ankyrin repeat and sterile alpha motif domain containing 1B 2d23.1 227949 a PHACTR3 phosphatase and actin regulator 3 2013.32 228O17 s at C20orf58 chromosome 20 open reading frame 58 2013.33 228.057 a DDIT4L DNA-damage-inducible transcript 4-like 4q23 228262 a MAP7D2 MAP7 domain containing 2 Xp22.12 228434 a BTNL9 butyrophilin-like 9 5q.35.3 228462 a RX2 iroquois homeobox2 5p15.33 228863 a PCDH17 protocadherin 17 13q21.1 22.9233 a NRG3 neuregulin 3 10q22-q23 2294.61 x at NEGR1 neuronal growth regulator 1 1p31.1 229638 a RX3 iroquois homeobox 3 16q12.2 229661 a SALL4 sal-like 4 (Drosophila) 20d 13.13-q13.2 229975 a Transcribed locus 22.9985 a. BTNL9 Butyrophilin-like 9 5q.35.3 23.01.10 a MCOLN2 mucolipin 2 1p22 23.0128 a GL(a) Immunoglobulin lambda locus 22q11.1-q11.2 23O130 a. SLIT2 Slithomolog 2 (Drosophila) 4p15.2 230472 a. RX1 iroquois homeobox 1 5p15.3 230537 a US 8,568,974 B2 49 50 TABLE 7S-continued 215 ROSE Probe Sets used to define R-Groups Probe Set ID Gene Symbol Gene Title Chrom 230687 a SLC13A3 solute carrier family 13 (sodium-dependent dicarboxylate 20g 12-q13.1 transporter), member 3 230803 s at ARHGAP24 Rho GTPase activating protein 24 4q21.23-q21.3 230817 a FAM84B Family with sequence similarity 84, member B 8q24.21 231040 a CDNA FLJ43172 fis, clone FCBBF3007242 231166 a GPR155 G protein-coupled receptor 155 2a31.1 231223 a CSMD1 CUB and Sushi multiple domains 1 8p23. 231257 a TCERG1L, transcription elongation regulator 1-like 10q26.3 231455 a. FLJ42418 FLJ42418 protein 2p25.2 231771 a GB6 gap junction protein, beta 6 13q11 q12.113q12 231899 a ZC3H12C Zinc finger CCCH-type containing 12C 11q22.3 232231 a RUNX2 runt-related transcription factor 2 6p2 232523 a MEGF10 multiple EGF-like-domains 10 5q.33 232636 a SLITRK4 SLIT and NTRK-like family, member 4 Xq27.3 232914 sat SYTL2 synaptotagmin-like 2 11 q14 233225 a. CDNA FLJ36087 fis, clone TESTI2020283 233847 x at Uncharacterized gastric protein ZA31P 234261 a MRNA, cDNA DKFZp761M10121 (from clone DKFZp761M10121)- 234945 a. FAM54A family with sequence similarity 54, member A 6q23.3 235521 a HOXA3 homeobox A3 7p15-p14 235625 a. VPS41 vacuolar protein sorting 41 homolog (S. cerevisiae) 7p14-p13 235666 a ITGA8 integrin, alpha 8 10p13 23.5911 a LOC44O995 Hypothetical gene supported by BCO34933; BCO68085 3q29 235988 a GPR110 G protein-coupled receptor 110 6p12.3 236430 a. TMED6 transmembrane emp24 protein transport domain containing 6 16q22.1 236489 a Transcribed locus 236773 a Transcribed locus 238018 a hCG 1990.170 hypothetical protein LOC285016 2p25.3 238689 a GPR110 G protein-coupled receptor 110 6p12.3 239657 x at - 240179 a 240619 a Transcribed locus 240758 a 241535 a. LOCA28176 hypothetical protein LOC728176 2p25.3 241647 x at Transcribed locus 24-1960 a CSMD1 CUB and Sushi multiple domains 1 8p23.2 242172 a. MEIS1 Meis homeobox 1 2p14-p13 242385 a. RORB RAR-related orphan receptor B 9q22 242457 a Transcribed locus 2424.68 a 243533 x at — 244665 a. Transcribed locus 38487 at STAB1 stabilin 3p21.1 46665 at SEMA4C Sema domain, immunoglobulin domain (Ig), transmembrane 2d 11.2 domain (TM) and short cytoplasmic domain, (Semaphorin) 4C

TABLE 8S 215 COPA Probe sets used to define C-Groups (6 XIST probe sets in gray font Probe Set ID Gene Symbol Gene Title Chrom 1552398 a. at CLEC12A C-type lectin domain family 12, member A 160364 1553613 s at FOXC1 orkhead box C1 2296 1553629. a. at FAM71B amily with sequence similarity 71, member B 153745 1554343 a at STAP1 signal transducing adaptor family member 1 26228 1554633 a at MYT1L, myelin transcription factor 1-like 23040 1555579 s at PTPRM protein tyrosine phosphatase, receptor type, M 5797 1555745 a. at LYZ ysozyme (renal amyloidosis) 4069 1557534 at LOC339862 hypothetical protein LOC339862 339862 15594.77 s at MEIS1 Meis homeobox 1 4211 1559696 at Full length insert cDNA clone YW24B11 1566772 at MRNA, cDNA DKFZp547L1918 (from clone DKFZp547L1918) 1568.603 at CADPS Ca2+-dependent secretion activator 86.18 200799 at HSPA1A heat shock 70 kDa protein 1A 3303 200800 s at HSPA1A, HSPA1B heat shock 70 kDa protein 1A heat shock 70 kDa protein 1B 33O3, 3304 201105 at LGALS1 ectin, galactoside-binding, soluble, 1 (galectin 1) 3956 201215 at PLS3 plastin 3 (T isoform) 5358 2O1579 at FAT FAT tumor Suppressor homolog 1 (Drosophila) 2195 201656 at ITGA6 integrin, alpha 6 3655 201842 s at EFEMP1 EGF-containing fibulin-like extracellular matrix protein 1 22O2 202018 s at LOCA2832O, LTF actotransferrin similar to lactotransferrin 4057, 72832O US 8,568,974 B2 51 52 TABLE 8S-continued 215 COPA Probe Sets used to define C-Groups (6 XIST probe Sets in gray font Probe Set ID Gene Symbol Gene Title Chrom 202178 at PRKCZ protein kinase C, Zeta 5590 202411 at IFI27 interferon, alpha-inducible protein 27 3.429 202859 x at IL8 interleukin 8 3576 202917 s at S100A8 S100 calcium binding protein A8 6279 203131 at PDGFRA platelet-derived growth factor receptor, alpha polypeptide 5156 203153 at IFIT1 interferon-induced protein with tetratricopeptide repeats 1 3434 203290 at HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1 3117 203329 at PTPRM protein tyrosine phosphatase, receptor type, M 5797 203335 at PHYH phytanoyl-CoA 2-hydroxylase 5264 203476 at TPBG trophoblast glycoprotein 7162 203535 at S100A9 S100 calcium binding protein A9 628O 203695 s at DFNAS deafness, autosomal dominant 5 1687 203757 s at CEACAM6 carcinoembryonic antigen-related cell adhesion molecule 6 (non- 468O specific cross reacting antigen) 20386.5 s at ADARB1 adenosine deaminase, RNA-specific, B1 (RED1 homolog rat) 104 203921 a CHST2 carbohydrate (N-acetylglucosamine-6-O) sulfotransferase 2 9435 203948 s at MPO myeloperoxidase 4353 203949 a MPO myeloperoxidase 4353 203973 s at CEBPD CCAAT? binding protein (C/EBP), delta 1052 204014 a DUSP4 dual specificity phosphatase 4 1846 204066 s at CENTO2 centaurin, gamma 2 116987 204069 a MEIS1 Meis homeobox 1 4211 204114 a NID2 nidogen 2 (osteonidogen) 22795 204134 a PDE2A phosphodiesterase 2A, c0MP-stimulated S138 204150 a STAB1 stabilin 1 23166 204273 a EDNRB endothelin receptor type B 1910 204304 sat PROM1 prominin 1 8842 204351a. S1 OOP S100 calcium binding protein P 6286 204363 a F3 coagulation factor III (thromboplastin, tissue factor) 215.2 204419 x at HBG2 hemoglobin, gamma G 3048 204439 a FI44L interferon-induced protein 44-like 10964 204469 a PTPRZ1 protein tyrosine phosphatase, receptor-type, Z polypeptide 1 S803 204482 a CLDNS claudin 5 (transmembrane protein deleted in velocardiofacial 7122 syndrome) 204745 x at MT1G metallothionein 1G 4495 204848 x at HBG1, HBG2 hemoglobin, gamma Aff hemoglobin, gamma G 3047, 3048 204895 X at MUC4 mucin 4, cell Surface associated 4585 204913 s at SOX11 SRY (sex determining region Y)-box 11 6664 204914 sat SOX11 SRY (sex determining region Y)-box 11 6664 204.915 s at SOX11 SRY (sex determining region Y)-box 11 6664 205239 a AREG///LOC727738 amphiregulin (schwaninoma-derived growth factor), similar to 374, 727738 Amphiregulin precursor (AR) (Colorectum cell-derived growth factor) (CRDGF) 205253 a PBX1 pre-B-cell leukemia homeobox 1 5087 205347 s at TMSL.8 thymosin-like 8 11013 205445 a. PRL prolactin 5617 205489 a CRYM crystallin, mu 1428 205656 a PCDH17 protocadherin 17 27253 205844 a VNN1 vanin 1 8876 205863 a S100A12 S100 calcium binding protein A12 6283 205899 a CCNA1 cyclin A1 89.00 205950s at CA1 carbonic anhydrase I 759 206070 s at EPHA3 EPH receptor A3 2042 206258 a ST8SLAS ST8 alpha-N-acetyl-neuraminide alpha-2,8-sialyltransferase 5 29906 206310 a SPINK2 serine peptidase inhibitor, Kazal type 2 (acrosin-trypsin inhibitor) 6691 206413 s at TCL1B, TCL6 T-cell leukemia/lymphoma 1B IIT-cell leukemia/lymphoma 6 27004, 9623 206461 x at MT1H NMT1P2 metallothionein 1H //metallothionein 1 pseudogene 2 4496, 645745 206478 a KIAAO125 KIAAO125 98.34 206633 a CHRNA1 cholinergic receptor, nicotinic, alpha 1 (muscle) 1134 206836 a SLC6A3 Solute carrier family 6 (neurotransmitter transporter, dopamine), 6531 member 3 207110 a KCNJ12 potassium inwardly-rectifying channel, Subfamily J, member 12 3768 207173 x at CDH11 cadherin 11, type 2, OB-cadherin (osteoblast) 1009 208173 a FNB1 interferon, beta 1, fibroblast 3456 2083.03 s at CRLF2 cytokine receptor-like factor 2 64109 208567 s at KCNJ12 potassium inwardly-rectifying channel, Subfamily J, member 12 3768 208581 X at MT1X metallothionein 1X 45O1 208937 s at D1 inhibitor of DNA binding 1, dominant negative helix-loop-helix 3397 protein 209289 at NFIB nuclear factor IB 4781 209290 s at NFIB nuclear factor IB 4781 209291 at D4 inhibitor of DNA binding 4., dominant negative helix-loop-helix 3400 protein 209301 at CA2 carbonic anhydrase II 760 209369 at ANXA3 annexin A3 306 209728 at HLA-DRB4 major histocompatibility complex, class II, DR beta 4 3126 US 8,568,974 B2 53 54 TABLE 8S-continued 215 COPA Probe Sets used to define C-Groups (6 XIST probe Sets in gray font Probe Set ID Gene Symbol Gene Title Chrom 209757 s at MYCN v-myc myelocytomatosis viral related oncogene, neuroblastoma 4613 derived (avian) 209897 s at SLIT2 slit homolog 2 (Drosophila) 9353 209905 at HOXA9 homeobox A9 3205 210016 at MYT1L. myelin transcription factor 1-like 23040 210254 at MS4A3 membrane-spanning 4-domains, Subfamily A, member 3 932 (hematopoietic cell-specific) 210664 S at TFPI tissue factor pathway inhibitor (lipoprotein-associated coagulation 7035 inhibitor) 210665 at TFPI tissue factor pathway inhibitor (lipoprotein-associated coagulation 7035 inhibitor) 211338 at IFNA2 interferon, alpha 2 3440 211456 x at MT1P2 metallothionein 1 pseudogene 2 64,5745 211506 s at IL8 interleukin 8 3576 211560s at ALAS2 aminolevulinate, delta-, synthase 2 (sideroblastic hypochromic 212 anemia) 211597 s at HOP homeodomain-only protein 84525 211639 x at SKAP2 Src kinase associated phosphoprotein 2 8935 211657 a CEACAM6 carcinoembryonic antigen-related cell adhesion molecule 6 (non- 468O specific cross reacting antigen) 212062 a ATP9A ATPase, Class II, type 9A 10079 212094 a PEG10 paternally expressed 10 23O89 212104 S at RBM9 RNA binding motif protein 9 23S43 212148 a PBX1 pre-B-cell leukemia homeobox 1 5087 212151 a PBX1 pre-B-cell leukemia homeobox 1 5087 212185 X at MT2A metallothionein 2A 45O2 212592 a IG mmunoglobulin J polypeptide, linker protein for immunoglobulin 3512 alpha and mu polypeptides 212859 x at MT1E metallothionein 1E 4493 213005 s at ANKRD15 ankyrin repeat domain 15 23 189 213258 a TFPI issue factor pathway inhibitor (lipoprotein-associated coagulation 7035 inhibitor) 213317 a CLICS chloride intracellular channel 5 53405 213371 a LDB3 LIM domain binding 3 11155 213479 a NPTX2 neuronal pentraxin II 4.885 213515 x a HBG1, HBG2 hemoglobin, gamma Aff hemoglobin, gamma G 3047, 3048 213844 a HOXAS homeobox A5 32O2 214218 s a XIST X (inactive)-specific transcript 7503 214349 a Transcribed locus 214651 s a HOXA9 homeobox A9 3205 214774 x a TOX3 TOX high mobility group box family member 3 27324 215177 s a ITGA6 integrin, alpha 6 3655 215214 a IGL(a) mmunoglobulin lambda locus 3535 215379 x a IGL(a)///IGLJ3/// immunoglobulin lambda locus immunoglobulin lambda variable 28793, 28815, IGLV2-14, FIGLV3- 3-25, immunoglobulin lambda variable 2-14?, immunoglobulin 28831, .353S 25 ambda joining 3 215692 s a MPPED2 metallophosphoesterase domain containing 2 744 215784 at CD1E CD1e molecule 913 216336 x a MT1A, FMT1M metallothionein 1A metallothionein 1M1 metallothionein 1 4489, 4499, 645745 MT1P2 pseudogene 2 216401 x a mmunoglobulin kappa light chain (IGKV gene), cell line JVM-2, clone 1 216491 x a GHM immunoglobulin heavy constant mu 3507 216853 x a GL(a) mmunoglobulin lambda locus 3535 216984 x a GL(a) mmunoglobulin lambda locus 3535 217083 at MAPKAPKS Mitogen-activated protein kinase-activated protein kinase 5 85.50 217109 at MUC4 mucin 4, cell Surface associated 4585 21711.0 s a MUC4 mucin 4, cell Surface associated 4585 217148 x a GLV2-14 immunoglobulin lambda variable 2-14 2881S 217179 x a Anti-thyroglobulin light chain variable region 21723.5 x a mmunoglobulin (mAb56) light chain V region mRNA, partial Sequence 217258 x a VD Sovaleryl Coenzyme A dehydrogenase 3712 217963 s a NGFRAP1 nerve growth factor receptor (TNFRSF16) associated protein 1 27.018 219463 at C20orf103 chromosome 20 open reading frame 103 24141 21948.9 s a NXN nucleoredoxin 64359 220010 at KCNE1L, KCNE1-like 23630 220059 at STAP1 signal transducing adaptor family member 1 26228 220416 at ATP8B4 ATPase, Class I, type 8B, member 4 798.95 221215 s a RIPK4 receptor-interacting serine-threonine kinase 4 S4101 221254 s a PITPNM3 PITPNM family member 3 83394 221 28 x a XIST X (inactive)-specific transcript 7503 221766 s a FAM46A amily with sequence similarity 46, member A 55603 221933 at NLGN4X neuroligin 4, X-linked 57502 222288 at Transcribed locus, moderately similar to XP 517655.1 similar to KIAAO825 protein Pan troglodytes US 8,568,974 B2 55 56 TABLE 8S-continued 215 COPA Probe sets used to define C-Groups (6 XIST probe sets in gray font Probe Set ID Gene Symbol Gene Title Chrom 222934 sat CLEC4E C-type lectin domain family 4, member E 26253 223121 sat SFRP2 secreted frizzled-related protein 2 6423 223278 a GJB2 gap Junction protein, beta 2, 26 kDa 27O6 223786 a SFTPA1, SFTPA1B, . Surfactant, pulmonary-associated protein A1B, surfactant, 6435, 6436. SFTPA2, pulmonary-associated protein A2B, surfactant, pulmonary 653509, 729238 SFTPA2B associated protein A1, surfactant, pulmonary-associated protein A2 223786 a carbohydrate (N-acetylglucosamine 6-O) sulfotransferase 6 4166 224215 s at delta-like 1 (Drosophila) 28S1.4 224.588 a X (inactive)-specific transcript 7503 224.589 a X (inactive)-specific transcript 7503 224.590 a X (inactive)-specific transcript 7503 225496 s at synaptotagmin-like 2 S4843 255.660 a Sema domain, transmembrane domain (TM), and cytoplasmic 57.556 domain, (semaphorin) 6A 225681 a collagen triple helix repeat containing 1 115908 226282 a Full length insert cDNA done ZEO3FO6 226415 a. KIAA1576 protein 57687 226621 a 226676 a Zinc finger protein 521 25925 226677 a Zinc finger protein 521 25925 226757 a interferon-induced protein with tetratricopeptide repeats 2 3433 226913 s at SRY (sex determining region Y)-box 8 3O812 227099 s at hypothetical LOC387763 387763 227289 a protocadherin 17 27253 227439 a ankyrin repeat and sterile alpha motif domain containing 1B 56899 227441 s at ankyrin repeat and sterile alpha motif domain containing 1B 56899 227671 a X (inactive)-specific transcript 7503 227949 a phosphatase and actin regulator 3 116154 228O17 s at chromosome 20 open reading frame 58 128414 228.057 a DNA-damage-inducible transcript 4-like 115265 228434 a butyrophilin-like 9 153579 228462 a iroquois homeobox2 153572 228854 a Transcribed locus 228863 a protocadherin 17 27253 22.9233 a neuregulin 3 10718 2294.61 x at neuronal growth regulator 1 257.194 229638 a iroquois homeobox 3 79191 229661 a sal-like 4 (Drosophila) 57167 22.9985 a. Butyrophilin-like 9 153579 23.0128 a Immunoglobulin lambda locus 3535 230472 a. iroquois homeobox 1 79.192 230537 a 231040 a CDNA FLJ43172 fis, clone FCBBF3007242 231223 a CUB and Sushi multiple domains 1 64.478 231257 a transcription elongation regulator 1-like 256.536 231771 8 1 gap junction protein, beta 6 10804 232231 a runt-related transcription factor 2 860 232523 a multiple EGF-like-domains 10 84.466 235988 a G protein-coupled receptor 110 266977 236489 a Transcribed locus 237613 a FOXR1 forkhead box R1 2831SO 238018 a hCG 1990.170 hypothetical protein LOC285016 28SO16 238423 a SYTL3 synaptotagmin-like 3 94120 238689 a GPR110 G protein-coupled receptor 110 266977 238900 a HLA-DRB1 HLA major histocompatibility complex, class II, DR beta 1/f major 3123, .3125, 73O415 DRB3, LOCA3O415 histocompatibility complex, class II, DR beta 3/7/hypothetical protein LOC730415 240179 a 240336 a hemoglobin, mu 3O42 240758 a 24O794 a Neuronal PAS domain protein 4 266743 24-1960 a CUB and Sushi multiple domains 1 64.478 242172 a. Meis homeobox 1 4211 242457 a Transcribed locus 2424.68 a 242747 a 243533 x at 244463 a ADAM23 ADAM metallopeptidase domain 23 244665 a. Transcribed locus US 8,568,974 B2 57 58 Probesets Associated with Rose Clusters (by Average Rank Order) TABLE 9S Top 50 Rl Probe Set ID Rank Gene Gene Title EntrezID Chrom 242172 a. 96 MEIS1 Meis homeobox 1 4211 2p14-p13 1559477 s at 96 MEIS1 Meis homeobox 1 4211 2p14-p13 204069 a 94 MEIS1 Meis homeobox 1 4211 2p14-p13 219463 a 93 C20orf103 chromosome 20 open reading frame 103 24141 20p12 235479 a 93 CPEB2 cytoplasmic polyadenylation element binding protein 2 132864 4p15.33 1558111 at 93 MBNL1 muscleblind-like (Drosophila) 4154 3q25 226415 a. 90 KIAA1576 KIAA1576 protein 57687 16q23.1 227877 a 89 CSOrf39 chromosome 5 open reading frame 39 389.289 5p12 235879 a 89 MBNL1 Muscleblind-like (Drosophila) 4154 3q25 226939 a 88 CPEB2 cytoplasmic polyadenylation element binding protein 2 132864 4p15.33 213844 a 87 HOXAS homeobox A5 32O2 7p15-p14 202976 s at 86 RHOBTB3 Rho-related BTB domain containing 3 22836 5q15 202975 s at 86 RHOBTB3 Rho-related BTB domain containing 3 22836 5q15 232645 a. 85 LOC153684 hypothetical protein LOC153684 153684 5p12 225202 a 85 RHOBTB3 Rho-related BTB domain containing 3 22836 5q15 241681 a 85 - Transcribed locus 3q25.2 242414 a 84 QPRT quinolinate phosphoribosyltransferase (nicotinate- 23475 16p11.2 nucleotide pyrophosphorylase (carboxylating)) 1568589 at 84 — Clone FLB3512 mRNA sequence 10q21.3 209905 a 84 HOXA9 homeobox A9 3205 7p15-p14 238712 a 83 – Transcribed locus 3p 14.1 228365 a. 82 CPNE8 copine VII 1444O2 12q12 235291 s at 82 FLJ32255 hypothetical protein LOC643977 643977 5p12 201105 a 82 LGALS1 ectin, galactoside-binding, soluble, 1 (galectin 1) 3956 22d 13.1 204044 a 81 QPRT quinolinate phosphoribosyltransferase (nicotinate- 23475 16p11.2 nucleotide pyrophosphorylase (carboxylating)) 238498 a 81 — MRNA full length Insert cDNA clone EUROIMAGE – 6q23.3 O902O7 219988 s at 81 C1orf164 chromosome 1 open reading frame 164 55182 1p34.1 205899 a 81 CCNA1 cyclin A1 89.00 13q12.3- q13 227235 a. 81 — CDNA clone IMAGE:5302158 4.d32.1 209822 s at 80 VLDLR very low density lipoprotein receptor 7436 9p24 155.6657 at 80 — CDNA FLJ36459 (fis, clone THYMU2014762 3d25.2 215163 a 80 — 3d27.2 222409 a 80 CORO1C coronin, actin binding protein, 1C 23603 2q24.1 232298 a 79 hoG 1806964 hoG1806964 401093 3d25.1 212588 a 79 PTPRC protein tyrosine phosphatase, receptor type, C 5788 q31-q32 214651s at 79 HOXA9 homeobox A9 3205 7p15-p14 204304 sat 79 PROM1 prominin 1 8842 4p15.32 204526 s at 79 TBC1D8 TBC1 domain family, member 8 (with GRAM domain) 11138 2d 11.2 210555 s at 79 NFATC3 nuclear factor of activated T-cells, cytoplasmic, 4775 6q22.2 calcineurin-dependent 3 2098.25 s at 78 UCK2 uridine-cytidine kinase 2 7371 d23 240180 at 78 - MRNA full length insert cDNA clone EUROIMAGE 6d23.3 1090207 201875 s at 78 LOC644387 fff myelin protein zero-like 1 i? similar to myelin protein 644387 ff, 1d24.2 fif MPZL.1 zero-like 1 isoform a 9019 711.21 202890 at 78 MAP7 microtubule-associated protein 7 9053 6d23.3 201153 s at 78 MBNL1 muscleblind-like (Drosophila) 4154 3d25 226568 at 78 FAM102B family with sequence similarity 102, member B 284611 p13.3 213147 at 78 HOXA10 homeobox A10 32O6 7p15-p14 206289 at 78 HOXA4 homeobox A4 32O1 7p15-p14 243605 at 78 - Transcribed locus 4p15.33 234032 at 78 - PRO1550 9p 13.2 209101 at 78 CTGF connective tissue growth factor 1490 6q23.1 227534 at 77 C9Crf21 chromosome 9 open reading frame 21 195827 9q22.32

TABLE 1 OS Top 50 R2 Probe Set ID Rank Gene Gene Title EntrezID Chrom 212148 at 196 PBX1 pre-B-cell leukemia homeobox 1 5087 1q23 212151 at 196 PBX1 pre-B-cell leukemia homeobox 1 5087 1q23 205253 at 195 PBX1 pre-B-cell leukemia homeobox 1 5087 1q23 206028 s at 195 MERTK c-mer proto-oncogene tyrosine kinase 10461 2d 14.1 225235 at 195 TSPAN17 tetraspanin 17 26262 5q.35.3 227439 at 195 ANKS1B ankyrin repeat and sterile alpha motif domain containing 56899 12q23.1 US 8,568,974 B2 59 60 TABLE 10S-continued Top 50 R2 Probe Set ID Rank Gene Gene Title EntrezID Chrom

1B 227440 at 95 ANKS1B ankyrin repeat and sterile alpha motif domain containing 56899 12q23.1 1B 227441 s at 95 ANKS1B ankyrin repeat and sterile alpha motif domain containing 56899 12q23.1 1B 227949 at 95 PHACTR3 phosphatase and actin regulator 3 116154 20d 13.32 232289 at 95 KCNJ12 potassium inwardly-rectifying channel, Subfamily J, 3768 17p11.1 member 12 234261 at 95 - MRNA, cDNA DKFZp761M10121 (from clone 12q23.1 DKFZp761M10121) 202178 at 94 PRKCZ protein kinase C, Zeta 5590 1p36.33 p36.2 202206 at 94 ARL4C ADP-ribosylation factor-like 4C 101.23 2C37.1 202207 at 94 ARL4C ADP-ribosylation factor-like 4C 101.23 2C37.1 204114 at 94 NID2 nidogen 2 (osteonidogen) 22795 14q21-q22 211913 s at 94 MERTK c-mer proto-oncogene tyrosine kinase 10461 2a14.1 46665 at 94 SEMA4C Sema domain, immunoglobulin domain (Ig), S4910 2d 11.2 transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 4C 224022 X at 94 WNT16 wingless-type MMTV integration site family, member 1651384 7q31. 225483 at 94 WPS2.6B vacuolar protein sorting 26 homolog B (S. pombe) 112936 1q25 23.5911 at 94 LOC440995 Hypothetical gene supported by BCO34933; BCO68085 440995 3d29 238778 at 94 MPP7 membrane protein, palmitoylated 7 (MAGUK p55 143098 Op11.23 Subfamily member 7) 2O1579 at 93 FAT FAT tumor suppressor homolog 1 (Drosophila) 2195 4.d35 202208 s at 93 ARL4C ADP-ribosylation factor-like 4C 101.23 2C37.1 212789 at 93 NCAPD3 non-SMC condensin II complex, subunit D3 23310 1d25 223693 s at 93 FLJ10324 FLJ10324 protein 55698 7p22.1 2292.47 at 93 FLJ37440 hypothetical protein FLJ37440 1298.04 2d 13 206181 at 92 SLAMF1 signaling lymphocytic activation molecule family 6504 q22-q23 member 1 209558 s at 92 HIP1R, huntingtin interacting protein 1 related similar to 728O14 12q24 LOC728O14 huntingtin interacting protein 1 related 9026 2d24.31 213005 s at 92 ANKRD15 ankyrin repeat domain 15 231.89 9p24.3 38340 at 92 HIP1R, huntingtin interacting protein 1 related similar to 728O14 12q24 LOC728O14 huntingtin interacting protein 1 related 9026 2d24.31 230306 at 92 WPS2.6B vacuolar protein sorting 26 homoiog B (S. pombe) 112936 1d25 204225 at 91 HDAC4 histone deacetylase 4 97.59 2a37.3 22.977O at 91 GLT1D1 glycosyltransferase 1 domain containing 1 144423 2d24.32 243533 x at 91 — 2d23.1 206255 at 90 BLK B lymphoid tyrosine kinase 640 8p23-p22 21O150 s at 90 LAMAS aminin, alpha 5 3911 20a13.2- q13.3 225313 at 90 C20orf177 chromosome 20 open reading frame 177 63939 20a13.2- q13.33 231040 at 90 - CDNA FLJ43172 fis, clone FCBBF3007242 9q21.13 242385 at 90 RORB RAR-related orphan receptor B 6096 9q22 200790 at 89 ODC1 ornithine decarboxylase 1 4953 2p25 205159 at 89 CSF2RB colony stimulating factor 2 receptor, beta, low-affinity 1439 22d 13.1 (granulocyte-macrophage) 242957 at 89 VWCE von Willebrand factor C and EGF domains 22OOO1 11 q12.2 208567 s at 88 KCNJ12 potassium inwardly-rectifying channel, Subfamily J, 3768 17p11.1 member 12 1559394 a at 88 — Full length insert cDNA clone ZC65D06 1p31.3 215807 s at 87 PLXNB1 plexin B1 S364 3p21.31 220911 s at 8.7 KIAA1305 KIAA1305 57523 14q12 234985 at 87 LDLRAD3 low density lipoprotein receptor class A domain 143458 11 p.13 containing 3 235666 at 87 ITGA8 integrin, alpha 8 8516 10p13 202478 at 86 TRIB2 tribbles homobg 2 (Drosophila) 28.951 2p25.1- p24.3 204202 at 86 IQCE IQ motif containing E 23288 7p22.2

TABLE 11S Top 50 R2A Probe Set ID Rank Gene Gene Title EntrezID Chrom 205659 at 2O1 HDAC9 histone deacetylase 9 9734 7p21.1 217869 at 2O1 HSD17B12 hydroxysteroid (17-beta) dehydrogenase 12 51144 11p11.2 23.0128 at 199 IGL(a) Immunoglobulin lambda locus 3535 22q11.1- q11.2 US 8,568,974 B2 61 62 TABLE 11 S-continued

Top 50 R2A

Probe Set ID Rank Gene Gene Title Entrez.ID

230968 a 97 Full-length cDNA clone CSODF032YA11 of Fetal brain of Homo sapiens (human) 242616 a 97 Transcribed locus 225496 s a 95 SYTL2 synaptotagmin-like 2 S4843 202780 a 95 OXCT1 3-oxoacid CoA transferase 1 SO19 204852 s a 95 PTPN7 protein tyrosine phosphatase, non-receptor type 7 5778 225961 a 94 KLHDCS ketch domain containing 5 57542 213502 x a 94 LOC91.316 similar to bK246H3.1 (immunoglobulin lambda-like 91316 polypeptide 1, pre-B-cell specific) 215946 x a 94 CTA-246H3.1 similar to omega protein 91.353 218942 a. 93 PIP4K2C phosphatidylinsoitol-5-phosphate 4-kinase, type II, 79837 gamma 204891 s a 93 LCK ymphocyte-specific protein tyrosine kinase 3932 1552496 a. at 92 COBL cordon-bleu homolog (mouse) 23242 213050 a 92 COBL cordon-bleu homolog (mouse) 23242 232914 s a 91 SYTL2 synaptotagmin-like 2 S4843 1552760 at 91 HDAC9 histone deacetylase 9 9734 235802 a 91 PLD4 phospholipase D family, member 4 122618 23.7625 s a 91 mmunoglobulin light chain variable region complementarity determining region (CDR3) mRNA 213243 a 90 VPS13B vacuolar protein sorting 13 homolog B (yeast) 15768O 204890s a 90 LCK ymphocyte-specific protein tyrosine kinase 3932 205484 a 89 SIT1 signaling threshold regulating transmembrane adaptor 1 27240 203263 s a 89 ARHGEF9 Cdc42 guanine nucleotide exchange factor (GEF) 9 23229 242952 a 89 221584 S a 89 KCNMA1 potassium large conductance calcium-activated 3778 channel, Subfamily M, alpha member 1 216218 s a 89 PLCL2 phospholipase C-like 2 23228 201216 a 88 ERP29 endoplasmic reticulum protein 29 10961 213348 a 88 CDKN1C cyclin-dependent kinase inhibitor 1C (p57, Kip2) 1028 1557252 at 88 CDNA FLJ36213 fis, clone THYMU2000671 223059 is a 88 FAM107B amily with sequence similarity 107, member B 83641 213309 a 88 PLCL2 phospholipase C-like 2 23228 221671 x a 88 GKC immunoglobulin kappa constant if immunoglobulin 28.299 GKV1-5 kappa variable 1-5 i? immunoglobulin kappa variable 28923 GKV2-24 2-24 3514 223017 a 87 TXNDC12 hioredoxin domain containing 12 (endoplasmic S1060 reticulum) 20386.5 s at 87 ADARB1 adenosine deaminase, RNA-specific, B1 (RED1 104 homolog rat) 235721 a 87 DTX3 deltex 3 homolog (Drosophila) 1964.03 241871 a 87 CAMK4 calcium calmodulin-dependent protein kinase IV 814 221 651 X at 87 GKC immunoglobulin kappa constant if immunoglobulin 28.299 GKV1-5 kappa variable 1-5 i? immunoglobulin kappa variable 28923 GKV2-24 2-24 3514 20284.4 S at 86 RALBP1 ralAbinding protein 1 10928 1.3 214785 at 86 VPS13A vacuolar protein sorting 13 homolog A (S. cerevisiae) 23230 204129 at 86 BCL9 B-cell CLL/lymphoma 9 607 229029 at 86 .1 1553423 a at 86 SLFN13 schlafen family member 13 146857 224795 X at 86 GKC immunoglobulin kappa constant if immunoglobulin 28.299 GKV1-5 kappa variable 1-5 i? immunoglobulin kappa variable 28923 GKV2-24 2-24 3514 219517 at 85 ELL3 elongation factor RNA polymerase II-like 3 80237 15d 5.3 226325 at 85 ADSSL1 adenylosuccinate synthase like 1 122622 14d3 2.33 219737 s at 85 PCDH9 protocadherin 9 S101 13d 4.3- q21.1 214677 x at 85 GL(a) f/ immunoglobulin lambda locus if immunoglobulin 28786 if 22d 1.1- GLT3 .. ambda variable 4-3 i? immunoglobulin lambda variable 28793 q11.2 GLV2-14 3-25 |f| immunoglobulin lambda variable 2-14 fif 2881S 22d 1.2 GLV3-2S immunoglobulin lambda joining 3 28831 if GLV4-3 3535 203431 s at 185 RICS Rho GTPase-activating protein 97.43 210791 s at 185 RICS Rho GTPase-activating protein 97.43 214836 x at 185 GKC immunoglobulin kappa constant if immunoglobulin 28.299 kappa variable 1-5 3514 US 8,568,974 B2 63 64 TABLE 12S

Top 50 R4

Probe Set ID Rank Gene Gene Title Entrez.ID

229661 a 201 SALL4 sal-like 4 (Drosophila) 57167

212062 a 201 ATP9A ATPase, Class II, type 9A 10079 2096O2 s at 97 GATA3 GATA binding protein 3 2625 1554903 at 96 FRMD8 FERM domain containing 8 83786 1554905. X at 96 FRMD8 FERM domain containing 8 83786 227595 a 96 ZMYM6 Zinc finger, MYM-type 6 92.04 1559916 a. at 95 - Homo sapiens, clone IMAGE:4723617, mRNA 1556385 at 95 - CDNA FLJ39926 fis, clone SPLEN2021157 209604 S at 94 GATA3 GATA binding protein 3 2625 216129 a 94 ATP9A ATPase, Class II, type 9A 10079 219999 a 94. MAN2A2 mannosidase, alpha, class 2A, member 2 4122 218589 a 93 P2RYS purinergic receptor P2Y, G-protein coupled, 5 1O161 243121 x at 93 - 214211 a 92 FTH1 ferritin, heavy polypeptide 1 if ferritin, heavy 2495 FTHL16 polypeptide-like 16 2508 202530 a. 92 MAPK14 mitogen-activated protein kinase 14 1432 204689 a 92 HEHEX hematopoietically expressed homeobox 3O87 222620 s at 92 DNAJC1 DnaJ (Hsp40) homolog, subfamily C, member 1 64215 1564164 at 92 C1orf218 chromosome 1 open reading frame 218 54530 235142 a. 91 LOC730411 zinc finger and BTB domain containing 8 i? similar to 653121 ft/ if ZBTB8 Zinc finger and BTB domain containing 8 73O411 202499 s at 91 SLC2A3 solute carrier family 2 (facilitated glucose transporter), 6515 member 3 201379 s at 91 TPD52L2 tumor protein D52-like 2 71.65 229744 at 91 SSFA2 Sperm specific antigen 2 6744 1557948 at 91 LOC653583 pleckstrin homology-like domain, family B, member 3 ? 284345 fit PHLDB3 (if similar to pleckstrinhomology-like domain, family B, 653583 member 1 225799 at 91 C2orf59 fif open reading frame 59 i? hypothetical 112597 fif LOCS41471 LOCS41471 541471 218927 s at 90 CHST12 carbohydrate (chondroitin 4) sulfotransferase 12 555O1 202032 s at 90 MAN2A2 mannosidase, alpha, class 2A, member 2 4122 222621 at 90 DNAJC1 DnaJ (Hsp40) homolog, subfamily C, member 1 64215 205423 at 89 AP1B1 adaptor-related protein complex 1, beta 1 subunit 162 22q1222(q12.2 200677 at 89 PTTG1 IP pituitary tumor-transforming 1 interacting protein 754 21q22.3 228297 at 89 - Transcribed locus 1p21.3 210665 at 89 TFPI issue factor pathway inhibitor (lipoprotein-associated 7035 coagulation inhibitor) 210664 S at 89 TFPI issue factor pathway inhibitor (lipoprotein-associated 7035 coagulation inhibitor) 21818.9 s at 89 NANS N-acetylneuraminic acid synthase (sialic acid 541.87 synthase) 228188 a 89 - 60471 at 89 RIN3 Ras and Rab interactor 3 79890 1563473 at 88 — MRNA, cDNA DKFZp761 L0320 (from clone DKFZp761LO320) 225262 a 88 FOSL2 FOS-like antigen 2 2355 203322 a. 88 ADNP2 ADNP homeobox2 228SO 215933 s at 88 HEHEX hematopoietically expressed homeobox 3O87 227594 a 88 ZMYM6 Zinc finger, MYM-type 6 92.04 226691 a 8.8 KIAA1856 KIAA1856 protein 84629 233877 a 88 — CDNA FLJ20770 fis, clone COLO6509 1560O31 at 88 FRMD4A FERM domain containing 4A 55691 242216 a 88 — Transcribed locus 219457 s at 88 RIN3 Ras and Rab interactor 3 79890 244665 a. 87 - Transcribed locus 202498 s at 87 SLC2A3 solute carrier family 2 (facilitated glucose transporter), 6515 member 3 2294.10 a 87 - MRNA, cDNA DKFZp564G0462 (from clone 9p13.11 DKFZp564G0462) 200748 s at 87 FTH1 ferritin, heavy polypeptide 1 if ferritin, heavy 2495 1q13 fif FTHL11 i? polypeptide-like 11 fit ferritlin, heavy polypeptide-like 2503 fif 8q21.13 FTHL16 16 2508 213258 at 87 TFPI tissue factor pathway inhibitor (lipoprotein-associated 7035 coagulation inhibitor) US 8,568,974 B2 65 66 TABLE13S

Top 50 R5

Probe Set ID Rank Gene Gene Title Entrez.ID Chrom

213920 at 85 CUTL2 cut-like 2 (Drosophila) 23316 2d24.11 q24. 12 224734 at 84 HMGB1 high-mobility group box. 1 3.146 12 212751 at 84 UBE2IN ubiquitin-conjugating enzyme E2N (UBC13 homolog, 7334 yeast) 241774 at 84 — Transcribed locus 202947 s a 82 GYPC glycophorin C (Gerbich blood group) 2995 201524 x a 82 UBE2IN ubliquitin-conjugating enzyme E2N (UBC13 homolog, 7334 yeast) 218447 at 82 C16orf61 chromosome 16 open reading frame 61 56942 6d23.2 242064 at 81 SDK2 sidekick homolog 2 (chicken) S4S49 725.1 210473 s a 80 GPR12S G protein-coupled receptor 125 166647 4p15.31 200056 s a 79 C1D nuclear DNA-binding protein i? similar to nuclear DNA-10438 fif Od22.3 LOC727879 binding protein 727879 201119 s a 79 COX8A cytochrome c oxidase subunit 8A (ubiquitous) 1351 20583.9 s a 79 BZRAP1 benzodiazepine receptor (peripheral) associated protein 9256 1 225073 at 79 PPHLN1 periphilin 1 51535 203948 s a 78 MPO myeloperoxidase 4353 723.1 239274 at 78 - Transcribed locus 208657 s a 78 39700 septin 9 10801 204005 s a 78 PAWR PRKC, apoptosis, WT1, regulator 5074 226101 at 78 PRKCE protein kinase C, epsilon 5581 213222 at 77 PLCB1 phospholipase C, beta 1 (phosphoinositide-specific) 23236 233873 x a 77 PAPD1 PAP associated domain containing 1 SS149 1.23 201015 s a 77 JUP junction plakoglobin 3728 202824 S a 77 TCEB1 transcription elongation factor B (SIII), polypeptide 1 6921 .11 (15 kDa, elongin C) 218023 s a 77 FAM53C amily with sequence similarity 53, member C 51307 208195 a 77 TTN itin 7273 202123 s a 76 ABL1. V-abl Abelson murine leukemia viral oncogene homolog 25

227433 a 76 KIAA2O18 KLAA2O18 205717 217788 s a 76 GALNT2 UDP-N-acetyl-alpha-D-galactosamine:polypeptide N 2590 acetylgalactosaminyltransferase 2 (GalNAc-T2) 227846 a 76 GPR176 G protein-coupled receptor 176 11245 14 .1 212229 s a 76 FBXO21 F-box protein 21 23O14 203476 a 76 TPBG trophoblast glycoproteln 7162 200786 a 75 PSMB7 proteasome (prosome, macropain) subunit, beta type, 7 5695 .12 223598 a 7S RAD23B RAD23 homolog B (S. cerevisiae) 5887 201827 a 7S SMARCD2 SWI/SNF related, matrix associated, actin dependent 6603 regulator of chromatin, Subfamily d, member 2 201754 a 75 COX6C cytochrome c oxidase subunit Vic 1345 205401 a 75 AGPS alkylglycerone phosphate synthase 854O 223991 s at 7S GALNT2 UDP-N-acetyl-alpha-D-galactosamine:polypeptide N 2590 acetylgalactosaminyltransferase 2 (GalNAc-T2) 211031 s at 74 CLIP2 CAP-GLY domain containing linker protein 2 7461 223101 s at 74 ARPCSL actin related protein 2/3 complex, subunit 5-like 81873 225694 at 74 CRKRS Cdc2-related kinase, arginine?serine-rich 51755 222794 x at 74 PAPD1 PAP associated domain containing 1 SS149 203949 at 74 MPO myeloperoxidase 4353 217584 at 74 NPC1 Niemann-Pick disease, type C1 4864 220684 at 74 TBX21 T-box 21 30009 209232 s at 74 DCTNS dynactin 5 (p2S) 84S16 204872 at 74 TLE4 transducin-like enhancer of split 4 (E(Sp1) homolog, 7091 Drosophila) 236375 at 74 – Transcribed locus 224830 at 74 NUDT21 nudix (nucleoside diphosphate linked moiety X)-type 11051 motif 21 1553380 at 74 PARP15 poly (ADP-ribose) polymerase family, member 15 165631 224221 S at 73 VAV3 vav 3 guanine nucleotide exchange factor 10451 211678 s at 73 ZNF313 Zinc finger protein 313 55905 3.13 US 8,568,974 B2 67 68 TABLE 14S

Top 50 R6 (* denotes probe sets mapped to gene by UCSC Genome Browser

Probe Set ID Rank Gene Gene Title EntrezID Chrom 220059 at 96 STAP1 signal transducing adaptor family member 1 26228 4q13.2 228240 at 96 CENTG2* Full-length cDNA clone CSODM002YA18 of Fetal liver – 2a37.2 of Homo sapiens (human) 204066 s at 96 CENTG2 centaurin, gamma 2 116987 2p24.3- p24.1 233225 at 96 CENTG2* CDNA FLJ36087 fis, clone TESTI2020283 2a37.2 206756 at 96 CHST7 carbohydrate (N-acetylglucosamine 6-O) S6548 Xp11.23 Sulfotransferase 7 240758 at 95 CENTO2* 2a37.2 1554343 a at 95 STAP1 signal transducing adaptor family member 1 26228 4q13.2 230537 at 94 PCDH17* 13q21.1 203921 at 94 CHST2 carbohydrate (N-acetylglucosamine-6-O) 9435 3q24 Sulfotransferase 2 230179 at 93 LOC285812 hypothetical protein LOC285812 285812 6p23 2198.21 sat 92 GFOD1 glucose-fructose oxidoreductase domain containing 1 54438 6pter p22.1 1554486 a. at 92 C6orf114 chromosome 6 open reading frame 114 85411 6p23 209593 s at 92 TOR1B torsin family 1, member B (torsin B) 27348 9d34 203329 a 91 PTPRM protein tyrosine phosphatase, receptor type, M 5797 18p11.2 227289 a 91 PCDH17 protocadherin 17 27253 13q21.1 1552398 a. at 91 CLEC12A C-type lectin domain family 12, member A 160364 12p13.2 242457 a 91 — Transcribed locus Sq21.1 205656 a 90 PCDH17 protocadherin 17 27253 13q21.1 1555579 s at 90 PTPRM protein tyrosine phosphatase, receptor type, M 5797 18p11.2 1556593 s at 89 - CDNA FLJ40061 fis, clone TESOP2000083 3d23 228863 a 89 PCDH17 protocadherin 17 27253 13q21.1 202336 s at 88 PAM peptidylglycine alpha-amidating monooxygenase SO66 5q14-q21 235968 a 87 CENTG2 centaurin, gamma 2 116987 2p24.3- p24.1 225611 a 87 - 5d.123 210944 sat 87 CAPN3 calpain 3, (p94) 825 15q15.1- q21.1 211340s at 87 MCAM melanoma cell adhesion molecule 4.162 11q23.3 233038 a 87 CENTG2* CDNA: FLJ22776 fis, clone KAIA1582 2C37.2 219470 x at 87 CCNT cyclin J S4619 10pter q26.12 244665 a. 86 ITGA6* Transcribed locus 2d31.1 230954 a 86 C20orf112 chromosome 20 open reading frame 112 14O688 20d 11.1- q11.23 211890 x at 86 CAPN3 calpain 3, (p94) 825 15q15.1- q21.1 226342 a. 86 SPTBN1 spectrin, beta, non-erythrocytic 1 6711 2p21 202746 a 86 ITM2A integral membrane protein 2A 9452 Xq13.3- Xq21.2 20908.7 x at 86 MCAM melanoma cell adhesion molecule 4.162 11q23.3 22.3130 s at 86 MYLIP myosin regulatory light chain interacting protein 291.16 6p23 p22.3 228.098 s at 85 MYLIP myosin regulatory light chain interacting protein 291.16 6p23 p22.3 225613 at 84 MAST4 microtubule associated serine/threonine kinase family 375449 5q12.3 member 4 40016 g at 84 MAST4 microtubule associated serine/threonine kinase family 375449 5q12.3 member 4 232227 at 84 AF161442* HSPC324 9q34.3 202747 s at 84 ITM2A integral membrane protein 2A 9452 Xq13.3- Xq21.2 228,097 at 84 MYLIP myosin regulatory light chain interacting protein 291.16 6p23 p22.3 229.091 s at 84 CCN cyclin J S4619 10pter q26.12 204836 at 84 GLDC glycine dehydrogenase (decarboxylating) 2731 9p22 201656 at 83 ITGA6 integrin, alpha 6 3655 2a31.1 215177 s at 83 ITGA6 integrin, alpha 6 3655 2a31.1 214475 X at 83 CAPN3 calpain 3, (p94) 825 15q15.1- q21.1 1558621 at 83 CABLES1 CdkS and Abl enzyme substrate 1 91768 18q11.2 229597 s at 83 WDFY4 WDFY family member 4 57705 10q11.23 231166 at 83 GPR155 G protein-coupled receptor 155 151556 2a31.1 239956 at 82 – CDNA FLJ40061 fis, clone TESOP2000083 3q23 US 8,568,974 B2 69 70 TABLE 15S

Top 50 R8 (* denotes probe sets mapped to gene by UCSC Genome Browser

Probe Set ID Rank Gene Gene Title Entrez.ID Chrom 236489 at 90 GPR1108 Transcribed locus 6p12.3 212592 at 89 IG Immunoglobulin J polypeptide, linker protein for 3512 4q21 immunoglobulin alpha and mu polypeptides 217109 at 89 MUC4 mucin 4, cell Surface associated 4585 3d29 240586 at 88 ENAM Enamelin 101.17 4q13.3 2O5795 at 88 NRXN3 neurexin 3 9369 14q31 238689 at 86 GPR110 G protein-coupled receptor 110 266977 6p12.3 21711.0 s at 85 MUC4 mucin 4, cell Surface associated 4585 3d29 236750 at 85 NRXN3* Transcribed locus 14q31.1 242051 at 85 CD99* Transcribed locus Xp22.33; Y11.31 204895 X at 84 MUC4 mucin 4, cell Surface associated 4585 3d29 201029 s at 84 CD99 CD99 molecule 4267 Xp22.32; Y11.3 201028 s at 83 CD99 CD99 molecule 4267 Xp22.32; Y11.3 229114 at 82 GAB1* CDNA done IMAGE:48O1326 14q31.21 206873 at 82 CA6 carbonic anhydrase VI 765 1p36.2 201876 at 82 PON2 paraOXonase 2 S445 7q21.3 222154 s a 82 LOC26010 viral DNA polymerase-transactivated protein 6 26010 233.1 210830s a 81 PON2 paraOXonase 2 S445 7q21.3 235988 at 81 GPR110 G protein-coupled receptor 110 266977 6p12.3 216565 x a 81 LOC391020 interferon induced transmembrane protein pseudogene 391 O20 p36.11 215021 s a 80 NRXN3 neurexin 3 9369 4q31 225912 at 79 TP53INP1 tumor protein p53 inducible nuclear protein 1 94241 8q22 226002 at 78 GAB1* CDNA clone IMAGE:48O1326 4.d31.21 214022 s a 78 IFITM1 interferon induced transmembrane protein 1 (9–27) 8519 1p15.5 212203 x a 78 IFITM3 interferon induced transmembrane protein 3 (1-8U) 10410 1p15.5 1563357 at 78 SERPINB9* MRNA, cDNA DKFZp564C203 (from clone 6p25.2 DKF4564C203) 225998 a 77 GAB1 GRB2-associated binding protein 1 2549 4q31.21 201315 x a 77 IFITM2 interferon induced transmembrane protein 2 (1-8D) 10581 1p15.5 201601 x a 77 IFITM1 interferon Induced transmembrane protein 1 (9–27) 8519 1p15.5 230643 a 77 WNT.9A wingless-type MMTV integration site family, member 9A 7483 q42 212974 a 77 DENND3 DENN/MADD domain containing 3 22898 8q24.3 203435 s a 77 MME membrane metallo-endopeptidase 4311 3q25.1- q25.2 22374.1 s a 77 TTYH2 tweety homolog 2 (Drosophila) 94O15 7q24 212975 a 77 DENND3 DENN/MADD domain containing 3 22898 8q24.3 207426 s a 76 TNFSF4 tumor necrosis factor (ligand) Superfamily, member 4 7292 q25 (tax-transcriptionally activated glycoprotein 1, 34 kDa) 52731 at 7S FLJ2O294 hypothetical protein FLJ20294 55626 1p11.2 215028 a 7S SEMA6A Sema domain, transmembrane domain (TM), and 57.556 5q23.1 cytoplasmic domain, (Semaphorin) 6A 229649 a 75 NRXN3 neurexin 3 9369 4q31 1559315 s at 175 LOC144481 hypothetical protein LOC144481 144481 2q22 205983 a 74 DPEP1 dipeptidase 1 (renal) 1800 6q24.3 226840 a 74 H2AFY H2A histone family, member Y 9555 5q31.3- q32 23.0161 a 74 CD99* Transcribed locus Xp22.33; Yp11.31 223304 a 74 SLC37A3 solute carrier family 37 (glycerol-3-phosphate 842SS 7q34 transporter), member 3 218862 a 74 ASB13 ankyrin repeat and SOCS box-containing 13 79754 10p15.1 213939 s at 73 RUFY3 RUN and FYVE domain containing 3 22902 4q13.3 207112 s at 73 GAB1 GRB2-associated binding protein 1 2549 4q31.21 227856 a 73 CAOrf32 chromosome 4 open reading frame 32 132720 4q25 238880 a 73 GTF3A general transcription factor IIIA 2971 13q12.3- q13.1 1569666 s at 73 SLC37A3* Homo sapiens, clone IMAGE:5581630, mRNA 7q34 20936.5 s at 73 ECM1 extracellular matrix protein 1 1893 1q21 203373 a 73 SOCS2 Suppressor of cytokine signaling 2 883S 12q

Acknowledgements Genomics, Biostatistics, and Bioinformatics & Computa This work was supported by NIH DHHS Grants: NCI tional Biology, partially supported by NCI P30 CA1 18100, Strategic Partnerships to Evaluate Cancer Gene Signatures 60 were critical for this work. We would like to thank Malcolm (SPECS) Program NCIU01 CA114762 (Principal Investiga- Smith for many helpful discussions and his organizational tor: CW) and NCI U10CA98543 Supporting the Children's efforts related to this entire project. Oncology Group and Statistical Center (Principal Investiga- Authorship tor: GR), The National Childhood Cancer Foundation, and a RCH performed research, analyzed and interpreted data, Leukemia and Lymphoma Society Specialized Center of 65 performed statistical analysis and wrote the manuscript; XW Research (SCOR) Program Grant 7388-06 (PI: CW). Univer- analyzed and interpreted data and performed Statistical analy sity of New Mexico Cancer Center Shared Facilities: KUGR sis; GSD analyzed and interpreted data; KA performed US 8,568,974 B2 71 72 research and analyzed and interpreted data; KKD analyzed lymphoblastic leukemia and its relationship to other prog and interpreted data; EJB performed statistical analysis; IMC nostic factors: A Children's Oncology Group study. Blood. designed research and analyzed and interpreted data; CSW 2008. wrote the manuscript; WW wrote the manuscript; SRA ana 10. Nachman J. B. Sather HN, Sensel MG, et al. Augmented lyzed and interpreted data; SPH designed research; MD 5 post-induction therapy for children with high-risk acute designed research and performed Statistical analysis; JP per lymphoblastic leukemia and a slow response to initial formed research; AJC performed research; MJB performed therapy. N Engl J Med. 1998; 338:1663-1671. research: WPB designed research; WLC designed research; 11. Seibel N L, Steinherz P G, Sather H N, et al. Early BC designed research; GHR designed research; DB per postinduction intensification therapy improves Survival for formed research; CLW designed research and wrote the 10 children and adolescents with high-risk acute lymphoblas manuscript. tic leukemia: a report from the Children's Oncology Group. Blood. 2008: 111:2548-2555. 12. Borowitz MJ, Pullen DJ, Shuster JJ, et al. Minimal FIGURE LEGENDS residual disease detection in childhood precursor-B-cell 15 acute lymphoblastic leukemia: relation to other risk fac FIG. 2. Hierarchical heat map that identifies outlier clus tors. A Children's Oncology Group study. Leukemia. ters. In Panel A the 209 COPA probe sets are shown in rows 2003; 17:1566-1572. and the 207 samples in columns. In Panel B the 215 ROSE 13. Davidson G. S. Martin S, Boyack K. W. et al. Robust probe sets are shown in rows. The colored boxes indicate the Methods for Microarray Analysis. In: Akay M, ed. Genom identification of significant clusters. The colored bars across ics and Proteomics Engineering in Medicine and Biology. the bottom denote translocations, outcome and race as Hoboken, New Jersey: IEEE Press; Wiley; 2007:99-130. described in FIG. 1. 14. Tomlins SA, Rhodes DR, Perner S, etal. Recurrent fusion FIG. 3. Kaplan-Meier plots for clusters with aberrant out of TMPRSS2 and ETS transcription factor genes in pros come. RFS survival are shown for cluster 6 (Panel A) and tate cancer. Science. 2005; 310:644-648. cluster 8 (Panel B) for patients identified by multiple algo 25 15. Bland J. M. Altman DG. The log rank test. BMJ. 2004; rithms. The data for all 207 samples are shown with a black 328:1073. line. yellow=H8, light blue=V8, red=R8 and magenta-C8. 16. Armitage P. Berry G. Statistical methods in medical FIG. 4. Validation of ROSE in CCG 1961 data set. In Panel research (ed 3rd). Oxford; Boston: Blackwell Scientific A a heat map generated as described in FIG. 2B identifies Publications: 1994. groups of samples with similar patterns of genes expression. 30 17. Bewick V, Cheek L, Ball J. Statistics review 12: survival The colored boxes indicate the clusters with similarities to analysis. Crit Care. 2004; 8:389-394. those shown in the primary data set. In Panel B the RFS curve 18. Bhojwani D, Kang H, Menezes RX, et al. Gene expres for cluster R8 in Panel A is shown in red, while the RFS for sion signatures predictive of early response and outcome in samples not in that group is shown in black. high-risk childhood acute lymphoblastic leukemia: a Chil 35 dren's Oncology Group study. JClin Oncol. 2008; in press. REFERENCES 19. Fine BM, Stanulla M. Schrappe M. etal. Gene expression patterns associated with recurrent chromosomal transloca 1. Ries L AG, Melbert D, Krapcho M, et al. SEER Cancer tions in acute lymphoblastic leukemia. Blood. 2004; 103: Statistics Review, 1975-2005. NIH publication. Bethesda, 1043-1049. Md.: National Cancer Institute, Bethesda, Md., 2008:V. 40 20. van Delft F W. Bellotti T, Luo Z. et al. Prospective gene 2. Smith M. Arthur D. Camitta B, et al. Uniform approach to expression analysis accurately subtypes acute leukaemia in risk classification and treatment assignment for children children and establishes a commonality between hyperdip with acute lymphoblastic leukemia. J Clin Oncol. 1996; loidy and t(12:21) in acute lymphoblastic leukaemia. Br J 14:18-24. Haematol. 2005: 130:26-35. 3. Pieters R. Carroll W. L. Biology and treatment of acute 45 21. Coustan-Smith E. Sancho J. Behm FG, et al. Prognostic lymphoblastic leukemia. Pediatr Clin North Am. 2008; importance of measuring early clearance of leukemic cells 55:1-20, ix. by flow cytometry in childhood acute lymphoblastic leu 4. Armstrong S A. Look A T. Molecular genetics of acute kemia. Blood. 2002: 100:52-58. lymphoblastic leukemia. J Clin Oncol. 2005; 23:6306 22. Steinherz PG, Gaynon PS, Breneman J C, et al. Cytore 6315. 50 duction and prognosis in acute lymphoblastic leukemia— 5. Yeah E. J. Ross ME, Shurtleff SA, et al. Classification, the importance of early marrow response: report from the Subtype discovery, and prediction of outcome in pediatric Childrens Cancer Group. J. Clin Oncol. 1996; 14:389-398. acute lymphoblastic leukemia by gene expression profil 23. Bhatia S, Sather HN, Heerema NA, Trigg M. E. Gaynon ing. Cancer Cell. 2002: 1:133-143. PS, Robison L. L. Racial and ethnic differences in survival 6. Moos PJ, Raetz EA, Carlson MA, et al. Identification of 55 of children with acute lymphoblastic leukemia. Blood. gene expression profiles that segregate patients with child 2002: 100:1957-1964. hood leukemia. Clin Cancer Res. 2002; 8:3118-3130. 24. Pollock B H, DeBaun MR, Camitta B M, et al. Racial 7. Wilson CS, Davidson G. S. Martin SB, et al. Gene expres differences in the survival of childhood B-precursor acute sion profiling of adult acute myeloid leukemia identifies lymphoblastic leukemia: a Pediatric Oncology Group novel biologic clusters for risk classification and outcome 60 Study. J Clin Oncol. 2000; 18:813-823. prediction. Blood. 2006: 108:685-696. 25. Dworzak M. N. Froschl G, Printz D, et al. CD99 expres 8. Shuster JJ. Camitta B M, Pullen J, et al. Identification of sion in T-lineage ALL: implications for flow cytometric newly diagnosed children with acute lymphocytic leuke detection of minimal residual disease. Leukemia. 2004; mia at high risk for relapse. Cancer Res Ther Control. 18:703-708. 1999; 9:101-107. 65 26. Wilkerson A E, Glasgow MA, Hiatt K.M. Immunoreac 9. Borowitz M. J. Devidas M, Hunger S. P. et al. Clinical tivity of CD99 in invasive malignant melanoma. J Cutan significance of minimal residual disease in childhood acute Pathol. 2006; 33:663-666. US 8,568,974 B2 73 74 27. Scotlandi K. Perdichizzi S. Bernard G. et al. Targeting B). treating B-ALL in said patient with non-traditional CD99 in association with doxorubicin: an effective com leukemia therapy if the observed expression level is bined treatment for Ewings sarcoma. EurJ Cancer. 2006; higher than control level. 42:91-96. 2. The method according to claim 1 wherein an observed 28. Chaturvedi P. Singh A P. Moniaux N, et al. MUC4 mucin 5 expression level of at least one additional gene product potentiates pancreatic tumor cell proliferation, Survival, selected from the group consisting of CRLF2 (cytokine and invasive properties and interferes with its interaction to receptor-like factor 2) and GPR110 (G protein-coupled extracellular matrix proteins. Mol Cancer Res. 2007; receptor 110) which is greater than said control expression 5:309-32O. level is indicative of therapeutic failure with traditional leu 29. Moniaux N. Chaturvedi P. Varshney GC, et al. Human 10 kemia therapy. MUC4 mucin induces ultra-structural changes and tumori genicity in pancreatic cancer cells. Br J Cancer. 2007; 3. The method according to claim 2 wherein said one 97:345-357. additional gene product is CRLF2. 30. Juric D. Lacayo NJ, Ramsey MC, et al. Differential gene 4. The method according to claim 2 wherein said one expression patterns and interaction networks in BCR- 15 additional gene product is GPR110. ABL-positive and -negative adult acute lymphoblastic leu 5. The method according to claim 2 wherein said additional kemias. J Clin Oncol. 2007; 25:1341-1349. gene product is CRLF2 and GPR110. 31. Kameda H, Ishigami H, Suzuki M. Abe T. Takeuchi T. 6. The method according to claim 1 wherein said tradi Imatinib mesylate inhibits proliferation of rheumatoid syn tional leukemia therapy is Memorial Sloan Kettering New ovial fibroblast-like cells and phosphorylation of Gab 20 York II (NYII), UKALLr2, AL841, AL851, ALHR88, adapter proteins activated by platelet-derived growth fac MCP841, modified BMF, BMF-95 or ALinC 17. tor. Clin Exp Immunol. 2006: 144:335-341. 7. The method according to claim 1 wherein said non 32. Zukerberg L. R. DeBernardo RL, Kirley SD, et al. Loss traditional therapy is a more aggressive traditional therapy. of cables, a cyclin-dependent kinase regulatory protein, is 8. The method according to claim 1 wherein said non associated with the development of endometrial hyperplasia 25 traditional therapy is a more aggressive NYII therapy. and endometrial cancer. Cancer Res. 2004; 64:202-208. 9. The method according to claim 1 wherein said non 33. Zhang H, Duan HO, Kirley S D, Zukerberg LR, Wu CL. traditional therapy is a more aggressive UKALLr2 therapy. Aberrant splicing of cables gene, a CDK regulator, in 10. The method according to claim 1 wherein said non human cancers. Cancer Biol Ther. 2005; 4:1211-1215. traditional therapy is a more aggressive AL841 therapy. 34. Dong Q, Kirley S. Rueda B, Zhao C, Zukerberg L. Oliva 30 11. The method according to claim 1 wherein said non E. Loss of cables, a novel gene on chromosome 18q, in traditional therapy is a more aggressive AL851 therapy. ovarian cancer. Mod Pathol. 2003; 16:863-868. 12. The method according to claim 1 wherein said non 35. Kirley S D, D Apuzzo M, Lauwers GY, Graeme-Cook F. traditional therapy is a more aggressive ALHR88 therapy. Chung DC, Zukerberg L. R. The Cables gene on chromo 13. The method according to claim 1 wherein said non Some 18Q regulates colon cancer progression in vivo. Can- 35 traditional therapy is a more aggressive MCP841 therapy. cer Biol Ther. 2005; 4:861-863. 14. The method according to claim 1 wherein said non 36. Ross ME, Zhou X, Song G, et al. Classification of pedi traditional therapy is a more aggressive modified BMF atric acute lymphoblastic leukemia by gene expression therapy. profiling. Blood. 2003: 102:2951-2959. 15. The method according to claim 1 wherein said non 37. Mullighan CG, Miller CB, Su X, et al. ERG deletions 40 traditional therapy is a more aggressive BMF-95 therapy. define a novel subtype of B-progenitor acute lymphoblastic 16. The method according to claim 1 wherein said non leukemia. Blood. 2007: 110:212 A-213A. traditional therapy is a more aggressive ALinC 17 therapy. 38. Hoffmann K. Firth MJ, Beesley AH, et al. Prediction of 17. The method according to claim 1 wherein said non relapse in paediatric pre-B acute lymphoblastic leukaemia traditional therapy is an experimental leukemia therapy. using a three-gene risk index. Br J Haematol. 2008: 140: 45 18. The method according to claim 1 wherein said prede 656-664. termined value is obtained from a sample of patients with high risk B-ALL who have been cured with traditional leu The invention claimed is: kemia therapy. 1. A method for treating high risk B-precursor acute lym 19. The method according to claim 1 wherein said control phoblastic leukemia (B-ALL) in a patient in need comprising: 50 is obtained from a sample of patients who are non-leukemic. A). determining whether said patient is a candidate for 20. A method for predicting therapeutic outcome in a traditional therapy for B-ALL comprising patient with high risk B-precursor acute lymphoblastic leu i) obtaining a biological sample from said patient; kemia (B-ALL) patient comprising: ii) analyzing said sample to determine the expression (A) obtaining a biological sample from said patient; level of the gene products MUC4 (Mucin 4) and IGJ 55 (B) analyzing said sample to determine the expression (immunoglobulin J) in said sample; and level of the gene products MUC4 (Mucin 4) and IGJ iii) comparing the observed gene expression levels for (immunoglobulin J) and at least one additional gene each of said gene products to a control gene expres product selected from the group consisting of CRLF2 sion level selected from the group consisting of: (cytokine receptor-like factor 2) and GPR110 (G pro a) the gene expression level for the gene products 60 tein-coupled receptor 110) in said sample; and observed in a control sample; and C) comparing the observed gene expression levels for each b) a predetermined gene expression level for the gene of said gene products to a control gene expression level products; Selected from the group consisting of: wherein an observed expression level that is higher i) the gene expression level for the gene products than the control gene expression for both of said 65 observed in a control sample; and gene products is indicative of therapeutic failure ii) a predetermined gene expression level for the gene with traditional leukemia therapy; and products; US 8,568,974 B2 75 76 wherein an observed expression level of all of the gene products analyzed that is higher than the control gene expression level for said gene products indicates thera peutic failure with traditional leukemia therapy in said patient and said patient is treated with non-traditional 5 leukemia therapy. 21. The method according to claim 20 wherein said addi tional gene product is CRLF2 and GPR110. 22. The method according to claim 20 wherein said one additional gene product is CRLF2. 10 23. The method according to claim 20 wherein said one additional gene product is GPR110. 24. The method according to claim 20 wherein said prede termined expression level is obtained from a sample of patients with high risk B-ALL who have been cured with 15 traditional leukemia therapy. 25. The method according to claim 20 wherein said control sample is obtained from a sample of patients who are non leukemic. 2O