US 2011 0230372A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2011/0230372 A1 Willman et al. (43) Pub. Date: Sep. 22, 2011

(54) EXPRESSION CLASSIFIERS FOR Related U.S. Application Data ESSESYSSSIM.A. (60) Provisional application No. 61/279,281, filed on Oct. CLASSIFICATION AND OUTCOME 16,s 2009,s provisional application No. 61/199,342s- as PREDCTION IN PEDIATRCB-PRECURSOR filed on Nov. 14, 2008. ACUTE LYMPHOBLASTICLEUKEMA Publication Classification (51) Int. Cl. (75) Inventors: Cheryl L. Willman, Albuquerque, C40B 40/06 (2006.01) NM (US); Richard Harvey, CI2O I/68 (2006.01) Placitas, NM (US); Huining Kang, GOIN 33/566 (2006.01) Albuquerque, NM (US); Edward CI2O I/44 (2006.01) Bedrick, Albuquerque, NM (US); CI2O I/527 (2006.01) Xuefei Wang, Creve Coeur, MO 5.9. 4.t 73 C (US); Susan R. Atlas, Albuquerque, ( .01) NM (US); I-Ming Chen (52) U.S. Cl...... 506/16:435/6.18; 435/6.17; 435/6.16; s s 435/6.1435/7.92.435/19435/4:435/15 Albuquerque, NM (US)(US s 435/6.13:435/7.1:436/501s s s s (73) Assignee: STC UNM (57) ABSTRACT The present invention relates to the identification of genetic (21) Appl. No.: 12/998,474 markers patients with leukemia, especially including acute lymphoblastic leukemia (ALL) at high risk for relapse, espe cially high risk B-precursor acute lymphoblastic leukemia (22) PCT Filed: Nov. 16, 2009 (B-ALL) and associated methods and their relationship to therapeutic outcome. The present invention also relates to (86). PCT No.: PCT/US2009/006117 diagnostic, prognostic and related methods using these genetic markers, as well as kits which provide microchips S371 (c)(1), and/or immunoreagents for performing analysis on leukemia (2), (4) Date: Jun. 6, 2011 patients. Patent Application Publication Sep. 22, 2011 Sheet 1 of 24 US 2011/0230372 A1

FIGURE 1

OO Low Risk (n=109)

25 HR3.31 P<0.0001 (logrank)

Patent Application Publication Sep. 22, 2011 Sheet 2 of 24 US 2011/0230372 A1

FIGURE 2

A B 100 100 Low Risk (n=72) 75 75 ---. ---- High Risk (n=52)

Flow MRD(-) Patients HR=2.8 HRe3.75 P-0.0001 (logrank) P=0.0004 (logrank)

Low Risk (n=29) MRD+/Low Risk

a "MROd. g 50 - s 50 High Risk l, - 25 High Risk 25 MRD4 Flow MRD(+) Patients (ra38) High Risk HRc282 - - - - P=0.0054 (logrank)

YEARS YEARS

Patent Application Publication Sep. 22, 2011 Sheet 3 of 24 US 2011/0230372 A1

FIGURE 3

MRD (n=96)

HR3.04 HR2.8 PCO.0001 (logrank) O P=0.0001 (logrank) O O 2 4. 6 YEARS C 100 Low Risk (n=61) 75 ------High Risk 2 (1535). 50

25 High'' Risk Flow MRD(-) Patients Flow MRD(+) Patients (n=34) HRs.2.54 HR-223 --- P=0.022 (ogrank) P=0.038 (logan) D 2 4 6 YEARS F 100 Low Risk (n=51

75 - 'i Intermediate Risk

: ----(n59)------50 25 High Risk (n=34) P<0.0001 (logrank)

YEARS YEARS Patent Application Publication Sep. 22, 2011 Sheet 4 of 24 US 2011/0230372 A1

FIGURE 4

Patent Application Publication Sep. 22, 2011 Sheet 5 of 24 US 2011/0230372 A1

FIGURE 5

A B 100 100

LOW Risk (n=47

'1. Intermediate Risk (n522)

High Risk (n=15).

P=0.047 (ogrank) Patent Application Publication Sep. 22, 2011 Sheet 6 of 24 US 2011/0230372 A1

FIGURE 6

- - - inte?tmediate Intermediate Risk ---Risk.

• High Risk 25 25 Patients with Signature -- Patients without Kinase Signature P=0.07 (togrank) P<0.0001 (logrank)

- trtemediate - - - Risk

High Risk

Patients with AKmutations Patients without JAK mutations P=0.001 (logrank) PCO.0001 (logrank) O 2 4. s YEARS

rterediate 25 High risk

Patients with karos deletion Patients without karos deleton P=0.0065 (logrank) P<0.0001 (logrank)

YEARS YEARS Patent Application Publication Sep. 22, 2011 Sheet 7 of 24 US 2011/0230372 A1

FIGURE 7

- Not studied (n = 65) --- Studied (n = 207) P - 0.9751 Patent Application Publication Sep. 22, 2011 Sheet 8 of 24 US 2011/0230372 A1

FIGURE 8

O C O O l?)

O C O O S. cd d C. O) O as S. to -O E cd 5 Z O O CN

O O C O w

O O 50 100 150 200 Number of present calls Patent Application Publication Sep. 22, 2011 Sheet 9 of 24 US 2011/0230372 A1

FIGURE 9

v

00 0.5 10 15 2.0 2.5 Threshold Patent Application Publication Sep. 22, 2011 Sheet 10 of 24 US 2011/0230372 A1

FIGURE 10

1 5 9 15 21 30 45 60 90 40 300 700 2000 6000 Number of Patent Application Publication Sep. 22, 2011 Sheet 11 of 24 US 2011/0230372 A1

FIGURE 11

Randomly partition the sample into 5 parts, balanced to preserve the proportions of key sample characteristics

Use each of the 20 models to predict on the test set. Fit Cox regression to the predicted score and RFS data of the test set and calculate the LRT score.

Has procedure been repeated 20 times?

Calculate the geometric mean of the 100LRTs corresponding to each of the 20's. Chonse the model with the that maximizes the mean TRT as the final model Patent Application Publication Sep. 22, 2011 Sheet 12 of 24 US 2011/0230372 A1

FIGURE 12

Leave one sample Out and use remaining -l samples as the training set

Determine a best prediction model based on the training set using the 20x5-fold cross validation procedure described in Figure S5

Use the best model to make the prediction on the left-out sample

Has every sample been left out and predicted on?

Use the predicted score on each of samples to assess the significance of the model Compared to the clinical predictors Patent Application Publication Sep. 22, 2011 Sheet 13 of 24 US 2011/0230372 A1

FIGURE 13

Randomly partition the sample into 10 folds, balanced to preserve proportions of the maior known sample characteristics

Combine 9 folds into a training set and leave one fold as the test set

N Use the training dataset to calculate the modified t-test score for each gene. Rank Repeat until the genes on the absolute values of the scores. For P = 1,2,..., use the top P every fold genes to build a prediction model has been a test Set OC

/ Use each of the models to predict on the test set, and calculate the misclassification late.

100 repeats (repartitionings) cottoleted?

Calculate the arithmetic mean of the 1000 misclassification rates corresponding to each of the P-gene models. Choose the model with the number of genes minimizing the mean misclassification rate as the final model. Patent Application Publication Sep. 22, 2011 Sheet 14 of 24 US 2011/0230372 A1

FIGURE 14

Omit one sample and use the remaining n-1 samples as the training set

Repeat until Determine a best prediction model based on the training set using the 100x10-fold every cross validations described in Figure S7 sample has been left out alld predicted Use the best model to predict on the left-out sample 0.

Has every sample been left out and predicted on

Calculate the misclassification rate using the binary predictions on all the samples and the ROC curve accuracy using the continuous scores of all the samples. Patent Application Publication Sep. 22, 2011 Sheet 15 of 24 US 2011/0230372 A1

FIGURE 15

OO 0.5 10 15 2.0 2.5 Threshold Patent Application Publication Sep. 22, 2011 Sheet 16 of 24 US 2011/0230372A1

FIGURE 16

s s s Low Risk (n=88); S

High Risk n=75) HR = 2.81 s s 5 P = 0.0001 (log ) Years

8 s Low Risk (n=61

s High Risk (n=35) HR = 2.54

Flow MRD) Patients Flow MRD (--) Patients P = 0.022 (log rank) 8 P< 0.038 (log rank) Years Years

8 s Low Risk (n=61

MRD(-), HR = 277 loan Risk High MRC Poo08 (n=59) s ------

HR = 2.26 MRD(+)| : P=0.0066 High MRC High Risk (n=34) P<0.0001 (log rank) . s P<0.0001 (log rank) Years Years Patent Application Publication Sep. 22, 2011 Sheet 17 of 24 US 2011/0230372 A1

FIGURE 17

Patent Application Publication Sep. 22, 2011 Sheet 18 of 24 US 2011/0230372 A1

FIGURE 18

See

S. Patent Application Publication Sep. 22, 2011 Sheet 19 of 24 US 2011/0230372 A1

FIGURE 19

Patent Application Publication Sep. 22, 2011 Sheet 20 of 24 US 2011/0230372 A1

FIGURE 20

OOL Os 09 t Oe S9.

OO. op Oa. S-se

OO 09 Or Oe Sk Patent Application Publication Sep. 22, 2011 Sheet 21 of 24 US 2011/0230372 A1

FIGURE 21

- Not studied (n = 65) a Studied (n = 207) P. O.9751 Patent Application Publication Sep. 22, 2011 Sheet 22 of 24 US 2011/0230372 A1

FIGURE 22

Probeset 21215 at

e --a of o

sus

9 17 25 33 41 4957 65 38 8997 1613 2 1231371451516. 169171351932O Patent Application Publication Sep. 22, 2011 Sheet 23 of 24 US 2011/0230372 A1

FIGURE 23

Patent Application Publication Sep. 22, 2011 Sheet 24 of 24 US 2011/0230372A1

FIGURE 24 83

8

: S.

8 US 2011/0230372 A1 Sep. 22, 2011

GENE EXPRESSION CLASSIFIERS FOR myeloid leukemia in children from ages 1-15 years, the fre RELAPSE FREE SURVIVAL AND MINIMAL quency of ALL and AML in infants less than one year of age RESIDUAL DISEASE IMPROVERISK is approximately equivalent. Secondly, in contrast to the CLASSIFICATION AND OUTCOME extensive heterogeneity in cytogenetic abnormalities and PREDICTION IN PEDIATRCB-PRECURSOR chromosomal rearrangements in older children with ALL and ACUTE LYMPHOBLASTICLEUKEMA AML, nearly 60% of acute leukemias in infants have chro mosomal rearrangements involving the MLL gene (for Mixed RELATED APPLICATIONS Lineage Leukemia) on 11q23. MLL transloca 0001. This application claims the benefit of priority of tions characterize a Subset of human acute leukemias with a U.S. provisional applications US61/199.342, filed Nov. 14, decidedly unfavorable prognosis. Current estimates Suggest 2008, entitled “ Classifiers for Minimal that about 60% of infants with AML and about 80% of infants Residual Disease and Relapse Free Survival Improve Out with ALL have a chromosomal rearrangement involving come Prediction and Risk Classification and US61/279,281, MLL abnormality in their leukemia cells. Whether hemato filed Oct. 16, 2009, entitled “Gene Expression Classifiers for poietic cells in infants are more likely to undergo chromo Relapse Free Survival and Minimal Residual Disease Somal rearrangements involving 11 q13 or whether this 11 q13 Improve Risk Classification and Outcome Prediction in Pedi rearrangement reflects a unique environmental exposure or atric B-Precursor Acute Lymphoblastic Leukemia', the entire genetic susceptibility remains to be determined. contents of said applications being incorporated by reference 0006. The modern classification of acute leukemias in in their entirety herein. children and adults relies principally on morphologic and 0002 The present invention was made with support under cytochemical features that may be useful in distinguishing one or more grants from the National Institutes of Health AML from ALL, changes in the expression of cell Surface grant no. NIH NCIU01 CA114762, NCI U10 CA98543, NCI antigens as a precursor cell differentiates, and the presence of U10 CA98543, NCI P30 CA118100, U01 GM61393, specific recurrent cytogenetic or chromosomal rearrange U01GM61374 and U24 CA114766. Consequently, the gov ments in leukemic cells. Using monoclonal antibodies, cell ernment retains rights in the present invention. surface antigens (called clusters of differentiation (CD)) can be identified in cell populations; leukemias can be accurately FIELD OF THE INVENTION classified by this means (immunophenotyping). By immu nophenotyping, it is possible to classify ALL into the major 0003. The present invention relates to the identification of categories of “common CD10+ B-cell precursor (around genetic markers patients with leukemia, especially including 50%), “pre-B” (around 25%), “T” (around 15%), “null” acute lymphoblastic leukemia (ALL) at high risk for relapse, (around 9%) and “B” cell ALL (around 1%). All forms other especially high risk B-precursor acute lymphoblastic leuke than T-ALL are considered to be derived from some stage of mia (B-ALL) and associated methods and their relationship B-precursor cell, and “null ALL is sometimes referred to as to therapeutic outcome. The present invention also relates to “early B-precursor ALL. diagnostic, prognostic and related methods using these genetic markers, as well as kits which provide microchips TABLE 1A and/or immunoreagents for performing analysis on leukemia patients. Recurrent Genetic Subtypes of B and T Cell ALL Associated Genetic Frequency in Risk BACKGROUND OF THE INVENTION Subtype Abnormalities Children Category 0004 Leukemia is the most common childhood malig B- Hyperdiploid DNA 25% of B Low nancy in the United States. Approximately 3,500 cases of Precursor Content; Trisomies of Precursor Cases acute leukemia are diagnosed each year in the U.S. in children ALL 4, 10, 17 (12:21)(p13; q22): 28% of B Low less than 20 years of age. The large majority (>70%) of these TELAML.1 Precursor Cases cases are acute lymphoblastic leukemias (ALL) and the 1q23/MLL 4% of B Precursor High remainder acute myeloid leukemias (AML). The outcome for Rearrangements; Cases; >80% of children with ALL has improved dramatically over the past particularly Infant ALL (4:11) (q21; q23) three decades, but despite significant progress in treatment, a (1:19)9q23: p13) - 6% of B Precursor High large group of children with ALL develop recurrent disease. E2APBX1 Cases Conversely, another group of children who now receive dose (9:22)(q34; q11): 2% of B Precursor Very High BCRABL Cases intensification are likely “over-treated and may well be Hypodiploidy Relatively Rare Very High cured using less intensive regimens resulting in fewer toxici B-ALL (8:14)(q24; q32) - 5% of all B High ties and long term side effects. Thus, a major challenge for the gH/MYC lineage ALL cases treatment of children with ALL in the next decade or so is to T-ALL Numerous translocations 7% of ALL cases Not improve and refine ALL diagnosis and risk classification involving the TCRCB Clearly schemes in order to precisely tailor therapeutic approaches to (7q35) or TCRYö (14q11) Defined the biology of the tumor and the genotype of the host. oci 0005 Leukemia in the first 12 months of life (referred to as infant leukemia) is extremely rare in the United States, with 0007 Current risk classification schemes for ALL in chil about 150 infants diagnosed each year. There are several dren from 1-18 years of age use clinical and laboratory clinical and genetic factors that distinguish infant leukemia parameters such as patientage, initial white blood cell count, from acute leukemias that occur in older children. First, while and the presence of specific ALL-associated cytogenetic the percentage of acute lymphoblastic leukemia (ALL) cases abnormalities to stratify patients into “low” “standard.” is far more frequent (approximately five times) than acute “high, and “very high risk categories. National Cancer US 2011/0230372 A1 Sep. 22, 2011

Institute (NCI) risk criteria are first applied to all children response that may be used to tailor the choice of therapy and with ALL, dividing them into “NCI standard risk' (age 1.00 its intensity to a patient's relapse risk. Yet current risk classi 9.99 years, WBC <50,000) and “NCI high risk” (age >10 fication schemes do not fully reflect the tremendous molecu years, WBC >50,000) based on age and initial white blood lar heterogeneity of the acute leukemias and do not precisely cell count (WBC) at disease presentation. In addition to these identify those patients who are more prone to relapse, those general NCI risk criteria, classic cytogenetic analysis and who might be cured with less intensive regimens resulting in molecular genetic detection of frequently recurring cytoge fewer toxicities and long term side effects, or those who will netic abnormalities have been used to stratify ALL patients respond to newer targeted therapeutic agents. It has thus been more precisely into “low” “standard,” “high, and “very the inventors hypothesis that large scale genomic and pro high risk categories. Table 1A shows the 4-year event free teomic technologies that measure global patterns of gene survival (EFS) projected for each of these groups. expression in leukemic cells will yield systematic profiles that 0008 Children with “low risk” disease (22% of all B pre can be used to improve outcome prediction, risk classifica cursor ALL cases) are defined as having standard NCI risk tion, and therapeutic targeting in the acute leukemias. The criteria, the presence of low risk cytogenetic abnormalities present inventors have worked with retrospective patient (t(12:21)/TEL: AML1 or trisomies of chromosomes 4 and cohorts from which they derived rigorously cross-validated 10), and a rapid early clearance of bone marrow blasts during gene expression profiles. Over the years, the inventors have induction chemotherapy. Children with “standard risk” dis built highly collaborative multidisciplinary laboratory, statis ease (50% of ALL cases) are NCI standard risk without “low tical, and computational teams; developed reproducible and risk” or unfavorable cytogenetic features, or, are children sensitive methods for performing gene expression arrays; with low risk cytogenetic features who have NCI high risk designed data warehouses for storage of large gene expres criteria or slow clearance of blasts during induction. Although sion datasets fully annotated with clinical, outcome, and therapeutic intensification has yielded significant improve experimental information; and developed and applied robust ments in outcome in the low and standard riskgroups of ALL, statistical and computational methods and novel visualization it is likely that a significant number of these children are tools for array data analysis. currently "over-treated” and could be cured with less inten 0011. The major scientific challenge in pediatric ALL is to sive regimens resulting in fewer toxicities and long term side improve risk classification schemes and outcome prediction effects. Conversely, a significant number of children even in in order to: 1) identify those children who are most likely to these good risk categories still relapse and a precise means to relapse who require intensive or novel regimens for cure; and prospectively identify them has remained elusive. Nearly 2) identify those children who can be cured with less intensive 30% of children with ALL have “high” or “very high risk regimens with fewer toxicities and long term side effects. disease, defined by NCI high risk criteria and the presence of specific cytogenetic abnormalities (such as t(1:19), t(9:22) or BRIEF DESCRIPTION OF THE FIGURES hypodiploidy) (Table 1); again, precise measures to distin (0012 FIG. 1 shows the performance of the 42 Probe Set guish children more prone to relapse in this heterogeneous (38-Gene) Gene Expression Classifier for Prediction of group have not been established. Relapse-Free Survival (RFS). A and B. Kaplan-Meier sur 0009. Despite these efforts, current diagnosis and risk vival estimates of RFS in the full cohort of 207 patients (Panel classification schemes remain imprecise. Children with ALL A) and in the low vs. high risk groups distinguished with the are more prone to relapse and require more intensive gene expression classifier for RFS (Panel B). HR is the hazard approaches than children with low risk disease who could be ratio estimated using Cox-regression. C. A gene expression cured with less intensive therapies are not adequately pre heatmap is shown with the rows representing the 42 probesets dicted by current classification schemes and are distributed (containing 38 unique genes) composing the gene expression among all currently defined risk groups. Although pre-treat classifier for RFS. The columns represent patient samples ment clinical and tumor genetic stratification of patients has sorted from left to right by time to relapse or last follow up. generally improved outcomes by optimizing therapy, vari Red: high expression relative to the mean: green: low expres ability in clinical course continues to exist among individuals sion relative to the mean. The column labels R or C indicate within a single risk group and even among those with similar whether the patients relapsed or were censored, respectively. prognostic features. In fact, the most significant prognostic (0013 FIG. 2 shows the Kaplan-Meier Estimates of factors in childhood ALL explain no more than 4% of the Relapse-free Survival (RFS) Based on the Gene Expression variability in prognosis, suggesting that yet undiscovered Classifier for RFS and End-Induction (Day 29) Minimal molecular mechanisms dictate clinical behavior (Donadieu et Residual Disease (MRD). A. Day 29 flow cytometric mea al., BrJ Haematol, 102:729-739, 1998). A precise means to Sures of MRD separated patients into two groups with sig prospectively identify such children has remained elusive. nificantly different RFS. B. and C. After dividing patients by 0010. With the advent of modem combination chemo their end-induction flow MRD status, an independent effect therapy and transplantation, significant advances have been of the gene expression classifier for RFS is observed among made in the treatment of the acute leukemias, particularly in both the flow MRD-negative (<0.01% blasts) (Panel B) and children. Yet despite these advances, a large percentage of the flow MRD-positive (>0.01% blasts) (Panel C) patients. Dand thousands of children and adults diagnosed with leukemia E. Combining the risk scores determined from the gene each year will ultimately die of resistant or relapsed disease. expression classifier and flow MRD yields four distinct out The therapeutic advances that have been achieved in the acute come groups; the two discordant groups show no significant leukemias, particularly in pediatric acute lymphoblastic leu difference in RFS (P=0.572) and are therefore collapsed into kemia (ALL), have come in part through the development of an intermediate risk group for RFS prediction (Panel E). The detailed risk classification schemes based on clinical features, hazard ratios (HR) and corresponding Pvalues are based on the presence or absence of specific cytogenetic or molecular the Cox regression (medium risk vs. low risk, HR=3.73, genetic abnormalities, and measures of early therapeutic P=0.001; high risk vs. medium risk, HR=2.27, P=0.002). The US 2011/0230372 A1 Sep. 22, 2011

P-value reported in the lower left hand corner corresponds to Presence of Kinase Signatures, JAK Mutations, and the test for differences among all groups. IKAROS/IKZF1 Deletions. A and B. Application of the origi 0014 FIG. 3 shows the Kaplan-Meier Estimates of nal 42 probeset (38 gene: Supplement Table S4) gene expres Relapse-free Survival (RFS) Based on the Gene Expression sion classifier for RFS combined with end-induction flow Classifier for RFS Modeled on High-Risk ALL Cases Lack cytometric measures of MRD distinguishes two distinct risk ing Known Recurring Cytogenetic 29 Abnormalities and groups in COG 9906 ALL patients with a kinase signatures End-Induction (Day 29) Minimal Residual Disease (MRD). (Panel A) and three risk groups in those patients lacking A. The second gene expression classifier modeled only on kinase signatures (Panel B). C and D. Application of the those high-risk ALL cases (n=163) (Supplement Table S8) combined classifier also resolves two distinct and statistically from the COG 9906 ALL cohort lacking recurring cytoge significant risk groups in ALL patients with JAK mutations netic abnormalities resolves two distinct risk groups of (Panel C) and in three risk groups in those patients lacking patients with significantly different RFS. B. Day 29 flow JAK mutations (Panel D). E and F. Application of the com MRD status separated these 163 ALL cases into two groups bined classifier distinguishes three risk groups with statisti with significantly different RFS. C and D. After dividing cally significant RFS and patients with (Panel E) and without patients by their end-induction flow MRD status, an indepen IKAROS/IKZF1 deletions. The hazard ratios (HR) and cor dent effect of the gene expression classifier for RFS is responding P-values are based on the Cox regression. The observed among both the flow MRD-negative (<0.01% P-value reported in the lower left hand corner corresponds to blasts) (Panel C) and flow MRD-positive (>0.01% blasts) the log rank test for differences among all groups. (Panel D) patients. E and F. Combining the risk scores deter (0018 FIG. 7 (Figure S1) shows the difference in Relapse mined from the gene expression classifier and flow MRD Free Survival (RFS) between Study Cohort (n=207) and yields four distinct outcome groups (Panel E); the two dis Remaining Patients Registered to COG P9906 (n=65). Com cordant groups show no significant difference in RFS and are parison of relapse free survival between those studied therefore collapsed into an intermediate risk group for RFS (n=207) and remaining COG P9906 patients not included in prediction (Panel F). The hazard ratios (HR) and correspond this cohort (n=65). ing P-values are based on the Cox regression regression (high (0019 FIG. 8 (Figure S2) shows the Number of Genes risk vs. intermediate risk, HR=2.26, P=0.0066; intermediate (Probe Sets) with the Number of Present Calls Exceeding a risk vs. low risk, HR=2.77, P=0.008). The P-value reported in Specified Cutoff. Number of probe sets with number of the lower left hand corner corresponds to the test for differ Present calls exceeding a specified cutoff (here, n=104, ences among all groups. corresponding to 50% of n=207 patient samples analyzed. 0015 FIG. 4 shows the Gene Expression Classifier for This yields 23,775 final probe sets for further analysis.) Prediction of End-Induction (Day 29) Flow MRD in Pretreat (0020 FIG. 9 (Figure S3) shows the Likelihood Ratio Test ment Samples Combined with the Gene Expression Classifier Statistic as a Function of SPCA Threshold. for RFS. A. A receiver operating curve (ROC) shows the high (0021 FIG. 10 (Figure S4) shows the Box plots of Cross accuracy of the 23 probe set MRD classifier (LOOCV error validation Error Rates for DLDA Model Predicting Day 29 rate of 24.61%; sensitivity 71.64%, specificity 77.42%) in MRD Status. predicting MRD. The area under the ROC curve (0.80) is (0022 FIG. 11 (Figure S5) shows the Cross-validation Pro significantly greater than an uninformative ROC curve (0.5) cedure for Determining the Best Model for Predicting RFS. (P<0.0001). B. Heatmap of 23 probe set predictor of MRD (0023 FIG. 12 (Figure S6) shows the Nested Cross-valida presented in rows (false discovery rate <0.0001%, SAM). The tion for Objective Prediction used in Significance Evaluation columns represent patient samples with positive or negative of the Gene Expression Risk Prediction Model. end-induction flow MRD while the rows are the specific (0024 FIG. 13 (Figure S7) shows the Cross-validation Pro predictor genes. Red: high expression relative to the mean: cedure for Determining the Best Model for Predicting Day 29 green: low expression relative to the mean. C. Kaplan-Meier MRD Status. Figure S7. estimates of relapse free survival (RFS) for the risk groups (0025 FIG. 14 (Figure S8) shows the Nested cross-valida determined by combining the gene expression classifiers for tion for Objective Predictions used in Significance Evaluation RFS and MRD, analogous to FIG. 2E, with the gene expres of Gene Expression Risk Prediction Model for the 29 MRD sion predictor for MRD replacing day 29 flow MRD. The Status. three risk groups have significantly different RFS (log rank (0026 FIG. 15 (Figure S9) shows the Likelihood RatioTest test, P<0.0001). Statistic as a Function of Gene Expression Classifier Thresh 0016 FIG. 5 shows the Kaplan-Meier Estimates of old for RFS with t1:19) Translocation and MLL Rearrange Relapse-free Survival (RFS) using the Combined Gene ment Cases Removed. Expression Classifiers for RFS and Minimal Residual Dis (0027 FIG. 16 (Figure S10) shows Kaplan-Meier Esti ease in an Independent Cohort of 84 Children with High-Risk mates of Relapse-free Survival (RFS) Based on Gene Expres ALL. A. The gene expression classifier for RFS separates sion Classifier for RFS and Day 29 Minimal Residual Disease children into low and high risk groups in an independent (MRD) Levels after Excluding to 1:19) Translocation and cohort of 84 children with high-risk ALL treated on COG MLL Rearrangement Cases. These are presented in figures Trial 1961. 14,16 B. Application of the combined gene (A) through (F). A. The gene expression classifier separates expression classifiers for RFS and MRD shows significant patients into low and high risk groups with significantly dif separation of three risk groups: low (47/84, 56%), intermedi ferent RFS. B. and C. After dividing patients by their end ate (22/84, 26%) and high (15/84, 18%), similar to our initial induction flow MRD status, an independent effect of the gene cohort (FIG. 3C). expression classifier for RFS is observed among both the flow 0017 FIG. 6 shows Kaplan-Meier Estimates of Relapse MRD-negative (<0.01% blasts) (Panel B) and flow MRD Free Survival using the Combined Gene Expression Classi positive (>0.01% blasts) (Panel C) patients. D. Combining fier for RFS and Flow Cytometric Measures of MRD in the the scores from the gene expression classifier for RFS and US 2011/0230372 A1 Sep. 22, 2011

flow MRD yields three distinct outcome groups. The hazard Vx=VxInsight clusters. Cluster numbers are given across ratio (HR) and corresponding p-value are based on the Cox each axis with the exception of RC9, which represents cluster regression. The p-value reported in the lower left hand corner 2A. corresponds to the test for differences among all groups. 0035 FIG. 24 shows the survival of IKZF1-positive 0028 FIG. 17 shows Hierarchical Clustering Identifying 8 patients in R8 compared to not-R8. IKZF1-positive patients Cluster Groups in High Risk ALL. Hierarchical clustering were divided into those in cluster 8 (red line) and those in using 254 genes (provided in Supplement, Table S7A) was other clusters (black line). The p-value and hazard ratio for used to identify clusters of patients with shared patterns of this comparison are given in the lower left panel. gene expression. (Rows: 207 P9906 patients; Columns: 254 BRIEF DESCRIPTION OF THE INVENTION Probe Sets). Shades of red depict expression levels higher than the median while green indicates levels lower than the 0036. Accurate risk stratification constitutes the funda median. The cluster groups are numbered and prefixed by mental paradigm of treatment in acute lymphoblastic leuke their method of probe set selection: H=High CV, C=COPA mia (ALL), allowing the intensity of therapy to be tailored to and R=ROSE. Panel A. HC method for selection of probe the patient's risk of relapse. The present invention evaluates a sets. Panel B. COPA selection of probe sets. Panel C. ROSE gene expression profile and identifies prognostic genes of cancers, in particular leukemia, more particularly high risk selection of probe sets. B-precursor acute lymphoblastic leukemia (B-ALL), includ 0029 FIG. 18 shows Relapse-Free Survival in Gene ing high risk pediatric acute lymphoblastic leukemia. The Expression Cluster Groups. Relapse free-survival is shown present invention provides a method of determining the exist for each of the High CV clusters (A), COPA clusters (B), and ence of high risk B-precursor ALL in a patient and predicting ROSE clusters (C). Only the H6, C6, and R6 clusters (curves therapeutic outcome of that patient, especially a pediatric shown in blue) have a significantly better outcome compared patient. The method comprises the steps of first establishing to the entire cohort (dense line), while the H8, C8, R8 clusters the threshold value of at least (2) or three (3) prognostic genes (curves shown in red) have a significantly poorer RFS. Haz of high risk B-ALL, or four (4) prognostic genes, at least five ard ratios and p-values are shown in the bottom left of each (5) prognostic genes, at least 6, at least 7, at least 8, at least 9, panel. at least 10, at least 11, at least 12, at least 13, at least 13, at least 0030 FIG. 19 shows Hierarchical Clustering Identifying 14, at least 15, at least 16, at least 17, at least 18, at least 19, Similar Clusters in a Second High Risk ALL Cohort. Hierar at least 20, at least 21, at least 22, at least 23, at least 24, at least chical clustering using 167 probe sets (provided in Supple 25, at least 26, at least 27, at least 28, at least 29, at least 30 or ment, Table S7A) was used to identify clusters of patients up to 30 or more prognostic genes which are described in the with shared patterns of gene expression in CCG 1961. (Rows: present specification, especially Table 1 P and 1 O (see below, 99 CCG 1961 patients; Columns: 167 Probe Sets). Shades of pages 14-17). Table 1 P genes include the following 31 genes red depict expression levels higher than the median while (gene products): BMPR1B (bone morphogenic receptor type green indicates levels lower than the median. The cluster 1B); BTG3 (B-cell translocation gene 3, also BTG family groups are prefixed by their method of probe set selection: member 3); C14orf32 (chromosome 14 open reading frame H=High CV, C=COPA and R=ROSE. Panel A. HC method 32); C8orf38 (Chromosome 8 open reading frame 38); CD2 for selection of probe sets. Panel B. COPA selection of probe (CD2 molecule); CDC42EP3 (CDC42 effector (Rho sets. Panel C. ROSE selection of probe sets. GTPase binding) 3); CHST2 (carbohydrate (N-acetylglu 0031 FIG. 20 shows Relapse-Free Survival in Second cosamine-6-O) sulfotransferase 2); CTGF (connective tissue High Risk ALL Cohort. Relapse free-survival is shown for growth factor); DDX21 (DEAD (Asp-Glu-Ala-Asp) box each of the High CV clusters (A), COPA clusters (B), and polypeptide 21); DKFZP761M1511 (hypothetical protein ROSE clusters (C). Only the C10 and R10 clusters (curves DKFZP761M1511); ECM1 (extracellular matrix protein 1); shown in blue) have a significantly better outcome compared FMNL2 (formin-like 2); GRAMD1 C (GRAM domain con to the entire cohort (dense line), while the H8, C8, R8 clusters taining 1C); IGJ (immunoglobulin J polypeptide); LDB3 (curves shown in red) have a significantly poorer RFS. Haz (LIM domain binding 3); LOC400581 (GRB2-related adap ard ratios and p-values are shown in the bottom left of each tor protein-like); LRRC62 (leucine rich repeat containing panel. 62): MDFIC (MyoD family inhibitor domain containing); MGC12916 (hypothetical protein MGC12916); NFKBIB 0032 FIG. 21 (Figure S1") shows a comparison of relapse (nuclear factor of kappa polypeptide gene enhancer in free survival between those studied (n=207) and remaining B-cells inhibitor, beta); NR4A3 (nuclear receptor subfamily COG P9906 patients not included in this cohort (n=65). 4, group A, member 3); NT5E (5'-nucleotidase, ecto (CD73)); 0033 FIG.22 (Figure S2') shows an example of probe set PON2 (paraoxonase 2); RGS1 (regulator of G-protein signal with outlier group at high end. Red line indicates signal ling 1); RGS2 (regulator of G-protein signalling 2, 24 kDa); intensities for all 207 patient samples for probe 212151 at. SCHIP1 (schwannomin interacting protein 1); SEMA6A Vertical blue lines depict partitioning of samples into thirds. A (sema domain, transmembrane domain (TM), and cytoplas least-squares curve fit is applied to the middle third of the mic domain, (semaphorin) 6A); TSPAN7 (tetraspanin 7); samples and the resulting trend line is shown in yellow. Dif TTYH2 (tweety homolog 2 (Drosophila)); UBE2E3 (ubiq ferent sample groups are illustrated by the dashed lines at the uitin-conjugating E2E 3 (UBC4/5 homolog, yeast)) top right. As shown by the double arrowed lines, the median and VPREB1 (pre-B lymphocyte gene 1). Of the above genes/ value from each of these groups is compared to the trend line. gene products (31) the following are high risk genes (gene 0034 FIG. 23 (Figure S3") shows a 3-D plot of cluster products): BMPR1B: C8orf38; CDC42EP3: CTGF; membership from different clustering methods. Each of the DKFZP761M1511; ECM1; GRAMD1C: IGJ; LDB3; three clustering methods is shown on an axis: LOC400581; LRRC62: MDFIC; NT5E; PON2: SCHIP1; HC=hierarchical clusters, RC=ROSE/COPA clusters and SEMA6A: TSPAN7; and TTYH2. Of these 31 genes, the US 2011/0230372 A1 Sep. 22, 2011

following are low risk genes (gene products): BTG3; 1); LOC650794 (Similar to FRAS1 related extracellular C14orf32: CD2: CHST2: DDX21: FMNL2: MGC12916: matrix protein 2 precursor (ECM3 homolog)): MUC4 (mucin NFKBIB; NR4A3; RGS1; RGS2; UBE2E3 and VPREB1. It 4, cell surface associated); NRXN3 (neurexin 3); PON2 is noted that the gene product AGAP1 (Arf GAP with GTP (paraoXonase 2); RGS2 (regulator of G-protein signalling 2. binding protein-like, ANK repeat and PH domains, also 24 kDa); RGS3 (Regulator of G-protein signalling 3): referred to as CENTG2) may also be added to this list for SCHIP1 (schwannomin interacting protein 1); SCRN3 (se analysis in order to enhance diagnosis and evaluation of the cernin3); SEMA6A (sema domain, transmembrane domain patient and/or therapeutic agent. (TM), and cytoplasmic domain, (semaphorin) 6A) and 0037 Preferred table 1 P genes to be measured include the ZBTB16 (Zinc finger and BTB domain containing 16). Of following 8 genes products: BMPR1B: CTGF: IGJ; LDB3: these 27 genes (gene products), the following are high risk: PON2; RGS2: SCHIP1 and SEMA6A. Of these genes (gene BMPR1B: BTBD11; C21orf37; CA6: CDC42EP3; CKMT2: products), BMPR1B: CTGF; IGJ; LDB3; PON2: SCHIP1 CRLF2: CTGF; DIP2A; GIMAP6; GPR110; IGFBP6; IGJ; and SEMA6A are “high risk', i.e., when overexpressed are K1F1C: LDB3; LOC391849; LOC650794: MUC4; NRXN3; predictive of an unfavorable therapeutic outcome (relapse, PON2; RGS3: SCHIP1; SCRN3: SEMA6A and ZBTB16. unsuccessful therapy) of the patient. One gene (gene product) The following gene (gene product) is low risk: RGS2. within this group, RGS2, when overexpressed, is predictive 0039. Preferred table 1Q(see below) genes to be measured of therapeutic Success (remission, favorable therapeutic out include the following 11 genes products: BMPR1B: CA6; come). At least 2 or 3 genes, preferably at least 4 or 5 genes, CRLF2: GPR110; IGJ; LDB3: MUC4; NRXN3; PON2: at least 6 at least 7 or 8 of these genes within this smallergroup RGS2 and SEMA6A. At least 2 or 3 genes, preferably at least are measured to provide a predictive outcome of therapy. It is 4 or 5 genes, at least 6 at least 7, at least 8, at least 9, at least noted that overexpression of a high risk gene (gene product) 10 or 11 of these genes are measured to provide a predictive will be predictive of an unfavorable outcome; whereas the outcome of therapy. A preferred list obtained from the above underexpression of a high risk gene will be (somewhat) pre list of 11 genes includes BMPR1B: CA6: CRLF2: GPR110: dictive of a favorable outcome. It is also noted that the over IGJ; LDB3; MUE4; PON2 and RGS2. Preferred gene prod expression of a low risk gene (gene product) will be predictive ucts within this list include CA6, IGJ, MUC4, GPR110, of a favorable therapeutic outcome, whereas the underexpres PON2, CRLF2 and optionally RGS2. CRLF2 is preferably sion of a low risk gene (gene product) will be predictive of an included as a gene product in the most preferred list. It is unfavorable therapeutic outcome. noted that overexpression of a high risk gene (gene product) 0038 Table 1Q genes include the following genes (gene will be predictive of an unfavorable outcome; whereas the products): BMPR1B (bone morphogenic receptor type 1B); underexpression of a high risk gene will be (somewhat) pre BTBD11 (BTB (POZ) domain containing 11); C21orf87 dictive of a favorable outcome. It is also noted that the over (chromosome 21 open reading frame 87); CA6 (carbonic expression of a low risk gene (gene product) will be predictive anhydrase VI); CDC42EP3 (CDC42 effector protein (Rho of a favorable therapeutic outcome (remission), whereas the GTPase binding)3); CKMT2 (creatine kinase, mitochondrial underexpression of a low risk gene (gene product) will be 2 (sarcomeric)); CRLF2 (cytokine receptor-like factor 2); predictive of an unfavorable therapeutic outcome. Also noted CTGF (connective tissue growth factor); DIP2A (DIP2 disco is the fact that the gene products AGAP-1 (Arf GAP with interacting protein 2 homolog A (Drosophila)); GIMAP6 GTP-binding protein-like, ANK repeat and PH domains, also (GTPase, IMAP family member 6); GPR110 (G protein CENTG2) and/or PCDH17 (Protocadherin-17) may also be coupled receptor 110); IGFBP6 (insulin-like growth factor used (analyzed) in the invention (in addition to Table 1P binding protein 6); IGJ (immunoglobulin J polypeptide); and/or Table 1Q gene products, including the preferred gene K1F1C (kinesin family member 1C); LDB3 (LIM domain product lists from each of these Tables) to promote the accu binding 3); LOC391849 (Homo sapiens similar to neuralized racy of diagnosis and related methods.

TABLE 1P Overlap Rank High => with 54K Probe set Gene Symbol Gene Description 1 High Risk Yes 242579 a BMPR1B Transcribed 10 High Risk Yes 232539 a MRNA, cDNA DKFZp761 H1023 (from clone DKFZp761H1023) 18 High Risk 236750 a Transcribed locus 19 High Risk 215617 a CDNA FLJ11754 fis, clone HEMBA1005588 25 High Risk 244280 a Homo sapiens, clone IMAGE: 5583725, mRNA 26 High Risk 215479 a CDNA FLJ20780 fis, clone COLO4256 31 Low Risk 238623 a CDNA FLJ37310 fis, clone BRAMY2016706 39 Low Risk 244623 a Transcribed locus 24 Low Risk 213134 x BTG3 BTG family, member 3 34 Low Risk 212497 a C14orf32 chromosome 14 open reading frame 32 20 High Risk 236766 a C8orf38 Chromosome 8 open reading frame 38 27 Low Risk 205831 a CD2 CD2 molecule 6 High Risk Yes 209288 s CDC42EP3 CDC42 effector protein (Rho GTPase binding) 41 Low Risk 203921 a CHST2 carbohydrate (N-acetylglucosamine-6-O) Sulfotransferase 2 12 High Risk Yes 209101 a CTGF connective tissue growth factor 30 Low Risk 224654 a DDX21 DEAD (Asp-Glu-Ala-Asp) box polypeptide 21 36 Low Risk 208152 s at DDX21 DEAD (Asp-Glu-Ala-Asp) box polypeptide 21 US 2011/0230372 A1 Sep. 22, 2011

TABLE 1P-continued Overlap Rank Hig with 54K Probe Set ID Gene Symbol Gene Description 14 Hig h Risk 225355 at DKFZP761M1511 hypothetical protein DKFZP761M1511 16 Hig h Risk 209365 s at ECM1 extracellular matrix protein 1 33 Low Risk 226184 at FMNL2 formin-like 2 13 Hig h Risk 219313 at GRAMD1 C GRAM domain containing 1C 11 Hig h Risk Yes 212592 at IG Immunoglobulin J polypeptide, linker protein for immunoglobulin alpha and mu polypeptide Hig h Risk Yes 213371 at LDB3 LIM domain binding 3 42 Hig h Risk 1560524 at LOC4OOS81 GRB2-related adaptor protein-like 38 Hig h Risk 1559.072 a. at LRRC62 leucine rich repeat containing 62 28 Hig h Risk 211675 s at MDFIC MyoD family inhibitor domain containing 40 Low Risk 224.507 s at MGC12916 hypothetical protein MGC12916 15 Low Risk 228.388 a NFKBIB nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, beta Low Risk 209959 a NR4A3 nuclear receptor Subfamily 4, group A, member 3 Low Risk 207978 s at NR4A3 nuclear receptor Subfamily 4, group A, member 3 Hig h Risk 203939 a NTSE 5'-nucleotidase, ecto (CD73) Hig h Risk Yes 210830s at PON2 paraOXonase 2 Hig h Risk Yes 201876 a PON2 paraOXonase 2 Low Risk 216834 a RGS1 regulator of G-protein signalling 1 Low Risk Yes 2O2388 at RGS2 regulator of G-protein signalling 2, 24 kDa Hig h Risk Yes 20403.0 s at SCHIP1 Schwannomin interacting protein 1 Hig h Risk Yes 215028 a SEMA6A Sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6A Hig h Risk Yes 223449 a SEMA6A Sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6A 32 Hig h Risk 202242 a. TSPANT tetraspanin 7 17 Hig h Risk 22374.1 s at TTYH2 tweety homolog 2 (Drosophila) 37 Low Risk 21.0024 s at UBE2E3 ubiquitin-conjugating enzyme E2E3 (UBC4/5 homolog, yeast) 35 Low Risk 221349 a VPREB1 pre-B lymphocyte gene 1

TABLE 1 O Rank Hig Probe Set ID Gene Symbol Gene Description Hig 236489 a Transcribed locus Hig 242579 a BMPR1B Transcribed locus 19 Hig 229975 a Transcribed locus 34 Hig 232539 a MRNA, cDNA DKFZp761H1023 (from clone DKFZp761H1023) 24 Hig 241295 a BTBD11 BTB (POZ) domain containing 11 29 Hig 1553.069 at C21orf7 chromosome 21 open reading frame 87 38 Hig 206873 a CA6 carbonic anhydrase VI 35 Hig 209288 s at CDC42EP3 CDC42 effector protein (Rho GTPase binding) 3 33 Hig 205295 a CKMT2 creatine kinase, mitochondrial 2 (sarcomeric) Hig 2083.03 s at CRLF2 cytokine receptor-like factor 2 32 Hig 209101 a CTGF connective tissue growth factor 18 Hig 1554969 x at DIP2A DIP2 disco-interacting protein 2 homolog A (Drosophila) Hig 219777 a GIMAP6 GTPase, IMAP family member 6 28 Hig 229367 s at GIMAP6 GTPase, IMAP family member 6 Hig 235988 a GPR110 G protein-coupled receptor 110 23 Hig 238689 a GPR110 G protein-coupled receptor 110 11 Hig 203851 a IGFBP6 insulin-like growth factor binding protein 6 25 Hig 212592 a IG Immunoglobulin J polypeptide, linker protein for immunoglobulin alpha and mu polypeptides 37 Hig 209245 s at KIF1C kinesin family member 1C Hig 213371 a LDB3 LIM domain binding 3 12 Hig 216887 s at LDB3 LIM domain binding 3 22 Hig 240457 a LOC391849 Similar to neuralized-like 15 Hig 237191 x at LOC6SO794 Similar to FRAS1-related extracellular matrix protein 2 precursor (ECM3 homolog) 217110 s at MUC4 mucin 4, cell Surface associated 217109 a MUC4 mucin 4, cell Surface associated 13 204895 X at MUC4 mucin 4, cell Surface associated 17 2O5795 a NRXN3 neurexin 3 2O 215021 s at NRXN3 neurexin 3 10 210830s at PON2 paraOXonase 2 26 201876 a PON2 paraOXonase 2 202388 a RGS2 regulator of G-protein signalling 2, 24 kDa US 2011/0230372 A1 Sep. 22, 2011

TABLE 1 O-continued Rank High => Probe Set ID Gene Symbol Gene Description 14 High Risk 233390 at RGS3 Regulator of G-protein signalling 3 31 High Risk 20403.0 s at SCHIP1 Schwannomin interacting protein 1 36 High Risk 232108 at SCHN3 secemin 3 16 High Risk 225660 at SEMA6A sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6A 21 High Risk 215028 at SEMA6A sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6A 27 High Risk 223449 at SEMA6a sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6A 30 High Risk 244697 at ZBTB16 Zinc finger and BTB domain containing 16

0040. Then, the amount of the prognostic gene(s) from a 0042. The present invention is directed to methods for patient inflicted with high risk B-ALL is determined. The outcome prediction and risk classification in leukemia, espe amount of the prognostic gene present in that patient is com cially a high risk classification in B precursor acute lympho pared with the established threshold value (a predetermined blastic leukemia (ALL), especially in children. In one value) of the prognostic gene(s) which is indicative of thera embodiment, the invention provides a method for classifying peutic success (low risk) or failure (high risk), whereby the leukemia in a patient that includes obtaining a biological prognostic outcome of the patient is determined. The prog sample from a patient; determining the expression level for a nostic gene may be a gene which is indicative of a poor or selected gene product, more preferably a group of selected unfavorable (bad) prognostic outcome (high risk) or a favor gene products, to yield an observed gene expression level; able (good) outcome (low risk). Analyzing expression levels and comparing the observed gene expression level for the of these genes provides accurate insight (diagnostic and prog selected gene product(s) to control gene expression levels nostic) information into the likelihood of a therapeutic out (preferably including a predetermined level). The control come in ALL, especially in a high risk B-ALL patient, includ gene expression level can be the expression level observed for ing a pediatric patient. the gene product(s) in a control sample, or a predetermined 0041. In certain embodiments, the amount of the prognos expression level for the gene product. An observed expression tic gene is determined by the quantitation of a transcript level (higher or lower) that differs from the control gene encoding the sequence of the prognostic gene; or a polypep expression level is indicative of a disease classification and is tide encoded by the transcript. The quantitation of the tran predictive of a therapeutic outcome. In another aspect, the script can be based on hybridization to the transcript. The method can include determining a gene expression profile for quantitation of the polypeptide can be based on antibody selected gene products in the biological sample to yield an detection or a related method. The method optionally com observed gene expression profile; and comparing the prises a step of amplifying nucleic acids from the tissue observed gene expression profile for the selected gene prod sample before the evaluating (PCR analysis). In a number of ucts to a control gene expression profile for the selected gene embodiments, the evaluating is of a plurality of prognostic products that correlates with a disease classification, for genes, preferably at least two (2) prognostic genes, at least example ALL, and in particular high risk B precursor ALL; three (3) prognostic genes, at least four (4) prognostic genes, wherein a similarity between the observed gene expression at least five (5) prognostic genes, at least six (6) prognostic profile and the control gene expression profile is indicative of genes, at least seven (7) prognostic genes, at least eight (8) the disease classification (e.g., high risk B-all poor or favor prognostic genes, at least nine (9) prognostic genes, at least able prognostic). ten (10) prognostic genes, at least eleven (11) prognostic 0043. The disease classification can be, for example, a genes, at least twelve (12) prognostic genes, at least thirteen classification preferably based on predicted outcome (remis (13) prognostic genes, at least fourteen (14) prognostic genes, sion VS therapeutic failure); but may also include a classifi at least fifteen (15) prognostic genes, at least sixteen (16) cation based upon clinical characteristics of patients, a clas prognostic genes, at least seventeen (17) prognostic genes, at sification based on karyotype; a classification based on least eighteen (18) prognostic genes, at least nineteen (19) leukemia Subtype; or a classification based on disease etiol prognostic genes, at least twenty (20) prognostic genes, at ogy. Measurement of all 31 genes (gene products) set forth in least twenty-one (21) prognostic genes, at least twenty-two Table 1 Pandall 27 gene products set forth in Table 1Q, below, (22) prognostic genes, at least twenty-three (23) prognostic or a group of genes (gene products) falling within these larger genes, at least twenty-four (24), at least twenty-five (25), at lists as otherwise described herein may also be performed to least twenty-six (26), at least twenty-seven (27), at least provide an accurate assessment of therapeutic intervention. twenty-eight (28), at least twenty-nine (29), at least thirty (30) 0044) The invention further provides for a method for or thirty-one (31) prognostic genes. The prognosis which is predicting a patient falls within a particular group of high risk determined from measuring the prognostic genes contributes B-ALL patients and predicting therapeutic outcome in that B to selection of a therapeutic strategy, which may be a tradi ALL leukemia patient, especially pediatric B-ALL that tional therapy for ALL, including B-precursor ALL (where a includes obtaining a biological sample from a patient; deter favorable prognosis is determined from measurements), or a mining the expression level for selected gene products asso more aggressive therapy based upon a traditional therapy or a ciated with outcome (high risk or low risk) to yield an non-traditional therapy (where an unfavorable prognosis is observed gene expression level; and comparing the observed determined from measurements). gene expression level for the selected gene product(s) to a US 2011/0230372 A1 Sep. 22, 2011 control gene expression level for the selected gene product. products as set forth above, three of the gene products, four of The control gene expression level for the selected gene prod the gene products or all five of the gene products. In addition, uct can include the gene expression level for the selected gene the therapeutic method according to the present invention product observed in a control sample, or a predetermined also modulates at least two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, gene expression level for the selected gene product; wherein seventeen, eighteen, nineteen, twenty, twenty-one, twenty an observed expression level that is different from the control two, twenty-three, twenty-four, twenty-five, twenty-six, gene expression level for the selected gene product(s) is twenty-seven, twenty-eight, twenty-nine, thirty or thirty one indicative of predicted remission or alternatively, an unfavor of a number of gene products as relevant in Tables 1P and 1 O able outcome. The method preferably may determine gene as indicated or otherwise described herein. Preferred genes expression levels of at least two gene products otherwise (gene products) useful in this aspect of the invention from identified herein. The genes (gene product expression) other Table 1 P include BMPR1B: CTGF: IGJ; LDB3; PON2: wise described herein are measured, compared to predeter RGS2: SCHIP1 and SEMA6A, all of which are high risk mined values (e.g. from a control sample) and then assessed genes with the exception of RGS2. to determine the likelihood of a favorable or unfavorable 0047. Also provided by the invention is an in vitro method therapeutic outcome and then providing a therapeutic for screening a compound useful for treating leukemia, espe approach consistent with the analysis of the express of the cially high risk B-ALL. The invention further provides an in measured gene products. The present method may include Vivo method for evaluating a compound for use in treating measuring expression of at least two gene products up to 31 leukemia, especially high risk B-ALL. The candidate com gene products according to Tables 1 P and 1 O as otherwise pounds are evaluated for their effect on the expression level(s) described herein. In certain preferred aspects of the invention, of one or more gene products associated with outcome in the expression levels of all 31 gene products (Table 1P) or all leukemia patients (for example, Table 1 P and 1 O and as 27 gene products Table 1Q) may be determined and compared otherwise described herein), especially high risk B-ALL, to a predetermined gene expression level, wherein a measure preferably at least two of those gene products, at least three of ment above or below a predetermined expression level is those gene products, at least four of those gene products, at indicative of the likelihood of an unfavorable therapeutic least five of those gene products, at least six of those gene response/therapeutic failure or a favorable therapeutic products, at least seven of those gene products, at least eight response (continuous complete remission or CCR). In the of those gene products, at least nine of those gene products, at case where therapeutic failure is predicted, the use of more least ten of those gene products, at least eleven of those gene aggressive protocols of traditional anti-cancer therapies products, at least twelve of those gene products, at least (higher doses and/or longer duration of drug administration) thirteen of those gene products, at least fourteen of those gene or experimental therapies may be advisable. products, at least fifteen of those gene products, at least six teen of those gene products, at least seventeen of those gene 0045 Optionally, the method further comprises determin products, at least eighteen of those gene products, at least ing the expression level for other gene products within the list twenty of those gene products, at least twenty-one of those of gene products otherwise disclosed herein and comparing in gene products, at least twenty-two of those gene products, at a similar fashion the observed gene expression levels for the least twenty-three of those gene products, at least twenty selected gene products with a control gene expression level four, at least twenty-five, at least twenty-six, at least twenty for those gene products, wherein an observed expression level seven, at least twenty-eight, at least twenty-nine, at least thirty for these gene products that is different from (above or below) or thirty-one of those gene products may be measured to the control gene expression level for that gene product (high determine a therapeutic outcome. risk or low risk) is further indicative of predicted remission 0048. The preferred gene products may also include at (favorable prognosis) or relapse (unfavorable prognosis). It is least three of CA6, IGJ, MUC4, GPR110, LDB3, PON2, noted that a higher expression (when compared to a control or CRLF2 and RGS2 (preferably CRLF2 is included in the at predetermined value) of a high risk gene (gene product) is least three gene products) and in certain instances may further generally indicative of an unfavorable prognosis of therapeu include AGAP-1 (Arf GAP with GTP-binding protein-like, tic outcome; a higher expression (when compared to a control ANK repeat and PH domains, also CENTG2) and/or or predetermined value) of a low risk gene (gene product) is PCDH17 (Protocadherin-17). These genes/gene products and generally indicative of a favorable therapeutic outcome (re their expression above or below a predetermined expression mission, including continuous complete remission); a lower level are more predictive of overall outcome. As shown expression (when compared to a control or a predetermined below, at least two or more of the gene products which are value) of a high risk gene (gene product) is generally indica presented in tables 1P or 1G may be used to predict therapeu tive of a favorable therapeutic outcome. Genes (gene prod tic outcome. This predictive model is tested in an independent ucts) are to be assessed in toto during an analysis to provide a cohort of high risk pediatric B-ALL cases (20) and is found to predictive basis upon which to recommend therapeutic inter predict outcome with extremely high statistical significance vention in a patient. (p-value <1.0). It is noted that the expression of gene prod 0046. The invention further includes a method for treating ucts of at least two of the five genes listed above, as well as leukemia comprising administering to a leukemia patient a additional genes from the list appearing in Tables 1P and 1 O therapeutic agent that modulates the amount or activity of the and in certain preferred instances, the expression of all 24 gene product(s) associated with therapeutic outcome. Prefer gene products of Table 1 P and 1 O may be measured and ably, the method modulates (enhancement/upregulation of a compared to predetermined expression levels to provide the gene product associated with a favorable or good therapeutic greater degrees of certainty of a therapeutic outcome. outcome (low risk) or inhibition/downregulation of a gene product associated with a poor or unfavorable therapeutic DETAILED DESCRIPTION OF THE INVENTION outcome (high risk) as measured by comparison with a con 0049 Gene expression profiling can provide insights into trol sample or predetermined value) at least two of the gene disease etiology and genetic progression, and can also pro US 2011/0230372 A1 Sep. 22, 2011

vide tools for more comprehensive molecular diagnosis and or (4) leukemic cell chromosome translocations t(1:19) or therapeutic targeting. The biologic clusters and associated t(9:22) confirmed by central reference laboratory. (See, Crist, gene profiles identified herein may be useful for refined et al, Blood 1990; 76: 117-122; and Fletcher, et al., Blood molecular classification of acute leukemias as well as 1991; 77:435-439). improved risk assessment and classification, especially of 0056. The term “traditional therapy” relates to therapy high risk B precursor acute lymphoblastic leukemia (protocol) which is typically used to treat leukemia, espe (B-ALL), especially including pediatric B-ALL. In addition, cially B-precursor ALL (including pediatric B-ALL) and can the invention has identified numerous genes, including but include Memorial Sloan-Kettering New York II therapy (NY not limited to the genes as presented in Tables 1P and 1CR II), UKALLR2, AL 841, AL851, ALHR88, MCP841 (India), hereof, that are, alone or in combination, strongly predictive as well as modified BFM (Berlin-Frankfurt-Munster) of therapeutic outcome in high risk B-ALL, and in particular therapy, BMF-95 or other therapy, including ALinC 17 high risk pediatric B precursor ALL. The genes identified therapy as is well-known in the art. In the present invention herein, and the gene products from said genes, including the term “more aggressive therapy’ or “alternative therapy' they encode, can be used to refine risk classification usually means a more aggressive version of conventional and diagnostics, to make outcome predictions and improve therapy typically used to treat leukemia, for example B-ALL, prognostics, and to serve as therapeutic targets in infant leu including pediatric B-precursor ALL, using for example, con kemia and pediatric ALL, especially B-precursor ALL. ventional or traditional chemotherapeutic agents at higher 0050 “Gene expression' as the term is used herein refers dosages and/or for longer periods of time in order to increase to the production of a biological product encoded by a nucleic the likelihood of a favorable therapeutic outcome. It may also acid sequence, such as a gene sequence. This biological prod refer, in context, to experimental therapies for treating leuke uct, referred to herein as a 'gene product may be a nucleic mia, rather than simply more aggressive versions of conven acid or a polypeptide. The nucleic acid is typically an RNA tional (traditional) therapy. molecule which is produced as a transcript from the gene sequence. The RNA molecule can be any type of RNA mol Diagnosis, Prognosis and Risk Classification ecule, whether either before (e.g., precursor RNA) or after 0057 Current parameters used for diagnosis, prognosis (e.g., mRNA) post-transcriptional processing. cDNA pre and risk classification in pediatric ALL are related to clinical pared from the mRNA of a sample is also considered a gene data, cytogenetics and response to treatment. They include product. The polypeptide gene product is a peptide or protein age and white blood count, cytogenetics, the presence or that is encoded by the coding region of the gene, and is absence of minimal residual disease (MRD), and a morpho produced during the process of translation of the mRNA. logical assessment of early response (measured as slow or 0051. The term “gene expression level” refers to a mea rapid early therapeutic response). As noted above however, Sure of a gene product(s) of the gene and typically refers to the these parameters are not always well correlated with out relative or absolute amount or activity of the gene product. come, nor are they precisely predictive at diagnosis. 0052. The term “gene expression profile' as used herein is 0.058 Prognosis is typically recognized as a forecast of the defined as the expression level of two or more genes. The term probable course and outcome of a disease. As such, it involves gene includes all natural variants of the gene. Typically a gene inputs of both statistical probability, requiring numbers of expression profile includes expression levels for the products samples, and outcome data. In the present invention, outcome of multiple genes in given sample, up to about 13,000, pref data is utilized in the form of continuous complete remission erably determined using an oligonucleotide microarray. (CCR) of ALL or therapeutic failure (non-CCR). A patient 0053. Unless otherwise specified, “a,” “an,” “the.” and “at population of hundreds is included, providing statistical least one' are used interchangeably and mean one or more power. than one. 0059. The ability to determine which cases of leukemia, 0054 The term “patient' shall mean within context an especially high risk B precursor acute lymphoblastic leuke , preferably a mammal, more preferably a human mia (B-ALL), including high risk pediatric B-ALL will patient, more preferably a human child who is undergoing or respond to treatment, and to which type of treatment, would will undergo therapy or treatment for leukemia, especially be useful in appropriate allocation of treatment resources. It high risk B-precursor acute lymphoblastic leukemia. would also provide guidance as to the aggressiveness of 0055. The term “high risk B precursor acute lymphocytic therapy in producing a favorable outcome (continuous com leukemia' or “high risk B-ALL refers to a disease state of a plete remission or CCR). As indicated above, the various patient with acute lymphoblastic leukemia who meets certain standard therapies have significantly different risks and high risk disease criteria. These include: confirmation of potential side effects, especially therapies which are more B-precursor ALL in the patient by central reference labora aggressive or even experimental in nature. Accurate progno tories (See Borowitz, et al., Rec Results Cancer Res 1993; sis would also minimize application of treatment regimens 131: 257-267); and exhibiting a leukemic cell DNA index of which have low likelihood of success and would allow a more is 1.16 (DNA content in leukemic cells: DNA content of efficient aggressive or even an experimental protocol to be normal Go/G cells) (DI) by central reference laboratory (See, used without wasting effort on therapies unlikely to produce Trueworthy, et al., J Clin Oncol 1992: 10: 606-613; and a favorable therapeutic outcome, preferably a continuous Pullen, et al., “Immunologic phenotypes and correlation with complete remission. Such also could avoid delay of the appli treatment results’. In Murphy SB, Gilbert JR (eds). Leukemia cation of alternative treatments which may have higher like Research. Advances in Cell Biology and Treatment. Elsevier: lihoods of Success for a particular presented case. Thus, the Amsterdam, 1994, pp. 221-239) and at least one of the fol ability to evaluate individual leukemia cases, especially lowing: (1) WBC 210 000-99 000/ul, aged 1-2.99 years or B-precursor acute lymphoblastic leukemia, for markers ages 6-21 years; (2) WBC 2100 000/ul, aged 1-21 years; (3) which Subset into responsive and non-responsive groups for all patients with CNS or overt testicular disease at diagnosis: particular treatments is very useful. US 2011/0230372 A1 Sep. 22, 2011

0060 Current models of leukemia classification have therapy. A novel strategy is described to discover/assess/mea become better at distinguishing between cancers that have Sure molecular markers for B-ALL leukemia, especially high similar histopathological features but vary in clinical course risk B-ALL to determine a treatment protocol, by assessing and outcome, except in certain areas, one of them being in gene expression in leukemia patients and modeling these data high risk B-precursor acute lymphoblastic leukemia based on a predetermined gene product expression for numer (B-ALL). Identification of novel prognostic molecular mark ous patients having a known clinical outcome. The invention ers is a priority if radical treatment is to be offered on a more herein is directed to defining different forms of leukemia, in selective basis to those high risk leukemia patients with dis particular, B-precursor acute lymphoblastic leukemia, espe ease states which do not respond favorably to conventional cially high risk B-precursor acute lymphoblastic leukemia, therapy. A novel strategy is described to discover/assess/mea including high risk pediatric B-ALL by measuring expression Sure molecular markers for B-ALL leukemia, especially high gene products which can translate directly into therapeutic risk B-ALL to determine a treatment protocol, by assessing prognosis. Such prognosis allows for application of a treat gene expression in leukemia patients and modeling these data ment regimen having a greater statistical likelihood of cost based on a predetermined gene product expression for numer effective treatments and minimization of negative side effects ous patients having a known clinical outcome. The invention from the different/various treatment options. herein is directed to defining different forms of leukemia, in 0063. In preferred aspects, the present invention provides particular, B-precursor acute lymphoblastic leukemia, espe an improved method for identifying and/or classifying acute cially high risk B-precursor acute lymphoblastic leukemia, leukemias, especially B precursor ALL, even more especially including high risk pediatric B-ALL by measuring expression high risk B precursor ALL and also high risk pediatric B gene products which can translate directly into therapeutic precursor ALL and for providing an indication of the thera prognosis. Such prognosis allows for application of a treat peutic outcome of the patient based upon an assessment of ment regimen having a greater statistical likelihood of cost expression levels of particular genes. Expression levels are effective treatments and minimization of negative side effects determined for two or more genes associated with therapeutic from the different/various treatment options. outcome, risk assessment or classification, karyotpe (e.g., 0061. In preferred aspects, the present invention provides MLL translocation) or Subtype (e.g., B-ALL, especially high an improved method for identifying and/or classifying acute risk B-ALL). Genes that are particularly relevant for diagno leukemias, especially B precursor ALL, even more especially sis, prognosis and risk classification, especially for high risk high risk B precursor ALL and also high risk pediatric B B precursor ALL, including high risk pediatric B precursor precursor ALL and for providing an indication of the thera ALL, according to the invention include those described in peutic outcome of the patient based upon an assessment of the tables (especially Table 1 P and 1 O) and figures herein. expression levels of particular genes. Expression levels are The gene expression levels for the gene(s) of interest in a determined for two or more genes associated with therapeutic biological sample from a patient diagnosed with or Suspected outcome, risk assessment or classification, karyotpe (e.g., of having an acute leukemia, especially B precursor ALL are MLL translocation) or Subtype (e.g., B-ALL, especially high compared to gene expression levels observed for a control risk B-ALL). Genes that are particularly relevant for diagno sample, or with a predetermined gene expression level. sis, prognosis and risk classification, especially for high risk Observed expression levels that are higher or lower than the B precursor ALL, including high risk pediatric B precursor expression levels observed for the gene(s) of interest in the ALL, according to the invention include those described in control sample or that are higher or lower than the predeter the tables (especially Table 1 P and 1 O) and figures herein. mined expression levels for the gene(s) of interest (as set forth The gene expression levels for the gene(s) of interest in a in Table 1 P and 1 O) provide information about the acute biological sample from a patient diagnosed with or Suspected leukemia that facilitates diagnosis, prognosis, and/or risk of having an acute leukemia, especially B precursor ALL are classification and can aid in treatment decisions, especially compared to gene expression levels observed for a control whether to use a more of less aggressive therapeutic regimen sample, or with a predetermined gene expression level. or perhaps even an experimental therapy. When the expres Observed expression levels that are higher or lower than the sion levels of multiple genes are assessed for a single biologi expression levels observed for the gene(s) of interest in the cal sample, a gene expression profile is produced. control sample or that are higher or lower than the predeter 0064. In one aspect, the invention provides genes and gene mined expression levels for the gene(s) of interest (as set forth expression profiles that are correlated with outcome (i.e., in Table 1 P and 1 O) provide information about the acute complete continuous remission or good/favorable prognosis leukemia that facilitates diagnosis, prognosis, and/or risk vs. therapeutic failure or poor/unfavorable prognosis) in high classification and can aid in treatment decisions, especially risk B-ALL. Assessment of at least two or more of these genes whether to use a more of less aggressive therapeutic regimen according to the invention, preferably at least three, at least or perhaps even an experimental therapy. When the expres four, at least five, six, seven, eight, nine, ten, eleven, twelve, sion levels of multiple genes are assessed for a single biologi thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nine cal sample, a gene expression profile is produced. teen, twenty, twenty-one, twenty-two, twenty-three, twenty 0062 Current models of leukemia classification have four, twenty-five, twenty-six (Table 1Q shows 26 genes), become better at distinguishing between cancers that have twenty-seven, twenty-eight, twenty-nine, thirty or thirty-one similar histopathological features but vary in clinical course as set forth in Tables 1 Pin a given gene profile can be inte and outcome, except in certain areas, one of them being in grated into revised risk classification schemes, therapeutic high risk B-precursor acute lymphoblastic leukemia targeting and clinical trial design. In one embodiment, the (B-ALL). Identification of novel prognostic molecular mark expression levels of a particular gene (gene products) are ers is a priority if radical treatment is to be offered on a more measured, and that measurement is used, either alone or with selective basis to those high risk leukemia patients with dis other parameters, to assign the patient to a particular risk ease states which do not respond favorably to conventional category (e.g., high risk B-ALL good/favorable or high risk US 2011/0230372 A1 Sep. 22, 2011

B-ALL poor/unfavorable). The invention identifies a pre outcome can be measured, and those measurements are used, ferred number of genes from Table P whose expression levels, either alone or with other parameters, to assign the patient to either alone or in combination, are associated with outcome, a particular risk category as it relates to a predicted therapeu including but not limited to at least two genes, preferably at tic outcome. For example, gene expression levels of multiple least three genes, four genes, five genes, six genes, seven genes can be measured for a patient (as by evaluating gene genes or eight genes selected from the group consisting of expression using an Affymetrix microarray chip) and com BMPR1B: CTGF; IGJ; LDB3; PON2; RGS2: SCHIP1 and pared to a list of genes whose expression levels (high or low) SEMA6A. The invention identifies a preferred number of are associated with a positive (or negative) outcome. If the genes from Table Q whose expression levels, either alone or gene expression profile of the patient is similar to that of the in combination, are associated with outcome, including but list of genes associated with outcome, then the patient can be not limited to at least two genes, preferably at least three assigned to a low risk (favorable outcome) or high risk (unfa genes, four genes, five genes, six genes, seven genes, eight Vorable outcome) category. The correlation between gene genes, nine genes, ten genes or eleven genes selected from the expression profiles and class distinction can be determined group consisting of BMPR1B: CA6: CRLF2: GPR110; IGJ; using a variety of methods. Methods of defining classes and LDB3: MUC4; NRXN3; PON2; RGS2 and SEMA6A. Of classifying samples are described, for example, in Golub etal, this list of 11 genes the following 9 are more relevant and U.S. Patent Application Publication No. 2003/0017481 pub indicative of a predictive outcome: BMPR1B: CA6: CRLF2: lished Jan. 23, 2003, and Golub et al., U.S. Patent Application GPR110; IGJ; LDB3: MUC4; PON2 and RGS2. Publication No. 2003/0134300, published Jul. 17, 2003. The 0065. Some of these genes exhibit a positive association information provided by the present invention, alone or in between expression level and outcome (low risk). For these conjunction with other test results, aids in sample classifica genes, expression levels above a predetermined threshold tion and diagnosis of disease. level (or higher than that exhibited by a control sample) is 0069 Computational analysis using the gene lists and predictive of a positive outcome (continuous complete remis other data, Such as measures of Statistical significance, as sion). In particular, it is expected Such measurements can be described herein is readily performed on a computer. The used to refine risk classification in children who are otherwise invention should therefore be understood to encompass classified as having high risk B-ALL, but who can respond machine readable media comprising any of the data, includ favorable (cured) with traditional, less intrusive therapies. ing gene lists, described herein. The invention further 0066. A number of genes, and in particular, CRLF2. includes an apparatus that includes a computer comprising MUC4 and LDB3 and to a lesser extent CA6, PON2 and such data and an output device such as a monitor or printerfor BMPR1B, in particular, are strong predictors of an unfavor evaluating the results of computational analysis performed able outcome for a high risk B-ALL patient and therefore in using such data. preferred aspects, the expression of at least two genes, and 0070. In another aspect, the invention provides genes and preferably the expression of at least three or four of those gene expression profiles that are correlated with cytogenetics. three genes among those cited above are measured and com This allows discrimination among the various karyotypes, pared with predetermined values for each of the gene prod Such as MLL translocations or numerical imbalances Such as ucts measured. This list may guide the choice of gene prod hyperdiploidy or hypodiploidy, which are useful in risk ucts to analyze to determine a therapeutic outcome or for assessment and outcome prediction. evaluating a drug, compound or therapeutic regimen. The 0071. In yet another aspect, the invention provides genes expression of RGS2 is a strong predictor of favorable out and gene expression profiles that are correlated with intrinsic come (low risk) and such can be used to further determine a disease biology and/or etiology. In other words, gene expres predictive outcome. sion profiles that are common or shared among individual 0067. In general, the expression of at least two genes in a leukemia cases in different patients can be used to define single group is measured and compared to a predetermined intrinsically related groups (often referred to as clusters) of value to provide a therapeutic outcome prediction and in acute leukemia that cannot be appreciated or diagnosed using addition to those two genes, the expression of any number of standard means such as morphology, immunophenotype, or additional genes described in Tables 1 P and 1 O can be mea cytogenetics. Mathematical modeling of the very sharp peak Sured and used for predicting therapeutic outcome. In certain in ALL incidence seen in children 2-3 years old (>80 cases per aspects of the invention where very high reliability is desired/ million) has suggested that ALL may arise from two primary required, the expression levels of all 31 or 26 genes genes (as events, the first of which occurs in utero and the second after per Tables 1P and 1 O) may be measured and compared with birth (Linetet al., Descriptive epidemiology of the leukemias, a predetermined value for each of the genes measured Such in Leukemias, 5" Edition. ES Henderson et al. (eds). WB that a measurement above or below the predetermined value Saunders, Philadelphia. 1990). Interestingly, the detection of of expression for each of the group of genes is indicative of a certain ALL-associated genetic abnormalities in cord blood favorable therapeutic outcome (continuous complete remis samples taken at birth from children who are ultimately sion) or a therapeutic failure. In the event of a predictive affected by disease supports this hypothesis (Gale et al., Proc. favorable therapeutic outcome, conventional anti-cancer Natl. Acad. Sci. U.S.A., 94:13950-13954, 1997: Ford et al., therapy may be used and in the event of a predictive unfavor Proc. Natl. Acad. Sci. U.S.A., 95:4584-4588, 1998). able outcome (failure), more aggressive therapy may be rec 0072 The results for pediatric B precursor ALL suggest ommended and implemented. that this disease is composed of novel intrinsic biologic clus 0068. The expression levels of multiple (two or more, ters defined by shared gene expression profiles, and that these preferably three or more, more preferably at least five genes intrinsic subsets cannot reliably be defined or predicted by as described hereinabove and in addition to the five, up to traditional labels currently used for risk classification or by twenty-four to thirty-one genes within the genes listed in the presence or absence of specific cytogenetic abnormalities. Tables 1P and 1 O in one or more lists of genes associated with We have identified 31 genes (Table 1P) and 26 genes (Table US 2011/0230372 A1 Sep. 22, 2011

1Q) for determining outcome in high risk B-ALL, and in In embodiments of the method of the invention practiced in particular high risk pediatric B precursor ALL using the cell culture (such as methods for screening compounds to methods set forth hereinbelow, for identifying candidate identify therapeutic agents), the biological sample can be genes associated with classification and outcome. We have whole or lysed cells from the cell culture or the cell superna identified 8 preferred genes (Table 1P) which are predictors of tant. outcome in high risk B precursor ALL patients, especially 0079 Gene expression levels can be assayed qualitatively high risk pediatric B precursor ALL patients. We have iden or quantitatively. The level of a gene product is measured or tified 11 genes (preferably 9 genes) which are predictors of estimated in a sample either directly (e.g., by determining or outcome in high risk B precursor ALL patients, especially estimating absolute level of the gene product) or relatively high risk pediatric B precursor ALL patients. Expression of (e.g., by comparing the observed expression level to a gene two or more of these genes which is greater than a predeter expression level of another samples or set of samples). Mea mined value or from a control may be indicative that tradi Surements of gene expression levels may, but need not, tional B-ALL therapy is appropriate (low risk) or inappropri include a normalization process. ate (high risk) for treating the patient's B precursor ALL. 0080 Typically, mRNA levels (or cDNA prepared from Where traditional therapy is viewed as being inappropriate Such mRNA) are assayed to determine gene expression lev (high risk), a measurement of the expression of these genes els. Methods to detect gene expression levels include North which is higher than predetermined values for each of these ern blot analysis (e.g., Harada et al., Cell 63:303-312 (1990)), genes is predictive of a high likelihood ofatherapeutic failure S1 nuclease mapping (e.g., Fujita et al., Cell 49:357-367 using traditional B precursor ALL therapies. High expression (1987)), polymerase chain reaction (PCR), reverse transcrip for these (high risk) genes would dictate an early aggressive tion in combination with the polymerase chain reaction (RT therapy or experimental therapy in order to increase the like PCR) (e.g., Example III; see also Makino et al., Technique lihood of a favorable therapeutic outcome. Low expression 2:295-301 (1990)), and reverse transcription in combination for these (high risk) genes and/or expression of low risk genes with the chain reaction (RT-LCR). Multiplexed meth would favor traditional therapy and a favorable result from ods that allow the measurement of expression levels for many that therapy. genes simultaneously are preferred, particularly in embodi 0073. Some genes in these clusters are metabolically ments involving methods based on gene expression profiles related, Suggesting that a metabolic pathway that is associated comprising multiple genes. In a preferred embodiment, gene with cancer initiation or progression. Other genes in these expression is measured using an oligonucleotide microarray, metabolic pathways, like the genes described herein but such as a DNA microchip. DNA microchips contain oligo upstream or downstream from them in the metabolic path nucleotide probes affixed to a solid substrate, and are useful way, thus can also serve as therapeutic targets. for screening a large number of samples for gene expression. 0074. In yet another aspect, the invention provides genes DNA microchips comprising DNA probes for binding poly and gene expression profiles which may be used to discrimi nucleotide gene products (mRNA) of the various genes from nate high riskB-ALL from acute myeloid leukemia (AML) in Table 1 are additional aspects of the present invention. infant leukemias by measuring the expression levels of the I0081 Alternatively or in addition, polypeptide levels can gene product(s) correlated with B-ALL as otherwise be assayed. Immunological techniques that involve antibody described herein, especially B-precursor ALL. binding. Such as enzyme linked immunosorbent assay 0075. It should be appreciated that while the present (ELISA) and radioimmunoassay (RIA), are typically invention is described primarily in terms of human disease, it employed. Where activity assays are available, the activity of is useful for diagnostic and prognostic applications in other a polypeptide of interest can be assayed directly. mammals as well, particularly in Veterinary applications such I0082. As discussed above, the expression levels of these as those related to the treatment of acute leukemia in cats, markers in a biological sample may be evaluated by many dogs, cows, pigs, horses and rabbits. methods. They may be evaluated for RNA expression levels. 0076 Further, the invention provides methods for compu Hybridization methods are typically used, and may take the tational and statistical methods for identifying genes, lists of form of a PCR or related amplification method. Alternatively, genes and gene expression profiles associated with outcome, a number of qualitative or quantitative hybridization methods karyotype, disease Subtype and the like as described herein. may be used, typically with some standard of comparison, 0077. In sum, the present invention has identified a group e.g., actin message. Alternatively, measurement of protein of genes which strongly correlate with favorable/unfavorable levels may performed by many means. Typically, antibody outcome in B precursor acute lymphoblastic leukemia and based methods are used, e.g., ELISA, radioimmunoassay, contribute unique information to allow the reliable prediction etc., which may not require isolation of the specific marker of a therapeutic outcome in high risk B precursor ALL, espe from other proteins. Other means for evaluation of expression cially high risk pediatric B precursor ALL. levels may be applied. Antibody purification may be per formed, though separation of protein from others, and evalu Measurement of Gene Expression Levels ation of specific bands or peaks on protein separation may 0078 Gene expression levels are determined by measur provide the same results. Thus, e.g., mass spectroscopy of a ing the amount or activity of a desired gene product (i.e., an protein sample may indicate that quantitation of a particular RNA or a polypeptide encoded by the coding sequence of the peak will allow detection of the corresponding gene product. gene) in a biological sample. Any biological sample can be Multidimensional protein separations may provide for quan analyzed. Preferably the biological sample is a bodily tissue titation of specific purified entities. or fluid, more preferably it is a bodily fluid such as blood, I0083. The observed expression levels for the gene(s) of serum, plasma, urine, bone marrow, lymphatic fluid, and CNS interest are evaluated to determine whether they provide diag or spinal fluid. Preferably, samples containing mononuclear nostic or prognostic information for the leukemia being ana bloods cells and/or bone marrow fluids and tissues are used. lyzed. The evaluation typically involves a comparison US 2011/0230372 A1 Sep. 22, 2011 between observed gene expression levels and either a prede nizes (i.e., increases) the activity of the polypeptide of inter termined gene expression level or threshold value, or a gene est. For example, in the case of BTG3, CD2, RGS2 or other expression level that characterizes a control sample (“prede gene product, these gene products may be administered to the termined value'). The control sample can be a sample patient to enhance the activity and treat the patient. obtained from a normal (i.e., non-leukemic) patient(s) or it I0089 Gene therapies can also be used to increase the can be a sample obtained from a patient or patients with high amount of a polypeptide of interest in a host cell of a patient. risk B-ALL that has been cured. For example, if a cytogenic Polynucleotides operably encoding the polypeptide of inter classification is desired, the biological sample can be interro est can be delivered to a patient either as “naked DNA” or as gated for the expression level of a gene correlated with the part of an expression vector. The term vector includes, but is cytogenic abnormality, then compared with the expression not limited to, plasmid vectors, cosmid vectors, artificial level of the same gene in a patient known to have the cytoge chromosome vectors, or, in Some aspects of the invention, netic abnormality (oran average expression level for the gene viral vectors. Examples of viral vectors include adenovirus, that characterizes that population). herpes simplex virus (HSV), alphavirus, simian virus 40, 0084. The present study provides specific identification of picornavirus, vaccinia virus, retrovirus, lentivirus, and adeno multiple genes whose expression levels in biological samples associated virus. Preferably the vector is a plasmid. In some will serve as markers to evaluate leukemia cases, especially aspects of the invention, a vector is capable of replication in therapeutic outcome in high risk B-ALL cases, especially the cell to which it is introduced; in other aspects the vector is high risk pediatric B-ALL cases. These markers have been not capable of replication. In some preferred aspects of the selected for statistical correlation to disease outcome data on present invention, the vector is unable to mediate the integra a large number of leukemia (high risk B-ALL) patients as tion of the vector sequences into the genomic DNA of a cell. described herein. An example of a vector that can mediate the integration of the vector sequences into the genomic DNA of a cell is a retro Treatment of Infant Leukemia and Pediatric B-Precursor viral vector, in which the integrase mediates integration of the ALL retroviral vector sequences. A vector may also contain trans 0085. The genes identified herein that are associated with poson sequences that facilitate integration of the coding outcome of a disease state may provide insight into a treat region into the genomic DNA of a host cell. ment regimen. That regimen may be that traditionally used for 0090 Selection of a vector depends upon a variety of the treatment of leukemia (as discussed hereinabove) in the desired characteristics in the resulting construct, such as a case where the analysis of gene products from samples taken selection marker, vector replication rate, and the like. An from the patient predicts a favorable therapeutic outcome, or expression vector optionally includes expression control alternatively, the chosen regimen may be a more aggressive sequences operably linked to the coding sequence Such that approach (e.g., higher dosages of traditional therapies for the coding region is expressed in the cell. The invention is not longer periods of time) or even experimental therapies in limited by the use of any particular promoter, and a wide instances where the predictive outcome is that of failure of variety is known. Promoters act as regulatory signals that bind therapy. RNA polymerase in a cell to initiate transcription of a down I0086. In addition, the present invention may provide new stream (3' direction) operably linked coding sequence. The treatment methods, agents and regimens for the treatment of promoter used in the invention can be a constitutive or an leukemia, especially high risk B-precursor acute lymphoblas inducible promoter. It can be, but need not be, heterologous tic leukemia, especially high risk pediatric B-precursor ALL. with respect to the cell to which it is introduced. The genes identified herein that are associated with outcome 0091 Another option for increasing the expression of a and/or specific disease Subtypes or karyotypes are likely to gene is to reduce the amount of methylation of the gene. have a specific role in the disease condition, and hence rep Demethylation agents, therefore, may be used to re-activate resent novel therapeutic targets. Thus, another aspect of the the expression of one or more of the gene products in cases invention involves treating high risk B-ALL patients, includ where methylation of the gene is responsible for reduced gene ing high risk pediatric ALL patients by modulating the expression in the patient. expression of one or more genes described herein in Table 1P 0092. For other genes identified herein as being correlated or 1F to a desired expression level or below. with therapeutic failure or without outcome in high risk 0087. In the case of those gene products (Table 1 P and 1 O) B-ALL, such as high risk pediatric B-ALL, high expression whose increased or decreased expression (whether above or of the gene is associated with a negative outcome rather than below a predetermined value, for example obtained for a a positive outcome (high risk). In such instances, where the control sample) is associated with a favorable outcome or expression levels of these genes as described are high, the failure, the treatment method of the invention will involve predicted therapeutic outcome in Such patients is therapeutic enhancing the expression of one or more of those gene prod failure for traditional therapies. In such case, more aggressive ucts in which a favorable therapeutic outcome is predicted approaches to traditional therapies and/or experimental thera (low risk) by Such enhancement and inhibiting the expression pies may be attempted. of one or more of those gene products in which enhanced 0093. The genes described above (high risk, negative out expression is associated with failed therapy (high risk). come) accordingly represent novel therapeutic targets, and 0088. The therapeutic agent can be a polypeptide having the invention provides a therapeutic method for reducing the biological activity of the polypeptide of interest (e.g., (inhibiting) the amount and/or activity of these polypeptides BTG3, CD2, RGS2 or other gene product, preferably a low of interest in a leukemia patient. Preferably the amount or risk gene/gene product) or a biologically active subunit or activity of the selected gene product is reduced to less than analog thereof. Alternatively, the therapeutic agent can be a about 90%, more preferably less than about 75%, most pref ligand (e.g., a small non-peptide molecule, a peptide, a pep erably less than about 25% of the gene expression level tidomimetic compound, an antibody, or the like) that ago observed in the patient prior to treatment. US 2011/0230372 A1 Sep. 22, 2011

0094 Genes (gene products) which are described as high agent; the age, health, and weight of the recipient; the kind of risk from Table 1 P include BMPR1B: C8orf38; CDC42EP3; concurrent treatment, if any; frequency of treatment; and the CTGF DKFZP761M1511: ECM1 GRAMD1C: IGJ; effect desired. A therapeutic agent(s) identified herein can be LDB3; LOC400581; LRRC62: MDFIC; NTSE; PON2: administered in combination with any other therapeutic agent SCHIP1: SEMA6A: TSPAN7; and TTYH2. Of these, one or (s) such as immunosuppressives, cytotoxic factors and/or more of the following represent preferred therapeutic targets: cytokine to augment therapy, see Golub et al. Golub et al., BMPR1B: CTGF; IGJ; LDB3; PON2; RGS2: SCHIP1 and U.S. Patent Application Publication No. 2003/0134300, pub SEMA6A. Genes (gene products) which are described as lished Jul. 17, 2003, for examples of suitable pharmaceutical high risk from Table 1Q include: BMPR1B: BTBD11; formulations and methods, Suitable dosages, treatment com C21orf37; CA6; CDC42EP3; CKMT2: CRLF2: CTGF; binations and representative delivery vehicles. DIP2A: GIMAP6; GPR110; IGFBP6; IGJ; K1F1C: LDB3; 0098. The effect of a treatment regimen on an acute leu LOC391849; LOC650794: MUC4; NRXN3; PON2; RGS3: kemia patient can be assessed by evaluating, before, during SCHIP1; SCRN3; EMA6A and ZBTB16. Of these, one or and/or after the treatment, the expression level of one or more more of the following represent preferred therapeutic targets: genes as described herein. Preferably, the expression level of BMPR1B: CA6: CRLF2: GPR110; IGJ; LDB3: MUC4; gene(s) associated with outcome. Such as a gene as described NRXN3; PON2; and SEMA6A above, may be monitored over the course of the treatment 0095. A cell manufactures proteins by first transcribing period. Optionally gene expression profiles showing the the DNA of a gene for that protein to produce RNA (tran expression levels of multiple selected genes associated with Scription). In , this transcript is an unprocessed outcome can be produced at different times during the course RNA called precursor RNA that is subsequently processed of treatment and compared to each other and/or to an expres (e.g. by the removal of introns, splicing, and the like) into sion profile correlated with outcome. messenger RNA (mRNA) and finally translated by ribosomes into the desired protein. This process may be interfered with Screening for Therapeutic Agents or inhibited at any point, for example, during transcription, 0099. The invention further provides methods for screen during RNA processing, or during translation. Reduced ing to identify agents that modulate expression levels of the expression of the gene(s) leads to a decrease or reduction in genes identified herein that are correlated with outcome, risk the activity of the gene product and, in cases where high assessment or classification, cytogenetics or the like. Candi expression leads to a theapeuric failure, an expected thera date compounds can be identified by Screening chemical peutic success. libraries according to methods well known to the art of drug 0096. The therapeutic method for inhibiting the activity of discovery and development (see Golub et al., U.S. Patent a gene whose high expression (Table 1P/1O) is correlated Application Publication No. 2003/0134300, published Jul. with negative outcome/therapeutic failure involves the 17, 2003, for a detailed description of a wide variety of administration of a therapeutic agent to the patient to inhibit screening methods). The screening method of the invention is the expression of the gene. The therapeutic agent can be a preferably carried out in cell culture, for example using leu nucleic acid, such as an antisense RNA or DNA, or a catalytic kemic cell lines (especially B-precursor ALL cell lines) that nucleic acid Such as a ribozyme, that reduces activity of the express known levels of the therapeutic target or other gene gene product of interest by directly binding to a portion of the product as otherwise described herein (see Table 1G and 1P). gene encoding the enzyme (for example, at the coding region, The cells are contacted with the candidate compound and at a regulatory element, or the like) or an RNA transcript of changes in gene expression of one or more genes relative to a the gene (for example, a precursor RNA or mRNA, at the control culture or predetermined values based upon a control coding region or at 5' or 3' untranslated regions) (see, e.g., culture are measured. Alternatively, gene expression levels Golub et al., U.S. Patent Application Publication No. 2003/ before and after contact with the candidate compound can be 0134300, published Jul. 17, 2003). Alternatively, the nucleic measured. Changes in gene expression (above or below a acid therapeutic agent can encode a transcript that binds to an predetermined value, depending upon the low risk or high risk endogenous RNA or DNA; or encode an inhibitor of the character of the gene/gene product) indicate that the com activity of the polypeptide of interest. It is sufficient that the pound may have therapeutic utility. Structural libraries can be introduction of the nucleic acid into the cell of the patient is or Surveyed computationally after identification of a lead drug to can be accompanied by a reduction in the amount and/or the achieve rational drug design of even more effective com activity of the polypeptide of interest. An RNA captamer can pounds. also be used to inhibit gene expression. The therapeutic agent 0100. The invention further relates to compounds thus may also be protein inhibitor or antagonist, such as Small identified according to the screening methods of the inven non-peptide molecule such as a drug or a prodrug, a peptide, tion. Such compounds can be used to treat high risk B-ALL a peptidomimetic compound, an antibody, a protein or fusion especially include high risk pediatric B-ALL as appropriate, protein, or the like that acts directly on the polypeptide of and can be formulated for therapeutic use as described above. interest to reduce its activity. 0101 Active analogs, as that term is used herein, include 0097. The invention includes a pharmaceutical composi modified polypeptides. Modifications of polypeptides of the tion that includes an effective amount ofatherapeutic agent as invention include chemical and/or enzymatic derivatizations described herein as well as a pharmaceutically acceptable at one or more constituent amino acids, including side chain carrier. These therapeutic agents may be agents or inhibitors modifications, backbone modifications, and N- and C-termi of selected genes (table 1P/1O). Therapeutic agents can be nal modifications including acetylation, hydroxylation, administered in any convenient manner including parenteral, methylation, amidation, and the attachment of carbohydrate Subcutaneous, intravenous, intramuscular, intraperitoneal, or lipid moieties, cofactors, and the like. intranasal, inhalation, transdermal, oral or buccal routes. The 0102. In certain aspects of the present invention, a thera dosage administered will be dependent upon the nature of the peutic method may rely on an antibody to one or more gene US 2011/0230372 A1 Sep. 22, 2011 products predictive of outcome, preferably to one or more 1P and 1 O. In certain preferred embodiments, the microchip gene product which otherwise is predictive of a negative contains DNA probes for all 31 genes or 26 genes which are outcome, so that the antibody may function as an inhibitor of set forth in Tables 1 P and 1C. Various probes can be provided a gene product. Preferably the antibody is a human or human onto the microchip representing any number and any varia ized antibody, especially if it is to be used for therapeutic tion of gene products as otherwise described in Table 1P or purposes. A human antibody is an antibody having the amino 1Q. In a preferred embodiment, the is an immunoreagent acid sequence of a human immunoglobulin and include anti kit and contains one or more antibodies specific for the bodies produced by human B cells, or isolated from human polypeptide(s) of interest. sera, human immunoglobulin libraries or from trans 0105. Relevant portions of the below cited references are genic for one or more human immunoglobulins and that do referenced and incorporated herein. In addition, previously not express endogenous immunoglobulins, as described in published WO 2004/053074 (Jun. 24, 2004) is incorporated U.S. Pat. No. 5,939,598 by Kucherlapati et al., for example. by reference in its entirety herein. Transgenic animals (e.g., mice) that are capable, upon immu 0106. In the present invention, sophisticated computa nization, of producing a full repertoire of human antibodies in tional tools and statistical methods were used to reduce the the absence of endogenous immunoglobulin production can comprehensive molecular profiles to a more limited set of 8 be employed. For example, it has been described that the genes from Table 1P or 11 genes (preferably 9 genes) from homozygous deletion of the antibody heavy chain joining Table 1Q (a gene expression “classifier) that is highly pre region (J(H)) gene in chimeric and germ-line mutant mice dictive of overall outcome in high risk B-ALL, including high results in complete inhibition of endogenous antibody pro risk pediatric B-ALL. duction. Transfer of the human germ-line immunoglobulin 0107 As described in the following examples, the inven gene array in Such germ-line mutant mice will result in the tors examined pre-treatment specimens from 207 patients production of human antibodies upon antigen challenge (see, with high risk B-precursor acute lymphoblastic leukemia e.g., Jakobovits et al., Proc. Natl. Acad. Sci. U.S.A.,90:2551 (ALL) who were uniformly treated on Children's Oncology 2555 (1993); Jakobovits et al., Nature, 362:255-258 (1993); Group Trial COG P9906. Gene expression profiles were cor Bruggemann et al., Year in Immuno. 7:33 (1993)). Human related with clinical features, treatment responses, and antibodies can also be produced in phage display libraries relapse free survivals (RFS). The use of four different unsu (Hoogenboom et al., J. Mol. Biol., 227:381 (1991); Marks et pervised clustering methods showed significant overlap in the al., J. Mol. Biol., 222:581 (1991)). The techniques of Cote et classification of these patients. Two clusters contained all al. and Boerner et al. are also available for the preparation of children with either t(1:19)(q23:p 13) translocations or MLL human monoclonal antibodies (Cole et al., Monoclonal Anti rearrangements. The other six clusters were novel and not bodies and Cancer Therapy, Alan R. Liss, p. 77 (1985); associated with recurrent chromosomal abnormalities or dis Boerner et al., J. Immunol., 147(1):86-95 (1991)). tinctive clinical features. One of these clusters (R6; n=21) had 0103) Antibodies generated in non-human species can be significantly better 4-year RFS of 95% as compared to the “humanized' for administration in humans in order to reduce 4-year RFS of 61% for the entire cohort (P=0.002). A cluster their antigenicity. Humanized forms of non-human (e.g., of children (R8; n=24) with dismal outcomes was found with murine) antibodies are chimeric immunoglobulins, immuno a 4 year RFS of only 21% (P<0.001). A significant proportion globulin chains or fragments thereof (such as Fv, Fab, Fab, of these children (63%;15/24) were of Hispanic/Latino eth F(ab')2, or otherantigen-binding Subsequences of antibodies) nicity. Specific gene alterations in this unique Subset of ALL which contain minimal sequence derived from non-human provide the basis for up-front identification of these immunoglobulin. Residues from a complementary determin extremely high risk individuals and allow for the possibility ing region (CDR) of a human recipient antibody are replaced of targeted therapy. by residues from a CDR of a non-human species (donor antibody) Such as mouse, rat or rabbit having the desired Examples specificity. Optionally, Fv framework residues of the human 0108. Through the optimization and progressive intensifi immunoglobulin are replaced by corresponding non-human cation of standard chemotherapeutic regimens, remarkable residues. See Jones et al., Nature, 321:522-525 (1986); advances have been achieved in the treatment of pediatric Riechmann et al., Nature, 332:323-327 (1988); and Presta, acute lymphoblastic leukemia (ALL). 1-3 (References-First Curr. Op. Struct. Biol. 2:593-596 (1992). Methods for Set) In parallel, laboratory investigations have provided humanizing non-human antibodies are well known in the art. remarkable insights into the biologic and genetic heteroge See Jones et al., Nature, 321:522-525 (1986); Riechmann et neity of this disease with the characterization of several recur al., Nature, 332:323-327 (1988); Verhoeyen et al., Science, ring genetic abnormalities (hyperdiploidy, hypodiploidy, 239:1534-1536 (1988); and (U.S. Pat. No. 4,816,567). t(12:21)(ETV6-RUNX1), t(1:19)(TCF3-PBX1), t(9:22) Laboratory Applications (BCR-ABL1), and translocations involving 11q23(MLL)) that are associated with distinct therapeutic outcomes and 0104. The present invention further includes an exemplary clinical phenotypes.2 Detailed risk classification schemes, microchip for use in clinical settings for detecting gene incorporating pre-treatment clinical characteristics (such as expression levels of one or more genes described herein as age, sex, and presenting white blood cell (WBC) count), the being associated with outcome, risk classification, cytogenics presence or absence of recurring cytogenetic abnormalities, or Subtype in high risk B-ALL, including high risk pediatric and measures of minimal residual disease (MRD) at the end B-ALL. In a preferred embodiment, the microchip contains of induction therapy, are now used to tailor the intensity of DNA probes specific for the target gene(s). Also provided by therapy to a child's relative relapse risk (categorized as “low.” the invention is a kit that includes means for measuring “standard/intermediate.” “high, or “very high”). 4-6. Yet, expression levels for the polypeptide product(s) of one or despite refinements in risk classification and improvements in more such genes, including any of the genes listed in Tables overall Survival, the second most common cause of cancer US 2011/0230372 A1 Sep. 22, 2011

related mortality in children in the United States remains Materials and Methods relapsed ALL.7 While relapses are more frequent in children with “very high risk” disease, associated with BCR-ABL1 or Patient Selection hypodiploidy, relapses occur within all currently defined risk 0111 Patient samples and clinical and outcome data for groups. 1.7 Indeed, the majority of relapses occur in children this study were obtained from The Children's Oncology initially assigned to the “standard/intermediate' or “high” Group (COG) Clinical Trial P9906. COGP9906 enrolled 272 risk categories.7 Thus, a primary challenge in pediatric ALL eligible “high-risk” B-precursor ALL patients between Mar. is to prospectively identify those children with higher risk 15, 2000 and Apr. 25, 2003; all patients were uniformly disease who do not benefit from therapeutic intensification treated with a modified augmented BFM regimen.6.19 This and who require the development of new therapies for cure." trial targeted a subset of newly diagnosed “high-risk” ALL patients that had experienced a poor outcome (44% RFS at 4 0109. In the present application, we determined if gene years) in prior studies.5.20 Patients with central nervous sys expression profiling could be used to improve risk classifica tem disease (CNS3) or testicular leukemia were eligible for tion and outcome prediction in “high-risk pediatric ALL, a the trial regardless of age or WBC count at diagnosis. Patients risk category largely defined by pretreatment clinical charac with “very high risk features (BCR-ABL1 or hypodiploidy) teristics (age >10 years and presenting WBC >50,000/uL) were excluded while those with “low-risk” features (triso and the absence of genetic abnormalities associated with mies of chromosomes 4 or 10; t(12:21)(ETV6-RUNX1)) “low” (hyperdiploidy, t(12:21)(ETV6-RUNX1)) or “very were excluded unless they had CNS3 or testicular leukemia. high’ (hypodiploidy, tC9:22)(BCR-ABL1)) risk disease.4 The majority of patients had minimal residual disease (MRD) Over 25% of children diagnosed with ALL are initially clas assessed by flow cytometry as previously described; cases sified as “high-risk.” Outcomes in this form of ALL remain were defined as MRD-positive or MRD-negative at the end of poor with high rates of relapse and relapse-free survivals of induction therapy (day 29) using a threshold of 0.01%.6 For only 45-60%.7 Furthermore, the underlying genetic features this study, previously cryopreserved residual pre-treatment associated with this form of ALL have not been well charac leukemia specimens were available on a representative cohort terized. Thus, gene expression profiling and other compre of 207 of the 272 (76%) registered patients. With the excep hensive genomic technologies, such as assessment of genome tion of differences in presenting WBC count, these 207 copy number abnormalities or DNA sequencing, have the patients were highly similar in all other clinical and outcome potential to resolve the underlying genetic heterogeneity of parameters to all 272 patients accrued to this trial (see Supple this form of ALL and to capture genetic differences that ment Table S1). For validation of the performance of the impact treatment response which can be exploited for classifiers, an independent set of 84 children with “high-risk” improved risk classification and the identification of novel ALL, previously treated on COG Trial 1961, was used as a therapeutic targets.8-15 validation cohort. 14 (Supplement, Section 2 provides the detailed patient characteristics of the validation cohort). Treatment protocols were approved by the National Cancer Gene Expression Classifiers for Relapse Free Survival and Institute (NCI) and participating institutions through their Minimal Residual Disease Institutional Review Boards. Informed consent for clinical 0110. From the gene expression profiles obtained in the trial registration, Sample Submission, and participation in pre-treatment leukemic cells of 207 uniformly treated chil these research studies was obtained from all patients or their dren with high-risk ALL, we used Supervised learning algo guardians. rithms and extensive cross-validation techniques to build a 42 probe-set (38 gene) expression classifier predictive of Microarray Analyses relapse-free survival (RFS). In multivariate analysis, the best 0112 RNA was purified from 207 pre-treatment diagnos predictive model for RFS was this gene expression classifier tic samples with >80% blasts (131 bone marrow, 76 periph combined with either flow cytometric measures of minimal eral blood) and hybridized to HG U133A Plus2.0 oligo residual disease (MRD) determined at the end of induction nucleotide microarrays (Affymetrix, Santa Clara, Calif., therapy (day 29), or, a 23 probe-set (21 gene) molecular USA) after RNA quantification, cDNA preparation, and classifier derived from pre-treatment samples that could pre labeling (Supplement, Section 3, below). Signals were dict levels of end-induction flow MRD at initial diagnosis. scanned (Affymetrix GeneChip Scanner) and analyzed with The application of these classifiers separated children with Affymetrix Microarray Suite (MAS 5.0). The expression sig “high-risk” ALL into three distinct risk groups with signifi nal matrix used for outcome analyses corresponded to a fil cantly different survivals in the initial patient cohort used for tered list of 23,775 probe sets (Supplement, Section 4). This modeling and in a second independent cohort of high-risk gene expression dataset may be accessed via the National ALL patients used for validation. The gene expression clas Cancer Institute caArray site (see website array.nci.nih.gov/ sifier for RFS alone and combined with flow MRD also caarrayf) or at Gene Expression Omnibus (ncbi.nlm.nih.gov/ retained independent prognostic significance in the presence geo/). of other genetic abnormalities (IKAROS/IKZF1 deletions, 16 JAK mutations, 17 and gene expression signatures reflective Statistical Analyses of activated tyrosine 16,18) that we and others have recently discovered and determined to be associated with a 0113 Relapse-free survival (RFS) was calculated from the poor outcome in pediatric ALL. Thus, gene expression clas date of trial enrollment to either the date of first event (re sifiers significantly enhance outcome prediction and risk clas lapse) or last follow-up. Patients in clinical remission, or with sification in high-risk ALL and in particular, identify a group a second malignancy, or with a toxic death as a first event were of children most likely to fail current therapeutic approaches censored at the date of last contact. As described in detail in and for whom novel therapies must be developed for cure. the Supplement (Sections 4C, 5-9), a Cox score was used to US 2011/0230372 A1 Sep. 22, 2011

rank genes based on their association with RFS and a Cox (67/191) (Table 1).6 Among pre-treatment clinical variables proportional hazards model-based Supervised principal com (age, sex, and CNS involvement), the presence of recurrent ponents analysis (SPCA)21 was used to build the gene cytogenetic abnormalities (TCF3-PBX1 and MLL), and mea expression classifier for RFS from the rank-ordered gene list. Sures of minimal residual disease, only end-induction flow Similarly, for the development of the gene expression classi MRD and increasing WBC count were significantly associ fier predictive of end-induction minimal residual disease ated with decreased RFS and both retained significance in (MRD), a modified t-test was used to rank genes expressed in multivariate analysis (LRT based on COX regression, P-0. pre-treatment cells according to their association with day 29 001) (Table 1). A trend towards declining RFS was also flow MRD. defined as “positive' or “negative' at a threshold observed among the 25% of children with Hispanic/Latino of 0.01%.6 Diagonal linear discriminant analysis (DLDA)22 ethnicity (P=0.049) (Table 1). 23 was then used to build a prediction model and the classifier for MRD from the top-ranked genes. The likelihood-ratio-test TABLE 1 (LRT) score and the prediction error rate were used in the model construction and evaluation. To avoid over-fitting, Association of Relapse Free Survival with Clinical extensive crossvalidation was used to determine the numbers and Genetic Features in the High-Risk ALL Cohort of top-ranked genes to be included.23 Nested crossvalida Association with Relapse tions provided predictions for individual cases as well as Free Survival? overall measures of the selected models performance.22-23 For the first multivariate analysis testing the predictive power Characteristic Hazard Ratio P-Value of the gene expression classifier for RFS relative to flow Age cytometric measures of MRD and to other clinical and genetic e1OYrs 132 1 variables, a multivariate proportional Cox hazards regression <1OYrs 75 1.152 O.S61 analysis was performed with the risk score (determined by Age gene expression classifier for RFS), WBC (on a log scale) and flow cytometric measures of MRD as explanatory variables. Median 13 yrs The Likelihood RatioTest (LRT) was performed to determine Range 1-2O 995 O.817 whether the risk score defined by the gene expression classi Sex fier for RFS was a significant predictor of time to relapse, Male 137 1 adjusting for WBC and MRD. To determine if the gene Female 70 O.769 O.32O expression classifier for RFS and the combined classifier WBC (with flow cytometric measures of MRD) retained prognostic Median 62.3K importance in the presence of new ALL-associated genetic Range 1-959 OO3 <0.001 abnormalities associated with a poor outcome that we and MRD at Day 29 others have recently described, we accessed our recently pub Negative 124 lished data reporting IKZFMKAROS deletionslö and JAK Positive 67 2.8OS <0.001 mutations17 in ALL as these studies were performed using Race DNA samples from the same cohort of patients with high-risk Hispanic 51 .644 O.049 ALL (COG P9906) reported herein. The primary DNA copy or Latino number variation data reporting IKZF1 deletionslé may be Others 1S6 accessed at the website: target.cancer.gov/data. The JAK MLL mutation data17 may be accessed at pnas.org/content/suppl/ 2009/05/22/081 1761106.DCSupplemental/081 1761106SI. Positive 21 O61 O.881 pdf (website). A multivariate Cox proportional hazards Negative 186 regression analysis was performed with each expression clas sifier and included IKZFMKAROS deletions, JAK muta Positive 23 704 O409 tions, and kinase gene expression signatures as additional Negative 184 explanatory variables. A likelihood ratio test was then per CNS formed to determine if the classifiers retained independent No blasts 160 prognostic significance adjusting for the effects of all cova <5 blasts 26 O78 O.826 riates. All statistical analyses utilized Stata Version 9 and R. 25 blasts 21 O.670 O.392 "Only 191,207 patients in the high-riskALL cohort had flow MRD results at end-induction. Results *Hazard ratio and corresponding p value are based on Cox regression, Patients and Clinical Risk Factors A Gene Expression Classifier Predictive of Survival 0114. The median age of the 207 high-risk B-precursor ALL patients registered to COG Trial P9906 was 13 years 0115 Gene expression profiles were obtained from pre (range: 1-20 years) (Table 1). While 23 of the 207 ALL treatment leukemic samples in each of the 207 high-risk ALL patients had at 1:19)(TCF3-PBX1) and 21 had various trans patients. To develop a gene expression-based classifier pre locations involving MLL, the remaining 163 high-risk cases dictive of relapse free survival (RFS), each of the 23,775 had no other known recurring cytogenetic abnormalities informative probe-sets on the gene expression microarrays (Table 1). Relapse-free survival in these 207 patients was was ranked based on strength of association with RFS (Cox 66.3% at 4 years (95% CI: 59-73%) (FIG. 1A). Day 29 score).21 As detailed in the Supplement (Sections 4C, 5, 8), a minimal residual disease, measured using flow cytometric Cox proportional hazards model-based Supervised principal techniques (end-induction flow MRD), was detected in 35% componentanalysis (SPCA) was used to build the expression US 2011/0230372 A1 Sep. 22, 2011

classifier for RFS which was optimized by performing 20 (FIGS. 2C-E: Table 2). No significant survival differences iterations of 5-fold crossvalidation.21 The final model incor (P=0.57) were observed among those with discordant predic porated the top 42 Affymetrix microarray probe sets corre tors, either those patients with low gene expression classifier sponding to 38 unique genes (see Supplement Table S4 for the risk scores and positive end-induction flow MRD (28/191, gene list; false discovery rate=8.45%, SAM).24. The pre 15% of cohort) or those with high gene expression classifier dicted gene expression classifier-based “risk score' for risk scores and negative endinduction flow MRD (52/191, relapse for a given patient was computed via nested leave 27% of cohort). These two groups were thus combined into an one-out cross-validation (LOOCV) over the full model build intermediate risk group (FIG. 2E). FIG. 2E provides the ing procedure (Supplement, Section 5 and 8). With a thresh Kaplan-Meier survival estimates for the three risk groups old of Zero, the gene expression classifier-derived risk scores significantly separated the 207 high-risk ALL patients into defined by the combined classifier and highlight the signifi low (4 yr RFS: 81%, 95%CI: 72-87%; n=109) versus high (4 cant differences in RFS. These three risk groups varied sig yr RFS: 50%, 95%CI: 39-60%; n=98) riskgroups (FIGS. 1B nificantly in age and in the presence of the known recurring and C). Increased expression of BMPR1B, CTGF (CCN2), cytogenetic abnormalities (Table 2). While the 17 patients TTYH2, IGJ, NTSE (CD73), CDC42EP3, TSPAN7, and with MLL translocations were distributed within the low and decreased expression of NR4A3 (NOR-1), RGS1-2, and intermediate risk groups, all 20 cases with t(1:19)(TCF3 BTG3 were observed in the “high gene expression risk PBX1) were in the lowest risk group, as discussed above group with the poorest outcome (FIG. 1C). In a multivariate (Table 2: FIG. 2E). Interestingly, of the 8 relapses that Cox-regression analysis, the likelihood ratio test (LRT) occurred in the lowest risk group, all 8 were ALL cases with revealed that the gene expression classifier for RFS provided t(1:19)(TCF3-PBX1). Children in each of the three risk significant independent information for outcome prediction, groups had similar proportions of relapse within the bone even after adjusting for flow MRD and WBC count (P=0. marrow or isolated to the CNS (Table 2). 001). TABLE 2

Improving Risk Classification and Outcome Prediction by Clinical and Genetic Features of The Three Risk Combining the Gene Expression Classifier and Flow Cyto Groups Determined by the Combined Application of metric Measures of MRD the Gene Expression Classifier for RFS and Flow Cytometric Measures of Minimal Residual Disease' 0116 Flow cytometric measures of minimal residual dis ease (flow MRD), measured at the end of induction therapy Combined Risk Group P-value (day 29), were also capable of distinguishing two groups of Inter- Total (Fisher patients with significantly different outcomes within the Characteristics Low mediate High Cohort Exact) high-risk ALL cohort (FIG. 2A).6 However, the independent RFS at 4 Years 87% 62% 29% 61%

flow MRD would require waiting until the end of induction TABLE 2-continued therapy, precluding earlier intervention in patients who were destined to ultimately fail therapy. To develop a gene expres Clinical and Genetic Features of The Three Risk Groups Determined by the Combined Application of sion classifier predictive of end-induction MRD in diagnostic the Gene Expression Classifier for RFS and Flow Cytometric pre-treatment specimens, 23.775 informative probe sets from Measures of Minimal Residual Disease' 191 patients (of the 207 patients who had day 29 MRD results available) were ranked on their association with MRD Combined Risk Group P-value (Supplement, Sections 6 and 9). Using a threshold of 1% for Inter- Total (Fisher the false discovery rate, SAM identified 352 probe sets sig Characteristics Low mediate High Cohort Exact) nificantly associated with positive end-induction flow MRD (Supplement, Table S6). A DLDA mode 122.23 predicting CNS MRD was built and optimized by performing 100 iterations of No blasts 57 57 32 146 O457 10-fold cross-validation. The final model incorporated the top <5 blasts 7 14 4 25 23 probe sets (21 unique genes) (Supplement, Table S5), 25 blasts 8 10 2 2O which separated the patients into two groups with signifi Relapse site cantly different outcomes (log rank test, P=0.014). FIG. 4A Isolated 3 15 5 23 O.09S shows the receiver operating characteristic (ROC) curve for CNS2 the nested LOOCV predictions of the classifier. The 23 probe Marrow 5 13 17 35 sets in the gene expression classifier predictive of end-induc Only 191 of the 207 patients in the high risk ALL cohort had flow MRD results at tion MRD (FIG. 4B) include the genes BAALC, P2RY5, end-induction; hence this table reports on 191 total patients. Flow MRD results were avail able on only 17/21 MLL and 20/23 to 1; 19)(TCF3-PBX1) patients. TNFSF4, E2F8, IRF4 CDC42EP3, KLF4, and two probesets *No association was seen between patients with isolated CNS relapse and those with CNS each for EPB41L2 and PARP15. When the gene expression blasts at diagnosis (2 test, P = 0.93). classifier predictive of MRD was substituted for the day 29 0117 To assure that the gene expression classifier could flow MRD data and then combined with the expression clas improve outcome prediction in high-risk ALL patients lack sifier for RFS, three distinct risk groups were resolved that ing known recurring cytogenetic abnormalities, we built a had significantly different RFS at 4 years (low: 82%; inter second gene expression classifier for RFS using a Subset of mediate: 63%; and high risk: 45%) (FIG. 4C). While still 163 of the original 207 COG 9906 high-risk ALL patients highly statistically significant (P<0.0001), the combined excluding those cases with MLL (n=21) or E2A-PBX1 trans classifier using the gene expression classifier for RFS and the locations (n=23), again using a Cox proportional hazards gene expression classifier predicting end-induction MRD model-based Supervised principal component analysis with (FIG. 4C) was slightly less discriminatory than the one com extensive cross-validation (see Supplement Section 10). The bining the gene expression classifier for RFS and flow MRD resulting classifier for RFS contained 32 probe sets (29 unique genes; list provided in Supplement, Table S8) and had (FIG. 2E). a high degree of overlap (84%) with the genes in the initial classifier (Supplement, Table S4). Validation of the Classifiers in an Independent Data Set 0118 With a threshold of Zero, the risk scores derived 0.120. The inventors next determined whether the gene from this second classifier also significantly separated the 163 expression classifiers were predictive of outcome in a second ALL cases into low (4 yr RFS: 76%, 95%CI: 64-84%; n=88) versus high (4 yr RFS: 52%, 95% CI: 40-64%; n=75) risk independent cohort of 84 children with high-risk ALL treated groups (P=0.0001) (FIG. 3A). Flow cytometric measures of on a different clinical trial (COG/CCG 1961). 14, 19 In con end-induction MRD were also capable of distinguishing two trast to the initial COG 9906 high-risk ALL cohort, a WBC risk groups within these 163 high-risk ALL cases (FIG. 3B) count >50,000411 (LRT, P=0.014) and male sex (LRT, P=0. and application of the gene expression classifier further 018) were associated with a worse RFS (Supplement, Section divided both the flow MRD-negative (FIG. 3C) and flow 2). 14, 19 Flow MRD was not evaluated in the CCG 1961 trial. MRD-positive (FIG. 3D) patients into distinct risk groups The initial 38 gene expression classifier for RFS (Supplement with significantly different outcomes. Combining this second Table S4) that we developed from COG P9906 predicted a classifier for RFS with end induction flow MRD yielded four risk score among these 84 patients that was significantly distinct risk groups with significantly different outcomes associated with RFS (Cox proportional hazard regression, (P<0.0001; FIG. 3E). As no significant survival differences P=0.006), even after adjusting for sex and WBC count (mul were observed among the two groups with discordant predic tivariate Cox regression, P=0.01). The gene expression clas tors, these groups were combined into an intermediate risk sifier risk scores split the 84 children from CCG 1961 into group (FIG. 3F). As shown in FIG. 3F, the Kaplan-Meier high (n=28) and low (n=56) risk groups (FIG.5A) Unlike our survival estimates for the three risk groups defined by this initial cohort, a significantly greater number of children with second combined classifier demonstrated highly significant WBC counts >50,000/ul were in the high (82%, 23/28) com differences in RFS (low (83% 4 year RFS, 95%CI: 70-90%), pared to the lower risk groups defined by the expression intermediate (60% 4 yr RFS, 95%CI:44-72%) and high (35% classifier (55%, 31/56) (Fisher exact test, P=0.017). Similar 4 yr RFS, 95% CI: 19-44%) (P<0.0001). These results dem to the COG 9906 cohort, all children with t(1:19)(TCF3 onstrate that gene expression classifiers significantly refine PBX1) were in the lowest risk group, although this cytoge risk classification in high-risk ALL cases lacking known cyto netic abnormality by itself did not predict RFS. We next tested genetic abnormalities. the effect of the combined gene expression classifiers for RFS and MRD and were able to resolve three distinct risk groups A Gene Expression Classifier Predictive of End-Induction with significantly different outcomes (FIG. 5B), demonstrat Flow MRD ing that these classifiers were capable of resolving distinct 0119 The clinical application of a combined classifier risk groups in an independent cohort of children with high utilizing the gene expression classifier for RFS and day 29 risk ALL. US 2011/0230372 A1 Sep. 22, 2011 20

Gene Expression Classifiers Retain Independent Prognostic for RFS and end-induction flow MRD (the “combined clas Significance in the Presence of New Genetic Factors Associ sifier) with kinase signatures, JAK family mutations, and ated with a Poor Outcome in Pediatric ALL IKAROS/IKZF1 deletions (Table 5, FIG. 6). Again, signifi cant associations between each of these variables and the 0121 The inventors and others have recently identified three risk groups (low, intermediate, and high) defined by the new genetic features in pediatric ALL that are associated with combined classifier were seen (Table 5, below). As shown in a poor outcome, including IKAROS/IKZF1 deletions, 16 JAK FIG. 6, the application of the combined classifier refined risk mutations, 17 and gene expression signatures reflective of classification and distinguished different patient groups with activated tyrosine kinase signaling pathways (termed "kinase statistically significant different RFS in the presence or signatures'). 16, 18 Two of these studies 16, 18 first reported absence of a kinase signature (FIGS. 6A and B), in the pres the discovery of ALL cases that lacked a classic BCR-ABLJ ence or absence of JAK mutations (FIGS. 6C and D), and in translocation but which had gene expression profiles reflec the presence or absence of IKAROS/IKZF1 deletions (FIGS. tive of tyrosine kinase activation. Our more recent work17 has 6E and F). In a multivariate Cox regression analysis (Table 6, determined that the majority of these cases have activating below), only the combined classifier retained independent mutations of the JAK family of tyrosine kinases. We thus prognostic significance for outcome prediction. The likeli wished to determine whether the gene expression classifier hood ratio test revealed that the combined classifier retained for RFS, or the combined classifier, retained independent independent prognostic significance after adjusting for the prognostic significance in the presence of these genetic effects of all other genetic abnormalities (P=0.0001). TABLE 3 Association of Kinase Gene Expression Signatures, JAK Mutations, and IKAROS/IKZF1 Deletions with the Low vs. High Risk Groups Defined by the Gene Expression Classifier for RFS Risk Group Determined by Gene p-value Expression Classifier for RFS (Fisher Genetic Feature Low Risk High Risk Total Exact) Kinase Signature Yes O 38 (39%) 38 (18%) <.OO1 No 109 60 (61%) 169 (82%) Total 109 98 (100%) 207 (100%) JAK1 JAK2 Yes O 19 (20%) 19 (10%) <.OO1 Mutation No 105 74 (100%) 179 (90%) Total 105 93 (100%) 198 (100%) IKAROSIKZF1 Yes 14 (13%) 41 (44%) 55 (28%) <.OO1 Deletion No 91 (87%) 52 (56%) 143 (72%) Total 105 (100%) 93 (100%) 198 (100%) The gene expression classifier for RFS used in this analysis is the initial classifier developed with 42 probe sets (38 unique genes) provided in Supplement Table S4. abnormalities. As detailed in the METHODS section, our studies reporting IKAROS/IKZF1 deletions, 16 activated TABLE 4 kinase signatures, 16 and JAK mutations 17 used samples Multivariate Cox-Regression Analysis of the Prognostic from the same COG 9906 high-risk ALL cohort; thus, we Significance of the Risk Group Determined by the Gene Expression could readily perform this multivariate analysis. As shown in Classifier for RFS in the Presence of Genetic Factors Table 3, below, activated kinase signatures, JAK family muta in ALL Associated with a Poor Outcome tions, and IKAROS/IKZF1 deletions were each significantly associated with the highest risk group as defined by the gene Hazard Rato' expression classifier for RFS in the COG 9906 high-risk ALL 95% Confidence cases. Not only did the gene expression classifier for RFS Covariates Estimate Interval P-Value assign all 38 cases with a kinase signature to the highest risk group, it also assigned another 60 cases to this risk group Gene Expression Classifier (Table 3). Similarly, while all cases with JAK mutations were for RFS Risk Group assigned to the highest risk group by the gene expression High Risk vs. Low Risk 2.380 2.36-4.338 O.OOS classifier for RFS, an additional 74 cases lacking these muta IKAROSIKZF1 Deletions tions were also assigned to this high risk group (Table 3. Positive vs. Negative 2.237 1316-3803 O.OO3 below). The gene expression classifier also refined risk clas JAKMutations sification in the presence of IKAROS/IKZF1 deletions (Table 3, below). In a multivariate Cox regression analysis, only the Positive vs. Negative 1.020 SOO-2O81 0.957 gene expression classifier for RFS (p=0.005) and IKAROS/ Kinase Gene Expression IKZF1 deletions (p=0.003) retained prognostic significance Signature (Table 4, below). A likelihood ratio test determined that the Positive vs. Negative 1.094 S90-2.030 0.774 gene expression classifier for RFS retained independent prog 'The gene expression classifier for RFS used in this analysis is the initial classifier developed nostic significance (P=0.0143) when adjusting for all other with 42 probe sets (38 unique genes) provided in Supplement Table S4. covariates. We also examined the association between risk *Hazardratios and correspondingp value are based on Cox regression, groups as defined by the combined gene expression classifier US 2011/0230372 A1 Sep. 22, 2011 21

TABLE 5 Association of Kinase Gene Expression Signatures, JAK Mutations, and IKAROS/IKZF1 Deletions with the Three Risk Groups Defined by the Combined Gene Expression Classifier for RFS and Flow Cytometric Measures of Minimal Residual Disease p-value Combined Risk Group (Fisher Genetic Feature Low Intermediate High Total Exact) Kinase Yes O 13 (16%) 22 (58%) 35 (18%) <0.001 Signature No 72 (100%) 68 (84%) 16 (42%) 156 (82%) Total 72 (100%) 81 (100%) 38 (100%) 191 (100%) JAK1 JAK2 Yes O 9 (12%) 9 (24%) 18 (10%) <0.001 Mutation No 69 (100%) 67 (88%) 28 (76%) 164 (90%) Total 69 (100%) 76 (100%) 37 (100%) 182 (100%) IKAROSIKZF1 Yes 9 (13%) 20 (26%) 25 (68%) 54 (30%) <0.001 Deletion No 60 (87%) 56 (74%) 12 (32%) 128 (70%) Total 69 (100%) 76 (100%) 37 (100%) 182 (100%) 'The gene expression classifier for RFS used in this analysis is the initial classifier developed with 42 probe sets (38 unique genes) provided in Supplement Table S4.

of relapse-free survival (RFS) was capable of resolving two TABLE 6 distinct groups of patients with significantly different out comes within the category of pediatric ALL patients tradi Multivariate Cox-Regression Analysis of the Prognostic Significance of the Risk Group Determined by the Combined tionally defined as “high-risk.” In multivariate analyses, only Gene Expression Classifier for RFS and Flow Cytometric the gene expression-based classifier for RFS and flow cyto Measures of MRD in the Presence of Genetic Factors metric measures of end-induction MRD provided indepen in ALL Associated with a Poor Outcome dent prognostic information for outcome prediction. By com bining the risk scores derived from the gene expression Hazard Ratio? classifier for RFS with end-induction flow MRD, three dis 95% Confidence tinct groups of patients with Strikingly different treatment Covariates Estimate Interval P outcomes could be identified. Similar results were obtained Risk Group Determined when modeling only those high-risk ALL cases that lacked by Gene Expression any known recurring cytogenetic abnormalities. Perhaps Classifier for RFS and Flow MRD most importantly, in terms of the future potential clinical utility of gene expression-based classifiers for risk classifica Intermediate Risk vs. Low Risk 3.366 1569-7222 O.OO2 tion, we further demonstrated that both the gene expression High Risk vs. Low Risk 6.214 2.547-15.16O O.OOO classifier for RFS and the combination of this classifier with IKAROSIKZF1 Deletions end-induction flow MRD retained independent prognostic Positive vs. Negative 1684 923-3.072 O.O89 significance for outcome prediction in the presence of new JAKMutations genetic abnormalities that we and others have recently dis covered and found to be associated with a poor outcome in Positive vs. Negative .987 469-2O76 0.973 pediatric ALL (IKAROS/IKZF1 deletions, JAK mutations, Kinase Gene Expression Signature and kinase signatures). The combined classifier further Positive vs. Negative .988 506-1929 O.972 refilled outcome prediction in the presence of each of these mutations or signatures, distinguishing which cases with JAK The gene expression classifier for RFS used in this analysis is the initial classifier developed with 42 probe sets (38 unique genes) provided in Supplement Table S4. mutations, kinase signatures or IKAROS/IKZF1 deletions *Hazardratios and corresponding p value are based on Cox regression, would have a good (“low risk”), intermediate, or poor (“high risk”) outcome (Table 5, FIG. 6). Thus, while IKZF1 dele tions and JAK mutations are exciting new targets for the Discussion development of novel therapeutic approaches in pediatric ALL, Ssessment of these genetic abnormalities alone may not 0122) While gene expression profiling studies in the acute be fully sufficient for risk classification or to predict overall leukemias have identified gene expression “signatures' asso outcome. As gene expression profiles reflect the full constel ciated with recurrent cytogenetic abnormalities8.25.26 and in lation and consequence of the multiple genetic abnormalities vitro drug responsiveness.9-11, 15 fewer studies have seen in each ALL patient and as measures of minimal residual reported and validated gene expression classifiers predictive disease are a functional biologic measure of residual or resis of Survival. 13.14 In this report, gene expression classifiers tant leukemic cells, they may have an enhanced clinical utility predictive of relapse free survival (RFS) and end-induction for refinement of risk classification and outcome prediction. minimal residual disease were derived from the gene expres I0123. The results reported herein, as well as those of other sion profiles obtained in the pre-treatment samples of 207 recent studies, 16-18 reveal the striking molecular and bio children with B-precursor high-risk ALL. A 42 probe-set logic heterogeneity within children who have traditionally (containing 38 unique genes) expression classifier predictive been classified as “high-risk” ALL. Unexpectedly, 72/207 US 2011/0230372 A1 Sep. 22, 2011 22

(38%) of the “high-risk” ALL patients studied in the COG patients with little chance for cure on contemporary chemo 9906 ALL cohort were found by the combined gene expres therapeutic regimens. Further analysis of these expression sion classifier for RFS and flow MRD classifier to have a profiles, coupled with other comprehensive genomic studies, significantly better survival (87% RFS at 4 years) when com will hopefully lead to the continued identification of novel pared with the entire cohort (66% survival at 4 years). This targets and more effective therapies for these children. group of patients, which included all 20 cases with t(1:19) (TCF3-PBX1) and an additional 52 cases whose underlying 1 Supplement Gene Expression Classifiers for Relapse genetic abnormalities remain to be discovered, was charac Free Survival and Minimal Residual Disease terized by high expression of the tumor Suppressor genes and signaling proteins RGS2, NFKBIB, NR4A3, DDX21, and Patients and Clinical Risk Factors BTG3.27-30 Application of the combined classifier also identified 38/207 (20%) of patients in the COG 9906 cohort 0.125 For this study, pre-treatment cryopreserved leuke who had a dismal 4 year RFS of 29% (approaching 0% at 5 mia specimens were available on a representative cohort of yrs). Highly expressed in this group of patients with the worst 207 of the 272 (76%) patients registered to COG P9906." outcome were genes (BMPR1B, CTGF (CCN2), TTYH2, With the exception of presenting white blood cell count IGJ, PON2, CD73, CDC42EP3, TSPAN7, SEMA6A) (WBC), the clinical and outcome parameters of these 207 involved in adaptive responses to TGFP, stem patients did not differ significantly from all 272 patients (see cell function, B-cell development and differentiation, and the Table S1 and FIG.7/S1). As shown in Table S1 and FIG.7/S1, regulation of tumor growth.27-45 These highest risk cases the differences in various characteristics between the entire lacked expression of the genes (NR4A3. BTG3, RGS1 and group (n=272) and the present study cohort (n=207) were RGS2) whose relatively high expression characterized the examined by the statistical comparisons between the present ALL cases with the best outcome. Not Surprisingly, given that study cohort and remaining patients (n=65) not included in all cases with an activated kinase signature were assigned to the present study. Each P-value in Table S1 and FIG.7/S1 is the highest risk group with the combined classifier, six of the that of the individual test which needs to be adjusted for genes associated with our kinase signature (BMPR1B, multiple testing. A simple Bonferroni adjustment multiplies ECM1, PON2, SEMA6A, and TSPAN7) were contained the P-values by the total number of tests. After this adjust within our gene expression classifier for RFS. The genes that ment, none of the characteristics are significantly different characterize the risk groups defined by the combined classi between the entire group and the cohort examined herein, fier provide important clues to the multiple complex path except the test for WBC count when a cutoff value was con Ways and mechanisms of leukemic transformation in pediat sidered. This trial targeted a subset (defined by age and WBC) ric ALL. of newly diagnosed NCI high risk ALL patients that had 0.124. The kinetics of early treatment response, best experienced a poor outcome (44% RFS) in prior studies. assessed by molecular or flow cytometric measures of mini Patients with central nervous system disease (CNS3) or tes mal residual disease (MRD) after the first 1-3 months of ticular leukemia were eligible regardless of age or white therapy, are a potent predictor of outcome in leukemia. Yet, blood cell (WBC) count at diagnosis. Patients with “very MRD data are not available at initial diagnosis and relapses high risk features (BCR-ABL or hypodiploid) were occur in some pediatric ALL patients (such as those with excluded, while those with “low” risk features (trisomy 4+10: t(1:19)TCF3-PBX1)), who have an excellent (negative) end TEL-AML1) were excluded unless they had CNS3 or testicu induction MRD response. Ideally, one would want to identify lar leukemia. The majority of patients had minimal residual as early as possible those ALL patients who are most likely to disease (MRD) assessed by flow cytometry as previously fail therapy so that novel treatment interventions or alterna described; cases were defined as MRD-positive or MRD tive induction methods could be employed. Using the com negative at the end of induction therapy (day 29) using a bined gene expression classifier for RFS and end-induction threshold of 0.01%. All treatment protocols were approved flow MRD, we identified 38 patients in the initial cohort of by the National Cancer Institute and all participating institu 207 patients who were destined to ultimately fail intensified tions through their Institutional Review Boards. Informed traditional therapy for ALL. Wetherefore built a 23 probe-set consent was obtained from all patients or their parents/guard (21 gene) gene expression classifier predictive of day 29 flow ians prior to enrollment. MRD in diagnostic, pre-treatment samples that could suc cessfully replace end-induction flow MRD in our risk model. TABLE S1 Among several interesting genes in the classifier predictive of end-induction MRD was BAALC, a novel marker of an early Comparison of High Risk ALL Patients Registered to COG P9906 progenitor cells that has been reported to confer a worse (n = 272) and The Subset of Patients Examined and Modeled outcome and primary resistance in acute leukemia, including for Gene Expression Signatures (n = 207)" ALL and AML in adults.46-47 Given the relatively old age Un (mean=13 years) of the children and adolescents in our ALL adjusted cohort and the presence of genes in our gene expression Not p-value classifiers for RFS and MRD that have previously been asso Char- Studied Studied Total (Fisher's ciated with a poor outcome in adult ALL (such as CTGF43-44 and BAALC46-47), we hypothesize that the gene expression acteristics N % N % N % exact test) classifiers that we have developed for pediatric ALL may also Age - no. be useful for risk classification and outcome prediction in adults with ALL. These studies are now in progress. The e1OYrs S1 78.46 132 63.77 183 67.28 O.O335 results of our studies provide evidence that improved out <1OYrs 14 21.54 75 26.23 89 32.72 come prediction and risk classification can be achieved in Sex - no. ALL through the development of gene expression classifiers. Male S2 80 137 66.18, 189 69.49 O.0442 The application of gene expression classifiers allows for the Female 13 20 7O 33.82 83 30.51 prospective identification of a significant Subgroup of ALL US 2011/0230372 A1 Sep. 22, 2011

fier for MRD was not feasible in this cohort due to the absence TABLE S1-continued of flow MRD testing in the COG 1961 protocol. Comparison of High Risk ALL Patients Registered to COG P9906 (n = 272) and The Subset of Patients Examined and Modeled Microarray Experimental Procedures for Gene Expression Signatures (n = 207)" I0127 RNA was prepared from thawed, cryopreserved Un samples with >80% blasts using TRIZol Reagent (Invitrogen, adjusted Carlsbad, Calif.) per the manufacturer's recommendations. Not p-value Total RNA concentration was determined by spectrophotom Char- Studied Studied Total (Fisher's eter and quality assessed with an Agilent Bioanalyzer 2100 acteristics N % N % N % exact test) (Agilent Technologies). The isolated RNA was reverse tran scribed into cDNA and re-transcribed into RNA. Biotiny WBC - no. lated eRNA was fragmented and hybridized to HG U133A 2.5ug and good quality MRD scanned images. Experimental quality was assessed by at day 29 GAPDH2 1800,220% expressed genes, GAPDH3"/5'ratios Negative 40 61.54 124 59.90 164 60.29 0.7550 s4 and linear regression r-squared values of spiked poly(A) Positive 19 29.23 67 32.37 86 31.62 controls >0.90. Unknown 6 9.23 16 7.73 22 8.09 MLL Statistical Analysis Negative 61 93.85 186 89.86 247 90.81 O4617 Positive 4 6.15 21 10.15 25 9.19 Microarray Data Pre-Processing E2A, PBX1 I0128. The supervised analyses were performed using the Negative S9 90.77 184 88.89 243 89.34 O6384 expression signal matrix corresponding to a filtered list of Positive 5 7.69 23 11.11 28 10.29 23,775 probe sets, reduced from the original 54,675. The Unknown 1 1.54 O O 1 0.37 experimental CEL files were first processed in conjunction CNS with a tailored mask using the Affymetrix GeneChip(R) Oper No blasts S4 83.08 160 77.29 214 78.68 O.1009 ating Software 1.4.0 Statistical Algorithm package to gener <5 blasts 3 4.61 26 12.56 29 10.66 ate a 207 patientx54,675 probe set signal data matrix and 25 blasts 8 1231 21 10.15 29 10.66 associated call matrix (Present/Absent/Marginal). The pur pose of the masking was to remove those probe pairs found to Total 6S 100 2O7 100 272 100 be uninformative in a majority of the samples and to eliminate All unknown data were removed before statistical tests were performed. non-specific signals common to a particular sample type, thus *After Bonferroni adjustment for multiple testing, only WBC remains significant at the improving the overall quality of the data. This was accom significance level plished by evaluating the signals for all probes across all 207 C = 0.05. samples and identifying those that gave mismatch (MM) sig nals greater than perfect match signals (PM) in more than 60% of the samples. This mask removed 94,767 probe pairs Validation Cohort and had some impact on 38.588 probe sets (71%). As shown in Table S2, the net impact of masking was a significant 0126. A subset of patients from COG 1961 "Treatment of increase in the number of present calls coupled with a dra Patients with Acute Lymphoblastic Leukemia with Unfavor matic decrease in the number of absent calls. The masked data able Features' was used as a validation cohort. As described also removed 7 probe sets entirely (none of which represented in Bhojwani et al.' this trial enrolled a total of 2078 patients human genes). This resulted in the number of analyzable with NCI high risk features, i.e. WBC count 250,000/ul or probe sets on the microarray being reduced from 54.675 to age 10 years old, from September 1996 to May 2002. Gene 54,668. Among the 54,668 probe sets, those with probe set ID expression microarray analyses were performed on pretreat starting with AFFX and those that did not receive present calls ment samples from 99 children treated on this study. This in at least 50% of the 207 samples were removed as described Subset was selected to identify gene expression profiles in the following section, leaving a total of 23,775 probe sets related to early response and long term outcome and may not for analysis. be representative of the entire high-risk population. These patients and their gene expression data were studied as a validation cohort for the gene expression classifier for RFS TABLE S2 after removal of 8 children with the t012:21), 6 with the Impact of masking on Affymetrix statistical calls (reported t(9:22) translocations, and 1 who failed induction therapy. as percentage of total probes: 54.675, raw 54.668, masked). Data on the remaining 84 patients, that best reflect our patient population, are provided in the paper. Among the 6 children Present Marginal Absent No call with the t09:22) translocation, the two with lowest gene Raw 34.9 1.7 6.3.3 O expression risk scores are in clinical remission, while 2 of 4 Masked 48.0 3.1 48.9 O (7) children with high gene expression risk scores have relapsed, and a third was censored. Validation of our molecular classi US 2011/0230372 A1 Sep. 22, 2011 24

Probe Set Filtering Prediction analysis was carried out using the Cox propor tional-hazards-model-based Supervised principal compo 0129. The filter required that a probe set be called nents analysis (SPCA) method.'" The number of genes Present in at least 50% of the samples (n=104) in order for used in the SPCA model was determined by maximizing the it to be retained in subsequent statistical analysis. This filter average likelihood ratio test (LRT) scores obtained in a 20x5 was fairly stringent, and it removed over 50% of the original fold cross-validation procedure, and a final model comprising probe sets, but was chosen to provide a reasonable tradeoff that number of highest Cox score genes was built using the between signal reliability and the loss of some probe sets of entire dataset. The model predicts a continuous risk score potential biological relevance (FIG. 8/S2). which is designed to be positively-associated with the risk to To assess whether the more reliable but reduced list of probe relapse. The gene expression risk classification was based on sets was indeed adequate for constructing our Supervised the predicted risk score. The gene expression high- (or low-) models, we did our outcome (RFS) and 29-day MRD analy risk group was defined as having a positive (or negative) risk ses using the full set of probe sets excluding those with probe score. To avoid biasing the analysis results, an outer loop of set IDs starting with "AFFX”. Although there was only a very leave-one-out cross-validation (LOOCV), independent from small overlap between the final sets of genes used in both the internal loop (i.e., the 20 iterations of 5-fold cross-vali models, the analyses that started from the filtered probe set dation used to determine the final model) was performed to list were found to be slightly superior statistically to those obtain cross-validated risk assignments used to assess the based on the unfiltered probe set list. significance of the predictions. These cross-validated risk 0130. These results are consistent with similar observa assignments were also used for outcome analyses and for tions made in the context of recent breast cancer studies. Two presenting prediction statistics. The performance of the out distinct expression profiling-derived gene panels for risk come predictor was evaluated by examining the association assessment are currently undergoing prospective evaluation of patient outcome with predicted risk score and risk groups by U.S. and European consortia. A meta-analysis found that using a Kaplan-Meier estimator, Cox regression and the notwithstanding minimal pairwise overlap between the logrank test. For further technical details see Supplement, respective sets of genes, a high concordance was observed Section 8. between outcome predictions derived from the two predictors I0133) For prediction of MRD status at day 29, a modified plus two others, in a large cohort of patients. In the present t-test' was used to examine the statistical significance of instance a similar biological redundancy is evidently operat probe sets according to their association with positive/nega ing with respect to the genes characterizing the newly-iden tive flow MRD at day 29, and a diagonal linear discriminant tified leukemic risk groups. analysis (DLDA) model' was used to make predictions. The 0131 Based on these results, it appears that underlying number of genes used in the DLDA model was determined by patterns of gene expression corresponding to fundamental minimizing the prediction error in a 100x10-fold cross-vali disease pathways and biological processes can manifest dation procedure, and a final model comprising that number themselves as robust statistical associations with very differ of highest-scoring genes was computed using the entire ent probe sets, depending on the precise analytic methodolo dataset. A similar nested cross-validation procedure was per gies used to identify them." The choice of methodology formed to obtain the cross-validated predictions on MRD day depends in turn on the particular goals of a given study—for 29 used to compute the misclassification error estimate. example, elucidating disease etiology, predicting outcome, or These predictions were also used for outcome analyses and performing risk stratification at diagnosis. Here we have for presenting prediction statistics. The performance of the focused on the identification of gene sets as features for MRD predictor was evaluated using the misclassification classifying acute leukemia patients into distinct risk catego error rate and ROC accuracy. For further technical details see ries. While non-unique, these probe sets provide important Supplement, Section 9. complementary clues for developing a unified understanding of the distinctive chromosomal lesions and disrupted regula Gene Expression Classifier for Prediction of Relapse Free tory pathways underlying the diverse prognostic Subtypes of Survival (RFS) B-precursor ALL. 0.134. A 20x5-fold cross validation as detailed in Section 8 Overview of Statistical Approach for Outcome Prediction was performed to determine the model for predicting the risk score of relapse. Twenty candidate thresholds were consid 0132) The primary indicator for outcome in this study is ered. The number of significant probe sets determined by relapse-free survival (RFS), calculated as time from the date each threshold and geometric mean of the likelihood ratio test of trial enrollment to first event (relapse) or last follow-up. statistic corresponding to each threshold are listed in Table Patients in clinical remission or remission were censored at S3, below. the date of last contact. RFS was estimated by the method of Kaplan and Meier and compared between groups using the logrank test. The Supervised analyses for predicting outcome TABLE S3 and MRD were performed using a cross-validation based Candidate thresholds and corresponding numbers of significant genes scheme," in which an optimal gene expression model was and geometric means of likelihood ratio test (LRT) statistic values. determined through a number of iterations of cross-valida # Significant LRT statistic tions. The performance of the optimal model was evaluated Threshold # Threshold Genes (geometric mean) through nested cross-validations of the entire model building process. 1 O.OOOO 23774 O.S289 2 0.1376 2O262 O.7148 For outcome prediction, a Cox score was used to examine the 3 0.2752 16846 0.813S statistical significance of individual probe sets on the basis of 4 O4128 13619 O.8511 how their expression values are associated with the RFS. US 2011/0230372 A1 Sep. 22, 2011 25

(Table S4). SAM software was also used to calculate the false TABLE S3-continued discovery rate (FDR) for each of those probe sets. I0135. The final model for predicting RFS includes 42 Candidate thresholds and corresponding numbers of significant genes probe sets (Table S4). Among the high-expressing genes in and geometric means of likelihood ratio test (LRT) statistic values. the high risk group are genes that play roles in the antioxidant # Significant LRT statistic defense system in the microvasculature (PON-2)," adaptive Threshold # Threshold Genes (geometric mean) cell signaling responses to TGF13 (CDC42EP3, CTGF)." B-cell development and differentiation (Ig), breast cancer 5 0.5505 10649 O.8174 6 O.6881 8007 O.86SO growth, invasion and migration (CD73, CTGF). 17.18 7 0.8257 5762 O.8248 colonic and/or renal cell carcinoma proliferation (TTYH2, 8 O.9633 3940 0.7768 BMPR1B).''' cell migration in acute myeloid leukemia 9 1.1009 2555 O.8843 (TSPAN7),’ and embryonic (SEMA6A) and mesenchymal 10 1.238S 1571 O.8154 11 1.3761 915 O.9366 (CD73) stem cell function. CTGF (CCN2) is also a 12 1.5137 509 1.OSS8 growth factor secreted by pre-BALL cells that is postulated to 13 1.6513 273 1.3662 play a role in disease pathophysiology. CD73 expressed on 14 1.7889 144 1.6222 regulatory T cells mediates immune suppression and plays 15 1926S 75 1.8837 16 2.0641 42 19570 a role in cellular multiresistance.’ Two genes with tumor 17 2.2017 24 17051 suppressor functions, NR4A3 and BTG3, are comparatively 18 2.3393 14 1.6378 downregulated in the high risk group, as are the signaling 19 2.4770 8 O.8933 proteins RGS1 and RGS2. RR4A3 (NOR-1) is a nuclear 2O 2.6146 4 0.5035 receptor of transcription factors involved in cellular Suscep tibility to tumorgenesis; downregulation is seen in acute myeloid leukemia. BTG3 is a regulator of apoptosis and cell The mean of the LRT statistic is also plotted in FIG.9/S3. We proliferation that controls cell cycle arrest following DNA see that the geometric mean of the LRT reaches the maximum damage and predicts relapse in T-ALL patients.” Decreased when the threshold is T=2.064. The “best model determined expression of RGS1 or RGS2 have a variety of consequences by this threshold is a linear combination of expression values including effects on T-cell activation and migration and of 42 probe sets that are highly associated with RFS status myeloid differentiation.31

TABLE S4 Probe sets (and associated genes) tha are significantly associated with relapse free survival Rank High in Cox Score p-value FDR Probe Set ID Gene Symbol Gene Description 1 High 2.9873 OOOOOO1 <.0001 242579 a BMPR1B bone morphogenetic protein Risk receptor, type IB 2 Low Risk -29540 OOOOO23 <.OOO1 202388 a RGS2 regulator of G-protein signaling 2, 24 kDa 3 High 2.9090 O.OOOO12 <.0001 213371 a LDB3 LIM domain binding 3 Risk 4 High 2.8856 OOOOO20 <.0001 210830 S. at PON2 paraOXonase 2 Risk 5 High 2.6177 O.OOO230 <.OOO1 201876 a PON2 paraOXonase 2 Risk 6 High 2.6146 (OOOOOO9 <.0001. 209288 s at CDC42EP3 CDC42 effector protein (Rho Risk GTPase binding) 3 7 High 2.6081 O.OOOSTO <.0001 215028 a SEMA6A sema domain, transmembrane Risk domain (TM), and cytoplasmic domain, (Semaphorin) 6A 8 High 2.5685 O.OOO62O <.0001 223449 a SEMA6A sema domain, transmembrane Risk domain (TM), and cytoplasmic domain, (Semaphorin) 6A 9 High 2.SS39 O.OOO310 <.0001. 204030 S. at SCHIP1 Schwannomin interacting protein 1 Risk 10 High 2.SS11 O.OOO160 <.0001 232539 at MRNA, cDNA Risk DKFZp761 H1023 (from clone DKFZp761H1023) 11 High 2.54SO O.OO1300 <.0001 212592 at IG Immunoglobulin J polypeptide, Risk linker protein for immunoglobulin alpha and mu polypeptides 12 High 2.5287 O.OOO4SO <.0001. 209101 at CTGF connective tissue growth factor Risk 13 High 2.5223 O.OOOO83 <.0001 219313 at GRAMD1 C GRAM domain containing 1C Risk 14 High 2.4907 O.OOO110 <.OOO1 225355 at LOCS4492 hypothetical LOC54492 Risk US 2011/0230372 A1 Sep. 22, 2011 26

TABLE S4-continued Probe sets (and associated genes) that are significantly associated with relapse free survival Rank High in Cox Score p-value FDR Probe Set ID Gene Symbol Gene Description 15 Low Risk -2.4874 0.000045 <.0001 228388 at NFKBIB nuclear factor of kappa light polypeptide gene enhancer in B cells inhibitor, beta 16 High 2.4545 0.000370 <.0001. 209365 s at ECM1 extracellular matrix protein 1 Risk 17 High 2.4211 O.OOOO83 <.0001. 22374.1 s at TTYH2 tweety homolog 2 (Drosophila) Risk 18 High 2.3965 0.000062 <.0001. 236750 a NRXN3 Neurexin 3 Risk 19 High 2.3725 O.OOO160 <.0001 215617 a LOC26010 viral DNA polymerase Risk transactivated protein 6 20 High 2.3715 O.OOOO39 <.0001. 236766 a Transcribed locus Risk 21 High 2.3487 0.000280 <.OOO1 203939 a NTSE 5'-nucleotidase, ecto (CD73) Risk 22 Low Risk -2.3253 0.001700 <.0001 216834 a RGS1 regulator of G-protein signaling 1 23 Low Risk -2.2848 0.002200 <.0001. 209959 a NR4A3 nuclear receptor Subfamily 4, group A, member 3 24 Low Risk -2.2784 0.000490 <.0001 213134 x at BTG3 BTG family, member 3 25 High 2.2782 0.000850 <.0001 244280 a Homo sapiens, clone Risk IMAGE:5583725, mRNA 26 High 2.2729 O.OOO140 <.0001 215479 a CDNA FLJ20780 fis, clone Risk COLO4256 27 Low Risk -2.2568 0.000053 <.0001. 205831 a CD2 CD2 molecule 28 High 2.2532 0.000140 <.0001 211675 is at MDFIC MyoD family inhibitor domain Risk containing 29 Low Risk -2.2474 0.001700 <.0001. 207978 s at NR4A3 nuclear receptor Subfamily 4, group A, member 3 30 Low Risk -2.2401 0.000009 <.0001 224654 at DDX21 DEAD (Asp-Glu-Ala-Asp) box polypeptide 21 31 Low Risk -2.2316 0.000410 <.0001 238623 at CDNA FLJ37310 fis, clone BRAMY2016706 32 High 2.2094 0.002200 <.0001. 202242 at TSPANT etraspanin 7 Risk 33 Low Risk -2.2082 0.000880 <.0001 226184 at FMNL2 ormin-like 2 34 Low Risk -2.2010 0.000039 <.0001 212497 at MAPK1 IP1L mitogen-activated protein kinase interacting protein 1-like 35 Low Risk -2.1912 0.000960 8.4505 221349 at VPREB1 pre-B lymphocyte gene 1 36 Low Risk -2.1797 0.000005 8.4505. 208152 s at DDX21 DEAD (Asp-Glu-Ala-Asp) box polypeptide 21 37 Low Risk -2.1716 0.000820 8.4505 210024 s at UBE2E3 ubiquitin-conjugating enzyme E2E3 (UBC45 homolog, yeast) 38 High 2.1635 O.OO1500 <.0001. 1559072 a at ELFN2 extracellular leucine-rich repeat Risk and fibronectin type III domain containing 2 39 Low Risk -2.1634 0.002400 8.4505 244623 at KCNQ5 potassium voltage-gated channel, KQT-like subfamily, member 5 40 Low Risk -2.1378 0.001500 8.4505 224.507 s at MGC12916 hypothetical protein MGC 12916 41 Low Risk -2.1275 O.OO 1300 8.4505 203921 at CHST2 carbohydrate (N- acetylglucosamine-6-O) Sulfotransferase 2 42 High 2.1196 0.000400 1.6184 1560524 at LOC4OO581 GRB2-related adaptor protein Risk like

Note “High in” corresponds to “gene expression over-expressed in” Cox Score is the modified score test statistic based on Cox regression, P-value is for the Wald test based on univariate Cox regression. FDR is the False Discovery Rate estimated using SAM

Gene Expression Classifier for Prediction of Day 29 Minimal of 100 average error rates and the lower and upper bounds of Residual Disease (MRD) the boxes represent the 25" and 75" quartiles, respectively. 0.137 The minimal mean error rate corresponds to the 0.136 An optimal DLDA model for prediction of day 29 model using the 23 significant probe sets listed in Table S5. MRD was determined through a 100x10-fold cross-valida With a threshold of 1% for the False Discovery Rate (FDR), tion procedure as described in Section 9. FIG. 10/S4 shows the SAM software identified 352 probe sets that are signifi the box plots of 100 average misclassification rates of each cantly associated with day 29 MRD status, which are listed in 10-fold cross-validation corresponding to each number of Table S6. Since DLDA as implemented here and SAM use the significant genes used in the models. The red line is the mean same method to assess the significance of the probe sets, the US 2011/0230372 A1 Sep. 22, 2011 27

23 probe sets included in the MRD prediction model (Table studies of MRD have found cell-cycle progression and apo S5) also appear on the top of the list in Table S6. The 23 probe ptosis-related genes to be involved in treatment resistance." set includes the gene CDC42EP3 which is present among the 37 Related genes present in our MRD classifier included top gene classifiers for both molecular MRD and RFS. A number of other probe sets overlap between the 352 probe P2RY5, E2F8, IRF4, but did not include CASP8AP2, sets predictive of MRD and gene expression predictors of described to be particularly significant in a few recent studies. RFS. ° Our two probe sets for CASP8AP2 (1570001, 222201) 0138 Genes with low expression among our high risk showed relatively weak signals with no discriminating func group include DTX-1, a regulator of Notch signaling. tion (P-0.1). High BAALC was a strong predictor for MRD. KLF4, a promoter of monocyte differentiation, and TNSF4. This gene has recently been shown to be associated with a member of the tumor necrosis family. Other microarray worse prognosis in acute myeloid leukemia.

TABLES5 Probe sets (and associated genes) that are included in the MRD predictor Rank High in p-value FDR (%) Probe set ID Gene Symbol Gene Description 1 Neg O.OOOOOOO5 <.0001 242747 a 2 Neg OOOOOO147 <.0001. 205429 is at MPP6 membrane protein, palmitoylated 6 (MAGUK p55 Subfamily member 6) 3 Neg OOOOOOO36 <.0001 221841 is at KLF4 Kruppel-like factor 4 (gut) 4 Pos 0.00000054 <.0001. 209286 a CDC42EP3 CDC42 effector protein (Rho GTPase binding) 3 5 Neg OOOOOOOOO <.0001. 1564310 a. at PARP15 poly (ADP-ribose) polymerase family, member 15 6 Neg OOOOOOO45 <.0001 201719 is at EPB41L2 erythrocyte membrane protein band 4.1-like 2 7 Pos OOOOOO2.19 <.0001 218899 is at BAALC brain and acute leukemia, cytoplasmic 8 Neg OOOOOO101 <.0001 213358 a KIAAO8O2 KIAAO8O2 9 Neg OOOOOO1 OO <.0001. 1553380 at PARP15 poly (ADP-ribose) polymerase family, member 15 10 PoS O.OOOOOO77 <.0001 2256.85 a CDNA FLJ31353 fis, clone MESAN2000264 11 Neg OOOOOOO42 <.0001. 227336 a DTX1 deltex homolog 1 (Drosophila) 12 Neg OOOOOOO32 <.0001 201718 s at EPB41L2 erythrocyte membrane protein band 4.1-like 2 13 Neg OOOOOOO60 <.0001 201710 a MYBL2 v-myb myeloblastosis viral oncogene homolog (avian)-like 2 14 Pos OOOOOO183 <.0001. 207426 S. at TNFSF4 tumor necrosis factor (ligand) Superfamily, member 4 (tax-transcriptionally activated glycoprotein 1, 34 kDa) 15 Neg OOOOOO120 <.0001. 219990 at E2F8 E2F transcription factor 8 16 POS 0.00000207 <.0001 213817 at CDNA FLJ13601 fis, clone PLACE1010069 17 POS OOOOO1106 <.0001 220448 at KCNK12 potassium channel, Subfamily K, member 12 18 Pos O.OOOOO110 <.0001. 232539 at MRNA, cDNA DKFZp761 H1023 (from clone DKFZp761H1023) 19 Neg OOOOOOO6S <.0001 225688 s at PHLDB2 pleckstrin homology-like domain, family B, member 2 2O POS OOOOOOS46 <.0001 218589 at P2RY5 purinergic receptor P2Y, G-protein coupled, 5 21 Neg OOOOOOO73 <.0001. 204562 at IRF4 interferon regulatory factor 4 22 Neg O.OOOOOO16 <.0001 219032 x at OPN3 opsin 3 23 Pos 0.00000598 <.0001 242051 at CD99 CD99 molecule

Note: Neg = MRD negative; Pos = MRD positive; p-value via two sample t-test FDR = False discovery rate as estimated by SAM

TABLE S6 Probe sets (and associated genes) that are significantly associated with distinction between negative and positive MRD at day 29. Highlighted top-23 probe sets correspond to those used in the final MRD predictor (Table S5). Rank High in p-value FDR (%) Probe set ID Gene Symbol Gene Description 1 Neg O.OOOOOOOS <.OOO 2 Neg O.OOOOO147 <.OOO 205429s at MPP6 membrane protein, palmitoylated 6 (MAGUK p55 Subfamily member 6) 3 Neg O.OOOOOO36 <.OOO 221841 sat KLF4 Kruppel-like factor 4 (gut) 4 Pos O.OOOOOOS4 <.OOO CDC42EP3 CDC42 effector protein (Rho GTPase binding) 3 5 Neg O.OOOOOOOO <.OOO PARP15 poly (ADP-ribose) polymerase family, member 15 6 Neg O.OOOOOO45 <.OOO EPB41L2 erythrocyte membrane protein band 4.1-like 2 7 Pos O.OOOOO219 <.OOO BAALC brain and acute leukemia, cytoplasmic 8 Neg O.OOOOO101 <.OOO 9 Neg O.OOOOO1 OO <.OOO PARP15 poly (ADP-ribose) polymerase family, member 15 10 PoS O.OOOOOO77 <.OOO CDNA FLJ31353 fis, clone MESAN2000264 US 2011/0230372 A1 Sep. 22, 2011 28

TABLE S6-continued Probe sets (and associated genes) that are significantly associated with distinction between negative and positive MRD at day 29. Highlighted top-23 probe sets correspond to those used in the final MRD predictor (Table S5). Rank High in p-value FDR (%) Probe Set ID Gene Symbol Gene Description 11 Neg OOOOOOO42 227336 at deltex homolog 1 (Drosophila) 12 Neg OOOOOOO32 201718 at erythrocyte membrane protein band 4.1-like 2 13 Neg OOOOOOO60 201710 at v-myb myeloblastosis viral oncogene homolog (avian)-like 2 14 POS OOOOOO183 207426 sat tumor necrosis factor (ligand) Superfamily, member 4 (tax-transcriptionally activated glycoprotein I, 34kDa) 15 Neg OOOOOO120 E2F transcription factor 8 16 POS OOOOOO2O7 CDNA FLJ13601 fis, clone PLACE1010069 17 POS OOOOO1106 KCNK12 potassium channel, Subfamily K, member 12 18 POS OOOOOO110 25 MRNA, cDNA DKFZp761H1023 (from clone DKFZp761H1023) 19 Neg OOOOOOO6S 22.5688 sat PHLDB2 pleckstrin -like domain, family B, member 2 2O POS OOOOOOS46 218589 at P2RY5 purinergic receptor P2Y, G-protein coupled, 5 21 Neg OOOOOOO73 204562 at RF4 interferon regulatory factor 4 22 Neg OOOOOOO16 OPN3 opsin 3 23 POS OOOOOO598 CD99 CD99 molecule 24 Neg OOOOOOO92 220266 s at KLF4 Kruppel-like factor 4 (gut) 25 POS OOOOO2445 201028 s at CD99 CD99 molecule 26 POS OOOOO4247 204304 S at PROM1 prominin 1 27 POS OOOOO726S 208886 at H1 histone family, member 0 28 POS OOOO12240 209101 at connective tissue growth factor 29 Neg OOOOOOOO3 236307 at Transcribed locus 30 Neg OOOOO6038 206530 at RAB30 RAB30, member RAS oncogene family 31 Neg OOOOO4247 210094 S a PARD3 par-3 partitioning defective 3 homolog (C. elegans) 32 POS OOOOOOOO3 209288 s a CDC42EP3 CDC42 effector protein (Rho GTPase binding) 3 33 Neg O.OOO15116 221526 x a PARD3 par-3 partitioning defective 3 homolog (C. elegans) 34 Neg OOOOO1630 210517 s a AKAP12 Akinase (PRKA) anchor protein (gravin) 12 35 POS O.OOO10226 227998 at S100A16 S100 calcium binding protein A16 36 Neg OOOOOO869 1559618 at LOC10O129447 hypothetical protein LOC100129447 37 Neg OOOOOO486 228390 at CDNA clone IMAGE:5259272 38 POS OOOOOO726 207571 x a Corf58 chromosome 1 open reading frame 38 39 POS OOOOO31S2 206674 at FLT3 ms-related tyrosine kinase 3 40 POS OOOOO6038 227923 at SHANK3 SH3 multiple ankyrin repeat domains 3 41 Neg OOOOO1223 212022 s a MKI67 antigen identified by monoclonal antibody Ki-67 42 POS OOOO14623 203372 s a SOCS2 Suppressor of cytokine signaling 2 43 POS OOOOO6938 204646 at DPYD dihydropyrimidine dehydrogenase 44 POS OOOOO1134 207610 s a EMR2 egf-like module containing, mucin-like, hormone receptor-like 2 45 POS OOOOO6858 20403.0 s a SCHIPI Schwannomin interacting protein 1 46 Neg OOOOO2761 1552924 a at PITPNM2 phosphatidylinositol transfer protein, membrane associated 2 47 POS OOOOOO765 217967 s a FAM129A family with sequence similarity 129, member A 48 Neg OOOOOO443 227173 s a BACH2 BTB and CNC homology 1, basic leucine Zipper transcription factor 2 49 POS OOOOO752O 203373 a SOCS2 Suppressor of cytokine signaling 2 50 POS O.OOO23124 222154 s a LOC26010 viral DNA polymerase-transactivated protein 6 51 POS OOOOOS697 201029 s a CD99 CD99 molecule 52 POS O.OOO12516 225524 a ANTXR2 anthrax toxin receptor 2 53 POS OOOOOO785 210785 s a Corf58 chromosome 1 open reading frame 38 54 Neg OOOOOOO2O 1556451 at MRNA, cDNA DKFZp667B1520 (from clone DKFZp667B1520) 55 POS OOOOOOO38 1557626 at CDNA FLJ398.05 fis, clone SPLEN2007.951 56 POS O.OOO11317 202242 a TSPAN7 tetraspanin 7 57 Neg OOOOOO176 228361 a E2F2 E2F transcription factor 2 58 POS OOOOO6108 222780s at BAALC brain and acute leukemia, cytoplasmic 59 POS O.OOO17824 201876 a PON2 paraOXonase 2 60 POS OOOOO1149 218847 a IGF2BP2 insulin-like growth factor 2 mRNA binding protein 2 61 POS OOOOOO598 228573 a Transcribed locus 62 Neg OOOO18824 225288 a COL27A1 collagen, type XXVII, alpha 1 63 Neg OOOOO1336 227846 a GPR176 G protein-coupled receptor 176 64 POS OOOOO1735 213541s at ERG v-ets erythroblastosis virus E26 oncogene homolog (avian) 65 Neg OOOOO8529 225246 a STIM2 stromal interaction molecule 2 66 POS OOOOOOO82 224861 a GNAQ Guanine nucleotide binding protein (G protein), q polypeptide 67 POS OOOOO2O61 211474 s at SERPINB6 Serpin peptidase inhibitor, clade B (ovalbumin), member 6 68 Neg O.OO182593 219737s at PCDH9 protocadherin 9 US 2011/0230372 A1 Sep. 22, 2011 29

TABLE S6-continued Probe sets (and associated genes) that are significantly associated with distinction between negative and positive MRD at day 29. Highlighted top-23 probe sets correspond to those used in the final MRD predictor (Table S5). Rank High in p-value FDR (%) Probe Set ID Gene Symbol Gene Description 69 OOOOOO225 226350 at CHML choroideremia-like (Rab escort protein 2) 70 OOOOOO765 221234 s a BACH2 BTB and CNC homology 1, basic leucine Zipper transcription factor 2 71 OOOOO6108 227013 at LATS2 LATS, large tumor Suppressor, homolog 2 (Drosophila) 72 OOOOOOO33 235094 at CDNA FLJ39413 fis, clone PLACE6O15729 73 OOOOO7018 209543 s a CD34 CD34 molecule 74 OOOOO3O41 205692 s a CD38 CD38 molecule 75 OOOOO8148 210993 s a SMAD1 SMAD family member 1 76 OOOOO3115 203922 s a CYBB cytochrome b-245, beta polypeptide (chronic granulomatous disease) 77 OOOOOO240 202430 s a PLSCR1 phospholipid scramblase 1 78 OOOO10460 225293 a COL27A1 collagen, type XXVII, alpha 1 79 O.OOOS62S6 213273 a ODZ.4 odz, Odd Oziten-m homolog 4 (Drosophila) 8O O.OOO33554 216565 x a 81 OOOOOO647 240432 x a Transcribed locus 82 OOOOOO699 239946 a Transcribed locus 83 OOOOO2SO6 242565 x a C2Orfsf Chromosome 21 open reading frame 57 84 O.OOO47774 201811 x a SH3BP5 SH3-domain binding protein 5 (BTK-associated) 85 O.OOO286.36 200953 s a CCND2 cyclin D2 86 OOOOO9998 220034 a IRAK3 interleukin-1 receptor-associated kinase 3 87 OOOOOO443 209760 a KIAAO922 KIAAO922 88 OOOOOO598 222762 x a LIMD1 LIM domains containing 1 89 OOOOO4OS1 22374.1 s a TTYH2 tweety homolog 2 (Drosophila) 90 O.OOO81524 226018 a C7orf241 chromosome 7 open reading frame 41 91 O.OO119278 210473 s a GPR12S G protein-coupled receptor 125 92 O.OOO332O3 239901 a Transcribed locus 93 O.OOO63516 1559315 s at LOC144481 hypothetical protein LOC144481 94 OOOOOO234 236796 a BACH2 BTB and CNC homology 1, basic leucine Zipper transcription factor 2 95 OOOOOO213 240498 a 96 OOOOOO186 219383 a FLJ14213 protor-2 97 OOOOOO134 221249 s at FAM117A family with sequence similarity 117, member A 98 O.OOO20983 1565951 s at CHML choroideremia-like (Rab escort protein 2) 99 OOOOOS128 205159 a CSF2RB colony stimulating factor 2 receptor, beta, low-affinity (granulocyte-macrophage) OO OOOOOOS12 228696 a SLC45A3 solute carrier family 45, member 3 O1 O.OOO 10343 213931 a D2 FID2B inhibitor of DNA binding 2, dominant negative helix-loop-helix protein i? inhibitor of DNA binding 2B, dominant negative helix-loop-helix protein O2 O.OOO32856 2024.81 a DHRS3 dehydrogenase/reductase (SDR family) member 3 O3 O.OO113666 226796 a LOC116236 hypothetical protein LOC116236 O4 OOOOO1223 218032 a SNN Stannin 05 OOOOO752O 223380 s a LATS2 LATS, large tumor Suppressor, homolog 2 (Drosophila) O.OOO149SO 202023 a EFNA1 ephrin-A1 OOOOO1713 211275 s a GYG1 glycogenin 1 O.OOO15453 204165 a. WASF1 WAS , member 1 O.OOO16874 219938 is a PSTPIP2 proline-serine-threonine phosphatase interacting protein 2 10 O.OOO90860 212985 a. MRNA, cDNA DKFZp434E033 (from clone DKFZp434E033) 11 O.OOO17248 231124 x a LY9 lymphocyte antigen 9 12 O.OOOS1853 206001 a NPY neuropeptide Y 13 O.OOO47774 241679 a 14 O.OOO15972 24O718 a LRMP Lymphoid-restricted membrane protein 15 O.OOO2O534 214453 s a FI44 interferon-induced protein 44 16 OOOOOOO17 203907 s a QSEC1 IQ motif and Sec7 domain 1 17 OOOOO6625 1556425 a. at LOC2842.19 hypothetical protein LOC284219 18 O.OOO286.36 201810 s a SH3BP5 SH3-domain binding protein 5 (BTK-associated) 19 OOOOO6473 241824 at Transcribed locus 2O OOOOOO681 211675 s a MDFIC MyoD family inhibitor domain containing 21 OOOOOO858 232210 at CDNA FLJ14056 fis, clone HEMBB1000335 22 OOOO14623 204334 at KLF7 Kruppel-like factor 7 (ubiquitous) 23 OOOOO2761 227002 at FAM78A family with sequence similarity 78, member A 24 O.OOOS1326 227798 at SMAD1 SMAD family member 1 25 OOOOO3470 209723 at SERPINB9 Serpin peptidase inhibitor, clade B (ovalbumin), member 9 26 O.OOO70928 202732 at PKIG protein kinase (cAMP-dependent, catalytic) inhibitor gamma 27 POS O.OOO32171 1563335 at IRGM immunity-related GTPase family, M US 2011/0230372 A1 Sep. 22, 2011 30

TABLE S6-continued Probe sets (and associated genes) that are significantly associated with distinction between negative and positive MRD at day 29. Highlighted top-23 probe sets correspond to those used in the final MRD predictor (Table S5). Rank High in p-value FDR (%) Probe Set ID Gene Symbol Gene Description 28 POS O.OOO10226 243092, a CDNA clone IMAGE:4817413 29 POS OOOOO6779 239809 a Transcribed locus 30 Neg OOOOO1630 2028.06 a drebrin 31 Neg OOOO11445 221520s at cell division cycle associated 8 32 Neg OOOOOOS12 204947 a E2F transcription factor 1 33 POS O.OOO60391 244665 a. Transcribed locus 34 Neg O.OOO3O841 236191 a Transcribed locus 35 POS OOOO14623 218729 a atexin 36 Neg O.OOO11704 230597 a Solute carrier family 7 (cationic amino acid transporter, y+ system), member 3 37 Neg OOOOO9131 243030 a. Transcribed locus 38 POS OOOOOOO3S 209164 s at CYB561 cytochrome b-561 39 POS OOOOO3909 219871 a FLJ13197 hypothetical FLJ13197 fill hypothetical protein LOC10O132861 LOC10O132861 40 POS OOOOOOO91 239740 a ETV6 ets variant gene 6 (TEL oncogene) 41 Neg OOOOO3956 208072 s at DGKD diacylglycerol kinase, delta 130kDa 42 POS OOOOOO174 237561 x at Transcribed locus 43 Neg OOOOO618O 235699 a REM2 RAS (RAD and GEM)-like GTP binding 2 44 POS O.OOO37651 218694 a ARMCX1 armadillo repeat containing, X-linked 1 45 POS O.OOOS8585 238032 a Transcribed locus 46 Neg O.OO147143 244623 a potassium voltage-gated channel, KQT-like Subfamily, member 5 47 O.OOO93573 O.2273 221527 s at PARD3 par-3 partitioning defective 3 homolog (C. elegans) 48 O.OOO23882 O.2273 208981 a PECAM1 platelettendothelial molecule (CD31 antigen) 49 O.OOO2S197 O.2273 204249 s at LMO2 LIM domain only 2 (rhombotin-like 1) 50 O.OOO90860 O.2273 243808 a Transcribed locus 51 O.OOO43S43 O.2273 2031.39 a DAPK1 death-associated protein kinase 1 52 O.OOO2S468 O.2273 2098.13 x at TARP TCRgamma alternate reading frame protein 53 OOOOOO336 O.2273 2031.85 a. RASSF2 Ras association (RaIGDS/AF-6) domain family member 2 S4 O.OOO4S848 O.2273 201656 a TGA6 integrin, alpha 6 55 O.OOO36873 O.2273 208.614 S at FLNB filamin B, beta (actin binding protein 278) 56 OOOOOO368 O.2273 232685 a. CDNA: FLJ21564 fis, clone COLO6452 57 OOOOO4148 O.2273 218949 s at QRSL1 glutaminyl-tRNA synthase (glutamine-hydrolyzing)- ike 1 58 OOOOO8OSS O.2273 237591 a FLJ42957 protein 59 OOOOO1938 O.2273 231369 a Zinc finger protein 333 60 O.OOO77581 O.2273 236750 a Neurexin 3 61 O.OOO29877 O.2273 226545 a. CD109 molecule 62 O.OOO16328 O.2273 237009 a 63 O.OO141668 O.2273 229072 a. CDNA clone IMAGE:5259272 64 O.OOO38046 O.2273 1555638 a. at SAMSN1 SAM domain, SH3 domain and nuclear localization signals 1 65 OOOOO2S67 O.2273 221586 s at E2F transcription factor 5, p130-binding 66 OOOOO2SO6 O.2273 205585 a. ets variant gene 6 (TEL oncogene) 67 OOOOO7963 O.2273 221942 s at 1, Soluble, alpha 3 68 O.OOO23124 O.2273 238623 a CDNA FLJ37310 fis, clone BRAMY2016706 69 O.OOO66791 O.2273 208982 a PECAM1 platelettendothelial cell adhesion molecule (CD31 antigen) 70 OOOOO31S2 O.2273 225913 a SGK269 NKF3 kinase family member 71 OOOOO882S O.2273 220560 a C11orf21 chromosome 11 open reading frame 21 72 O.OOO13087 O.2273 238893 a LOC338.758 hypothetical protein LOC338758 73 OOOOO76O7 O.2273 205423 a AP1B1 adaptor-related protein complex 1, beta 1 subunit 74 O.OOO30516 O.2273 228461 a SH3MD4 SH3 multiple domains 4 75 O.OOO15116 O.2273 235171 a Transcribed locus 76 OOOOOO4S5 O.2273 239005 a CDNA FLJ38785 fis, clone LIVER200 1329 77 O.OO102169 O.2273 242579 a BMPR1B bone morphogenetic protein receptor, type IB 78 O.OOO13234 O.2273 227098 a DUSP18 dual specificity phosphatase 18 79 O.OOO36110 O.2273 206079 a CHML choroideremia-like (Rab escort protein 2) 8O OOOOOO708 O.2273 202252 a RAB13 RAB13, member RAS oncogene family 81 O.OO191271 O.2273 214084 x at LOC648998 similar to Neutrophil factor 1 (NCF-1) (Neutrophil NADPH oxidase factor 1) (47kDa neutrophiloxidase factor) (p47-phox) (NCF-47K) (47 kDa autosomal chronic granulomatous disease protein) (NOXO2) 182 Neg OOOOO1178 O.2273 22O768 s at CSNK1 G3 , gamma 3 183 POS OOOOO2SO6 O.2273 2091.63 at CYB561 cytochrome b-561 184 POS O.OO133807 O.2273 215177 s at ITGA6 integrin, alpha 6 185 POS OOOO24663 O.2273 238063 at TMEM154 transmembrane protein 154

US 2011/0230372 A1 Sep. 22, 2011 32

TABLE S6-continued Probe sets (and associated genes) that are significantly associated with distinction between negative and positive MRD at day 29. Highlighted top-23 probe sets correspond to those used in the final MRD predictor (Table S5). Rank High in p-value FDR (%) Probe set ID Gene Symbol Gene Description 250 Neg OOOOO216S 0.5864. 204822 a. TTK TTK protein kinase 251 Pos O.OOO15116 0.5864. 213035 a. ANKRD28 ankyrin repeat domain 28 252 Neg O.OOO4876S O.S864 221969 a Transcribed locus 253 Neg O.OOO24929 O.S864 234140 s at STIM2 stromal interaction molecule 2 254 Neg OOOOO6625 0.5864 222680s at DTL denticleless homolog (Drosophila) 255 Neg O.OO1877S6 0.5864. 208650 s at CD24 CD24 molecule 256 POS OOOO18824 0.5864. 24.2121 a RNF12 Ring finger protein 12 2S7 Pos O.OO164760 0.5864. 204759 a RCBTB2 regulator of chromosome condensation (RCC1) and BTB (POZ) domain containing protein 2 258 Neg O.OOO2686S 0.5864. 1565693 at DTYMK Deoxythymidylate kinase (thymidylate kinase) 259 Neg OOOOO2933 0.5864. 224162 s at FBXO31 F-box protein 3 260 PoS OOOOO6702 0.5864. 235142 a. RP1-27O5.1 / Zinc finger and BTB domain containing 8 filf Zinc ZBTB8 finger and BTB domain containing 8-like 261 Pos O.OO643099 0.5864 226905 a FAM101B amily with sequence similarity 101, member B 262 Neg O.OOO31499 0.5864. 212611 a DTX4 deltex 4 homolog (Drosophila) 263 POS O.OOO66791 0.5864. 228617 a XAF1 XIAP associated factor 1 264 POS OOOOO2358 0.5864 202615 a. GNAQ Guanine nucleotide binding protein (G protein), q polypeptide 26S Pos O.OO132537 O.S864 243366 s at Transcribed locus 266 POS O.OOO41347 0.5864. 224,566 a TncRNA trophoblast-derived noncoding RNA 267 Neg OOOOO1476 0.5864 223471 a RAB3IP RAB3A interacting protein (rabin3) 268 POS O.OOO61623 O.S864 60471 at RIN3 Ras and Rab interactor 3 269 Neg O.O2S3O326 0.5864. 217968 a TSSC1 tumor Suppressing Subtransferable candidate 1 27O POS O.OOO856S1 O.S864 219806 s at C11orf75 chromosome 11 open reading frame 75 271 Pos O.OOOS9783 0.5864. 202771 a FAM38A amily with sequence similarity 38, member A 272 Pos O.OO622O46 O.S864 1555705 a. at CMTM3 CKLF-like MARVEL transmembrane domain containing 3 273 Neg O.OOO43S43 O.S864 237104 a Transcribed locus 274 Neg O.OO171 OS1 0.5864 225O19 a CAMK2D calcium calmodulin-dependent protein kinase (CaM kinase) II delta 275 Pos O.OO167878 O.S864 203542 s a KLF9 Kruppel-like factor 9 276 Neg O.OO2O5947 0.5864 201189 s a ITPR3 inositol 1,4,5-triphosphate receptor, type 3 277 Neg O.OO382473 O.S864 231067 s a Transcribed locus 278 Pos O.OO26S825 0.5864. 228113 at RAB37 RAB37, member RAS oncogene family 279 Neg O.OOO70928 O.S864 219135s a LMF1 lipase maturation factor 1 280 PoS OOOOO9998 0.5864 37384 at PPM1F protein phosphatase 1F (PP2C domain containing) 281 POS O.OOSO3951 0.5864. 209555 s a CD36 CD36 molecule (thrombospondin receptor) 282 Neg OOOOOOO83 O.S864 225649 s a STK3S serine/threonine kinase 35 283 Pos O.OOO10819 0.5864 1555486 a. at FLJ14213 protor-2 284 Neg O.OOO1862O 0.5864. 218009 s a PRC1 protein regulator of cytokinesis 1 28S Pos O.OS823921 0.5864. 212592 at IG Immunoglobulin J polypeptide, linker protein for immunoglobulin alpha and mu polypeptides 286 POS OOOOO4247 O.S864 208109s a C15Orfs open reading frame 5 287 Neg O.OOO71640 0.5864 201792 at AEBP1 AE binding protein 1 288 POS O.OO1 O1179 0.5864. 231431 is a CDNA clone IMAGE:4798730 289 Pos O.OOOS346S 0.5864. 209287 s a CDC42EP3 CDC42 effector protein (Rho GTPase binding) 3 29O POS O.OOO 10578 O.S864 218749 s a SLC24A6 solute carrier family 24 (sodium potassium calcium exchanger), member 6 291 POS OOOOO1915 O.S864 240960 at Transcribed locus 292 PoS OOOO62248 0.5864. 227567 at AMZ2 Archaelysin family metallopeptidase 2 293 Neg O.OOO46323 0.5864. 214875 X a APLP2 amyloid beta (A4) precursor-like protein 2 294 Neg OOOOO7963 0.5864 201397 at PHGDH phosphoglycerate dehydrogenase 295 Pos O.OOO28O34 0.5864 220558 x a TSPAN32 tetraspanin 32 296 POS O.OO155722 0.9484 229530 at CDNA clone IMAGE:5302158 297 Neg O.OOO98262 0.9484 200790 at ODC1 ornithine decarboxylase 1 298 Neg O.OO2.70658 0.9484 219396 s a NEIL1 nei endonuclease VIII-like 1 (E. coli) 299 Neg O.OO102169 0.9484 242468 at 3OO POS O.OOO80721 0.9484 229015 at LOC286.367 FP944 301 Neg O.OO396O44 0.9484 214835 is a SUCLG2 Succinate-CoA ligase, GDP-forming, beta Subunit 3O2 POS OOOOO1286 O9484 209321 s a ADCY3 adenylate cyclase 3 303 Neg O.OOO73O84 0.9484 1555372 at BCL2L11 BCL2-like 11 (apoptosis facilitator) 304 Neg OOOOO7434 0.9484 205005 s a NMT2 N-myristoyltransferase 2 305 Neg O.OOO13234 0.9484 235258 at DCP2 DCP2 decapping enzyme homolog (S. cerevisiae) 306 Pos O.OOO16508 0.9484 51146 at PIGV phosphatidylinositol glycan anchor biosynthesis, class V 307 Pos O.OO140329 O9484 220330 s a SAMSN1 SAM domain, SH3 domain and nuclear localization signals 1 3O8 POS O.OOO32171 O9484 15575O1a at Full length insert cDNA clone YB22B02 309 Pos O.OOO13087 0.9484 235922 at CDNA FLJ39413 fis, clone PLACE6O15729 31 O POS O.OOO3O841 O9484 1554250 s at TRIM73 tripartite motif-containing 73 US 2011/0230372 A1 Sep. 22, 2011 33

TABLE S6-continued Probe sets (and associated genes) that are significantly associated with distinction between negative and positive MRD at day 29. Highlighted top-23 probe sets correspond to those used in the final MRD predictor (Table S5). Rank High in p-value FDR (%) Probe set ID Gene Symbol Gene Description 311 POS O.OO1263SO 0.9484 2096.04 sat GATA3 GATA binding protein 3 312 PoS O.OOO64807 0.9484 225883 a ATG16L2 ATG16 autophagy related 16-like 2 (S. cerevisiae) 313 Pos OOOOO6548 0.9484 209627 s at OSBPL3 oxysterol binding protein-like 3 314 POS O.OO213666 0.9484 201170s at BHLHB2 basic helix-loop-helix domain containing, class B, 2 31S Pos OOOO22148 0.9484 226267 a DP2 jun dimerization protein 2 316 POS OOOOOS968 0.9484 232614 a CDNA FLJ12049 fis, clone HEMBB100 1996 317 POS O.OOO41778 0.9484 204689 a HHEX hematopoietically expressed homeobox 318 Pos O.OOO10226 0.9484 205462 s at HPCAL1 hippocalcin-like 1 319 Neg O.OOO2O534 0.9484 210279 a GPR18 G protein-coupled receptor 18 320 Neg O.OO643099 0.9484 2087.03 s at APLP2 amyloid beta (A4) precursor-like protein 2 321 POS O.OOO11574 0.9484 207986 X at CYB561 cytochrome b-561 322 Neg OOOOO1756 0.9484 218344 sat RCOR3 REST corepressor 3 323 Neg O.OOO82334 0.9484 225147 a PSCD3 pleckstrin homology, Sect and coiled-coil domains 3 324 POS O.OO102169 0.9484 202371 a TCEAL4 transcription elongation factor A (SII)-like 4 325 POS O.OO41OOS1 0.9484 2054.07 a RECK reversion-inducing-cysteine-rich protein with kazal motifs 326 POS OOOOOS631 0.9484 227502 a KIAA1147 KIAA1147 327 Pos O.OO127566 0.9484 224697 a WDR22 WD repeat domain 22 328 Pos O.OO1 OO198 0.9484 228412 a LOC643072 hypothetical LOC643072 329 Pos O.OO22.9906 0.9484 236395 a Transcribed locus 330 POS O.OOO64807 0.9484 207761 s at METTL7A methyltransferase like 7A 331 Neg O.OOO973O7 0.9484 209383 a DDIT3 DNA-damage-inducible transcript 3 332 Pos O.OO104176 0.9484 227001 a NPAL2 NIPA-like domain containing 2 333 Pos O.OOO11574 0.9484 241916 a Transcribed locus 334 POS O.OOO60391 0.9484 201328 a ETS2 v-ets erythroblastosis virus E26 oncogene homolog 2 (avian) 335 Pos O.OOO89972 0.9484 228623 a Transcribed locus 336 Neg OOOOO1012 0.9484 226233 a B3GALNT2 beta-1,3-N-acetylgalactosaminyltransferase 2 337 Neg O.OOO42213 0.9484 204998 s at ATF5 activating transcription factor 5 338 POS O.OO21S637 0.9484 218400 a OAS3 2'-5'-oligoadenylate synthetase 3, 100kDa 339 Pos O.OOO19238 0.9484 243279 a Transcribed locus 340 POS O.OO2S1794 0.9484 2301.61 a Transcribed locus 341 Neg OOOO19449 0.9484 228049 X at Transcribed locus, strongly similar to XP 001172939.1 PREDICTED: hypothetical protein Pan troglodytes 342 Neg O.OOO23374 0.9484 226118 a CENPO centromere protein O 343 POS OOOOO3596 0.9484 20919.5 s at ADCY6 adenylate cyclase 6 344 POS OOOOOO409 0.9484 227132 a ZNF706 Zinc finger protein 706 345 Neg O.OO611754 0.9484 215772 x at SUCLG2 Succinate-CoA ligase, GDP-forming, beta Subunit 346 POS O.OOO39664 0.9484 212326 a VPS13D vacuolar protein sorting 13 homolog D (S. cerevisiae) 347 Pos O.OOO49267 0.9484 209933 s at CD3OOA CD300a molecule 348 Neg O.OOO286.36 0.9484 220719 a FLJ13769 hypothetical protein FLJ13769 349 POS OOOOO9998 0.9484 243356 a Transcribed locus 350 Neg O.OO144382 0.9484 204735 a. PDE4A phosphodiesterase 4A, cAMP-specific (phosphodiesterase E2 dunce homolog, Drosophila) 351 Neg O.OO196658 0.9484 20350S a ABCA1 ATP-binding cassette, Sub-family A (ABC1), member 1 352 Pos OOOOO3863 0.9484 1555420 a. at KLF7 Kruppel-like factor 7 (ubiquitous)

Note: Neg=MRD negative; Pos = MRD positive; p-value via two sample t-test FDR = False discovery rate as estimated by SAM Probe sets (top 23) used for final model building are shaded

Consideration of Diagnostic White Blood Cell (WBC) Count Technical Details on the Construction and Evaluation of the as a Predictive Variable Gene Expression Classifier for RFS 0.139. The WBC count at diagnosis had an independent effect on predicting RFS in our population but was deemed 0140. This section describes the detailed analysis tech untenable for use in modeling building due to the requirement niques that were used to construct and evaluate the gene of a binary WBC cutoff value instead of a continuous vari expression classifier. Throughout this section and the next, able. We believed that a cutoff value would be over-influenced the gene expression data will be denoted by X, i=1,2,..., p. by the cohort composition and patientage, particularly given j=1,2,..., n, where p and n are the numbers of genes and that trial eligibility and enrollment may itself be based on an samples, respectively. Here a gene refers to a probe set. The age-adjusted WBC count. A WBC cutoff of 50 K/uL was prediction model was constructed in two stages—gene selec shown to have significance in the validation cohort but not in tion and model building. our cohort, yet the gene expression classifier for RFS derived Gene selection based on association with outcome, here RFS, in the present work proved informative despite differences in is a necessary step for removing irrelevant genes and thus clinical parameters and therapies between the external vali improving the accuracy of the final prediction model. It also dation group and our cohort. reduces the dimensionality of the feature space so that a small US 2011/0230372 A1 Sep. 22, 2011 34 subset of genes can be used to build a stable predictor. In this ased prediction on each of the all samples and make best use paper we based our gene selection on the Cox score calcu of the data we therefore employed a nested cross-validation lated for each gene i: procedure as suggested by Simon and used by Asgharzadeh et. al.' This procedure, detailed in FIG. 12/S6, consists of Leave-One-Out Cross-Validation (LOOCV) with each fold : i = 1, 2, ... including a 20x5-fold cross-validation. Si + So a P Technical Details on the Construction and Evaluation of the Given a threshold tid-0, a gene will be excluded if the absolute Gene Expression Classifier for Predicting Day 29 MRD value of its Cox score is less thant. The Cox score for genei 0.141. The methodology for constructing and evaluating is calculated as follows. We denote the censored RFS data for the gene expression predictor for MRD is essentially the same sample jasy (tA), where t, is time and A, 1 if the observa as that described in the previous section. Because the tion is relapse, 0 if censored. Let D be the indices of the K response variable is binary (either MRD positive or negative), unique death times Z, Z2. . . . Z. Let R, R2, . . . . R. denote constructing the model is significantly less computationally the sets of indices of the observations at risk at these unique intensive, which allows more folds of cross-validation. relapse times, that is R-i:t,2Z}. Let m, the number of Gene selection is performed using the filter method with the indices in R. Let d be the number of deaths at time Z and modified t-test statistic calculated for each gene i:'' Xi. =P-X, and X, Xerx/m. Then

K r = X(x; - d. Vik) k=1 Here the numerator corresponds to the difference of the and sample means of the two classes (MRD positive and nega K 2 tive), and the denominator is an estimate Q, of the standard deviation plus apositive number Oo, where O is the median of S; 3. (d. ?m.)jeR (vii -. all O. k=1 The prediction analysis is based on the diagonal linear dis criminant analysis (DLDA) method.'" After calculating the so is the median of all s,. modified t-test statistich, for all genes, we ranked the genes in After excluding the irrelevant genes, principal component descending order by the absolute value Ih. The top P genes analysis is performed on the standardized expression values were used to build the discriminant function: of the remaining genes. Cox proportional hazard regression is then performed on the scores of the first principal component. The linear part of the fitted regression model, which is also a P linear combination of the probe sets, is used as the prediction f x - it; co-lo P - y hit,Ö; + do model. This model predicts a continuous score, either positive i or negative, on a new sample, which is associated with the risk to relapse: the higher the score, the higher the risk. The performance of the predictions on a set of new samples can be where p, and p, are the proportions of the MRD positive and evaluated by examining the association between the predicted negative samples, and L is the mean expression value of the score and RFS status of the samples. This was done in our ith gene. This model predicts a continuous score, either posi analysis by performing a Cox proportional hazard regression tive or negative, on a new sample, where a higher value is and calculating the likelihood ratio test (LRT) statistic. Larger more indicative of MRD positive. The model uses zero as a LRT implies better performance. binary prediction threshold and predicts MRD positive if the The number of genes included in the prediction model and the predicted score is positive and MRD negative otherwise. The performance of the model both depend on the thresholdt. In prediction performance depends on the number P of top sig this study 20 candidate thresholds were considered and the nificant genes included in the model. The value of P corre one corresponding to the best model was determined through sponding to the best model was determined through a 100x a 20x5-fold cross-validation 10-fold cross-validation procedure, as illustrated Once we have obtained a prediction model we would like to schematically in FIG. 13/S7. assess the significance of the model compared with known As with the performance evaluation for the RFS predictor, we clinical predictors. One approach to doing this would be to employed a nested cross-validation procedure as Suggested use the model to make predictions back on the samples and by Simon and used by Asgharzadeh et al.' to obtain an then compare the predicted risk scores with the clinical pre objective and unbiased performance evaluation for the DLDA dictors. It is known that Such an approach is biased which model, which also makes best use of the data. This procedure, would overestimate the significance of the final model detailed in FIG. 14/S8, consists of Leave-One-Out Cross because the same data were used both to develop the model Validation (LOOCV), with each fold including a 100x10-fold and to evaluate its significance. Another alternative approach cross-validation as illustrated in FIG. 137S7. that can avoid this bias is to separate the data into a training set 0142. Development pf a Gene Expression Classifier for for developing the model through the above procedure and a RFS in High-Risk ALL Excluding Cases with Known Recur test set used for evaluating the performance of the model. The ring Cytogenetic Abnormalities (t(1:19) and MLL) disadvantage of Such an approach is that it does not make In this analysis we rebuilt the gene expression classifier for efficient use of the data, since the training set may be too small RFS from the beginning through the extensive nested cross to develop an accurate model, and the test set may be too small validation. Please note that we removed the probe sets using to evaluate its significance. To obtain an objective and unbi the rule of 50% present call. After removing to 1:19) translo US 2011/0230372 A1 Sep. 22, 2011

cation and MLL rearrangement cases we were left with 163 patients. A 20x5-fold cross validation as detailed in original TABLE S7-continued manuscript was performed to determine the model for pre dicting the risk score of relapse. Twenty candidate thresholds Candidate thresholds and corresponding numbers of significant genes were considered. The number of significant probe sets deter and geometric means of likelihood ratio test (LRT) statistic values. mined by each threshold and geometric mean of the likeli # significant LRT Statistic hood ratio test statistic corresponding to each threshold are Threshold # Threshold Genes (Geometric mean) listed in Table S7. 11 146674 780.68 1.212886 12 1.61341 420.9 14742S7 TABLE ST 13 1.76008 219.08 1932876 14 190674 111.1 2.328886 Candidate thresholds and corresponding numbers of significant genes 15 2.05341 58.25 2.193993 and geometric means of likelihood ratio test (LRT) statistic values. 16 2.2O008 31.5 2.564132 17 2.34674 17.56 2.443301 # significant LRT Statistic 18 2.49341 10.13 1978.379 Threshold # Threshold Genes (Geometric mean) 19 2.64008 5.99 1531674 2O 2.78674 3.53 O.948.933 1 OOOOO7 23773.15 O.6682S8 2 O.14674 2O191.85 O.688759 3 O.29341 16699.37 O.779984 4 O44007 13379.21 O849028 The mean of the LRT statistic is also plotted in FIG. 15/S9. 5 O.S8674 10351.13 O.8836O3 We see that the geometric mean of the LRT reaches the 6 O.73341 7689.64 O.857314 maximum when the threshold is The “best model deter 7 O.88.007 S434.52 O.8427OS 8 1.02674 3647.99 O.917711 mined by this threshold is a linear combination of expression 9 1.17341 2313.88 O.938.914 values of 32 probe sets that are highly associated with RFS 10 1.32008 1383.15 1.01OO1 status. The information about the 32 probe sets are presented in Table S8, below.

TABLE S8 Probe sets (and associated genes) that are significantly associated with RFS Rank Score Probe Set ID Gene Symbol Gene Title 3.25 210830s at PON2 paraOXonase 2 3.24.242579 a BMPR1B bone morphogenetic protein receptor, type IB 3.07 201876 a PON2 paraOXonase 2 2.97 236750 a 2.94.212592 a. G immunoglobulin J polypeptide, linker protein for immunoglobulin alpha and mu polypeptides -2.79 216834 a RGS1 regulator of G-protein signaling 1 2.72 232539 a 2.71. 209288 s at CDC42EP3 CDC42 effector protein (Rho GTPase binding) 3 -2.69 202388 a RGS2 regulator of G-protein signaling 2, 24 kDa 2.68 213371 a LDB3 LIM domain binding 3 2.64 215028 a SEMA6A Sema domain, transmembrane domain (TM), and cytoplasmic domain, (Semaphorin) 6A 12 2.63 215617 a LOC26010 viral DNA polymerase-transactivated protein 6 13 2.61 209101 a CTGF connective tissue growth factor 14 2.59 20403.0 s at SCHIP1 Schwaninomin interacting protein 1 15 -2.55 209959 a NR4A3 nuclear receptor Subfamily 4, group A, member 3 16 2.53 222780 s.at BAALC brain and acute leukemia, cytoplasmic 17 2.53 203939 a NTSE 5'-nucleotidase, ecto (CD73) 18 2.51 236766 a 19 2.47 202242 a. TSPANT tetraspanin 7 2O 2.44 225355 a. LOCS4492 neuralized-2 21 2.41 211675 s at MDFIC MyoD family inhibitor domain containing 22 2.40 219313 a GRAMD1 C GRAM domain containing 1C 23 -2.40 203921 a CHST2 carbohydrate (N-acetylglucosamine-6-O) Sulfotransferase 2 24 2.39 2198.71 a FLJ13197 hypothetical FLJ13197 25 -2.39 207978 s at NR4A3 nuclear receptor Subfamily 4, group A, member 3 26 -2.38 221349 a VPREB1 pre-B lymphocyte 1 27 2.36 244280 a 28 2.34 209365 s at ECM1 extracellular matrix protein 1 29 2.33 239673 a 30 2.33 223449 a SEMA6A Sema domain, transmembrane domain (TM), and cytoplasmic domain, (Semaphorin) 6A 31 -2.32 202506 a SSFA2 sperm specific antigen 2 32 -2.32 205241 a SCO2 SCO cytochrome oxidase deficient homolog2 (yeast) US 2011/0230372 A1 Sep. 22, 2011 36

Through the nested cross validation procedure as described in 21)(ETV6-RUNX1) or trisomies of chromosomes 4, 10, and the manuscript the gene expression-based risk classifier pre 17) or those with unfavorable, “very high risk disease (asso dicted a risk score on each of the 163 patients. With a thresh ciated with t(9:22)(BCR-ABL1) or hypodiploidy), the bio old of Zero the risk score separated the 163 patients into low logic and genetic features of these higher risk ALL patients (n=66) vs. high (n=97) risk groups. Table S9 shows the asso are only now becoming well characterized. To identify ciation between the risk groups with day 29 MRD. novel, biologically defined subgroups within higher risk ALL and to identify genes defining these Subgroups that might TABLE S9 serve as new diagnostic or therapeutic targets for this form of disease, we performed GEP analysis in a cohort of 207 uni Two-Way Classification Table of formly treated higher risk ALL patients who were enrolled in Risk Groups and Day 29 MRD Status the Children's Oncology Group (COG) P9906 clinical trial (http://www.acor.org/pedonc? diseases/ALLtrials/9906. MRD day 28 Risk Group html). Under the auspices of a National Cancer Institute TAR (binary) Low Risk High Risk Total GET Project (Therapeutically Applicable Research to Gen Negative 61 35 96 erate Effective Treatments; www.target.cancer.gov), we have 63.54 36.46 100.00 also assessed genome-wide DNA copy number abnormalities Positive 24 34 58 in leukemic DNA in this same cohort and have performed 41.38 58.62 100.00 selective gene resequencing to identify genes consistently Missing 3 6 9 mutated in the leukemias cells of the cohort. Herein we 33.33 66.67 100.00 report the discovery of 8 gene expression-based cluster Total 88 75 163 groups of patients within higher risk pediatric ALL, identified S3.99 46.01 100.00 through shared patterns of gene expression. While two of these clusters were found to be associated with known recur Fisher Exact Test (after removing missing data): 0.006 rent cytogenetic abnormalities (either t(1:19)(TCF3-PBX1) or MLL translocations), the remaining 6 cluster groups had The Kaplan-Meier estimates of relapse-free survival (RFS) no detectable conserved cytogenetic aberrations, but 2 of the for the various groups based on gene expression classifer groups were associated with Strikingly different therapeutic based risk group for RFS and end-induction flow cytometric outcomes and clinical characteristics. The gene expression MRD status were plotted in Figures S10(A) through (F) as based cluster groups were also associated with distinct pat follows terns of genome-wide DNA copy number abnormalities and with the aberrant expression of “outlier genes. These genes Identification of Novel Cluster Groups in Pediatric Higher provide new targets for improved diagnosis, risk classifica Risk B-Precursor Acute Lymphoblastic Leukemia by Unsu tion, and therapy for this poor risk form of ALL. pervised Gene Expression Profiling 0143. The cure rate of pediatric B-precursor acute lym Materials and Methods phoblastic leukemia (ALL) now exceeds 80% with contem Patient Selection and Characteristics porary treatment regimens. These therapeutic advances have come through the progressive refinement of chemotherapy (0145 The COG Trial P9906 enrolled 272 eligible children and the development of risk classification schemes that target and adolescents with higher-risk ALL between Mar. 15, 2000 children to more intensive therapies based on their relapse and Apr. 25, 2003. This trial targeted a subset of patients with risk." Current risk classification schemes incorporate pre higher risk features (older age and higher WBC) that had treatment clinical characteristics (white blood cell count experienced relatively poor outcomes (<50% 4-year relapse (WBC), age, and the presence of extramedullary disease), the free survival (RFS)) in prior COG clinical trials. Patients presence or absence of sentinel cytogenetic lesions (such as were first enrolled on the COGP9000 classification study and t(12:21)(ETV6-RUNX1) and t(9:22)(BCR-ABL1), translo received a four-drug induction regimen.” Those with 5-25% cations involving MLL, and chromosomal trisomies or hypo blasts in the bone marrow (BM) at day 29 of therapy received diploidy), and measures of minimal residual disease (MRD) 2 additional weeks of extended induction therapy using the at the end of induction therapy, to classify children with ALL same agents. Patients in complete remission (CR) with less into “low” “standard/intermediate.” “high” or “very high” than 5% BM blasts following either 4 or 6 weeks of induction risk categories. Despite improvements in treatment and in were then eligible to participate in COG P9906 if they met the risk classification over the past three decades, up to 20% of age and WBC criteria described previously or had overt children with ALL still relapse. The majority of relapses central nervous system (CNS3) or testicular involvement at occur in those children who are initially classified as “stan diagnosis. Patients that met the higher risk age/sex/WBC dard/intermediate' or “high risk. Thus, while overall out criteria but had favorable genetic features t(12:21)(ETV6 comes have significantly improved, children classified with RUNX1) or trisomy of chromosomes 4 and 10 or those with “high” or “very high risk disease, those who have relapsed, unfavorable, “very high risk features t(9:22)(BCR-ABL1) or those of Hispanic or American Indian descent continue to or hypodiploidy were excluded. Patients enrolled in COG have relatively poor survivals. These latter groups require P9906 were uniformly treated with a modified augmented the development of novel therapies for cure. BFM regimen that included two delayed intensification 0144. Shuster previously showed that the group of chil phases." The majority of patients had MRD assessed by dren with high-risk B-precursor ALL based on the “NCl/ flow cytometric analysis of bone marrow samples at day 29 of Rome' criteria (age 210 years and/or presenting WBC 250, induction therapy as previously described'; cases were 000/uIL) could be refined using age, sex and WBC to identify defined as MRD-positive or MRD-negative at day 29 using a a subgroup of ~12% of B-precursor ALL patients, referred to threshold of 0.01%. herein as “higher risk, that had a very poor outcome with 0146 For this study, cryopreserved pre-treatment leuke <50% expected survival. In contrast to children with favor mia specimens were available on a representative cohort of able, “low” risk ALL (associated with the presence oft(12; 207 of the 272 (76%) patients registered to this trial. The 65 US 2011/0230372 A1 Sep. 22, 2011 37 unstudied patients included a greater proportion of older boys 0150. In the validation cohort (CCG 1961) the same initial with lower WBC counts, but otherwise were similar and filtering criteria were applied to the raw data. Each method showed no significant outcome differences (Supplement began with 54,504 probe sets. Applying the ROSE method, Table S1'; FIG. 21). Treatment protocols were approved by with the same cutoffs used in P9906, 167 probe sets were the National Cancer Institute (NCI) and participating institu retained and used for clustering. COPA and HC also used the tions through their Institutional Review Boards. Informed consent for participation in these research studies was same selection criteria as in P9906, and the top 167 probesets obtained from all patients or their guardians. Outcome data were used in clustering (Supplement, Table S7A). for all patients were frozen as of October 2006; the median time to event or censoring was 3.7 years. A validation cohort Assessment of Genome-Wide DNA Copy Number Abnor consisted of an independent study of 99 cases of NCI/Rome malities (CNA) high risk ALL that were derived from COG Trial CCG 1961 and used the same Affymetrix microarray platform. 0151 Copy number alterations were detected as described in Mullighan etal, and the initial CNA data for this cohort are Gene Expression Profiling also presented there. Briefly, DNA from the diagnostic leu kemic cells and from a sample obtained after remission induc 0147 RNA was isolated from pre-treatment, diagnostic tion therapy (germline) was extracted and genotyped using samples in the 207 ALL cases (131 bone marrow, 76 periph either the 250K Sty and Nsp single-nucleotide-polymor eral blood) using TRIZol (Invitrogen, Carlsbad, Calif); all phism (SNP) arrays (Affymetrix, Santa Clara, Calif.). SNP samples had >80% leukemic blasts. cDNA labeling, hybrid array data preprocessing and inference of DNA copy number ization and scanning were performed as previously described abnormalities (CNA) and loss-of-heterozygosity (LOH) was (detailed in Supplement)." A mask to remove uninformative performed as previously described.'' probe pairs was applied to all the arrays (detailed in Supple ment, Section 3). The default MAS 5.0 normalization was used. Array experimental quality was assessed using the fol Statistical Analyses lowing parameters and all arrays met these criteria for inclu 0152 Log rank analysis was used to evaluate relapse-free sion: GAPDH 25,000; 2.20% expressed genes; GAPDH survival (RFS).7 Kaplan-Meier survival analyses and hazard 3"/5' ratios s4; and linear regression r-squared values of ratios were also calculated for comparisons of group RFS.' spiked poly(A) controls >0.90. This gene expression dataset 19 Kruskal-Wallis rank Sumtests were used to analyze age and may be accessed via the National Cancer Institute caArray WBC counts; Fisher's exact test was used to evaluate the site (https://array.nci.nih.gov/caarray/) or at Gene Expression binary variables." All statistical analyses were performed Omnibus (http://www.ncbi.nlm.nih.gov/geo/). using R' (http://www.R-project.org, version 2.9.1, with stats Unsupervised Clustering Methods and Selection of Outlier and Survival packages). Genes Results 0148 Microarray gene expression data were available from an initial 54,504 probe sets after masking and filtering 0153. Reflective of their classification as higher risk, the (see Supplement, Section 30. Three distinctly different meth 207 children and adolescents had a median age of 13 years ods were used to select genes for hierarchical clustering: High (range: 1-20 years), a median WBC at disease presentation of Coefficient of variation (HC), Cancer Outlier Profile Analysis 62,300/uL, a male predominance (66%), and 35% were MRD (COPA) and Recognition of Outliers by Sampling Ends positive at day 29 of induction therapy (Supplement, Table (ROSE). In HC, the 54,504 probe sets were ordered by their S2'). Nearly 25% (51/205) of these children were of Hispanic/ coefficients of variation (CV) and the highest 254 probe sets Latino ethnicity, while 10% (21/207) had translocations were used for clustering. This method identifies probe set involving the MLL gene on chromosome 11q23 and 11% having an overall high variance relative to mean intensity. (23/207) had t(1:19)(TCF3-PBX1) translocations (Supple COPA (previously described by Tomlins et al)' selects out ment, Table S1'). The remaining cases (79%) did not have lier probe sets on the basis of their absolute deviation from median at a fixed point (typically 95" percentile). ROSE was known recurring chromosomal translocations. Relapse-free developed in our laboratory as an alternative to COPA, and survival (RFS) and overall survival (OS) in the 207 patients selects probe sets both on the basis of the size of the outlier were 66.3+3.5% and 83% at 4 years, respectively (FIG. 21). group they identify as well as the magnitude of the deviation from expected intensity (see Supplement, Sections 4B and C Unsupervised Hierarchical Clustering Defines Eight Gene for detailed methods of ROSE and COPA). Expression Cluster Groups 0149 For all three probe selection methods, the top 254 probe sets were clustered using EPCLUST (http://www.bio 0154 Based upon the assumption that the most robust infebc.ee/EP/EP/EPCLUST/, v0.9.23 beta, Euclidean dis clusters would be repeatedly and consistently identified by tance, average linkage UPGMA). A threshold branch dis more than one clustering approach, several methods of select tance was applied and the largest distinct branches above this ing probe sets for unsupervised clustering were applied to the threshold containing more than 8 patients were retained and gene expression data. First, using the top 254 genes selected labeled. The HC method was used as the basis of cluster by CV (the full gene list is provided in Supplement, Table nomenclature, with each new cluster being assigned a num S7A), we identified 8 distinct gene expression-based cluster ber. All clusters are prefixed by the method of their probe set groups which were labeled H1 through H8 (FIG. 17A). Inter selection (H=High CV, C=COPA and R=ROSE), with COPA estingly, while 20 of 21 cases with an MLL translocation were and ROSE numbers being assigned by the similarity of their incluster H1 (Table 1") and all 23 cases with at(1:19)(TCF3 group's membership to H-clusters. The top 100 median rank PBX1) were in cluster H2 (FIG. 17A), the remaining 6 clus order probe sets for each ROSE cluster are listed in the ters (labeled H3-H8) lacked a clear association with any Supplement, Section 6. previously described cytogenetic abnormality. US 2011/0230372 A1 Sep. 22, 2011 38

TABLE 1. Association of Clinical and Outcome Features with High CV Expression Cluster Groups' P H1 H2 H3 H4 HS H6 H7 H3 Total Value

# Cases, Cluster 2O 23 8 11 9 19 95 22 2O7 Median Age (Yrs) 6.9 13.1 13.8 14.2 14.7 14.5 11.4 13.8 13.1 O.OO2 Sex (Male) 11,2O 11.23 48 1011 7/9 15.19 64,95. 1S,22 137,207 O.16S Ethnicity (Hispanic) 3,20 6,23 2.8 211 Of8 3.18 22.95 13.22 51,205 O.018 MLL 2O2O O23 Of8 Of 11 Of Of 19 1.94 Of 22 21,207 <0.001 TCF3-PBX1 Of2O 23.23 Of8 Of 11 Of Of 19 O95 Of 22 23,207 <0.001 D29 MRD 8,16 Of2O Of7 211 7/9 6.19 27/88 1721 67.191 <0.001 Median WBC 1294 67.2 139.0 13.3 32.6 31.4 59.9 197.5 62.3 <0.001 RFS - 1 Yr SE 75.O. 9.7 91.3 5.9 87.5 - 11.7 100 NA 1 OONA 1 OONA 97.915 90.7 6.3 94.1 1.7 RFS - 2 yrs SE 65.O. 10.7 73.99.2 87.5 - 11.7 818 - 11.6 1 OONA 1 OONA 83.03.8 71.6-9.8 81.7 - 2.7 RFS - 3 Yrs SE 65.O. 10.7 73.99.2 87.5 - 11.7 72.7 - 13.4 88.9 10.5 94.15.7 77.2 - 4.4 52.5 10.9 75.1 3.0 RFS - 4Yrs SE 65.O. 10.7 73.99.2 750 - 15.3 58.216.9 88.9 10.5 94.15.7 67.45.1 23.0 - 10.3 66.33.5 RFS - SYrs SE 65.O. 10.7 73.99.2 750 - 15.3 58.216.9 88.9 10.5 94.1 : 5.7. 57.O- 6.5 ONA 61.93.9 Logrank p-value O.722 O.409 O.S82 O.930 O.185 O.O184 O.993 <0.001 Hazard Ratio 1.152 O.704 0.675 1.046 O.286 O.133 O.998 3491 Abbreviations and Notations: MRD. Minimal Residual Disease; RFS: Relapse-Free Survival; MLL; the presence of MLL translocations; TCF3-PBX1: the presence of a t (1; 19).TCF3-PBX1. Median WBC reported in 10/LL. 'All P-values are calculated for Fisher's Exact Test (all variables exceptage and WBC) or Kruskal-Wallis Rank SumTest (age and WBC) using R (version 2.9.1, survival and stats packages). Logrank p-values and hazard ratios calculated separately for each cluster using R (version 2.9.1, stats package)

0155. Using probe sets selected by methods designed to (93.2% identical), however a pair-wise comparison revealed find outliers (COPA and ROSE), nearly all of these same all to have nearly 90% common members. Even in the clusters were detected (FIGS. 17B and C: Tables 2 and 3'). The sole exception to this is cluster 4, which was not evident absence of cluster 4 in COPA clusters, the consensus overlap using the COPA probe sets. The degree of the overlap across of all three methods was 86.5%. This is particularly notewor these three methods was also quite extensive (Table 4" shows thy since only 37% of the clustering probe sets were shared by the cluster identity). HC and ROSE were the most similar all three methods (Supplement, Table S7B").

TABLE 2 Association of Clinical and Outcome Features with COPA Gene Expression Cluster Groups' C1 C2 C3 C5 C6 C7 C8 Total P-Value?

# Cases, Cluster 2O 23 10 11 21 102 2O 2O7 Median Age (Yrs) 6.9 13.1 15.2 14.7 14.5 11.7 14.3 13.1 <0.001 Sex (Male) 11,2O 11.23 S.10 8,11 1721 71,102 14?.20 137,2O7 O.196 Ethnicity (Hispanic) 3,20 6.23 2.10 Of 10 3f2O 25,102 12.20 51,205 O.OO8 MLL 2O2O O.23 Of 10 Of 11 Of 21 1,102 Of2O 21,207 <0.001 TCF3-PBX1 Of2O 23,23 Of 10 Of 11 Of 21 Of 102 Of2O 23,207 <0.001 D29 MRD 9.17 Of2O 1.9 8,11 6.21 26,94 17.19 67,191 <0.001 Median WBC 1294 67.2 33.5 32.6 26.O 52.5 158.3 623 O.O28 RFS - 1 Yr SE 80.0 - 8.9 91.3 5.9 90.0 9.5 1 OONA 1 OONA 97.1 1.7 89.7 6.9 94.1 1.7 RFS - 2 yrs SE 7O.O. 10.3 73.99.2 8O.O. 12.7 100 NA 1 OONA 84.1 - 3.7 63.3 11.O 817 - 2.7 RFS - 3 Yrs SE 7O.O. 10.3 73.99.2 8O.O. 12.7 90.0 9.5 94.7 5.1 770 - 4.2 42.2 11.3 75.13.O RFS - 4Yrs SE 7O.O. 10.3 73.99.2 70.O. 14.5 78.7 13.4 94.7 5.1 66.4 SO 15.19.3 66.33.5 RFS - SYrs SE 7O.O. 10.3 73.99.2 70.O. 14.5 78.7 13.4 94.7 5.1 56.1 6.4 OONA 61.93.9 Logrank p-value O.808 O.409 O.788 O.364 O.O10 O944

Abbreviations and Notations: MRD. Minimal Residual Disease; RFS: Relapse-Free Survival; MLL; the presence of MLL translocations; TCF3-PBX1: the presence of a t(1:19). TCF3-PBX1. Median WBC reported in 10 L.L. 'All P-values are calculated for Fisher's Exact Test (all variables except age and WBC) or Kruskal-Wallis Rank SumTest (age and WBC) using R (version 2.9.0, survival and stats packages, Logrank p-values and hazard ratios calculated separately for each cluster using R (version 2.9.1, stats package)

TABLE 3 ASSociation of Clinical and Outcome Features with ROSE Gene Expression Cluster Groups R1 R2 R3 R4 R5 R6 R7 R8 Total P-Value?

# Cases, Cluster 21 23 12 14 10 21 82 24 2O7 Median Age (Yrs) 4.7 13.1 15.2 14.3 14.5 14.5 7.8 14.1 13.1 <0.001 Sex (Male) 1121 11.23 612 1314 8.10 1721 54,82. 17.24 137,2O7 O.O43 US 2011/0230372 A1 Sep. 22, 2011 39

TABLE 3'-continued Association of Clinical and Outcome Features with ROSE Gene Expression Cluster Groups R1 R2 R3 R4 RS R6 R7 R8 Total P-Value? Ethnicity 4, 21 6.23 2f12 3.14 O.9 3,20 1882 15,24 51,205 O.OO)4 (Hispanic) MLL 21.21 O.23 Of 12 0.14 Of 10 O.21 Of82 0.24 21?2O7 <0.001 TCF3-PBX1 O.21 23,23 Of 12 0.14 Of 10 O.21 Of82 0.24 23, 207 <0.001 D29 MRD 9.17 Of2O 1.11 3.14 8.10 6.21 21.75 1923 67.191 <0.001 Median WBC 125.8 67.2 49.6 9.2 31.5 26.0 68.8 153.8 62.3 <0.001 RFS - 1 Yr SE 76.29.3 91.3 5.9 90.98.7 1OONA 1OONA 1 OONA 97.6 1.7 91.55.8 941 - 1.7 RFS - 2 yrs SE 66.7 10.3 73.99.2 81.8 11.6 92.96.9 1OONA 1 OONA 82.6 4.2. 69.7 9.6 81.7 - 2.7 RFS - 3 Yrs SE 66.7 10.3 73.99.2 81.8 11.6 85.79.4 90.O. 9.5 94.75.1 76.348 47.9 10.4 75.1 3.0 RFS - 4Yrs SE 66.7 10.3 73.99.2 72.7 - 13.4 75.0 - 12.9 78.7 13.4 94.75.1 66.25.5 21:09.5 66.33.5 RFS - SYrs SE 66.7 10.3 73.99.2 72.7 - 13.4 75.0 - 12.9 78.7 13.4 94.75:1. 53.4 7.4 ONA 61.93.9 Logrank p-value O.881 O.409 0.615 O.259 O.366 O.O10 O.68O <0.001 Hazard Ratio 1.060 O.704 O.744 O.S2O O.S28 O.117 1110 3.878 Abbreviations and Notations: MRD. Minimal Residual Disease; RFS: Relapse-Free Survival; MLL; the presence of MLL translocations; TCF3-PBX1: the presence of a t (1; 9) TCF3-PBX1. Median WBC reported in 10/LL 'All P-values are calculated for Fisher's Exact Test (all variables except age and WBC) or Kruskal-Wallis Rank SumTest (age and WBC) using R (version 2.9.1) Logrank p-values and hazard ratios calculated separately for each cluster using R (version 2.9.1, stats package

0158 Table 5' lists the 113 probesets that overlap between TABLE 4 the ROSE clustering probe sets and those that were among the top 100 rank order for each cluster (Supplement, Sections 5 Comparison of Membership of P9906 Clusters and 6). The majority of those associated with R1 (the cluster containing all the MLL translocated Samples), including Cluster Overall MEIS1, PROM1, RUNX2 and members of the HOX gene 1 2 3 4 5 6 7 8 Identity family, are consistent with previous reports describing the elevated expression of these genes in samples with underly HC w COPA 19 23 8 O 9 19 88 1.9 89.4% ing MLL translocations.' We also found a number of other HC w ROSE 2O 23 8 10 9 19 82 22 93.2% interesting outlier genes associated with MLL translocations, COPA v ROSE 2O 23 10 0 1 O 21 82 20 89.9% such as CTGF, which has previously been reported to be HC w COPA v ROSE 19 23 8 O 9 19 82 19 86.5% associated with a poor outcome in adult ALL; the correla tion of CTGF expression and MLL translocations in that 0156. In addition to the significant association (p<0.001) study was not reported. The outlier genes that distinguished between recurrent cytogenetic abnormalities and clusters 1 cluster R2, containing all 23 cases with t(1:19)/TCF3-PBX1, and 2, we observed significant associations between the clus included PBX1, which is directly involved in the underlying ters and several clinical features, including age (p<0.001-0. translocation. Surprisingly, while many of the probe sets 002), race (p=0.004-0.018), the presence of MRD at the end associated with the other clusters formed very clear blocks of of induction therapy (p<0.001), and relapse free survival elevated expression (FIG. 17), they were neither comprised of (RFS) (Tables 1'-3', FIG. 18). Of particular note was the any obvious pathways nor located within a particular chro significant variation in RFS among the cluster groups (FIG. mosomal vicinity. These blocks of probe sets with very 18). Two of these (clusters 6 and 8) reached levels of statis tical significance by independent logrank analysis in all three elevated expression, however, strongly suggest that a small methods (cluster 6: p=0.010-0.018, HR=0.117-0.133; cluster Subset might be used to distinguish the sample clusters. 8: p<0.001, HR =3.491-4.382). While the overall 4-year RFS 0159. Since several of the genes exhibiting outlier expres was 66.3+3.5%, cluster 6 ranged from 94.1+5.7 to 94.7+5. sion inclusters R1 and R2 are involved in or activated by their 1%, with COPA and ROSE identifying the largest cluster (21 underlying cytogenetic abnormalities, this suggests that out members) with the highest RFS. In contrast, the 4-year RFS lier genes associated with the other ROSE clusters might also for cluster 8 ranged from 15.1+9.3% for COPA to 23.0+10. be involved in, or perturbed by, a comparable genetic abnor 3% for HC. Again, the ROSE cluster (R8) was the largest, mality. Consistent with this hypothesis is the presence of with 24 members, and was intermediate in its RFS (21.0-9. notable outlier genes defining cluster R8 (including GAB1, 5%). All 18 members of C8 were all contained within the R8 MUC4, PON2, GPR110, SEMA6, SERPINB9; Supplement, cluster. Tables S15 S17 and S18) whose expression has been asso 0157. The timing of relapse also differed between the clus ciated with t(9:22)/BCR-ABL1 and with overall outcome in ter groups. While all relapses in clusters 1, 2 and 6 occurred ALL.'" Although patients in R8 were, by definition, all within the first three years, patients in the remaining clusters, BCR-ABL1 negative, the strong similarity in expression pat particularly in cluster 8, continued to experience relapses in terns suggests a shared root pathway. Two recent reports of years 3-5. Cluster 8 was also distinguished by a high fre CRLF2 translocations and deletions in pediatric ALL also quency of MRD positivity at the end of induction therapy implicate this as a potential candidate for perturbation within (81.0-89.5% of cases) and a preponderance of Hispanic/ cluster 8. While the elevated expression of CRLF2 is a Latino ethnicity (59.1-62.5%) (Tables 1'-3"). Due to the exten feature of many R8 samples, however, it is not highly sive overlap of cluster membership, the larger size of the expressed in all. None of the other highly expressed genes clusters, and the fact that R1 and R2 identified all MLL and associated with the other clusters has yet been shown to be TCF3-PBX1 samples, ROSE was selected as the reference directly involved in a translocation or activated by Such an clustering method. event. US 2011/0230372 A1 Sep. 22, 2011 40

TABLE 5 ROSE Outlier Probe Sets/Genes Present in Top Rank Order of Clusters R1 R2

220416 a ATP8B4 227441 S at ANKS1B 213808 at ADAM23* 203949 at MPO 219463 a C20orf103 2274.40 a ANKS1B 20386.5 s at ADARB1 203948 S. at MPO 205899 a CCNA1 227439 a ANKS1B 23.0128 at IGL(a) 202273 at PDGFRB 209101 a CTGF 243533 x at ANKS1B* 231513 at KCNJ2* 203476 at TPBG 218468 S. at GREM1 234261 a ANKS1B 203726 S. at LAMA3 213150 a HOXA10 202207 a ARL4C 232914 S at SYTL2 235521 a HOXA3 202206 a ARL4C 225496 s at SYTL2 213844 a HOXAS 212077 a CALD1 214651 S. at HOXA9 223786 a CHST6 209905 a HOXA9 205489 a CRYM 218847 a IGF2BP2 206070 s at EPHAJ 201105 a LGALS1 201579 a FAT1 1557534 at LOC339862. 231455 a. FLJ42418 202890 a MAP7 239657 X at FOXO6 242172 a. MEIS1 235666 a TGA8? 204069 a MEIS1 23.5911 a KO32008 1559477 s at MEIS1 213005 S at KANK1 204304 S at PROM1 208567 s at KCNJ12 202976 s at RHOBTB3 21 O150 s at LAMAS 232231 a RUNX2 228262 a MAP7D2 226415 a. VATIL 206028 S. at MERTK 231899 a ZC3H12C 204114 a NID2 212151 a PBX1 212148 a PBX1 205253 a PBX1 227949 a PHACTR3 202178 a PRKCZ 242385 a RORB 231040 a RORBR 46665 at SEMA4C 206181 a SLAMF1 225483 a VPS26B

RS R6 R7 R8

212062 at ATP9A 242457 a 219837 S at CYTL1 229975 at BMPR1B 228297 at CNN3* 241535 a. 212192 at KCTD12 208303 s at CRLF2 209604 S at GATA3 204066 s at AGAP1 238689 at GPR110 213362 at PTPRD 240758 a AGAP1* 235988 at GPR110 229661 at SALL4 233225 a. AGAP1* 236489 at GPR11 OR 213258 at TFPI 219470 x at CCNJ 2O7651 at GPR171 210665 at TFPI 203921 a. CHST2 212592 at IG 210664 S at TFPI 206756 a CHST7 213371 at LDB3 1552398 a at CLEC12AB 217110 S. at MUC4 231166 a GPR155 217109 at MUC4 202409 a IGF2 204895 x at MUC4 215177 s at ITGA6 201656 a ITGA6 211340 S at MCAM 210869 s at MCAM 215692 s at MPPED2 205413 a MPPED2 202336 s at PAM 228863 a PCDH17 227289 a PCDH17 205656 a PCDH17 230537 a PCDH17 203335 a. PHYH 203329 a PTPRM 1555579 s at PTPRM 220059 a STAP1 1554343 a at STAP1

Correlation of Genome-Wide Copy DNA Number Changes ber abnormalities measured using SNP arrays, as previously with ROSE Clusters described. The genome-wide copy number abnormalities in 0160 To gain insights into the genetic heterogeneity this higher-risk ALL cohort were recently reported, but within higher risk B-precursor ALL and to identify underly herein we correlate these copy number abnormalities with the ing genetic lesions, particularly in the novel ROSE-defined novel gene expression-based cluster groups that we have cluster groups, we further correlated the gene expression defined through ROSE outlier gene analysis (Table 6'; profiles we had obtained with genome-wide DNA copy num Supplement, Table S16'). As shown in Table 6', while certain US 2011/0230372 A1 Sep. 22, 2011

copy number abnormalities (such as those in seen in CDKN2A/B and PAX5) were found in several ROSE clus TABLE 6'-continued ters, other abnormalities were more uniquely associated with Correlation of Genome-Wide DNA Copy Number Abnormalities and each cluster group. As expected, 1 qgain and TCF3 loss were Acquired Mutations With ROSE Gene-Expression Cluster Groups' highly associated with the R2 cluster that contains TCF3 PBX1 cases, reflecting the unbalanced tC1:19) translocations Rose Cluster Group that lead to duplication of chromosome 1 telomeric to PBX1 R1 R2 R3 R5 R6 R8 R7 P-Value Comments and deletion of chromosome 19 telomeric to TCF3. ERG deletions, as previously described by Mullighan, et al., were ARMC2- O 2 O 2 O 5 4 O.O291 SESN1 seen almost exclusively (8 of 9) in R6. EBF1 deletions were JAK12 O O O O O 1,11 2 S

Despite these related signatures, as was shown with CCG 1961 cases, when BCR-ABL1 samples are clustered together TABLE S1 with other high-risk samples using outlier genes, they do not necessarily segregate to cluster 8. Comparison of HR-ALL Patients Registered to COG P9906 (n = 272) and The Subset of Patients Examined and 0170 As part of a comprehensive approach to the genetic Modeled for Gene Expression Signatures (n = 207 analysis of high-risk B-precursor ALL, we have undertaken a focused targeted gene sequencing effort of the COG P9906 Not p-value cohort under the auspices of a National Cancer Institute TAR Char- Studied Studied Total (Fisher's GET Initiative (www.target.cancer.gov). Through this effort, acteristics N % N % N % exact test) we discovered mutations in two members of the JAK family of tyrosine kinases (JAK1 and JAK2) in 12/24 R8 cluster Age - no. members and 7 patients that did not cluster (R7). Of these 12 e1OYrs S1 78.46 132 63.77 183 67.28 O.O335 JAK mutant R8 cases, 9 also had IKZF1 deletions (while <1OYrs 14 21.54 75 26.23 89 32.72 11/12 without JAK mutations had IKZF1 lesions). It is likely Sex - no. that other unidentified mutations are responsible for the “acti Male S2 80 137 66.18, 189 69.49 O.O442 vated kinase' gene expression signature in the R8 cases with Female 13 20 70 33.82 83 30.51 out JAK mutations, and we are currently performing a range WBC - no. of complementary genomic analysis, including sequencing of the tyrosine kinome, in search of them. <50K/L S2 80 99 47.83 151 SS.S1 0.01% blasts)." As differences in various characteristics between the entire shown in Table S2, only MRD at the end of induction therapy group (n=272) and the present study cohort (n=207) were and increasing WBC count were significantly associated with examined by the statistical comparisons between the present decreased relapse free survival (RFS). The significant effect study cohort and remaining patients (n-65) not included in of WBC countas a continuous variable on decreased RFS was the present study. Each P-value in Table S1 and Figure S1 is no longer seen when the cutoff of 50 K/uL was applied (see that of the individual test which needs to be adjusted for multiple testing. A simple Bonferroni adjustment multiplies Section 7). A trend towards declining RFS was also observed the P-values by the total number of tests (10). After this among the 25% of children with Hispanic/Latino ethnicity adjustment, none of the characteristics are significantly dif contained within this cohort. In multivariate analysis, both ferent between the entire group and the cohort examined MRD and WBC count retained significance when adjusted herein, except the test for WBC count when a cutoff value was for one another (likelihood ratio test based on COX regres considered. sion, P-value <0.001). US 2011/0230372 A1 Sep. 22, 2011 44

intensities and expression data were generated with the TABLE S2' Affymetrix GCOS1.4 software package. Association of Relapse Free Survival with Clinical B. Microarray Data Masking and Genetic Features in the High Risk ALL Cohort 0.175 Prior to any intensity analysis, the microarray data Association with Relapse were first masked to remove those probes found to be unin Free Survival formative in a majority of the samples. Removal of these Hazard probe pairs improves the overall quality of the data and elimi Characteristic Ratio p-value nates many non-specific signals that are shared by a particular Age sample type (i.e., cross-hybridizing messages present in blood and marrow samples). Each probe pair (across all 207 e1OYrs 132 1 samples) was evaluated and masked if the mismatch (MM) <1OYrs 75 1.152 O.S61 was greater than the perfect match (PM) in more than 60% of Age the samples. This mask removed 94,767 probe pairs (15.7% Median 13.5 yrs of the 604,258) and had some impact on 38,588 probe sets Range 1-2O 995 O.817 (71%). As shown in Table S3, the net impact of masking was Sex a significant increase in the number of present calls coupled Male 137 1 with a dramatic decrease in the number of absent calls. The Female 70 O.769 O.32O mask removed only seven probe sets (0.01% of the 54.675), WBC all of which represented non-human control genes. Median 62.3 K/IL Range 1-959 OO3 <0.001 TABLE S3 MRD at Day 29 Impact of Masking on Affymetrix Statistical Calls (Reported Negative 124 as Percentage of Total Probes: 54.675 raw 54.668 masked). Positive 67 2.805 <0.001 Race Present Marginal Absent No call

Hispanic 51 .644 O.049 Raw 34.9 1.7 6.3.3 O or Latino Masked 48.0 3.1 48.9 O (7) Others 154 MLL

Positive 21 O61 O.881 C. Microarray Data Filtering Negative 186 TCF3 PBX1 0176 Prior to any clustering, the data were filtered to remove probe sets deemed to be unrelated to disease: genes Positive 23 704 O.409 from sex-determining regions of X and Y (which simply Negative 184 correlate with sex), spiked control genes and globin genes CNS (presumed to arise from contaminating normal blood cells). No blasts 160 All filtered probe sets were selected based upon their gene <5 blasts 26 O.897 O.708 symbols or chromosomal location. Table S4 lists the 89 probe 25 blasts 21 sets mapped within sex-determining regions. These include the XIST gene from chromosome X and probe sets from Yp 11-Yd 11. All probe sets from PAR1 and PAR2 regions of Validation Cohort both sex chromosomes are retained. Table S5 lists the 62 Affymetrix spiked control genes. Table S6 lists the twenty (0173 A subset of patients from COG CCG 1961 "Treat excluded globin probe sets with a gene symbol beginning ment of Patients with Acute Lymphoblastic Leukemia with with “HB and the word “globin' contained within the gene Unfavorable Features' was used as a validation cohort to title. After the filtering of these probe sets 54,504 were avail determine whether similar clusters were present in a different set of high-risk patients. As described in Bhojwani et al., able for clustering. COG CCG 1961 enrolled a total of 2078 patients with NCI high risk features, i.e. WBC count 250,000/uIL or age 210 TABLE S4 years old, from September 1996 to May 2002. Microarray X- and Y Specific Transcripts Excluded from the Analysis (89 data from these 99 patients were analyzed using the methods described in this paper. Probe Set ID Gene Symbol Cytoband 214218 s at XIST Xq13.2 3. Data Processing 221728 x at XIST Xq13.2 224.588 at XIST Xq13.2 A. Microarray Preparation and Scanning 224.589 at XIST Xq13.2 224.590 at XIST Xq13.2 0.174. After RNA quantification, cDNA preparation, and 227671 at XIST Xq13.2 labeling, biotinylated cRNA was fragmented and hybridized 243712 at XIST Xq13.2 201909 at LOC10O133662 RPS4Y1 Yp11.3 to HG U133 Plus2.0 oligonucleotide microarrays (Affyme 204409 s at EIF1AY Yd 11.222 trix, Santa Clara, Calif.) containing 54.675 probesets. Signals 204410 at EIF1AY Yd 11.222 were scanned (Affymetrix GeneChip Scanner) and analyzed 205000 at DDX3Y Yd 11 with the Affymetrix Microarray Suite (MAS 5.0). Signal

US 2011/0230372 A1 Sep. 22, 2011 46

the intensities for each probe set were divided by its MAD. TABLES5'-continued Finally, these MAD-normalized intensities at the 95th per centile were sorted. In order to make the comparison of all AFFX Probe Sets Excluded from the Analysis (62) Probe Set ID clustering methods more comparable, an equal number of probe sets (254) was selected from the top of the sorted list FFX-r2-BS-lys-3 at FFX-r2-Bs-phe-5 at and was used for clustering. FFX-r2-Bs-phe-M at FFX-r2-Bs-phe-3 at C. Selection of ROSE Probe Sets FFX-r2-Bs-thr-3 s at FFX-r2-Bs-thr-M S at (0179 ROSE (Recognition of Outlier by Sampling Ends) FFX-r2-Bs-thr-5 s at FFX-HUMISGF3AM97935. 5 at was developed as an alternative method for outlier detection. FFX-HUMISGF3AM97935 MA at In COPA, units of MAD at a fixed point (typically either the FFX-HUMISGF3AM97935 MB at 90th or 95th percentile) rank the outliers. This fixed-point FFX-HUMISGF3AM97935 3 at FFX-HUMRGEAM100.98 5 at threshold confers a size bias for the clusters (higher percentile FFX-HUMRGEM100.98 M at levels favor Smaller groups of outlier signals). More impor FFX-HUMRGEM100.98 3 at tantly, the ranking of probe sets is by the magnitude of their FFX-HUMGAPDH/M33197 5 at FFX-HUMGAPDH/M33197 M at deviation. Those with the greatest deviations will dominate FFX-HUMGAPDH/M33197 3 at the top of the list. The potential drawback to this is that larger FFX-HSACO7/XOO351 5 at groups of related samples with outlier signals may be missed FFX-HSACO7/XOO351 M at FFX-HSACO7/XOO351. 3 at if the magnitude of their variance is not extremely high. FFX-M27830 5 at In contrast, ROSE applies a single threshold for the magni FFX-M27830 M at tude of the deviation and then orders the probe sets by the size FFX-M27830 3 at of the largest sampled group that satisfies this cutoff. Regard FFX-hum alu at less of the magnitude of the difference from median, all probe sets that satisfy the threshold cutoff and are within the desig nated size range are considered equal. Details of the ROSE TABLE S6 method, as it was applied in this study, follow. The intensity Globin Probe Sets Excluded from the Analysis (20 values for each of the 54,504 probe sets were plotted indi vidually inascending order. The plots were divided into thirds Probe Set ID Gene Symbol Cytoband and the intensities from the middle third were used to generate 1562981 at HBB 1S.S trend lines by least squares fitting. Groups of 2*k (where k is 204018 x a HBA1 HBA2 6p13.3 an integer from 2 to one third of the sample size) were 204419 x a HBG1 HBG2 1S.S 204848 x a HBG1 HBG2 1S.S sampled from each end of the intensity plots and the median 205919 at HBE 1S.S intensities of these groups were compared to the trend lines. 206647 at HBZ 6p13.3 The choice of a trend line as the metric, rather than simply 206834 at HBD 1S.S 209116 x a HBB 1S.S median, is meant to reduce the number of probe sets than 209458 x a HBA1 HBA2 6p13.3 simply have a high variance, but do not necessarily contain 211696 X a HBB 1S.S distinct clusters of outlier samples. 211699 X a HBA1 HBA2 6p13.3 FIG.22 (S2) illustrates how this is accomplished. Increasing 211745 X a HBA1 HBA2 6p13.3 213515 x a HBG1 HBG2 1S.S sized groups are sampled from each end until the median 214414 X a HBA1 HBA2 6p13.3 intensity of a group fails to exceed the desired threshold. The 216036 at HBBP1 1S.S largest value ofkat which each probeset Surpasses the thresh 217232 x a HBB 1S.S 217414 X a HBA1 HBA2 6p13.3 old is recorded. The probe sets are then ordered by their 217683 at HBE 1S.S maximum k values. In this study a probe set was selected for 220807 at HBQ 6p13.3 clustering if k26 and the median intensity of the sampled 240336 at HBM 6p13.3 group was at least 7-fold its corresponding point on the trend line. This threshold fork was selected in order to enrich for groups in the range of 10 or more members (greater than 5% 4. Selection of Clustering Probe Sets: High CV. ROSE and of the population size). Smaller groups, although still possi COPA bly quite interesting, are much less likely to yield statistically significant results. The 7-fold threshold was chosen to mini A. Selection of High CV Probe Sets mize the impact of signal noise on probe set selection and also 0177. Each of the remaining 54,504 filtered probesets was to limit the total number of probe sets to be used for cluster ordered by its coefficient of variation (CV-standard devation/ ing. Only 254 probe sets out of 54,504 (0.5%) satisfied these mean). The 254 probesets with the highest CVs were used for criteria of 7x threshold and k values 26. the H clustering. D. Outlier Probe Set Selection for CCG 1961 (Validation B. Selection of COPA Probe Sets Cohort) 0.178 The COPA method was applied essentially as 0180 Masking and filtering was applied to the CCG 1961 described by Tomlins et a1.5 First, the median expression for data set exactly the same way as in P9906. ROSE used the each probe set was adjusted to Zero. Secondly, the median same 7-fold threshold for intensity and k26. 167 probe sets absolute deviation from median (MAD) was calculated and (0.3% of the 54,504) satisfied these criteria. COPA clustering

US 2011/0230372 A1 Sep. 22, 2011 48

TABLE S7A'-continued TABLE S7A'-continued

Probe Sets Used in P9906 and CCG1961 Probe Sets Used in P9906 and CCG1961 The probe sets common to HC and either COPA or ROSE are The probe sets common to HC and either COPA or ROSE are shown in bold; those shared between COPA and either shown in bold; those shared between COPA and either HC or ROSE are italicized. HC or ROSE are italicized.

HC COPA ROSE HC COPA ROSE 2501 at 223278 at 214774 x at 226677 at 236430 a. 230472 at 2859 x at 223449 at 215177 s at 226757 at 236489 at 230537 at 13005 s at 2235.02 s at 215182 X at 226818 at 236633 a 230,698 at 13150 at 223720 at 215379 X at 226913 s at 236773 a 230803 s at 3194 at 223885 at 215692 s at 227099 s at 236967 a 230817 at 13258 at 224215 s at 216623 x at 227195 at 237069 s at 231040 at 13317 at 225.369 at 217083 at 227289 at 237238 a 231166 at 13371 at 225436 at 217 109 at 227439 at 23.7717 x at 23 1223 at 3418 at 225483 at 217110 s at 227697 at 237828 a 23 1257 at 13479 at 225496 s at 217276 X at 227949 at 237978 a 23455 at 3488 at 225.660 at 217281 X at 22.8057 at 238018 at 231513 at 3791 at 225681 at 217284 X at 2282.62 at 238689 a 231771 at 13808 at 226282 at 217963 S at 228297 at 238900 a 23 1899 at 13844 at 226415 at 218086 at 228434 at 239361 a 232231 at 3993 at 226913 s at 218330 s at 228462 at 240179 at 232523 at 14349 at 227099 s at 28468 S at 228854 at 240336 at 232636 at 14651 s at 227289 at 218469 at 228863 at 240758 at 232914 S at 14774 x at 227439 at 218847 at 229638 at 240794 a 233225 at 5108 x at 227440 at 219463 at 229661 at 241527 a 234261 at 15177 s at 227441 s at 219470 x at 229985 at 241535 a. 235521 at 5214 at 227711 at 219489 S at 23.0128 at 242172 a. 235666 at 15379 X at 227949 at 219837 S at 230255 at 242385 a 23.5911 at 15692 s at 228O17 s at 220010 at 230291 s at 242457 at 235988 at 5784 at 22.8057 at 220059 at 230537 at 242468 at 23.6430 at 6320 x a 228434 at 220377 at 230788 at 242747 a 236489 at 6336 X a 228462 at 220416 at 230791 at 243533 x at 236773 at 6401 X a 228599 a 221254 S at 231202 at 244002 a 238018 at 6491 x a 228854 at 22 1933 at 23 1223 at 244155 x at 238689 at 6560 x a 228863 at 222921 s at 23 1257 at 244665 at 239657 X at 16623 x at 228918 a 222934 S at 231771 at 244750 a 240179 at 6853 x a 229029 a 223 21 S at 232231 at 244782 a 240336 at 6874 at 229149 a 223786 at 232523 at 1552398 a at 240758 at 6984 x a 22.9233 a 224215 s at 232629 at 552767 a. at 241535 at 17109 at 2294.61 x at 224.520 s at 232636 at 553629 a. at 241960 at 1 7110 s at 229638 at 225436 at 233225 at 553963 at 24.272 at 7143 s at 229661 at 225483 at 234830 at 554343 a at 242385 at 7148 x a 229967 a 225496 s at 235249 at 554912 at 242457 at 7165 x a 229975 a. 225597 at 235371 at 555220 a. at 242468 at 7179 x a 229985 at 225681 at 235988 at 1555579 s at 243533 x at 7235 x a 230030 a. 226084 at 236489 at 555745 a. at 244665 at 7258 x a 2301.10 a 226282 at 2374.71 at 1557534 at 2447.50 at 738.8 s at 230306 a 226415 at 237613 at 557876 at 1552398 a at 7623 at 230468 s at 226676 at 23.7625 s at 559394 a at 1552511 a. at 8145 at 230472 a 226733 at 238018 at 559459 at 1552767 a at 9093 at 230537 at 226913 s at 238423 at 15594.77 s at 1553629 a at 9360 s at 230668 a 227006 at 24O104 at 559842 at 1554343 a at 9666 at 230698 a 227099 s at 2401 79 at 559865 at 1554.633 a at 219714 S at 2308.03 s at 227289 at 240336 at 560315 at 1555579 s at 220010 at 230817 a 227439 at 240758 at 560642 at 555745 a at 220416 at 231040 a 2274.40 at 241960 at 561025 at 1555756 a. at 221215 s at 23 1223 at 227441 S at 242457 at 563868 a. at 1557534 at 221766 s at 23 1257 at 227949 at 242468 at 566825 at 1559394 a at 22 1933 at 231455 a. 22807 s at 242541 at 568603 at 1559459 at 222288 at 231706 s at 22.8057 at 243533 x at 569591 at 15594.77 s at 223278 at 231771 at 2282.62 at 244463 at 569663 at 1561025 at 223678 s at 231899 a 228297 at 244665 at 570058 at 1566.825 at 223786 at 232231 at 228434 at CCG 1961 Probe sets (167) 223939 at 232530 a. 228462 at 22421.5 s at 233225 at 228854 at 117 at 1554008 at 1554008 at 225496 s at 233847 x at 228863 at 1554008 at 1555167 s at 1555167 s at 225681 at 234261 a 22.9233 at 1554140 at 1555216 a. at 1555578 at 226034 at 234803 a 2294.61 x at 1554655 a. at 1555578 at 1555579 s at 226084 at 234849 a 229638 at 1555167 s at 1555579 s at 1559394 a at 226189 at 234985 a 229661 at 1555579 s at 1557534 at 15594.77 s at 226325 at 235284 S at 229975 at 1557534 at 1559280 a. at 1560.109 S at 226415 at 235666 a 229985 at 1559280 a. at 1559394 a at 1560225 at 226492 at 235721 a 2301 10 at 1559477 s at 1559477 s at 1560483 at 226621 at 23.5911 a 23.0128 at 1559696 at 1560.109 S at 1560581 at 226676 at 235988 at 230130 at 15599.10 at 1560225 at 1565558 at US 2011/0230372 A1 Sep. 22, 2011 49

TABLE S7A'-continued TABLE S7A'-continued

Probe Sets Used in P9906 and CCG1961 Probe Sets Used in P9906 and CCG1961 The probe sets common to HC and either COPA or ROSE are The probe sets common to HC and either COPA or ROSE are shown in bold; those shared between COPA and either shown in bold; those shared between COPA and either HC or ROSE are italicized. HC or ROSE are italicized.

HC COPA ROSE COPA ROSE 1560225 at 1562903 at 200800 S at 211743 s at 213371 at 21577 S at 1562903 at 1565558 at 201579 at 212148 at 2 3423 x at 216623 x at 1567912 S at 200800 S at 201842 s at 212554 at 213558 at 217 109 at 201131 s at 201579 at 202178 at 2 1294.2 s at 213566 at 217I10s at 201215 at 201842 s at 202289 S at 3032 at 214020 x at 217963 S at 201243 s at 202178 at 202581 at 13150 at 214043 at 28922 S at 201842 s at 202289 S at 202890 at 13317 at 214446 at 2 19355 at 201843 s at 202478 at 203038 at 13371 at 214651 s at 219463 at 202007 at 202581 at 203290 at 3380 x at 2497.8 s at 219489 S at 202609 at 202890 at 203373 at 3418 at 21577 S at 2198.40 S at 203131 at 203038 at 203434 s at 3436 at 215305 at 2198.55 at 203216 s at 203290 at 2034.76 at 3479 at 216623 x at 220276 at 203290 at 2034.76 at 203695 S at 13558 at 217 109 at 220377 at 203304 at 203695 S at 203835 at 3791 at 217I10 s at 220922 s at 203632 s at 203835 at 20386.5 S at 3993 at 217963 S at 222 162 S at 204014 at 20386.5 S at 204014 at 3994 s at 28922 S at 222288 at 204015 s at 204014 at 204015 s at 4433 s at 219225 at 2224.50 at 204066 s at 204069 at 204069 at 14651 s at 2 19355 at 223.075 S. at 204069 at 20414 at 20414 at 4769 at 219463 at 223754 at 204337 at 204304 S at 204439 at 4774 x at 219489 S at 223786 at 204895 x at 204416 X at 204895 x at 5108 x at 2198.40 S at 224022 x at 205253 at 204439 at 204913 s at 5121 X at 2198.55 at 224762 at 205382 s at 204895 x at 2049.4 S at 15305 at 220276 at 225.369 at 205413 at 204914 S at 20495 S at 5733 x at 220377 at 225782 at 205493 s at 20495 S at 204944 at 6320 x at 220528 at 225977 at 205573 s at 204944 at 20509s at 16623 x at 222 162 S at 226034 at 205627 at 20509 S at 205253 at 17109 at 222258 s at 226096 at 205857 at 205253 at 205413 at 17110 s at 222288 at 226282 at 205899 at 205382 s at 205489 at 7138 x at 222347 at 226636 at 205942 s at 205413 at 20554.4 S at 8507 at 2224.50 at 22693 S at 205951 a 205477 s at 205592 at 9093 at 223319 at 227006 at 205980 S. at 205489 at 205857 at 19225 at 223422 S at 227289 at 205987 a 20554.4 S at 205870 at 219525 at 223786 at 227372 s at 206070 s at 205627 at 205899 at 220225 at 224022 x at 227377 at 206084 a 205857 at 20593.6 s at 221731 x at 224762 at 227441 S at 206135 at 205870 at 205946 at 221870 at 225977 at 227949 at 206204 a 205899 at 20611 at 221901 at 226034 at 228018 at 206207 at 205936 S at 206181 at 222288 at 226096 at 22.8057 at 206298 a 205946 at 206207 at 222315 at 226282 at 22816 at 206371 at 20611 at 20643 S at 2224.50 at 226636 at 228262 at 206432 a 206135 at 206756 at 222885 at 22693 S at 228462 at 206741 a 206181 at 208.285 at 223235 s at 227289 at 228863 at 206756 at 206207 at 209291 at 223611 s at 227372 s at 228994 at 206785 S at 206371 at 209392 at 22361.2 s at 227377 at 2291.08 at 206851 a 20643 S at 209570 S at 223786 at 227441 S at 2292.47 at 2O7638 a 206710 s at 20.9602 S at 224022 x at 227949 at 229638 at 207768 a 206756 at 209822 S at 225575 a. 228018 at 229975 at 207802 a 206881 s at 209905 at 225842 a. 22.8057 at 230030 at 208029 s at 208.285 at 210016 at 226034 at 22816 at 230668 at 208090 s. at 208470 s at 210665 at 226676 a 228262 at 250680 at 208148 a 209291 at 210683 at 226677 a 228462 at 23 1040 at 208.605 s at 209392 at 211306 S at 227174 a 228863 at 23 1223 at 209289 a 209570 S at 211382 s at 227289 at 228994 at 2312.57 at 209291 at 20.9602 S at 211560 S at 227372 s at 229638 at 231316 at 209436 a 209822 S at 211743 s at 227481 a 229661 at 231455 at 209.687 a 209905 at 212148 at 227758 a 229963 at 23 1600 at 209774 X at 210016 at 2125 at 228462 at 229975 at 231859 at 209905 at 210432 s at 212592 at 228766 a 230472 at 23 1899 at 210095 s at 210683 at 21294.2 s at 228780 a 250680 at 232010 at 2101.35 s at 211306 S at 23005 S at 228863 at 23 1040 at 23223 at 210402 at 211518 s at 213050 at 22.9147 a 23 1223 at 232636 at 210546 X at 211560 S at 213317 at 229638 at 23 1257 at 232903 at 210664 S at 212094 at 213371 at 229934 a 231503 at 234985 at 210665 at 212148 at 2 3423 x at 229963 at 23 1600 at 235343 at 210683 at 2125 at 213906 at 2301.10 a 23 1899 at 235557 at 211276 at 212592 at 214020 x at 23O372 a. 232010 at 235988 at 211518 s at 23005 S at 214446 at 230495 a 23223 at 236430 at 211674 X at 213150 at 214651 s at 23 1040 at 232636 at 236489 at 211719 x at 213317 at 2497.8 s at 23 1223 at 235557 at 237207 at US 2011/0230372 A1 Sep. 22, 2011 50

TABLE S7A'-continued TABLE S8 Probe Sets Used in P9906 and CCG1961 Identity of Membership in P9906 Clusters The probe sets common to HC and either COPA or ROSE are shown in bold; those shared between COPA and either Cluster HC or ROSE are italicized. 1 2 3 4 5 6 7 8 Overall HC COPA ROSE HC w COPA 19 23 8 O 9 19 88 1.9 89.4% 23 1899 at 23.5911 at 2374.21 at HC w ROSE 2O 23 8 10 9 19 82 22 93.2% 232523 a 235988 at 2374.66 s at COPA v ROSE 2O 23 10 0 10 21 82 20 89.9% 233038 a 236489 at 238617 at HC w COPA v ROSE 19 23 8 O 9 19 82 19 86.5% 233463 a 2374.21 at 238778 at 233969 a 2374.66 s at 2396.57 x at 235004 a 237974 at 239964 at 6. Probesets Associated with Rose Clusters (by Median Rank 2355.57 at 2386.17 at 240032 at 235700 a 239610 at 240179 at Order) 235771 a 2396.57 x at 240245 at The top 100 median rank order probe sets for each ROSE 2363.01 a 239964 at 240336 at cluster are given. Percentile denotes the ranking of the median 237802 a 240032 at 240347 at cluster rank order relative to the maximum possible. Bold font 238,091 a 240245 at 240466 at indicates that these probe sets were also among the 254 out 238175 a 240347 at 240496 at 240758 a 240466 at 24.1506 at liers selected for clustering. Probe sets marked with an aster 242172 at 240496 at 24-1960 at isk (including several PCDH17, GAB1, GPR110, CENTG2 243533 x at 242172 at 242172 at and CD99) indicate those for which Affymetrix does not 243917 at 242747 at 2424.68 at specify a gene, however the probe sets were mapped using the 243932 a 243917 at 243917 at UCSC Genome Browser (http://genome.ucsc.edu/) between exons of the indicated genes. Those with a question mark were also lacking Affymetrix gene data, but were mapped TABLE STB within 10 kb of the indicated gene using the UCSC Genome Browser. Overlap of Probe Sets Used in Either P9906 or CCG1961 COPA ROSE TABLE S9' P9906 (254 total) Top 100 Rank Order Genes Defining ROSE Cluster 1 (Rl HC 96 (37.8%) 135 (53.1%) Per COPA 169 (66.5%) Probeset centile Symbol EntrezID Cytoband HC & COPA 94 (37.0%) CCG 1961 (167 total) 219463 at 100 C2Oorf103 24141 20p12 205899 at 100 CCNA1 8900 13912.3-q13 HC 55 (32.9%) 46 (27.5%) 235479 at 100 CPEB2 132864 4p15.33 COPA 130 (77.8%) 226939 at 100 CPEB2 132864 4p15.33 HC & COPA 42 (25.1%) 241706 at 100 CPNE8 1444.02 12q12 236921 at 100 EMB* 5q11.1 222603 at 100 ERMP1 79956 9p24 213147 at 100 HOXA10 3206 7p15-p14 213150 at 100 HOXA10 3206 7p15-p14 TABLE STC 23.5521 at 100 HOXA3 3200 7p15-p14 214651 s at 100 HOXA9 3205 7p15-p14 Common P9906 and CCG1961 Probe Sets by Method 209905 at 100 HOXA9 3205 7p15-p14 215163 a 100 IGF2BP2: 3q27.2 HC (1961) COPA (1961) ROSE (1961) 226789 a 100 LOC6471.21 647121 1 p11.2 202890 at 100 MAP 9053 6q23.3 HC (9906) 55 (32.9%) 56 (33.5%) 59 (35.3%) 238498 a 100 MAP7? 6q23.3 COPA (9906) 36 (21.6%) 66 (39.5%) 68 (40.7%) 204069 at 100 MES1 4211 2p14-p13 ROSE (9906) 45 (26.9%) 75 (44.9%) 77 (46.1%) 242172 at 100 MES1 4211 2p14-p13 15594.77 s at 100 MES1 4211 2p14-p13 219033 a 1OO PARP8 79668 5q11.1 204304 s at 100 PROM1 8842 4p15.32 5. Overlap of P9906 Clusters Defined by Each Method 242414 a 100 QPRT 23475 16p11.2 204044 a 100 QPRT 23475 16p11.2 0182 Each of the three clustering methods in P9906 iden 1568589 at 100 REEP3: 10q21.3 231899 at 100 ZC3H12C 85463 11q22.3 tified predominantly the same samples even though they 220416 at 99.5 ATP8B4 79895 15q21.2 shared only 37% of the probe sets (Table S7B). As in shown 225841 a 99.5 C1orf59 113802 1p13.3 in Table S8, the overall identity of samples across all three 227877 a 99.5 CSOrf39 389289 5p12 212063 a 99.5 CD44 960 11p13 methods is 86.5%. The primary factor responsible for this 213844 at 99.5 HOXAS 3202 7p15-p14 being lower than ~90% is that HC and ROSE identified a 2188.47 at 99.5 GF2BP2 10644 3q27.2 cluster 4, while COPA did not. All 23 of the patients with 201163 s at 99.5 IGFBP7 3490 4q12 TCF3-PBX1 translocations were grouped into cluster 1 by all 201105 at 99.5 LGALS1 3956 22013.1 2284.12 at 99.5 LOC643072 643072 2d24.2 three methods, as were 19 of the 21 patients with MLL trans 240180 at 99.5 MAP7 6q23.3 locations. Even though the remaining clusters lacked known 201153 s at 99.5 MBNL1 4154 3q25 underlying translocations they were also very highly con 1558111 at 99.5 MBNL1 4154 3q25 served. US 2011/0230372 A1 Sep. 22, 2011 51

TABLE S9'-continued TABLE S9'-continued Top 100 Rank Order Genes Defining ROSE Cluster 1 (Rl Top 100 Rank Order Genes Defining ROSE Cluster 1 (Rl Per Per Probeset centile Symbol .ID Cytoband Probeset centile Symbol EntrezID Cytoband 1556658 a. at 99.5 MB 235753 a 98.5 HOXA7 3204 7p15-p14 238558 a 99.5 MB 213910 a 98.5 IGFBP7 3490 4q12 244008 a 99.5 PARP8, 1569041 at 98.5 JMJD1C: 10q21.2 204082 a 99.5 SO90 203836 s at 98.5 MAP3KS 4217 6q22.33 230480 a 99.5 PIWIL4 143689 203837 a 98.5 MAP3KS 4217 6q22.33 232231 at 99.5 RUNX2 860 201152 s at 98.5 MBNL1 4154 3q25 211769 x at 99.5 SERINC3 10955 235879 a 98.5 MBNL1 4154 3q25 226415 at 99.5 VAT1L. 57687 225202 a 98.5 RHOBTB3 22836 5q15 203827 a 99.5 WIPI1 55062 227719 a 98.5 SMAD9 4093 13q12-q14 242023 a 99 AB 63874 225959 s at 98.5 ZNRF1 84937 16923.1 202603 a 99 ADAM10 223382 s at 98.5 ZNRF1 84937 16923.1 215925 S. at 99 CD72 971 210783 X at 98 CLEC11A 6320 19q13.3 228365 a 99 CPNE 1444O2 232645 a. 98 LOC153684 153684 5p12 214297 a 99 CSPG4 1464 241681 a 98 MBNL1 * 3d25.2 200046 a 99 DA D1 1603 202976 s at 98 ROBTB3 22836 5q15 227002 a 99 FAM78A 286.336 227611 a 98 TARSL2 123283 15q26.3 235291 s at 99 FL 32255 643977 2098.25 S. at 98 UCK2 7371 123 238712 a 99 FOXP1:8 223383 a 98 ZNRF1 84937 16923.1 204417 a 99 GA LC 2581 36553 at 975 ASMTL 8623 Xp22.3; Yp11.3 235173 a 99 hCG 1806964 401093 224848 a 97.5 CDK6 1021 7q21-q22 201162 a 99 IGFBP7 3490 213379 a 97.5 COQ2 2723S 421.23 232544 a 99 IGFBP73 209101 at 97.5 CTGF 1490 6q23.1 241391 a 99 218147 s at 97.5 GLT8D1 55830 3p21.1 99 LOC339862 339862 1557534 at 218468 s at 97.5 GREM1 26585 15q13-q15 155.6657 at 99 MB NL1: 219988 s at 99 RN F220 55182 227235 a. 97.5 GUCY1A3 2982 4d31.3- 221473 X at 99 SERINC3 10955 q33|4q31.1-q31.2 206506 sat 99 SUPT3H 8464 206289 a 97.5 HOXA4 3.201 7p15-p14 213836 s at 99 WIPI1 55062 227384 S at 97.5 LOC72782O 72782O 121.1 218581 at 98.5 AB 63874 203537 a 97.5 PRPSAP2 5636 17p11.2-p12 214895 s at 98.5 ADAM10 102 226168 a 97.5 ZFAND2B 1306.17 235 212174 at 98.5 AK2 204 225962 a 97.5 ZNRF1 84937 16923.1 203562 at 98.5 FEZ1 9638

TABLE S1 O' Top 100 Rank Order Genes Defining ROSE Cluster 2 (R2 Probeset Percentile Symbol EntrezID Cytoband 227440 at 100 ANKS1B 56899 12q23.1 227441 s at 100 ANKS1B 56899 12q23.1 227439 at 100 ANKS1B 56899 12q23.1 234261 at 100 ANKS1B 12q23.1 243533 x at 100 ANKS1B 12q23.1 202206 at 100 ARL4C 10123 2037.1 2292.47 at 1OO FBLN7 1298.04 2d 13 239657 X at 100 FOXO6 100132074 1p34.1 202106 at 1OO GOLGA3 2802 12q24.33 213005 s at 100 KANK1 23189 9p24.3 207110 at 1OO KCNJ12 3768 17p11.2 232289 at 1OO KCNJ12 3768 17p11.2 208567 s at 100 KCN12 fif 100131509 II/ 17p11.2 LOC100131509 fif 100134444 if LOC100134444 3768 213909 at 1OO LRRC15 131578 3q29 206028 s at 100 MERTK 10461 2014.1 211913 s at 1OO MERTK 10461 2d 14.1 238778 at 1OO MPP7 143098 10p11.23 212789 at 1OO NCAPD3 23310 11q25 212148 at 100 PBX1 5087 1923 212151 at 100 PBX1 5087 1923 205253 at 100 PBX1 5087 1923 227949 at 100 PACTR3 116154 20g 13.32 231095 at 1OO PITPNC1: 17q24.2 202178 at 100 PRKCZ, 5590 1p36.33-p36.2 223693 s at 1OO RADIL 55698 7p22.1 222513 s at 1OO SORBS1 10580 10q23.3-q24.1

US 2011/0230372 A1 Sep. 22, 2011 55

TABLE S12'-continued Top 100 Rank Order Genes Defining ROSE Cluster 4 (R4 Probeset Rank Symbol Entrez.ID Cytoband 226832 at 98.0% 202273 at 98.0% PDGFRB 5159 5q31-q32 225376 at 98.0% C20orf11 S4994 20g 13.33 225281 at 98.0% C3orf17 25871 3q13.2 201096 s at 98.0% ARF4 378 3p21.2- p21.1 203948 is at 97.5% MIPO 4353 17q23.1 1558017 s at 97.5% 203949 at 97.5% MIPO 4353 17q23.1 1555392 at 97.5% LOC10O128868 1OO128868 7q31.2 227541 at 97.5% WDR2O 91833 4.d32.31 1567458 s at 97.5% RAC1 5879 213920 at 97.5% CUX2 23316 224734 at 97.5% HMGB1 3.146 206673 at 97.5% GPR176 11245 224636 at 97.5% ZFP91 8O829 235232 at 97.5% GMEB1 10691 208762 at 97.5% SUMO1 7341 36612 at 97.0% FAM168A 232O1 225240 s at 97.0% MSI2 124540 336 at 97.0% TBXA2R 6915 223101 s at 97.0% ARPCSL 81873 209049 s at 97.0% ZMYND8 23613 21794.0 s at 97.0% CARKD 55739 3d34 216508 X at 97.0% CTCFL if HMGB1 if 1 OO130561 3q12 / 20g 13.31 /// HMGB1L1 if 1 OO132863 10357 3.32 I. HMGB1L10 if 140690 3146 2.1 / 9q33.2 LOC10O132863 201266 a 97.0% TXNRD1 7296 212286 a 97.0% ANKRD12 23253 200618 a 97.0% LASP1 3927 227577 a 97.0% EXOC8 149371 203068 a 97.0% KLHL21 99.03 217787 s at 97.0% GALNT2 2590 239930 a. 97.0% GALNT2 2590 227700 x at 97.0% ATAD3A 55210 225694 a 97.0% CRKRS 51755 202514 a 97.0% DLG1 1739 226115 a. 97.0% AHCTF1 25909 1562948 at 97.0% 225456 a 97.0% MED1 S469 208821 a 97.0% SNRPB 6628 212204 a 97.0% TMEM87A 25963 231124 x a 97.0% LY9 4O63 218118 s a 97.0% TIMM23 10431 212272 a. 96.5% LPIN1 23175 220684 a 96.5% TBX21 30009 216836 s a 96.5% ERBB2 2O64 232521 a 96.5% PCSK7 9159 20583.9 s a 96.5% BZRAP1 92S6 218031 s a 96.5% FOXN3 1112 226640 a 96.5% DAGLB 221955 213514 s a 96.5% DIAPH1 1729 225494 a 96.5% DYNLL2 14073S 213222 a. 96.5% PLCB1 23236 212594 a 96.5% PDCD4 27250 201133 s a 96.5% PA2 9867 235463 s a 96.5% LASS6 253782 200047 s a 96.5% YY1 7528 201407 s a 96.5% PPP1CB 5500 1552931 a. at 96.5% PDE8A 5151 242467 at 96.5% 213860 x a 96.5% CSNK1A1 1452 212927 at 96.5% SMCS 231.37 227237 x a 96.5% ATAD3B 732419, 83858 LOCA32419 200775 s a 96.5% HNKNPK 3190 210203 at 96.5% CNOT4 48SO 214352 s a 96.5% KRAS 384.5 1555.772 a. at 96.5% CDC25A 993 212696 s a 96.5% RNF4 6047 235233 s a 96.5% GMEB1 10691

US 2011/0230372 A1 Sep. 22, 2011 62

(0191) 9. Cheok MH, Yang WL, Pui CH, et al. Treatment TABLE S19'-continued specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nat Genet. 2003: Copy Number Analysis (CNA) Variations Associated with ROSE Clusters 34(1): 85-90. (0192 10. Holleman A, Cheok M H, den Boer ML, et al. FET Gene-expression patterns in drug-resistant acute lympho 1 2 3 5 6 8 no cluster p-value blastic leukemia cells and response to treatment. NEngll LOC44O742* O O O O O 3 3 O.3136 Med. 2004; 351(6):533-542. TOX O O O O O 3 4 O.3430 (0193 11. Lugthart S. Cheok M H, den Boer ML, et al. FBXW7 O O O O O 2 1 0.3779 Identification of genes associated with chemotherapy RB1 O 4 O 1 1 2 12 O.3886 FHIT O O O O O 1 O 0.5505 crossresistance and treatment response in childhood acute MSRA O O O 1 O O 3 O.6230 lymphoblastic leukemia. Cancer Cell. 2005;7(4):375-386. ARID1B O 1 O 1 1 2 3 O.6751 (0194 12. Mullighan C G, Goorha S, Radtke I, et al. ARPP-21 O O O O O 2 5 0.6777 Histone cluster O O O O O 2 6 O.6782 Genome-wide analysis of genetic alterations in acute lym MBNL1 O O 1 O O 1 3 O.6815 phoblastic leukaemia. Nature. 2007:446(7137):758-764. ATP10A O O O 1 O 1 3 O.6815 (0195 13. Flotho C, Coustan-Smith E. Pei DQ, et al. A set iAmp21 O O O O O 1 7 O.6879 of genes that regulate cell proliferation predictstreatment NRAS O O O O 1 O 2 0.7695 ADAR O O O O O 1 1 O.7992 outcome in childhood acute lymphoblastic leukemia. COPEB-KLF6 O O O O O 1 1 O.7992 Blood. 2007: 110(4): 1271-1277. CCDC26 2 1 0 1 3 3 8 O.8732 0196) 14. Bhojwani D, Kang H. Menezes RX, et al. Gene ABL1. O O O O O 1 2 O.9109 expression signatures predictive of early response and out NR3C2 O O O O O 1 4 0.9751 ARHGAP24 O O O O O 1 3 1.OOOO come in high-risk childhood acute lymphoblastic leuke ZMYMS O O O O O O 3 1.OOOO mia: a Children's Oncology Group Study on behalf of the SPRED1 (5') O O O O O O O 1.OOOO Dutch Childhood Oncology Group and the German Coop LTK O O O O O O O 1.OOOO erative Study Group for Childhood Acute Lymphoblastic The CNA variations are shown along with their membership in each ROSE cluster, Leukemia. J Clin Oncol. 2008; 26(27):4376-4384. FET indicates the p-value for this results as determined by Fisher's Exact Test, (0197) 15. Sorich MJ, Pottier N, Pei D, et al. In vivo CNA variations are sorted in ascending order by their p-values, response to methotrexate forecasts outcome of acute lym phoblastic leukemia and has a distinct gene expression REFERENCES profile. PLoS Med. 2008; 5(4):646-656. (0198 16. Mullighan CG, Su X, Zhang J, et al. Deletion of First Set IKZF1 and prognosis in acute lymphoblastic leukemia. N 0183 1. Pui CH, Evans W. E. Drug therapy Treatment of Englj Med. 2009:360(5):470-480. acute lymphoblastic leukemia. N Engll Med. 2006: 354 (0199. 17. Mullighan CG, Zhang J, Harvey RC, et al. JAK (2):166-178. mutations in high-risk childhood acute lymphoblastic leu 0184 2. Pui C H., Robison L. L. Look AT. Acute lympho kemia. Proc Natl AcadSci USA. 2009; 106(23):9414-9418. blastic leukaemia. Lancet. 2008: 371 (96.17):1030-1043. (0200 18. Den Boer ML, van Slegtenhorst M, De Menezes 0185. 3. Pui C H., Pei D Q, Sandlund J.T. et al. Risk of R X, et al. A subtype of childhood acute lymphoblastic adverse events after completion of therapy for childhood leukaemia with poor treatment outcome: a genome-wide acute lymphoblastic leukemia. JClin Oncol. 2005; 23 (31): classification study. Lancet Oncol. 2009; 10(2):125-134. 7936-7941. 0201 19. Nachman J. B. Sather HN, Sensel M. G. et al. 0186 4. Schultz KR, Pullen DJ, Sather HN, et al. Risk Augmented post-induction therapy for children with high and response-based classification of childhood Bprecursor risk acute lymphoblastic leukemia and a slow response to acute lymphoblastic leukemia: a combined analysis of initial therapy. N EnglJ Med. 1998:338(23):1663-1671. prognostic markers from the Pediatric Oncology Group 0202. 20. Shuster JJ. Camitta B M, Pullen J, et al. Identi (POG) and Children's Cancer Group (CCG). Blood. 2007: fication of newly diagnosed children with acute lympho 109(3):926-935. cytic leukemia at high risk for relapse. Cancer Research 0187 5. Smith M. Arthur D. Camitta B, et al. Uniform Therapy and Control. 1999; 9(1-2):101-107. approach to risk classification and treatment assignment 0203 21. Bair E. Hastie T. Paul D, Tibshirani R. Prediction for children with acute lymphoblastic leukemia. J Clin by Supervised principal components. J Am Stat Assoc. Oncol. 1996; 14(1):18-24. 2006; 101(473): 119-137. 0188 6. Borowitz M. J. Devidas M, Hunger S. P. et al. 0204 22. Asgharzadeh S. Pique-Regi R, Sposto R, et al. Clinical significance of minimal residual disease in child Prognostic significance of gene expression profiles of hood acute lymphoblastic leukemia and its relationship to metastatic neuroblastomas lacking MYCN gene amplifi other prognostic factors: a Children's Oncology Group cation. J Natl Cancer Inst. 2006: 98(17): 1193–1203. study. Blood. 2008: 111 (12):5477-5485. 0205 23. Simon R. Development and evaluation of thera (0189 7. Pui CH, Jeha S. New therapeutic strategies for the peutically relevant predictive classifiers using gene expres treatment of acute lymphoblastic leukaemia. Nat Rev Drug sion profiling. J Natl Cancer Inst. 2006: 98(17): 1169 Discov. 2007; 6(2):149-165. 1171. 0.190 8. Yeoh E. J. Ross ME, Shurtleff SA, et al. Classi (0206. 24. Tusher V. G., Tibshirani R, Chu G. Significance fication, Subtype discovery, and prediction of outcome in analysis of microarrays applied to the ionizing radiation pediatric acute lymphoblastic leukemia by gene expression response. Proc Natl Acad Sci USA. 2001; 98(9):51 16 profiling. Cancer Cell. 2002: 1(2):133-143. 5121. US 2011/0230372 A1 Sep. 22, 2011

0207 25. Ross ME, Zhou X, Song G, et al. Classification 0222 40. Mageed AS, Pietryga DW, DeHeer DH, West R of pediatric acute lymphoblastic leukemia by gene expres A. Isolation of large numbers of mesenchymal stem cells sion profiling. Blood. 2003: 102(8):2951-2959. from the washings of bone marrow collection bags: char 0208. 26. Martin S B, Mosquera-Caro MP, Potter J. W., et acterization of fresh mesenchymal stem cells. Transplan al. Gene expression overlap affects karyotype prediction in tation. 2007; 83(8): 1019-1026. pediatric acute lymphoblastic leukemia. Leukemia. 2007: 0223 41. Deaglio S. Dwyer KM, Gao W. et al. Adenosine 21(6):1341-1344. generation catalyzed by CD39 and CD73 expressed on 0209. 27. Mullican S E, Zhang S. Konopleva M, et al. regulatory T cells mediates immune Suppression. J Exp Abrogation of nuclear receptors Nr4a3 and Nr4al leads to Med. 2007; 204(6):1257-1265. development of acute myeloid leukemia. Nat Med. 2007; 0224 42. Mikhailov A, Sokolovskaya A, Yegutkin GG, et 13(6):730-735. al. CD73 participates in cellular multiresistance program 0210 28. Schwable J, Choudhary C. Thiede C, et al. RGS2 and protects against TRAIL-induced apoptosis. J Immu is an important target gene of Flt3-ITD mutations in AML mol. 2008; 181(1):464–475. and functions in myeloid differentiation and leukemic 0225. 43. Sala-Torra O, Gundacker HM, Stirewalt DL, et transformation. Blood. 2005: 105(5):2107-2114. al. Connective tissue growth factor (CTGF) expression and 0211) 29. Gottardo N G, Hoffmann K, Beesley AH, et al. outcome in adult patients with acute lymphoblastic leuke Identification of novel molecular prognostic markersfor mia. Blood. 2007: 109(7):3080-3083. paediatric T-cell acute lymphoblastic leukaemia. Br. J Hae 0226 44. Boag JM, Beesley A. H. Firth MJ, et al. High matol. 2007: 137(4):319-328. expression of connective tissue growth factor in pre-B 0212. 30. Agenes F, Bosco N. Mascarell L. Fritah S. acute lymphoblastic leukaemia. Br. J Haematol. 2007: 138 Ceredig R. Differential expression of regulator of Gprotein (6):740–748. signalling transcripts and in vivo migration of CD4+ naive 0227 45. Hoffmann K. Firth M. J. Beesley A. H. et al. and regulatory T cells. Immunology. 2005; 115(2): 179 Prediction of relapse in paediatric pre-B acute lymphoblas 188. tic leukaemia using a three-gene risk index. Br. J Haematol. 0213 31. HorkeS, Witte I, Wilgenbus P. Kruger M, Strand 2008: 140(6):656-664. D. Forstermann U. Paraoxonase-2 reduces oxidative stress 0228 46. Baldus CD, Martus P. Burmeister T, et al. Low in vascular cells and decreases ERG and BAALC expression identifies a new subgroup of stress-induced caspase activation. Circulation. 2007; 115 adult acute T-lymphoblastic leukemia with a highly favor (15):2055-2064. able outcome. J Clin Oncol. 2007; 25(24):3739-3745. 0214 32. Gomis R R, Alarcon C. He W. et al. A FoxO 0229 47. Langer C. Radmacher MD, Ruppert AS, et al. Smad Synexpression group in human keratinocytes. Proc High BAALC expression associates with other molecular Natl AcadSci USA. 2006; 103(34): 12747-12752. prognostic markers, poor outcome, and a distinct gene 0215 33. Chen P-S, Wang M-Y, Wu S-N, et al. CTGF expression signature in cytogenetically normal patients enhances the motility of breast cancer cells via an integrin younger than 60 years with acute myeloid leukemia: a alpha v beta 3-ERK 1/2-dependent S100A4-upregulated Cancer and Leukemia Group B (CALGB) study. Blood. pathway...J Cell Sci. 2007: 120(12):2053-2065. 2008: 111(11):5371-5379. 0216 34. Wang L, Zhou X, Zhou T. et al. Ecto-5'-nucle otidase promotes invasion, migration and adhesion of REFERENCES human breast cancer cells. J Cancer Res Clin Oncol. 2008; 134(3):365-372. Second Set 1' Supplement 0217 35. Kodach L L. Bleurning SA, Musler A R, et al. 0230) 1. Borowitz M. J. Devidas M, Hunger S. P. et al. The bone morphogenetic protein pathway is active in Clinical significance of minimal residual disease in child human colon adenomas and inactivated in colorectal can hood acute lymphoblastic leukemia and its relationship to cer. Cancer. 2008; 112(2):300-306. other prognostic factors: a Children's Oncology Group 0218. 36. Rae FK, Hooper J D, Eyre HJ, Sutherland GR, study. Blood. 2008: 111(12):5477-5485. Nicol D L, Clements JA..TTYH2, a human homologue of 0231 2. Bair E., Tibshirani R. Semi-supervised methods to the Drosophila melanogaster gene tweety, is located on predict patient Survival from gene expression data. PLOS 17q24 and upregulated in renal cell carcinoma. Genomics. Biol. 2004; 2(4):511-522. 2001; 77(3):200-207. 0232 3. Shuster JJ. Camitta B M, Pullen J, et al. Identi 0219. 37. Toiyama Y. Mizoguchi A, Kimura K, et al. fication of newly diagnosed children with acute lympho TTYH2, a human homologue of the Drosophila melano cytic leukemia at high risk for relapse. Cancer Research gaster gene tweety, is up-regulated in colon carcinoma and Therapy and Control. 1999; 9(1-2):101-107. involved in cell proliferation and cell aggregation. World J 0233 4. Bhojwani D, Kang H. Menezes R X, et al. Gene Gastroenterol. 2007: 13(19):2717-2721. expression signatures predictive of early response and out 0220 38. Dunne J. Cullmann C. Ritter M, et al. siRNA come in high-risk childhood acute lymphoblastic leuke mediated AML.1/MTG8 depletion affects differentiation mia: a Children's Oncology Group Study on behalf of the and proliferation-associated gene expression in t(8:21)- Dutch Childhood Oncology Group and the German Coop positive cell lines and primary AML blasts. Oncogene. erative Study Group for Childhood Acute Lymphoblastic 2006: 25(45):6067-6078. Leukemia. J Clin Oncol. 2008; 26(27):4376-4384. 0221 39. Assou S. Le Carrour T, Tondeur S, et al. A 0234 5. Wilson CS, Davidson GS, Martin SB, et al. Gene meta-analysis of human embryonic stem cells transcrip expression profiling of adult acute myeloid leukemia iden tome integrated into a web-based expression atlas. Stem tifies novel biologic clusters for risk classification and out Cells. 2007; 25(4):961-973. come prediction. Blood. 2006:10802): 685-696. US 2011/0230372 A1 Sep. 22, 2011 64

0235 6. O'Shaughnessy J.A. Molecular signatures predict 0252 23. Assou S, Le Carrour T, Tondeur S, et al. A outcomes of breast cancer. N Engll Med. 2006; 355(6): meta-analysis of human embryonic stem cells transcrip 615-617. tome integrated into a web-based expression atlas. Stem 0236 7. Fan C, Oh DS, Wessels L., et al. Concordance Cells. 2007; 25(4):961-973. among gene-expression-based predictors for breast cancer. (0253 24. Mageed AS, Pietryga DW, DeHeer DH, West R N Engl J Med. 2006; 355(6):560-569. A. Isolation of large numbers of mesenchymal stem cells 0237 8. Twombly R. Breast cancer gene microarrays pass from the washings of bone marrow collection bags: char muster. J Natl Cancer Inst. 2006: 98(20): 1438–1440. acterization of fresh mesenchymal stem cells. Transplan 0238 9. Simon R. Development and evaluation of thera tation. 2007; 83(1019-1026. peutically relevant predictive classifiers using gene expres (0254. 25. Boag JM, Beesley A. H. Firth MJ, et al. High sion profiling. J Natl Cancer Inst. 2006: 98(17): 1169 expression of connective tissue growth factor in pre-B 1171. acute lymphoblastic leukaemia. Br. J. Haematol. 2007: 138 0239) 10. Asgharzadeh S. Pique-Regi R, Sposto R, et al. (6):740–748. Prognostic significance of gene expression profiles of (0255 26. Deaglio S, Dwyer KM, Gao W. etal. Adenosine metastatic neuroblastomas lacking MYCN gene amplifi generation catalyzed by CD39 and CD73 expressed on cation. J Natl Cancer Inst. 2006: 98(17): 1193-1203. regulatory T cells mediates immune Suppression. J Exp 0240 11. Bair E. Hastie T. Paul D, Tibshirani R. Prediction Med. 2007; 204(1257-1265. by Supervised principal components. J. An Stat Assoc. 0256 27. Mikhailov A, Sokolovskaya A, Yegutkin GG, et 2006; 101(473): 119-137. al. CD73 participates in cellular multiresistance program 0241 12. Bair E. Tibshirani R. Supervised principal com and protects against TRAIL-induced apoptosis. J Immu ponents, R package. mol. 2008; 181(1):464–475. 0242) 13. Tusher V. G., Tibshirani R. Chu G. Significance (0257 28. Mullican S E, Zhang S. Konopleva M, et al. analysis of microarrays applied to the ionizing radiation Abrogation of nuclear receptors Nr4a3 and Nr4al leads to response. Proc Natl Acad Sci USA. 2001: 98(9): 5116 development of acute myeloid leukemia. Nat Med. 2007: 5121. 13(6):730-735. 0243) 14. Dudoit S, Fridly and J. SpeedTP. Comparison of (0258. 29. Gottardo N G, Hoffmann K, Beesley AH, et al. discrimination methods for the classification of tumors Identification of novel molecular prognostic markers for using gene expression data. J Am Stat Assoc. 2002; paediatric T-cell acute lymphoblastic leukaemia. Br J. 97(457):77-87. Haematol. 2007: 137(319-328. 0244 15. HorkeS, Witte I, Wilgenbus P. Kruger M, Strand (0259 30. Agenes F, Bosco N, Mascarell L. Fritah S. D. Forstermann U. Paraoxonase-2 reduces oxidative stress Ceredig R. Differential expression of regulator of G-pro in vascular cells and decreases endoplasmic reticulum tein signalling transcripts and in vivo migration of CD4+ stress-induced caspase activation. Circulation. 2007; 115 naive and regulatory T cells. J Immunol. 2005; 115(179 (15):2055-2064. 188. 0245 16. Gomis R R, Alarcon C. He W. et al. A FoxO 0260 31. Schwable J, Choudhary C. Thiede C, et al. RGS2 Smad Synexpression group in human keratinocytes. Proc is an important target gene of Flt3-ITD mutations in AML Nall AcadSci USA. 2006; 103(34): 12747-12752. and functions in myeloid differentiation and leukemic 0246) 17. Chen P-S, Wang M-Y, Wu S-N, et al. CTGF transformation. Blood. 2005: 105(5):2107-2114. enhances the motility of breast cancer cells via an integrin 0261 32. Lehar SM, Bevan M.J.T cells develop normally alpha v beta 3-ERK 1/2-dependent S100A4-upregulated in the absence of both Deltex 1 and Deltex2. Mol Cell Biol. pathway...J Cell Sci. 2007: 120(12):2053-2065. 2006; 26(7358-7371. 0247 18. Wang L, Zhou X, Zhou T. et al. Ecto-5'-nucle 0262. 33. Feinberg M. W. Wara A K, Cao Z. et al. The otidase promotes invasion, migration and adhesion of Kruppel-like factor KLF4 is a critical regulator of mono human breast cancer cells. J Cancer Res Clin Oncol. 2008; cyte differentiation. EMBO.J. 2007; 26(4138-4148. 134(3):365-372. 0263. 34. Cario G, Stanulla M. Fine B M, et al. Distinct 0248. 19. Kodach L. L. Bleurning SA, Musler A. R. et al. gene expression profiles determine molecular treatment The bone morphogenetic protein pathway is active in response in childhood acute lymphoblastic leukemia. human colon adenomas and inactivated in colorectal can Blood. 2005: 105(821-826. cer. Cancer. 2008; 112(2):300-306. 0264. 35. Flotho C, Coustan-Smith E. Pei D, et al. A set of 0249. 20. Rae FK, Hooper J D, Eyre HJ, Sutherland GR, genes that regulate cell proliferation predicts treatment Nicol D L, Clements JA..TTYH2, a human homologue of outcome in childhood acute lymphoblastic leukemia. the Drosophila melanogaster gene tweety, is located on Blood. 2007: 110(4): 1271-1277. 17q24 and upregulated in renal cell carcinoma. Genomics. 0265 36. Flotho C, Coustan-Smith E. Pei D, et al. Genes 2001; 77(3):200-207. contributing to minimal residual disease in childhood acute 0250 21. Toiyama Y. Mizoguchi A, Kimura K, et al. lymphoblastic leukemia: prognostic significance of TTYH2, a human homologue of the Drosophila melano CASP8AP2. Blood. 2006; 108(3):1050-1057. gaster gene tweety, is up-regulated in colon carcinoma and 0266 37. Yeoh E. J. Ross ME, Shurtleff SA, et al. Classi involved in cell proliferation and cell aggregation. World J. fication, Subtype discovery, and prediction of outcome in Gastroenterol. 2007: 13(19): 2717-2721. pediatric acute lymphoblastic leukemia by gene expression 0251 22. Dunne J. Cullmann C. Ritter M, et al. siRNA profiling. Cancer Cell. 2002: 1(2):133-143. mediated AML.1/MTG8 depletion affects differentiation 0267 38. Langer C. Radmacher MD, Ruppert AS, et al. and proliferation-associated gene expression in t(8:21)- High BAALC expression associates with other molecular positive cell lines and primary AML blasts. Oncogene. prognostic markers, poor outcome, and a distinct gene 2006: 25(6067-6078. expression signature in cytogenetically normal patients US 2011/0230372 A1 Sep. 22, 2011

younger than 60 years with acute myeloid leukemia: a (0281. 13. Wilson C S, Davidson G. S. Martin S B, et al. Cancer and Leukemia Group B (CALGB) study. Blood. Gene expression profiling of adult acute myeloid leukemia 2008: 111(11):5371-5379. identifies novel biologic clusters for risk classification and 0268 39. Tibshirani R, Chu G, Hastie T, Narasimhan B. outcome prediction. Blood. 2006: 108(2): 685-696. SAM: Significance analysis of microarrays, R package. 0282. 14. Tomlins SA, Rhodes DR, Perner S, et al. Recur rent fusion of TMPRSS2 and ETS transcription factor REFERENCES genes in prostate cancer. Science. 2005; 310(5748):644 648. Third Set (0283. 15. Mullighan C G, Goorha S, Radtke I, et al. 0269. 1. Smith M. Arthur D. Camitta B, et al. Uniform Genome-wide analysis of genetic alterations in acute lym approach to risk classification and treatment assignment phoblastic leukaemia. Nature. 2007:446(7137): 758-764. for children with acute lymphoblastic leukemia. J Clin (0284 16. Mullighan C. G. Miller C B, Radtke I, et al. Oncol. 1996; 14(1):18-24. BCR-ABL1 lymphoblastic leukaemia is characterized by 0270 2. Schultz KR, Pullen DJ, Sather HN, et al. Risk the deletion of Ikaros. Nature. 2008: 453(7191): 110-114. and response-based classification of childhood B-precur (0285) 17. Bland J. M. Altman DG. The logrank test. BMJ. Sor acute lymphoblastic leukemia: a combined analysis of 2004; 328(7447): 1073. prognostic markers from the Pediatric Oncology Group 0286 18. Armitage P. Berry G. Statistical methods in (POG) and Children's Cancer Group (CCG). Blood. 2007: medical research (ed 3rd). Oxford; Boston: Blackwell Sci 109(3):926-935. entific Publications: 1994. (0271 3. Kadan-Lottick N. S. Ness KK, Bhatia S, Gurney 0287. 19. Bewick V. Cheek L, Ball J. Statistics review 12: JG. Survival variability by race and ethnicity in childhood survival analysis. Crit Care. 2004; 8(5):389-394. acute lymphoblastic leukemia. JAMA: The Journal of the 0288. 20. R Development Core Team. R: A language American Medical Association. 2003:290(15):2008-2014. and environment for statistical computing: 2009. 0272 4. Shuster JJ. Camitta B M, Pullen J, et al. Identi fication of newly diagnosed children with acute lympho (0289. 21. Ross ME, Zhou X D, Song GC, et al. Classifi cytic leukemia at high risk for relapse. Cancer Research cation of pediatric acute lymphoblastic leukemia by gene Therapy and Control. 1999; 9(1-2):101-107. expression profiling. Blood. 2003: 102(8):2951-2959. (0273 5. Mullighan CG, Su X, Zhang J, et al. Deletion of 0290 22. Wong P. Iwasaki M. Somervaille TC, So C W. IKZF1 and prognosis in acute lymphoblastic leukemia. N. Cleary M. L. Meisl is an essential and rate-limiting regula Engl J Med. 2009; 360(5):470-480. tor of MLL leukemia stem cell potential. Genes Dev. 2007: 0274 6. Mullighan CG, Zhang J. Harvey RC, et al. JAK 21(21):2762-2774. mutations in high-risk childhood acute lymphoblastic leu 0291. 23. Sala-Torra O, Gundacker HM, Stirewalt DL, et kemia. Proc Natl AcadSci USA. 2009. al. Connective tissue growth factor (CTGF) expression and (0275 7. Borowitz M. J. Devidas M, Hunger S. P. et al. outcome in adult patients with acute lymphoblastic leuke Clinical significance of minimal residual disease in child mia. Blood. 2007: 109(7):3080-3083. hood acute lymphoblastic leukemia and its relationship to 0292 24. Julie D, Lacayo NJ, Ramsey MC, et al. Differ other prognostic factors: a Children's Oncology Group ential gene expression patterns and interaction networks in study. Blood. 2008: 111 (12):5477-5485. BCR-ABL-positive and -negative adult acute lymphoblas (0276 8. Borowitz M. J. Devidas M, Hunger S. P. et al. tic leukemias. J Clin Oncol. 2007; 25(11): 1341-1349. Clinical significance of minimal residual disease in child 0293 25. Mullighan C G, Collins-Underwood J R, Phil hood acute lymphoblastic leukemia and its relationship to lips LAA, et al. Rearrangement of CRLF2 in B-progenitor other prognostic factors: A Children's Oncology Group and Down syndrome associated acute lymphoblastic leu study. Blood. 2008. kemia. Nat Genet. 2009; (in press). (0277 9. Nachman J. B. Sather HN, Sensel M. G. et al. 0294 26. Russell L. J. Capasso M. Vater I, et al. Deregu Augmented post-induction therapy for children with high lated expression of cytokine receptor gene, CRLF2, is risk acute lymphoblastic leukemia and a slow response to involved in lymphoid transformation in B-cell precursor initial therapy. N Engl J Med. 1998:338(23):1663-1671. acute lymphoblastic leukemia. Blood. 2009; 114(13): 0278 10. Seibel N L, Steinherz P G. Sather H N, et al. 2688-2698. Early postinduction intensification therapy improves Sur 0295 27. Mullighan CG, Miller C B, Su X, et al. ERG vival for children and adolescents with high-risk acute deletions define a novel subtype of B-progenitor acute lymphoblastic leukemia: a report from the Children's lymphoblastic leukemia. Blood. 2007: 110(11, 1):212A Oncology Group. Blood. 2008: 111(5):2548-2555. 213A 0279 11. Borowitz M. J. Pullen DJ, Shuster JJ, et al. 0296 28. Yeoh E. J. Ross ME, Shurtleff SA, et al. Classi Minimal residual disease detection in childhood precursor fication, Subtype discovery, and prediction of outcome in B-cell acute lymphoblastic leukemia: relation to other risk pediatric acute lymphoblastic leukemia by gene expression factors. A Children's Oncology Group study. Leukemia. profiling. Cancer Cell. 2002: 1(2):133-143. 2003; 17(8): 1566-1572. 0297 29. Bhatia S, Sather HN, HeeremaNA, Trigg ME, 0280 12. Bhojwani D, Kang H. Menezes RX, et al. Gene Gaynon PS, Robison L. L. Racial and ethnic differences in expression signatures predictive of early response and out survival of children with acute lymphoblastic leukemia. come in high-risk childhood acute lymphoblastic leuke Blood. 2002: 100(6):1957-1964. mia: a Children's Oncology Group Study on behalf of the 0298. 30. Pollock BH, DeBaun MR, Camitta B M, et al. Dutch Childhood Oncology Group and the German Coop Racial differences in the survival of childhood B-precursor erative Study Group for Childhood Acute Lymphoblastic acute lymphoblastic leukemia: a Pediatric Oncology Leukemia. J Clin Oncol. 2008; 26(27):4376-4384. Group Study. J Clin Oncol. 2000; 18(4):813-823. US 2011/0230372 A1 Sep. 22, 2011 66

0299 31. Den Boer ML, van Slegtenhorst M. DeMenezes (ii) a predetermined gene expression level for the gene R X, et al. A subtype of childhood acute lymphoblastic products; leukaemia with poor treatment outcome: a genome-wide wherein an observed expression levels that is higher or lower classification study. Lancet Oncol. 2009; 10(2):125-134. than the control gene expression levels is indicative of pre 0300 32. Harvey R C, Davidson G. S. Wang X, et al. dicted remission or therapeutic failure. Expression profiling identifies novel genetic Subgroups 2. The method of claim 1 wherein said at least two gene with distinct clinical features and outcome in high-risk products includes at least three gene products from Table 1 P. pediatric precursor B acute lymphoblastic leukemia 3. The method of claim 1 wherein said at least two gene (B-ALL). A Children's Oncology Group Study. Blood. products includes at least three gene products from Table 1Q 2007: 110: Abstract 1430. hereof. 0301 33. Russell LJ, Capasso M, Vater I, et al. IGH(a) 4. The method of claim 1 wherein said at least two gene translocations involving the pseudoautosomal region 1 products are selected from the group consisting of BMPR1B: (PAR1) of both sex chromosomes deregulate the cytokine CTGF; IGJ; LDB3; PON2; RGS2: SCHIP1 and SEMA6A. receptor-like factor 2 (CRLF2) gene in B cell precursor 5. The method of claim 1 wherein said gene product acute lymphoblastic leukemia (BCP-ALL). Blood. 2008; includes at least two gene products selected from the group 1.12: Abstract 787. consisting of BMPR1B: CA6: CRLF2: GPR110; IGJ; LDB3; 0302. 34. Russell L. J. Capasso M. Vater I, et al. Deregu lated expression of cytokine receptor gene, CRLF2, is MUC4; NRXN3; PON2; RGS2 and SEMA6A. involved in lymphoid transformation in B cell precursor 6. The method according to claim 1 wherein said gene acute lymphoblastic leukemia. Blood. 2009. products include at least three gene products. 0303. 35. Juric D, Lacayo NJ, Ramsey MC, et al. Differ 7. The method according to claim 1 wherein said gene ential gene expression patterns and interaction networks in products include at least four gene products. BCR-ABL-positive and -negative adult acute lymphoblas 8. (canceled) tic leukemias. J Clin Oncol. 2007; 25(11):1341-1349. 9. (canceled) 10. (canceled) REFERENCES 11. (canceled) 12. (canceled) Fourth Set 4th Supplement 13. (canceled) 0304) 1. Ross ME, Zhou X D, Song GC, et al. Classifi 14. (canceled) cation of pediatric acute lymphoblastic leukemia by gene 15. (canceled) expression profiling. Blood. 2003: 102(8):2951-2959. 16. The method according to claim 1 wherein at least one of 0305 2. Mullighan CG, Su X, Zhang J, et al. Deletion of said gene products is CRLF2. IKZF1 and prognosis in acute lymphoblastic leukemia. N 17. The method according to claim 1 wherein said leuke Engl J Med. 2009; 360(5):470-480. mia patient has been diagnosed with acute lymphoblastic (0306 3. Borowitz M. J. Devidas M, Hunger S. P. et al. leukemia (ALL). Clinical significance of minimal residual disease in child 18. The method according to claim 1 wherein said leuke hood acute lymphoblastic leukemia and its relationship to mia patient has been diagnosed with B-precursor acute lym other prognostic factors: a Children's Oncology Group phoblastic leukemia (B-ALL) study. Blood. 2008: 111 (12):5477-5485. 19. The method according to claim 18 wherein said leuke 0307 4. Bhojwani D, Kang H, Menezes RX, et al. Gene expression signatures predictive of early response and out mia patient is a pediatric leukemia patient. come in high-risk childhood acute lymphoblastic leuke 20. The method according to claim 1 wherein an observed mia: a Children's Oncology Group Study on behalf of the expression level which is greater than a control expression Dutch Childhood Oncology Group and the German Coop level is indicative of an unfavorable therapeutic outcome. erative Study Group for Childhood Acute Lymphoblastic 21. The method according to claim 1 wherein an observed Leukemia. J Clin Oncol. 2008; 26(27):4376-4384. expression level which is greater than a control expression 0308 5. Tomlins SA, Rhodes DR, Perrier S, et al. Recur level is indicative of a favorable therapeutic outcome. rent fusion of TMPRSS2 and ETS transcription factor 22. The method according to claim 1 wherein an observed genes in prostate cancer. Science. 2005; 310(5748):644 expression level of at least one gene product selected from the 648. group consisting of BMPR1B: C8orf.38: CDC42EP3: CTGF: DKFZP761M1511; ECM1; GRAMD1C: IGJ; LDB3; 1. A method for predicting therapeutic outcome in a leu LOC400581; LRRC62: MDFIC; NT5E; PON2: SCHIP1; kemia patient comprising: SEMA6A: TSPAN7 and TTYH2 which is greater than a (a) obtaining a biological sample from a patient; control expression level is indicative of an unfavorable thera (b) determining in said sample the expression level for at peutic outcome. least two gene products selected from the group consist 23. The method according to claim 4 wherein an observed ing of the gene products which are set forth in Tables 1P expression level of at least one gene product selected from the or alternatively 1C hereof, to yield observed gene group consisting of BMPR1B: CTGF: IGJ; LDB3; PON2: expression levels; and SCHIP1 and SEMA6A which is greater thana control expres (c) comparing the observed gene expression levels for the sion level is indicative of an unfavorable therapeutic outcome. gene products to a control gene expression level selected 24. The method according to claim 1 wherein an observed from the group consisting of expression level of at least one gene product selected from the (i) the gene expression level for the gene products group consisting of BTG3: C14orf32: CD2: CHST2: observed in a control sample; and DDX21: FMNL2: MGC12916; NFKBIB; NR4A3; RGS1; US 2011/0230372 A1 Sep. 22, 2011 67

RGS2: UBE2E3 and VPREB1 which is greater than a control 35. The method according to claim 30 wherein said gene expression level is indicative of a favorable therapeutic out products further include AGAP-1 (Arf GAP with GTP-bind COC. ing protein-like, ANK repeat and PH domains) and/or 25. The method according to claim 1 wherein an observed PCDH17 (Protocadherin-17). expression level of at least one gene product selected from the 36. A method for Screening compounds useful for treating group consisting of BMPR1B: BTBD11; C21orf87: CA6: acute lymphoblastic leukemia comprising: CDC42EP3; CKMT2: CRLF2: CTGF; DIP2A; GIMAP6; (a) determining the expression level for at least three gene GPR110; IGFBP6; IGJ; K1F1C: LDB3; LOC391849; products selected from the group consisting of the gene LOC650794: MUC4; NRXN3; PON2; RGS3: SCHIP1; products of Table 1P or alternatively, Table 1Q in a cell SCRN3: SEMA6A and ZBTB16 which is greater than a culture to yield observed gene expression levels prior to control expression level is indicative of an unfavorable thera contact with a candidate compound; peutic outcome. (b) contacting the cell culture with a candidate compound; 26. The method according to claim 5 wherein an observed (c) determining the expression level for the gene products expression level of at least one gene product selected from the in the cell culture to yield observed gene expression group consisting of BMPR1B: CA6: CRLF2: GPR110; IGJ; levels after contact with the candidate compound; and LDB3: MUC4; NRXN3; PON2; RGS2 and SEMA6A which (d) comparing the observed gene expression levels before is greater than a control expression level is indicative of an and after contact with the candidate compound wherein unfavorable therapeutic outcome. a change in the gene expression levels after contact with 27. The method according to claim 4 wherein an observed the compound is indicative of therapeutic utility for said expression level of RGS2 which is greater than a control compound. expression level is indicative of a favorable therapeutic out 37. The method according to claim 36 wherein said gene COC. products are selected from the group consisting of BMPR1B: 28. The method according to claim 1 wherein said gene CA6: CRLF2: GPR110; IGJ; LDB3: MUC4; NRXN3; products are selected from the group consisting of CA6, IGJ. PON2; and SEMA6A and an observed expression level of MUC4, GPR110, LDB3, PON2, RGS2 and CRLF2. BMPR1B: CA6: CRLF2: GPR110; IGJ; LDB3: MUC4; 29. The method according to claim 1 wherein said gene NRXN3; PON2; and/or SEMA6A which is the same as or products further include AGAP-1 (Arf GAP with GTP-bind higher than a control expression level is indicative of an ing protein-like, ANK repeat and PH unfavorable or inactive therapeutic compound. domains) and/or PCDH17 (Protocadherin-17). 30. A method for predicting therapeutic outcome in a leu 38. The method according to claim 36 wherein said gene kemia patient comprising: products are selected from the group consisting of BMPR1B: CA6: CRLF2: GPR110; IGJ; LDB3: MUC4; NRXN3; (a) obtaining a biological sample from a patient; PON2; and SEMA6A and an observed expression level of (b) determining in said sample the expression level of gene BMPR1B: CA6: CRLF2: GPR110; IGJ; LDB3: MUC4; products for at least five of the genes of Tables 1P or NRXN3; PON2; and/or SEMA6A which is less thana control alternatively, 1 O hereof to yield observed gene expres expression level is indicative of a favorable therapeutic out sion levels; and (c) comparing the observed gene expression levels for the COC. gene products to a control gene expression level selected 39. The method of claim 36 wherein said at least three gene from the group consisting of products includes CRLF-2. (i) the gene expression level for the gene products 40. The method of claim 36 comprising determining the observed in a control sample; and expression level for at least five of said gene products. (ii) a predetermined gene expression level for the gene 41. The method according to claim 36 wherein said leuke products; mia is B-precursor acute lymphoblastic leukemia (B-ALL). wherein an observed expression levels that is higher or lower 42. The method according to claim 41 wherein said leuke than the control gene expression levels is indicative of pre mia is pediatric B-ALL. dicted remission or an unfavorable therapeutic outcome. 43. The method according to claim 36 wherein said gene 31. The method according to claim 30 wherein the expres products further include AGAP-1 (Arf GAP with GTP-bind sion levels of BMPR1B: CA6: CRLF2: GPR110; IGJ; LDB3; ing protein-like, ANK repeat and PH domains) and/or MUC4; NRXN3; PON2 and SEMA6A which is above a PCDH17 (Protocadherin-17). control expression level is indicative of a unfavorable thera 44. A method for Screening compounds useful for treating peutic outcome and the expression level of RGS2 which is acute lymphoblastic leukemia comprising: above a control expression level is indicative of a favorable (a) contacting an experimental cell culture with a candidate therapeutic outcome. compound; 32. The method according to claim 30 wherein the expres (b) determining the expression level for at least three gene sion levels of CA6: CRLF2: GPR110; IGJ; LDB3: MUC4 products selected from the group consisting of the gene and PON2 which is above a control expression level is indica products of Table 1P or alternatively, Table 1Q in the cell tive of a unfavorable therapeutic outcome and the expression culture to yield experimental gene expression levels; and level of RGS2 which is above a control expression level is (c) comparing the experimental gene expression levels of indicative of a favorable therapeutic outcome step b) to the expression level of the gene products in a 33. The method according to claim 30 wherein said patient control cell culture, wherein a relative difference in the is diagnosed with B-precursor acute lymphoblastic leukemia gene expression levels between the experimental and (B-ALL). control cultures is indicative of therapeutic utility. 34. The method according to claim 33 wherein said patient 45. The method according to claim 44 wherein said gene is a pediatric patient. products are selected from the group consisting of BMPR1B: US 2011/0230372 A1 Sep. 22, 2011

CA6: CRLF2: GPR110; IGJ; LDB3: MUC4; NRXN3; 57. The method according to claim 55 wherein said leuke PON2; RGS2: SEMA6A and mixtures thereof. mia is pediatric B-ALL. 46. The method according to claim 45 wherein the expres 58. The method according to claim 55 wherein said gene sion of all eleven gene products is measured and compared to products include CRLF2. expression of said eleven gene products in said control cell 59. The method according to claim 55 wherein said gene culture. products further include AGAP-1 (Arf GAP with GTP-bind 47. The method according to claim 44 wherein said gene ing protein-like, ANK repeat and PH domains) and/or products includes CRLF2. PCDH17 (Protocadherin-17). 48. The method according to claim 44 wherein said gene 60. The method according to claim 55 wherein said gene products further include AGAP-1 (Arf GAP with GTP-bind products wherein a more aggressive traditional therapy or an ing protein-like, ANK repeat and PH domains) and/or experimental therapy is recommended for said leukemia PCDH17 (Protocadherin-17). patient. 49. (canceled) 61. (canceled) 50. (canceled) 62. (canceled) 51. (canceled) 63. (canceled) 52. (canceled) 64. (canceled) 53. (canceled) 65. (canceled) 54. (canceled) 66. (canceled) 55. A method for predicting therapeutic outcome in a leu 67. (canceled) kemia patient comprising: 68. (canceled) (a) obtaining a biological sample from a patient; 69. (canceled) (b) determining in said sample the expression level for at 70. A kit comprising a microchip embedded thereon poly least three gene products selected from the group con nucleotide probes specific for at least two prognostic genes sisting of BMPR1B: CA6: CRLF2: GPR110; IGJ; selected from the group as set forth in Table 1 P or alterna LDB3: MUC4; NRXN3; PON2; RGS2 and SEMA6A to tively, Table 1Q. yield observed gene expression levels; and 71. The kit according to claim 70 wherein said prognostic (c) comparing the observed gene expression levels for the genes are selected from the group consisting of BMPR1B: gene products to a control gene expression level selected CA6: CRLF2: GPR110; IGJ; LDB3: MUC4; NRXN3; from the group consisting of PON2; RGS2 and SEMA6A. (i) the gene expression level for the gene products 72. (canceled) observed in a control sample; and 73. A kit comprising at least two antibodies which are each (ii) a predetermined gene expression level for the gene specific at least for two different polypeptides selected from products; the group consisting of gene products as set forth in Table 1P wherein an observed expression levels that is higher or lower or alternatively, Table 1Q. than the control gene expression levels is indicative of pre 74. (canceled) dicted therapeutic failure. 75. (canceled) 56. The method according to claim 55 wherein said leuke mia is B-precursor acute lymphoblastic leukemia (B-ALL).