US 2006O183141A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2006/0183141 A1 Chang et al. (43) Pub. Date: Aug. 17, 2006

(54) EXPRESSION SIGNATURE FOR Publication Classification PREDCTION OF HUMAN CANCER PROGRESSION (51) Int. Cl. (75) Inventors: Howard Yuan-Hao Chang, CI2O I/68 (2006.01) Burlingame, CA (US); Julie Sneddon, G06F 9/00 (2006.01) Palo Alto, CA (US); Patrick O. Brown, Standford, CA (US) (52) U.S. Cl...... 435/6; 702/20 Correspondence Address: BOZICEVIC, FIELD & FRANCIS LLP (57) ABSTRACT 1900 UNIVERSITY AVENUE SUTE 200 EAST PALO ALTO, CA 94.303 (US) Methods are provided for classification of cancers by the expression of a set of referred to as the core serum (73) Assignee: The Board of Trustees of the Leland response (CSR), or a subset thereof. The expression pattern Stanford Junior Univerity of the CSR in normal tissues correlates with that seen in quiescent fibroblasts cultured in the absence of serum, while (21) Appl. No.: 11/332,547 cancer tissues can be classified as having a quiescent or induced CSR signature. Patients with the induced CSR (22) Filed: Jan. 12, 2006 signature have a higher probability of metastasis. Classifi Related U.S. Application Data cation according to CSR signature allows optimization of treatment, and determination of whether on whether to (60) Provisional application No. 60/643,610, filed on Jan. proceed with a specific therapy, and how to optimize dose, 12, 2005. choice of treatment, and the like. Patent Application Publication Aug. 17, 2006 Sheet 1 of 12 US 2006/0183141 A1

S : : O CD III IIHTHEi NS O

O ves s N

O w

C

N. CN CS

e - Seues 99. i. v. g

CC ------e-Seuef A9 Patent Application Publication Aug. 17, 2006 Sheet 2 of 12 US 2006/0183141 A1 i Seuef OO

it is is is si'i it is

a .38 vs. 3. Rii i its & sis

LeSEs 83.38&S : & 1883

U9s St is is is 8:8; i i Patent Application Publication Aug. 17, 2006 Sheet 3 of 12 US 2006/0183141 A1

- Normal-like A - ErbB-2+ - Basa fi h Luminal

l- L L U U U U U l l p53 DOD D OOOO O O OO O DOD O DO

Quiescent Activated

B 1 Breast CA

0.8 X Censored e 0.6 - Ouiescent O 8c 0.4 - ActivatedA ri O. O. 2 O O 20 40 60 80 100 O 20 40 60 80 100 Survival (months) Relapse Free Survival (months)

FIG. 3

Patent Application Publication Aug. 17, 2006 Sheet 5 of 12 US 2006/0183141 A1

is ow ... sia Patent Application Publication Aug. 17, 2006 Sheet 6 of 12 US 2006/0183141 A1

in Tiri

3.0

-0.33

B

O8 a, s 0.6

l O2 O P-5.6 E-10 O 2 4. 6 8 10 12 O 2 4. 6 8 10 12 Survival (Years) Met-Free Survival (Years)

FIG. 6 Patent Application Publication Aug. 17, 2006 Sheet 7 of 12 US 2006/0183141 A1

Patent Application Publication Aug. 17, 2006 Sheet 8 of 12 US 2006/0183141 A1

A NIH High Risk C Decision guides >10 year followup NIH SG Activated

P=0.0003 O 2 4 6 8 10 12 8 Met-Free Survival (Years) d B St. Gallen High Risk

1

O8 e . 06 S Activated 9 04 5 d O2

O O 2 4. 6 8 10 12 E3 Chemo Met-Free Survival (Years) No chemo

FIG. 8 Patent Application Publication Aug. 17, 2006 Sheet 9 of 12 US 2006/0183141 A1

Patent Application Publication Aug. 17, 2006 Sheet 10 of 12 US 2006/0183141 A1

0.8 É 0.6 O 9 04 A- m Activated O.2 in Ouiescent O area indeterminate p= 3.8 E-9 O 2 4. 6 8 10 12 O 2 4. 6 8 10 12 Survival (Years) Distant Metastasis Free Probability (Years)

FIG. 1 O Patent Application Publication Aug. 17, 2006 Sheet 11 of 12 US 2006/0183141 A1

A

0.7 imitri I in Titli

0.5 ------ar------0.3 t A. -- - I -

0.18'Wyn. Y.E. YYYY sawadayiYEV

air"...rategy. HBasal Lumina B -0.5 ------ErbB2 - Normalike -0.7 -Luminal A -unclassified

B Wound Signature 70 Gene Signature

Basal 45 42 44 ErbB2 39 18 32 Luminal A 47 10 11 Luminal B 45 33 42 Normal-like 10 O 3 Unclassified 109 23 48

C 1 ErbB2 subtype luminal B subtype

0.8 0.8 e 0.6 0.6 9 0.4 0.4

0. 2 HActivated and Poor, N=16 0.2 H Activated and Poor, N=33 - Not activated-poor, N=23 O - Not activated+poor, N=12 O 0 2 4 6 8 10 12 O 2 4 6 8 10 12 Overall Survival (Years) Overall Survival (Years)

FG 11

US 2006/0183141 A1 Aug. 17, 2006

GENE EXPRESSION SIGNATURE FOR interpret the gene expression data of human cancers and test PREDCTION OF HUMAN CANCER specific hypotheses. However, as in other methodologies, PROGRESSION reproducibility and scales for interpretation should to be evaluated before this strategy can be generally adopted for 0001. This invention was made with Government support biologic discovery and clinical use. under contract NIH CA77097 awarded by the National Institutes of Health. The Government has certain rights in 0007 Early disease diagnosis is of central importance to this invention. halting disease progression, and reducing morbidity. Analy sis of a patient's tumor to identify gene expression patterns 0002. In recent years, microarray analysis of gene expres provides the basis for more specific, rational cancer therapy sion patterns has provided a way to improve the diagnosis that may result in diminished adverse side effects relative to and risk stratification of many cancers. Unsupervised analy conventional therapies. Furthermore, confirmation that a sis of global gene expression patterns has identified molecu tumor poses less risk to the patient (e.g., that the tumor is larly distinct Subtypes of cancer, distinguished by extensive benign) can avoid unnecessary therapies. In short, identifi differences in gene expression, in diseases that were con cation of gene expression patterns in cancerous cells can sidered homogeneous based on classical diagnostic meth provide the basis of therapeutics, diagnostics, prognostics, ods. Such molecular subtypes are often associated with therametrics, and the like. different clinical outcomes. Global gene expression pattern can also be examined for features that correlate with clinical 0008 Since the classic observations of the many histo behavior to create prognostic signatures. logic similarities between the tumor microenvironment and normal wound healing, it has been proposed that tumor 0003 Cancer, like many diseases, is not the result of a stroma is “normal wound healing gone awry. During normal single, well-defined cause, but rather can be viewed as wound healing, coagulation of extravasated blood initiates a several diseases, each caused by different aberrations in complex cascade of signals that recruit inflammatory cells, informational pathways, which ultimately result in appar stimulate fibroblast and epithelial cell proliferation, direct ently similar pathologic phenotypes. Identification of poly cell migration, and induce angiogenesis to restore tissue nucleotides that are differentially expressed in cancerous, integrity. Many of these normally reparative processes may pre-cancerous, or low metastatic potential cells relative to be constitutively active in the tumor milieu and critical for normal cells of the same tissue type can provide the basis for tumor engraftment, local invasion, and metastasis to distant diagnostic tools, facilitates drug discovery by providing for organs. Indeed, keratinocytes from the wound edge tran targets for candidate agents, and further serves to identify siently exhibit many similarities to their transformed coun therapeutic targets for cancer therapies that are more tailored terparts in squamous cell carcinomas. Epidemiologically, for the type of cancer to be treated. chronic wound and inflammatory states are well-known risk factors for cancer development: the connection between 0004 Identification of differentially expressed gene prod cirrhosis and liver cancer, gastric ulcers and gastric carci ucts also furthers the understanding of the progression and noma, and burn wounds and Subsequent squamous cell nature of complex diseases such as cancer, and is key to identifying the genetic factors that are responsible for the carcinoma (so-called Majorlin’s ulcer) are but a few phenotypes associated with development of for example, examples. In the genetic blistering disorder recessive dys the metastatic phenotype. Identification of gene products trophic epidermolysis bullosa, nearly 80% of the patients that are differentially expressed at various stages, and in develop aggressive squamous cell carcinoma in their life various types of cancers, can both provide for early diag time, attesting to the powerful inductive environment of nostic tests, and further serve as therapeutic targets. Addi wounds for cancer development. tionally, the product of a differentially expressed gene can be 0009. In recent years, the roles of angiogenesis, extracel the basis for screening assays to identify chemotherapeutic lular matrix remodeling, and directed cell motility in cancer agents that modulate its activity (e.g. its expression, bio progression have been intensely studied. Nonetheless, a logical activity, and the like). comprehensive molecular view of wound healing and its relationship to human cancer is still lacking. Thus, there is 0005. By detailing the expression level of thousands of currently no established method to quantify the risk of genes simultaneously in tumor cells and their Surrounding cancer from wounds diagnostically or to intervene therapeu stroma, gene expression profiles of tumors can provide tically. “molecular portraits” of human cancers. The variations in gene expression patterns in human cancers are multidimen 0010 Fibroblasts are ubiquitous mesenchymal cells in sional and typically represent the contributions and interac the stroma of all epithelial organs and play important roles tions of numerous distinct cells and diverse physiological, in organ development, wound healing, inflammation, and regulatory, and genetic factors. Although gene expression fibrosis. Fibroblasts from each anatomic site of the body are patterns that correlate with different clinical outcomes can differentiated in a site-specific fashion and thus may play a be identified from microarray data, the biological processes key role in establishing and maintaining positional identity in tissues and organs. Tumor-associated fibroblasts have that the genes represent and thus the appropriate therapeutic previously been shown to promote the engraftment and interventions are generally not obvious. metastasis of orthotopic tumor cells of many epithelial 0006 Gene expression patterns provide a common lan lineages. The genomic response of foreskin fibroblasts to guage among biologic phenomena and allow an alternative serum, the soluble fraction of coagulated blood, represents a approach to infer physiologic and molecular mechanisms broadly coordinated and multifaceted wound-healing pro from complex human disease states. Starting with the gene gram that includes regulation of hemostasis, cell cycle expression profile of cells manipulated in vitro to simulate a progression, epithelial cell migration, inflammation, and biologic process, the expression profile can then be used to angiogenesis. US 2006/0183141 A1 Aug. 17, 2006

0011. The identification of a canonical gene expression 0017 FIGS. 1A-1C. Identification and Annotation of a signature of the fibroblast serum response, might provide a Common Serum Response in Fibroblasts. FIG. 1A. The molecular gauge for the presence and physiologic signifi fibroblast common serum response. Genes with expression cance of the wound-healing process in human cancers. The changes that demonstrate coordinate induction or repression present invention addresses this issue. by serum in fibroblasts from ten anatomic sites are shown. Each row represents a gene; each column represents a SUMMARY OF THE INVENTION sample. The level of expression of each gene in each sample, relative to the mean level of expression of that gene across 0012 Methods are provided for classification of cancers, all the samples, is represented using a red-green color scale particularly carcinomas. The global transcriptional response as shown in the key; gray indicates missing data. Represen of fibroblasts to serum integrates many processes involved tative genes with probable function in cell cycle progression in wound healing, which response is characterized herein by (orange), matrix remodeling (blue), cytoskeletal rearrange the expression of a set of genes referred to as the core serum ment (red), and cell-cell signaling (black) are highlighted by response (CSR), or a subset thereof. A predominantly bipha colored text on the right. Three fetal lung fibroblast samples, sic pattern of expression for the CSR is found in diverse cultured in low serum, which showed the most divergent cancers, including breast cancers, lung cancers, gastric can expression patterns among these samples, are indicated by cers, prostate cancers, and hepatocellular carcinoma. The blue branches. FIG. 1B. Identification of cell cycle-regu expression pattern of the CSR in normal tissues correlates lated genes in the common serum response signature. The with that seen in quiescent fibroblasts cultured in the expression pattern of each of the genes in (A) during HeLa absence of serum, while cancer tissues can be classified as cell cycle over 46 h after synchronization by double thymi having a quiescent or induced CSR signature. Patients with dine block is shown. Transit of cells through S and M phases the induced CSR signature have a higher probability of during the timecourse, verified by flow cytometry, is indi metastasis. Classification according to CSR signature allows cated below. Approximately one-quarter of genes demon optimization of treatment, and determination of whether on strate a periodic expression patterns and are therefore opera whether to proceed with a specific therapy, and how to tionally annotated as cell cycle genes; the remainder of the optimize dose, choice of treatment, and the like. genes are used in further analyses to define the CSR. FIG. 0013 In another embodiment of the invention, methods 1C. Validation of annotation by temporal expression pro are provided for statistical analysis of expression profile data files. Timecourse of gene expression changes in a foreskin fibroblast culture after shifting from 0.1% to 10% FBS is to determine whether a pattern of expression or response shown. Global gene expression patterns were determined will be predictive of a phenotype of interest. using cDNA microarrays containing 36,000 genes; genes 0014. In some embodiments of the invention, hierarchical whose transcript levels changed by at least 3-fold during the clustering can be used to assess the similarity between the timecourse and those in (A) are displayed. The cell cycle CSR signature and a test gene expression, by setting an genes identified in the analysis illustrated in (B) were found arbitrary threshold for assigning a cancer to one of two to have a distinct temporal expression pattern with coordi groups. Alternatively, in a preferred embodiment, the thresh nate upregulation at 12 h. old for assignment is treated as a parameter, which can be 0018 FIG. 2. Survey of Fibroblast CSR Gene Expres used to quantify the confidence with which patients are sion in Human Cancers. Expression patterns of available assigned to each class. The threshold for assignment can be CSR genes in over 500 tumors and corresponding normal scaled to favor sensitivity or specificity, depending on the tissues were extracted, filtered as described in Materials and clinical scenario. In one Such method, the CSR expression Methods, and organized by hierarchical clustering. The profile in a test sample is correlated to a vector representing response of each gene in the fibroblast serum response is the centroid of the differential expression of the reference shown on the right bar (red shows activated; green shows CSR signature. The correlation value to the reference cen repressed by serum). The strong clustering of the genes troid generates a continuous score that can be scaled. In induced or repressed, respectively, in fibroblasts in response multivariate analysis, the CSR signature is an independent to serum exposure, based solely on their expression patterns predictor of metastasis and death and provides a high level in the tumor samples, highlights their coordinate regulation of prognostic information. in tumors. The dendrograms at the top of each data display 0015. In an alternative embodiment, a decision tree algo represent the similarities among the samples in their expres rithm is used to identify patients with clinically meaningful sion of the fibroblast CSR genes; tumors are indicated by differences in outcome. At each node in the decision tree, all black branches, normal tissue by green branches. clinical risk factors and gene expression profiles are con 0.019 FIGS. 3A-3B. Context, Stability, and Prognostic sidered, patients with divergent outcomes using the domi Value of Fibroblast CSR in Breast Cancer. FIG. 3A. Expres nant risk factor are identified, and reiterated the process on sion patterns of CSR genes in a group of breast carcinomas each subgroup until the patients or risk factors became and normal breast tissue. Genes and samples were organized exhausted. by hierarchical clustering. The serum response of each gene is indicated on the right bar (red shows induced; green shows BRIEF DESCRIPTION OF THE DRAWINGS repressed by serum). Note the biphasic pattern of expression that allows each tumor sample to be classified as “activated 0016. The patent or application file contains at least one or “quiescent' based on the expression of the CSR genes. drawing executed in color. Copies of this patent or patent The previously identified tumor phenotype (color code) and application publication with color drawing(s) will be pro p53 status (solid blackbox shows mutated; white box shows vided by the Office upon request and payment of the wild-type) are shown. Pairs of tumor samples from the same necessary fee. patient, obtained before and after Surgery and chemotherapy, US 2006/0183141 A1 Aug. 17, 2006

are connected by black lines under the dendrogram. Two patients (N=151, 48 Activated, 103 Quiescent), 10 year OS primary tumor-lymph node metastasis pairs from the same for the Activated vs. Quiescent group are 52% vs. 80% patient are connected by purple lines. FIG. 3B. Kaplan respectively (p<0.00001). Right: In lymph node positive Meier survival curves for the two classes of tumors. Tumors patients, (N=144, 64 Activated vs. 80 Quiescent), 10 year with serum-activated CSR signature had worse disease OS for the Activated vs. Quiescent group are 51% vs. 90% specific Survival and relapse-free Survival compared to respectively (p=0.00002). tumors with quiescent CSR signature. Similar results were 0024 FIGS. 8A-8C. A scalable wound signature as a obtained whether performing classification using all breast guide for chemotherapy. FIG. 8A. Supervised wound sig tumors in this dataset or just the 58 tumors from the same nature adds prognostic information within the group of high clinical trial. risk patients identified by NIH consensus criteria. According 0020 FIGS. 4A-4D. Prognostic Value of Fibroblast CSR to the NIH criteria, 284 patients are high risk and advised to in Epithelial Tumors. Kaplan-Meier survival curves of undergo adjuvant chemotherapy; 72 patients had tumor tumors stratified into two classes using the fibroblast CSR positive lymph nodes. Patients were classified using the are shown for stage I and IIA breast cancer, FIG. 4A, stage serum activated fibroblast centroid (threshold=-0.15). 10 I and II lung adenocarcinoma, FIG. 4B; lung adenocarci years DMFP for the Activated (N=221) vs. Quiescent (n=61) noma of all stages, FIG. 4C, and stage III gastric carcinoma, is 58% vs. 83% respectively (p=0.0002). FIG. 8B. Super FG. 4D. vised wound signature stratifies St. Gallen criteria high risk patients. According to St. Gallen criteria, 271 patients are 0021 FIG. 5. Histological Architecture of CSR Gene high risk and advised to undergo adjuvant treatment, 72 Expression in Breast Cancer. Representative ISH of LOXL2 patients had tumor-positive lymph nodes. Using the Super and SDFR1 and IHC of PLOD2, PLAUR, and ESDN are vised wound signature, the 10 years DMFP for the Activated shown (magnification, 200x). Panels for LOXL2, PLAUR, (n=217) vs. Quiescent (n=56) group is 59% vs. 83% respec PLOD2, and ESDN represent cores of normal and invasive tively (p=0.0005). FIG. 8C. Graphical representation of ductal breast carcinoma from different patients on the same number of patients advised to undergo adjuvant systemic tissue microarray. Panels for SDFR1 demonstrate staining in treatment and their eventual outcomes based on the Super adjacent normal and carcinoma cells on the same tissue vised wound signature, the NIH, or St. Gallen Criteria in the section. Arrows highlight spindle-shaped stromal cells that 185 patients in this dataset that did not receive adjuvant stain positive for SDFR1 and PLOD2. No signal was chemotherapy. 40 patients had tumor-positive lymph nodes. detected for the sense probe for ISH or for control IHC Yellow indicates chemotherapy, blue indicates no chemo without the primary antibody. therapy. The bar at the left side shows which patients have developed distant metastasis as first event: Black indicates 0022 FIGS. 6A-6C. Prognostic value of fibroblast core distant metastasis; white indicated no metastasis. Thus blue serum response in breast cancer. FIG. 6A. Unsupervised in the lower bar indicates the potentially under treated hierarchical clustering of 295 breast cancer Samples using patients, yellow in the upper bar shows the potentially over 442 available CSR genes. Each row represents a gene; each treated patients. column represents a sample. The level of expression of each gene, in each sample, relative to the mean level of expres 0.025 FIGS. 9A-9D. Integration of diverse gene expres sion of that gene across all the samples, is represented using sion signatures for risk prediction. FIG. 9A. Compendium a red-green color scale as shown in the key; gray indicates of gene expression signatures in 295 breast tumors. Corre missing data. The identity of each gene in the fibroblast lation value to canonical centroids of classes defined by serum response is shown on the right bar (red indicates intrinsic genes (Basal, luminal A, luminal B, ErbB2, vs. activated; green indicates repressed by serum). The dendro normal-like), by the 70 genes (Poor prognosis VS good), and gram at the top indicates the similarities among the samples by the wound signature (Activated VS. quiescent). Each row in their expression of the CSR genes. Two main groups of is a class; each column is a sample. Lower panel shows tumors were observed: one group with similar expression to corresponding clinical outcomes; black vertical bar indi serum-activated firbroblasts, termed "Activated’, and a sec cated death or metastasis as the first recurrence event. FIG. ond group with a reciprocal expression pattern of CSR 9B. Summary of decision tree analysis. At each node, the genes, termed "Quiescent'. Two small Subsets of the quies dominant risk factor in multivariate analysis is used to cent group with more heterogeneous expression patterns are segregate patients, and the process is repeated in each indicated by yellow bars. FIG. 6B, FIG. 6C. Kaplan-Meier Subgroup until patients or risk factors became exhausted. We survival curves for the two classes of tumors. Tumors with found that the 70 gene signature was able to identify a group the activated wound response signature had worse overall of patients with very good prognosis (group 0), and then the survival (OS) and distant metastasis-free probability wound signature could divide the patients called “poor” by (DMFP) compared to tumors with a quiescent wound sig the 70 gene signature into those with moderate and signifi nature. 126 tumors were classified as Activated and 169 cantly worse outcomes (groups 1 and 2). FIG. 9C. Distri tumors as Quiescent. For Activated vs. Quiescent groups, 10 bution of 144 lymph node positive patients among the 3 year OS are 50% vs. 84% (p=5.6x10') and 10 year DMFP groups defined in (B). Because the 70 gene signature was are 51% vs. 75% (p=8.6x10), respectively. identified using a select subset of 60 patients with lymph node negative disease, the decision tree incorporating the 70 0023 FIG. 7. Decreased survival of tumors with acti gene signature was performed on the independent lymph vated wound signature independent of tumor size or lymph node positive subset to have an unbiased evaluation of risk node status. Left: In tumors<20 mm (pT1) (N=155, 48 prediction. Hazard ratios of metastasis risk after adjusting Activated, 107 Quiescent), the 10 year overall survival (OS) for all other factors listed in Table 1 are shown for the 3 for the Activated vs. Quiescent groups are 62% vs. 85%, subgroups stratified by the decision tree. FIG. 9D. Distant respectively (p=0.0009). Middle: in lymph node negative metastasis free probabilities of patients stratified by the US 2006/0183141 A1 Aug. 17, 2006 decision tree analysis. 55, 32, and 57 patients are in group profile data to determine whether a pattern of expression or 0, 1, and 2 respectively, and 10 years DMFP for the 3 groups response will be predictive of a phenotype of interest. were 89%, 78%, and 47%, respectively (p=6.94x10). Preferably the threshold for assignment is treated as a parameter, which can be used to quantify the confidence 0026 FIG. 10. Clinical outcomes of patients with inde with which patients are assigned to each class. The threshold terminate expression of the wound response signature (yel for assignment can be scaled to favor sensitivity or speci low bar in FIG. 6) are intermediate between patients with ficity, depending on the clinical scenario. In certain embodi activated and quiescent wound response signatures. ments, the expression profile is determined using a microar 0027 FIGS. 11A-11C.. Expression of the 5 molecular ray. In other embodiments the expression profile is Subtypes in early breast cancer and improved risk stratifi determined by quantitative PCR or other quantitative meth cation by addition of wound response and 70-gene signa ods for measuring mRNA. tures. FIG. 11A. Correlation of gene expression pattern in 0030 The subject invention also provides a reference 295 breast cancer samples to the centroids of the 5 molecular CSR expression profile for a response phenotype that is one subtypes. The strongest positive correlation of at least >0.10 of: (a) quiescent; or (b) induced; wherein said expression determines the subtype (1). The individual patient braches profile is recorded on a computer readable medium. are colored according to the Subclass as defined by centroid correlation. Note that the basal subtype is most clearly 0031. For quantitative PCR analysis, the subject inven defined, but >100 samples were not able to be assigned to tion provides a collection of gene specific primers, said any subtype. FIG. 11B. Tabular summary of patients in each collection comprising: gene specific primers specific for at tumor Subtype with the activated wound response signature least about 10, usually at least about 20 of the CSR genes, or poor prognosis 70-gene signature. Classification by the where in certain embodiments said collection comprises at unsupervised wound response signature from FIG. 1 was least 50 gene specific primers, at least 100, or more. The applied for consistency. FIG. 11C. Improve risk stratifica Subject invention also provides an array of probe nucleic tion by integration of signatures. Patients in the ErbB2 (left) acids immobilized on a solid Support, said array comprising: or Luminal B (right) subtypes were stratified by whether a plurality of probe nucleic acid compositions, wherein each they have both the wound response and 70-gene signatures. probe nucleic acid composition is specific for a CSR gene, Expression of the activated wound and poor prognosis where in certain embodiments said array further comprises 70-gene signatures conferred additive risk of death. at least one control nucleic acid composition. 0028 FIG. 12. Nonlinear multivariate analysis of prog 0032. The subject invention also provides a kit for use in nostic gene expression signatures and clinical risk factors in determining the phenotype of a source of a nucleic acid early stage breast cancer. Shown are the additive contribu sample, said kit comprising: at least one of: (a) an array as tions of the Wound signature (top row) and the 70 gene Good described above; or (b) a collection of gene specific primers Prognosis signature (bottom row) to the log-relative-risk in as described above. The kit may further comprise a software Cox proportional hazard models, in the presence of all package for data analysis of expression profiles. standard risk factors (Table 1). In the left column, the 0033. The present application may make reference to outcome is time to distant metastasis, while in the right it is information provided in Chang et al. (2004) PLoS Biology patient Survival time. The black curve in each case repre 2:206-214, including Supplemental materials provided sents the contribution of the signature as a smooth function, therein, which is herein specifically incorporated by refer using a basis of natural cubic splines with 4 interior knots. ence in its entirety. The green curves are pointwise-standard-error curves about 0034. Before the subject invention is described further, it the smooth curves. The blue lines are the result when these is to be understood that the invention is not limited to the continuous scores are fit instead by a pair of constants, particular embodiments of the invention described below, as obtained by thresholding the scores at the values indicated. variations of the particular embodiments may be made and Because the thresholds were obtained from the decision tree still fall within the scope of the appended claims. It is also analysis (FIG. 9B); their mapping to the linear part of the to be understood that the terminology employed is for the Smoothed curves indicates the congruence between the two purpose of describing particular embodiments, and is not models. The piecewise-constant fit Summarizes the contri intended to be limiting. Instead, the scope of the present bution of each of these scores, while the curves give a more invention will be established by the appended claims. In this detailed contribution. We note that the bends on the extreme specification and the appended claims, the singular forms two ends of the curves are fitted with less confidence (thus “a,”“an and “the' include plural reference unless the con much larger confidence intervals). Although some simple text clearly dictates otherwise. tests indicate evidence for these details, a larger dataset would be required to establish them convincingly. 0035. Where a range of values is provided, it is under stood that each intervening value, to the tenth of the unit of DETAILED DESCRIPTION OF THE the lower limit unless the context clearly dictates otherwise, EMBODIMENTS between the upper and lower limit of that range, and any other stated or intervening value in that stated range, is 0029 Methods are provided for classification of cancers, encompassed within the invention. The upper and lower particularly cancers derived from epithelial type cells, e.g. limits of these Smaller ranges may independently be carcinomas. Classification according to CSR signature included in the Smaller ranges, and are also encompassed allows optimization of treatment, and determination of within the invention, Subject to any specifically excluded whether on whether to proceed with a specific therapy, and limit in the Stated range. Where the stated range includes one how to optimize dose, choice of treatment, and the like. or both of the limits, ranges excluding either or both of those Methods are provided for statistical analysis of expression included limits are also included in the invention. US 2006/0183141 A1 Aug. 17, 2006

0036). Unless defined otherwise, all technical and scien with reagents, solubilization, or enrichment for certain com tific terms used herein have the same meaning as commonly ponents. The term encompasses a clinical sample, and also understood to one of ordinary skill in the art to which this includes cells in cell culture, cell Supernatants, cell lysates, invention belongs. Although any methods, devices and serum, plasma, biological fluids, and tissue samples. materials similar or equivalent to those described herein can be used in the practice or testing of the invention, the 0043. The terms “treatment”, “treating”, “treat” and the like are used herein to generally refer to obtaining a desired preferred methods, devices and materials are now described. pharmacologic and/or physiologic effect. The effect may be 0037 All publications mentioned herein are incorporated prophylactic in terms of completely or partially preventing herein by reference for the purpose of describing and dis a disease or symptom thereof and/or may be therapeutic in closing the Subject components of the invention that are terms of a partial or complete stabilization or cure for a described in the publications, which components might be disease and/or adverse effect attributable to the disease. used in connection with the presently described invention. “Treatment as used herein covers any treatment of a disease in a mammal, particularly a human, and includes: (a) pre 0038. As summarized above, the subject invention is venting the disease or symptom from occurring in a subject directed to methods of classification of cancers, as well as which may be predisposed to the disease or symptom but has reagents and kits for use in practicing the Subject methods. not yet been diagnosed as having it; (b) inhibiting the disease The methods may also determine an appropriate level of symptom, i.e., arresting its development; or (c) relieving the treatment for a particular cancer. disease symptom, i.e., causing regression of the disease or 0.039 Methods are also provided for optimizing therapy, symptom. by first classification, and based on that information, select ing the appropriate therapy, dose, treatment modality, etc. 0044) The terms “individual,”“subject,”“host,” and which optimizes the differential between delivery of an “patient,” used interchangeably herein and refer to any anti-proliferative treatment to the undesirable target cells, mammalian Subject for whom diagnosis, treatment, or while minimizing undesirable toxicity. The treatment is therapy is desired, particularly humans. Other subjects may optimized by selection for a treatment that minimizes unde include cattle, dogs, cats, guinea pigs, rabbits, rats, mice, sirable toxicity, while providing for effective anti-prolifera horses, and the like. tive activity. 0045. A “host cell’, as used herein, refers to a microor 0040. The invention finds use in the prevention, treat ganism or a eukaryotic cell or cell line cultured as a ment, detection or research into any cancer, including pros unicellular entity which can be, or has been, used as a trate, pancreas, colon, brain, lung, breast, bone, skin cancers. recipient for a recombinant vector or other transfer poly For example, the invention finds use in the prevention, nucleotides, and include the progeny of the original cell treatment, detection of or research into gastrointestinal can which has been transfected. It is understood that the progeny cers, such as cancer of the anus, colon, esophagus, gallblad of a single cell may not necessarily be completely identical der, stomach, liver, and rectum; genitourinary cancers such in morphology or in genomic or total DNA complement as as cancer of the penis, prostate and testes; gynecological the original parent, due to natural, accidental, or deliberate cancers, such as cancer of the ovaries, cervix, endometrium, mutation. uterus, fallopian tubes, vagina, and Vulva; head and neck 0046) The terms “cancer”, “neoplasm”, “tumor, and cancers, such as hypopharyngeal, laryngeal, oropharyngeal “carcinoma', are used interchangeably herein to refer to cancers, lip, mouth and oral cancers, cancer of the salivary cells that exhibit relatively autonomous growth, so that they gland, cancer of the digestive tract and sinus cancer, meta exhibit an aberrant growth phenotype characterized by a static cancer, sarcomas; skin cancer, urinary tract cancers significant loss of control of cell proliferation. In general, including bladder, kidney and urethral cancers; endocrine cells of interest for detection or treatment in the present system cancers, such as cancers of the thyroid, pituitary, and application include precancerous (e.g., benign), malignant, adrenal glands and the pancreatic islets; and pediatric can pre-metastatic, metastatic, and non-metastatic cells. Detec CS. tion of cancerous cells is of particular interest. 0041 “Diagnosis' as used herein generally includes 0047 The term “normal” as used in the context of determination of a Subjects Susceptibility to a disease or “normal cell,” is meant to refer to a cell of an untransformed disorder, determination as to whether a Subject is presently phenotype or exhibiting a morphology of a non-transformed affected by a disease or disorder, prognosis of a subject cell of the tissue type being examined. affected by a disease or disorder (e.g., identification of pre-metastatic or metastatic cancerous states, stages of can 0048 “Cancerous phenotype' generally refers to any of a cer, or responsiveness of cancer to therapy), and use of variety of biological phenomena that are characteristic of a therametrics (e.g., monitoring a Subject's condition to pro cancerous cell, which phenomena can vary with the type of vide information as to the effect or efficacy of therapy). cancer. The cancerous phenotype is generally identified by abnormalities in, for example, cell growth or proliferation 0042. The term “biological sample' encompasses a vari (e.g., uncontrolled growth or proliferation), regulation of the ety of sample types obtained from an organism and can be cell cycle, cell mobility, cell-cell interaction, or metastasis, used in a diagnostic or monitoring assay. The term encom passes blood and other liquid samples of biological origin, etc. Solid tissue samples, such as a biopsy specimen or tissue 0049) “Therapeutic target' generally refers to a gene or cultures or cells derived therefrom and the progeny thereof. gene product that, upon modulation of its activity (e.g., by The term encompasses samples that have been manipulated modulation of expression, biological activity, and the like), in any way after their procurement, Such as by treatment can provide for modulation of the cancerous phenotype. US 2006/0183141 A1 Aug. 17, 2006

0050. As used throughout, “modulation' is meant to refer 0056. A scaled approach may also be taken to the data to an increase or a decrease in the indicated phenomenon analysis. Pearson correlation of the expression values of (e.g., modulation of a biological activity refers to an increase CSR genes of tumor samples to the serum-activated fibro in a biological activity or a decrease in a biological activity). blast centroid results in a quantitative score reflecting the 0051. A “CSR signature' is a dataset that has been wound response signature for each sample. The higher the obtained from multiple fibroblast cells, and provides infor correlation value, the more the sample resembles serum mation on the change in expression of a set of genes activated fibroblasts (“activated wound response signa following fibroblast exposure to serum. A useful signature ture). A negative correlation value indicates the opposite may be obtained from all or a part of the gene dataset, behavior and higher expression of the “quiescent” wound usually the signature will comprise information from at least response signature. The threshold for the two classes can be about 20 genes, more usually at least about 30 genes, at least moved up or down from Zero depending on the clinical goal. about 35 genes, at least about 45 genes, at least about 50 For example, sensitivity and specificity for predicting genes, or more, up to the complete dataset. Where a Subset metastasis as the first recurrence event has been calculated of the dataset is used, the Subset may comprise upregulated for every threshold between -1 and +1 for the correlation genes, downregulated genes, or a combination thereof. score in 0.05 increments. The threshold value of negative 0.15 correlation gave 90% sensitivity for metastasis predic 0.052 Various methods for analysis of a set of data may tion in the training set, and had equivalent performance in be utilized. In one embodiment, expression data is subjected the test-set. to transformation and normalization. For example, ratios are generated by mean centering the expression data for each 0057 To provide significance ordering, the false discov gene (by dividing the intensity measurement for each gene ery rate (FDR) may be determined. First, a set of null on a given array by the average intensity of the gene across distributions of dissimilarity values is generated. In one all arrays), (2) then log-transformed (base 2) the resulting embodiment, the values of observed profiles are permuted to ratios, and (3) then median centered the expression data create a sequence of distributions of correlation coefficients across arrays then across genes. obtained out of chance, thereby creating an appropriate set of null distributions of correlation coefficients (see Tusher et 0053 For cDNA microarray data, genes with fluorescent al. (2001) PNAS 98, 5116-21, herein incorporated by ref hybridization signals at least 1.5-fold greater than the local erence). The set of null distribution is obtained by: permut background fluorescent signal in the reference channel are ing the values of each profile for all available profiles: considered adequately measured. The genes are centered by calculating the pairwise correlation coefficients for all pro mean value within each dataset, and average linkage clus file; calculating the probability density function of the tering carried out. The samples are segregated into two classes based on the first bifurcation in the hierarchical correlation coefficients for this permutation; and repeating clustering “dendrogram'. The clustering and reciprocal the procedure for N times, where N is a large number, expression of serum-induced and serum repressed genes in usually 300. Using the N distributions, one calculates an tumor expression data allows two classes to be unambigu appropriate measure (mean, median, etc.) of the count of ously assigned. Samples with generally high levels of correlation coefficient values that their values exceed the expression of the serum-induced genes and low levels of value (of similarity) that is obtained from the distribution of expression of the serum-repressed genes, are classified as experimentally observed similarity values at given signifi “activated, or “induced; conversely, samples with gener cance level. ally high levels of expression of serum-repressed genes and 0.058. The FDR is the ratio of the number of the expected low levels of expression of the serum-induced genes are falsely significant correlations (estimated from the correla classified as "quiescent'. tions greater than this selected Pearson correlation in the set 0054. In an alternative approach that quantifies the simi of randomized data) to the number of correlations greater larity of CSR gene expression in tumors vs. in cultured than this selected Pearson correlation in the empirical data fibroblasts, the expression pattern of CSR genes in the (significant correlations). This cut-off correlation value may fibroblast types is averaged to derive a single number for be applied to the correlations between experimental profiles. each gene. The Pearson correlation of the averaged fibro 0059. Using the aforementioned distribution, a level of blast expression pattern with the cancer sample is then confidence is chosen for significance. This is used to deter calculated. The Pearson correlation data allows the cancer mine the lowest value of the correlation coefficient that sample to be assigned as having a positive correlation to the exceeds the result that would have obtained by chance. fibroblast serum-induced expression pattern, or as being Using this method, one obtains thresholds for positive anti-correlated with serum-induced expression. For correlation, negative correlation or both. Using this thresh example, using Pearson correlation of 0.2 as the cutoff, old(s), the user can filter the observed values of the pairwise Cox-Mantel test confirmed that cancers with high correla correlation coefficients and eliminate those that do not tion to fibroblast serum-induced expression of CSR genes exceed the threshold(s). Furthermore, an estimate of the demonstrate poorer disease-specific Survival and relapse false positive rate can be obtained for a given threshold. For free survival. each of the individual "random correlation' distributions, 0055) To address the level of redundancy of CSR genes one can find how many observations fall outside the thresh in achieving tumor classification, a shrunken centroid analy old range. This procedure provides a sequence of counts. sis has been applied, using Prediction Analysis of Microar The mean and the standard deviation of the sequence rays (PAM). Using a 10-fold balanced leave-one-out training provide the average number of potential false positives and and testing procedure, it has been shown that approximately its standard deviation. 5% of the CSR dataset is sufficient to recapitulate the 0060. The data may be subjected to non-supervised hier classification archical clustering to reveal relationships among profiles. US 2006/0183141 A1 Aug. 17, 2006

For example, hierarchical clustering may be performed, by the computer to perform the procedures described herein. where the Pearson correlation is employed as the clustering The system may also be considered to be implemented as a metric. Clustering of the correlation matrix, e.g. using computer-readable storage medium, configured with a com multidimensional scaling, enhances the visualization of puter program, where the storage medium so configured functional homology similarities and dissimilarities. Multi causes a computer to operate in a specific and predefined dimensional scaling (MDS) can be applied in one, two or three dimensions. manner to perform the functions described herein. 0061 The analysis may be implemented in hardware or 0063 A variety of structural formats for the input and software, or a combination of both. In one embodiment of output means can be used to input and output the informa the invention, a machine-readable storage medium is pro tion in the computer-based systems of the present invention. vided, the medium comprising a data storage material One format for an output means test datasets possessing encoded with machine readable data which, when using a varying degrees of similarity to a trusted profile. Such machine programmed with instructions for using said data, presentation provides a skilled artisan with a ranking of is capable of displaying a any of the datasets and data comparisons of this invention. Such data may be used for a similarities and identifies the degree of similarity contained variety of purposes, such as drug discovery, analysis of in the test pattern. interactions between cellular components, and the like. Preferably, the invention is implemented in computer pro 0064. The CSR dataset may include expression data, for grams executing on programmable computers, comprising a example as set forth in the attached table of sequences. Such processor, a data storage system (including Volatile and information may include, for example: non-volatile memory and/or storage elements), at least one input device, and at least one output device. Program code is applied to input data to perform the functions described Imputation Engine Row Average Imputer above and generate output information. The output infor Data Type Two Class, unpaired mation is applied to one or more output devices, in known data fashion. The computer may be, for example, a personal Data in log scale? TRUE Number of Permutations 100 computer, microcomputer, or workstation of conventional Blocked Permutation? FALSE design. RNG Seed 1234567 0062 Each program is preferably implemented in a high (Delta, Fold Change) (0.93749.) (Upper Cutoff, Lower Cutoff) (1.10713, -2.02782) level procedural or objectoriented programming language to Computed Quantities communicate with a computer system. However, the pro Computed Exchangeability Factor SO O.O88187083 grams can be implemented in assembly or machine lan SO percentile O guage, if desired. In any case, the language may be a False Significant Number (Median, 90 percentile) (3.54839, 6.48387) compiled or interpreted language. Each Such computer pro False Discovery Rate (Median, 90 percentile) (3.28554, 6.00358) gram is preferably stored on a storage media or device (e.g., POHat O.32258 ROM or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and oper ating the computer when the storage media or device is read 0065

Significant Upregulated Genes

Gene Numerator Denominator Name Gene ID Score (d) (r) (s + s0) Fold Change q-value (%) S.77152 minichromosome maintenance deficient (S. cerevisiae) 7 6.24266579 0.88SS8576 O.14186019 8.7394 O.27199782 (MCM7) S.283532 uncharacterized bone marrow BMO39 (BMO39) 6.17372,963 1.36598.291 O.22125733 2.72859 O.27199782 HS.6879 DC13 protein (DC13) 6.16636831 O.883O2988 O.1432O096 84812 O.27199782 HS.1600 Homo sapiens mRNA for KIAAO098 protein, partial cols. (CCT5) 5.64212S27 O.82474861 O.14617694 91743 O.27199782 S.179718 v-myb avian myeloblastosis vira oncogene homolog-like 2 5.42648948 0.65247642 O.12O23914 SS694 O.27199782 (MYBL2) S.9991O phosphofructokinase, platelet (PFKP) 5.308732S2 1.4605137 0.27S11533 3.20727 O.27199782 S.8OSO6 Small nuclear ribonucleoprotein polypeptide A' (SNRPA1) SO1401.357 0.84545044. O.16861.75 85129 O.27199782 S.38178 Homo sapiens cDNA: FLJ23468 fis, clone HSI11603 (FLJ23468) 4.977SO953 1.18388499 0.23784686 2.07999 O.27199782 S.119.192 H2A histone family, member Z (H2AFZ) 4.83660369 O.71736233 0.14831943 59929 O.27199782 S.78619 gamma-glutamyl hydrolase (conjugase, folylpolygammaglutamyl 4.65988643 1.253723O8 O.26904585 2.27496 O.27199782 hydrolase) (GGH) S.76084 lamin B2 (LMNB2) 4.3608.1861 O.S4756893 0.12556.563 47445 O.27199782 S.301 OOS purine-rich element binding protein B (PURB) 4.32O34591 O.S8292O36 0.13492447 S1299 O.27199782 S.1046SO hypothetical protein FLJ10292 ( FLJ10292) 4.31473432 O.68766639 O.1593,763 71836 O.27199782 S.30738 hypothetical protein FLJ10407 ( FLJ10407) 4.26103924 O.6S4O7O63 0.153SOO26 60324 O.27199782 S.293943 ESTs, Highly similar to type III adenylyl cyclase H. sapiens 4.12208931 O.S2894.843 O.12832047 46766 O.27199782 (MGC11266) S.172052 serine/threonine kinase 18 (STK 18) 4.1023O318 O.96344313 O.2348,542 82998 O.27199782 S.95734 uridine monophosphate kinase (UMPK) 4.09493539 O.63O774S4 O.15403773 61626 O.27199782 US 2006/0183141 A1 Aug. 17, 2006

-continued Significant Upregulated Genes

Gene Numerator Denominator Name Gene ID Score (d) (r) (s + s()) Fold Change q-value (%) Hs. 184693 transcription elongation factor B (SIII), polypeptide 1 (15 kD, 4.04725975 O.S628.2743 O.13906383 46422 O.27199782 elongin C) (TCEB1) Hs.109059 mitochondrial ribosomal protein L12 (MRPL12) 3.81864,555 0.5775517 O.15124517 58426 O.27199782 Hs.71465 squalene epoxidase (SQLE) 3.71603394 O.78959 138 0.21248229 65688 O.27199782 Hs.72160 AND-1 protein (AND-1) 3.71136957 0.4482O71 O.12O76596 33466 O.27199782 Hs.74619 proteasome (prosome, macropain) 26S subunit, non-ATPase, 2 3.63896765 0.455662O6 0.12S2174 42331 O.27199782 (PSMD2) Hs. 151734 nuclear transport factor 2 (placental protein 15) (PP15) 3.615901 08 0.48679812 O.134627O6 42652 O.27199782 Hs. 184641 delta-6 fatty acid desaturase (FADS2) 3.SS28O237 1.2177139 O.34274744 2.18507 O.27199782 Hs.254105 MYC promoter-binding protein 1 (MPB1) 3.480O8378 0.605.4986S O.17398.968 69499 O.27199782 HS.233952 proteasome (prosome, macropain) subunit, alpha type, 7 3.42861708 O.4.1988.721 0.12246S47 35873 O.27199782 (PSMA7) Hs. 17377 coronin, actin-binding protein, 1C (CORO1C) 3.40013908 O.S7434988 O. 1689.1953 S61.83 O.27199782 Hs.81412 lipin 1 (LPIN1) 3.3831926S O.S485.1461 O.16212929 55.251 O.27199782 Hs.335918 farnesyl diphosphate synthase (farnesyl pyrophosphate 3.37338054 0.41139393 0.121953O2 33744 O.27199782 synthetase, dimethylallyltranstransferase, geranyltranstransferase) (FDPS) HS.41270 procollagen-lysine, 2-oxoglutarate 5-dioxygenase (lysine 3.322O4115 0.945.47776 O.2846O748 91049 O.27199782 hydroxylase) 2 (PLOD2) Hs. 167246 P450 (cytochrome) oxidoreductase (POR) 3.28545393 0.405082O6 0.12329561 34948 O.27199782 Hs.24763 RAN binding protein 1 (RANBP1) 3.28.285844 0.4064.4954. O.1238.096S 36186 O.27199782 Hs.25292 ribonuclease HI, large subunit (RNASEHI) 3.2764898 0.49781887 0.15193665 4SO33 O.27199782 Hs.21331 hypothetical protein FLJ10036 (FLJ10036) 3.263S2442 0.36796.159 0.11274976 29306 O.27199782 Hs.118638 non-metastatic cells 1, protein (NM23A) expressed in (NME1) 3.20843494 O.7648118 0.2383,7535 2.25967 O.27199782 Hs.425427 hypothetical protein FLJ20425 (FLJ20425) 3.192296.12 0.38036SO9 O. 1191SO94 33523 O.27199782 HS-39504 ESTS 3.18081739 O.410O3O2S O.12890719 34102 O.27199782 Hs.76038 isopentenyl-diphosphate delta isomerase (IDI1) 3.17655837 O.S6842601 0.17894398 44489 O.27199782 Hs. 13413 Homo sapiens clone 24463 mRNA sequence 3.14715276 O.4584221 O.14566249 44799 O.27199782 Hs.300592 v-myb avian myeloblastosis viral oncogene homolog-like 1 3.1333SS32 0.99922149 0.31889824 87750 O.27199782 (MYBL1) HS.30928 DNA segment on 19 (unique) 1177 expressed 3.1080O899 0.37209625 0.11972174 29528 O.27199782 sequence (D19S1177E) Hs.254105 enolase 1, (alpha) (ENO1) 3.1030O284 0.762991 63 0.24588815 74.559 O.27199782 Hs.20295 CHK1 (checkpoint, (S. pombe) homolog (CHEK1) 3.06131965 0.43883572 0.14334.855 34487 O.27199782 Hs.179657 plasminogen activator, urokinase receptor (PLAUR) 2.9944.4759 O.S2176544 0.1742443 46068 O.27199782 Hs.301613 JTV1 gene (JTV1) 2.90806819 O.33659929 O. 1157467 28442 O.27199782 Hs. 132898 Homo sapiens clone 23716 mRNA sequence (FADS1) 2.90OOSS17 O.7O1582SS O.24192041 .766.17 O.27199782 Hs.90421 PRO2463 protein (PRO2463) 2.86786719 O.35798348. O.12482S68 27507 O.27199782 Hs. 144407 hypothetical protein FLJ10956 (FLJ10956) 2.8096OOO9 O.43376209 O.154385.71 31539 O.27199782 HS.374421 ESTs, Moderately similar to IP63 protein R. norvegicus 2.74678832 0.379.34929 O.13810649 34755 O.27199782 (KIAA0203) Hs. 1063 Small nuclear ribonucleoprotein polypeptide C (SNRPC) 2.73.871301 0.31259621 O. 11413982 26394 O.27199782 Hs.274350 BAF53 (BAF53A) 2.71855649 0.4O765963 0.14995445 .338OS O.27199782 HS.180403 ESTS 2.689,10682 0.366OOO76 O.13610495 32377 O.27199782 Hs. 180403 STRIN protein (STRIN) 2.663.20957 0.35379143 0.132844O1 31840 O.27199782 Hs.239189 Homo sapiens glutaminase isoform M precursor, mRNA, 2.65063913 0.63428913 0.2392.9667 4.62885 O.27199782 complete cols Hs.274170 Opa-interacting protein 2 (OIP2) 2.64516217 O.31482978 O. 11902097 24892 O.27199782 Hs.433434 proteasome (prosome, macropain) subunit, beta type, 7 (PSMB7) 2.6052457 O.3165641 O.12151027 28888 O.27199782 Hs. 136644 CS box-containing WD protein (LOC55884) 2.58087422 O.339 12871 O.13140071 28953 O.27199782 HS. 709 deoxycytidine kinase (DCK) 2.S7369859 O.37O13597 0.14381481 297,59 O.27199782 Hs.29088 ESTs, Weakly similar to ARL3 HUMAN ADP-RIBOSYLATION 2.5261OO98 0.487.04394 O.1928O462 43699 O.27199782 FACTOR-LIKE PROTEIN 3 H. sapiens Hs.5957 Homo sapiens clone 24416 mRNA sequence 2.52O160O3 0.39965241 0.15858215 38929 O.27199782 Hs.179565 minichromosome maintenance deficient (S. cerevisiae) 3 (MCM3) 2.4894397 O.285.49077 O. 11468073 24454 O.27199782 Hs.73965 splicing factor, arginine?serine-rich 2 (SFRS2) 2.47543942 0.24756852 O.1OOOO993 16457 O.27199782 HS.388 nudix (nucleoside diphosphate linked moiety X)-type motif 1 2.4611642 O.27071923 O.10999641 21357 O.27199782 (NUDT1) Hs.79172 solute carrier family 25 (mitochondrial carrier; adenine nucleotide 2.4483.298 O4108956 O.1678.269 3O174 O.27199782 translocator), member 5 (SLC25A5) Hs.3828 mevalonate (diphospho) decarboxylase (MVD) 2.42S13279 0.23018084 O.O94.91474 172O7 O.27199782 Hs. 153179 fatty acid binding protein 5 (psoriasis-associated) (FABP5) 2.418433O2 O.S9851.464 O.24748O3S 2.17670 O.27199782 Hs.334612 Small nuclear ribonucleoprotein polypeptide E (SNRPE) 2.4095.1258 0.386OO4SS O.16O20026 3.7147 O.27199782 Hs.267288 hypothetical protein (HSPC228) 2.40575144 OS4178947 0.22S2O592 40436 O.27199782 Hs. 81361 heterogeneous nuclear ribonucleoprotein A/B (HNRPAB) 2.38340841 O.28992.098 0.121641.33 23242 O.27199782 Hs. 15159 chemokine-like factor 3, alternatively spliced (LOC51192) 2.29576393 0.26653.362 0.116098O1 2O594 O.27199782 Hs. 170328 moesin (MSN) 2.27516047 O.36.787244 0.16169076 39934 O.27199782 Hs.75721 profilin 1 (PFN1) 2.2S143981 0.22S18119 O.1OOO1653 17696 O.27199782 Hs. 159226 hyaluronan synthase 2 (HAS2) 2.2416810S 0.43338.901 O.1933.3215 4O128 O.27199782 Hs. 115474 replication factor C (activator 1) 3 (38 kD) (RFC3) 2.22895495 0.31247982 0.14019118 17852 O.27199782 Hs. 173255 Small nuclear ribonucleoprotein polypeptide A (SNRPA) 2.211536O1 O.2O861507 O.O943304 15803 O.27199782 US 2006/0183141 A1 Aug. 17, 2006

-continued Significant Upregulated Genes

Gene Numerator Denominator Name Gene ID Score (d) (r) (s + s()) Fold Change q-value (%) Hs.236204 nuclear pore complex protein (NUP107) 2.19861709 O.3010O287 0.13690554 23597 O.27199782 Hs.333212 retinal degeneration B beta (RDGBB) 2.176OO694 O.31922978 O.1467.0439 25754 O.27199782 Hs. 115660 hypothetical protein FLJ12810 (FLJ12810) 2.17426911 O.296.8772S 0.13654117 23752 O.27199782 Hs.21293 UDP-N-acteylglucosamine pyrophosphorylase 1 (UAP1) 2.16628553 0.31945677 0.147467S3 22148 O.27199782 Hs.232400 heterogeneous nuclear ribonucleoprotein A2/B1 (HNRPA2B1) 2.16O2SO42 (0.22190382 O. 10272134 16264 O.27199782 Hs. 6441 issue inhibitor of metalloproteinase 2 (TIMP2) 2.12233563 0.32258277 0.15199423 28837 O.27199782 HS. 6679 hHDC for homolog of Drosphila headcase (LOC51696) 2.09642466 0.34282628. O.163529O2 24711 O.27199782 Hs.251754 secretory leukocyte protease inhibitor (antileukoproteinase) 2.0793.6889 O.33592311 O.161SSOS1 34579 O.27199782 (SLPI) Hs.50848 hypothetical protein FLJ20331 (FLJ20331) 2.0786OSO8 0.462.43275 O.22247263 26114 O.27199782 Hs. 15159 transmembrane proteolipid (HSPC224) 2.0671.3387 0.28O15069 O.13552615 21214 O.27199782 Hs.77910 3-hydroxy-3-methylglutaryl-Coenzyme A synthase 1 (soluble) 2.0608O3O2 O.S4351413 O.263739 32792 O.27199782 (HMGCS1) HS.99185 polymerase (DNA directed), epsilon 2 (POLE2) 2.03931226 O.32387O2S 0.15881347 22127 OSO48S927 Hs. 132898 fatty acid desaturase 1 (FADS1) 2.03113336 0.44728926 0.22021659 27983 OSO48S927 Hs.4209 mitochondrial ribosomal protein L37 (MRPL37) 2.01731461 O.298.97245 0.1482O318 2SO48 OSO48S927 Hs. 132004 cardiotrophin-like cytokine; neurotrophin-1/B-cell stimulating 97290698 0.27794123 O.14087903 2O693 OSO48S927 actor-3 (CLC) Hs.21635 tubulin, gamma 1 (TUBG1) .96841526 O.30936059 0.15716226 25460 OSO48S927 Hs.283077 ESTs, Weakly similar to I38428 T-complex protein 10A 96.732498 0.46245778 O.23SO6934 33.738 OSO48S927 H. sapiens (BMO32) Hs.83753 Small nuclear ribonucleoprotein polypeptides B and B1 (SNRPB) .95649714 O.24O64175 0.12299622 18166 OSO48S927 HS.37616 Human D9 splice variant B mRNA, complete cols (D9 splice 9478297 O41887173 0.21SO4536 S8924 O.692S2O78 variant A) Hs.433410 menage a trois 1 (CAK assembly factor) (MNAT1) 93937463 O.3073OS25 0.1584.5585 25912 O.692S2O78 Hs.250758 proteasome (prosome, macropain) 26S subunit, ATPase, 3 93361382 O.21467994. O.11102524 15020 O.692S2O78 (PSMC3) Hs.279918 hypothetical protein (HSPC111) 92490899 0.26037618 O. 13526675 2O32S O.692S2O78 Hs. 115823 ribonuclease P, 40 kD subunit (RPP40) 89539827 0.45966533 0.24251649 25592 O.692S2O78 Hs.234279 microtubule-associated protein, RP/EB family, member 1 8883O139 O.21898.904 O. 11597144 175O1 O.692S2O78 (MAPRE1) HS.366 6-pyruvoyltetrahydropterin synthase (PTS) 87341499 0.24O38499 0.1283138 21664 O.692S2O78 Hs.433317 eukaryotic translation initiation factor 4E binding protein 1 8678.3064 O.3398.6461 O.18195686 13648 O.692S2O78 (EIF4EBP1) Hs.34045 hypothetical protein FLJ20764 (FLJ20764) 8.6462266 O41839215 O.224384.35 23S06 O.692S2O78 Hs.55097 HSPC007 protein (MRPS28) 85812O75 0.298.12259 0.16O44307 2612S O.692S2O78 Hs.283077 centrosomal P4.1-associated protein; uncharacterized bone 8392S243 O.21972973 0.1194.6687 18278 1.283697OS marrow protein BMO32 (BMO32) Hs.75231 solute carrier family 16 (monocarboxylic acid transporters), 81687376 O.S1059.437 0.281 02908 48465 1.283697OS member 1 (SLC16A1) Hs.3745 milk fat globule-EGF factor 8 protein (MFGE8) 803981.81 0357934O3 O.1984.1333 7342O 1.283697OS Hs.9081 phenylalanyl-tRNA synthetase beta-subunit (PheHB) .7997857 O.2294.3372 O.12747836 13698 1.283697OS HS.S957 ESTS .79248.137 O.3O336788. O.16924.465 26912 1.283697OS Hs.30035 splicing factor, arginine?serine-rich (transformer 2 Drosophila 78531703 0.21351528 O. 11959516 14083 1.283697OS homolog) 10 (SFRS10) Hs.56205 insulin induced gene 1 (INSIG1) 762O129 O.424.91935 0.2411SS64 2SO21 1.283697OS Hs. 173374 Homo sapiens unknown mRNA .74582269 O.2781.SSO6 0.159326O7 2612O 1.283697OS Hs.389371 stromal cell derived factor receptor 1 (SDFR1) 717621 O2 O.3068.1408. O.17862734 431.67 1.283697OS HS.82109 syndecan 1 (SDC1) 68.910218 O.36091597 0.21367326 316O2 2.08.629682 Hs.346868 nucleolar protein p40; homolog of yeast EBNA1-binding protein 63693648 0.2042O155 O.12474617 17604 2.08.629682 (P40) HS.1600 ESTS .6296.5745 0.32896O37 0.2O18586 46752 2.08.629682 Hs. 119597 stearoyl-CoA desaturase (delta-9-desaturase) (SCD) 62297238 0.468746SS O.2888.1979 13094 2.08.629682 Hs.355899 type I transmembrane protein Fn 14 (FN14) .62O785.19 O.24764974. O.152796.15 188O2 2.08.629682 Hs.44235 hypothetical protein from clone 24774 (LOC57213) 616S4O79 0.2384004 O.14747S6S 14323 2.08.629682 HS.26812 ESTS 6138S953 O2S89403 O.1604.4785 24737 2.08.629682 Hs. 111632 LSm3 protein (LSM3) 6O108709 O.19531168 0.12198692 14935 2.08.629682 Hs.77254 chromobox homolog 1 (Drosphia HP1 beta) (CBX1) S7781631 O.26629665 0.16877545 22826 2.08.629682 Hs.94262 p53-inducible ribonucleotide reductase small subunit 2 homolog S7068021 O.25133806 0.16001861 24555 2.08.629682 (p53R2) Hs. 117950 multifunctional polypeptide similar to SAICAR synthetase and AIR 1.55751281 0.2398798 O.154O146S 238OS 2.08.629682 carboxylase (ADE2H1) Hs.4295 proteasome (prosome, macropain) 26S subunit, non-ATPase, 12 SS435183 0.26SS8351. O.17086447 14969 2.08.629682 (PSMD12) Hs.89718 spermine synthase (SMS) S3674434 0.2O2SOS14 0.13177542 15724 4.44342297 Hs. 149155 voltage-dependent anion channel 1 (VDAC1) S1556,741. O.196994.55 0.12998O72 15767 4.44342297 Hs.433750 eukaryotic translation initiation factor 4 gamma, 1 (EIF4G1) 49744535 0.13959.027 O.O9321894 10967 4.44342297 Hs.91579 Homo sapiens clone 23783 mRNA sequence 494.66O48 0.15SS2436 0.1040533 11OSO 4.44342297 Hs.46967 HSPCO34 protein (LOC51668) 48595392 O.19758214 O. 13296653 16297 4.44342297 Hs.266940 t-complex-associated-testis-expressed 1-like 1 (TCTEL1) 432S7635 0.232O4568 0.16197788 21344 4.44342297 Hs.100S6 ESTS 4297.0833 0.44285313 O.3097SO68 3.67816 4.44342297

US 2006/0183141 A1 Aug. 17, 2006 11

-continued Significant Upregulated Genes

Gene Numerator Denominator Name Gene ID Score (d) (r) (s + s()) Fold Change q-value (%) S.80545 mitogen-activated protein kinase 8 interacting protein 2 -2.56560737 -0.520O2O34 0.202688.98 O.67846 O.27199782 (MAPK8IP2) S.352413 chaperonin containing TCP1, subunit 8 (theta) (CCT8) -2.52837862 -0.31985871 O.126SO744 O.81426 O.27199782 S.24758 ESTS -24O266.066 -0.42164464 O. 17549071 0.79853 O.27199782 S.27973 KIAAO874 protein (K AA0874) -2.37907339 -0.413773S4. O.17392214 O.73S44 O.27199782 S.432790 Homo sapiens cDNA: FLJ23582 fis, clone LNG 13759 -2.37S74729 -0.36811418. O.15494669 O.81504 O.27199782 S.26418 ESTS -2.3553O275 -0.4593O149 0.195OO741 O.71293 OSO48S927 HS.15220 ESTs, Weakly similar o zinc finger protein 106 M. musculus -2.32669624 -0.25972128 0.11162664 O.83809 OSO48S927 (ZFP106) Homo sapiens mRNA: cDNA DKFZp564D0472 (from clone -2.32O1161 -O32416509 O.1397 1934 O.77172 OSO48S927 DKFZp564D0472) KIAA1095 protein (K AA1095) -2.279963O4 -0.47OOOO67 0.2O614399 O.696.76 O.692S2O78 ESTS -2.26647751 -0.27S3334 0.12148075 O.81381 O.692S2O78 Homo sapiens cDNA FLJ20678 fis, clone KAIA4163 -2.2359993 -0.287.13412 O.12841423 0.79712 O.692S2O78 ribosomal protein, large P2 (RPLP2) -2.1697615 -0.26644626 O. 122799.79 O.83829 1.283697OS epithelial membrane protein 2 (EMP2) -2.15845615 -0.31964271 0.14808858 O.80186 1.283697OS KIAAO090 protein (K AAO090) -2.1463.0109 -0.24363817 O. 11351.538 O.83816 1.283697OS axin 2 (conductin, axi ) (AXIN2) -2.O18493.79 -0.22433745 0.11114102 O.8SOO1 2.08.629682 Homo sapiens cDNA: FLJ23582 fis, clone LNG 13759 -2.01727159 -0.25574906 0.12677968 O.84215 2.08.629682 LIM domain binding (LDB1) -2.OO480389 -0.17SO6154 0.0873.2103 O.88236 2.08.629682 Homo sapiens cDNA FLJ20053 fis, clone COLOO809. -19883O118 -0.43983115 0.22120952 O.72974 2.08.629682 KIAA1036 protein (K AA1036) -196266233 -0.23934,489 O.12194909 O.83497 2.08.629682 heme-regulated initiation factor 2-alpha kinase (HRI) - 194772O66 -O3O225721 0.15518509 O.839S4 2.08.629682 Rho guanine nucleoti e exchange factor (GEF) 3 (ARHGEF3) -194714513 -0.41123378 O.21119832 O.65987 2.08.629682 8136 quiescent cell proline dipeptidase (DPP7) -194611299 -0.24O23O27 O.1234.4107 O.84708 2.08.629682 .356344 zinc finger protein 36 (KOX 18) (ZNF36) -193518401 -0.28154264. O.14548624 O85414 2.08.629682 53639 hypothetical SBBIO3 protein (SBB103) -191175493 -0.242428O3 O.1268.0916 O.86669 4.44342297 70056 Homo sapiens mRNA: cDNA DKFZp586B0220 (from clone -1.89784876 -O.SSO615O1 O.29O12586 O.22O72 4.44342297 DKFZp586BO220) interleukin 1 receptor, type I (IL1R1) -1.892.1268 -O33324684 O.17612289 0.77867 4.44342297 coagulation factor X (F10) -1.88931086 -0.24229629 O.12824586 O.81539 4.44342297 filamin C, gamma (actin-binding protein-280) (FLNC) -1.88226223 -0.259789 O.138O1955 O.7864.5 4.44342297 ESTS -1.8739.173 -0.42O67.057 0.22448726 O.28114 4.44342297 ESTS -1.85.084709 -0.46576O42 (0.2S164716 0.27509 4.44342297 Homo sapiens myosin, light polypeptide 6, alkali, Smooth muscle -1.85O28666 -0.23423984 O.12659651 O84663 4.44342297 and non-muscle (MYL6), mRNA. (MYL6) Homo sapiens cDNA FLJ12798 fis, clone NT2RP2002076, -1.849901.85 -0.222688SS O.12O37857 O.85691 4.44342297 highly similar to Homo sapiens clone 24804 mRNA sequence (MGC2722) S.75335 glycine amidinotransferase (L-arginine:glycine -1.84126621. -O35233323 O.1913S377 O.74OOO 4.44342297 amidinotransferase) (GATM) S.373498 organic cation transporter (LOC57100) -1.792O8398 -0.44085062 O.24599886 O.82117 4.44342297 s.179735 Homo sapiens mRNA: cDNA DKFZp434P1514 (from clone -1.78655725 -0.57761661 O.3233.1268 0.77298 4.44342297 DKFZp434P1514); partial cols (DKFZp434P1514) protein kinase (cAMP-dependent, catalytic) inhibitor gamma -1.77424.818 -0.16272O77 0.09171252 O891.40 4.44342297 (PKIG) S. 62.192 coagulation factor III (thromboplastin, tissue factor) (F3) -1.7673 1078 -0.54771014 O.3099.1162 O.S2O41 4.44342297 S.17270 DKFZP434C211 protein (DKFZP434C211) -1.76107883 -0.20459113 O. 11617375 O84740 4.44342297 S.118630 MAX-interacting protein 1 (MXI1) -1.74086O39 -0.25335606 0.1455,3497 O.84390 4.44342297 S.323583 AD021 protein (LOC51313) -1.7224.5235 -0.438.95237 0.254.841.51 O.28919 4.44342297 HS.8026 Homo sapiens cDNA: FLJ21987 fis, clone HEPO6306 -1.6939.0783 -0.2293SO42 (0.13539723 O.84S12 4.44342297 S.34359 ESTS -165617257 -0.22756407 O.1374036 O.87929 4.44342297 S.2S253 Homo sapiens cDNA: FLJ20935 fis, clone ADSEO1534 -1.65599971 -0.34298.933 0.2071.192 0.75112 4.44342297 (MAN1A1) S.111903 Fc fragment of IgG, receptor, transporter, alpha (FCGRT) -162435104 -0.24121667 O.1485OO33 O.83S48 4.44342297 s.179735 ras homolog gene family, member C (ARHC) -160421404 -0.22927649 O.14292138 O.87841 4.44342297 S.79914 umican (LUM) -1.6OO12485 -0.35795777 O.2237O615 0.75593 4.44342297 HS.366 interferon induced transmembrane protein 1 (9–27) (IFITM1) -15936.7416 -0.386.64867 O.24261463 0.85753 4.44342297 S.124.696 Homo sapiens oxidoreductase UCPA (LOC56898), mRNA. -158443416 -0.18040354. O.11385992 0.87759 4.44342297 (LOC56898) S.127337 ESTs (AXIN2) -1583S1156 -0.41159548 0.25992578 0.67751 4.44342297

0.066 Tumor classification and patient stratification. The sis and death, and therefore may be treated more aggres invention provides for methods of classifying tumors, and sively than tumors of a "quiescent' type. thus grouping or 'stratifying patients, according to the CSR 0067. The tumor of each patient in a pool of potential signature. As shown in the Examples, tumors classified as patients for a clinical trial can be classified as described having an “induced signature carry a higher risk of metasta above. Patients having similarly classified tumors can then US 2006/0183141 A1 Aug. 17, 2006 be selected for participation in an investigative or clinical cell or tissue being diagnosed. The nucleic acid may include trial of a cancer therapeutic where a homogeneous popula RNA or DNA nucleic acids, e.g., mRNA, cRNA, cDNA etc., tion is desired. The tumor classification of a patient can also So long as the sample retains the expression information of be used in assessing the efficacy of a cancer therapeutic in the host cell or tissue from which it is obtained. a heterogeneous patient population. Thus, comparison of an 0072 The sample may be prepared in a number of individual’s expression profile to the population profile for different ways, as is known in the art, e.g., by mRNA a type of cancer, permits the selection or design of drugs or isolation from a cell, where the isolated mRNA is used as is, other therapeutic regimens that are expected to be safe and amplified, employed to prepare cDNA, cRNA, etc., as is efficacious for a particular patient or patient population (i.e., known in the differential expression art. The sample is a group of patients having the same type of cancer). typically prepared from a tumor cell or tissue harvested from 0068 The methods of the invention can be carried out a Subject to be diagnosed, using standard protocols, where using any Suitable probe for detection of a gene product that cell types or tissues from which such nucleic acids may be is differentially expressed in colon cancer cells. For generated include any tissue in which the expression pattern example, mRNA (or cDNA generated from mRNA) of the to be determined phenotype exists. Cells may be expressed from a CSR gene can be detected using poly cultured prior to analysis. nucleotide probes. In another example, the CSR gene prod 0073. The expression profile may be generated from the uct is a polypeptide, which polypeptides can be detected initial nucleic acid sample using any convenient protocol. using, for example, antibodies that specifically bind Such While a variety of different manners of generating expres polypeptides or an antigenic portion thereof. sion profiles are known, such as those employed in the field 0069. The present invention relates to methods and com of differential gene expression analysis, one representative positions useful in diagnosis of cancer, design of rational and convenient type of protocol for generating expression therapy, and the selection of patient populations for the profiles is array based gene expression profile generation purposes of clinical trials. The invention is based on the protocols. Such applications are hybridization assays in discovery that tumors of a patient can be classified according which a nucleic acid that displays “probe' nucleic acids for to CSR expression profile. Polynucleotides that correspond each of the genes to be assayed/profiled in the profile to be to the selected CSR genes can be used in diagnostic assays generated is employed. In these assays, a sample of target to provide for diagnosis of cancer at the molecular level, and nucleic acids is first prepared from the initial nucleic acid to provide for the basis for rational therapy (e.g., therapy is sample being assayed, where preparation may include label selected according to the expression pattern of a selected set ing of the target nucleic acids with a label, e.g., a member of genes in the tumor). The gene products encoded by CSR of signal producing system. Following target nucleic acid genes can also serve as therapeutic targets, and candidate sample preparation, the sample is contacted with the array agents effective against Such targets Screened by, for under hybridization conditions, whereby complexes are example, analyzing the ability of candidate agents to modu formed between target nucleic acids that are complementary late activity of differentially expressed gene products. to probe sequences attached to the array Surface. The pres ence of hybridized complexes is then detected, either quali 0070 The term expression profile is used broadly to tatively or quantitatively. include a genomic expression profile, e.g., an expression profile of mRNAS, or a proteomic expression profile, e.g., an 0074 Specific hybridization technology which may be expression profile of one or more different . Profiles practiced to generate the expression profiles employed in the may be generated by any convenient means for determining subject methods includes the technology described in U.S. differential gene expression between two samples, e.g. quan Pat. Nos. 5,143,854: 5,288,644; 5,324,633; 5,432,049; titative hybridization of mRNA, labeled mRNA, amplified 5,470,710; 5,492,806; 5,503,980; 5,510,270: 5,525,464; mRNA, cRNA, etc., quantitative PCR, ELISA for protein 5,547,839; 5,580,732; 5,661,028; 5,800,992; the disclosures quantitation, and the like. A Subject or patient tumor sample, of which are herein incorporated by reference; as well as e.g., cells or collections thereof, e.g., tissues, is assayed. WO 95/21265; WO 96/31622; WO 97/10365; WO Samples are collected by any convenient method, as known 97/27317; EP 373 203; and EP 785 280. In these methods, in the art. Additionally, tumor cells may be collected and an array of “probe' nucleic acids that includes a probe for tested to determine the relative effectiveness of a therapy in each of the phenotype determinative genes whose expres causing differential death between normal and diseased sion is being assayed is contacted with target nucleic acids cells. Genes/proteins of interest are genes/proteins that are as described above. Contact is carried out under hybridiza found to be predictive, including the genes/proteins pro tion conditions, e.g., stringent hybridization conditions as described above, and unbound nucleic acid is then removed. vided above, where the expression profile may include The resultant pattern of hybridized nucleic acid provides expression data for 5, 10, 20, 25, 50, 100 or more of information regarding expression for each of the genes that including all of the listed genes/proteins. have been probed, where the expression information is in 0071. In certain embodiments, the expression profile terms of whether or not the gene is expressed and, typically, obtained is a genomic or nucleic acid expression profile, at what level, where the expression data, i.e., expression where the amount or level of one or more nucleic acids in the profile, may be both qualitative and quantitative. sample is determined. In these embodiments, the sample that is assayed to generate the expression profile employed in the 0075 Alternatively, non-array based methods for quan diagnostic methods is one that is a nucleic acid sample. The titating the levels of one or more nucleic acids in a sample nucleic acid sample includes a plurality or population of may be employed, including quantitative PCR, and the like. distinct nucleic acids that includes the expression informa 0076. Where the expression profile is a protein expres tion of the phenotype determinative genes of interest of the sion profile, any convenient protein quantitation protocol US 2006/0183141 A1 Aug. 17, 2006 may be employed, where the levels of one or more proteins can be read and accessed directly by a computer. Such media in the assayed sample are determined. Representative meth include, but are not limited to: magnetic storage media, Such ods include, but are not limited to; proteomic arrays, flow as floppy discs, hard disc storage medium, and magnetic cytometry, standard immunoassays, etc. tape; optical storage media such as CD-ROM; electrical 0077. Following obtainment of the expression profile storage media such as RAM and ROM; and hybrids of these from the sample being assayed, the expression profile is categories such as magnetic/optical storage media. One of compared with a reference or control profile to make a skill in the art can readily appreciate how any of the diagnosis. A reference or control profile is provided, or may presently known computer readable mediums can be used to be obtained by empirical methods from samples of fibro create a manufacture comprising a recording of the present blasts exposed to serum. In certain embodiments, the database information. “Recorded refers to a process for obtained expression profile is compared to a single refer storing information on computer readable medium, using ence/control profile to obtain information regarding the any such methods as known in the art. Any convenient data phenotype of the cell/tissue being assayed. In yet other storage structure may be chosen, based on the means used to embodiments, the obtained expression profile is compared to access the stored information. A variety of data processor two or more different reference/control profiles to obtain programs and formats can be used for storage, e.g. word more in depth information regarding the phenotype of the processing text file, database format, etc. assayed cell/tissue. For example, the obtained expression 0082. As used herein, “a computer-based system” refers profile may be compared to a positive and negative reference to the hardware means, software means, and data storage profile to obtain confirmed information regarding whether means used to analyze the information of the present inven the cell/tissue has the phenotype of interest. tion. The minimum hardware of the computer-based systems 0078. The difference values, i.e. the difference in expres of the present invention comprises a central processing unit sion may be performed using any convenient methodology, (CPU), input means, output means, and data storage means. where a variety of methodologies are known to those of skill A skilled artisan can readily appreciate that any one of the in the array art, e.g., by comparing digital images of the currently available computer-based system are suitable for expression profiles, by comparing databases of expression use in the present invention. The data storage means may data, etc. Patents describing ways of comparing expression comprise any manufacture comprising a recording of the profiles include, but are not limited to, U.S. Pat. Nos. present information as described above, or a memory access 6,308,170 and 6,228,575, the disclosures of which are herein means that can access such a manufacture. incorporated by reference. Methods of comparing expres 0083) A variety of structural formats for the input and sion profiles are also described above. output means can be used to input and output the informa 0079 A statistical analysis step is then performed to tion in the computer-based systems of the present invention. obtain the weighted contribution of the set of predictive Such presentation provides a skilled artisan with a ranking genes. For example, nearest shrunken centroids analysis of similarities and identifies the degree of similarity con may be applied as described in Tibshirani et al. (2002) tained in the test expression profile. P.N.A.S. 99:6567-6572 to compute the centroid for each class, then compute the average squared distance between a Reagents and Kits given expression profile and each centroid, normalized by 0084. Also provided are reagents and kits thereof for the within-class standard deviation. practicing one or more of the above-described methods. The 0080. The classification is probabilistically defined, Subject reagents and kits thereof may vary greatly. Reagents where the cut-off may be empirically derived. In one of interest include reagents specifically designed for use in embodiment of the invention, a probability of about 0.4 may production of the above described expression profiles of be used to distinguish between quiescent and induced phenotype determinative genes. patients, more usually a probability of about 0.5, and may utilize a probability of about 0.6 or higher. A “high prob 0085 One type of such reagent is an array of probe ability may be at least about 0.75, at least about 0.7, at least nucleic acids in which CSR genes of interest are represented. about 0.6, or at least about 0.5. A “low” probability may be A variety of different array formats are known in the art, with not more than about 0.25, not more than 0.3, or not more a wide variety of different probe structures, substrate com than 0.4. In many embodiments, the above-obtained infor positions and attachment technologies. Representative array mation about the cell/tissue being assayed is employed to structures of interest include those described in U.S. Pat. predict whether a host, subject or patient should be treated Nos. 5,143,854: 5,288,644; 5,324,633; 5,432,049; 5,470, with a therapy of interest and to optimize the dose therein. 710; 5.492,806; 5,503,980; 5,510,270: 5,525,464; 5,547, 839; 5,580,732; 5,661,028; 5,800,992; the disclosures of Databases of Expression Profiles which are herein incorporated by reference; as well as WO 95/21265; WO 96/31622; WO 97/10365; WO 97/27317; EP 0081. Also provided are databases of expression profiles 373 203; and EP 785 280. In certain embodiments, the of CSR genes. Such databases will typically comprise number of genes that are from that is represented on the expression profiles derived from serum induced fibroblasts, array is at least 10, usually at least 25, and may be at least typical cancer cell samples, etc. The expression profiles and 50, 100, up to including all of the CSR genes, preferably databases thereof may be provided in a variety of media to utilizing the top ranked set of genes. Where the subject facilitate their use. “Media' refers to a manufacture that arrays include probes for Such additional genes, in certain contains the expression profile information of the present embodiments the number 9% of additional genes that are invention. The databases of the present invention can be represented does not exceed about 50%, usually does not recorded on computer readable media, e.g. any medium that exceed about 25%. US 2006/0183141 A1 Aug. 17, 2006

0.086 Another type of reagent that is specifically tailored features are embodied in one or more computer programs for generating expression profiles of CSR genes is a collec may be performed by one or more computers running Such tion of gene specific primers that is designed to selectively programs. amplify Such genes, for use in quantitative PCR and other quantitation methods. Gene specific primers and methods Diagnosis, Prognosis, Assessment of Therapy for using the same are described in U.S. Pat. No. 5,994,076, (Therametrics), and Management of Cancer the disclosure of which is herein incorporated by reference. 0090 The classification methods described herein, as Of particular interest are collections of gene specific primers well as their gene products and corresponding genes and that have primers for at least 10 of the CSR genes, often a gene products, are of particular interest as genetic or bio plurality of these genes, e.g., at least 25, and may be 50, 100 chemical markers (e.g., in blood or tissues) that will detect or more to include all of the CSR genes. The subject gene the earliest changes along the carcinogenesis pathway and/or specific primer collections may include only CSR genes, or to monitor the efficacy of various therapies and preventive they may include primers for additional genes. interventions. 0087. The kits of the subject invention may include the 0091 Staging. Staging is a process used by physicians to describe how advanced the cancerous state is in a patient. above described arrays and/or gene specific primer collec Staging assists the physician in determining a prognosis, tions. The kits may further include a software package for planning treatment and evaluating the results of Such treat statistical analysis of one or more phenotypes, and may ment. Staging systems vary with the types of cancer, but include a reference database for calculating the probability generally involve the following “TNM system: the type of of Susceptibility. The kit may include reagents employed in tumor, indicated by T. whether the cancer has metastasized the various methods, such as primers for generating target to nearby lymph nodes, indicated by N; and whether the nucleic acids, dNTPs and/or rNTPs, which may be either cancer has metastasized to more distant parts of the body, premixed or separate, one or more uniquely labeled dNTPs indicated by M. Generally, if a cancer is only detectable in and/or rNTPs, such as biotinylated or Cy3 or Cy5 tagged the area of the primary lesion without having spread to any dNTPs, gold or silver particles with different scattering lymph nodes it is called Stage I. If it has spread only to the spectra, or other post synthesis labeling reagent, Such as closest lymph nodes, it is called Stage II. In Stage III, the chemically active derivatives of fluorescent dyes, enzymes, cancer has generally spread to the lymph nodes in near such as reverse transcriptases, DNA polymerases, RNA proximity to the site of the primary lesion. Cancers that have polymerases, and the like, various buffer mediums, e.g. spread to a distant part of the body, such as the liver, bone, hybridization and washing buffers, prefabricated probe brain or other site, are Stage IV, the most advanced Stage. arrays, labeled probe purification reagents and components, 0092. The methods described herein can facilitate fine like spin columns, etc., signal generation and detection tuning of the staging process by identifying the aggressive reagents, e.g. Streptavidin-alkaline phosphatase conjugate, ness of a cancer, e.g. the metastatic potential, as well as the chemifluorescent or chemiluminescent Substrate, and the presence in different areas of the body. Thus, a Stage II like. cancer with a classification signifying a high metastatic 0088. In addition to the above components, the subject potential cancer can be used to change a borderline Stage II kits will further include instructions for practicing the sub tumor to a Stage III tumor, justifying more aggressive ject methods. These instructions may be present in the therapy. Conversely, the presence of a polynucleotide sig Subject kits in a variety of forms, one or more of which may nifying a lower metastatic potential allows more conserva be present in the kit. One form in which these instructions tive staging of a tumor. may be present is as printed information on a Suitable 0093. The following examples are offered by way of medium or Substrate, e.g., a piece or pieces of paper on illustration and not by way of limitation. which the information is printed, in the packaging of the kit, in a package insert, etc. Yet another means would be a EXAMPLE 1. computer readable medium, e.g., diskette, CD, etc., on which the information has been recorded. Yet another means 0094) Identification of a Stereotyped Genomic Response that may be present is a website address which may be used of Fibroblasts to Serum. We previously observed that the via the internet to access the information at a removed site. global transcriptional response of fibroblasts to serum inte Any convenient means may be present in the kits. grates many processes involved in wound healing. Because fibroblasts from different anatomic sites are distinct differ 0089. The above-described analytical methods may be entiated cells with characteristic gene expression profiles, embodied as a program of instructions executable by com we investigated whether the genomic responses to serum puter to perform the different aspects of the invention. Any varied significantly among fibroblasts cultured from differ of the techniques described above may be performed by ent anatomic sites. Fifty fibroblast cultures derived from ten means of Software components loaded into a computer or anatomic sites were cultured asynchronously in 10% fetal other information appliance or digital device. When so bovine serum (FBS) or in media containing only 0.1% FBS. enabled, the computer, appliance or device may then per Analysis of the global gene expression patterns, using form the above-described techniques to assist the analysis of human clNA microarrays containing approximately 36,000 sets of values associated with a plurality of genes in the genes, revealed that although fibroblasts from different sites manner described above, or for comparing Such associated have distinctly different gene expression programs, they values. The software component may be loaded from a fixed share a stereotyped gene expression program in response to media or accessed through a communication medium Such serum (FIG. 1A). Selection for genes that were concor as the internet or other type of computer network. The above dantly induced or repressed by most types of fibroblasts US 2006/0183141 A1 Aug. 17, 2006 yielded 677 genes, represented by 772 cDNA probes, of closely approximated that seen in quiescent fibroblasts cul which 611 are uniquely identified by UniGene. tured in the absence of serum (FIG. 2). In prostate and 0.095 This common genomic response to serum includes hepatocellular carcinomas, all of the normal tissue samples induction of genes that represent entry into and progression had the serum-repressed signature and almost all of the through the cell cycle (e.g., E2F1, FOXM1, PTTG1), induc tumors had the serum-induced signature, albeit with varying tion of cell motility (e.g., CORO1C, FLNC), extracellular amplitude. In breast, lung, and gastric carcinomas, the matrix remodeling (LOXL2, PLOD2, PLAUR), cell-cell common fibroblast serum response signature was clearly signaling (SDFR1, ESDN, MIF), and acquisition of a myo evident in some of the tumors and apparently absent in fibroblast phenotype (e.g., TAGLN, TPM2, MYL6). Analy others, Suggesting that a “wound-healing phenotype' was a sis of the public (GO) annotation of the variable feature of these cancers. We therefore classified fibroblast serum response genes confirmed a significant breast, lung, and gastric cancer samples based on the pattern enrichment of genes involved in cell proliferation, blood of expression of the genes that comprise the fibroblast CSR. coagulation, complement activation, secretory protein Syn 0099 Link between the Gene Expression Signature of thesis, angiogenesis, and proteolysis, reflecting the diverse Fibroblast Serum Response and Cancer Progression. To roles that fibroblasts may play during wound healing. investigate the stability and consistency of the serum 0096. One of the most consistent and important responses response signature in individual tumors and to explore its of human cells to serum is proliferation. Abnormal cell clinical implications, we examined CSR gene expression in proliferation is also a consistent characteristic of cancer a group of locally advanced breast cancers with extensive cells, irrespective of any possible involvement of a wound clinical and molecular data. As shown in FIG. 3A, the healing response. We therefore sought to eliminate the expression profiles of the CSR genes were biphasic, allow contributions of genes directly related to cell proliferation, ing a natural separation of these tumors into two classes. to improve the specificity of a genomic signature of the Interestingly, in 18 out of 20 paired tumor samples obtained fibroblast serum response. To identify features directly from the same patients before and after excisional biopsy related to cell cycle progression, we examined the expres and chemotherapy, the CSR expression phenotypes were sion pattern of these 677 genes during the cell cycle (in consistent between the two samples. Thus, the wound HeLa cells). Despite the well-known role of serum as a related expression program appears to be an intrinsic prop mitogen, only one-quarter (165 out of 677 genes) of the erty of each tumor and not easily extinguished. In a set of 51 fibroblast serum response genes showed periodic expression patients with clinically matched disease and equivalent during the cell cycle (FIG. 1B). The majority of the genes treatment, primary tumors with the activated CSR signature whose expression levels in fibroblasts showed the most were significantly more likely to progress to metastasis and consistent response to serum exposure do not appear simply death in a 5-y follow-up period (p=0.013 and 0.041, respec to reflect cell growth or division; these 512 serum-respon tively) (FIG. 3B). Using an alternative analytic approach, sive and cell cycle-independent genes are operationally classifying each sample by the Pearson correlation between defined as the fibroblast core serum response (CSR). Com tumor and fibroblast expression patterns of the fibroblast parison of the common fibroblast serum response with a CSR genes, also reproduced the identification of two classes detailed analysis of the temporal program of gene expression of samples with differing clinical outcomes. A gene expres following serum exposure in foreskin fibroblasts confirmed sion pattern similar to the serum-activated program of that the cell cycle genes and the CSR have distinct temporal fibroblasts is thus a powerful predictor of prognosis. profiles during serum stimulation and are thus distinguish 0.100 Other significant prognostic factors in these same able biological processes (FIG. 1C). patients include tumor grade, estrogen receptor status, and 0097 Expression of Fibroblast CSR in Human Cancers. tumor subtype based on gene expression profile. Tumor Because serum (as distinct from plasma and normal extra stage, lymph-node status, and p53 status were not statisti cellular fluid) is encountered in vivo only at sites of tissue cally significant predictors of Survival in these patients injury or remodeling and induces in fibroblasts a gene (p=0.13, 0.79, 0.05, respectively). A “basal-like” subtype of expression response Suggestive of wound healing, we rea breast cancer, characterized by molecular similarities of the soned that expression of fibroblast CSR genes in tumors tumor cells to basal epithelial cells of the normal mammary would gauge the extent to which the tumor microenviron duct and associated with a particularly unfavorable progno ment recapitulates normal wound healing. We examined the sis, was significantly associated with a gene expression expression of genes comprising the fibroblast CSR in pub pattern resembling the fibroblast CSR: six of seven basal licly available microarray data from a variety of human like breast cancers had the "serum-activated” gene expres cancers and their corresponding normal tissues. To facilitate sion signature (p=0.0075, Fisher's exact test). Thus, the visualization and analysis, we organized the gene expression presence or absence of the wound-like phenotype is linked patterns and samples by hierarchical clustering. Remark to intrinsic features of the tumor cells. ably, we observed a predominantly biphasic pattern of 0101 We considered the possibility that the observed expression for the fibroblast CSR in diverse cancers, includ phenomenon may be simply a reflection of the number of ing breast cancers, lung cancers, gastric cancers, prostate fibroblasts in tumor samples. Perhaps tumors that are infil cancers, and hepatocellular carcinoma. Expression levels of trative or otherwise worrisome clinically would demand a genes that were activated by serum in fibroblasts varied wide margin of excision that would include more fibroblasts coordinately in tumors, and genes that were repressed by in the resultant samples. However, classification of breast serum in fibroblasts were mostly expressed in a reciprocal cancers using the top 1% most highly expressed fibroblast pattern (FIG. 2). genes (which include a number of extracellular matrix genes 0098. In each of the tumor types examined, the expres and have been previous observed as the “stroma signature') sion pattern of the fibroblast CSR genes in normal tissues showed no relationship between the generic fibroblast sig US 2006/0183141 A1 Aug. 17, 2006 nature and clinical outcome (p=0.75). Thus, the prognostic eling and cell-cell interaction, using tissue microarrays con value of the fibroblast CSR reflects the physiologic state of taining hundreds of breast carcinoma tissues. PLAUR, also the tumor microenvironment and not just the number of known as urokinase-type plasminogen activator receptor, is fibroblasts in tumor stroma. Similarly, although the mitotic a well-characterized receptor for matrix-degrading proteases index is an established criterion of tumor grade, classifica that has been implicated in tumor cell invasion. LOXL2 is tion of these tumors based on expression of cell cycle genes a member of a family of extracellular lysyl oxidases that only had moderate prognostic value (p=0.08). This result modify and cross-link collagen and elastin fibers. PLOD2 is also demonstrates that the prognostic value of the fibroblast a member of the lysyl hydroxylase family that plays impor CSR is unlikely to be accounted for by the incomplete tant roles in matrix cross-linking and fibrosis. SDFR1, annotation and removal of genes representing cell growth or previously named gp55 and gp65, encodes a cell Surface division. protein of the immunglobulin Superfamily that regulates cell 0102) To extend and validate these results, we tested the adhesion and process outgrowth. ESDN is a neuropilin-like prognostic power of the fibroblast CSR signature in inde cell surface receptor that was also previously found to be pendent datasets and different kinds of human cancer (FIG. upregulated in metastatic lung cancers. All five of these 4). Using published DNA microarray data from a study of genes were included in the fibroblast CSR gene set by virtue gene expression patterns in a group of 78 early (tumor of their induction by serum in fibroblasts (see FIG. 1). Smaller than 5 cm, stage I and IIA) breast cancer patients, we 0105 Anti-PLAUR antibody is commercially available could segregate the patients into two groups based on and served as a positive control. We prepared specific expression of the fibroblast CSR genes in the biopsy riboprobes for LOXL2 and SDFR1 and generated affinity samples. Tumors with the serum-induced signature had a purified anti-peptide antibodies to PLOD2 and ESDN to significantly increased risk of metastasis over 5 y detect the predicted protein products. As shown in FIG. 5, (p=0.00046) (FIG. 4A). Multivariate Cox proportional haz PLAUR, LOXL2, PLOD2, and ESDN were not detectably ard analysis confirmed that the CSR classification is a expressed in normal breast tissue: SDFR1 was expressed at significant independent predictor (p=0.009); the serum-in a low level in normal breast epithelial cells (n=11). In duced gene expression signature was associated with a contrast, all five genes were induced in a significant fraction 3.3-fold relative risk of breast cancer metastasis within 5 y of invasive ductal carcinomas of the breast. As previously of diagnosis. In the two breast cancer datasets examined, reported, PLAUR protein is expressed in both tumor cells approximately 50% of the CSR genes demonstrated signifi and peritumoral stroma (70 out of 96, 73% positive) (FIG. cant differences in expression between the activated and 5). PLOD2 protein and SDFR1 mRNA were detected in quiescent groups of Samples, but permutation and 10-fold breast carcinoma cells and in a small but consistent fraction balanced leave-one-out analyses revealed that the correct of peritumor stroma cells (78 out of 100, 78% positive, and classification can be accomplished using as few as 6% of 55 out of 79, 70% positive, respectively). ESDN protein was CSR genes. detected exclusively in breast carcinoma cells (69 out of 112, 62% positive). In contrast, LOXL2 mRNA was abundant in 0103 Thus, the expression pattern of the CSR genes peritumoral fibroblasts around invasive carcinomas (45 out provides a robust basis for predicting tumor behavior. Simi of 106, 42% positive). LOXL2 protein has been previously larly, in analysis of published DNA microarray data from 62 reported to be expressed in normal mammary ducts and patients with stage I and II lung adenocarcinomas, tumors increased in invasive breast carcinoma cells. Our data Sug with the serum-induced signature were associated with gest that LOXL2 is primarily synthesized by peritumoral significantly higher risk of death compared to tumors with fibroblasts, but may act on or in the vicinity of epithelial the serum-repressed signature (p=0.021) (FIG. 4B). These cells during tissue remodeling. Collectively, these results results suggest that presence or absence of a wound-like Suggest that the pathophysiology represented by expression phenotype in these cancers, with its prognostic implication of the fibroblast CSR genes in cancers represents a multi for their metastatic potential, may be determined at an early cellular program in which the tumor cells themselves, stage in their development. In a second, independent group tumor-associated fibroblasts, and perhaps diverse other cells of lung adenocarcinomas of all stages, tumors with the in the tumor microenvironment are active participants. fibroblast serum-induced signature were associated with a 0106 The remarkable ability of a single physiological significantly worse prognosis (p=0.0014) (FIG. 4C). A fluid-serum to promote the growth and Survival of significant correlation between advanced stage and the diverse normal and cancer cells in culture Suggests that there serum-induced signature was also apparent in this dataset. may be a conserved, programmed response to the molecular Finally, in 42 patients with stage III gastric carcinomas, all signals that serum provides. In vivo, serum as a physiologi treated with gastrectomy alone, tumors with the activated cal signal has a very specific meaning: cells encounter CSR signature were again associated with shorter Survival serum the soluble fraction of coagulated blood—only in (p=0.02) (FIG. 4D). These results demonstrate that a the context of a local injury. In virtually any tissue, a rapid, wound-healing phenotype, reflected in the expression of a concerted multicellular response, with distinct physiological set of serum-inducible genes in fibroblasts, is strongly linked exigencies that evolve over minutes, hours, and days, is to progression of diverse human carcinomas and can provide required to preserve the integrity of the tissue and often the valuable prognostic information even at an early stage in the Survival of the organism. In response to a wound, many of natural history of a cancer. the normal differentiated characteristics of the cells in the 0104. Histological Architecture of CSR Gene Expression wounded tissue are temporarily set aside in favor of an in Tumors. Both to validate the DNA microarray results and emergency response. In wound repair, as in cancer, cells that to investigate the histological architecture of CSR gene ordinarily divide infrequently are induced to proliferate expression in tumors, we examined the expression patterns rapidly, extracellular matrix and connective tissues are of five CSR genes implicated in extracellular matrix remod invaded and remodeled, epithelial cells and stromal cells US 2006/0183141 A1 Aug. 17, 2006

migrate, and new blood vessels are recruited. In all these Ramaswamy et al. (2003). Only 11 genes are in common respects, a wound response—and the characteristic physi between the 231 gene Vant Veer poor prognosis signature ological response to serum-appears to provide a highly for breast cancer and the fibroblast CSR genes. The prog favorable milieu for cancer progression. nostic power of these different sets of genes illustrates the multidimensional variation in the gene expression programs 0107 We defined a stereotyped genomic expression in cancers and the complex interplay of many distinct response of fibroblasts to serum, which reflects many fea genetic and physiological factors in determining the distinc tures of the physiology of wound healing. When we exam tive biology of each individual tumor. Our success in dis ined the expression of these genes in human tumors, we covering a significant new determinant of cancer progres found strong evidence that a wound-like phenotype was sion illustrates the richness of the data as a continuing Source variably present in many common human cancers (including for future discoveries and the importance of unrestricted many that are not known to be preceded by chronic wounds) and was a remarkably powerful predictor of metastasis and access to published research data. death in several different carcinomas. Materials and Methods 0108. At least three genes induced in the fibroblast serum 0.111 Cells and tissue culture. Human primary fibroblasts response, PLAUR, LOXL2, and MIF, have been previously from ten anatomic sites were cultured in 0.1% versus 10% shown to increase cancer invasiveness or angiogenesis in FBS, as previously described (Chang et al. 2002 Proc Natl animal Xenograft models; each of these three genes has also Acad Sci 99:12877-12882). For the serum induction time been shown to play an important role in wound healing. course, foreskin fibroblasts CRL 2091 (American Type Thus, coordinate induction of a wound-healing program in Culture Collection ATCC). Manassas, Va., United States) carcinomas may contribute to tumor invasion and metasta were serum-starved for 48 h and harvested at the indicated S1S. timepoints after switching to media with 10% FBS, essen 0109) Several potential mechanisms might contribute to tially as described in lyer et al. (1999) Science 283: 83-87. the wound-like gene expression pattern in cancers. In some 0112 Microarray procedures. Construction of human cancers, ongoing local tissue injury, resulting from growth cDNA microarrays containing approximately 43,000 ele and dysfunctional behavior of the tumor cells, could con ments, representing approximately 36,000 different genes, tinuously trigger a normal wound-healing response. The and array hybridizations were as previously described classic observation of deposited fibrin products in human (Perou et al. 2000 Nature 406: 747-752). mRNA was puri tumors is consistent with this model. Inflammatory cells, fied using FastTrack according to the manufacturers presumably recruited by tissue disorder, may amplify the instructions (Invitrogen, Carlsbad, Calif., United States). For wound response and contribute to tumor invasion in part by the serum time course, RNA from all of the sampled expression of metalloproteinases. The wound response timepoints were pooled as reference RNA to compare with might also be initiated directly by signals from the tumor RNA from individual timepoints as described in lyer et al. cells, whose ability to activate an inappropriate wound (1999) supra. healing response—favorable to cell proliferation, invasion, and angiogenesis—might be strongly selected during cancer 0113 Data analysis. For defining a common serum progression. The possibility that stromal cells might play a response program in fibroblasts, global gene expression primary role in promoting a wound-like phenotype in some patterns in 50 fibroblast cultures derived from ten anatomic cancers is raised by studies showing that tumor-associated sites, cultured in the presence of 10% or 0.1% FBS, were fibroblasts can enhance tumor engraftment and metastasis in characterized by DNA microarray hybridization (Chang et animal models and the demonstration in some cancers of al. 2002, supra). We selected for further analysis genes for which the corresponding array elements had fluorescent genotypic abnormalities in tumor-associated fibroblasts. hybridization signals at least 1.5-fold greater than the local 0110) Our results illustrate the power of using gene background fluorescence in the reference channel, and we expression data from specific cells or physiological and further restricted our analyses to genes for which technically genetic manipulations to build an interpretive framework for adequate data were obtained in at least 80% of experiments. the complex gene expression profiles of clinical samples. These filtered genes were then analyzed by the multiclass Several prognostic models based on gene expression pat Significance Analysis of Microarrays (SAM) algorithm terns have previously been identified from systematic DNA (Tusher et al. 2001 Proc Natl AcadSci USA 98: 5116-5121) microarray profiles of gene expression in human cancers. to select a set of genes whose expression levels had a Some of these prognostic gene expression profiles appear to significant correlation with the presence of serum in the reflect the developmental lineage of the cancer cells, some medium, with a false discovery rate (FDR) of less than appear to reflect the activity of specific molecular determi 0.02%. The corresponding expression patterns were orga nants of tumor behavior (e.g., the activity of PLA2G2A in nized by hierarchical clustering (Eisen et al. 1998 Proc Natl gastric cancer), while still others represent the mechanisti Acad Sci 95:14863-14868). Genes that were coordinately cally agnostic results of machine-assisted learning. Although induced or repressed in response to serum in most samples they serve to identify many of the same tumors with unfa (Pearson correlation, greater than 90%) were identified. This vorable prognosis, the genes that define the fibroblast CSR set of 677 genes, represented by 772 cDNA probes, of which overlap minimally with the genes previously used to predict 611 are uniquely identified by UniGene, was termed the outcome in the same cancers. For example, the fibroblast common fibroblast serum response gene set. To identify the CSR involves only 20 out of 456 genes in an “intrinsic gene subset of these 677 genes whose variation in expression was list' that can serve to segregate breast cancers into prog directly related to cell cycle progression, we compared this nostically distinct groups and four out of 128 genes that set of genes to a published set of genes periodically define the general metastasis signature reported by expressed during the HeLa cell cycle (Whitfield et al. 2002 US 2006/0183141 A1 Aug. 17, 2006

Mol Biol Cell 13: 1977-2000). Because both datasets were (AB8903; Chemicon, Temecula, Calif., United States) was generated using similar cDNA microarrays, we tracked used at 1:200 dilution. Affinity-purified polyclonal antibody genes by the IMAGE number of the cDNA clones on the to PLOD2 was produced by immunizing rabbits with pep microarrays. The majority of the genes in the fibroblast tides (SEQ ID NO: 1) EFDTVDLSAVDVHPN, coupled to serum response gene set showed no evidence of periodic keyhole limpet hemocyanin (KLH) (Applied Genomics, expression during the HeLa cell cycle. One hundred sixty Inc., Sunnyvale, Calif., United States); affinity-purified anti five genes, represented by 199 cDNA clones, overlapped serum was used for IHC at 1:25,000 dilution. Similarly, with the cell cycle gene list; the remaining 512 genes, affinity-purified polyclonal antibody to ESDN was produced represented by 573 clones, of which 459 are uniquely by immunizing rabbits with peptide (SEQ ID NO:2) DHT identified in UniGene, was termed the CSR gene set. GQENSWKPKKARLKK coupled to KLH (Applied Genomics, Inc.) and used for IHC at 1:12.500 dilution. 0114. The patterns of expression in human tumors of the High-density tissue microarrays containing tumor samples 512 genes of the fibroblast CSR gene set were analyzed were constructed as described in Kononen et al. (1998) Nat using data from published tumor expression profiles. We Med 4: 844-847. ISH (Iacobuzio-Donahue et al. 2002 Can used the Unigene unique identifier to match genes repre cer Res 62: 5351-5357) and IHC (Perou et al. 2000, supra) sented in different microarray platforms. For cDNA microar were as reported. ISH and IHC images and data were rays, genes with fluorescent hybridization signals at least archived as described in Liu et al. (2002) Am J Pathol 161: 1.5-fold greater than the local background fluorescent signal 1557-1565. in the reference channel (Cy3) were considered adequately measured and were selected for further analyses. For 0118. The Locus Link accession numbers for the genes Affymetrix data, signal intensity values were first trans discussed in this paper are CORO1C (Locus Link ID formed into ratios, using for each gene the mean values of 23603), E2F1 (Locus Link ID 1869), ESDN (Locus Link ID the normalized fluorescence signals across all the samples 131566), FLNC (Locus Link ID 2318), FOXM1 (Locus analyzed as the denominators (Bhattacharjee et al. 2001 Proc Link ID 2305), LOXL2 (Locus Link ID 4017), MIF (Locus Natl Acad Sci 98:13790-13795). Link ID 4282), MYL6 (Locus Link ID 4637), PLAUR (Locus Link ID 5329), PLOD2 (Locus Link ID 5352), 0115 The genes for which technically adequate measure PTTG1 (Locus Link ID 9232), SDFR1 (Locus Link ID ments were obtained from at least 80% of the samples in a 27020), TAGLN (Locus Link ID 6876), and TPM2 (Locus given dataset were centered by mean value within each Link ID 7169). The accession numbers of the Gene Ontol dataset, and average linkage clustering was carried out using ogy (GO) terms that appear in Dataset S1 are angiogensis the Cluster software (Eisen et al. 1998, Supra). In each set of (GO:0001525), blood coagulation (GO:0007596), comple patient samples, the samples were segregated into two ment activation (GO:0006956), immune response classes based on the first bifurcation in the hierarchical (GO:0006955), N-linked glycosylation (GO:0006487), pro clustering dendrogram. For the datasets shown, the cluster tein translation (GO:0006445), and proteolysis and pepti ing and reciprocal expression of serum-induced and serum dolysis (GO:0006508). repressed genes in the tumor expression data allowed two classes to be unambiguously assigned. Samples with gen 0119) clNA microarray data: Molecular portrait of breast erally high levels of expression of the serum-induced genes cancer—62 sporadic breast cancers and 3 pooled normal and low levels of expression of the serum-repressed genes breast tissues, including 20 pairs of tumors obtained before were classified as “activated’; conversely, samples with and after excisional biopsy and doxorubicin-based chemo generally high levels of expression of serum-repressed genes therapy and 2 pairs of primary tumor and lymph node and low levels of expression of the serum-induced genes metastasis. Published by (Perou et al., 2000). were classified as "quiescent.” Survival analysis by a Cox 0120 Locally advanced breast cancer 85 breast Mantel test was performed in the program Winstat (R. Fitch samples, consisting of 78 carcinomas, 3 fibroadenomas, and Software). 4 normals. 40 of these tumor were previously profiled in 0116. In situ hybridization and immunohistochemistry. Dataset A. A subset of 51 locally advanced primary breast Digoxigenin-labeled sense and antisense riboprobes for cancers were all treated with excisional biopsy and doxo LOXL2 and SDFR1 were synthesized using T7 polymerase rubicin-based chemotherapy. Clinical endpoint=relapse free directed in vitro transcription. Sense and antisense ribo survival and disease-specific survival. Published by (Sorlie probes for SDFR1 were made from nucleotides 51-478 of et al., 2001). IMAGE clone 586731 (ATCC #745139), corresponding to 0121 Lung cancer—67 sporadic primary lung carcino the last 388 nucleotides of the 3' end of the coding sequence mas of different histologic types and stages, including 24 and 39 nucleotides of the 3' untranslated region. Sense and primary adenocarcinomas. 6 normal lung tissues were also antisense riboprobes for LOXL2 were made from nucle profiled. Clinical endpoint=overall survival. Published by otides 41-441 of IMAGE clone 882506 (ATCC #1139012), (Garber et al., 2001). corresponding to the 3' end of the coding sequence. In situ 0.122 Gastric cancer—104 sporadic primary gastric car hybridization (ISH) results were considered to have appro cinomas with >5 year followup and 24 non-neoplastic priate specificity when we observed a strong, consistent gastric mucosa. All patients were treated with gastrectomy pattern of hybridization of the antisense probe and little or alone. Stage III presentation (n=42) was the most common no hybridization of the corresponding sense probe. and was analyzed for the clinical endpoint of overall sur 0117 Immunohistochemical (IHC) staining was per vival. Published by (Leung et al., 2002). formed using Dako (Glostrup, Denmark) Envision Plus 0123. Diffuse large B cell lymphoma 240 DLCL following the manufacturers instructions. Anti-PLAUR patients with >5 year followup. Clinical endpoint=overall antibody against whole purified human uPA-receptor protein survival. Published by (Rosenwald et al., 2002). US 2006/0183141 A1 Aug. 17, 2006

0124 Hepatocellular carcinoma—156 HCC and non Following these 3 steps, we then (1) generated ratios by cancerous liver tissues studied by (Chen et al., 2002). mean centering the expression data for each gene (by dividing the intensity measurement for each gene on a given 0125 Prostate cancer—100 prostate cancers and adjacent array by the average intensity of the gene across all 156 normal tissues profiled by Lapointe et al. arrays), (2) then log-transformed (base 2) the resulting 0126 Rosetta ink jet oligonucleotide microarray data. ratios, and (3) then median centered the expression data Early breast cancer 78 stage sporadic primary breast car across arrays then across genes (2 iterations). cinomas.<5 cm diameter (stage I and IIA) with >5 year 0.130 UniGene mapping/CSR cross-referencing: We next clinical followup after lumpectomy. Clinical endpoint=me mapped the 12,454 probe sets (excluding control elements) tastasis. Data published by (vant Veer et al., 2002). represented on these U95A Affymetrix microarrays to the 0127. Affymetrix Genechip data. Early lung cancer—156 corresponding GenBank accessions of the mRNA targets, lung samples, including 127 sporadic primary adenocarci using the NetAffx resource (Liu et al., 2003) as well as nomas of the lung, (62 of which were stage I and II), 12 “Table A” from the supplement to Ramaswamy et al. These Suspected extrapulmonary metastases, and 17 normal lung accessions were then used in BatchSOURCE and LocusLink samples with >4 year clinical followup. Clinical endpoint= queries or to retrieve the corresponding UniGene cluster IDs overall survival. Data published by (Bhattacharjee et al., (build #158); in this manner we mapped 11,963 (~96%) 2001) and stage I and II data selected by (Ramaswamy et al., probe sets to 9.311 unique UniGene clusters. Of these 2003). Medulloblastoma—60 medulloblastomas with >5 mapped probe sets, 246 (corresponding to 212 unique Uni year clinical followup. Clinical endpoint=overall survival. Gene clusters) had corresponding features represented in the Published by (Pomeroy et al., 2002). CSR gene list, and were used for further analyses as 0128. Cross platform mapping and data normalization. described below. Breast Cancer Data (vant Veer et al.): We downloaded and 0131 Medulloblastoma (Pomeroy et al.): we downloaded combined the raw microarray hybridization data for 78 raw microarray data (HuGeneFL series) for 60 specimens Stage I breast tumors from the Supplemental materials from the Supplemental website accompanying Ramaswamy accompanying Van't veer et al. We then mapped each et al. (their Dataset E. Because the data provided by the arrayed feature on the microarrays to the corresponding authors were intensity measurements processed by a linear genes using BatchSOURCE, where the 24,481 GenBank Scaling scheme (Ramaswamy et al., 2003), we converted accessions provided by the authors were used as queries to these intensities to normalized log-ratios to allow compari retrieve UniGene identifiers (build #158, Jan. 15, 2003). Son of the corresponding measurements from cDNA Since not all GenBank accessions are represented within microarrays. Specifically, following the convention UniGene, we could not map 636 (-2.6%) of the arrayed employed by Ramaswamy et al., we (1) considered all features in this manner. 456 of the 23845 Rosetta array measurements regardless of Present (“P”) or Absent ('A') elements that could be mapped corresponded to the fibro call, and (2) then applied a thresholding filter which arbi blast CSR genes present on our cDNA microarrays, and trarily sets values less than 20 to 20, and those above 16,000 were used for Subsequent analyses. Because the download to 16,000. Following these steps, we then (1) generated able data were presented as 2-color ratios in log base 10 ratios by mean centering the expression data for each gene space, we simply transformed the measurements to log base (by dividing the intensity measurement for each gene on a 2 space to allow comparison to the spotted DNA microarray given array by the average intensity of the gene across all 60 data. Consistent with the scheme employed for all 2-color arrays), (2) then log-transformed (base 2) the resulting hybridization arrays considered in this study, we filtered out ratios, and (3) then median centered the expression data genes with fewer than 80% data present (453 genes passed across arrays then across genes (2 iterations). Following the filter). These data were then processed as detailed in these 2 steps, we then (1) generated ratios by mean centering section III below. the expression data for each gene (by dividing the intensity 0129. Lung Adenocarcinoma (Bhattacharjee et al.). We measurement for each gene on a given array by the average downloaded raw microarray data (U95A series) for 156 intensity of the gene across all 60 arrays), (2) then log specimens including 127 primary lung adenocarcinomas, 12 transformed (base 2) the resulting ratios, and (3) then Suspected extrapulmonary metastases from the lung, and 17 median centered the expression data across arrays then normal lung samples from the Supplemental website accom across genes (2 iterations). panying Bhattacharjee et al. Because the data provided by 0.132 UniGene mapping/CSR cross-referencing: We next the authors were intensity measurements processed by a mapped the 7,129 probe sets represented on these rank-invariant Scaling scheme, we converted these intensi HuGeneFL Affymetrix microarrays to the corresponding ties to normalized log-ratios to allow comparison of the GenBank accessions of the mRNA targets, using the corresponding measurements from cDNA microarrays. Spe NetAfix resource (Liu et al., 2003) as well as “Table A” from cifically, following the protocol employed by Ramaswamy the supplement to Ramaswamy et al. We retrieved surrogate et al., we (1) considered all measurements regardless of accessions for probe sets designed from TIGR consensus Present (“P”) or Absent (“A”) call, (2) then applied a sequences from Wong Lab website at Harvard University. thresholding filter which arbitrarily sets values less than 20 These accessions were then used in BatchSOURCE and to 20, and those above 16000 to 16000, and (3) then applied LocusLink queries to retrieve the corresponding UniGene a variation filter such that we only considered those features cluster IDs (build #158); we supplemented these mappings which exhibited variation of at least 100 in intensity and with an annotation file from Jean-Marie Rouillard at the which showed at least 3-fold difference in the intensity University of Michigan. We in this manner mapped 7,079 between the highest and lowest expression levels across the (-99%). probe sets to 5,691 unique UniGene clusters (Build 156 microarrays (6349 of 12600 passed these criteria). #158). Of these mapped probe sets, 222 (corresponding to US 2006/0183141 A1 Aug. 17, 2006 20

181 unique UniGene clusters) had corresponding features 0.137 Classification by Pearson correlation. To evaluate represented in the CSR gene list, and were used for further the validity of splitting tumor samples into two classes, we analyses as described below. analyzed the expression pattern of CSR genes in the locally 0.133 Classification of Cancers by Fibroblast CSR genes advanced breast cancers by an alternative approach that and correlated clinical outcomes. The patterns of expression quantifies the similarity of CSR gene expression in tumors in human tumors of the 512 genes of the fibroblast CSR gene vs. in cultured fibroblasts. The expression pattern of CSR set were analyzed using data from published tumor expres genes in the 10 fibroblasts types cultured in 10% FBS was sion profiles listed above. We used IMAGE clone identifiers averaged to derive a single number for each gene. The to follow the identity of cDNA probes of Stanford and NIH Pearson correlation of the averaged fibroblast expression cDNA microarrays, and used Unigene unique identifier to pattern with each of the breast cancer sample was then match genes represented in different microarray platforms. calculated. The Pearson correlation data demonstrated at least two groups of breast cancer samples: one group with Transformation and normalization of expression data from expression patterns that have positive correlation to the different platforms are described above. fibroblast serum-induced expression pattern, and a second 0134) For cDNA microarray data, genes with fluorescent group with expression patterns that is anti-correlated with hybridization signals at least 1.5-fold greater than the local serum-induced expression. Plotting the Pearson correlations background fluorescent signal in the reference channel against uncensored Survival time revealed that cancer (Cy3) were considered adequately measured and were samples with Pearson correlation greater than 0.2 had selected for further analyses. The genes for which techni decreased survival and relapse-free survival. Using Pearson cally adequate measurements were obtained from at least correlation of 0.2 as the cutoff, Cox-Mantel test confirmed 80% of the samples in a given dataset were centered by that breast cancers with high correlation to fibroblast serum mean value within each dataset, and average linkage clus induced expression of CSR genes indeed demonstrate poorer tering was carried out using the Cluster Software. In each set disease-specific survival and relapse free survival (p=0.023 of patient samples, the samples were segregated into two and 0.04, respectively). classes based on the first bifurcation in the hierarchical clustering “dendrogram. Unless otherwise noted, the clus 0.138 Lung cancer—all stages. Gene expression data of tering and reciprocal expression of serum-induced and 67 lung carcinomas and 6 normal lung tissues were down serum repressed genes in the tumor expression data allowed loaded from Stanford Microarray Database. Genes with two classes to be unambiguously assigned. Samples with technically adequate measurement over 80% of experiments generally high levels of expression of the serum-induced were selected; 338 cDNA probes corresponding to CSR genes and low levels of expression of the serum-repressed genes (henceforth genes) were present in this dataset and genes, were classified as “activated’; conversely, Samples pass the filtering criteria. The expression pattern of these 338 with generally high levels of expression of serum-repressed genes were used for hierarchical clustering to define 2 genes and low levels of expression of the serum-induced classes were as described above. The 6 normal lung tissues genes were classified as "quiescent'. Survival analysis by in this dataset were all identified as "quiescent'. Among 24 Cox-Mantel test was performed in the program Winstat (R. primary lung adenocarcinomas with adequate Survival infor Fitch Software). mation, 10 tumors were classified as “activated' and 14 tumors were classified as “quiescent.” The “activated 0135 For results shown in the paper, the expression data tumors demonstrated worse overall survival (p=0.001). of CSR genes for each data set is provided in the cdt file and There was an apparent association between the activated can be viewed using Treeview. The correlated clinical data serum phenotype and advanced stage:7 out of 10 “activated are available in Microsoft Excel worksheets as indicated tumors had distant metastases at the time of presentation below. while only 3 of 14 patients with “quiescent” tumors had 0136 Classification of tumors using fibroblast CSR metastases at time of presentation. genes and correlated clinical outcomes. The gene expression 0.139 Gastric cancer. Gene expression data of 104 gastric data of 58 samples (including 3 normal, 4 fibroadenomas, carcinomas and 24 non-neoplastic gastric tissues were and 51 locally advanced breast cancers from the same downloaded from Stanford Microarray Database. Genes clinical trial) were downloaded from Stanford Microarray with technically adequate measurement over 80% of experi Database. Because the data were derived from several ments were selected; 446 clNA probes corresponding to batches of microarrays (some containing different numbers CSR genes (henceforth genes) were present in this dataset of genes), the filtering criteria was relaxed to include genes and pass the filtering criteria. The expression pattern of these with technically adequate data in 60% of experiments in 446 genes were used for hierarchical clustering to define 2 order to preserve the expression data stemming from the classes were as described above. The 24 normal gastric larger arrays. 218 clNA probes corresponding to CSR genes tissues in this dataset were all identified as “quiescent'. (henceforth genes) were present in this dataset and pass the Among 42 stage III primary gastric carcinomas with filtering criteria. The expression pattern of these 218 genes adequate Survival information, 18 tumors were classified as were used for hierarchical clustering to define 2 classes were “activated and 24 tumors were classified as “quiescent.” as described above. The 3 normal breasts and 4 fibroad The “activated” tumors demonstrated worse overall survival enomas in this dataset were all identified as "quiescent'. (p=0.02). along with 32 breast tumors. 19 tumors were classified as “activated.” The “activated” tumors demonstrated worse 0140. Diffuse large B cell lymphoma. Gene expression outcome in disease-specific Survival and relapse free Sur data of 240 DLCL samples were downloaded. Genes with vival (p=0.041 and 0.013, respectively). Applying CSR technically adequate measurement over 80% of experiments genes to the entire set of 85 breast carcinomas yielded were selected; 198 cDNA probes corresponding to CSR similar classification result and prognostic stratification. genes (henceforth genes) were present in this dataset and US 2006/0183141 A1 Aug. 17, 2006 pass the filtering criteria. The expression pattern of these 198 0145 Medulloblastoma. Gene expression data of 60 genes were used for hierarchical clustering to define 2 medulloblastoma samples were downloaded, transformed, classes were as described above. We did not observe clear and processed as described in section II. Genes with tech reciprocal expression of serum-induced and serum-re nically adequate measurement over 80% of experiments pressed CSR genes within the samples. Thus, we took the were selected; 222 CSR genes present in this dataset pass the first bifurcation of the hierarchical clustering dendrogram filtering criteria. The expression pattern of these 222 genes and classified samples as “A” or “B”, recognizing that the were used for hierarchical clustering to define 2 classes were variation observed here may not have biological meaning. as described above. We did not observe clear reciprocal 110 samples were classified as “A” and 130 samples were expression of serum-induced and serum-repressed CSR classified as “B”. However, these two groups do not have genes within the samples. Thus, we took the first bifurcation significant difference in their overall survival (p=0.25). of the hierarchical clustering dendrogram and classified samples as “A” or “B”, recognizing that the variation 0141 Hepatocellular carcinoma. Gene expression data of observed here may not have biological meaning. 21 Samples 82 HCC and 74 non-neoplastic liver tissue were downloaded were classified as “A” and 39 samples were classified as from Stanford Microarray Database. Genes with technically “B”. However, these two groups do not have significant adequate measurement over 80% of experiments were selected; 249 cDNA probes corresponding to CSR genes difference in their overall survival (p=0.65). (henceforth genes) were present in this dataset and pass the 0146 To identify genes that are constitutively and highly filtering criteria. The expression pattern of these 249 genes expressed in fibroblasts, the global gene expression data of were used for hierarchical clustering to define 2 classes were 50 fibroblast cultures was selected as follows. The median as described above. 73 out of 74 non-neoplastic liver tissues Cy5 fluorescence signal over background (representing in this dataset were identified as “quiescent”. 77 out of 82 expression of genes in fibroblasts) for each array element HCC samples were classified as “activated.” Because most was filtered for regressiond O.6 over the element, Cy3 chan tumors had the activated CSR phenotype, we did not analyze nel (representing reference RNA) signald 1.5 fold over back possible survival differences. ground, 80% informative data and variance less than 2 fold in 5 arrays over the 50 experiments. These filtering criteria 0142 Prostate cancer. Gene expression data of 59 pros identified 12959 array elements out of 44600 on the microar tate cancers and 41 non-neoplastic prostate tissue were ray. The Cy5 fluorescence signal of each gene was then downloaded from Stanford Microarray Database. Genes averaged for the 50 experiments and ranked from high to with technically adequate measurement over 80% of experi low. Genes already identified as the universal fibroblast ments were selected: 431 clNA probes corresponding to serum response were removed from this list. The top 1% this CSR genes (henceforth genes) were present in this dataset ranked gene list (122 out of 12213) was termed “top 1% and pass the filtering criteria. The expression pattern of these fibroblast genes.” 431 genes were used for hierarchical clustering to define 2 classes were as described above. 40 out of 41 non-neoplastic 0147 To determine whether the top 1% fibroblast genes prostate tissues in this dataset were identified as "quiescent'. also had prognostic power in breast cancer, IMAGE clone 58 out of 59 HCC samples were classified as “activated.” number was used to map the genes in this list to array Because most tumors had the activated CSR phenotype, we elements in breast cancer gene expression. 98 out of 122 did not analyze possible survival differences. genes were mapped. The extracted expression data was centered by mean, filtered for genes that were present for 0143 Early breast cancer. Gene expression data of 78 80% of experiments, and the breast cancer samples were stage I and IIA breast cancers were downloaded and pro organized by the expression pattern of these genes as cesses as described above in section II. Genes with techni described above using hierarchical clustering. The top 1% cally adequate measurement over 80% of experiments were fibroblast genes were up regulated in benign fibroadenomas, selected; 453 CSR genes were present in this dataset and which is consistent with the known biology of fibroad pass the filtering criteria. The expression pattern of these 453 enomas and confirms the selection of fibroblast-enriched genes were used for hierarchical clustering to define 2 genes. However, separation of 51 breast cancer Samples into classes were as described above. 33 tumors were classified 2 groups based on this gene list did not identify a statistically as “activated” and 45 tumors were classified as “quiescent.” significant Survival difference between these two groups The “activated” tumors demonstrated worse metastasis-free (p=0.75). survival over 10 years of followup (p=0.00046). 0.148. To compare the prognostic value of fibroblast CSR 0144. Early lung cancer—stage I and II. Gene expression to a measure of cell proliferation, we chose to classify breast data of 156 lung samples, including 62 stage I and II primary cancers based on the expression pattern of all genes desig lung adenocarcinomas and 17 normal lung samples were nated as S or G2/M phase-specific. 535 out of 726 cDNA downloaded and processes as described above in section II. clones were mapped in the breast cancer data, and 224 out Genes with technically adequate measurement over 80% of of 535 clones passed the filter criteria as above. The expres experiments were selected; 246 CSR genes were present in sion patterns and samples were organized by hierarchical this dataset and pass the filtering criteria. The expression clustering; the tumors overexpressing the S and G2/M phase pattern of these 246 genes were used for hierarchical clus signature demonstrated poorer outcome but with borderline tering to define 2 classes were as described above. 16 of 17 statistical significance in relapse free Survival and overall normal lung samples were classified as "quiescent.” Among survival (p=0.06 and 0.08, respectively). Thus, although the 62 stage I and II primary lung adenocarcinomas, 36 mitotic rate is one of the established criteria for tumor grade, tumors were classified as “activated' and 26 tumors were the aggregate gene expression measurement of cell prolif classified as "quiescent.” The “activated tumors demon eration is not sufficiently robust to predict outcome. This strated worse overall survival (p=0.021). result also indicates that the prognostic power of the fibro US 2006/0183141 A1 Aug. 17, 2006 22 blast core serum response genes cannot be solely accounted EXAMPLE 2 for by the incomplete removal of genes representing cell cycle progression. 0152 Based on the hypothesis that normal wound heal ing and cancer metastasis may share many common features, 0149 To confirm the interpretation that the common we identified consistent features in the transcriptional serum response of fibroblasts reflect their diverse roles in response of normal fibroblasts to serum, and used this wound healing, we asked whether the serum response genes wound response signature to reveal links between wound were enriched for biologic processes related to wound healing and cancer progression in a variety of common healing in the public Gene Ontology annotation database. epithelial tumors. Here we show in a consecutive series of The common fibroblast serum response were queried against 295 early breast cancer patient that tumors showing an the GO database using the program SOURCE, and enrich activated wound response signature (N=126) had decreased ment of GO-annotated biologic processes greater than distant metastasis-free probability and overall survival com expected by chance was calculated using a hypergeometric pared to those with a quiescent signature (10 year DMFP= distribution model as previously described. Specifically, we 51% vs. 75% and OS=51% vs. 84%, P value=10 and compared the number of genes with a particular GO anno 10', respectively). We establish a gene expression centroid tation in the query set ('sample succ/sample num”) versus of the wound signature that allows the signature to be that ratio calculated for all genes on the microarray ("pop applied to individual samples prospectively and quantita Succ/pop num”). For genes in the unfiltered, common tively, and enables the signature to be scaled to suit different fibroblast serum response, the predominant biologic process clinical purposes. annotations were related to cell proliferation. Once genes 0153. Moreover, we find that the wound response signa that have periodic expression during the cell cycle were ture improves risk stratification independently of known removed (FIG. 1B,C), the enriched biologic processes clinical and pathologic risk factors and previously estab include: blood coagulation (GO:0007596), angiogensis lished prognostic signatures based on unsupervised hierar (GO:0001525), complement activation (GO:0006956), chical clustering (“molecular subtypes) or Supervised pre immune response (GO:0006955), proteolysis and peptidoly dictors of metastasis (“70-gene prognosis signature'). These sis (GO:0006508), and secretory protein synthesis such as results demonstrate that hypothesis-driven gene expression N-linked glycosylation (GO:0006487) and protein transla signatures of biological processes can provide order and tion (GO:0006445). This result reinforces the idea that the meaning to heterologous data, and is a powerful approach to common transcriptional response of fibroblasts to serum in decipher the complex biology of human diseases. vitro recapitulates their multifaceted roles in wound healing in vivo. Materials and Methods 0150. To understand how many of the CSR genes were 0154 Tumor Gene Expression Profiles. Detailed patient driving the classification of tumors into two classes (Acti information has been described previously. RNA isolation, vated vs. Quiescent), we performed SAM analysis on the labeling of complementary RNA, competitive hybridization CSR gene expression patterns in two breast cancer datasets of each tumor cRNA with pooled reference cFNA from all examined in this study (datasets B and H above). SAM is a samples to 25,000 element oligonucleotide microarrays, and permutation-based algorithm that calculates a false discov measurement of expression ratios were previously described ery rate (FDR) analogous to traditional p-values but has (vant Veer et al. (2002) Nature 415, 530-6). added advantages. Of 217 CSR genes in the Sorlie dataset, Data Analysis 108 (50%) of the CSR genes were significantly different (FDR-0.05) between the activated vs. the quiescent 0.155) Prognostic signatures. Genes on Stanford cDNA samples. Of the 456 CSR genes in the vant Veer dataset, microarrays and Rosetta/NKI oligonucleotide microarrays 237 genes (52%) were significantly different (FDR-0.05) were mapped across different platforms using Unigene iden between the activated and quiescent samples. Thus, a sig tifiers. This older build of Unigene was used to allow nificant Subset of the CSR genes are providing discriminat comparison with 2 previously published cross-platform ing power to the tumor classification, highlighting the link analyses. In the unsupervised analysis, 295 tumor samples between wound healing and cancer progression. were grouped by similarity of the expression pattern of the CSR genes (for which technically adequate data were 0151. To address the level of redundancy of CSR genes obtained from at 80% of samples) by average linkage in achieving tumor classification, we applied a shrunken clustering using the Software Cluster, the gene expression centroid analysis in the program Prediction Analysis of values were centered by mean. The samples were segregated Microarrays (PAM). Using a 10-fold balanced leave-one-out into two classes based on the first bifurcation of the clus training and testing procedure, we discovered that as few as tering dendrogram; the two classes were identified as “Acti 35 CSR genes could recapitulate the classification in the vated' vs. "Quiescent' by the predominant expression of the Sorlie dataset, and as few as 38 CSR genes could recapitu serum-induced and serum repressed CSR genes. Classifica late the classification in the vant Veer dataset. In other tion of the tumors as having a good prognosis signature or words, a minimum of 6% of CSR genes may accomplish the a poor prognosis signature based on the expression of 70 diagnostic task. Because different published cancer gene genes was as described above. The 5 class “intrinsic gene’ expression datasets contain varying number of CSR genes, signature was assigned by matching the expression value of the robustness of the CSR gene classification underlies our the intrinsic genes in the NKI dataset to the nearest expres Success in using this one set of genes in Stratifying prognosis sion centroid of the 5 classes as described; samples that did in multiple types of human cancers. Nevertheless, we have not have correlation>0.1 to any centroid were termed unclas noted that different subsets of CSR genes are more distinct sified. 509 probes representing 431 genes out of 487 intrinsic in different types of cancers. genes were successfully identified in the NKI data set. US 2006/0183141 A1 Aug. 17, 2006 23

0156 Survival analysis. Overall survival (OS) was with lymph node negative disease in this series were used to defined by death from any cause. Distant metastasis-free train the 70 gene signature, performance of the decision tree probability (DMFP) was defined by a distant metastasis as a incorporating the 70 gene signature was validated on the first recurrence event; data on all patients were censored on independent subset of patients with lymph node positive the date of the last follow-up visit, death from causes other disease. The threshold for the 70-gene signature was previ than breast cancer, the recurrence of local or regional ously reported; the threshold for the wound response signa disease, or the development of a second primary cancer, ture was chosen based on outcome data in the training set. including contra-lateral breast cancer. Kaplan-Meier Sur Performance of the decision tree analysis was validated by vival curves were compared by the Cox-Mantel log-rank test equal performance in the randomized training and testing in Winstat(R) for Excel. Multivariate analysis by the Cox sets of patients. Support of the decision tree model by proportional hazard method was performed using the Soft non-linear multivariate analysis is described in FIG. 12. ware package SPSS 11.5 (SPSS, Inc.). 0159) Prognostic Value of a Wound Response Gene 0157 Scaling the wound signature. The patient dataset Expression Signature in Breast Cancer. To validate the was randomized into two halves, one for training and one for prognostic value of the wound response signature, we exam testing. The two half sets were matched for all known ined the expression of the core serum response genes in 295 clinical parameters and risk factors (Table 2). The serum consecutive patients with early breast cancer treated at the activated fibroblast centroid was as described (Chang et al. Netherlands Cancer Institute. 442 probes representing 380 (2004) PLoS Biology 2, E7). Pearson correlation of the out of 459 core serum response genes were successfully expression values of CSR genes of tumor samples to the identified in this data set. In order to determine whether the serum-activated fibroblast centroid results in a quantitative CSR genes showed coherent expression in this new set of score reflecting the wound response signature for each patients, we grouped the expression pattern of genes and sample. The higher the correlation value, the more the patients by similarity using hierarchical clustering. As sample resembles serum-activated fibroblasts (“activated reported above in 2 Smaller groups of breast cancer patients, wound response signature). A negative correlation value the CSR genes showed a coordinated and biphasic pattern of indicates the opposite behavior and higher expression of the expression (FIG. 6A). Breast cancer samples showed pre “quiescent wound response signature. The threshold for the dominant expression of either serum-induced or serum two classes can be moved up or down from Zero depending repressed genes, allowing us to assign each sample to the on the clinical goal. Sensitivity and specificity for predicting “activated' or “quiescent” wound response signature. We metastasis as the first recurrence event was calculated for tested for association between the wound response signature every threshold between -1 and +1 for the correlation score and the occurrence and timing of several key clinical out in 0.05 increments. The threshold value of negative 0.15 comes. Patients with the activated wound response signature correlation gave 90% sensitivity for metastasis prediction in (n=126, 42.7%) had a significantly decreased distant the training set, and had equivalent performance in the metastasis-free probability (p=8.6x10) and overall sur test-Set. vival (p=5.6x10') in univariate analysis (FIG. 6B, C). We

TABLE 2 Characteristics of patients in the learning and test Subsets. No significant difference was found between the two subsets. Training set Validation set All P (N = 148) (N = 147) (N = 295) value Overall Survival (10 years) 69.6% 70.9% 70.4% O.96 Metastasis-free probability 66.8% 63.3% 65.2% O.89 (10 years) T1 ws. T2 53%. 47% 52%-48% 53%. 47% O.77 pNO-pN1a-pN2a3a 51%-36%. 13%. 51%-36%. 13%. 51%-36%. 13%. 1 MST vs. BCT 45%. 55% 46% 54% 45%. 55% O.96 ER+ vs. ER- 72%. 28% 81% 1996 77%. 23% O.08 Grade I-II-III 27%-30%. 43%. 24%. 38%-38%. 24%. 35% 40% 0.38 Age 40<>40 19% 81% 24%. 46% 21% 79% O.31 CHT yes vs. no 38%. 62% 37%. 63% 37%. 63% O.84 70 genes poor vs. good 62%-38% 60%. 4.0% 61%-39% O.69 WS activated vs quiescent 43%. 57% 42%. 58% 43%. 57% O.85

0158 Decision Tree Analysis. To construct a decision noted that two small Subsets of patients with in the quiescent tree, we considered all clinical risk factors and gene expres group have more heterogeneous gene expression patterns sion profiles using the Cox proportional hazard model in (FIG. 6A, yellow bars); these patients that were less con SPSS, identified the dominant risk factor (most significant p fidently assigned to the quiescent group had an intermediate value) to segregate patients, and reiterated the process on risk of metastasis and death from their tumors (FIG. 10). each subgroup until the patients or risk factors became exhausted. For gene expression signatures, we used the 0.160 We extended the analysis by separately testing the correlation value to each canonical centroid as a continuous association of the activated wound response signature and variable to capture the possibility that different thresholds clinical outcome in Subsets of breast cancer patients: those may be optimal in different Subgroups. Because 61 patients with tumors<2.0 cm (T1 tumors); and separately in lymph US 2006/0183141 A1 Aug. 17, 2006 24 node negative disease, and in lymph node positive disease. threshold as a parameter and quantify the confidence with In each of these Subsets of breast cancer patients, patients which patients are assigned to each class. The threshold for with tumors showing an activated wound response signature calling a tumor sample wound-like could then be system had significantly worse distant metastasis-free probability atically and finely scaled to favor sensitivity or specificity, and overall Survival compared to those with a quiescent depending on the clinical scenario. For example, in a screen wound signature (FIG. 7). These results confirm that the ing setting, it may be preferable to favor sensitivity, whereas wound response signature is a powerful prognostic indicator a clinical test to determine therapies associated with high in breast cancer. morbidity should have high specificity.

TABLE 3 Multivariate analysis of prognostic gene expression signatures and clinical risk factors using a linear additive Cox proportional hazard model. Death Metastasis

Hazard Ratio Hazard Ratio (95% CI) P value (95% CI) P value Wound response signature 6.17 (1.11–34.48) 0.034 3.60 (0.71–18.17) O.11 70-gene poor prognosis signature 4.46 (1.71–11.63) 0.002 4.53 (2.10–9.77) &0.0001 Molecular subtypes Basal 0.45 (0.047. 4.20) 0.47 0.244 (0.042–1.40) O.11 Erbb2 0.74 (0.085-6.43) 0.78 0.532 (0.11-2.69) 0.44 Luminal a 0.79 (0.085-7.38) 0.83 0.679 (0.13–3.53) O.64 Luminal b 0.59 (0.068–5.12) 0.62 0.458 (0.092–2.29) O.33 Indeterminate 0.51 (0.061-4.20) 0.52 0.438 (0.094 2.04) O.28 Age (per decade) 0.75 (0.51-1.10) 0.13 0.821 (0.57–1.18) 0.27 Diameter of tumor (per cm) .03 (1.00-1.05) 0.081 1.046 (1.02–1.08) 0.001 Lymph node status (per positive .10 (0.98–1.24) 0.11 1.148 (1.04–1.27) O.007 node) Tumor grade Grade 2 vs. 1 .93 (0.62–6.08) 0.25 1.262 (0.54–2.91) O.S8 Grade 3 vs. 1 .70 (0.51-5.69) 0.38 0.972 (0.39–2.42) O.9S Vascular invasion 1–3 vessels vs. O vessels 0.72 (0.26–2.00) 0.52 0.623 (0.25-1.55) O.30 >3 vessels vs. O vessels 1.74 (1.01-2.98) 0.040 1.539 (0.93–2.56) O.09 Estrogen receptor status (Positive .85 (0.83–4.12) 0.12 1.400 (0.65-3.03) O.38 vs. negative) Mastectomy (vs. breast 0.85 (0.51-141) 0.52 0.836 (0.52–1.36) O46 conserving therapy) No adjuvant chemotherapy .86 (0.99–3.50) 0.050 2.795 (1.53–5.11) 0.001 No adjuvant hormonal therapy .25 (0.50–3.16) 0.63 1.713 (0.73–4.03) O.21

*Per 1.0 increment in correlation value to the serum-activated fibroblast centroid. The correlation value to the serum-activated fibroblast centroid was modeled as a continuous variable: the hazard ratio per +1.0 correlation value is reported. CI = confidence interval. The hazard ratio per +0.1 correlation value for death and metastasis are 1.20 (95% CI = 1.01–142) and 1.14 (95% CI = 0.97-1.34) respectively. Each molecular Subtype was compared to all other subtypes. Parameters found to be significant (p < 0.05) are shown in bold. Note that the 70-gene signature was identified based on metastasis prediction of a subset of these patients, thus its performance in this data set may be optimistic.

0161 Creation of a Scalable Prognostic Score based on 0162 We first defined the expression pattern of CSR the Wound Response Signature. The previous analyses genes in serum-activated serves as the prototype of the depended on stratifying tumors within a pre-defined group, “activated profile of the wound response signature. Thus, relative to which each tumor is evaluated. To allow practical we considered a strategy based on the correlation of the clinical use of the wound signature, we needed to develop a expression profile of CSR genes in each tumor sample to a method to rationally apply and scale the signature so that a vector representing the centroid of the differential expres newly diagnosed cancer could be scored and classified with sion in response to serum in cell culture studies of fibroblasts respect to the wound response signature by itself. The from 10 anatomic sites. The correlation value to the "serum classification of the new tumor should not influence the activated fibroblast centroid generates a continuous score classification of previously studied tumors nor be influenced that can be scaled. To evaluate the prognostic utility of the by the addition of other tumors to the data set. Classification Scalable wound signature, multivariate analysis of the by hierarchical clustering provided a mathematically rea wound signature with known clinical and pathologic risk sonable but biologically arbitrary threshold for assigning a factors for breast cancer outcomes showed that the wound cancer to one of two groups; it is preferable to treat the signature is an independent predictor of metastasis and death US 2006/0183141 A1 Aug. 17, 2006

and provides more prognostic information than any of the (NIH) or St. Gallen consensus criteria. Nonetheless, risk classical risk factors in the multivariate model (Table 1, stratification based on clinical parameters is far from perfect hazard ratio of 7 and 11, respectively. P-0.01). Because the and as a result many women who are unlikely to benefit are pattern of CSR genes in serum-activated fibroblasts was treated with chemotherapy. discovered completely independently of tumor gene expres sion data or clinical outcome, the prognostic power of the 0164. Because the presence of the wound response sig serum-activated fibroblast centroid in breast cancer provides nature in the primary tumor is associated with an increased strong evidence of the biologic link between wound healing risk of Subsequent metastasis, we used a scalable wound and cancer progression. signature to identify a Subset of patients with a risk Subse

TABLE 1. Multivariate analysis of risk factors for death and metastasis as the first recurrence event in early breast cancer. Death Metastasis

Hazard Ratio Hazard Ratio (95% CI) P value (95% CI) P value Wound response signature 11.18 (2.52–49.6) 0.001 7.25 (1.75-30.0) 0.006 Age (per decade) 0.66 (0.45-0.95) 0.027 0.71 (0.50-1.00) O.OS2 Diameter of tumor (per cm) 1.02 (0.98-1.04) 0.270 1.03 (1.01–1.06) 0.008 Lymph node status (per positive node) 1.05 (0.94–1.16) 0.371 1.10 (1.01–1.21) 0.035 Tumor grade Grade 2 vs. 1 2.86 (0.96-8.5) 0.059 1.87 (0.86 4.07) O.117 Grade 3 vs. 1 3.14 (1.02–9.6) 0.045 1.70 (0.74–3.90) O.212 Vascular invasion 1–3 vessels vs. O vessels 0.95 (0.35-2.52) 0.918 0.78 (0.32–1.87) 0.57 >3 vessels vs. O vessels 1.88 (1.13–3.11) 0.014 1.65 (1.02–2.68) 0.043 Estrogen receptor status (Positive vs. 0.49 (0.29–0.83) 0.008 0.82 (0.47–141) O468 negative) Mastectomy (vs. breast conserving therapy) 1.23 (0.76–2.01) 0.401 1.28 (0.80–2.04) O311 No adjuvant therapy (vs. chemo or hormonal 1.42 (0.80–2.52) 0.291 2.24 (1.32–3.82) 0.003 therapy) *The correlation value to the serum activated fibroblast centroid was modeled as a continuous variable: the hazard ratio per +1.0 correlation value is reported and represents the different risks at two ends of the spectrum. CI is confidence interval. The hazard ratio per +0.1 correlation value for death and metastasis are 1.27 (95% CI = 1.10–148) and 1.22 (95% CI = 1.06-1.40) respectively. Parameters found to be significant (p < 0.05) in the Cox proportional hazard model are shown in bold.

0163 Improving the Decision whether to Treat Early quent metastasis of less than 10 percent. Within this low-risk Breast Cancer Patients with Chemotherapy. Because the population, the expected absolute benefit from chemo wound signature provides improved risk prediction com therapy would be very small and the decision to forego pared to traditional criteria, we examined the utility of a chemotherapy may be justified. Using the serum-activated Scalable wound signature in a clinical scenario—the deci fibroblast centroid, we assigned a correlation score to each sion to treat with adjuvant chemotherapy in early breast tumor in the data set. We set a threshold for the correlation cancer. Approximately 30% of women with early breast score that was able to identify 90% of all patients with cancer have clinically occult metastatic disease, and treat subsequent metastasis; this threshold was validated by first ment with chemotherapy in addition to Surgical excision and learning the threshold in half of the samples and showing an radiotherapy improves their outcomes. Uniform treatment of equivalent performance in the remaining half of the data set. early breast cancer in women young than 50 years of age 0.165. We then tested whether this supervised wound with chemotherapy increases the 10 year survival from 71% signature provided improved risk stratification compared to to 78% (absolute benefit of 7%) for lymph node negative traditional clinical criteria. Indeed, patients who were disease and from 42% to 53% (absolute benefit of 11%) for assigned as high risk by the NIH or St. Gallen consensus lymph node positive disease, but at the cost of exposing a criteria had heterogeneous outcomes, and within these sets large number of women (89 to 93% of all breast cancer of conventional “high risk” patients, the supervised wound patients) who do not benefit to the morbidities of chemo response criterion was able to identify a Subset of patients therapy. The absolute benefit of chemotherapy for older with a low risk of subsequent metastasis (FIG. 8A, B). 185 patients is even smaller (3.3% for node negative and 2.7% patients within the NKI dataset were not treated with adju for node positive patients). Clinical parameters, such as vant chemotherapy; the clinical outcomes of these patients lymph node status, tumor size and histologic grade can allowed us to examine the appropriateness of decision for provide prognostic information; and are Summarized in chemotherapy provided by the clinical guidelines or wound commonly used clinical guides for deciding whether to treat signature. As schematized in FIG. 8C, the majority of with chemotherapy such as the National Institute of Health patients who did not develop metastasis in this series were US 2006/0183141 A1 Aug. 17, 2006 26 stratified as high risk by the NIH or St. Gallen criteria, and were highlighted by the wound response signature (FIG. according to these criteria would have been treated with 9A, right side). Similarly, almost all of the basal-like sub chemotherapy that would not benefit them. The wound group, so termed because they express markers character response signature appropriately identified 90% of patients istic of the basal epithelial cells in breast ducts, expressed who developed metastases as the first recurrence (the end the 70-gene poor prognosis signature and the activated point of the Supervised scaling), and at the same time would wound response signature (FIG. 10, p<0.001, chi square have spared 30% of women who did not develop metastasis test). These results confirm the notion that the basal-like from exposure to chemotherapy. These results illustrate the tumors represent a distinct disease entity with an aggressive potential utility and improved risk stratification of Scaling clinical course. However, outside of the basal-like subtype, the wound response signature to fit the prognostic goals in many tumors had mixed expression patterns of several a clinical setting. subtypes as defined by the intrinsic genes, and >100 tumors out of 295 could not be confidently assigned to any of the 5 TABLE 4 subtypes defined by Perou and Sorlie et al. (FIG. 11). The limited ability to classify these cancers may be due to the Sensitivity and specificity for predicting distant metastasis as first incomplete representation of genes that define the intrinsic recurrence: comparison of gene expression profiles and clinical criteria. gene list in this dataset, or due to the fact that the genes that Sensitivity Specificity False Negative define this classification system were identified in locally advanced breast cancer samples and may not be optimal for NIH high risk 96.6% 3.9% 3.4% St. Gallen high risk 93.2% 7.70% 6.8% classifying earlier stage cancers. In multivariate analysis Wound response signature 59.1% 64.3% 40.1% combining (additively) known clinical risk factors with all 3 70-gene signature'* 85.2% 49.3% 14.8% signatures, the 70-gene signature and wound response sig Wound response criterion+ 90.9% 29.0% 9.1% nature provided independent and significant prognostic *Activated vs. Quiescent by hierarchical clustering. information while the intrinsic genes did not (Table 2). **Good vs. Poor +Activated vs. Quiescent; cut off by correlation level -0.15 to the serum 0.168. As an alternative approach to considering informa activated fibroblast centroid. tion from multiple gene expression signatures for clinical risk stratification, we developed and evaluated a decision 0166 Integration of Diverse Gene Expression Signa tree algorithm to identify patients with clinically meaningful tures. How can we integrate the information from different differences in outcome. At each node in the decision tree, we prognostic signatures that have been indentified for breast considered all clinical risk factors and gene expression cancer to optimize risk stratification? We focused on three profiles, identified patients with divergent outcomes using signatures that have been validated in independent studies the dominant risk factor, and reiterated the process on each and represent distinct analytic strategies. Perou et al., Supra. Subgroup until the patients or risk factors became exhausted. used an unsupervised clustering strategy to identify subtypes We discovered that in decision trees incorporating gene of locally advanced breast tumors with pervasive differences expression signatures, the 70-gene and wound response in global gene expression patterns; the Subtypes are thought signature were sufficient to capture the prognostic informa to represent distinct biologic entities and were associated to tion in only 2 steps (FIG. 10B-D). Modeling of nonlinear different clinical outcomes. At least 5 subtypes were char interactions between the gene expression signatures and acterized—termed basal-like, ErbB2, luminal A, luminal B, clinical risk factors independently yielded a similar conclu and normal-like—and can be identified by the pattern of sion (FIG. 12). expression of a set of 500 “intrinsic genes.” In contrast, 0169. For patients with early breast cancer and lymph Vant Veer et al., supra. selected a 70 gene signature based node involvement, important clinical decisions are whether on the association of expression each gene with the likeli to treat with adjuvant chemotherapy and of what type. As hood of metastasis within 5 years. The 70 gene signature was previously reported, patients with the favorable 70-gene trained on a Subset of the same of patients used in the present profile had approximately 90% metastasis-free probability work and its performance had been previously validated on (group 0). Patients whose cancers had a poor prognosis the entire group of 295 patients. Finally, the wound response 70-gene profile, but lacked the activated wound response signature was identified in a hypothesis-driven approach that signature, have a risk profile similar to the aggregated specifically tested the relationship between genes activated average baseline (group 1); patients whose cancers had both in a wound-like experimental setting and tumor progression. the activated wound-response signature and the 70-gene Importantly, these prognostic signatures are defined by poor prognosis signature had a risk of metastatic disease expression patterns of distinct sets of genes with little approximately 6.4 fold higher than did patients in group 0 overlap-only 22 genes are shared by 2 signatures (18 of these (10 year DMFP of 89%, 78%, vs. 47%, respectively). Thus, genes were shared between wound response and the intrinsic the patients in group 0 might reasonably consider not gene list), and no gene is present in all 3 signatures. undergoing adjuvant chemotherapy, whereas the patients in 0167 We used each of the three signatures to evaluate group 2 have a risk profile more similar to patients with this series of 295 breast tumors and found that, despite their locally advanced disease and might be recommended for different derivations, the signatures gave overlapping and dose-dense or taxane-based adjuvant chemotherapy. consistent predictions of outcomes (FIG. 10A). Many pri Together, these results illustrate that adding the wound mary tumors from patients that developed Subsequent response signature to existing clinical, pathologic, and gene metastasis and died expressed both the 70-gene poor prog expression prognostic factors can significantly improve risk nosis signature and the wound response signature; notably a stratification and clinical decision making. Small group of tumors with poor outcome were not identified 0170 Using an independent data set, we have confirmed as having a poor prognosis by the 70-gene signature but that a wound response gene expression signature is a pow US 2006/0183141 A1 Aug. 17, 2006 27 erful predictor of clinical outcome in patients with early tumors into coherent and internally consistent groups, and stage breast cancers. Together with our previous results on where the signatures diverged, gave improved risk stratifi locally advanced breast cancer, lung cancer, and gastric cation compared to individual signatures. These results cancer, these findings reinforce the concept that a gene show that diverse analytic strategies are continuing to iden expression program related to the physiological response to tify distinct molecular features that are related to poor a wound is frequently activated in common human epithelial prognosis in these tumors. tumors, and confers increased risk of metastasis and cancer progression. By delineating the risk for metastasis based on 0172 Visualizing the connections between the different the wound response signature, these high risk breast cancer signatures reveals potential biologic explanations for differ patients may benefit from therapies that target the wound ent clinical outcomes and sets the stage for directed experi response. mentation. For example, the high level activation of the 0171 We have examined approaches to parameterize the wound signatures in the basal-like Subtype of breast cancers wound response signature so that it can be evaluated in raises the possibility that basal epithelial cells in breast ducts tumors individually to yield a quantitative score; the inter have distinct roles in wound healing and may differentially pretation of the wound signature score can then be rationally regulate the CSR genes. Finally, the ability of the wound directed to suit the clinical task. As a first step toward response signature, a gene expression pattern discovered in integrating diverse prognostic signatures, we examined the a cell culture model, to improve cancer risk stratification interactions and information provided by 3 independent beyond what had been accomplished using prognostic sig methods for using global gene expression patterns to classify natures derived directly from global expression patterns in breast cancers and predict their course: one that defined 5 the cancers themselves highlights the importance of diverse molecular subtypes, one that was discovered by directly and systematic studies of the human gene expression pro fitting to Survival data, and one based on an in vitro model gram in providing a framework for interpreting the complex of wound response. The different signatures classified genomic programs of human diseases.

Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone.ID OGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAGE: 809894 Hs.14779 acetyl-Coenzyme A ACAS2 O -2 synthetase 2 (ADP forming) MAGE: 417404 Hs.227133 apoptotic chromatin ACINUS O -2 condensation inducer in the nucleus MAGE: 144797 HS.8230 a disintegrin-like and ADAMTS1 O 2 metalloprotease (reprolysin type) with thrombospondin type 1 motif, 1 MAGE: 472185 HS.8230 a disintegrin-like and ADAMTS1 1 O metalloprotease (reprolysin type) with thrombospondin type 1 motif, 1 MAGE: 796323 HS.324470 adducin 3 (gamma) ADD3 O -2 MAGE: 1558492 Hs.22599 atrophin-1 interacting protein 1 AIP1 O -2 MAGE: 245174 Hs.172788 ALEX3 protein ALEX3 1 -2 MAGE: 251452 Hs.172788 ALEX3 protein ALEX3 1 -2 MAGE: 283233 Hs.172788 ALEX3 protein ALEX3 1 -2 MAGE: 785342 Hs.172788 ALEX3 protein ALEX3 O -2 MAGE: 825842 Hs.262476 adenosylmethionine AMD1 O O decarboxylase 1 MAGE: 1942271 Hs.72160 AND-1 protein AND-1 O 2 MAGE: 46.1699 Hs. 172572 ankyrin repeat domain 10 ANKRD10 O O MAGE: 2327739 Hs.279905 nucleolar protein ANKT ANKT O O MAGE: 461933 Hs.279905 nucleolar protein ANKT ANKT 1 O MAGE: 951241 Hs.279905 nucleolar protein ANKT ANKT 1 O MAGE: 128711 Hs.62180 anillin, actin binding protein ANLN 1 O (scraps homolog, Drosophila) MAGE: 129858 Hs.62180 anillin, actin binding protein ANLN O O (scraps homolog, Drosphila) MAGE: 1637791 Hs.71331 acidic (leucine-rich) nuclear ANP32E O O phosphoprotein 32 family, member E MAGE: 159608 Hs.75736 apolipoprotein D APOD 1 -2 MAGE: 838611 Hs.75736 apolipoprotein D APOD O -2 US 2006/0183141 A1 Aug. 17, 2006 28

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone.ID UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E : 3233.71 S.177486 amyloid beta (A4) precursor APP O -2 protein (protease nexin-II, Alzheimer disease) MAG : 2316441 s.179735 ras homolog gene family, ARHC member C MAG : 29OOSO S.13531 Rho GTPase activating ARHGAP12 protein 12 MAG 293745 S.25951 Rho guanine nucleotide ARHGEF3 exchange factor (GEF) 3 MAG 1703236 S.245540 ADP-ribosylation factor-like 4 ARL4 MAG 295 710 S.26516 ASF1 anti-Silencing function ASF1B homolog B (S. cerevisiae) MAG E: 770377 S.267871 ATPase, H+ transporting, ATP6V0A1 ysosomal VO subunit a isoform 1 MAG 1585327 S.127337 axin 2 (conductin, axil) AXIN2 MAG 753400 S.2743SO BAFS3 BAFS3A MAG 1015874 S.54089 BRCA1 associated RING BARD1 domain 1 MAG 2326129 S.87246 BCL2 binding component 3 BBC3 MAG 415437 S.2798.62 BRCA2 and CDKN1A BCCIP interacting protein MAG E: 201727 B-cell CLL/lymphoma 6 (zinc BCL6 finger protein 51) MAG 826182 B-cell CLL/lymphoma 6 (zinc BCL6 finger protein 51) MAG 23O376 S. 69771 B-factor, properdin MAG 138728 S. 106826 BRAF35/HDAC2 complex (80 kDa) MAG E:: 469297 S.171825 basic helix-loop-helix domain BHLHB2 containing, class B, 2 MAG E:: 796694 S.1578 baculoviral IAP repeat BIRCS containing 5 (Survivin) MAG E : 448O36 S.283532 uncharacterized bone BMO39 marrow protein BMO39 MAG E S.283532 uncharacterized bone BMO39 marrow protein BMO39 MAG E: 14561 SS S.373498 potent brain type organic ion BOCT transporter MAG 711 698 S.34O12 breast cancer 2, early onset BRCA2 MAG 1844.857 S.97515 BRCA1 interacting protein C BRIP1 terminal helicase MAG 244767 S.1.192 barren homolog (Drosphia) BRRN1 MAG 781047 S.98.658 BUB1 budding uninhibited by BUB1 benzimidazoles 1 homolog (yeast) MAG E: 842968 S.36708 BUB1 budding uninhibited by BUB1B benzimidazoles 1 homolog beta (yeast) MAG E : 742952 S.40323 BUB3 budding uninhibited by BUB3 benzimidazoles 3 homolog

MAG E:: 726860 chromosome 11 open C11orf14 reading frame 14 MAG E 3.06446 chromosome 11 open

2.92829 S.121025 chromosome 11 open

MAG E 242840 S.44235 chromosome 13 open

MAG E 703.559 S.88523 chromosome 13 open reading frame 3 MAG E 195813 chromosome 1 open reading C1 orf.3 frame 33 MAG E 377346 S.284.609 complement component 1, S Subcomponen MAG E 85.634 S.284.609 complement component 1, S Subcomponen US 2006/0183141 A1 Aug. 17, 2006 29

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E : 1540227 S.9329 chromosome 20 open C2Oor 1 O reading frame 1 MAG : 23O8994 S.9329 chromosome 20 open C2Oor O O reading frame 1 MAG : 232837 S.9329 chromosome 20 open C2Oor reading frame 1 MAG : 80692 S.352413 chromosome 20 open C20orf108 reading frame 108 MAG : 2004O2 S. 70704 chromosome 20 open C20orf129 reading frame 129 MAG E:: 293727 S.208912 chromosome 22 open reading frame 18 MAG : 79412 S. 10235 open reading frame 4 MAG : 796623 S.88663 chromosome 6 open reading frame 139 MAG E:: 242O8 S.267288 chromosome 6 open reading frame 55 MAG 121136 chromosome 8 open reading frame 13 MAG 27516 S.13572 calcium modulating ligand CAMLG MAG 30170 S. 74.552 caspase 3, apoptosis-related CASP3 cysteine protease MAG 786084 S. 772S4 chromobox homolog 1 (HP1 CBX1 beta homolog Drosphila) MAG 814270 S.851.37 cyclin A2 CCNA2 MAG 950690 S.851.37 cyclin A2 CCNA2 MAG 856289 S. 194698 cyclin B2 CCNB2 MAG 45S128 S.1973 cyclin F CCNF MAG 823691 S.79069 cyclin G2 CCNG2 MAG 120362 S.1436O1 cyclin L2 CCNL2 MAG 884.425 S.1600 chaperonin containing TCP1, CCT5 subunit 5 (epsilon) MAG E: 1031142 S.22116 CDC14 cell division cycle 14 CDC14B homolog B (S. cerevisiae) MAG : 731127 S.22116 CDC14 cell division cycle 14 CDC14B homolog B (S. cerevisiae) MAG : 781061 S.22116 CDC14 cell division cycle 14 CDC14B homolog B (S. cerevisiae) MAG : 712.505 S.334562 cell division cycle 2, G1 to S CDC2 and G2 to M MAG E: 898.286 S.334562 cell division cycle 2, G1 to S CDC2 and G2 to M MAG 366.057 S.1634 cell division cycle 25A DC25A MAG 415102 S. 656 cell division cycle 25C MAG 204214 S. 69563 CDC6 cell division cycle 6 o o homolog (S. cerevisiae) MAG 731095 S.2345.45 cell division cycle associated 1 DCA1 MAG 814O72 S.34,045 cell division cycle associated 4 DCA4 MAG 7531.98 S.3338.93 cell division cycle associated 7 DCA7 MAG 23O8346 S.19.192 cyclin-dependent kinase 2 DK2 MAG 301.018 S.SO905 cyclin-dependent kinase-like 5 DKLS MAG 268652 S.17966S cyclin-dependent kinase DKN1A inhibitor 1A (p21, Cip1) MAG : 147744 S.106O70 cyclin-dependent kinase C inhibitor 1C (p57, Kip2) MAG : 700792 S.84113 cyclin-dependent kinase CDKN3 inhibitor 3 (CDK2-associated dual specificity phosphatase) MAG 2O17415 S.1594 centromere protein A, 17 kDa CENPA MAG 435.076 S. 77204 centromere protein F, CENPF 350/400ka (mitosin) MAG 431477 S.283.077 centromere protein J CENPJ MAG 429784 S.433212 CGI-121 protein CGI-121 MAG 246524 CHK1 checkpoint homolog CHEK1 (S. pombe) MAG : 71902 S.24641 cytoskeleton associated CKAP2 protein 2 US 2006/0183141 A1 Aug. 17, 2006 30

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E: 82S228 S.24641 cytoskeleton associated CKAP2 O O protein 2 MAG 812244 S.15159 chemokine-like factor CKLF O MAG 725.454 S.837.58 CDC28 protein kinase CKS2 O regulatory Subunit 2 MAG 288888 S.44563 hypothetical protein CL640 CL640 MAG 824755 S.211614 chloride channel 6 CLCN6 MAG 1915913 S.S4570 chloride intracellular channel 2 CLIC2 MAG 470279 S.31622 contactin associated protein 1 CNTNAP1 MAG 1602675 S.15591 COP9 subunit 6 (MOV34 COPS6 homolog, 34 kD) MAG E: S11647 S.17377 coronin, actin binding CORO1C protein, 1C MAG : 813490 S.17377 coronin, actin binding CORO1C protein, 1C MAG EE : 144849 S.289092 coactosin-like 1 COTL1 (Dictyostelium) MAG E : 4898.23 S.16297 COX17 homolog, COX17 cytochrome c oxidase assembly protein (yeast) MAG 85313 cell cycle progression 8 CPR8 protein MAG 768262 S.155481 cartilage associated protein CRTAP O MAG 1475574 S.173894 colony stimulating factor 1 CSF1 (macrophage) MAG E: 73527 S.173894 colony stimulating factor 1 CSF1 (macrophage) MAG : 949938 S.3O4682 cystatin C (amyloid CST3 angiopathy and cerebral hemorrhage) MAG 269997 S. 64837 cystinosis, nephropathic CTNS MAG 1571993 S.11590 cathepsin F CTSF MAG 295843 S.82568 cytochrome P450, family 27, CYP27A1 Subfamily A, polypeptide 1 MAG 624390 S. 6879 DC13 protein DC13 MAG 431.98 S. 709 eoxycytidine kinase DCK MAG 896978 S.115660 DNA cross-link repair 1B DCLRE1B (PSO2 homolog, S. cerevisiae) MAG 281898 S.40592S ifferential display and DDA3 activated by p53 MAG : 703633 S.40592S ifferential display and DDA3 activated by p53 MAG 245774 S.936.75 ecidual protein induced by EPP progesterone MAG 4.62961 S.83765 ihydrofolate reductase HFR MAG 2442OS S.83765 ihydrofolate reductase HFR MAG 768172 S.83765 ihydrofolate reductase HFR MAG 1995.58 S.124.696 ehydrogenase/reductase HRS6 (SDR family) member 6 MAG E: 743.182 S.S790 hypothetical protein 37E16.5 J37E16.5 MAG SO9943 S.4747 yskeratosis congenita 1, DKC1 yskerin MAG 1724716 S355920 DKFZP434B103 protein KFZP434B103 MAG : 462333 S.59461 DKFZP434C245 protein MAG 823655 S.323583 O thetical protein o 5 K FZ3434L142 MAG 1636060 S.26712O 8. C tyl idin MAG 136070 S.288771 FZP586AO522 protein MAG 70152 S.288771 FZP586AO522 protein MAG 2O62453 S.427S25 FZP727GO51 protein MAG 359.504 s. 270753 othetical protein 5 FZ761L1417 MAG E: 1540236 S.104859 othetical protein 5 K FZ762E1312 MAG S.104859 othetical protein 5 FZ762E1312 MAG : 1961.48 S.14478 othetical protein FZ762H185 US 2006/0183141 A1 Aug. 17, 2006 31

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone OGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E : 773383 S.20149 deleted in lymphocytic DLEU1 O 2 leukemia, 1 MAG : 27O136 S.43628 deleted in lymphocytic DLEU2 O 2 leukemia, 2 MAG E: 686,172 S.77695 discs, large homolog 7 DLG7 (Drosphila) MAG 755228 s. 166161 dynamin 1 DNM1 O MAG 752 770 s. 17834 downstream neighbor of DONSON SON MAG 767268 S.458134 dipeptidylpeptidase 7 DPP7 MAG 84.162O S.173381 dihydropyrimidinase-like 2 DPYSL2 MAG 24O748 S.29106 dual specificity phosphatase DUSP22 o 22 MAG 773678 S.367676 duTP pyrophosphatase MAG 768260 S.96055 E2F transcription factor 1 MAG 22918 S.346868 EBNA1 binding protein 2 MAG 306921 S.433779 eukaryotic translation elongation factor 1 epsilon 1 MAG E: 795.229 S.121073 EF-hand domain (C-terminal) O containing 1 MAG E:: 2O17769 S.433317 eukaryotic translation initiation factor 4E binding protein 1 MAG E: 25988 S.433750 eukaryotic translation E O 2 initiation factor 4 gamma, 1 MAG 272262 S.7913 hypothetical protein Ells1 ls1 MAG 109863 S.29191 epithelial membrane protein 2 MP2 MAG SO2682 S. 102948 enigma (LIM domain protein) GMA MAG 1637756 S.254105 enolase 1, (alpha) MAG 392678 S.254105 enolase 1, (alpha) E MAG 153541 s.78436 EphB1 MAG 248.454 s.93659 protein disulfide isomerase related protein (calcium binding protein, intestinal related) MAG : 2632OO S.173374 endothelial and smooth muscle cell-derive neuropilin-like protein MAG : 2654.94 S.173374 endothelial and smooth muscle cell-derive neuropilin-like protein MAG : 78.2460 S.173374 endothelial and smooth muscle cell-derive neuropilin-like protein MAG 447208 S.47504 exonuclease 1 O MAG 770992 S.77256 enhancer of Zeste homolog 2 (Drosphila) MAG 310519 S.47913 coagulation factor X O MAG 1928791 S.62192 coagulation factor III (thromboplastin, tissue actor) MAG E : 2984.09 S.49881 fatty acid binding protein 3, FABP3 muscle and heart (mammary-derived growth inhibitor) MAG : 1758590 S.268012 fatty-acid-Coenzyme A FACL3 igase, long-chain 3 MAG : 310493 S.268012 fatty-acid-Coenzyme A FACL3 igase, long-chain 3 MAG E:: 49944 S.268012 fatty-acid-Coenzyme A FACL3 igase, long-chain 3 MAG 782503 S. 132898 atty acid desaturase 1 FADS1 MAG 128329 S.184641 fatty acid desaturase 2 FADS2 MAG 878174 S.184641 8. y acid desaturase 2 FADS2 MAG 770424 S.8047 Fanconi anemia, EANCG : complementation group G MAG : 358643 S.23111 phenylalanine-tRNA FARSL synthetase-like US 2006/0183141 A1 Aug. 17, 2006 32

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone OGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E : 68894 S.111903 Fc fragment of IgG, receptor, FCGRT O ransporter, alpha MAG : 770394 S.111903 Fc fragment of IgG, receptor, FCGRT 1 ransporter, alpha MAG E:: 80410 S.335918 airnesyl diphosphate DPS synthase (farnesyl pyrophosphate synthetase, dimethylallyltranstransferase, geranyltranstransferase) MAG E:: 951142 S.4756 flap structure-specific EN1 endonuclease 1 MAG : 842767 S.21331 hypothetical protein LJ10036 OO36 MAG : 773147 S.86211 O ical protein R O 5 6 MAG : 1664710 S. 1046SO O 8. protein LJ10292 T O 5. MAG : 824.126 S.30738 O 8. protein LJ10407 R O. 5. MAG : 292936 S.48855 O 8. protein LJ10468 T O. s MAG : 346834 S.42484 O 8. protein LJ106.18 R O s s MAG E:: 6262O6 S.334828 O 8. protein LJ10719 T O s MAG : 773605 S.8768 O 8. protein LJ10849 T Os MAG : 307328 S.34579 O 8. protein LJ10948 T O.. s MAG E:: 277808 S.29716 O l e C 8. protein LJ10980 L1098O MAG E:: 1572724 S.23363 hypothetical protein LJ10983 FLJ10983 MAG : 46.2861 S.274448 hypothetical protein LJ11029 FLJ11029 MAG : 809383 S.12151 hypothetical protein LJ11286 FLJ11286 MAG : 435619 S.374421 hypothetical protein LJ12643 FLJ12643 MAG : 188O814 S.323537 l e LJ12953 R RE. t M S

MAG : 3463O8 S.47125 O 8. O e l LJ13912 3o MAG : 290057 S.26812 R S. 8. protein MAG E:: 810603 S.26812 T s 8. protein MAG : 1697632 S.24.6875 hetical protein 20059 R O O 5 9 MAG : 124242 hetical protein T O 1 5 4 MAG : 812137 hetical protein R O 3 3 1 MAG : 590253 S.79828 hetical protein T O 3 3 3 MAG S.133260 hetical protein R O 3 5 4 MAG : 882355 S.32471 hetical protein R O 3 64 MAG E:: 549572 S.426696 hetical protein O 5 6 MAG : 85891S S.289069 y hetical protein LU21016 MAG E: 1696374 S.255416 66 LU21986 MAG 838446 S.31297 s1 s cytO chrom e b LU23462 MAG 7822.59 S.38178 l y O l e ic8. O e l LU23468 2 3 46 8 US 2006/0183141 A1 Aug. 17, 2006 33

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone UGCluster Symbol (Y = 1, N = 0) cell cycle = 0

MAG E : 814769 S.38178 l y O l e C 8. protein FLU23468 O O

MAG : 1618978 S.1656O7 protein LJ25416 O O

MAG E:: 32O865 S.124740 l protein O 5 3 2 MAG E:: 1941536 l protein LJ30574

MAG E:: 1474390 ly d protein LJ31033 1 O 3 3 MAG E:: 365177 S.38.0474 protein LJ32731

MAG : 788596 S.98.133 l protein LJ32915 2 9 5 MAG : 824913 S.998O7 hetical protein O629 MAG E:: 767172 S.8963 l y hetica protein L90754 O754 MAG : 489509 S.28264 pothetica protein L90798 L90798 MAG E: 2321104 S.58414 s amin C, gamma (actin LNC binding protein 280) MAG S64803 S.239 orkhead box M OXM1 MAG 815072 S.9081 phenylalanyl-tRNA RSB synthetase beta-subunit MAG 823659 FYVE and coiled-coil domain FYCO1 containing 1 MAG EE:: 81409 S.336429 GABA(A) receptor GABARAPL1 associated protein like 1 MAG E: 298.231 S.167017 gamma-aminobutyric acid GABBR1 (GABA) B receptor, 1 MAG 15821.49 S.294.088 GAJ protein GAJ O MAG 42558 S.75335 glycine amidinotransferase GATM (L-arginine:glycine amidinotransferase) MAG 6274O1 S.17839 TNF-induced protein GG2-1 O MAG 809588 S. 78619 gamma-glutamyl hydrolase GGEH (conjugase, folylpolygammaglutamyl hydrolase) MAG 196O12 S.2391.89 glutaminase GLS MAG 193883 S.234896 geminin, DNA replication GMNN inhibitor MAG 813586 S.234896 geminin, DNA replication GMNN inhibitor MAG E:: 1636447 S.83381 guanine nucleotide binding GNG11 protein (G protein), gamma 1 MAG : 1656488 glycosylphosphatidylinositol GPLD1 specific phospholipase D1 MAG EE : 486493 S.17270 G protein-coupled receptor GPR124 24 MAG E: 214990 gelsolin (amyloidosis, GSN Finnish type) MAG 2O19372 S.122552 G-2 and S-phase expressed 1 GTSE1 MAG 785897 S.122552 G-2 and S-phase expressed 1 GTSE1 MAG 256664 S.147097 H2A histone family, member X H2AFX MAG 2315147 S.1191.92 H2A histone family, member Z H2AFZ. MAG 249949 S.301OOS histone H2AFZ variant H2AV MAG 1679531 S.159226 hyaluronan synthase 2 HAS2 MAG 2116,188 S.9028 histone deacetylase 5 HDACS MAG S11388 S. 6679 headcase homolog HECA (Drosphila) MAG 789.091 s. 28777 histone 1, H2ac MAG 97.0591 S.427696 high-mobility group box 1 MAG 29O111 S. 77910 3-hydroxy-3-methylglutaryl Coenzyme A synthase 1 (soluble) US 2006/0183141 A1 Aug. 17, 2006 34

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E : 704519 S. 77910 3-hydroxy-3-methylglutaryl O -2 Coenzyme A synthase 1 (soluble) MAG : 73.252 S. 77910 3-hydroxy-3-methylglutaryl Coenzyme A synthase 1 (soluble) MAG : 18456.30 high-mobility group nucleosomal binding domain 2 MAG : 24.1826 high-mobility group nucleosomal binding domain 2 MAG : 128947 hyaluronan-mediated motility HMMR receptor (RHAMM) MAG : 471568 hematological and neurological expressed 1 MAG 7958O3 hematological and neurological expressed 1 MAG 489208 HN1 like MAG 855723 HN1 like MAG 327350 heterogeneous nuclear RPA2B1 ribonucleoprotein A2, B1 MAG 453790 S.15265 heterogeneous nuclear RPR ribonucleoprotein R MAG 260696 HIV-1 rev binding protein 2 B2 MAG 755581 heme-regulated initiation actor 2-alpha kinase MAG E: 825695 S.279918 hypothetical protein HSPC111 HSPC111 MAG : 796469 S.5199 HSPC150 protein similar to HSPC150 ubiquitin-conjugating enzyme MAG E:: 786690 s.150555 protein predicted by clone HSU79274 23.733 MAG : 221295 S.180919 inhibitor of DNA binding 2, dominant negative helix oop-helix protein MAG : 756405 S. 76884 inhibitor of DNA binding 3, dominant negative helix oop-helix protein MAG : 44975 S.76038 isopentenyl-diphosphate delta isomerase MAG : S88840 interferon-induced protein with tetratricopeptide repeats 1 MAG : 809946 S.315177 interferon-related developmental regulator 2 MAG E: 796996 S.3631 immunoglobulin (CD79A) binding protein 1 MAG 138265 S.82112 interleukin 1 receptor, type I MAG 146671 S.82112 interleukin 1 receptor, type I i MAG 2O18581 S.82O65 interleukin 6 signal transducer (gp130, oncostatin M receptor) MAG E: 753743 interleukin 6 signal transducer (gp130, oncostatin M receptor) MAG 840460 S.362807 interleukin 7 receptor MAG 242952 interleukin enhancer binding factor 2, 45 kDa MAG E: 814428 S.91579 U3 snoRNP protein 4 MP4 homolog MAG insulin induced gene 1 NSIG1 MAG 471,835 S. 61790 importin 4 PO4 MAG 73784 S.227730 integrin, alpha 6 TGA6 MAG 859478 S.87149 integrin, beta 3 (platelet TGB3 glycoprotein IIIa, antigen CD61) MAG E: 276.091 S. 78877 inositol 14,5-trisphosphate TPKB 3-kinase B MAG 141815 S.91143 jagged 1 (Alagile syndrome) AG1 MAG 2O27560 S.3O1613 JTV1 gene TV1 US 2006/0183141 A1 Aug. 17, 2006 35

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone.ID UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 IMAGE: 1474284 HS.323949 kangai 1 (Suppression of KAI1 O -2 umorigenicity 6, prostate; CD82 antigen (R2 leukocyte antigen, antigen detected by monoclonal and antibody A4)) MAG E: 298769 S.285818 similar to Caenorhabditis K O elegans protein C42C1.9 MAG 788721 54797 AAO090 protein AAOO90 MAG S1918 SS314 AAO095 gene product AAO095 MAG 342640 81892 AA0101 gene product AAO101 MAG 41S25 AA0323 protein AAO323 MAG 502067 6 95 O AA0342 gene product AAO342 MAG 813828 AAO367 protein AAO367 MAG 768940 AAO874 protein AAO874 MAG 487013 036 protein AA1036 MAG 305920 othetical protein AA1109 109 MAG SO2S86 228 protein AA1228 MAG 1581420 268 protein AA1268 MAG 754581 305 protein AA1305 MAG 1670954 363 protein AA1363 MAG 200741 363 protein AA1363 MAG 32887 363 protein AA1363 MAG 1916769 536 protein AA1536 MAG 462845 536 protein AA1536 MAG 50276 554 protein AA1554 MAG 877884 720 protein AA1720 MAG 18590SO 946 protein AA1946 MAG 769942 kinesin family member 22 F22 MAG 788256 kinesin family member 23 F23 MAG 292933 kinesin family member C1 MAG 26SO60 v-kit Hardy-Zuckerman 4 eline sarcoma viral oncogene homolog MAG 746080 S.272239 kelch-like 5 (Drosphila) KLHLS MAG 739230 S.26OO2 LIM domain binding 1 LDB1 g MAG 825295 S.213289 ow density lipoprotein LDLR receptor (familial hypercholesterolemia) MAG E:: 854701 S.85226 ipase A, lysosomal acid, LIPA cholesterol esterase (Wolman disease) MAG 1591599 S.89497 amin B1 MAG 81SSO1 S.76084 amin B2 g MAG 773.308 S.184164 hypothetical protein LOC115106 BCO14003 MAG E: 429811 S. 60293 similar to hypothetical protein LOC115294 FLJ10883 MAG 827141 S.180591 mitotic phosphoprotein 44 LOC1294O1 MAG 280763 S.163725 adult retina protein LOC153222 MAG 757431 S.163725 adult retina protein LOC153222 MAG 815297 S.163725 adult retina protein LOC153222 MAG 1623.191 S.9948O hypothetical protein LOC157570 LOC157570 MAG 229560 S.9948O hypothetical protein LOC157570 LOC157570 MAG : 811069 S.9948O hypothetical protein LOC157570 LOC157570 MAG : 1502490 S.94.795 hypothetical protein LOC1696.11 LOC1696.11 MAG : 1966.34 S.5957 hypothetical protein LOC2O1562 LOC2O1562 MAG : 746245 S.5957 hypothetical protein LOC2O1562 LOC2O1562 MAG : 295650 S.3426SS hypothetical protein LOC2O1895 LOC2O1895 MAG : 1895O46 hypothetical protein LOC221810 LOC22.1810 US 2006/0183141 A1 Aug. 17, 2006 36

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E : 823815 S.432790 hypothetical protein LOC2S3263 O -2 LOC2S3263 MAG : 2645O2 S.20575 hypothetical protein LOC283431 O LOC28343 MAG : 130895 S.90790 hypothetical protein LOC284O18 LOC284.018 MAG : 283124 S.17567 hypothetical protein LOC284436 LOC284436 MAG : 665445 hypothetical protein LOC285362 LOC285362 MAG E: S.4094 hypothetical protein LOC339924

MAG 3O8466 S.279582 GTP-binding protein Sara LOCS1128 MAG 771142 S.985.71 complement C1r-like LOC51279 proteinase MAG 160O239 S.43318O HSPCO37 protein LOCS1659 MAG 772925 S.46967 HSPCO34 protein LOCS1668 MAG 274512 S.223SO hypothetical protein LOC56757 LOC56757 MAG 61626 S.193384 putatative 28 kDa protein LOCS6902 MAG 756554 S.24983 hypothetical protein from LOCS6926 EUROIMAGE 2021883 MAG E:: 418240 S.28893 hypothetical protein LOC901.10 LOC901.10 MAG 2316683 S.13413 hypothetical protein LOC93O81 BCO15148 MAG 882SO6 S.83354 ysyl oxidase-like 2 LOXL2 MAG 78.3698 S.81412 ipin LPIN1 g MAG 461144 S.24279 eucine-rich repeats and LRIG2 immunoglobulin-like domains 2 MAG 810SS1 S.446467 ow density lipoprotein LRP1 related protein 1 (alpha-2- macroglobulin receptor) MAG E:: 7961.76 S.111632 LSM3 homolog, U6 small LSM3 nuclear RNA associated (S. cerevisiae) MAG : 5O175 S. 76719 LSM4 homolog, U6 small LSM4 nuclear RNA associated (S. cerevisiae) MAG : 462806 S.93199 anosterol synthase (2,3- LSS oxidosqualene-lanosterol cyclase) MAG E:: 770355 S.93199 anosterol synthase (2,3- LSS oxidosqualene-lanosterol cyclase) MAG 471855 S.79914 unican LUM O MAG 366009 S.425427 hypothetical protein LYAR FLU20425 MAG 7671.63 S.425427 hypothetical protein LYAR FLU20425 MAG : 814701 S.79078 MAD2 mitotic arrest MAD2L1 deficient-like 1 (yeast) MAG : 277414 V-maf musculoaponeurotic MAF fibrosarcoma oncogene homolog (avian) MAG EE : 487793 V-maf musculoaponeurotic MAF fibrosarcoma oncogene homolog (avian) MAG : 823688 mannosidase, alpha, class MAN1A1 A member 1 MAG : 34O630 S.248 mitogen-activated protein MAP3K8 kinase kinase kinase 8 MAG : 590.774 S.178695 mitogen-activated protein MAPK13 kinase 13 MAG : 428223 S.234279 microtubule-associated MAPRE1 protein, RP/EB family, member 1 MAG 328889 S.69547 myelin basic protein MBP MAG 809557 S.17956S MCM3 minichromosome MCM3 maintenance deficient 3 (S. cerevisiae) US 2006/0183141 A1 Aug. 17, 2006 37

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone OGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E S.1544.43 MCM4 minichromosome MCM4 O O maintenance deficient 4 (S. cerevisiae) MAG S.77171 MCM5 minichromosome MCMS O O maintenance deficient 5, cell division cycle 46 (S. cerevisiae) MAG E:: 700721 S.77171 MCM5 minichromosome MCMS maintenance deficient 5, cell division cycle 46 (S. cerevisiae) MAG E:: 158.7847 S.1554-62 MCM6 minichromosome MCM6 maintenance deficient 6 (MIS5 homolog, (S. pombe) (S. cerevisiae) MAG : 2325.609 S.7715.2 MCM7 minichromosome MCM7 maintenance deficient 7 (S. cerevisiae) MAG : 796994 S.83532 membrane cofactor protein MCP (CD46, trophoblast lymphocyte cross-reactive antigen) MAG 142586 S. 102696 MCT-1 protein MCT1 O MAG 448232 S. 77955 MADS box transcription MEF2D enhancer factor 2, polypeptide D (myocyte enhancer factor 2D) MAG E: 1517595 S.184339 maternal embryonic leucine MELK Zipper kinase MAG 79655 S.11039 MEP50 protein MEP50 MAG 626841 S.316752 met proto-oncogene MET (hepatocyte growth factor receptor) MAG E:: 754509 S.316752 met proto-oncogene MET (hepatocyte growth factor receptor) MAG : 488O17 S.3745 milk fat globule-EGF factor 8 MFGE8 protein MAG E:: S64981 S.134726 hypothetical protein MGC MGC: 10200 MAG : 356835 S.271599 hypothetical protein MGC OSOO MGC1OSOO MAG : 743362 S.111099 hypothetical protein MGC O974 MGC10974 MAG E:: 1642496 S.293943 hypothetical protein MGC 1266 1266 MAG 758314 S.97031 hypothetical protein MGC MGC13047 MAG 769945 S.2563O1 MGC13170 gene MGC 3170 MAG 813675 S.37616 hypothetical protein MGC 448O MGC1448O MAG E:: 448344 S.79 hypothetical protein MGC S429 S429 MAG E: 296.155 O RIKEN cDNA MGC 6386 2610036L13 MAG 769796 S.26670 HGFL gene MGC 7330 MAG S1320 S.3O1394 hypothetical protein MGC3101 MGC31 O1 MAG E: SO2.096 S.21415 hypothetical protein MGC3982O MGC3982O MAG 271855 S. 7041 MGC4170 protein MGC4170 MAG 754588 S.39504 hypothetical protein MGC4308 MGC4308 MAG E: 1858892 hypothetical protein MGC4825 MGC4825 MAG 742642 S.11169 Gene 33/Mig-6 MIG-6 MAG 140957 S.46743 McKusick-Kaufman MKKS syndrome MAG : 72.9957 S.46743 McKusick-Kaufman MKKS syndrome MAG : 461770 S.3491.96 myeloid/lymphoid or mixed MLLT6 lineage leukemia (trithorax US 2006/0183141 A1 Aug. 17, 2006 38

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone.ID UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 homolog, Drosphia); translocated to, 6 MAG E: 810791 S.4334.10 menage a trois 1 (CAK MNAT1 assembly factor) MAG 292964 HS.240 M-phase phosphoprotein 1 MPHOSPH1 MAG 713236 HS.240 M-phase phosphoprotein 1 MPHOSPH1 MAG 595 637 S.12702 modulator recognition factor 2 MRF2 MAG 1636069 S.109059 mitochondrial ribosomal MRPL12 protein L12 MAG E: 843263 HS.4209 mitochondrial ribosomal MRPL37 O protein L37 MAG E : 755304 mitochondrial ribosomal MRPS16 protein S16 MAG E: 773483 S.SSO97 mitochondrial ribosomal MRPS28 protein S28 MAG 131362 S.170328 moesin MSN MAG 81332 S.170328 moesin MSN MAG 78.353 S.381097 metallothionein 1F (functiona ) MAG E: 2O19011 S. 731.33 metallothionein 3 (growth MT3 inhibitory factor (neurotrophic)) MAG E:: 203008 likely ortholog of mouse MTH2 MutT homolog 2 MAG E : 2028.294 S.17266S methylenetetrahydrofolate MTHFD1 dehydrogenase (NADP+ dependent), methenyltetrahydrofolate cyclohydrolase, formyltetrahydrofolate synthetase MAG E: 280934 HS.3828 mevalonate (diphospho) MVD decarboxylase MAG 168.0549 S.118630 MAX interacting protein 1 MXI1 MAG 2.71478 S.118630 MAX interacting protein 1 MXI1 MAG 277611 S.118630 MAX interacting protein 1 MXI1 MAG 489947 S.118630 MAX interacting protein 1 MXI1 MAG 609366 S.118630 MAX interacting protein 1 MXI1 MAG 1526789 S.3OOS92 v-myb myeloblastosis viral MYBL1 oncogene homolog (avian)- like 1 MAG E:: 815526 S.1797.18 v-myb myeloblastosis viral MYBL2 oncogene homolog (avian)- like 2 MAG S10794 S.78221 c-myc binding protein MYCBP O 2 MAG 842989 S. 77385 myosin, light polypeptide 6, MYL6 alkali, Smooth muscle and non-muscle MAG E: 1474424 S. 69476 Similar to RIKEN cDNA l 11 OOO1AO7 MAG 14684.66 S.127797 Similar to PRO2550 l MAG 1881.517 S.2831.27 similar to Diap3 protein l MAG 25058 S.179397 hypothetical gene Supported l by AF131741 MAG 469898 S.40527 LOC345469 l MAG 1553567 S.26O395 similar to hypothetical protein l MAG 1758226 S.144814 similar to caspase 1 isoform l alpha precursor; interleukin -beta convertase; interleukin 1-B converting enzyme: IL1B-convertase IMAGE: 34.6860 HS.177781 hypothetical gene Supported l by AKO93984 IMAGE: 781.48 HS.177781 hypothetical gene Supported l by AKO93984 IMAGE: 788445 HS.237642 similar to family 4 l cytochrome P450; cytochrome P450, 4v3 IMAGE: 825356 HS.43275S similar to SNAG1 l US 2006/0183141 A1 Aug. 17, 2006 39

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0

MAG E S.177781 hypothetical gene Supported l 1 -2 by AKO93984 MAG 246808 S. 6844 neuronal apoptosis inhibitor NALP2 O protein 2 MAG 502333 s. 225977 nuclear receptor coactivator 3 NCOA3 MAG 73531 S.99.08 nitrogen fixation cluster-like NIFU g MAG S.22151 neurolysin (metallopeptidase NLN M3 family) MAG E: 845363 S.118638 non-metastatic cells 1, NME1 protein (NM23A) expressed l MAG : 811097 S.23990 nucleolar protein family A, NOLA2 member 2 (H/ACA Small nucleolar RNPs) MAG : 7565O2 S.388 nudix (nucleoside NUDT1 diphosphate linked moiety X)-type motif 1 MAG 257955 S.236204 nuclear pore complex protein NUP107 MAG 827,159 S.236204 nuclear pore complex protein NUP107 MAG 413299 S.90421 nucleoporin like 1 NUPL1 MAG 1899230 S.151734 nuclear transport factor 2 NUTF2 MAG S12116 S.377830 O-acyltransferase OACT1 (membrane bound) domain containing MAG 282720 S.274.170 Opa-interacting protein 2 OIP2 O 2 MAG 191603 S.179661 beta 5-tubulin OKASW c.56 MAG 773479 S.179661 beta 5-tubulin OKASW c.56 MAG 268978 S.109694 oxysterol binding protein-like 8 OSBPL8 MAG 80484 S.424,279 p8 protein (candidate of P8 metastasis 1) MAG E: 813584 S.14125 p53 regulated PA26 nuclear PA26 protein MAG E:: 842973 S.34.3258 proliferation-associated 2G4, PA2G4 38 kDa MAG : 273S46 S.117950 phosphoribosylaminoimidazole PAICS carboxylase, phosphoribosylaminoimidazole Succinocarboxamide synthetase MAG E:: 366042 S.8068 pre-B-cell leukemia PBXIP1 ranscription factor interacting protein 1 MAG : 43229 S.78996 proliferating cell nuclear PCNA antigen MAG E: 789 182 S.78996 proliferating cell nuclear PCNA antigen MAG 2431SS S.184352 pericentrin 1 PCNT1 O MAG 81346O S.432969 proprotein convertase PCSK7 subtilisin?kexin type 7 MAG 824426 S.278426 PDGFA associated protein 1 PDAP1 O MAG 49860 S.92261 pyruvate dehydrogenase PDK2 kinase, isoenzyme 2 MAG 950682 S.9991O phosphofructokinase, platelet MAG 826 173 S.408943 profilin 1 MAG 796.263 S.197335 plasma glutamate carboxypeptidase MAG E: 1533669 1-aminocyclopropane-1- PHACS carboxylate synthase MAG E: 30114 S.128653 putative homeodomain PHTF2 ranscription factor 2 MAG 18393.67 S.24596 RAD51-interacting protein PIR51 MAG 364436 S.333,212 phosphatidylinositol transfer PITPNC1 protein, cytoplasmic 1 MAG E: 85.5557 protein kinase (cAMP PKIG dependent, catalytic) inhibitor gamma US 2006/0183141 A1 Aug. 17, 2006 40

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone.ID OGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAGE: 320355 Hs. 171945 phospholipase A2 receptor PLA2R1 1 -2 1, 180 kDa MAGE: 511303 Hs. 171945 phospholipase A2 receptor PLA2R1 O -2 1, 180 kDa MAGE: 590154 Hs.1796.57 plasminogen activator, PLAUR O 2 urokinase receptor MAGE: 810017 Hs.1796.57 plasminogen activator, PLAUR 1 2 urokinase receptor MAGE: 159455 Hs.74573 phospholipase D3 PLD3 O -2 MAGE: 195040 Hs.75576 plasminogen PLG O 2 MAGE: 744047 Hs.77597 polo-like kinase (Drosphila) PLK O O MAGE: 263013 HS.41270 procollagen-lysine, 2- PLOD2 O 2 oxoglutarate 5-dioxygenase (lysine hydroxylase) 2 MAGE: 838829 HS.143323 putative DNA chromatin PLU-1 1 O binding motif MAGE: 2108411 HS.143323 putative DNA chromatin PLU-1 O -2 binding motif MAGE: 755952 HS.278311 lexin B1 PLXNB1 O -2 MAGE: 341051 HS.44499 pinin, desmosome PNN O 2 associated protein MAGE: 786078 HS.99185 polymerase (DNA directed), POLE2 O 2 epsilon 2 (p59 subunit) MAGE: 511632 Hs. 110857 polymerase (RNA) III (DNA POLR3K O 2 directed) polypeptide K, 12.3 kDa MAGE: 82556 Hs. 167246 P450 (cytochrome) POR O -2 oxidoreductase MAGE: 767277 HS.988O peptidyl prolyl isomerase H PPIH O 2 (cyclophilin H) MAGE: 365641 HS.82741 primase, polypeptide 1, PRIM1 O O 49 kDa MAGE: 4232S Hs.74519 primase, polypeptide 2A, PRIM2A O O 58 kDa MAGE: 770880 Hs.74519 primase, polypeptide 2A, PRIM2A 1 O 58 kDa MAGE: 204483 Hs.222088 PRO2000 protein PRO2OOO O O MAGE: 280375 Hs.222088 PRO2000 protein PRO2OOO 1 O MAGE: 857002 HS.75969 proline rich 2 PROL2 O -2 MAGE: 2054635 Hs.233952 proteasome (prosome, PSMA7 O 2 macropain) subunit, alpha ype, 7 MAGE: 1602493 Hs.250758 proteasome (prosome, PSMC3 1 2 macropain) 26S Subunit, ATPase, 3 MAGE: 712916 Hs.250758 proteasome (prosome, PSMC3 O 2 macropain) 26S Subunit, ATPase, 3 MAGE: 823598 HS.4295 proteasome (prosome, PSMD12 O 2 macropain) 26S Subunit, non-ATPase, 12 MAGE: 285686 Hs. 178761 proteasome (prosome, PSMD14 O 2 macropain) 26S Subunit, non-ATPase, 14 MAGE: 809992 Hs.74619 proteasome (prosome, PSMD2 O 2 macropain) 26S Subunit, non-ATPase, 2 MAGE: 744800 Hs. 19718 protein tyrosine PTPRU O -2 phosphatase, receptor type, U MAGE: 1160558 HS.415877 6-pyruvoyltetrahydropterin PTS O 2 syntinase MAGE: 2018976 Hs.252587 pituitary tumor-transforming 1 PTTG1 O O MAGE: 781089 Hs.252587 pituitary tumor-transforming 1 PTTG1 1 O MAGE: 843069 Hs. 172589 nuclear phosphoprotein PWP1 O O similar to S. cerevisiae PWP1 MAGE: 40120 Hs. 173656 KIAAO941 protein Rab11- O -2 FIP2 MAGE: 1619759 Hs. 183800 Ran GTPase activating RANGAP1 O O protein 1 US 2006/0183141 A1 Aug. 17, 2006 41

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E : 324225 S.17466 retinoic acid receptor RARRES3 O -2 responder (tazaroteine induced) 3 MAG 731136 S.11170 RNA binding motif protein 14 RBM14 MAG 611028 RNA binding motif protein, X RBMX chromosome MAG 951 080 S.31442 RecQ protein-like 4 RECQL4 MAG 1574649 S.115521 REV3-like, catalytic subunit REV3L of DNA polymerase Zeta (yeast) MAG E:: 86OOOO S. 139226 replication factor C (activator RFC2 1) 2, 40 kDa MAG : 277112 S.115474 replication factor C (activator 1) 3, 38 kDa MAG : 309.288 S.35120 replication factor C (activator RFC4 1) 4, 37 kDa MAG S12410 S.25292 ribonuclease H2, large RNASEEH2A Subunit MAG 855243 S.115823 ribonuclease P1 RNASEP1 MAG 786625 S.180403 ring finger protein 138 RNF138 MAG 19001:49 S.153639 ring finger protein 41 RNF41 MAG SO2690 ribophorin I RPN1 MAG 85.6489 S.2934 ribonucleotide reductase M1 RRM1 polypeptide MAG E: 624627 ribonucleotide reductase M2 RRM2 O O polypeptide MAG E: 7684.66 S.94262 ribonucleotide reductase M2 B (TP53 inducible) MAG 827O11 S.272822 RuvB-like 1 (E. coli) RUVBL1 O MAG 364510 S. 74592 special AT-rich sequence SATB1 binding protein 1 (binds to nuclear matrix scaffold associating DNAs) MAG 2OOO12 S.110783 HBV pre-s2 binding protein 1 SBP1 MAG 590759 S.239926 sterol-C4-methyl oxidase-like SC4MOL g MAG 123474 S.119597 Stearoyl-CoA desaturase SCD (delta-9-desaturase) MAG E: 1616241 S.119597 Stearoyl-CoA desaturase SCD (delta-9-desaturase) MAG E: 8.10711 S.119597 Stearoyl-CoA desaturase SCD (delta-9-desaturase) MAG 1635.538 S.82109 Syndecan 1 S DC1 MAG 525926 S.82109 Syndecan 1 DC1 MAG 586731 S.389371 stromal cell derived factor FR1 o receptor 1 MAG 167205 S.334841 Selenium binding protein 1 S LENBP1 O MAG 754550 S.17763S likely ortholog of mouse MACAP3 semaF cytoplasmic domain associated protein 3 MAG : 381066 S.151242 serine (or cysteine) S E RPING1 proteinase inhibitor, clade G (C1 inhibitor), member 1, (angioedema, hereditary) MAG 788232 sestrin 2 S S 2 O MAG 47681 splicing factor, arginine serine-rich 10 (transformer 2 homolog, Drosphila) MAG : 809535 S. 73.96S splicing factor, SFRS2 arginine serine-rich 2 MAG : 1584SS1 S.7630S Surfactant, pulmonary SFTPB associated protein B MAG : 48.6175 S.75231 solute carrier family 16 SLC16A1 (monocarboxylic acid transporters), member 1 MAG : 772304 S.79172 solute carrier family 25 SLC25AS (mitochondrial carrier; adenine nucleotide translocator), member 5 US 2006/0183141 A1 Aug. 17, 2006 42

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone D OGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E : 461 O98 S.214646 solute carrier family 35, O -2 member E2 MAG : 71863 S.5944 solute carrier family 40 (iron SLC40A1 O -2 regulated transporter), member 1 MAG E:: 839882 S.324787 solute carrier family 5 SLC5A3 (inositol transporters), member 3 MAG E:: 378.813 secretory leukocyte protease SLPI inhibitor (antileukoproteinase) MAG : 682846 S.119023 SMC2 structural maintenance of 2-like 1 (yeast) MAG S.SOfS8 SMC4 structural maintenance of chromosomes 4-like 1 (yeast) MAG S.897 18 spermine synthase SMS MAG S. 194477 E3 ubiquitin ligase SMURF2 SMURF2 O MAG S.174051 S8II CIE8 SNRP70 ribonucleoprotein 70 kDa polypeptide (RNP antigen) MAG : 2322223 S.1732SS 8II CIE8 SNRPA ribonucleoprotein ide A MAG : 490772 S.80506 S8II CIE8 SNRPA1 ribonucleoprotein ide A MAG E:: 95.0482 S8II CIE8 SNRPB ribonucleoprotein ides B and B1 MAG E:: 724387 S. 1063 S8II CIE8 SNRPC ribonucleoprotein ide C MAG : 47542 S.86948 CI(8. SNRPD1 ribonucleoprotein D1 ide 16 kDa MAG : 4318O3 S.334612 CI(8. SNRPE ribonucleoprotein polypeptide E MAG 2307015 S.16244 sperm associated antigen 5 SPAGS MAG 124781 S.71465 squalene epoxidase SQL E MAG 322643 S.8.185 Sulfide quinone reductase SQRDL ike (yeast) MAG E: 85060 S.8.185 Sulfide quinone reductase SQRDL ike (yeast) MAG 856,796 S.76244 spermidine synthase SRM MAG S.28707 signal sequence receptor, SSR3 gamma (translocon associated protein gamma) MAG : 7672O6 S.28707 signal sequence receptor, SSR3 gamma (translocon associated protein gamma) MAG E:: 813499 S.25723 Sjogren's SSSCA1 syndrome scleroderma autoantigen 1 MAG E: 1499.34 S.9075 serine/threonine kinase 17a STK17A (apoptosis-inducing) MAG 2106955 S.172052 serine/threonine kinase 18 STK18 MAG 129865 S.25O822 serine/threonine kinase 6 STK6 MAG 754.018 S.154567 supervillin SVIL MAG 705064 S. 104019 transforming, acidic coiled TACC3 coil containing protein 3 MAG 359457 S.433399 transgelin TAGLN MAG 33122 S.443.668 likely ortholog of mouse TBRG1 g transforming growth factor beta regulated gene 1 US 2006/0183141 A1 Aug. 17, 2006 43

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG E : 34.7373 S.1846.93 ranscription elongation TCEB1 O 2 actor B (SIII), polypeptide 1 (15 kDa, elongin C) MAG : 1631194 S.266940 -complex-associated-testis TCTEL1 expressed 1-like 1 MAG : 266696 S.266940 -complex-associated-testis TCTEL1 expressed 1-like 1 MAG E: 726086 S.378774 issue factor pathway TFPI2 inhibitor 2 MAG S355819 homolog of yeast Tim50 TIMSOL O 2 MAG HS.S831 issue inhibitor of TIMP1 oproteinase 1 (erythroid potentiating activity, collagenase inhibitor) MAG : 1534435 Hs. 6441 issue inhibitor of TIMP2 oproteinase 2 MAG E:: 810444 S.101.382 umor necrosis factor, alpha TNFAIP2 induced protein 2 MAG : 135791 umor necrosis factor TNFRSF12A receptor Superfamily, member 12A MAG : 1759582 umor necrosis factor TNFRSF12A receptor Superfamily, member 12A MAG E: 271670 umor necrosis factor (ligand) TNFSF12 Superfamily, member 12 MAG 75644 S.169886 enascin XB TNXB MAG 809466 S.30928 ranslocase of outer TOMM40 mitochondrial membrane 40 homolog (yeast) MAG : 82S470 S.156346 opoisomerase (DNA) II TOP2A alpha 170 kDa MAG : 16291.13 S. 104741 T-LAK cell-originated protein TOPK kinase MAG EE:E:: 785368 S. 104741 T-LAK cell-originated protein TOPK kinase MAG 814528 S.75497 umor protein p53 inducible nuclear protein 1 MAG 855.749 S.83848 riosephosphate isomerase 1 TPI1 MAG 488479 S. 77899 ropomyosin 1 (alpha) TPM1 MAG 74O62O S.3OO772 ropomyosin 2 (beta) TPM2 MAG 549146 S.3185O1 ripartite motif-containing 22 TRIM22 MAG 85.6427 HS.6566 hyroid hormone receptor TRIP13 interactor 13 MAG E: 1897944 S.114360 ransforming growth factor TSC22 beta-stimulated protein TSC OO 22 MAG 795.936 S.7SO66 ranslin TSN O MAG 612274 S.75318 ubulin, alpha 1 (testis TUBA1 specific) MAG 38816 HS.75318 ubulin, alpha 1 (testis TUBA1 specific) MAG 230742O S.458114 ubulin, beta polypeptide TUBB MAG 1636876 S.2S1653 ubulin, beta, 2 TUBB2 MAG 108377 S.21 635 ubulin, gamma 1 TUBG1 MAG 50743 S.42644 hioredoxin-like 2 TXNL2 MAG 853.368 S.294.75 hymidylate synthetase TYMS MAG 292515 S.21293 UDP-N-acteylglucosamine UAP1 pyrophosphorylase 1 MAG S.21293 UDP-N-acteylglucosamine UAP1 2 pyrophosphorylase 1 MAG EE:: 146882 ubiquitin-conjugating enzyme UBE2C E2C MAG : 769921 ubiquitin-conjugating enzyme UBE2C E2C MAG E:: 279.972 S.184325 ubiquitin-conjugating enzyme UBE21 E2, J1 (UBC6 homolog, yeast) US 2006/0183141 A1 Aug. 17, 2006 44

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone.ID UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 1292535 HS288:549 ubiquitin UBF-fl UBF-f O O IMAGE: 1550739 HS.108106 ubiquitin-like, containing UHRF1 O O PHD and RING finger domains, 1 IMAGE: 3442.43 HS.454562 uridine monophosphate UMPK kinase IMAGE: 760344 uridine monophosphate UMPS synthetase (orotate phosphoribosyl transferase and orotidine-5'- decarboxylase) MAG 489595 S.35086 ubiquitin specific protease 1 USP1 MAG 73596 S.35086 ubiquitin specific protease 1 USP1 MAG 813261 HS. 6651 vesicle-associated WAMP4 membrane protein 4 MAG 486221 S.14915S voltage-dependent anion WDAC1 channel 1 MAG 755145 S.155191 villin 2 (eZrin) VIL2 MAG 85403 S.23.1840 WW domain binding protein 2 WBP2 g MAG 234004 S.187991 SOCS box-containing WD WSB1 protein SWiP-1 MAG 271699 HS.187991 SOCS box-containing WD WSB1 protein SWiP-1 MAG : 16054O7 HS.136644 WD repeat and SOCS box WSB2 containing protein 2 MAG E: 898.095 HS.119 Wilms tumor 1 associated WTAP protein MAG 258761 HS.23495 HBXAg transactivated protein 1 XTP1 MAG 292996 HS.349S30 tyrosine 3 YWHAH monooxygenase? tryptophan 5-monooxygenase activation protein, eta polypeptide MAG 1933,716 HS.15220 Zinc finger protein 106 ZFP106 MAG 824875 S.1522O Zinc finger protein 106 ZFP106 MAG 84S419 S.3S1605 Zinc finger protein 276 ZFP276 MAG 755373 S.33532 Zinc finger protein 151 (pHZ ZNF151 67) MAG 461613 S.250493 Zinc finger protein 219 ZNF219 MAG S62115 S.356344 Zinc finger protein 36 (KOX ZNF36 g 18) MAG 486356 S.305953 Zinc finger protein 83 (HPF1) ZNF83 MAG 1034491 Data not found MAG 1664309 Data not found MAG 1680098 Homo sapiens transcribed sequences MAG 1881224 Homo sapiens transcribed sequences MAG 1926715 Data not found MAG 195419 In multiple clusters MAG 2012S23 S.458417 Homo sapiens transcribed sequence with strong similarity to protein pir: I56326 (H. sapiens) 156326 fatty acid binding protein homolog - human MAG 2O7029 Data not found MAG 232586 HS.102219 Homo sapiens transcribed sequences MAG 246684 HS.48058 Homo sapiens transcribed sequences MAG 26O187 HS.443O7 Homo sapiens transcribed sequence with weak similarity to protein ref: NP 060265.1 (H. sapiens) hypothetical protein FLJ20378 Homo sapiens MAG 2.78687 In multiple clusters MAG 279616 HS.46852 Homo sapiens transcribed sequences MAG 281 039 In multiple clusters MAG 283.751 HS.44205 Sapiens, clone MGC: 32686 IMAGE: 4051739, mRNA, complete cols MAG 290.162 In multiple clusters MAG 295473 In multiple clusters MAG 30093 Data not found MAG 3O2933 In multiple clusters MAG 32134 In multiple clusters MAG 321354 In multiple clusters US 2006/0183141 A1 Aug. 17, 2006 45

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG 321.905 HSSS08O Homo sapiens transcribed sequences MAG 32641 In multiple c lusters MAG 345833 In multiple c lusters MAG 4.0017 In multiple c lusters MAG 4.18279 Sapiens, clone IMAGE: 4448513, mRNA MAG 454219 Homo sapiens transcribed sequences MAG 470930 In multiple c lusters MAG 645702 HS.169514 Homo sapiens transcribed sequence with weak similarity to protein ref: NP 060265.1 (H. sapiens) hypothetical protein FLJ20378 Homo sapiens MAG 66SS08 in multiple c SCS MAG 66852 Data not found O 2 MAG 687297 Sapiens cDNA FLJ11245 fis, clone PLACE10O8629. MAG 713031 in multiple c SCS MAG 731290 HS.456464 Homo sapiens transcribed sequences MAG 74S476 HS.2O8414 Sapiens mRNA, cDNA DKFZp564D0472 (from clone DKFZp564D0472) MAG 757144 in multiple c SCS MAG 81.0156 in multiple c SCS MAG 811999 in multiple c SCS MAG 813636 S.452,394 Sapiens HSPC151 mRNA, complete cols MAG 824132 in multiple c SCS MAG 824756 Data not found MAG 824917 ille C SCS MAG 825659 in multiple c SCS MAG 841.238 S.237868 Sapiens esophageal carc inoma-related mRNA, complete sequence MAG 853968 S.116680 Homo sapiens transcribed sequences MAG 858.375 S.116808 Sapiens mRNA, cDNA g : clone DKFZp566J1846) MAG 897.68O Data not found O MAG O35796 S.33966S Sapiens, Similar to RIK EN cDNA 2700049P18 gene, clone MGC: 57827 IMAGE: 6064384, mRNA, complete cols MAG 30204 In multiple c usters O2 MAG 3.1316 S.33966S Sapiens, Similar to RIK EN cDNA 2700049P18 gene, clone MGC: 57827 IMAGE: 6064384, mRNA, complete cols MAG 39.705 S.28465 Sapiens, clone IMAGE: 5263527, mRNA O O MAG S364S1 S.126,714 Homo sapiens transcribed sequence with weak similarity to protein ref: NP 062553.1 (H. sapiens) hypothetical protein FLJ11267 Homo sapiens MAG S646O1 S.186579 Sapiens, clone IMAGE: 4081483, mRNA MAG 677546 S. 135448 Homo sapiens transcribed sequence MAG 837.950 S.12O605 Homo sapiens transcribed sequences MAG 91.1913 HS.370736 Homo sapiens transcribed sequences MAG 96475 HS.41853S Homo sapiens transcribed sequences MAG 2O2704 HS.268919 Sapiens cDNA FLJ37623 fis, clone BRCOC2O14013. MAG 203275 In multiple c usters MAG 22O376 HS.432827 Homo sapiens transcribed sequence with g g weak similarity to protein pir: 521348 (R. norvegicus) S21348 probable pol polyprotein-related protein 4 - rat MAG 236142 Data not found MAG 241282 Hs.299797 Sapiens cDNA FLJ34225 fis, clone FCBBF302.3372. MAG 3O8633 In multiple c usters MAG 34.6257 HS.31921S Sapiens, clone IMAGE: 5270727, mRNA MAG 3.58052 Hs.348874 Sapiens full length insert cDNA clone o o ZEO4G11 MAG 366414 In multiple c usters MAG 366SS8 In multiple c usters MAG 510273 In multiple c usters MAG 610362 In multiple c usters MAG 62S616 In multiple c usters US 2006/0183141 A1 Aug. 17, 2006 46

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone D UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 MAG 627688 Hs. 104123 Homo sapiens transcribed sequence O O MAG 7395.11 In multiple clusters O O MAG 745138 Hs.457442 Sapiens cDNA FLJ35797 fis, clone O O TESTI2005892, highly similar to TUBULIN ALPHA-3 ALPHA-7 CHAIN. MAG 77OO66 In multiple clusters MAG 809530 Data not found g g MAG 809731 Hs.375205 Sapiens, clone IMAGE: 45893.00, mRNA, partial cols MAG E: 8106OO Hs.430976 Homo sapiens transcribed sequence with strong similarity to protein pir: B42856 (H. sapiens) B42856 ubiquitin carrier protein E2 - human MAG 810899 in multiple clusters O O MAG 853066 Hs.446510 Homo sapiens transcribed sequence with weak similarity to protein ref: NP 060265.1 (H. sapiens) hypothetical protein FLJ20378 Homo sapiens MAG O8422 in multiple clusters MAG 21512 in multiple clusters g MAG 280S4 Hs.356538 Homo sapiens transcribed sequence with moderate similarity to protein pdb: 1 BGM (E. coli) O Chain O, Beta-Galactosidase (Chains I P) MAG 29883 in multiple clusters MAG 376O2 Hs. 106148 Sapiens mRNA, cDNA DKFZp434G0972 (from clone DKFZp434G0972) MAG 41854 in multiple clusters MAG 558625 Hs.25144 Sapiens cDNA FLJ31683 fis, clone NT2RI2OOS353. MAG E: 564426 Hs.446437 Homo sapiens transcribed sequence with weak similarity to protein ref: NP 060312.1 (H. sapiens) hypothetical protein FLJ20489 Homo sapiens MAG 569077 Data not found MAG 6O1926 Hs.457626 Homo sapiens transcribed sequences MAG 649341 Data not found MAG 6866OO Hs.170261 Sapiens cDNA FLJ38461 fis, clone FEBRA2O2O977. MAG 892.599 Hs.409561 Homo sapiens transcribed sequence MAG 898442 Hs.34068 Sapiens, clone IMAGE: 5296353, mRNA MAG 898826 Data not found MAG 2013496 Hs.268016 Sapiens cDNA: FLJ21243 fis, clone COLO1164. MAG 210486 In multiple clusters MAG 24O480 In multiple clusters MAG 2494.86 Data not found MAG 2S6619 In multiple clusters MAG 262313 Hs. 108873 Homo sapiens transcribed sequences MAG 266263 Hs.26418 Sapiens, clone IMAGE: 5261213, mRNA MAG 2.78729 Hs.29088 Homo sapiens transcribed sequence with weak similarity to protein sp: P11369 (M. musculus) POL2 MOUSE Retrovirus related POL polyprotein Contains: Reverse transcriptase: Endonuclease IMAGE: 28927 Hs.388212 Homo sapiens transcribed sequence IMAGE: 2.89505 Hs.44829 Homo sapiens transcribed sequence with moderate similarity to protein ref: NP 060265.1 (H. sapiens) hypothetical protein FLJ20378 Homo sapiens IMAGE: 291.394 Hs. 108873 Homo sapiens transcribed sequences IMAGE: 34O745 Hs.25144 Sapiens cDNA FLJ31683 fis, clone NT2RI2OOS353. IMAGE: 346643 Hs.23575 Homo sapiens transcribed sequences IMAGE: 358647 Hs.26418 Sapiens, clone IMAGE: 5261213, mRNA IMAGE: 361.456 In multiple clusters IMAGE: 38009 Hs. 170056 Sapiens mRNA, cDNA DKFZp586BO220 (from clone DKFZp586BO220) US 2006/0183141 A1 Aug. 17, 2006 47

-continued Sequences

CSR Activated = 2, Redundant Quiescent = -2, Clone.ID UGCluster Name Symbol (Y = 1, N = 0) cell cycle = 0 IMAG E : 38072 HS.293782 Sapiens, clone MGC: 27375 IMAGE: 4688423, 1 -2 mRNA, complete cols IMAG E : 42935 HS.445537 Homo sapiens transcribed sequence with -2 weak similarity to protein pir: T12486 (H. sapiens) T12486 hypothetical protein DKFZp566HO33.1 - human IMAG 4318OS Data not found IMAG 487499 HS.24758 Sapiens cDNA FLJ32068 fis, clone g OCBBF1OOO114. IMAG 491415 In multiple clusters IMAG SO3839 In multiple clusters IMAG 664233 In multiple clusters IMAG 69309 HS.452719 Homo sapiens transcribed sequence with weak similarity to protein sp: P29974 (M. musculus) CNG1 MOUSE coMP-gated cation channel alpha 1 (CNG channel alpha 1) (CNG-1) (CNG1) (Cyclic nucleotide gated channel alpha 1) (Cyclic nucleotide gated channel, photoreceptor) (Cyclic-nucleotide gated cation channel 1) (Rod photoreceptor cGMP-gated channel alpha subunit) MAG E: 69378 S.279898 Sapiens cDNA: FLJ23165 fis, clone LNGO9846. MAG 74-1954 In multiple clusters MAG 742685 S.291804 Sapiens cDNA FLJ35517 fis, clone SPLEN2OOO698. MAG E: 742806 S.398090 Sapiens cDNA FLJ39131 fis, clone NTONG 2008143. MAG 767289 in multiple clusters MAG 782737 in multiple clusters MAG 785819 Sapiens cDNA: FLJ21243 fis, clone COLO1164. MAG 786573 in multiple clusters MAG 788217 S.34359 Homo sapiens transcribed sequences MAG 795427 S.356688 Sapiens cDNA FLJ37527 fis, clone BRCAN2O11946. MAG E: 8101.33 S. 10362 Sapiens cDNA: FLJ20944 fis, clone ADSEO1780. MAG 810326 in multiple clusters MAG 8104.86 S.356618 Sapiens cDNA clone IMAGE: 4822701, partial cols MAG 810859 in multiple clusters MAG 811751 S.293782 Sapiens, clone MGC: 27375 IMAGE: 4688423, mRNA, complete cols MAG 811837 in multiple clusters MAG 81417 in multiple clusters MAG 824111 Homo sapiens transcribed sequence with moderate similarity to protein sp: Q99576 (H. sapiens) GILZ HUMAN Glucocorticoid induced leucine Zipper protein (Delta sleep inducing peptide immunoreactor) (DSIP immunoreactive peptide) (DIP protein) (hDIP) (TSC-22-like protein) (TSC-22R) IMAGE: 8241SO HS.439.107 Sapiens, clone IMAGE: 5288451, mRNA IMAGE: 82434 In multiple clusters IMAGE: 854.122 HS.349326 Sapiens cDNA FLJ30677 fis, clone FCBBF2OOOO87. IMAGE: 8558O8 HS.443798 Homo sapiens transcribed sequences IMAGE: 866276 HS.442762 Homo sapiens transcribed sequences IMAGE: 898.133 HS.3S1108 Homo sapiens transcribed sequences IMAGE: 95.1007 HS.112862 Homo sapiens transcribed sequences US 2006/0183141 A1 Aug. 17, 2006 48

What is claimed is: comparing said level of mRNA to the level of said mRNA 1. A method of classifying a cancer, said method com in a reference sample. prising: 7. The method according to claim 6, wherein said com (a) obtaining an CSR expression profile from a sample paring step comprises determination of statistical correla from said Subject; and tion. 8. The method according to claim 6, wherein said com (b) comparing said obtained expression profile to a ref paring step comprises a nearest shrunken centroid analysis erence CSR expression profile to classify said cancer as step. activated or quiescent. 9. A kit for cancer classification, the kit comprising: 2. The method according to claim 1, wherein said CSR expression profile comprises a dataset obtained from at least a set of primers specific for at least 25 CSR genes; and about 25 CSR genes. instructions for use. 3. The method according to claim 2, wherein said cancer 10. The kit according to claim 9, further comprising a is a carcinoma. Software package for statistical analysis of expression pro 4. The method according to claim 3, wherein said cancer files, and a reference dataset for a CSR signature. is a breast carcinoma, lung adenocarcinoma or gastric car 11. A kit for determining susceptibility to undesirable cinoma. toxicity, the kit comprising: 5. The method according to claim 1, wherein expression profile is a transcriptional profile. a microarray comprising probes specific for at least 25 6. The method according to claim 5, the method com CSR genes; and instructions for use. prising: 12. The kit according to claim 11, further comprising a Software package for statistical analysis of expression pro extracting mRNA from said cancer cell; files. quantitating the level of mRNA corresponding to CSR Sequences: