US 2016.0068916A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2016/0068916 A1 Nekarda et al. (43) Pub. Date: Mar. 10, 2016

(54) TEST KITS (30) Foreign Application Priority Data (71) Applicant: Pacific Edge Limited, Dunedin (NZ) Dec. 23, 2005 (NZ) ...... 544.432 (72) Inventors: Hjalmar Nekarda, Taufkirchen (DE); Publication Classification Jan Friederichs, Munich (DE); Bernhard Holzmann, Munich (DE); (51) Int. Cl. Robert Rosenberg, Munich (DE); CI2O I/68 (2006.01) Anthony Edmund Reeve, Dunedin (52) U.S. Cl. (NZ); Michael Alan Black, Dunedin CPC ...... CI2O I/6886 (2013.01); C12O 2600/1 18 (NZ); John Lindsay McCall, Auckland (2013.01); C12O 2600/158 (2013.01); C12O (NZ); Yu-Hsin Lin, Dunedin (NZ): 2600/16 (2013.01) Robert Craig Pollock, Dunedin (NZ) (57) ABSTRACT This invention relates to prognostic signatures, and compo (73) Assignee: Pacific Edge Limited, Dunedin (NZ) sitions and methods for determining the prognosis of cancer in a patient, particularly for colorectal cancer. Specifically, (21) Appl. No.: 14/860,024 this invention relates to the use of genetic markers for the prediction of the prognosis of cancer, such as colorectal can (22) Filed: Sep. 21, 2015 cer, based on signatures of genetic markers. In various aspects, the invention relates to a method of predicting the likelihood of long-term Survival of a cancer patient, a method Related U.S. Application Data of determining a treatment regime for a cancer patient, a (60) Division of application No. 12/214,782, filed on Jun. method of preparing a treatment modality for a cancer patient, 20, 2008, which is a continuation of application No. among other methods as well as kits and devices for carrying PCT/NZ2006/000343, filed on Dec. 22, 2006. out these methods. Patent Application Publication Mar. 10, 2016 Sheet 1 of 9 US 2016/006891.6 A1

FIG.

New Zealand CRC data set German CRC data Set Oligo-spotted arrays, Affymetrix arrays (Sample n is 149, 47 relapsed, (Sample, n=55, 26 relapsed, 102 non-relapsed; , nr 038) 29 non-reiapsed; genes, nr 5260)

Class predicition Class predicition

identify a 19 signature, called German signature

Germar CRC data set with NZ NZ CRC data set with Geria signature genes signature genes (Sample, n=55, genes, n-18) (Sample, rs149; genes, n-4)

Patent Application Publication Mar. 10, 2016 Sheet 2 of 9 US 2016/006891.6 A1

FG, 2.

----- | st w w v . “rs----- , r wr 23 . g : '...... ; go y ... s. 5. sos w wg M, v...... is P-elis 50,000 ... SE---- Predicted recurrentsecret S -P-value Predictax <0.000 cr-ferrent e - - - - Predicted recurrent c T. w ve------e.truerrrrrrrrrroom gec or soverryw - 0 0 20 30 40 50 60 1 20 30 4, 5, 80 Risease free survival months} Disease free surviva (onths)

C D fro- wned g o is - a assaakassassississ r. was as 8 3 wer 3 - 3 x Y a in . . . 2 *A : is

. P-waite 46.49 P-waite 4039 - Predicted non-recurrent - Fredicted non-recurrent - - - - Precicted fectifier - - - - Pfeiced feetiret

d s Morrowmirrowroo. s ---r-r-sur 0 0 20 3 43 50 60 2 30 ( 50 80 Disease free survival months} isease free survival months Patent Application Publication Mar. 10, 2016 Sheet 3 of 9 US 2016/006891.6 A1

FG, 3

do a

W ** - &

a dtra - - - - - a a A- a v w w y a A P. s r. v. v. ap w Aw

s war Stage if (predictedio-recurent * * * Stage it (predicted rectivent} * * * * * Stage if predicted non-recurret ' ' ' ' Stage 3 predicted recurrent)

10 2. 30 40 53 6) Disease free survival months)

g

& s

s 3 s .

&ss. a w w w w aw w ruit; p una' r

t

Malea - rew ------s Stage pedicted for-exuifei - - - Stage is predicted ?ectifret) Stage i3 predicted ro-recurrent 3 * * * * Stage (predicted recipient) i O 2. 3) 4. Sc 6. Disease free survival months) Patent Application Publication Mar. 10, 2016 Sheet 4 of 9 US 2016/006891.6 A1

•,ç•k?$ ***??{}--*J{};1r--;******************************•4s??»),«ºgaer.

so so go go go 8:SS 83 geod goisse Acago S.A digitoxic So Kyoid Patent Application Publication Mar. 10, 2016 Sheet 5 of 9 US 2016/006891.6 A1

***********,………… {}{}, ******®

***

{}{}} (panu?uoo)?ºr?

99' Sg 8.3avo pegsses Áf383.33 Rosiodoxic Patent Application Publication Mar. 10, 2016 Sheet 6 of 9 US 2016/006891.6 A1

{ ),*{?3}********###***********•** *:**;·.x

xx

O E. O 90 gC. ic 33ssage 33&so es; SSaga: Agoauco s, so took S.A. c. logicatio, Patent Application Publication Mar. 10, 2016 Sheet 7 of 9 US 2016/006891.6 A1

s

s

OS" is City eaac gess said goes to a taogi Odo Patent Application Publication Mar. 10, 2016 Sheet 8 of 9 US 2016/006891.6 A1

F.G. 6 Number of Appearances in “Top 108' List

SO 2:3

wer

o i

e

d s: Patent Application Publication Mar. 10, 2016 Sheet 9 of 9 US 2016/006891.6 A1

FIG 7 Number of Appearances in “Top 100” List.

Wilcoxon

s i i nod-test :

C

s

3 US 2016/006891.6 A1 Mar. 10, 2016

TEST KITS accuracy of prediction can be enhanced by combining the markers together into a prognostic signature for, providing for RELATED APPLICATIONS much more effective individual tests than single-gene assays. 0001. This application is a Divisional application filed Also provided for is the application of techniques, such as under 35 U.S.C. S 120 and 37 C.F.R. 1.53(b), which claims statistics, machine learning, artificial intelligence, and data priority to U.S. patent application Ser. No. 13/214,782, which mining to the prognostics signatures to generate prediction models. In another embodiment, expression levels of the is a Continuation under 35 U.S.C. S.1.1.11(a) of PCT/NZ2006/ markers of a particular prognostic signature in the tumour of 000343, International Filing Date 22 Dec. 2006, which claims a patient can then be applied to the prediction model to deter the benefit of New Zealand Provisional Patent Application mine the prognosis. No. 544432 filed Dec. 23, 2005, each of which is incorporated 0008. In certain embodiments, the expression level of the by reference herein in its entirety. markers can be established using microarray methods, quan FIELD OF THE INVENTION titative polymerase chain reaction (qPCR), or immunoassays. 0002. This invention relates to test kits, methods and com BRIEF DESCRIPTION OF THE FIGURES positions for determining the prognosis of cancer, particu larly colorectal cancer, in a patient. Specifically, this inven 0009. This invention is described with reference to spe tion relates to the use of genetic markers for determining the cific embodiments thereofand with reference to the figures, in prognosis of cancer, such as colorectal cancer, based on prog which: nostic signatures. 0010 FIG. 1 depicts a flow chart showing the methodol ogy for producing the prognostic signatures from 149 New BACKGROUND OF THE INVENTION Zealand (NZ) and 55 German (DE) colorectal cancer (CRC) samples. New Zealand RNA samples were hybridized to oli 0003 Colorectal cancer (CRC) is one of the most common gonucleotide spotted arrays, with a 22-gene signature pro cancers in the developed world, and its incidence is continu duced via leave one out cross validation (LOOCV), and then ing to increase. Although the progression of colorectal cancer independently validated by LOOCV using the 55 sample DE from benign polyp to adenoma to carcinoma is well Studied data set. German RNA samples were hybridized to Affyme (1), the molecular events influencing the transition and estab trix arrays, with a 19-gene signature produced via LOOCV. lishment of metastasis are less well understood. The progno and then independently validated by LOOCV using the NZ sis and treatment of CRC currently depends on the clinico data set. pathological stage of disease at the time of diagnosis, and 0011 FIG. 2 depicts a Kaplan-Meier analysis of disease primary Surgical treatment. Unfortunately disease stage alone free survival time with patients predicted as high versus low does not allow accurate prediction of outcome for individual risk of tumour recurrence: FIG. 2a, using NZ 22-gene signa patients. If patient outcomes could be predicted more accu ture on 149 tumours from NZ patients: FIG. 2b, using DE rately treatments could be tailored to avoid under-treating 19-genesignature on 55 tumours from DE patients: FIG.2c, patients destined to relapse, or over-treating patients who NZ prognostic signature validated on 55 tumours from DE would be helped by surgery alone. patients: FIG. 2d. DE prognostic signature validated on 149 0004. Many attempts have been made to identify markers tumours from NZ patients. P-values were calculated using the that predict clinical outcome in CRC. Until recently most log-rank test. studies focused on single proteins or gene mutations with 0012 FIG. 3 depicts a Kaplan-Meier analysis of disease limited Success in terms of prognostic information (2). free survival time with patients predicted as high versus low Microarray technology enables the identification of sets of risk of tumour recurrence: FIG. 3a, using the 22-gene NZ genes, called classifiers or signatures that correlate with can signature on NZ patients with Stage II and Stage III disease; cer outcome. This approach has been applied to a variety of FIG. 3b, using the 19-gene DE signature on NZ patients with cancers, including CRC (3-5), but methodological problems Stage II and Stage III disease. and a lack of independent validation has cast doubt over the 0013 FIG. 4 shows the predictive value of signatures of findings (6.7). Furthermore, doubts about the ability of clas varying lengths for prognosis of colorectal cancer. These sifiers/signatures to predict outcome have arisen due to poor signatures were derived from 10 replicate runs of 11-fold concordance of identified by different researchers using dif cross validation. Each replicate 11-fold validation run is indi ferent array platforms and methodologies (8). cated by the various dashed lines; the mean across replicates 0005. There is a need for further tools to predict the prog by the bold line. In each fold of the cross-validation, genes nosis of colorectal cancer. This invention provides further were removed if the fold-change across classes was <1.1 (for methods, compositions, kits, and devices based on prognostic the remaining samples not removed in that particular fold). cancer markers, specifically colorectal cancer prognostic The genes were then ranked using a modified t-statistic, markers, to aid in the prognosis and treatment of cancer. obtaining a different set of genes for each fold, and classifiers using the top n-genes (where n=2 to 200) were constructed SUMMARY OF THE INVENTION for each fold. The genes therefore may differ for each fold of 0006. In certain embodiments there is provided a set of each replicate 11-fold cross validation. FIG. 4 (A): Sensitivity markers genes identified to be differentially expressed in (proportion of recurrent tumours correctly classified), with recurrent and non-recurrent colorectal tumours. This set of respect to number of genes/signature. FIG. 4 (B): Specificity genes and test kits can be used to generate prognostics signa (proportion of non-recurrent tumours correctly classified), tures, comprising two or more markers, capable of predicting with respect to number of genes/signature. FIG. 4 (C): Clas the progression of colorectal tumour in a patient. sification rate (proportion of tumours correctly classified), 0007. The individual markers can differentially expressed with respect to number of genes/signature. The nomenclature depending on whether the tumour is recurrent or not. The applied by the statistician is as follows: I refers to Stage I or US 2016/006891.6 A1 Mar. 10, 2016

Stage II colorectal cancer (with no progression), and IV refers 0020. The terms “prognostic signature.” “signature.” and to eventual progression to Stage IV metastases. the like refer to a set of two or more markers, for example 0014 FIG. 5 shows the decreased predictive value of sig CCPMs, that when analysed together as a set allow for the natures for the prognosis of colorectal cancer, in a repeat of determination of or prediction of an event, for example the the experiment of FIG. 4, except with the two genes, FAS and prognostic outcome of colorectal cancer. The use of a signa ME2, removed from the data set. FIG. 5 (A): Sensitivity ture comprising two or more markers reduces the effect of (proportion of recurrent tumours correctly classified), with individual variation and allows for a more robust prediction. respect to number of genes/signature. FIG. 5 (B): Specificity Non-limiting examples of CCPMs are set forth in Tables 1, 2, (proportion of non-recurrent tumours correctly classified), 5, and 9, while non-limiting examples of prognostic signa with respect to number of genes/signature. FIG. 5 (C): Clas tures are set forth in Tables 3, 4, 8A, 8B, and 9, herein. In the sification rate (proportion of tumours correctly classified), context of the present invention, reference to “at least one.” with respect to number of genes/signature. “at least two.” “at least five.” etc., of the markers listed in any 0015 FIG. 6 shows a pairs chart of “top counts” (number particular set (e.g., any signature) means any one or any and of times each gene appeared in the “top-n' gene lists, i.e., top all combinations of the markers listed. 10, top 20, top 100, and top 325 as described in Example 17) 0021. The term “prediction method is defined to cover the using three different normalization methods produced using broader genus of methods from the fields of statistics, the R statistical computing package(10.39), in accordance machine learning, artificial intelligence, and data mining, with Example 17, below. The “pairs' chart is described in by which can be used to specify a prediction model. These are Becker et al., in their treatise on the S Language (upon which discussed further in the Detailed Description section. R is based; see reference 39). To compare methods, use row 0022. The term “prediction model” refers to the specific and column as defined on the diagonal to obtain the scatter mathematical model obtained by applying a prediction plot between those two methods, analogous to reading dis method to a collection of data. In the examples detailed tances off a distance chart on a map herein, Such data sets consist of measurements of gene activ 0016 FIG. 7 shows the pairs chart (39) of top counts ity in tissue samples taken from recurrent and non-recurrent (number of times each gene appeared in the “top-n' gene lists, colorectal cancer patients, for which the class (recurrent or i.e., top 10, top 20, top 100, and top 325 as described in non-recurrent) of each sample is known. Such models can be Example 17) using three different filtering statistics: FIG. used to (1) classify a sample of unknown recurrence status as 7(a) two-sample Wilcoxon test (41), FIG. 7(b) t-test (modi being one of recurrent or non-recurrent, or (2) make a proba fied using an ad-hoc correction factor in the denominator to bilistic prediction (i.e., produce either a proportion or per abrogate the effect of low-variance genes falsely appearing as centage to be interpreted as a probability) which represents significant) and FIG.7(c) empirical Bayes as provided by the the likelihood that the unknown sample is recurrent, based on “limma'(10.40.42) package of Bioconductor (12.40). the measurement of mRNA expression levels or expression products, of a specified collection of genes, in the unknown DETAILED DESCRIPTION sample. The exact details of how these gene-specific mea Surements are combined to produce classifications and proba Definitions bilistic predictions are dependent on the specific mechanisms of the prediction method used to construct the model. 0017. Before describing embodiments of the invention in (0023 “Sensitivity”, “specificity” (or “selectivity”), and detail, it will be useful to provide some definitions of terms “classification rate', when applied to the describing the effec used herein. tiveness of prediction models mean the following: 0.018. The term “marker” refers to a molecule that is asso 0024) “Sensitivity” means the proportion of truly positive ciated quantitatively or qualitatively with the presence of a samples that are also predicted (by the model) to be positive. biological phenomenon. Examples of “markers' include a In a test for CRC recurrence, that would be the proportion of polynucleotide, Such as a gene or gene fragment, RNA or recurrent tumours predicted by the model to be recurrent. RNA fragment; or a gene product, including a polypeptide “Specificity' or “selectivity” means the proportion of truly Such as a peptide, oligopeptide, protein, or protein fragment; negative samples that are also predicted (by the model) to be or any related metabolites, by products, or any other identi negative. In a test for CRC recurrence, this equates to the fying molecules, such as antibodies or antibody fragments, proportion of non-recurrent samples that are predicted to by whether related directly or indirectly to a mechanism under non-recurrent by the model. “Classification Rate' is the pro lying the phenomenon. The markers of the invention include portion of all samples that are correctly classified by the the nucleotide sequences (e.g., GenBank sequences) as dis prediction model (be that as positive or negative). closed herein, in particular, the full-length sequences, any 0025. As used herein “antibodies' and like terms refer to coding sequences, any fragments, or any complements immunoglobulin molecules and immunologically active por thereof, and any measurable marker thereofas defined above. tions of immunoglobulin (Ig) molecules, i.e., molecules that 0019. The terms “CCPM or “colorectal cancer prognos contain an antigenbinding site that specifically binds (immu tic marker” or “CCPM family member” refer to a marker with noreacts with) an antigen. altered expression that is associated with a particular progno 0026. These include, but are not limited to, polyclonal, sis, e.g., a higher or lower likelihood of recurrence of cancer, monoclonal, chimeric, single chain, Fc., Fab., Fab', and Fab as described herein, but can exclude molecules that are known fragments, and a Fab expression library. Antibody molecules in the prior art to be associated with prognosis of colorectal relate to any of the classes IgG, IgM, IgA, IgE, and Ig). cancer. It is to be understood that the term CCPM does not which differ from one another by the nature of heavy chain require that the marker be specific only for colorectal present in the molecule. These include Subclasses as well, tumours. Rather, expression of CCPM can be altered in other Such as IgG1, IgG2, and others. The light chain may be a types of tumours, including malignant tumours. kappa chain or a lambda chain. Reference hereinto antibodies US 2016/006891.6 A1 Mar. 10, 2016

includes a reference to all classes, Subclasses, and types. Also orectal cancer. Differential expression includes both quanti included are chimeric antibodies, for example, monoclonal tative, as well as qualitative, differences in the temporal or antibodies or fragments thereofthat are specific to more than cellular expression pattern in a gene or its expression products one source, e.g., a mouse or human sequence. Further among, for example, normal and diseased cells, or among included are camelid antibodies, shark antibodies or nano cells which have undergone different disease events or dis bodies. ease stages, or cells with different levels of proliferation. 0027. The terms “cancer and “cancerous” refer to or 0031. The term “expression includes production of poly describe the physiological condition in mammals that is typi nucleotides and polypeptides, in particular, the production of cally characterized by abnormal or unregulated cell growth. RNA (e.g., mRNA) from a gene or portion of a gene, and Cancer and cancer pathology can be associated, for example, includes the production of a polypeptide encoded by an RNA with metastasis, interference with the normal functioning of or gene or portion of a gene, and the appearance of a detect neighbouring cells, release of cytokines or other secretory able material associated with expression. For example, the products at abnormal levels, suppression or aggravation of formation of a complex, for example, from a polypeptide inflammatory or immunological response, neoplasia, prema polypeptide interaction, polypeptide-nucleotide interaction, lignancy, malignancy, invasion of Surrounding or distant tis or the like, is included within the scope of the term “expres Sues or organs, such as lymph nodes, etc. Specifically sion'. Another example is the binding of a binding ligand, included are colorectal cancers, such as, bowel (e.g., large Such as a hybridization probe or antibody, to a gene or other bowel), anal, and rectal cancers. polynucleotide or oligonucleotide, a polypeptide or a protein 0028. The term “colorectal cancer includes cancer of the fragment, and the visualization of the binding ligand. Thus, colon, rectum, and/oranus, and especially, adenocarcinomas, the intensity of a spot on a microarray, on a hybridization blot and may also include carcinomas (e.g., squamous cloaco Such as a Northern blot, or on an immunoblot Such as a genic carcinomas), melanomas, lymphomas, and sarcomas. Western blot, or on a bead array, or by PCR analysis, is Epidermoid (nonkeratinizing Squamous cell or basaloid) car included within the term “expression of the underlying bio cinomas are also included. The cancer may be associated with logical molecule. particular types of polyps or other lesions, for example, tubu lar adenomas, tubulovillous adenomas (e.g., villoglandular 0032. The terms “expression threshold,” and “defined polyps), Villous (e.g., papillary) adenomas (with or without expression threshold' are used interchangeably and refer to adenocarcinoma), hyperplastic polyps, hamartomas, juvenile the level of a marker in question outside which the polynucle polyps, polypoid carcinomas, pseudopolyps, lipomas, or lei otide or polypeptide serves as a predictive marker for patient omyomas. The cancer may be associated with familial poly survival without cancer recurrence. The threshold will be posis and related conditions such as Gardner's syndrome or dependent on the predictive model established are derived Peutz-Jeghers syndrome. The cancer may be associated, for experimentally from clinical studies such as those described example, with chronic fistulas, irradiated anal skin, leuko in the Examples below. Depending on the prediction model plakia, lymphogranuloma Venereum, Bowen's disease (in used, the expression threshold may be set to achieve maxi traepithelial carcinoma), condyloma acuminatum, or human mum sensitivity, or for maximum specificity, or for minimum papillomavirus. In other aspects, the cancer may be associ error (maximum classification rate). For example a higher ated with basal cell carcinoma, extramammary Paget’s dis threshold may be set to achieve minimum errors, but this may ease, cloacogenic carcinoma, or malignant melanoma. result in a lower sensitivity. Therefore, for any given predic 0029. The terms “differentially expressed,” “differential tive model, clinical studies will be used to set an expression expression, and like phrases, refer to a gene marker whose threshold that generally achieves the highest sensitivity while expression is activated to a higher or lower level in a subject having a minimal error rate. The determination of the expres (e.g., test sample) having a condition, specifically cancer, sion threshold for any situation is well within the knowledge Such as colorectal cancer, relative to its expression in a control of those skilled in the art. Subject (e.g., reference sample). The terms also include mark 0033. The term “long-term survival' is used herein to refer ers whose expression is activated to a higher or lower level at to survival for at least 5 years, more preferably for at least 8 different stages of the same condition; in recurrent or non years, most preferably for at least 10 years following Surgery recurrent disease; or in cells with higher or lower levels of or other treatment. proliferation. A differentially expressed marker may be either activated or inhibited at the polynucleotide level or polypep 0034. The term “microarray' refers to an ordered or unor tide level, or may be subject to alternative splicing to result in dered arrangement of capture agents, preferably polynucle a different polypeptide product. Such differences may be otides (e.g., probes) or polypeptides on a Substrate. See, e.g., evidenced by a change in mRNA levels, Surface expression, Microarray Analysis, M. Schena, John Wiley & Sons, 2002: secretion or other partitioning of a polypeptide, for example. Microarray Biochip Technology, M. Schena, ed., Eaton Pub 0030) Differential expression may include a comparison lishing, 2000: Guide to Analysis of DNA Microarray Data, S. of expression between two or more markers (e.g., genes or Knudsen, John Wiley & Sons, 2004; and Protein Microarray their gene products); or a comparison of the ratios of the Technology, D. Kambhampati, ed., John Wiley & Sons, 2004. expression between two or more markers (e.g., genes or their 0035. The term "oligonucleotide' refers to a polynucle gene products); or a comparison of two differently processed otide, typically a probe or primer, including, without limita products (e.g., transcripts or polypeptides) of the same tion, single-stranded deoxyribonucleotides, single- or marker, which differ between normal subjects and diseased double-stranded ribonucleotides, RNA: DNA hybrids, and Subjects; or between various stages of the same disease; or double-stranded DNAs. Oligonucleotides, such as single between recurring and non-recurring disease; or between stranded DNA probe oligonucleotides, are often synthesized cells with higher and lower levels of proliferation; or between by chemical methods, for example using automated oligo normal tissue and diseased tissue, specifically cancer, or col nucleotide synthesizers that are commercially available, or by US 2016/006891.6 A1 Mar. 10, 2016 a variety of other methods, including in vitro expression ronment below their melting temperature. The higher the Systems, recombinant techniques, and expression in cells and degree of desired homology between the probe and hybridis organisms. able sequence, the higher the relative temperature which can 0036) The term "polynucleotide,” when used in the singu be used. As a result, it follows that higher relative tempera lar or plural, generally refers to any polyribonucleotide or tures would tend to make the reaction conditions more strin polydeoxribonucleotide, which may be unmodified RNA or gent, while lower temperatures less so. Additional details and DNA or modified RNA or DNA. This includes, without limi explanation of stringency of hybridization reactions, are tation, single- and double-stranded DNA, DNA including found e.g., in Ausubel et al., Current Protocols in Molecular single- and double-stranded regions, single- and double Biology, Wiley Interscience Publishers, (1995). stranded RNA, and RNA including single- and double I0043 "Stringent conditions” or “high stringency condi stranded regions, hybrid molecules comprising DNA and tions', as defined herein, typically: (1) employ low ionic RNA that may be single-stranded or, more typically, double strength and high temperature for washing, for example 0.015 Stranded or include single- and double-stranded regions. Also M sodium chloride/0.0015 M sodium citrate/0.1% sodium included are triple-stranded regions comprising RNA or dodecyl sulfate at 50° C.; (2) employ a denaturing agent DNA or both RNA and DNA. Specifically included are during hybridization, such as formamide, for example, 50% mRNAs, cDNAs, and genomic DNAs, and any fragments (v/v) formamide with 0.1% bovine serum albumin/0.1% thereof. The term includes DNAs and RNAs that contain one Ficoll/0.1% polyvinylpyrrolidone/50 mM sodium phosphate or more modified bases, such as tritiated bases, or unusual buffer at pH 6.5 with 750 mM sodium chloride, 75 mM bases, such as inosine. The polynucleotides of the invention sodium citrate at 42° C.; or (3) employ 50% formamide, can encompass coding or non-coding sequences, or sense or 5xSSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM antisense sequences. It will be understood that each reference sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5x, to a "polynucleotide' or like term, herein, will include the Denhardt's solution, sonicated salmon sperm DNA (50 full-length sequences as well as any fragments, derivatives, or ug/ml), 0.1% SDS, and 10% dextran sulfate at 42°C., with variants thereof. washes at 42°C. in 0.2xSSC (sodium chloride/sodium cit 0037 "Polypeptide,” as used herein, refers to an oligopep rate) and 50% formamide at 55° C., followed by a high tide, peptide, or protein sequence, or fragment thereof, and to Stringency wash comprising 0.1xSSC containing EDTA at naturally occurring, recombinant, synthetic, or semi-syn 550 C. thetic molecules. Where "polypeptide' is recited herein to 0044) "Moderately stringent conditions” may be identi refer to an amino acid sequence of a naturally occurring fied as described by Sambrook et al., Molecular Cloning: A protein molecule, "polypeptide' and like terms, are not meant Laboratory Manual, New York: Cold Spring Harbor Press, to limit the amino acid sequence to the complete, native 1989, and include the use of washing solution and hybridiza amino acid sequence for the full-length molecule. It will be tion conditions (e.g., temperature, ionic strength, and % SDS) understood that each reference to a "polypeptide' or like less stringent that those described above. An example of term, herein, will include the full-length sequence, as well as moderately stringent conditions is overnight incubation at any fragments, derivatives, or variants thereof. 37° C. in a solution comprising: 20% formamide, 5xSSC 0038. The term “prognosis” refers to a prediction of medi (150 mM. NaCl, 15 mM trisodium citrate), 50 mM sodium cal outcome, for example, a poor or good outcome (e.g., phosphate (pH 7.6), 5xDenhardt’s solution, 10% dextransul likelihood of long-term survival); a negative prognosis, or fate, and 20 mg/ml denatured sheared salmon sperm DNA, poor outcome, includes a prediction of relapse, disease pro followed by washing the filters in 1xSSC at about 37-50° C. gression (e.g., tumour growth or metastasis, or drug resis The skilled artisan will recognize how to adjust the tempera tance), or mortality; a positive prognosis, or good outcome, ture, ionic strength, etc. as necessary to accommodate factors includes a prediction of disease remission, (e.g., disease-free such as probe length and the like. status), amelioration (e.g., tumour regression), or stabiliza I0045. The practice of the present invention will employ, tion. unless otherwise indicated, conventional techniques of 0039. The term “proliferation” refers to the processes molecular biology (including recombinant techniques). leading to increased cell size or cell number, and can include microbiology, cell biology, and biochemistry, which are one or more of tumour or cell growth, angiogenesis, inner within the skill of the art. Such techniques are explained fully Vation, and metastasis. in the literature, such as, Molecular Cloning: A Laboratory 0040. The term "qPCR” or “QPCR” refers to quantative Manual, 2nd edition, Sambrook et al., 1989; Oligonucleotide polymerase chain reaction as described, for example, in PCR Synthesis, MJ Gait, ed., 1984; Animal Cell Culture, R. I. Technique: Quantitative PCR, J. W. Larrick, ed., Eaton Pub Freshney, ed., 1987; Methods in Enzymology, Academic lishing, 1997, and A-Z of Quantitative PCR. S. Bustin, ed., Press, Inc.; Handbook of Experimental Immunology, 4th edi IUL Press, 2004. tion, D. M. Weir & CC. Blackwell, eds., Blackwell Science 0041) The term "tumour” refers to all neoplastic cell Inc., 1987; Gene Transfer Vectors for Mammalian Cells, J. M. growth and proliferation, whether malignant or benign, and Miller & M. P. Calos, eds., 1987; Current Protocols in all pre-cancerous and cancerous cells and tissues. Molecular Biology, F. M. Ausubelet al., eds., 1987; and PCR: 0042 "Stringency” of hybridization reactions is readily The Polymerase Chain Reaction, Mullis et al., eds., 1994. determinable by one of ordinary skill in the art, and generally is an empirical calculation dependent upon probe length, DESCRIPTION OF EMBODIMENTS OF THE Washing temperature, and salt concentration. In general, INVENTION longer probes require higher temperatures for proper anneal 0046. In colorectal cancer, discordant results have been ing, while shorter probes need lower temperatures. Hybrid reported for prognostic markers. The present invention dis ization generally depends on the ability of denatured DNA to closes the use of microarrays to reach a firmer conclusion, and reanneal when complementary strands are present in an envi to determine the prognostic role of specific prognostic signa US 2016/006891.6 A1 Mar. 10, 2016 tures in colorectal cancer. The microarray-based studies signatures for determining the prognosis of cancer, and estab shown herein indicate that particular prognostic signatures in lishing a treatment regime, or treatment modality, specific for colorectal cancer are associated with a prognosis. The inven that tumour. In particular, a positive prognosis can be used by tion can therefore be used to identify patients at high risk of a patient to decide to pursue standard or less invasive treat recurrence of cancer, or patients with a high likelihood of ment options. A negative prognosis can be used by a patient to recovery. decide to terminate treatment or to pursue highly aggressive 0047. The present invention provides for markers for the or experimental treatments. In addition, a patient can chose determination of disease prognosis, for example, the likeli treatments based on their impact on the expression of prog hood of recurrence of tumours, including colorectal tumours. nostic markers (e.g., CCPMs). Using the methods of the invention, it has been found that 0053 Levels of CCPMs can be detected in tumour tissue, numerous markers are associated with the prognosis of col tissue proximal to the tumour, lymph node samples, blood orectal cancer, and can be used to predict disease outcome. samples, serum samples, urine samples, or faecal samples, Microarray analysis of samples taken from patients with vari using any suitable technique, and can include, but is not ous stages of colorectal tumours has led to the Surprising limited to, oligonucleotide probes, quantitative PCR, or anti discovery that specific patterns of marker expression are asso bodies raised against the markers. It will be appreciated that ciated with prognosis of the cancer. The present invention by analyzing the presence and amounts of expression of a therefore provides for a set of genes, outlined in Table 1 and plurality of CCPMs in the form of prediction signatures, and Table 2, that are differentially expressed in recurrent and constructing a prognostic signature (e.g., as set forth in Tables non-recurrent colorectal cancers. The genes outlined in Table 3, 4, 8A, 8B, and 9), the sensitivity and accuracy of prognosis 1 and Table 2 provide for a set of colorectal cancer prognostic will be increased. Therefore, multiple markers according to makers (CCPMs). the present invention can be used to determine the prognosis 0048. A decrease in certain colorectal cancer prognostic of a cancer. markers (CCPMs), for example, markers associated with 0054 The invention includes the use of archived paraffin immune responses, is indicative of a particular prognosis. embedded biopsy material for assay of the markers in the set, This can include increased likelihood of cancer recurrence and therefore is compatible with the most widely available after standard treatment, especially for colorectal cancer. type of biopsy material. It is also compatible with several Conversely, an increase in other CCPMs is indicative of a different methods of tumour tissue harvest, for example, via particular prognosis. This can include disease progression or core biopsy or fine needle aspiration. In certain aspects, RNA the increased likelihood of cancer recurrence, especially for is isolated from a fixed, wax-embedded cancer tissue speci colorectal cancer. A decrease or increase in expression can be men of the patient. Isolation may be performed by any tech determined, for example, by comparison of a test sample, e.g., nique known in the art, for example from core biopsy tissue or patient's tumour sample, to a reference sample, e.g., a sample fine needle aspirate cells. associated with a known prognosis. In particular, one or more 0055. In one aspect, the invention relates to a method of samples from patient(s) with non-recurrent cancer could be predicting a prognosis, e.g., the likelihood of long-term Sur used as a reference sample. vival of a cancer patient without the recurrence of cancer, 0049. For example, to obtain a prognosis, expression lev comprising determining the expression level of one or more els in a patient’s sample (e.g., tumour sample) can be com prognostic markers or their expression products in a sample pared to samples from patients with a known outcome. If the obtained from the patient, normalized against the expression patient’s sample shows increased or decreased expression of level of other RNA transcripts or their products in the sample, one or more CCPMs that compares to samples with good or of a reference set of RNA transcripts or their expression outcome (no recurrence), then a positive prognosis, or recur products. In specific aspects, the prognostic marker is one or rence is unlikely, is implicated. If the patient’s sample shows more markers listed in Tables 1, 2, or 5, or is included as one expression of one or more CCPMs that is comparable to or more of the prognostic signatures derived from the markers samples with poor outcome (recurrence), then a positive listed in Tables 1, 2, and 5, or the prognostic signatures listed prognosis, or recurrence of the tumour is likely, is implicated. in Tables 3, 4, 8A, 8B, or 9. 0050. As further examples, the expression levels of a prog 0056. In further aspects, the expression levels of the prog nostic signature comprising two or more CCPMS from a nostic markers or their expression products are determined, patient’s sample (e.g., tumour sample) can be compared to e.g., for the markers listed in Tables 1, 2, or 5, a prognostic samples of recurrent/non-recurrent cancer. If the patients signature derived from the markers listed in Tables 1, 2, and 5, sample shows increased or decreased expression of CCPMs e.g., for the prognostic signatures listed in Tables 3, 4, 8A, 8B, by comparison to samples of non-recurrent cancer, and/or or 9. In another aspect, the method comprises the determina comparable expression to samples of recurrent cancer, then a tion of the expression levels of a full set of prognosis markers negative prognosis is implicated. If the patient’s sample or their expression products, e.g., for the markers listed in shows expression of CCPMs that is comparable to samples of Tables 1, 2, or 5, or, a prognostic signature derived from the non-recurrent cancer, and/or lower or higher expression than markers listed in Tables 1, 2, and 5, e.g., for the prognostic samples of recurrent cancer, then a positive prognosis is signatures listed in Tables 3, 4, 8A, 8B, or 9. implicated. 0057. In an additional aspect, the invention relates to an 0051. As one approach, a prediction method can be array (e.g., microarray) comprising polynucleotides hybrid applied to a panel of markers, for example the panel of izing to two or more markers, e.g., for the markers listed in CCPMs outlined in Table 1 and Table 2, in order to generate Tables 1, 2, and 5, or a prognostic signature derived from the a predictive model. This involves the generation of a prog markers listed in Tables 1, 2, and 5, e.g., the prognostic nostic signature, comprising two or more CCPMs. signatures listed in Tables 3, 4, 8A, 8B, and 9. In particular 0052. The disclosed CCPMs in Table 1 and Table 2 there aspects, the array comprises polynucleotides hybridizing to fore provide a useful set of markers to generate prediction prognostic signature derived from the markers listed in Tables US 2016/006891.6 A1 Mar. 10, 2016

1, 2, and 5, or e.g., for the prognostic signatures listed in hybridize to one or more of the markers as disclosed herein, Tables 3, 4, 8A, 8B, or 9. In another specific aspect, the array for example, to the full-length sequences, any coding comprises polynucleotides hybridizing to the full set of mark sequences, any fragments, or any complements thereof. In particular aspects, an increase or decrease in expression levels ers, e.g., for the markers listed in Tables 1, 2, or 5, or, e.g., for of one or more CCPM indicates a decreased likelihood of the prognostic signatures listed in Tables 3, 4, 8A, 8B, or 9. long-term Survival, e.g., due to cancer recurrence, while a 0058 For these arrays, the polynucleotides can be cDNAs, lack of an increase or decrease in expression levels of one or or oligonucleotides, and the Solid Surface on which they are more CCPM indicates an increased likelihood of long-term displayed can be glass, for example. The polynucleotides can Survival without cancer recurrence. TABLE 1 Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P<0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) ME2 210154 at, NM 002396 malic enzyme 2, HS.233119 M55905, O.74 210153 s at, NAD(+)-dependent, BCOOO147 209397 at mitochondrial STAT1 AFFX- NM 007315, signal transducer HS.47O943 NM 007315, O.S8 HUMISGF3Af NM 139266 and activator of BCOO2704 M97935 MA at, transcription 1, AFFX- 91 kDa HUMISGF3Af M97935 MB at, AFFX HUMISGF3Af M97935. 3 at, 200887 s at, AFFX HUMISGF3Af M97935. 5 at, 209969 s at CXCL10 204533 at NM OO1565 chemokine (C X—C HS.413924 NM OO1565 O.29 motif) ligand 10 FAS 215719 x at, NM 000043, Fas (TNF receptor HS.244.139 X83493, O.68 216252 x at, NM 152871, Superfamily, Z70519, 204780 s at, NM 152872, member 6) AA164751, 204781 s at NM 152873, NM OOOO43 NM 152874, NM 152875, NM 152876, NM 152877 SFRS2 200753 x at, NM OO3O16 splicing factor, Hs.7396S BE866585, O.82 214882 s at, arginine serine-rich 2 BG254869, 200754 x at NM OO3O16 GUF1 21888.4 S at NM O21927 GUF1 GTPase HS.546419 NM O21927 O.71 homolog (S. cerevisiae) CXCL9 203915 at NM 002416 chemokine (C X—C HS.77367 NM 002416 O.33 motif) ligand 9 TYMS 202589 at NM OO1071 hymidylate HS.369762 NM OO1071 O.S3 synthetase SEC1 OL1 21874.8 s at NM 006544 SEC10-like 1 (S. cerevisiae) HS.365863 NM 006544 O.76 PLK4 204887 s at NM O14264 polo-like kinase 4 HS.172052 NM O14264 O.64 (Drosophila) MAP2K4 203265 s at NM OO3010 mitogen-activated HS.514681 AA810268 O.76 protein kinase kinase 4 EIF4E 201435 s at, NM OO1968 eukaryotic HS.249718 AW268640, O.69 201436 at translation initiation AI742789 actor 4E TLK1 210379 s at NM 012290 Ousled-like kinase 1 HS.470586 AF1626.66 O.S9 CXCL11 210163 at, NM OO5409 chemokine (C X—C HS.518814 AFO30514, O.15 211122 s at motif) ligand 11 AFOO2985 PSME2 201762 s at NM 002818 broteasone Hs.434081, NM 002818 O.68 (prosome, HS-512410 macropain) activator subunit 2 (PA28 beta) US 2016/006891.6 A1 Mar. 10, 2016

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) CAP-D3 212789 at NM O15261 non-SMC condensin HS.438550 AI796581 O.83 II complex, Subunit D3 MPP5 219321 at NM 022474 membrane protein, HSSO9699 NM 022474 O.74 palmitoylated 5 (MAGUK p55 Subfamily member 5) DLGAP4 202570 S. at NM O14902, discs, large HS.249600 BF346592 1.3 NM 183006 (Drosophila) homolog-associated protein 4 WARS 200628 s at, NM 004184, tryptophanyl-tRNA HS.4975.99 M61715, O.66 200629 at NM 173701, synthetase NM 004184 NM 213645, NM 213646 ARF6 203312 X at NM OO1663 ADP-ribosylation HS.S2S330 NM OO1663 0.77 factor 6 PBK 219148 at NM 018492 PDZ binding kinase HS. 104741 NM 018492 O41 GMFB 202543 s at NM 004124 glia maturation HS-151413 BCOO5359 O.66 factor, beta NDUFA9 208969 at NM OO5002 NADH HS.75227 AFOSO641 0.77 dehydrogenase (ubiquinone) 1 alpha Subcomplex, 9, 39 kDa CDC4O 203377 s at NM O15891 cell division cycle HS.428147 NM O15891 O.8 40 homolog (yeast) WHSC1 209053 s at, NM 007331, Wolf-Hirschhorn HS.113876 BE793.789, 0.75 209054 s at, NM 014.919, syndrome candidate 1 AFO83389, 209052 s at NM 133330, BF111870 NM 133331, NM 133332, NM 133333, NM 133334, NM 133335, NM 133336 C1OBP 208910 s at, NM 001212 complement HSSSS866 L04636, O.71 214214 S at component 1, q AU1518O1 Subcomponent binding protein RBM2S 212031 at NM 021239 RNA binding motif HS.S31106 AV757384 O.83 protein 25 SLC25A11 209003 at, NM 003562 solute carrier family HS.184877 AF070548, O.83 207088 S. at 25 (mitochondrial NM 003562 carrier, oxoglutarate carrier), member 11 TK1 202338 at NM OO3258 thymidine kinase 1, HS.S15122 NM OO3258 0.73 soluble ETNK1 222262 s at, NM 018638 ethanolamine kinase 1 HS.24.0056 AL137750, O.66 219017 at NM 018638 KLHL24 221985 at NM O17644 kelch-like 24 HS.407709 AWOO67SO 1.4 (Drosophila) AK2 212175 s at, NM OO1625, adenylate kinase 2 HS.470907 AL513611, O.8 205996 s at, NM 013411, NM 013411, 212174 at WO2312 HNRPD 221481 X at, NM 001003810, heterogeneous HS.480073 D55672, O.8 209330 s at, NM 002138, nuclear D55674, 200073 s a NM 031369, ribonucleoprotein D M946.30 NM 03.1370 (AU-rich element RNA binding protein 1, 37 kDa) GTPBP3 213835 X a NM 032620, GTP binding protein HS.334.885 AL524262 O.87 NM 133644 3 (mitochondrial) PSAT1 220892 s a NM 021154, phosphoserine HS.494.261 NM 021154 O.S4 NM 058179 aminotransferase 1 US 2016/006891.6 A1 Mar. 10, 2016

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) AP1G1 203350 at NM 001030007, adaptor-related HS.461253 NM OO1128 O.89 NM OO1128 protein complex 1, gamma 1 subunit SMCHD1 212577 at structural HS.8118 AA868754 O.74 maintenance of flexible hinge domain containing 1 SLC4A4 210738 s at, NM 003759 solute carrier family HS.5462 AFO 11390, 0.7 203908 at, 4, Sodium NM 003759, 211494 S at, bicarbonate AF157492, 210739 X at cotransporter, AFO6951 O member 4 RBMS3 206767 at NM 001003792, RNA binding motif, HS.221436 NM O14483 1.2 NM 001003793, single stranded NM O14483 interacting protein LARP4 214155 s at NM 052879, La ribonucleoprotein HS.26613 AI743740 O.66 NM 1991.88, domain family, NM 1991.90 member 4 FANCA 203805 s at NM 000135, Fanconi anemia, HS.284153 AWO83279 O.78 NM 001018112 complementation group A SOS1 212780 at NM O05633 Son of sevenless HS.278733 AA7OO167 O.84 homolog 1 (Drosophila) IFT20 21.0312 s at NM 174887 intraflagellar HS.4187 BCOO2640 1.2 transport 20 homolog (Chlamydomonas) NUP210 212316 at, NM 024923 nucleoporin 210 kDa HS.47SS25 AA502912, O.78 220035 at, NM 024923, 213947 s a AI8671 O2 IRF8 204057 at NM 002163 interferon regulatory Hs.137427 AIO73984 0.75 factor 8 SGPP1 221268 s a NM 030791 sphingosine-1- HS-24678 NM 030791 O.76 phosphate phosphatase 1 MAD2L1 203362 s a NM 002358 MAD2 mitotic arrest Hs.509523, NM 002358 0.7 deficient-like 1 HS.S3318S (yeast) PAICS 201013 s at, NM OO6452 phosphoribosylaminoimidazole HS.S18774 AA902652, O.71 201014 S a carboxylase, NM OO6452 phosphoribosylaminoimidazole Succinocarboxamide synthetase RPS2 217466 x a NM OO2952 ribosomal protein S2 Hs.356366, L48784 O.83 Hs.381079, Hs.498569, Hs.506997, HSSS6270 TMED5 202195 s at NM 016040 transmembrane HS.482873 NM 016040 O.86 emp24 protein transport domain containing 5 GTSE1 204317 at, NM 016426 G-2 and S-phase Hs.386.189, BF305380, O.8 204318 s at expressed 1 HS.47S140 NM 016426 DCK 203302 at NM OOO788 deoxycytidine kinase HS. 709 NM OOO788 0.77 DKFZp762E1312 218726 at NM 018410 hypothetical protein HS.S32968 NM 018410 O.81 DKFZp762E1312 BAZ1A 217986 s at NM 013448, bromodomain HS.509140 NM 013448 O.8 NM 182648 adjacent to Zinc finger domain, 1A HIP2 202346 at NM 005339 huntingtin HSSO3O8 NM 005339 O.78 interacting protein 2 US 2016/006891.6 A1 Mar. 10, 2016

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) HNRPA3P1 206809 s at heterogeneous HS.S24276 NM OO5758 O.83 nuclear ribonucleoprotein A3 pseudogene 1 CDC42BPA 214464 at NM 003607, CDC42 binding HS.3S433 NM OO3607 1.4 NM O14826 protein kinase alpha (DMPK-like) P15RS 218209 s at NM 018170 hypothetical protein HS.464912 NM 018170 0.79 FLJ10656 FLJ10534TSR1 218156 s at NM 018128 TSR1, 20S rRNA HS.38817O NM 018128 0.75 accumulation, homolog (S. cerevisiae) RRM1 201476 s at NM OO1033 ribonucleotide HS.383396 AI692974 O.76 reductase M1 polypeptide USP4 202682 s at NM 003363, ubiquitin specific HS.77SOO NM OO3363 1.2 NM 199443 peptidase 4 (proto Oncogene) ZNF304 207 753 at NM O2O657 Zinc finger protein HS28.7374 NM O2O657 1.3 3O4 CA2 209301 at NM OOOO67 carbonic anhydrase HS.155097 M36532 O.25 I LOC92249 212957 s at hypothetical protein HS.31532 AU154785 1.1 LOC92249 MARCHS 218582 at NM O17824 membrane- HS.S4916S NM O17824 O.81 associated ring finger (C3HC4) 5 TRMT5 221952 x at NM 020810 TRMS tRNA HS.38O159 ABO37814 O.81 methyltransferase 5 homolog (S. cerevisiae) PRDX3 201619 at NM OO6793, peroxiredoxin 3 HSS233O2 NM OO6793 0.73 NM 014098 RAP1GDS1 217457 s at NM 021159 RAP1, GTP-GDP HS.132858 X63465 O.82 dissociation stimulator 1 NUMB 209073 s at NM 001005743, numb homolog HSSO9909 AFO15040 O.82 NM 001005744, (Drosophila) NM 001005745, NM 003744 KIF2 203087 s at NM 004520 kinesin heavy chain HS.S33222 NM 004520 0.72 member 2 ACADSB 205355 at NM OO1609 acyl-Coenzyme A HS.81934 NM OO1609 O.87 dehydrogenase, short branched chain IBRDC3 213038 at NM 153341 IBR domain HS.546478 ALO316O2 O.88 containing 3 TES 202719 s at NM O15641, testis derived HS.S33391 BCOO1451 1.3 NM 152829 transcript (3 LIM domains) YDD19 37079 at YDD19 protein HS.S25826 U82319 O.92 GZMB 210164 at NM 004131 granzyme B HS. 1051 JO31.89 O.66 (granzyme 2, cytotoxic T lymphocyte associated serine esterase 1) LAP3 217933 s at NM O15907 leucine HS.479264 NM O15907 0.67 aminopeptidase 3 C17orf25 209092 s at NM 016080 17 HS.279061 AFO61730 0.72 open reading frame 25 ZNF345 207236 at NM OO3419 Zinc finger protein HS.362324 NM OO3419 1.1 345 KITLG 207029 at, NM 000899, KIT ligand HS.1048 NM 000899, 0.75 211124 S at NM OO3994 AF11983S US 2016/006891.6 A1 Mar. 10, 2016 10

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) CAMSAP1L1 212765 at NM 203459 calmodulin regulated HS.23585 ABO29OO1 1.3 spectrin-associated protein 1-like 1 YTHDC2 20583.5 s at, NM 022828 YTH domain HS.231942 AW975818, O.84 205836 s at containing 2 NM 022828 RABIF 204477 at NM 002871 RAB interacting HS.90875 U74324 1.2 actor SERBP1 21772S X at NM 001018067, SERPINE1 mRNA Hs.369448, NM O15640 O.81 NM 001018068, binding protein 1 Hs.519284, NM 001018069, HS.53O412 NM O15640 KPNB1 208975 S. at NM OO2265 karyopherin HS.S32793 L38.951 O.74 (importin) beta 1 BRIP1 221703 at NM 032043 BRCA1 interacting HS.S32799 AF360549 O.86 protein C-terminal 1 IRF1 202531 at NM 002198 interferon regulatory HS.436061 NM 002198 O.62 actor 1 TIPIN 219258 at NM O17858 TIMELESS HS.426696 NM O17858 0.73 interacting protein SPFH1 202444 S at NM OO6459 SPFH domain Hs.150O87 NM OO6459 O.76 amily, member 1 SFPQ 201586 s at NM 005066 splicing factor HS-355934 NM 005066 O.83 proline:glutamine rich (polypyrimidine tract binding protein associated) MGAT2 211061 s a NM 001015883, mannosyl (alpha- HS.93.338 BCOO6390 0.79 NM 0024.08 6-)-glycoprotein beta-1,2-N- acetylglucosaminyltransferase MCCC2 209624 s a NM 022132 methylcrotonoyl- HS.167531 ABOSOO49 O.6 Coenzyme A carboxylase 2 (beta) DDAH2 215537 x at, NM O13974 dimethylarginine HS.247362 AJO12008, AKO26.191 1.2 214909 s a dimethylaminohydrolase 2 NP 201695 s a NM 000270 nucleoside HS.7SS1.4 NM 000270 0.79 phosphorylase CHEK1 205393 s at, NM OO1274 CHK1 checkpoint HS-24529 NM OO1274 0.7 205394 at homolog (S. pombe) MYO1B 212365 at NM 012223 myosin IB HS.43962O BF215996 O.85 ATPSA1 213738 s a NM 001001935, ATP synthase, H+ Hs.298280, AIS87323 O.82 NM 001001937, transporting, HSSS1998 NM OO4046 mitochondrial F1 complex, alpha Subunit, isoform 1, cardiac muscle IL2RB 205291 at NM OOO878 interleukin 2 HS.474787 NM OOO878 0.73 receptor, beta RPL39 217665 at NM 001000 ribosomal protein HSSS8387 AA42O614 1.3 L39 (RPL39) CD59 212463 at NM 000611, CD59 antigen p18-20 HS.278573 BE379006 1.5 NM 203329, (antigen NM 203330, identified by NM 203331 monoclonal antibodies 16.3A5, EJ16, E.J30, EL32 and G344) AMD1 201196 s at NM 001033059, adenosylmethionine HS.1591.18 M21154 O.74 NM OO1634 decarboxylase 1 GGA2 210658 s at NM O15044, golgi associated, HS.460336 BCOOO284 O.82 NM 138640 gamma adaptin ear containing, ARF binding protein 2 US 2016/006891.6 A1 Mar. 10, 2016 11

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) MCM6 201930 at NM 005915 MCM6 HS.444118 NM 005915 0.75 minichromosome maintenance deficient 6 (MIS5 homolog, S. pombe) (S. cerevisiae) SCC-112 213983 s at, NM O15200 SCC-112 protein HS.331431 AW 991219, O.8 21.2138 at AKO21757 BCL7C 219072 at NM 004765 B-cell HS.3031.97 NM 004765 1.2 CLL/lymphoma 7C HMGN2 208668 x a NM OO5517 high-mobility group HS.181163 BCOO3689 O.9 nucleosomal binding domain 2 RBBP4. 210371 s at, NM O05610 retinoblastoma HSSSS890 BCOO3092, O.8 217301 x a binding protein 4 X71810 KIAAOO90 212396 s a NM O15047 KIAAO090 HS.4392OO AI143233 O.81 SYNPO 202796 at NM OO7286 synaptopodin HS.435228 NM OO7286 1.2 GPR161 214104 at NM 007369, G protein-coupled Hs.271809 AI703188 1.5 NM 153832 receptor 161 TMEM113 215509 S a NM O25222 transmembrane HS. 1941 10 AL137654 0.72 protein 113 SMC2L1 204240 S a NM O06444 SMC2 structural HS.119023 NM O06444 O.65 maintenance of chromosomes 2-like 1 (yeast) CCNA2 203418 at NM OO1237 cyclin A2 HS.851.37 NM OO1237 O.6 WAPB 202549 at NM 004.738 VAMP (vesicle- HS.18262S AKO2S720 1.2 associated membrane protein)- associated protein B and C EXOSC9 213226 at NM 005033 exosome component 9 HS.91728 AI3463SO 0.73 TRIM2S 206911 at NM 005082 tripartite motif. Hs.528952, NM 005082 O.88 containing 25 HSSS1516 SCYL2 221220 S at NM O17988 SCY1-like 2 (S. cerevisiae) HS.506481 NM O17988 O.85 RYK 214172 X at NM 001005861, RYK receptor-like HS.245869 BGO32O3S 1.2 NM OO2958 tyrosine kinase MTHFD1 202309 at NM OO5956 methylenetetrahydro HS.435974 NM OO5956 O.74 olate dehydrogenase (NADP+ dependent) methenyltetrahydrofolate cyclohydrolase, ormyltetrahydrofolate synthetase RUNX1 211180 x at NM 001001890, runt-related Hs. 149261, D89788 1.1 NM OO1754 transcription factor 1 HS.2784.46 (acute myeloid eukemia 1, aml1 Oncogene) KPNA2 201088 at, NM 002266 karyopherin alpha 2 Hs. 159557, NM 002266, 0.77 211762 s at (RAG cohort 1, HS.2S2712 BCOOS978 importin alpha 1) PSME1 200814 at NM OO6263, broteasone HS.7S348 NM OO6263 O.76 NM 176783 (prosome, macropain) activator subunit 1 (PA28 alpha) TACC3 2183.08 at NM OO6342 transforming, acidic HS. 104019 NM OO6342 O.78 coiled-coil containing protein 3 FEN1 204768 s at NM 004111 flap structure- HS.40906S NM 004111 0.73 specific endonuclease 1 US 2016/006891.6 A1 Mar. 10, 2016 12

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) GTF3C4 219198 at NM 012204 general transcription HS.S49088 NM 012204 O.87 factor IIIC, polypeptide 4, 90 kDa GEMIN4 217099 s at NM O15721 gem (nuclear HS.49962O AF2S8545 O.76 organelle) associated protein 4 CTSS 202902 s at NM 004079 cathepsin S HS.181301 NM 004079 O.74 MCM2 202107 s at NM OO4526 MCM2 HS.477481 NM OO4526 O.71 minichromosome maintenance deficient 2, mitotin (S. cerevisiae) GPHN 220773 s at NM 001024218, gephyrin HS2O876S NM 020806 0.67 NM 020806 NUPSO 218295 s at NM 007172, nucleoporin 50 kDa HS.475103 NM 007172 O.78 NM 153645, NM 153684 RANBP2L1 210676 x at NM 005054, RAN binding HS.46963O U64675 O.83 NM 032260 protein 2-like 1 NRSA2 208337 s at NM 003822, nuclear receptor HS-33446 NM OO3822 0.77 NM 205860 Subfamily 5, group A member 2 PGD 201118 at NM 002631 phosphogluconate HS.464071 NM 002631 0.75 dehydrogenase FUT4 20.9892 at, NM 002033 fucosyltransferase 4 HS.39042O AF305.083, O.78 2098.93 s at (alpha (1,3) MS8596 fucosyltransferase, myeloid-specific) RAB6A 201048 x at NM 002869, RAB6A, member Hs.503222, NM 002869 O.81 NM 198896 RAS oncogene HS.S35586 amily CCNT2 204645 at NM OO1241, cyclin T2 HS.292754 NM OO1241 O.87 NM 058241 TFRC 207332 S at NM OO3234 transferrin receptor HS.S29618 NM OO3234 O.63 (p90, CD71) BIRCS 202095 s at NM 001012270, baculoviral IAP HS.S14527 NM 001168 0.7 NM 001012271, repeat-containing 5 NM 001168 (survivin) PGGT1B 206288 at NM 005O23 protein HS.254OO6 NM 005O23 O.8 geranylgeranyltransferase type I, beta Subunit USP14 201672 s at NM O05151 ubiquitin specific HS.464416 NM O05151 O.81 peptidase 14 (tRNA guanine transglycosylase) PURA 204020 at NM OO5859 purine-rich element HS.443121 BF739943 1.2 binding protein A LMAN1 203293 s at, NM OO5570 ectin, mannose- HS.46529S NM 005570, O.82 203294 s at binding, 1 UO9716 WDR45L 209.076 s at NM 019613 WDR45-like HS2O1390 BCOOO974 O.82 SGCD 213543 at NM 000337, sarcoglycan, delta HS.3872O7 AAS70453 1.2 NM 172244 (35 kDa dystrophin associated glycoprotein) LRP8 205282 at NM 001018.054, low density HS.444.637 NM 004631 O.78 NM 004631, lipoprotein receptor NM 017522, related protein 8, NM 033300 apolipoproteine receptor ITGA4 205885 S at NM OOO885 integrin, alpha 4 HSSSS880 L12002 O.74 (antigen CD49D, alpha 4 subunit of VLA-4 receptor) US 2016/006891.6 A1 Mar. 10, 2016 13

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) BUB3 201458 s at NM 001007793, BUB3 budding HS.418533 NM OO4725 0.79 NM OO4725 uninhibited by benzimidazoles 3 homolog (yeast) KIF18A 221258 s at NM 031217 kinesin family HS.301052 NM 031217 O.83 member 18A FKBP9 212169 at NM OO7270 FK506 binding HS.1O3934 ALOSO187 1.2 protein 9, 63 kDa ATF6 217550 at NM OO7348 activating HS.492740 AAS76497 1.4 transcription factor 6 TNFRSF11A 207037 at NM OO3839 tumor necrosis HS.2O4044 NM OO3839 O.68 actor receptor Superfamily, member 11a, NFKB activator KIAAO841 213054 at KIAAO841 HS.7426 AA8453SS O.9 TGFB2 209909 s at NM O03238 transforming growth HS.133379 M19154 1.1 actor, beta 2 ITGB5 201125 s at, NM OO2213 integrin, beta 5 HS.1315S NM 002.213, 1.2 201124 at, AL048423, 214021 X at AI33S208 RABGEF1 218310 at NM 014504 RAB guanine HS.S3OOS3 NM 014504 1.2 nucleotide exchange actor (GEF) 1 PBX1 205253 at, NM OO2585 pre-B-cell leukemia HS.493.096 NM 002585, 1.2 212148 at transcription factor 1 ALO49381 ZNF148 203318 s at NM 021964 Zinc finger protein HS.380334 NM 021964 1.2 48 (pHZ-52) ZWINT 204026 s at NM 001005413, ZW10 interactor HS.42650 NM 007057 O.66 NM 001005414, NM 007057, NM 032997 ZDHHC3 213675 at NM O16598 Zinc finger, DHHC- HS. 61430 W61005 1.3 type containing 3 CDCA8 221520 s at NM 018101 cell division cycle HS.S24571 BCOO1651 O.76 associated 8 CUTL1 214743 at NM OO1913, cut-like 1, CCAAT HS.438.974 BEO46521 1.3 NM 181500, displacement protein NM 181552 (Drosophila) C18orf 219311 at NM O24899 chromosome 18 HS.236940 NM O24899 0.73 open reading frame 9 TXNDC 209476 at NM 030755 hioredoxin domain HS.125221 ALO80O8O 0.75 containing POLE2 205909 at NM 002692 polymerase (DNA HS.162777 NM 002692 0.73 directed), epsilon 2 (p59 subunit) SPCS3 218817 at NM O21928 signal peptidase HS.42194 NM O21928 0.7 complex subunit 3 homolog (S. cerevisiae) CAND1 208839 S at NM 018448 cullin-associated and HS.5464O7 AL136810 O.84 neddylation dissociated 1 U2AF2 218381 s at NM 001012478, U2 (RNU2) small HS.S28007 NM OO7279 O.83 NM OO7279 nuclear RNA auxiliary factor 2 WDHD1 204728 s at NM 001008396, WD repeat and HS.385998 NM 007086 0.73 NM 007086 HMG-box DNA binding protein 1 HEM1 209734 at NM 005337 hematopoietic HS.182014 BCOO1604 O.9 protein 1 RABEP1 214552 s at NM 004703 rabaptin, RAB HSSS1518 AFO98.638 O.84 GTPase binding effector protein 1 SYDE1 44702 at NM 033025 synapse defective 1, HS.S28.701 R77097 1.1 Rho GTPase, homolog 1 (C. elegans) US 2016/006891.6 A1 Mar. 10, 2016 14

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) WFDC1 219478 at NM 021197 WAP four-disulfide HS.36688 NM 021197 1.2 core domain 1 TBX2 40560 at NM OO5994 T-box2 HSS3.1085 U28049 1.1 GART 210005 at NM 000819, phosphoribosylglycinamide HS.473648 D32051 O.84 NM 175085 formyltransferase, phosphoribosylglycinamide synthetase, phosphoribosylaminoimidazole synthetase H2AFZ. 213911 s at, NM 002106 H2A histone family, HS.1191.92 BF718636, O.8 200853 at member Z NM 002106 CD7 214551 S. at NM OO6137 CD7 antigen (p41) HS.36972 NM OO6137 O.8 ELOVL6 210868 s at NM 024090 ELOVL family HS.412939 BCOO 130S O.81 member 6, elongation of long chain fatty acids (FEN1/Elo2, SUR4/Elo3-like, yeast) CACNB3 34726 at NM 000725 calcium channel, HS.2SO712 UO7139 1.2 voltage-dependent, beta 3 subunit TAP1 202307 s at NM 000593 transporter 1, ATP- HS.3S2O18 NM 000593 O.68 binding cassette, Sub-family B (MDR/TAP) NUP98 210793 s at NM 005387, nucleoporin 98 kDa HS.S247SO U41815 0.75 NM 016320, NM 139131, NM 139132 CHAF1A 214426 X at, NM 005483 chromatin assembly HS.79018 BFO62223, O.83 203976 S. at factor 1, Subunit A NM 005483 (p150) EPAS1 200878 at NM 001430 endothelial PAS HS.468410 AFOS2O94 1.3 domain protein 1 RNGTT 204207 s at NM OO3800 RNA HS.127219 ABO12142 O.8 guanylyltransferase and 5'-phosphatase KLF7 204334 at NM 003709 Kruppel-like factor 7 HS.471221 AA488.672 1.1 (ubiquitous) C4orf16 219023 at NM 018569 open HS.435991 NM 018569 0.77 reading frame 16 YBX2 219704 at NM O15982 Y box binding HS.38O691 NM O15982 0.75 protein 2 IVD 216958 s at NM OO2225 isovaleryl Coenzyme HS.513646 AKO22777 O.81 A dehydrogenase PEG3 209242 at NM OO6210 paternally expressed 3 HS2O1776 ALO42588 1.2 FBXL14 213145 at NM 152441 F-box and leucine- HS.367956 BFOO1666 O.83 rich repeat protein 14 TMEPAI 217875 s at NM 02O182, transmembrane, HS.S171SS NM O2O182 1.4 NM 199169, prostate androgen NM 1991.70, induced RNA NM 1991.71 RNF138 218738 s at NM 016271, ring finger protein Hs.302408, NM O16271 O.82 NM 1981.28 138 HS.5O1040 DNM1L, 203105 s at NM 005690, dynamin 1-like HSSSO499 NM 012062 O.87 NM 012062, NM 012063 LHCGR. 2153.06 at NM OOO233 luteinizing HS.468.490 ALO49443 1.3 hormonefchoriogonadotropin receptor SOCS6 214462 at, NM OO4232 Suppressor of HS.S91068 NM 004232, O.85 206020 at cytokine signaling 6 NM 016387 (SOCS6) US 2016/006891.6 A1 Mar. 10, 2016 15

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) CEP350 213956 at NM O14810 centrosomal protein HS.413045 AW299294 1.3 350 kDa PTGER3 210374 x at, NM OOO957, prostaglandin E HS.445OOO D383.00, 1.1 210831 s at NM 198712, receptor 3 (subtype L27489 NM 198713, EP3) NM 198714, NM 198715, NM 198716, NM 198717, NM 198718, NM 198719, NM 19872O M11 S1 200723 s at NM O05898, membrane HS.47 1818 NM OO5898 O.9 NM 203364 component, chromosome 11, Surface marker 1 RFCS 203210 s at NM 007370, replication factor C HSSO6989 NM OO7370 0.79 NM 181578 (activator 1) 5, 36.5 kDa INDO 210029 at NM 002164 indoleamine-pyrrole HS.840 M344S5 O.74 2,3 dioxygenase KIAAO286 212619 at NM O15257 NA HS.S33787 AW2OS215 0.77 MOBK1B 201298 s a NM 018221 MOB1, Mps One HS.196437 BCOO3398 O.84 Binder kinase activator-like 1B (yeast) FLJ2O273 218035 s a NM O19027 RNA-binding HS.S18727 NM O19027 0.73 protein HADHSC 211569 s a NM 005327 L-3-hydroxyacyl- HS.438289 AFOO1903 O.62 Coenzyme A dehydrogenase, short chain SSPN 204964 S a NM 005086 sarcospan (Kras HS.183428 NM 005086 1.6 oncogene-associated gene) AP2B1 200615 s a NM 001030006, adaptor-related HS.514819 ALS67295 0.77 NM 001282 protein complex 2, beta 1 subunit EIF4A1 201530 x at, NM 001416 eukaryotic HS.129673 NM 001416, 0.79 214805 at translation initiation U79273 factor 4A, isoform 1 DEPDC1 220295 X a NM O17779 DEP domain HS.445098 NM O17779 O.66 containing 1 AGPATS 218096 at NM 018361 1-acylglycerol-3- HS.490899 NM 018361 O.68 phosphate O acyltransferase 5 (lysophosphatidic acid acyltransferase, epsilon) HNRPDL 201993 X at NM 005463, heterogeneous HS.S271 OS NM OO5463 O.86 NM 031372 nuclear ribonucleoprotein D like GBP1 202270 at NM 002053 guanylate binding Hs.62661, NM 002053 O.61 protein 1, interferon- HS.443S27 inducible, 67 kDa AMIGO2 222108 at NM 181847 adhesion molecule HS.121S2O ACOO4010 1.6 with Ig-like domain 2 XPO7 208459 s at NM O15024 exportin 7 HS.172685 NM O15024 O.78 PAWR 204005 s at NM OO2583 PRKC, apoptosis, HS.406074 NM OO2583 O.71 WT1, regulator NARS 200027 at NM OO4539 asparaginyl-tRNA HS.465224 NM OO4539 O.84 synthetase CENPA 204962 s at NM OO1809 centromere protein HS.1594 NM OO1809 O.69 A, 17 kDa US 2016/006891.6 A1 Mar. 10, 2016 16

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) KIF15 219306 at NM 020242 kinesin family HS.307529 NM 020242 O.78 member 15 ZNF518 204291 at NM 014803 Zinc finger protein HS.14789S NM 014803 O.88 518 LPP 202821 s at NM OO5578 LIM domain HS.444362 ALO44O18 1.3 containing preferred translocation partner in lipoma BRRN1 212949 at NM O15341 barren homolog HS.308045 D38553 O.76 (Drosophila) CSOrfa. 48031 r at NM 016348, chromosome 5 open HS.519694 H93.077 1.2 NM 032385 reading frame 4 UBAP1 46270 at NM O16525 ubiquitin associated HS.268963 ALO394.47 1.1 protein 1 SH3GLB1 209090 s. at NM 016009 SH3-domain GRB2- HS136309 ALO49597 1.2 like endophilin B1 CDKN1C 213182 X at NM OOOO76 cyclin-dependent HS.106O70 R786.68 1.4 kinase inhibitor 1C (p57, Kip2) MCM10 220651 s at NM 018518, MCM10 HS.198363 NM 018518 O.74 NM 182751 minichromosome maintenance deficient 10 (S. cerevisiae) KIAAO265 209254 at NM O14997 KIAA0265 protein HS.S2O710 AI8O8625 1.2 BUB1 209642 at NM 004336 BUB1 budding HS.469649 AFO43294 O.68 uninhibited by benzimidazoles 1 homolog (yeast) LGALS3BP 200923 at NM 005567 lectin, galactoside- HS.S1453S NM 005567 O.8 binding, soluble, 3 binding protein NCAPD2 201774 S at NM O14865 non-SMC condensin HSS719 AKO22511 0.73 I complex, Subunit D2 CD86 205686 s at NM OO6889, CD86 antigen HS.171182 NM OO6889 O.88 NM 175862 (CD28 antigen ligand 2, B7-2 antigen) C16orf50 219315 s at NM O24600 chromosome 16 HS.459652 NM O24600 1.2 open reading frame 30 RBBP8 203344 S at NM 002894, retinoblastoma HS.546282 NM 002894 0.79 NM 203291, binding protein 8 NM 203292 FEM1C 213341 at NM O2O177 fem-1 homologic HS.47367 AI862658 O.82 (C. elegans) NUP160 214962 s at NM O15231 nucleoporin 160 kDa HS.372099 AKO26236 O.84 WAMP4 213480 at NM 003762, vesicle-associated HS. 6651 AFOS2100 1.1 NM 201994 membrane protein 4 C9orf76 218979 at NM O24945 chromosome 9 open HS.2841.37 NM O24945 O.8 reading frame 76 DHX15 201386 s at NM OO1358 DEAH (Asp-Glu- HS.S683 AF279891 O.83 Ala-His) box polypeptide 15 RIG 221127 s at regulated in glioma. HS.2921S6 NM OO6394 1.2 HBP1 209102 s at NM 012257 HMG-box HS.162032 AFO 19214 1.2 transcription factor 1 ABCE1 201873 s at, NM 002940 ATP-binding HS.12O13 NM 002940, 0.79 201872 s at cassette, Sub-family, AIOO2OO2 E (OABP), member 1 PPA2 220741 s at NM OO6903, pyrophosphatase HS.480452 NM OO6903 O.81 NM 176866, (inorganic) 2 NM 176867, NM 176869 CPD 201942 s at NM OO1304 carboxypeptidase D HS.446079 D85390 O.68 KIAAO828 215672 s at NM O15328 adenosylhomocysteinase 3 Hs.195058 AKO2S372 0.73 US 2016/006891.6 A1 Mar. 10, 2016 17

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) K- 211058 x at NM OO6082 alpha tubulin HS.S24390 BCOO6379 O.85 ALPHA-1 RNMT 202684 S at NM 003799 RNA (guanine-7-) HS.8086 ABO2O966 O.9 methyltransferase MIS12 221559 s at NM 024039 MIS12 homolog HS.2671.94 BCOOO229 O.8 (yeast) AURKB 2094.64 at NM 004217 aurora kinase B HS.442658 ABO11446 O.71 FAM64A 221591 s at NM O19013 amily with HS.404323 BCOOSOO4 O.8 sequence similarity 64, member A TAP2 204770 at NM 000544, transporter 2, ATP- HSSO2 NM 000544 O.82 NM 018833 binding cassette, Sub-family B (MDR/TAP) PCDHGC3 205717 X at NM 002588, protocadherin HS-36816O NM OO2588 1.2 NM 032402, gamma Subfamily C, 3 NM 032403 AVEN 219366 at NM O2O371 apoptosis, caspase HSSSS966 NM O2O371 1.1 activation inhibitor HMGB2 208808 s at NM 002129 high-mobility group HS.434953 BCOOO903 O.76 box2 CDC2 203214 X at NM 001786, cell division cycle 2, HS.334562 NM OO1786 0.72 NM 033379 G1 to Sand G2 to M RIF1 214700 x at NM 0181.51 RAP1 interacting HS.S36537 AKOOO323 O.84 actor homolog (yeast) TCF7L2 216511 s at NM 030756 transcription factor HSSO108O AJ270770 O.8 7-like 2 (T-cell specific, HMG-box) KIF11 204444 at NM OO4523 kinesin family HS.8878 NM OO4523 O.68 member 11 TTC19 217964 at NM O17775 etratricopeptide HS.462316 NM O17775 0.67 repeat domain 19 MDSO32 221706 s at NM 018467 uncharacterized HS.161.87 BCOO600S 1.2 hematopoietic stem progenitor cells protein MDSO32 PSMA3 201532 at NM 002788, O(8SOc HS.S31089 NM OO2788 O.76 NM 152132 (prosome, macropain) subunit, alpha type, 3 PDGFA 205463 s at platelet-derived Hs.376032, NM 002607 1.3 growth factor alpha HS.S21331 polypeptide GTF2H2 221540 x at NM OO1515 general transcription Hs. 191356, AFO78847 O.86 actor IIH, HS.398348 polypeptide 2, 44 kDa CXCL13 205242 at NM 006419 chemokine (C-X-C HS.100431 NM 006419 O.36 motif) ligand 13 (B- cell chemoattractant) FOXM1 202580 x at NM 021953, orkhead box M1 HS.239 NM O21953 0.7 NM 202002, NM 202003 YARS 212048 s at NM OO3680 tyrosyl-tRNA HS.213264 AW2454OO O.87 synthetase SE57-1 22O180 at NM O25214 coiled-coil domain Hs.120790 NM O25214 0.77 containing 68 CLCA4 220026 at NM 012128 chloride channel, HS.546343 NM 012128 O.64 calcium activated, family member 4 MCAM 211340 S at NM OO6500 melanoma cell HS.S11397 M28882 1.2 adhesion molecule PBXIP1 214177 s at NM 020524 pre-B-cell leukemia HSSOS806 AI935162 1.2 transcription factor interacting protein 1 US 2016/006891.6 A1 Mar. 10, 2016 18

TABLE 1-continued Colorectal Cancer Predictive Markers (corresponding to Affymetrix GeneChip probes that show statistically significant differential expression, P < 0.05, as ascertained by BRB Array Tools Expression Fold Difference Other (relapse? Gene Affymetrix Probe Genbank Oil Symbol IDS Refseq Access. Gene Description Unigene Access. Access. relapse) PPM1D 204566 at NM OO3620 protein phosphatase HS286O73 NM OO3620 O.88 1D magnesium dependent, delta isoform FLJ22471 218175 at NM O25140 NA HS.114111 NM O25140 1.2 ZBTB20 205383 s at NM O15642 Zinc finger and BTB HS. 122417 NM O15642 1.4 domain containing 2O RRM2 209773 s at NM 001034 ribonucleotide HS.226390 BCOO1886 O.69 reductase M2 polypeptide

TABLE 2 Markers with expression correlating to that of the 22 genes from NZ signature. Expression Fold Difference (relapse Affymetrix Refseq Unigene Genbank Oil Gene Symbol Probe IDs Access Gene Description Access Access relapse) CCL5 1405 i at, NM OO2985 chemokine (C-C motif) Hs.514821 M21121, O.69 204655 a. ligand 5 NM 002985 SFRS10 200893 a NM 004593 splicing factor, Hs.533122 NM 004593 O.96 arginine serine-rich 10 (transformer 2 homolog, Drosophila) HLA-E 200904 a NM 005516 major histocompatibility HS.3810O8 XS6841 1 complex, class I, E K-ALPHA-1 201090 X at NM 006082 alpha tubulin Hs.524390 NM 006082 O.87 PSMAS 201274 a NM OO2790 proteasome (prosome, HS.485246 NM OO2790 O.9S macropain) subunit, alpha type, 5 TOP2A 201292 a NM OO1067 topoisomerase (DNA) II HS.156346 ALS 61,834 O.77 alpha 170 kDa EBNA1BP2 201323 a NM OO6824 EBNA1 binding protein 2 HS.346868 NM OO6824 O.98 SNRPC 201342 a. NM 003093 Small nuclear HS.1063 NM OO3093 1 ribonucleoprotein polypeptide C UBE2IL6 201649 a NM 0.04223, ubiquitin-conjugating HS.425777 NM 004223 0.75 NM 198183 enzyme E2L 6 LAPTMS 201720 S at NM OO6762 lysosomal associated HS.371021 AIS89086 O.89 multispanning membrane protein 5 CTSL 202087 s at NM OO1912, cathepsin L. Hs.418123 NM OO1912 0.97 NM 145918 GBP1 202269 X at NM 002053 guanylate binding protein Hs.62661, BCOO2666 O.69 1, interferon-inducible, HS.443S27 67kDa TNFAIP2 2025.10 S at NM OO6291 tumor necrosis factor, HS.S256O7 NM OO6291 O.91 alpha-induced protein 2 CCNB2 202705 at NM 004701 cyclin B2 Hs.194698 NM 004701 O.83 GBP2 202748 at NM 004120 guanylate binding protein Hs.386567 NM 004120 O.87 2, interferon-inducible CDC2O 202870 S at NM OO1255 CDC20 cell division Hs.524947 NM OO1255 O.78 cycle 20 homolog (S. cerevisiae) HAT1 2031.38 at NM 001033085, histone acetyltransferase 1 HS.47O611 NM 003642 O.9S NM 003.642 SPAGS 203145 at NM OO6461 sperm associated antigen 5 Hs.514033 NM OO6461 O.87 US 2016/006891.6 A1 Mar. 10, 2016 19

TABLE 2-continued Markers with expression correlating to that of the 22 genes from NZ signature. Expression Fold Difference (relapse Affymetrix Refseq Unigene Genbank Oil Gene Symbol Probe IDs Access Gene Description Access Access relapse) RFCS 203209 a NM 007370, replication factor C HSSO6989 BCOO1866 0.79 NM 181578 (activator 1) 5, 36.5 kDa MYCBP 203360 s at NM 012333 c-myc binding protein HS.370040 DSO692 1 BUB1B 2O3755 a. NM 001211 BUB1 budding HS.36708 NM 001211 O.85 uninhibited by benzimidazoles 1 homolog beta (yeast) SLA 203761 a NM OO6748 Src-like-adaptor HS.7S367 NM OO6748 0.97 WRK1 203856 a NM OO3384 vaccinia related kinase 1 HS.422662 NM 003384 0.72 PIK3CD 203879 a NM 005O26 phosphoinositide-3- HS.S.18451 U86453 O.99 kinase, catalytic, delta polypeptide HLA-DMB 203932 a NM 002118 major histocompatibility HS.1162 NM 002118 O.82 complex, class II, DM beta TRIP13 204033 a NM 004237 thyroid hormone receptor HS.436187 NM 004237 O.78 interactor 13 RARRES3 204070 a NM 004585 retinoic acid receptor HS.17466 NM 004585 O.96 responder (tazaroteine induced) 3 CKS2 20417 O S at NM 001827 CDC28 protein kinase HS.837.58 NM OO1827 O.8 regulatory Subunit 2 APOBEC3G 204205 at NM O21822 apolipoprotein B mRNA HS.474853 NM O21822 O.74 editing enzyme, catalytic polypeptide-like 3G PSMB9 204279 at NM 002800, proteasome (prosome, HS-381081 NM 002800 O.63 NM 148954 macropain) subunit, beta type, 9 (large multifunctional peptidase 2) FUSIP1 204299 at NM 054O16 FUS interacting protein HS.3S30 NM O21993 O.9 (serinefarginine-rich) 1 SELL 204563 at NM OOO655 selectin L (lymphocyte HS.82848 NM OOO655 O.88 adhesion molecule 1) DKK1 204602 at NM 012242 dickkopfhomolog 1 HS.40499 NM 012242 O.9S (Xenopus laevis) KIF23 204709 s at NM 004856, kinesin family member Hs.270845 NM 004.856 O.9 NM 138555 23 TTK 204822 at NM OO3318 TTK protein kinase Hs.169840 NM 003318 O.8 ECGF1 204858 s at NM OO1953 endothelial cell growth Hs.546251 NM OO1953 O.85 actor 1 (platelet-derived) LCP2 205269 at, NM 005565 ymphocyte cytosolic Hs.304475 AI123251, O.91 205270 s at protein 2 (SH2 domain NM 005565 containing leukocyte protein of 76 kDa) BTN2A2 205298 s at NM O06995, butyrophilin, subfamily HS.373938 W58757 O.94 NM 181531 2, member A2 BMP5 205431 S. at NM 021073 bone morphogenetic Hs.296648 NM 021073 O.9 protein 5 GZMA 205488 at NM 006144 granzyme A (granzyme HS.90708 NM 006144 O.68 1, cytotoxic T ymphocyte-associated serine esterase 3) SMURF2 205596 s at NM 022739 SMAD specific E3 HS.S15O11 AYO1418O 1 ubiquitin protein ligase 2 CD8A 205758 at NM OO1768, CD8 antigen, alpha HS.852S8 AWOO673S O.78 NM 171827 polypeptide (p32) CD2 205831 at NM OO1767 CD2 antigen (p50), sheep Hs.523500 NM OO1767 O.87 red blood cell receptor JAK2 205842 s at NM 004972 Janus kinase 2 (a protein HS.43437.4 AFOO1362 O.86 tyrosine kinase) UBD 205890 S at NM OO6398 ubiquitin D HS.44532 NM OO6398 O41 ADH1C 206262 at NM OOO669 alcohol dehydrogenase HS.2523 NM OOO669 O.33 1C (class I), gamma polypeptide AIM2 206513 at NM 004.833 absent in melanoma 2 Hs.281898 NM 004.833 O.91 US 2016/006891.6 A1 Mar. 10, 2016 20

TABLE 2-continued Markers with expression correlating to that of the 22 genes from NZ signature. Expression Fold Difference (relapse Affymetrix Refseq Unigene Genbank Oil Gene Symbol Probe IDs Access Gene Description Access Access relapse) SI 206664 at NM 001041 Sucrase-isomaltase HS.429596 NM 001041 O.39 (alpha-glucosidase) NAT2 206797 at NM OOOO15 N-acetyltransferase 2 Hs.2 NM OOOO15 O.82 (arylamine N acetyltransferase) SP110 208012 X at NM 004509, SP110 nuclear body Hs.145150 NM 004.509 O.9S NM 0.04510, protein NM 080424 PRDX1 208680 at NM 002574, peroxiredoxin 1 HS.180909 L1918.4 1 NM 181696, NM 181697 PSMA6 208805 at NM OO2791 proteasome (prosome, HS.446260 BCOO2979 O.87 macropain) subunit, alpha type, 6 IFI16 208966 x at NM 005531 interferon, gamma- HS.38O2SO AF208043 1.2 inducible protein 16 PPIG 208995 S at NM 004792 peptidyl-prolyl isomerase HS.470544 U4O763 O.98 G (cyclophilin G) KIF2C 2094.08 at, NM OO6845 kinesin family member HS.69360 U63743, 0.75 211519 s at 2C AYO26505 APOL1 209546 s at NM 003661, apolipoprotein L, 1 HS.114309 AF323S40 O.98 NM 145343, NM 145344 CD74 209619 at NM 001025 158, CD74 antigen (invariant HS.436568 KO1144 O.76 NM 001025 159, polypeptide of major NM 004355 histocompatibility complex, class II antigen associated) HMMR 209709 s at NM 012484, hyaluronan-mediated Hs.725SO U29343 O.84 NM 012485 motility receptor (RHAMM) CDKN3 209714 S at NM 005.192 cyclin-dependent kinase HS.841.13 AF213O33 O.71 inhibitor 3 (CDK2 associated dual specificity phosphatase) BUB3 209974 s at NM 001007793, BUB3 budding HS.418533 AFO47473 O.84 NM 004725 uninhibited by benzimidazoles 3 homolog (yeast) SOCS1 210001 S. at NM 003745 Suppressor of cytokine HS.50640 ABOOSO43 O.93 signaling 1 CD32. 210031 at NM 000734, CD37, antigen, Zeta HS.156445 JO4132 O.87 NM 198053 polypeptide (TIT3 complex) CACYBP 210691 s at NM 001007214, calcyclin binding protein HSSO8524 AF2758O3 0.97 NM O14412 HLA-DRA 210982 s at NM 019111 major histocompatibility HS.S2OO48 M60333 O.74 complex, class II, DR alpha NEK2 211080 S at NM 002497 NIMA (never in mitosis HS.153704 Z2S425 O.77 genea)-related kinase 2 NF2 211091 s at NM 000268, neurofibromin 2 HS.1878.98 AF122828 O.96 NM 016418, (bilateral acoustic NM 181825, neuroma) NM 181826, NM 181827, NM 181828, NM 181829, NM 181830, NM 181831, NM 181832, NM 181833, NM 181834, NM 181835 FYB 211795 s at NM 001465, FYN binding protein HS.370503 AF198052 O.83 NM 1993.35 (FYB-120/130) US 2016/006891.6 A1 Mar. 10, 2016 21

TABLE 2-continued Markers with expression correlating to that of the 22 genes from NZ signature. Expression Fold Difference (relapse Affymetrix Refseq Unigene Genbank Oil Gene Symbol Probe IDs Access Gene Description Access Access relapse) HLA-DPA1 211991 s at NM 033554 major histocompatibility HS-347270 M27487 0.75 complex, class II, DP alpha 1 PTPRC 212587 s at, NM 002838, protein tyrosine Hs.192039 AI809341, O.77 212588 at NM 080921, phosphatase, receptor YOOO62 NM 080922, type, C NM 080923 SP3 213168 at NM 001017371, Sp3 transcription factor HS.S31587 AU145OOS O.98 NM 003111 ITGAL 213475 S. at NM 002209 integrin, alpha L (antigen HS.1741.03 ACOO2310 O.85 CD11A (p180), lymphocyte function associated antigen 1, alpha polypeptide) RAC2 213603 s at NM 002872 ras-related C3 botulinum HS.S176O1 BE138888 O.92 toxin substrate 2 (rho family, Small GTP binding protein Rac2) DNA2L. 213647 at DNA2 DNA replication HS.532446 D42046 O.87 helicase 2-like (yeast) TRAF3IP3 213888 s at NM O25228 TRAF3 interacting HS.147434 ALO22398 O.86 protein 3 NKG7 213915 at NM 005 601 natural killer cell group 7 HS.10306 NM 005 601 0.72 Sequence SFRS7 214141 X at NM 001031684, splicing factor, HS.309090 BFO333S4 O.88 NM OO6276 arginine serine-rich 7, 35 kDa ZG16 214142 at NM 152338 zymogen granule protein HS.184507 AIf3290S O.18 16 PRF1 214617 at NM O05041 perforin 1 (pore forming HS.2200 AI445650 O.81 protein) CCNB1 214710 S at NM 031966 cyclin B1 HS.2396O BE4O7516 O.63 KIAAO907 214995 S at NM O14949 KIAAO907 HS.24656 BFSO8948 O.82 GTSE1 215942 s at NM 016426 G-2 and S-phase Hs.386.189, BF973 178 O.86 expressed 1 HS.47S140 HMGB3 216548 x at NM 005342 high-mobility group box 3 HS.19114 ALO49709 0.97 HLA-DMA 217478 s at NM 006120 major histocompatibility Hs.351279 X76775 O.8 complex, class II, DM alpha C20orfas 217851 S. at NM 016045 chromosome 20 open HS.3945 NM 016045 1.1 reading frame 45 MRPL42 217919 s a NM 014050, mitochondrial ribosomal HS.199579 BEf82148 0.79 NM 172177, protein L42 NM 172178 NUSAP1 218039 at, NM 016359, nucleolar and spindle Hs.511093 NM 016359, O.92 219978 s at NM 018.454 associated protein 1 NM 018454 TMEM48 218073 s at NM 018087 transmembrane protein 48 HS.476525 NM 018087 O.71 DHX40 218277 s at NM O24612 DEAH (Asp-Glu-Ala- HS.29403 NM 024612 1.1 His) box polypeptide 40 NFS1 21.8455 at NM 021100, NFS1 nitrogen fixation 1 Hs.194692 NM 021100 1 NM 181679 (S. cerevisiae) C10orf 218542 at NM 018131 open HS.14559 NM 018131 O.77 reading frame 3 NCAPG 218663 at NM 022346 non-SMC condensin I Hs.4462O1, NM 022346 0.73 complex, Subunit G HS.47927O FBXO5 218875 S. at NM 012177 F-box protein 5 Hs.520506 NM 012177 O.89 SLAMF8 219385 at NM O2O125 SLAM family member 8 Hs.438683 NM O2O125 O.94 CENPN 219555 S at NM 018455 centromere protein N Hs.283532 NM 018455 O.81 ATP13A3 219558 at ATPase type 13 A3 Hs.529609 NM O24524 0.75 ECT2 219787 s at NM 018098 epithelial cell Hs.518299 NM 018098 0.75 transforming sequence 2 Oncogene ASPM 21991.8 s at NM 018136 asp (abnormal spindle)- Hs.121028 NM 018123 O.89 like, microcephaly associated (Drosophila) ZC3HAV1 22O104 at NM 020119, Zinc finger CCCH-type, Hs.133512 NM O2O119 O.93 NM O24625 antiviral 1 US 2016/006891.6 A1 Mar. 10, 2016 22

TABLE 2-continued Markers with expression correlating to that of the 22 genes from NZ signature. Expression Fold Difference (relapse Affymetrix Refseq Unigene Genbank Oil Gene Symbol Probe IDs Access Gene Description Access Access relapse) CLEC2D 220132 s at NM 001004419, C-type lectin Superfamily HS.268326 NM 013269 O.91 NM 001004420, 2, member D NM 013269 MS4A12 220834 at NM O17716 membrane-spanning 4 Hs.272789 NM O17716 O.S domains, Subfamily A, member 12 C1orf112 220840 s at NM 01818.6 chromosome 1 open Hs.443551 NM 01818.6 O.96 reading frame 112 TPRT 220865 S at NM O14317 trans-prenyltransferase Hs.555924 NM O14317 O.92 APOL3 221087 s at NM O14349, apolipoprotein L, 3 HS.474.737 NM O14349 O.84 NM 03.0644, NM 145639, NM 145640, NM 145641, NM 145642 C14orf156 221434 S. at NM 031210 chromosome 14 open HS.324521 NM O31210 O.9 reading frame 156 YTHDF3 221749 at NM 152758 YTH domain family, HS.491861 AU157915 O.9S member 3 LOC146909 222039 at hypothetical protein HS.135094 AA292789 O.83 LOC146909 TRAFD1 35254 at NM OO6700 TRAF-type zinc finger HS.5148 ABOO7447 O.98 domain containing 1 ESPL1 38158 at NM 012291 extra spindle poles like 1 HS.153479 D7998.7 O.87 (S. cerevisiae) BTN3A3 38241 at NM OO6994, butyrophilin, subfamily HS.167741 U90548 O.9 NM 197974 3, member A3

General Approaches to Prognostic Marker Detection determined in recurring tumours and non-recurring tumours; by comparison of marker expression levels to levels deter 0059. The following approaches are non-limiting methods mined in tumours with or without metastasis; by comparison that can be used to detect the proliferation markers, including ofmarker expression levels to levels determined in differently CCPM family members: microarray approaches using oligo staged tumours; or by comparison of marker expression lev nucleotide probes selective for a CCPM; real-time qPCR on els to levels determined in cells with different levels of pro tumour samples using CCPM specific primers and probes; liferation. A negative or positive prognosis is determined real-time qPCR on lymph node, blood, serum, faecal, or urine based on this analysis. Further analysis of tumour marker samples using CCPM specific primers and probes; enzyme expression includes matching those markers exhibiting linked immunological assays (ELISA); immunohistochemis increased or decreased expression with expression profiles of try using anti-marker antibodies; and analysis of array or qPCR data using computers. known colorectal tumours to provide a prognosis. 0060. Other useful methods include northern blotting and 0062. A threshold for concluding that expression is in situ hybridization (Parker and Barnes, Methods in Molecu increased will be dependent on the particular marker and also lar Biology 106: 247-283 (1999)); RNase protection assays the particular predictive model that is to be applied. The (Hod, BioTechniques 13: 852-854 (1992); reverse transcrip threshold is generally set to achieve the highest sensitivity and tion polymerase chain reaction (RT-PCR; Weis et al., Trends selectivity with the lowest error rate, although variations may in Genetics 8: 263-264 (1992)): serial analysis of gene be desirable for a particular clinical situation. The desired expression (SAGE: Velculescu et al., Science 270: 484-487 threshold is determined by analysing a population of Suffi (1995); and Velculescu et al., Cell 88: 243-51 (1997)), Mas cient size taking into account the statistical variability of any SARRAY technology (Sequenom, San Diego, Calif.), and predictive model and is calculated from the size of the sample gene expression analysis by massively parallel signature used to produce the predictive model. The same applies for sequencing (MPSS: Brenner et al., Nature Biotechnology 18: the determination of a threshold for concluding that expres 630-634 (2000)). Alternatively, antibodies may be employed sion is decreased. It can be appreciated that other thresholds, that can recognize specific complexes, including DNA or methods for establishing a threshold, for concluding that duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or increased or decreased expression has occurred can be DNA-polypeptide duplexes. selected without departing from the scope of this invention. 0061 Primary data can be collected and fold change 0063. It is also possible that a prediction model may pro analysis can be performed, for example, by comparison of duce as its output a numerical value, for example a score, marker expression levels in tumour tissue and non-tumour likelihood value or probability. In these instances, it is pos tissue; by comparison of marker expression levels to levels sible to apply thresholds to the results produced by prediction US 2016/006891.6 A1 Mar. 10, 2016

models, and in these cases similar principles apply as those reaction. The two most commonly used reverse transcriptases used to set thresholds for expression values. are avian myeloblastosis virus reverse transcriptase (AMV 0064. Once the expression level, or output of a prediction RT) and Moloney murine leukaemia virus reverse tran model, of a predictive signature in a tumour sample has been scriptase (MMLV-RT). The reverse transcription step is typi obtained, the likelihood of the cancer recurring can then be cally primed using specific primers, random hexamers, or determined. oligo-dT primers, depending on the circumstances and the 0065. From the markers identified, prognostic signatures goal of expression profiling. For example, extracted RNA can comprising one or more CCPMs can be used to determine the be reverse-transcribed using a GeneAmp RNA PCR kit (Per prognosis of a cancer, by comparing the expression level of kin Elmer, CA, USA), following the manufacturers instruc the one or more markers to the disclosed prognostic signature. tions. The derived cDNA can then be used as a template in the By comparing the expression of one or more of the CCPMs in subsequent PCR reaction. a tumour sample with the disclosed prognostic signature, the likelihood of the cancer recurring can be determined. The 0071 Although the PCR step can use a variety of thermo comparison of expression levels of the prognostic signature to stable DNA-dependent DNA polymerases, it typically establish a prognosis can be done by applying a predictive employs the Taq DNA polymerase, which has a 5'-3' nuclease model as described previously. activity but lacks a 3'-5' proofreading endonuclease activity. 0066 Determining the likelihood of the cancer recurring Thus, TaqMan (q) PCR typically utilizes the 5' nuclease activ is of great value to the medical practitioner. A high likelihood ity of Taq or Tth polymerase to hydrolyze a hybridization of re-occurrence means that alonger or higher dose treatment probe bound to its target amplicon, but any enzyme with should be given, and the patient should be more closely moni equivalent 5' nuclease activity can be used. tored for signs of recurrence of the cancer. An accurate prog 0072. Two oligonucleotide primers are used to generate an nosis is also of benefit to the patient. It allows the patient, amplicon typical of a PCR reaction. A third oligonucleotide, along with their partners, family, and friends to also make or probe, is designed to detect nucleotide sequence located decisions about treatment, as well as decisions about their between the two PCR primers. The probe is non-extendible future and lifestyle changes. Therefore, the invention also by Taq DNA polymerase enzyme, and is labeled with a provides for a method establishing a treatment regime for a reporterfluorescent dye and a quencher fluorescent dye. Any particular cancer based on the prognosis established by laser-induced emission from the reporter dye is quenched by matching the expression of the markers in a tumour sample the quenching dye when the two dyes are located close with the differential expression signature. together as they are on the probe. During the amplification 0067. It will be appreciated that the marker selection, or reaction, the Taq DNA polymerase enzyme cleaves the probe construction of a prognostic signature, does not have to be in a template-dependent manner. The resultant probe frag restricted to the CCPMs disclosed in Tables 1, 2, or 5, herein, ments disassociate in Solution, and signal from the released or the prognostic signatures disclosed in Tables 3, 4, 8A, 8B, reporter dye is free from the quenching effect of the second and 9, but could involve the use of one or more CCPMs from fluorophore. One molecule of reporter dye is liberated for the disclosed signatures, or a new signature may be estab each new molecule synthesized, and detection of the lished using CCPMs selected from the disclosed marker lists. unduenched reporter dye provides the basis for quantitative The requirement of any signature is that it predicts the like interpretation of the data. lihood of recurrence with enough accuracy to assista medical 0073 TaqMan RT-PCR can be performed using commer practitioner to establish a treatment regime. cially available equipment, such as, for example, ABIPRISM Reverse Transcription PCR (RT-PCR) 7700 Sequence Detection System (Perkin-Elmer-Applied Biosystems, Foster City, Calif., USA), or Lightcycler (Roche 0068. Of the techniques listed above, the most sensitive Molecular Biochemicals, Mannheim, Germany). In a pre and most flexible quantitative method is RT-PCR, which can ferred embodiment, the 5' nuclease procedure is run on a be used to compare RNA levels in different sample popula real-time quantitative PCR device such as the ABI PRISM tions, in normal and tumour tissues, with or without drug 7700tam Sequence Detection System. The system consists of treatment, to characterize patterns of expression, to discrimi athermocycler, laser, charge-coupled device (CCD), camera, nate between closely related RNAs, and to analyze RNA and computer. The system amplifies samples in a 96-well Structure. format on a thermocycler. During amplification, laser-in 0069. For RT-PCR, the first step is the isolation of RNA duced fluorescent signal is collected in real-time through fibre from a target sample. The starting material is typically total optics cables for all 96 wells, and detected at the CCD. The RNA isolated from human tumours or tumour cell lines, and system includes software for running the instrument and for corresponding normal tissues or cell lines, respectively. RNA analyzing the data. can be isolated from a variety of Samples, such as tumour samples from breast, lung, colon (e.g., large bowel or Small 0074 5' nuclease assay data are initially expressed as Ct, bowel), colorectal, gastric, esophageal, anal, rectal, prostate, or the threshold cycle. As discussed above, fluorescence val brain, liver, kidney, pancreas, spleen, thymus, testis, ovary, ues are recorded during every cycle and represent the amount uterus, etc., tissues, from primary tumours, or tumour cell of product amplified to that point in the amplification reac lines, and from pooled samples from healthy donors. If the tion. The point when the fluorescent signal is first recorded as source of RNA is a tumour, RNA can be extracted, for statistically significant is the threshold cycle. example, from frozen or archived paraffin-embedded and 0075 To minimize errors and the effect of sample-to fixed (e.g., formalin-fixed) tissue samples. sample variation, RT-PCR is usually performed using an 0070 The first step in gene expression profiling by RT internal standard. The ideal internal standard is expressed at a PCR is the reverse transcription of the RNA template into constant level among different tissues, and is unaffected by cDNA, followed by its exponential amplification in a PCR the experimental treatment. RNAs most frequently used to US 2016/006891.6 A1 Mar. 10, 2016 24 normalize patterns of gene expression are mRNAS for the paraffin-embedded tumour tissue, using microarray technol housekeeping genes glyceraldehyde-3-phosphate-dehydro ogy. In this method, polynucleotide sequences of interest genase (GAPDH) and-actin. (including cDNAS and oligonucleotides) are plated, or Real-Time Quantitative PCR (qPCR) arrayed, on a microchip Substrate. The arrayed sequences 0076. A more recent variation of the RT-PCR technique is (i.e., capture probes) are then hybridized with specific poly the real time quantitative PCR, which measures PCR product nucleotides from cells or tissues of interest (i.e., targets). Just accumulation through a dual-labeled fluorigenic probe (i.e., TaqMan probe). Real time PCR is compatible both with quan as in the RT-PCR method, the source of RNA typically is total titative competitive PCR and with quantitative comparative RNA isolated from human tumours or tumour cell lines, and PCR. The former uses an internal competitor for each target corresponding normal tissues or cell lines. Thus RNA can be sequence for normalization, while the latter uses a normal isolated from a variety of primary tumours or tumour cell ization gene contained within the sample, or a housekeeping lines. If the source of RNA is a primary tumour, RNA can be gene for RT-PCR. Further details are provided, e.g., by Held extracted, for example, from frozen or archived formalin et al., Genome Research 6: 986-994 (1996). fixed paraffin-embedded (FFPE) tissue samples and fixed 0077 Expression levels can be determined using fixed, (e.g., formalin-fixed) tissue samples, which are routinely pre paraffin-embedded tissues as the RNA source. According to pared and preserved in everyday clinical practice. one aspect of the present invention, PCR primers and probes I0081. In a specific embodiment of the microarray tech are designed based upon intron sequences present in the gene nique, PCR amplified inserts of cDNA clones are applied to a to be amplified. In this embodiment, the first step in the primer/probe design is the delineation of intron sequences substrate. The substrate can include up to 1, 2, 5, 10, 15, 20, within the genes. This can be done by publicly available 25, 30, 35, 40, 45, 50, or 75 nucleotide sequences. In other software, such as the DNA BLAT software developed by aspects, the substrate can include at least 10,000 nucleotide Kent, W. J., Genome Res. 12 (4): 656-64 (2002), or by the sequences. The microarrayed sequences, immobilized on the BLAST software including its variations. Subsequent steps microchip, are Suitable for hybridization under Stringent con follow well established methods of PCR primer and probe ditions. As other embodiments, the targets for the microarrays design. can be at least 50, 100, 200, 400, 500, 1000, or 2000 bases in 0078. In order to avoid non-specific signals, it is useful to length; or 50-100, 100-200, 100-500, 100-1000, 100-2000, or mask repetitive sequences within the introns when designing 500-5000 bases in length. As further embodiments, the cap the primers and probes. This can be easily accomplished by ture probes for the microarrays can be at least 10, 15, 20, 25, using the Repeat Masker program available on-line through 50, 75, 80, or 100 bases in length; or 10-15, 10-20, 10-25, the Baylor College of Medicine, which screens DNA 10-50, 10-75, 10-80, or 20-80 bases in length. sequences against a library of repetitive elements and returns I0082 Fluorescently labeled cDNA probes may be gener a query sequence in which the repetitive elements are masked. The masked sequences can then be used to design primer and ated through incorporation of fluorescent nucleotides by probe sequences using any commercially or otherwise pub reverse transcription of RNA extracted from tissues of inter licly available primer/probe design packages, such as Primer est. Labeled cDNA probes applied to the chip hybridize with Express (Applied Biosystems); MGB assay-by-design (Ap specificity to each spot of DNA on the array. After stringent plied Biosystems); Primer3 (Steve Rozen and Helen J. Ska washing to remove non-specifically bound probes, the chip is letsky (2000) Primer3 on the WWW for general users and for scanned by confocal laser microscopy or by another detection biologist programmers in: Krawetz, S. Misener S (eds) Bio method, such as a CCD camera. Quantitation of hybridization informatics Methods and Protocols: Methods in Molecular of each arrayed element allows for assessment of correspond Biology. Humana Press, Totowa, N.J., pp. 365-386). ing mRNA abundance. With dual colour fluorescence, sepa 0079. The most important factors considered in PCR rately labeled cDNA probes generated from two sources of primer design include primer length, melting temperature RNA are hybridized pairwise to the array. The relative abun (T), and G/C content, specificity, complementary primer dance of the transcripts from the two sources corresponding sequences, and 3' end sequence. In general, optimal PCR to each specified gene is thus determined simultaneously. An primers are generally 17-30 bases in length, and containabout exemplary protocol for this is described in detail in Example 20-80%, such as, for example, about 50-60% G+C bases. 4. Melting temperatures between 50 and 80°C., e.g., about 50 to I0083. The miniaturized scale of the hybridization affords a 70° C., are typically preferred. For further guidelines for PCR convenient and rapid evaluation of the expression pattern for primer and probe design see, e.g., Dieffenbach, C. W. et al., large numbers of genes. Such methods have been shown to General Concepts for PCR Primer Design in: PCR Primer, A have the sensitivity required to detect rare transcripts, which Laboratory Manual, Cold Spring Harbor Laboratory Press, are expressed at a few copies per cell, and to reproducibly New York, 1995, pp. 133-155; Innis and Gelfand, Optimiza detect at least approximately two-fold differences in the tion of PCRs in: PCR Protocols, A Guide to Methods and expression levels (Schena et al., Proc. Natl. Acad. Sci. USA Applications, CRC Press, London, 1994, pp. 5-11; and Plas 93 (2): 106-149 (1996)). Microarray analysis can be per terer, T. N. Primerselect: Primer and probe design. Methods formed by commercially available equipment, following Mol. Biol. 70:520-527 (1997), the entire disclosures of which manufacturer's protocols, such as by using the Affymetrix are hereby expressly incorporated by reference. GenChip technology, IIlumina microarray technology or Incyte’s microarray technology. The development of Microarray Analysis microarray methods for large-scale analysis of gene expres 0080. Differential expression can also be identified, or sion makes it possible to search systematically for molecular confirmed using the microarray technique. Thus, the expres markers of cancer classification and outcome prediction in a sion profile of CCPMs can be measured in either fresh or variety of tumour types. US 2016/006891.6 A1 Mar. 10, 2016

RNA. Isolation, Purification, and Amplification using bioinformatics. Proteomics methods are valuable Supplements to other methods of gene expression profiling, 0084 General methods for mRNA extraction are well and can be used, alone or in combination with other methods, known in the art and are disclosed in standard textbooks of to detect the products of the proliferation markers of the molecular biology, including Ausubel et al., Current Proto present invention. cols of Molecular Biology, John Wiley and Sons (1997). I0088. Once the expression level of one or more prognostic Methods for RNA extraction from paraffin embedded tissues markers in a tumour sample has been assessed the likelihood are disclosed, for example, in Rupp and Locker, Lab Invest. of the cancer recurring can then be determined. The inventors 56: A67 (1987), and De Sandres et al., BioTechniques 18: have identified a number of markers that are differentially 42044 (1995). In particular, RNA isolation can be performed expressed in non-recurring colorectal cancers compared to using purification kit, buffer set, and protease from commer recurring colorectal cancers in patient data sets. The markers cial manufacturers, such as Qiagen, according to the manu are set out in Tables 1, 2, and 9, in the examples below. facturers instructions. For example, total RNA from cells in I0089. Selection of Differentially Expressed Genes. culture can be isolated using Qiagen RNeasy mini-columns. 0090. An early approach to the selection of genes deemed Other commercially available RNA isolation kits include significant involved simply looking at the "fold change of a MasterPure Complete DNA and RNA Purification Kit (EPI given gene between the two groups of interest. While this CENTRE (D, Madison, Wis.), and Paraffin Block RNA. Iso approach hones in on genes that seem to change the most lation Kit (Ambion, Inc.). Total RNA from tissue samples can spectacularly, consideration of basic statistics leads one to be isolated using RNA Stat-60 (Tel-Test). RNA prepared realize that if the variance (or noise level) is quite high (as is from tumour can be isolated, for example, by cesium chloride often seen in microarray experiments), then seemingly large density gradient centrifugation. fold-change can happen frequently by chance alone. 0085. The steps of a representative protocol for profiling 0091 Microarray experiments, such as those described gene expression using fixed, paraffin-embedded tissues as the here, typically involve the simultaneous measurement of RNA source, including mRNA isolation, purification, primer thousands of genes. If one is comparing the expression levels extension and amplification are given in various published for a particular gene between two groups (for example recur journal articles (for example: T. E. Godfrey et al. J. Molec. rent and non-recurrent tumours), the typical tests for signifi Diagnostics 2: 84-91 (2000); K. Specht et al., Am. J. Pathol. cance (such as the t-test) are not adequate. This is because, in 158: 419-29 (2001)). Briefly, a representative process starts an ensemble of thousands of experiments (in this context each with cutting about 10um thick sections of paraffin-embedded gene constitutes an “experiment'), the probability of at least tumour tissue samples. The RNA is then extracted, and pro one experiment passing the usual criteria for significance by tein and DNA are removed. After analysis of the RNA con chance alone is essentially unity. In a test for significance, one centration, RNA repair and/or amplification steps may be typically calculates the probability that the “null hypothesis” included, if necessary, and RNA is reverse transcribed using is correct. In the case of comparing two groups, the null gene specific promoters followed by RT-PCR. Finally, the hypothesis is that there is no difference between the two data are analyzed to identify the best treatment option(s) groups. If a statistical test produces a probability for the null available to the patient on the basis of the characteristic gene hypothesis below some threshold (usually 0.05 or 0.01), it is expression pattern identified in the tumour sample examined. stated that we can reject the null hypothesis, and accept the hypothesis that the two groups are significantly different. Immunohistochemistry and Proteomics Clearly, in such a test, a rejection of the null hypothesis by I0086. Immunohistochemistry methods are also suitable chance alone could be expected 1 in 20 times (or 1 in 100). for detecting the expression levels of the proliferation mark The use oft-tests, or other similar statistical tests for signifi ers of the present invention. Thus, antibodies or antisera, cance, fail in the context of microarrays, producing far too preferably polyclonal antisera, and most preferably mono many false positives (or type I errors) clonal antibodies specific for each marker, are used to detect 0092. In this type of situation, where one is testing mul expression. The antibodies can be detected by direct labeling tiple hypotheses at the same time, one applies typical multiple of the antibodies themselves, for example, with radioactive comparison procedures, such as the Bonferroni Method (43). labels, fluorescent labels, hapten labels such as, biotin, or an However such tests are too conservative for most microarray enzyme Such as horse radish peroxidase or alkaline phos experiments, resulting in too many false negative (type II) phatase. Alternatively, unlabeled primary antibody is used in COS. conjunction with a labeled secondary antibody, comprising 0093. A more recent approach is to do away with attempt antisera, polyclonal antisera or a monoclonal antibody spe ing to apply a probability for a given test being significant, cific for the primary antibody. Immunohistochemistry proto and establish a means for selecting a Subset of experiments, cols and kits are well known in the art and are commercially such that the expected proportion of Type I errors (or false available. discovery rate: 47) is controlled for. It is this approach that has 0087 Proteomics can be used to analyze the polypeptides been used in this investigation, through various implementa present in a sample (e.g., tissue, organism, or cell culture) at tions, namely the methods provided with BRB Array Tools a certain point of time. In particular, proteomic techniques (48), and the limma (11,42) package of Bioconductor (that can be used to assess the global changes of polypeptide uses the R statistical environment; 10.39). expression in a sample (also referred to as expression pro General Methodology for Data Mining: Generation of teomics). Proteomic analysis typically includes: (1) separa Prognostic Signatures tion of individual polypeptides in a sample by 2-D gel elec trophoresis (2-D PAGE); (2) identification of the individual 0094) Data Mining is the term used to describe the extrac polypeptides recovered from the gel, e.g., by mass spectrom tion of “knowledge', in other words the “know-how, or etry or N-terminal sequencing, and (3) analysis of the data predictive ability from (usually) large volumes of data (the US 2016/006891.6 A1 Mar. 10, 2016 26 dataset). This is the approach used in this study to generate produce a prediction model. The actual ability of a pre prognostic signatures. In the case of this study the "know diction model to describe a dataset is derived from some how’ is the ability to accurately predict prognosis from a subset of the full dimensionality of the dataset. These given set of gene expression measurements, or 'signature' (as dimensions the most important components (or fea described generally in this section and in more detail in the tures) of the dataset. Note in the context of microarray examples section). data, the dimensions of the dataset are the individual 0095. The specific details used for the methods used in this genes. Feature selection, in the context described here, study are described in Examples 17-20. However, application involves finding those genes which are most “differen of any of the data mining methods (both those described in the tially expressed”. In a more general sense, it involves Examples, and those described here) can follow this general those groups which pass Some statistical test for signifi protocol. cance, i.e. is the level of aparticular variable consistently 0096 Data mining (49), and the related topic machine higher or lower in one or other of the groups being learning (40) is a complex, repetitive mathematical task that investigated. Sometimes the features are those variables involves the use of one or more appropriate computer soft (or dimensions) which exhibit the greatest variance. The ware packages (see below). The use of Software is advanta application of feature selection is completely indepen geous on the one hand, in that one does not need to be dent of the method used to create a prediction model, and completely familiar with the intricacies of the theory behind involves a great deal of experimentation to achieve the each technique in order to Successfully use data mining tech desired results. Within this invention, the selection of niques, provided that one adheres to the correct methodology. significant genes, and those which correlated with the The disadvantage is that the application of data mining can earlier successful model (the NZ classifier), entailed often be viewed as a “black box”: one inserts the data and feature selection. In addition, methods of data reduction receives the answer. How this is achieved is often masked (such as principal componentanalysis) can be applied to from the end-user (this is the case for many of the techniques the dataset. described, and can often influence the statistical method cho 0102 Training. Once the classes (e.g. recurrence/non Sen for data mining. For example, neural networks and Sup recurrence) and the features of the dataset have been port vector machines have a particularly complex implemen established, and the data is represented in a form that is tation that makes it very difficult for the end user to extract out acceptable as input for data mining, the reduced dataset the “rules' used to produce the decision. On the other hand, (as described by the features) is applied to the prediction k-nearest neighbours and linear discriminant analysis have a model of choice. The input for this model is usually in very transparent process for decision making that is not hid the form a multi-dimensional numerical input, (known den from the user. as a vector), with associated output information (a class 0097. There are two types of approach used in data min label or a response). In the training process, selected data ing: Supervised and unsupervised approaches. In the Super is input into the prediction model, either sequentially (in vised approach, the information that is being linked to the techniques such as neural networks) or as a whole (in data is known, such as categorical data (e.g. recurrent vs. non techniques that apply some form of regression, Such as recurrent tumours). What is required is the ability to link the linear models, linear discriminant analysis, Support vec observed response (e.g. recurrence vs. non-recurrence) to the tor machines). In some instances (e.g. k-nearest neigh input variables. In the unsupervised approach, the classes bours) the dataset (or subset of the dataset obtained after within the dataset are not known in advance, and data mining feature selection) is itself the model. As discussed, effec methodology is employed to attempt to find the classes or tive models can be established with minimal under structure within the dataset. standing of the detailed mathematics, through the use of 0098. In the present example the supervised approach was various software packages where the parameters of the used and is discussed in detail here, although it will be appre model have been pre-determined by expert analysts as ciated that any of the other techniques could be used. most likely to lead to successful results. 0099. The overall protocol involves the following steps: 0.103 Validation. This is a key component of the data 0100 Data representation. This involves transforma mining protocol, and the incorrect application of this tion of the data into a form that is most likely to work frequently leads to errors. Portions of the dataset are to Successfully with the chosen data mining technique. In be set aside, apart from feature selection and training, to where the data is numerical. Such as in this study where test the success of the prediction model. Furthermore, if the data being investigated represents relative levels of the results of validation are used to effect feature selec gene expression, this is fairly simple. If the data covers a tion and training of the model, then one obtains a further large dynamic range (i.e. many orders of magnitude) validation set to test the model before it is applied to often the log of the data is taken. If the data covers many real-life situations. If this process is not strictly adhered measurements of separate samples on separate days by to the model is likely to fail in real-world situations. The separate investigators, particular care has to be taken to methods of validation are described in more detail ensure systematic error is minimised. The minimisation below. of systematic error (i.e. errors resulting from protocol 0.104) Application. Once the model has been con differences, machine differences, operator differences structed, and validated, it must be packaged in Some way and other quantifiable factors) is the process referred to as it is accessible to end users. This often involves imple here as “normalisation'. mentation of some form a spreadsheet application, into 0101 Feature Selection. Typically the dataset contains which the model has been imbedded, scripting of a sta many more data elements than would be practical to tistical Software package, or refactoring of the model measure on a day-to-day basis, and additionally many into a hard-coded application by information technol elements that do not provide the information needed to ogy staff. US 2016/006891.6 A1 Mar. 10, 2016 27

0105 Examples of software packages that are frequently noisy data that may not be possible to categorize in other used are: ways. The most common implementation of neural net 0106 Spreadsheet plugins, obtained from multiple ven works is the multi-layer perceptron. dors. 0116 Classification and regression trees (54): In these. 01.07 The R statistical environment. variables are used to define a hierarchy of rules that can 0108. The commercial packages MatLab, S-plus, SAS, be followed in a stepwise manner to determine the class SPSS, STATA. of a sample. The typical process creates a set of rules 0109 Free open-source software such as Octave (a Mat which lead to a specific class output, or a specific state Lab clone) 0110 many and varied C++ libraries, which can be used ment of the inability to discriminate. A example classi to implement prediction models in a commercial, fication tree is an implementation of an algorithm Such closed-source setting. aS

Examples of Data Mining Methods. if gene A-X and gene Y > X and gene Z = Z. then 0111. The methods can be by first performing the step of class A data mining process (above), and then applying the appropri else if gene A = q. ate known software packages. Further description of the pro then cess of data mining is described in detail in many extremely class B well-written texts. (49) 0112 Linear models (49, 50): The data is treated as the input of a linear regression model, of which the class 0117 Nearest neighbour methods (51, 52). Predictions labels or responses variables are the output. Class labels, or classifications are made by comparing a sample (of or other categorical data, must be transformed into unknown class) to those around it (or known class), with numerical values (usually integer). In generalised linear closeness defined by a distance function. It is possible to models, the class labels or response variables are not define many different distance functions. Commonly themselves linearly related to the input data, but are used distance functions are the Euclidean distance (an transformed through the use of a “link function'. Logis extension of the Pythagorean distance, as in triangula tic regression is the most common form of generalized tion, to n-dimensions), various forms of correlation (in linear model. cluding Pearson Correlation co-efficient). There are also 0113 Linear Discriminant analysis (49, 51, 52). Pro transformation functions that convert data points that vided the data is linearly separable (i.e. the groups or would not normally be interconnected by a meaningful classes of data can be separated by a hyperplane, which distance metric into euclidean space, so that Euclidean is an n-dimensional extension of a threshold), this tech distance can then be applied (e.g. Mahalanobis dis nique can be applied. A combination of variables is used tance). Although the distance metric can be quite com to separate the classes, such that the between group plex, the basic premise of k-nearest neighbours is quite variance is maximised, and the within-group variance is simple, essentially being a restatement of “find the minimised. The byproduct of this is the formation of a k-data vectors that are most similar to the unknown classification rule. Application of this rule to samples of input, find out which class they correspond to, and vote unknown class allows predictions or classification of as to which class the unknown input is'. class membership to be made for that sample. There are 0118. Other methods: variations of linear discriminant analysis such as nearest shrunken centroids which are commonly used for 0119 Bayesian networks. A directed acyclic graph is microarray analysis. used to represent a collection of variables in conjunc 0114 Support vector machines (53): A collection of tion with their joint probability distribution, which is variables is used in conjunction with a collection of then used to determine the probability of class mem weights to determine a model that maximizes the sepa bership for a sample. ration between classes in terms of those weighted vari I0120 Independent components analysis, in which ables. Application of this model to a sample then pro independent signals (e.g., class membership) re iso duces a classification or prediction of class membership lated (into components) from a collection of vari for that sample. ables. These components can then be used to produce 0115) Neural networks (52): The data is treated as input a classification or prediction of class membership for into a network of nodes, which superficially resemble a sample. biological neurons, which apply the input from all the 0121 Ensemble learning methods in which a collection nodes to which they are connected, and transform the of prediction methods are combined to produce a joint input into an output. Commonly, neural networks use the classification or prediction of class membership for a “multiply and Sum' algorithm, to transform the inputs sample from multiple connected input nodes into a single out put. A node may not necessarily produce an output 0.122 There are many variations of these methodologies unless the inputs to that node exceed a certain threshold. that can be explored (49), and many new methodologies are Each node has as its input the output from several other constantly being defined and developed. It will be appreciated nodes, with the final output node usually being linked to that any one of these methodologies can be applied in order to a categorical variable. The number of nodes, and the obtain an acceptable result. Particular care must be taken to topology of the nodes can be varied in almost infinite avoid overfitting, by ensuring that all results are tested via a ways, providing for the ability to classify extremely comprehensive validation scheme. US 2016/006891.6 A1 Mar. 10, 2016 28

Validation I0127. Leave-one-out cross-validation: A commonly used 0123 Application of any of the prediction methods variation of K-fold cross validation, in which Kn, where n is described involves both training and cross-validation (43,55) the number of samples. before the method can be applied to new datasets (such as data 0.128 Combinations of CCPMS, such as those described from a clinical trial). Training involves taking a Subset of the above in Tables 1 and 2, can be used to construct predictive dataset of interest (in this case gene expression measurements models for prognosis. from colorectal tumours). Such that it is stratified across the classes that are being tested for (in this case recurrent and Prognostic Signatures non-recurrent tumours). This training set is used to generate a prediction model (defined above), which is tested on the 0129. Prognostic signatures, comprising one or more of remainder of the data (the testing set). these markers, can be used to determine the outcome of a 0.124. It is possible to alter the parameters of the prediction patient, through application of one or more predictive models model so as to obtain better performance in the testing set, derived from the signature. In particular, a clinician or however, this can lead to the situation known as overfitting, researcher can determine the differential expression (e.g., where the prediction model works on the training dataset but increased or decreased expression) of the one or more mark not on any external dataset. In order to circumvent this, the ers in the signature, apply a predictive model, and thereby process of validation is followed. There are two major types predict the negative prognosis, e.g., likelihood of disease of validation typically applied, the first (hold-out validation) relapse, of a patient, or alternatively the likelihood of a posi involves partitioning the dataset into three groups: testing, tive prognosis (continued remission). training, and validation. The validation set has no input into 0.130. A set of prognostic signatures have been developed. the training process whatsoever, so that any adjustment of In the first instance, there are two signatures developed by parameters or other refinements must take place during appli cross-comparison of predictive ability between two datasets: cation to the testing set (but not the validation set). The second the set of microarray experiments encompassing the German major type is cross-validation, which can be applied in several colorectal cancer samples, and the set of microarray experi different ways, described below. ments encompassing the New Zealand samples (discussed in 0.125. There are two main sub-types of cross-validation: example 6). In the second instance there has been an exhaus K-fold cross-validation, and leave-one-out cross-validation tive statistical search for effective signatures based solely on 0126 K-fold cross-validation: The dataset is divided into the German dataset (discussed in example 17). K subsamples, each subsample containing approximately the 0131. As described in Example 6 below, a prognostic sig same proportions of the class groups as the original. In each nature comprising 19 genes has been established from a set of round of validation, one of the K Subsamples is set aside, and colorectal samples from Germany (Table 4). Another prog training is accomplished using the remainder of the dataset. nostic signature, of 22 genes, has also been established from The effectiveness of the training for that round is gauged by samples of colorectal tumours from patients in New Zealand how correctly the classification of the left-out group is. This (Table 3). By obtaining a patient sample (e.g., tumour procedure is repeated K-times, and the overall effectiveness sample), and matching the expression levels of one or more ascertained by comparison of the predicted class with the markers in the sample to the differential expression profile, known class. the likelihood of the cancer recurring can be determined. TABLE 3 New Zealand prognostic signature WDR44 WD repeat domain 44 O.81 HS.98510 NM O19045 RBMS1 rina binding motif, single stranded 1.27 S.470412 NM 016836 interacting protein 1, isoform d SACM1L, Ras-GTPase activating protein 0.84 HS.156509 NM 014016 SH3 domain-binding protein 2 SOAT1 sterol O-acyltransferase acyl- 1.21 HS.496383 NM 003101 coenzyme a: cholesterol acyltransferase 1 PBK pdz-binding kinase 0.76 Hs.104741 NM 018492 G3BP2 ras-gtpase activating protein 0.86 HS.303676 NM 012297 sh3 domain-binding protein 2 ZBTB20 Zinc finger and BTB domain 1.2 S.477166 NM. O15642 containing 20 ZNF410 Zinc finger protein 410 0.84 HS.270869 NM 02118.8 COMMD2 COMM domain containing 2 1.09 Hs.591315 NM 016094 PSMC1 proteasome (prosome, macropain) 0.79 HS.356654 NM 002802 26S subunit, atpase, 1 COX10 COX10 homolog, cytochrome c O.9 S.462278 NM OO1303 oxidase assembly protein, heme A: farnesyltransferase (yeast) general transcription factor 0.84 HS.495417 NM 012087 iiic, polypeptide 5 (63 kd) hyaluronan-mediated motility 0.78 Hs.72550 NM 012485 receptor (rhamm) UBE2IL3 ubiquitin-conjugating enzyme e23 0.83 Hs.108104 NM OO3347 GNAS gnas complex locus 1.26 Hs.125898 NM 000516 US 2016/006891.6 A1 Mar. 10, 2016 29

TABLE 3-continued

New Zealand prognostic signature

PPP2R2A protein phosphatase 2 (formerly 2a), O.91 S.146339 NM 002717 regulatory Subunit b (pr 52), alpha isoform RNASE2 ribonuclease, rnase a family, 2 O.83 HS.728 NM OO2934 (liver, eosinophil-derived neurotoxin) SCOC short coiled-coil protein 0.78 HS.480815 NM 032547 PSMD9 proteasome (prosome, macropain) 0.89 Hs.131151 NM 002813 26S subunit, non-atpase, 9 EIF3S7 eukaryotic translation initiation 0.85 HS.55682 NM 003753 factor 3, Subunit 7 (Zeta, 66/67kd) ATP2B4 ATPase, Ca++ transporting, 1.11 HS.343522 NM 001001396 plasma membrane 4 NM OO1684 ABCC9 atp-binding cassette, Sub-family c, 0.9 HS.446050 NM 020298 member 9, isoform sur2a-delta-14

TABLE 4 German prognostic signature Expression fold difference Gene (relapse? UniGene GenBank Symbol Gene Description non-relapse) Cluster Acc. No. CXCL10 Chemokine (C-X-C motif) O.87 Hs.413924 NMOO1565 ligand 10 FAS FAS (TNF receptor O.9 Hs.244139 NM 000043 Superfamily, member 6) NM 152871 NM 152872 NM 152873 NM 152874 NM 152875 NM 152876 NM 152877 CXCL9 chemokine (C-X-C motif) O.87 Hs.77367 NM 002416 ligand 9 TLK1 tOusled-like kinase 1 O.91 HS.470586 NM 012290 CXCL11 chemokine (C-X-C motif) 0.75 Hs.518814 NM OO5409 ligand 11 PBK T-LAK cell-originated O.86 Hs.104741 NM 018492 protein kinase PSAT1 phosphoserine O.91 Hs.494.261 NM 021154 aminotransferase 1 MAD2L1 MAD2 mitotic arrest O.89 Hs.5331.85 NM 002358 deficient-like 1 (yeast) CA2 carbonic anhydrase II O.84 Hs.155097 NM OOOO67 GZMB granzyme B (granzyme 2, O.9 HS. 1051 NM 004131 cytotoxic T-lymphocyte associated serine esterase 1) SLC4A4 Solute carrier family 4, O.86 HS.5462 NM 003759 Sodium bicarbonate cotransporter, member 4 DLG7 discs, large homolog 7 O.89 Hs.77695 NM 014750 (Drosophila) TNFRSF11A tumor necrosis factor receptor O.9 Hs.204044 NM 003839 Superfamily, member 11a, activator of NFKB KITLG KIT ligand O.91 Hs.1048 NM 000899 INDO indoleamine-pyrrole 2.3 O.91 HS.840 NM 002164 dioxygenase GBP1 guanylate binding protein 1, O.9 Hs.62661, NM 002053 interferon-inducible, 67kDa CXCL13 chemokine (C-X-C motif) O.86 Hs.100431 NM 006419 ligand 13 (B-cell chemoattractant) CLCA4 chloride channel, calcium O.84 Hs.546343 NM 012128 activated, family member 4 PCP4 Purkinje cell protein 4 1.14 S.80296 NM 006198 US 2016/006891.6 A1 Mar. 10, 2016 30

TABLE 5

Immune res OlSC 9CIlcS Expression fold difference Gene (relapse? OniCiene GenBank Symbol Gene Description non-relapse) Cluster Acc. No. CXCL9 chemokine (C-X-C motif) O.87 HS.77367 NM 002416 ligand 9 CXCL10 Chemokine (C-X-C motif) O.87 HS.413924 NM OO1565 ligand 10 CXCL11 chemokine (C-X-C motif) 0.75 HS.518814 AFO30514 ligand 11 CXCL13 chemokine (C-X-C motif) O.86 HS.100431 NM 006419 ligand 13 (B-cell chemoattractant) PBK T-LAK cell-originated O.86 HS. 104741 NM 018492 protein kinase INDO indoleamine-pyrrole 2.3 O.91 HS.840 M3445S dioxygenase GBP1 guanylate binding protein 1, O.9 HS-62661 NM 002053 interferon-inducible, 67kDa GZMB granzyme B (granzyme 2, O.9 HS. 1051 JO31.89 cytotoxic T-lymphocyte associated serine esterase 1) KITLG KIT ligand O.91 HS.1048 NM OOO899 TNFRSF11A tumor necrosis factor receptor O.9 HS2O4044 NM OO3839 Superfamily, member 11a, activator of NFKB FAS FAS (TNF receptor O.9 HS.244.139 Z70519 Superfamily, member 6)

(0132) In certain aspects, this invention provides methods I0136. In still further aspects, the invention includes a for determining the prognosis of a cancer, comprising: (a) device for detecting a CCPM, comprising: a Substrate having providing a sample of the cancer; (b) detecting the expression a CCPM capture reagent thereon; and a detector associated level of a CCPM family member in said sample; and (c) with said Substrate, said detector capable of detecting a determining the prognosis of the cancer. In one aspect, the CCPM associated with said capture reagent. Additional cancer is colorectal cancer. aspects include kits for detecting cancer, comprising: a Sub 0133. In other aspects, the invention includes a step of strate; a CCPM capture reagent; and instructions for use. Yet detecting the expression level of a CCPM mRNA. In other further aspects of the invention include method for detecting aspects, the invention includes a step of detecting the expres a CCPM using qPCR, comprising: a forward primer specific sion level of a CCPM polypeptide. In yet a further aspect, the for said CCPM; a reverse primer specific for said CCPM; invention includes a step of detecting the level of a CCPM PCR reagents; a reaction vial; and instructions for use. peptide. In yet another aspect, the invention includes detect 0.137 Additional aspects of this invention comprise a kit ing the expression level of more than one CCPM family for detecting the presence of a CCPM polypeptide or peptide, member in said sample. In a further aspect the CCPM is a comprising: a Substrate having a capture agent for said CCPM gene associated with an immune response. In a further aspect polypeptide or peptide; an antibody specific for said CCPM the CCPM is selected from the markers set forth in Tables 3, polypeptide or peptide; a reagent capable of labeling bound 4, 8A, 8B, or 9. In a still further aspect, the CCPM is included antibody for said CCPM polypeptide or peptide; and instruc in a signature selected from the signatures set forth in Tables tions for use. 3, 4, 8A, 8B, or 9. 0134. In a further aspect the invention comprises detecting 0.138. In yet further aspects, this invention includes a the expression level of WDR44, RBMS1, SACM1L, SOAT1, method for determining the prognosis of colorectal cancer, PBK, G3BP2, ZBTB20, ZNF410, COMMD2, PSMC1, comprising the steps of providing a tumour sample from a COX10, GTF3C5, HMMR, UBE2I.3, GNAS, PPP2R2A, patient Suspected of having colorectal cancer; measuring the RNASE2, SCOC PSMD9, EIF3S7, ATP2B4, and ABCC9. In presence of a CCPM polypeptide using an ELISA method. In a further aspect the invention comprises detecting the expres specific aspects of this invention the CCPM of the invention is sion level of CXCL10, FAS, CXCL9, TLK1, CXCL11, selected from the markers set forth in Tables 1, 2, 5, or 9. In PBK, PSAT1, MAD2L1, CA2, GZMB, SLC4A4, DLG7, still further aspects, the CCPM is included in a prognostic TNFRSF11A, KITLG, INDO, GBP1, CXCL13, CLCA4, signature selected from the signatures set forth in Tables 3, 4, and PCP4. 8A, 8B, or 10. 0135) In still further aspects, the invention includes a method of determining a treatment regime for a cancer com EXAMPLES prising: (a) providing a sample of the cancer; (b) detecting the expression level of a CCPM family member in said sample: 0.139. The examples described herein are for purposes of (c) determining the prognosis of the cancer based on the illustrating embodiments of the invention. Other embodi expression level of a CCPM family member; and (d) deter ments, methods, and types of analyses are within the scope of mining the treatment regime according to the prognosis. persons of ordinary skill in the molecular diagnostic arts and US 2016/006891.6 A1 Mar. 10, 2016

need not be described in detail hereon. Other embodiments TABLE 6-continued within the scope of the art are considered to be part of this invention. Clinical characteristics of New Zealand and German colorectal tumours Example 1 Relapse free Relapse Patients and Methods Median follow up 72 (range: 60-80) 15 (range: 0-59) period/median 0140 Two cohorts of patients were included in this study, recurrence free one set from New Zealand (NZ) and the second from Ger period (months) many (DE). The NZ patients were part of a prospective cohort German data study that included all disease stages, whereas the DE Number of patients 29 26 samples were selected from a tumour bank. Clinical informa Age 64.3 (SD: 12.8) 61.8 (SD: 10.7) tion is shown in Table 6, while FIG. 1 summarises the experi Gender mental design. male 17 (59%) 16 (62%) female 12 (41%) 10 (38%) Example 2 Tumor localization right colon 8 (28%) 4 (15%) Tumour Samples left colon 7 (24%) 5 (19%) 0141 Primary colorectal tumor samples from 149 NZ sigmoid 6 (21%) 7 (27%) patients were obtained from patients undergoing Surgery at rectum 8 (28%) 10 (38%) Dunedin Hospital and Auckland Hospital between 1995 Tumor stage 2000. Tumor samples were snap frozen in liquid nitrogen. All Stage I 5 6 Surgical specimens were reviewed by a single pathologist Stage II 24 2O Median follow up 83.1 (range: 64-99) 27.4 (range: 3-60) (H-SY) and were estimated to contain an average of 85% period/median tumor cells. Among the 149 CRC patients, 12 had metastatic recurrence free disease at presentation, 35 developed recurrent disease, and period (months) 102 were disease-free after a minimum of 5-year follow up. 0142. Primary colorectal tumor samples from DE patients Persisting disease were obtained from patients undergoing Surgery at the Sur gical Department of the Technical University of Munich Example 3 between 1995-2001. A group of 55 colorectal carcinoma samples was selected from banked tumours which had been RNA Extraction and Target Labeling obtained fresh from Surgery, Snap frozen in liquid nitrogen. The samples were obtained from 11 patients with stage I 014.4 NZ tumours: Tumours were homogenized and RNA cancer and 44 patients with stage II cancer. Twenty nine was extracted using Tri-Reagent (ProgenZ, Auckland, New patients were recurrence-free and 26 patients had experi Zealand). The RNA was then further purified using RNeasy enced disease recurrence after a minimum of 5-year follow mini column (Qiagen, Victoria, Australia). Ten micrograms up. of RNA was labelled with Cy5 dUTP using the indirect 0143 Tumor content ranged between 70 and 100% with amino-allyl cDNA labelling protocol. an average of 87%. (0145 A reference RNA from 12 different cell lines was labelled with Cy3 dUTP. The fluorescently labelled cDNA TABLE 6 were purified using a QiaGuick PCR purification kit (Qiagen, Victoria, Australia) according to the manufacturer's protocol. Clinical characteristics of New Zealand 0146 DE tumours: Tumours were homogenized and RNA and German colorectal tumours was isolated using RNeasy Mini Kit (Qiagen, Hilden, Ger Relapse free Relapse many). cRNA preparation was performed as described previ ously (9), purified on RNeasy Columns (Qiagen, Hilden, New Zealand data Germany), and eluted in 55ul of water. Fifteen micrograms of Number of patients 102 47 cRNA was fragmented for 35 minutes at 95°C. and double Age 68.5 (SD: 15.1) 69.8 (SD: 8.7) stranded cDNA was synthesized with a oligo-dT-T7 primer Gender (Eurogentec, Köln, Germany) and transcribed using the male 48 (47%) 22 (47%) Promega RiboMax T7-kit (Promega, Madison, Wis.) and female 54 (53%) 25 (53%) Biotin-NTP labelling mix (Loxo, Dossenheim, Germany). Tumor localization Example 4 right colon 41 (40%) 18 (38%) left colon 12 (12%) 4 (9%) sigmoid 31 (30%) 17 (36%) Microarray Experiments rectum 18 (18%) 8 (17%) Tumor stage 0147 NZ tumours: Hybridisation of the labelled target cDNA was performed using MWG Human 30K Array oligo Stage I 16 O Stage II 61 13 nucleotides printed on epoxy coated slides. Slides were Stage III 25 22 blocked with 1% BSA and the hybridisation was done in Stage IV O 121 pre-hybridisation buffer at 42°C. for at least 12 hours fol lowed by a high Stringency wash. Slides were scanned with a US 2016/006891.6 A1 Mar. 10, 2016 32

GenePix Microarray Scanner and data was analyzed using expressed genes exhibited relatively small changes in expres GenePix Pro 4.1 Microarray Acquisition and Analysis Soft Sion, a condition requiring the mean log fold change between ware (Axon, Calif.). the two classes to be higher than 1.1 was added to the gene 0148 DE tumours: cRNA was mixed with B2-control oli selection process for the DE data. Gene-based prognostic gonucleotide (Affymetrix, Santa Clara, Calif.), eukaryotic signatures were produced using leave one out cross validation hybridization controls (Affymetrix, Santa Clara, Calif.), her (LOOCV) in each of the NZ and DE data sets. To avoid the ring sperm (Promega, Madison, Wis.), buffer and BSA to a problem of over-fitting, both the gene selection and signature final volume of 300 ul and hybridized to one microarray chip construction were performed during each LOOCV iteration. (Affymetrix, Santa Clara, Calif.) for 16 hours at 45° C. Wash After LOOCV, the prediction rate was estimated by the frac ing steps and incubation with Streptavidin (Roche, Man tion of samples correctly predicted. In order to find a gene set nheim, Germany), biotinylated goat-anti Streptavidin anti that could make the best prediction for unknown samples, body (Serva, Heidelberg, Germany), goat-IgG (Sigma, different t-test thresholds using a random variance model Taufkirchen, Germany), and streptavidin-phycoerythrin were investigated in conjunction with six classification meth (Molecular Probes, Leiden, Netherlands) was performed in ods: compound covariate classifier (CCP), diagonal linear an Affymetrix Fluidics Station according to the manufactur discriminant analysis (DLD), 3-nearest neighbours (3-NN), er's protocol. The arrays were then scanned with a HP-argon 1-nearest neighbours (1-NN), nearest centroid (NC), and Sup ion laser confocal microscope and the digitized image data port vector machines (SVM). were processed using the Affymetrix R, Microarray Suite 5.0 0152 To establish the validity of the NZ and DE prognosis Software. signatures, reciprocal validation was performed, with the NZ signature validated using the DE data set, and vice versa. To Example 5 test the NZ genes, probes relating to the 22 genes from the NZ signature were identified in the DE data, and LOOCV was Data Pre-Processing used to assess the performance of a signature for the DE 0149 NZ data: Data pre-processing and normalization samples, based only on these probes. Similarly, probes relat was performed in the R computing environment (10). A log ing to the 19 genes in the DE signature were identified in the transformation was applied to the foreground intensities from NZ data and LOOCV was used to assess the performance of each channel of each array. Data from each spot was used on a signature for the NZ samples. In both cases a significance a per array basis to perform print-tip loss normalization via threshold of 0.999 was used to ensure that all genes were used the limma package (11) from the Bioconductor Suite of analy in each LOOCV iteration. Differences between the platforms sis tools (12). Scale normalization (13) was then used to (in particular, log-ratio data versus log-intensity data) meant standardize the distribution of log intensity ratios across that direct application of a prediction rule across data sets was arrays. Post-normalization cluster analysis revealed the pres not feasible. The consequence of this is that only the gene ence of a gene-specific print-run effect present in the data. sets, and not the prediction rules used, can be generalized to Analysis of variance (ANOVA) normalization was used to new samples. The significance of the LOOCV prediction estimate and remove print run effects from the data for each results was calculated by permuting the class labels of the gene. Replicate array data was available for 46 of the 149 samples and finding the proportion of times that the permuted samples. Cluster analysis of the entire data set indicated that data resulted in a higher LOOCV prediction rate than that the duplicate arrays clustered well with each other suggesting obtained for the unpermuted data. All permutation analysis internal consistency of the array platform. Genes with low involved 2000 permutations, with small P-values indicating intensity, large differences between replicates (mean log that prediction results were unlikely to be due to chance. difference between duplicates higher than 0.5), and unknown proteins were removed from the data set. After the initial Example 7 normalization procedure, a Subset of 10.318 genes was cho sen for further analysis. Survival Analysis 0150. DE data: All Affymetrix U133A GeneChips passed 0153. Kaplan-Meier survival analysis for censored data quality control to eliminate scans with abnormal characteris was performed using the Survival package within the R com tics, that is, abnormallow or high dynamic range, high perfect puting environment. Survival was defined to be “disease free match Saturation, high pixel noise, grid misalignment prob Survival' post Surgery. For each analysis, Survival curves lems, and low mean signal to noise ratio. Background correc were constructed, and the log-rank test (15) was used to tion and normalization were performed in the R computing assess the presence of significant differences between the environment (10,40). Background corrected and normalized curves for the two groups in question. Censoring was taken expression measures from probe level data (cel-files) were into account for both the NZ and DE datasets. For the disease obtained using the robust multi-array average function (14) free Survival data, right censoring prior to five years could implemented in the Bioconductor package affy. only occur for non-recurrent patients as a result of either death, or the last clinical follow-up occurring at less than five Example 6 years. Odds ratios and confidence intervals were produced using the epitools package for R. Prognostic Signatures and Cross Validation 0151. Data analysis was performed using the BRB Array Example 8 Tools package (hypertext transfer protocol://linus.nci.nih. gov/BRB-ArrayTools.html). Gene selection was performed Identification of Markers Co-Expressed with using a random variance model t-test. In the DE data, 318 Chemokine Ligands genes were found to be differentially expressed when using a 0154 Genes in the DE data which had a Pearson correla significance threshold of 0.001. As most of the differentially tion coefficient greater than 0.75 with at least one of the four US 2016/006891.6 A1 Mar. 10, 2016 33 chemokines appearing in the predictor in the non-relapse mization of the empirical likelihood function) to form the group were selected for ontology analysis. Ontology was following estimate of the pooled variance (see next page), performed using DAVID (hypertext transfer protocol://appsl. niaid.nih.gov/david/). s’ = (n - 2)s + 2b Example 9 (n - 2) +2a Results and Analysis where S is the new estimate of the pooled variance, Soo led 0155 To identify robust prognostic signatures to predict is the standard estimate of pooled variance (45), n is the disease relapse for CRC, two independent sets of samples number of samples, and a and b are the parameters of the from NZ and DE were used to generate array expression data F-distribution (46). Based on the t-statistic formed, a t-distri sets from separate series of primary tumours with clinical bution with n-2+2a degrees of freedom was used to obtain a follow-up of five or more years. After normalization, each p-value for each gene. To adjust for multiple hypothesis test data set was analyzed using the same statistical methods to ing, the False Discovery Rate controlling procedure of Ben generate a prognostic signature, which was then validated on jamini and Hochberg (7) was used to produce adjusted p-val the alternate series of patients. As such, the DE prognostic ues for each gene. A gene was considered to have undergone signature was validated on the NZ data set and the NZ prog significant differential expression if its adjusted p-value was nostic signature was validated on the DE data set. less than 0.05. Example 10 Example 11 Exhaustive Identification of Differentially Expressed Identification of Correlated Markers Markers 0156 DE Data Set: The BRB Array Tools class compari 0158. In order to identify additional genes that can be used son procedure was used to detect probes exhibiting statisti as prognostic predictors, correlation analysis was carried out cally significant differences in average intensity between using the R statistical computing software package. This relapse and non-relapse samples. The RVM (random variance analysis revealed 167 probes that had a Pearson correlation model) was again used to produce p-values for each probe in coefficient (40, 44, 45) of at least 0.8. Of these probes, 51 the data set. In this second round, a total of 325 probes were were already present in the set of 325 significantly differen found to be significantly differentially expressed between the tially expressed probes, while the remaining 116 were two sample classes using an arbitrary significance threshold reported as non-significant (using a 0.05 threshold for the of 0.05. Note this selection of genes did not apply any fold FDR, or “false-discovery rate” (47) controlling procedure, change threshold, and used a significance cut off of 0.05. the RVM, or rando variance model). These 116 probes rep rather than the threshold of 0.001 that was used in Example 6. resent 111 distinct genes (Table 2). The purpose of this less stringent threshold (p=0.05 instead of p=0.001) was to put forward a larger number of genes for Example 12 construction of the second round of signatures (see example 17) These probes represent 270 unique genes (Table 1 and Construction of Prognostic Signatures Table 2). 0159. The NZ data set was generated using oligonucle 0157 Explicitly, the test for significance (random variance otide printed microarrays. Six different signatures were con model) comprises the following: generating a test statistic for structed, with a Support vector machine (SVM) using a gene each gene which was identical to that of a standard two selection threshold of 0.0008 yielding the highest LOOCV sample t-test (45) except that the estimate of the pooled vari prediction rate, and producing a 22-gene signature (77% pre ance was obtained by representing the variance structure diction rate, 53% sensitivity, 88% specificity; p=0.002, across all genes as an F-distribution, and then using the Tables 7, 8A, and 8B). For Tables 8A and 8B, the gene parameters, a and b, of this distribution (obtained via maxi descriptions are shown in Tables 3 and 4, respectively. TABLE 7 Construction of prognostic signatures Data set Prediction rate Sensitivity Specificity P value Odd ratio 22 gene NZ signature tested on German data

NZ data O.77 O.S3 O.88 O.OO2 8.4 (training; SVM) (0.66, 0.86)s (0.33, 0.73) (0.77, 0.95) (3.5, 21.4) NZ data minus 4 genes 0.72 O.38 O.87 O.O11 not found in German data were removed from NZ data set (training; SVM) German data (test; SVM) O.71 O.62 0.79 O.OO2 5.9 (0.51, 0.86) (0.32, 0.86) (0.52, 0.95) (1.6, 24.5) 19 gene German signature tested on NZ data

German data O.84 O.85 O.83

TABLE 7-continued Construction of prognostic signatures Data set Prediction rate Sensitivity Specificity P value Odd ratio German data minus 5 0.67 O.65 O.66 O.046 genes not found in NZ data were removed from German data set (training: 3-NN) NZ data (test: 3-NN) 0.67 O.42 O.78 O.O45 2.6 (0.55, 0.78) (0.22, 0.64) (0.65, 0.89) (1.2, 6.0) SVM: support vector machine signature; 3-NN: 3 nearest neighbour signature. 95% confidence interval *P values were calculated from 2,000 permutation of class labels

TABLE 8A TABLE 8B-continued NZ prognostic signature DE prognostic signature New Zealand 22-gene prognostic signature German 19-gene prognostic signature Gene GenBank Genes not found in German Gene GenBank Genes not found in NZ p-value Symbol Acc. No. data at time of analysis p-value Symbol Acc. No. data at time of analysis 2.3OE-05 WDR44 NM O19045 : O.OOO177 SLC4A4 NM 003759 3.3OE-05 RBMS1 NM O16836 O.OOO215 DLG7 NM 014750 * 0.000376 TNFRSF11A, NM OO3839 6.8OE-OS SOAT1 NM 003101 O.OOO38 KITLG NM OOO899 7.9 OE-05 PBK NM 018492 O.OOOS/9 INDO NM 002164 O.OOO14 G3BP2 NM 012297 O.OOO634 GBP1 NM 002053 O.OOO163 ZBTB20 NM O15642 O.OOO919 CXCL13 NM OO6419 * 0.000214 ZNF410 NM 02118.8 : O.OOO942 CLCA4 NM 012128 * 0.00022 COMMD2 NM 016094 : O.OO1636 PCP4 NM OO6198 O.OOO293 PSMC1 NM 002802 O.OOO321 COX10 NM OO1303 0.000334 GTF3C5 NM 012087 0160 The NZ signature had an odds ratio for disease O.OOO367 HMMR NM 012485 0.0004-05 UBE2IL3 NM OO3347 recurrence in the NZ patients of 8.4 (95% CI 3.5-21.4). O.OOO417 GNAS NM 000516 0.161 The DE data set was generated using Affymetrix 0.000467 PPP2R2A NM 002717 arrays resulting in a 19-gene (22-probe) and 3-nearest neigh 0.000493 RNASE2 NM OO2934 bour (3-NN) signature (selection threshold 0.002, log fold O.OOOS32 SCOC NM 032547 : changed 1.1, 84% classification rate, 85% sensitivity, 83% O.OOOS/8 PSMD9 NM 002813 0.000593 EIF3S7 NM 003753 specificity, p<0.0001, Tables 3, 4,7). The DE signature had an 0.000649 ATP2B4 NM 001001396 odds ratio for recurrence in the DE patients of 24.1 (95% CI NM OO1684 5.3-144.7). Using Kaplan-Meier analysis, disease-free sur O.OOO737 ABCC9 NM O2O298 vival in NZ and DE patients was significantly different for those predicted to recur or not recur (NZ signature, p<0.0001, FIG. 2A; DE signature, p<0.0001, FIG. 2B). TABLE 8B Example 13 DE prognostic signature German 19-gene prognostic signature External Validation of the NZ and DE Prognostic Signatures Gene GenBank Genes not found in NZ p-value Symbol Acc. No. data at time of analysis 0162 To validate the NZ signature, the 22 genes were used 3.OOE-06 CXCL10 NM OO1565 to construct a SVM signature in the DE data set by LOOCV. 4.OOE-06 FAS NM OOOO43 A prediction rate of 71% was achieved, which was highly NM 152871 significant (p=0.002; Table 7). The odds ratio for recurrence NM 152872 NM 152873 in DE patients, using the NZ signature, was 5.9 (95% CI NM 152874 1.6-24.5). We surmise that the reduction in prediction rate, NM 152875 from 77% in NZ patients to 71% in DE patients (Table 7), was NM 152876 due to four genes from the NZ signature not being present in NM 152877 8.OOE-06 CXCL9 NM OO2416 * the DE data. Disease-free survival for DE patients predicted 1.2OE-05 TLK1 NM 012290 to relapse, according to the NZ signature, was significantly 1.3OE-OS CXCL11 NM OO5409 lower than disease-free survival for patients predicted not to 2.1 OE-05 PBK NM 018492 relapse (p=0.0049, FIG. 2C). 4.2OE-OS PSAT1 NM 021154 7.6OE-05 MAD2L1 NM 002358 0163 The DE signature was next validated by using the 19 9.8OE-05 CA2 NM OOOO67 genes to construct a 3-NN signature in the NZ data set by O.OOO128 GZMB NM 004131 * LOOCV. The prediction rate of 67% was again significant (p=0.046; Table 7), confirming the validity of the DE signa US 2016/006891.6 A1 Mar. 10, 2016

ture. The odds ratio for recurrence in NZ patients, using the associated with recurrent disease are related to the immune DE signature, was 2.6 (95% CI 1.2-6.0). We consider that the response. The immune response has an important role in the reduction of the prediction rate was due to five genes from DE progression of different cancers and T-lymphocyte infiltra signature not being present in the NZ data set. This was tion in CRC patients is an indicator of good prognosis (36 confirmed when removal of these five genes from the DE data 38). All of the eleven immune response (Table 5) genes were set resulted in a reduction of the LOOCV prediction rate from down-regulated in recurrent patients which would be unex 84% to 67% (Table 7). Disease-free survival for NZ patients pected based on known biological mechanisms. predicted to relapse, according to the DE signature, was sig 0169. To further confirm these results, 4 chemokine genes nificantly lower than disease-free survival for patients pre were chosen for further analysis. Chemokine ligands not only dicted not to relapse (p=0.029; FIG. 2D). reflect the activity of the immune system and mediate leuko cyte recruitment but also are involved in chemotaxis, cell Example 14 adhesion and motility, and angiogenesis (36). To investigate the role of the immune response genes, 86 genes co-expressed Comparison of NZ and DE Prognostic Signatures with the chemokine ligands were identified. Almost half of with Current Staging System these genes had a classification within the “immune response' category Suggesting that the primary 0164. Significant differences in disease-free survival function of these genes in the recurrence process is the modu between patients predicted to relapse or not relapse were also lation of the immune response. Furthermore, CD4+ and observed within the same clinico-pathological stage (FIG.3). CD8+ T cell antigens (CD8A, CD3, PRF1, TRAa, TRBa) or When patient predictions were stratified according to disease functionally related antigens, for example, major histocom stage, the NZ signature was able to identify patients who were patibility molecules, interferon gamma induced proteins, and more likely to recur in both Stage II (p=0.0013, FIG.3A), and IL2RB, were found in the co-expressed gene list. The activa Stage III subgroups (p=0.0295, FIG. 3A). This was mirrored tion of tumor specific CD4+ T cells and CD8+ T cells has been to a lesser extent when the DE signature was applied to the NZ shown to result in tumour rejection in a mouse colorectal data set, where the difference was only observed for Stage III cancer model (37). Collectively, these findings suggest that patients (p=0.0491, FIG.3B). Again, the decreased predictive the lymphocytes form part of a tumor-specific host response accuracy of the DE signature was likely due to the absence of involved in minimising the spread of cells from the primary five genes from the NZ data that decreased the LOOCV tumour. prediction rate. Example 17 Example 15 Selection of Additional Prognostic Signatures Genes in Signatures are Related to CRC Disease Progression 0170 The performance of the two prognostic signatures described above was excellent in terms of cross-validation 0.165. A number of genes in the NZ signature (Table 3) between the two data sets. Further studies were carried out, including G3BP2 (16), RBMS1 (17), HMMR (18), UBE2IL3 using a purely statistical approach, to develop a range of (19), GNAS (20), RNASE2 (21) and ABCC9 (22) have all signatures, in addition to the aforementioned, that would also been reported to be involved in cancer progression, while predict prognosis for other data sets. One of the additional RBMS1 (23), EIF3S7 (24) and GTF3C5 (25) are involved in goals of these studies was to ensure that the method used to transcription or translation. PBK is a protein kinase, which is normalize the microarray data (robust multi-array average) involved in the process of mitosis (26), and the only gene was not exerting undue influence on the choice of genes. common to the NZ and DE signatures. Eleven of 19 genes in (0171 FIG. 4 shows the classification rates obtained from the DE signature (Table 4) are involved in the immune signatures of varying lengths. The classification rate is the response including 4 chemokine ligands (CXCL9, CXCL10, proportion of correct relapse predictions (expressed as a per CXCL11, CXCL13; (27)), PBK (28), INDO (29), GBP1 (30), centage of total predictions), i.e., the proportion of Samples GZMB (31), KITLG (32), and two receptors of the tumor correctly classified. The classification rates were determined using 11-fold cross validation. For this cross validation, a necrosis factor family (TNFRSF11A, FAS: 33)). randomly selected Stratified sample (i.e. same ratio of recur 0166 Eighty six genes were found to be moderately cor rent to non-recurrent tumours as the full data set) was related (Pearson correlation coefficient >0.75) with at least removed as a validation set prior to gene selection of the one of the four chemokine ligands in the DE data. Ontology genes, and model construction (using the training set of the analysis found that 39 of these 65 genes were in the category remaining 50 samples). Cross-validation was then repeated a of immune response (p<10’). This result suggests a key role further ten times so that all 55 samples appeared in one for the host immune response in determining CRC recur validation set each. This 11-fold cross-validation process was CC. repeated as 10 replicates, and the results plotted in FIG. 4 and FIG. 5. The classification rates shown were corrected using Example 16 bootstrap bias correction (43), to give the expected classifi cation rates for the signatures to be applied to another data set. Discussion of NZ and DE Prognostic Signatures From this analysis, it was ascertained that shorter signatures produced the best classification rate. In addition, analysis of 0167. It has been shown that the two different prognostic the genes that most frequently appeared in classifiers show signatures can be used to improve the current prognosis of that the discriminatory power was mostly due to the effec colorectal cancer. tiveness of two genes: FAS and ME2. This is illustrated most 0168 For the DE signature, it was surprising and unex clearly by FIG. 5 shows the effectiveness of the signatures, pected that the stage I/II samples could be used to predict once the two genes FAS and ME2 were removed from the data stage III outcome. It was also surprising that many genes set. For more detail see the legend to FIG. 5. US 2016/006891.6 A1 Mar. 10, 2016 36

0172. The effect of normalization on feature selection was 0.174. On average, longer prognostic signatures were pre thoroughly investigated by generating gene lists from 1000 ferred over shorter signatures in terms of ability to predict stratified Sub-Samples of the original set of tumours, each prognosis for new data sets (FIG. 4 and FIG. 5). The genes time removing 5 samples (i.e. 1/11 of the total number of FAS and ME2 were also important (discussed, above). These samples) from the data set. (This is effectively the same as two facts were used, along with the fact that short signatures performing 11-fold cross-validation). A tally was made of the that do not contain either FAS or ME2 perform less effec number of times each gene appeared in the “top-n' gene lists tively, to select candidate signatures as shown in Table 9. (i.e., top 10, top 20, top 100, and top 325). This value was below. Signatures were selected (from the pool of randomly termed the “top count. Top counts were generated using generated signatures) if they exhibited >80% median classi three different normalization methods (40) (FIG. 6), and three fication rate (using three methods of classifiers: k-nearest different filtering statistics (FIG. 7). There was substantial neighbours, with k=1; k-nearest neighbours, with k=3; and correlation in the top count between normalization schemes Support vector machines, with a linear kernel function), using and filtering statistics (41, 42) used. Thus, while normaliza leave-one-out cross-validation. tion and feature selection methods were important, many 0.175. In addition, because, on average, longer signatures genes appeared in the gene lists independently of the method (>10 genes/signature) tended to perform better, we selected used to pre-process the data. This indicates that the choice of signatures with 20 or more genes/signatures from a pool of normalization method had only a minimal effect on which signatures with 30 or more probes/signature. It is expected genes were selected for use in signature construction. The top that these signatures (Table 10) will perform with a classifi count, Summed across all normalization methods and statis cation rate of around 70% when applied to other data sets, on tics, was found to be a robust measure of a gene's differential the basis of the results shown in FIGS. 4 and 5. It was found expression between recurrent and non-recurrent tumours. that all of the signatures generated in this way contained both 0173 Genes from the gene lists (see Table 1 and Table 2), ME2, and all but one contained FAS, which may be due to the were used to generate signatures by random sampling. The importance of these genes in providing prediction of progno generation of samples was weighted, such that genes with sis. It was noted that the high classification rate obtained higher “top count” were more likely to be selected. A range of using this approach on the in-house data set did not necessar signatures was generated, using between 2 and 55 Affymetrix ily mean that these signatures that would be expected to probes. Signatures were selected if they exhibited >80% perform better than those set forth in Example 12, on other median classification rate, using three methods of classifiers: data sets. Rather, the purpose was to produce a range of k-nearest neighbours, with k=1; k-nearest neighbours, with signatures expected to apply to other data sets as least as well k 3; and Support vector machines, with a linear kernel func as the previous signatures. The markers comprising the prog tion, and using leave-one-out cross-validation. nostic signatures are set forth in Table 9. TABLE 9 Additional Prognostic signatures (note SVM = Support vector machine, 3NN = 3 nearest neighbours, 1 NN = 1 nearest neighbour, Sens = sensitivity, Spec = Specificity, for prediction of recurrence Signature Number Signature Genes (as gene symbols) Sens Spec Sens Spec Sens Spec 1 WARS, STAT1, EIF4E, PRDX3, PSME2, 81% 86% 73% 90% 77%. 83% GMFB, DLGAP4, TYMS, CTSS, MAD2L1, CXCL10, C1OBP, NDUFA9, SLC25A11, HNRPD, ME2, CXCL11, RBM25, CAMSAP1L1, hCAP-D3, BRRN1, ATP5A1, FAS, FLJ13220, PBK, BRIP1 2 WARS, SFRS2, EIF4E, MTHFD2, PSME2, 77%. 86% 85% 79% 81% 86% GMFB, DLGAP4, TYMS, LMAN1, CDC40, CXCL10, NDUFA9, SLC25A11, CA2, ME2, FT20, TLK1, CXCL11, RBM25, AK2, FAS, FLJ13220, PBK, PSAT1, STAT1 3 WARS, SFRS2, PRDX3, GMFB, DLGAP4, 85% 86% 92% 76% 85% 79% TYMS, LMAN1, CDC40, CXCL10, NDUFA9, KPNB1, SLC25A11, CA2, ME2, FUT4, CXCL11, GZMB, RBM25, ATP5A1, CDC42BPA, FAS, RBBP4, HNRPD, BRIP1, STAT1 WARS, PRDX3, MTHFD2, PSME2, TES, 81% 79% 77%. 69% 770, 79% DCK, CDC40, CXCL10, PLK4, NDUFA9, SLC25A11, WHSC1, ME2, CXCL11, SLC4A4, RBM25, ATP5A1, CDC42BPA, FAS, BAZ1A, AGPAT5, FLJ13220, HNRPD, KLHL24, STAT1 HNRPD, WARS, MTHFD2, GMFB, 88% 83%. 88% 83% 88% 76% DLGAP4, TYMS, CXCL9, IRF8, GTSE1, RABIF, CXCL10, FAS, TRIM25, KITLG, C1QBP, SLC25A11, C17orf25, CA2, ME2, SLC4A4, CXCL11, RBM25, KLHL24, STAT1

US 2016/006891.6 A1 Mar. 10, 2016 array data) for the probes corresponding to genes that com (0193 13. Smyth GK, SpeedTP. Normalization of cDNA prise that signature, across both recurrent and non-recurrent microarray data. In: Carter D, ed. METHODS: Selecting samples: Candidate Genes from DNA Array Screens: Application to 0.178 For k-nearest neighbours, we used leave-one-out Neuroscience. Vol. 31, 2003:265-73. cross validation with k=1 and k=3 to obtain sensitivity (0194 14. Irizarry RA, Hobbs B, Collin F, et al. Explora (proportion of positive, i.e. recurrent, samples correctly tion, normalization, and Summaries of high density oligo classified) and specificity (proportion of negative nucleotide array probe level data. Biostatistics 2003; samples, i.e. non-recurrent samples correctly classified) 4:249-64. described in table 9 (0195 15. Harrington DP Fleming TR. A class of rank test 0179 The dataset was used to generate leave-one-out procedures for censored survival data. Biometrika 1982; cross-validation sensitivity and specificity data using the 69:553-66. following Support-vector machine parameters: The Sup (0196) 16. Barnes C J, Li F, Mandal M. Yang Z, Sahin AA, port vector machine models were generated using a lin Kumar R. Heregulin induces expression, ATPase activity, ear kernel, and all other parameters used were the default and nuclear localization of G3BP, a Ras signaling compo values obtained from the SVm function of the e1071 nent, in human breast tumors. Cancer Res 2002; 62:1251 package. 55. 0180. Note the genes comprising the signatures were (0197) 17. Niki T, Izumi S, Saegusa Y, et al. MSSP pro themselves obtained from the list of significantly differen motes ras/myc cooperative cell transforming activity by tially expressed probes, and those from the list of genes which binding to c-Myc. Genes Cells 2000: 5:127-41. were found to correlate with genes from the NZ 22-gene (0198 18. Rein DT, Roehrig K, SchondorfT, et al. Expres signature. In some cases there was more than one significant sion of the hyaluronan receptor RHAMM in endometrial (or correlated) probe per gene. In these cases, the prediction carcinomas Suggests a role in tumor progression and models used the median intensity data across all significant metastasis. J Cancer Res Clin Oncol 2003; 129:161-64. probes (i.e. those in the significant probe list, see table 1) for (0199 19. Fernandez P. Carretero J. Medina P P. et al. that gene. Distinctive gene expression of human lung adenocarcino mas carrying LKB1 mutations. Oncogene 2004; 23:5084 REFERENCES 91. 0181 1. Arnold CN, Goel A, Blum H E. Richard Boland (0200. 20. Frey U H. Eisenhardt A, Lummen G, et al. The C. Molecular pathogenesis of colorectal cancer. Cancer T393C polymorphism of the Galpha s gene (GNAS1) is a 2005: 104:2035-47. novel prognostic marker in bladder cancer. Cancer Epide 0182 2. Anwar S, Frayling IM, Scott N A, Carlson G. L. miol Biomarkers Prev 2005; 14:871-77. Systematic review of genetic influences on the prognosis of 0201 21. Niini T, Vettenranta K, Hollmen J, et al. Expres colorectal cancer. BrJ Surg 2004:91:1275-91. sion of myeloid-specific genes in childhood acute lympho 0183 3. Wang Y. Jatkoe T. ZhangY, et al. Gene expression blastic leukemia—a cDNA array study. Leukemia 2002: profiles and molecular markers to predict recurrence of 16:2213-21. Dukes B colon cancer. J Clin Oncol 2004; 22:1564-71. 0202 22. Yasui K, Mihara S. Zhao C, et al. Alteration in 0184. 4. Eschrich S. Yang I, Bloom G, et al. Molecular copy numbers of genes as a mechanism for acquired drug staging for Survival prediction of colorectal cancer resistance. Cancer Res 2004; 64: 1403-10. patients. J. Clin Oncol 2005; 23:3526-35. 0203 23. Nomura J. Matsumoto K, Iguchi-Ariga S M, 0185. 5. Barrier A. Lemoine A, Boelle PY, et al. Colon Ariga H. Positive regulation of Fas gene expression by cancer prognosis prediction by gene expression profiling. MSSP and abrogation of Fas-mediated apoptosis induction Oncogene 2005; 24:6155-64. in MSSP-deficient mice. Exp Cell Res 2005: 305:324-32. 0186 6. Simon R. Roadmap for developing and validating (0204 24. Mayeur G. L., Fraser CS, Peiretti F, Block KL, therapeutically relevant genomic classifiers. J Clin Oncol Hershey J.W. Characterization of eIF3k: a newly discov 2005; 23:7332-41. ered subunit of mammalian translation initiation factor 0187 7. Michiels S. Koscielny S. Hill C. Prediction of eIF3. Eur J. Biochem 2003: 270:4133-39. cancer outcome with microarrays: a multiple random vali (0205 25. Hsieh YJ. Wang Z, Kovelman R, Roeder R. G. dation strategy. Lancet 2005; 365:488-92. Cloning and characterization of two evolutionarily con 0188 8. Marshall E. Getting the noise out of gene arrays. served subunits (TFIIIC102 and TFIIIC63) of human Science 2004; 306:630-31. TFIIIC and their involvement in functional interactions 0189 9. Birkenkamp-Demtroder K. Christensen L. L. Ole with TFIIIB and RNA polymerase III. Mol Cell Biol 1999; Sen SH, et al. Gene expression in colorectal cancer. Cancer 19:4944-52. Res 2002: 62:4352-63. (0206. 26. Matsumoto S, Abe Y. Fujibuchi T, et al. Charac 0190. 10. Ihaka R. Gentleman R. R: A language for data terization of a MAPKK-like protein kinase TOPK. Bio analysis and graphics. Journal of Computational and chem Biophys Res Commun 2004; 325:997-1004. Graphical Statistics 1996; 5:299-314. 0207 27. Dong VM, McDermott D H, Abdi R. Chemok 0191) 11. Smyth G. K. Linear models and empirical Bayes ines and diseases. Eur J Dermatol 2003; 13:224-30. methods for assessing differential expression in microar (0208. 28. Abe Y. Matsumoto S, Kito K, Ueda N. Cloning ray experiments. Statistical Applications in Genetics and and expression of a novel MAPKK-like protein kinase, Molecular Biology 2004; 3:Article 3. lymphokine-activated killer T-cell-originated protein (0192 12. Gentleman RC, Carey VJ, Bates DM, et al. kinase, specifically expressed in the testis and activated Bioconductor: open Software development for computa lymphoid cells. J Biol Chem 2000: 275:21525-31. tional biology and bioinformatics. Genome Biol 2004; (0209. 29. Logan G. J. Smyth CM, Earl J. W. et al. HeLa 5:R8O. cells cocultured with peripheral blood lymphocytes US 2016/006891.6 A1 Mar. 10, 2016

acquire an immuno-inhibitory phenotype through up-regu 0228 48. Wright G W. Simon R M A random variance lation of indoleamine 2,3-dioxygenase activity. Immunol model for detection of differential gene expression in small ogy 2002: 105:478-87. microarray experiments. Bioinformatics 2003; 19:2448 0210 30. Lubeseder-Martellato C. Guenzi E. Jorg A, et al. 2455 Guanylate-binding protein-1 expression is selectively 0229 49. Hastie T, Tibshirani R, Friedman JThe Elements induced by inflammatory cytokines and is an activation of Statistical Learning Data Mining, Inference and Predic marker of endothelial cells during inflammatory diseases. tion Springer 2003 Am J Pathol 2002; 161:1749-59. 0230 50. Neter J, Kutner MH, Wasserman W. Nachtsheim 0211. 31. Phillips S M, Banerjea A, Feakins R, Li S R, C.J. Applied Linear Statistical Models McGraw-Hill/Irwin Bustin SA, Dorudi S. Tumor-infiltrating lymphocytes in 1996 colorectal cancer with microsatellite instability are acti 0231 51. Venables, WN, Ripley, B D Modern Applied vated and cytotoxic. BrJ Surg 2004:91:469-75. Statistics with S. 4 ed. Springer 2002. 0212 32. Oliveira S H, Taub D D, Nagel J, et al. Stem cell 0232 52. Ripley, B. D. Pattern Recognition and Neural factor induces eosinophil activation and degranulation: Networks Cambridge University Press 1996 mediator release and gene array analysis. Blood 2002; 0233 53. Cristianini N. Shawe-Taylor J. An Introduction 100:4291-97. to Support Vector Machines (and other kernel-based learn 0213 33. Xanthoulea S. Pasparakis M. Kousteni S, et al. ing methods) Cambridge University Press 2000 Tumor necrosis factor (TNF) receptor shedding controls 0234 54. Breiman L, Friedman J. Stone C J, Olshen RA thresholds of innate immune activation that balance oppos Classification and Regression Trees Chapman & Hall/CRC ing TNF functions in infectious and inflammatory diseases. 1984 J Exp Med 2004; 200:367-76. 0235. 55. Good, P I Resampling Methods: A Practical 0214 34. Brennan DJ, O'Brien SL, Fagan A, et al. Appli Guide to Data Analysis Birkhauser 1999 cation of DNA microarray technology in determining 0236 Wherein in the description reference has been made breast cancer prognosis and therapeutic response. Expert to integers or components having known equivalents, such Opin Biol Ther 2005; 5: 1069-83. equivalents are herein incorporated as if individually set 0215. 35. Canna K. McArdle PA, McMillan DC, et al. fourth. The relationship between tumor T-lymphocyte infiltration, 0237 Although the invention has been described by way the systemic inflammatory response and Survival in of example and with reference to possible embodiments patients undergoing curative resection for colorectal can thereof, it is to be appreciated that improvements and/or cer. BrJ Cancer 2005: 92:651-54. modifications may be made without departing from the scope 0216) 36. Rossi D, Zlotnik A. The biology of chemokines thereof. and their receptors. Annu Rev Immunol 2000; 18:217-42. What is claimed is: 0217 37. Miyazaki M, Nakatsura T. Yokomine K, et al. 1. A prognostic signature for determining progression of DNA vaccination of HSP 105 leads to tumor rejection of CRC, comprising two or more genes selected from Tables 1 colorectal cancer and melanomain mice through activation and 2. of both CD4 T cells and CD8 T cells. Cancer Sci 2005; 2. The signature of claim 1, selected from any one of the 96:695-705. signatures in any one of Tables 3, 4 or Table 9. 0218 38. Ein-Dor L. Kela I, Getz G, Givol D, Domany E. 3. A device for determining prognosis of CRC, comprising: Outcome signature genes in breast cancer: is there a unique a Substrate having one or more locations thereon, each set? Bioinformatics 2005; 21:171-78. location having two or more oligonucleotides thereon, 0219. 39. Becker RA, Chambers, JM and Wilks A R The each oligonucleotide selected from the group of genes New S Language. Wadsworth & Brooks/Cole 1988. from Tables 1 and 2. 4. The device of claim 3, wherein said the two or more 0220 40. Gentleman R., Carey VJ, Huber W., Irizarry R oligonucleotides are a prognostic signature selected from in A, Dudoit S. Bioinformatics and Computational Biology any one of Tables 3, 4 or Table 9. Solutions. Using R and Bioconductor. Springer 2005. 5. A method for determining the prognosis of CRC in a 0221 41. Bauer D F. Constructing confidence sets using patient, comprising the steps of rank Statistics. Journal of the American Statistical Associa (i) determining the expression level of a prognostic signa tion 1972: 67:687-690. ture comprising two or more genes from Tables 1 and 2 0222 42. Lönnstedt I. and SpeedTP. Replicated microar in CRC tumour sample from the patient, ray data. Statistica Sinica 2002: 12:31-46. (ii) applying a predictive model, established by applying a 0223 43. Efron, B. and Tibshirani, R. An Introduction to predictive method to expressions levels of the predictive the Bootstrap. Chapman & Hall. 2005 signature in recurrent and non-recurrent tumour 0224. 44. Harraway J. Introductory Statistical Methods Samples, and the Analysis of Variance. University of Otago Press (iii) establishing a prognosis. 1993. 6. The method of claim 5, wherein the signature is selected 0225. 45. McCabe G. P. Moore D S Introduction to the from any one of Tables 3, 4 or Table 9. Practice of Statistics W.H. Freeman & Co. 2005 7. The method of claim 5, wherein said predictive method 0226 46. Casella G. Berger RL Statistical Inference Wad is selected from the group consisting of linear models, Sup Sworth 2001 port vector machines, neural networks, classification and 0227 47. McLaughlan G. J. Do K, Ambroise C Analyzing regression trees, ensemble learning methods, discriminant Microarray Gene Expression Data (Wiley Series in Prob analysis, nearest neighbor method, bayesian networks, inde ability and Statistics) 2004 pendent components analysis. US 2016/006891.6 A1 Mar. 10, 2016

8. The method of any one of claims 5 to 7, wherein the step 13. The methodofany one of claims 5 to 7, wherein the step of determining the expression level of a prognostic signature of determining the expression level of a prognostic signature is carried out by detecting the expression level of mRNA of is carried out by detecting the expression level of the protein each gene. of each marker. 9. The method of any one of claims 5 to 7, wherein the step 14. The methodofany one of claims 5 to 7, wherein the step of determining the expression level of a prognostic signature of determining the expression level of a prognostic signature is carried out by detecting the expression level of cDNA of is carried out by detecting the expression level of the peptide each gene. of each marker. 10. The method of claim 9, wherein the step of determining 15. The method of claim 12 or claim 13, wherein said step the expression level of a prognostic signature is carried out of detecting is carried out using an antibody directed against using a nucleotide complementary to at least a portion of said each marker. cDNA 16. The method of any one of claims 12 to 14, wherein said 11. The method of claim8, wherein the step of determining step of detecting is carried out using a sandwich-type immu the expression level of a prognostic signature is carried out noassay method. using qPCR method using a forward primer and a reverse 17. The method of any one of claims 12 to 15, wherein said primer. antibody is a monoclonal antibody. 12. The method of claim8, wherein the step of determining 18. The method of any one of claims 12 to 15, wherein said the expression level of a prognostic signature is carried out antibody is a polyclonal antiserum. using a device according to claim 3 or claim 4. k k k k k