US 2011 013.6683A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2011/013.6683 A1 DaVicioni (43) Pub. Date: Jun. 9, 2011

(54) SYSTEMS AND METHODS FOR Publication Classification EXPRESSION-BASED DISCRIMINATION OF DISTINCT CLINICAL DISEASE STATES IN (51) Int. Cl. PROSTATE CANCER C40B 30/04 (2006.01) C40B 40/06 (2006.01) C4DB 60/2 (2006.01) (75) Inventor: Elai Davicioni, Vancouver (CA) (52) U.S. Cl...... 506/7:506/16:506/39 (73) Assignee: GENOMEDX BIOSCIENCES, (57) ABSTRACT INC., Vancouver, BC (CA) A system for expression-based discrimination of distinct (21) Appl. No.: 12/994,408 clinical disease states in prostate cancer is provided that is based on the identification of sets of transcripts, which are characterized in that changes in expression of each gene (22) PCT Fled: May 28, 2009 transcript within a set of gene transcripts can be correlated with recurrent or non-recurrent prostate cancer. The Prostate (86) PCT NO.: PCTACA2O09/OOO694 Cancer Prognostic system provides for sets of “prostate can cer prognostic' target sequences and further provides for S371 (c)(1), combinations of polynucleotide probes and primers derived (2), (4) Date: Feb. 17, 2011 there from. These combinations of polynucleotide probes can be provided in solution or as an array. The combination of Related U.S. Application Data probes and the arrays can be used for diagnosis. The invention (60) Provisional application No. 61/056,827, filed on May further provides further methods of classifying prostate can 28, 2008. certissue.

P. O Patent Application Publication Jun. 9, 2011 Sheet 1 of 9 US 2011/O136683 A1

P. O

Figure 1A Patent Application Publication Jun. 9, 2011 Sheet 2 of 9 US 2011/O136683 A1

SYS PSA NED

& 8:

&

XXX XXX &

SSSSSSSXX:

Figure 1B Patent Application Publication Jun. 9, 2011 Sheet 3 of 9 US 2011/O136683 A1

Recurrent Non-Recurrent

&S

Figure 1C Patent Application Publication Jun. 9, 2011 Sheet 4 of 9 US 2011/O136683 A1

526 RNA Metagene Mean it SD

- 47

510 + 28

868 - 57

Metagene Value p-0.00001

Figure 2 Patent Application Publication Jun. 9, 2011 Sheet 5 of 9 US 2011/O136683 A1

A 0. -. i.

a. a. i {G8 G8 g

NED PSA SYS

Figure 3 Patent Application Publication Jun. 9, 2011 Sheet 6 of 9 US 2011/O136683 A1

18-RNA POP Scores

OO

90 S&S 80 70 60 50 40 30 20 10

NED PSA SYS Clinical Status

Figure 4 Patent Application Publication Jun. 9, 2011 Sheet 7 of 9 US 2011/O136683 A1

1O-RNA POP Scores

OO 90 - 80

9 60 - c d of 50 40 - 30 2O I 10 --

NED PSA SYS Clinical Status

Figure 5 Patent Application Publication Jun. 9, 2011 Sheet 8 of 9 US 2011/O136683 A1

41-RNA POP Scores

OO 90 8O 70 60 50 40 3O

2O s to a O

NED PSA SYS Clinical Status

Figure 6 Patent Application Publication Jun. 9, 2011 Sheet 9 of 9 US 2011/O136683 A1

148-RNA POP Scores

OO 90 8O

70 8& 60 50 40 30 2O O

NED PSA SYS Clinical Status

Figure 7 US 2011/013.6683 A1 Jun. 9, 2011

SYSTEMS AND METHODS FOR ment. The limitations of using the PSA biomarker and the EXPRESSION-BASED DISCRIMINATION OF absence of additional biomarkers for predicting disease DISTINCT CLINICAL DISEASE STATES IN recurrence have led to the development of statistical models PROSTATE CANCER combining several clinical and pathological features includ ing PSA results. Several of these nomograms have been CROSS-REFERENCE TO RELATED shown to improve the predictive powerfor disease recurrence APPLICATIONS in individual patients over any single independent variable. 0001. This application is a continuation of Application These models (see Citation #14) are used routinely in the No. PCT/CA2009/000694 filed May 28, 2009, now pending, clinic and are currently the best available tools for prediction which claims priority benefit of U.S. Provisional Application of outcomes, although they do not provide high levels of No. 61/056,827, filed May 28, 2008, the entire contents both accuracy for groups of patients with highly similar histologi of which are incorporated herein by reference. cal/pathological features or those at intermediate risk of disease recurrence after prostatectomy. FIELD OF THE INVENTION 0007. The use of quantitative molecular analyses has the 0002 This invention relates to the field of diagnostics and potential to increase the sensitivity, specificity and/or overall in particular to systems and methods for classifying prostate accuracy of disease prognosis and provide a more objective cancer into distinct clinical disease states. basis for determination of risk stratification as compared to conventional clinical-pathological risk models (see Citation BACKGROUND #13). The PSA test demonstrates the deficiencies of relying 0003 Prostate cancer is the most common malignancy on the measurement of any single biomarker in clinically affecting U.S. men, with approximately 240,000 new cases heterogeneous and complex prostate cancer genomes. There diagnosed each year. The incidence of prostate cancer is fore, genomic-based approaches measuring combinations of increasing, in part due to increased Surveillance efforts from biomarkers or a signature of disease recurrence are currently the application of routine molecular testing Such as prostate being investigated as better Surrogates for predicting disease specific antigen (PSA). For most men, prostate cancer is a outcome (see Citations # 1-13). For prostate cancer patients slow-growing, organ-confined or localized malignancy that these efforts are aimed at reducing the number of unnecessary poses little risk of death. The most common treatments for Surgeries for patients without progressive disease and avoid prostate cancer in the U.S. are surgical procedures such as inadvertent under-treatment for higher risk patients. To date, radical prostatectomy, where the entire prostate is removed genomic profiling efforts to identify DNA-based (e.g., copy from the patient. This procedure on its own is highly curative number alterations, methylation changes), RNA-based (e.g., for most but not all men. gene or non-coding RNA expression) or -based (e.g., 0004. The vast majority of deaths from prostate cancer protein expression or modification) signatures, useful for dis occur in patients with metastasis, believed to be present ease prognosis have not however resulted in widespread clini already at the time of diagnosis in the form of clinically cal use. undetectable micro-metastases. In these patients, it is clear 0008. There are several key reasons explaining why prior that prostatectomy alone is not curative and additional thera genomic profiling methods for prostate cancer have not yet pies such as anti-androgen or radiation therapy are required to been incorporated in the clinic. These include the small control the spread of disease and extend the life of the patient. sample sizes typical of individual studies, coupled with varia 0005 Most prostatectomy patients however face uncer tions due to differences in study protocols, clinical heteroge tainty with respect to their prognosis after Surgery: whether or neity of patients and lack of external validation data, which not the initial surgery will be curative several years from the combined have made identifying a robust and reproducible initial treatment because the current methods for assessment disease signature elusive. Specifically for gene or RNA of the clinical risk Such as the various pathological (e.g., expression based prognostic models; the mitigating techno tumor stage), histological (e.g., Gleason's), clinical (e.g., logical limitations include the quality and quantity of RNA Kattan nomogram) and molecular biomarkers (e.g., PSA) are that can be isolated from routine clinical samples. Routine not reliable predictors of prognosis, specifically disease pro clinical samples of prostate cancer include needle-biopsies gression. Routine PSA testing has certainly increased Surveil and Surgical resections that have been fixed in formalin and lance and early-detection rates of prostate cancer and this has embedded in paraffin wax (FFPE). FFPE-derived RNA is resulted in an increased number of patients being treated but typically degraded and fragmented to between 100-300 bp in not significantly decreased the mortality rate. size and without poly-A tails making it of little use for tradi 0006. Despite the controversies surrounding PSA testing tional 3'-biased gene expression profiling, which requires as a screening tool, most physicians confidently rely on PSA larger microgram quantities of RNA with intact poly-A tails testing to assess pre-treatment prognosis and to monitor dis to prime clNA synthesis. ease progression after initial therapy. Successive increases in 0009 Furthermore, as <2% of the genome encodes for PSA levels above a defined threshold value or variations protein, traditional gene expression profiling in fact captures thereof (i.e. Rising-PSA), also known as biochemical recur only a small fraction of the transcriptome and variation in rence has been shown to be correlated to disease progression expression as most RNA molecules that are transcribed are after first-line therapy (e.g., prostatectomy, radiation and not translated into protein but serve other functional roles and brachytherapy). However, less than a /3 of patients with ris non-coding RNAS are the most abundant transcript species in ing-PSA will eventually be diagnosed with systemic or meta the genome. static disease and several studies have shown that after long 0010. This background information is provided for the term follow up, the majority will never show any symptoms of purpose of making known information believed by the appli disease progression aside from increases in PSA measure cant to be of possible relevance to the present invention. No US 2011/013.6683 A1 Jun. 9, 2011

admission is necessarily intended, nor should be construed, means for correlating the expression level with a classifica that any of the preceding information constitutes prior art tion of prostate cancer status; and means for outputting the against the present invention. prostate cancer status. 0017. In accordance with another aspect of the present invention, there is provided a computer-readable medium SUMMARY OF THE INVENTION comprising one or more digitally-encoded expression pattern profiles representative of the level of expression of one or 0011. An object of the present invention is to provide more transcripts of one or more selected from the group systems and methods for expression-based discrimination of of genes set forth in Table 3 and/or 6, each of said one or more distinct clinical disease states in prostate cancer. In accor expression pattern profiles being associated with a value dance with one aspect of the present invention, there is pro vided a system for expression-based assessment of risk of wherein each of said values is correlated with the presence of prostate cancer recurrence after prostatectomy, said system recurrent or non-recurrent prostate cancer. comprising one or more polynucleotides, each of said poly BRIEF DESCRIPTION OF THE DRAWINGS nucleotides capable of specifically hybridizing to a RNA transcript of a gene selected from the group of genes set forth 0018. These and other features of the invention will in Table 3 and/or 6. become more apparent in the following detailed description 0012. In accordance with another aspect of the present in which reference is made to the appended drawings. invention, there is provided a nucleic acid array for expres (0019 FIG.1. A) Principle components analysis (PCA) of Sion-based assessment of prostate cancer recurrence risk, said 2,114 RNAs identified to be differentially expressed between array comprising at least ten probes immobilized on a Solid tumors from patients with differing clinical outcome (see support, each of said probes being between about 15 and Table 2 for comparisons evaluated), PCA plot of 22 prostate about 500 nucleotides in length, each of said probes being cancer tumors shows tight clustering of samples by clinical derived from a sequence corresponding to, or complementary outcome of patients (circles, NED: diamonds, PSA; squares, to, a transcript of a gene selected from the group of genes set SYS). B) Two-way hierarchical clustering dendrogram and forth in Table 3 and/or 6, or a portion of said transcript. expression matrix of 526 target sequences (Table 4) RNAs 0013. In accordance with another aspect of the present filtered using linear regression (p<0.01) to identify RNAs that invention, there is provided a method for expression-based followed either SYS)PSA>NED or NEDDPSA>SYS trend assessment of prostate cancer recurrence, said method com in differential expression. C) Two-way hierarchical clustering prising: (a) determining the expression level of one or more dendrogram and expression matrix of 148 target sequences transcripts of one or more genes in a test sample obtained (Table 5), a subset of the most differentially expressed tran from said Subject to provide an expression pattern profile, Scripts between patients with clinically significant recurrent (i.e., SYS) and non-recurrent (i.e., PSA and NED) dis said one or more genes selected from the group of genes set ease as filtered using a t-test (p<0.001). For B) and C), sample forth in Table 3 and/or 6, and (c) comparing said expression and RNAs were optimally ordered using Pearson's correla pattern profile with a reference expression pattern profile. tion distance metric with complete-linkage cluster distances 0014. In accordance with another aspect of the present and the expression of each RNA in each sample was normal invention, there is provided a kit for characterizing the ized in the heatmap by the number of standard deviations expression of one or more nucleic acid sequences depicted in above (blacker) and below (whiter) the median expression SEQID NOs: 1-2114 comprising one or more nucleic acids value (grey) across all samples. selected from (a) a nucleic acid depicted in any of SEQ ID 0020 FIG. 2. Histograms showing distribution patient's NOs: 1-2114; (b) an RNA form of any of the nucleic acids tumor expression levels of a metagene generated from a depicted in SEQID NOs: 1-2114; (c) a peptide nucleic acid linear combination of the 526 RNAs for each clinical group. form of any of the nucleic acids depicted in SEQ ID NOs: The histograms bin Samples with similar metagene expres 1-2114; (d) a nucleic acid comprising at least 20 consecutive sion values and significantly separate three modes of patient bases of any of (a-c); (e) a nucleic acid comprising at least 25 metagene scores (ANOVA, p<0.000001) corresponding to consecutive bases having at least 90% sequence identity to the three clinical status groups evaluated. any of (a-c); or (f) a complement to any of (a-e); and option 0021 FIG. 3. Scatter plots summarizing the mean (tstan ally instructions for correlating the expression level of said dard deviation) of metagene expression values for tumor one or more nucleic acid sequences with the disease state of samples from patients in the three clinical status groups prostate cancer tissue. (NED; PSA; SYS). Metagenes were generated from a linear 0015. In accordance with another aspect of the present combinations of 6 (A), 18 (0) or 20 () RNAs and demon invention, there is provided an array of probe nucleic acids strate highly significant differential expression between clini certified for use in expression-based assessment of prostate cal groups (ANOVA, p<0.000001). cancer recurrence risk, wherein said array comprises at least 0022 FIG. 4. Box plots showing interquartile range and two different probe nucleic acids that specifically hybridize to distribution of POP scores for each clinical group using an corresponding different target nucleic acids depicted in one of 18-target sequence metagene (Table 7) to derive patient out SEQID NOs: 1-2114, an RNA form thereof, or a complement come predictor scores scaled and normalized on a data range to either thereof. of 0-100 points. T-tests were used to evaluate the statistical 0016. In accordance with another aspect of the present significance of differences in POP scores between NED and invention, there is provided a device for classifying a biologi PSA (*) as well as between PSA and SYS (*) clinical groups cal sample from a prostate cancer as recurrent or non-recur (p<7x107 and p<1x10, respectively). rent, the device comprising means for measuring the expres 0023 FIG. 5. Box plots showing interquartile range and sion level of one or more transcripts of one or more genes distribution of POP scores for each clinical group using a selected from the group of genes set forth in Table 3 and/or 6: 10-target sequence metagene (Table 9) to derive patient out US 2011/013.6683 A1 Jun. 9, 2011

come predictor scores scaled and normalized on a data range can be correlated with an increased likelihood of a long-term of 0-100 points. T-tests were used to evaluate the statistical NED prognosis or low risk of prostate cancer recurrence. significance of differences in POP scores between recurrent Increased relative expression of one or more target sequences (i.e., SYS) and non-recurrent (i.e., PSA and NED) in a SYS Group corresponding to the sequences as set forth patient groups (**, p<4x10'). in SEQID NOs: 914-2114 is indicative of or predictive of an 0024 FIG. 6. Box plots showing interquartile range and aggressive form of prostate cancer and can be correlated with distribution of POP scores for each clinical group using a an increased likelihood of a long-term SYS prognosis or high 41-target sequence metagene (Table 10) to derive patient risk of prostate cancer recurrence. Optionally, intermediate outcome predictor scores scaled and normalized on a data relative levels of one or more target sequences in a PSA range of 0-100 points. T-tests were used to evaluate the sta Group corresponding to target sequences set forth in Table 7 tistical significance of differences in POP scores between is indicative of or predictive of biochemical recurrence. Sub recurrent (i.e., SYS) and non-recurrent (i.e., PSA and sets and combinations of these target sequences or probes NED) patient groups (*, p<2x10'). complementary thereto may be used as described herein. 0025 FIG. 7. Box plots showing interquartile range and 0030. Before the present invention is described in further distribution of POP scores for each clinical group using a detail, it is to be understood that this invention is not limited 148-target sequence metagene to derive patient outcome pre to the particular methodology, compositions, articles or dictor scores scaled and normalized on a data range of 0-100 machines described, as Such methods, compositions, articles points. T-tests were used to evaluate the statistical signifi or machines can, of course, vary. It is also to be understood cance of differences in POP scores between recurrent (i.e., that the terminology used herein is for the purpose of describ SYS) and non-recurrent (i.e., PSA and NED) patient ing particular embodiments only, and is not intended to limit groups (*, p<9x10'). the scope of the present invention. DETAILED DESCRIPTION OF THE INVENTION DEFINITIONS 0026. The present invention provides a system and method 0031. Unless defined otherwise or the context clearly dic for assessing prostate cancer recurrence risk by distinguish tates otherwise, all technical and Scientific terms used herein ing clinically distinct disease states in men with prostate have the same meaning as commonly understood by one of cancer at the time of initial diagnosis or Surgery. The system ordinary skill in the art to which this invention belongs. In and methods are based on the identification of gene tran describing the present invention, the following terms will be scripts following a retrospective analysis of tumor samples employed, and are intended to be defined as indicated below. that are differentially expressed in prostate cancer in a manner 0032. The term “polynucleotide' as used herein refers to a dependent on prostate cancer aggressiveness as indicated by polymer of greater than one nucleotide in length of ribo long-term post-prostatectomy clinical outcome. These gene nucleic acid (RNA), deoxyribonucleic acid (DNA), hybrid transcripts can be considered as a library which can be used as RNA/DNA, modified RNA or DNA, or RNA or DNA mimet a resource for the identification of sets of specific target ics, including peptide nucleic acids (PNAS). The polynucle sequences (“prostate cancer prognostic sets’), which may otides may be single- or double-stranded. The term includes represent the entire library of gene transcripts or a Subset of polynucleotides composed of naturally-occurring nucleo the library and the detection of which is indicative of prostate bases, Sugars and covalent internucleoside (backbone) link cancer recurrence risk. The invention further provides for ages as well as polynucleotides having non-naturally-occur probes capable of detecting these target sequences and prim ring portions which function similarly. Such modified or ers that are capable of amplifying the target sequences. substituted polynucleotides are well-known in the art and for 0027. In accordance with one embodiment of the inven the purposes of the present invention, are referred to as “ana tion, the system and method for assessing prostate cancer logues.” recurrence risk are prognostic for a post Surgery clinical out 0033 “Complementary' or “substantially complemen come selected from no evidence of disease (NED), bio tary” refers to the ability to hybridize or between chemical relapse (two Successive increases in prostate-spe nucleotides or nucleic acids, such as, for instance, between a cific antigen levels; (PSA) and systemic prostate cancer sensor peptide nucleic acid or polynucleotide and a target systemic metastases (SYS). polynucleotide. Complementary nucleotides are, generally, A 0028. In accordance with one embodiment of the inven and T (or A and U), or C and G. Two single-stranded poly tion, the target sequences comprised by the prostate cancer nucleotides or PNAS are said to be substantially complemen prognostic set are sequences based on or derived from the tary when the bases of one strand, optimally aligned and gene transcripts from the library, or a subset thereof. Such compared and with appropriate insertions or deletions, pair sequences are occasionally referred to herein as "probe selec with at least about 80% of the bases of the other strand, tion regions” or “PSRs.” In another embodiment of the inven usually at least about 90% to 95%, and more preferably from tion, the target sequences comprised by the prostate classifi about 98 to 100%. cation set are sequences based on the gene transcripts from 0034. Alternatively, substantial complementarity exists the library, or a subset thereof, and include both coding and when a polynucleotide will hybridize under selective hybrid non-coding sequences. ization conditions to its complement. Typically, selective 0029. In one embodiment, the systems and methods pro hybridization will occur when there is at least about 65% vide for the molecular analysis of the expression levels of one complementarity over a stretch of at least 14 to 25 bases, for or more of the target sequences as set forth in SEQID NOs: example at least about 75%, or at least about 90% comple 1-2114 (Table 4). Increased relative expression of one or mentarity. See, M. Kanehisa Nucleic Acids Res. 12:203 more target sequences in a NED Group corresponding to the (1984). sequences as set forth in SEQID NOs: 1-913 is indicative of 0035 “Preferential binding” or “preferential hybridiza or predictive of a non-recurrent form of prostate cancer and tion” refers to the increased propensity of one polynucleotide US 2011/013.6683 A1 Jun. 9, 2011 to bind to its complement in a sample as compared to a understood that such a variation is always included in any noncomplementary polymer in the sample. given value provided herein, whether or not it is specifically 0036) Hybridization conditions will typically include salt referred to. concentrations of less than about 1M, more usually less than 0043. Use of the singular forms “a,” “an,” and “the about 500 mM, for example less than about 200 mM. In the include plural references unless the context clearly dictates case of hybridization between a peptide nucleic acid and a otherwise. Thus, for example, reference to “a polynucleotide' polynucleotide, the hybridization can be done in solutions includes a plurality of polynucleotides, reference to “a target” containing little or no salt. Hybridization temperatures can be includes a plurality of such targets, reference to “a normal as low as 5°C., but are typically greater than 22°C., and more ization method’ includes a plurality of such methods, and the typically greater than about 30°C., for example in excess of like. Additionally, use of specific plural references, such as about 37° C. Longer fragments may require higher hybrid “two.” “three.” etc., read on larger numbers of the same sub ization temperatures for specific hybridization as is known in ject, unless the context clearly dictates otherwise. the art. Other factors may affect the stringency of hybridiza 0044 Terms such as “connected,” “attached,” “linked' tion, including base composition and length of the comple and 'conjugated' are used interchangeably herein and mentary strands, presence of organic solvents and extent of encompass direct as well as indirect connection, attachment, base mismatching, and the combination of parameters used is linkage or conjugation unless the context clearly dictates more important than the absolute measure of any one alone. otherwise. Other hybridization conditions which may be controlled 0045. Where a range of values is recited, it is to be under include buffer type and concentration, solution pH, presence stood that each intervening integer value, and each fraction and concentration of blocking reagents to decrease back thereof, between the recited upper and lower limits of that ground binding such as repeat sequences or blocking protein range is also specifically disclosed, along with each subrange Solutions, detergent type(s) and concentrations, molecules between such values. The upper and lower limits of any range such as polymers which increase the relative concentration of can independently be included in or excluded from the range, the polynucleotides, metal ion(s) and their concentration(s), and each range where either, neither or both limits are chelator(s) and their concentrations, and other conditions included is also encompassed within the invention. Where a known in the art. value being discussed has inherent limits, for example where 0037 “Multiplexing herein refers to an assay or other a component can be present at a concentration of from 0 to analytical method in which multiple analytes can be assayed 100%, or where the pH of an aqueous solution can range from simultaneously. 1 to 14, those inherent limits are specifically disclosed. Where 0038 A “target sequence” as used herein (also occasion a value is explicitly recited, it is to be understood that values ally referred to as a "PSR' or “probe selection region”) refers which are about the same quantity or amount as the recited to a region of the genome against which one or more probes value are also within the scope of the invention, as are ranges can be designed. As used herein, a probe is any polynucle based thereon. Where a combination is disclosed, each sub otide capable of selectively hybridizing to a target sequence combination of the elements of that combination is also spe or its complement, or to an RNA version of either. A probe cifically disclosed and is within the scope of the invention. may comprise ribonucleotides, deoxyribonucleotides, pep Conversely, where different elements or groups of elements tide nucleic acids, and combinations thereof. A probe may are disclosed, combinations thereofare also disclosed. Where optionally comprise one or more labels. In some embodi any element of an invention is disclosed as having a plurality ments, a probe may be used to amplify one or both strands of of alternatives, examples of that invention in which each a target sequence or an RNA form thereof, acting as a sole alternative is excluded singly or in any combination with the primer in an amplification reaction or as a member of a set of other alternatives are also hereby disclosed; more than one primers. element of an invention can have such exclusions, and all 0039) “Having” is an openended phrase like "comprising combinations of elements having such exclusions are hereby and including.” and includes circumstances where addi disclosed. tional elements are included and circumstances where they are not. Prostate Cancer Prognostic System 0040) "Optional' or “optionally’ means that the subse 0046) The system of the present invention is based on the quently described event or circumstance may or may not identification of a library of gene and RNA transcripts that are occur, and that the description includes instances where the differentially expressed in prostate cancer in a manner depen event or circumstance occurs and instances in which it does dent on prostate cancer aggressiveness as indicated by the not post-prostatectomy clinical outcome of the patient. For 0041 As used herein NED describes a clinically distinct example, relative over expression of one or more of the gene disease state in which patients show no evidence of disease transcripts in a prostate cancer sample compared to a refer (NED) at least 5 years after surgery, PSA describes a ence sample or expression profile or signature there from may clinically distinct disease state in which patients show bio be prognostic of a clinically distinct disease outcome post chemical relapse only (two successive increases in prostate prostatectomy selected from no evidence of disease (NED), specific antigen levels but no other symptoms of disease with biochemical relapse (PSA) and prostate cancer disease sys at least 5 years follow up after surgery; PSA) and SYS temic recurrence or metastases (SYS). The reference describes a clinically distinct disease state in which patients sample can be, for example, from prostate cancer sample(s) of develop biochemical relapse and present with systemic pros one or more references subject(s) with a known post-pros tate cancer disease or metastases (SYS) within five years tatectomy clinical outcomes. The reference expression profile after the initial treatment with radical prostatectomy. or signature may optionally be normalized to one or more 0042. As used herein, the term “about” refers to approxi appropriate reference gene transcripts. Alternatively or in mately a +/-10% variation from a given value. It is to be addition to, expression of one or more of the gene transcripts US 2011/013.6683 A1 Jun. 9, 2011

in a prostate cancer Sample may be compared to an expression tor; Ataxin 1: ATM/ATR-Substrate Chk2-Interacting Zn2+- profile or signature from normal prostate tissue. finger protein; ATPase, Class I, type 8B, member 1: ATPase, 0047 Expression profiles or signatures from prostate can Na+/K+ transporting, alpha 1 polypeptide; ATP-binding cas cer samples may be normalized to one or more housekeeping sette, sub-family F (GCN20), member 1: Autism susceptibil gene transcripts such that normalized over and/or under ity candidate 2: Baculoviral IAP repeat-containing 6 (apol expression of one or more of the gene transcripts in a sample lon); Basonuclin 2: Brain-specific angiogenesis inhibitor 3: may be indicative of a clinically distinct disease state or Bromodomain containing 7: Bromodomain containing 8: prognosis. Bromodomain PHD finger transcription factor; BTB (POZ) domain containing 16; BTB (POZ) domain containing 7; Prostate Prognostic Library Calcium activated nucleotidase 1, Calcium binding protein 0048. The Prostate Prognostic Library in accordance with P22; Calcium channel, Voltage-dependent, beta 4 subunit; the present invention comprises one or more gene or RNA Calcium channel, Voltage-dependent, L type, alpha 1 C Sub transcripts whose relative and/or normalized expression is unit, Calcium channel, Voltage-dependent, L type, alpha 1D indicative of prostate cancer recurrence and which may be Subunit; Calcyclin binding protein; Calmodulin 1 (phospho prognostic for post-prostatectomy clinical outcome of a rylase kinase, delta); Calsyntenin 1; Carbonyl reductase 3: patient. Exemplary RNA transcripts that showed differential Cardiolipin synthase 1: Carnitine palmitoyltransferase 1A expression in prostate cancer samples from patients with (liver); Casein kinase 1, delta; Casein kinase 1, gamma 1: clinically distinct disease outcomes after initial treatment Casein kinase 1, gamma 3: Caspase 1, apoptosis-related cys with radical prostatectomy are shown in Table 3. In one teine peptidase (interleukin 1, beta, convertase); CD109 mol embodiment of the invention, the library comprises one or ecule; CD99 molecule-like 2: CDK5 regulatory subunit asso more of the gene transcripts of the genes listed in Table 3. ciated protein 2: CDP-diacylglycerol synthase 0049. In one embodiment, the library comprises at least (phosphatidate cytidylyltransferase) 2: Cell adhesion mol one transcript from at least one gene selected from those listed ecule 1: Cell division cycle and apoptosis regulator 1: Cen in Table 3. In one embodiment, the library comprises at least trosomal protein 70 kDa; Chloride channel 3: Chromodomain one transcript from each of at least 5 genes selected from helicase DNA binding protein 2: Chromodomain helicase those listed in Table 3. In another embodiment, the library DNA binding protein 6: Chromodomain protein, Y-like 2: comprises at least one transcript from each of at least 10 genes 1 ORF 116; Chromosome 1 ORF52; Chromo selected from those listed in Table 3. Inafurther embodiment, some 10 ORF 118; Chromosome 12 ORF30; Chromosome the library comprises at least one transcript from each of at 13 ORF23; Chromosome 16 ORF 45: Chromosome 18 ORF least 15 genes selected from those listed in Table 1. In other 1: Chromosome 18 ORF 1: Chromosome 18 ORF 1: Chro embodiments, the library comprises at least one transcript mosome 18 ORF 1: Chromosome 18 ORF 17: Chromosome from each of at least 20, at least 25, at least 30, at least 35, at 2 ORF3; Chromosome 20 ORF 133; Chromosome 21 ORF least 40, at least 45, at least 50, at least 55, at least 60 and at 25; Chromosome 21 ORF 34; Chromosome 22 ORF 13: least 65 genes selected from those listed in Table 3. In a Chromosome 3 ORF 26; Chromosome 5 ORF3; Chromo further embodiment, the library comprises at least one tran some 5 ORF 33: Chromosome 5 ORF 35; Chromosome 5 script from all of the genes listed in Table 3. In a further ORF 39; Chromosome 7 ORF 13; Chromosome 7 ORF 42: embodiment, the library comprises at all transcripts from all Chromosome 9 ORF 3: Chromosome 9 ORF 94; Chromo of the genes listed in Table 3. some Y ORF 15B; Chymase 1, mast cell; Citrate lyase beta 0050. In one embodiment, the library comprises at least like; Class II, major histocompatibility complex, transactiva one transcript from at least one gene selected from the group tor; C-Maf-inducing protein; Coatomer protein complex, consisting of NM 001004722); NM 001005522); NM subunit alpha; Cofilin 2 (muscle); Coiled-coil domain con 001013671); NM 001033517); NM 183049); NM taining 50: Coiled-coil domain containing 7: Coiled-coil 212559: 5'-3' exoribonuclease 1: A kinase (PRKA) anchor helix-coiled-coil-helix domain containing 4: Cold shock protein (yotiao) 9; AarF domain containing kinase 4: Abhy domain containing E1, RNA-binding; Collagen, type XII. drolase domain containing 3: Aconitase 1, soluble; Actinin, alpha 1: Complement component 1, r Subcomponent-like; alpha 1: ADAM metallopeptidase domain 19 (meltrin beta); Core-binding factor, runt domain, alpha subunit 2; translo Adaptor-related protein complex 1, gamma 2 subunit; cated to. 2: CREB binding protein (Rubinstein-Taybi syn Adenosine deaminase, RNA-specific, B2 (RED2 homolog drome); CTD (carboxy-terminal domain, RNA polymerase rat); Adenylate cyclase 3; ADP-ribosylation factor GTPase II, polypeptide A) small phosphatase 2: CTD (carboxy-ter activating protein 3; ADP-ribosylation factor guanine nucle minal domain, RNA polymerase II, polypeptide A) Small otide-exchange factor 2 (brefeldin A-inhibited); ADP-ribosy phosphatase-like: CUG triplet repeat, RNA binding protein2: lation factor-like 4D: Adrenergic, beta, receptor kinase 2; Cullin 3: Cut-like 2: Cyclin F: Cyclin Y: Cysteine-rich with AF4/FMR2 family, member 3: Amyloid beta (A4) precursor EGF-like domains 1: Cytochrome P450, family 4, subfamily protein-binding, family B, member 1 (Feó5); Anaphase pro F. polypeptide 11; Cytoplasmic FMR1 interacting protein 2: moting complex subunit 1: Ankyrin 3, node of Ranvier DAZ interacting protein 1-like; DCP2 decapping enzyme (ankyrin G); Ankyrin repeat domain 15: Ankyrin repeat homolog (S. cerevisiae): DEAD box polypeptide 47; DEAD domain 28; Annexin A1, Annexin A2; Anterior pharynx box polypeptide 5: DEAD box polypeptide 52: DEAD box defective 1 homolog B (C. elegans); Anthraxtoxin receptor 1; polypeptide 56: Death inducer-obliterator 1: Dedicator of Antizyme inhibitor 1; Arachidonate 12-lipoxygenase, 12R cytokinesis 2: DEP domain containing 1B: DEP domain con type; Arginine vasopressin receptor 1A: Arginine-glutamic taining 2: DEP domain containing 6; Development and dif acid dipeptide (RE) repeats; ARP3 actin-related protein 3 ferentiation enhancing factor 1; Diacylglycerollipase, alpha; homolog (yeast); Arrestin 3, retinal (X-arrestin); Arrestin Diaphanous homolog 2 (Drosophila); Dickkopf homolog 3. domain containing 1; Aryl hydrocarbon receptor interacting Dihydropyrimidine dehydrogenase; Dipeptidyl-peptidase protein-like 1; Aryl hydrocarbon receptor nuclear transloca 10; Discs, large homolog 2, chapsyn-1 10: Dishevelled, dsh US 2011/013.6683 A1 Jun. 9, 2011 homolog 2: DnaJ (Hsp40) homolog, subfamily C, member 6: tal); Leptin receptor overlapping transcript-like 1; Leucine Dpy-19-like 3: Dual specificity phosphatase 5: Ectodysplasin rich repeat containing 16; Leucine-rich repeat kinase 1: Leu A receptor; Ectonucleoside triphosphate diphosphohydrolase cine-rich repeat-containing G protein-coupled receptor 4: 7; EGFR-coamplified and overexpressed protein; ELL asso LIM domain 7: Major histocompatibility complex, class II, ciated factor 1; Emopamil binding protein (sterol isomerase); DR beta 1; Malignant fibrous histiocytoma amplified Enabled homolog; Ephrin-A5; ER lipid raft associated 1: sequence 1; Maltase-glucoamylase (alpha-glucosidase); Erythroblast membrane-associated protein (Scianna blood Mannosidase, alpha, class 2A, member 1; Mannosyl (alpha group); Erythrocyte membrane protein band 4.1 like 4A; 1.6-)-glycoprotein beta-1,6-N-acetyl-glucosaminyltrans Etoposide induced 2.4 mRNA; Eukaryotic translation initia ferase: MBD2-interacting zinc finger; Melanin-concentrat tion factor 4E family member 3: FAD1 flavin adenine dinucleotide synthetase homolog; Family with sequence ing hormone receptor 1: Methionyl-tRNA synthetase; Methyl similarity 110, member A. Family with sequence similarity CpG binding protein 2: Methyl-CpG binding domain protein 114, member A1: Family with sequence similarity 135, mem 5; Microcephaly, primary autosomal recessive 1; Microsemi ber A. Family with sequence similarity 40, member A. Family noprotein, beta-; Microtubule-associated protein 1B: Micro with sequence similarity 80, member B; F-box and leucine tubule-associated protein 2: Minichromosome maintenance rich repeat protein 11; F-box and leucine-rich repeat protein complex component 3 associated protein; Mitochondrial 7; F-box protein 2; Ferritin, heavy polypeptide 1: Fibronectin ribosomal protein S15; Mohawk homeobox; Monooxyge nase, DBH-like 1; MORN repeat containing 1; Muscle RAS type III domain containing 3A: Fibronectin type III domain oncogene homolog; Muscleblind-like; Myelin protein Zero containing 3B; Fibulin 1: FLJ25476 protein: FLJ41603 pro like 1; Myeloid/lymphoid or mixed-lineage leukemia (tritho tein; Forkhead box J3; Forkhead box J3; Forkhead box K1, rax homolog, Drosophila); translocated to. 4; Myocyte Forkhead box P1: Frizzled homolog3; Frizzled homolog 5: G enhancer factor 2B: Myosin IF: Myosin, heavy chain 3, skel protein-coupled receptor kinase interactor 2; GABAA recep etal muscle, embryonic; N-acetylgalactosaminidase, alpha-, tor, delta; GATA binding protein 2; GDNF family receptor N-acetylglucosamine-1-phosphate transferase, alpha and alpha 2: Gelsolin (amyloidosis, Finnish type); Genethonin 1: beta Subunits; Nascent polypeptide-associated complex alpha Glucose phosphate isomerase; Glucose-fructose oxidoreduc subunit; NECAP endocytosis associated 2: Necdinhomolog; tase domain containing 1: Glucosidase, beta (bile acid) 2; Neural precursor cell expressed, developmentally down Glutamate dehydrogenase 1: Glutaminase; Glutamyl ami regulated 9; Neuregulin 1: Neuron navigator 1: Nibrin; Nico nopeptidase (aminopeptidase A); Glutathione reductase tinamide N-methyltransferase: NIMA (never in mitosis gene Glycogen synthase kinase 3beta; Grainyhead-like 2: Gremlin a)-related kinase 6: NLR family, CARD domain containing 5: 1, cysteine knot Superfamily, homolog; GTPase activating NOL1/NOP2/Sun domain family, member 3: NOL1/NOP2/ protein (SH3 domain) binding protein 1; Hairy/enhancer-of Sun domain family, member 3: NOL1/NOP2/Sun domain split related with YRPW motif 2: Heparan sulfate 6-O-sul family, member 6: Nuclear receptor coactivator 2; Nuclear fotransferase 3: Hermansky-Pudlak syndrome 5: Heteroge receptor coactivator 6: Nuclear receptor Subfamily 2, group F, neous nuclear ribonucleoprotein C(C1/C2) member 2; Nuclear receptor subfamily 3, group C, member 2: Hippocalcin-like 1; Histocompatibility (minor) 13: Histone Nuclear receptor subfamily 4, group A, member 2; Nuclear cluster 1, H3d; Histone deacetylase 6: Homeobox A1; transcription factor, X-box binding-like 1: Nucleolar and Homeobox and leucine Zipper encoding; Host cell factor C1 coiled-body phosphoprotein 1: Overexpressed in colon car (VP16-accessory protein); Hyaluronan binding protein 4; cinoma-1; PAN3 polyA specific ribonuclease subunit Hyperpolarization activated cyclic nucleotide-gated potas homolog; PAP associated domain containing 1; ParaoXonase sium channel 3: Hypothetical gene supported by AK128346; 2: Paraspeckle component 1; PCTAIRE protein kinase 2: Hypothetical LOC51149; Hypothetical protein FLJ12949; Peptidase D.; Pericentrin (kendrin); Peroxisomal biogenesis Hypothetical protein FLJ20035: Hypothetical protein factor 19; PHD finger protein 8: Phosphatidic acid phos FLJ20309: Hypothetical protein FLJ38482; Hypothetical phatase type 2 domain containing 3: Phosphatidylinositol protein HSPC148: Hypothetical protein LOC130576; Hypo 4-kinase, catalytic, alpha polypeptide; Phosphatidylinositol thetical protein LOC285908: Hypothetical protein glycan anchor biosynthesis, class O. Phosphatidylinositol LOC643155: Iduronidase, alpha-L-IKAROS family Zinc fin transfer protein, beta; Phosphodiesterase 4D. cAMP-specific; ger 1 (Ikaros); IlvB (bacterial acetolactate synthase)-like: Phosphoglucomutase 5: Phosphoglycerate mutase family Inhibitor ofkappa light polypeptide gene enhancer in B-cells, member 5: Phosphoinositide-3-kinase, class 2, beta polypep kinase beta; Inositol polyphosphate-4-phosphatase, type II; tide; Phospholipase A2, group IVB (cytosolic); Phospholi Integrin, alpha 4 (antigen CD49D, alpha 4 subunit of VLA-4 pase C, beta 1 (phosphoinositide-specific); Phospholipase C, receptor); Integrin, alpha 6; Integrin, alpha 9: Integrin, alpha gamma 2 (phosphatidylinositol-specific); Phosphorylase V (vitronectin receptor, alpha polypeptide, antigen CD51); kinase, beta; Plasminogen activator, tissue; Platelet-activat Integrin, beta-like 1 (with EGF-like repeat domains); Inter ing factor acetylhydrolase, isoform Ib, alpha subunit 45 kDa; alpha (globulin) inhibitor H3; Interleukin enhancer binding Pleckstrin homology domain containing, family A (phos factor 3, 90 kDa. Intestine-specific homeobox phoinositide binding specific) member 3: Pleckstrin homol Intraflagellar transport 172 homolog; Janus kinase 1: Jumonji ogy domain containing, family A member 7: Pleckstrin domain containing 1B: Jumonji domain containing 2B: homology domain containing, family G (with RhoGef Jumonji domain containing 2C; Kalirin, RhoGEF kinase; domain) member 3: Pleckstrinhomology domain containing, Kallikrein-related peptidase 2: Karyopherin alpha 3 (impor family H (with MyTH4 domain) member 1: Poly (ADP tin alpha 4); Keratinocyte associated protein 2: KIAA0152; ribose) polymerase family, member 16; Poly (ADP-ribose) KIAA0241; KIAA0319-like; KIAA0495; KIAA0562; polymerase family, member 2; Poly(A) polymerase alpha; KIAA0564 protein; KIAA1217; KIAA1244: KIAA1244: La Poly(A)-specific ribonuclease (deadenylation nuclease); ribonucleoprotein domain family, member 1, Lamin A/C. Polymerase (DNA directed) nu; Polymerase (DNA directed), LATS, large tumor Suppressor, homolog 2: Leiomodin 3 (fe gamma 2, accessory subunit; Polymerase (RNA) II (DNA US 2011/013.6683 A1 Jun. 9, 2011

directed) polypeptide L; Polymerase (RNA) III (DNA tide repeat domain 23; Thioredoxin-like 2: THUMP domain directed) polypeptide E; Polymerase I and transcript release containing 3: TIMELESS interacting protein; TOX high factor; Potassium channel tetramerisation domain containing mobility group box family member 4: Trafficking protein, 1; Potassium channel tetramerisation domain containing 2; kinesin binding 1: Transcription factor 7-like 1 (T-cell spe Potassium channel tetramerisation domain containing 7; cific, HMG-box); Transcription factor 7-like 2 (T-cell spe Potassium channel, subfamily K, member 1: Presenilin 1: cific, HMG-box); Translocase of inner mitochondrial mem PRKR interacting protein 1: Procollagen-lysine, 2-oxoglut brane 13 homolog: Translocated promoter region (to arate 5-dioxygenase 2: ProSAPiP1 protein; Prostaglandin E activated MET oncogene); Translocation associated mem synthase 3 (cytosolic); Protease, serine, 2 (trypsin. 2); Protein brane protein 2: Transmembrane 9 superfamily member 2: kinase, Y-linked; Protein phosphatase 1, regulatory (inhibi Transmembrane emp24 protein transport domain containing tor) subunit 9A; Protein phosphatase 3 (formerly 2B), cata 8; Transmembrane emp24-like trafficking protein 10; Trans lytic subunit, alpha isoform; Protein phosphatase 4, regula membrane protein 134: Transmembrane protein 18; Trans tory subunit 1-like; Protein tyrosine phosphatase, non membrane protein 18; Transmembrane protein 29: Triadin; receptor type 18 (brain-derived); Protein tyrosine Tribbles homolog 1; Trinucleotide repeat containing 6C; Tri phosphatase, non-receptor type 3: Protein tyrosine phos partite motif-containing 36; Tripartite motif-containing 61; phatase, receptor type, D: Protein-O-mannosyltransferase 1: TRNA methyltransferase 11 homolog: TruB pseudouridine Proteolipid protein 2 (colonic epithelium-enriched); Pro (psi) synthase homolog 2; Tubby like protein 4: Tuftelin 1: tocadherin 7: Protocadherin gamma subfamily A, 1; PRP6 Tumor necrosis factor receptor superfamily, member 11b (os pre-mRNA processing factor 6 homolog: Putative home teoprotegerin); Tumor necrosis factor receptor Superfamily, odomain transcription factor 1; RAB GTPase activating pro member 25; Tumor necrosis factor, alpha-induced protein 8: tein 1-like: RAB10; RAB30; Rabaptin, RABGTPase binding Tyrosine kinase 2: Ubiquinol-cytochrome c reductase core effector protein 1: RAD51-like 1; RALBP1 associated Eps protein I; Ubiquitin specific peptidase 47; Ubiquitin specific domain containing 2: Rap guanine nucleotide exchange fac peptidase 5 (isopeptidase T); Ubiquitin specific peptidase 8; tor (GEF) 1: Rapamycin-insensitive companion of mTOR; Ubiquitin-like 7 (bone marrow stromal cell-derived); UBX Ras and Rab interactor 2: Receptor accessory protein 3: Ree domain containing 6: UDP-glucose ceramide glucosyltrans lin; Replication factor C (activator 1) 3; Replication protein ferase-like 2: UDP-N-acetyl-alpha-D-galactosamine: A3; Rho GTPase activating protein 18; Rho guanine nucle polypeptide N-acetylgalactosaminyltransferase 2 (GalNAc otide exchange factor (GEF) 10-like; Rhophilin, Rho GTPase T2); Unc-93 homolog B1; UTP6, Small subunit (SSU) binding protein 1: Ribonuclease H2, subunit B: Ribonuclease processome component, homolog; Vacuolar protein sorting 8 P 14 kDa subunit; Ring finger protein 10; Ring finger protein homolog; V-akt murine thymoma viral oncogene homolog 1; 144: Ring finger protein 44; RNA binding motif protein 16; V-ets erythroblastosis virus E26 oncogene homolog; Viral Roundabout, axon guidance receptor, homolog 1; Round DNA polymerase-transactivated protein 6: WD repeat about, axon guidance receptor, homolog 2; RUN domain domain 33: WD repeat domain 90: Wingless-type MMTV containing 2A; Scinderin; SEC23 interacting protein; Sec61 integration site family, member 2B: WW and C2 domain alpha 2 subunit; Septin 11; Serine/threonine kinase 32A; containing 1; Yip 1 domain family, member 3: YTH domain Serine/threonine kinase 32C; SGT1, suppressor of G2 allele family, member 3; Zinc finger and BTB domain containing of SKP1; SH3 and PX domains 2A; Signal peptide peptidase 10; Zinc finger and BTB domain containing 20; Zinc finger 3: Signal transducer and activator of transcription 1.91 kDa; and SCAN domain containing 18; Zinc finger homeodomain Single-stranded DNA binding protein 2: Small nuclear ribo 4; Zinc finger protein 14 homolog; Zinc finger protein 335; nucleoprotein polypeptide N; SNF8, ESCRT-II complex sub Zinc finger protein 394; Zinc finger protein 407; Zinc finger unit, homolog; Sodium channel, Voltage-gated, type III, alpha protein 608; Zinc finger protein 667; Zinc finger protein 692; Subunit; Solute carrier family 1 (neutral amino acid trans Zinc finger protein 718; Zinc finger protein 721; Zinc finger, porter), member 5; Solute carrier family 16, member 7 CCHC domain containing 9; Zinc finger, matrin type 1; Zinc (monocarboxylic acid transporter 2); Solute carrier family 2 finger, MYND domain containing 11; Zinc finger, ZZ-type (facilitated glucose transporter), member 11; Solute carrier containing 3; Zinc fingers and homeoboxes 2; and Zwilch, family 2 (facilitated glucose transporter), member 11: Solute kinetochore associated, homolog. carrier family 24 (Sodium/potassium/calcium exchanger), 0051. In one embodiment, the library comprises at least member 3: Solute carrier family 3 (activators of dibasic and one transcript from at least one gene selected from the group neutral amino acid transport), member 2; Solute carrier fam consisting of Replication factor C (activator 1) 3: Tripartite ily 30 (zinc transporter), member 6: Solute carrier family 39 motif-containing 61; Citrate lyase beta like; Ankyrin repeat (zinc transporter), member 10; Solute carrier family 43, domain 15; UDP-glucose ceramide glucosyltransferase-like member 1: Solute carrier family 9 (sodium/hydrogen 2: Hypothetical protein FLJ12949; Chromosome 22 open exchanger), member 3 regulator 2; SON DNA binding pro reading frame 13; Phosphatidylinositol glycan anchor bio tein; Sortilin-related VPS10 domain containing receptor 3: synthesis, class O: Solute carrier family 43, member 1: Sparcfosteonectin, cwcV and kazal-like domains proteogly Rabaptin, RAB GTPase binding effector protein 1; Zinc fin can (testican) 2; Spectrin repeat containing, nuclear envelope ger protein 14 homolog: Hypothetical gene Supported by 1: Sperm associated antigen 9: Splicing factor 3a, Subunit 2. AK128346: Adenylate cyclase 3: Phosphatidylinositol trans 66 kDa; Splicing factor 3b, subunit 1, 155 kDa; Staphylococ fer protein, beta; Zinc finger protein 667: Gremlin 1, cysteine cal nuclease and tudor domain containing 1; Staufen, RNA knot Superfamily, homolog: Ankyrin 3, node of Ranvier binding protein, homolog 1; Suppression of tumorigenicity 7; (ankyrin G) and Maltase-glucoamylase (alpha-glucosidase). Suppressor of variegation 4-20 homolog 1; Synapsin III: Syn 0052. In one embodiment, the library comprises at least taxin 3: Syntaxin 5; Tachykinin receptor 1; TAO kinase 3: one transcript from at least one gene selected from the group TBC1 domain family, member 16; TBC1 domain family, consisting of Replication factor C (activator 1) 3; Ankyrin member 19: Testis specific, 10; Tetraspanin 6: Tetratricopep repeat domain 15: Hypothetical protein FLJ12949; Solute US 2011/013.6683 A1 Jun. 9, 2011

carrier family 43, member 1: Thioredoxin-like 2: Polymerase subunit 1: Cofilin 2: Heparan sulfate 6-O-sulfotransferase 3: (RNA) II (DNA directed) polypeptide L: Syntaxin 5; Leucine Enabled homolog; Mannosyl (alpha-1,6-)-glycoprotein beta rich repeat containing 16. Calcium channel, Voltage-depen 1.6-N-acetyl-glucosaminyltransferase; Solute carrier family dent, beta 4 subunit; NM 001005522; G protein-coupled 24 (sodium/potassium/calcium exchanger), member 3; Inosi receptor kinase interactor 2: Ankyrin 3, node of Ranvier tol 1,4,5-triphosphate receptor, type 1; CAP-GLY domain (ankyrin G); Gremlin 1, cysteine knot Superfamily, homolog; containing linker protein; Transglutaminase 4: MOCO Sul Zinc finger protein 667: Hypothetical gene supported by phurase C-terminal domain containing 2: 4-hydroxyphe AK128346; Transmembrane 9 superfamily member 2: Potas nylpyruvate dioxygenase-like; and R3H domain containing sium channel, subfamily K, member 1: Chromodomain heli case DNA binding protein 2: Microcephaly, primary autoso 1 mal recessive 1: Chromosome 21 open reading frame 34 and 0057 The invention also contemplates that alternative Dual specificity phosphatase 5. libraries may be designed that include transcripts of one or 0053. In one embodiment, the library comprises at least more of the genes in Table 3, together with additional gene one transcript from at least one gene selected from the group transcripts that have previously been shown to be associated consisting of Replication factor C (activator 1) 3: Tripartite with prostate cancer Systemic progression. As is known in the motif-containing 61; Citrate lyase beta like; Ankyrin repeat art, the publication and sequence databases can be mined domain 15: Ankyrin 3, node of Ranvier (ankyrin G) and using a variety of search Strategies to identify appropriate Maltase-glucoamylase (alpha-glucosidase). additional candidates for inclusion in the library. For 0054. In one embodiment, the library comprises at least example, currently available scientific and medical publica one transcript from at least one gene selected from the group tion databases such as Medline, Current Contents, OMIM consisting of Replication factor C (activator 1) 3; Ankyrin (online Mendelian inheritance in man), various Biological repeat domain 15: Hypothetical protein FLJ12949; Solute and Chemical Abstracts, Journal indexes, and the like can be carrier family 43, member 1: Thioredoxin-like 2: Polymerase searched using term or key-word searches, or by author, title, (RNA) II (DNA directed) polypeptide L: Syntaxin 5; Leucine or other relevant search parameters. Many such databases are rich repeat containing 16. Calcium channel, Voltage-depen publicly available, and strategies and procedures for identi dent, beta 4 subunit; NM 001005522; G protein-coupled fying publications and their contents, for example, genes, receptor kinase interactor 2: Ankyrin 3, node of Ranvier other nucleotide sequences, descriptions, indications, expres (ankyrin G); Gremlin 1, cysteine knot Superfamily, homolog; sion pattern, etc. are well known to those skilled in the art. Zinc finger protein 667; Hypothetical gene supported by Numerous databases are available through the internet for AK128346; Transmembrane 9 superfamily member 2: Potas free or by subscription, see, for example, the National Center sium channel, subfamily K, member 1: Chromodomain heli Biotechnology Information (NCBI), Infotrieve, Thomson case DNA binding protein 2: Chromosome 9 open reading ISI, and Science Magazine (published by the AAAS) web frame 94; Chromosome 21 open reading frame 34; and Dual sites. Additional or alternative publication or citation data specificity phosphatase 5. bases are also available that provide identical or similar types 0055. In one embodiment, the library comprises at least of information, any of which can be employed in the context one transcript from at least one gene selected from the group of the invention. These databases can be searched for publi consisting of Citrate lyase beta like: Phosphodiesterase 4D, cations describing altered gene expression between recurrent cAMP-specific: Ectodysplasin A receptor; DEP domain con and non-recurrent prostate cancer. Additional potential can taining 6: Basonuclin 2: open reading frame didate genes may be identified by searching the above 3: FLJ25476 protein; Staphylococcal nuclease and tudor described databases for differentially expressed and domain containing 1; Hermansky-Pudlak syndrome 5 and by identifying the nucleotide sequence encoding the differ Chromosome 12 open reading frame 30. entially expressed proteins. A list of genes whose altered 0056. In one embodiment, the library comprises at least expression is between patients with recurrent disease and one transcript from at least one gene selected from the group non-recurrent prostate cancer is presented in Table 6. consisting of Replication factor C (activator 1) 3: Tripartite motif-containing 61: Citrate lyase beta like; Adaptor-related Prostate Cancer Prognostic Sets protein complex 1, gamma 2 subunit; Kallikrein-related pep 0.058 A Prostate Prognostic Set comprises one or more tidase 2: Phosphodiesterase 4D. cAMP-specific; Cytochrome target sequences identified within the gene transcripts in the P450, family 4, subfamily F, polypeptide 11; Ectodysplasin A prostate prognostic library, or a Subset of these gene tran receptor Scripts. The target sequences may be within the coding and/or Phospholipase C, beta1; KIAA 1244: Paraoxonase 2: Arachi non-coding regions of the gene transcripts. The set can com donate 12-lipoxygenase, 12R type; Cut-like 2: Chemokine prise one or a plurality of target sequences from each gene (C-X-C motif) ligand 12; Rho guanine nucleotide exchange transcript in the library, or subset thereof. The relative and/or factor (GEF) 5: Olfactory receptor, family 2, subfamily A, normalized level of these target sequences in a sample is member 4: Chromosome 19 open reading frame 42; Hypo indicative of the level of expression of the particular gene thetical gene supported by AK128346; Phosphoglucomutase transcript and thus of prostate cancer recurrence risk. For 5: Hyaluronan binding protein 4: NECAP endocytosis asso example, the relative and/or normalized expression level of ciated 2 one or more of the target sequences may be indicative of an Myeloid/lymphoid or mixed-lineage leukemia; translocated recurrent form of prostate cancer and therefore prognostic for to. 4; Signal transducer and activator of transcription 1: Chro prostate cancer systemic progression while the relative and/or mosome 2 open reading frame 3; FLJ25476 protein; Staphy normalized expression level of one or more other target lococcal nuclease and tudor domain containing 1; Transmem sequences may be indicative of a non-recurrent form of pros brane protein 18; Hermansky-Pudlak syndrome 5: tate cancer and therefore prognostic for a NED clinical out Chromosome 12 open reading frame 30: Splicing factor 3b, COC. US 2011/013.6683 A1 Jun. 9, 2011

0059. Accordingly, one embodiment of the present inven or 3' of the gene's transcriptional start site or representing tion provides for a library or catalog of candidate target transcripts expressed in a manner antisense to the gene, sequences derived from the transcripts (both coding and non amongst others. coding regions) of at least one gene Suitable for classifying 0064. In one embodiment, the Prostate Classification Set prostate cancer recurrence risk. In a further embodiment, the comprises target sequences derived from 382.253 base pair 3' present invention provides for a library or catalog of candi of Replication factor C (activator 1) 3, 38 kDa. 58,123 base date target sequences derived from the non-coding regions of pair 3' of Tripartite motif-containing 61; in intron #3 of Cit transcripts of at least one gene Suitable for classifying pros rate lyase beta like: in intron #2 of Ankyrin repeat domain 15; tate cancer recurrence risk. In still a further embodiment, the in exon #1 of UDP-glucose ceramide glucosyltransferase library or catalog of candidate target sequences comprises like 2; in exon of #19 of Hypothetical protein FLJ12949; in target sequences derived from the transcripts of one or more intron #4 of Chromosome 22 open reading frame 13; in exon of the genes set forth in Table 3 and/or Table 6. The library or #2 of phatidylinositol glycan anchor biosynthesis, class O; in exon #15 of Solute carrier family 43, member 1; in exon #1 of catalog in affect provides a resource list of transcripts from Rabaptin, RAB GTPase binding effector protein 1; in intron which target sequences appropriate for inclusion in a Prostate #38 of Maltase-glucoamylase (alpha-glucosidase); in intron Cancer Prognostic set can be derived. In one embodiment, an #23 of Ankyrin 3, node of Ranvier (ankyrin G); 71.333 base individual Prostate Cancer Prognostic set may comprise tar pair 3' of Gremlin 1, cysteine knot Superfamily, homolog; get sequences derived from the transcripts of one or more 1,561 base pair of 3' Zinc finger protein 667; in exon #4 of genes exhibiting a positive correlation with recurrent prostate Phosphatidylinositol transfer protein, beta; in intron #18 of cancer. In one embodiment, an individual Prostate Cancer Adenylate cyclase 3; and in exon #2 of Hypothetical gene Prognostic Set may comprise target sequences derived from supported by AK128346. the transcripts of one or more genes exhibiting a negative 0065. In one embodiment, the Prostate Classification Set correlation with recurrent prostate cancer. In one embodi comprises target sequences derived from 382.253 base pair 3' ment, an individual Prostate Cancer Prognostic Set may com of Replication factor C (activator 1)3; in intron #2 of Ankyrin prise target sequences derived from the transcripts of two or repeat domain 15; in exon #19 of Hypothetical protein more genes, wherein at least one gene has a transcript that FLJ12949; in exon #15 of Solute carrier family 43, member 1: exhibits a positive correlation with recurrent prostate cancer 313,721 base pair 3' of Thioredoxin-like 2; in exon #2 of and at least one gene has a transcript that exhibits negative Polymerase (RNA) II (DNA directed) polypeptide L, 7.6 correlation with recurrent prostate cancer. kDa; in intron #10 of Syntaxin 5; 141,389 base pair 5' of 0060. In one embodiment, the Prostate Cancer Prognostic Leucine rich repeat containing 16; in intron #2 of Calcium Set comprises target sequences derived from the transcripts of channel, voltage-dependent, beta 4 subunit; 5,474 base pair 5 at least one gene. In one embodiment, the Prostate Cancer of NM 001005522; in intron #14 of G protein-coupled Prognostic Set comprises target sequences derived from the receptor kinase interactor 2; in intron #23 of Ankyrin 3, node of Ranvier (ankyrin G); 71.333 base pair 3' of Gremlin 1, transcripts of at least 5 genes. In another embodiment, the cysteine knot superfamily, homolog: 1,561 base pair of 3' of Prostate Cancer Prognostic set comprises target sequences Zinc finger protein 667; in exon #2 of Hypothetical gene derived from the transcripts of at least 10 genes. In a further supported by AK128346; in intron #11 of Transmembrane 9 embodiment, the Prostate Cancer Prognostic set comprises superfamily member 2; in intron #1 of Potassium channel, target sequences derived from the transcripts of at least 15 subfamily K, member 1; in intron #2 of Chromodomain heli genes. In other embodiments, the Prostate Cancer Prognostic case DNA binding protein 2: 22, 184 base pair 5' of Micro set comprises target sequences derived from the transcripts of cephaly, primary autosomal recessive 1; in intron #4 of Chro at least 20, at least 25, at least 30, at least 35, at least 40, at least mosome 21 open reading frame 34; and in intron #2 of Dual 45, at least 50, at least 55, at least 60 and at least 65 genes. specificity phosphatase 5. 0061 Following the identification of candidate gene tran 0066. In one embodiment, the Prostate Classification Set Scripts, appropriate target sequences can be identified by comprises target sequences derived from 382.253 base pair 3' screening for target sequences that have been annotated to be of Replication factor C (activator 1) 3, 38 kDa. 58,123 base associated with each specific gene locus from a number of pair 3' of Tripartite motif-containing 61; in intron #3 of Cit annotation sources including GenBank, RefSeq, Ensembl. rate lyase beta like: in intron #2 of Ankyrin repeat domain 15; dbFST, GENSCAN, TWINSCAN, Exoniphy, Vega, microR in intron #38 of Maltase-glucoamylase (alpha-glucosidase); NAs registry and others (see Affymetrix Exon Array design and in intron #23 of Ankyrin 3, node of Ranvier (ankyrin G). note). 0067. In one embodiment, the Prostate Classification Set 0062. As part of the target sequence selection process, comprises target sequences derived from 382.253 base pair 3' target sequences can be further evaluated for potential cross of Replication factor C (activator 1) 3, 38 kDa; in intron #2 of hybridization against other putative transcribed sequences in Ankyrin repeat domain 15; in exon #19 of Hypothetical pro the design (but not the entire genome) to identify only those tein FLJ12949; in exon #15 of Solute carrier family 43, mem target sequences that are predicted to uniquely hybridize to a ber 1; 313,721 base pair 3' of Thioredoxin-like 2; in exon #2 single target. of Polymerase (RNA) II (DNA directed) polypeptide L, 7.6 0063 Optionally, the set of target sequences that are pre kDa; in intron #10 of Syntaxin 5; 141,389 base pair 5' of dicted to uniquely hybridize to a single target can be further Leucine rich repeat containing 16; in intron #2 of Calcium filtered using a variety of criteria including, for example, channel, voltage-dependent, beta 4 subunit; 5,474 base pair 5 sequence length, for their mean expression levels across a of NM 001005522; in intron #14 of G protein-coupled wide selection of tissues, as being representive of receptor kinase interactor 2; in intron #2 of Ankyrin 3, node of transcripts expressed either as novel alternative (i.e., non Ranvier (ankyrin G); 71.333 base pair of 3' Gremlin 1, cys consensus) exons, alternative retained introns, novel exons 5' teine knot superfamily; 1,561 base pair 3' of Zinc finger US 2011/013.6683 A1 Jun. 9, 2011 protein 667; in exon #2 of Hypothetical gene supported by with, for example, elevated expression across numerous tis AK128346; in intron #11 of Transmembrane 9 superfamily sues (non-specific) or no expression in prostate tissue can be member 2; in intron #1 of Potassium channel, subfamily K. excluded. member 1; in intron #2 of Chromodomain helicase DNA binding protein 2; in exon #8 of Chromosome 9 open reading Validation of Target Sequences frame 94; in intron #4 of Chromosome 21 open reading frame (0071. Following in silico selection of target sequences, 34; and each target sequence suitable for use in the Prostate Cancer in intron #2 of Dual specificity phosphatase 5. Prognostic Set may be validated to confirm differential rela 0068. In one embodiment, the Prostate Classification Set tive or normalized expression in recurrent prostate cancer or comprises target sequences derived from in intron #3 of Cit non-recurrent prostate cancer. Validation methods are known rate lyase beta like; 210,560 base pair 5 of Phosphodiesterase in the art and include hybridization techniques such as 4D: 189,722 base pair 5' of Ectodysplasin A receptor; 3.510 microarray analysis or Northern blotting using appropriate base pair 3' of DEP domain containing 6; in exon #6 of controls, and may include one or more additional steps. Such Basonuclin 2; in intron #1 of Chromosome 2 open reading as reverse transcription, transcription, PCR, RT-PCR and the frame 3; in intron #1 of FLJ25476 protein; in intron #10 of like. The validation of the target sequences using these meth Staphylococcal nuclease and tudor domain containing 1; in ods is well within the abilities of a worker skilled in the art. exon #22 of Hermansky-Pudlak syndrome 5; and in exon #24 Minimal Expression Signature of Chromosome 12 open reading frame 30. 0069. In one embodiment, the Prostate Classification Set 0072. In one embodiment, individual Prostate Cancer comprises target sequences derived from 382.253 base pair 3' Prognostic Sets provide for at least a determination of a of Replication factor C (activator 1) 3, 38 kDa. 58,123 base minimal expression signature, capable of distinguishing pair 3' of Tripartite motif-containing 61; in intron #3 of Cit recurrent from non-recurrent forms of prostate cancer. Means rate lyase beta like; in intron #1 of Adaptor-related protein for determining the appropriate number of target sequences complex 1 gamma 2 subunit; in intron #2 of Kallikrein necessary to obtain a minimal expression signature are known related peptidase 2:210,560 base pair 5' of Phosphodiesterase in the art and include the Nearest Shrunken Centroids (NSC) 4D: 3,508 base pair 3' of Cytochrome P450, family 4, sub method. 0073. In this method (see US 2007003 1873), a standard family F, polypeptide 11; 189,722 base pair 5' of Ectodyspla ized centroid is computed for each class. This is the average sin A receptor; in intron #2 of Phospholipase C, beta 1; in gene expression for each gene in each class divided by the intron #10 of KIAA1244: in intron #2 of Paraoxonase 2; within-class standard deviation for that gene. Nearest cen 11,235 base pair 3' of Arachidonate 12-lipoxygenase, 12R troid classification takes the gene expression profile of a new type; in exon #22 of Cut-like 2: 143,098 base pair 5' of sample, and compares it to each of these class centroids. The Chemokine (C-X-C motif) ligand 12; in intron #6 of Rho class whose centroid that it is closest to, in squared distance. guanine nucleotide exchange factor (GEF) 5: 15,057 base is the predicted class for that new sample. Nearest shrunken pair 5' of Olfactory receptor, family 2, subfamily A, member centroid classification “shrinks’ each of the class centroids 4; 6,025 base pair 3' of Chromosome 19 open reading frame toward the overall centroid for all classes by an amount called 42: inexon #2 of Hypothetical gene supported by AK128346; the threshold. This shrinkage consists of moving the centroid in exon #11 of Phosphoglucomutase 5: in exon #9 of Hyalu towards zero by threshold, setting it equal to Zero if it hits roman binding protein 4; in exon #8 of NECAP endocytosis Zero. For example if threshold was 2.0, a centroid of 3.2 associated 2; in intron #20 of Myeloid/lymphoid or mixed would be shrunk to 1.2, a centroid of -3.4 would be shrunk to lineage leukemia; 1,558 base pair 3' of Signal transducer and -1.4, and a centroid of 1.2 would be shrunk to zero. After activator of transcription; in intron #1 of Chromosome 2 open shrinking the centroids, the new sample is classified by the reading frame 3; in intron #1 of FLJ25476 protein; in intron usual nearest centroid rule, but using the shrunken class cen #10 of Staphylococcal nuclease and tudor domain containing troids. This shrinkage can make the classifier more accurate 1: 84,468 base pair 3' of Transmembrane protein 18; in exon by reducing the effect of noisy genes and provides an auto #22 of Hermansky-Pudlak syndrome 5; in exon #24 of Chro matic gene selection. In particular, if a gene is shrunk to Zero mosome 12 open reading frame 30; 95.745 base pair of 3' for all classes, then it is eliminated from the prediction rule. Splicing factor 3b, subunit 1; in exon #4 of Cofilin 2; in intron Alternatively, it may be set to Zero for all classes except one, #1 of Heparan sulfate 6-O-sulfotransferase 3: in intron #1 of and it can be learned that the high or low expression for that Enabled homolog; in intron #2 of Mannosyl (alpha-1,6-)- gene characterizes that class. The user decides on the value to glycoprotein beta-1,6-N-acetyl-glucosaminyltransferase; in use for threshold. Typically one examines a number of differ intron #8 of Solute carrier family 24, member 3; 32.382 base ent choices. To guide in this choice, PAM does K-fold cross pair 3' of Inositol 1,4,5-triphosphate receptor, type 1; in intron validation for a range of threshold values. The samples are #9 of CAP-GLY domain containing linker protein 1; in exon divided up at random into Kroughly equally sized parts. For #14 of Transglutaminase 4; in intron #4 of MOCO sulphurase each part in turn, the classifier is built on the other K-1 parts C-terminal domain containing 2: 21,555 base pair 5' of 4 then tested on the remaining part. This is done for a range of hydroxyphenylpyruvate dioxygenase-like; and in exon #26 of threshold values, and the cross-validated misclassification R3H domain containing 1. error rate is reported for each threshold value. Typically, the 0070. In one embodiment, the potential set of target user would choose the threshold value giving the minimum sequences can be filtered for their expression levels using the cross-validated misclassification error rate. multi-tissue expression data made publicly available by 0074 Alternatively, minimal expression signatures can be Affymetrix at (http://www.affymetrix.com/stipport/techni established through the use of optimization algorithms such cal/sample data/exon array data.affx) such that probes as the mean variance algorithm widely used in establishing US 2011/013.6683 A1 Jun. 9, 2011

stock portfolios. This method is described in detail in US employed. For example, a combination target sequences use patent publication number 20030194734. Essentially, the ful in these methods were found to include those encoding method calls for the establishment of a set of inputs (stocks in RNAs corresponding to SEQ ID NOs: 1-913 (found at financial applications, expression as measured by intensity increased expression in prostate cancer Samples from NED here) that will optimize the return (e.g., signal that is gener patients) and/or corresponding to SEQ ID NOs: 914-2114 ated) one receives for using it while minimizing the variabil (found at increased expression levels in prostate cancer ity of the return. In other words, the method calls for the samples from SYS patients), where intermediate levels of establishment of a set of inputs (e.g., expression as measured by intensity) that will optimize the signal while minimizing certain target sequences (Table 7) are observed in prostate variability. Many commercial Software programs are avail cancer samples from PSA patients with biochemical recur able to conduct such operations. “Wagner Associates Mean rence, where the RNA expression levels are indicative of a Variance Optimization Application.” referred to as “Wagner disease state or outcome. Subgroups of these target Software” throughout this specification, is preferred. This sequences, as well as individual target sequences, have been software uses functions from the “Wagner Associates Mean found useful in such methods. Variance Optimization Library” to determine an efficient 0079. In some embodiments, an RNA signature corre frontier and optimal portfolios in the Markowitz sense is sponding to SEQ ID NOs: 1, 4, 6, 9, 14-16, 18-21915-917, preferred. Use of this type of software requires that microar 920,922,928,929,931,935 and 936 (the 21-RNA signature) ray data be transformed so that it can be treated as an input in and/or SEQID NOs: 1-11,914-920 (the 18-RNA signature) the way stock return and risk measurements are used when the and/or SEQ ID NOs: 1-4,914,915) (the 6-RNA signature) Software is used for its intended financial analysis purposes. and/or SEQID NOs: 1, 4, 6, 9, 14-16, 18-21, 915-917, 920, 0075. The process of selecting a minimal expression sig 922, 928,929, 931, 935 and 936 (the 20-RNA signature) nature can also include the application of heuristic rules. and/or SEQID NOs 3, 36, 60, 63,926,971,978,999, 1014 Preferably, such rules are formulated based on biology and an and 1022 (the 10-RNA signature) and/or SEQID NOs 1-3, understanding of the technology used to produce clinical 32, 33, 36,46, 60, 63, 66, 69,88, 100,241,265,334,437,920, results. More preferably, they are applied to output from the 925,934, 945, 947,954,971,978,999, 1004, 1014, 1022, optimization method. For example, the mean variance 1023, 1032, 1080, 1093, 1101, 1164, 1248, 1304, 1311, 1330, method of portfolio selection can be applied to microarray 1402 and 1425 (the 41-RNA signature) are formulated into data for a number of genes differentially expressed in Subjects a linear combination of their respective expression values for with cancer. Output from the method would be an optimized each patient generating a patient outcome predictor (POP) set of genes that could include Some genes that are expressed score and indicative of the disease status of the patient after in peripheral blood as well as in diseased tissue. prostatectomy. 0076. Other heuristic rules can be applied that are not 0080 Exemplary subsets and combinations of interest necessarily related to the biology in question. For example, also include at least five, six, 10, 15, 18, 20, 23, 25, 27, 30, 35, one can apply a rule that only a prescribed percentage of the 40, 45, 50, 55, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, portfolio can be represented by a particular gene or group of 250, 275,300, 350, 400, 450, 500, 750, 1000, 1200, 1400, genes. Commercially available software such as the Wagner 1600, 1800, 2000, or all 2114 target sequences in Table 4; at Software readily accommodates these types of heuristics. least five, six, 10, 15, 18, 20, 23, 25, 27, 30, 35, 40, 45, 50,55, This can be useful, for example, when factors other than 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275,300, accuracy and precision (e.g., anticipated licensing fees) have 350, 400, 450, 500, or all 526 target sequences in Table 7: an impact on the desirability of including one or more genes. SEQID NOs: 1, 4,915, 6,916, 9,917,920, 922, 14, 15, 16, 0077. In one embodiment, the Prostate Cancer Prognostic 928, 929, 18, 19, 931, 20, 21, 935, 936, or combinations Set for obtaining a minimal expression signature comprises at thereof: SEQID NOs: 1,2,3,4,5,6,7,8,9, 10, 11,914,915, least one, two, three, four, five, six, eight, 10, 15, 20, 25 or 916, 917,918, 919,920, or combinations thereof: SEQ ID more of target sequences shown to have a positive correlation NOs: 1, 4, 6, 9, 14-16, 18-21, 915-917, 920, 922, 928,929, with non-recurrent prostate disease, for example those 931, 935, 936 or combinations thereof: SEQ ID NOS 3, 36, depicted in SEQID NOs: 1-913 or a subset thereof. In another 60, 63, 926, 971, 978, 999, 1014, 1022 or combinations embodiment, the Prostate Cancer Prognostic Set for obtain thereof: SEQID NOs 1-3, 32, 33, 36, 46, 60, 63, 66, 69, 88, ing a minimal expression signature comprises at least one, 100, 241, 265, 334, 437, 920, 925,934, 945, 947, 954,971, two, three, four, five, six, eight, 10, 15, 20, 25 or more of those 978,999, 1004, 1014, 1022, 1023, 1032, 1080, 1093, 1101, target sequences shown to have a positive correlation with 1164, 1248, 1304, 1311, 1330, 1402, 1425 or combinations recurrent prostate cancer, for example those depicted in of thereof; at least one, two, three, four, five or six of SEQ ID SEQ ID NOs: 914-2114, or a subset thereof. In yet another NOs:1, 4, 6, 9, 14, 15, 16, 18, 19, 20, and 21 and at least one, embodiment, the Prostate Cancer Prognostic Set for obtain two, three, four, five or six of SEQ ID NOs: 915, 916, 917, ing a minimal expression signature comprises at least one, 920, 922,928,929,931, 935, and 936; and at least one, two, two, three, four, five, six, eight, 10, 15, 20, 25 or more of target three, four, five or six of SEQID NOs: 1, 2, 3, 4, 5, 6, 7, 8, 9, sequences shown to have a correlation with non-recurrent or 10, and 11 at least one, two, three, four, five or six of at least recurrent prostate cancer, for example those depicted in SEQ one, two, three, four, five or six of SEQIDNOs:914,915,916, ID NOS:1-2114 or a Subset thereof. 917,918, 919, and 920 and at least one, two, three, four, five 0078. In some embodiments, the Prostate Cancer Prog or six of SEQID NOs: 1, 4, 6, 9, 14-16, 18-21,915-917,920, nostic Set comprises target sequences for detecting expres 922,928,929,931,935,936; and at leastone, two, three, four, sion products of SEQIDs: 1-2114. In some embodiments, the five or six of SEQID NOs 3,36, 60, 63,926,971,978,999, Prostate Cancer Prognostic Set comprises probes for detect 1014, 1022; and at least one, two, three, four, five or six of ing expression levels of sequences exhibiting positive and SEQID NOs 1-3, 32,33, 36,46, 60, 63, 66, 69,88, 100, 241, negative correlation with a disease status of interest are 265,334,437, 920, 925,934, 945, 947,954,971,978,999, US 2011/013.6683 A1 Jun. 9, 2011

1004, 1014, 1022, 1023, 1032, 1080, 1093, 1101, 1164, 1248, tion against gene sequence databases, such as the Human 1304, 1311, 1330, 1402, 1425. Genome Sequence, UniGene, dbEST or the non-redundant 0081 Exemplary subsets of interest include those database at NCBI. In one embodiment of the invention, the described herein, including in the examples. Exemplary com polynucleotide probe is complementary to a region of a target binations of interest include those utilizing one or more of the mRNA derived from a target sequence in the Prostate Cancer sequences listed in Tables 5, 7, 8, 9 or 10. Of particular Prognostic Set. Computer programs can also be employed to interest are those combinations utilizing at least one sequence select probe sequences that will not cross hybridize or will not exhibiting positive correlation with the trait of interest, as hybridize non-specifically. well as those combinations utilizing at least one sequence 0087. One skilled in the art will understand that the nucle exhibiting negative correlation with the trait of interest. Also otide sequence of the polynucleotide probe need not be iden of interest are those combinations utilizing at least two, at tical to its target sequence in order to specifically hybridize least three, at least four, at least five or at least six of those thereto. The polynucleotide probes of the present invention, sequences exhibiting Such a positive correlation, in combina therefore, comprise a nucleotide sequence that is at least tion with at least two, at least three, at least four, at least five, about 75% identical to a region of the target gene or mRNA. or at least six of those sequences exhibiting Such a negative In another embodiment, the nucleotide sequence of the poly correlation. Exemplary combinations include those utilizing nucleotide probe is at least about 90% identical a region of the at least one, two, three, four, five or six of the target sequences target gene or mRNA. In a further embodiment, the nucle depicted in Tables 5 and 6. otide sequence of the polynucleotide probe is at least about 0082 In some embodiments, increased relative expression 95% identical to a region of the target gene or mRNA. Meth of one or more of SEQIDs: 1-913, decreased relative expres ods of determining sequence identity are known in the art and sion of one or more of SEQID NOS:914-2114 or a combina can be determined, for example, by using the BLASTN pro tion of any thereof is indicative/predictive of the patient gram of the University of Wisconsin Computer Group (GCG) exhibiting no evidence of disease for at least seven years or software or provided on the NCBI website. The nucleotide more after Surgery. In some embodiments, increased relative sequence of the polynucleotide probes of the present inven expression of SEQIDs:914-2114, decreased relative expres tion may exhibit variability by differing (e.g. by nucleotide sion of one or more of SEQID NOs: 1-913 or a combination Substitution, including transition or transversion) at one, two, of any thereof is indicative/predictive of the patient exhibiting three, four or more nucleotides from the sequence of the target systemic prostate cancer. Increased or decreased expression gene. of target sequences represented in these sequence listings, or I0088. Other criteria known in the art may be employed in of the target sequences described in the examples, may be the design of the polynucleotide probes of the present inven utilized in the methods of the invention. tion. For example, the probes can be designed to have <50% 0083. The Prostate Cancer Prognostic Set can optionally G content and/or between about 25% and about 70% G+C include one or more target sequences specifically derived content. Strategies to optimize probe hybridization to the from the transcripts of one or more housekeeping genes and/ target nucleic acid sequence can also be included in the pro or one or more internal control target sequences and/or one or cess of probe selection. Hybridization under particular pH, more negative control target sequences. In one embodiment, salt, and temperature conditions can be optimized by taking these target sequences can, for example, be used to normalize into account melting temperatures and by using empirical expression data. Housekeeping genes from which target rules that correlate with desired hybridization behaviours. sequences for inclusion in a Prostate Cancer Prognostic Set Computer models may be used for predicting the intensity can be derived from are known in the art and include those and concentration-dependence of probe hybridization. genes in which are expressed at a constant level in normal and I0089. As is known in the art, in order to represent a unique prostate cancer tissue. sequence in the , a probe should be at least 15 0084. The target sequences described herein may be used nucleotides in length. Accordingly, the polynucleotide probes alone or in combination with each other or with other known of the present invention range in length from about 15 nucle or later identified disease markers. otides to the full length of the target sequence or target mRNA. In one embodiment of the invention, the polynucle Prostate Cancer Prognostic Probes/Primers otide probes are at least about 15 nucleotides in length. In 0085. The system of the present invention provides for another embodiment, the polynucleotide probes are at least combinations of polynucleotide probes that are capable of about 20 nucleotides in length. In a further embodiment, the detecting the target sequences of the Prostate Cancer Prog polynucleotide probes are at least about 25 nucleotides in nostic Sets. Individual polynucleotide probes comprise a length. In another embodiment, the polynucleotide probes are nucleotide sequence derived from the nucleotide sequence of between about 15 nucleotides and about 500 nucleotides in the target sequences or complementary sequences thereof. length. In other embodiments, the polynucleotide probes are The nucleotide sequence of the polynucleotide probe is between about 15 nucleotides and about 450 nucleotides, designed such that it corresponds to, or is complementary to about 15 nucleotides and about 400 nucleotides, about 15 the target sequences. The polynucleotide probe can specifi nucleotides and about 350 nucleotides, about 15 nucleotides cally hybridize under either stringent or lowered stringency and about 300 nucleotides in length. hybridization conditions to a region of the target sequences, to (0090. The polynucleotide probes of a Prostate Cancer the complement thereof, or to a nucleic acid sequence (Such Prognostic Set can comprise RNA, DNA, RNA or DNA as a cDNA) derived therefrom. mimetics, or combinations thereof, and can be single I0086. The selection of the polynucleotide probe sequences stranded or double-stranded. Thus the polynucleotide probes and determination of their uniqueness may be carried out in can be composed of naturally-occurring nucleobases, Sugars silico using techniques known in the art, for example, based and covalent internucleoside (backbone) linkages as well as onaBLASTN search of the polynucleotide sequence in ques polynucleotide probes having non-naturally-occurring por US 2011/013.6683 A1 Jun. 9, 2011

tions which function similarly. Such modified or substituted oalkyl-phosphonates, thionoalkylphosphotriesters, and polynucleotide probes may provide desirable properties Such boranophosphates having normal 3'-5' linkages. 2'-5' linked as, for example, enhanced affinity for a target gene and analogs of these, and those having inverted polarity wherein increased stability. the adjacent pairs of nucleoside units are linked 3'-5' to 5'-3' or 0091. The system of the present invention further provides 2'-5' to 5'-2'. Various salts, mixed salts and free acid forms are for primers and primer pairs capable of amplifying target also included. sequences defined by the Prostate Cancer Prognostic Set, or 0096 Exemplary modified oligonucleotide backbones fragments or Subsequences or complements thereof. The that do not include a phosphorus atom are formed by short nucleotide sequences of the Prostate Cancer Prognostic set chain alkyl or cycloalkyl internucleoside linkages, mixed het may be provided in computer-readable media for in silico eroatom and alkyl or cycloalkyl internucleoside linkages, or applications and as a basis for the design of appropriate prim one or more short chain heteroatomic or heterocyclic inter ers for amplification of one or more target sequences of the nucleoside linkages. Such backbones include morpholino Prostate Cancer Prognostic Set. linkages (formed in part from the Sugar portion of a nucleo 0092 Primers based on the nucleotide sequences of target side); siloxane backbones; Sulfide, Sulfoxide and Sulphone sequences can be designed for use in amplification of the backbones; formacetyl and thioformacetylbackbones; meth target sequences. For use in amplification reactions such as ylene formacetyl and thioformacetylbackbones; alkene con PCR, a pair of primers will be used. The exact composition of taining backbones; Sulphamate backbones; methyleneimino the primer sequences is not critical to the invention, but for and methylenehydrazino backbones; Sulphonate and Sulfona most applications the primers will hybridize to specific mide backbones; amide backbones; and others having mixed sequences of the Prostate Cancer Prognostic Set under strin N, O, S and CH component parts. gent conditions, particularly under conditions of high Strin 0097. The present invention also contemplates oligo gency, as known in the art. The pairs of primers are usually nucleotide mimetics in which both the sugar and the inter chosen so as to generate an amplification product of at least nucleoside linkage of the nucleotide units are replaced with about 50 nucleotides, more usually at least about 100 nucle novel groups. The base units are maintained for hybridization otides. Algorithms for the selection of primer sequences are with an appropriate nucleic acid target compound. An generally known, and are available in commercial Software example of such an oligonucleotide mimetic, which has been packages. These primers may be used in standard quantitative shown to have excellent hybridization properties, is a peptide or qualitative PCR-based assays to assess transcript expres nucleic acid (PNA) Nielsen et al., Science, 254:1497-1500 sion levels of RNAs defined by the Prostate Cancer Prognos (1991). In PNA compounds, the sugar-backbone of an oli tic Set. Alternatively, these primers may be used in combina gonucleotide is replaced with an amide containing backbone, tion with probes, such as molecular beacons in amplifications in particular an aminoethylglycine backbone. The nucleo using real-time PCR. bases are retained and are bound directly or indirectly to 0093. In one embodiment, the primers or primer pairs, aza-nitrogenatoms of the amide portion of the backbone. The when used in an amplification reaction, specifically amplify present invention also contemplates polynucleotide probes or at least a portion of a nucleic acid depicted in one of SEQID primers comprising “locked nucleic acids’ (LNAs), which NOs: 1-2114 (or subgroups thereof as set forth herein), an are novel conformationally restricted oligonucleotide ana RNA form thereof, or a complement to either thereof. logues containing a methylene bridge that connects the 2'-O 0094. As is known in the art, a nucleoside is a base-sugar of ribose with the 4'-C (see, Singh et al., Chem. Commun., combination and a nucleotide is a nucleoside that further 1998, 4:455-456). LNA and LNA analogues display very includes a phosphate group covalently linked to the Sugar high duplex thermal stabilities with complementary DNA and portion of the nucleoside. In forming oligonucleotides, the RNA, stability towards 3'-exonuclease degradation, and good phosphate groups covalently link adjacent nucleosides to one solubility properties. Synthesis of the LNA analogues of another to form a linear polymeric compound, with the nor adenine, cytosine, guanine, 5-methylcytosine, thymine and mal linkage or backbone of RNA and DNA being a 3' to 5' uracil, their oligomerization, and nucleic acid recognition phosphodiester linkage. Specific examples of polynucleotide properties have been described (see Koshkin et al., Tetrahe probes or primers useful in this invention include oligonucle dron, 1998, 54:3607-3630). Studies of mis-matched otides containing modified backbones or non-natural inter sequences show that LNA obey the Watson-Crick base pair nucleoside linkages. As defined in this specification, oligo ing rules with generally improved selectivity compared to the nucleotides having modified backbones include both those corresponding unmodified reference strands. that retain a phosphorus atom in the backbone and those that (0098 LNAs form duplexes with complementary DNA or lack a phosphorus atom in the backbone. For the purposes of RNA or with complementary LNA, with high thermal affini the present invention, and as sometimes referenced in the art, ties. The universality of LNA-mediated hybridization has modified oligonucleotides that do not have a phosphorus been emphasized by the formation of exceedingly stable atom in their internucleoside backbone can also be consid LNA:LNA duplexes (Koshkinet al., J. Am. Chem. Soc., 1998, ered to be oligonucleotides. 120:13252-13253). LNA:LNA hybridization was shown to 0095 Exemplary polynucleotide probes or primers having be the most thermally stable nucleic acid type duplex system, modified oligonucleotide backbones include, for example, and the RNA-mimicking character of LNA was established at those with one or more modified internucleotide linkages that the duplex level. Introduction of three LNA monomers (Tor are phosphorothioates, chiral phosphorothioates, phospho A) resulted in significantly increased melting points toward rodithioates, phosphotriesters, aminoalkylphosphotriesters, DNA complements. methyl and other alkyl phosphonates including 3'-alkylene 0099 Synthesis of 2'-amino-LNA (Singh et al., J. Org. phosphonates and chiral phosphonates, phosphinates, phos Chem., 1998, 63, 1.0035-10039) and 2'-methylamino-LNA phoramidates including 3' amino phosphoramidate and ami has been described and thermal stability of their duplexes noalkylphosphoramidates, thionophosphoramidates, thion with complementary RNA and DNA strands reported. Prepa US 2011/013.6683 A1 Jun. 9, 2011 ration of phosphorothioate-LNA and 2'-thio-LNA have also useful for increasing the binding affinity of the polynucle been described (Kumaret al., Bioorg. Med. Chem. Lett., 1998, otide probes of the invention. These include 5-substituted 8:2219-2222). pyrimidines, 6-azapyrimidines and N-2, N-6 and O-6 substi 0100 Modified polynucleotide probes or primers may tuted purines, including 2-aminopropyladenine, 5-propyny also contain one or more substituted Sugar moieties. For luracil and 5-propynylcytosine. 5-methylcytosine Substitu example, oligonucleotides may comprise Sugars with one of tions have been shown to increase nucleic acid duplex the following substituents at the 2' position: OH: F: O S , stability by 0.6-1.2° C. Sanghvi, Y. S., (1993) Antisense or N-alkyl: O , S: , or N-alkenyl: O S or N-alkynyl: Research and Applications, pp. 276-278, Crooke, S. T. and or O-alkyl-O-alkyl, wherein the alkyl, alkenyl and alkynyl Lebleu, B., ed., CRC Press, Boca Raton. may be substituted or unsubstituted C to Co alkyl or C to 0103) One skilled in the art will recognize that it is not Co alkenyl and alkynyl. Examples of Such groups are: necessary for all positions in a given polynucleotide probe or O(CH), OCH, O(CH), OCH, O(CH), NH, O(CH), primer to be uniformly modified. The present invention, CH, O(CH), ONH2, and O(CH), ON(CH), CH), therefore, contemplates the incorporation of more than one of where n and m are from 1 to about 10. Alternatively, the the aforementioned modifications into a single polynucle oligonucleotides may comprise one of the following Substitu otide probe or even at a single nucleoside within the probe or ents at the 2' position: C to Co lower alkyl, substituted lower primer. alkyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH, 0104 One skilled in the art will also appreciate that the OCN, Cl, Br, CN, CF, OCF, SOCH, SO, CH, ONO, nucleotide sequence of the entire length of the polynucleotide NO, N, NH, heterocycloalkyl, heterocycloalkaryl, ami probe or primer does not need to be derived from the target noalkylamino, polyalkylamino, Substituted silyl, an RNA sequence. Thus, for example, the polynucleotide probe may cleaving group, a reporter group, an intercalator, a group for comprise nucleotide sequences at the 5' and/or 3' termini that improving the pharmacokinetic properties of an oligonucle are not derived from the target sequences. Nucleotide otide, or a group for improving the pharmacodynamic prop sequences which are not derived from the nucleotide erties of an oligonucleotide, and other substituents having sequence of the target sequence may provide additional func similar properties. Specific examples include 2'-methoxy tionality to the polynucleotide probe. For example, they may ethoxy (2'-O CHCHOCH, also known as 2'-O-(2-meth provide a restriction enzyme recognition sequence or a “tag” oxyethyl) or 2-MOE) Martin et al., Helv. Chim. Acta, that facilitates detection, isolation, purification or immobili 78:486-504 (1995), 2-dimethylaminooxyethoxy (O(CH), sation onto a solid Support. Alternatively, the additional ON(CH) group, also known as 2'-DMAOE), 2-methoxy nucleotides may provide a self-complementary sequence that (2'-O-CH-), 2-aminopropoxy (2'-OCH, CH, CH NH) allows the primer/probe to adopt a hairpin configuration. and 2'-fluoro (2'-F). Such configurations are necessary for certain probes, for 0101 Similar modifications may also be made at other example, molecular beacon and Scorpion probes, which can positions on the polynucleotide probes or primers, particu be used in solution hybridization techniques. larly the 3' position of the sugar on the 3' terminal nucleotide 0105. The polynucleotide probes or primers can incorpo or in 2'-5' linked oligonucleotides and the 5' position of 5' rate moieties useful in detection, isolation, purification, or terminal nucleotide. Polynucleotide probes or primers may immobilisation, if desired. Such moieties are well-known in also have Sugar mimetics such as cyclobutyl moieties in place the art (see, for example, Ausubel et al., (1997 & updates) of the pentofuranosyl Sugar. Current Protocols in Molecular Biology, Wiley & Sons, New 0102 Polynucleotide probes or primers may also include York) and are chosen such that the ability of the probe to modifications or Substitutions to the nucleobase. As used hybridize with its target sequence is not affected. herein, “unmodified’ or “natural nucleobases include the 0106 Examples of suitable moieties are detectable labels, purine bases adenine (A) and guanine (G), and the pyrimidine Such as radioisotopes, fluorophores, chemiluminophores, bases thymine (T), cytosine (C) and uracil (U). Modified enzymes, colloidal particles, and fluorescent microparticles, nucleobases include other synthetic and natural nucleobases as well as antigens, antibodies, haptens, avidin/streptavidin, such as 5-methylcytosine (5-me-C), 5-hydroxymethyl biotin, haptens, enzyme cofactors/Substrates, enzymes, and cytosine, Xanthine, hypoxanthine, 2-aminoadenine, 6-methyl the like. and other alkyl derivatives of adenine and guanine, 2-propyl 0107 Alabel can optionally be attached to or incorporated and other alkyl derivatives of adenine and guanine, 2-thiou into a probe or primer polynucleotide to allow detection and/ racil, 2-thiothymine and 2-thiocytosine, 5-halouracil and or quantitation of a target polynucleotide representing the cytosine, 5-propynyl uracil and cytosine, 6-azo uracil, target sequence of interest. The target polynucleotide may be cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, the expressed target sequence RNA itself, a cDNA copy 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other thereof, or an amplification product derived therefrom, and 8-Substituted adenines and guanines, 5-halo particularly may be the positive or negative strand, so long as it can be 5-bromo, 5-trifluoromethyl and other 5-substituted uracils specifically detected in the assay being used. Similarly, an and cytosines, 7-methylguanine and 7-methyladenine, 8-aza antibody may be labeled. guanine and 8-azaadenine, 7-deazaguanine and 7-deaZaad 0108. In certain multiplex formats, labels used for detect enine and 3-deazaguanine and 3-deazaadenine. Further ing different targets may be distinguishable. The label can be nucleobases include those disclosed in U.S. Pat. No. 3,687, attached directly (e.g., via covalent linkage) or indirectly, e.g., 808: The Concise Encyclopedia Of Polymer Science And via a bridging molecule or series of molecules (e.g., a mol Engineering, (1990) pp 858-859, Kroschwitz, J. I., ed. John ecule or complex that can bind to an assay component, or via Wiley & Sons: Englisch et al., Angewandte Chemie, Int. Ed., members of a binding pair that can be incorporated into assay 30:613 (1991); and Sanghvi, Y. S., (1993) Antisense Research components, e.g. biotin-avidin or Streptavidin). Many labels and Applications, pp 289-302, Crooke, S.T. and Lebleu, B., are commercially available in activated forms which can ed., CRC Press. Certain of these nucleobases are particularly readily be used for Such conjugation (for example through US 2011/013.6683 A1 Jun. 9, 2011 amine acylation), or labels may be attached through known or polymer Such as a peptide nucleic acid. Polynucleotides of the determinable conjugation schemes, many of which are invention can be selected from the subsets of the recited known in the art. nucleic acids described herein, as well as their complements. 0109 Labels useful in the invention described herein 0.115. In some embodiments, polynucleotides of the include any substance which can be detected when bound to invention comprise at least 20 consecutive bases as depicted or incorporated into the biomolecule of interest. Any effective in SEQID NOS:1-2114, or a complement thereto. The poly detection method can be used, including optical, spectro nucleotides may comprise at least 21, 22, 23, 24, 25, 27, 30, scopic, electrical, piezoelectrical, magnetic, Raman scatter 32, 35 or more consecutive bases as depicted in SEQ ID ing, Surface plasmon resonance, colorimetric, calorimetric, NOs:1-2114, as applicable. etc. A label is typically selected from a chromophore, a lumi 0116. In some embodiments, the nucleic acid in (a) can be phore, a fluorophore, one member of a quenching system, a selected from those in Table 3, and from SEQ ID NOs: 1, 4, chromogen, a hapten, an antigen, a magnetic particle, a mate 915, 6,916, 9,917,920,922, 14, 15, 16,928,929, 18, 19,931, rial exhibiting nonlinear optics, a semiconductor nanocrystal, 20, 21,935, and 936; or from SEQID NOs: 1, 2, 3, 4, 5, 6, 7, a metal nanoparticle, an enzyme, an antibody or binding 8, 9, 10, 11,914, 915,916,917,918, 919, and 920; or from portion or equivalent thereof, an aptamer, and one member of SEQID NOs: 1, 4, 6, 9, 14-16, 18-21,915-917,920,922,928, a binding pair, and combinations thereof. Quenching schemes 929,931, 935 and 936; or from SEQ ID NOs 3, 36, 60, 63, may be used, wherein a quencher and a fluorophore as mem 926,971,978,999, 1014 and 1022; or from SEQID NOs 1-3, bers of a quenching pair may be used on a probe, such that a 32, 33, 36,46, 60, 63, 66, 69,88, 100,241,265,334,437,920, change in optical parameters occurs upon binding to the target 925,934, 945, 947,954,971,978,999, 1004, 1014, 1022, introduce or quench the signal from the fluorophore. One 1023, 1032, 1080, 1093, 1101, 1164, 1248, 1304, 1311, 1330, example of Such a system is a molecular beacon. Suitable 1402 and 1425. quencher/fluorophore systems are known in the art. The label 0117 The polynucleotides may be provided in a variety of may be bound through a variety of intermediate linkages. For formats, including as solids, in Solution, or in an array. The example, a polynucleotide may comprise a biotin-binding polynucleotides may optionally comprise one or more labels, species, and an optically detectable label may be conjugated which may be chemically and/or enzymatically incorporated to biotin and then bound to the labeled polynucleotide. Simi into the polynucleotide. larly, a polynucleotide sensor may comprise an immunologi 0118. In one embodiment, Solutions comprising poly cal species such as an antibody or fragment, and a secondary nucleotide and a solvent are also provided. In some embodi antibody containing an optically detectable label may be ments, the solvent may be water or may be predominantly added. aqueous. In some embodiments, the solution may comprise at 0110 Chromophores useful in the methods described least two, three, four, five, six, seven, eight, nine, ten, twelve, herein include any Substance which can absorb energy and fifteen, seventeen, twenty or more different polynucleotides, emit light. For multiplexed assays, a plurality of different including primers and primer pairs, of the invention. Addi signaling chromophores can be used with detectably different tional Substances may be included in the solution, alone or in emission spectra. The chromophore can be a lumophore or a combination, including one or more labels, additional Sol fluorophore. Typical fluorophores include fluorescent dyes, vents, buffers, biomolecules, polynucleotides, and one or semiconductor nanocrystals, lanthanide chelates, polynucle more enzymes useful for performing methods described otide-specific dyes and green fluorescent protein. herein, including polymerases and ligases. The Solution may 0111 Coding schemes may optionally be used, compris further comprise a primer or primer pair capable of amplify ing encoded particles and/or encoded tags associated with ing a polynucleotide of the invention present in the solution. different polynucleotides of the invention. A variety of dif 0119. In some embodiments, one or more polynucleotides ferent coding schemes are known in the art, including fluo provided herein can be provided on a substrate. The substrate rophores, including SCNCs, deposited metals, and RF tags. can comprise a wide range of material, either biological, 0112 Polynucleotides from the described target nonbiological, organic, inorganic, or a combination of any of sequences may be employed as probes for detecting target these. For example, the Substrate may be a polymerized Lang sequences expression, for ligation amplification schemes, or muir Blodgett film, functionalized glass, Si, Ge. GaAs, GaP. may be used as primers for amplification schemes of all or a SiO, SiN., modified silicon, or any one of a wide variety of portion of a target sequences. When amplified, either strand gels or polymers such as (poly) tetrafluoroethylene, (poly) produced by amplification may be provided in purified and/or vinylidenedifluoride, polystyrene, cross-linked polystyrene, isolated form. polyacrylic, polylactic acid, polyglycolic acid, poly(lactide 0113. In one embodiment, polynucleotides of the inven coglycolide), polyanhydrides, poly(methyl methacrylate), tion include a nucleic acid depicted in (a) any one of SEQID poly(ethylene-co-vinyl acetate), polysiloxanes, polymeric NOs: 1-2114, or a subgroup thereofas set forth herein; (b) an silica, latexes, dextran polymers, epoxies, polycarbonates, or RNA form of any one of the nucleic acids depicted in SEQID combinations thereof. Conducting polymers and photocon NOs: 1-2114, or a subgroup thereofas set forth herein; (c) a ductive materials can be used. peptide nucleic acid form of any of the nucleic acids depicted I0120 Substrates can be planar crystalline substrates such in SEQ ID NOs: 1-2114, or a subgroup thereofas set forth as silica based substrates (e.g. glass, quartz, or the like), or herein; (d) a nucleic acid comprising at least 20 consecutive crystalline Substrates used in, e.g., the semiconductor and bases of any of (a-c); (e) a nucleic acid comprising at least 25 microprocessor industries, such as silicon, gallium arsenide, bases having at least 90% sequenced identity to any of (a-c); indium doped GaN and the like, and includes semiconductor and (f) a complement to any of (a-e). nanocrystals. 0114 Complements may take any polymeric form capable I0121 The substrate can take the form of an array, a pho of base pairing to the species recited in (a)-(e), including todiode, an optoelectronic sensor Such as an optoelectronic nucleic acid such as RNA or DNA, or may be a neutral semiconductor chip or optoelectronic thin-film semiconduc US 2011/013.6683 A1 Jun. 9, 2011

tor, or a biochip. The location(s) of probe(s) on the substrate 90/15070). Techniques for the synthesis of these arrays using can be addressable; this can be done in highly dense formats, mechanical synthesis strategies are described in, e.g., PCT and the location(s) can be microaddressable or nanoaddress Publication No. WO 93/09668 and U.S. Pat. No. 5,384,261. able. Still further techniques include bead based techniques such as 0122 Silica aerogels can also be used as Substrates, and those described in PCT Appl. No. PCT/US93/04145 and pin can be prepared by methods known in the art. Aerogel Sub based methods such as those described in U.S. Pat. No. 5,288, strates may be used as free standing Substrates or as a Surface 514. Additional flow channel or spotting methods applicable coating for another substrate material. to attachment of sensor polynucleotides to a Substrate are 0123. The substrate can take any form and typically is a described in U.S. patent application Ser. No. 07/980,523, plate, slide, bead, pellet, disk, particle, microparticle, nano filed Nov. 20, 1992, and U.S. Pat. No. 5,384,261. particle, Strand, precipitate, optionally porous gel, sheets, 0.130. Alternatively, the polynucleotide probes of the tube, sphere, container, capillary, pad, slice, film, chip, mul present invention can be prepared by enzymatic digestion of tiwell plate or dish, optical fiber, etc. The substrate can be any the naturally occurring target gene, or mRNA or cDNA form that is rigid or semi-rigid. The Substrate may contain raised or depressed regions on which an assay component is derived therefrom, by methods known in the art. located. The surface of the substrate can be etched using known techniques to provide for desired surface features, for Prostate Cancer Prognostic Methods example trenches, V-grooves, mesa structures, or the like. 0.124 Surfaces on the substrate can be composed of the I0131 The present invention further provides methods for same material as the substrate or can be made from a different characterizing prostate cancer sample for recurrence risk. The material, and can be coupled to the Substrate by chemical or methods use the Prostate Cancer Prognostic Sets, probes and physical means. Such coupled Surfaces may be composed of primers described herein to provide expression signatures or any of a wide variety of materials, for example, polymers, profiles from a test sample derived from a Subject having or plastics, resins, polysaccharides, silica or silica-based mate Suspected of having prostate cancer. In some embodiments, rials, carbon, metals, inorganic glasses, membranes, or any of Such methods involve contacting a test sample with Prostate the above-listed substrate materials. The surface can be opti Cancer Prognostic probes (either in solution or immobilized) cally transparent and can have surface Si-OH functional under conditions that permit hybridization of the probe(s) to ities, such as those found on silica Surfaces. any target nucleic acid(s) present in the test sample and then 0.125. The substrate and/or its optional surface can be cho detecting any probe: target duplexes formed as an indication Sen to provide appropriate characteristics for the synthetic of the presence of the target nucleic acid in the sample. and/or detection methods used. The substrate and/or surface Expression patterns thus determined are then compared to can be transparent to allow the exposure of the substrate by one or more reference profiles or signatures. Optionally, the light applied from multiple directions. The substrate and/or expression pattern can be normalized. The methods use the surface may be provided with reflective “mirror structures to Prostate Cancer Prognostic Sets, probes and primers increase the recovery of light. described herein to provide expression signatures or profiles 0126 The substrate and/or its surface is generally resistant from a test sample derived from a subject to classify the to, or is treated to resist, the conditions to which it is to be prostate cancer as recurrent or non-recurrent. exposed in use, and can be optionally treated to remove any (0132. In some embodiments, such methods involve the resistant material after exposure to such conditions. specific amplification of target sequences nucleic acid(s) 0127. The substrate or a region thereofmay be encoded so present in the test sample using methods known in the art to that the identity of the sensor located in the substrate or region generate an expression profile or signature which is then being queried may be determined Any Suitable coding compared to a reference profile or signature. scheme can be used, for example optical codes, RFID tags, I0133. In some embodiments, the invention further pro magnetic codes, physical codes, fluorescent codes, and com vides for prognosing patient outcome, predicting likelihood binations of codes. of recurrence after prostatectomy and/or for designating treatment modalities. Preparation of Probes and Primers I0134. In one embodiment, the methods generate expres 0128. The polynucleotide probes or primers of the present sion profiles or signatures detailing the expression of the 2114 invention can be prepared by conventional techniques well target sequences having altered relative expression with dif known to those skilled in the art. For example, the polynucle ferent prostate cancer outcomes. In one embodiment, the otide probes can be prepared using Solid-phase synthesis methods generate expression profiles or signatures detailing using commercially available equipment. As is well-known in the expression of the Subsets of these target sequences having the art, modified oligonucleotides can also be readily pre 526 or 18 target sequences as described in the examples. pared by similar methods. The polynucleotide probes can also I0135) In some embodiments, increased relative expression be synthesized directly on a solid Support according to meth of one or more of SEQIDs: 1-913, decreased relative expres ods standard in the art. This method of synthesizing poly sion of one or more of SEQID NOS:914-2114 or a combina nucleotides is particularly useful when the polynucleotide tion of any thereof is indicative of a non-recurrent form of probes are part of a nucleic acid array. prostate cancer and may be predictive a NED clinical out 0129 Polynucleotide probes or primers can be fabricated come after Surgery. In some embodiments, increased relative on or attached to the substrate by any suitable method, for expression of SEQIDs:914-2114, decreased relative expres example the methods described in U.S. Pat. No. 5,143,854, sion of one or more of SEQID NOs: 1-913 or a combination PCT Publ. No. WO92/10092, U.S. patent application Ser. of any thereof is indicative of a recurrent form of prostate No. 07/624,120, filed Dec. 6, 1990 (now abandoned), Fodor cancer and may be predictive of a SYS clinical outcome after et al., Science, 251: 767-777 (1991), and PCT Publ. No. WO Surgery. Increased or decreased expression of target US 2011/013.6683 A1 Jun. 9, 2011

sequences represented in these sequence listings, or of the 0143. In one embodiments, the sample or portion of the target sequences described in the examples, may be utilized in sample comprising or Suspected of comprising prostate can the methods of the invention. cer tissue or cells can be any source of biological material, 0136. In one embodiment, intermediate levels of expres including cells, tissue or fluid, including bodily fluids. Non sion of one or more target sequences depicted in Table 7 limiting examples of the Source of the sample include an indicate a probability of future biochemical recurrence. aspirate, a needle biopsy, a cytology pellet, a bulk tissue 0137 In some embodiments, the methods detect combi preparation or a section thereof obtained for example by nations of expression levels of sequences exhibiting positive Surgery or autopsy, lymph fluid, blood, plasma, serum, and negative correlation with a disease status. In one embodi tumors, and organs. ment, the methods detect a minimal expression signature. 0144. The samples may be archival samples, having a 0138 Any method of detecting and/or quantitating the known and documented medical outcome, or may be samples expression of the encoded target sequences can in principle be from current patients whose ultimate medical outcome is not used in the invention. Such methods can include Northern yet known. blotting, array or microarray hybridization, by enzymatic 0145. In some embodiments, the sample may be dissected cleavage of specific structures (e.g., an Invader Rassay, Third prior to molecular analysis. The sample may be prepared via Wave Technologies, e.g. as described in U.S. Pat. Nos. 5,846, macrodissection of a bulk tumor specimen orportion thereof, 717, 6,090,543; 6,001,567; 5,985,557; and 5,994,069) and or may be treated via microdissection, for example via Laser amplification methods, e.g. RT-PCR, including in a Taq Capture Microdissection (LCM). Man(R) assay (PE Biosystems, Foster City, Calif., e.g. as 0146 The sample may initially be provided in a variety of described in U.S. Pat. Nos. 5,962,233 and 5,538,848), and states, as fresh tissue, fresh frozen tissue, fine needle aspi may be quantitative or semi-quantitative, and may vary rates, and may be fixed or unfixed. Frequently, medical labo depending on the origin, amount and condition of the avail ratories routinely prepare medical samples in a fixed state, able biological sample. Combinations of these methods may which facilitates tissue storage. A variety of fixatives can be also be used. For example, nucleic acids may be amplified, used to fix tissue to stabilize the morphology of cells, and may labeled and Subjected to microarray analysis. Single-mol be used alone or in combination with other agents. Exemplary ecule sequencing (e.g., Illumina, Helicos, PacBio, ABI fixatives include crosslinking agents, alcohols, acetone, SOLID), in situ hybridization, bead-array technologies (e.g., Bouin’s solution, Zenker solution, Hely solution, osmic acid Luminex XMAP, Illumina BeadChips), branched DNA tech Solution and Carnoy solution. nology (e.g., Panomics, Genisphere). 0147 Crosslinking fixatives can comprise any agent suit 0.139. The expressed target sequences can be directly able for forming two or more covalent bonds, for example an detected and/or quantitated, or may be copied and/or ampli aldehyde. Sources of aldehydes typically used for fixation fied to allow detection of amplified copies of the expressed include formaldehyde, paraformaldehyde, glutaraldehyde or target sequences or its complement. In some embodiments, formalin. Preferably, the crosslinking agent comprises form degraded and/or fragmented RNA can be usefully analyzed aldehyde, which may be included in its native form or in the for expression levels of target sequences, for example RNA form of paraformaldehyde or formalin. One of skill in the art having an RNA integrity number of less than 8. would appreciate that for samples in which crosslinking fixa 0140. In some embodiments, quantitative RT-PCR assays tives have been used special preparatory steps may be neces are used to measure the expression level of target sequences sary including for example heating steps and proteinase-k depicted in SEQIDs: 1-2114. In other embodiments, a Gene digestion; see methods Chip or microarray can be used to measure the expression of 0148 One or more alcohols may be used to fix tissue, one or more of the target sequences. alone or in combination with other fixatives. Exemplary alco hols used for fixation include methanol, ethanol and isopro 0141 Molecular assays measure the relative expression panol. levels of the target sequences, which can be normalized to the expression levels of one or more control sequences, for 0149 Formalin fixation is frequently used in medical example array control sequences and/or one or more house laboratories. Formalin comprises both an alcohol, typically keeping genes, for example GAPDH. Increased (or methanol, and formaldehyde, both of which can act to fix a decreased) relative expression of the target sequences as biological sample. described herein, including any of SEQID NOS:1-2114, may 0150. Whether fixed or unfixed, the biological sample may thus be used alone or in any combination with each other in optionally be embedded in an embedding medium. Exem the methods described herein. In addition, negative control plary embedding media used in histology including paraffin, probes may be included. Tissue-Tek R. V.I.PTM, Paramat, Paramat Extra, Paraplast, Paraplast X-tra, Paraplast Plus, Peel Away Paraffin Embed Diagnostic Samples ding Wax, Polyester Wax, Carbowax Polyethylene Glycol, PolyfinTM, Tissue Freezing Medium TFMTM, Cryo-GelTM, 0142 Diagnostic samples for use with the systems and in and OCT Compound (Electron Microscopy Sciences, Hat the methods of the present invention comprise nucleic acids field, Pa.). Prior to molecular analysis, the embedding mate suitable for providing RNAs expression information. In prin rial may be removed via any suitable techniques, as known in ciple, the biological sample from which the expressed RNA is the art. For example, where the sample is embedded in wax, obtained and analyzed for target sequence expression can be the embedding material may be removed by extraction with any material Suspected of comprising prostate cancer tissue organic solvent(s), for example Xylenes. Kits are commer or cells. The diagnostic sample can be a biological sample cially available for removing embedding media from tissues. used directly in a method of the invention. Alternatively, the Samples or sections thereof may be subjected to further pro diagnostic sample can be a sample prepared from a biological cessing steps as needed, for example serial hydration or dehy sample. dration steps. US 2011/013.6683 A1 Jun. 9, 2011

0151. In some embodiments, the sample is a fixed, wax transcriptase and a primer to create cDNA prior to detection, embedded biological sample. Frequently, Samples from quantitation and/or amplification; this can be done in vitro medical laboratories are provided as fixed, wax-embedded with purified mRNA or in situ, e.g., in cells or tissues affixed samples, most commonly as formalin-fixed, paraffin embed to a slide. ded (FFPE) tissues. 0158. By “amplification' is meant any process of produc 0152 Whatever the source of the biological sample, the ing at least one copy of a nucleic acid, in this case an target polynucleotide that is ultimately assayed can be pre expressed RNA, and in many cases produces multiple copies. pared synthetically (in the case of control sequences), but An amplification product can be RNA or DNA, and may typically is purified from the biological source and Subjected include a complementary strand to the expressed target to one or more preparative steps. The RNA may be purified to sequence. DNA amplification products can be produced ini remove or diminish one or more undesired components from tially through reverse translation and then optionally from the biological sample or to concentrate it. Conversely, where further amplification reactions. The amplification product the RNA is too concentrated for the particular assay, it may be may include all or a portion of a target sequence, and may diluted. optionally be labeled. A variety of amplification methods are Suitable for use, including polymerase-based methods and RNA Extraction ligation-based methods. Exemplary amplification techniques 0153. RNA can be extracted and purified from biological include the polymerase chain reaction method (PCR), the samples using any Suitable technique. A number of tech ligase chain reaction (LCR), ribozyme-based methods, self niques are known in the art, and several are commercially Sustained sequence replication (3SR), nucleic acid sequence available (e.g., FormaPureTM nucleic acid extraction kit, based amplification (NASBA), the use of Q Beta replicase, Agencourt Biosciences, Beverly Mass., High Pure FFPE reverse transcription, nick translation, and the like. RNA Micro KitTM, Roche Applied Science, Indianapolis, 0159. Asymmetric amplification reactions may be used to Ind.). RNA can be extracted from frozen tissue sections using preferentially amplify one strand representing the target TRIZol (Invitrogen, Carlsbad, Calif.) and purified using RNe sequence that is used for detection as the target polynucle asy Protect kit (Qiagen, Valencia, Calif.). RNA can be further otide. In some cases, the presence and/or amount of the ampli purified using DNAse I treatment (Ambion, Austin, Tex.) to fication product itself may be used to determine the expres eliminate any contaminating DNA. RNA concentrations can sion level of a given target sequence. In other instances, the be made using a Nanodrop ND-1000 spectrophotometer amplification product may be used to hybridize to an array or (Nanodrop Technologies, Rockland, Del.). RNA integrity can other substrate comprising sensor polynucleotides which are be evaluated by running electropherograms, and RNA integ used to detect and/or quantitate target sequence expression. rity number (RIN, a correlative measure that indicates intact 0160 The first cycle of amplification in polymerase-based ness of mRNA) can be determined using the RNA 6000 methods typically forms a primer extension product comple Pico Assay for the Bioanalyzer 2100 (Agilent Technologies, mentary to the template strand. If the template is single Santa Clara, Calif.). stranded RNA, a polymerase with reverse transcriptase activ ity is used in the first amplification to reverse transcribe the Reverse Transcription for QRT-PCR Analysis RNA to DNA, and additional amplification cycles can be performed to copy the primer extension products. The prim 0154 Reverse transcription can be performed using the ers for a PCR must, of course, be designed to hybridize to Omniscript kit (Qiagen, Valencia, Calif.), Superscript III kit regions in their corresponding template that will produce an (Invitrogen, Carlsbad, Calif.), for RT-PCR. Target-specific amplifiable segment; thus, each primer must hybridize so that priming can be performed in order to increase the sensitivity its 3' nucleotide is paired to a nucleotide in its complementary of detection of target sequences and generate target-specific template strand that is located 3' from the 3' nucleotide of the cDNA primer used to replicate that complementary template strand 0155 TaqMan(R) Gene Expression Analysis in the PCR. 0156 TaqMan(R) RT-PCR can be performed using Applied 0.161 The target polynucleotide can be amplified by con Biosystems Prism (ABI) 7900 HT instruments in a 5ul vol tacting one or more strands of the target polynucleotide with ume with target sequence-specific cDNA equivalent to 1 ng a primer and a polymerase having Suitable activity to extend total RNA. Primers and probes concentrations for TaqMan the primer and copy the target polynucleotide to produce a analysis are added to amplify fluorescent amplicons using full-length complementary polynucleotide or a smaller por PCR cycling conditions such as 95°C. for 10 minutes for one tion thereof. Any enzyme having a polymerase activity that cycle, 95°C. for 20 seconds, and 60° C. for 45 seconds for 40 can copy the target polynucleotide can be used, including cycles. A reference sample can be assayed to ensure reagent DNA polymerases, RNA polymerases, reverse transcriptases, and process stability. Negative controls (i.e., no template) enzymes having more than one type of polymerase or enzyme should be assayed to monitor any exogenous nucleic acid activity. The enzyme can be thermolabile or thermostable. contamination. Mixtures of enzymes can also be used. Exemplary enzymes Amplification and Hybridization include: DNA polymerases such as DNA Polymerase I (“Pol I”), the Klenow fragment of Pol I, T4, T7, Sequenase(R). T7. 0157 Following sample collection and nucleic acid Sequenase RVersion 2.0 T7, Tub, Taq, Tth, Pfx, Pfu, Tsp, Tfl, extraction, the nucleic acid portion of the sample comprising Tli and Pyrococcus sp GB-D DNA polymerases; RNA poly RNA that is or can be used to prepare the target polynucle merases such as E. coli, SP6, T3 and T7 RNA polymerases: otide(s) of interest can be subjected to one or more prepara and reverse transcriptases such as AMV. M-MuDV. MMLV. tive reactions. These preparative reactions can include in vitro RNAse H MMLV (SuperScript(R), SuperScript(R) II, Ther transcription (IVT), labeling, fragmentation, amplification moScript(R), HIV-1, and RAV2 reverse transcriptases. All of and other reactions. mRNA can first be treated with reverse these enzymes are commercially available. Exemplary poly US 2011/013.6683 A1 Jun. 9, 2011 merases with multiple specificities include RAV2 and Tli Genome Analysis System (Illumina, Hayward, Calif.), Ope (exo-) polymerases. Exemplary thermostable polymerases nArray Real Time qPCR (BioTrove, Woburn, Mass.) and include Tub, Taq. Tth, Pfx, Pfu, Tsp, Tfl, Tli and Pyrococcus BeadXpress System (Illumina, Hayward, Calif.). sp. GB-D DNA polymerases. 0162 Suitable reaction conditions are chosen to permit Prostate Classification Arrays amplification of the target polynucleotide, including pH, 0166 The present invention contemplates that a Prostate buffer, ionic strength, presence and concentration of one or Cancer Prognostic Set or probes derived therefrom may be more salts, presence and concentration of reactants and cofac provided in an array format. In the context of the present tors such as nucleotides and magnesium and/or other metal invention, an "array' is a spatially or logically organized ions (e.g., manganese), optional cosolvents, temperature, collection of polynucleotide probes. Any array comprising thermal cycling profile for amplification schemes comprising sensor probes specific for two or more of SEQ ID NOs: a polymerase chain reaction, and may depend in part on the 1-2114 or a product derived therefrom can be used. Desirably, polymerase being used as well as the nature of the sample. an array will be specific for 5, 10, 15, 20, 25, 30, 50, 75, 100, Cosolvents include formamide (typically at from about 2 to 150, 200,250,300,350,400, 450, 500, 600, 700, 1000, 1200, about 10%), glycerol (typically at from about 5 to about 1400, 1600, 1800, 2000 or more of SEQ ID NOs: 1-2114. 10%), and DMSO (typically at from about 0.9 to about 10%). Expression of these sequences may be detected alone or in Techniques may be used in the amplification scheme in order combination with other transcripts. In some embodiments, an to minimize the production of false positives or artifacts pro array is used which comprises a wide range of sensor probes duced during amplification. These include “touchdown” for prostate-specific expression products, along with appro PCR, hot-start techniques, use of nested primers, or designing PCR primers so that they form stem-loop structures in the priate control sequences. An array of interest is the Human event of primer-dimer formation and thus are not amplified. Exon 1.0 ST Array (HuEx 1.0 ST, Affymetrix, Inc., Santa Techniques to accelerate PCR can be used, for example cen Clara, Calif.). trifugal PCR, which allows for greater convection within the 0.167 Typically the polynucleotide probes are attached to sample, and comprising infrared heating steps for rapid heat a solid substrate and are ordered so that the location (on the ing and cooling of the sample. One or more cycles of ampli substrate) and the identity of each are known. The polynucle fication can be performed. An excess of one primer can be otide probes can be attached to one of a variety of solid used to produce an excess of one primer extension product Substrates capable of withstanding the reagents and condi during PCR; preferably, the primer extension product pro tions necessary for use of the array. Examples include, but are duced in excess is the amplification product to be detected. A not limited to, polymers, such as (poly)tetrafluoroethylene, plurality of different primers may be used to amplify different (poly)Vinylidenedifluoride, polystyrene, polycarbonate, target polynucleotides or different regions of a particular polypropylene and polystyrene; ceramic; silicon; silicon target polynucleotide within the sample. dioxide; modified silicon, (fused) silica, quartZorglass; func tionalized glass; paper, Such as filter paper, diazotized cellu 0163 An amplification reaction can be performed under lose; nitrocellulose filter, nylon membrane; and polyacryla conditions which allow an optionally labeled sensor poly mide gel pad. Substrates that are transparent to light are useful nucleotide to hybridize to the amplification product during at for arrays that will be used in an assay that involves optical least part of an amplification cycle. When the assay is per detection. formed in this manner, real-time detection of this hybridiza 0168 Examples of array formats include membrane or tion event can take place by monitoring for light emission or filter arrays (for example, nitrocellulose, nylon arrays), plate fluorescence during amplification, as known in the art. arrays (for example, multiwell, such as a 24- 96-, 256-, 384-, 0164. Where the amplification product is to be used for 864- or 1536-well, microtitre plate arrays), pin arrays, and hybridization to an array or microarray, a number of suitable bead arrays (for example, in a liquid “slurry). Arrays on commercially available amplification products are available. Substrates Such as glass or ceramic slides are often referred to These include amplification kits available from NuGEN, Inc. as chip arrays or “chips. Such arrays are well known in the (San Carlos, Calif.), including the WTOvationTM System, art. In one embodiment of the present invention, the Prostate WTOvationTM System v2, WTOvationTM Pico System, WT Cancer Prognosticarray is a chip. OvationTM FFPE Exon Module, WTOvationTM FFPE Exon Module Ribo Amp and RiboAmp' RNA Amplification Kits Data Analysis (MDS Analytical Technologies (formerly Arcturus) (Moun tain View, Calif.), Genisphere, Inc. (Hatfield, Pa.), including 0169 Array data can be managed and analyzed using tech the Ramp Up PlusTM and Sense.AmpTM RNA Amplification niques known in the art. The Genetrix suite of tools can be kits, alone or in combination. Amplified nucleic acids may be used for microarray analysis (Epicenter Software, Pasadena, Subjected to one or more purification reactions after amplifi Calif.). Probe set modeling and data pre-processing can be cation and labeling, for example using magnetic beads (e.g., derived using the Robust Multi-Array (RMA) algorithm or RNAClean magnetic beads, Agencourt BioSciences). variant GC-RMA, Probe Logarithmic Intensity Error 0.165 Multiple RNA biomarkers can be analyzed using (PLIER) algorithm or variant iterPLIER. Variance or inten real-time quantitative multiplex RT-PCR platforms and other sity filters can be applied to pre-process data using the RMA multiplexing technologies such as GenomeLab GeXP algorithm, for example by removing target sequences with a Genetic Analysis System (Beckman Coulter, Foster City, standard deviation of <10 or a mean intensity of <100 inten Calif.), SmartCyclerR 9600 or GeneXpert(R) Systems (Cep sity units of a normalized data range, respectively. heid, Sunnyvale, Calif.), ABI 7900 HT Fast RealTime PCR 0170 In some embodiments, one or more pattern recog system (Applied Biosystems, Foster City, Calif.), LightCy nition methods can be used in analyzing the expression level clerR 480 System (Roche Molecular Systems, Pleasanton, of target sequences. The pattern recognition method can com Calif.), XMAP 100 System (Luminex, Austin, Tex.) Solexa prise a linear combination of expression levels, or a nonlinear US 2011/013.6683 A1 Jun. 9, 2011 20 combination of expression levels. In some embodiments, 0177. In some embodiments, portfolios are established expression measurements for RNA transcripts or combina such that the combination of genes in the portfolio exhibit tions of RNA transcript levels are formulated into linear or improved sensitivity and specificity relative to known meth non-linear models or algorithms (i.e., an expression signa ods. In considering a group of genes for inclusion in a port ture) and converted into a likelihood score. This likelihood folio, a small standard deviation in expression measurements score indicates the probability that a biological sample is correlates with greater specificity. Other measurements of from a patient who will exhibit no evidence of disease, who variation Such as correlation coefficients can also be used in will exhibit systemic cancer, or who will exhibit biochemical this capacity. The invention also encompasses the above recurrence. The likelihood score can be used to distinguish methods where the specificity is at least about 50% or at least these disease states. The models and/or algorithms can be about 60%. The invention also encompasses the above meth provided in machine readable format, and may be used to ods where the sensitivity is at least about 90%. correlate expression levels or an expression profile with a 0.178 The gene expression profiles of each of the target disease state, and/or to designate a treatment modality for a sequences comprising the portfolio can fixed in a medium patient or class of patients. Such as a computer readable medium. This can take a number of forms. For example, a table can be established into which 0171 Thus, results of the expression level analysis can be the range of signals (e.g., intensity measurements) indicative used to correlate increased expression of RNAs correspond of disease or outcome is input. Actual patient data can then be ing to SEQ ID NOs: 1-2114, or subgroups thereof as compared to the values in the table to determine the patient described herein, with prostate cancer outcome, and to des samples diagnosis or prognosis. In a more Sophisticated ignate a treatment modality. embodiment, patterns of the expression signals (e.g., fluores 0172 Factors known in the art for diagnosing and/or Sug cent intensity) are recorded digitally or graphically. gesting, selecting, designating, recommending or otherwise 0179 The expression profiles of the samples can be com determining a course of treatment for a patient or class of pared to a control portfolio. If the sample expression patterns patients suspected of having prostate cancer can be employed are consistent with the expression pattern for a known disease in combination with measurements of the target sequence or disease outcome, the expression patterns can be used to expression. These techniques include cytology, histology, designate one or more treatment modalities. For patients with ultrasound analysis, MRI results, CT scan results, and mea test scores consistent with systemic disease outcome after surements of PSA levels. prostatectomy, additional treatment modalities such as adju 0173 Certified tests for classifying prostate disease status Vant chemotherapy (e.g., docetaxel, mitoxantrone and pred and/or designating treatment modalities are also provided. A nisone), Systemic radiation therapy (e.g., Samarium or stron certified test comprises a means for characterizing the expres tium) and/or anti-androgen therapy (e.g., Surgical castration, sion levels of one or more of the target sequences of interest, finasteride, dutasteride) can be designated. Such patients and a certification from a government regulatory agency would likely be treated immediately with anti-androgen endorsing use of the test for classifying the prostate disease therapy alone or in combination with radiation therapy in status of a biological sample. order to eliminate presumed micro-metastatic disease, which 0.174. In some embodiments, the certified test may com cannot be detected clinically but can be revealed by the target prise reagents for amplification reactions used to detect and/ sequence expression signature. Such patients can also be or quantitate expression of the target sequences to be charac more closely monitored for signs of disease progression. For terized in the test. An array of probe nucleic acids can be used, patients with test scores consistent with PSA or NED, adju with or without prior target amplification, for use in measur vant therapy would not likely be recommended by their phy ing target sequence expression. sicians in order to avoid treatment-related side effects such as 0.175. The test is submitted to an agency having authority metabolic syndrome (e.g., hypertension, diabetes and/or to certify the test for use in distinguishing prostate disease weight gain) or osteoporosis. Patients with samples consis status and/or outcome. Results of detection of expression tent with NED could be designated for watchful waiting, or levels of the target sequences used in the test and correlation for no treatment. Patients with test scores that do not correlate with disease status and/or outcome are submitted to the with systemic disease but who have successive PSA increases agency. A certification authorizing the diagnostic and/or could be designated for watchful waiting, increased monitor prognostic use of the test is obtained. ing, or lower dose or shorter duration anti-androgen therapy. 0176 Also provided are portfolios of expression levels 0180 Target sequences can be grouped so that information comprising a plurality of normalized expression levels of the obtained about the set of target sequences in the group can be target sequences described herein, including SEQID NOs: 1 used to make or assist in making a clinically relevant judg 2114. Such portfolios may be provided by performing the ment such as a diagnosis, prognosis, or treatment choice. methods described herein to obtain expression levels from an 0181 A patient report is also provided comprising a rep individual patient or from a group of patients. The expression resentation of measured expression levels of a plurality of levels can be normalized by any method known in the art; target sequences in a biological sample from the patient, exemplary normalization methods that can be used in various wherein the representation comprises expression levels of embodiments include Robust Multichip Average (RMA), target sequences corresponding to any one, two, three, four, probe logarithmic intensity error estimation (PLIER), non five, six, eight, ten, twenty, thirty, fifty or more of the target linear fit (NLFIT) quantile-based and nonlinear normaliza sequences depicted in SEQID NOs: 1-2114, or of the subsets tion, and combinations thereof. Background correction can described herein, or of a combination thereof. In some also be performed on the expression data; exemplary tech embodiments, the representation of the measured expression niques useful for background correction include mode of level(s) may take the form of a linear or nonlinear combina intensities, normalized using median polish probe modeling tion of expression levels of the target sequences of interest. and sketch-normalization. The patient report may be provided in a machine (e.g., a US 2011/013.6683 A1 Jun. 9, 2011 computer) readable format and/or in a hard (paper) copy. The amplifying the same number of target sequence-representa report can also include standard measurements of expression tive polynucleotides of interest. levels of said plurality of target sequences from one or more 0186 The reagents may independently be in liquid or solid sets of patients with known disease status and/or outcome. form. The reagents may be provided in mixtures. Control The report can be used to inform the patient and/or treating samples and/or nucleic acids may optionally be provided in physician of the expression levels of the expressed target the kit. Control samples may include tissue and/or nucleic sequences, the likely medical diagnosis and/or implications, acids obtained from or representative of prostate tumor and optionally may recommend a treatment modality for the samples from patients showing no evidence of disease, as patient. well as tissue and/or nucleic acids obtained from or represen 0182 Also provided are representations of the gene tative of prostate tumor samples from patients that develop expression profiles useful for treating, diagnosing, prognos systemic prostate cancer. ticating, and otherwise assessing disease. In some embodi 0187. The nucleic acids may be provided in an array for ments, these profile representations are reduced to a medium mat, and thus an array or microarray may be included in the that can be automatically readby a machine such as computer kit. The kit optionally may be certified by a government readable media (magnetic, optical, and the like). The articles agency for use in prognosing the disease outcome of prostate can also include instructions for assessing the gene expres cancer patients and/or for designating a treatment modality. sion profiles in Such media. For example, the articles may 0188 Instructions for using the kit to perform one or more comprise a readable storage form having computer instruc methods of the invention can be provided with the container, tions for comparing gene expression profiles of the portfolios and can be provided in any fixed medium. The instructions of genes described above. The articles may also have gene may be located inside or outside the container or housing, expression profiles digitally recorded therein so that they may and/or may be printed on the interior or exterior of any surface be compared with gene expression data from patient samples. thereof. A kit may be in multiplex form for concurrently Alternatively, the profiles can be recorded in different repre detecting and/or quantitating one or more different target sentational format. A graphical recordation is one such for polynucleotides representing the expressed target sequences. mat. Clustering algorithms can assist in the visualization of Devices Such data. 0189 Devices useful for performing methods of the inven Kits tion are also provided. The devices can comprise means for characterizing the expression level of a target sequence of the 0183 Kits for performing the desired method(s) are also invention, for example components for performing one or provided, and comprise a container or housing for holding the more methods of nucleic acid extraction, amplification, and/ components of the kit, one or more vessels containing one or or detection. Such components may include one or more of an more nucleic acid(s), and optionally one or more vessels amplification chamber (for example athermal cycler), a plate containing one or more reagents. The reagents include those reader, a spectrophotometer, capillary electrophoresis appa described in the composition of matter section above, and ratus, a chip reader, and or robotic sample handling compo those reagents useful for performing the methods described, nents. These components ultimately can obtain data that including amplification reagents, and may include one or reflects the expression level of the target sequences used in the more probes, primers or primer pairs, enzymes (including assay being employed. polymerases and ligases), intercalating dyes, labeled probes, 0190. The devices may include an excitation and/or a and labels that can be incorporated into amplification prod detection means. Any instrument that provides a wavelength uctS. that can excite a species of interest and is shorter than the 0184. In some embodiments, the kit comprises primers or emission wavelength(s) to be detected can be used for exci primer pairs specific for those Subsets and combinations of tation. Commercially available devices can provide suitable target sequences described herein. At least two, three, four or excitation wavelengths as well as Suitable detection compo five primers or pairs of primers suitable for selectively ampli nentS. fying the same number of target sequence-specific polynucle 0191 Exemplary excitation sources include a broadband otides can be provided in kit form. In some embodiments, the UV light source such as a deuterium lamp with an appropriate kit comprises from five to fifty primers or pairs of primers filter, the output of a white light source Such as a Xenon lamp Suitable for amplifying the same number of target sequence or a deuterium lamp after passing through a monochromator representative polynucleotides of interest. to extract out the desired wavelength(s), a continuous wave 0185. The primers or primer pairs of the kit, when used in (cw) gas laser, a solid state diode laser, or any of the pulsed an amplification reaction, specifically amplify at least a por lasers. Emitted light can be detected through any suitable tion of a nucleic acid depicted in one of SEQID NOs: 1-2114 device or technique; many suitable approaches are known in (or subgroups thereof as set forth herein), an RNA form the art. For example, a fluorimeter or spectrophotometer may thereof, or a complement to either thereof. The kit may be used to detect whether the test sample emits light of a include a plurality of Such primers or primer pairs which can wavelength characteristic of a label used in an assay. specifically amplify a corresponding plurality of different 0.192 The devices typically comprise a means for identi nucleic acids depicted in one of SEQ ID NOs: 1-2114 (or fying a given sample, and of linking the results obtained to subgroups thereofas set forth herein), RNA forms thereof, or that sample. Such means can include manual labels, barcodes, complements thereto. At least two, three, four or five primers and other indicators which can be linked to a sample vessel, or pairs of primers suitable for selectively amplifying the and/or may optionally be included in the sample itself, for same number of target sequence-specific polynucleotides can example where an encoded particle is added to the sample. be provided in kit form. In some embodiments, the kit com The results may be linked to the sample, for example in a prises from five to fifty primers or pairs of primers suitable for computer memory that contains a sample designation and a US 2011/013.6683 A1 Jun. 9, 2011 22 record of expression levels obtained from the sample. Link (0205 6: Mendiratta P. Febbo P G. Genomic signatures age of the results to the sample can also include a linkage to a associated with the development, progression, and out particular sample receptacle in the device, which is also come of prostate cancer. Mol DiagnTher. 2007; 11 (6):345 linked to the sample identity. 54. 0193 The devices also comprise a means for correlating (0206 7: Reddy G. K. BalkSP. Clinical utility of microar the expression levels of the target sequences being studied ray-derived genetic signatures in predicting outcomes in with a prognosis of disease outcome. Such means may com prostate cancer. Clin Genitourin Cancer. 2006 December; prise one or more of a variety of correlative techniques, 5(3):187-9. Review. including lookup tables, algorithms, multivariate models, and 0207 8: True L, Coleman I, Hawley S. Huang CY. Gifford linear or nonlinear combinations of expression models or D. Coleman R. Beer T M. Gelmann E. Datta M. Mostaghel algorithms. The expression levels may be converted to one or E. Knudsen B. Lange P. Vessella R, Lin D. Hood L. Nelson more likelihood scores, reflecting a likelihood that the patient PS. A molecular correlate to the Gleason grading system providing the sample will exhibit a particular disease out for prostate adenocarcinoma. Proc Natl Acad Sci USA. come. The models and/or algorithms can be provided in 2006 Jul. 18; 103(29): 10991-6. Epub 2006 Jul. 7. machine readable format, and can optionally further desig 0208 9: Stephenson A.J. Smith A, Kattan MW. Satagopan nate a treatment modality for a patient or class of patients. J, Reuter V. E. Scardino PT, Gerald W L. Integration of 0194 The device also comprises output means for output gene expression profiling and clinical variables to predict ting the disease status, prognosis and/or a treatment modality. prostate carcinoma recurrence after radical prostatectomy. Such output means can take any form which transmits the Cancer. 2005 Jul. 15: 104(2):290-8. results to a patient and/or a healthcare provider, and may (0209) 10: Bueno R, Loughlin KR, Powell MH, Gordon G include a monitor, a printed format, or both. The device may J. A diagnostic test for prostate cancer from gene expres use a computer system for performing one or more of the sion profiling data.J Urol. 2004 February; 171 (2 Pt 1):903 steps provided. 6. 0210) 11:YuY P. Landsittel D, Jing L, Nelson J, Ren B, Liu CITATIONS L. McDonald C, Thomas R, Dhir R, Finkelstein S, Micha lopoulos G. Becich M, Luo J. H. Gene expression alter Patents and Published Applications ations in prostate cancer predicting tumor aggression and (0195 1. US 2003/0224399 A1 patent application Methods preceding development of malignancy. JClin Oncol. 2004 for determining the prognosis for patients with a prostate Jul. 15; 22(14):2790-9. 0211) 12: Feroze-Merzoug F. Schober M. S., Chen Y Q. neoplastic condition 2003 Dec. 4 Molecular profiling in prostate cancer. Cancer Metastasis (0196. 2. US 2007/0048738A1 patent application Methods Rev. 2001; 20(3–4): 165-71. Review. and compositions for diagnosis, staging and prognosis of 0212) 13: Nakagawa T. Kollmeyer T M. Morlan B W, prostate cancer 2007 Mar. 1 Anderson S K, Bergstralh E. J. Davis BJ, Asmann Y W. (0197) 3. US 2007/0099197 A1 patent application Methods Klee G G, Ballman KV, Jenkins R B. A tissue biomarker of prognosis of prostate cancer 2007 May 3 panel predicting systemic progression after PSA recur (0198 4. US 2007/0259352 A1 patent application Prostate rence post-definitive prostate cancer therapy. PLoS ONE. cancer-related nucleic acids 2007 Nov. 8 2008; 3(5):e2318. (0199 5. US 2008/0009001 A1 patent application Method 0213) 14: Shariat S F, Karakiewicz, P I, Roehrborn C G, for Identification of Neoplastic Transformation with Par Kattan MW. An updated catalog of prostate cancer predic ticular Reference to Prostate Cancer 2008 Jan. 10 tive tools. Cancer 2008; 113(11): 3062-6. Publications EXAMPLES (0200 1: Cooper CS, Campbell C, Jhavar S. Mechanisms 0214) To gain a better understanding of the invention of Disease: biomarkers and molecular targets from described herein, the following examples are set forth. It will microarray gene expression studies in prostate cancer. Nat be understood that these examples are intended to describe Clin Pract Urol. 2007 December; 4(12):677-87. Review. illustrative embodiments of the invention and are not intended 0201 2: Reddy G. K. BalkSP. Clinical utility of microar to limit the scope of the invention in any way. Efforts have ray-derived genetic signatures in predicting outcomes in been made to ensure accuracy with respect to numbers used prostate cancer. Clin Genitourin Cancer. 2006 December; (e.g., amounts, temperature, etc.) but some experimental 5(3):187-9. Review. error and deviation should be accounted for. Unless otherwise 0202 3: Nelson PS. Predicting prostate cancer behavior indicated, parts are parts by weight, temperature is degree using transcript profiles. J Urol. 2004 November; 172(5 Pt centigrade and pressure is at or near atmospheric, and all 2):S28-32; discussion S33. Review. materials are commercially available. 0203 4: Bibikova M. Chudin E, Arsanjani A, Zhou L, Garcia E. W. Modder J, Kostelec M, Barker D, Downs T. Example 1 Fan J. B. Wang-Rodriguez, J. Expression signatures that correlated with Gleason score and relapse in prostate can Identification of Target Sequences Differentially cer. Genomics. 2007 June: 89(6):666-72. Epub 2007 Apr. Expressed in Prostate Disease States 24. 0215 Tissue Samples. Formalin-fixed paraffin embedded 0204 5: SchlommT, Erbersdobler A, Mirlacher M, Sauter (FFPE) samples of human prostate adenocarcinoma prostate G. Molecular staging of prostate cancer in the year 2007. ctomies were collected from patients at the Mayo Clinic World J Urol. 2007 March; 25(1):19-30. Epub 2007 Mar. 2. Comprehensive Cancer Center according to an institutional Review. review board-approved protocol and stored in the Department US 2011/013.6683 A1 Jun. 9, 2011 of Pathology for up to 20 years. For each patient sample four displayed an expression pattern of NED>PSA>SYS or 4 micron sections were cut from formalin-fixed paraffin SYS>PSA>NED and genes were selected with a p-value embedded blocks. Pathological review of FFPE tissue sec cut-off of p-0.01 for two-way hierarchical clustering using tions was used to guide macrodissection of tumor and Sur Pearson's correlation distance metric with complete-linkage rounding normal tissue. Patients were classified into one of cluster distances. three clinical disease states; no evidence of disease (NED, 0219 Archived FFPE blocks of tumors were selected from n=10) for those patients with no biochemical or other clinical 30 patients that had undergone a prostatectomy at the Mayo signs of disease progression (at least 10 years follow-up); Clinic Comprehensive Cancer Center between the years prostate-specific antigen biochemical recurrence (PSA, 1987-1997, providing for at least 10 years follow-up on each n=10) for those patients with two successive increases in PSA patient. Twenty-two patient samples had RNA of sufficient measurements above an established cut-point of >4 ng/mL quantity and quality for RNA amplification and Subsequent (rising PSA); and systemic disease (SYS, n=10) for those GeneChip hybridization. Three clinical categories of patients patients that had rising PSA and developed metastases or were evaluated; patients alive with no evidence of disease clinically detectable disease progression within five years (NED, n=6), patients with rising PSA or biochemical recur after initial prostatectomy. Clinical disease was confirmed rence (defined as two Successive increases in PSA measure using bone or CT scans for prostate cancer metastases. ments) (PSA, n=7) and patients with rising PSA and clinical 0216 RNA Extraction. RNA was extracted and purified evidence of systemic or recurrent disease (e.g., determined by from FFPE tissue sections using a modified protocol for the bone scan, CT) (SYS, n=9) after prostatectomy. No statis commercially available High Pure FFPE RNA Micro nucleic tically significant differences between these three clinical acid extraction kit (Roche Applied Sciences, Indianapolis, groups were apparent when considering pathological factors Ind.). RNA concentrations were calculated using a Nanodrop Such as Gleason score or tumor stage (Table 1). AS Samples ND-1000 spectrophotometer (Nanodrop Technologies, from older archived FFPE blocks are typically more degraded Rockland, Del.). and fragmented than youngerblocks, the distribution of block 0217 RNA Amplification and GeneChip Hybridization. ages was similar in the three clinical groups so as not to skew Purified RNA was subjected to whole-transcriptome ampli or bias the results due to a block age effect. Fifty nanograms fication using the WTOvation FFPE system including the of RNA extracted from FFPE sections was amplified and WTOvation Exon and FL-Ovation Biotin V2 labeling mod hybridized to whole-transcriptome microarrays, interrogat ules, with the following modifications. Fifty (50) nanograms ing >1.4 million probe target sequences measuring RNA lev of RNA extracted from FFPE sections was used to generate els for RefSeq, dbEST and predicted transcripts (collectively, amplified Ribo-SPIA product. For the WTOvation Exon RNAs). sense-target strand conversion kit 4 ug of Ribo-SPIA product 0220 Table 3 displays the number of target sequences were used. All clean-up steps were performed with RNA identified in two-way comparisons between different clinical Clean magnetic beads (Agencourt BioSciences). Between 2.5 states using the appropriate t-tests and a p-value cut-off of and 5 micrograms of WTOvation Exon product were used to p-0.001. At total of 2,114 target sequences (Table 3) were fragment and label using the FL-Ovation Biotin V2 labeling identified as differentially expressed in these comparisons module and labeled product was hybridized to Affymetrix and a principle components analysis demonstrates that these Human Exon 1.0 ST GeneChips following manufacturer's target sequences discriminate the distinct clinical states into recommendations (Affymetrix, Santa Clara, Calif.). Of the 30 three clusters (FIG. 1A). samples processed, 22 had sufficient amplified material (i.e., 0221) A linear regression filter was next employed to sta >2.5ug of WTOvation Exon product) for GeneChip hybrid tistically rank target sequences that followed a trend of either ization. increased expression with poor prognosis patients (i.e., 0218 Microarray Analysis. All data management and SYS>PSA>NED) or increased expression in good prognosis analysis was conducted using the Genetrix Suite of tools for patients (NED-PSA>SYS, alternatively decreased expres microarray analysis (Epicenter Software, Pasadena, Calif.). sion in poor prognosis patients) (Table 4). FIG. 1B depicts a Probe set modeling and data pre-processing were derived two-way hierarchical clustering dendrogram and expression using the Robust Multi-Array (RMA) algorithm. The mode of matrix of top-ranked 526 target sequences and 22 tumor intensity values was used for background correction and samples. Patients in the PSA clinical status category dis RMA-sketch was used for normalization and probe modeling played intermediate expression levels for genes expressed at used a median polish routine. A variance filter was applied to data pre-processed using the RMA algorithm, by removing increased levels in SYS (n-313) and NED (n=213), respec target sequences with a mean intensity of <10 intensity units tively (Table 4). FIG. 1C depicts a two-way hierarchical clus of a normalized data range. Target sequences typically com tering dendrogram and expression matrix of 148 target prise four individual probes that interrogate the expression of sequences and 22 tumor samples. These target sequences RNA transcripts or portions thereof. Target sequence annota were a subset of the differentially expressed transcripts (Table tions and the sequences (RNAS) that they interrogate were 3) filtered using a t-test to query recurrent (i.e., SYS) and downloaded from the Affymetrix website (www.netaffx. non-recurrent (i.e., PSA and NED) patient samples com). Supervised analysis of differentially expressed RNA (Table 5). transcripts was determined based on the fold change in the 0222. The expression levels of these genes were summa average expression (at least 2 fold change) and the associated rized for each patient into a metagene using a simple linear t-test, with a p-value cut-off of p-0.001 between different combination by taking the expression level and multiplying it prostate cancer patient disease states. Linear regression was by a weighting factor for each target sequence in the also used to screen differentially expressed transcripts that metagene signature and combining these values into a single US 2011/013.6683 A1 Jun. 9, 2011 24 variable. Weighting factors were derived from the coefficients of the linear regression fit analysis (Table 4). FIG. 2 shows a TABLE 1 histogram plot of the metagene expression values for the Clinical characteristics of different clinical status patient groups evaluated. Summarized 526 target sequences in each of the three clinical Note Chi Square tests for homogeneity reveal that the three clinical groups groups. This 526-metagene achieved maximal separation do not show significant differences in terms of patient composition based between clinical groups and low variance within each clinical on known prognostic variables such as pathological TNM stage or Gleason score. Also, the blockage of the samples was not different between group. Metagenes comprised of Smaller Subsets of 21, 18 and clinical status groups; so that sample archive age effect is mitigated 6 target sequences were also generated (FIG. 3, Tables 7 and (i.e., older samples have more degraded, fragmented nucleic acids that 8). The distinctions between clinical groups with respect to could skew or bias results if not evenly distributed in clinical status the metagene scores were preserved, although increased patient groups). within-group variance was observed when using fewer target NED PSA Systemic X. Tests for sequences (FIG. 3). (n = 6) (n = 7) (n = 9) Homegeneity 0223) Next, Patient outcome predictor (POP) scores Pathological T2NO 2 3 3 p < 0.07 were generated from the metagene values for each patient. Stage T3aNO 1 4 1 T3bNO O O 3 For the 18-target sequences metagene, this entailed scaling TXN- 3 O 2 and normalizing the metagene scores within a range of 0 to Gleason Score 7 2 7 4 p < 0.06 100, where a value of between 0-20 points indicates a patient 8 3 O 2 9 1 O 3 with NED, 40-60 points a patient with PSA recurrence and Block Age 10-14 3 4 5 p < 0.9 80-100 points a patient with SYS metastatic disease (FIG. 4). Group (years) 13-15 1 1 2 In contrast, Gleason scores for patients could not be used on 16-20 2 2 2 their own to distinguish the clinical groups (Table 1). 0224. Using the Nearest Shrunken Centroids (NSC) algo rithm with leave-1-out cross-validation, smaller subsets of TABLE 2 RNA transcripts were identified that distinguish recurrent Comparison of differential expression between patient clinical (i.e., SYS) and non-recurrent (i.e., PSA and NED) dis status groups. Two-way comparisons for differential expression ease (Tables 9 and 10). NSC algorithm identified 10- and using the following statistical criteria: a) at least 2-fold mean 41-target sequence metagenes used to derive patient outcome difference in expression between comparison groups; b) expression levels >50 intensity units of a normalized data predictor scores scaled and normalized on a data range of range (approximately the mean expression level across all 0-100 points. FIGS. 5 and 6 depict box plots showing inter transcripts in all samples) and c) significance cut-off of quartile range and distribution of POP scores for each clini ps 0.001 determined using a t-test. cal group. A 148-target sequence metagene (Table 5) was similarly used to derive “POP” scores depicted in FIG. 7. Clinical Differentially T-tests were used to evaluate the statistical significance of Status Comparison Expressed RNAs differences in POP scores between recurrent (i.e., SYS) NED wSPSA 316 PSA vs NED 442 and non-recurrent (i.e., PSA and NED) patient groups NED vs SYS 213 (indicated in the figures) and show that increasing the number PSA vs SYS 194 of target sequences in the metagene combination increases SYS vs NED 269 the significance level of the differences in POP scores. SYS wSPSA 323 0225. The data generated from such methods can be used SYS wSNED & PSA 310 to determine a prognosis for disease outcome, and/or to rec NED & PSA vs SYS 77 *Differential Expression is indicated by at least 2-fold change expression between mean of ommend or designate one or more treatment modalities for clinical status variable and comparison variable; also mean value -50 intensity units of a patients, to produce patient reports, and to prepare expression normalized data range and p < 0.001 calculated using a t-test profiles. US 2011/013.6683 A1 Jun. 9, 2011 25

US 2011/013.6683 A1 Jun. 9, 2011 26

US 2011/013.6683 A1 Jun. 9, 2011 30

US 2011/013.6683 A1 Jun. 9, 2011 34

eouenbesvžNTHSCIO 535 ONI ONI ONI ONI ONI ONI ONI ONI

OZ Tz zz

US 2011/013.6683 A1 Jun. 9, 2011 39

US 2011/013.6683 A1 Jun. 9, 2011 40

US 2011/013.6683 A1 Jun. 9, 2011 41

eouenbeSVNIH epôle6e6

SCIC) ONI ONI ONI ONI ONI ONI

ToquLKSCII

·KgJVoN US 2011/013.6683 A1 Jun. 9, 2011 42

US 2011/013.6683 A1 Jun. 9, 2011 44

US 2011/013.6683 A1 Jun. 9, 2011 45

US 2011/013.6683 A1 Jun. 9, 2011 46

US 2011/013.6683 A1 Jun. 9, 2011 49

US 2011/013.6683 A1 Jun. 9, 2011 51

US 2011/013.6683 A1 Jun. 9, 2011 52

US 2011/013.6683 A1 Jun. 9, 2011 53

US 2011/013.6683 A1 Jun. 9, 2011 56

US 2011/013.6683 A1 Jun. 9, 2011 57

US 2011/013.6683 A1 Jun. 9, 2011 58

US 2011/013.6683 A1 Jun. 9, 2011 59

US 2011/013.6683 A1 Jun. 9, 2011 60

US 2011/013.6683 A1 Jun. 9, 2011 62

ONIZZA6TOOTOOTI US 2011/013.6683 A1 Jun. 9, 2011 63

US 2011/013.6683 A1 Jun. 9, 2011 64

US 2011/013.6683 A1 Jun. 9, 2011 68

US 2011/013.6683 A1 Jun. 9, 2011 70

US 2011/013.6683 A1 Jun. 9, 2011 73

US 2011/013.6683 A1 Jun. 9, 2011 74

US 2011/013.6683 A1 Jun. 9, 2011 76

US 2011/013.6683 A1 Jun. 9, 2011 77

US 2011/013.6683 A1 Jun. 9, 2011 78

US 2011/013.6683 A1 Jun. 9, 2011 79

US 2011/013.6683 A1 Jun. 9, 2011 81

US 2011/013.6683 A1 Jun. 9, 2011 82

US 2011/013.6683 A1 Jun. 9, 2011 83

US 2011/013.6683 A1 Jun. 9, 2011 84

US 2011/013.6683 A1 Jun. 9, 2011 86

US 2011/013.6683 A1 Jun. 9, 2011 87

US 2011/013.6683 A1 Jun. 9, 2011 88

US 2011/013.6683 A1 Jun. 9, 2011 89