Interrogation of ERG gene rearrangements in identifies a prognostic 10-gene signature with relevant implication to patients' clinical outcome Tarek A. Bismar*†‡, Mohammed Alshalalfa§, Lars F. Petersen*, Liang Hong Teng*, Travis Gerke¶, Ashraf Bakkar*, Amal Al-Mami*, Shuhong Liu*, Michael Dolph*, Lorelei A. Mucci¶ and Reda Alhajj§ *Department of Pathology and Laboratory Medicine, University of Calgary and Calgary Laboratory Services, †Department of Oncology, University of Calgary, ‡Southern Alberta Cancer Institute and Tom Baker Cancer Center, §Department of Computer Science, University of Calgary, Calgary, AB, Canada, and ¶Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA L.F.P.and L.H.T contributed equally to this work

Objectives • Patients with the ERG-like signature were significantly associated with disease recurrence on univariate (hazard • ERG-gene rearrangement defines a distinct molecular = subtype of PCA with potential biological and clinical ratio [HR] 2.6; 95% confidence interval [CI]:1.3–5.2; P implications. 0.004) and multivariate analysis (HR 2.3; 95% CI:1.1–4.6, P = 0.016) compared with patients without this signature. • To identify a molecular signature reflective of the downstream effects of ERG-mediated transcriptional • Within the group of patients with Gleason score (GS) 6 and 7 regulation with prognostic implication in patients with PCA, the signature added prognostic value beyond GS and prostate cancer (PCA). identified patients at higher risk of cancer deaths more accurately than GS alone or in combination with ERG status. Material and Methods • Protein expression of the 10 genes were significantly associated with ERG and disease progression regardless of • We used a singular value decomposition (SVD) ERG status. bioinformatics approach to re-analyse data previously generated from 46 prostate tumours, and Conclusion identified an ERG-like gene signature. • The characterized ERG-like signature was reflective of • The signature was validated on several patient cohorts and aggressive features of ERG-mediated transcription and was individual genes were correlated to ERG expression and prognostically robust. PCA progression. • The combination of this signature with clinicopathological variables should be validated prospectively to explore its Results clinical utility in stratifying patients with PCA and in •AnERG-like 10-gene signature was identified and identifying those at higher risk of metastatic and lethal validated in PCA cohorts of the physician health study disease. (p115) (n = 110) in addition to three independent public datasets, and was significantly associated with disease Keywords progression, biochemical recurrence and PCA-specific ERG rearrangements, Gleason score, molecular pathways, mortality. ERG-like gene signature, prostate cancer, prognosis

Introduction both their mutual exclusivity with other proposed PCA subtypes and the fundamentally different chromatin biology ETS (E26) fusions have been proposed to constitute a unique conferred by ETS-positivity [1–6]. These findings have allowed molecular subtype of prostate cancer (PCA), as evidenced by the conception of a road map associating specific molecular

© 2013 The Authors BJU International © 2013 BJU International | doi:10.1111/bju.12262 BJU Int 2013 Published by John Wiley & Sons Ltd. www.bjui.org wileyonlinelibrary.com Bismar et al.

subtypes of PCA with progression pathways, and pairing profiling and further bioinformatics clustering based on ERG targeted therapies with particular PCA subtypes, similar to status. To assess the relationship of the 10-gene signature to algorithms used in breast cancer and other haematological patients’ clinical outcome, we used three publically available malignancies [1]. To date, however, the prognostic value of ETS cohorts, the Swedish GSE8402, Taylor and Glinsky [33–35]. We gene rearrangements has remained uncertain and has generated also used a case–control subgroup of patients from the conflicting results. The majority of published reports showing Harvard University Physician Health study (n = 115) who prognostic value have been documented on cohorts reflective of were treated by radical prostatectomy and for whom the natural history of PCA, while those not able to confirm any information regarding the development of metastasis and prognostic significance have been investigated in surgically lethal disease was available. Protein expression of the 10-gene treated cohorts, with few exceptions [7–20]. The vast majority of signature relative to disease progression and ERG status, was ETS gene rearrangements involve ERG, a prototypical ETS assessed using a progression cohort consisting of 61 patients, that transcriptionally activates hundreds of with samples representing disease progression (benign genes. While not all of these rearrangments have prognostic prostate tissue, localized PCA and castration-resistant disease), value, several studies have characterized novel genes associated obtained from radical prostatectomy and TURP samples, with ERG gene rearrangements and their respective signalling respectively. Additional information about the cohorts can be pathways [21–29]. Furthermore, other studies have chosen a found in the Supporting Information (supplementary Methods differential expression approach between two groups of section). The study was approved by the institutional review ERG-positive and -negative tumours to identify downstream board at the University of Calgary, Calgary, AB, Canada and deregulated genes and genes with biological significance in the Harvard University School of Public Health. patient prognosis [30,31]. Taken together, these studies show that a thorough interrogation and characterization of the Assessment of ERG-Gene Rearrangement molecular mechanisms that result in, or result from, ERG-gene rearrangements will enable us to identify novel PCA biomarkers ERG status was confirmed using a fluorescent in situ as well as characterize new pathways potentially involved in hybridization (FISH) break-apart probe assay and disease progression, allowing a better prediction of patient immunohistochemistry (IHC) on a tissue microarray of prognosis. the first Cohort (above) treated by different androgen deprivation therapies as previously described [13]. To omit It is also possible that a subset of these target genes could false-positive/-negatives, only cases with consistent FISH and contribute to the ‘ERG-ness’ of a tumour, that is, a measure of IHC results were included in the SVD approach. the strength by which ERG-mediated regulation is active in the downstream gene pathway, independent of whether the tumour Expression Profiling is ERG-positive or -negative; therefore, we hypothesized that ERG-ness, or an ERG-associated gene expression signature, We re-analysed gene expression data, which was previously would have more prognostic value and be more clinically published and generated using a complementary relevant than ERG status itself. In the present study, we used a DNA-mediated annealing selection and ligation assay (DASL; differential expression microarray approach, combined with a Illumina, San Diego, CA, USA) to regroup tumour samples bioinformatics interrogation of ERG-negative and ERG-positive into ERG-positive and -negative cases [27]. tumours using the singular value decomposition (SVD) method [32], to identify novel genes associated with disease progression Immunohistochemistry and with prognostic implication for cancer recurrence and disease-specific mortality, regardless of ERG status. The We used the Ventana autostainer system (NEXES IHC model; SVD-based approaches discover genes that have high entropy Ventana Medical Systems Inc., Tuczon, AZ, USA) to assess on the whole expression data, unlike other methods that test the protein expression by IHC, using the antibodies and single genes without considering the overall effect on the data. conditions listed in Table S1, and according to manufacturer’s We further validate the prognostic significance of this signature protocol outlined in full in the supplementary Methods across several cohorts of PCA patients. section.

Pathological Analysis Materials and Methods The samples used in gene expression profiling were previously Patient Cohorts and Samples validated by at least two pathologists. Protein intensity Several cohorts were interrogated in this study. To investigate expression was assessed semi-quantitatively by two the downstream ERG transcriptional regulation signature, we pathologists (L.H.T. and T.A.B.) using a four-tiered system (0, used a cohort of 52 patients with castration-resistant disease negative; 1, weak; 2, moderate; and 3, strong) without prior (54 samples) which was interrogated for gene expression knowledge of clinical information.

© 2013 The Authors 2 BJU International © 2013 BJU International ERG-like signature in prostate cancer prognosis

Statistical Analysis entropy of the whole gene expression E (Am_n), including gene i, and then we find the entropy of the gene expression without gene One-way ANOVA tests were performed to determine if the i. Genes are sorted based on CEi value and the top genes are scores for each marker were significantly different between selected as significant. We preprocessed the data by averaging benign tumours, localized PCA and castration-resistant PCA. gene expression across the normal and cancer samples. Genes Data for each of the 10 markers from the four-tiered IHC with high entropy across the two average samples are anticipated analyses were used for these tests. to have a global differential expression pattern. The advantage of Phi-correlations were used to test each marker’s association this method compared with the traditional t-test and Significant with ERG status in PCA samples. Marker association with Analysis of Microarrays is that it considers the effect of each gene ERG expression was categorized into two groups, based on on all other genes. Genes that have high entropy on the other ERG intensity (negative and weak) vs (moderate and high) genes are ranked high and identified as significant biomarkers. expression. Analyses were performed using SPSS v.16 (IBM Corp., Somers, NY, USA). A two-tailed P-value ≤0.05 was considered to indicate statistical significance. Results Computational Analysis and Characterization of Multivariate Cox regression analysis of 150 patients from Significant Differentially Expressed Genes Taylor data was initially performed, including two presurgical clinical and pathological variables (prebiopsy serum PSA and We used the SVD-based gene selection method [32] to discover biopsy Gleason score (GS) in addition to the 10-gene signature highly differentially expressed genes ranked based on their to predict biochemical recurrence. contribution to the overall entropy of the expression data. Genes that were highly ranked were identified as potential Gene Selection and Computational Analysis biomarkers discriminating between ERG-rearranged (positive) and ERG-non-rearranged (negative) samples. Only samples with We developed a new technique based on SVD to identify known and consistent ERG status (based on IHC and FISH) genes with high entropy between the two cancer classes were used in this study. A total of 46 samples, 27 ERG-negative (ERG-gene rearrangement vs ERG no rearrangement). SVD is (ERG0) and 19 ERG-positive (ERG1) were included in the final a linear transformation of the expression data from n genes by analysis. m samples represented by a matrix Am_n to the reduced diagonal L-eigengenes by L-eigenarrays matrix, where L = min Based on SVD analysis, a list of 10 candidate genes was (n; m). and si; i = 1; : : :;L are the singular values. We calculated identified showing significant expression changes between the normalized relative significance pk of the k-th eigengene ERG0 and ERG1 samples (Fig. 1A; Table S2). The 10-gene list for Am_n as follows: correlated to ERG status with a 76% accuracy rate using a linear support vector machine. The 10-gene ERG-like 2 = sk signature relationship to ERG status was validated on three pk l 2 independent public cohorts (the Swedish GSE8402, Taylor and ∑si Glinsky cohorts) with a 65–83% accuracy rate (data not i=1 shown) [34–36]. and the Shannon entropy of the data represented by Am_n is calculated as: Association of Protein Levels of Identified Genes with ERG Protein Expression L =− 1 EA()mxn ∑ pklog( pk ) To validate the association of the genes identified by log(L ) = k 1 computational methods with ERG-gene rearrangement, we Varshavsky et. al [32]. defined the contribution of the i-th gene, analysed the protein expression of those genes in comparison with ERG protein expression using a progression tissue denoted CEi, by considering a leaving out comparison as: microarray of 86 samples (320 cores) belonging to 61 patients. =−i CEimxnmxn EA()( EA−1 ) In each subset of samples (i.e. ERG0 or ERG1), the intensity level for each marker was categorized as absent/weak or i Where Amxn−1 is the matrix Am_n with the i-th row deleted. The moderate/strong. Seven of 10 markers (CHD5, Ankyrin, SVD-based approaches discover genes that show change in the MEIS2, FRP-3, LEF1, PLA2G7 and WNT2) were differentially expression level across samples. Genes that show a large change expressed relative to ERG protein at a significant level, while across samples compared with genes that show a small change are ING3 and ANXA4 showed similar, although nonsignificant, more likely to have a large SVD value (CEi).Geneswithahigh trends. In these samples, Syntenin expression was not SVD value pose large entropy compared with the expression of significantly associated with ERG expression; however, this the other genes. To assess the entropy of gene i, we find the was confirmed using a different cohort with a larger number

© 2013 The Authors BJU International © 2013 BJU International 3 Bismar et al.

Fig. 1 Association of markers with ERG. (A) Heatmap of the 10-gene signature across ERG1 and ERG0 samples in the Swedish cohort. (B) IHC plots of the protein expression levels of each of the 10 markers relative to ERG protein expression, plotted as negative/weak vs moderate/high intensity. *Refers to significant correlation, P < 0.05. Red: over expressed; Green: down expression; black: missing data.

A B CHD5 Syntenin 100% 100% 80% 80% 60% 60% 40%

% of Cases % of 40% 3 20% 20% ANXA4 * 0% 0% ERG− ERG+ ERG− ERG+ 2 Syntenin WNT2 FRZB 100% 100% WNT2 80% 80% 60% 60% 1 PLA2G7 40% 40% % of Cases % of * 20% * 20% CHD5 0% 0% 0 ERG− ERG+ERG− ERG+ ANK3 ANK3 MEIS2 100% 100% FRZB 80% 80% −1 60% 60% MEIS2 40% 40% % of Cases % of 20% * 20% * −2 LEF1 0% 0% ERG− ERG+ERG− ERG+ ING3 ING3 PLA2G7 100% 100%

−3 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 80% 80% ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG ERG 60% 60% 40% 40% % of Cases % of * 20% * 20% 0% 0% ERG− ERG+ERG− ERG+ LEF1 ANXA4 100% 100% 80% 80% 60% 60% 40% 40% % of Cases % of 20% * 20% 0% 0% ERG− ERG+ERG− ERG+

Moderate/High Negative/Weak Expression Expression of samples (data not shown). The mean protein expression of markers was differentially expressed between several stages each of the 10 markers in PCA progression and its relation to of PCA progression, with ANAX4 being significantly ERG is shown in Fig. 1B and Table 1. Antibody specificities downregulated with disease progression (P < 0.001; Table 1). were confirmed by Western blots on protein isolated from frozen tissues from patients (data not shown). ERG-Gene Signature in Relation to Prognosis and Overall Survival of Patients with PCA Association of Protein Expression Levels of Identified As PCA is known to be heterogeneous, we hypothesized that Genes with PCA Progression ERG status alone might not be very predictive of patient To test our hypothesis that the ERG signature is reflective of prognosis, consistent with earlier reports, and that integrating an ERG-mediated transcriptional regulation regardless of ERG the status of other genes reflective of the downstream effects status, we analysed the expression levels of the 10 markers in of ERG-mediated transcription in each tumour sample might benign, localized and castration-resistant prostate cancer tissue improve overall patient prognosis. To validate our signature using the progression tissue microarray described above. The in other cohorts, the 10-gene signature was represented as mean intensity level for each marker was then plotted relative a vector of length 10 (reference vector), where the gene to disease progression (Fig. 2A). Protein expression of the 10 has a value of 1 if it is overexpressed and 0 when it is

© 2013 The Authors 4 BJU International © 2013 BJU International ERG-like signature in prostate cancer prognosis

Table 1 Mean protein expression of the 10 markers in relation to PCA progression and their signifcance to ERG.

Markers Progression Benign Localized PCA CRPC ERG association P value Mean ± SD Mean ± SD Mean ± SD P value

CHD5(cyto) <0.001 1.81 ± 0.430 1.98 ± 0.311 1.27 ± 0.522 0.182 Ankyrin(cyto) <0.001 0.34 ± 0.509 1.52 ± 0.517 1.61 ± 0.502 0.203 Ankyrin(mem) <0.001 1.55 ± 0.724 2.09 ± 0.684 1.94 ± 0.725 0.010 Syntenin(cyto) <0.001 0.59 ± 0.523 2.04 ± 0.528 2.04 ± 0.713 0.378 Meis2(cyto) <0.001 2.51 ± 0.901 1.23 ± 0.661 1.50 ± 0.816 0.587 FRP-3(cyto) <0.001 1.07 ± 0.596 1.68 ± 0.592 2.53 ± 0.539 0.011 LEF1(nuclear) <0.001 1.26 ± 0.822 1.90 ± 0.640 1.00 ± 0.833 0.000 PLA2G7(cyto) <0.001 1.04 ± 0.630 1.79 ± 0.629 1.63 ± 0.809 0.000 WNT2(cyto) <0.001 0.58 ± 0.597 1.73 ± 0.610 1.96 ± 0.690 0.001 ANX4 (cyto) <0.001 1.95 ± 0.590 1.60 ± 0.660 0.98 ± 0.887 0.083 ING3(nuclear/cyto) <0.001 1.59 ± 0.496 2.19 ± 0.549 2.17 ± 0.509 0.082

CRPC, castration-resistant prostate cancer.

downregulated. To stratify samples based on the signature, Fig. 3A; hazard ratio [HR]: 1.25, 95% CI: 0.99–1.58, P = 0.005). the expression of each of the genes across all samples was Application of our 10-gene signature separated patients into categorized into high and low expression, and then a vector ERG1-like (n = 22; seven were ERG1)andERG0-like patients of length 10 for each sample was constructed (1 for high (n = 259), which showed a stronger association with patient expression and 0 for low expression). The final step was to prognosis than did ERG status alone (Fig. 3B; HR: 2.38, find the correlation between the reference vector and the 95%CI: 1.45–3.8, P < 0.001). Univariate analysis using a Cox samples’ vectors; samples with high correlation (r > 0.5) were proportional hazard model confirmed that the 10-gene model considered as ERG1-like and the remainder were considered was more significantly associated with overall survival than ERG0-like. was ERG status alone (P = 0.003 vs. P = 0.053, respectively). We first investigated the 10-gene signature using the PHS In the Glinsky cohort, patients with high ERG (n = 32) were (physician Health study). Samples were grouped into not well separated from patients with low ERG (n = 47; ERG-1-like and ERG-0-like samples and we assessed their Fig. 3C; HR: 0.76, 95% CI: 0.38–1.47, P = 0.15), so we further association with cancer lethality. The 10-gene signature was grouped samples into high risk (n = 21) and low risk (n = 58) associated with lethal disease with an odds ratio (OR) of 4.33 as described by Varambally et al. [37] using ERG status, but (95% CI: 1.82–10.3; P < 0.001) compared with ERG status the poor separation remained (Fig. 3D; HR: 1.4, 95% CI: alone: OR = 1.49 (95% CI: 0.68–3.26, P = 0.32). When 0.67–2.67, P = 0.47). We then used our 10-gene signature investigating net reclassification improvement, the 10-gene which was able to separate patients into high-risk (n = 18, signature showed significant increases in sensitivity (P = 0.02) 11 were ERG1) and low-risk (n = 61) groups based on and specificity (P = 0.008) for predicting lethal disease post-surgical biochemical recurrence (Fig. 3E; HR: 1.8, 95% compared with GS alone. CI: 1.2–2.8, P = 0.15). Using univariate analysis, the 10-gene model was more associated with prognosis, albeit at We then applied the 10-gene signature to various publicly borderline significance (P = 0.1) compared with ERG status available patient datasets (Swedish, Taylor and Glinsky alone (P = 0.4). Similarly, in the Taylor cohort, ERG expression cohorts) [34–36] to investigate the strength of our gene was not effective in separating patients into clinically distinct signature in predicting patients’ prognosis compared with groups (Fig. 3F; HR: 0.7, 95%CI: 0.5–1.5, P = 0.33), but the ERG expression alone. In the Swedish cohort, ERG status was 10-gene model successfully separated patients into high-risk determined by FISH; whereas, in both the Taylor and the ERG1-like (n = 23; nine were ERG1) and low-risk ERG0-like Glinsky cohorts it was predicted based on ERG-gene (n = 117) groups (Fig. 3G; HR: 3.2, 95%CI: 2.2–5.8, P = 0.0026). expression (samples with ERG expression above the third quartile range were considered ERG-positive). Based on ERG The 10-gene signature was also able to classify samples with status, we grouped samples into ERG1 (fusion-positive) and very aggressive forms of PCA (high GS and highly metastatic; ERG0 (fusion-negative). We also used our 10-gene model to i.e. cluster 5 in Taylor data) from the other clusters, slightly group patients into ERG1-like and ERG0-like signatures to more accurately than ERG status alone (80%; HR: 1.07, 95% investigate the significance of the signature in stratifying CI: 0.4–1.6 vs 77%; HR: 0.87, 95% CI: 0.2–1.4). patients into different prognostic groups. We also used Cox regression multivariate analysis to assess the In the Swedish cohort, patients with ERG1 status (n = 46) were significance of the ERG-like signature in stratifying patients at higher risk of lethal disease than those with ERG0 (n = 226; before implementing treatments, so we included presurgical

© 2013 The Authors BJU International © 2013 BJU International 5 Bismar et al.

Fig. 2 Association of markers with prostate cancer progression. (A) Selected images of BEN CRPCA BEN PCA CRPCA PCA tissue microarray elements representing IHC analysis of protein levels for the 10 marker CHD5 genes in benign prostate gland (BEN), localized prostate cancer (PCA) and castration-resistant PCA (CRPCA). Relative

Relative Intensity levels of proteins as assessed by blinded pathology analysis of tissue microarrays (n = 320 cores) are provided to the right. Syntenin Relative Intensity

WNT2 Relative Intensity Relative

FRP-3 (FRZB) Relative Intensity

ANK3 (Ankyrin G) Relative Intensity Relative

MEIS2 Relative Intensity Relative

ING3 Relative Intensity Relative

PLA2G7 Relative Intensity Relative

LEF1 Relative Intensity Relative

ANXA4 (AnnexinA4) Relative Intensity Relative

© 2013 The Authors 6 BJU International © 2013 BJU International ERG-like signature in prostate cancer prognosis

Fig. 3 Application of ERG-like 10-gene signature better predicts patients' outcome than ERG status alone. (A) Kaplan–Meier survival plot showing that ERG status can separate patients into two groups in the Swedish PCA cohort. (B) Kaplan–Meier plot showing that the 10-gene model has significant association with the survival outcome in the Swedish PCA cohort, ERG1-like patients are the cluster of patients enriched with ERG1 samples. (C) Kaplan–Meier plots of patients grouped based on ERG status. (D) Kaplan–Meier plots of patients grouped into high and low risk based on Varambally et al. [37], and (E) samples grouped based on the 10-gene model in the Glinsky PCA cohort. (F) Kaplan–Meier plots of patients grouped based on ERG expression. (G) Kaplan–Meier plots of patients, grouped based on 10-gene model in the Taylor PCA cohort. (H) Kaplan–Meier plots of patients with GS 6 and 7, grouped based on the 10-gene signature in the Swedish PCA cohort.

A 1 B 1 ERG positive patients ERG positive- like patients 0.9 ERG negative patients 0.9 ERG negative- like patients 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2

Cumulative Survival Probability = Logrank p 0.005 Cumulative Survival Probability = n = 226 Logrank p 0.0006 n = 259 0.1 = 0.1 = Cox p 0.053 Cox p = 0.0005 n 22 n = 46 0 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Time to death, months Time to death, months Swedish Prostate Cancer Cohort

C 1 DE1 1 ERG positive patients High-risk patients ERG positive- like patients ERG negative patients 0.9 Low-risk patients 0.9 ERG negative- like patients 0.9 0.8 0.8 0.7 0.8 = 0.6 n 58 0.7 = n = 32 n 61 0.7 0.5 0.6 0.4 Probability Probability Probability 0.5 0.6 0.3 Relapse Free Survival Free Relapse Survival Free Relapse Relapse Free Survival Free Relapse 0.4 Logrank p = 0.15 = 0.2 = = = 0.5 n 47 Logrank p 0.47 n 21 Logrank p 0.15 = = 0.3 n 18 Cox p 0.4 0.1 Cox p = 0.4 Cox p = 0.12 0.4 0 0.2 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 Time to recurrence, months Time to recurrence, months Time to recurrence, months Glinsky Prostate Cancer Cohort

FG1 1 H ERG positive patients ERG positive- like patients ERG positive- like patients ERG negative patients ERG negative- like patients ERG negative- like patients 0.95 0.9 1 0.8 0.9 0.8 = 0.7 n 117 0.85 n = 48 0.6 0.6 0.8

= Probability n 92 0.5

Probability 0.4 0.75 Relapse Free Survival Free Relapse 0.4 Relapse Free Survival Free Relapse = = n = 187 Logrank p 0.33 Logrank p 0.0026 0.2 0.7 = Probability Survival Cumulative Logrank p = 0.0001 Cox p 0.2 0.3 = Cox p 0.00086 n = 23 Cox p = 0.0001 n=13 0.65 0.2 0 0 102030405060708090100 0 10 20 30 40 50 60 70 80 90 100 0 50 100 150 200 250 Time to recurrence, months Time to recurrence, months Time to death, months Gleason Score 6&7 Patients Taylor Prostate Cancer Cohort (Swedish PCA Cohort

© 2013 The Authors BJU International © 2013 BJU International 7 Bismar et al.

Table 2 Univariate and multivariate analysis for cancer recurrence in PCA: taylor data.

Variable N Univariate analysis Multivariate analysis

HR (95% CI) P HR (95% CI) P

Signature 0.004 0.016 ERG0-like 114 1(ref) 1(ref) ERG1-like 23 2.6 (1.3–5.2) 2.3 (1.1–4.6) Biopsy PSA 0.13 0.28 1–4 27 1(ref) 1(ref) 4–10 85 0.9 (0.3–2.3) 0.9 (0.3–2.3) 10–40 23 1.45 (0.8–2.4) 1.26(0.7–2.1) Biopsy GS 5.0e-4 0.011 ≤7 125 1(ref) 1(ref) >7 12 4.0 (1.8–9) 2.9 (1.2–6.9)

Table 3 Hazard ratio for cancer-specific mortality in the Swedish Cohort using GS alone, GS and ERG-gene rearrangments and GS with ERG-like signature.

Group Number of samples HR (95%CI) P value/Cox value

GS 7 Patients GS 7 alone GS 7(3+4) GS 7(4+3) 2.23 (1.5–3.5) 2 × 10–4/2.4 × 10–4 n = 79 n = 38 GS 7 + ERG GS 7(3+4) and ERG0 GS 7(4+3) or ERG1 1.8 (1.4–2.5) 9 × 10–4/5.6 × 10–5 status n = 61 n = 44 GS 7 + ERG-like GS 7(3+4) and ERG0-like GS 7(4+3) or ERG1-like 2.52 (1.6–3.4) 3 × 10–5/6.2 × 10–6 signature n = 72 n = 45 GS 6,7 Patients GS 6,7 GS 6 +GS 7(3+4) GS 7(4+3) 3 (2–4.5) <10–7/5.3 × 10–8 n = 162 n = 38 GS 6+7 + ERG GS 6 +GS 7(3+4) and ERG0 GS 7(4+3) or ERG1 1.5 (1.2–2) <10–5/5 × 10–5 signature n = 136 n = 46 GS 6+7 + ERG-like GS 6 +GS 7(3+4) and ERG0-like GS 7(4+3) or ERG1-like 3.2 (2.1–4.1) <10–10/7.5 × 10–11 signature n = 153 n = 45 variables: needle biopsy GS and pre-biopsy serum PSA from analysis to developing of the 10-gene signature and validation the Taylor cohort (n = 150). Patients with ERG1-like signature of the signature using independent and well-annotated had an overall HR of 2.6 (95% CI: 1.3–5.2; P = 0.004) and HR cohorts. 2.3 (95% CI:1.1–4.6; P = 0.016) for predicting PCA relapse on univariate and multivariate analysis, respectively (Table 2). Discussion Next we assessed the ERG-like signature in terms of its ability Predicting aggressive disease is one of the most important and to stratify subgroups of patients with GS 6 and 7 PCA, given necessary steps in cancer management. This is especially true that these Gleason patterns represent the majority of patients in PCA, where overdiagnosis resulting from PSA screening encountered in clinical practice, where current clinical and has reached alarming levels. Currently, a reliable distinction pathological variables are not able to differentiate aggressive between indolent and aggressive PCA is not achievable based from indolent disease. Focusing on patients with GS 6 and 7 on pathological and clinical variables alone. ERG-gene from the Swedish cohort (n = 200), our 10-gene signature was rearrangements are one of the most common gene alterations still able to separate patients into high-risk (n = 13) and affecting PCA [1,9]. Expression profile studies suggest that low-risk (n = 187, 13 of which were ERG positive) groups ERG represents a subset of prostate tumours that share (Fig. 3H; HR: 3.5, 95% CI: 1.8–6.6, P < 0.001), compared with specific progression pathways with potential prognostic and ERG status alone, which separated patients into high-risk (n = therapeutic implications [9]. 26) and low-risk (n = 174) groups and showed a lower HR for lethal disease (HR: 2.4, 95% CI: 1.5–3.9, P < 0.001). Using a As PCA is among the most heterogeneous tumours, it is mixed clinical-molecular signature (i.e. GS and ERG-like expected that predicting tumour progression would be more signature) we were able to identify higher-risk patients more achievable and reliable based on a multi-gene model rather accurately than by using GS or ERG status alone (Fig. 3H; than individual genes. This is similar to what is being clinically Table 3). implemented for patients with breast and colon cancer, with Figure 4 is a flow chart of the multiple analysis and cohorts the use of an ‘oncotype DX’ assay test. In the current study, we evaluated in the present study from initial bioinformatics identified and characterized a 10-gene signature that shows

© 2013 The Authors 8 BJU International © 2013 BJU International ERG-like signature in prostate cancer prognosis

Fig. 4 Flow chart of the multiple analysis and cohorts evaluated in this study from initial bioinformatics analysis to the development of the 10-gene signature and validation of the signature using independent and well-annotated cohorts. RFS, recurrence-free survival; OS; overall survival; KM, Kaplan–Meier, GS Gleason score. Initial cohort (n=46)

Group samples based on ERG status ERG- ERG- fusion- fusion- negative positive

Gene selection by SVD

10-gene signature Computational validation Biological validation

Assess association between 10-gene model and lethal disease using PHS data Validating the protein expression Validation of 10-gene model on independent datasets of the 10-gene model on 86 Taylor samples (320 cores) in relation to Glinsky disease progression and ERG Swedish data data data Use KM and Cox to expression Use KM and Cox to associate model Use KM and Cox associate model with RFS compared regression to with OS compared with ERG expression associate model with ERG status, GS with RFS compared with ERG expression Assess the power of Multivariate the model to stratify regression analysis high risk samples in subset (GS6,7) compared with ERG

potential for further development as a signature for aggressive in clinical practice. Moreover, the 10-gene signature was able and indolent PCA. This panel was identified by a combination to stratify patients in the intermediate grade category (GS 6 of computational analysis and biological assays and validated and 7) and identify two distinct classes within this group of on several well-annotated and large cohorts. The signature patients. When implementing a mixed clinical and molecular identified was confirmed to be more accurate than ERG gene model based on a combination of GS and the 10-gene expression alone, which at times was found to be unreliable signature, we were more accurate in identifying aggressive (based on the Glinsky and Taylor surgical charts). The fact tumours that could have been misclassified if judged by GS that the signature was reflective of ERG status in only 76% of alone. This indicates the strength of our signature compared patients is worth noting, as it signifies the ability to reflect with other models, where combining the signature with GS transcriptional regulation within a tumour sample regardless did not improve the predictive accuracy [30]. Our 10-gene of its ERG expression level. Hence, it is superior to ERG as a signature, coupled with other preclinical variables, such as GS potential biomarker for aggressive disease (based on validation and serum PSA, could be the initial step towards improving cohorts above). Our signature was also robust in multivariate our ability to stratify patients into different prognostic groups analysis, which included biopsy GS and prebiopsy serum PSA, of aggressive and indolent disease before implementing two of the most powerful and currently most used biomarkers definite therapies. It could enable us to offer selected patients

© 2013 The Authors BJU International © 2013 BJU International 9 Bismar et al. expectant therapy, based on the signature of their tumours at Conflict of Interest the time of prostate biopsy, thus avoiding over-treatments and T. A. B. is a co-inventor on a patent filed by The University of unnecessary harmful side effects for a subgroup of patients Calgary covering the 10-gene signature prognostic and with indolent disease. therapeutic implementation in prostate and breast cancer and These data collectively confirm that the expression signature leukaemia. No other conflict of interest exists. of our 10-gene model is more reflective of the downstream References effects of ERG-mediated transcription in PCA than the ERG gene alone. It further documents that some ERG-negative 1 Tomlins SA, Rhodes DR, Perner S et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 2005; 310: tumours are molecularly, biologically and prognostically 644–8 more related to other ERG-positive tumours, based on the 2 Tomlins SA, Mehra R, Rhodes DR et al. TMPRSS2:ETV4 gene fusions ERG-mediated transcriptional regulation contributing to PCA define a third molecular subtype of prostate cancer. Cancer Res 2006; 66: progression (the same is also true for ERG-positive tumours 3396–400 being classified as ERG0-like signature). Our model is further 3 Helgeson BE, Tomlins SA, Shah N et al. Characterization of TMPRSS2:ETV5 and SLC45A3:ETV5 gene fusions in prostate cancer. evidence of the heterogeneous and multifocal nature of PCA. Cancer Res 2008; 68: 73–80 The 10-gene signature model is more robust than other 4 Esgueva R, Perner S, J LaFargue C et al. Prevalence of TMPRSS2-ERG published ERG-related signatures, as it is able to identify more and SLC45A3-ERG gene fusions in a large prostatectomy cohort. Mod accurately patients in the higher-risk group who could have Pathol 2010; 23: 539–46 been misclassified as having a ‘favourable prognosis’ based on 5 Rickman DS, Pflueger D, Moss B et al. SLC45A3-ELK4 is a novel and GS alone. frequent erythroblast transformation-specific fusion transcript in prostate cancer. Cancer Res 2009; 69: 2734–8 Finally, the significance of this multi-gene model is currently 6 Palanisamy N, Ateeq B, Kalyana-Sundaram S et al. Rearrangements of being validated and further refined using quantitative the RAF kinase pathway in prostate cancer, gastric cancer and melanoma. Nat Med 2010; 16: 793–8 RT-PCR-based methods which would allow us to better 7 Hoogland AM, Jenster G, van Weerden WM et al. ERG quantify the expression levels of the 10 proposed genes within immunohistochemistry is not predictive for PSA recurrence, local a given patient’s sample using appropriate housekeeping recurrence or overall survival after radical prostatectomy for prostate genes. This would be similar to developing a ‘recurrence score’ cancer. Mod Pathol 2012; 25: 471–9 per patient, based on the relative risk for disease recurrence or 8 Rubin MA, Maher CA, Chinnaiyan AM. Common gene rearrangements lethal outcome. By incorporating such tests in PCA, we hope in prostate cancer. J Clin Oncol 2011; 29: 3659–68 9 Tomlins SA, Bjartell A, Chinnaiyan AM et al. ETS gene fusions in to be able to more accurately predict those tumours to either prostate cancer: from discovery to daily clinical practice. Eur Urol 2009; indolent or aggressive nature, especially those within the GS 6 56: 275–86 and 7 category. Those patients represent the commonest 10 Minner S, Enodien M, Sirma H et al. ERG status is unrelated to PSA group seeking medical attention and the most difficult to recurrence in radically operated prostate cancer in the absence of predict based on current clinical and pathological variables anti-hormonal therapy. Clin Cancer Res 2011; 17: 5878–88 alone. 11 ToubajiA,AlbadineR,MeekerAKetal.Increased gene copy number of ERG on 21 but not TMPRSS2-ERG fusion predicts outcome In summary, we identified and validated a multi-gene model, in prostatic adenocarcinomas. Mod Pathol 2011; 24: 1511–20 reflective of the transcriptional regulation of downstream ERG 12 Attard G, Clark J, Ambroisine L et al. Duplication of the fusion of TMPRSS2 to ERG sequences identifies fatal human prostate cancer. in a given tumour sample, which is more robust than ERG 2008; 27: 253–63 expression or GS alone in identifying patients at higher risk of 13 Darnel AD, Lafargue CJ, Vollmer RT, Corcos J, Bismar TA. disease recurrence and lethal outcome. This signature, coupled TMPRSS2-ERG fusion is frequently observed in Gleason pattern 3 with clinicopathological variables in prostate biopsy, could prostate cancer in a Canadian cohort. Cancer Biol Ther 2009; 8: 125–30 enable us to separate aggressive from indolent disease and to 14 FitzGerald LM, Agalliu I, Johnson K et al. Association of TMPRSS2-ERG identify patients at highest risk for cancer progression and gene fusion with clinical characteristics and outcomes: results from a population-based study of prostate cancer. BMC Cancer 2008; 8: 230 lethal disease. Finally, functional studies investigating 15 Gopalan A, Leversha MA, Satagopan JM et al. TMPRSS2-ERG gene individual genes within this signature could shed light on fusion is not associated with outcome in patients treated by novel potential pathways associated with disease progression prostatectomy. Cancer Research. 2009; 69: 1400–6 and with therapeutic potential. 16 Hermans KG, Boormans JL, Gasi D et al. Overexpression of prostate-specific TMPRSS2(exon 0)-ERG fusion transcripts corresponds with favorable prognosis of prostate cancer. Clin Cancer Res 2009; 15: Acknowledgements 6398–403 This work was supported in part by the Prostate Cancer 17 NamRK,SugarL,WangZetal.Expression of TMPRSS2:ERG gene Foundation Young Investigator Award (to T.A.B). This work fusion in prostate cancer cells is an important prognostic factor for cancer progression. Cancer Biol Ther 2007; 6: 40–5 was also supported by Prostate Cancer Canada and is proudly 18 NamRK,SugarL,YangWetal.Expression of the TMPRSS2:ERG fusion funded by the Movember Foundation-Grant #B2013-01. The gene predicts cancer recurrence after surgery for localised prostate cancer. authors thank Felix Feng for scientific input and discussion. BrJCancer2007; 97: 1690–5

© 2013 The Authors 10 BJU International © 2013 BJU International ERG-like signature in prostate cancer prognosis

19 Rubio-Briones J, Fernandez-Serra A, Calatrava A et al. Clinical 31 Ribeiro FR, Paulo P, Costa VL et al. Cysteine-rich secretory protein-3 implications of TMPRSS2-ERG gene fusion expression in patients with (CRISP3) is strongly up-regulated in prostate carcinomas with the prostate cancer treated with radical prostatectomy. J Urol 2010; 183: TMPRSS2-ERG . PLoS ONE 2011; 6: e22317 2054–61 32 Varshavsky R, Gottlieb A, Linial M, Horn D. Novel unsupervised feature 20 Saramaki OR, Harjula AE, Martikainen PM, Vessella RL, Tammela TL, filtering of biological data. Bioinformatics 2006; 22: e507–13 Visakorpi T. TMPRSS2:ERG fusion identifies a subgroup of prostate 33 Sboner A, Demichelis F, Calza S et al. Molecular sampling of prostate cancers with a favorable prognosis. Clin Cancer Res 2008; 14: 3395–400 cancer: a dilemma for predicting disease progression. BMC Med Genomics 21 Cai J, Kandagatla P, Singareddy R et al. Androgens induce functional 2010; 3: 8 CXCR4 through ERG factor expression in TMPRSS2-ERG fusion-positive 34 Glinsky GV, Glinskii AB, Stephenson AJ, Hoffman RM, Gerald WL. prostate cancer cells. Transl Oncol 2010; 3: 195–203 Gene expression profiling predicts clinical outcome of prostate cancer. 22 Gupta S, Iljin K, Sara H et al. FZD4 as a mediator of ERG J Clin Invest 2004; 113: 913–23 oncogene-induced WNT signaling and epithelial-to-mesenchymal 35 Taylor BS, Schultz N, Hieronymus H et al. Integrative genomic profiling transition in human prostate cancer cells. Cancer Res 2010; 70: of human prostate cancer. Cancer Cell. 2010; 18: 11–22 6735–45 36 Sboner A, Demichelis F, Calza S et al. Molecular sampling of prostate 23 Haffner MC, Aryee MJ, Toubaji A et al. Androgen-induced cancer: a dilemma for predicting disease progression. BMC Med Genomics TOP2B-mediated double-strand breaks and prostate cancer gene 2010; 3: 8 rearrangements. Nat Genet 2010; 42: 668–75 37 Varambally S, Yu J, Laxman B et al. Integrative genomic and proteomic 24 Hawksworth D, Ravindranath L, Chen Y et al. Overexpression of analysis of prostate cancer reveals signatures of metastatic progression. C- oncogene in prostate cancer predicts biochemical recurrence. Cancer Cell 2005; 8: 393–406 Prostate Cancer Prostatic Dis 2010; 13: 311–5 25 Kunderfranco P, Mello-Grand M, Cangemi R et al. ETS transcription Correspondence: Tarek A. Bismar, University of Calgary, Faculty factors control transcription of EZH2 and epigenetic silencing of the of Medicine, Departments of Pathology and Laboratory Medicine tumor suppressor gene Nkx3.1 in prostate cancer. PLoS ONE 2010; 5: e10547 and Oncology, Rockyview General Hospital, 7007–14th Street 26 Leshem O, Madar S, Kogan-Sakin I et al. TMPRSS2/ERG promotes SW,Calgary, AB, Canada T2V 1P9. epithelial to mesenchymal transition through the ZEB1/ZEB2 axis in a prostate cancer model. PLoS ONE 2011; 6: e21650 e-mail: [email protected] 27 Rickman DS, Chen YB, Banerjee S et al. ERG cooperates with androgen Abbreviations: PCA, prostate cancer; SVD, singular value in regulating trefoil factor 3 in prostate cancer disease progression. Neoplasia 2010; 12: 1031–40 decomposition; HR, hazard ratio; GS, Gleason score; FISH, 28 Vainio P, Gupta S, Ketola K et al. Arachidonic acid pathway members fluorescent in situ hybridization; IHC, immunohistochemistry. PLA2G7, HPGD, EPHX2, and CYP4F8 identified as putative novel therapeutic targets in prostate cancer. AmJPathol2011; 178: 525–36 Supporting Information 29 Wang J, Cai Y, Shao LJ et al. Activation of NF-{kappa}B by TMPRSS2/ERG Fusion Isoforms through Toll-Like Receptor-4. Cancer Additional Supporting Information may be found in the Res 2011; 71: 1325–33 online version of this article at the publisher’s web-site: 30 Barwick BG, Abramovitz M, Kodani M et al. Prostate cancer genes associated with TMPRSS2-ERG gene fusion and prognostic of Supplementary Materials and Methods biochemical recurrence in multiple cohorts. BrJCancer2010; 102: Table S1 Antibody specification used for IHC. 570–6 Table S2 Fold-change of genes relative to ERG.

© 2013 The Authors BJU International © 2013 BJU International 11