Transcriptomic Analysis of Prostate Cancer Progression
Total Page:16
File Type:pdf, Size:1020Kb
Transcriptomic Analysis of Prostate Cancer Progression Bridget Lindsay Bickers The thesis is submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of the University of Portsmouth June 2018 i Abstract Background Prostate cancer has a highly variable clinical course, which cannot be fully explained by clinicopathological variables. New biomarkers are needed to better inform treatment decisions at diagnosis to prevent the overtreatment of indolent disease, whilst allowing more aggressive disease to be appropriately treated with interventional and adjunctive treatments. Methods This discovery study used TaqMan arrays to measure the expression of a panel of 91 putative markers of prostate cancer progression in archival biopsy samples from 67 prostate cancer patients treated with radical prostatectomy and/or radiotherapy. The gene expression data were correlated with clinical outcome and binary logistic regression analysis was used to identify predictive models of 5-year biochemical/clinical recurrence. Results Multivariate logistic regression models were focused to a final set of 28 two-gene and 9 three-gene models of recurrence which showed the highest overall performance in terms of calibration (Likelihood Ratio Test statistic (χ2)) and discriminatory accuracy (Area Under the Curve), whilst retaining parsimony (smallest number of predictor variables). The best performing models according to χ2 were AMACR.EFNA5.SDHA (χ2(3) = 22.483, p<0.0001) and AMACR.HPRT1.SDHA (χ2(3) = 22.265, p<0.0001). In addition to demonstrating a highly significant improvement in fit to the data in comparison to the null model, these models had high discriminatory accuracy (Area Under the Curve of 0.812 and 0.829 respectively). By selecting an optimal threshold for categorisation of the patients into the recurrent or non-recurrent group, AMACR.EFNA5.SDHA could correctly categorise 76.1% of patients with a sensitivity and specificity of 81% and 73.9% ii respectively. AMACR.HPRT1.SDHA could correctly categorise 85.1% of patients with a sensitivity and specificity of 81% and 87% respectively. Conclusion The study identified a selection of two- and three-gene models with potential to predict 5-year recurrence in this set of prostate cancer patients. These models may provide novel prognostic information beyond standard prognostic variables at the time of treatment selection but would need to be externally validated in another set of prostate cancer patients to confirm their generalizability as prostate cancer biomarkers. The findings contributed to knowledge regarding disease progression in prostate cancer. Several genes of interest were identified, some of which have previously been described in relation to prostate cancer progression and others that showed a novel association. A large proportion of the identified genes were key genes in metabolic pathways, underlining the importance of metabolic reprogramming in prostate cancer progression. iii Table of Contents DECLARATION XI ABBREVIATIONS XII ACKNOWLEDGEMENTS XVII INTRODUCTION 1 1.1 Prostate cancer epidemiology 1 1.2 The prostate 4 1.3 Prostate pathology 8 Benign prostatic hyperplasia 8 High-grade prostatic intraepithelial neoplasia 8 Prostate cancer 9 1.4 Prostate cancer treatments 10 1.5 Prostate cancer diagnosis and the PSA test 12 Transrectal ultrasound (TRUS)-guided prostate biopsy 13 1.6 Prostate cancer prognosis 14 Staging 14 The Gleason grading system 15 Pretreatment serum PSA 16 Risk stratification 16 Limitations of clinical prognostic tools 17 Biomarker research and molecular technologies 19 The hallmark traits of cancer 21 Prostate cancer prognostic biomarkers 36 Gene expression panels 37 This study 39 TAQMAN ARRAY STUDY 40 2.1 Introduction 40 Background 40 Previous pilot study 41 iv The current study - new research 42 Aims 43 2.2 Materials and methods 44 Introduction/practical considerations 44 Sample size and power calculations 52 Ethical permissions 53 Practical methods 53 Data analysis methods 59 2.3 Results 72 Introduction 72 Samples 72 Method development of practical methods 73 Method development of data analysis 76 Optimisation of the assay 99 Independent samples t-tests 101 Univariate logistic regression analysis 104 Multivariate logistic regression analysis 116 Comparison of single, two-gene and three-gene models 134 Distribution of genes in the top gene models 142 Comparison to the pilot study 148 2.4 Discussion 159 Gleason score analysis 159 Comparison of recurrence analysis between the BB and SL study 160 BB study - recurrence analysis 162 Calibration and discrimination - evaluating the performance of the predictive models 164 Gene prevalence plots 165 Genes included in the highest multivariate logistic regression models according to χ2 167 Discussion regarding genes of interest 179 General discussion and future studies 181 2.5 Conclusions 185 APPENDIX 1 EXAMPLES OF POTENTIAL BIOMARKERS OF PROSTATE CANCER PROGRESSION 188 v APPENDIX 2 GENES ON THE BB AND SL ARRAY 194 APPENDIX 3 BACKGROUND INFORMATION 204 APPENDIX 4 SELECTION OF REFERENCE GENE 209 APPENDIX 5 BINARY LOGISTIC REGRESSION/RECEIVER OPERATING CHARACTERISTIC (ROC) CURVES 213 APPENDIX 6 ARRAYS FOR METHOD DEVELOPMENT AND TESTING THE CONTROL CDNA 221 APPENDIX 7 PATIENT DEMOGRAPHICS 224 APPENDIX 8 ELECTRONIC APPENDIX 228 APPENDIX 9 FORM UPR16 229 APPENDIX 10 DISSEMINATION LIST 230 BIBLIOGRAPHY 231 vi Table of Figures Figure 1.1 The 20 most common cancers, UK, 2015, Cancer Research UK (2) ............................................ 1 Figure 1.2 Average number of new prostate cancer cases per year and age-specific incidence rates, males, UK, 2013-2015 (7) ............................................................................................................ 2 Figure 1.3 Estimated prostate cancer incidence worldwide (estimated age-standardized rates per 100,000)(12)................................................................................................................................ 3 Figure 1.4 The location of the prostate gland (22) ...................................................................................... 4 Figure 1.5 The zones of the prostate (24, 25) .............................................................................................. 5 Figure 1.6 High power microscopic image of normal prostate gland acini (26) .......................................... 6 Figure 1.7 Schematic representation of the normal prostate epithelium (27) ............................................ 6 Figure 1.8 The TNM staging system (85).................................................................................................... 14 Figure 1.9 Gleason grading system according to the International Society of Urological Pathology (87) ............................................................................................................................................ 15 Figure 1.10 The hallmark traits of cancer (adapted from (116)) ............................................................... 21 Figure 1.11 The phases of the cell cycle showing two of the major checkpoints (G1/S, G2/M) and the molecules which regulate them ................................................................................................ 24 Figure 1.12 Steps involved in angiogenesis (168) ...................................................................................... 27 Figure 1.13 Metabolism in the healthy prostate and at different stages of prostate cancer: a) Normal prostate b) Early prostate cancer c) Late prostate cancer (adapted from (188)) ..................... 29 Figure 1.14 Schematic representation of the multiple stages of metastasis (196) ................................... 31 Figure 2.1 Genes included on the SL array and their putative roles in cancer initiation/progression ...... 48 Figure 2.2 Genes included on the BB array and their putative roles in cancer initiation/progression ...... 49 Figure 2.3 Genes on the BB array, categorised according to their roles in cancer initiation/progression 50 Figure 2.4 Amplification plots for the housekeeping gene HMBS (linear view): a) before baseline setting (Rn vs cycle number) b) after correctly setting baseline at 3-19 cycles (ΔRn vs cycle number) .................................................................................................................................... 61 Figure 2.5 Amplification plots for HMBS (log view) showing threshold value at 0.426 ............................. 61 Figure 2.6 Amplification plots for HMBS (log view) showing high precision of replicates ........................ 62 Figure 2.7 Standard curve plot for HMBS .................................................................................................. 62 Figure 2.8 ROC curve plotted in SPSS......................................................................................................... 70 Figure 2.9 Steps to gain new ethical approval to access patient data ....................................................... 80 vii Figure 2.10 Inappropriate setting of the baseline ..................................................................................... 82 Figure 2.11 The threshold is set too high .................................................................................................. 83 Figure 2.12 VBA script in Excel spreadsheet containing list of two-gene combinations ........................... 85 Figure 2.13 Example snippet of SPSS input file for logistic regression analysis ........................................