Proteomics of Prostate Proximal Fluids to Guide Biomarker Discovery

by

Katharina Fritsch

A thesis submitted in conformity with the requirements for the degree of Master of Science

Medical Biophysics University of Toronto

© Copyright by Katharina Fritsch 2019

Proteomics Panorama of Prostate Proximal Fluids

Katharina Fritsch

Master of Science

Medical Biophysics University of Toronto

2019

Abstract Prostate Cancer is the most common non-skin cancer in men. Current diagnostic factors are inaccurate in predicting outcome, resulting in over- and undertreatment of many men. Improved prognostic factors that enable non-invasive diagnosis and follow-up are strongly needed. I hypothesize that comparative proteomic profiling of a direct EPS cohort will identify prognostic biomarkers. I developed and applied a shotgun proteomics assay to a cohort of 148 clinically stratified direct EPS samples. These analyses identified 1271 peptides that are significant between risk categories. Results were independently verified in a direct EPS cohort from Virginia with 115 samples (59 overlap). 228 of 1271 peptides showed the same trend over risk groups in both datasets. Putative biomarkers will be validated using targeted proteomics assays in an independent cohort of post-DRE urines. In the future these assays could assist to accurately stratifying patients into prostate cancer risk groups.

ii

Acknowledgments

This project and my interest in research would not be possible without the influences of my past and present supervisors, colleagues and professors.

Firstly, I would like to thank my supervisor Thomas Kislinger for the opportunity to work in his proteomics laboratory at Princess Margaret Cancer Centre, Toronto, his guidance and support in conducting my Master project. I am thanking my committee members Stanley Liu and Paul Boutros for their important input and guidance throughout this project and their expertise in research, especially in prostate cancer and data analysis. Additionally, I would like to thank John O. Semmes and Julius O. Nyalwidhe from the Eastern Virginia Medical School in Norfolk, Virginia, US for sending us the clinically stratified patient samples.

Secondly, I would like to thank my coworkers in the Kislinger lab and my fellow students from the Medical Biophysics department at the University of Toronto for their help and support.

Thirdly, I am grateful towards my family and friends in Munich, who supported me from far away and stayed connected with me over the last two years living on a different continent. I am so thankful for all the amazing people I met in Toronto and get to do life with – I would not want it any other way.

Lastly, I am thanking God for all that He has given and entrusted to me (including my brain). I thank Him for always helping me to grow and flourish through the highs and the lows. For showing me that deep truth that exceeds everything, for knowing me better than any human being (including myself) and loving me the way I am and constantly healing my soul. For being the light, truth, and peace when my thoughts wage war. And for filling me with joy in every season no matter what it may bring. ‘But blessed is the one who trusts in the Lord, whose confidence is in Him. They will be like a tree planted by the water that sends out its roots by the stream. It does not fear when heat comes; its leaves are always green. It has no worries in a year of drought and never fails to bear fruit.’ Jeremiah 17:7-8 NIV

iii

Table of Contents

1. Introduction ...... 1 1.1 The prostate gland ...... 1

1.2 Prostate cancer ...... 3

1.2.1 Prostate cancer statistics and risk factors ...... 3

1.2.2 Histopathology of prostate cancer ...... 4

1.2.3 Current diagnosis and risk stratification of prostate cancer ...... 7

1.3 Prostate Cancer Biomarkers ...... 14

1.3.1 Introduction to cancer biomarkers ...... 14

1.3.2 Biomarkers in tissue proximal fluids and biological fluids ...... 15

1.4 Shotgun proteomics for biomarker discovery ...... 17

1.5 Hypothesis and Aims ...... 21

2. Materials and Methods ...... 22 2.1 Materials ...... 22

2.2 Methods ...... 23

2.2.1 Sample Cohort ...... 23

2.2.2 Bicinchoninic Acid Assay ...... 24

2.2.3 Trypsin Digestion using MStern Blotting ...... 25

2.2.4 Solid Phase Extraction ...... 26

2.2.5 Liquid Chromatography and Mass Spectrometry ...... 26

2.2.6 Analysis of Mass Spectrometry Data ...... 28

2.2.7 Ontology Pathway Analysis ...... 28

2.2.8 Differential Expression Analysis ...... 28

3. Results ...... 30 3.1 Method Optimization ...... 30 iv

3.1.1 MStern Digestion ...... 30

3.1.2 High throughput SPE in 96 well format ...... 32

3.1.3 Internal standards SUC2 and iRT ...... 33

3.2 Risk group placement ...... 35

3.3 Data analysis ...... 38

3.3.1 Data quality check ...... 38

3.3.2 Comparison to previously published datasets...... 45

3.3.3 Data overview (protein numbers, GO analysis) ...... 47

3.3.4 Differentially abundant peptides ...... 51

4. Comparison to Independent Direct EPS Data ...... 62 4.1 Data Quality for Virginia Cohort ...... 62

4.2 Qualitative Comparison of Toronto and Virginia Data ...... 66

4.3 Filtering for Peptides ...... 68

5. Discussion ...... 72

6. Outlook ...... 74

7. Abbreviation and Symbols ...... 76

8. References ...... 80

9. Supplemental Figures ...... 92

v

1. Introduction

1.1 The prostate gland

The prostate is a small exocrine gland in the male reproductive system. It contributes secreted proteins to the seminal fluid to keep sperm alive and sustain their mobility while protecting the genetic code they carry1. The gland is located at the base of the bladder in front of the rectum surrounding the urethra (Figure 1) and is comprised of 70% glandular and 30% fibromuscular or stromal tissue2.

Figure 1: Schematic of the male urogenital tract.

McNeil established the commonly accepted concept of various zones of the prostate gland with different probabilities to develop carcinomas3 (Figure 2). The peripheral zone constitutes up to 70% of the gland and comprises the prostatic glandular tissue at the apex. It shows the highest probability to give rise to carcinoma (70-80%), post-inflammatory atrophy, and chronic prostatitis4. The central zone includes the area from the confluence of the ejaculatory ducts to the prostatic urethra and covers roughly 25% of the glandular prostate. Prostatic ducts located in the central zone increase greatly in size as part of the normal aging process4. Central zone carcinoma is very rare (5%) but develops a highly aggressive form of prostate tumors5. Age- related benign prostatic hyperplasia (BPH) – and less commonly, adenocarcinoma (20-25% of

1 prostate cancer cases6) – usually develops in the transition zone, which consists of two equal portions of glandular tissue lateral to the urethra in the midgland3. Lee et al. showed, that even though transition zone carcinomas presented with larger cancer volume, they were associated with favorable pathologic features and better recurrence-free survival6.

Figure 2: Schematic of distinct zones of the prostate gland.

Histology of the prostate

The prostate gland is comprised of ductal acini embedded in stroma7. The prostatic epithelium of those ductal acini consists of three different cell types organized in layers: secretory luminal cells, basal cells, and neuroendocrine cells (Figure 3).

Figure 3: Scheme of prostatic acini.

2

Secretory luminal cells are the predominant cell type and produce prostatic proteins, such as Prostate Specific Antigen (PSA8) that is released into the lumen. Basal cells are located between the luminal cells and the basement membrane and express for example Keratin 5 and Keratin 149. Basal cells are the main pool of prostate stem cells, whereas luminal cells are generally viewed as differentiated cells with limited stem and/or progenitor cell capacity.8 Neuroendocrine cells are dispersed throughout the basal layer and mostly express Chromogranin A1.

1.2 Prostate cancer

1.2.1 Prostate cancer statistics and risk factors

Prostate cancer remains the most common non-cutaneous cancer among Canadian men with 21,300 new cases in 2017 and is the third leading cause for cancer-related death of men in Canada10. Of all diagnosed prostate cancer cases in 2018 in Canada 74% were diagnosed at stage 1 or 2 and the five-year net survival rate of early-stage disease is close to 100%10. However, the five-year net survival rate drops to 29% if the cancer is detected at stage 4. These more aggressive forms of the disease can lead to local invasion of the seminal vesicles, followed by metastasis, primarily to the bone1. Only 22.4% of diagnosed cases will present with high risk prostate cancer (stage 3 and 4)10. Approximately 60–70 % of men dying from non- prostate cancer causes will have histological evidence of prostate cancer in autopsy series, showing that prostate cancer is ubiquitous in older men11.

Risk factors

Research to this date reveals age, family history, diet, and ethnicity as the main risk factors for prostate cancer.

Age: Prostate cancer is rare in men younger than 40. The chance of having prostate cancer rises rapidly after the age of 50: about 6 in 10 cases are diagnosed in men aged 65 or older and the average age at time of diagnosis is about 6612,13. Only about 10% of all prostate cancer cases in the United States occur in men younger than 55 years, however among men

3 diagnosed with high grade and stage cancers, men with early onset prostate cancer are more likely to die from the disease14.

Family history is the strongest risk factor for prostate cancer. According to Bratt et al., a man with a father or brother diagnosed with prostate cancer had an approximately 3 times higher risk of being diagnosed himself (48%) as compared to the general population (13%)15. If men had both a brother and father diagnosed with prostate cancer, the chance of getting an aggressive type of prostate cancer by age 75 was at 14%, compared with about 5% among other men12. Currently, genetic evaluation guidelines for prostate cancer primarily focus on BRCA1 and BRCA2 testing as both gene mutations have been associated with more aggressive disease and poor clinical outcomes. For men who carry BRCA1 mutations the overall risk of prostate cancer has been reported up to 3.8-fold and up to 8.6-fold for men who carry BRCA2 mutations by age 6516,17.

Ethnicity: African Americans are about 60% more prone to prostate cancer than Caucasians and mortality among African Americans is approximately double that of whites. It is not entirely clear whether this mortality difference is due to variables in socioeconomic status and stage at diagnosis or to the underlying biology of prostate cancer18.

Diet high in saturated fat, well-done meats, and calcium is associated with an increased risk of advanced prostate cancer19. Snowdon et al. found that the risk of death from prostate cancer was 2.5 times higher in overweight men20.

1.2.2 Histopathology of prostate cancer

Prostate adenocarcinoma is the most common type of prostate cancer21. Tumors of the prostate are very heterogeneous, consisting of different cell types, degrees of anaplasia and differentiation, growth patterns, and invasive features22. Prostatic intraepithelial neoplasia (PIN) is generally understood to be a precursor of prostate cancer and virtually indistinguishable from invasive prostate cancer except that, in high-grade PIN, the basal cell layer is still at least partially intact23. It is characterized by increased growth and cell division of luminal cells and simultaneous degeneration of basal cells without affecting the basement membrane24. PIN, unlike prostate cancer, does not contribute to serum PSA or serum free PSA in blood23,25,26. As

4 high-grade PIN progresses, the likelihood of basal cell layer disruption increases. When high- grade PIN is found on needle biopsy, there is a 50% chance to detect carcinoma on subsequent biopsies over three years27. Since it is considered a risk factor for developing prostate cancer, biopsy is usually repeated even if the first biopsy remains negative28. Well differentiated prostate cancer shows proliferation of microacinar structures lined by prostatic luminal cells without an accompanying basal cell layer29 (Figure 4). Studies show contradictory results whether the tumor derives from luminal or basal cells and the origin is still unclear1,8. Prostate cancer is very heterogeneous and multifocal, showing benign glands as well as neoplastic foci of varying severity1 that are genetically distinct (nonclonal), even those in close proximity30. Hence, accurate diagnosis and prognosis remains very challenging as it requires detection of all multiple neoplastic foci.

Figure 4: Scheme of prostatic acini in healthy prostate, prostatic intraepithelial neoplasia, and prostate with carcinoma (left to right).

Intraductal carcinoma (IDC) and cribriform architecture represent unfavorable sub-pathologies in localized prostate cancer with very similar morphologic features. IDC is characterized by malignant epithelial cells invading the prostatic acini ducts with an intact basal cell layer31.

5

Although at the ISUP conference 2014 it was decided to not be included in the Gleason grading system it is associated with Gleason grade patterns 4 and 5 (it was decided to record information on IDC separately to the Gleason score)32. Studies have shown by examinations of radical prostatectomy specimens that IDC is associated with increased tumor volume, high Gleason scores, and advanced stages of disease33–35. Intraductal carcinoma shows genetic differences to high grade PIN with greater loss of heterozygosity of Tumor Protein P53 (TP53) and RB Transcriptional Corepressor 1 (RB1), and with greater frequency of ETS Transcription Factor ERG (ERG) rearrangement35. In cribriform carcinoma, peripheral structures are variably irregular, the basal cells are absent, and the nuclei are often pyknotic (irreversible condensation of chromatin in the nucleus of a cell undergoing necrosis or apoptosis) toward the centre. It is often described as gland-in-gland growth or bridging of the cells across the lumen without intervening stroma. Nuclei are uniform and nucleoli are very small or absent (Figure 5)22,36. Kweldam et al. has shown that patients with cribriform and/or intraductal carcinoma (CR/IDC), have significantly worse disease-specific survival probabilities than those without, regardless of the Gleason score37. Furthermore, patients with focal CR/IDC have similar outcome as men with extensive CR/IDC, indicating that the mere presence of this growth pattern changes disease outcome35,37.

Figure 5: Cribriform cancer showing single cells, partly intraductal patterns, and a partly invasive front36.

6

Neuroendocrine prostate cancer (NEPC) is a very uncommon (<1% of cases) but aggressive subtype of prostate cancer with most patients dying within 1 or 2 years of diagnosis38 and makes up to 25% of castration-resistant prostate cancer (CRPC) cases39. It frequently presents with symptoms related to locally invasive or metastatic disease at the time of diagnosis, such as bowel or bladder invasion, hydronephrosis, and metastasis to liver, lung, central nervous system, and bone40. NEPC differs histologically from prostate cancer and is characterized by the presence of small, round neuroendocrine cells, which do not express androgen receptor (AR) or PSA, but usually express neuroendocrine markers such as chromogranin A, synaptophysin, and neuron-specific enolase41. There is currently no effective treatment for NEPC and the rate of occurrence is predicted to rise with the use of more potent AR inhibitors39. CRPC is a type of prostate cancer that keeps growing even when the amount of testosterone in the body is reduced to very low levels42.

1.2.3 Current diagnosis and risk stratification of prostate cancer

Current diagnostic methods for prostate cancer include the quantification of Prostate Specific Antigen (PSA) in blood, followed by a Digital Rectal Examination (DRE) and magnetic resonance imaging or transrectal ultrasound (MRI/TRUS) guided needle core biopsy.

The PSA test was FDA (United States Food and Drug Administration) approved in 1986 to monitor the detection and progression of prostate cancer. PSA is produced by luminal cells, both in normal as well as cancerous acini, and its role is to liquify semen to increase the mobility of sperm43. The threshold indicating a potential tumor is set to a PSA concentration of 4 ng/mL. If the serum PSA level is above the threshold a digital rectal examination is performed, where the clinician physically palpates the prostate gland to check for abnormalities like a hard mass or nodule, induration or asymmetry that might indicate a tumor. The tumor category (T category) describes the primary tumor and is assigned based on the tumor extend (Table 1).

7

Table 1: Tumor categories in prostate cancer44,45.

T1 Tumor cannot be felt during a DRE and is not seen during imaging tests. It may be found when surgery is performed for another reason (e.g. BPH or abnormal growth of noncancerous prostate cells).

T1a Tumor is present in 5% or less of the prostate tissue removed during surgery

T1b Tumor is present in more than 5% of the prostate tissue removed during surgery

T1c Tumor was found during a needle core biopsy (usually because the patient has an elevated PSA level)

T2 Tumor is only found in the prostate, not other parts of the body

T2a Tumor involves one half of one lobe of the prostate

T2b Tumor is more than one half of one lobe of the prostate but not both sides

T2c Tumor has grown into both lobes of the prostate

T3 Tumor has grown through the prostate and into the tissue just outside of the prostate

T3a Tumor penetrated the prostate capsule on one or both sides.

T3b Tumor invaded into the seminal vesicles.

T4 Tumor has spread into other tissues next to the prostate, such as the urethra sphincter, bladder, rectum and/or wall of the pelvis

If either both an elevated PSA level and/or the DRE results indicate the presence of a tumor - a transrectal (or sometimes transperineal) systematic needle core biopsy will be performed to take tissue biopsies and further analyze the cell morphology. In general, the prostate gland is divided into sextant where one or two cores are taken from each sextant resulting in 6-12 needle biopsies. To visually guide the needle biopsy, transrectal ultrasonography (TRUS) or magnetic resonance imaging (MRI) is typically used. The benefit of TRUS is its real-time nature, low cost,

8 and simplicity46. Limitations of the use of TRUS to detect prostate cancer are its relatively poor image quality and the low intrinsic contrast between tumor and healthy tissue on ultrasound. During the procedure, the prostate is divided into six or more zones of equal volume, and a needle biopsy is obtained from each zone with one or two cores in a systematic but inherently undirect fashion46. Subsequently, the growth pattern of each tissue sample is microscopically examined by a pathologist. To describe histological features of the adenocarcinoma a so-called Gleason Score is assigned. The Gleason Score grading system got established in the 1960s by Donald Gleason and is used ever since to describe development and aggressiveness of a tumor and help estimate the prognosis of men with prostate cancer. Based on the cell morphology every architectural pattern is graded from 1 for healthy, small, uniform glands up to 5 indicating only occasional gland formation (Table 2)47.

Table 2: Architectural patterns and its Gleason grade. Adapted from Urologic Pathology: The Prostate, Tannenbaum et al47.

Taking the heterogeneity of prostatic tumors into account one grade is assigned for the most common and another for the second most common cellular pattern. Both grades combined represent a patient’s Gleason Score and are used for diagnosis and prognosis in addition to the serum PSA level and the T category from the DRE.

9

International Society of Urological Pathology (ISUP) grade groups The Gleason prostate cancer grading system still has major deficiencies as the lowest score, assigned 6, may be misunderstood as a cancer in the middle of the grading scale in contrast to low risk cancer. Furthermore, 3 + 4 = 7 and 4 + 3 = 7 are often considered the same prognostic group while resenting different outcome (88% BCR-free progression as opposed to 63%)48. Based on the revised original Gleason score a five–grade group system was established: grade group 1 (Gleason score <6), grade group 2 (Gleason score 3 + 4 = 7), grade group 3 (Gleason score 4 + 3 = 7)48, grade group 4 (Gleason score 8), and grade group 5 (Gleason score 9–10). This new grading system beginning with grade group 1 has the potential benefit of reducing fear and may contribute to a decrease in the overtreatment of low-grade prostate cancer. The new Gleason prostate cancer ISUP grading system presents more accurate grade stratification than current systems by splitting Gleason 7 into two groups48. Additionally, a grading system of five grades simplifies risk group placement based on prostate cancer prognosis.

Prostate Cancer Risk Groups Risk groups are usually assigned to help choose the best treatment and predict patient outcome. Based on the PSA level, the T category, and the Gleason Score patients are placed in low, intermediate or high-risk groups and treatment options are recommended accordingly (Table 3). Furthermore, if nodes are seen beyond pelvis through imaging or prostate specific membrane antigen (PSMA) positron emission tomography (PET) tracers, the patient is considered metastatic49. In addition to the T category, the M category is used to report, if the cancer has spread to other parts of the body outside the pelvis. M1a indicates that there are cancer cells in lymph nodes, M1b that there are cancer cells in the bone, and M1c that there are cancer cells in other parts of the body45.

Considering the slow growth of many prostate cancers, most estimates suggest that the vast majority of patients do not benefit from radical therapy as a therapeutic strategy for low-risk prostate cancer and may have been harmed by negative effects of treatment on quality-of-life (e. g. urinary, bowel, and sexual function). Hence, management like watchful waiting or active surveillance presents a good alternative to radical treatment for low risk patients. Intermediate risk patients are recommended to receive radical prostatectomy or radiotherapy. A combination of radical prostatectomy or radiotherapy and additional treatment like hormone therapy is

10 recommended for high risk patients. Treatment recommendations for low risk patients widely vary between different countries and even hospitals.

Table 3: International Society of Urological Pathology (ISUP) classification of prostate cancer44.

Risk Group Criteria Recommended Treatment

Gleason ≤6 and low PSA < 10 ng/mL and Active Surveillance T1-T2a

Gleason 7 or intermediate PSA 10-20 ng/mL or Radical Prostatectomy or Radiotherapy T2b-T2c

Gleason 8-10 or Radical Prostatectomy or Radiotherapy high PSA > 20 ng/mL or plus additional Treatment like Hormone T3-T4 Therapy

Watchful waiting and active surveillance Watchful waiting describes a less intensive surveillance with the administration of androgen deprivation therapy, whereas active surveillance involves close monitoring of the cancer progression50 with frequent PSA tests (every 3-6 month) followed by a needle core biopsy every 6-12 month if needed. When evidence of more aggressive cancer is detected, patients upgrade into a higher risk group and more radical treatment options are recommended, respectively. The application of watchful waiting or active surveillance largely depends on the hospital and the country. Expectant management in the United States for example is relatively under-utilized due to concerns of undertreatment, risking an individual’s longevity and quality of life51.

Radiation Therapy Radiation therapy like external beam radiotherapy (EBRT) and brachytherapy (BT) is used in about one-third of patients diagnosed with prostate cancer and doses of 75.6 Gy or higher are applied when conventional fractionation is used52,53. Complications of EBRT for prostate cancer 11 depend upon the dose and volume of tissue irradiated and are primarily affecting gastrointestinal, urinary, or sexual functions53. Brachytherapy places sealed radioactive sources close to or within a tumor. The sources are doubly encapsulated within sealed containers minimizing the risk of dispersion of radioactive material, that deliver tumoricidal radiation dose to a tumor while sparing surrounding normal tissue. Dose rates vary from less than 2 Gy per hour to high rates above 12 Gy per hour54.

Radical Prostatectomy During a radical prostatectomy the prostate along with seminal vesicles is removed either through retropubic, transperineal or perineal approaches. The procedure can be performed endoscopically or robot-assisted. The radical surgical management of localized prostate cancer should be individualised by considering overall general health, comorbidities, life expectancy and on informed decision made by the patient55.

Challenges of current diagnosis and prognosis The benefit of population screening for prostate cancer that leads to early detection is controversial. Several factors favor the use of population screening as many patients do not experience symptoms during early stages and hence are unlikely to seek care until the disease has progressed. This can have fatal consequences as the only potential cure for high grade localized cancer is radical prostatectomy and no cure has been found for advanced prostate cancer with metastases yet. However, with the increased performance of needle biopsies a significant number of cancers unlikely to ever become life-threatening are diagnosed resulting in overdiagnosis. Overdiagnosis leads to unnecessary treatment, exposure to the risks of side effects of treatment, and economic burden on the health care system. Despite of the high rate of overdiagnosis some clinically significant prostate cancers are still missed56. Furthermore, there is no definitive scientific evidence that population screening will reduce overall disease- specific mortality57. It is questionable if men benefit from early detection as they may be harmed more then helped, considering the negative effect of treatment on quality-of-life in terms of urinary, bowel, and sexual function compared to mostly unaggressive tumors.

Most diagnosed prostate cancers are stage 1 or 2 tumors and have a five-year net survival rate of almost 100%, that drops to 29% in stage 4 cancers (see chapter 1.2.1). Hence, it is important

12 to accurately distinguish between aggressive and low risk cancer to ensure the best treatment for each patient. Even though current methods for diagnosis and prognosis stage patients in risk groups, the rate of over- and undertreatment remains high58. PSA identifies prostate cancer with a likelihood of about 21%59 and studies show that the low specificity of PSA for prostate cancer leads to an over-detection of 27-56%60. This indicates that even though PSA is an organ specific marker it is not cancer specific, resulting in over-testing and over-diagnosis. Transrectal needle core biopsy leads to complications in 50% of cases, and include pain, hematuria, hematospermia, urinary retention, and infection61. Only around 1% of the gland is sampled during needle core biopsy, emphasizing the importance of TRUS or MRI guidance to target tumors. Unfortunately, even with the use of multiparametric MRI only about 45% of all lesions are detected62. Similarly, TRUS has a limited capacity to identify prostate cancer due to variability in the ultrasonic appearance of cancers and lack of specificity56. Another limitation of biopsies is that it is unclear how much of the cancerous foci was sampled (Figure 6) making it impossible to accurately select a Gleason Score that describes the cell morphology of the whole tumor and not just the part sampled by the biopsy. Furthermore, invasive procedures like biopsies hold a risk of bleeding (e. g. hematuria, hematochezia, and hematospermia), infection (e. g. Prostatitis) and sepsis, especially if they are performed repeatedly in the context of active surveillance.

Figure 6: Undersampling during needle core biopsy.

13

In summary, current methods to diagnose and prognose prostate cancer are associated with significant uncertainty to properly stratify men into appropriate risk/treatment groups. Consequently, we focus on finding a more accurate and less invasive method using biomarker discovery by mass spectrometry to assist the clinical decision process and improve both the process of diagnosis as well as the risk group stratification.

1.3 Prostate Cancer Biomarkers

1.3.1 Introduction to cancer biomarkers

According to the National Cancer Institute, biomarkers are biological molecules (i.e. gene mutations, proteins, metabolites, etc.), which are found in blood, other body fluids, or tissues that are signs of a normal or abnormal process, or of a condition or disease63. Biomarkers are used to measure the presence, progress or intensity of a disease and screen otherwise healthy patients for malignancy64. In patients who have been diagnosed with a cancer, biomarkers can help determine prognosis, or likelihood of disease recurrence independent of treatment65. Tumor biomarkers include cancer-specific mutations or changes in gene expression, both of which can result in aberrant protein expression which are detectable as free proteins66. Cancer biomarkers are classified into screening, diagnostic, prognostic, predictive, and pharmacodynamic markers. Screening markers are measured serially for assessing the status of a disease or medical condition or for evidence of exposure to (or effect of) a medical product or an environmental agent67. A diagnostic biomarker is used to detect or confirm the presence of a disease or condition of interest or to identify individuals with a subtype of the disease67. Prognostic markers predict the natural outcome of a patient’s disease. They are used to distinguish between good and poor outcome tumors and support the clinical decision process of whom to treat and/or how aggressive the treatment should be. Predictive biomarkers provide information about a patient’s benefit from a specific therapy, and pharmacodynamic biomarkers indicate the effect of a certain drug on the patient68. To obtain high specificity and sensitivity a biomarker panel rather than one single biomarker is desirable69. An ideal cancer biomarker would be produced by tumor tissue (not normal tissue) and robustly detectable in a minimally invasive (blood) or non-invasive (urine) manner70. Cancer is a complex disease hence it is unlikely that one single biomarker will detect cancer of a certain organ with high specificity and 14 sensitivity. Combination of multiple markers and/or approaches maybe more successful. Comparable to clinical phases of testing a new cancer drug, Pepe et al. suggested 5 phases for screening biomarker development70. Phase 1 is called the preclinical exploratory phase to identify promising directions. Here, biomarker identification studies are performed using a discovery approach with techniques such as high-throughput sequencing, gene expression arrays, and mass spectrometry for identification of individual or groups of biomarkers that differ between risk groups. Phase 2 focuses on clinical assays and validation to detect established diseases. The retrospective longitudinal phase 3 includes studies on biomarkers that detect a disease early before it becomes clinical and a ‘screen positive’ rule is defined. Prospective screening is the main goal of phase 4, were extend and characteristics of the disease are detected by the test and the false referral rate is identified. The focus in phase 5 is on cancer control to quantify the impact of screening on reducing the burden of a disease on the population. This project is associated with a biomarker development phase 1.

1.3.2 Biomarkers in tissue proximal fluids and biological fluids

Biomarker identification focuses on the biology of the tumor and its surrounding microenvironment. The analysis of body fluids and tissues is used to detect biomarkers like proteins or peptides that indicate the presence/absence of tumors or the level of tumor burden66. These proteins and peptides can be found in serum, however the high complexity of serum and low abundance of proteins of interest lead to particular challenges in proteomics- based biomarker discovery in blood71. Currently, there are several diagnostic biomarkers for prostate cancer available such as PSA (in serum), Prostate Cancer Associated Transcript 3 (PCA3) score (in post-DRE urine), Prostate Health Index (PHI) (in serum), and the four- kallikrein panel (in serum) to reduce the number of unnecessary biopsies and provide information related to the aggressiveness of the tumor72. New promising biomarkers, such as PSA glycoforms, TMPRSS2:ERG (Transmembrane Serine Protease 2) fusion gene, microRNAs, circulating tumor cells and androgen receptor variants could change the management of early prostate cancer, but need to be compared in large prospective studies to evaluate their real value in prostate cancer detection and prognosis73. Unfortunately, there are several challenges with these current biomarkers such as low specificity and/or low sensitivity 15 for prostate cancer (Table 4)74. Even though PCA3 clearly outperforms PSA (same sensitivity,

Table 4: Sensitivity and specificity of current prostate cancer biomarkers for diagnosis and prognosis 74. Biomarker Sensitivity [%] Specificity [%] PSA 65 47 PCA3 66 76 PHI 80 45

almost 30% higher specificity) its widespread use is not encouraged by the Canadian Urological Association based on available data and not publicly funded in Canada (as well as the four- kallikrein panel and the PHI test). Reasons are a high rate of undiagnosed high-grade cancers (13%) using a PCA3 score <20 and high costs of the PCA3 tests75. Hence, the discovery of novel biomarkers with higher sensitivity and specificity is pivotal to accurately and non- invasively diagnose and prognose prostate cancer. These novel biomarkers could potentially be used on its own or in combination with current biomarkers to improve prostate cancer prediction and decrease the rate of under- and overtreatment. To detect these novel biomarkers we interrogate prostate proximal fluids to focus on proteins that maybe directly related to the disease. One such fluid is direct expressed prostatic secretion (EPS), which is collected just prior to radical prostatectomy (Figure 7).

Figure 7: Scheme of collection of two fluids containing expressed prostatic secretions.

16

It contains proteins directly shed/secreted by the prostate. Even though its use in biomarker discovery is of potential value based on its close proximity to the prostate, it is of limited clinical use as EPS is collected just prior to radical prostatectomy. Another prostate proximal fluid is post digital rectal examination urine (post-DRE urine). Usually 20-50mL of first catch urine following a DRE are collected in the clinic. Post-DRE urines contain a small fraction of EPS that is expelled during the DRE and collected by the urine. Post-DRE urine is easily accessible and can be obtained longitudinal (Figure 7).

1.4 Shotgun proteomics for biomarker discovery

Mass spectrometry driven proteomics can be utilized to profile (detect and quantify) proteins and peptides present in a biological sample. Two vastly different approaches are used in proteomics called top down and bottom up. In top down, intact proteins are detected and characterized. In bottom up proteomics, proteins are subjected to proteolytic cleavage, and the peptide products are analyzed by MS76. Typically, trypsin is used to convert proteins into peptides, cleaving on the carboxy-terminal side of arginine and lysine residues. Lys-C is more stable than trypsin and often both enzymes are used in combination77. Top down approaches can provide a complete description of the primary structure of the protein with all of its modifications. However, this extensive fragmentation of intact protein ions, especially from large proteins, has proved to be difficult. Further, the distinctly different physicochemical properties of different proteins make them challenging to handle as mixtures without incurring losses of certain components78. Peptides that are analyzed in bottom up proteomics are readily solubilized and separated prior to be analyzed by the mass spectrometer, tasks that are considerably more difficult for the parent proteins used in top down proteomics as mentioned above. Unfortunately, only a small fraction of the tryptic peptides is normally detected, and only a fraction of these yield useful fragmentation ladders. The bottom up approach is suboptimal for determining modifications and alternative splice variants78. However, the technical difficulty of proteome-wide analysis at the intact protein level in top down proteomics in terms of proteome coverage, sensitivity, and throughput leads to a current preference of bottom up approaches79. The strength of proteomics is the ability to perform direct comparisons between different samples, potentially obtaining a systems level insight of phenotypic changes, revealing 17 novel insights into fundamental biology. When bottom-up proteomics is performed on a mixture of proteins it is called shotgun proteomics, named by the Yates lab as a complementary term to shotgun genomic sequencing80.

Although various types of mass spectrometers exist, the following explains the basic operation of the Q Exactive™ HF Hybrid Quadrupole-Orbitrap™ Mass Spectrometer, used for all data collection presented in this thesis. Prior to mass spectrometric analysis, peptides are usually separated by liquid chromatography (LC). Peptide samples are loaded onto a high-performance liquid chromatography (HPLC) column utilizing a reversed-phase stationary phase consisting of long hydrophobic alkyl chains (C18 beads). Peptides are then separated based on their hydrophobicity81. Columns are heated and kept at a constant temperature, as alterations affect peak shape and elution time of peptides. In proteomics applications, we often use a nano-flow high performance liquid chromatography system (nHPLC) as it provides increased sensitivity. The nHPLC column is directly coupled to an electrospray ionization (ESI) source, that applies voltage (2 kV) to the mobile phase leading to the generation of a fine electrospray (i.e. small charged droplets) with charged analyte molecules81. The ionized molecules are guided through several lenses and ion-guides to sort out uncharged particles, allowing charged peptides to pass (compare to Figure 8). A quadrupole mass filter is constructed out of four hyperbolic rods (Figure 9A), using radio frequency voltage with a direct current (DC) offset to stabilize the trajectory of a certain mass to charge ratio. Hence, enabling ions of interest to pass the mass filter, whereas all other particles with a different mass to charge ratio (m/z) are removed81. Subsequently, peptide ions pass through additional lenses and are collected in the curved linear trap (C-Trap). The collected and focused ions are then sent into the mass analyzer where their m/z is measured along with their abundance. In the Q ExactiveTM mass spectrometer, the OrbitrapTM functions as mass analyzer. Briefly, ions are trapped in an electrostatic field where they circulate around a spindle-shaped electrode (Figure 9B).

18

Figure 8: Schematic layout of the Q exactiveTM Hybrid Quadrupole-OrbitrapTM Mass Spectrometer82

Figure 9: A) Hyperbolic Quadrupole with RF voltage with DC offset. Voltages of each pair of rods are equal in amplitude but opposite in sign82. B) Schematic of Orbitrap cell and example of stable ion trajectory82.

In addition to the MS1-scan, which represents peaks of intact molecules with different m/z, peptides can be fragmented to generate fragmentation patterns that are displayed as the MS2 spectrum. In the Q-Exactive HF, fragmentation takes places in the higher-energy collisional dissociation (HCD) collision cell. The ions collected in the C-Trap are guided into the HCD collision cell for fragmentation. Inside the collision cell, the electrical potential is continuously 19 altered which causes ions to gain kinetic energy and collide with neutral gas (nitrogen) molecules, yielding fragment ions83. The particle that is fragmented is called precursor ion and resulting fragments are called product ions. These product ions are then analyzed in the Orbitrap analyzer and reported in the MS2 scan. Hence, the MS2 spectrum displays an ensemble of one particular precursor ion fragmenting at different amide bonds. The peptide backbone is fragmented by imparting energy and most likely breaks at the amide bonds. Resulting nitrogen-terminal fragments of the peptide with a charge retained by the carboxy- terminal part are called Y-ions, whereas carbon-terminal fragments with the charge retained by the amino-terminal part of the peptide are determined as B-ions (Figure 10)84.

Figure 10: Pattern of fragmentation of ions at the peptide bond between two amino acids of a peptide (own illustration).

20

1.5 Hypothesis and Aims

Prostate cancer is the most common non-cutaneous cancer in Canadian men with 21,300 new cases in 2017 and is the third leading cause for cancer-related death of men in Canada10. As many individuals present with low risk tumors the number of patients dying from prostate cancer are comparatively small. Current methods of diagnosis and prognosis are failing to accurately classify patients based on the significance of their disease and subjecting them to unnecessary risks due to overtreatment and invasive procedures of diagnosis. Establishing non-invasive molecular features capable of distinguishing insignificant prostate tumors from significant ones would have a significant clinical impact.

I hypothesize that comparative proteomic profiling of a richly annotated cohort of direct EPS will identify proteins associated with patient risk categories. To study this hypothesis, I developed a shotgun proteomics assay for high-throughput analyses which I applied to a cohort of 148 clinically stratified direct EPS samples. Subsequently, I analysed the data acquired from the direct EPS cohort to identify putative biomarkers that are associated with prostate cancer risk categories.

21

2. Materials and Methods

2.1 Materials

Biological Reagents • iRT Kit, Biognosis, Product Number: Ki-3002-1 • SUC2 Invertase, Sigma-Aldrich, (Product Number: I4504-250MG, Lot: 016K74552) • Direct EPS samples were collected at the Eastern Virginia Medical School in Norfolk, Virginia, USA under Institutional Review Board approved protocols at Urology of Virginia and Eastern Virginia Medical School (#06–12-FB-0343) and the Research Ethics Review Board at the University Health Network (10–0159-T).

Chemical Reagents Protein Concentration Determination: • Pierce™ BCA Protein Assay Kit, (Product Number: 23227, Lot: SA244533)

MStern Digestion:

• MultiScreenHTS IP Filter Plate, 0.45 µm Hydrophobic High Protein binding immobilon-P Membrane, clear, sterile (REF: MSIPS4510, Lot: R6CA20386) Rev. 07/14; PR03371; Merck KgaA • 500 mM Dithiothreitol; 1000 mM Iodacetamide; 100 mM Ammonium bicarbonate, pH 8.0; 1 mM Calcium chloride; Acetonitrile • Trypsin/Lys-C Mix, Mass Spec Grade, Promega (Lot: 0000294463)

Solid Phase Extraction:

• Formic acid, Acetonitrile, HPLC dH2O • C18 Stage Tips

Liquid Chromatography Mass Spectrometry: • Solvent A: HPLC dH2O with 0.1% formic acid • Solvent B: Acetonitrile with 0.1% formic acid

Instruments • Denville Scientific Inc. 300D Microcentrifuge C0265-24 (1000667) • CLARIOstar® Microplate reader BMG Labtech

22

• MultiScreenHTS® Vacuum Manifold Millipore Sigma • VWR® Analog Vortex Mixer 10153-838 • SAVANTTM SC250EXP SpeedVacTM Concentrator Thermo Scientific • NanoDropTM Lite UV-Vis Spectrophotometer Thermo Scientific • EASY-nLC 1000 Liquid Chromatograph Thermo Scientific (LC1000) • Thermo Scientific™ Acclaim™ PepMap™ 100 C18 LC Column; nanoViper Trap Column, C18, 3 µm, 100 Å, 75 µm x 20 mm; Thermo Scientific (P/N 164946; S/N 10619441) • EASY-Spray Column PepMap® RSLC, C18, 2 µm, 100 Å, 75 µm x 500 mm; Thermo Scientific (P/N ES803; S/N 10449098) • Q Exactive™ HF Hybrid Quadrupole Orbitrap™ Mass Spectrometer

Software • Thermo XcaliburTM version 4.0.27.19 • MaxQuant version 1.6.1.0 • R Studio 3.4.4 • AGoTool

2.2 Methods

2.2.1 Sample Cohort

148 direct expressed prostatic secretion (direct EPS) samples were collected at the Eastern Virginia Medical School in Norfolk, Virginia, USA. Direct EPS are fluids that are secreted by the prostate gland after massaging the organ. Patients were under anesthesia when the prostate gland was massaged and about 0.2 – 1 mL of the secretion got collected prior to radical prostatectomy as described previously85. Samples were diluted with saline to 5 mL and stored on ice for a maximum of 1 h. Particulates were sedimented by low-speed centrifugation and the resulting supernatants were aliquoted to 1 mL and stored at −80 °C. Diagnostic data for each patient was recorded including features like the pre-treatment PSA level in serum, the tumor extend (T-category), and the Gleason score, as well as patient information on age, body mass index (BMI), and ethnicity. Pathological information such as the tumor extend, the Gleason 23 score, and the time to biochemical recurrence (BCR), based on post treatment PSA level in serum, was also available for each patient. Biochemical recurrence is defined as a post- treatment serum PSA level above 0.2 ng/mL. The diagnostic data linked to the patients enrolled in this study are outlined in Supplemental Table 1. A brief overview is shown in Figure 11. Patients were placed into risk groups based on diagnostic features (pre-treatment PSA, Gleason Score, and tumor extend) as previously described44. As selection criteria for this cohort a diagnostic PSA level of below 20 ng/mL and post-surgery PSA of 0.1 ng/mL was selected to exclude highly metastatic patients.

Figure 11: Overview of 148 patients placed into risk groups. Within each risk group, pre-treatment PSA, diagnostic Gleason Score (d. GS), pathologic Gleason Score (p. GS), and biochemical recurrence (BCR) are shown.

2.2.2 Bicinchoninic Acid Assay

For each direct EPS sample, the amount of total protein was determined based on colorimetric detection of bicinchoninic acid (BCA) using the Thermo Scientific™ Pierce™ BCA Protein Assay. This method combines the Biuret reaction (reduction of Cu+2 to Cu+1 by proteins in an alkaline medium) with the highly sensitive and selective colorimetric detection of the cuprous cation (Cu+1) at 562 nm using a reagent containing bicinchoninic acid86. Per 96 well-plate one set of standards was prepared in duplicates with the following concentrations in mg/mL: 2.0, 1.5, 1.0, 0.75, 0.50, 0.25, 0.125, 0.025, and 0.0. For each sample a 1:5 dilution was prepared. 25 µL of standard, undiluted and 1:5 diluted sample was added to separate wells. Subsequently, 200 µL of working reagent (prepared according to the ‘User Guide: Pierce BCA Protein Assay Kit’) were added to every well (Figure 12) incubated at 37°C for 30 minutes, and

24 the absorbance was measured at 562 nm on a plate reader. The protein concentration was determined according to the Pierce user guide based on linear regression.

1 2 3 4 5 6 7 8 9 10 11 12 A S 2.000 S 2.000 S 0.000 S 0.000 E04 (1:5) E04 (1:5) E08 (1:5) E08 (1:5) E12 (1:5) E12 (1:5) E16 (1:5) E16 (1:5) B S 1.500 S 1.500 E01 E01 E05 E05 E09 E09 E13 E13 E17 E17 C S 1.000 S 1.000 E01 (1:5) E01 (1:5) E05 (1:5) E05 (1:5) E09 (1:5) E09 (1:5) E13 (1:5) E13 (1:5) E17 (1:5) E17 (1:5) D S 0.750 S 0.750 E02 E02 E06 E06 E10 E10 E14 E14 E18 E18 E S 0.500 S 0.500 E02 (1:5) E02 (1:5) E06 (1:5) E06 (1:5) E10 (1:5) E10 (1:5) E14 (1:5) E14 (1:5) E18 (1:5) E18 (1:5) F S 0.250 S 0.250 E03 E03 E07 E07 E11 E11 E15 E15 E19 E19 G S 0.125 S 0.125 E03 (1:5) E03 (1:5) E07 (1:5) E07 (1:5) E11 (1:5) E11 (1:5) E15 (1:5) E15 (1:5) E19 (1:5) E19 (1:5) H S 0.025 S 0.025 E04 E04 E08 E08 E12 E12 E16 E16 Figure 12: BCA plate scheme with standard (S) in duplicates and each sample (E01-E19) undiluted and in 1:5 dilution in duplicates.

2.2.3 Trypsin Digestion using MStern Blotting

For each sample, a volume of direct EPS equivalent to 15 µg of protein, as determined by BCA assay, was adjusted to 200 µL with 100 mM Ammonium Bicarbonate (ABC). Per sample, 2 pmol of Saccharomyces Cerevisiae Invertase (SUC2) was added as internal standard to control for sample processing biases. Dithiothreitol (DTT) was added to a final concentration of 5 mM per sample and reactions were incubated for 30 minutes at 56°C to reduce disulfide bonds of proteins. To prevent the reformation of reduced disulfide bonds, Iodacetamide (IAA) was added to a final concentration of 25 mM and the samples were incubated at room temperature for 30 minutes in the dark. The polyvinylidene difluoride (PVDF) membrane87 was equilibrated with 50 µL of 70% Ethanol, which was then passed through the membrane using vacuum suction. Secondly, two washing steps were performed with 50 µL of 100 mM ABC per well. The samples (200 µL each) were applied and passed through the PVDF membrane to capture the proteins. To remove any kind of salts and contaminants, two more washing steps with 100 µL of 100 mM

ABC each were performed. 50 µL digestion buffer containing 100 mM ABC, 1 mM CaCl2, and 5% Acetonitrile (ACN) (pH of 8.0) containing 1 µg of Trypsin-Lys-C (mass spec grade) mix was added to each well to enzymatically digest the bound proteins. To ensure that the proteins are in contact with the digestion buffer, the digestion solution got passed through the membrane by centrifugation and was collected in a flow-through plate. Each flow-through was applied back to the corresponding well and incubated at 37°C. After two hours the samples were carefully 25 pipetted up and down to ensure consistent contact of Trypsin/Lys-C with the proteins. After another two hours of incubation (equals four hours of digestion time in total), the resultant peptides were collected in a collection plate by centrifugation and remnant peptides were eluted with 50 µL of 50% ACN. The collected samples were lyophilized to complete dryness.

2.2.4 Solid Phase Extraction

Peptides were desalted using in-house made solid phase extraction (SPE) stage tips with three plugs of 3M™ Empore™ Discs C18 membrane88 to purify the tryptic peptides and remove contaminants and salts. - Equilibration solvent: Methanol - Activation solvent: Acetonitrile (ACN) and 0.1% formic acid (FA)

- Washing solvent: Water (HPLC dH2O) and 0.1% FA - Elution solvent: 80% ACN and 0.1% FA For each stage tip, 30 µL of methanol were used to equilibrate the C18 membrane, followed by centrifugation at 4.500 rpm for 30 seconds. Subsequently, the stage tips were activated with 30 µL of activation solvent, centrifuged and washed twice with 30 µL of washing solvent. Afterwards, the sample was loaded on the stage tip and passed through the C18 membrane by centrifugation to bind the peptides to the C18 reversed-phase material. Subsequently, the cartridge was washed by applying two cycles of 50 µL washing solvent. The bound peptides were eluted using 50 µL of elution solvent. This step was repeated with an additional 50 µL of the elution solvent. The eluted samples were frozen with dry ice and subsequently lyophilized using a vacuum concentrator. 24 µL MS solvent A (see ‘2.1 Materials’) was used to resuspend the samples. To each sample 2.4 µL indexed Retention Time (iRT) stock was added, thoroughly mixed and 11 µL per sample were loaded onto the liquid chromatography column.

2.2.5 Liquid Chromatography and Mass Spectrometry

Each direct EPS sample was separately analyzed on the Q Exactive HF by nanoflow LC using the Thermo Scientific™ EASY – nLC™ 1000 system. 11 μL of each sample were injected onto

26 an Acclaim™ PepMap™ 100 C18 LC pre-column of 75 µm inner diameter and 20 mm in length with 100 Å pore size and 3 µm particle size. The pre-column is connected to an EASY-SprayTM C18 LC column of 75 μm inner diameter and 50 cm in length with 100 Å pore size and 2 µm particle size. The mixture of tryptic peptides was separated at a flow rate of 250 nL/min using a linear gradient (Table 5). To equilibrate the analytical column and the pre-column 3 µL of MS solvent A was used at a maximum back pressure of 740 bars.

Table 5: MS gradient displaying content of MS solvent B (0.1% FA in ACN) at start and end of each section throughout the gradient with a total length of 135 minutes.

The Q Exactive HF instrument was operated using a top-15 data dependent MS/MS acquisition method where a full scan was collected from m/z 350 - 1800 at a resolving power of 60,000 full width at half maximum (FWHM) with an automated gain control (AGC) target of 3 x 106 and a maximum ion injection time (IT) for MS/MS of 40 ms. AGC regulates the number of ions in the mass analyzer to reduce space charge effects. The MS or MS/MS event will be performed if criteria of either the AGC or IT are met, based on which is reached first89. MS/MS spectra were recorded at a resolving power of 30,000 FWHM with an AGC target of 2 x 105 and a maximum IT of 55 ms. The quadrupole isolation window was set to 1.4 m/z, a Normalized Collision Energy (NCE) of 27, intensity threshold was set to 1.8 x 104, and a dynamic exclusion of 40 s was used. Every sample was run once using the gradient described above. To ensure stability of LC/MS- MS performance the system was calibrated once per week and a quality control sample was run and analyzed, followed by five washes.

27

2.2.6 Analysis of Mass Spectrometry Data

MaxQuant version 1.6.1.0 was used to analyze the collected mass spectrometry data. Chromatograms of all 148 samples were searched at the same time with the matching between runs feature enabled. The false discovery rate (FDR) threshold on the peptide and protein level was set to 0.01. Carbamidomethyl of cysteine was specified as fixed modification. Peptides were required to have a minimum length of seven amino acids and a maximum mass of 4600 Da. Further downstream analysis of the results was performed with in-house-developed tools based on R (version 3.4.4) scripting for statistical analysis and ggplot for data visualization. For each sample label free quantification (LFQ) adjusted intensity-based absolute quantification (iBAQ) intensities were calculated with a minimum of 2 peptides per protein90.

2.2.7 Pathway Analysis

Pathway enrichment analysis was performed using AGoTool91 accessed on February 13, 2019. First, the whole dataset was used as foreground with the human proteome as background, method was set to compare_samples, all gene ontology (GO) terms were used, both under- and overrepresentation was considered, and the Benjamini Hochberg method was selected for correction of multiple testing with an FDR-cutoff of 0.01. Second, the dataset was ranked by intensity and separated into pentiles. Each pentile was used as foreground and the direct EPS dataset as background, previous settings were applied.

2.2.8 Differential Expression Analysis

We are aiming to identify biomarkers that assist in placing patients accurately in either the low risk group, potentially suitable for active surveillance, or the intermediate/high risk group which will receive more aggressive treatment. Hence, we are interested in proteins and peptides that are differentially expressed between patients placed into the above risk groups. For statistical significance testing we split the cohort according to the patient’s diagnostic features into a low risk group (n = 76) and an intermediate/high risk group (n = 72). P values were calculated with the Mann-Whitney U test using the Benjamini Hochberg method for adjustment for multiple

28 comparisons, missing values were set to random values between 0.1 to 1.0 and the cutoff was set to a p value of 0.1 or less. The fold change between both groups was calculated using the median with missing values set to NA with a cutoff of 1.5 or higher. Peptides were filtered for biological (proteotypic) and physicochemical (shorter than 22 amino acids, excluding missed cleavages, only double or triple charged peptides) parameters to meet criteria for targeted mass spectrometry assays like Parallel Reaction Monitoring (PRM).

29

3. Results

3.1 Method Optimization

Sample preparation methods in proteomics can vary in reproducibility and time requirement, impacting both data quality and expenses. To gain optimal results and achieve in-depth proteomic profiling, the choice of sample processing methods as well as LC-MS/MS settings is crucial. In order to achieve high-quality results, it is important to perform in-depth analyses that are reproducible over all samples of the study cohort. Especially in large-scale studies it is pivotal to keep time investment and expenses to a minimum while ensuring strong reproducibility and acquiring high-quality data.

3.1.1 MStern Digestion

Conventional sample processing methods in proteomics (i.e. in-solution-based sample processing) do not easily provide reproducibility and often don’t allow for high throughput analysis as needed in large cohort studies if no liquid handling robots are available. To overcome this issue, we applied a 96-well plate compatible membrane-based proteomic sample processing method, termed MStern Blotting87. Hereby, a large-pore hydrophobic polyvinylidene difluoride (PVDF) membrane binds proteins, resulting in fast liquid transfer through the membrane using vacuum suction, significantly reducing sample processing time while ensuring high reproducibility. Previous protocols in the Kislinger lab used mass weight (MW) exclusion filtration applications that lead to tedious sample processing of up to 27 hours and high amount of starting material of up to 100 µg required. To evaluate faster methods that require less starting material both a C8 magnetic bead and the MStern protocol were tested. The MStern approach allowed the shortest sample processing time of just 6.5 hours with a starting material of only 10-15 ug equivalent to 200 µL of direct EPS compared to both MW filter and C8 beads (Figure 13). In comparison to the original MW-exclusion filtration method, MStern and C8 magnetic beads provided sufficient peptide yield for LC-MS and only a small decrease in the number of detected protein groups (Figure 14 left), while using 95% less starting material and 70% less time (Figure 13) resulting in a more efficient capture of the urinary proteome. The peptide yield post sample preparation was higher for MStern compared to the C8 magnetic

30 beads approach (Figure 14 right). MStern was chosen over C8 magnetic beads because of the higher throughput that can be achieved with MStern due to its 96 well plate format. Using a magnet for a large batch of samples would be tedious and the results between both methods are very comparable.

Figure 13: Comparison of sample preparation with three different methods: Mass weight (MW) filter, C8 magnetic beads, and MStern.

Figure 14: Number of protein groups (left) and peptide yield (right) achieved with MW-exclusion filtration method, MStern, and C8 magnetic beads.

31

The PVDF membrane that is used in the MStern approach is available as 96-well plates, allowing for parallel sample processing (Figure 15). With this experimental set-up all 148 direct EPS samples could be processed on the same day, significantly reducing batch effects during sample preparation. For further explanation of the MStern protocol, please refer to the Methods chapter.

Figure 15: MStern plate on vacuum unit with urine samples. Figure 2: MStern set up: PVDF membrane in 96-well plate on top of vacuum suction unit. Some well are filled with 3.1.2 High throughputurine (instead SPE in of 96direct well EPS) format for displaying purposes.

Peptides obtained by enzymatic digestion need to be further purified to avoid long loading times or column blockage on the LC-MS/MS instrument. Concentration and purification is routinely achieved by binding peptides to reversed-phase material in microcolumns (for example C18) and then eluting them off-line for subsequent analysis by mass spectrometry1. In this approach we modified the C18 based solid phase extraction to allow multiplexed sample preparation of all 148 samples. Here, we used in-house made microcolumns termed stage tips92 (for STop And Go Extraction). These stage tips were prepared using regular 200 µL pipette tips with three C18 disks placed at the end of the pipette tip. The loading capacity per C18 disk is 2-4 µg of protein digest and allows fast loading with low backpressure (>300 µL/min for packed column using manual force)88. The stage tips were placed in a pipette tip rack fixed on top of a waste plate for the washing steps (Figure 16 left) and on top of a collection plate for the elution (Figure 16 right). This approach is a significantly faster compared to our conventional solid

32 phase extraction set-up, were each stage tip is placed in a reaction tube and is handled and centrifuged separately (1.5 hours for 12 samples would result in 18.5 hours for 148 samples compared to 7 hours for 148 samples). Placing all stage tips in a pipette tip rack allows using a multichannel pipette for all the washing and elution steps as well as centrifuging all 148 samples in a large volume centrifuge at the same time.

Figure 16: Stage tip set up for solid phase extraction. Left: Stage tips placed in pipette tip rack on top of waste plate. Right: Stage tips in pipette tip rack placed on top of elution plate.

3.1.3 Internal standards SUC2 and iRT

With recent improvements in LC-MS technologies it is now feasible to analyze >100 clinical samples in a reasonable fast timeline. Hence, the necessity of comparing and integrating results across large scale projects, laboratories, and platforms can lead to challenges especially in proteomics studies using label-free quantification (LFQ). As different sample preparation methods and instrument settings have an impact on the acquired data, assessment of sample processing and performance of the instrument are pivotal steps in proteomics. In LFQ projects this is usually done through internal standards that are added to each sample either prior to or after sample preparation to enable subsequent normalization (or quality control) of resulting mass spectrometry data. Here, we used S. cerevisiae invertase and indexed Retention Time (iRT) peptides as two independent sets of internal standards. Invertase (SUC2) is a yeast

33 protein that was used as an internal control to evaluate potential experimental variability throughout all steps of sample processing. The same amount of SUC2 was added to each sample prior to sample preparation. Therefore, it was bound to the PVDF membrane, enzymatically cleaved to peptides and SPE purified just as any other protein in the direct EPS samples. Potential variations introduced during the sample processing can now be detected by the mass spectrometer and would show up as variability in SUC2 intensity. iRT peptides from Biognosys are 11 non-naturally occurring synthetic peptides with distinct and evenly distributed retention times throughout C18 reversed phased LC gradients (Table 6). These iRT peptides allow accurate prediction of peptide retention for any chromatographic

Table 6: Peptide sequences of 11 indexed Retention Time standards with dimensionless iRT value that defines chromatographic retention of naturally occurring peptides. Peptide Sequence dimensionless iRT LGGNEQVTR -24,92 GAGSSEPVTGLDAK 0 VEATFGVDESNAK 12,39 YILAGVENSK 19,79 TPVISGGPYEYR 28,71 TPVITGAPYEYR 33,38 DGLDAASYYAPVR 42,26 ADVTPADFSEWSK 54,62 GTFIIDPGGVIR 70,52 GTFIIDPAAVIR 87,23 LFLQFGAQGSPFLK 100 setup and are an efficient tool to monitor the performance of the LC system, enabling alignment of chromatographic peak across acquisitions. The indexed Retention Time is a dimensionless value that defines chromatographic retention of a peptide for a defined resin type (e.g. C18) relative to the iRT standard. The iRT stock was diluted 1:10 and one unit of iRT was added to each sample just before loading onto the LC column for analysis by the mass spectrometer. Using the reference sheet provided by Biognosys, the unique retention time of all 11 peptides was determined in a test run. For all 148 samples, the established retention time of the iRT peptides was monitored to check for chromatographic changes that would indicating a decrease in performance of the LC system. 34

3.2 Risk group placement

Currently, prostate cancer patients are place into risk groups and treatment options are recommended accordingly. In North America, most hospitals use ISUP criteria (pre-treatment PSA, T category, and clinical Gleason score) for risk group placements. The advantage of the biological fluid used in this project is that information on both the clinical and the pathological level were obtained. This leads to a great variation of scientific questions that can be ask and will be further described in the outlook chapter of this thesis. For initial analysis, I decided to group patients into risk groups according to International Society of Urological Pathology (ISUP) criteria (pre-treatment PSA, T category, and clinical Gleason score) based on current standard of care in prostate cancer diagnosis. With this grouping I hope to find proteins that maybe helpful in stratifying patients into current prostate cancer risk groups in a less invasive manner. Biochemical recurrence of this direct EPS cohort supports this approach, revealing more aggressive cases (with biochemical recurrence) in the high risk group compared to intermediate and low risk group (Figure 17). Further parameters for patients with BCR are metastasis-free survival (MFS) and prostate cancer-specific mortality (PCSM) predicting the overall survival (OS) of men that undergo radical prostatectomy. While a rising PSA level universally antedates metastatic progression and prostate-cancer-specific mortality (PCSM), a PSA rise is not a surrogate for these survival endpoints93.

Up- and downgrading One major clinical issue with prostate cancer treatment is the high incidence of under- and overdiagnosis (upgrading in 36.3% of 7643 patients94, 65.3% of 111 patients95). We compared diagnostic and pathologic information of all 148 patients and found that up- and downgrading based on the Gleason Score is a common phenomenon in this cohort (Figure 18). Cause could be of technical or biological origin. Technical causes could be that the needle core biopsies only sample a small portion of the prostate gland and tumors can be missed easily. Other reason can be the subjectivity of histological evaluation of biopsy tissues leading to risk group misplacement. Inter-pathologist agreement in Gleason score differs between studies as literature reporting 9.9%-72% agreement96. For this cohort, agreement between diagnostic and pathologic Gleason score is at 51.4% with a Cohen’s kappa of 0.226 (fair agreement). However, it is unknown if the reason for upgrading of patients is due to subjectivity or biology (time form

35 diagnosis to treatment in this cohort ranges between 30 days and 3.5 years with a median of 3.7 months). If excluding upgrading cases agreement is at 73.3% with a Cohen’s kappa of 0.583 (n = 86). Biological cause could be that some tumors are more aggressive than others leading to disease progression between diagnosis and treatment. As selection criteria for this cohort a diagnostic PSA level of below 20 ng/mL and post-surgery PSA of 0.1 ng/mL was selected to exclude highly metastatic patients as well as patients. This might be a reason as to why the Cohen’s kappa between diagnostic and pathologic Gleason score is only at 51.4%. Another analysis of interest would be focusing on upgrading cases versus the ones that are stagnant.

Figure 17: Biochemical Recurrence (BCR) of patient cohort placed into risk groups based on pre- treatment PSA, diagnostic Gleason Score, and diagnostic tumor extend.

36

However, without knowing whether upgrading originated from technical or biological reasons it might be difficult to draw accurate conclusions. Up- and downgrading for this cohort between diagnostic and pathological level are illustrated in Figure 18.

Figure 18: Diagnostic patient information on the left side: circle fill colour shows Gleason score, stroke colour shows pre-PSA level in serum, text inside the circle shows T category. Patients are grouped into Gleason Score category on the diagnostic level. Circle with the same position on the right side shows the pathologic information of the same patient, illustrating up- and downgrading.

37

3.3 Data analysis

Data analysis was performed using R studio. First, a data quality check was performed using quality control (QC) samples, iRT peptides and Saccharomyces Cerevisiae Invertase (SUC2). We then focused on identifying differentially expressed proteins over risk groups as well as differentially expressed peptides that could potentially be used for targeted assays in the future.

3.3.1 Data quality check

Before running the samples and after calibrating the system we ran quality controls to monitor the LC-MS/MS performance. The chromatogram of a representative run (sample E120) shows even distribution of all iRT and SUC2 peptides over the entire chromatographic gradient of 135 minutes (Figure 19). Concentrations of the internal standards were chosen to generate lower intensity peaks. Internal standards can elute at the same time as endogenous peptides, potentially resulting in signal suppression. I hence selected standard concentrations that are readily detectable but result in lower intensity signals.

Figure 19: Chromatogram of sample E120 with peaks of iRT peptides (red) and SUC2 peptides (blue). On the right side: enlarged chromatogram of internal standard peptides. 38 iRT peptides As mentioned earlier, iRT peptides allow accurate prediction of peptide retention for any chromatographic setup and are an efficient tool to monitor the performance of the LC system. In data independent acquisition (DIA) they are often used to align LC peaks across multiple runs and enable improved matching to spectral libraries. In Figure 20, I show iRT peptide elution over all 148 direct EPS samples analyzed as part of this MSc project. Overall iRT peptide retention times were reasonably stable with a variation of 0.2% to 3.4% (Table 7).

Figure 20: Retention in all 148 samples for each iRT peptide.

39

Table 7: Mean and standard deviation of each iRT peptide’s retention time over all 148 samples.

iRT Peptide Mean [min] Standard Deviation [min] Standard Deviation [%] ADVTPADFSEWSK 88.39 0.31 0.3% DGLDAASYYAPVR 85.62 0.20 0.2% GAGSSEPVTGLDAK 43.28 0.80 1.9% GTFIIDPAAVIR 110.28 0.47 0.4% GTFIIDPGGVIR 99.52 0.40 0.4% LFLQFGAQGSPFLK 117.52 0.84 0.7% LGGNEQVTR 23.76 0.80 3.4% TPVISGGPYEYR 66.41 0.71 1.1% TPVITGAPYEYR 70.73 0.51 0.7% VEATFGVDESNAK 54.65 0.43 0.8% YILAGVENSK 58.15 0.96 1.7%

Taking a closer look at each of the 11 iRT peptides revealed that not every single peptide was detected in every sample. Before adding iRT peptides to each sample we diluted the stock to reduce cost and as our main focus was to monitor the overall LC performance, detection of every iRT peptide is not absolutely crucial. Three peptides were detected in every single sample and are plotted in colour in Figure 21. The slight shift in retention time for samples 42-44 suggested potential variation in LC performance. After calibrating the liquid chromatography system, the iRT peptides eluted at the previous retention time, suggesting that the LC system performance was successfully recovered. The intensity that was detected for all 11 iRT peptides was consistent between individual samples with 4.27 x 108 ± 0.56 x 108 with a confidence interval of 95% (Figure 22) and risk groups (Mann-Whitney U test with Benjamini Hochberg adjustment of all iRT peptides combined: low – int: 0.65, low – high: 0.87, int – high: 0.65) (Figure 23).

40

Figure 21: Retention time of each iRT peptide over all 148 runs.

41

Figure 22: Intensity of all 11 iRT peptides in 148 samples.

Figure 23: Intensity of all 11 iRT peptides in 148 samples plotted per risk group.

42

SUC2 Invertase As described in the methods chapter, Saccharomyces Cerevisiae Invertase (SUC2) is a yeast protein that shows no to proteins in the human database, which allows for its use as internal standard. Here, it was used to monitor potential variation introduced during sample preparation. The SUC2 intensities detected were mostly consistent between individual samples with 3.24 x 109 ± 0.52 x 109 with a confidence interval of 95% (Figure 24), showing higher differences than the iRT peptides. Similarly, SUC2 intensities between risk groups show higher variation than iRT peptides. However, SUC2 changes between risk groups show p values of 0.15-0.95 and are therefore not significant (Mann-Whitney u test with Benjamini Hochberg adjustment of SUC2 intensities: low – int: 0.95, low – high: 0.15, int – high: 0.15) (Figure 25).

Figure 24: Intensity of SUC2 invertase in 148 samples. 43

Figure 25: Intensity of SUC2 invertase in 148 samples plotted per risk group.

Higher variation of SUC2 between samples compared to iRT intensities is to be expected as SUC2 was added at the beginning of the sample preparation as opposed to directly before the LC-MS/MS analysis. Hence, SUC2 variation can be due to differences in sample preparation as well as mass spectrometric analysis, whereas iRT variation is only due to the latter. Overall, thorough analysis of both internal standards revealed the quality of this dataset through all 148 runs with no significant changes between risk groups for both iRT peptides (BH adjusted p values of 0.65-0.87) and SUC2 (BH adjusted p values of 0.15-0.95) and a standard deviation of below 16% with a 95% confidence interval (13.1% for iRT peptides and 16.0% for SUC2).

44

3.3.2 Comparison to previously published datasets

Next, I compared the proteomics results obtained as part of this study to previously published direct EPS proteomes97,98. Both Drake et al. (n = 9) and Kim et al. (n = 16) used a different approach for sample preparation (mass weight filter) and data acquisition (9-step MudPIT analysis). The yield of unique protein groups detected in all three datasets is vastly different with a 5-fold increase of protein groups per sample detected in my dataset (Figure 26).

Figure 26: Number of protein groups detected in direct EPS in previous projects in the lab (Drake et al. 2010 and Kim et al. 2012) and in current project (Fritsch 2019).

Qualitative comparison between all three projects revealed that most protein groups of previous projects also got detected in the current project (891 of 1019, both Drake et al. and Kim et al. combined) (Figure 27). Only 128 protein groups found in previous projects were not detected 45 in this project. Reason for this can be the different approach in sample collection as well as processing, different mass spectrometry instrumentation and methods, and ultimately the protein group assembly performed by MaxQuant that may slightly differ. Astonishingly, the high increase in protein groups detected was achieved using less starting material (7 µg compared to 100 µg), faster sample preparation (6.5 hours instead of 27 hours) and a by far shorter gradient (135 minutes instead of 18 hours) (Table 8). This increase is primarily due to technical improvements throughout recent years and a novel and quicker sample preparation approach.

Figure 27: Qualitative comparison of previous (Drake in 2010 and Kim in 2012) and current (Fritsch in 2019) direct EPS projects in the Kislinger lab based on protein groups.

Table 8: Comparison of sample processing and results of previous and current projects in the Kislinger lab

46

3.3.3 Data overview (protein numbers, GO analysis)

In the direct EPS dataset 2865 protein groups and 44693 peptides were detected in total. In the low risk group, an average of 1968 protein groups and 13689 peptides were detected, 1864 protein groups and 12599 peptides were detected in the intermediate risk group, and 1842 protein groups and 12208 peptides in the high-risk group (Figure 28). The high difference in total peptides (44693) and average peptides detected per sample (13689) can be due to several reasons: biological difference between the 148 patients, and peptides with missed cleavages that may not be detected in any other sample are the most likely. Furthermore, a higher inter- sample variability at the peptide level than on the protein group level, is to be expected and was observed in this dataset as well.

Figure 28: Protein groups and peptides detected in each sample grouped into risk groups.

The numbers of detected protein groups and peptides is decreasing by risk groups (from low to high). The exact reason for this trend remains unclear, however potential cause could be that low risk samples contain both naturally occurring and in tumor elevated proteins whereas higher cancer samples express primarily those proteins that are enriched in tumors. An additional reason may be histologic changes in the tissue, with increased dedifferentiation in the high-risk group, which could potentially result in compromised secretory function. Of all 2865 protein groups that were detected in this dataset 336 protein groups were found in all 148 samples (Figure 29). I separated the 2865 protein groups into pentiles based on their median abundance

47

(pentiles are colour coded in Figure 29). As expected, high-abundant protein groups (pentile 1) were detected in a larger fraction of samples and the majority of lower-abundant protein groups (pentile 5) were found in a smaller number of samples. Protein groups detected with the highest abundance include Albumin and other blood proteins like AZGP1 (Alpha-2-Glycoprotein 1).

Figure 29: Distribution of protein quantitation measured as median intensity by the number of samples they are detected in. Bar plot on top shows the total counts of proteins quantified in various number of samples. Colour indicates what quantitative pentile of the dataset the protein group was detected in.

48

AZGP1 however is also associated with worse clinical outcomes in prostate cancer (protein levels measured by immunohistochemistry and RNA expression measured by RNA in situ hybridization on tissue microarray of 1275 radical prostatectomy patients)100. Most proteins encoded by prostate cancer driver of this dataset were detected in all 148 samples (KLK3, ACPP, TGM4, TMPRSS2, CHD1, and FOLH1) with an exception of PSCA (n = 111), NEFH (n = 137), KLK2 (n = 147), and NKX3-1 (n = 97) of the analyzed direct EPS samples. Kallikrein Related Peptidase 3 (KLK3) is associated with prostate cancer risk, disease-specific survival and pre-diagnostic PSA levels and encodes PSA which is specific for the prostate101. Acid phosphatase prostate (ACPP) is secreted by epithelial cells of the prostate gland and is associated with prostate cancer102. 4 (TGM4) catalyzes the cross-linking of proteins and the conjugation of polyamines to specific proteins in the seminal tract and is found to be overexpressed in prostate cancer (immunostaining on tissue microarray of tumor tissue and paracarcinoma tissue of 159 radical prostatectomy patients)103. Contradictory findings were published regarding the prognostic role of Prostate Stem Cell Antigen (PSCA) as to if its expression is associated with favorable (immunohistochemistry of radical prostatectomy specimens from 13,660 patients of different risk groups)104 or unfavorable (immunohistochemistry of 25 normal tissues, 112 primary prostate cancers and nine prostate cancers metastatic to bone)105 outcome. Kallikrein Related Peptidase 2 (KLK2) is primarily expressed in prostatic tissue and is responsible for cleaving pro-prostate-specific antigen into its enzymatically active form. It is found to be highly expressed in prostate tumor cells and may be a prognostic marker for prostate cancer risk (immunohistochemistry of six hormone refractory prostate cancer specimens, defined by failure in complete androgen blockade therapy and 12 hormone-sensitive prostate cancer specimens, transient DNA transfection, and promoter reporter assay of four cell lines, xenograft animal models of 15 mice)106. Transmembrane Serine Protease 2 (TMPRSS2) was demonstrated to be up-regulated by androgenic hormones in prostate cancer cells and down-regulated in androgen-independent prostate cancer tissue. Numerous studies have evaluated the association of TMPRSS2:ERG and outcome of prostate cancer patients with varying results107. A loss of function of Cadherin 1 (CHD1) is thought to contribute to cancer progression by increasing proliferation, invasion, and/or metastasis and is known to be a prostate-specific tumor suppressor108. Neurofilament Heavy (NEFH) was found to be downregulated in prostate cancer (RNA extracted from radical

49 prostatectomy specimens were obtained from seven patients)109. Folate Hydrolase 1 (FOLH1), also called PSMA (Prostate Specific Membrane Antigen) is up-regulated in cancerous cells in the prostate and is used as an effective diagnostic and prognostic indicator of prostate cancer. It is used in gene therapy approaches to prostate cancer110. NK3 Homeobox 1 (NKX3-1) is a negative regulator of epithelial cell growth in prostate tissue and a marker of prostatic origin in metastatic tumors111. All these genes discussed above are associated with prostate cancer and some were found to be up- or downregulated in prostate cancer. The latter are potentially interesting in stratifying patients into prostate cancer risk groups. However, most research groups compare non-cancerous to cancerous tissue as opposed to evaluating different risk groups against one another. Unfortunately, research of the genes discussed above reveal contradictory study results regarding their expression in cancer compared to normal tissue for some genes (e. g. PSCA). Other genes like TMPRSS2 have found to behave differently in various prostate cancer subtypes. Hence, the complexity of prostate cancer leads to challenges in understanding the cancer biology and finding more accurate ways of diagnosing and prognosing prostate cancer.

Gene Ontology For a potentially better understanding of the direct EPS dataset I performed gene ontology analysis with the human proteome as background. The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time in a given type of cell or organism112. Hence, terms that are significantly over- or underrepresented in my data are of special interest as they represent the uniqueness of direct EPS compared to the available ‘human proteome’. A number of Gene Ontology terms were significantly enriched in the direct EPS data (Figure 30). They were mainly implicated in immune system process and cell killing in the biological process category (BP), as well as structural molecule activity and downregulation of transcription regulator activity in the molecular function category (MF). Notably, many proteins were recognized as blood microparticles and circulating immunoglobulin complexes as well as localized in the membrane-enclosed lumen as well as extracellular region, as indicated by their overrepresentation in Gene Ontology terms. Of UniProt keywords acute phase, proteasome, and signalosome are the most over-represented

50 categories whereas processes in DNA binding and transcription seem to be under-represented in the direct EPS dataset compared to the whole human proteome.

Figure 30: Functional analysis of over- and under-represented terms of the acquired direct EPS proteome. Annotations were sourced from UniProt keywords and slim GeneOntology. The spot size represents the magnitude of enrichment, and color denotes over- (red) and under-represented (blue) terms. The background shade denotes Benjamini-Hochberg corrected p values.

3.3.4 Differentially abundant peptides

Current methods to diagnose and prognose prostate cancer fail in accurately place patients into clinical risk groups. Therefore, we analyzed the direct EPS data to identify biomarker candidates that are potentially related with prostate cancer risk groups. As explained in chapter ‘1.2.3 Current diagnosis and risk stratification of prostate cancer’ we categorized patients into 51 risk groups based on the ISUP criteria (pre-PSA, tumor extend, and Gleason Score). The direct EPS cohort was compared with a prostate tissue dataset113 (n = 76) to focus on peptides also detected in primary tumor tissue.

For statistical analysis we compared samples from the low risk group (n = 76) to a combined group of intermediate and high risk patients (n = 72). My goal was to differentiate between patients that most likely need treatment compared to those that are more likely placed on active surveillance. Up- and downregulation of peptides over risk groups was determined according to the abundance trend between low (n = 76), intermediate (n = 50) and high risk patients (n = 22).

I used 26328 peptides for further evaluation, which were detected in both the prostate tissue and the direct EPS dataset (Figure 31). The long-term goal of this project is to develop targeted mass spectrometry assays for verification and validation. This requires certain biological and physicochemical properties of potential peptides of interest:

- Uniqueness: only proteotypic peptides (uniquely identify one protein within the human protein sequence database) should be selected

- Peptide length of less than 22 amino acids: Peptides longer than 21 amino acids tend to have singly charged fragments with m/z’s that exceed the functional mass range of the mass spectrometer114. Also, synthetic peptides for targeted assays tend to be more expensive the longer the sequence.

- No mis-cleavages: when using trypsin to digest proteins to peptides they should be fully tryptic without mis-cleavages

- A peptide charge state that fragments better and generates the most sensitive measurements should be selected. Doubly or triply charged precursor ions are favorable due to their measurable m/z ranges.

After filtering for all biological and physicochemical properties, the pool of potential peptide candidates dropped to 15645. The dataset does not show a normal distribution, that is why the Mann-Whitney U test with a Benjamini Hochberg adjustment for multiple comparison testing was chosen. Statistical significance threshold was set to 0.1 at this stage of the project and will be adjusted to a more conservative value at verification and validation stages. 52

Figure 31: Flowchart of filters applied identify differentially abundant peptides in the acquired direct EPS data.

Results of the analysis resulted in 2179 peptides showing statistical significance at a Benjamini Hochberg adjusted p-value of ≤ 0.1. This pool of peptides was then split into groups based on

53 their trend over all three risk groups. A consistent trend is shown by 888 peptides (e.g. STIGIGIEGFSELIQR of ALOX15B, Figure 32 A) and 478 peptides a consistent increasing trend (e.g. NYTPQLSEAEVER of MMP8, Figure 32 B).

A B

C

Figure 32: A) peptide STIGIGIEGFSELIQR of ALOX15B with decreasing trend over risk groups; B) Peptide NYTPQLSEAEVER of MMP8 showing increasing trend over risk groups; C) Peptide AAGCDFTNVVK of HRSP12 showing no clear trend over risk groups.

54

There are 813 peptides that are significant but do not show a clear up- or downregulated trend. These peptides for example show higher abundance in the low-risk group, low in the intermediate, and higher again in the high-risk group but passed testing for significance as we only compared low risk with grouped intermediate and high risk groups (e.g. AAGCDFTNVVK of HRSP12, Figure 32 C). Finally, only quantotypic peptides of the 888 and 478 remaining peptides were selected. Quantotypic peptides are characterized by a correlation in abundance to the abundance of their parent protein115. The abundance of a protein is assembled from the intensity profiles of all the peptides mapped to this specific protein. Some of these peptides might show a slightly different trend in abundance than its parent protein does. For instance, DWEMHVHFK expressed by LMAN2 is slightly upregulated in abundance over risk groups whereas LMAN2 itself shows a downregulated trend (Figure 33).

Figure 33: Abundance of LMAN2 in direct EPS (left) and abundance of peptide DWEMHVHFK of LMAN2 in direct EPS showing opposite abundance trend than the protein group it is mapped to.

This can be explained by a lower abundance (106) of the peptide, technical issues like signal suppression through high number of peptides being analyzed at the same time and slight differences during the enzymatic digestion. In total 406 increasing and 865 decreasing quantotypic peptides were found. After applying all biological, physicochemical, and statistical filters in total 1271 peptides remained and were used for further analysis.

55

Analysis of 1271 (865 decreasing and 406 increasing) peptides

The fold change of these 1271 peptides between the two designated risk-group categories (low vs. combined intermediate and high) yields a range of -6.23 to +6.86, however most peptides show a change of around 2.5-fold (Figure 34 upper panel). Peptide HEQVYIR of ACPP shows the biggest fold change of 6.23 for peptides with a decreasing trend over risk groups and was detected in 137 samples.

Figure 34: Upper panel shows the fold change (intermediate + high over low risk group: downregulated peptides on the left and upregulated peptides towards the right of the plot) of 1271 quantotypic peptides that remain after applying filters and testing for significance. The colour indicates peptides that are detected in all (blue) or only some (grey) of the 148 samples. The lower panel shows the exact number of samples each peptide got detected (peptide sequences are mapped to the upper panel).

56

The largest fold change for peptides with increasing trend is represented by TSTADYAMFK of Fibrinogen Gamma Chain (FGG) with a fold change of 6.86 and was detected in 79 samples. In fact, 6 peptides in the top ten peptides with the biggest fold change are mapped to FGG with a fold change range of 3.99-6.86 that were detected in an average of 98.5 samples (52-125 samples). 3 other peptides of the top ten are mapped to Fibrinogen Beta Chain (FGB) with a fold change range of 3.98-4.37 that were detected in an average of 80 samples (71-94 samples). Fibrinogen has been shown to promote growth of lung and prostate cancer cells116. Only 60 peptides are detected in all 148 samples (34 increasing and 26 decreasing), 66.5% (845) of the total of 1271 peptides are detected in at least 100 samples and 84.5% (1074) peptides are detected in at least 50% of the samples (74 samples) (Figure 34 lower panel and Figure 35).

Figure 35: Number of peptides (y axis) that are detected in a certain number of samples (x axis), showing that 60 peptides are detected in all 148 samples. 57

Those 60 peptides detected in all samples were selected for further analysis. Initially, peptides detected in all samples were mapped to their gene product (if multiple peptides were detected, then the average was reported) (Figure 36).

Figure 36: 60 peptides plotted by gene name. The fold change was calculated using the fold change of the peptide of interest and the average of the peptides of interest in case of multiple peptides per gene. The colour represents the peptide count per gene of the pool of 60 peptides.

Downregulated peptides: Of the peptides that are downregulated (higher abundant in low and less abundant in intermediate/high-risk group), 62% (16 of 26) are mapped to prostate related genes such as KLK3, PEBP1, ACPP, DBI, TMPRSS2, and AZGP1 (Figure 35). Prostatic Binding Protein (PEBP1) presents a decreased protein expression in prostate cancer tissue and was found to

58 be a potential prognostic marker in prostate cancer117,118. Acid Phosphatase Prostate (ACPP) is secreted by epithelial cells and known as a tumor suppressor102. Diazepam Binding Inhibitor (DBI) was found to become dysregulated during progression to androgen independence119. Transmembrane Serine Protease 2 (TMPRSS2) is associated with prostate cancer. Numerous studies have evaluated the association of TMPRSS2-ERG and outcome of prostate cancer patients with varying results107. Loss of Alpha-2-Glycoprotein 1 (AZGP1) expression is associated with worse clinical outcomes in a multi-institutional radical prostatectomy cohort100. Creatine Kinase B (CKB) shows the highest average fold change of 2.07 with one peptide and is directly followed by ACPP with 7 peptides and an average fold change of 1.98. CKB is a cytoplasmic enzyme involved in energy homeostasis that has been associated with disease of the prostate gland120. AZGP1 was quantified by 4 peptides with an average fold change of 1.42. DBI has an average fold change of 1.34 with 2 peptides and the average fold change of PEBP1 is 1.28 with 2 peptides as well. All four genes have been associated with prostate cancer before. Exemplary, the abundance of ATQIPSYK and FQELESETLK that are both mapped to ACPP show a clear and significant decreasing trend over all three risk groups (Figure 37).

Figure 37: Abundance over risk groups of ATQIPSYK (left) and FQELESETLK (right) mapped to ACPP showing a clear and significant decreasing trend.

59

Upregulated Peptides: All peptides that are upregulated (lower abundant in low-risk and higher abundant in the intermediate/high-risk groups) are mapped to plasma proteins or proteins of the immune system (Figure 35). However, some of these were also found to play a significant role in prostate cancer such as serum Albumin (ALB) and IGHA1. Serum ALB level, which is commonly used to assess the nutritional status, was found to be an important prognostic factor in advanced cancer121,122. Low pre-operative levels of serum albumin could predict lymph node metastases and ultimately correlated with a biochemical recurrence of prostate cancer in radical prostatectomy patients121,122. The same albumin trend was found in previous projects in the lab as well97,98. Immunoglobulin Heavy Constant Alpha 1 (IGHA1) takes part in the binding and uptake of ligands by scavenger receptors and in vesicle-mediated transport and was shown to be involved in prostate cancer118. The gene product with the highest average fold change of 2.49 is C3 with 3 peptides that meet the filtering criteria. IGHA1 is represented by 2 peptides with an average fold change of 2.13. Transferrin (TF) has an average fold change of 2.04 with 6 peptides and the average fold change of ALB is 1.75 with 9 peptides as well. Besides IGHA1 there is also IGHG1 that was found to be upregulated with a fold change of 1.89, represented by one peptide. Both have previously been studied along with IGHD, IGHG3, IGLC2, and IGLJ3 as biomarkers for better prognosis in triple-negative breast cancer as they are related to B cell- specific immunoglobulin and were found to function as tumor suppressors123. Another interesting fact is that TF (6 peptides with a fold change of 2.04), HPX (1 peptide with a fold change of 1.48), and LTF (1 peptide with a fold change of 2.29) are all involved in iron metabolism. Hemopexin (HPX) binds heme and transports it to the liver for breakdown and iron recovery118. Additionally, HPX is necessary to build the majority of Matrix Metallopeptidase domains (MMP; exemptions are MMP-7, MMP-23, and MMP-26), which were found to be associated with prostate cancer124. Lactotransferrin (LTF) regulates iron homeostasis and LTF receptors were found to be overexpressed on prostate cancer cell lines125,126. These data suggest that iron is a micronutrient that supports prostate cancer cell growth and survival. In this context, it has been shown that iron-rich media supports cancer cell growth, while iron starvation limits their growth127–129. Iron overload can cause an increase in reactive oxygen species, which are harmful for cell structures. However, cancer cells have developed mechanisms (e.g. increased activity of enzymes with antioxidant properties) that protect them

60 against oxidative damage129. Exemplary, the abundance of DGAGDVAFVK and HSTIFENLANK that are both mapped to TF show a clear and significant decreasing trend over all three risk groups (Figure 38). More peptides are to be found in the appendix. Another interesting observation in the gene list of 60 peptides are SERPINC1 (one peptide with a fold change of 1.81), SERPINA1 (three peptides with an average fold change of 1.67), and SERPINA3 (one peptide with a fold change of 1.84). Serpins are serine proteinase inhibitors and several serpins are have been implicated in cancer progression and metastasis130.

Figure 38: Abundance over risk groups of DGAGDVAFVK (left) and HSTIFENLANK (right) mapped to TF showing a clear and significant decreasing trend.

61

4. Comparison to Independent Direct EPS Data

Independent of my proteomics analyses, our collaborators at the Eastern Virginia Medical School, Norfolk, Virginia, USA analyzed a cohort of 115 direct EPS samples using the same sample preparation protocol and a similar LC-MS set-up (Thermo Orbitrap Fusion Lumos). A sub-set of the samples analyzed in Virginia (n = 59) was overlapping between both sites. Samples ran in both sites have a correlation between 0.67 and 0.82 (Supplemental Figure 1). Similar as described above, samples were placed in low (n = 42), intermediate (n = 55), and high (n = 18) risk groups according to ISUP criteria (Figure 39). Patient information was recorded accordingly to my cohort and is available to the same extend (Supplemental Table 2).

Figure 39: Direct EPS cohort that was run in Virginia, USA. 59 of these patients overlap with the direct EPS cohort that was run in Toronto.

4.1 Data Quality for Virginia Cohort

Internal standard iRT peptides and S. cerevisiae invertase (SUC2) was used to monitor data quality accordingly. The abundance of all 11 iRT peptides was very consistent between individual samples with 8.50 x 108 ± 1.18 x 108 with a confidence interval of 95% (Figure 40) and risk groups (Figure 41) (Mann-Whitney U test with Benjamini Hochberg adjustment of all iRT peptides combined: low – int: 0.77, low – high: 0.55, int – high: 0.69).

62

Figure 40: Abundance of all 11 iRT peptides in direct EPS data acquired in Virginia.

Figure 41: Abundance of all 11 iRT peptides per sample and risk group in direct EPS data acquired in Virginia. 63

The SUC2 intensities detected were mostly consistent between individual samples with 5.68 x 109 ± 0.51 x 109 with a confidence interval of 95% (Figure 42). SUC2 changes between risk groups show p values of 0.15-0.95 and are therefore not significant (Mann-Whitney u test with Benjamini Hochberg adjustment of SUC2 invertase intensities: low – int: 0.36, low – high: 0.85, int – high: 0.41) (Figure 43).

Figure 42: Abundance of SUC2 invertase in direct EPS data acquired in Virginia.

64

Figure 43: Abundance of SUC2 per sample and risk group in direct EPS data acquired in Virginia.

Overall, thorough analysis of both internal standards revealed the quality of this dataset through all 115 runs with no significant changes between risk groups for both iRT peptides (BH adjusted p values of 0.55-0.77) and SUC2 (BH adjusted p values of 0.36-0.85) and a standard deviation of below 14% with a 95% confidence interval (13.9% for iRT peptides and 9% for SUC2).

65

4.2 Qualitative Comparison of Toronto and Virginia Data

Comparing protein groups and peptides detected in my data (Toronto) and Virginia reveals that 2865 protein groups were detected in 148 samples in Toronto and 2100 protein groups in 115 samples in Virginia with an overlap of 1951 protein groups (Figure 44).

Figure 44: Overlap of protein groups in direct EPS processed in Toronto (n = 148) and Virginia (n = 115).

Figure 45: Overlap of peptides in direct EPS processed in Toronto (n = 148) and Virginia (n = 115).

66

In my data, in total 44694 peptides were detected and 28539 in the Virginia data with an overlap of 24422 peptides (Figure 45). Surprisingly, the data aquired in Virginia contained less proteins/peptides, eventhough the latest generation of Orbitrap mass spectrometer was used. The reasons for these differences are currently not known. The number of protein groups and peptides detected in the data from Virginia is decreasing over risk groups in a similar manner as observed for my own data (Figures 46). Average detections in the low risk group are 1331 protein groups and 7790 peptides, decreasing to 1325 protein groups and 7480 peptides in the intermediate risk group, and 1268 protein groups and 7330 peptides in the high risk group.

Figure 46: Protein groups (left) and peptides (right) detected per sample in the direct EPS data acquired in Virginia.

67

4.3 Filtering for Peptides

Figure 47: Flowchart of filters applied detect differentially abundant peptides in the acquired direct EPS data.

68

The data from the Virginia Cohort was filtered for peptides of interest identical to my own data (Figure 47). In total 18839 peptides were detected in both the direct EPS data from Virginia as well as in the prostate tissue data. Of these 10219 peptides met all the required biological and physicochemical properties. When testing for significant difference in abundance between the low-risk and intermediate/high-risk groups using the Mann-Whitney U test, 1698 peptides showed a p-value ≤ 0.1. Unfortunately, none of these peptides were significant after applying the Benjamini-Hochberg adjustment, which is why the unadjusted p-values were used for this initial comparison. These 341 peptides were quantotypic and showed a decreasing trend over risk groups and 321 peptides showed quantotypic behaviour with an increasing trend over risk groups. Only 12 of these peptides were detected in all 115 samples (Figure 48), suggesting

Figure 48: Upper panel: Waterfall plot of significant peptides that meet all required criteria showing fold change on the y axis. Colour represents if peptides were detected in all (blue) or some (grey) of the 115 samples. Lower panel shows the number of samples each peptide was detected in.

69 that the data obtained in Virginia is more variable than my own data. Of the 1271 peptides from my data that are significant and meet all biological and physicochemical criteria 1136 are detected in the Virginia data. 230 peptides of these 1271 also meet all criteria and show significance in the Virginia data. Interestingly, almost all peptides have the same trend in both datasets and only two peptides are increasing over risk groups in the data from Virginia and decreasing in my data (Figure 49).

Figure 49: Correlation between fold change (intermediate and high / low risk group) of the data from Virginia (y axis) and my data (x axis).

All 60 peptides that were detected in 148 direct EPS samples, show significance and meet all criteria for targeted mass spectrometry assays, could also be detected in the data from Virginia.

70

28 of these 60 peptides are significant and meet all biological and physicochemical criteria in the both datasets. Interestingly, these 28 peptides show the same trend in the data from both sites (Figure 50). Six of these 60 peptides were found to be significant (without adjusting for multiple comparisons) with the same trend and are detected in all 115 samples.

Figure 50: Fold change (intermediate and high / low risk group) correlation of 28 peptides of the data from Virginia (y axis) and my data (x axis).

71

5. Discussion

Of my direct EPS cohort 1271 peptides show significance (BH adjusted p-value ≤ 0.1) between the low-risk and intermediate/high-risk groups, are quantotypic and meet all biological and physicochemical requirements for targeted mass spectrometry assays. Of these 60 peptides could be detected in all 148 samples. For verification of my results I used a direct EPS cohort from Virginia with 115 samples of which 59 overlap with my cohort. 230 of the final 1271 peptides also meet the criteria in the data from Virginia (p-values without Benjamini Hochberg adjustment). The 60 peptides that were found in all 148 samples of my data were also detected in the data from Virginia, and 28 of these 60 peptides meet the biological and physicochemical criteria in both data sets. Six of the 28 peptides are detected in all samples of the data from both sites (EFLFSSPHGK of TF, LVAASQAALGL, VPQVSTPTLVEVSR, and YICENQDSISSK of ALB, QDPPSVVVTSHQAPGEK of AZGP1, and TTPPVLDSDGSFFLYSK of IGHG1). The abundance profiles of three of these six peptides are shown exemplarily and are highly comparable between the two datasets (Figures 51-53).

Figure 51: TTPPVLDSDGSFFLYSK abundance of IGHG1 in the data from Virginia (left) and my data (right).

72

Figure 52: Abundance of QDPPSVVVTSHQAPGEK of AZGP1 in the data from Virginia (left) and my data (right).

Figure 53: Abundance of VPQVSTPTLVEVSR of ALB in the data from Virginia (left) and my data (right).

73

6. Outlook

Prostate cancer diagnostic/prognostic factors assigned based on pre-treatment serum prostate specific antigen (PSA), the clinical T-category (tumor extent), and grade (Gleason score) of a pre-treatment biopsy stratify patients into risk groups which assist in selecting patient treatment. Current diagnostic factors are inaccurate in predicting outcome, resulting in both over- and under-treatment of many men. Men on active surveillance are required to undergo repeated needle biopsies, subjecting them to associated risks. A pressing need in prostate cancer management is the development of improved prognostic factors that enable follow-up of men with low-risk disease in a non-invasive manner.

In the current MSc project, I completed an in-depth proteomics comparison of 148 richly annotated direct EPS samples with the goal to identify novel prostate cancer biomarkers that can assist stratifying patients into prostate cancer risk groups. Analysis of the acquired data resulted in 1271 peptides (60 of which were detected in all 148 samples) that are significant between the low risk group and combined intermediate and high risk groups, meet all physicochemical filter criteria, and were also detected in prostate cancer tissue data (n = 76). Another cohort of direct EPS from 115 patients was processed and data was acquired at the Eastern Virginia Medical School in Norfolk, Virginia, USA using the same sample processing and similar instrument methods (overlap of 59 patients between my data and the data from Virginia). Comparing my data with the data from Virginia revealed 28 of the 60 peptides that meet all criteria and follow the same trend over risk groups in both datasets. Six of the 60 peptides are also detected in every sample of data from both sites. These peptides are of potential interest for further studies to find a panel of biomarkers that might be of use to differentiate between prostate cancer risk groups. Future goal is the detection of potential biomarker signatures in post-DRE urine to stratify patients into prostate cancer risk groups in a non-invasive and highly accurate manner. Post- DRE urine contains prostatic secretions diluted in urine, its collection is minimal invasive and can also be done longitudinal. Hence, it is very well suited for non-invasive diagnosis and prognosis for prostate cancer. In future projects, the final list of peptides found to be potential biomarkers will be validated in targeted mass spectrometry assays in post-DRE urine. Resulting validated peptides might then form a biomarker panel for stratifying patients into prostate cancer risk groups that would be further tested in clinical studies. 74

In this study, only a small subset of extensive patient information was used to place direct EPS samples into different risk groups. However, several questions can be asked to address different problems in prostate cancer. One of them regards the biochemical recurrence in prostate cancer, for which the dataset could be split in patients with and without biochemical recurrence per Gleason score category (potentially with time restriction to BCR within 18 months) and a similar statistical analysis would be performed. This would help understanding biological differences between prostate cancers that metastasise and those that don’t and hence might need less aggressive treatment than the first ones.

Furthermore, differentiating between tumors that grow very quickly between diagnosis and prognosis could be of very high value, as it would help to find signatures that indicate the speed of tumor growth. This question can be answered as we have information on both diagnostic and pathologic information as well as time between diagnosis and treatment.

Additionally, men from African American descent were shown to be about 60% more prone to prostate cancer than Caucasians and mortality among African Americans is approximately double that of whites18. The cohort could be split according to the patient’s ethnicity to examine if African American’s of this cohort are of higher risk than Caucasians. If so, the data could be analysed to detect potential peptides or protein groups that are in- or decreased in patients with African American descent per risk group.

Eventually, the dataset could be analyzed only with use of pathologic patient information (e.g. the pathologic Gleason score) to identify peptides or protein groups that show differential abundance between tumors with lower and higher Gleason score. The behaviour of these potential biomarkers could then be compared between pathologic and diagnostic risk groups.

Summarising, there are many different scientific questions that could be analyzed with my data. Additionally, the peptides found to be differentially abundant between prostate cancer risk groups can be used to build targeted mass spectrometry assays for further validation before potential application in the clinic.

75

7. Abbreviation and Symbols

Å Angstrom (≙ 10-10 m) ABC Ammonium Bicarbonate ACN Acetonitrile ACPP Acid phosphatase prostate AGC Automated Gain Control ALB Albumin ALOX15B Arachidonate 15-Lipoxygenase Type B AZGP1 Alpha-2-Glycoprotein 1 BCA Bicinchoninic Acid BCR Biochemical Recurrence BH Benjamini Hochberg BMI Body Mass Index BP Biological Process BPH Benign Prostatic Hyperplasia BT Brachytherapy

CaCl2 Calcium Chloride CC Cellular Component CHD1 Cadherin 1 CKB Creatine Kinase B cm Centimeter CR Cribriform Carcinoma C-Trap Curved Linear Trap Cu Copper DBI Diazepam Binding Inhibitor DC Direct Current DIA Data Independent Acquisition DNA Deoxyribonucleic acid DRE Digital Rectal Examination DTT Dithiothreitol EBRT External Beam Radiotherapy 76

EPS Expressed Prostatic Secretion ERG ETS Transcription Factor ERG ESI Electrospray Ionization FC Fold Change FDA United States Food and Drug Administration FDR False Discovery Rate FGB Fibrinogen Beta Chain FGG Fibrinogen Gamma Chain FOLH1 Folate Hydrolase 1 FWHM Full With at Half Maximum GO Gene Ontology GS Gleason Score Gy Gray (unit) HCD Higher-energy collision Dissociation HPLC High Performance Liquid Chromatography HPX Hemopexin IAA Iodacetamide iBAQ Intensity-Based Absolute Quantification IDC Intraductal Carcinoma IGHA1 Immunoglobulin Heavy Constant Alpha 1 iRT Indexed Retention Time ISUP International Society of Urological Pathology IT Injection Time KLK2 Kallikrein Related Peptidase 2 KLK3 Kallikrein Related Peptidase 3 kV Kilovolt LC Liquid Chromatography LC/MS-MS Liquid Chromatography Mass Spectrometry LFQ Label Free Quantification LTF Lactotransferrin Lys Lysin

77 m/z Mass to Charge Ratio MF Molecular Function MFS Metastasis Free Survival min Minute mM Millimolar MMP8 Matrix Metallopeptidase 8 MRI Magnetic Resonance Imaging MS Mass Spectrometry ms Millisecond MS/MS Tandem Mass Spectrometry MW Mass Weight NA Not Available/Applicable NCE Normalized Collision Energy NEFH Neurofilament Heavy nHPLC Nano-flow High Performance Liquid Chromatography NKx3-1 NK3 Homeobox 1 nL Nanoliter nm Nanometer OS Overall Survival PCA3 Prostate Cancer Associated Transcript 3 PCSM Prostate Cancer Specific Mortality PEBP1 Prostatic Binding Protein PHI Prostate Health Index PIN Prostatic Intraepithelial Neoplasia PRM Parallel Reaction Monitoring PSA Prostate Specific Antigen PSCA Prostate Stem Cell Antigen PSMA Prostate Specific Membrane Antigen PVDF Polyvinylidene Difluoride QC Quality Control RB1 RB Transcriptional Corepressor 1

78

RNA Ribonucleic Acid RPM Rounds Per Minute SPE Solid Phase Extraction STAGE STop And Go Extraction SUC2 S. Cerevisiae Invertase T Category Tumor Category TF Transferrin TGM4 Transglutaminase 4 TMPRSS2 Transmembrane Serine Protease 2 TP53 Tumor Protein P53 TRUS Transrectal Ultrasound µg Microgram µL Microliter µm Micrometer

79

8. References

1. Abate-shen C, Shen MM. Molecular genetics of prostate cancer. Genes Dev. 2000;(732):2410-2434. doi:10.1101/gad.819500.2410 2. Nargund VH, Raghavan D, Sandler HM. Urological oncology. In: Urologial Oncology. 2nd ed. Springer-Verlag London; 2015:677. 3. McNeal JE. Origin and Development of Carcinoma in the Prostate. Am Cancer Soc. 1969;23(1):24-34. 4. Onay A, Ertas G, Vural M, et al. Evaluation of peripheral zone prostate cancer aggressiveness using the ratio of diffusion tensor imaging measures. Contrast Media Mol Imaging. 2017;2017(Md). doi:10.1155/2017/5678350 5. Cohen RJ, Shannon BA, Phillips M, Moorin RE, Wheeler TM, Garrett KL. Central Zone Carcinoma of the Prostate Gland: A Distinct Tumor Type With Poor Prognostic Features. J Urol. 2008;179(5):1762-1767. doi:10.1016/j.juro.2008.01.017 6. Lee JJ, Thomas IC, Nolley R, Ferrari M, Brooks JD, Leppert JT. Biologic differences between peripheral and transition zone prostate cancer. Prostate. 2015;75(2):183-190. doi:10.1002/pros.22903 7. Hricak H, Scardino P. Prostate Cancer. Cambridge: Cambridge University Press; 2008. doi:10.1017/CBO9780511551994 8. Li X, Zhang D, Zhao S, Tang DG, Kirk JS. Prostate Luminal Progenitor Cells in Development and Cancer. Trends in Cancer. 2018;4(11):769-783. doi:10.1016/j.trecan.2018.09.003 9. Bui M, Reiter R. Stem cell genes in androgen-independent prostate cancer. Cancer Metastasis Rev. 1998;17(4):391-399. 10. Canadian Cancer Statistics Advisory Committee. Canadian Cancer Statistics 2018. Canadian Cancer Society. https://prostate-cancer.canceraustralia.gov.au/statistics. Published 2018. Accessed February 11, 2019. 11. Sakr W., Crissman J., Grignon D., et al. High grade prostatic intraepithelial neoplasia (HGPIN) and prostatic adenocarcinoma between the ages of 20-69: An autopsy study of 249 cases. In Vivo (Brooklyn). 1994;8(3):439-444. 12. The American Cancer Society medical and editorial content team J. Prostate Cancer Risk Factors. American Cancer Society. https://www.cancer.org/cancer/prostate- 80

cancer/causes-risks-prevention/risk-factors.html#written_by. Published 2016. Accessed February 11, 2019. 13. Bell KJL, Del Mar C, Wright G, Dickinson J, Glasziou P. Prevalence of incidental prostate cancer: A systematic review of autopsy studies. Int J Cancer. 2015;137(7):1749-1757. doi:10.1002/ijc.29538 14. Salinas CA, Tsodikov A, Ishak-Howard M, Cooney KA. Prostate Cancer in Young Men: An Important Clinical Entity Incidence and mortality in young men with prostate cancer. 2014;11(6):317-323. doi:10.1038/nrurol.2014.91 15. Ventimiglia E, Salonia A, Briganti A, Montorsi F. Family History and Probability of Prostate Cancer, Differentiated by Risk Category — A Nationwide Population-based Study. Eur Urol. 2017;71(1):143-144. doi:10.1016/j.eururo.2016.08.063 16. Castro E, Eeles R. The role of BRCA1 and BRCA2 in prostate cancer. Asian J Androl. 2012;14(3):409-414. doi:10.1038/aja.2011.150 17. Kote-Jarai Z, Leongamornlert D, Saunders E, et al. BRCA2 is a moderate penetrance gene contributing to young-onset prostate cancer: Implications for genetic testing in prostate cancer patients. Br J Cancer. 2011;105(8):1230-1234. doi:10.1038/bjc.2011.383 18. Gann PH. Risk Factors for Prostate Cancer. MedReviews. 2002;4:3-10. 19. G. G-MW, Zhang J. Dietary Factors and Risk of Advanced Prostate Cancer Wambui. Eur J Cancer. 2015;23(2):96-109. doi:10.1097/CEJ.0b013e3283647394.Dietary 20. Snowdon D, Phillips R, Choi W. Diet, Obesity, and Risk of Fatal Prostate Cancer. Am J Epidemiol. 1984;120(2):244-250. 21. Hammerich KH, Ayala GE, Wheeler TM. Prostate Anatomy and Surgical Pathology. In: Prostate Cancer. 1st ed. Cambridge: Cambridge University Press; 2009:1-14. 22. Mostofi F, Sesterhenn I, Davis C. Histological Typing of Prostate Tumours. 2nd ed. Springer-Verlag Berlin Heidelberg GmbH; 2002. 23. McNeal JE, Bostwick DG. Intraductal dysplasia: A premalignant lesion of the prostate. Hum Pathol. 1986;17(1):64-71. 24. Brawer MK. Prostatic Intraepithelial Neoplasia: An Overview. MedReviews. 2005;7:11- 18. 25. Mager DE, Catalona WJ, Ramos CG, Haberer B, Carvahal GF. the Effect of High

81

Grade Prostatic Intraepithelial Neoplasia on Serum Total and Percentage of Free Prostate Specific Antigen Levels. J Urol. 2003:1587. doi:10.1097/00005392- 199911000-00004 26. Morote J, Encabo G, Lopez M, De Torres IM. Influence of high-grade prostatic intra- epithelial neoplasia on total and percentage free serum prostatic specific antigen. BJU Int. 1999;84(6):657-660. doi:10.1046/j.1464-410x.1999.00213.x 27. Mydlo JH, Godec CJ. High-Grade Prostatic Intraepithelial Neoplasia. San Diego: Elsevier Inc.; 2003. doi:10.1016/B978-0-12-286981-5.X5000-9 28. E. Goluboff, D. Burzon, N. Zinner, R. S. Israeli, G. Barnette, R. Boger, J. Mitchell and MSS. Men with high-grade prostatic intraepithelial neoplasia (PIN) remain at high risk for prostate cancer even with a subsequent negative biopsy: Results of a prospective study. J Clin Oncol. 2005;23(16):1024. 29. Hammerich KH, Ayala GE, Wheeler TM. Anatomy of the prostate gland and surgical pathology of prostate cancer. In: Prostate Cancer. Cambridge University Press; 2009:5. 30. Bostwick D, Pacelli A, Blute M, Roche P, Murphy G. Prostate specific membrane antigen expression in prostatic intraepithelial neoplasia and adenocarcinoma: a study of 184 cases. Cancer. 1998;82(11):2256-2261. 31. Guo CC, Epstein JI. Intraductal carcinoma of the prostate on needle biopsy: Histologic features and clinical significance. Mod Pathol. 2006;19(12):1528-1535. doi:10.1038/modpathol.3800702 32. Kweldam CF, van Leenders GJ, van der Kwast T. Grading of prostate cancer: a work in progress. Histopathology. 2019;74(1):146-160. doi:10.1111/his.13767 33. Cohen RJ, Chan WC, Edgar SG, et al. Prediction of pathological stage and clinical outcome in prostate cancer: An improved pre-operative model incorporating biopsy- determined intraductal carcinoma. Br J Urol. 1998;81(3):413-418. 34. Kimura K, Tsuzuki T, Kato M, et al. Prognostic value of intraductal carcinoma of the prostate in radical prostatectomy specimens. Prostate. 2014;74(6):680-687. doi:10.1002/pros.22786 35. Trudel D, Downes MR, Sykes J, Kron KJ, Trachtenberg J, Van Der Kwast TH. Prognostic impact of intraductal carcinoma and large cribriform carcinoma architecture after prostatectomy in a contemporary cohort. Eur J Cancer. 2014;50(9):1610-1616.

82

doi:10.1016/j.ejca.2014.03.009 36. Iczkowski KA, Paner GP, Van Der Kwast T. The New Realization about Cribriform Prostate Cancer. Adv Anat Pathol. 2018;25(1):31-37. doi:10.1097/PAP.0000000000000168 37. Kweldam CF, Verhoef EI, van Leenders GJ, et al. Disease-specific survival of patients with invasive cribriform and intraductal prostate cancer at diagnostic biopsy. Mod Pathol. 2016;29(6):630-636. doi:10.1038/modpathol.2016.49 38. Aggarwal R, Zhang T, Small EJ, Armstrong AJ. Neuroendocrine prostate cancer : subtypes , biology , and clinical PubMed Commons. 2015;12(5):2014-2015. 39. Zhang D, Tang DG. “Splice” a way towards neuroendocrine prostate cancer. EBioMedicine. 2018;35:12-13. doi:10.1016/j.ebiom.2018.08.037 40. Marcus DM, Goodman M, Jani AB, Osunkoya AO, Rossi PJ. A comprehensive review of incidence and survival in patients with rare histological variants of prostate cancer in the United States from 1973 to 2008. Prostate Cancer Prostatic Dis. 2012;15(3):283- 288. doi:10.1038/pcan.2012.4 41. Beltran H, Rickman DS, Park K, et al. Molecular characterization of neuroendocrine prostate cancer and identification of new drug targets. Cancer Discov. 2011;1(6):487- 495. doi:10.1158/2159-8290.CD-11-0130 42. Castration-resistant prostate cancer. National Cancer Institute. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/crpc. Accessed July 12, 2019. 43. Balk BSP, Ko Y, Bubley GJ. Biology of Prostate-Specific Antigen. J Clin Oncol. 2003;21(2):383-391. doi:10.1200/JCO.2003.02.083 44. The American Cancer Society medical and editorial content team J. Prostate Cancer Stages. The American Cancer Society. https://www.cancer.org/cancer/prostate- cancer/detection-diagnosis-staging/staging.html. Published 2017. Accessed February 11, 2019. 45. Manual ACS. Prostate Cancer - Stages and Grades. Springer Verlag. 46. Xu S, Kruecker J, Turkbey B, et al. Real-time MRI-TRUS fusion for guidance of targeted prostate biosies. Comput Aided Surg. 2009;13(November 2007):255-264. doi:10.1080/10929080802364645.Real-time

83

47. Tannenbaum M. Urologic Pathology: The Prostate. Am J Surg Pathol. 1978;2(4):432. 48. Epstein JI, Zelefsky MJ, Sjoberg DD, et al. A contemporary prostate cancer grading system. Eur Urol. 2016;69(3):428-435. doi:10.1016/j.eururo.2015.06.046.A 49. Barbosa F, Queiroz M, Nunes R, Marin J, Buchpiguel C, Cerri G. Clinical perspectives of PSMA PET/MRI for prostate cancer. Clinics. 2018;73(Suppl 1):1-9. doi:10.6061/clinics/2018/e586s 50. The American Cancer Society medical and editorial content team J. Watchful Waiting or Active Surveillance for Prostate Cancer. https://www.cancer.org/cancer/prostate- cancer/treating/watchful-waiting.html#written_by. Published 2016. 51. Mir MC, Stephenson AJ. Expectant Management of Localized Prostate Cancer. In: Urological Oncology. 2nd ed. Springer Verlag; 2015:719-730. 52. Koontz BF, Lee WR. External Beam Radiation Therapy for Clinically Localized Prostate Cancer. In: Urological Oncology. 2nd ed. Springer Verlag; 2015:731-742. 53. Kollmeier MA, Zelefsky MJ. Radiation Therapy. In: Hricak H, Scardino P, eds. Prostate Cancer. 1st ed. Cambridge: Cambridge University Press; 2009:58-92. 54. DiBiase SJ, Jacobs SC. Brachytherapy for Prostate Cancer. In: Prostate Cancer: Science and Clinical Practice. ; 2003:403-408. 55. Masterson TA, Eastham JA. Surgical treatment of prostate cancer. In: Hricak H, Scardino P, eds. Prostate Cancer. 1st ed. Cambridge: Cambridge University Press; 2009:43-57. 56. Hermanns T, Kuk C, Zlotta AR. Clinical Presentation, Diagnosis and Staging. In: Urological Oncology. ; 2015:697-717. 57. Lefevre ML. Prostate cancer screening: More harm than good? Am Fam Physician. 1998;58(2):432-438. 58. Loeb S, Bjurlin M, Nicholson J, et al. Overdiagnosis and Overtreatment of Prostate Cancer. Eur Urol. 2014;65(6):1046-1055. doi:10.1016/j.eururo.2013.12.062.Overdiagnosis 59. Attard G, Parker C, Eeles RA, et al. Prostate cancer. Lancet. 2016;387:70-82. doi:10.1016/S0140-6736(14)61947-4 60. Kajikawa K, Kanao K, Kobayashi I, et al. Original Article : Clinical Investigation Optimal method for measuring tumor extent in needle biopsy specimens to identify small-volume

84

prostate cancer. Int J Urol. 2016;23:62-68. doi:10.1111/iju.12961 61. Abdelkhalek M, Abdelshafy M, Elhelaly H, Kamal M. Hemosepermia after transrectal ultrasound-guided prostatic biopsy: A prospective study. Urol Ann. 2013;5(1):30-33. 62. Johnson DC, Raman SS, Mirak SA, et al. Detection of Individual Prostate Cancer Foci via Multiparametric Magnetic Resonance Imaging. Eur Urol. 2018:1-9. doi:10.1016/j.eururo.2018.11.031 63. Definition Biomarker. National Cancer Institute. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/biomarker. Accessed June 10, 2019. 64. Goossens N, Nakagawa S, Sun X, Hoshida Y. Cancer Biomarker Discovery and Validation. Transl Cancer Res. 2015;73(4):389-400. doi:10.3978/j.issn.2218- 676X.2015.06.04.Cancer 65. Kucuk O. Cancer Biomarkers. Mol Aspects Med. 2015;45:1-2. 66. Tainsky MA. Methods and Protocol. In: Tumor Biomarker Discovery. Humana Press; 2009. 67. FDA-NIH Biomarker Working Group. Diagnostic Biomarker. In: BEST (Biomarkers, EndpointS, and Other Tools). ; 2016. 68. Scatena R. Advances in Cancer Biomarkers : From Biochemistry to Clinic for a Critical Revision. Springer Netherlands; 2015. 69. Tainsky MA. Preface. In: Tumor Biomarker Discovery: Methods and Protocols. Humana Press; 2009:5. 70. Pepe MS, Etzioni R, Feng Z, et al. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst. 2001;93(14):1054-1061. doi:10.1093/jnci/93.14.1054 71. Jin G, Zhou X, Wang H, Wong STC. Challenges in Blood Proteomic Biomarker Discovery. In: Kowalski J, Piantadosi S, eds. Computational Biology: Issues and Applications in Oncology. New York City: Springer Verlag; 2009:272-299. 72. Filella X, Fernández-Galan E, Bonifacio RF, Foj L. Emerging biomarkers in the diagnosis of prostate cancer. Pharmgenomics Pers Med. 2018;11:83-94. doi:10.2147/PGPM.S136026 73. Alford A V, Brito JM, Yadav KK, Yadav SS, Tewari AK, Renzulli J. The Use of Biomarkers in Prostate Cancer Screening and Treatment. Rev Urol. 2017;19(4):221-

85

234. doi:10.3909/riu0772 74. Sharma P, Zargar-Shoshtari K, Pow-Sang JM. Biomarkers for prostate cancer: present challenges and future opportunities. Futur Sci OA. 2016;2(1). doi:10.4155/fso.15.72 75. Rendon RA, Mason RJ, Marzouk K, Finelli A, Saad F. Canadian Urological Association recommendations on prostate cancer screening and early diagnosis. CUA. 2017;11(10). 76. Resing KA, Ahn NG. Proteomics strategies for protein identification. FEBS Lett. 2005;579(4 SPEC. ISS.):885-889. doi:10.1016/j.febslet.2004.12.001 77. Steen H, Mann M. The ABC’s (and XYZ’s) of peptide sequencing. Nat Rev Mol Cell Biol. 2004;5(9):699-711. doi:10.1038/nrm1468 78. Chait BT. Mass Spectrometry: Bottom-Up or Top-Down. Science (80- ). 2006;314(October):65-66. doi:10.1126/science.1133351 79. Catherman AD, Skinner OS, Kelleher NL. Top Down Proteomics: Facts and Perspectives Proteomics in a Post-Genomics World. Biochem Biophys Res Commun. 2015;445(4):683-693. doi:10.1016/j.bbrc.2014.02.041 80. Yates JR. Mass Spectrometry and the Age of the Proteome. Proteome. 1998;33(November 1997):1-19. 81. Douglas A, Skoog FJH, Crouch SR. Principles of Instrumental Analysis. 6th ed. Belmont, CA, USA: Thomson Brooks/Cole; 1998. 82. Thermo Scientific. Q-Exactive HF Operating Manual. 83. Olsen JV, Macek B, Lange O, Makarov A, Horning S, Mann M. Higher-energy C-trap dissociation for peptide modification analysis. Nat Methods. 2007;4(9):709-712. 84. Biemann K. Contributions of Mass Spectrometry to Peptide and Protein Structure. Biomed Environ Mass Spectrom. 1988;16:99-111. 85. Drake RR, White KY, Fuller TW, et al. Clinical Collection and Protein Properties of Expressed Prostatic Secretions as a Source for Biomarkers of Prostatic Disease. J Proteomics. 2009;72(6):907-917. doi:10.1016/j.jprot.2009.01.007.Clinical 86. Thermo Scientific. User Guide: Pierce BCA Protein Assay Kit. Pierce Biotechnol. 2011;0747(23225):1-7. doi:10.1016/j.ijproman.2010.02.012 87. Berger ST, Ahmed S, Muntel J, et al. MStern Blotting – High Throughput Polyvinylidene Fluoride ( PVDF ) Membrane- Based Proteomic Sample Preparation for 96-Well Plates.

86

Mol Cell Proteomics. 2015:2814-2823. doi:10.1074/mcp.O115.049650 88. Rappsilber J, Mann M, Ishihama Y. Protocol for micro-purification, enrichment, pre- fractionation and storage of peptides for proteomics using StageTips. Nat Protoc. 2007;2(8):1896-1906. doi:10.1038/nprot.2007.261 89. Gallien S, Duriez E, Demeure K, Domon B. Selectivity of LC-MS/MS analysis: Implication for proteomics experiments. J Proteomics. 2012;81:148-158. doi:10.1016/j.jprot.2012.11.005 90. Busse D, Li N, Dittmar G, et al. Global quantification of mammalian gene expression control. Nature. 2011;473:337-342. doi:10.1038/nature10098 91. Zhou W, Chen T, Chong Z, et al. Avoiding abundance bias in the functional annotation of post- translationally modified proteins. Nat Publ Gr. 2015;12(11):1003-1004. doi:10.1038/nmeth.3621 92. Rappsilber J, Ishihama Y, Mann M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal Chem. 2003;75(3):663-670. doi:10.1021/ac026117i 93. Mottet N, Bellmunt J, Patient EB, et al. Guidelines on Prostate Cancer. European Association of Urology; 2015. 94. Epstein JI, Feng Z, Trock BJ, Pierorazio PM. Upgrading and Downgrading of Prostate Cancer from Biopsy to Radical Prostatectomy: Incidence and Predictive Factors Using the Modified Gleason Grading System and Factoring in Tertiary Grades. Eur Urol. 2012;25(3):1019-1024. doi:10.1007/s11065-015-9294-9.Functional 95. Porcaro AB, Siracusano S, De Luyk N, et al. Low-Risk Prostate Cancer and Tumor Upgrading in the Surgical Specimen: Analysis of Clinical Factors Predicting Tumor Upgrading in a Contemporary Series of Patients Who were Evaluated According to the Modified Gleason Score Grading System. Curr Urol. 2017;10(3):118-125. doi:10.1159/000447164 96. Salmo EN. An audit of inter-observer variability in Gleason grading of prostate cancer biopsies: The experience of central pathology review in the North West of England. Integr Cancer Sci Ther. 2015;2(2):104-106. doi:10.15761/icst.1000123 97. Drake RR, Elschenbroich S, Lopez-Perez O, et al. In-depth Proteomic Analyses of Direct Expressed Prostatic Secretions. J Proteome Res. 2010;9(5):2109-2116.

87

doi:10.1371/journal.pone.0178059 98. Kim Y, Ignatchenko V, Yao CQ, et al. Identification of Differentially Expressed Proteins in Direct Expressed Prostatic Secretions of Men with Organ-confined Versus Extracapsular Prostate Cancer . Mol Cell Proteomics. 2012;11(12):1870-1884. doi:10.1074/mcp.m112.017889 99. Wyatt AW, Mo F, Wang K, et al. Heterogeneity in the inter-tumor transcriptome of high risk prostate cancer. Genome Biol. 2014;15(426):1-14. 100. Brooks JD, Wei Wei, Pollack JR, et al. Loss of expression of AZGP1 is associated with worse clinical outcomes in a multi-institutional radical prostatectomy cohort. Prostate. 2016;76(15):1409–1419. doi:10.1097/CCM.0b013e31823da96d.Hydrogen 101. Penney KL, Schumacher FR, Kraft P, et al. Association of KLK3 ( PSA ) genetic variants with prostate cancer risk and PSA levels. 2011;32(6):853-859. doi:10.1093/carcin/bgr050 102. Kong HY, Byun J. Emerging roles of human prostatic acid phosphatase. Biomol Ther. 2013;21(1):10-20. doi:10.4062/biomolther.2012.095 103. Cao Z, Wang Y, Liu ZY, et al. Overexpression of transglutaminase 4 and prostate cancer progression: A potential predictor of less favourable outcomes. Asian J Androl. 2013;15(6):742-746. doi:10.1038/aja.2013.79 104. Heinrich MC, Göbel C, Kluth M, et al. PSCA expression is associated with favorable tumor features and reduced PSA recurrence in operated prostate cancer. BMC Cancer. 2018;18(1):1-9. doi:10.1186/s12885-018-4547-7 105. Gu Z, Thomas G, Yamashiro J, et al. Prostate stem cell antigen (PSCA) expression increases with high gleason score, advanced stage and bone metastasis in prostate cancer. Oncogene. 2000;19(10):1288-1296. doi:10.1038/sj.onc.1203426 106. Shang Z, Niu Y, Cai Q, et al. Human kallikrein 2 (KLK2) promotes prostate cancer cell growth via function as a modulator to promote the ARA70-enhanced androgen receptor transactivation. Tumor Biol. 2014;35(3):1881-1890. doi:10.1007/s13277-013-1253-6 107. St. John J, Powell K, Katie Conley-LaComb M, Chinni SR. TMPRSS2-ERG fusion gene expression in prostate tumor cells and its clinical and biological significance in prostate cancer progression. J Cancer Sci Ther. 2012;4(4):94-101. doi:10.4172/1948- 5956.1000119

88

108. Augello MA, Liu D, Deonarine LD, et al. CHD1 Loss Alters AR Binding at Lineage- Specific Enhancers and Modulates Distinct Transcriptional Programs to Drive Prostate Tumorigenesis. Cancer Cell. 2019;35(4):603-617.e8. doi:10.1016/j.ccell.2019.03.001 109. Schleicher RL, Hunter SB, Zhang M, et al. Neurofilament heavy chain-like messenger RNA and protein are present in benign prostate and down-regulated in prostatic carcinoma. Cancer Res. 1997;57(16):3532-3536. 110. Watt F, Martorana A, Brookes DE, et al. A tissue-specific enhancer of the prostate- specific membrane antigen gene, FOLH1. Genomics. 2001;73(3):243-254. doi:10.1006/geno.2000.6446 111. Gurel B, Ali TZ, Montgomery EA, et al. NKX3.1 as a marker of prostatic origin in metastatic tumors. Am J Surg Pathol. 2010;34(8):1097-1105. doi:10.1097/PAS.0b013e3181e6cbf3 112. Proteome. National Cancer Institute. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/proteome. Accessed June 4, 2019. 113. Sinha A, Huang V, Livingstone J, et al. The Proteogenomic Landscape of Curable Prostate Cancer. Cancer Cell. 2019;35(3):414-427.e6. doi:10.1016/j.ccell.2019.02.005 114. Bollinger JG, Stergachis AB, Johnson RS, Jarrett D. Egertson, MacCoss MJ. Selecting Optimal Peptides for Targeted Proteomic Experiments in Human Plasma Using in vitro Synthesized Proteins as Analytical Standards. Methods Mol Biol. 2016;(1410):207:221. doi:10.1002/9781118116555.ch4 115. Worboys JD, Sinclair J, Yuan Y, Jørgensen C. Systematic evaluation of quantotypic peptides for targeted analysis of the human kinome. Nat Methods. 2014;11(10):1041- 1044. doi:10.1038/nmeth.3072 116. Sahni A, Simpson-haidaris PJ, Sahni SK, Vaday GG, Francis CW. Fibrinogen synthesized by cancer cells augments the proliferative effect of fibroblast growth factor- 2 (FGF-2). J Thromb Haemost. 2008;6(1):176-183. doi:10.1111/j.1538- 7836.2007.02808.x 117. Fu Z, Kitagawa Y, Shen R, et al. Metastasis suppressor gene Raf kinase inhibitor protein (RKIP) is a novel prognostic marker in prostate cancer. Prostate. 2006;66(3):248-256. doi:10.1002/pros.20319

89

118. Davalieva K, Kiprijanovska S, Maleva Kostovska I, et al. Comparative Proteomics Analysis of Urine Reveals Down-Regulation of Acute Phase Response Signaling and LXR/RXR Activation Pathways in Prostate Cancer. Proteomes. 2017;6(1):1. doi:10.3390/proteomes6010001 119. Ettinger SL, Sobel R, Whitmore TG, et al. Dysregulation of Sterol Response Element- Binding Proteins and Downstream Effectors in Prostate Cancer during Progression to Androgen Independence. Cancer Res. 2004;64(6):2212-2221. doi:10.1158/0008- 5472.CAN-2148-2 120. Genecards. Creatine Kinase B. https://www.genecards.org/cgi- bin/carddisp.pl?gene=CKB. Accessed June 6, 2019. 121. Wang Y, Chen W, Hu C, et al. Albumin and fibrinogen combined prognostic grade predicts prognosis of patients with prostate cancer. J Cancer. 2017;8(19):3992-4001. doi:10.7150/jca.21061 122. Sejima T, Iwamoto H, Masago T, et al. Low pre-operative levels of serum albumin predict lymph node metastases and ultimately correlate with a biochemical recurrence of prostate cancer in radical prostatectomy patents. Cent Eur J Urol. 2013;66(2):126- 132. doi:10.5173/ceju.2013.02.art3 123. Hsu HM, Chu CM, Chang YJ, et al. Six novel immunoglobulin genes as biomarkers for better prognosis in triple-negative breast cancer by gene co-expression network analysis. Sci Rep. 2019;9(1):1-12. doi:10.1038/s41598-019-40826-w 124. Nagase H, Visse R, Murphy G. Structure and function of matrix metalloproteinases and TIMPs. Cardiovasc Res. 2006;69(3):562-573. doi:10.1016/j.cardiores.2005.12.002 125. Barresi G, Tuccari G. Lactoferrin in benign hypertrophy and carcinomas of the prostatic gland. Virchows Arch A Pathol Anat Histopathol. 1984;403(1):59-66. doi:10.1007/BF00689338 126. Shen Y, Li X, Dong D, Zhang B, Xue Y, Shang P. Transferrin receptor 1 in cancer: a new sight for cancer therapy. Am J Cancer Res. 2018;8(6):916-931. 127. Fu D, Richardson DR, Dc W. Iron chelation and regulation of the cell cycle : 2 mechanisms of posttranscriptional regulation of the universal cyclin-dependent kinase inhibitor p21 CIP1 / WAF1 by iron depletion Iron chelation and regulation of the cell cycle : 2 mechanisms of posttra. 2011;110(2):752-761. doi:10.1182/blood-2007-03-

90

076737 128. Khiroya H, Moore JS, Ahmad N, et al. IRP2 as a potential modulator of cell proliferation, apoptosis and prognosis in nonsmall cell lung cancer. Eur Respir J. 2017;49(4):1600711. doi:10.1183/13993003.00711-2016 129. Vela D. Iron Metabolism in Prostate Cancer; From Basic Science to New Therapeutic Strategies. Front Oncol. 2018;8(November):1-11. doi:10.3389/fonc.2018.00547 130. Claire H, Brian CJ, Monica M, et al. Update of the human and mouse SERPIN gene superfamily. Hum Genomics. 2013;7(1):1-14. doi:10.1186/1479-7364-7-22

91

9. Supplemental Figures

Supplemental Table 1: Patient information of direct EPS cohort processed in Toronto.

92

93

94

95

Supplemental Table 2: Patient information of direct EPS cohort processed in Virginia.

96

97

98

Supplemental Figure 1: Protein correlation between sample processed in Toronto and Virginia. Upper panel shows the best correlation of all 59 overlapping samples with R2 = 0.82. Lower panel shows the poorest correlation of all 59 overlapping samples with R2 = 0.67.

99