This may be the author’s version of a work that was submitted/accepted for publication in the following source:

Farashi, Samaneh, Kryza, Thomas, Clements, Judith,& Batra, Jyotsna (2019) Post-GWAS in prostate cancer: from genetic association to biological con- tribution. Nature Reviews Cancer, 19(1), pp. 46-59.

This file was downloaded from: https://eprints.qut.edu.au/124207/

c Consult author(s) regarding copyright matters

This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the docu- ment is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recog- nise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to [email protected]

Notice: Please note that this document may not be the Version of Record (i.e. published version) of the work. Author manuscript versions (as Sub- mitted for peer review or as Accepted for publication after peer review) can be identified by an absence of publisher branding and/or typeset appear- ance. If there is any doubt, please refer to the published source. https://doi.org/10.1038/s41568-018-0087-3 Post-GWAS in prostate cancer: from genetic association to biological contribution

Samaneh Farashi1,2, Thomas Kryza1,2, Judith Clements1,2, Jyotsna Batra1,2* 1Cancer Program, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, 4102, Australia. 2Australian Prostate Cancer Research Centre – Queensland, Queensland University of Technology, Translational Research Institute, 37 Kent Street, Woolloongabba, Queensland, 4102, Australia.

Genome-wide association studies (GWAS) have been successful in deciphering the genetic component of predisposition to many human complex diseases including prostate cancer. Germline variants identified by GWAS progressively unravelled the significant knowledge gap concerning prostate cancer heritability. With the beginning of the post-GWAS era, more and more studies reveal that in addition to their value as risk markers, germline variants can exert active roles in prostate oncogenesis. Consequently, current research efforts are focused on exploring the biological mechanisms underlying specific susceptibility loci known as causal variants by applying novel and precise analytical methods to available GWAS data. Results obtained from the aforementioned post-GWAS have highlighted the potential of exploiting prostate cancer-risk associated germline variants to identify new networks and signalling pathways involved in prostate tumorigenesis. In reviewing this new field of cancer biology, we describe the molecular basis of several important prostate cancer causal variants with an emphasis on leveraging post-GWAS in yielding a much deeper insight into cancer aetiology. In addition to discussing the current status of post-GWAS studies, we also summarise the main molecular mechanisms of promising causal variants underlying prostate cancer-risk loci and argue the major challenges in moving from association to functional studies and their implication in clinical translation.

Introduction

Prostate cancer (PrCa) is the second most common cancer in men worldwide and is particularly prevalent amongst men aged >79 years1. As for most cancers, environmental

1 factors increase the risk to develop PrCa2, 3. In addition, the genetic component plays a significant role in PrCa aetiology as there is high reported heritability (57%), together with the elevated risk of PrCa in African American men and patients with family history4, 5. Estimations of the heritability and familial risk of PrCa have been partly explained by large twin and familial segregation studies, respectively4. Population-based studies such as genome-wide association studies (GWAS) have explained substantial familial relative risk (FRR, 28.4%) of PrCa5. For more than a decade, GWAS has been the gold standard to discover the link between germline variants and complex diseases, including PrCa6, 7. To date, GWAS has identified >160 common loci associated with PrCa susceptibility5, 8, 9. This large number suggests a multi- or poly-genic model of prostate carcinogenesis and has added to our knowledge of polygenic risk score (PRS) in PrCa allowing improved genetic disease prediction.

Ever since the first GWAS, a post-GWAS strategy has been proposed with the ultimate goal of understanding the biological consequences behind risk loci10, 11. Further functional studies confirmed the active role of several risk loci in PrCa13,14, thus providing greater understanding of the assigned genes5, 12-16. An important observation of the follow-up analysis of GWAS data in PrCa is that risk loci are often located within/near and/or regulate genes that define key events in tumourigenesis including cell cycle or DNA repair machinery (ATM, TERT, MYC, MDM2), inflammatory response (IL8RB, TNF and LILRA3)17 and metabolism (JAZF1, HNF1B)18-20. Consequently, it is now recognised that associated genes/regions represent ‘tumour-causal loci’ and may play an active role in oncogenesis21. This fundamentally changes the canonical belief of the non-deleterious effect of germline variants. Thus, follow-up studies of GWAS could delineate the functional mechanisms of these effects. To illustrate the importance of post-GWAS in cancer biology, in this review, we summarise the molecular consequences of several well-established causal loci identified in PrCa. In particular, we emphasise on the necessity of a well-designed post-GWAS approach (FIG 1,2) to further elucidate gene networks and biological pathways involved in PrCa. Consequently, identification of novel involved genes may enhance the translational potential of the GWAS data. This can lead to the discovery of promising targets for therapeutic intervention, new biomarkers for early detection, and predicting disease aggressiveness enable us to precisely and efficiently predict treatment outcomes8, 22.

2

Fine-mapping of the GWAS loci to pinpoint candidate causal variants

During the GWAS era it was already known that the risk loci are complex due to linkage disequilibrium (LD) (BOX 1)23, consequently identification of the causal variant is additionally complicated. To precisely identify causal variant(s) underlying an observed PrCa-GWAS signal, the region needs to be fine-mapped while accounting for SNPs in LD24,25. Moreover, imputation methods have been developed using reference panels of human haplotypes to statistically determine non-genotyped sequences24. Several fine-mapping efforts have refined independent causal SNPs16,23,25,26 involved in PrCa such as TERT15 and HNF1B16. Remarkably, the contribution of germline variants to the FRR of PrCa has increased by 4.4%, from 24% to 28.4%, as a consequence of fine-mapping of associated loci5,16. Fine mapping of particular cancer-risk associated regions extensively reported by GWAS27,28 such as the 8q2429, 22q1330 and 17q1226 loci suggests these regions harbour gene(s) that are mediators in PrCa as well as other types of cancers due to their pleiotropic effect8,24,29.

Functional variants in PrCa: upgraded from simple witnesses to active players

Several criteria need to be met for a SNP to be considered as a causal variant in PrCa (BOX 1). Notably, the SNP should have an impact, likely small, on molecular or cellular systems of prostate and/or relative cells/tissue(s). On the one hand, when localised in a gene coding region, a SNP could affect the properties of the encoded and modulate its molecular/biological functions in the prostate microenvironment (FIG 2A). On the other hand, a SNP localised in a non-coding region, is likely to affect the expression of one or several genes through different molecular mechanisms (FIG 2B-D). Many post-GWAS studies focussed on understanding the upstream regulatory effects of non-coding SNPs known as expression quantitative trait loci (eQTLs) on the genes in the vicinity of the fine-mapped SNPs5,16. These co-localised GWAS-eQTLs have been considered as causal variants in this review (BOX 1). FIGURE 2 illustrates some of the experimental approaches that have been used in previous studies, with a particular focus on the functional role of SNPs within the Kallikrein-related peptidase 3 (KLK3) gene. Prostate specific antigen (PSA) encoded by the KLK3 gene has been a major subject of interest with respect to detection bias as elevated levels of PSA are used to trigger diagnostic biopsies that often lead to unnecessary treatment since the PSA test alone cannot discriminate aggressive from indolent disease31.

3

A functional role of coding variants modulating PSA function (FIG 2A) also non-coding variants (FIG 2B,D) changing the PSA level have been described. Determination of the status of these and other variants associated with changed PSA levels32 in men could shed some light on why some men have low levels of PSA but still have PrCa and allow the development of a more “personalised” PSA test taking into account these germline variants. In addition to the regulatory role of non-coding SNPs on protein coding genes, they can affect the expression and stability of non-coding RNAs (ncRNAs) subsequently modifying their regulatory function33 (FIG 2D). Alternatively, a causal SNP could indirectly promote cancer progression by altering the tumour microenvironment. For instance, prostate cells are highly dependent on androgens synthesised by surrounding tissues (e.g. adipose) other than the prostate34. Consequently, a genetic variation modulating androgen biosynthesis could have an impact on prostate cells although the target gene is not expressed in those cells. Additionally, inflammatory modulation of key immune cells and cytokines by causal variants may contribute to PrCa progression given that several inflammatory-related SNPs influence PrCa risk35. Yet, to fully understand the contribution of causal variants to prostate tumourigenesis we need to consider all gene networks involved, in parallel.

In-silico approaches to decode fine-mapped regions. The first level of assessment of candidate functional SNPs is performed using computational strategies (reviewed by 36 37-39 Khurana E. et al. ) to prioritise risk loci for further functional studies . Current algorithms evaluate the potential impact of variants on amino acid charge, protein 3D structure or possible changes in transcription factor binding sites (TFBSs) resulting from a corresponding amino acid change. In-silico functional evaluation of variants within non-protein coding regions is mostly studied utilising the existing chromatin immunoprecipitation sequencing (ChIP-seq, see BOX 1 and FIG 1 legend) data to investigate the upstream impact of key TFs40, together with a topological-associating domain approach to further identify variants with a 41,42 long-range regulatory impact . In-silico study of 100 PrCa-risk loci by Whitington T. et al. predicted 82 loci within open chromatin40 suggesting that they can potentially modify the spatiotemporal mode of chromatin influencing nearby/distal chromatin accessibility to the regulatory machinery within the genome43,44. Furthermore, post-GWAS studies focus on understanding the allele specific regulatory effects of eQTLs, leveraging the higher probability of biological role of co-localised GWAS-eQTL loci45,46. Yet, to eliminate false

4 positives, the functionality of in-silico annotated variants as causal must be experimentally validated regardless of how strong the prediction is (BOX 2).

Unraveling the function of identified variants through experimental studies. To assess the effect of candidate SNPs, rigorous experiments that can demonstrate consequences of putative variants should be designed. Experimental techniques demonstrate the impact of minor allele variants compared to the corresponding major allele (FIG 2, BOX 2). Assigning function to SNPs within coding regions is inferred based on the gene they are located in while it includes adjacent or distal gene(s) for causal variants within non-coding regions.

Coding variants may alter protein properties. Only less than 10% of SNPs are located within protein-coding regions of the human genome7. These coding SNPs might result in amino acid substitutions that could modify: i) protein stability14, ii) structure14 or iii) biochemical properties47 (e.g. charge, solubility), thus changing the molecular function of the protein48. To date, only limited experimental data supports the mechanisms underlying the biological effects of coding variants in PrCa. The most well-known PrCa-risk associated variant is the non-synonymous substitution (Gly84Glu) in the HOXB13 TF, an important regulator of cellular response to androgens, that results from SNP rs138213197. Despite great interest, the molecular mechanism of its action remains largely unclear49,50. A recent study showed that overexpression of HOXB13 G84E reduces prostate cell growth51 but had no clear molecular explanation for this effect. The lack of definitive findings for G84E HOXB13 might lead us to three suggestions for consideration: Firstly, the effect of the risk locus represented by rs138213197 might be driven by other rare variants. Demonstration of an impact of two missense variants found by sequencing (c.383 and c.720 both C>A) on both PrCa cell proliferation and exemplifies this point50,52. Secondly, the rare but highly penetrant G84E HOXB13 mutation may cover weaker PrCa-risk associations of variants with a possible impact as it has been demonstrated by fine mapping studies of the HOXB13 region53. Finally, it is likely that interactions of the HOXB13 (G84E) protein with other functionally promotes the observed impact54. This highlights the importance of follow-up studies at the molecular level for recurrent common SNPs tagging other rare/common with less penetrance variants for such crucial molecules in prostate development. Other examples of coding causal SNPs, rs17632542 and rs61752561, are located in the KLK3 gene encoding PSA. The resulting non-synonomous amino acid change

5

(Ile179Thr) from the SNP rs17632542 major or minor allele, was predicted to affect stability of PSA containing isoleucine or threonine respectively14, with a consequent decreased stability of PSA harbouring the minor allele. Apart from the above mechanism, the minor C allele of SNP rs17632542 was previously predicted to have a role in PSA-mRNA processing and was suggested to disturb a splicing site of the PSA transcript14. This alternative transcription results in a KLK3 isoform retaining the intron between exons 3 and 4 creating a new stop codon which produces a different isoform of PSA14. Yet another PrCa-associated SNP rs61752561 (c.304 G>A) creates a new glycosylation site in the 34th codon of KLK3 promoting a change in biochemical properties of the PSA protein (FIG 2A)47. Interestingly, a functional role of SNP rs2066827 in the CDKN1B (p27) gene was predicted in PrCa55, a few years before its discovery by GWAS56. The T allele of SNP rs2066827, associated with the risk of advanced PrCa57, results in a missense change (V109G) within the residues of CDKN1B interacting with its negative regulator p38jab1 (p38 protein encoded by the Jab1 gene)55. These studies suggest that subtle modulations resulting from non-synonymous SNPs may contribute to prostate tumourigenesis bringing attention to several missense substitutions in consistently reproduced PrCa-risk loci5 confirmed by targeted next generation sequencing58 (Supplementary TABLE 1). These variants are located in genes that are involved in crucial cell cycle checkpoints including ATM, TP53, and CDKN1B55,59,60. Further functional studies of these variants are still required to delineate any effect on cellular checkpoints20. Non-coding variants regulate . More than 90% of GWAS identified PrCa variants are located in non-coding regions of the genome61 which may be functionally involved in PrCa through multiple mechanisms based on their location within proximal (promoters62, enhancers/super-enhancers63) and distal response elements (intergenic64 or intragenic65 regions) modifying transcriptional regulation (FIG 2B,C). Variants in the 5’ untranslated region (UTR) can have a regulatory effect at both the transcriptional and translational level66. Transcriptional dysregulation of the target genes is a prominent consequence of non-coding causal variants in PrCa40,67. This effect is mainly due to: i) change in TFBS motif40,64, ii) change in DNA methylation marks55 and iii) chromatin architecture alteration33, or a combination of these mechanisms. The sequence alteration at a single SNP might create a TFBS for a certain TF or inversely disrupt a pre-existing TFBS and consequently alter the formation of a complex of TFs within the region harbouring the

6 causal SNP. It has been shown that a considerably large number of PrCa-risk loci alter binding of pivotal prostate-associated TFs such as androgen receptor (AR)40,65, FOXA112,69,70,68, HOXB1340 and GATA264. The ultimate impact of individual SNPs may be small but the cumulative disruption/creation of TFBS by multiple causal SNPs can affect the expression of target genes bringing about significant changes in tumour development. The risk alleles of SNPs rs339331 and rs10993994 within the intronic region of the RFX6 gene and upstream promoter of the MSMB gene, respectively, lead to stronger binding of HOXB1369 and CREB70. These changes in TFBS motifs lead to increased enhancer activity of RFX6 or reduced promoter activity of MSMB. Therefore, observed RFX6 upregulation or lack of tumour suppressor activity of MSMB in advanced stages71 (in patients with higher- Gleason-score disease) of PrCa72 may partly be explained by the contribution of causal variants. The effect of causal variants in AR binding site modification has been demonstrated for variants within the intergenic region between KLK3 and KLK1564 (FIG 2), the enhancer of the SOX973and for an intronic variant in the MLPH65 genes. More than 30% of PrCa-risk variants have been reported within AR binding sites and likely alter the expression of the target genes regulated by AR40 thus suggesting a common biological mechanism of causal variants via modulating the AR axis65,74,75.

Histone modifications including methylation (H3K4me3, H3K36me3, mainly within CpG islands: 5mCpG/5hmCpG) and acetylation (H3K27ac) are epigenetic consequences, which might be affected by SNPs conferring a change in chromatin interactions76,77. Elevated HNF1B expression resulting from a reduced degree of promoter methylation that is associated with two SNPs, rs11649743 and rs3760511, within the HNF1B promoter suggests an epigenetic mechanism for the G allele of both SNPs78. The PrCa-risk region at 7p15.2 demonstrates long range interactions with the HOXA locus to up-regulate the HOXA13 and 42 adjacent HOTTIP genes . Interestingly, previous studies predicted differential expression 40,56 of HOXA13 linked to the 7p15.2 risk region . The effects of non-coding causal variants could be the sum of regulatory mechanisms if a SNP abrogates a TFBS in response elements and at the same time modifies histone status to a more open/closed chromatin conformation, thus, leading to modulated gene(s) expression76,79. eQTLs as the main regulatory variants could act through all the above-mentioned mechanisms changing the expression of the implicated genes19,27,80. Therefore, leveraging

7 the promising functionality of eQTLs together with the reproducibility of GWAS signals 45 might be a front-line method for prioritising likely causal variants . In concordance with this strategy, Thibodeau S.N. et al. identified shared risk loci and significant eQTLs for 51 loci of the PrCa-GWAS that are associated with 88 genes80. Recently, a study by Dadaev T. et al. showed that 40 associated regions of 100 PrCa-GWAS loci overlapped with at least one eQTL. Adding to those eQTLs, Schumacher et al. discovered 35 of 63 newly identified risk loci coincided with eQTLs5 highlighting the benefits of using risk loci data in tandem with allele specific gene expression information. However, given the limitations of eQTL studies (see Challenges section) it is crucial to combine the expression data with the available ChIP- seq datasets and long-distance interactions of regions harbouring PrCa-associated risk loci42. This is supported by the observation that several eQTL studies failed to detect some of the high-confidence causal variants due to power constraints despite the fact they were repeatedly mapped as high-risk regions in previous GWAS and validated as cancer-causal variants by further functional studies18,81-83. In particular, ethnicity seems to be a matter of 19,84 importance for the functional effect of prostate-related eQTLs .Recently, a haplotype’s influence on gene expression in complex traits has been addressed reflecting the haplotype dependency of SNPs interactions85.

Molecular effects of causal variants on ncRNA genes. Apart from the abovementioned effect of causal variants at the transcriptional level, they can impact on gene expression at the post-transcriptional level via modulation of microRNA (miRNA) binding to the 3’UTR of 33,86 target genes or their stability altering translation efficiency (FIG 2D). The regulatory role of miRSNPs in the 3’ UTR of KLK3 (FIG 2D), VAMP-8, and MDM4 (a negative regulator of p53) 60,86,87 has been described previously . The critical role of ncRNAs in PrCa is highlighted by the fact that they participate in a broad range of different mechanisms which drive tumourigenesis33,88. So far, post-GWAS have been mostly on two main types of ncRNAs, miRNAs and long non-coding RNAs (lncRNA), showing a post-transcriptional regulatory effect of causal variants when they reside within the transcribed region of ncRNAs in addition to an impact on the expression and/or the stability of ncRNAs89,90. To this end, Guo et al. conducted a comprehensive post-GWAS by integrative analysis of transcriptomic, genomic and epigenomic data33. The top-score lncRNA gene, PCAT1, was predicted to be modulated by a candidate causal risk-SNP (rs7463708). Experimental validation

8 demonstrated that this regulatory SNP affects binding of multiple key TFs, including AR, FOXA1, and HOXB13 in both PrCa cells and tumour samples affecting expression of PCAT1, which in turn regulates expression of several androgen-regulated genes33. A similar strategy could be used for characterisation of other ncRNA-related causal SNPs in PrCa.

Unmasking pathways of causality in PrCa using post-GWAS

Pathway-based analyses that examine GWAS loci (PWAS) have been utilised to identify groups of assigned genes to risk loci that share specific biological processes or cellular functions in the genetic aetiology of a disease5,16. ERK/MAPK, Wnt/β-catenin, p53 and ATM signaling pathways, G2/M DNA damage and Estrogen-mediated S-phase entry checkpoint regulation pathways were found to be enriched in GWAS loci91,92. To go further, pathway analysis strategy can be applied in post-GWAS to prioritise genes/pathways for subsequent investigations. In a first attempt of pathway analysis of post-GWAS, using five candidate causal SNPs, the damaged DNA binding and inward rectifier potassium channel activity pathways were demonstrated to be involved in PrCa91. Adding eQTLs as causal loci and their new target genes pinpointed additional pathways such as Jak-STAT signaling and cell cycle- related pathways in PrCa12,93. In a recent study, Schumacher F. et al. conducted PWAS for newly found GWAS loci including GWAS-eQTL pairs. This analysis detected the PD-1 signaling pathway as the most significant pathway in addition to other less well-known pathways in PrCa5. Understanding the connection among assigned genes to causal SNPs may lead to the elucidation of biological pathways in PrCa pathogenesis. To achieve this, we followed certain criteria (BOX 1) to gather all causal variants identified so far in PrCa (Supplementary TABLE 1), to the best of our knowledge, and performed pathway analysis for their assigned genes using Ingenuity Pathway Analysis (IPA)94. This analysis highlights several well-known signaling pathways involved in PrCa progression as well as enrichment in cancer-related gene networks (Supplementary TABLE 2). The most significant canonical pathway in this analysis is the Antigen Presentation Pathway (p-value: 3.18E-13) encompassing several molecules particularly encoded by different classes of major histocompatibility complex (HLA) genes. The p-values in this analysis are based on statistical significance of the pattern match with the published data94. Based on this analysis, the majority of the genes assigned to causal variants are cancer-related with the highest number of molecules involved in lipid metabolism, molecular transport, small molecule

9 biochemistry networks (Supplementary TABLE 2). In particular, the higher numbers of genes are involved in cellular development function. The upstream analysis of IPA performed to uncover multi-level causal relationships relevant to the experimental available data of regulators indirectly connected to this gene list. NLRC5, a member of the caspase recruitment domain-containing NLR family, was identified as the top TF (p- value:0.00000386). Other upstream molecules including B2M, HDAC1, HDAC2, CCND1 and CTNNB1 regulating a higher number of genes in this analysis suggests that they are interesting targets to conduct follow-up studies. Of note, Histone deacetylase 1 (HDAC1) regulates the highest number of genes assigned to the causal variants such as TERT, KLK3, MYC and SOX9 genes (FIG 3A). The AR was identified as an upstream regulator (p-value: 0.018) modulating expression of 11 genes including TERT, NKX3-1, MYC, MSMB (FIG 3B). The AR has been identified as an upstream regulator in previous in-silico PrCa-PWAS92. Given the fact that AR can activate other prostate signalling cascades like the MAPK, Akt, JAK-STAT3 pathways95 and is a main stream therapeutic target underscores the central importance of further investigation of related genes to causal variants in the clinic. Clinical impact of functional SNPs: Bridging genotype to phenotype

The integration of GWAS and post-GWAS outcomes in PrCa screening, risk profiling, and prevention in the clinic is currently ongoing9,24,96-99,100; although it is important to keep their limitations in mind101. Larger numbers of common genetic variants have been helpful to improve detection, surveillance and risk stratification94,95,106,107,96,102,103, remarkably, a reduction of 15-20% of unnecessary biopsies31. Adding family history104 and clinical variables105 using higher number of SNP sets have led to better risk evaluation9,22,24,100,106,8. Of note, the captured FRR and PRS results are needed to guide genetic counselling and are capable of modifying diagnostic approaches through the estimation of genetic risk for men with a familial history of PrCa regardless of their age107. The clinical usage of post-GWAS in PrCa started with a few biomarkers for risk and progression of the disease such as MSMB protein99 levels (TABLE 1) and several SNPs found in the Kallikrein region associated with PSA levels62. Integration of those causal SNPs in the PSA test could correct the bias of free/total PSA levels observed for patients carrying variants within the region in

10 combination with other influencing loci in order to limit the number of false negative diagnoses31,108.

Currently, therapeutic drugs targeting AR signalling represent the most efficient treatment for patients developing PrCa107,125. Given that a relatively large number of causal variants identified so far have been assigned to genes regulated by the AR or impacting on AR- signalling, suggests that investigation of causal variants involved in AR-associated pathways has the potential to develop more effective therapies109,127,108,126 (TABLE 1). Furthermore, therapeutic potential of several immune checkpoint inhibitors focusing on newly found signaling pathways to enhance immune responses are valuable hints for improving clinical management of PrCa109. Recently pharmacogenomics study of certain variants in DNA repair pathways have allowed risk-reduction strategies for both prognosis and prediction of therapeutic sensitivity to PARP inhibitors and platinum-based chemotherapy110. This highlights the enormous value of germline variants in the clinic with the potential that might lead to application of population-specific biomarkers that respond to a certain therapy111 to pursue the precision medicine. However, incorporating GWAS in the clinic may be limiting given the observed effect sizes of genetic variation on trait loci76 due to evolutionary selective pressures on GWAS risk loci. By contrast, pharmacological manipulation based on functionally implicated loci is not subject to this limitation, enhancing our success to efficiently improve personalised, PrCa treatment111. The knowledge generated from post- GWAS could enable us to conquer the limitation of genetic heterogeneity that arises in patients of different genotype backgrounds leading to a broader range of potential drug targets, together with drug repurposing, to provide therapeutic interventions based on altered genes/pathways5.

Challenges: Caution urged in post-GWAS analysis

Previous GWAS focused on PrCa susceptibility regardless of specific status of the disease. Although meta-analysis of available GWAS data showed the specificity of some loci in aggressiveness or survival112, the vast majority of SNPs identified to date do not discriminate patients with a poorer prognosis. Investigation of such loci for future GWAS might discover many potential associations with PrCa aggressiveness113 which is a clinically more relevant outcome. Apart from this, some challenges encountered during analysis of the valuable

11

GWAS data that needs to be taken into consideration in order to maximise the accuracy and precision of post-GWAS are noted below:

- Genetic association of causal SNPs could be missed by current GWAS. Low resolution of GWAS does not allow full coverage of SNP density present in the and thus candidate causal loci114. Particularly, some of the rare variants with minor allele frequency (MAF) of 1-5% might be missed in GWAS, fine-mapping and imputation statistical procedures115. Rare variants with greater effect sizes116 might confer highly deleterious effects on key oncogenes in PrCa48, thus, it is crucial to include them in the subsequent post- GWAS analysis. To detect copy number variations (CNVs) together with small insertions/deletions117 in PrCa, many current investigations are focusing on pursuing high- throughput sequencing-based GWAS118. However, it is technically challenging due the low- coverage of current sequencing technologies. Imputation methods also have their own limitations that can be minimised by increasing the number of samples in reference panels. The Haplotype Reference Consortium (HRC) is currently the largest reference panel combining other reference panels such as 1000 Genomes Project data together with sequencing data consisting of 64,976 haplotypes119. Since the reference panels predominantly contain European ancestry data and given the genetic heterogeneity among various populations, more diverse reference panels including sets of world-wide populations are needed to have more accurate imputation results for a certain population especially for rare variants.

- The target genes of regulatory SNPs are not necessarily the nearest/embedded genes. The nearest gene is not always regulated by the closest response element, thus, assigning a gene to a causal SNP is difficult and requires experimental validation. A typical example of how misleading the traditional approach could be is the regulatory variant, SNP rs9930506, located within the intronic region of the FTO gene which was originally considered as its target gene leading to undertaking drug design and preclinical studies in adiposity120. Further studies demonstrated that this variant is actually functionally connected to another gene (IRX3) via a long-range interaction (encompassing nearly 2 Mb)121. The regulatory effect of the vast majority of functional germline variants identified as eQTLs highlights the need for throughput techniques empowering the discovery of allele-specific regulatory elements and their target gene(s)74,122,123 (BOX 2). Additionally, functional study of risk-loci

12 should include different modulators to result in genuine causality discoveries. The protein (pQTL), splice-QTL (sQTLs) or metabolites (metQTLs) and methylation (mQTL)77 studies together with eQTLs may be useful to examine correlations between proteomics, epigenomics and transcriptomic data along with genomics data.

- Overlooking tumour microenvironment and tissue-specificity. Lack of tissue/tissue group specificity in datasets could be a deterrent to discover functional SNPs. Only limited resources are available for tissue-specific expression data. This is of high importance in the case of regulatory variants due to a strong degree of tissue-specificity in crucial interactions of regulatory machinery124. Moreover, some of the functional effects of variants might be manifested only in specific conditions including cell cycle phases, signals from the microenvironment depending on relevant or related tissue types76. Aside from the original disease-relevant tissue, related tissue(s) can harbour functional variants that are involved in PrCa manifestations86. Notably, cells from the and other cells in the tumour microenvironment (fibroblasts, adipocytes) play an important role in cancer development and progression along with cancer cells. In-vitro 3D models might partly be a solution to recapitulate the in-vivo complexity of tumorigenesis125. Integration of data from all available resources for specific cancer cell type/tissue(s) might be helpful to overcome this challenge. Understanding the transcriptome at the single-cell level (scQTLs) will enable us to recapitulate precise biological perspective of transcriptomic architecture regardless of i) cell cycle stage, ii) epigenetic and iii) stochastic differences, markedly affecting the accuracy of results of whole-tissue/cell line population studies126. Induced pluripotent stem cells (iPSCs) can also be powerful models to perform functional studies on variants in differentiated cell types of prostate tissue127.

- Consideration of genetic, epigenetic and environmental factors. A phenotypic heterogeneity in PrCa patients implies the involvement of additional genetic/epigenetic (oncogenome)76 and environmental factors128. For example, germline and somatic variant enrichments within the genes involved in cell growth suggest a coordinated interplay between those variants in PrCa118 and might be helpful to predict the risk as it has been shown in breast cancer129,112. Recent efforts investigating SNP-SNP interactions try to expand association studies to a higher level of cross-associations of risk loci130. SNP-SNP interactions in angiogenesis genes such as EGFR, MMP16, and CSF1 have been shown to be

13 associated in aggressive PrCa130,131. Moreover, enrichment of some variants in histone acetylation profiles suggests that germline variants could also contribute to PrCa at the epigenetic level63. Environmental factors can affect the overall impact of functional loci at various levels and make the interpretation of consequences even more complex132. Studies on non-genetic factors such as alcohol consumption133 or height134 try to understand the evidence of gene-environment interaction beyond genetics factors to affect PrCa incidence. The Mendalian Randomization (MR) approach can be used to investigate causality of other risk factors (obesity, smoking, etc.) in PrCa. Methods such as Summary-data-based Mendelian Randomization (SMR) which is created based on MR using GWAS and gene expression data are helpful to assess causality of GWAS-eQTL hits for PrCa45. These methods emphasise the high utility of summary statistics, which do not require the investigator to obtain individual-level data which makes the analyses more cost-effective and less computationally demanding. Likely cross-phenotype associations for intermediate phenotypes of PrCa proposed by phenome-wide association studies (PheWAS)135 can come into the picture investigating a landscape of associations in PrCa enabling the identification of common genes/pathways in the biology of a disease valuable in the development of new therapeutic strategies136.

Conclusion and future directions:

Although both GWAS and post-GWAS have provided the first detailed understanding of the genetic component of PrCa, a substantial portion of PrCa heritability remains unexplained. Post-GWAS is emerging as a way of translating the knowledge from disease-specific sets of variants into biologically, clinically, and therapeutically meaningful factors. Small effects of SNPs can be combined together to exert a cumulative impact on the network of genes assigned to causal loci perturbing biological processes and driving cellular dysfunction137. This is supported by proposed polygenicity or omnigenicity models of cancer progression as a complex approach. The difference in these models is about the disproportionate degree of involvement of all genes in a final phenotype in an omnigenic model compared to an even contribution of a certain number of genes to the final impact explained by the polygenicity model. Indeed, the proportion of involvement of GWAS assigned genes depends on the trait/disease complexity138. Post-GWAS might be a practical strategy to prove that “highly

14 connected genes” among GWAS findings are the main contributors of variants in key biological pathways proposed in a recent omnigenic model139.

The possible impact of the interplay of genetic variants either with epigenetic and environmental factors or with other germline/somatic mutations112 would shed light on the functional mechanism of causal variants in future studies. Germline variants may cooperate with somatic mutations as pre-existing tumorigenesis drivers129. Focus on the overlapped genes may lead to the discovery of novel key mechanisms underlying prostate oncogenesis. At a systems level, this approach could account for interpretation of gene networks and deregulated pathways which those genes act through. This might be complex multivariate interactions of different relevant regulatory networks in transcriptional, post-transcriptional and post-translational levels as well as protein-interactions involved in various inter/intra- cellular signaling. Additional approaches that have not been explored thoroughly such as epigenetic-wide association studies (EpiWAS/EWAS)77 and various gene expression data analysis with respect to trans-eQTL and cis-eQTL data of ncRNAs will substantiate available data. Finally, high-throughput algorithms, screening tools validating functional SNPs and experimental strategies are necessary in order to effectively detect the minuscule impact of functional variants other than an absolute effect like mutations.

At a glance

 Post-GWAS has empowered our ability to decode the potential biological role of risk-associated loci in prostate cancer.  Determination of the biological role of risk loci residing in non-coding DNA regions is particularly challenging since the nearest gene is not always the target gene because of the highly dynamic nature of the genome.  Fine-mapping of GWAS loci by the presence of expression quantitative trait loci in order to capture the combined yet consistent supporting data of overlapped assigned genes might be a practical analytical paradigm of prioritising genes and regulatory elements at GWAS loci for follow-up functional studies.  Leveraging this current strategy, regardless of its limitations, has provided us with functional germline variants in prostate cancer yielding new insights into deciphering the causal impact of these loci in aetiology of the disease.

15

 Investigation of pathway enrichments of assigned genes to causal variants can help us to overcome the current limitation of detecting more subtle impacts of causal variants while we are studying them separately.  All the examples in this review not only highlight the need for deeper analysis of original GWAS data but also emphasise the potential of GWAS signals worth pursuing to enhance our ability to discover the effects of risk loci on the physiological/pathological state of a prostate cell that ultimately can be conveyed for molecular stratification of prostate cancer to guide treatment.

Glossary:

Polygenic risk score: A polygenic risk score also called genetic risk score is a number based on variation in multiple genetic loci and their associated weights.

Tag/lead SNP: In GWAS, the tag SNP is the most significant associated SNP to a trait/disease within a genomic region according to the initial design of the GWAS.

Linkage disequilibrium (LD) block: Non-random correlation of alleles in a haplotype, the degree of correlation is estimated by r2 value ranging from 0-1; r2=0 shows complete linkage equilibrium while r2> 0.9 represents highly correlated LD SNPs. Haplotype: is a group of alleles located on a that are likely to be inherited together.

Related/correlated SNP: Dependent variants exist at the same LD with GWAS-tag risk locus.

Coding SNPs: Variants located within exonic regions of the genes.

Non-coding/Regulatory SNPs: Functional variants are located within genic, intragenic or intergenic regions of the genome modulating (assigned) gene expression.

Heritability: Genetic component of a trait/disease.

Segregation study: is a method of estimating the genetic inheritance of a disease using family data.

Familial risk: Inherited predisposition of a disease in an individual.

Pleiotropic effect: occurs when one gene influences two or more diseases.

Twin studies: Large scale studies to evaluate the role of genetic and environment influence on the development of a disease by comparison between monozygotic and dizygotic twins.

Untranslated regions (UTRs): The sequences at 3’ (3’UTR) and 5’ (5’UTR) of a gene which are not translated and often have a regulatory impact.

16

Super-enhancer:A group of putative enhancers in close genomic proximity with unusually high levels of TF binding sites.

Intragenic variants: Germline variations located within a gene.

Intergenic variants: Germline variations located between genes.

Copy number variations (CNVs): are a distinct class of germline polymorphisms consisting of longer sequences than small insertion/deletions.

Transcriptional regulation: regulation of gene activity at the level of the conversion of DNA to RNA.

Post-transcriptional regulation: regulation of gene activity by impacting on RNA stability/availability before being translated to protein.

Translational regulation: regulation of gene activity at the level of the conversion of RNA to protein.

Transcriptome: The set of all RNA molecules in a cell or a population of cells (a tissue).

Epigenetic: The modifications on DNA that involves molecules other than nucleotides such as histones.

Gleason-score: is a patho-histological score from a prostate biopsy or surgical sample used to determine the prognostic risk level of men with prostate cancer.

CpG islands: DNA sequences with high repetition of the nucleotides, Cytosine and Guanidine.

5mCpG: 5-methylcytosine, Cytosine modification of the dinucleotide CpG in the DNA regulatory region.

5hmCpG: 5-hydroxymethylcytosine (5hmCpG) methylation, another Cytosine modification of the dinucleotide CpG in the DNA regulatory region.

H3K36me3: Epigenetic mark of trimethylation of Histone 3 lysine 36.

H3K27ac: Epigenetic mark of acetylation of Histone 3 lysine 27.

17

Genetic heterogeneity: Different frequencies and combinations of germline variants in populations of different ethnicities.

MAF (Minor Allele Frequency): The frequency of the less common allele of a SNP in a given population.

Mendalian Randomization (MR): is a method in epidemiology using inherited genetic variants to infer causal relationship of an exposure and a disease outcome. cis-eQTLs: Local eQTLs that are located on a same chromosome as their target genes.

Trans-eQTLs: Distant eQTLs that are located on a different chromosome of their target genes.

BOX 1: Causal variants Leveraging the allele-specific regulatory effects of the expression quantitative trait loci (eQTLs) within GWAS-risk loci has been the front-line method in recent post-GWAS studies in prostate cancer (PrCa)40,56,64. When a GWAS hit corresponds to an eQTL there is a high probability that a particular likely causal variant influences the disease by affecting the shared assigned gene140-143. In fact, the majority of causal variants described in this review were discovered recently by this combined approach (Supplementary TABLE 1). The eQTL- GWAS pairs that were predominantly of European single ancestry while drastically less significant within other populations19 were excluded in addition to eQTL-GWAS pairs with borderline p-values. Of note, the criteria for the abovementioned studies which is mainly the closest neighbouring gene to implicate a gene (known as assigned gene for GWAS and e- Gene for eQTL studies) are likely to overlook long-distance interactions and the possible engagement of multiple SNPs within a linkage disequilibrium (LD) block. Indeed, enforcing developing strategies that implement more comprehensive criteria helps to define causal variants and their mechanisms. Cross-ancestry fine-mapping continues and will help to index SNPs within complex LD patterns to map the true causal variant responsible for the association signal or causality regardless of the differences in genetic architecture between different ethnicities20. Apart from eQTL-GWAS pairs other causal variants in this review were selected according to the following criteria: Firstly, variants that their functionality has been validated by a range

18 of experimental strategies including luciferase reporter assay, DNase I hypersensitivity assay, Chromatin conformation capture (3C), Electrophoretic mobility shift assay (EMSA), Hi- C, CRISPR/Cas9, proliferation and apoptosis assays, in-vivo transgenic reporter assay, and microRNA mimic assay. Secondly, we included SNPs that are likely to have some effect on transcription factor binding sites resulting from chromatin immunoprecipitation sequencing motif analysis. Thirdly, we included the missense mutations that have been consistently identified in PrCa-GWAS and assigned to the genes implicated in PrCa development. The causality of these variants (except for rs6175256147) has been shown by computational annotation and further delineation is still required. Nevertheless, our current knowledge and technology overlooks the highly active nature of the genome and it is very difficult and challenging to have complete certainty to implicate a gene/mechanism to a single variant. As we develop better techniques enforcing rigorous experimental criteria, we will get closer to be able to clearly define causal variants to confidently declare a gene(s) implicated in the impact of those causal variants in prostate oncogenesis.

Box2: Current strategies to validate candidate causal SNPs Considering the fact that the post-GWAS field is just starting to bear fruit, the number of experimentally validated causal variants is very few, mainly focusing on tagged SNPs while providing limited molecular understanding of the mechanism and bypassing the highly dynamic nature of the genomic elements. Nevertheless, functionally annotated variants (i.e. candidate variants that are likely to have a functional role using in-silico analysis and/or based on existing datasets) are extremely valuable to help overcome the complexity of the newborn post-GWAS in combination with the rapid introduction of new technologies to facilitate the evaluation of the subtle effect of germline variants. To identify long-distance DNA interactions of regulatory elements harbouring functional variants, Chromosome Conformation Capture (3C) and its derivatives (4C, 5C, Hi-C, Capture-C, Capture-Hi-C and tethered conformation capture) have been utilised with the data made publicly available143. Current genome editing methods120 have been developed to examine the causality of candidate variants in response elements. For example, CRISPR/Cas9 and Perturb-seq144 use perturbations or interrogation of the function of those regions harbouring causal variants to confirm gain/loss within the genome or transcriptome, respectively. Massively Parallel Reporter Assays (MPRAs)119, STARR-seq139 and Formaldehyde-Assisted Isolation of

19

Regulatory Elements sequencing (FAIRE-seq) can provide the ability to test thousands of sequences and variants for any impact on the regulatory activity142. These methods help us to confirm whether the loci are involved in modifying the chromatin structure; however we still need to narrow down the list of candidate causal(s) within a particular locus to further validate assigned risk/eQTL loci–target gene relationships79. The necessity of the data integration that combines different components into one assay can help to pinpoint the exact functional variant(s). Following the post-GWAS road with standard experimental strategies, such as xenografts, proliferation, invasion, migration, soft agar colony formation, help evaluating the cellular impact of screened loci, depends on the location and the predicted in-silico impact on the target gene(s). Experimental approaches such as protein modelling and biochemical analyses (e.g. protein stability measurement, enzymatic activity if the gene product is an active enzyme) in related biology processes can be performed. New technologies have evolved for in-vitro study of epigenomic signature changes such as methylation resulting from functional variants using a particular cancer related cell line71. Finally, it is necessary to follow-up allele replacement experiments in other model systems145 to determine the in-vivo contribution, if any, of causal variants to PrCa phenotype.

Figure 1: Representation of the post-GWAS pipeline. More than 160 common risk loci in prostate cancer have been identified by GWAS over 11 years, from 2006-2018; these loci can be subjected to a systematic functional post-GWAS analysis as follows: 1) Computational Post-GWAS: Statistical/Bioinformatic filtering is the first step facilitating post-GWAS data interpretation. Fine-mapping and imputation are employed within a multivariate region to characterise individual SNPs in an LD block/risk region and loci that have not been genotyped in original GWAS, respectively. Data integration of established resources based on pre-existing biological information regarding SNPs themselves or their assigned genes is recommended. The chromatin immunoprecipitation sequencing (ChIP- seq) peaks from Encyclopedia of DNA Elements (ENCODE)146, the National Institutes of Health (NIH) Roadmap Epigenomics76 and BLUEPRINT Epigenome Consortia (see URLs) provide a comprehensive map of methylation, histone modifications, chromatin accessibility and small RNA transcripts in several cancers. Furthermore, the Genotype Tissue Expression (GTEx) project and its extended version (eGTEX)147 have established the eQTL datasets,

20 enormously helpful in annotating functional impact of germline variants on gene expression. 2) Experimental Post-GWAS: Functional study of top-ranked variants requires a rigorous in-vitro/in-vivo experimental approach to confirm whether a causal variant is involved in the expected phenotype (FIG 2, BOX 2). 3) Clinical Post-GWAS: Translation of GWAS data has been started by using GWAS SNP based polygenic risk scores as a biomarker of prostate cancer risk stratification. Post-GWAS might enable us to generate large-scale functional maps of the cancer genome to better understand the disease, improve clinical decision making, and aid in the search for new prostate cancer therapeutic strategies (TABLE 1).

Figure 2: Schematic illustration of the approaches routinely employed in experimental validation of causal SNPs describing the known causal variants at the Kallikrein (KLK) locus in prostate cancer. The original GWAS data identified a high risk region on chromosome 19 (19q13) harbouring members of the KLK family (KLK1-15). Computational post-GWAS utilised various algorithms and datasets (detailed in REFs. 36-39) based on the location of candidate variants to prioritise them for experimentally validation of functional effects: (A) A missense variant (Asp-Asn; D-N) resulting from SNP rs61752561 A/G in KLK3 creates a new glycosylation site in PSA47. (B) A SNP, rs2739448, located in the promoter region of the KLK3 gene leads to up-regulation of the KLK3 gene. This effect is validated by the luciferase reporter assay to examine the differential promoter activity between minor and major

21 alleles62. (C) A SNP, rs2659051 located within an intergenic region between KLK3 and KLK15 is associated with higher expression of KLK15. The ChIP-qPCR method was performed to evaluate the impact of this causal SNP on AR, FOXA1, GATA2 and HOXB13 transcription factor binding motifs as well as a change in H3K27ac enrichment within the enhancer64. (D) Reporter gene and microRNA mimic assays were conducted to examine the post- transcriptional effect of the miRSNP rs1058205 within the 3’ untranslated region (3’UTR) of the KLK3 gene.

Figure 3: Pathway enrichment analysis of causal variants in prostate cancer. Analysis of the genes assigned to prostate cancer-causal SNPs (Supplementary TABLE 1) by Ingenuity Pathway Analysis (IPA) demonstrated cellular development as the main functional mechanism of the gene list (based on both the highest number of involved genes: 58 genes, and significant p-value range: 1.14E-02-4.98E-05). (A) Histone deacetylase 1 (HDAC1) is identified as one of the top upstream transcription factors (TFs) (p-value: 1.26E-05) regulating the highest number of genes, 13 genes, including TERT, KLK3, MYC and SOX9 genes. Edges (lines and arrows between nodes) represent direct (solid lines) and indirect

22

(dashed lines) interactions between molecules. Node shapes represent functional classes of gene products; defined in the figure. (B) IPA pathway design consisting of genes involved in cell development including the Androgen Receptor (AR) as a well-established upstream regulator demonstrates a functional role of AR in direct regulation of 11 downstream molecules (Supplementary TABLE 2) including known genes in prostate cell development. There is an activation relationship between AR and other molecules presented in this figure except for TERT and ATM. TERT decreases the expression of AR while AR decreases ATM phosphorylation in prostate cells.

23

Dataset URLs:

GTEX: https://www.gtexportal.org/home/

TCGA: https://cancergenome.nih.gov/

ENCODE: https://www.encodeproject.org/

BLUEPRINT Epigenome: http://www.blueprint-epigenome.eu/

International Cancer Genome Project: http://icgc.org/

Roadmap: http://www.roadmapepigenomics.org/

Haplotype Reference Consortium (HRC): http://www.haplotype-reference-consortium.org/

References

1 Bell, K. J., Del Mar, C., Wright, G., Dickinson, J. & Glasziou, P. Prevalence of incidental prostate cancer: A systematic review of autopsy studies. Int J Cancer 137, 1749-1757, doi:10.1002/ijc.29538 (2015). 2 Huncharek, M., Haddock, K. S., Reid, R. & Kupelnick, B. Smoking as a risk factor for prostate cancer: a meta-analysis of 24 prospective cohort studies. Am J Public Health 100, 693-701, doi:10.2105/AJPH.2008.150508 (2010). 3 Zhao, J., Stockwell, T., Roemer, A. & Chikritzhs, T. Is alcohol consumption a risk factor for prostate cancer? A systematic review and meta-analysis. BMC Cancer 16, 845, doi:10.1186/s12885-016-2891-z (2016).

24

4 Mucci, L. A. et al. Familial Risk and Heritability of Cancer Among Twins in Nordic Countries. JAMA 315, 68-76, doi:10.1001/jama.2015.17703 (2016). 5 Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nature Genetics, doi:10.1038/s41588-018-0142-8 (2018). The authors identified 63 new associated loci in PrCa and they also reported eQTLs for those loci utilising TCGA data. 6 Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747- 753, doi:10.1038/nature08494 (2009). 7 MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res 45, D896-D901, doi:10.1093/nar/gkw1133 (2017). 8 Mikropoulos, C., Goh, C., Leongamornlert, D., Kote-Jarai, Z. & Eeles, R. Translating genetic risk factors for prostate cancer to the clinic: 2013 and beyond. Future Oncol 10, 1679-1694, doi:10.2217/fon.14.72 (2014). This article shows the progress of translational application of GWAS in PrCa. 9 Benafif, S. & Eeles, R. Genetic predisposition to prostate cancer. Br Med Bull 120, 75-89, doi:10.1093/bmb/ldw039 (2016). 10 Freedman, M. L. et al. Principles for the post-GWAS functional characterization of cancer risk loci. Nat Genet 43, 513-518, doi:10.1038/ng.840 (2011). 11 Edwards, S. L., Beesley, J., French, J. D. & Dunning, A. M. Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet 93, 779-797, doi:10.1016/j.ajhg.2013.10.012 (2013). References 10 & 11 are two of the first studies proposing the post-GWAS. 12 Jia, P., Liu, Y. & Zhao, Z. Integrative pathway analysis of genome-wide association studies and gene expression data in prostate cancer. BMC Syst Biol 6 Suppl 3, S13, doi:10.1186/1752-0509-6-S3-S13 (2012). 13 Jiang, J., Cui, W., Vongsangnak, W., Hu, G. & Shen, B. Post genome-wide association studies functional characterization of prostate cancer risk loci. BMC Genomics 14 Suppl 8, S9, doi:10.1186/1471-2164-14-S8-S9 (2013). 14 Kote-Jarai, Z. et al. Identification of a novel prostate cancer susceptibility variant in the KLK3 gene transcript. Hum Genet 129, 687-694, doi:10.1007/s00439-011-0981-1 (2011). 15 Kote-Jarai, Z. et al. Fine-mapping identifies multiple prostate cancer risk loci at 5p15, one of which associates with TERT expression. Hum Mol Genet 22, 2520-2528, doi:10.1093/hmg/ddt086 (2013). 16 Dadaev, T. et al. Fine-mapping of prostate cancer susceptibility loci in a large meta-analysis identifies candidate causal variants. Nat Commun 9, 2256, doi:10.1038/s41467-018-04109-8 (2018). This study used a fine-mapping approach to find causal variants for identified PrCa-risk loci using an integrative approach of DNA variation and gene expression data. 17 Jones, D. Z. et al. The impact of genetic variants in inflammatory-related genes on prostate cancer risk among men of African Descent: a case control study. Hered Cancer Clin Pract 11, 19, doi:10.1186/1897-4287-11-19 (2013). 18 Ahmadiyeh, N. et al. 8q24 prostate, breast, and colon cancer risk loci show tissue-specific long-range interaction with MYC. Proc Natl Acad Sci U S A 107, 9742-9746, doi:10.1073/pnas.0910668107 (2010). 19 Grisanzio, C. et al. Genetic and functional analyses implicate the NUDT11, HNF1B, and SLC22A3 genes in prostate cancer pathogenesis. Proc Natl Acad Sci U S A 109, 11252-11257, doi:10.1073/pnas.1200853109 (2012). 20 Benafif, S., Kote-Jarai, Z., Eeles, R. A. & Consortium, P. A Review of Prostate Cancer Genome- Wide Association Studies (GWAS). Cancer Epidemiol Biomarkers Prev 27, 845-857, doi:10.1158/1055-9965.EPI-16-1046 (2018).

25

21 Cooper, G. M. & Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12, 628-640, doi:10.1038/nrg3046 (2011). 22 Dias, A., Kote-Jarai, Z., Mikropoulos, C. & Eeles, R. Prostate Cancer Germline Variations and Implications for Screening and Treatment. Cold Spring Harb Perspect Med, doi:10.1101/cshperspect.a030379 (2017). 23 Amin Al Olama, A. et al. Multiple novel prostate cancer susceptibility signals identified by fine-mapping of known risk loci among Europeans. Hum Mol Genet 24, 5589-5602, doi:10.1093/hmg/ddv203 (2015). 24 Sud, A., Kinnersley, B. & Houlston, R. S. Genome-wide association studies of cancer: current insights and future perspectives. Nat Rev Cancer 17, 692-704, doi:10.1038/nrc.2017.82 (2017). 25 Chung, C. C. et al. Fine mapping of a region of chromosome 11q13 reveals multiple independent loci associated with risk of prostate cancer. Hum Mol Genet 20, 2869-2878, doi:10.1093/hmg/ddr189 (2011). 26 Laitinen, V. H. et al. Fine-mapping the 2q37 and 17q11.2-q22 loci for novel genes and sequence variants associated with a genetic predisposition to prostate cancer. Int J Cancer 136, 2316-2327, doi:10.1002/ijc.29276 (2015). 27 Xu, X. et al. Variants at IRX4 as prostate cancer expression quantitative trait loci. Eur J Hum Genet 22, 558-563, doi:10.1038/ejhg.2013.195 (2014). 28 Chen, R., Ren, S. & Sun, Y. Genome-wide association studies on prostate cancer: the end or the beginning? Protein Cell 4, 677-686, doi:10.1007/s13238-013-3055-4 (2013). 29 Grisanzio, C. & Freedman, M. L. Chromosome 8q24-Associated Cancers and MYC. Genes Cancer 1, 555-559, doi:10.1177/1947601910381380 (2010). 30 Johanneson, B. et al. Fine mapping of familial prostate cancer families narrows the interval for a susceptibility locus on chromosome 22q12.3 to 1.36 Mb. Hum Genet 123, 65-75, doi:10.1007/s00439-007-0451-y (2008). 31 Helfand, B. T. et al. Personalized prostate specific antigen testing using genetic variants may reduce unnecessary prostate biopsies. J Urol 189, 1697-1701, doi:10.1016/j.juro.2012.12.023 (2013). 32 Hoffmann, T. J. et al. Genome-wide association study of prostate-specific antigen levels identifies novel loci independent of prostate cancer. Nat Commun 8, 14248, doi:10.1038/ncomms14248 (2017). 33 Guo, H. et al. Modulation of long noncoding RNAs by risk SNPs underlying genetic predispositions to prostate cancer. Nat Genet 48, 1142-1150, doi:10.1038/ng.3637 (2016). This article demonstrates an example of the successful application of the post-GWAS workflow proposed in this review. 34 Morris, E. V. & Edwards, C. M. Bone Marrow Adipose Tissue: A New Player in Cancer Metastasis to Bone. Front Endocrinol (Lausanne) 7, 90, doi:10.3389/fendo.2016.00090 (2016). 35 Shahedi, K. et al. Genetic variation in the COX-2 gene and the association with prostate cancer risk. Int J Cancer 119, 668-672, doi:10.1002/ijc.21864 (2006). 36 Khurana, E. et al. Role of non-coding sequence variants in cancer. Nat Rev Genet 17, 93-108, doi:10.1038/nrg.2015.17 (2016). 37 Schork, A. J. et al. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoS Genet 9, e1003449, doi:10.1371/journal.pgen.1003449 (2013). 38 Coetzee, S. G., Rhie, S. K., Berman, B. P., Coetzee, G. A. & Noushmehr, H. FunciSNP: an R/bioconductor tool integrating functional non-coding data sets with genetic association studies to identify candidate regulatory SNPs. Nucleic Acids Res 40, e139, doi:10.1093/nar/gks542 (2012).

26

39 Patnala, R., Clements, J. & Batra, J. Candidate gene association studies: a comprehensive guide to useful in silico tools. BMC Genet 14, 39, doi:10.1186/1471-2156-14-39 (2013). 40 Whitington, T. et al. Gene regulatory mechanisms underpinning prostate cancer susceptibility. Nat Genet 48, 387-397, doi:10.1038/ng.3523 (2016). The authors discuss the regulatory potential of non-coding risk loci by application of the ChIP-sequencing concept to explore upstream regulators. 41 Corradin, O. & Scacheri, P. C. Enhancer variants: evaluating functions in common disease. Genome Med 6, 85, doi:10.1186/s13073-014-0085-3 (2014). 42 Luo, Z., Rhie, S. K., Lay, F. D. & Farnham, P. J. A Prostate Cancer Risk Element Functions as a Repressive Loop that Regulates HOXA13. Cell Rep 21, 1411-1417, doi:10.1016/j.celrep.2017.10.048 (2017). This paper provides an example of how a regulatory SNP leads to gene expression variation at a distant gene. It also highlights the necessity for fine-mapping of identified associated regions in order to discover a promising causal variant. 43 McVicker, G. et al. Identification of genetic variants that affect histone modifications in human cells. Science 342, 747-749, doi:10.1126/science.1242429 (2013). 44 Bell, J. T. et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol 12, R10, doi:10.1186/gb-2011-12-1-r10 (2011). 45 Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet 48, 481-487, doi:10.1038/ng.3538 (2016). 46 Gorlov, I. P., Gallick, G. E., Gorlova, O. Y., Amos, C. & Logothetis, C. J. GWAS meets microarray: are the results of genome-wide association studies and gene-expression profiling consistent? Prostate cancer as an example. PLoS One 4, e6511, doi:10.1371/journal.pone.0006511 (2009). 47 Song, E. et al. Characterization of the Glycosylation Site of Human PSA Prompted by Missense Mutation using LC-MS/MS. J Proteome Res 14, 2872-2883, doi:10.1021/acs.jproteome.5b00362 (2015). 48 MacArthur, D. G. et al. A systematic survey of loss-of-function variants in human protein- coding genes. Science 335, 823-828, doi:10.1126/science.1215040 (2012). 49 Ewing, C. M. et al. Germline mutations in HOXB13 and prostate-cancer risk. N Engl J Med 366, 141-149, doi:10.1056/NEJMoa1110000 (2012). 50 Cardoso, M., Maia, S., Paulo, P. & Teixeira, M. R. Oncogenic mechanisms of HOXB13 missense mutations in prostate carcinogenesis. Oncoscience 3, 288-296, doi:10.18632/oncoscience.322 (2016). This article is not post-GWAS, although, it demonstrates the use of functional studies for two coding causal variants in HOXB13. 51 Sipeky, C. et al. Synergistic interaction of HOXB13 and CIP2A predispose to aggressive prostate cancer. Clin Cancer Res, doi:10.1158/1078-0432.CCR-18-0444 (2018). 52 Maia, S. et al. Identification of Two Novel HOXB13 Germline Mutations in Portuguese Prostate Cancer Patients. PLoS One 10, e0132728, doi:10.1371/journal.pone.0132728 (2015). 53 Saunders, E. J. et al. Fine-mapping the HOXB region detects common variants tagging a rare coding allele: evidence for synthetic association in prostate cancer. PLoS Genet 10, e1004129, doi:10.1371/journal.pgen.1004129 (2014). 54 Sipeky, C. et al. Synergistic Interaction of HOXB13 and CIP2A Predisposes to Aggressive Prostate Cancer. Clin Cancer Res, doi:10.1158/1078-0432.CCR-18-0444 (2018). 55 Chang, B. L. et al. A polymorphism in the CDKN1B gene is associated with increased risk of hereditary prostate cancer. Cancer Res 64, 1997-1999 (2004). 56 Hazelett, D. J. et al. Comprehensive functional annotation of 77 prostate cancer risk loci. PLoS Genet 10, e1004102, doi:10.1371/journal.pgen.1004102 (2014).

27

57 Kibel, A. S. et al. CDKN1A and CDKN1B polymorphisms and risk of advanced prostate carcinoma. Cancer Res 63, 2033-2036 (2003). 58 Paulo, P. et al. Targeted next generation sequencing identifies functionally deleterious germline mutations in novel genes in early-onset/familial prostate cancer. PLoS Genet 14, e1007355, doi:10.1371/journal.pgen.1007355 (2018). 59 Meyer, A. et al. ATM missense variant P1054R predisposes to prostate cancer. Radiother Oncol 83, 283-288, doi:10.1016/j.radonc.2007.04.029 (2007). 60 Stegeman, S. et al. A genetic variant of MDM4 influences regulation by multiple microRNAs in prostate cancer. Endocr Relat Cancer 22, 265-276, doi:10.1530/ERC-15-0013 (2015). 61 Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190-1195, doi:10.1126/science.1222794 (2012). 62 Cramer, S. D. et al. Association between genetic polymorphisms in the prostate-specific antigen gene promoter and serum prostate-specific antigen levels. J Natl Cancer Inst 95, 1044-1053 (2003). 63 Zuber, V. et al. Bromodomain protein 4 discriminates tissue-specific super-enhancers containing disease-specific susceptibility loci in prostate and breast cancer. BMC Genomics 18, 270, doi:10.1186/s12864-017-3620-y (2017). 64 Jin, H. J., Jung, S., DebRoy, A. R. & Davuluri, R. V. Identification and validation of regulatory SNPs that modulate transcription factor chromatin binding and gene expression in prostate cancer. Oncotarget 7, 54616-54626, doi:10.18632/oncotarget.10520 (2016). 65 Bu, H. et al. Putative Prostate Cancer Risk SNP in an Androgen Receptor-Binding Site of the Melanophilin Gene Illustrates Enrichment of Risk SNPs in Androgen Receptor Target Sites. Hum Mutat 37, 52-64, doi:10.1002/humu.22909 (2016). 66 Akamatsu, S. et al. A functional variant in NKX3.1 associated with prostate cancer susceptibility down-regulates NKX3.1 expression. Hum Mol Genet 19, 4265-4272, doi:10.1093/hmg/ddq350 (2010). 67 Lu, Y. et al. Functional annotation of risk loci identified through genome-wide association studies for prostate cancer. Prostate 71, 955-963, doi:10.1002/pros.21311 (2011). 68 Hazelett, D. J., Coetzee, S. G. & Coetzee, G. A. A rare variant, which destroys a FoxA1 site at 8q24, is associated with prostate cancer risk. Cell Cycle 12, 379-380, doi:10.4161/cc.23201 (2013). 69 Huang, Q. et al. A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. Nat Genet 46, 126-135, doi:10.1038/ng.2862 (2014). 70 Lou, H. et al. Fine mapping and functional analysis of a common variant in MSMB on chromosome 10q11.2 associated with prostate cancer susceptibility. Proc Natl Acad Sci U S A 106, 7933-7938, doi:10.1073/pnas.0902104106 (2009). 71 Sjoblom, L. et al. Microseminoprotein-Beta Expression in Different Stages of Prostate Cancer. PLoS One 11, e0150241, doi:10.1371/journal.pone.0150241 (2016). 72 Kote-Jarai, Z. et al. Mutation analysis of the MSMB gene in familial prostate cancer. Br J Cancer 102, 414-418, doi:10.1038/sj.bjc.6605485 (2010). 73 Zhang, X., Cowper-Sal lari, R., Bailey, S. D., Moore, J. H. & Lupien, M. Integrative functional genomics identifies an enhancer looping to the SOX9 gene disrupted by the 17q24.3 prostate cancer risk locus. Genome Res 22, 1437-1446, doi:10.1101/gr.135665.111 (2012). 74 Spisak, S. et al. CAUSEL: an epigenome- and genome-editing pipeline for establishing function of noncoding GWAS variants. Nat Med 21, 1357-1363, doi:10.1038/nm.3975 (2015). This article is one of the clearest demonstrations of a pipeline for post-GWAS of noncoding variants in PrCa. 75 Noushmehr H., C. S. G., Rhie S.K., Yan C., Coetzee G.A. Wang Z. (eds) . Springer, New York, NY. The Functionality of Prostate Cancer Predisposition Risk Regions Is Revealed by AR Enhancers., ((2013)).

28

76 Roadmap Epigenomics, C. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317-330, doi:10.1038/nature14248 (2015). 77 Do, C. et al. Genetic–epigenetic interactions in cis: a major focus in the post-GWAS era. Genome Biology 18, 120, doi:10.1186/s13059-017-1250-y (2017). 78 Ross-Adams, H. et al. HNF1B variants associate with promoter methylation and regulate gene networks activated in prostate and ovarian cancer. Oncotarget 7, 74734-74746, doi:10.18632/oncotarget.12543 (2016). This article is one of the first studies showing an epigenetic effect of SNPs in PrCa, experimentally. 79 Mumbach, M. R. et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat Genet 49, 1602-1612, doi:10.1038/ng.3963 (2017). 80 Thibodeau, S. N. et al. Identification of candidate genes for prostate cancer-risk SNPs utilizing a normal prostate tissue eQTL data set. Nat Commun 6, 8653, doi:10.1038/ncomms9653 (2015). This study provides an example of identification of eQTL-GWAS pairs in PrCa. 81 Sur, I. K. et al. Mice lacking a Myc enhancer that includes human SNP rs6983267 are resistant to intestinal tumors. Science 338, 1360-1363, doi:10.1126/science.1228606 (2012). 82 French, J. D. et al. Functional variants at the 11q13 risk locus for breast cancer regulate cyclin D1 expression through long-range enhancers. Am J Hum Genet 92, 489-503, doi:10.1016/j.ajhg.2013.01.002 (2013). 83 Sotelo, J. et al. Long-range enhancers on 8q24 regulate c-Myc. Proc Natl Acad Sci U S A 107, 3001-3005, doi:10.1073/pnas.0906067107 (2010). 84 Han, Y. et al. Integration of multiethnic fine-mapping and genomic annotation to prioritize candidate functional SNPs at prostate cancer susceptibility regions. Hum Mol Genet 24, 5603-5618, doi:10.1093/hmg/ddv269 (2015). This study provides the conceptual basis of post-GWAS in order to prioritise potential driver genes of PrCa. 85 Dingge Ying, M. J. L., Pak Chung Sham, Miaoxin Li. A powerful approach reveals numerous expression quantitative trait haplotypes in multiple tissues. Bioinformatics, , bty318, doi:https://doi.org/10.1093/bioinformatics/bty318 (2018). 86 Stegeman, S. et al. A Large-Scale Analysis of Genetic Variants within Putative miRNA Binding Sites in Prostate Cancer. Cancer Discov 5, 368-379, doi:10.1158/2159-8290.CD-14-1057 (2015). 87 Bao, B. Y. et al. Polymorphisms inside microRNAs and microRNA target sites predict clinical outcomes in prostate cancer patients receiving androgen-deprivation therapy. Clin Cancer Res 17, 928-936, doi:10.1158/1078-0432.CCR-10-2648 (2011). 88 Anastasiadou, E., Jacob, L. S. & Slack, F. J. Non-coding RNA networks in cancer. Nat Rev Cancer 18, 5-18, doi:10.1038/nrc.2017.99 (2018). 89 Duan, J. et al. A rare functional noncoding variant at the GWAS-implicated MIR137/MIR2682 locus might confer risk to schizophrenia and bipolar disorder. Am J Hum Genet 95, 744-753, doi:10.1016/j.ajhg.2014.11.001 (2014). 90 Duan, R., Pak, C. & Jin, P. Single nucleotide polymorphism associated with mature miR-125a alters the processing of pri-miRNA. Hum Mol Genet 16, 1124-1131, doi:10.1093/hmg/ddm062 (2007). 91 Kim, Y. S., Kim, Y., Choi, J. W., Oh, H. E. & Lee, J. H. Genetic variants and risk of prostate cancer using pathway analysis of a genome-wide association study. Neoplasma 63, 629-634, doi:10.4149/neo_2016_418 (2016). 92 Loo, L. W., Fong, A. Y., Cheng, I. & Le Marchand, L. In silico functional pathway annotation of 86 established prostate cancer risk variants. PLoS One 10, e0117873, doi:10.1371/journal.pone.0117873 (2015).

29

93 Gorlova, O. Y., Demidenko, E. I., Amos, C. I. & Gorlov, I. P. Downstream targets of GWAS- detected genes for breast, lung, and prostate and colon cancer converge to G1/S transition pathway. Hum Mol Genet 26, 1465-1471, doi:10.1093/hmg/ddx050 (2017). 94 Kramer, A., Green, J., Pollard, J., Jr. & Tugendreich, S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics 30, 523-530, doi:10.1093/bioinformatics/btt703 (2014). 95 Ghosh, P. M. et al. Signal transduction pathways in androgen-dependent and -independent prostate cancer cell proliferation. Endocr Relat Cancer 12, 119-134, doi:10.1677/erc.1.00835 (2005). 96 Goh, C. L. et al. Clinical implications of family history of prostate cancer and genetic risk single nucleotide polymorphism (SNP) profiles in an active surveillance cohort. BJU Int 112, 666-673, doi:10.1111/j.1464-410X.2012.11648.x (2013). 97 Lilja, H., Ulmert, D. & Vickers, A. J. Prostate-specific antigen and prostate cancer: prediction, detection and monitoring. Nat Rev Cancer 8, 268-278, doi:10.1038/nrc2351 (2008). 98 Amin Al Olama, A. et al. A meta-analysis of genome-wide association studies to identify prostate cancer susceptibility loci associated with aggressive and non-aggressive disease. Hum Mol Genet 22, 408-415, doi:10.1093/hmg/dds425 (2013). 99 Whitaker, H. C., Warren, A. Y., Eeles, R., Kote-Jarai, Z. & Neal, D. E. The potential value of microseminoprotein-beta as a prostate cancer biomarker and therapeutic target. Prostate 70, 333-340, doi:10.1002/pros.21059 (2010). 100 Aly, M. et al. Polygenic risk score improves prostate cancer risk prediction: results from the Stockholm-1 cohort study. Eur Urol 60, 21-28, doi:10.1016/j.eururo.2011.01.017 (2011). 101 Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 14, 507-515, doi:10.1038/nrg3457 (2013). 102 Kader, A. K. et al. Potential impact of adding genetic markers to clinical parameters in predicting prostate biopsy outcomes in men following an initial negative biopsy: findings from the REDUCE trial. Eur Urol 62, 953-961, doi:10.1016/j.eururo.2012.05.006 (2012). 103 Shibahara, T. et al. A G/A polymorphism in the androgen response element 1 of prostate- specific antigen gene correlates with the response to androgen deprivation therapy in Japanese population. Anticancer Res 26, 3365-3371 (2006). 104 Macinnis, R. J. et al. A risk prediction algorithm based on family history and common genetic variants: application to prostate cancer with potential clinical impact. Genet Epidemiol 35, 549-556, doi:10.1002/gepi.20605 (2011). 105 Gronberg, H. et al. Prostate cancer screening in men aged 50-69 years (STHLM3): a prospective population-based diagnostic study. Lancet Oncol 16, 1667-1676, doi:10.1016/S1470-2045(15)00361-7 (2015). 106 Barnett, G. C. et al. A genome wide association study (GWAS) providing evidence of an association between common genetic variants and late radiotherapy toxicity. Radiother Oncol 111, 178-185, doi:10.1016/j.radonc.2014.02.012 (2014). 107 Walsh, P. C. The Search for the Missing Heritability of Prostate Cancer. Eur Urol, doi:10.1016/j.eururo.2017.04.003 (2017). 108 Helfand, B. T., Catalona, W. J. & Xu, J. A genetic-based approach to personalized prostate cancer screening and treatment. Curr Opin Urol 25, 53-58, doi:10.1097/MOU.0000000000000130 (2015). 109 McDermott, D. F. & Atkins, M. B. PD-1 as a potential target in cancer therapy. Cancer Med 2, 662-673, doi:10.1002/cam4.106 (2013). 110 Caffo, O., Veccia, A., Kinspergher, S., Rizzo, M. & Maines, F. Aberrations of DNA Repair Pathways in Prostate Cancer: Future Implications for Clinical Practice? Front Cell Dev Biol 6, 71, doi:10.3389/fcell.2018.00071 (2018). 111 Ritchie, M. D. The success of pharmacogenomics in moving genetic association studies from bench to bedside: study design and implementation of precision medicine in the post-GWAS era. Hum Genet 131, 1615-1626, doi:10.1007/s00439-012-1221-z (2012).

30

112 Geeleher, P. & Huang, R. S. Exploring the Link between the Germline and Somatic Genome in Cancer. Cancer Discov 7, 354-355, doi:10.1158/2159-8290.CD-17-0192 (2017). 113 Seibert, T. M. et al. Polygenic hazard score to guide screening for aggressive prostate cancer: development and validation in large scale cohorts. BMJ 360, j5757, doi:10.1136/bmj.j5757 (2018). 114 Sham, P. C. & Purcell, S. M. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15, 335-346, doi:10.1038/nrg3706 (2014). 115 McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-369, doi:10.1038/nrg2344 (2008). This review presents an overview of key considerations and challenges in GWAS that needs to be kept in mind before proceeding with post-GWAS. 116 Mancuso, N. et al. The contribution of rare variation to prostate cancer heritability. Nat Genet 48, 30-35, doi:10.1038/ng.3446 (2016). 117 Cheng, Z. et al. PExFInS: An Integrative Post-GWAS Explorer for Functional Indels and SNPs. Sci Rep 5, 17302, doi:10.1038/srep17302 (2015). 118 Wedge, D. C. et al. Sequencing of prostate cancers identifies new cancer genes, routes of progression and drug targets. Nature Genetics 50, 682-692, doi:10.1038/s41588-018-0086-z (2018). 119 McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48, 1279-1283, doi:10.1038/ng.3643 (2016). 120 FTO genotype and weight loss: systematic review and meta-analysis of 9563 individual participant data from eight randomised controlled trials. BMJ 356, j263, doi:10.1136/bmj.j263 (2017). 121 Smemo, S. et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature 507, 371-375, doi:10.1038/nature13138 (2014). 122 Tewhey, R. et al. Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell 165, 1519-1529, doi:10.1016/j.cell.2016.04.027 (2016). 123 Smith, A. J. P., Deloukas, P. & Munroe, P. B. Emerging applications of genome-editing technology to examine functionality of GWAS-associated variants for complex traits. Physiol Genomics, doi:10.1152/physiolgenomics.00028.2018 (2018). 124 Nica, A. C. et al. The architecture of gene regulatory variation across multiple human tissues: the MuTHER study. PLoS Genet 7, e1002003, doi:10.1371/journal.pgen.1002003 (2011). 125 Nyga, A., Cheema, U. & Loizidou, M. 3D tumour models: novel in vitro approaches to cancer studies. J Cell Commun Signal 5, 239-248, doi:10.1007/s12079-011-0132-4 (2011). 126 Wills, Q. F. et al. Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments. Nat Biotechnol 31, 748-752, doi:10.1038/nbt.2642 (2013). 127 Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370-375, doi:10.1038/nature22403 (2017). 128 Gomez-Acebo, I. et al. Risk Model for Prostate Cancer Using Environmental and Genetic Factors in the Spanish Multi-Case-Control (MCC) Study. Sci Rep 7, 8994, doi:10.1038/s41598- 017-09386-9 (2017). 129 Agarwal, D., Nowak, C., Zhang, N. R., Pusztai, L. & Hatzis, C. Functional germline variants as potential co-oncogenes. NPJ Breast Cancer 3, 46, doi:10.1038/s41523-017-0051-5 (2017). This article is an interesting prospective that demonstrates an active role of germline variations contributing in breast cancer and describes them as potential co-oncogenes. 130 Lin, H. Y. et al. SNP interaction pattern identifier (SIPI): an intensive search for SNP-SNP interaction patterns. Bioinformatics 33, 822-833, doi:10.1093/bioinformatics/btw762 (2017). 131 Vaidyanathan, V. et al. SNP-SNP interactions as risk factors for aggressive prostate cancer. F1000Res 6, 621, doi:10.12688/f1000research.11027.1 (2017).

31

132 Thompson, D. J. et al. CYP19A1 fine-mapping and Mendelian randomization: estradiol is causal for endometrial cancer. Endocr Relat Cancer 23, 77-91, doi:10.1530/ERC-15-0386 (2016). 133 Brunner, C. et al. Alcohol consumption and prostate cancer incidence and progression: A Mendelian randomisation study. Int J Cancer 140, 75-85, doi:10.1002/ijc.30436 (2017). 134 Lophatananon, A. et al. Height, selected genetic markers and prostate cancer risk: results from the PRACTICAL consortium. Br J Cancer 117, 734-743, doi:10.1038/bjc.2017.231 (2017). 135 Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol 31, 1102- 1110, doi:10.1038/nbt.2749 (2013). 136 Verma, A. et al. PheWAS and Beyond: The Landscape of Associations with Medical Diagnoses and Clinical Measures across 38,662 Individuals from Geisinger. Am J Hum Genet 102, 592- 608, doi:10.1016/j.ajhg.2018.02.017 (2018). This article is proposing that highly connected genes act additively to create risk for complex diseases in an omnigenic model. 137 Saunders, E. J. et al. Gene and pathway level analyses of germline DNA-repair gene variants and prostate cancer susceptibility using the iCOGS-genotyping array. Br J Cancer 118, e9, doi:10.1038/bjc.2017.468 (2018). 138 Wray, N. R., Wijmenga, C., Sullivan, P. F., Yang, J. & Visscher, P. M. Common Disease Is More Complex Than Implied by the Core Gene Omnigenic Model. Cell 173, 1573-1580, doi:10.1016/j.cell.2018.05.051 (2018). 139 Boyle, E. A., Li, Y. I. & Pritchard, J. K. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, 1177-1186, doi:10.1016/j.cell.2017.05.038 (2017). 140 Huan, T. et al. Genome-wide identification of microRNA expression quantitative trait loci. Nat Commun 6, 6601, doi:10.1038/ncomms7601 (2015). 141 Cheung, V. G. et al. Mapping determinants of human gene expression by regional and genome-wide association. Nature 437, 1365-1369, doi:10.1038/nature04244 (2005). 142 Hubner, N. et al. Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat Genet 37, 243-253, doi:10.1038/ng1522 (2005). 143 Stranger, B. E. et al. Population genomics of human gene expression. Nat Genet 39, 1217- 1224, doi:10.1038/ng2142 (2007). 144 Dixit, A. et al. Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 167, 1853-1866 e1817, doi:10.1016/j.cell.2016.11.038 (2016). 145 Wasserman, N. F., Aneas, I. & Nobrega, M. A. An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a MYC enhancer. Genome Res 20, 1191-1197, doi:10.1101/gr.105361.110 (2010). 146 Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74, doi:10.1038/nature11247 (2012). 147 e, G. P. Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nat Genet 49, 1664-1670, doi:10.1038/ng.3969 (2017). 148 Gudmundsson, J. et al. Genetic correction of PSA values using sequence variants associated with PSA levels. Sci Transl Med 2, 62ra92, doi:10.1126/scitranslmed.3001513 (2010). 149 Niedworok, C. et al. Serum Chromogranin A as a Complementary Marker for the Prediction of Prostate Cancer-Specific Survival. Pathol Oncol Res 23, 643-650, doi:10.1007/s12253-016- 0171-5 (2017). 150 Beke, L., Nuytten, M., Van Eynde, A., Beullens, M. & Bollen, M. The gene encoding the prostatic tumor suppressor PSP94 is a target for repression by the Polycomb group protein EZH2. Oncogene 26, 4590-4595, doi:10.1038/sj.onc.1210248 (2007).

Competing interests:

32

The authors declare no competing interests.

Acknowledgements:

The authors are grateful to Queensland University of Technology Postgraduate Award (QUTPRA), the National Health and Medical Research Council (NHMRC) CDF and PRF Fellowships, and Cancer Australia PdCCRS funding. TABLE 1: Clinical implications of post-GWAS data in prostate cancer. Characterising the biological function of risk associated SNPs provides an opportunity to identify mechanisms driving prostate cancer. This elucidates how genome sequence variation affects disease predisposition via downstream cellular regulatory mechanisms and discovers relevant biomarkers and drug targets. The risk loci including some of the identified causal SNPs (Supplementary TABLE 1) have been used as biomarkers105,148. For example, SNP rs11568818 with potential prognostic value is situated within the MMP7 gene that encodes for a matrix metalloproteinase, which is pivotal for tumour metastasis149. The knowledge of genetic variant association and causality can be merged to apply to currently used biomarkers and enable more efficient personalised value to treatment programs148. For instance, EZH2 inhibitors (*) could be useful for the treatment of metastatic prostate cancer via modulation of the MSMB gene (*) as a target gene of EZH299,150or application of AR antagonists (*) for patients carrying functional variants changing AR or AR binding sites, directly or indirectly. Drug repurposing and combinations of drugs to target potential therapeutic genes are new avenues for deeper investigations to utilise post-GWAS applications in prostate cancer.

33

Gene GWAS/causal SNP Effect of causal SNP Protein/gene biomarker Protein/gene therapeutic target149 MMP7 rs11568818 Transcription Overexpression of MMP7 is a MMP7 inhibitors dysregulation as potential biomarker for PrCa eQTLs aggressiveness and risk of metastatic disease MSMB rs10993994 dysregulation of The expression of the - MSMB as a target encoded MSMB gene for AR protein is found to be decreased in PrCa KLK3 rs2735839/ Modification of High PSA level is associated PSA immunotherapy, encoded PSA level with higher probability of Optimisation of PSA- disease level screening by incorporating sets of SNPs within PSA gene NAALADL2 3q26¥/rs10936845 Modifying expression NAALADL2 molecule Drugs targeting cell as eQTLs measurement is associated movement and adhesion with higher Gleason score, a pathological measure of disease aggressiveness ATM rs1800057 Annotated as benign - ATM inhibitors coding variant in Drugs for DNA repair several in-silico defects (e.g. Olaparib) studies TERT rs13190087 Modifying expression Hyper methylation of TERT Telomerase vaccine such rs2242652 as eQTLs promoter is a diagnostic and as GX301 that is rs2853676 prognostic biomarker in PrCa reported to be safe and rs2736107 highly immunogenic in /rs2242652 patients with prostate cancer Myc 8q24¥/rs11986220 Modifying Myc - MYC (BET inhibitors) rs183373024 enhancer activity rs6983267 AR rs5919432 * AR level, splice variants, liquid Androgen pathway biopsy targeted therapy* (AR Antagonists such as Enzalutamide, EZH2*) CDKN1B 12q13¥/rs2066827 Annotated as benign MAPK Pathway coding variant in inhibitors several in-silico PI3K Inhibitors studies Cell cycle check point inhibitors 34 TMPRSS2 rs1041449/ Modifying expression TMPRSS2-ERG transcript TMPRSS2-ETS fusions

rs9979125 as eQTLs serves as PCa biomarker (ERG Pathway inhibitor) detected classically with RT- PCR in urine and biopsy Supplementary Table 1: Prostate cancer-causal variants within coding/non-coding regions reported by post-GWAS.

Supplementary Table 2: List of implicated networks of affected/dysregulated genes by causal variants and pathway analysis using the Ingenuity Pathway Analysis (IPA) algorithm. The 10 top canonical pathways in IPA analysis (based on p-value) including HLA genes have been listed. We excluded HLA genes to identify HLA independent key pathways related to the rest of the assigned genes. The upstream regulators of the genes involved have been listed.

35