Pathway Analysis of Renal Cell Carcinoma Genome-Wide Association Studies Identifies Novel Associations Mark P
Total Page:16
File Type:pdf, Size:1020Kb
Published OnlineFirst July 30, 2020; DOI: 10.1158/1055-9965.EPI-20-0472 CANCER EPIDEMIOLOGY, BIOMARKERS & PREVENTION | RESEARCH ARTICLE Pathway Analysis of Renal Cell Carcinoma Genome-Wide Association Studies Identifies Novel Associations Mark P. Purdue1, Lei Song1, Ghislaine Scelo 2, Richard S. Houlston3, Xifeng Wu4, Lori C. Sakoda5, Khanh Thai5, Rebecca E. Graff6, Nathaniel Rothman1, Paul Brennan7, Stephen J. Chanock1, and Kai Yu1 ABSTRACT ◥ Background: Much of the heritable risk of renal cell carcinoma correction threshold) and replication (P < 0.05) sets, 10 of which (RCC) associated with common genetic variation is unexplained. include components of the PI3K/AKT pathway. In tests across New analytic approaches have been developed to increase the 2,035 genes in these pathways, associations (Bonferroni corrected À discovery of risk variants in genome-wide association studies P < 2.46  10 5 in discovery and replication sets combined) were (GWAS), including multi-locus testing through pathway analysis. observed for CASP9, TIPIN, and CDKN2C. The strongest SNP P ¼  À5 P ¼ Methods: We conducted a pathway analysis using GWAS sum- signal was for rs12124078 ( Discovery 2.6 10 ; Replication  À4 P ¼  À8 CASP9 mary data from six previous scans (10,784 cases and 20,406 con- 1.5 10 ; Combined 6.9 10 ), a expression quan- trols) and evaluated 3,678 pathways and gene sets drawn from the titative trait locus. Molecular Signatures Database. To replicate findings, we analyzed Conclusions: Our pathway analysis implicates genetic variation GWAS summary data from the UK Biobank (903 cases and 451,361 within the PI3K/AKT pathway as a source of RCC heritability and controls) and the Genetic Epidemiology Research on Adult Health identifies several promising novel susceptibility genes, including and Aging cohort (317 cases and 50,511 controls). CASP9, which warrant further investigation. Results: We identified 14 pathways/gene sets associated with Impact: Our findings illustrate the value of pathway analysis as a À RCC in both the discovery (P < 1.36  10 5, the Bonferroni complementary approach to analyzing GWAS data. Introduction by biological pathways, have the potential to empower the identifica- tion of new associations not captured by testing individual genetic Kidney cancer is one of the 10 most common cancers in the United variants (6, 7). States, with around 74,000 new cases and 14,000 related deaths in To gain further insight into the heritability of RCC, we evaluated 2019 (1). Renal cell carcinoma (RCC) is the most common type of 3,678 canonical pathways and gene sets using summary-level data kidney cancer, accounting for more than 90% of kidney cancer from a GWAS meta-analysis. To validate our findings, we analyzed diagnoses. RCC has a heritable basis, with relatives of patients having GWAS summary data from the UK Biobank and Kaiser Permanente a 2-fold increased risk (2, 3). While a number of rare familial RCC Genetic Epidemiology Research on Adult Health and Aging (GERA) syndromes caused by inheritance of high-impact mutations have been cohorts. identified, even collectively, they only account for less than 5% of RCC (4). Evidence for the role of common low-impact genetic variants influencing RCC risk has been established in genome-wide association Materials and Methods fi studies (GWAS), which have so far identi ed 13 susceptibility loci (5). Study populations Because GWAS risk variants typically have a small effect size, they The RCC GWAS meta-analysis, described previously (5), combined are difficult to identify in individual SNP-based GWAS after account- summary results from six independent GWASs totaling 10,784 RCC ing for multiple testing, even with large sample numbers. Pathway- cases and 20,406 controls of European ancestry. Briefly, genotypes had based analyses, involving joint testing of SNPs within gene sets defined been assayed across the scans using a combination of Illumina SNP Arrays (Illumina Inc). After performing imputation on all scans using 1Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rock- 1,094 subjects from the 1000 Genomes Project (phase I release 3) as the ville, Maryland. 2Department of Medical Sciences, University of Turin, Turin, Italy. reference panel, 7,437,091 SNPs were included in the meta-analysis. To 3Division of Genetics and Epidemiology, The Institute of Cancer Research, facilitate the identification of novel genetic signals, we excluded from Sutton, London, United Kingdom. 4Department of Big Data in Health Science, our pathway analysis 36,616 SNPs within 500 kb of genetic variants 5 Zhejiang University School of Public Health, Hangzhou, Zhejiang, China. Divi- previously reported to be associated with RCC at genome-wide À sion of Research, Kaiser Permanente Northern California, Oakland, California. significance (P < 5  10 8). All tests of statistical significance used 6Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California. 7International Agency for Research on in this analysis were two-sided. Cancer, Lyon, France. To replicate study findings, we made use of summary-level asso- ciation statistics for 30,798,054 SNPs from a GWAS of RCC conducted Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/). among UK Biobank participants (903 cases and 451,361 controls) downloaded from GeneATLAS (http://geneatlas.roslin.ed.ac.uk/; Corresponding Author: Mark P. Purdue, NCI, 9609 Medical Center Drive, MSC fi 9776, Bethesda, MD 20892. Phone: 240-276-7265; Fax: 240-276-7835; E-mail: ref. 8). The SNP beta coef cients and SEs in GeneATLAS were [email protected] computed using mixed linear models; we transformed these summary statistics to ORs using LMOR (9). SEs were calculated from the Cancer Epidemiol Biomarkers Prev 2020;XX:XX–XX reported P value and estimated OR. doi: 10.1158/1055-9965.EPI-20-0472 For replication of SNP-level analyses, we used summary results Ó2020 American Association for Cancer Research. from UK Biobank and, for selected SNPs (n ¼ 10), a GWAS of kidney AACRJournals.org | OF1 Downloaded from cebp.aacrjournals.org on October 3, 2021. © 2020 American Association for Cancer Research. Published OnlineFirst July 30, 2020; DOI: 10.1158/1055-9965.EPI-20-0472 Purdue et al. cancer conducted among persons of European ancestry in the GERA analysis of promising pathway-level results in replication datasets, we cohort (317 cases and 50,511 controls). Details of the GERA GWAS considered an alpha of 0.05 as being significant. A schematic sum- have been described previously (10). marizing our pathway analysis approach is provided in Supplementary Fig. S1. Pathway analysis We downloaded definitions for 3,762 human-derived pathways and Gene-level and SNP-level analyses within selected pathways gene sets (C2 gene set collection) from the Broad Institute Molecular We evaluated gene-level and SNP-level test results among all Signatures Database (MSigDB) v6.1 (http://software.broadinstitute. constituent genes of pathways found to be associated with RCC in org/gsea/msigdb/collections.jsp) for the pathway-level analysis. Geno- the discovery and replication sets, and combined association statistics mic definitions for genes were downloaded from human genes NCBI36 through meta-analysis using fixed effects models. and reference genome GRCh37.p13 using the Ensemble BioMart tool. We conducted gene- and pathway-level meta-analyses using the Functional annotation of SNP associations summary statistics-based adaptive rank truncated product (sARTP) We queried public databases to explore the possible biologic effects method (https://www.rdocumentation.org/packages/ARTP2/versions/ of selected SNPs. We searched RegulomeDB (http://www.regulomedb. 0.9.45/topics/sARTP). sARTP combines SNP associations across org) and HaploReg (https://pubs.broadinstitute.org/mammals/hap variants 20 kb upstream and downstream of a given gene with loreg/haploreg.php) to assess the likelihood that SNPs map to regu- adjustment for the size of genes and pathways through a resampling latory elements, and the NHGRI-EBI GWAS Catalog (https://www. procedure to evaluate the global testing P value, with proper adjust- ebi.ac.uk/gwas/) to search for previously reported GWAS associations ment of multiple comparisons (11). A Web-based sARTP application with other traits. We assessed potential SNP associations with gene tool is available allowing users to submit pathway analysis jobs online expression in The Cancer Genome Atlas (TCGA) renal cancer tumor and receive results computed using NCI computing resources cases (KIRC; n = 517) using the PancanQTL database (http://bioinfo. (https://analysistools.nci.nih.gov/pathway/). Significance of gene- life.hust.edu.cn/PancanQTL/; ref. 12). We also explored expression and pathway-level associations was estimated from the null distri- quantitative trait locus (eQTL) kidney cortex expression data (n ¼ 73) bution generated from 10 million resampling steps. A panel of 503 using GTEx (https://gtexportal.org/home/; ref. 13) European subjects (population codes: CEU, TSI, FIN, GBR, and IBS) in the 1000 Genomes Project (phase III, v5) was used in sARTP to estimate the linkage disequilibrium between SNPs. To mitigate Results the impact of population stratification, we applied genomic control We identified 14 pathways and gene sets significantly associated inflation factors to rescale the SEs of the log ORs for SNPs in each with RCC in the discovery and replication sets, ranging in size from 12 GWAS (lambda values 1.009–1.058) and in the meta-analysis to 732 genes and 402 to 31,986 SNPs (Table 1). Notably, 10 of the 14 (lambda ¼ 1.037). pathways/gene sets include components of the PI3K/AKT signaling We successfully analyzed 3,678 of the 3,762 pathways and gene sets pathway, with AKT1 and CASP9 among the most significant gene-level downloaded from MSigDB; tests of 84 pathways failed because of a lack signals for all 10 pathways. The four other gene sets were “Benporath of SNP coverage.