<<

ARTICLE

DOI: 10.1038/s41467-018-05074-y OPEN Identification of susceptibility pathways for the role of 15q25.1 in modifying risk

Xuemei Ji et al.#

Genome-wide association studies (GWAS) identified the chromosome 15q25.1 as a leading susceptibility region for lung cancer. However, the pathogenic pathways, through which sus- 1234567890():,; ceptibility SNPs within chromosome 15q25.1 affects lung cancer risk, have not been explored. We analyzed three cohorts with GWAS data consisting 42,901 individuals and lung expression quantitative trait loci (eQTL) data on 409 individuals to identify and validate the underlying pathways and to investigate the combined effect of from the identified susceptibility pathways. The KEGG neuroactive interaction pathway, two Reactome pathways, and 22 Ontology terms were identified and replicated to be significantly associated with lung cancer risk, with P values less than 0.05 and FDR less than 0.1. Functional annotation of eQTL analysis results showed that the neuroactive ligand receptor interaction pathway and gated channel activity were involved in lung cancer risk. These pathways provide important insights for the etiology of lung cancer.

Correspondence and requests for materials should be addressed to C.I.A. (email: [email protected]). #A full list of authors and their affliations appears at the end of the paper.

NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y

ung cancer, accounting for 13% of all cancer cases and 23% Results Lof all cancer-related deaths worldwide, is a leading cause of The study design is presented in Fig. 1. Demographic char- cancer death in the US and around the world, and represents acteristics and sample sizes of the two discovery cohorts and the a major public health problem1. Several genome-wide association replication cohort for GWAS pathway analyses are summarized studies (GWAS) have been published and identified the chro- in Table 1. Demographic characteristics of the lung eQTL study mosome 15q25.1 locus as a susceptibility region for lung cancer2– are summarized in Supplementary Table 1. 4, smoking behavior5,6, and addiction4 in Caucasians2, African-Americans7, and Asians8. Epigenetic analyses provided Selection of index SNPs and candidate SNPs in discovery.To evidence that epigenetic silencing of nAChR-encoding genes determine the most important susceptibility loci for lung cancer clustered at the 15q25.1 locus may contribute to lung cancer risk9. and to identify multiple association signals within observed loci, In addition, expression quantitative trait loci (eQTL) studies we performed association analyses using the 1st and 2nd dis- showed an influence of alleles in this region on the expression of covery cohort, separately, and conducted a meta-analysis of the several genes at chromosome 15q25.1, providing a mechanism by two cohorts in the discovery phase. We identified the most sig- which these variations might affect lung cancer risk10. Our pre- nificant susceptibility loci for lung cancer on vious GWA studies found that variants in chromosome 15q25.1, 15q25.1 in both discovery cohorts, and confirmed the finding in including single- polymorphisms (SNPs) and haplo- the meta-analysis. Eight signals within chromosome 15q25.1 were types, are involved in the etiology of overall lung cancer sus- defined as lung cancer risk-associated SNPs based on P values of ceptibility and by histology and smoking status2,11. However, − association with lung cancer of less than 5 × 10 8 in the 1st lung cancer, being a disease of complex origin, is usually con- discovery cohort and in the meta-analysis (Table 2). After Bon- sidered to result from complex effects of smoking along with ferroni correction, the eight signals maintained a significant multiple genetic variants affecting a number of pathways or impact on lung cancer risk in the 1st discovery cohort and in the biological process. Common SNPs are not individually known to meta-analysis. We defined the eight significant SNPs, which were add greatly to individual risk, unless more complex gene–gene rs1051730 in CHRNA3; rs1996371, rs6495314, rs11638372, interactions play a crucial role in the of rs4887077, and rs6495309 in CHRNB4; and rs8034191 and pathogenesis of complex disorders12, such as lung cancer. The rs2036534 in HYKK, as the index SNPs for lung cancer risk, and pathogenic pathways, through which lung cancer susceptibility used these eight SNPs to further select the candidate SNPs, which SNPs within chromosome 15q25.1 affect disease etiology and interacted with the eight index SNPs. development of lung cancer, have not been studied comprehen- To evaluate potential functional connections between genes sively, limiting mechanistic understanding. mapping throughout the genome and those on chromosome The objective of this study was to explore the underlying 15q25.1, to further elucidate the role of the chromosome 15q25.1- pathways that are involved in the molecular mechanisms by related pathway in lung cancer risk, we investigated SNP–SNP which variants at the chromosome 15q25.1 locus modify lung interactions between the eight index SNPs within chromosome cancer risk and increase lung cancer occurrence and develop- 15q25.1 and the whole genome in both discovery cohorts, and ment. We first performed a GWAS analysis with a cohort of conducted a meta-analysis of SNP–SNP interaction with both 1923 lung cancer cases and 1977 healthy controls of Italian cohorts. A total of 5883 SNP pairs between the eight index SNPs origin combined with a cohort of 2995 lung cases and 3578 and candidate SNPs in the whole genome exhibited P value controls of European ancestry, and then conducted a meta- of less than 0.05 in the 1st discovery cohort and in the meta-analysis analysis to identify the index SNPs within the chromosome results and showed epistasis P value of less than 0.10 in the 2nd 15q25.1 locus that were significantly associated with lung discovery cohort (Supplementary Data 1). In total, 3409 candidate cancer risk. We then investigated the SNP–SNP interaction SNPswithinthewholegenomewereidentified and validated to between the index SNPs within the chromosome 15q25.1 locus interact with the eight index SNPs (Supplementary Data 2). and the entire genome to identify the SNPs that interact with the 15q25.1 index SNPs, and are therefore involved in lung cancer etiology through interaction. Furthermore, using the Susceptibility pathways and GO terms in discovery. In order to index SNPs and their related SNPs in the whole genome, we identify chromosome 15q25.1-associated pathogenic pathways explored the pathogenic pathways that may be relevant to lung and biological processes that may be relevant to lung cancer cancer etiology, and replicated the findings with an indepen- etiology, we then conducted enrichment analyses using i- dent cohort of 18,439 lung cancer cases and 14,026 healthy GSEA4GWAS13 in discovery phase with the meta-analysis controls. We also studied genome-wide data in results of 2530 SNPs, which were pruned from eight index human lung tissues and conducted an eQTL analysis to inves- SNPs and 3409 candidate SNPs for linkage disequilibrium (LD) tigate whether the functional annotation of the eQTL results to reduce the possibility of biased results. We applied mapping can validate the susceptibility pathways from our GWAS ana- rules of SNPs to genes by incorporating a region 20 kb upstream lyses. Finally, we explored whether genes from our suscept- and downstream of each gene (Supplementary Data 2 and 3). In ibility pathways might jointly affect the process by which the total, one Kyoto Encyclopedia of Genes and Genomes (KEGG) chromosome 15q25.1 locus influences lung cancer risk. Our pathway, three Reactome pathways, and 22 (GO) findings suggest that common genetic variations within chro- terms were significantly associated with lung cancer risk with mosome 15q25.1 are likely to affect lung cancer etiology by improved gene set enrichment analysis (i-GSEA)13 P values less influencing the expression/structure and thereby the function of than 0.05 and FDR less than 0.25 for each pathway (Table 3). The genes that comprise the neuroactive ligand receptor interaction KEGG pathway was the neuroactive ligand receptor interaction pathway or gated channel activity and related terms. Such new pathway (i-GSEA P = 0.001 and FDR = 0.006). The 22 GO terms biologic insights from pathway analysis will provide a better included -specific channel activity (i-GSEA P < 0.001 understanding of the etiology and development of lung cancer, and FDR = 0.005), activity (i-GSEA P < 0.001 and potentially shortening the interval between increasing biologic FDR = 0.005), gated channel activity (i-GSEA P = 0.002 and knowledge and translation to patient care. FDR = 0.006), and several similar terms.

2 NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y ARTICLE

Discovery 1st cohort : 1977 cases and 1923 controls with 561, 466 SNPs 1 Primary GWAS analysis Discovery 2nd cohort : 2995 cases and 3578 controls with 310, 276 SNPs Meta–analysis of the two discovery cohorts with 310, 276 SNPs

Chromosome 15q25. 1 SNPs Association P value < 5 × 10–8 in the discovery 1st cohort 2 Index SNP selection Association P value < 5 × 10–8 in the meta–analysis of discovery cohorts

Discovery 1st cohort : the tag SNPs × SNPs in whole genome 3 Epistasis screening Discovery 2nd cohort : the tag SNPs × SNPs in whole genome Meta–analysis of the discovery cohorts

SNPs in whole genome Epistasis P value < 0.05 in the discovery 1st cohort 4 Candidate SNP selection Epistasis P value < 0.1 in the discovery 2nd cohort Epistasis P value < 0.05 in the meta–analysis

Using association P value of the LD pruned SNPs from the index 5 GWAS pathway analysis SNPs and candidate SNPs in meta–analysis of the two discovery cohorts

Replication cohort : 18,439 cases and 14,026 controls 6 Independent replication Association analysis of the index SNPs and candidate SNPs Pathway analysis

Laval cohort with 409 patients 7 eQTL study validation Genome-wide gene expression levels in the lung tissue and eQTL study Using DAVID

Combined genetic risk in Calculating accumulative genetic risk of genes from our 8 susceptibility pathways suceptibility pathways on lung cancer risk

Stratified GWAS pathway Stratified analyses by smoking status 9 analysis

Fig. 1 Schematic overview of the study design. (1) In the discovery phase, a total of 310,276 SNPs were the same in both the 1st and 2nd discovery cohorts and were applied for association analyses and meta-analyses. (2) SNPs within the 15q25.1 locus, which were associated with lung cancer risk with logistic regression P values of less than 5 × 10−8 in the 1st discovery cohort and in the meta-analysis of the discovery cohorts, were selected as index SNPs. (3) The epistasis test between SNPs in the whole genome and the index SNPs within chromosome 15q25.1 locus were conducted for both discovery cohorts and a meta-analysis was performed to combine the epistasis results. (4) The SNPs, which interacted with the index SNPs with an epistasis P value of less than 0.05 in the 1st discovery cohort and in the meta-analysis of both discovery cohorts, and less than 0.10 in the 2nd discovery cohort, were selected as the candidate SNPs. (5) The index SNPs and the candidate SNPs with the logistic regression P values in the meta-analysis of discovery cohorts were applied for GWAS pathway analysis. (6) In the replication phase, the index SNPs and the candidate SNPs with the logistic regression P values in an independent cohort were applied for GWAS pathway analysis to validate the susceptibility pathway enriched in step 5. (7) The most significant genes in the whole genome regulated by SNPs in chromosome 15q25.1, which were selected with the eQTL study, were employed for pathway analysis. (8) The individual and combined effects of genes in the pathways on lung cancer risk were calculated. (9) A similar process to select index SNPs and candidate SNPs and to carry out GWAS pathway analyses in the subgroups of smokers and non-smokers were conducted

Susceptibility pathways and GO terms in replication. We also 10−22 (Table 2). Enrichment analysis in the replication cohort examined whether our findings of chromosome 15q25.1-related confirmed that the KEGG pathway, two Reactome pathways, and pathways could be validated as involved in lung cancer patho- 22 GO terms were all significantly associated with lung cancer genesis and conducted an independent GWAS with a population- risk with P values of less than 0.05 and FDR of less than 0.25 for based case-control study among 18,439 lung cancer cases and each pathway, which was in agreement with the findings from the 14,026 healthy controls in the replication phase using i- meta-analysis of the discovery phase (Table 3). GSEA4GWAS. Of the eight index SNPs and the 3409 candidate SNPs in the discovery phase, 3411 SNPs were found in the replication cohort and, after pruned, 2525 SNPs were applied for Verification of GWAS pathway analysis. Considering the pos- enrichment (Supplementary Data 2). The replication cohort sibility that the much more significant lung cancer associated P analysis confirmed that the eight index SNPs within chromosome values in the index SNPs than in the candidate SNPs might lead 15q25.1 were significantly associated with lung cancer risk with a to false positive enrichment if the observed pathways were due logistic regression P value of each index SNP of less than 1 × only to the significance of the index SNPs, we performed gene set

NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y

Table 1 Participant characteristics of lung cancer cases and controls in GWAS cohorts

Variants 1st Discovery cohort (n = 3900) 2nd Discovery cohort (n = 6573) Replication Cohort (n = 32,465) Control Case P-value Control Case P-value Control Case P-value (n = 1977) (n = 1923) (n = 3578) (n = 2995) (n = 14,026) (n = 18,439) No. % No. % No. % No. % No. % No. % Age (years) 0–64 502 25.4 420 21.8 0.009 2304 64.39 1825 60.9 0.004 8449 60.2 9513 51.6 <0.0001 ≥65 1475 74.6 1503 78.2 1274 35.61 1170 39.1 5577 39.8 8926 48.4 Gender Male 1514 76.6 1520 79.0 0.06 2417 67.55 2093 69.9 0.04 8639 61.6 11,495 62.3 0.37 Female 463 23.4 403 21.0 1161 32.45 902 30.1 5384 38.4 6941 37.6 Omitted 3 0.02 3 0.02 Smoking status Never 633 32.0 138 7.1 <0.0001 867 24.23 137 4.6 <0.0001 4415 31.5 1800 9.8 <0.0001 Ever 1339 67.7 1774 92.3 2702 75.52 2854 95.3 9930 66.5 16,341 88.6 Omitted 5 0.3 11 0.6 9 0.25 4 0.1 281 2.0 298 1.6 Histology Squamous 488 25.4 307 10.3 4490 24.3 Adenocarcinoma 788 40.9 620 20.7 6819 37.0 Other 613 31.9 226 7.5 5487 29.8 Omitted 34 1.8 1842 61.5 1643 8.9 enrichment analysis with the index SNPs alone and the candidate its related genes identified by eQTL studies would indicate shared SNPs alone, separately, to clarify the contribution of the index pathways with the susceptibility pathways and GO terms from SNPs alone and the candidate SNPs alone to pathway analysis. our GWA study. Because rs16969968 was a functional SNP that We found that the enrichment analysis with the index SNPs alone changes through CHRNA516, and since in discovery and replication, respectively, cannot result in any rs16969968 had an estimated R-square LD value of 0.98 with pathways and GO terms with threshold of FDR < 0.25, but the rs1051730, which was the most significant SNP associated with analysis with the candidate SNPs alone showed several pathways lung cancer risk in discovery and replication cohorts, we used and GO terms with threshold of FDR < 0.25 in discovery and rs16969968 as a surrogate for CHRNA3–CHRNA5 and to inves- replication, respectively (Supplementary Table 2). To further tigate the influence of rs16969968 on whole-genome gene elucidate the independent effect of the candidate SNPs on this expression level. In addition, because rs649530917,18 in CHRNB4 pathway enrichment, we conducted an analysis using the and rs80341912,17 in HYKK had been reported to exhibit the observed logistic regression P values of the candidate SNPs and strongest association with lung cancer risk in CHRNB4 and setting the P values of the index SNPs to 0.01 to reduce the HYKK, separately, we also explored the effect of rs6495309 and impact that these SNPs might have had on the analysis. We rs8034191 on whole-genome gene expression level (Supplemen- observed that 14 significant GO terms, all of which are from the tary Data 4). 22 susceptibility GO terms confirmed by the discovery and We evaluated whether the epistatistic pathways we previously replication phase of GWAS analyses, were associated with lung identified were significantly related to expression levels. The cancer risk with i-GSEA P values of less than 0.05 and FDR of less KEGG neuroactive ligand receptor interaction pathway from our than 0.25 for each pathway in both the discovery and replication GWAS analyses was validated to influence lung cancer risk data. Therefore, our sensitivity analyses deny the possibility that through an impact on expression levels. The GWAS pathways of the observed pathways are due only to effects from only the index Reactome showed no significant associations with lung cancer SNPs. risk. Of 22 susceptibility GO terms from our GWAS analyses, In order to demonstrate that the observed pathways were gated channel activity was significantly associated with lung independent of the tool-chain, we performed gene set enrichment cancer risk (Fisher’s exact test, P = 0.029). Four transporter analysis using an alternative analytical strategy, namely GSA- activity GO terms, including ion channel activity, cation channel SNP214,15. We found that the KEGG pathway, all of the 22 GO activity, substrate-specific channel activity, and cation transmem- terms and one Reactome pathways, which were observed from brane transporter activity, had borderline significant associations our previous GWAS analyses with i-GSEA4GWAS, showed with lung cancer (Fisher’s exact test, P = 0.071, 0.073, 0.080, and significant association with lung cancer risk with P values of less 0.098, respectively) (Table 4 and Fig. 2). Another 17 GO terms than 0.05 for each pathway in both discovery and replication exhibited no significant relationship. We also evaluated whether (Supplementary Table 3). Only one Reactome pathway, neuronal the functional eQTL analysis identified the same lung cancer- system pathway, from our previous GWAS analyses was unable to related pathways after removing the HYKK and CHRNB4, which be confirmed. A few additional pathways, such as receptor were eQTL related pathways of genes underlying rs16969968, complex term, were identified. In addition, this method provides rs6495309, and rs8034191. more precise P values. We observed that the KEGG neuroactive ligand receptor interaction pathway still exhibited an involvement in lung cancer risk through its effect on expression levels. The gated channel Functional validation by lung eQTL analysis. We next mea- activity term showed association with lung cancer risk with sured genome-wide gene expression levels in lung tissues of 409 borderline significance (Fisher’s exact test, P = 0.088) (Supple- lung cancer patients and mapped eQTLs to determine which mentary Table 4). genes can be transcriptionally regulated by SNPs in chromosome To aid interpretation of the relationship of the sharing 15q25.1, and asked whether genes on chromosome 15q25.1 and pathways in both GWAS analysis and eQTL studies, we

4 NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y ARTICLE

Table 2 Index SNPs in the chromosome 15q25.1 locus which were associated with lung cancer with P < 5.00E-8 in the 1st discovery cohort and in meta-analysis of the discovery cohorts

SNP Gene Predicted function A1 A2 1st discovery cohort 2nd discovery cohort Meta-analysis of replication cohort discovery cohorts P-value BONFa P-value BONFa P-value BONFa P-value BONFa rs1051730 CHRNA3 coding T C 2.28E-14 7.77E-11 3.03E-13 1.04E-09 1.64E-25 5.09E-20 3.11E-49 1.06E-45 rs1996371 CHRNB4 intronic G A 9.08E-12 3.10E-08 1.15E-05 3.93E-02 2.05E-14 6.36E-09 2.83E-24 9.65E-21 rs6495314 CHRNB4 intronic C A 1.47E-11 5.01E-08 7.29E-06 2.49E-02 1.47E-14 4.56E-09 8.54E-24 2.91E-20 rs8034191 HYKK intronic C T 3.05E-11 1.04E-07 8.98E-14 3.07E-10 2.40E-23 7.45E-18 2.12E-46 7.23E-43 rs11638372 CHRNB4 intronic T C 3.14E-10 1.07E-06 2.95E-05 1.01E-01 8.11E-13 2.52E-07 5.28E-24 1.80E-20 rs2036534 HYKK 3downstream C T 3.81E-10 1.30E-06 4.29E-06 1.47E-02 7.81E-14 2.42E-08 4.85E-32 1.65E-28 rs4887077 CHRNB4 intronic T C 4.16E-10 1.42E-06 2.39E-05 8.17E-02 7.72E-13 2.40E-07 2.23E-23 7.61E-20 rs6495309 CHRNB4 3downstream T C 3.57E-08 1.22E-04 4.29E-06 1.47E-02 2.18E-12 6.76E-07 9.34E-29 3.19E-25 aP-value was adjusted for multiple comparisons using Bonferroni correction. calculated the overlapping genes in the KEGG neuroactive ligand The association of lung cancer risk and genotypes of each SNP receptor interaction pathway and the GO terms and clarified the and the number of risk alleles is shown in Tables 5 and 6. In each parent and child terms of the GO terms (Fig. 3). In addition, we cohort, the observed genotype frequencies among the controls investigated the gene expression level in normal lung tissue from were all consistent with Hardy–Weinberg equilibrium. Among Genecards database and found that all the genes in the the selected genes and their reference SNPs, CHRNA3 rs1051730 neuroactive ligand receptor interaction pathway and the gated was the most significantly associated with increased lung cancer channel activity term, as well as the four transporter activity risk. With respect to CHRNA3 rs1051730, compared with the CC terms whose functional annotation for eQTL study had border- homozygote, the CT heterozygote was associated with an elevated line significant association with lung cancer, are normally risk of lung cancer with ORs being 1.32 (adjusted 95% CI, expressed in normal lung tissue and may play roles in cell 1.21–1.44) in meta-analysis of the discovery cohorts and 1.27 growth, differentiation, or function of normal lung cell. (adjusted 95% CI, 1.21–1.33) in replication, while the TT homozygote was associated with increased lung cancer risk with – Combined effect of genes on lung cancer risk. In order to ORs being 1.89 (adjusted 95% CI, 1.67 2.14) in meta-analysis of the discovery cohorts and 1.63 (adjusted 95% CI, 1.52–1.75) in explore whether genes from our susceptibility KEGG pathways fi and GO terms could jointly affect lung cancer risk, we calculated replication. We also found and validated a signi cant dose–response relationship between the number of CHRNA3 the individual and combined effects of multiple functionally- = related genes from our susceptibility pathways and GO terms on rs1051730 T alleles and lung cancer risk (adjusted trend test P 2.68 × 10−24 for discovery (Table 5) and adjusted trend test P = lung cancer risk. Supplementary Data 5 shows the results from −44 our study of all selected genes, which were identified by our 1.82 × 10 for replication (Table 6)). The risk allele of CHRNB4 fi rs6495309 also significantly increased lung cancer risk in GWAS enrichment analysis as signi cant genes constituting the fi – susceptibility pathways/GO terms, and the reference SNP, as well discovery and replication. A signi cant dose response relation- as its P value associated with lung cancer risk. Since the combined ship was demonstrated between the number of risk alleles of effect of weaker SNPs/genes might have minor influence and lead CHRNB4 rs6495309 and the risk of lung cancer. Each of the other to difficulties in exploring the systems view, only those genes SNPs, including KCNJ4 rs138396 and SCN2B rs7944321, whose reference SNPs were associated with lung cancer risk with appeared to have a slightly elevated risk of lung cancer in border-line significance (association test P < 0.1) in the meta- discovery and replication. analysis of the discovery cohorts and in the replication cohort For the neuroactive ligand receptor interaction pathway, based were selected to assess the individual and joint effects on lung on the number of risk alleles of the combined CHRNA3 cancer risk. With the threshold of P value of the reference SNP rs1051730 and CHRNB4 rs6495309 genotypes, we grouped the less than 0.1, the same genes/SNPs were selected in gated channel individuals into four genotype groups, as follows: zero or one risk activity term and the 4 transporter activity terms whose func- alleles of either gene; only two risk alleles; three risk alleles; and tional annotation for eQTL study were borderline significantly four risk alleles (Tables 5 and 6). We observed that the combined associated with lung cancer. Therefore, we explored the accu- genotypes in those carrying four risk alleles, compared with those carrying zero or one risk allele, had a >2-fold increased risk in mulated risk in the neuroactive ligand receptor interaction = – pathway and the gated channel activity term. discovery (adjusted OR 2.07; 95% CI, 1.80 2.38), and exhibited a 1.74-fold elevated risk in replication (adjusted OR = 1.74; 95% In total, for the neuroactive ligand receptor interaction – pathway, CHRNA3 rs1051730 and CHRNB4 rs6495309 reached CI, 1.61 1.89) for lung cancer risk. The difference in CHRNA3 the criterion and were included for further analysis of the rs1051730 and CHRNB4 rs6495309 combination was associated independent association and combined effects of SNPs on lung with lung cancer risk in a dose-dependent fashion in discovery (adjusted trend test P = 1.55 × 10−24) and replication (adjusted cancer risk. With respect to the gated channel activity term, = −50 CHRNA3 rs1051730, CHRNB4 rs6495309, KCNJ4 rs138396, and trend test P 4.80 × 10 ). SCN2B rs7944321 reached the criterion and were included for Based on the number of risk alleles of the combined genotypes further analysis. Because the frequency of CHRNA3 rs1051730 T, in the gated channel activity term, we grouped the individuals into four genotype groups, as follows: zero or one risk allele of CHRNB4 rs6495309 C, KCNJ4 rs138396 A, and SCN2B rs7944321 fi A alleles among the cases were slightly higher than among either gene; only two or three risk alleles; four or ve risk alleles; controls in the discovery cohorts and in the replication cohort, we and six to eight risk alleles (Tables 5 and 6). Compared with assumed these alleles may be putative risk alleles in further individuals with zero or one risk allele, we observed that the combined analyses. combined genotypes in those carrying six to eight risk alleles had

NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y

Table 3 Pathways and GO terms in discovery and replication with a threshold of FDR < 0.25 in both phase

Source Pathway/gene set name Meta-analysis of Replication cohort discovery cohorts P-value FDR P-value FDR KEGG neuroactive ligand receptor interaction 0.001 0.006 0.013 0.042 Reactome neuronal system 0.001 0.015 0.014 0.082 transmission across chemical synapses 0.003 0.023 0.003 0.028 Gene Oncology substrate-specific channel activity <0.001 0.005 0.002 0.004 ion channel activity <0.001 0.005 0.002 0.004 substrate-specific transporter activity 0.001 0.006 0.010 0.013 cation channel activity 0.002 0.006 0.002 0.008 ion transmembrane transporter activity 0.002 0.006 0.004 0.009 metal ion transmembrane transporter activity 0.001 0.006 0.002 0.003 transmembrane transporter activity <0.001 0.006 0.007 0.012 gated channel activity 0.002 0.006 0.001 0.016 substrate-specific transmembrane transporter activity <0.001 0.006 0.006 0.012 cation transmembrane transporter activity 0.001 0.006 0.003 0.006 transmembrane receptor activity 0.001 0.007 <0.001 0.006 receptor activity 0.017 0.021 0.002 0.007 macromolecular complex 0.001 0.008 0.006 0.037 complex 0.001 0.012 0.006 0.080 intrinsic to membrane 0.003 0.022 0.002 0.025 intrinsic to plasma membrane 0.004 0.024 0.002 0.030 integral to membrane 0.003 0.027 0.002 0.027 membrane part 0.005 0.028 0.013 0.050 membrane 0.006 0.032 0.024 0.071 plasma membrane part 0.009 0.032 0.005 0.030 integral to plasma membrane 0.003 0.035 0.002 0.051 plasma membrane 0.015 0.044 0.034 0.085

enrichment analyses with 2522 SNPs (Supplementary Data 7). Among those SNPs that are significant at P < 0.05, pathway Table 4 Functional annotation of eQTL study results for analysis found the same one KEGG pathway and eight GO terms our susceptibility GWAS GO terms with a threshold of were identified and validated as significantly associated with lung P value < 0.1 cancer19 risk in the meta-analysis of discovery cohorts and in the replication cohort with statistically significant P values (Supple- GO term P-value mentary Table 6). In addition, we found that the KEGG neu- gated channel activity 0.029 roactive ligand receptor interaction pathway exhibited a ion channel activity 0.071 significant association with smoking-related lung cancer risk in cation channel activity 0.073 the meta-analysis results of the discovery phase (i-GSEA P = substrate-specific channel activity 0.08 0.004 and FDR = 0.017) and in the replication cohort (i-GSEA P cation transmembrane transporter activity 0.098 < 0.001 and FDR = 0.003). We did not explore chromosome 15q25.1-related for lung cancer in never smokers because the association between SNPs within chromosomes 15q25.1 and lung a >2-fold increased risk in discovery (adjusted OR = 2.17; 95% cancer did not reach genome-wide significance in the discovery CI, 1.74–2.70) and a 1.7-fold elevated risk in replication (adjusted cohorts. OR = 1.72; 95% CI, 1.52–1.95) for lung cancer risk. The fi difference between the four genotype groups had a signi cant Discussion association with lung cancer risk in a dose-dependent fashion in The chromosome 15q25.1 locus was first identified as the leading discovery (adjusted trend test P = 2.46 × 10−18) and in replication = −37 susceptibility locus for lung cancer in Caucasians in 2008 by our (adjusted trend test P 2.09 × 10 ). group2 and by Hung et al.3, and was then replicated in a Chinese population18, in African-Americans7,17, and by an international Stratified gene enrichment analyses by smoking status. When lung cancer consortium20, as well as in smokers21. However, to we performed the stratified analyses according to smoking status, our knowledge, no study to date has investigated how this locus we found that chromosome 15q25.1 was the most significant affects lung cancer etiology, nor documented the susceptibility susceptibility locus for lung cancer risk among smokers in the 1st pathways by which chromosome 15q25.1 modifies lung cancer and 2nd discovery cohort, and a meta-analysis of discovery risk and is involved in lung cancer pathogenesis. The results cohorts also supported this finding. Eight SNPs within chromo- presented here confirm the central role of chromosome 15q25.1 some 15q25.1 were identified and validated as associated with in lung cancer pathogenesis and provide confirmation of the smoking-related lung cancer and were defined as the index SNPs pathways that affect lung cancer pathogenesis. We identified the for further selection of the candidate SNPs (Supplementary neuroactive ligand receptor interaction pathway is involved as a Table 5). In total, 3401 candidate SNPs (Supplementary Data 6) mechanism by which the chromosome 15q25.1 locus influences in the whole genome were identified and verified to interact with lung cancer risk, in large discovery cohorts and in the replication eight index SNPs in the 1st and 2nd discovery cohort and in the cohort, and confirmed the involvement using functional anno- meta-analysis of discovery. After pruning for LD, we conducted tation of an eQTL study with lung tissue from lung cancer

6 NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y ARTICLE

acb KCND3 EDNRB KCND3 KCNB2 KCNB2 SCNN1B CHRNB4 KCNQ3 CFTR ASIC4 PARD3 TRPV1 CACNG2 CHRND KCNQ3 ADRA1A CHRNA3 ASIC2 GRM5 CHNRD CACNB2 CACNG2 KCNJ6 FSHR GRM7 KCNA6 CHRNB4 KCNA6 SCN7A ASIC2 SCN5A CACNA1A HRH1 GRIN2A KCNJ6 GRID2 CHRND CHRNB4 GRIK1 GABRG3 SCN7A SCN2A SCN2A CACNB2 GRM8 CACNA1A PRL CHRNA3 KCNJ4 CHRNA3 RYR3 RYR2 KCNJ4 PLG GABBR2 SCN2B SCN5A LPAR3 NTSR1 GABBR1 RYR2 SCNN1B SCN2B GRM6 CFTR Networks Co-expression CACNA2D1 CACNA2D1 RYR3 Shared protein domains Co-localization CHRND d e f Physical interactions TRPV1 KCNB2 Pathway CHRND CACNA2D1 SCNN1B Predicted KCND3 SLC12A2 CFTR KCNA6 KCNQ3 CHRNB4

KCNJ4 KCNB2 SCN7A KCNA6 CNRNB4 RYR3 KCNJ4 SCNN1B RYR2 ASIC4 KCNA6 CACNA2D1 KCNB2 SCN2B KCNJ4 TRPV1 CHRNB4 CHRNA3 KCNJ6 KCNQ3 KCNJ6 CHRND KCNJ6 SCNN1B CACNA2D1 SCN5A SCN2A RYR3 SCN2A CACNG2 CHRNA3 ASIC2 ITPR1 CACNA1A CACNG2 SCN5A KCNQ3 ASIC2 CACNA1A ASIC4 RYR2 SCN5A ASIC2 RYR2 SCN7A RYR3 KCND3 SCN7A SCN2B SCN2A CACNA1A CHRNA3 TRPV1 KCND3 CACNG2 ATP6V1B2 CACNB2

SCN2B CACNB2 CACNB2 SLC8A1 SLC17A5

Fig. 2 Gene network of the susceptibility pathways/GO terms. Each node is a gene, which was the selected gene in our GWAS pathway analysis and is shown in Supplementary Data 4. The connecting lines are drawn if the two genes have a relationship such as co-expression, shared protein domain, physical interactions, co-localization, or pathway. The thickness of the lines represents the degree of similarity between two genes. Gene network is produced using GeneMANIA. a The network of KEGG neuroactive ligand receptor interaction pathway consists of 23 genes and only TRPV1 in the associated gene list identified by our GWAS enrichment was shown no direct relationship with other associated genes. b The network of gated channel activity GO term consists all of 22 genes identified by our GWAS enrichment. c The network of ion channel activity GO term consists all of 24 genes identified by our GWAS enrichment. d The network of cation channel activity GO term consists all of 22 genes identified by our GWAS enrichment. e The network of substrate-specific channel activity GO term consists all of 24 genes identified by our GWAS enrichment. f The network of cation transmembrane transporter activity GO term consists 29 and only SLC4A4 in the associated gene list genes identified by our GWAS enrichment was shown no direct relationship with other associated genes patients. Gated channel activity term was verified to be sig- information processing and signaling molecules and interac- nificantly associated with the mechanism of chromosome 15q25.1 tion27. This pathway was found to be associated with certain in conferring lung cancer risk in GWAS pathway analysis of neuropsychiatric disorders and congenital diseases28,29. A recent discovery and replication phase and in the functional annotation study of 23 lung squamous cell carcinoma and paired normal of an eQTL study. In addition, risk alleles in SNPs in the genes in lung tissue evaluated gene expression associated with microRNA- our susceptibility pathways can be combines to confer the lung 375 and found that the neuroactive ligand receptor interaction cancer risk. pathway was one of the possible pathways associated with lung Pathway analyses, being a complementary approach to single- squamous cell carcinoma30. Another study investigated the dif- point analyses, can determine whether a set of genes from a ferentially expressed genes in 48 lung adenocarcinomas and 47 biological pathway jointly affects the risk of a disease trait and controls and revealed this pathway was one possible mechanism uncover insights into disease etiology, and therefore such analyses of lung adenocarcinoma31. are beneficial to better understand the bridge between genotypes These two reports revealed the dysregulation of the neuroactive and . GWAS pathway analyses together with gene ligand receptor interaction pathway in lung cancer and supported expression studies identified new pathways involved in the our findings that this pathway plays a role in lung cancer etiology. etiology of cardiovascular disease22, immune-related The neuroactive ligand receptor interaction pathway is also disorders23,24, and body fat distribution25. The first wave of implicated in nicotine dependence, which also contributes to GWA studies to explore lung cancer susceptibility regions iden- increasing lung cancer risk. Most of the selected genes of this tified several candidate genes and causal variants for lung cancer pathway from the current GWAS pathway analyses, including, risk. Going forward, investigation of the pathogenic pathways will CHRNA5–CHRNA3–CHRNB432, GABBR133, GABBR233, GRM734, be essential to provide a better understanding of the process of GRM835, GRIN2A35,andCHRND36 are significantly associated with lung cancer etiology and will contribute to further control of lung nicotine dependence and smoking behavior, as well as known cancer. smoking-related diseases such as lung cancer. For example, GRM7 The neuroactive ligand receptor interaction pathway mainly in chromosome 3p26.1 and GRM8 in chromosome 7q31.33 are consists a group of neuroreceptor genes, such as important in the biological processes and development of nicotine receptor26 and proto-, and is involved in environmental dependence, and some of these risks may be shared across diverse

NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y

a Metal ion transmembrane transporter activity

Ion transmembrane transporter activity

Plasma membrane Kegg neuroactive ligand receptor interaction Gated channel activity

Transmembrane transporter activity Membrane part Membrane

Transmembrane receptor activity Ion channel activity

Plasma membrane part Intrinsic to membrane Cation channel activity

Substrate specific channel activity Receptor activity

Intrinsic to plasma membrane Intergral to membrane Substrate specific transmembrane transporter activity

Cation transmembrane transporter activity Integral to plasma membrane Substrate specific transporter activity

b

GO terms

Molecular function Cellular component

Receptor activity Transporter activity Membrane

Transmembrane Transmembrane Substrate-specific receptor activity transporter activity transporter activity Plasma membrane Membrane part

Substrate-specific Intrinsic component of transmembrane Plasma membrane part membrane transporter activity

Channel activity Substrate sprcific Ion transmembrane Integral component of Intrinsic component of channel activity transporter activity membrane plasma membrane

Gated Cation transmembrane Integral component of Ion channel activity channel activity transporter activity plasma membrane

p < 0.05 in eQTL study Metal ion Cation channel activity transmembrane p > 0.05 and p < 0.10 in eQTL study transporter activity p > 0.10 in eQTL study

Not associated with our data

Fig. 3 Association among susceptibility pathways and GO terms. a Each node is a subcomponent. Connecting lines are drawn if the overlapping coefficient between the two nodes is greater than 0.8. This picture was drawn by Cytoscape with EnrichmentMap plugin, using the standard gene set file. b The relationship between the susceptibility GO terms. The relationship between the GO terms was based on the knowledge from the European Bioinformatics Institute. All gene sets which were enriched in our GWAS pathway analyses were colored. Of the GO terms, 10 terms belonged to a transporter activity term. Both substrate-specific transporter activity term and transmembrane transporter activity term are part of the transporter activity term and have a child term of substrate-specific transmembrane transporter activity. Both terms of substrate-specific channel activity and ion transmembrane transporter activity are part of the substrate-specific transmembrane transporter activity term, and have a child term of ion channel activity which had a child term of cation channel activity. In addition, cation transmembrane transporter activity was part of ion transmembrane transporter activity, and had a child term of metal ion transmembrane transporter activity. Gated channel activity term was a child term of transmembrane transporter activity and shared several child terms with ion channel activity term

8 NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y ARTICLE

Table 5 Individual and combined effects of SNPs from our susceptibility pathways on lung cancer risk in the meta-analysis of Discovery Cohorts

Univariate analysis Multivariate analysis* OR L95 U95 PP_trend OR L95 U95 PP_trend CHRNA3 rs1051730 0 1 1.29E-25 2.68E-24 1 1.31 1.21 1.43 3.46E-10 1.32 1.21 1.44 1.23E-09 2 1.86 1.65 2.09 8.80E-25 1.89 1.67 2.14 7.44E-24 CHRNB4 rs6495309 0 1 2.35E-12 4.56E-11 1 1.22 1 1.5 0.05 1.21 0.98 1.49 0.08 2 1.58 1.3 1.93 5.21E-06 1.56 1.27 1.91 2.67E-05 KCNJ4 rs138396 0 1 0.06 0.04 1 1 0.91 1.09 0.94 1.01 0.92 1.1 0.89 2 1.13 1.01 1.26 0.03 1.15 1.02 1.29 0.02 SCN2B rs7944321 0 1 0.07 0.14 1 1.07 0.98 1.16 0.13 1.06 0.97 1.15 0.19 2 1.12 0.94 1.34 0.22 1.09 0.91 1.32 0.35 neuroactive ligand receptor interaction pathway (CHRNA3 rs1051730 and CHRNB4 rs6495309) 0-1 1 2.80E-26 1 1.55E-24 2 1.32 1.18 1.47 8.45E-07 1.32 1.18 1.48 1.82E-06 3 1.48 1.32 1.65 4.94E-12 1.47 1.31 1.65 6.14E-11 4 2.04 1.79 2.33 2.22E-26 2.07 1.8 2.38 4.99E-25 gated channel activity term (CHRNA3 rs1051730, CHRNB4 rs6495309, KCNJ4 rs138396 and SCN2B rs7944321) 0-1 1 4.39E-19 2.46E-18 2-3 1.29 1.08 1.53 5.50E-03 1.3 1.08 1.56 5.50E-03 4-5 1.6 1.34 1.9 1.84E-07 1.63 1.36 1.96 1.65E-07 6-8 2.15 1.74 2.65 9.31E-13 2.17 1.74 2.7 4.01E-12

*Adjusted by age sex smoke status in the Logistic Models. population34,35. Second, the receptors in this malignant phenotypes43. Finally, the majority of genes which pathway (including CHRNA5, CHRNA3, CHRNB4,andCHRND) were chosen as the significant or selected genes in the current participate in the biological process by which smoking induces GWAS pathway analyses, such as KCNJ444, CACNB245, and nicotine dependence. Thus, the association between chromosome SLC14A46, have been reported to be involved in cancer etiology 15q25.1, this pathway, lung cancer likely reflects, at least partially, an and development. In addition, our finding that genes from our indirect effect of these genes on lung cancer risk through their effects susceptibility transporter activity GO terms jointly affected the on smoking behavior. Aside from nicotine dependence and lung chromosome 15q25.1-related lung cancer risk also supported the cancer risk, this pathway also influences other neurotransmitter- hypothesis that these pathways were implicated in the mechan- mediated disorders, such as dependence37,Parkinson’s isms of lung cancer. Therefore, we speculated that the accumu- disease38, therapy39, and spectrum dis- lated effects of multiple functionally-related genes from our orders40. Thus this pathway may have many complex effects on susceptibility pathways caused lung cancer occurrence, even lung cancer risk, either directly by influencing lung tissues or lung though a single gene in any pathway may have only a moderate or cancers as suggested by expression studies, indirectly through weak effect on lung cancer risk. However, more biological smoking behavior or even through effects on other mechanism research involving these pathways needs to be carried neurotransmitter-related diseases. out in future. Another important finding in our study is that a few trans- Our GWAS pathway analyses also suggest that receptor activity porter activity GO terms, such as gated channel activity, were GO terms and membrane terms, might play roles in the implicated in the mechanisms of chromosome 15q25.1-modified mechanism via which the chromosome 15q25.1 locus is involved lung cancer risk. Although we first reported the association in the pathogenesis of lung cancer, though the involvement between the transporter activity GO terms and lung cancer risk, exhibits nonsignificant association in the functional annotation of this finding could be supported by previous studies. First, eQTL studies. This finding was supported by the fact that most numerous studies have shown that a few transporter activity GO selected genes in the two pathways from current GWAS pathway terms are involved in the processes driving the malignancy, such analyses, including CHRNA3, CHRNB4, TGFBR247, RTPRG48, as calcium channels41,42, which belong to the gated channel group FGFR149, OPCML50,51, and ROR152,53, were involved in influ- or ion channel group. Second, CHRNA5–CHRNA3–CHRNB4 encing lung cancer risk. It is thus likely that combined effects and within chromosome 15q25.1 has been documented to modify interaction of the genes in the two susceptibility pathways trig- some pathways of gated channel activity and ion channel activity, gered lung cancer pathogenesis. Although we do not know at this and therefore to play a crucial role in leading to and maintaining stage whether the biological pathways identified in our study have

NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y

Table 6 Individual and combined effects of SNPs from our susceptibility pathways on lung cancer risk in Replication Cohort

Univariate analysis Multivariate analysis* OR L95 U95 P P_trend OR L95 U95 P P_trend CHRNA3 rs1051730 0 1 1.70E-51 1 1.82E-44 1 1.28 1.22 1.34 3.43E-23 1.27 1.21 1.33 2.62E-20 2 1.65 1.54 1.77 2.47E-46 1.63 1.52 1.75 1.22E-40 CHRNB4 rs6495309 0 1 1.40E-29 1 1.55E-24 1 1.14 1.02 1.28 0.02 1.12 1 1.26 0.05 2 1.46 1.31 1.63 1.32E-11 1.42 1.27 1.6 2.13E-09 KCNJ4 rs138396 0 1 2.00E-04 1 5.00E-04 1 1.06 1.01 1.11 0.03 1.05 1 1.11 0.08 2 1.13 1.06 1.21 2.00E-04 1.13 1.06 1.21 4.00E-04 SCN2B rs7944321 0 1 8.90E-03 1 0.01 1 1.06 1.01 1.11 0.026 1.05 1 1.11 0.04 2 1.1 0.99 1.22 0.083 1.1 0.99 1.23 0.08 neuroactive ligand receptor interaction pathway (CHRNA3 rs1051730 and CHRNB4 rs6495309) 0-1 1 1.11E-58 1 4.80E-50 2 1.21 1.14 1.28 1.79E-09 1.21 1.14 1.29 2.96E-09 3 1.44 1.36 1.54 1.44E-30 1.42 1.33 1.52 2.43E-26 4 1.77 1.64 1.91 2.65E-49 1.74 1.61 1.89 3.32E-43 gated channel activity term (CHRNA3 rs1051730, CHRNB4 rs6495309, KCNJ4 rs138396 and SCN2B rs7944321) 0-1 1 3.36E-44 1 2.09E-37 2-3 1.18 1.07 1.3 8.00E-04 1.15 1.04 1.28 5.40E-03 4-5 1.51 1.37 1.67 4.89E-17 1.47 1.33 1.63 7.77E-14 6-8 1.79 1.59 2.02 2.84E-22 1.72 1.52 1.95 8.27E-18

*Adjusted by age sex smoke status in the Logistic Model a direct functional role in affecting lung cancer etiology, the complex disorder12. Therefore, we applied a pathway-based susceptibility pathways represent attractive candidates. approach to identify the sets of pathways that were significantly Despite these intriguing findings in this well-characterized associated with cancer risk. To correct for multiple testing pathway study, our investigation still had some limitations. associated with pathway analysis we followed a false discovery First, we performed the epistasis test between SNPs in the rate approach. Identification and verification of the suscept- univariate model; this may have led to the omission of the ibility pathways in both GWAS analysis of discovery and effects of other cofactors, such as age, gender, and smoking replication and the eQTL study confirmed the reliability of our status, on the interaction between SNPs. However, not including study. Nevertheless, our results should be confirmed in the cofactors typically reduces powerratherthanfalsepositive future with genome-wide expression data and protein–protein findings, so that a model ignoring cofactors seems a reasonable interaction data, and more biological mechanism research first step to analysis. In addition, we only retained the SNP involving these pathways needs to be carried out. Finally, we pairs, which exhibited statistically significant interactions in the realized that, in the GWAS pathway analyses, all subjects used 1st discovery cohort and in the meta-analysis results and in both discovery and replication phase are of European showed at least a borderline significantinteractioninthe2nd ancestry, and that the subjects in the lung eQTL study are discovery cohort, for further analysis, which ensured the relia- French Canadians, which suggest that our findings can be bility of this study. Second, only GWAS data without genome- applied to the population of European ancestry. wide expression data were applied to identify the SNPs and their Many genetic variants certainly contribute to the large unex- related genes that interact with the chromosome 15q25.1 locus. plained portion of lung cancer pathogenesis, and it is expected However, the susceptibility pathways and GO terms from our that more mechanisms contributing to increased lung cancer risk GWA study can share pathways with the functional annotation will be identified in the future. The data presented here suggest of genes in chromosome 15q25.1 and its related genes identified that common genetic variations within chromosome 15q25.1 are by eQTL studies, which supports the interpretation of some of likely to affect lung cancer etiology by influencing the expression/ our findings. A concern in this study is the large number of tests structure and thereby the function of genes that comprise the that were performed to identify epistatically acting SNPs. neuroactive ligand receptor interaction pathway or gated channel However, the purpose of conducting SNP–SNP interaction test activity and related terms. To the best of our knowledge, this is in the current study is to select a group of candidate SNPs which the first study to explore the pathogenic pathways related to the are the most associated with the index SNPs in chromosome mechanisms through which the chromosome 15q25.1 locus 15q25.1. On the other hand, majority of SNPs/genes in the modifies lung cancer risk. These pathways provide important pathways have weak and minor influence on the pathogenesis of leads to a better understanding of the etiology and development

10 NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y ARTICLE of lung cancer, potentially shorten the interval between biologic association analysis using the basic meta-analysis function in PLINK v1.9, which knowledge and improved patient care, and are beneficial to the conducted a fixed-effects analysis using inverse variance weighting to combine the design of future functional studies to increase understanding of studies. In the discovery phase, a total of 310,276 SNPs were the same in both the 1st and 2nd discovery cohorts, passed quality control steps and were retained for these mechanisms. association analyses and meta-analyses.

Methods Index SNP selection. SNPs within the 15q25.1 locus spanning 203 kb, which were Study subjects. The study design is presented in Fig. 1. In the discovery phase, two associated with lung cancer risk with P values of less than 5 × 10−8 in the 1st dis- discovery cohorts were used, to perform SNP selection and GWAS pathway ana- covery cohort and in the meta-analysis of the 1st and 2nd discovery cohort, were lysis. The Environment And Genetics in Lung cancer Etiology (EAGLE) study54, selected as index SNPs. A Bonferroni correction was applied to adjust the association which was composed of 1923 lung cancer cases and 1977 healthy controls, was used of the index SNP for multiple comparisons with using PLINK version 1.9 and R as the 1st discovery cohort. The EAGLE study participants were recruited in package of adjust P-values for multiple comparisons. Because 310,276 SNPs in the between 2002 and 2005 for a population-based case-control study, which included discovery phase were performed for association analyses and meta-analyses and 3411 incident primary lung cancer cases of any histologic type and healthy population- SNPs in the replication phase were conducted association analyses, adjustments for based controls, matched by gender, residence, and 5-year age-group. All subjects in 310,276 tests in discovery and 3411 tests in replication were used. After selection, the EAGLE study are of Italian nationality and born in Italy. eight SNPs met the criterion for index SNP selection and were used for further We used the M.D. Anderson Cancer Center (MDACC) study2 and the selection of the candidate SNPs in the whole genome which had potential functional International Agency for Research on Cancer (IARC) study55 as the second connections with the index SNPs in the chromosome 15q25.1 locus. discovery cohort, in total comprising 2995 lung cancer cases and 3578 healthy controls. The MDACC study participants were recruited at the University of Texas Epistasis test and candidate SNP selection MD Anderson Cancer Center between 1997 and 2007 and included 1154 primary . The epistasis test between SNPs in lung cancer cases of adenocarcinoma and squamous cell carcinoma and 1136 the whole genome and the chromosome 15q25.1 locus was performed separately healthy controls that were matched to cases by smoking behavior, ethnicity, and 5- for the 1st and 2nd discovery cohorts using the application PLINK version 1.9. A year age-group. The IARC study was a multicenter study from six countries of total of 2,482,200 SNP × SNP pairs were calculated in both cohorts. We then central Europe, which recruited newly-diagnosed lung cancer cases of any carried out a meta-analysis to combine the epistasis results in the 1st and 2nd discovery cohorts with the application of the basic meta-analysis function in histologic type and healthy individuals without diagnosed cancers or any family fi history of cancers, matched to cases by sex, age, and center or region within PLINK v1.9 that conducted xed-effects analysis using inverse variance weighting. European countries. The current case-control comparison included 1841 cases and The SNPs, which interacted with the index SNPs within the 15q25.1 locus with 2442 controls from IARC available data. All subjects used in the current study are an epistasis P value of less than 0.05 in the 1st discovery cohort and in the meta- of European ancestry. analysis of both discovery cohorts, and less than 0.10 in the 2nd discovery cohort, The Oncoarray consortium, which analyzed samples of 18,439 European- were selected as the candidate SNPs for further pathway analyses. After selection, 3409 candidate SNPs met the criterion for candidate SNP selection and were descent lung cancer cases and 14,026 European-descent healthy controls, was used fi for replication. The Oncoarray consortium is a network created to increase identi ed to have potential connections with the eight index SNPs in the understanding of the genetic architecture of common cancers and included GWAS chromosome 15q25.1 locus. data of a total of 57,776 samples, obtained from 29 studies across North America and Europe, as well as Asia56. The participants who lacked imputed data, disease Pathway analysis with GWAS data. We included curated pathways from the status, were close relatives (second-degree relatives or closer) or had low-quality Canonical pathways, Reactome, BioCarta, KEGG databases59, and GO60. The DNA, or were non-European, were excluded from the current study. Therefore, a Reactome database is based on reactions between diverse molecular species rather total of 18,439 cases and 14,026 healthy controls were included in the current case- than limiting the pathways to protein–protein interactions. The KEGG database control study. represents experimentally-validated pathways of metabolic processes and gene sets Noncancerous lung tissue from Laval University was obtained from 420 of human diseases. GO is a major framework for the model of biology that defines patients undergoing surgical resection for lung cancer. Through quality controls, classes used to describe gene function, and relationships between these concepts. 409 samples were used for whole-genome gene expression profiling in the lung and Gene set enrichment analysis was performed by i-GSEA4GWAS. SNPs were eQTL analysis. All patients in the Laval cohort were from a French Canadian retained for analysis that were within 20 kb upstream or downstream of a gene. We population and underwent lung cancer surgery between April 2004 and December used gene set databases of canonical pathways, GO biological process, GO 2008. Samples were stored at the Institut universitaire de cardiologie et de molecular function, and GO cellular component, separately, and applied the pneumologie de Quebec (IUCPQ) site of the Respiratory Health Network Tissue standard input gene set file of KEGG, BioCarta and, Reactome, which were Bank of the Fonds de la recherche en sante du Quebec (www.tissuebank.ca)57. Lung downloaded from the Molecular Signatures Database (MSigDB) in GSEA (http:// tissue samples were obtained in accordance with Institutional Review Board software.broadinstitute.org/gsea/msigdb/collections.jsp), we selected gene sets guidelines. Genotype data and a detailed reports were available for all whose number of genes were between 21 and 200, and without limiting gene sets by patients. keyword (e.g. immune) and without masking the MHC region. In order to reduce Human participant approval was obtained from the Institutional Review Board the possibility of biased results due to LD patterns from SNP arrays13, we pruned of each participating Hospital and University and by the National Cancer Institute, the set of SNPs, including the index SNPs and the candidate SNPs, for LD and only Bethesda, MD, USA. Written informed consent was obtained from each inputted SNPs not in LD (r2 < 0.2) to enrich pathways. After pruning, we participant. performed gene set enrichment analysis with associated P values of the SNPs that using the option of “−logarithm transformation”, as required by the software, in Genotyping. A total of 561,466 SNPs in EAGLE samples were genotyped using the meta-analysis of discovery cohorts and the replication cohort, respectively. Illumina HumanHap550v3_B BeadChips (Illumina, San Diego, CA, USA) at the i-GSEA4GWAS performs gene set enrichment to identify pathways that show a higher proportion of statistically significant genes than randomly expected and, Center for Inherited Disease Research, part of the Gene Environment Association fi 61,62 Studies Initiative (GENEVA) funded through the National with some modi cations, is based on the GSEA algorithm . i-GSEA4GWAS implements SNP label permutation to analyze SNP P values and to correct gene Research Institute. Genotyping of 317,498 SNPs in MDACC samples was carried fi out using Illumina 300K HumanHap v1.155. A further 317,139 SNPs in IARC and gene set variation and multiplies a signi cance proportion ratio factor to the 55 enrichment score (ES) to yield the significant proportion-based enrichment score samples were genotyped using either Illumina 317k or 370Duo arrays . A novel fi technology developed by Illumina to facilitate efficient genotyping was used to (SPES). SPES multiplies by the proportion of signi cant SNPs in the pathway so 56 that i-GSEA4GWAS identifies pathways/gene sets including a high proportion of genotype a total of 494,763 SNPs in Oncoarray samples . rs16969968 in the lung fi eQTL study was genotyped using the Illumina Human1M-Duo BeadChip10. signi cant genes. It is, therefore, more appropriate for study of the combined effects of possibly modest SNPs/genes and gives i-GSEA improved sensitivity for complex diseases13. Pathways/gene sets with FDR < 0.25 were regarded as possibly Imputation. To effectively replicate the findings in the discovery phase, we associated with traits; FDR < 0.05 were regarded as high confidence or with imputed additional SNPs in Oncoarray samples to allow us to integrate the data statistical significance. with the common SNPs studied in the discovery cohorts. Imputation was per- Gene set enrichment analysis was also performed by GSA-SNP2, which is a formed with the software package Impute 2 v2.3.258 and 1000 Genomes Project successor of GSA-SNP14,15, using same SNPs and P value and to retain SNPs with Phase 3 (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data). Following impu- 20 kb upstream or downstream of a gene. We used gene set databases which were tation, 20,734,083 SNPs in the whole genome were available for further analysis. downloaded from the Molecular Signatures Database (MSigDB) in GSEA and selected gene sets whose number of genes was between 21 and 200. Association analysis and meta-analysis. Case-control association tests for gen- otyped data were conducted using 1-degree-of-freedom Cochran–Mantel–Haenszel Genome-wide gene expression levels and eQTL study. All lung samples were tests with the application of PLINK version 1.9. SNPTEST v2.5.2 was used in the reviewed by an experienced pathologist for clinical diagnosis and staging. Each analysis of case-control association for each SNP in imputed data. Meta-analysis of lung tissue sample was snap-frozen in liquid nitrogen and stored at −80 °C until the 1st and 2nd discovery cohorts was performed on the results of case-control further processing. The SV96 Total RNA Isolation System (Promega) was used to

NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y extract RNA. Expression profiling was carried out with an Affymetrix custom array 2. Amos, C. I. et al. Genome-wide association scan of tag SNPs identifies a (GEO platform GPL10379). The robust multichip average method63 as imple- susceptibility locus for lung cancer at 15q25.1. Nat. Genet. 40, 616–622 (2008). mented in the Affymetrix Power Tools software was used to examine expression 3. Hung, R. J. et al. A susceptibility locus for lung cancer maps to nicotinic values. Standard quality control parameters64 were applied to check the quality of subunit genes on 15q25. Nature 452, 633–637 (2008). the arrays. Through quality controls, 409 patients were available for eQTL analyses 4. Thorgeirsson, T. E. et al. A variant associated with nicotine dependence, lung with both genotypes and gene expression levels. R statistical software was used to cancer and peripheral arterial disease. Nature 452, 638–642 (2008). perform tests for robust multichip average expression. Association tests between 5. Liu, J. Z. et al. Meta-analysis and imputation refines the association of 15q25 the expression traits, which were adjusted for age, sex, and smoking status, and the with smoking quantity. Nat. Genet. 42, 436–440 (2010). fi most signi cant SNPs in each genes within chromosome 15q25.1 associated with 6. Tobacco and Genetics Consortium. Genome-wide meta-analyses identify 16 17,18 lung cancer risk in previous reports, including rs16969968 , rs6495309 , and multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 2,17 rs8034191 , were estimated with the application of quantitative association tests (2010). implemented in PLINK. A P value of less than 0.05 was considered to be 7. Walsh, K. M. et al. Fine-mapping of the 5p15.33, 6p22.1-.31, and 15q25.1 fi signi cant. regions identifies functional and histology-specific lung cancer susceptibility loci in African-Americans. Cancer Epidemiol. Biomarkers Prev. 22, 251–260 Functional validation of pathways with eQTL results. The genes in the whole (2013). genome with a linear regression P value of less than 0.0005 in the eQTL study were 8. Chen, L. S. et al. Smoking and genetic risk variation across populations of selected as candidate genes for further pathway/gene set analysis. Because the European, Asian, and African American ancestry—a meta-analysis of “Functional annotation table” of DAVID may query associated terms for all genes chromosome 15q25. Genet. Epidemiol. 36, 340–351 (2012). and “Functional annotation clustering” of DAVID can cluster functionally similar 9. Paliwal, A. et al. Aberrant DNA methylation links cancer susceptibility locus genes into groups65, we employed the “Functional annotation table” of DAVID 15q25.1 to apoptotic regulation and lung cancer. Cancer Res. 70, 2779–2788 (version 6.8) to perform functional annotation of biological pathways in Reactome, (2010). BioCarta, and KEGG and used “Functional annotation clustering” of DAVID 10. Nguyen, J. D. et al. Susceptibility loci for lung cancer are associated with (version 6.8) to cluster functionally similar genes into groups of GO terms, with mRNA levels of nearby genes in the lung. Carcinogenesis 35, 2653–2659 Species and background being set up as Homo sapiens. We used the candidate (2014). genes in the whole genome with and without the genes in the chromosome 15q25.1 11. Ji, X. et al. The role of haplotype in 15q25.1 locus in lung cancer risk: results of ‘ ’ locus to perform the analyses, by choosing Homo sapiens selection to limit scanning . Carcinogenesis 36, 1275–1283 (2015). annotations by species. 12. Fellay, J. et al. Common genetic variation and the control of HIV-1 in humans. PLoS Genet. 5, e1000791 (2009). Relationship of the susceptibility pathways and GO terms. We clarified the 13. Zhang, K., Cui, S., Chang, S., Zhang, L. & Wang, J. i-GSEA4GWAS: a web relationship of sharing pathways identified by GWAS analysis of both discovery server for identification of pathways/gene sets associated with traits by and replication phase and validated by eQTL studies with information from the applying an improved gene set enrichment analysis to genome-wide European Bioinformatics Institute (http://www.ebi.ac.uk/QuickGO/) and applied association study. Nucleic Acids Res. 38, W90–W95 (2010). Cytoscape (version 3.4.0) with EnrichmentMap plugin (version 2.1). The standard 14. Kwon, J. S., Kim, J., Nam, D. & Kim, S. Performance comparison of two gene gene sets for EnrichmentMap plugin were downloaded from MSigDB Collections set analysis methods for genome-wide association study results: GSA-SNP vs (http://software.broadinstitute.org/gsea/msigdb/collections.jsp). For the selected i-GSEA4GWAS. Genomics Inform. 10, 5 (2012). genes in our susceptibility pathways from current GWAS pathway analyses, we 15. Nam, D., Kim, J., Kim, SY. & Kim, S. GSA-SNP: a general approach for gene achieved the gene expression level in normal lung tissue from Genecards database set analysis of polymorphisms. Nucleic Acids Res. 38, 6 (2010). (http://www.genecards.org/). 16. George, A. A. et al. Function of human α3β4α5 nicotinic acetylcholine receptors is reduced by the α5(D398N) variant. J. Biol. Chem. 287, Accumulating risk of lung cancer. We calculated the individual and combined 25151–25162 (2012). effect of genes in the pathways/gene set on lung cancer risk using the SNPs that 17. Amos, C. I. et al. Nicotinic acetylcholine receptor region on chromosome were identified by i-GSEA4GWAS as reference SNPs for the selected genes in each 15q25 and lung cancer risk among African Americans: a case-control study. J. pathway. The genes whose reference SNPs were associated with lung cancer risk Natl. Cancer Inst. 102, 1199–1205 (2010). with borderline significance (P < 0.1) in the meta-analysis of discovery cohorts and 18. Wu, C. et al. Genetic variants on chromosome 15q25 associated with lung in the replication cohort, were selected to assess the individual effect and joint cancer risk in Chinese populations. Cancer Res. 69, 5065–5072 (2009). effects on lung cancer risk. Genotype frequencies between the cases and controls 19. Koster, R. et al. Pathway-based analysis of GWAs data identifies association of were evaluated using a chi-square test. Univariate and multivariate logistic sex determination genes with susceptibility to testicular germ cell tumors. regression models were used to calculate odds ratios (ORs) and 95% confidence Hum. Mol. Genet. 23, 6061–6068 (2014). intervals (CIs) of each genotype to estimate its effect on lung cancer risk with or 20. Truong, T. et al. Replication of lung cancer susceptibility loci at chromosomes without adjustment for age, sex and smoking status (never and ever). Statistical 15q25, 5p15, and 6p21: a pooled analysis from the International Lung Cancer analyses were performed with Statistical Analysis System (SAS) software (version Consortium. J. Natl. Cancer Inst. 102, 959–971 (2010). 9.1; SAS Institute, Cary, NC, USA) and P value < 0.05 was considered significant. 21. Jaworowska, E. et al. Smoking related cancers and loci at chromosomes 15q25, 5p15, 6p22.1 and 6p21.33 in the Polish population. PLoS ONE 6, e25057 Stratified analyses. We determined whether there were different pathways among (2011). the overall group, and in the group when stratified by smoking status (never and 22. Makinen, V. P. et al. Integrative genomics reveals novel molecular pathways ever). In the subgroups of smokers and non-smokers, we used a similar process to and gene networks for coronary artery disease. PLoS Genet. 10, e1004502 select index SNPs and candidate SNPs, and to carry out pathway analyses. Among (2014). smokers, eight index SNPs and the 3401 candidate SNPs were selected for further 23. Parkes, M., Cortes, A., van Heel, D. A. & Brown, M. A. Genetic insights into gene set enrichment analysis in the 1st and 2nd discovery cohort, the meta-analysis common pathways and complex relationships among immune-mediated of discovery cohorts and the replication cohort. Among non-smokers, no SNPs in diseases. Nat. Rev. Genet. 14, 661–673 (2013). chromosome 15q25.1 reached the criteria for index SNP selection, and therefore no 24. Bunyavanich, S. et al. Integrated genome-wide association, coexpression subsequent steps for pathway analyses were conducted. network, and expression single nucleotide polymorphism analysis identifies novel pathway in allergic rhinitis. BMC Med. Genomics 7, 48 (2014). Data availability. The data that support the findings of this study are available. 25. Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat – The access numbers are “phs000336.v1.p1.c1” for EAGLE study, “phs000753.v1. distribution. Nature 518, 187 196 (2015). – p1” for MDACC study, and “phs001273” for Oncoarray study in dbGAP. The 26. Adkins, D. E. et al. SNP-based analysis of neuroactive ligand receptor IARC study was made available at http://www.ceph.fr/cancer3. interaction pathways implicates PGE2 as a novel mediator of antipsychotic treatment response: data from the CATIE study. Schizophr. Res. 135, 200–201 (2012). Received: 1 November 2016 Accepted: 1 May 2018 27. Ren, C. et al. CRISPR/Cas9-mediated efficient targeted mutagenesis in Chardonnay (Vitis vinifera L.). Sci. Rep. 6, 32289 (2016). 28. Mundt, E. & Bates, M. D. Genetics of Hirschsprung disease and anorectal malformations. Semin. Pediatr. Surg. 19, 107–117 (2010). 29. Puri, P. & Shinkai, T. Pathogenesis of Hirschsprung’s disease and its variants: recent progress. Semin. Pediatr. Surg. 13,18–24 (2004). References 30. Chen, W. J. et al. Implication of downregulation and prospective pathway 1. Siegel, R., Ward, E., Brawley, O. & Jemal, A. Cancer statistics, 2011: the impact signaling of microRNA-375 in lung squamous cell carcinoma. Pathol. Res. of eliminating socioeconomic and racial disparities on premature cancer Pract. 213, 364–372 (2017). deaths. CA Cancer J. Clin. 61, 212–236 (2011).

12 NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y ARTICLE

31. Wu, X., Zang, W., Cui, S. & Wang, M. Bioinformatics analysis of two 57. Bosse, Y. et al. Molecular signature of smoking in human lung tissues. Cancer microarray gene-expression data sets to select lung adenocarcinoma marker Res. 72, 3753–3763 (2012). genes. Eur. Rev. Med. Pharmacol. Sci. 16, 1582–1587 (2012). 58. Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype 32. Lassi, G. et al. The CHRNA5-A3-B4 gene cluster and smoking: from discovery imputation method for the next generation of genome-wide association to therapeutics. Trends Neurosci. 39, 851–861 (2016). studies. PLoS Genet. 5, e1000529 (2009). 33. Li, M. D. et al. Association and interaction analyses of GABBR1 and GABBR2 59. Ogata, H. et al. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic with nicotine dependence in European- and African-American populations. Acids Res. 27,29–34 (1999). PLoS One 4, e7055 (2009). 60. Gene Ontology Consortium. The Gene Ontology (GO) project in 2006. 34. Begum, F. et al. Hemizygous deletion on chromosome 3p26.1 is associated Nucleic Acids Res. 34, D322–D326 (2006). with heavy smoking among African American subjects in the COPD gene 61. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based study. PLoS ONE 11, e0164134 (2016). approach for interpreting genome-wide expression profiles. Proc. Natl Acad. 35. Vink, J. M. et al. Genome-wide association study of smoking initiation and Sci. U.S.A. 102, 15545–15550 (2005). current smoking. Am. J. Hum. Genet. 84, 367–379 (2009). 62. Wang, K., Li, M. & Bucan, M. Pathway-based approaches for analysis of 36. Saccone, N. L. et al. Multiple distinct risk loci for nicotine dependence genomewide association studies. Am. J. Hum. Genet. 81, 1278–1283 (2007). identified by dense coverage of the complete family of nicotinic receptor 63. Irizarry, R. A. et al. Exploration, normalization, and summaries of high density subunit (CHRN) genes. Am. J. Med. Genet. B, Neuropsychiatr. Genet. 150B, oligonucleotide array probe level data. Biostatistics 4, 249–264 (2003). 453–466 (2009). 64. Heber, S. & Sick, B. Quality assessment of Affymetrix GeneChip data. OMICS 37. Biernacka, J. M. et al. Genome-wide gene-set analysis for identification of 10, 358–368 (2006). pathways associated with alcohol dependence. Int. J. Neuropsychopharmacol. 65. da Huang, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative 16, 271–278 (2013). analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 38. Kong, Y. et al. High throughput sequencing identifies microRNAs mediating 4,44–57 (2009). alpha-synuclein by targeting neuroactive-ligand receptor interaction pathway in early stage of Drosophila Parkinson’s disease model. PLoS ONE 10, e0137432 (2015). Acknowledgments 39. Putnam, D. K., Sun, J. & Zhao, Z. Exploring schizophrenia drug–gene This work was supported by National Institutes of Health (NIH) for the research of lung interactions through molecular network and pathway modeling. AMIA Annu. cancer (grants P30CA023108, P20GM103534, and R01LM012012); Transdisciplinary Symp. Proc. 2011, 1127–1133 (2011). Research in Cancer of the Lung (TRICL) (grant U19CA148127); UICC American Cancer 40. Wen, Y., Alshikho, M. J. & Herbert, M. R. Pathway network analyses for Society Beginning Investigators Fellowship funded by the Union for International Cancer autism reveal multisystem involvement, major overlaps with other diseases Control (UICC) (to X.J.). CAPUA study was supported by FIS-FEDER/Spain grant and convergence upon MAPK and . PLoS One 11, e0153329 numbers FIS-01/310, FIS-PI03-0365, and FIS-07-BI060604, FICYT/Asturias grant (2016). numbers FICYT PB02-67 and FICYT IB09-133, and the University Institute of Oncology 41. Lee, J. M., Davis, F. M., Roberts-Thomson, S. J. & Monteith, G. R. Ion (IUOPA) of the University of Oviedo and the Ciber de Epidemiologia y Salud Pública. channels and transporters in cancer. 4. Remodeling of Ca(2+) signaling in CIBERESP, SPAIN.CARET study was supported by NIH awards UM1 CA167462, UO1- tumorigenesis: role of Ca(2+) transport. Am. J. Physiol. Cell Physiol. 301, CA6367307, CA111703, and R01 CA151989. The Liverpool Lung project is supported by C969–C976 (2011). the Roy Castle Lung Cancer Foundation. The Harvard Lung Cancer Study was supported 42. Prevarskaya, N., Skryma, R. & Shuba, Y. Targeting Ca(2)(+) transport in by NIH grants CA092824, CA090578, and CA074386. The Multiethnic Cohort Study was cancer: close reality or long perspective? Expert Opin. Ther. Targets 17, partially supported by NIH grants CA164973, CA033619, CA63464, and CA148127. The 225–241 (2013). MSH-PMH study was supported by The Canadian Cancer Society Research Institute 43. Deliot, N. & Constantin, B. Plasma membrane calcium channels in cancer: (020214), Ontario Institute of Cancer and Cancer Care Ontario Chair Award to R.H. and alterations and consequences for cell proliferation and migration. Biochim. G.L. and the Alan Brown Chair and Lusi Wong Programs at the Princess Margaret Biophys. Acta 1848, 2512–2522 (2015). Hospital Foundation. R.T is supported by a Canada Research Chair in Pharmacoge- 44. Kim, Y. S., Kim, Y., Choi, J. W., Oh, H. E. & Lee, J. H. Genetic variants and nomics, CIHR grant (FDN-154294), and CAMH. NJLCS was funded by the State Key risk of prostate cancer using pathway analysis of a genome-wide association Program of National Natural Science of (81230067), the National Key Basic study. Neoplasma 63, 629–634 (2016). Research Program Grant (2011CB503805), the Major Program of the National Natural 45. Tomoshige, K. et al. Germline mutations causing familial lung cancer. J. Hum. Science Foundation of China (81390543). The Norway study was supported by Nor- Genet. 60, 597–603 (2015). wegian Cancer Society, Norwegian Research Council. The Shanghai Cohort Study was 46. Frullanti, E. et al. Association of lung adenocarcinoma clinical stage with gene supported by NIH R01 CA144034 and UM1 CA182876. The Singapore Chinese Health expression pattern in noninvolved lung tissue. Int. J. Cancer 131, E643–E648 Study (SCHS) was supported by NIH R01 CA144034 and UM1 CA182876. The TLC (2012). study has been supported in part by the James & Esther King Biomedical Research 47. Lee, J., Katzenmaier, E. M., Kopitz, J. & Gebert, J. Reconstitution of TGFBR2 Program (09KN-15), NIH grant P50 CA119997 and P30CA76292. The Vanderbilt Lung in HCT116 colorectal cancer cells causes increased LFNG expression and Cancer Study—BioVU dataset used for the analyses described was obtained from Van- enhanced N-acetyl-d-glucosamine incorporation into Notch1. Cell Signal. 28, derbilt University Medical Center’s BioVU, which is supported by institutional funding, 1105–1113 (2016). the 1S10RR025141-01 instrumentation award, and by the Vanderbilt CTSA grant 48. Cheung, A. K. et al. PTPRG suppresses tumor growth and invasion via UL1TR000445 from NCATS/NIH, K07CA172294 and U01HG004798. The Copenhagen inhibition of Akt signaling in nasopharyngeal carcinoma. Oncotarget 6, General Population Study was supported by the Chief Johan Boserup and Lise 13434–13447 (2015). Boserup Fund, the Danish Medical Research Council and Herlev Hospital. The NELCS 49. Englinger, B. et al. Acquired nintedanib resistance in FGFR1-driven small cell study was supported by NIH grant P20RR018787. The MDACC study was supported in lung cancer: role of -A receptor-activated ABCB1 expression. part by grants from the NIH (P50 CA070907, R01 CA176568), Cancer Prevention & Oncotarget. 7, 50161–50179 (2016). Research Institute of Texas (RP130502), and The University of Texas MD Anderson 50. Cui, Y. et al. OPCML is a broad tumor suppressor for multiple carcinomas Cancer Center institutional support for the Center for Translational and Public Health and lymphomas with frequently epigenetic inactivation. PLoS One 3, e2990 Genomics. The study in Lodz center was partially funded by Nofer Institute of Occu- (2008). pational Medicine, under task NIOM 10.13: Predictors of mortality from non-small cell 51. Anglim, P. P. et al. Identification of a panel of sensitive and specific DNA lung cancer—field study. Kentucky Lung Cancer Research Initiative was supported by the methylation markers for squamous cell lung cancer. Mol. Cancer 7, 62 (2008). Department of Defense [Congressionally Directed Medical Research Program, U.S. Army 52. Gentile, A., Lazzari, L., Benvenuti, S., Trusolino, L. & Comoglio, P. M. Ror1 is Medical Research and Materiel Command Program] under award number 10153006 a pseudokinase that is crucial for Met-driven tumorigenesis. Cancer Res. 71, (W81XWH-11-1-0781). Views and opinions of, and endorsements by the author(s) do 3132–3141 (2011). not reflect those of the US Army or the Department of Defense. It is also supported by 53. Yamaguchi, T. et al. NKX2-1/TITF1/TTF-1-induced ROR1 is required to NIH grant UL1TR000117 and P30 CA177558 using Shared Resource Facilities: Cancer sustain EGFR survival signaling in lung adenocarcinoma. Cancer Cell. 21, Research Informatics, Biospecimen and Tissue Procurement, and Biostatistics and 348–361 (2012). Bioinformatics. The Resource for the Study of Lung Cancer Epidemiology in North Trent 54. Landi, M. T. et al. Environment And Genetics in Lung cancer Etiology (ReSoLuCENT) study was funded by the Sheffield Hospitals Charity, Sheffield Experi- (EAGLE) study: an integrative population-based case-control study of lung mental Cancer Medicine Centre, and Weston Park Hospital Cancer Charity. F.T. was cancer. BMC Public Health 8, 203 (2008). supported by a clinical PhD fellowship funded by the Yorkshire Cancer Research/Cancer 55. Wang, Y. et al. Rare variants of large effect in BRCA2 and CHEK2 affect risk Research UK Sheffield Cancer Centre. The authors at Laval would like to thank the staff of lung cancer. Nat. Genet. 46, 736–741 (2014). at the Respiratory Health Network Tissue Bank of the FRQS for their valuable assistance 56. McKay, J. et al. Large-scale association analysis identifies new lung cancer with the lung eQTL dataset at Laval University. The lung eQTL study at Laval University susceptibility loci and heterogeneity in genetic susceptibility across histological was supported by the Fondation de l’Institut universitaire de cardiologie et de pneu- subtypes. Nat. Genet. 49, 1126–1132 (2017). mologie de Québec, the Respiratory Health Network of the FRQS, the Canadian Insti- tutes of Health Research (MOP-123369). Y.B. holds a Canada Research Chair in

NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y

Genomics of Heart and Lung Diseases. The research undertaken by M.T., L.W., and M.A. Reprints and permission information is available online at http://npg.nature.com/ was partly funded by the National Institute for Health Research (NIHR). The views reprintsandpermissions/ expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. M.D.T. holds a Medical Research Council Senior Clinical Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in Fellowship (G0902313). published maps and institutional affiliations.

Author contributions C.A. and M.de B. designed the study. C.A. edited the article. X.J. analyzed the data. X.J., C.A., and Y.B. wrote the article. Y.B. performed eQTL research. M.L. contributed to Eagle Open Access This article is licensed under a Creative Commons Data. C.A., X.J., Y.B., M.L., J.G., X.X., D.Q., P.J., M.L., Y.L., I.G., M.de B., Y.H., O.G., R. Attribution 4.0 International License, which permits use, sharing, H., X.W., J.M., X.Z., R.C., D.C., N.C., M.J., G.L., S.B., L.M., D.A., H.B., M.A., W.B., A.T., adaptation, distribution and reproduction in any medium or format, as long as you give G.R., C.C., M.T., J.F., L.K., P.L., A.H., S.L., M.S., A.A., H.S., Y.H., J.Y., P.B., A.P., Y.Y., N. appropriate credit to the original author(s) and the source, provide a link to the Creative D., L.S., R.Z., Y.B., N.L., J.J., A.M., W.S., C.H., L.W., A.F., G.F., E.H., J.K., J.D., Z.H., M.D., Commons license, and indicate if changes were made. The images or other third party ’ M.M., H.B., J.M., O.M., D.M., K.O., A.T., R.T., J.D., G.G., A.C., F.T., P.W., I.B., J.M., T. material in this article are included in the article s Creative Commons license, unless M., A.R., A.R., K.G., M.J., F.S., M.T., S.A., E.H., C.B., I.H., V.J., M.K., J.L., A.M., S.O., T. indicated otherwise in a credit line to the material. If material is not included in the O., G.S., B.S., D.Z., P.B., V.S., S.Z., E.D., L.B., W.K., Y.G., R.H., J.M., V.S., D.N., M.O., W. article’s Creative Commons license and your intended use is not permitted by statutory T., B.Z., L.S., M.A., M.T., L.W., F.G., J.B., A.K., D.Z., R.T., W.W., S.C., and P.B. conducted regulation or exceeds the permitted use, you will need to obtain permission directly from data preparation, discussed the results, and approved the final version of the manuscript. the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/. Additional information Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467- 018-05074-y. © The Author(s) 2018 Competing interests: The authors declare no competing interests.

Xuemei Ji1, Yohan Bossé 2,3, Maria Teresa Landi4, Jiang Gui1, Xiangjun Xiao1, David Qian1, Philippe Joubert3, Maxime Lamontagne3, Yafang Li1, Ivan Gorlov1, Mariella de Biasi5,6, Younghun Han1, Olga Gorlova1, Rayjean J. Hung7, Xifeng Wu 8, James McKay9, Xuchen Zong7, Robert Carreras-Torres9, David C. Christiani10,11, Neil Caporaso4, Mattias Johansson9, Geoffrey Liu7, Stig E. Bojesen 12,13,14, Loic Le Marchand15, Demetrios Albanes4, Heike Bickeböller16, Melinda C. Aldrich17, William S. Bush17,18, Adonina Tardon19,20, Gad Rennert 21,22, Chu Chen23, M. Dawn Teare24, John K. Field25, Lambertus A. Kiemeney26, Philip Lazarus27, Aage Haugen28, Stephen Lam29, Matthew B. Schabath 30, Angeline S. Andrew31, Hongbing Shen32, Yun-Chul Hong33, Jian-Min Yuan34, Pier A. Bertazzi 35,36, Angela C. Pesatori 35,36, Yuanqing Ye8, Nancy Diao10,LiSu10, Ruyang Zhang10,32, Yonathan Brhane7, Natasha Leighl37, Jakob S. Johansen38, Anders Mellemgaard38, Walid Saliba21,22, Christopher Haiman39, Lynne Wilkens15, Ana Fernandez-Somoano 19,20, Guillermo Fernandez-Tardon19,20, Erik H.F.M. van der Heijden 26, Jin Hee Kim 40, Juncheng Dai32, Zhibin Hu32, Michael P.A. Davies25, Michael W. Marcus25, Hans Brunnström41, Jonas Manjer42, Olle Melander42, David C. Muller 43, Kim Overvad42, Antonia Trichopoulou44, Rosario Tumino45, Jennifer Doherty31,46, Gary E. Goodman46,47, Angela Cox 48, Fiona Taylor48, Penella Woll48, Irene Brüske49, Judith Manz49, Thomas Muley50,51, Angela Risch 52, Albert Rosenberger16, Kjell Grankvist 53, Mikael Johansson54, Frances Shepherd55, Ming-Sound Tsao55, Susanne M. Arnold 56, Eric B. Haura57, Ciprian Bolca58, Ivana Holcatova59, Vladimir Janout59, Milica Kontic60, Jolanta Lissowska61, Anush Mukeria62, Simona Ognjanovic63, Tadeusz M. Orlowski64, Ghislaine Scelo9, Beata Swiatkowska65, David Zaridze62, Per Bakke66, Vidar Skaug28, Shanbeh Zienolddiny28, Eric J. Duell67, Lesley M. Butler34, Woon-Puay Koh68,69, Yu-Tang Gao70, Richard Houlston 71, John McLaughlin72, Victoria Stevens73, David C. Nickle74,Ma’en Obeidat 75, Wim Timens 76, Bin Zhu4, Lei Song4, María Soler Artigas77,78, Martin D. Tobin 77,78, Louise V. Wain77,78,

14 NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05074-y ARTICLE

Fangyi Gu4, Jinyoung Byun1, Ahsan Kamal1, Dakai Zhu1, Rachel F. Tyndale79,80,81, Wei-Qi Wei82, Stephen Chanock4, Paul Brennan 9 & Christopher I. Amos 1,83

1Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, 03750 NH, USA. 2Department of Molecular Medicine, Laval University, Québec G1V 4G5, Canada. 3Institut universitaire de cardiologie et de pneumologie de Québec, Québec G1V 4G5, Canada. 4Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda 20892 MD, USA. 5Annenberg School of Communication, University of Pennsylvania, , 19104 PA, USA. 6Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104 PA, USA. 7Lunenfeld-Tanenbaum Research Institute, Sinai Health System and University of Toronto, Toronto M5T 3L9, Canada. 8Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, 77030 TX, USA. 9International Agency for Research on Cancer, World Health Organization, Lyon 69372 CEDEX 08, . 10Department of Environmental Health, Harvard School of Public Health, Boston, 02115 MA, USA. 11Department of Medicine, Massachusetts General Hospital, Boston, 02115 MA, USA. 12Department of Clinical , Herlev and Gentofte Hospital, Copenhagen University Hospital, Copenhagen, Herlev 2730, Denmark. 13Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, 2200 København N, Denmark. 14Copenhagen General Population Study, Herlev and Gentofte Hospital, Ringvej 75, Copenhagen, Herlev 2730, Denmark. 15Epidemiology Program, University of Hawaii Cancer Center, Honolulu, 96813 HI, USA. 16Department of Genetic Epidemiology, University Medical Center, Georg-August-University Göttingen, Göttingen 37073, Germany. 17Department of Thoracic Surgery, Division of Epidemiology, Vanderbilt University Medical Center, Nashville, 37203 TN, USA. 18Department of Epidemiology and Biostatistics, School of Medicine, Case Western Reserve University, Cleveland, 44106 OH, USA. 19Faculty of Medicine, University of Oviedo, Oviedo 33006, Spain. 20Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública, Campus del Cristo s/n, Oviedo 33006, Spain. 21Clalit National Cancer Control Center, Carmel Medical Center, Haifa 34361, Israel. 22Faculty of Medicine, Technion, Haifa 34361, Israel. 23Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, 98109 WA, USA. 24School of Health and Related Research, University of Sheffield, Sheffield S1 4DA, UK. 25Roy Castle Lung Cancer Research Programme, Institute of Translational Medicine, University of Liverpool, Liverpool L69 3BX, UK. 26Radboud University Medical Center, Radboud Institute for Health Sciences, Nijmegen, 6525 EZ, The Netherlands. 27Department of Pharmaceutical Sciences, College of Pharmacy, Washington State University, Spokane 99210-1495 WA, USA. 28National Institute of Occupational Health, 0033Gydas vei 8, 0033, Oslo,, Norway. 29British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver V5Z1L3, Canada. 30Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, 33612 FL, USA. 31Department of Epidemiology, Geisel School of Medicine, 1 Medical Center Drive, Hanover, 03755 NH, USA. 32Department of Epidemiology and Biostatistics, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, 101 Longmian Ave, Nanjing 211166, PR China. 33Department of Preventive Medicine, Seoul National University College of Medicine, 1 Gwanak-ro, Gwanak-gu, Seoul 151 742, Republic of Korea. 34University of Pittsburgh Cancer Institute, Pittsburgh, 15232 PA, USA. 35Department of Preventive Medicine, IRCCS Foundation Ca’Granda Ospedale Maggiore Policlinico, Milan 20133, Italy. 36Department of Clinical Sciences and Community Health, University of Milan, Milan 20133, Italy. 37University Health Network—The Princess Margaret Cancer Centre, 600 University Avenue, Toronto M5G 2C4, Canada. 38Department of Oncology, Herlev and Gentofte Hospital, Copenhagen University Hospital, Copenhagen 2730, Denmark. 39Department of Preventive Medicine, Keck School of Medicine, University of Southern California Norris Comprehensive Cancer Center, Los Angeles, 90033 CA, USA. 40Department of Integrative Bioscience & Biotechnology, Sejong University, Gwangjin-gu, Seoul 05029, Republic of Korea. 41Department of Pathology, Lund University, Lund 222 41, Sweden. 42Faculty of Medicine, Lund University, Lund 22100, Sweden. 43School of Public Health, St Mary’s Campus, Imperial College London, London W2 1PG, UK. 44Hellenic Health Foundation, Athens GR-115 27, Greece. 45Cancer Registry and Histopathology Department, “Civic-M.P. Arezzo” Hospital, ASP, Ragusa 97100, Italy. 46Fred Hutchinson Cancer Research Center, Seattle, 98109-1024 WA, USA. 47Swedish Medical Group, Arnold Pavilion, Suite 200, Seattle, 98104 WA, USA. 48Department of Oncology and Metabolism, University of Sheffield, Sheffield S10 2RX, UK. 49Research Unit of Molecular Epidemiology, Institute of Epidemiology II, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg D-85764, Germany. 50Thoraxklinik at University Hospital Heidelberg, Heidelberg 69126, Germany. 51Translational Lung Research Center Heidelberg (TLRC-H), Heidelberg 69120, Germany. 52Cancer Cluster Salzburg, University of Salzburg, Salzburg 5020, Austria. 53Department of Medical Biosciences, Umeå University, Umeå 901 85, Sweden. 54Department of Radiation Sciences, Umeå University, Umeå 901 85, Sweden. 55Princess Margaret Cancer Centre, Toronto M5G2M9, Canada. 56Markey Cancer Center, University of Kentucky, First Floor, 800 Rose Street, Lexington, 40508 KY, USA. 57Department of Thoracic Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, 33612 KY, USA. 58Institute of Pneumology “Marius Nasta”, RO-050159, . 591st Faculty of Medicine, Charles University, Kateřinská 32, Prague 121 08 Praha 2, Czech Republic. 60Clinical Center of Serbia, Clinic for , School of Medicine, University of Belgrade, Belgrade 11000, Serbia. 61Department of Cancer Epidemiology and Prevention, M. Sklodowska-Curie Institute—Oncology Center, Warsaw 02-781, . 62Department of Epidemiology and Prevention, Russian N.N. Blokhin Cancer Research Centre, Moscow 115478, Russian Federation. 63International Organization for Cancer Prevention and Research, Belgrade 11070, Serbia. 64Department of Surgery, National and Lung Diseases Research Institute, Warsaw PL-01-138, Poland. 65Department of Environmental Epidemiology, Nofer Institute of Occupational Medicine, Lodz 91- 348, Poland. 66Department of Clinical Science, University of Bergen, Bergen 5021, Norway. 67Unit of Nutrition and Cancer, Catalan Institute of Oncology (ICO-IDIBELL), Barcelona 08908, Spain. 68Duke–NUS Medical School, Singapore 119077, Singapore. 69Saw Swee Hock School of Public Health, National University of Singapore, Singapore 117549, Singapore. 70Department of Epidemiology, Shanghai Cancer Institute, Shanghai 2200, China. 71The Institute of Cancer Research, London, SW7 3RP England, UK. 72Public Health Ontario, Windsor, Ontario N8W 5K5, Canada. 73American Cancer Society, Inc., Atlanta, 30303 GA, USA. 74Department of Genetics and , Merck Research Laboratories, Boston, 02115-5727 MA, USA. 75Centre for Heart Lung Innovation, St Paul’s Hospital, The University of British Columbia, Vancouver V6Z 1Y6 BC, Canada. 76Department of Pathology and Medical Biology, GRIAC, University of Groningen, University Medical Center Groningen, Groningen NL - 9713 GZ, The Netherlands. 77Genetic Epidemiology Group, Department of Health Sciences, University of Leicester, Leicester LE1 7RH, UK. 78Leicester Respiratory Biomedical Research Unit, National Institute for Health Research (NIHR), Glenfield Hospital, Leicester LE3 9QP, UK. 79Department of and , University of Toronto, Toronto M5S 1A8 ON, Canada. 80Department of Psychiatry, University of Toronto, Toronto M5T 1R8 ON, Canada. 81Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, M6J 1H4 ON, Canada. 82Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN 37235, USA. 83The Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, 77030 TX, USA

NATURE COMMUNICATIONS | (2018) 9:3221 | DOI: 10.1038/s41467-018-05074-y | www.nature.com/naturecommunications 15