Exome Array Analysis of Chronic Obstructive Pulmonary Disease

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Hobbs, Brian. 2015. Exome Array Analysis of Chronic Obstructive Pulmonary Disease. Master's thesis, Harvard Medical School.

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:22837731

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Abstract

Background:

Chronic obstructive pulmonary disease (COPD) susceptibility is in part related to genetic variants. Most genetic studies have focused on common variation, but rare coding variants are known to affect COPD susceptibility. We hypothesized that an exome array analysis would identify single non-synonymous variants and -based aggregates of non-synonymous variants associated with COPD.

Methods:

We used the Illumina HumanExome array to genotype individuals in six COPD cohorts:

Caucasian subjects from the family-based Boston Early-Onset COPD Study (BEOCOPD) and

International COPD Genetics Network (ICGN), and the case-control COPDGene (non-Hispanic whites and African-Americans) and Transcontinental COPD Genetics Study (Poland and Korea).

Cases were defined as GOLD Grade 2 and above COPD. Controls had normal lung function; the vast majority were current or former smokers. We tested single non-synonymous, stop and splice variants with a minor allele frequency (MAF) of > 0.5% in an additive model using logistic regression and combined results in a fixed-effects meta-analysis. Our gene-based testing was performed on non-synonymous, stop, and splice variants with MAF < 5% and used SKAT-O with meta-analysis in the MetaSKAT software in R. We performed meta-analyses for all subjects and separately by ethnicity. We adjusted all analyses for age, sex, pack-years of smoking, and ancestry-related principal components. Exome-wide significance was determined to be 2.3x10-6 for single variant testing and 4.1x10-6 for gene-based testing.

Results: Across the six cohorts, we included 5971 controls and 6054 cases in our analysis. We identified an exome-wide significant non-synonymous variant rs16969968 (p=1.4x10-13) in

CHRNA5 at a locus previously described in association with COPD susceptibility and nicotine addiction. No additional variants or met exome-wide significance. Additional top association results included variants in MMP3, AGER, and SERPINA1. A non-synonymous variant in IL27 (p=5.6x10-6) was just below the level of exome-wide significance. In gene-based testing, the top gene was CYB5RL with p=3.9x10-5. We also identified several non-synonymous

SNPs at previously described GWAS loci for COPD or lung function, including GPR126, RIN3,

MECOM, and TNS1.

Conclusions:

We have performed an exome array analysis for COPD in multiple populations. Although no novel variants or genes were identified at exome-wide significance, our analysis confirms associations at previously discovered loci and identifies coding variants for potential future study. Additionally, we identified a variant in IL27 just below the significance threshold as a potential candidate for COPD pathogenesis. Introduction

Chronic obstructive pulmonary disease (COPD) is a highly morbid condition, estimated to affect approximately 12.7 million persons in the United States, that led to 142,943 deaths in

2011, making COPD the third leading cause of death behind heart disease and all cancers.(1, 2)

COPD is a complex disease whose development depends on both environmental and genetic risk factors. The genetic contributions of COPD were first illustrated in family-based and linkage studies, and have been further elucidated with several genome-wide association studies

(GWASs) implicating COPD risk loci at IREB2, CHRNA3/5, HHIP, FAM13A, RIN3, MMP12,

TGFB2 as well as variants in chromosomal region 19q13.(3-7) Although multiple COPD susceptibility loci have been identified through GWASs, the effect size of these risk loci is modest (odds ratios typically less than 1.5 when comparing cases and controls). In 2013, Zhou et al. estimated in COPD that the proportion of phenotypic variability attributed to genetic variation

(heritability) was 38% and the known COPD risk loci (at that time: IREB2, CHRNA3/5, HHIP,

FAM13A, and 19q13) explained only 5-10% of the observed phenotypic variability.(8)

Therefore, a large portion of the heritability of COPD is yet to be explained. Undiscovered uncommon (minor allele frequency (MAF) 1 to 5%) and rare (MAF less than 1%) genetic variants, not captured by GWAS, are one of several possible causes of “missing heritability”.(9)

In addition to contributing to proportion of explained heritability in COPD, analyzing uncommon and rare coding variation may reveal novel pathobiology contributing to the development of COPD. Rare variants are important in COPD susceptibility, as illustrated by alpha-1 antitrypsin deficiency, a genetic disorder in which rare variants in a serine protease inhibitor (SERPINA1) greatly impact COPD susceptibility.(10) In cardiology genomic research,

Cohen et al. demonstrated that uncommon and rare genetic variation in PCSK9 led to large reductions in plasma low-density lipoprotein (LDL) levels along with a reduction in risk of coronary heart disease(11, 12), which has subsequently led to novel therapies. These studies and others illustrate the potential for the study of rare coding variation in other complex diseases, such as COPD.

Traditional GWAS genotyping arrays have a large portion of genetic variants outside the coding genome and thus many GWAS associations in complex disease have yet to be functionally classified. Restricting uncommon and rare variation analysis to the coding regions of the genome (the exome), allows for more direct biological and functional interpretation of association study results. Exome genotyping arrays have been developed as a way to specifically query the uncommon and rare genetic variation in the coding genome. These exome arrays contain approximately 250,000 non-synonymous (i.e. structure altering) SNP probes.(13)

In complex disease phenotypes such as insulin secretion, type 2 diabetes risk reduction, blood lipid levels, and fasting glucose levels, this exome array technology has already been employed to add to working knowledge of these phenotypes.(14-17) We applied an exome genotyping array to determine coding variant associations with COPD susceptibility. Methods

Cohorts and COPD case status definition:

The Boston Early Onset COPD (BEOCOPD) study (ClinicalTrials.gov: NCT01177618) is an extended pedigree study constructed based on probands under 53 years of age with severe

COPD (defined as forced expiratory volume in one second (FEV1) < 40% predicted) and without severe alpha-1 antitrypsin deficiency.(18) The International COPD Genetics Network (ICGN) study recruited subjects with relatively early onset COPD (FEV1 < 60% predicted and FEV1 to forced vital capacity ratio (FEV1/FVC) < 90% predicted between ages 45-65) as probands and then enrolled siblings and parents through the proband.(19, 20). We limited analysis of the

BEOCOPD and ICGN studies to Caucasians. The Transcontinental COPD Genetics Study included two case-control studies, based in Poland and in Korea. Both studies recruited individuals between 40 and 80 years of age, with at least 10 pack-years of cigarette smoking; where cases had severe COPD (FEV1 < 50% predicted) and controls had normal spirometry.

Subjects with other lung disease were excluded; more complete inclusion and exclusion criteria have been previously described.(21) TCGS-Poland enrolled non-Hispanic white individuals, and

TCGS-Korea enrolled Korean individuals. The Genetic Epidemiology of COPD (COPDGene) study (ClinicalTrials.gov: NCT00608764), enrolled approximately 10,200 self-reported non-

Hispanic whites and African Americans between the ages of 45 and 80 years old with a minimum 10 pack-year smoking history. Full details of the COPDGene study design including inclusion and exclusion criteria have been previously described and are available online at www.COPDGene.org.(22) All COPD cohorts excluded persons with known Alpha1-Antitrypsin deficiency. IRB approval was obtained for all analysis cohorts. For all analyses, persons were labeled unaffected with COPD if they had an FEV1/FVC ratio ≥ 0.7 and FEV1 ≥ 80% predicted. Persons were labeled as affected with COPD if they had

FEV1/FVC ratio < 0.7 and FEV1 < 80% predicted.

Genotyping:

Using the Illumina HumanExome v1.2 array, 4900 Caucasian individuals (BEOCOPD =

1198, TCGS-Poland = 659, and ICGN = 3043) and 458 individuals from the TCGS-Korea cohort were genotyped in a single batch at Illumina, following quality control guidelines as outlined by the CHARGE Consortium.(23) The COPDGene non-Hispanic white (NHW) and African

American (AA) cohorts were genotyped in two separate batches. First, 2470 COPDGene NHW individuals, chosen on the basis of either having severe COPD or being resistant smoking controls, were genotyped on the Illumina HumanExome v1.1 array at the University of

Washington. Second, the remainder of the COPDGene study population comprising 7967 NHW and AA individuals (including intentional duplicates and genotyping controls) were genotyped on the Illumina HumanExome v1.2 array at the Center for Inherited Disease Research (CIDR) at

Johns Hopkins University.

Subject level quality control:

Kinship errors were systematically evaluated using the KING relationship inference software package.(24) Errors in reported relationships in the BEOCOPD extended pedigree study and the ICGN sibling study were manually reviewed using the kinship2 (25) package in R

(version 3.1.1, http://www.R-project.org/) (26) to draw all pedigrees containing errors. For

BEOCOPD, subjects with suspected non-paternity were retained in the analysis as long as there was corroborating evidence of non-paternity when examining all first-degree relationships. In the

BEOCOPD study, errors in third-degree or more distant relationships were ignored as long as the relationships in first and second-degree relatives were as expected. For the ICGN sibship study, subjects with suspected non-paternity were kept in the study as half-siblings. For all other kinship errors in the BEOCOPD and ICGN cohorts, subjects were removed from analysis.

For all analysis cohorts, assessment of subject genotyping rate, X heterozygosity (sex check), and excess homozygosity (inbreeding), were calculated in PLINK v1.07 and v1.9.(27-30) For the BEOCOPD extended pedigree study, Mendel errors were additionally calculated in PLINK. For the two COPDGene cohorts, population outliers were evaluated by performing principal component analysis (PCA) in the smartpca software package.(31, 32) Assessment for population outliers in the BEOCOPD, ICGN, TCGS-Korea, and

TCGS-Poland populations was performed using TRACE (33-35), a method of evaluating population substructure robust to genetic relatedness between samples, using the HapMap population(36) as reference.

Marker level quality control:

In order to assure common marker names as well as common reference alleles for all variants across our cohorts, we used the chromosome and position of markers to map variants to the dbSNP 141 (37) variant calling format (VCF) file in order to retrieve rs numbers as well as the appropriate reference and alternate allele for each biallelic substitution variant. For variants not documented in dbSNP, we used the Exome Aggregation Consortium (ExAC) database release 0.2 (38) to identify the reference alleles for our variants. For variants not found in either dbSNP 141 or the ExAC database, names were assigned using chromosome and position and reference alleles were set according to the hg19 human reference genome. For all cohorts,

PLINK was used to assess and remove markers with call rate < 95%, markers poorly concordant between intentional duplicate subjects and duplicate markers, and markers out of Hardy- Weinberg equilibrium (p value < 1x10-8) in unaffected individuals in any single cohort. For the

COPDGene cohorts where two separate exome array platforms were used, minor allele frequency differences between the two platforms were assessed using Fisher’s exact test with markers having p < 1x10-4 for allele frequency difference being removed from the analysis.

Marker annotation:

We used GENCODE (39) version 14 implemented through EPACTS v3.2.6 (40) to annotate all markers. All exonic markers were additionally assessed as being synonymous, non- synonymous, stop, or splice.

Single variant testing:

All single variant testing was performed on non-synonymous, stop, and splice site exome array variants with cohort-based minor allele frequency (MAF) > 0.5%. For BEOCOPD and

ICGN family-based studies, only one randomly chosen individual per pedigree was included in the MAF calculation. In all cohorts, analyses were adjusted for age, sex, pack-years of smoking, and ancestry-based principal components. We combined the BEOCOPD and ICGN family- based cohorts including a covariate for study and performed a case-control analysis using

Generalized Estimating Equations (GEE) as implemented in the geepack.lgst.batch function in

GWAF version 2.2.(26, 41) Individual single variant COPD association analysis for TCGS-

Korea, TCGS-Poland, COPDGene AA, and COPDGene NHW cohorts were performed using an additive genetic model through logistic covariate regression in PLINK v1.9.(29, 30) For the

COPDGene NHW cohort, exome array platform (HumanExome v1.1 vs. v1.2) was an additional covariate. Following all cohort-level COPD association analyses, results were combined in a fixed effects meta-analysis using METAL (version 2011-03-25).(42, 43).

Gene-based testing: In each analysis cohort (BEOCOPD and ICGN, TCGS-Korea, TCGS-Poland,

COPDGene AA, and COPDGene NHW), non-synonymous, splice, and stop variants were collapsed into gene sets in EPACTS according to GENCODE version 14 gene annotations. Gene sets within each cohort were limited to non-synonymous variants with a cohort-specific MAF <

5% and cohort-specific minimum minor allele count (MAC) of 5. We used SKAT-O (44), the optimal combination of the burden test and sequence kernel association test (SKAT) (45), for all gene-based COPD association testing in a meta-analysis framework in the MetaSKAT version

0.40 (46). The BEOCOPD and ICGN family-based cohorts were combined for analysis and a logistic mixed model was implemented in the GMMAT version 0.5 package (47, 48) in order to calculate the null model parameters and residuals needed to input to MetaSKAT. For all other analysis cohorts, the null model parameters and residuals for MetaSKAT were calculated using the SKAT_Null_Model function in the SKAT version 1.0.1 (45, 49). Summary statistic files from all analysis cohorts were combined in a meta-analysis using MetaSKAT assuming homogeneous effect of SNPs across study cohorts.

For both single-variant and gene-based meta-analyses, we performed an analysis across all cohorts, and a secondary analysis using on non-Hispanic whites. Quantile-quantile (QQ) plots were performed for all individual analyses and meta-analyses and, where applicable, examined by allele frequency using EPACTS to assure no inflation in any group of variants. We adjusted for multiple testing using Bonferroni correction, requiring that a single variant or gene was present in at least 3 of the 5 meta-analysis cohorts.

Candidate gene evaluation:

We evaluated the association statistics for a list of 119 genes identified from COPD and lung function genome-wide association study loci as well as genes implicated in Mendelian diseases leading to syndromes that included COPD (subsequently referred to as ‘candidate genes’).(7, 50, 51) The full list of evaluated candidate genes can be found in supplemental Table

S1. For candidate gene investigations, we consider nominal statistical significance for COPD association to be p < 0.05 in the single variant and gene-based meta-analyses.

Results

Participant details:

After subject quality control measures, removal of subjects who were neither unaffected with COPD (controls) nor affected with COPD (cases), and removal of subjects with missing covariate data, a total of 5971 controls and 6054 cases were available for analysis. Baseline characteristics are shown in Table 1.

Table 1. Cohort compositions and demographics, separated by affection status.

BEOCOPD ICGN COPDGene COPDGene (N Pedigrees = (N Pedigrees = African non-Hispanic 199) 1097) TCGS - Korea TCGS - Poland Americans whites

N 550 695 219 307 1754 2446 Male % 41.6 48.2 96.8 67.4 58.3 49.2

Controls Age 39.1 (26.3,51.3) 54.9 (48.1,60.7) 53 (46,59) 58.3 (54.3,62.9) 51.8 (48.2,55.7) 59.4 (52.2,65.9)

Pack Years Smoking 1.55 (0,15.7) 25.1 (15.6,38.8) 25.5 (16.5,34) 32.2 (22.9,41.1) 32.7 (21.1,43.8) 35 (23,47) FEV1 % Predicted 94.5 (86.9,103) 97.5 (88.6,108) 93.6 (88.2,100) 102 (92.5,111) 96.6 (88.9,106) 95.5 (88.1,104) N 366 1736 149 304 796 2703 Male % 39.9 58.9 99.3 70.1 55.5 55.6

Cases Age 50.8 (46.4,59) 59.4 (54.8,63.6) 69 (65,74) 62.2 (57.8,67.7) 58.2 (52.4,64.8) 65.1 (58.9,71)

Pack Years Smoking 37.5 (25.4,54) 45 (32.8,64.8) 40 (31,52) 40.3 (30,52.8) 37.8 (25.2,51.6) 49.8 (38,70.5)

FEV1 % Predicted 29.2 (18.5,51.8) 39.2 (26.8,52.7) 33.2 (27.4,40.1) 28.7 (22.3,35.4) 54 (39.3,67.1) 50.6 (35.2,65.8) Cohort member characteristics are reported independently for cases and controls in each cohort. For age, pack years smoking, and FEV1 % predicted, the medians and interquartile ranges are reported.

Marker filtering for analysis and calculation of statistical significance level:

Table 2 illustrates the number of variants and gene sets considered in both single variant testing and gene-based testing with SKAT-O.

Table 2. Number of variants and gene sets analyzed in single variant and gene-based COPD affection status association analysis.

BEOCOPD COPDGene COPDGene and ICGN African Non-Hispanic Combined TCGS - Korea TCGS - Poland Americans Whites Number of Variants in Single Variant Testing* 41,191 35,097 41,125 58,577 39,449 Number of Non-synonymous Variants in Single Variant Testing 25,017 19,773 25,013 41,785 24,078 Number of Gene Sets in SKAT- O Testing^ 11,621 6,221 9,399 14,559 14,677 Number of Rare Variants in SKAT-O Testing# 106,223 72,835 94,941 189,254 193,375 *Single variant testing included variants with minor allele frequency greater than 0.5%. ^Gene sets in SKAT-O testing at a minimum allele count of 5 and contained only variants with cohort-level minor allele frequency less than 5%. #The total number of non-synonymous variants meeting gene set requirements for SKAT-O testing.

The number of non-synonymous exonic variants present in at least 3 of the 5 analysis cohorts in the meta-analysis was 21980, giving an exome wide single variant significance of

2.3x10-6. For gene-based SKAT-O meta-analysis, 12133 genes gives a significance threshold of

4.1x10-6.

Single variant meta-analysis for association with COPD affection status:

In our COPD affection status single variant meta-analysis of 61176 non-synonymous variants across all five cohorts, we identified a single exome-wide significant variant, rs16969968 in CHRNA5, with meta-analysis p=1.4x10-13. The CHRNA5 locus is previously known to be associated with COPD susceptibility and nicotine addiction.(3, 5, 7) No novel variants reached exome-wide significance in our meta-analysis; however, rs181206 in IL27 on was just below the exome wide significance threshold with meta-analysis p=5.6x10-6. Top results from the meta-analysis across all five cohorts are shown in Table 3. Of the top 10 results, the majority were common variants with overall effect allele frequency > 5%; however, variants in FAM208B, AGER, and CRAMP1L were uncommon with effect allele frequency between 1% and 5%. No variants with allele frequency < 1% were identified in the top single variant associations with COPD in our meta-analysis. The 12th most significant non- synonymous variant association with p=1.8x10-4 was rs28929474, the SERPINA1 Z allele, and the most common cause of the Mendelian syndrome Alpha1-Antitrypsin (AAT) deficiency.(10)

Although persons with known AAT were excluded from all study cohorts, three individuals in the TCGS-Poland study cohort were found to be Z allele homozygotes. Removal of these three subjects diminished but did not eliminate the association for this variant (p=0.009 to p=0.03).

Table 3. Top 10 results from meta-analysis across all cohorts of non-synonymous variant associations with COPD affection status.

Effect Allele Chr Position SNP Gene Overall Freq Beta SE Direction*^ p value 15 78882925 rs16969968 CHRNA5 0.36 0.26 0.035 +++++ 1.4E-13 16 28513403 rs181206 IL27 0.69 -0.16 0.036 ----- 5.6E-06 6 109767931 rs59056467 MICAL1 0.32 0.14 0.033 +-+++ 1.9E-05 6 109885475 rs10499052 AKD1 0.27 0.15 0.036 +++++ 2.1E-05 11 102713620 rs679620 MMP3 0.48 0.13 0.030 +++++ 3.0E-05 10 5803368 rs41290259 FAM208B 0.98 0.62 0.15 +??++ 3.7E-05 6 32151443 rs2070600 AGER 0.043 -0.33 0.080 --++- 3.7E-05 16 1718110 rs61746451 CRAMP1L 0.011 -0.57 0.15 -?+-- 8.8E-05 14 33293122 rs1051695 AKAP6 0.43 0.13 0.032 +++++ 9.0E-05 11 126162843 rs8177374 TIRAP 0.15 0.18 0.045 +--++ 9.2E-05 Only variants observed in at least 3 of the 5 analysis cohorts are reported. *The direction of effect is reported for each analysis cohort where the symbols from left to right represent 1) BEOCOPD and ICGN, 2) TCGS - Korea, 3) TCGS - Poland, 4) COPDGene AA, 5) COPDGene NHW. ^The symbol for each cohort is "+" for variant association with increased COPD susceptibility, "-" for association with decreased COPD susceptibility, and "?" if variant not present in the cohort.

To assess non-synonymous variant associations with COPD in a more racially homogeneous set of individuals, we performed a meta-analysis limited to the non-Hispanic white cohorts (BEOCOPD and ICGN combined, TCGS - Poland, and COPDGene NHW). The results were overall similar, and no individual variants reached exome-wide significance (Table S2).

Gene-based meta-analysis for gene association with COPD affection status:

In gene-based association analysis, 16006 gene sets were evaluated; 12133 gene sets were present in at least three of the five cohorts. Our top association was in the gene CYB5RL, and did not reach exome-wide significance (p=3.9x10-5). The top 10 most significant p values from SKAT-O meta-analysis can be seen in Table 4. Table 4. Top 10 results from meta-analysis across all cohorts of SKAT-O gene-based analysis of association with COPD affection status.

Meta-analysis BEOCOPD and ICGN TCGS - Korea TCGS - Poland COPDGene AA COPDGene NHW Gene Set p value p value p value p value p value p value CYB5RL 3.9E-05 9.9E-03 0.38 0.42 0.088 7.1E-03 PAICS 7.6E-05 0.12 ------0.050 2.6E-04 GON4L 1.8E-04 0.81 0.86 0.40 5.1E-04 0.10 RP11-446E24.4 2.2E-04 8.6E-03 --- 0.33 1 0.013 ALOX12 3.2E-04 9.5E-03 --- 0.19 0.012 0.24 AGER 3.3E-04 0.58 0.35 0.48 0.36 2.8E-05 MFNG 4.7E-04 0.068 --- 0.13 0.012 0.10 ZNF560 5.7E-04 0.019 --- 0.32 0.014 0.44 ALDH16A1 6.2E-04 0.054 0.30 0.34 0.89 1.2E-03 ZNF434 7.5E-04 0.072 0.11 0.56 7.9E-03 0.67 Top results from all cohorts meta-analysis with individual cohort p values. Only gene sets present in at least 3 of the 5 cohort-level analyses are reported.

We also performed SKAT-O analyses in the three non-Hispanic white cohorts

(BEOCOPD and ICGN combined, TCGS - Poland, and COPDGene NHW) to observe the gene- based associations with COPD in a more racially homogeneous set of individuals. Results were similar, and no genes reached our pre-determined threshold of exome-wide significance (Table

S3).

Candidate gene evaluation:

In the meta-analysis of single non-synonymous variants, 12 variants in candidate genes had a p value of association < 0.05 (see Table 5). The top four variants in candidate genes were rs16969968 in CHRNA5, rs679620 in MMP3, rs2070600 in AGER, and rs28929474 (the alpha-1 antitrypsin Z allele) in SERPINA1. Two variants in RIN3 and two variants in GPR126 had p values below the threshold for nominal significance in the meta-analysis (see Table 5).

Table 5. Candidate gene search in all cohorts meta-analysis of single non-synonymous variant associations with COPD affection status.

Effect Allele Chromosome Position SNP Gene Overall Freq Beta SE Direction*^ p value 15 78882925 rs16969968 CHRNA5 0.36 0.26 0.035 +++++ 1.4E-13 11 102713620 rs679620 MMP3 0.48 0.13 0.030 +++++ 3.0E-05 6 32151443 rs2070600 AGER 0.043 -0.33 0.080 --++- 3.7E-05 14 94844947 rs28929474 SERPINA1 0.021 0.45 0.12 +?+?+ 1.8E-04 6 142691549 rs11155242 GPR126 0.80 0.13 0.038 +++++ 7.7E-04 14 93118038 rs3829947 RIN3 0.47 0.094 0.030 +-+++ 2.0E-03 6 142688969 rs17280293 GPR126 0.97 0.29 0.10 +++?+ 4.0E-03 14 93119232 rs12434929 RIN3 0.11 0.24 0.092 ??+++ 8.1E-03 15 78880752 rs2229961 CHRNA5 0.02 0.32 0.13 +?+?+ 0.014 # 15 78825917 rs3885951 HYKK 0.88 -0.13 0.05 -?--- 0.015 2 189864023 rs41263773 COL3A1 0.01 0.46 0.190 +?-?+ 0.017 3 169098992 rs7622799 MECOM 0.12 -0.100 0.047 -+--- 0.029 2 218674697 rs918949 TNS1 0.630 -0.07 0.032 -+-+- 0.031 6 28264717 rs3800326 PGBD1 0.036 -0.19 0.092 ----- 0.041 6 32634339 rs36222416 HLA-DQB1 0.96 0.56 0.28 +++?? 0.043 Only variants observed in at least 3 of the 5 analysis cohorts are reported. *The direction of effect is reported for each analysis cohort where the symbols from left to right represent 1) BEOCOPD and ICGN, 2) TCGS - Korea, 3) TCGS - Poland, 4) COPDGene AA, 5) COPDGene NHW. ^The symbol for each cohort is "+" for variant association with increased COPD susceptibility, "-" for association with decreased COPD susceptibility, and "?" if variant not present in the analysis cohort. # Previously known as AGPHD1

Since the majority of our candidate gene non-synonymous variants were chosen based on significance in prior COPD and lung function GWAS, we examined linkage disequilibrium (LD) between our variants and the previously reported GWAS SNPs (Table 6). Non-synonymous

SNPs in CHRNA5 (rs16969968), AGER, GPR126 (rs11155242), and TNS1 are all closely related to the candidate gene GWAS variants with both the correlation coefficient (r2) and the normalized coefficient of LD (D’) being greater than 0.8 (in AGER the non-synonymous variant in is the same as reported significant lung function GWAS SNP). An additional set of non- synonymous variants in CHRNA5 (rs2229961), MMP3, RIN3, GPR126 (rs17280293), HYKK,

MECOM, PGBD1, and HLA-DQB1 had D’ greater than 0.7, but r2 less than or equal to 0.2.

Table 6. Linkage disequilibrium (LD) structure between candidate non-synonymous variants and significant GWAS loci.

COPD Analysis GWAS Candidate COPD SNP Gene GWAS SNP Gene Chr Analysis SNP Annotation Candidate SNP Annotation GWAS Phenotype r2 D' 15 rs16969968 CHRNA5 rs8034191 HYKK* COPD 0.89 0.95 11 rs679620 MMP3 rs626750 MMP12 COPD 0.18 0.90 6 rs2070600 AGER rs2070600 AGER FEV1/FVC 1 1 6 rs11155242 GPR126 rs3817928 GPR126 FEV1/FVC 0.96 0.99 6 rs11155242 GPR126 rs1329705^ GPR126 FEV1 < 65% 0.97 0.99 14 rs3829947 RIN3 rs754388 RIN3 Severe COPD 0.16 0.96 6 rs17280293 GPR126 rs3817928 GPR126 FEV1/FVC 0.11 0.96 6 rs17280293 GPR126 rs1329705^ GPR126 FEV1 < 65% 0.12 1 14 rs12434929 RIN3 rs754388 RIN3 COPD 0.0012 1 15 rs2229961 CHRNA5 rs8034191 HYKK* COPD 0.029 0.95 15 rs3885951 HYKK* rs8034191 HYKK* COPD 0.20 0.98 3 rs7622799 MECOM rs1344555 MECOM FEV1 0.016 0.73 2 rs918949 TNS1 rs2571445 TNS1 FEV1 0.93 0.97 6 rs3800326 PGBD1 rs6903823 ZKSCAN3 FEV1 0.071 0.95 FEV1/FVC, ever 6 rs36222416 HLA-DQB1 rs7764819 TRNAI25 0.00042 1 smoking interaction r2 represents the correlation coefficient between the COPD analysis SNP and the previous GWAS SNP. D' represents the normalized coefficient of linkage disequilibrium between the COPD analysis SNP and the previous GWAS SNP. *In our study with GENCODE v14 annotations, this gene was named AGPHD1; however, the current gene name is HYKK. ^This variant was not genome-wide significant in GWAS (p=3x10-6) (52)

The candidate gene evaluation for the meta-analysis of gene-based SKAT-O results across all five analysis cohorts revealed seven candidate genes with meta-analysis p value below the threshold for nominal statistical significance. These genes included AGER, SERPINA1, and

GPR126 (see Table 6); MMP15 and TGFB2 were just below nominal significance (0.05 < p <

0.06). We performed repeat SKAT-O meta-analysis after removing the AGER GWAS variant

(rs2070600) and the SERPINA1 Z allele (rs28929474) to assess for persistence of association signal in AGER and SERPINA1. With these top variants removed the gene-based association p values no longer met the threshold for nominal significance with AGER p=0.13 and SERPINA1 p=0.30.

Table 7. Candidate gene search in all cohorts meta-analysis of gene-based SKAT-O association testing of COPD affection status.

Meta- BEOCOPD and TCGS - COPDGene Candidate analysis ICGN TCGS - Korea Poland COPDGene AA NHW Gene Set p value p value p value p value p value p value AGER 3.3E-04 0.58 0.35 0.48 0.36 2.8E-05 SERPINA1 2.0E-03 0.32 0.81 2.9E-03 0.18 0.053 GPR126 3.7E-03 0.38 0.079 0.84 0.83 6.2E-03 LYPLAL1 6.2E-03 0.77 0.086 ------8.8E-03 ATP13A2 7.0E-03 0.15 --- 0.77 0.030 0.27 STAT6 0.038 0.33 0.49 --- 0.013 0.73 NUDT5 0.039 0.25 ------0.019 0.36 Top results from all cohorts meta-analysis with individual cohort p values. Only gene sets present in at least 3 of the 5 cohort- level analyses are reported.

Discussion

The genetic risk factors for COPD are still largely unknown. Coding variants are known to predispose to COPD, but most studies to date have focused on common, mostly non-coding variation. Using common and low-frequency coding variation available through the exome array, we performed single variant and gene-based associations with COPD affection status, and included six study populations with 5971 unaffected and 6054 affected persons. No novel variants or genes reached exome-wide statistical significance in either our single-variant or gene- based analysis, suggesting that uncommon and rare coding variants (MAF 0.1 - 5%) are unlikely to explain a significant portion of the “missing heritability” in COPD.

Our top single non-synonymous variant was rs16969968 in CHRNA5, a gene known to be associated with COPD susceptibility. The SNP in CHRNA5 is in strong LD (r-squared 0.89, D’

0.95) with rs8034191, one of the first-identified COPD associations through GWAS.(3) Another top association was rs679620 in MMP3, which is at the same locus recently associated with severe COPD in a GWAS meta-analysis.(7) rs2070600 in AGER is also a variant known to be associated with COPD and emphysema, showing genome wide significance in GWASs of FEV1/FVC ratio (53, 54) as well as the interaction between FEV1/FVC ratio and pack-years of smoking (55).

Aside from rediscovering known COPD associations in CHRNA5, MMP3, and AGER, the non-synonymous variant rs181206 in the interleukin 27 gene (IL27) is an interesting finding in our study and falls just under the threshold for exome-wide significance with p=5.6x10-6. IL27 encodes a heterodimeric cytokine complex and interacts with other interleukins to drive rapid expansion of memory CD4(+) T cells.(56) IL27 has been previously studied in COPD with

Huang et al. illustrating an association of two IL27 polymorphisms, c.-964A/G ( rs153109) and g.2905T/G (rs17855750), with COPD susceptibility in a Chinese population.(57) Cao et al. reported higher levels of IL27 in the sputum and plasma of COPD patients compared to healthy controls with a negative correlation between IL27 levels and FEV1 in COPD patients.(58) Serum

IL27 levels have recently been associated with COPD exacerbation where Angata et al. suggested that serum IL27 levels a biomarker for COPD exacerbation.(59)

Our gene-based SKAT-O meta-analysis did not reveal any exome-wide significant genes associated with COPD affection status. As with single variant testing, using gene-based association analysis we were able to recover a known association of COPD with AGER, which was likely driven by a single known variant. Our top gene-based COPD association was the gene

NADH-cytochrome b5 reductase-like (CYB5RL, p=3.9x10-5). By sequence similarity, CYB5RL is assumed to act like other NADH-cytochrome b5 reductases, which are involved in desaturation and elongation of fatty acids, cholesterol biosynthesis, drug metabolism, and methemoglobin reduction in erythrocytes.(60, 61) While CYB5RL has not been previously studied in COPD, it is expressed in lung tissue (62, 63) and is present in bronchus respiratory epithelial cells, pulmonary macrophages, and pneumocytes (64, 65). Future studies with additional sample size may allow for confirming the association of CYB5RL with COPD affection status.

Analysis of non-synonymous variants around GWAS loci may give evidence to suggest a responsible gene at a known GWAS locus. An example from our analysis is RIN3, where we identified two non-synonymous variants associated with COPD affection status. The r2 between these variants and the top reported SNP at this locus is low (< 0.2), but the D’ is high, suggesting that RIN3 may be the casual gene and that these non-synonymous variants may account for at least some of the association signal at this locus. Our discovered COPD association with rs679620 in MMP3 also illustrates the possibility of using the coding genome to clarify GWAS findings, where the COPD GWAS variant rs626750 is intergenic between MMP3 and MMP12.

Future studies, including analyses of larger sample sizes with both exome and genome-wide data, and functional assays, may help clarify these findings. Also of interest is our ability to detect some signal of association of non-synonymous variants within genes associated with lung function GWAS variants. GPR126 is one such lung function GWAS candidate gene, where two separate non-synonymous variants (rs11155242 and rs17280293) were associated with COPD with p value <0.05. GWAS of FEV1/FVC ratio has identified a variant in GPR126 at a genome- wide significant level (54); and a GWAS of airflow obstruction (FEV1 % predicted < 65) found a variant in GPR126 just below the level of genome wide significance (52). Our top GPR126 non-

2 synonymous variant is in LD with r > 0.9 and D’ = 0.99 in relation to both the FEV1/FVC and airflow obstruction GWAS variants. GPR126 is a g-coupled protein receptor (60) with relatively high lung expression (62) and model organism research showing a role in organ development

(66), making GPR126 an intriguing candidate for COPD susceptibility. Despite our analysis of over 12000 subjects, we did not discover any novel associations at an exome-wide significance level. One possible explanation for the lack of significant findings is limited sample size, where approximately 6000 cases and 6000 controls may be insufficient to uncover novel associations, particularly uncommon or rare coding variation associated with

COPD. Another limitation of our study is the racial / ethnic heterogeneity of COPD cohorts that we analyzed. This heterogeneity was most notable in the gene-based analysis where only 4519 of

16006 gene sets were common across all cohorts (where 9023 gene sets were common across the non-Hispanic white cohorts). We performed an analysis across all cohorts in hopes of maximizing power, but attempted to assess the impact of racial heterogeneity on our results by performing both single-variant and gene-based meta-analysis across only the non-Hispanic white cohorts. The results were not significantly different from including all cohorts in the meta- analysis, suggesting racial heterogeneity across cohorts did not significantly impair our ability to discover genetic associations. However, the lack of a difference in findings may have been impacted by the reduced sample size with the non-Hispanic white only meta-analysis. It is possible that with a more homogeneous composition of analysis cohorts, and a similar sample size we may uncover additional single-variant and gene-based associations. Using an exome genotyping array (as opposed to exome or whole genome sequencing) is a further limitation of our study, as we lack the ability to detect low frequency and private variants, which were explicitly excluded from exome arrays. Our testing methods are an additional limitation to our study as gene-based tests may have reduced power to detect genetic associations when many variants of null effect are aggregated across a gene. Another limitation in detecting significant associations is the heterogeneity of COPD.(67-69) Even when degree of airflow obstruction is similar, the clinical presentation of individuals can vary markedly including the severity of emphysema and frequency of exacerbations. Refining and studying phenotypes related to COPD pathogenesis may give additional power to detect novel genetic associations in COPD.

Despite these study limitations, we are encouraged that we identified both single-variant and gene-based associations with genes known to be important to COPD susceptibility and lung function. Our study also illustrates the potential utility of studying associations in the coding genome to help clarify GWAS loci. In addition to rediscovering known associations, we identified several novel associations just below exome-wide significance, including IL27, which has been studied as not only important to COPD susceptibility, but also as a possible biomarker for COPD exacerbations. Together, these data suggest that additional studies searching the coding genome for COPD susceptibility may improve our understanding of the genetic risk factors for COPD susceptibility.

References

1. American Lung Association. Chronic Obstructive Pulmonary Disease (COPD) Fact Sheet. Accessed: November 24, 2014. Available from: http://www.lung.org/lung- disease/copd/resources/facts-figures/COPD-Fact-Sheet.html. 2. Centers for Disease Control and Prevention. FastStats - Leading Causes of Death. Accessed: November 24, 2014. Available from: http://www.cdc.gov/nchs/fastats/lcod.htm. 3. Pillai SG, Ge D, Zhu G, Kong X, Shianna KV, Need AC, Feng S, Hersh CP, Bakke P, Gulsvik A, Ruppert A, Lodrup Carlsen KC, Roses A, Anderson W, Rennard SI, Lomas DA, Silverman EK, Goldstein DB, Investigators I. A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS genetics 2009; 5: e1000421. 4. Wilk JB, Chen TH, Gottlieb DJ, Walter RE, Nagle MW, Brandler BJ, Myers RH, Borecki IB, Silverman EK, Weiss ST, O'Connor GT. A genome-wide association study of pulmonary function measures in the Framingham Heart Study. PLoS genetics 2009; 5: e1000429. 5. Cho MH, Boutaoui N, Klanderman BJ, Sylvia JS, Ziniti JP, Hersh CP, DeMeo DL, Hunninghake GM, Litonjua AA, Sparrow D, Lange C, Won S, Murphy JR, Beaty TH, Regan EA, Make BJ, Hokanson JE, Crapo JD, Kong X, Anderson WH, Tal-Singer R, Lomas DA, Bakke P, Gulsvik A, Pillai SG, Silverman EK. Variants in FAM13A are associated with chronic obstructive pulmonary disease. Nature genetics 2010; 42: 200- 202. 6. Cho MH, Castaldi PJ, Wan ES, Siedlinski M, Hersh CP, Demeo DL, Himes BE, Sylvia JS, Klanderman BJ, Ziniti JP, Lange C, Litonjua AA, Sparrow D, Regan EA, Make BJ, Hokanson JE, Murray T, Hetmanski JB, Pillai SG, Kong X, Anderson WH, Tal-Singer R, Lomas DA, Coxson HO, Edwards LD, MacNee W, Vestbo J, Yates JC, Agusti A, Calverley PM, Celli B, Crim C, Rennard S, Wouters E, Bakke P, Gulsvik A, Crapo JD, Beaty TH, Silverman EK, Investigators I, Investigators E, Investigators CO. A genome- wide association study of COPD identifies a susceptibility locus on chromosome 19q13. Human molecular genetics 2012; 21: 947-957. 7. Cho MH, McDonald ML, Zhou X, Mattheisen M, Castaldi PJ, Hersh CP, Demeo DL, Sylvia JS, Ziniti J, Laird NM, Lange C, Litonjua AA, Sparrow D, Casaburi R, Barr RG, Regan EA, Make BJ, Hokanson JE, Lutz S, Dudenkov TM, Farzadegan H, Hetmanski JB, Tal- Singer R, Lomas DA, Bakke P, Gulsvik A, Crapo JD, Silverman EK, Beaty TH, Nett Genetics IE, Investigators CO. Risk loci for chronic obstructive pulmonary disease: a genome-wide association study and meta-analysis. The lancet Respiratory medicine 2014; 2: 214-225. 8. Zhou JJ, Cho MH, Castaldi PJ, Hersh CP, Silverman EK, Laird NM. Heritability of chronic obstructive pulmonary disease and related phenotypes in smokers. Am J Respir Crit Care Med 2013; 188: 941-947. 9. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM. Finding the missing heritability of complex diseases. Nature 2009; 461: 747-753. 10. Silverman EK, Sandhaus RA. Clinical practice. Alpha1-antitrypsin deficiency. N Engl J Med 2009; 360: 2749-2757. 11. Cohen J, Pertsemlidis A, Kotowski IK, Graham R, Garcia CK, Hobbs HH. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nature genetics 2005; 37: 161-165. 12. Cohen JC, Boerwinkle E, Mosley TH, Jr., Hobbs HH. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N Engl J Med 2006; 354: 1264-1272. 13. Abecasis Lab at the University of Michigan Center for Statistical Genetics. Exome Chip Design. Last modified August 6, 2013. Accessed April 28, 2015. Available from: http://genome.sph.umich.edu/wiki/Exome_Chip_Design. 14. Huyghe JR, Jackson AU, Fogarty MP, Buchkovich ML, Stancakova A, Stringham HM, Sim X, Yang L, Fuchsberger C, Cederberg H, Chines PS, Teslovich TM, Romm JM, Ling H, McMullen I, Ingersoll R, Pugh EW, Doheny KF, Neale BM, Daly MJ, Kuusisto J, Scott LJ, Kang HM, Collins FS, Abecasis GR, Watanabe RM, Boehnke M, Laakso M, Mohlke KL. Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nature genetics 2013; 45: 197-201. 15. Flannick J, Thorleifsson G, Beer NL, Jacobs SB, Grarup N, Burtt NP, Mahajan A, Fuchsberger C, Atzmon G, Benediktsson R, Blangero J, Bowden DW, Brandslund I, Brosnan J, Burslem F, Chambers J, Cho YS, Christensen C, Douglas DA, Duggirala R, Dymek Z, Farjoun Y, Fennell T, Fontanillas P, Forsen T, Gabriel S, Glaser B, Gudbjartsson DF, Hanis C, Hansen T, Hreidarsson AB, Hveem K, Ingelsson E, Isomaa B, Johansson S, Jorgensen T, Jorgensen ME, Kathiresan S, Kong A, Kooner J, Kravic J, Laakso M, Lee JY, Lind L, Lindgren CM, Linneberg A, Masson G, Meitinger T, Mohlke KL, Molven A, Morris AP, Potluri S, Rauramaa R, Ribel-Madsen R, Richard AM, Rolph T, Salomaa V, Segre AV, Skarstrand H, Steinthorsdottir V, Stringham HM, Sulem P, Tai ES, Teo YY, Teslovich T, Thorsteinsdottir U, Trimmer JK, Tuomi T, Tuomilehto J, Vaziri-Sani F, Voight BF, Wilson JG, Boehnke M, McCarthy MI, Njolstad PR, Pedersen O, Groop L, Cox DR, Stefansson K, Altshuler D. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nature genetics 2014. 16. Peloso GM, Auer PL, Bis JC, Voorman A, Morrison AC, Stitziel NO, Brody JA, Khetarpal SA, Crosby JR, Fornage M, Isaacs A, Jakobsdottir J, Feitosa MF, Davies G, Huffman JE, Manichaikul A, Davis B, Lohman K, Joon AY, Smith AV, Grove ML, Zanoni P, Redon V, Demissie S, Lawson K, Peters U, Carlson C, Jackson RD, Ryckman KK, Mackey RH, Robinson JG, Siscovick DS, Schreiner PJ, Mychaleckyj JC, Pankow JS, Hofman A, Uitterlinden AG, Harris TB, Taylor KD, Stafford JM, Reynolds LM, Marioni RE, Dehghan A, Franco OH, Patel AP, Lu Y, Hindy G, Gottesman O, Bottinger EP, Melander O, Orho-Melander M, Loos RJ, Duga S, Merlini PA, Farrall M, Goel A, Asselta R, Girelli D, Martinelli N, Shah SH, Kraus WE, Li M, Rader DJ, Reilly MP, McPherson R, Watkins H, Ardissino D, Project NGES, Zhang Q, Wang J, Tsai MY, Taylor HA, Correa A, Griswold ME, Lange LA, Starr JM, Rudan I, Eiriksdottir G, Launer LJ, Ordovas JM, Levy D, Chen YD, Reiner AP, Hayward C, Polasek O, Deary IJ, Borecki IB, Liu Y, Gudnason V, Wilson JG, van Duijn CM, Kooperberg C, Rich SS, Psaty BM, Rotter JI, O'Donnell CJ, Rice K, Boerwinkle E, Kathiresan S, Cupples LA. Association of low- frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks. Am J Hum Genet 2014; 94: 223-232. 17. Wessel J, Chu AY, Willems SM, Wang S, Yaghootkar H, Brody JA, Dauriz M, Hivert MF, Raghavan S, Lipovich L, Hidalgo B, Fox K, Huffman JE, An P, Lu Y, Rasmussen-Torvik LJ, Grarup N, Ehm MG, Li L, Baldridge AS, Stancakova A, Abrol R, Besse C, Boland A, Bork-Jensen J, Fornage M, Freitag DF, Garcia ME, Guo X, Hara K, Isaacs A, Jakobsdottir J, Lange LA, Layton JC, Li M, Hua Zhao J, Meidtner K, Morrison AC, Nalls MA, Peters MJ, Sabater-Lleal M, Schurmann C, Silveira A, Smith AV, Southam L, Stoiber MH, Strawbridge RJ, Taylor KD, Varga TV, Allin KH, Amin N, Aponte JL, Aung T, Barbieri C, Bihlmeyer NA, Boehnke M, Bombieri C, Bowden DW, Burns SM, Chen Y, Chen YD, Cheng CY, Correa A, Czajkowski J, Dehghan A, Ehret GB, Eiriksdottir G, Escher SA, Farmaki AE, Franberg M, Gambaro G, Giulianini F, Goddard WA, Goel A, Gottesman O, Grove ML, Gustafsson S, Hai Y, Hallmans G, Heo J, Hoffmann P, Ikram MK, Jensen RA, Jorgensen ME, Jorgensen T, Karaleftheri M, Khor CC, Kirkpatrick A, Kraja AT, Kuusisto J, Lange EM, Lee IT, Lee WJ, Leong A, Liao J, Liu C, Liu Y, Lindgren CM, Linneberg A, Malerba G, Mamakou V, Marouli E, Maruthur NM, Matchan A, McKean-Cowdin R, McLeod O, Metcalf GA, Mohlke KL, Muzny DM, Ntalla I, Palmer ND, Pasko D, Peter A, Rayner NW, Renstrom F, Rice K, Sala CF, Sennblad B, Serafetinidis I, Smith JA, Soranzo N, Speliotes EK, Stahl EA, Stirrups K, Tentolouris N, Thanopoulou A, Torres M, Traglia M, Tsafantakis E, Javad S, Yanek LR, Zengini E, Becker DM, Bis JC, Brown JB, Adrienne Cupples L, Hansen T, Ingelsson E, Karter AJ, Lorenzo C, Mathias RA, Norris JM, Peloso GM, Sheu WH, Toniolo D, Vaidya D, Varma R, Wagenknecht LE, Boeing H, Bottinger EP, Dedoussis G, Deloukas P, Ferrannini E, Franco OH, Franks PW, Gibbs RA, Gudnason V, Hamsten A, Harris TB, Hattersley AT, Hayward C, Hofman A, Jansson JH, Langenberg C, Launer LJ, Levy D, Oostra BA, O'Donnell CJ, O'Rahilly S, Padmanabhan S, Pankow JS, Polasek O, Province MA, Rich SS, Ridker PM, Rudan I, Schulze MB, Smith BH, Uitterlinden AG, Walker M, Watkins H, Wong TY, Zeggini E, Consortium EP-I, Laakso M, Borecki IB, Chasman DI, Pedersen O, Psaty BM, Shyong Tai E, van Duijn CM, Wareham NJ, Waterworth DM, Boerwinkle E, Linda Kao WH, Florez JC, Loos RJ, Wilson JG, Frayling TM, Siscovick DS, Dupuis J, Rotter JI, Meigs JB, Scott RA, Goodarzi MO, Consortium EP-I. Low- frequency and rare exome chip variants associate with fasting glucose and type 2 diabetes susceptibility. Nature communications 2015; 6: 5897. 18. Silverman EK, Weiss ST, Drazen JM, Chapman HA, Carey V, Campbell EJ, Denish P, Silverman RA, Celedon JC, Reilly JJ, Ginns LC, Speizer FE. Gender-related differences in severe, early-onset chronic obstructive pulmonary disease. Am J Respir Crit Care Med 2000; 162: 2152-2158. 19. Zhu G, Warren L, Aponte J, Gulsvik A, Bakke P, Anderson WH, Lomas DA, Silverman EK, Pillai SG, International CGNI. The SERPINE2 gene is associated with chronic obstructive pulmonary disease in two large populations. Am J Respir Crit Care Med 2007; 176: 167-173. 20. Patel BD, Coxson HO, Pillai SG, Agusti AG, Calverley PM, Donner CF, Make BJ, Muller NL, Rennard SI, Vestbo J, Wouters EF, Hiorns MP, Nakano Y, Camp PG, Nasute Fauerbach PV, Screaton NJ, Campbell EJ, Anderson WH, Pare PD, Levy RD, Lake SL, Silverman EK, Lomas DA, International CGN. Airway wall thickening and emphysema show independent familial aggregation in chronic obstructive pulmonary disease. Am J Respir Crit Care Med 2008; 178: 500-505. 21. Zhou X, Baron RM, Hardin M, Cho MH, Zielinski J, Hawrylkiewicz I, Sliwinski P, Hersh CP, Mancini JD, Lu K, Thibault D, Donahue AL, Klanderman BJ, Rosner B, Raby BA, Lu Q, Geldart AM, Layne MD, Perrella MA, Weiss ST, Choi AM, Silverman EK. Identification of a chronic obstructive pulmonary disease genetic determinant that regulates HHIP. Human molecular genetics 2012; 21: 1325-1335. 22. Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH, Curran-Everett D, Silverman EK, Crapo JD. Genetic epidemiology of COPD (COPDGene) study design. Copd 2010; 7: 32-43. 23. Grove ML, Yu B, Cochran BJ, Haritunians T, Bis JC, Taylor KD, Hansen M, Borecki IB, Cupples LA, Fornage M, Gudnason V, Harris TB, Kathiresan S, Kraaij R, Launer LJ, Levy D, Liu Y, Mosley T, Peloso GM, Psaty BM, Rich SS, Rivadeneira F, Siscovick DS, Smith AV, Uitterlinden A, van Duijn CM, Wilson JG, O'Donnell CJ, Rotter JI, Boerwinkle E. Best practices and joint calling of the HumanExome BeadChip: the CHARGE Consortium. PLoS One 2013; 8: e68095. 24. Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics 2010; 26: 2867-2873. 25. Therneau T, Atkinson E, Sinnwell J, Schaid D, McDonnell S. kinship2: Pedigree functions. R package version 1.6.0. URL: http://CRAN.R-project.org/package=kinship2. 2014. 26. R Core Team (2014). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. URL: http://www.R-project.org/. 27. Purcell S. PLINK [version 1.07]. URL: http://pngu.mgh.harvard.edu/purcell/plink/. 28. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559-575. 29. Purcell S, Chang C. PLINK [version 1.9]. URL: https://www.cog-genomics.org/plink2. 2015. 30. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 2015; 4: 7. 31. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics 2006; 38: 904-909. 32. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS genetics 2006; 2: e190. 33. Wang C. TRACE: fasT and Robust Ancestry Coordinate Estimation version 1.0. URL: http://www.sph.umich.edu/csg/chaolong/LASER/. 2014. 34. Wang et al. Improved ancestry estimation for both genotyping and sequencing data using projection Procrustes analysis and genotype imputation. (Manuscript in preparation). 35. Wang C, Zhan X, Bragg-Gresham J, Kang HM, Stambolian D, Chew EY, Branham KE, Heckenlively J, Study F, Fulton R, Wilson RK, Mardis ER, Lin X, Swaroop A, Zollner S, Abecasis GR. Ancestry estimation and control of population stratification for sequence- based association studies. Nature genetics 2014; 46: 409-415. 36. International HapMap C. The International HapMap Project. Nature 2003; 426: 789-796. 37. Database of Single Nucleotide Polymorphisms (dbSNP). Bethesda, MD: National Center for Biotechnology Information, National Library of Medicine. (dbSNP Build ID: 141). URL: http://www.ncbi.nlm.nih.gov/SNP/. Available from: http://www.ncbi.nlm.nih.gov/SNP/. 38. Exome Aggregation Consortium (ExAC) Release 0.2, Cambridge, MA. 2014 [cited 2014 December 3, 2014]. Available from: http://exac.broadinstitute.org. 39. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigo R, Hubbard TJ. GENCODE: the reference annotation for The ENCODE Project. Genome research 2012; 22: 1760-1774. 40. Kang HM. Efficient and Parallelizable Association Container Toolbox (EPACTS) v3.2.6. URL: http://genome.sph.umich.edu/wiki/EPACTS. 2014. 41. Chen M-H, Yang Q. GWAF: Genome-Wide Association/Interaction Analysis and Rare Variant Analysis with Family Data. R package version 2.2. URL: http://CRAN.R- project.org/package=GWAF. 2015. 42. Abecasis G, Li Y, Willer C. METAL MetaAnalysis Helper. Version release 2011-03-25. URL: http://genome.sph.umich.edu/wiki/METAL_Program. 2011. 43. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010; 26: 2190-2191. 44. Lee S, Miropolsky L, Wu M. SKAT: SNP-set (Sequence) Kernel Association Test. R package version 1.0.1. URL: http://CRAN.R-project.org/package=SKAT. 2014. 45. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 2011; 89: 82- 93. 46. Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, Team NGESP-ELP, Christiani DC, Wurfel MM, Lin X. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet 2012; 91: 224-237. 47. Chen H, Conomos MP. GMMAT: Generalized linear Mixed Model Association Tests. R package version 0.5. (Not yet publically available). 2014. 48. Chen H, Wang C, Conomos M, Stilp A, Li Z, Sofer T, Szpiro A, Chen W, Brehm J, Celedón J, Redline S, Papanicolaou G, Thornton T, Laurie C, Rice K, Lin X. Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies Using Logistic Mixed Models. (Manuscript submitted for publication). 49. Lee S. MetaSKAT: Meta analysis for SNP-set (Sequence) Kernel Association Test. R package version 0.40. URL: http://CRAN.R-project.org/package=MetaSKAT. 2014. 50. Soler Artigas M, Loth DW, Wain LV, Gharib SA, Obeidat M, Tang W, Zhai G, Zhao JH, Smith AV, Huffman JE, Albrecht E, Jackson CM, Evans DM, Cadby G, Fornage M, Manichaikul A, Lopez LM, Johnson T, Aldrich MC, Aspelund T, Barroso I, Campbell H, Cassano PA, Couper DJ, Eiriksdottir G, Franceschini N, Garcia M, Gieger C, Gislason GK, Grkovic I, Hammond CJ, Hancock DB, Harris TB, Ramasamy A, Heckbert SR, Heliovaara M, Homuth G, Hysi PG, James AL, Jankovic S, Joubert BR, Karrasch S, Klopp N, Koch B, Kritchevsky SB, Launer LJ, Liu Y, Loehr LR, Lohman K, Loos RJ, Lumley T, Al Balushi KA, Ang WQ, Barr RG, Beilby J, Blakey JD, Boban M, Boraska V, Brisman J, Britton JR, Brusselle GG, Cooper C, Curjuric I, Dahgam S, Deary IJ, Ebrahim S, Eijgelsheim M, Francks C, Gaysina D, Granell R, Gu X, Hankinson JL, Hardy R, Harris SE, Henderson J, Henry A, Hingorani AD, Hofman A, Holt PG, Hui J, Hunter ML, Imboden M, Jameson KA, Kerr SM, Kolcic I, Kronenberg F, Liu JZ, Marchini J, McKeever T, Morris AD, Olin AC, Porteous DJ, Postma DS, Rich SS, Ring SM, Rivadeneira F, Rochat T, Sayer AA, Sayers I, Sly PD, Smith GD, Sood A, Starr JM, Uitterlinden AG, Vonk JM, Wannamethee SG, Whincup PH, Wijmenga C, Williams OD, Wong A, Mangino M, Marciante KD, McArdle WL, Meibohm B, Morrison AC, North KE, Omenaas E, Palmer LJ, Pietilainen KH, Pin I, Pola Sbreve Ek O, Pouta A, Psaty BM, Hartikainen AL, Rantanen T, Ripatti S, Rotter JI, Rudan I, Rudnicka AR, Schulz H, Shin SY, Spector TD, Surakka I, Vitart V, Volzke H, Wareham NJ, Warrington NM, Wichmann HE, Wild SH, Wilk JB, Wjst M, Wright AF, Zgaga L, Zemunik T, Pennell CE, Nyberg F, Kuh D, Holloway JW, Boezen HM, Lawlor DA, Morris RW, Probst- Hensch N, International Lung Cancer C, consortium G, Kaprio J, Wilson JF, Hayward C, Kahonen M, Heinrich J, Musk AW, Jarvis DL, Glaser S, Jarvelin MR, Ch Stricker BH, Elliott P, O'Connor GT, Strachan DP, London SJ, Hall IP, Gudnason V, Tobin MD. Genome-wide association and large-scale follow up identifies 16 new loci influencing lung function. Nature genetics 2011; 43: 1082-1090. 51. Hersh CP, DeMeo DL, Silverman EK. Respiratory genetics. London & New York: Hodder Arnold; Distributed in the United States of America by Oxford University Press; 2005. p. 253-296. 52. Wilk JB, Shrine NR, Loehr LR, Zhao JH, Manichaikul A, Lopez LM, Smith AV, Heckbert SR, Smolonska J, Tang W, Loth DW, Curjuric I, Hui J, Cho MH, Latourelle JC, Henry AP, Aldrich M, Bakke P, Beaty TH, Bentley AR, Borecki IB, Brusselle GG, Burkart KM, Chen TH, Couper D, Crapo JD, Davies G, Dupuis J, Franceschini N, Gulsvik A, Hancock DB, Harris TB, Hofman A, Imboden M, James AL, Khaw KT, Lahousse L, Launer LJ, Litonjua A, Liu Y, Lohman KK, Lomas DA, Lumley T, Marciante KD, McArdle WL, Meibohm B, Morrison AC, Musk AW, Myers RH, North KE, Postma DS, Psaty BM, Rich SS, Rivadeneira F, Rochat T, Rotter JI, Artigas MS, Starr JM, Uitterlinden AG, Wareham NJ, Wijmenga C, Zanen P, Province MA, Silverman EK, Deary IJ, Palmer LJ, Cassano PA, Gudnason V, Barr RG, Loos RJ, Strachan DP, London SJ, Boezen HM, Probst-Hensch N, Gharib SA, Hall IP, O'Connor GT, Tobin MD, Stricker BH. Genome- wide association studies identify CHRNA5/3 and HTR4 in the development of airflow obstruction. Am J Respir Crit Care Med 2012; 186: 622-632. 53. Repapi E, Sayers I, Wain LV, Burton PR, Johnson T, Obeidat M, Zhao JH, Ramasamy A, Zhai G, Vitart V, Huffman JE, Igl W, Albrecht E, Deloukas P, Henderson J, Granell R, McArdle WL, Rudnicka AR, Wellcome Trust Case Control C, Barroso I, Loos RJ, Wareham NJ, Mustelin L, Rantanen T, Surakka I, Imboden M, Wichmann HE, Grkovic I, Jankovic S, Zgaga L, Hartikainen AL, Peltonen L, Gyllensten U, Johansson A, Zaboli G, Campbell H, Wild SH, Wilson JF, Glaser S, Homuth G, Volzke H, Mangino M, Soranzo N, Spector TD, Polasek O, Rudan I, Wright AF, Heliovaara M, Ripatti S, Pouta A, Naluai AT, Olin AC, Toren K, Cooper MN, James AL, Palmer LJ, Hingorani AD, Wannamethee SG, Whincup PH, Smith GD, Ebrahim S, McKeever TM, Pavord ID, MacLeod AK, Morris AD, Porteous DJ, Cooper C, Dennison E, Shaheen S, Karrasch S, Schnabel E, Schulz H, Grallert H, Bouatia-Naji N, Delplanque J, Froguel P, Blakey JD, Team NRS, Britton JR, Morris RW, Holloway JW, Lawlor DA, Hui J, Nyberg F, Jarvelin MR, Jackson C, Kahonen M, Kaprio J, Probst-Hensch NM, Koch B, Hayward C, Evans DM, Elliott P, Strachan DP, Hall IP, Tobin MD. Genome-wide association study identifies five loci associated with lung function. Nature genetics 2010; 42: 36-44. 54. Hancock DB, Eijgelsheim M, Wilk JB, Gharib SA, Loehr LR, Marciante KD, Franceschini N, van Durme YM, Chen TH, Barr RG, Schabath MB, Couper DJ, Brusselle GG, Psaty BM, van Duijn CM, Rotter JI, Uitterlinden AG, Hofman A, Punjabi NM, Rivadeneira F, Morrison AC, Enright PL, North KE, Heckbert SR, Lumley T, Stricker BH, O'Connor GT, London SJ. Meta-analyses of genome-wide association studies identify multiple loci associated with pulmonary function. Nature genetics 2010; 42: 45-52. 55. Hancock DB, Artigas MS, Gharib SA, Henry A, Manichaikul A, Ramasamy A, Loth DW, Imboden M, Koch B, McArdle WL, Smith AV, Smolonska J, Sood A, Tang W, Wilk JB, Zhai G, Zhao JH, Aschard H, Burkart KM, Curjuric I, Eijgelsheim M, Elliott P, Gu X, Harris TB, Janson C, Homuth G, Hysi PG, Liu JZ, Loehr LR, Lohman K, Loos RJ, Manning AK, Marciante KD, Obeidat M, Postma DS, Aldrich MC, Brusselle GG, Chen TH, Eiriksdottir G, Franceschini N, Heinrich J, Rotter JI, Wijmenga C, Williams OD, Bentley AR, Hofman A, Laurie CC, Lumley T, Morrison AC, Joubert BR, Rivadeneira F, Couper DJ, Kritchevsky SB, Liu Y, Wjst M, Wain LV, Vonk JM, Uitterlinden AG, Rochat T, Rich SS, Psaty BM, O'Connor GT, North KE, Mirel DB, Meibohm B, Launer LJ, Khaw KT, Hartikainen AL, Hammond CJ, Glaser S, Marchini J, Kraft P, Wareham NJ, Volzke H, Stricker BH, Spector TD, Probst-Hensch NM, Jarvis D, Jarvelin MR, Heckbert SR, Gudnason V, Boezen HM, Barr RG, Cassano PA, Strachan DP, Fornage M, Hall IP, Dupuis J, Tobin MD, London SJ. Genome-wide joint meta-analysis of SNP and SNP-by-smoking interaction identifies novel loci for pulmonary function. PLoS genetics 2012; 8: e1003098. 56. Online Mendelian Inheritance in Man, OMIM®. Johns Hopkins University, Baltimore, MD. MIM Number: 608273. Date Last Edit: 08/18/2014. URL: http://omim.org/. 57. Huang N, Liu L, Wang XZ, Liu D, Yin SY, Yang XD. Association of interleukin (IL)-12 and IL-27 gene polymorphisms with chronic obstructive pulmonary disease in a Chinese population. DNA Cell Biol 2008; 27: 527-531. 58. Cao J, Zhang L, Li D, Xu F, Huang S, Xiang Y, Yin Y, Ren G. IL-27 is elevated in patients with COPD and patients with pulmonary TB and induces human bronchial epithelial cells to produce CXCL10. Chest 2012; 141: 121-130. 59. Angata T, Ishii T, Gao C, Ohtsubo K, Kitazume S, Gemma A, Kida K, Taniguchi N. Association of serum interleukin-27 with the exacerbation of chronic obstructive pulmonary disease. Physiol Rep 2014; 2. 60. UniProtKB: Protein knowledgebase. URL: http://www.uniprot.org/. [cited 2015 April 27]. 61. UniProt C. UniProt: a hub for protein information. Nucleic acids research 2015; 43: D204- 212. 62. GTEx Portal. Version: 4. Build #: 199. URL: http://www.gtexportal.org/home/. [cited 2015 April 25]. 63. Consortium GT. The Genotype-Tissue Expression (GTEx) project. Nature genetics 2013; 45: 580-585. 64. The Human Protein Atlas. URL: http://www.proteinatlas.org/. [cited 2015 April 25]. 65. Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A, Olsson I, Edlund K, Lundberg E, Navani S, Szigyarto CA, Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T, Edqvist PH, Berling H, Tegel H, Mulder J, Rockberg J, Nilsson P, Schwenk JM, Hamsten M, von Feilitzen K, Forsberg M, Persson L, Johansson F, Zwahlen M, von Heijne G, Nielsen J, Ponten F. Proteomics. Tissue-based map of the human proteome. Science 2015; 347: 1260419. 66. Patra C, Monk KR, Engel FB. The multiple signaling modalities of adhesion G protein- coupled receptor GPR126 in development. Receptors Clin Investig 2014; 1: 79. 67. Han MK, Criner GJ. Update in chronic obstructive pulmonary disease 2012. Am J Respir Crit Care Med 2013; 188: 29-34. 68. Rennard SI, Vestbo J. The many "small COPDs": COPD should be an orphan disease. Chest 2008; 134: 623-627. 69. Hersh CP, Make BJ, Lynch DA, Barr RG, Bowler RP, Calverley PM, Castaldi PJ, Cho MH, Coxson HO, DeMeo DL, Foreman MG, Han MK, Harshfield BJ, Hokanson JE, Lutz S, Ramsdell JW, Regan EA, Rennard SI, Schroeder JD, Sciurba FC, Steiner RM, Tal-Singer R, van Beek E, Jr., Silverman EK, Crapo JD, Copdgene, Investigators E. Non- emphysematous chronic obstructive pulmonary disease is associated with diabetes mellitus. BMC pulmonary medicine 2014; 14: 164.

Supplement to:

Exome Array Analysis of Chronic Obstructive Pulmonary Disease BD Hobbs1,2, MM Parker3, H Chen4, M Hardin1,2, D Qiao4, I Hawrylkiewicz5, P Sliwinski5, JJ Yim6, WJ Kim7, DK Kim8, A Agusti9, BJ Make10, JD Crapo10, PM Calverley11, CF Donner12, D Lomas13, E Wouters14, J Vestbo15, PD Paré16, RD Levy16, S Rennard17, NM Laird4, X Lin4, TH Beaty3, EK Silverman1,2, MH Cho1,2; COPDGene and International COPD Genetics Network Investigators

1) Channing Division of Network Medicine, 2) Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America 3) Department of Epidemiology, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, Maryland, United States of America 4) Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America 5) National Tuberculosis and Lung Disease Research Institute, Warsaw, Poland 6) Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Seoul National University College of Medicine, South Korea 7) Kangwon National University, Chuncheon, South Korea 8) Seoul National University Boramae Medical Center, Seoul, South Korea 9) Thorax Institute, Hospital Clinic, IDIBAPS, University of Barcelona, CIBERES, Barcelona, Spain 10) National Jewish Health, Denver, Colorado, United States of America 11) University of Liverpool, Liverpool, United Kingdom 12) Multidisciplinary and Rehabilitation Outpatient Clinic, Mondo Medico di I.F.I.M. srl, Borgomanero (NO), Italy 13) University College London, United Kingdom 14) University Hospital Maastricht, The Netherlands 15) Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark 16) University of British Columbia, Vancouver, Canada 17) Pulmonary and Critical Care Medicine, University of Nebraska Medical Center, Omaha, Nebraska, United States of America

Supplemental Table S1. List of all candidate genes considered in the candidate gene evaluation phase of our study.

ADAM19 COL3A1^ HLA-DQB1 NIPAL4 SNRPF AGER CROCC HTR4 NMBR SOX30 AGPAT1 CSNK2B HYKK* NOTCH4 SPATA9 AIF1 CYP2A6* INTS12 NPC2^ STAT6 APOM DDX39B IREB2* NPNT STK19 ARMC2 DNER KCNE2 NUDT5 TCF19 ATP13A2 EFEMP2^ KCNJ2 PADI2 TGFB2* ATP6V1G2 EGLN2* KCNMA1 PBX2 THSD4 ATP7A^ ELN^ LINC00310 PGBD1 TMEM170A BAG6 FAM13A* LRP1 PPT2 TNF C10orf11 FBLN1^ LTA PRRC2A TNS1 C6orf15 FBLN5^ LTBP4^ PRRT1 VARS C6orf48 FLCN^ LYPLAL1 PSMA4* VTA1 CCDC38 FLJ43879 MECOM PSORS1C1 ZKSCAN3 CDC123 GPANK1 MFAP2 PTCH1 ZKSCAN4 CDSN GPR126 MIA* RAB4B* ZNF165 CFB GPSM3 MICA RARB ZNF192 CFDP1 GPX6 MICB RHOBTB3 ZNF193 CFTR^ GSTCD MMP12* RIN3* ZNF323 CHRNA3* HDAC4 MMP15 RNF5 ZSCAN12 CHRNA5* HHIP* MMP3* SDHB ZSCAN16 CHRNB4* HIST1H1B MSH5 SERPINA1^ ZSCAN23 CHST6 HIST1H2AL NCR3 SLC17A5^ ZSCAN26 CLIC1 HIVEP2 NFKBIL1 SMPD1^ Unless indicated with * or ^ all genes listed are candidates from GWAS of lung function or airflow obstruction.(1, 2) * Indicates COPD GWAS candidate gene(3) ^ Indicates Mendelian candidate gene(4)

Supplemental Table S2. Meta-analysis across non-Hispanic white cohorts of non-synonymous variant associations with COPD affection status.

Effect Allele Chromosome Position SNP Gene Overall Freq Beta SE Direction*^ p value 15 78882925 rs16969968 CHRNA5 0.38 0.26 0.036 +++ 1.4E-12 16 28513403 rs181206 IL27 0.67 -0.17 0.037 --- 6.5E-06 6 109885475 rs10499052 AKD1 0.28 0.16 0.038 +++ 1.5E-05 8 142367246 rs36092215 GPR20 0.040 -0.37 0.087 --- 2.5E-05 2 24483979 rs41281485 ITSN2 0.99 -0.86 0.21 -?- 5.0E-05 6 32151443 rs2070600 AGER 0.038 -0.34 0.084 -+- 6.3E-05 6 109767931 rs59056467 MICAL1 0.34 0.14 0.036 +++ 6.5E-05 15 79089111 rs3825807 ADAMTS7 0.54 -0.13 0.035 --- 1.5E-04 12 6153534 rs1063856 VWF 0.64 0.14 0.036 +++ 1.5E-04 14 94844947 rs28929474 SERPINA1 0.021 0.45 0.12 +++ 1.8E-04 Only variants observed in at least 2 of the 3 analysis cohorts are reported. *The direction of effect is reported for each analysis cohort where the symbols from left to right represent 1) BEOCOPD and ICGN, 2) TCGS - Poland, and 3) COPDGene NHW. ^The symbol for each cohort is "+" for variant association with increased COPD susceptibility, "-" for association with decreased COPD susceptibility, and "?" if variant not present in the analysis cohort.

Supplemental Table S3. Top 10 results from meta-analysis across non-Hispanic white cohorts of SKAT-O gene-based analysis of association with COPD affection status.

COPDGene Meta-analysis BEOCOPD and ICGN TCGS - Poland NHW Gene Set p value p value p value p value CYB5RL 7.9E-05 9.9E-03 0.42 7.1E-03 PAICS 9.5E-05 0.12 --- 2.6E-04 ALDH16A1 1.0E-04 0.054 0.34 1.2E-03 C16orf7 1.5E-04 0.027 1 8.5E-03 RP11-446E24.4 1.5E-04 8.6E-03 0.33 0.013 AGER 2.5E-04 0.58 0.48 2.8E-05 MECR 3.3E-04 0.072 0.31 0.010 KCNA3 6.4E-04 0.12 --- 2.0E-03 ACTR1B 6.6E-04 0.43 0.11 1.2E-03 COQ2 6.6E-04 0.19 0.30 2.3E-03

Top results from NHW cohorts meta-analysis with individual cohort p values. Only gene sets present in at least 2 of the 3 cohort-level analyses are reported.

Supplement References:

S1. Soler Artigas M, Loth DW, Wain LV, Gharib SA, Obeidat M, Tang W, Zhai G, Zhao JH, Smith AV, Huffman JE, Albrecht E, Jackson CM, Evans DM, Cadby G, Fornage M, Manichaikul A, Lopez LM, Johnson T, Aldrich MC, Aspelund T, Barroso I, Campbell H, Cassano PA, Couper DJ, Eiriksdottir G, Franceschini N, Garcia M, Gieger C, Gislason GK, Grkovic I, Hammond CJ, Hancock DB, Harris TB, Ramasamy A, Heckbert SR, Heliovaara M, Homuth G, Hysi PG, James AL, Jankovic S, Joubert BR, Karrasch S, Klopp N, Koch B, Kritchevsky SB, Launer LJ, Liu Y, Loehr LR, Lohman K, Loos RJ, Lumley T, Al Balushi KA, Ang WQ, Barr RG, Beilby J, Blakey JD, Boban M, Boraska V, Brisman J, Britton JR, Brusselle GG, Cooper C, Curjuric I, Dahgam S, Deary IJ, Ebrahim S, Eijgelsheim M, Francks C, Gaysina D, Granell R, Gu X, Hankinson JL, Hardy R, Harris SE, Henderson J, Henry A, Hingorani AD, Hofman A, Holt PG, Hui J, Hunter ML, Imboden M, Jameson KA, Kerr SM, Kolcic I, Kronenberg F, Liu JZ, Marchini J, McKeever T, Morris AD, Olin AC, Porteous DJ, Postma DS, Rich SS, Ring SM, Rivadeneira F, Rochat T, Sayer AA, Sayers I, Sly PD, Smith GD, Sood A, Starr JM, Uitterlinden AG, Vonk JM, Wannamethee SG, Whincup PH, Wijmenga C, Williams OD, Wong A, Mangino M, Marciante KD, McArdle WL, Meibohm B, Morrison AC, North KE, Omenaas E, Palmer LJ, Pietilainen KH, Pin I, Pola Sbreve Ek O, Pouta A, Psaty BM, Hartikainen AL, Rantanen T, Ripatti S, Rotter JI, Rudan I, Rudnicka AR, Schulz H, Shin SY, Spector TD, Surakka I, Vitart V, Volzke H, Wareham NJ, Warrington NM, Wichmann HE, Wild SH, Wilk JB, Wjst M, Wright AF, Zgaga L, Zemunik T, Pennell CE, Nyberg F, Kuh D, Holloway JW, Boezen HM, Lawlor DA, Morris RW, Probst- Hensch N, International Lung Cancer C, consortium G, Kaprio J, Wilson JF, Hayward C, Kahonen M, Heinrich J, Musk AW, Jarvis DL, Glaser S, Jarvelin MR, Ch Stricker BH, Elliott P, O'Connor GT, Strachan DP, London SJ, Hall IP, Gudnason V, Tobin MD. Genome-wide association and large-scale follow up identifies 16 new loci influencing lung function. Nature genetics 2011; 43: 1082-1090. S2. Hancock DB, Artigas MS, Gharib SA, Henry A, Manichaikul A, Ramasamy A, Loth DW, Imboden M, Koch B, McArdle WL, Smith AV, Smolonska J, Sood A, Tang W, Wilk JB, Zhai G, Zhao JH, Aschard H, Burkart KM, Curjuric I, Eijgelsheim M, Elliott P, Gu X, Harris TB, Janson C, Homuth G, Hysi PG, Liu JZ, Loehr LR, Lohman K, Loos RJ, Manning AK, Marciante KD, Obeidat M, Postma DS, Aldrich MC, Brusselle GG, Chen TH, Eiriksdottir G, Franceschini N, Heinrich J, Rotter JI, Wijmenga C, Williams OD, Bentley AR, Hofman A, Laurie CC, Lumley T, Morrison AC, Joubert BR, Rivadeneira F, Couper DJ, Kritchevsky SB, Liu Y, Wjst M, Wain LV, Vonk JM, Uitterlinden AG, Rochat T, Rich SS, Psaty BM, O'Connor GT, North KE, Mirel DB, Meibohm B, Launer LJ, Khaw KT, Hartikainen AL, Hammond CJ, Glaser S, Marchini J, Kraft P, Wareham NJ, Volzke H, Stricker BH, Spector TD, Probst-Hensch NM, Jarvis D, Jarvelin MR, Heckbert SR, Gudnason V, Boezen HM, Barr RG, Cassano PA, Strachan DP, Fornage M, Hall IP, Dupuis J, Tobin MD, London SJ. Genome-wide joint meta-analysis of SNP and SNP-by-smoking interaction identifies novel loci for pulmonary function. PLoS genetics 2012; 8: e1003098. S3. Cho MH, McDonald ML, Zhou X, Mattheisen M, Castaldi PJ, Hersh CP, Demeo DL, Sylvia JS, Ziniti J, Laird NM, Lange C, Litonjua AA, Sparrow D, Casaburi R, Barr RG, Regan EA, Make BJ, Hokanson JE, Lutz S, Dudenkov TM, Farzadegan H, Hetmanski JB, Tal- Singer R, Lomas DA, Bakke P, Gulsvik A, Crapo JD, Silverman EK, Beaty TH, Nett Genetics IE, Investigators CO. Risk loci for chronic obstructive pulmonary disease: a genome-wide association study and meta-analysis. The lancet Respiratory medicine 2014; 2: 214-225. S4. Hersh CP, DeMeo DL, Silverman EK. Respiratory genetics. London & New York: Hodder Arnold; Distributed in the United States of America by Oxford University Press; 2005. p. 253-296.