EVALUATION OF THE GENETIC ARCHITECTURE OF CYSTIC FIBROSIS

by Melis Atalar Aksit

A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy

Baltimore, Maryland March 2020

© 2020 Melis Atalar Aksit All rights reserved

Abstract

Cystic fibrosis (CF) is an autosomal recessive disease caused by variants in

CFTR. Individuals with CF show a high degree of variability in disease severity, complications and survival. This phenotypic variability is influenced by a combination of the allelic heterogeneity in CFTR, genetic modifiers outside of CFTR, environmental factors, and stochastic (random) factors. In this study, I evaluate genetic modifiers that influence the age of onset of diabetes occurring as a complication of CF (cystic fibrosis- related diabetes, or CFRD), and the impact of one of the most common CF-causing mutations in CFTR, W1282X.

In Chapter 2, in a genome-wide association study for CFRD, I obtained genome- wide significance for variants at a novel (PTMA) and two known CFRD genetic modifiers (TCF7L2 and SLC26A9). Furthermore, I determined CFRD and T2D have more etiologic and mechanistic overlap than previously known, aligning along pathways involving β-cell function rather than insulin sensitivity. In Chapter 3, I identified a variant in a well-studied locus, CDKN2B-AS1, that is associated with markedly earlier onset of cystic fibrosis-related diabetes in females. This finding could partially explain why females with cystic fibrosis develop diabetes at a younger age than males, and provides new information on disease etiology that could affect future risk-assessment strategies. In Chapter 4, I demonstrated how W1282X, one of the most common CF- causing mutations, results in a premature termination codon which induces nonsense- mediated mRNA decay. This finding indicates transcripts with this mutation will not generate any protein, and hence treatment with CFTR modulators (which act on expressed protein) would not be beneficial.

ii Taken together, these findings highlight the importance of understanding the genetic etiology of each individual’s disease for appropriate treatment. The nature of the

CF-causing mutation in CFTR will inform proper treatment, and genetic risk factors outside of CFTR influences disease severity may be used to adjust disease surveillance strategies such as adjusting the age or frequency of diabetes screening tests. Furthermore, this work provides a better understanding of the genetic architecture of cystic fibrosis- related diabetes, which gives us insight into the mechanism of this disease.

iii Thesis Committee

Advisor & 1st Reader: Garry R. Cutting, MD Professor of Pediatrics and Medicine McKusick-Nathans Institute of the Department of Genetic Medicine Johns Hopkins School of Medicine

Advisor & 2nd Reader: Scott M. Blackman, MD, PhD Associate Professor of Pediatrics McKusick-Nathans Institute of the Department of Genetic Medicine Johns Hopkins School of Medicine

Chair & 3rd Reader: Terri H. Beaty, PhD Professor Department of Epidemiology Johns Hopkins Bloomberg School of Public Health

Loyal A. Goff, PhD Assistant Professor of Neuroscience McKusick-Nathans Institute of the Department of Genetic Medicine Johns Hopkins School of Medicine

Dan E. Arking, PhD Professor of Medicine McKusick-Nathans Institute of the Department of Genetic Medicine Johns Hopkins School of Medicine

iv Dedication

To my parents, whom I owe this PhD to. Their unconditional support and selfless sacrifices has brought me to where I am today.

To my sister, who has been by my side since day one, and has brought so much joy into my life.

To my husband, who has brought out the best in me. I would not have been able to get through my thesis without him, and I know I can get through anything because he is by my side.

v Acknowledgements

My thesis would not have been possible without the support of many people I am so lucky to have in my life. First and foremost, I’d like to thank my thesis advisors, Dr. Garry Cutting and Dr. Scott Blackman. I am very lucky to be mentored by these two amazing physician scientists. Dr. Cutting has provided a great lab environment with a lot of support. He keeps all of us motivated and provides us with all the resources we need to be successful. Dr. Blackman patiently guided me on how to do better analyses, payed attention to the detail, and helped perfect my work. Together, the two of them provided the perfect balance which made my doctoral thesis work very enjoyable. I am very grateful for all of their help and guidance. In addition, I would like to thank my colleagues in the lab, who have made me look forward to coming in to lab every day. Dr. Neeraj Sharma has been an unofficial third mentor to me. He is extremely hard working, very knowledgeable, and always happy to answer all of my questions. I am very grateful for all of the opportunities and support he gave me along the way. In addition, the upper year PhD students, Dr Briana Vecchio-Pagan, Dr. Melissa Lee, Dr. Ted Han, Dr. Allison McCague, and Dr. Anh-Thu Lam have been a great example for me and provided me advice that allowed my thesis work to go much more smoothly. I am especially grateful for Dr. Anh-Thu Lam, my “kitten pair”, who is always on top of things, has patiently listened and helped me prepare for all of my presentations and exams, and most importantly has become a great, lifelong friend. In addition, I am grateful for Karen Raraigh, who has been a great resource for me; she knows all of the details of all of the genetic variants in CFTR by heart, and always helps put things into perspective from a patient stand point. I would also like to thank all of the other past and present lab members; Anya Joynt, Alyssa Bowling, Matthew Pellicore, Taylor Evans, Emily Marcisak, Corey Lu, Yasmin Akhtar, Kathleen Paul and Derek Osorio, for the countless great memories and making my time as a doctoral thesis student so enjoyable. In addition, I am very grateful for Patricia Cornwall, who has made sure that everything goes very smoothly, and has always known how to make me laugh in the process.

vi I would also like to thank our collaborators at the Center for Inherited Disease Research (CIDR) of JHMI. Dr. Hua Ling, Dr. Peng Zhang, Dr. Elizabeth Pugh, and Kurt Hetrick have been extremely helpful and undoubtedly significantly improved my thesis work. Lastly, I would like to thank the human genetics program. I am grateful for Dr. Barbara Migeon for founding this PhD program which provided me with so many opportunities. I am also very grateful for Dr. David Valle and Sandy Muscelli for providing an environment filled with so many opportunities and making sure everything goes smoothly in the process. Finally, I would like to thank my fellow doctoral students in the program, for their support and friendship. I owe my thesis to all of these people, whom together provided me with this incredible support system. I am very grateful for all of their support, advice, and encouragement.

vii Table of Contents

Abstract ii

Thesis Committee iv

Dedication v

Acknowledgements vi

Table of Contents viii

List of Tables x

List of Figures xii

Chapter 1: Introduction 1

1.1 Cystic Fibrosis 2

1.2 Phenotypic heterogeneity in CF 2

1.3 Cystic Fibrosis-Related Diabetes 4

1.4 Genetic Modifiers of Cystic Fibrosis-Related Diabetes 6

1.5 Nonsense-mediated mRNA Decay and W1282X 8

Chapter 2: Genetic modifiers of cystic fibrosis-related diabetes 9

have extensive overlap with type 2 diabetes and related traits

2.1 Introduction 10

2.2 Results 12

2.3 Discussion 22 2.4 Methods 28

Chapter 3: Common variants in CDKN2B-AS1 delay onset of 68

CFRD in females

3.1 Introduction 69

viii 3.2 Results 70

3.3 Discussion 74

3.4 Methods 76

Chapter 4: Decreased mRNA and protein stability of W1282X 84

limits response to modulator therapy

4.1 Introduction 85

4.2 Results 87

4.3 Discussion 93

4.4 Materials and Methods 97

Chapter 5: Conclusions 114

Reference 118

Curriculum Vitae 132

ix List of Tables

Chapter 2:

Table 2.1 Characteristics of patients enrolled by the studies 41

comprising the International Cystic Fibrosis Modifier Consortium.

Table 2.2 Association statistics for top variant at each genome-wide 42

significant locus.

Table S2.1 Associations of potential covariates with CFRD onset. 50

Table S2.2 Association statistics of variants that exceeded genome-wide 51

significance.

Table S2.3 Association statistics for all variants that exceeded suggestive 52

significance.

Table S2.4 Association statistics of T2D risk variants previously identified as 54

CFRD modifiers other than TCF7L2.

Table S2.5 Associations of variants with T2D and CFRD that were 55

genome-wide significant in either study.

Table S2.6 Variants included in each polygenic risk score. 60

Table S2.7 Polygenic Risk Score (PRS) association statistics. 65

Table S2.8 Variants included in the CFRD PRS. 66

Table S2.9 Table S2.11. Association summary statistics for F508del only,

non F508del homozygotes, and F508del homozygosity * SNP

interaction term analyses in variants that exceeded genome-wide

significance.

x Chapter 4:

Table S4.1 EMGs containing full-length CFTR cDNA and flanking introns 113

faithfully reproduce splicing patterns observed in affected tissues.

xi List of Figures

Chapter 2:

Figure 2.1 Manhattan plot of Phase 1 + 2 combined association analysis. 36

Figure 2.2 LocusZoom and Forest Plots of genome-wide significant loci, 37

TCF7L2 (A-B), PTMA (C-D), and SLC26A9 (E-F).

Figure 2.3 Comparison of the genetic risk architectures of CFRD and T2D. 38

Figure 2.4 Cumulative incidence (A-B) and prevalence (C-D) plots by CFRD PRS 40

in test and replication populations.

Figure S2.1 Q-Q plot for association with CFRD. 43

Figure S2.2 LocusZoom conditional plots at loci that associate with CFRD onset 44

with genome-wide significance.

Figure S2.3 Epigenetic mapping at the TCF7L2 locus. 45

Figure S2.4 Epigenetic mapping at the PTMA locus. 46

Figure S2.5 Epigenetic mapping at the SLC26A9 locus. 47

Figure S2.6 Direction of effect concordance between CFRD and T2D GWASs. 48

Figure S2.7 Genetic overlap between T1D and CFRD. 49

Chapter 3:

Figure 3.1 Kaplan–Meier plot of CFRD onset in males and females. 78

Figure 3.2 Cumulative incidence of CFRD in males and females binned 79

by rs1333045 genotype

xii Figure 3.3 LocusZoom and forest plots of association with CFRD onset 80

at the CDKN2A/B locus.

Figure 3.4 LocusZoom plots of association between rs1333045 and 81

CFRD, T2D and CAD

Figure 3.5 Manhattan plot of (A) sex*SNP interaction term, (B) male-only, 82

and (C) and female only analyses

Chapter 4:

Figure 4.1 Assessment of CFTR mRNA bearing W1282X in the primary nasal 102

epithelial cells of healthy heterozygous carriers of W1282X.

Figure 4.2 Transcriptome analysis of the primary nasal epithelial cells of 104

W1282X proband, carrier parents, and controls.

Figure 4.3 Evaluation of W1282X mRNA stability in primary nasal epithelial 106

cells of W1282X carrier parents.

Figure 4.4 Assessment of mRNA stability and protein production in cell line 108

model expressing an expression minigene (EMG) bearing W1282X.

Figure S4.1 Volcano plots of differential gene expression between the proband 110

and control.

Figure S4.2 Normal splicing of WT-EMG i21-24 in Flp-In-293 stable cells. 111

Figure S4.3 Semi-log plot of the time-course experiment to determine the mRNA 112

half-life.

xiii

Chapter 1

Introduction

1 1.1 Cystic Fibrosis

Cystic Fibrosis (CF) is a life-limiting, progressive, autosomal recessive disease that affects ~70,000 individuals worldwide. It is caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene, which encodes a chloride and bicarbonate channel regulating fluid movement across epithelial tissue layers. Loss of

CFTR causes chronic dehydration of the surface fluids of secretory epithelia which in turn results in chronic obstruction, infection, inflammation, and ultimate destruction of affected organs.

The most severely affected organs with regards to human pathophysiology are the lungs, pancreas, and the gastrointestinal tract. Exocrine pancreatic dysfunction results in malnutrition and poor growth, which leads to death in the first decade of life for most untreated individuals. Replacement of pancreatic and intensive pulmonary, nutritional, and antimicrobial therapies have revolutionized the treatment of cystic fibrosis, resulting in progressive improvements in survival. Based on the 2017 CF foundation registry data, the life expectancy of individuals with CF born between 2013 and 2017 is predicted to be 44 years. Obstructive lung disease is currently the primary cause of morbidity and is responsible for ~80% of mortality among CF patients.

1.2 Phenotypic heterogeneity in CF

Individuals with cystic fibrosis show a high degree of variability in disease severity, complications and survival. This variable expressivity is influenced in part by the allelic heterogeneity in CFTR. More than 2000 variants have been identified in CFTR, over 300 of which are known to cause CF. These variants disrupt synthesis, processing,

2 and/or function of CFTR to different degrees, which in turn influence the severity of clinical features of CF. Mutations predicted to prevent CFTR biosynthesis or produce nonfunctional protein tend to have more severe consequences, whereas mutations that cause minor changes in the protein function result in a milder phenotype. Functional studies have demonstrated the CFTR genotype correlates with the severity of many CF manifestations such as sweat chloride concentration and exocrine pancreatic insufficiency. Diabetes in CF is also highly influenced by the specific CFTR genotype; in the adult population, the overall observed prevalence of CFRD in those with severe

CFTR genotypes was greater than in those with mild genotypes (60% vs. 14%;

P < 0.0001)(1). In Chapter 4, I discuss the consequences of a nonsense mutation in

CFTR, W1282X, which results in a severe CF phenotype.

Though CFTR mutations largely determine the severity of some features of CF such as sweat chloride levels and pancreatic exocrine insufficiency, there is extensive phenotypic heterogeneity in many other phenotypes among CF patients carrying the same

CFTR genotype. To estimate the degree to which variables beyond CFTR influence variability in disease severity, families with affected twins and siblings have been analyzed. Affected twin and sibling pairs have the same CFTR genotype, however monozygotic twins are genetically identical and dizygotic twins and siblings share approximately 50% of their genome. Their environmental exposures are similar in utero and when they grow in the same household, but diverge after they move apart. Hence, twin and sibling studies can provide a naturally balanced, age-matched powerful dataset to quantify the degree to which trait variance can be attributed to genetic and environmental variance outside of CFTR. The contribution of genetic modifiers to several

3 traits relevant to clinical CF have been estimated through twin and sibling studies as

~50% in lung function, 54-80% in body mass index (BMI), 42-98% in diabetes, and

~100% in meconium ileus.

1.3 Cystic Fibrosis-Related Diabetes

Cystic fibrosis-related diabetes (CFRD) is an important complication of CF. As in all types of diabetes, individuals with CFRD have elevated blood glucose levels

(hyperglycemia); however, CFRD is characterized by a gradual decline in production of insulin, while insulin sensitivity is usually normal.

CFRD is a unique type of diabetes, distinct from type 1 and type 2 diabetes. In type 1 diabetes, hyperglycemia is due to absence of insulin-producing β-cells in the pancreatic islets, and usually has an abrupt and symptomatic onset. In type 2 diabetes, hyperglycemia results from a combination of reduced sensitivity to insulin and insufficient production of insulin. Reduced early-phase insulin release, prolonged hyperglycemia, and reactive hypoglycemia are all seen in both CFRD and in type 2 diabetes.

Individuals with CFRD tend to have worse lung disease, poorer nutritional status, and increased mortality, all of which have been shown to improve with treatment of

CFRD. Therefore, detection and appropriate treatment of CFRD is a key component of the medical care of an individual with CF.

As of 2017, in the U.S. CF Foundation Patient Registry, the prevalence of CFRD among all living patients was 18.5%, with 5.3% of individuals younger than 18 and 31% of individuals older than 18 affected. The risk of CFRD is about 5x higher in people with

4 CFTR genotypes causing exocrine pancreatic insufficiency than in those with mutations with some residual function resulting in pancreatic sufficiency. In addition, at all ages approximately 50% more females are affected by diabetes than men, even though cystic fibrosis is equally common in men and women.

The exact cause of CFRD is unknown. The strong correlation of CFRD with exocrine pancreatic insufficiency suggests that the abnormally viscous secretions that plug pancreatic ducts in pancreatic insufficient patients could be reducing endocrine pancreatic mass, hence resulting in CFRD. In autopsy studies, CFRD did not correlate with islet number or mass; pancreatic islets are relatively spared, but the number and mass of islets are reduced in all individuals with CF regardless of whether CFRD was present. However, the quantity of surviving β-cells is generally sufficient to otherwise prevent diabetes. Thus, the surviving β-cells in CFRD are believed to have functional deficiencies that also contribute to the development of CFRD. Pancreatic islet amyloid deposition, which is thought to either reflect beta cell dysfunction or actually be detrimental to beta-cells, is a characteristic of CFRD. In addition, mice with CF do not have pancreatic insufficiency, but are more prone to develop diabetes after a mild beta- cell injury. These suggest beta cells have an intrinsic defect in CFRD.

The only treatment that has been shown to improve outcomes of CFRD is insulin.

However, recent advent of small-molecule therapies aimed at restoring CFTR function has raised the question as to whether these therapies will treat or even prevent CFRD. It is currently unknown whether the causes of poor insulin secretion in patients with CF are due to reversible or irreversible defects. Exocrine pancreatic damage is extensive and thought to be irreversible by 4 years of age or earlier in most patients with CF, and

5 surviving β-cells exist in an abnormal environment and thus might be subject to failure over time even in the face of CFTR functional restoration. Both anecdotal studies on small groups of individuals (2-4), and larger studies comparing individuals with G551D eligible for the small molecule therapy ivacaftor to comparators who had never received ivacaftor in the US and UK patient (5,6) showed some evidence that ivacaftor may reduce CFRD. However, only longer-term studies will be able to address whether functional restoration of CFTR function can prevent CFRD over a span of years to decades.

1.4 Genetic Modifiers of Cystic Fibrosis-Related Diabetes

As discussed in section 1.2 major risk factor for CFRD is the CFTR genotype; however, even within individuals with identical severe CFTR genotypes, there is wide variation in the onset of CFRD. Blackman, et al.(7) demonstrated that concordance for diabetes among MZ twins (presumed to be 100% genetically identical) was 0.73, whereas the age and sex-matched affected siblings (where the sibs carry the same CFTR genotype, but are about 50% identical in the rest of their genome) was 0.18. This difference was statistically significantly different (p-value: 0.002) and yielded an estimated heritability for diabetes in CF of 0.98 (95% CI 0.42–1.0), demonstrating that genetic variants outside

CFTR can contribute significantly to CFRD onset(7).

In addition, Blackman et al, found family history of type 2 diabetes increased the risk of diabetes in individuals with CF (OR=3.1; p=0.0009)(8). This raises the possibility there is genetic overlap between type 2 diabetes and cystic fibrosis related diabetes.

6 Furthermore, this study demonstrated a well-recognized T2D risk locus, at TCF7L2, also associated with CFRD with study-wide significance(8).

In 2013, Blackman, et al. conducted a genome-wide association study for CFRD on 3,059 individuals with cystic fibrosis. This identified variants 5’ and intronic of

SLC26A9, which were previously also identified to be associated with meconium ileus

(another complication of cystic fibrosis). In this same cohort, they also observed association between the T2D-associated loci TCF7L2, CDKAL1, CDKN2A/B and

IGF2BP2 and CFRD.

In Chapter 2, I conducted a mega-analysis on individuals in the 2013 GWAS along with individuals genotyped in a second phase of this study. In addition, I used polygenic risk scores and various other methods to compare the genetic architecture of

CFRD to that of other types of diabetes and their related traits to quantify the genetic overlap between T2D and CFRD.

Though CFRD is more common in females than males, the cause for this is not known. A few studies have tried to understand the reason behind this phenomenon, however, no study to date has reported any genetic variants could cause earlier onset of

CFRD in females.

In Chapter 3, I explore genetic variants that might cause the earlier onset. I conducted stratified analysis (female-only, male-only) and tests for sex*variant interaction term in a GWAS framework, and also tested candidate variants for any female or male-specific associations.

7 1.5 Nonsense-mediated mRNA Decay and W1282X

Approximately 10% of individuals with CF harbor at least one nonsense variant in

CFTR, which introduces premature termination codons (PTCs) [https://cftr2.org].

Transcripts with a PTC involving at least 50-55 nucleotides before the last exon-exon junction complex will result in nonsense mediated mRNA decay (NMD). NMD is thought to serve as an mRNA surveillance mechanism to prevent the synthesis of truncated proteins that have the potential to have toxic effects such as dominant negative interactions. Since no protein is being produced, such variants resulting in a PTC are often times very harmful and will result in a severe phenotype.

W1282X is the most common nonsense variant reported in the CFTR2 database

[https://cftr2.org]. It results in severe CF; individuals with CF in this database who were homozygote for W1282X have an average sweat chloride of 104 (compared to the average 96 of all individuals with CF), and 98% of these individuals are pancreatic insufficient (compared to the 85% in all individuals with CF). Many studies have previously reported the nonsense mutation W1282X will result in a truncated protein, and the function of this truncated protein could be improved by being treated with modulators. In Chapter 4, we evaluate the consequence of this variant to inform whether this is indeed the case.

8

Chapter 2

Genetic modifiers of cystic fibrosis-related diabetes have extensive overlap with type 2 diabetes and related traits

MA Aksit, RG Pace, B Vecchio-Pagan, H Ling, JM Rommens, PY Boelle, L Guillot, KS Raraigh, E Pugh, P Zhang, LJ Strug, ML Drumm, MR Knowles, GR Cutting, H Corvol, SM Blackman. Genetic modifiers of cystic fibrosis-related diabetes have extensive overlap with type 2 diabetes and related traits. JCEM. 19 October 2019.

9 2.1 Introduction

Diabetes is a frequent complication of cystic fibrosis (CF), an autosomal recessive disorder affecting more than 70,000 individuals worldwide. CF is caused by loss-of- function mutations in the CF transmembrane conductance regulator (CFTR) gene which is expressed in a variety of epithelial tissues including lungs and pancreas. CF-related diabetes (CFRD) has some aspects in common with type 1 diabetes (T1D) and type 2 diabetes (T2D), but is distinct from both. Development of CFRD involves a generally slow decline in β-cell function and increased production of islet amyloid (9,10) as is the case for type 2 diabetes (T2D). However, unlike T2D, individuals with CFRD generally have normal insulin sensitivity (except during pulmonary disease exacerbations or glucocorticoid treatment) (11). While diabetes in CF is associated with some of the same long-term complications of T1D and T2D (11), the more important complications are more rapid decline in lung function and reduced survival(12).

The prevalence of CFRD increases with age, affecting more than 90% of individuals with common CFTR genotypes causing severe dysfunction by their 6th decade

(1). However, age at onset varies considerably (e.g., from 10-50 years old (1)), and it has been shown that CFRD onset is highly heritable independent of CFTR genotype (h2 =

0.98, with 95% CI 0.4-1.0) (7), indicating an important role for genetic modifiers.

Evidence of shared heritability between CFRD and T2D was demonstrated by a family study in which a history of T2D in adult non-CF family members significantly increased the risk of CFRD (OR: 3.1), and known T2D susceptibility variants in TCF7L2 were also associated with CFRD (8).

10 In a genome-wide association study involving 3,059 unrelated individuals with

CF (644 with CFRD) (“GWAS Phase 1” (13)), the International CF Gene Modifier

Consortium identified genetic variants that associated with age at onset of CFRD within and 5′ of the SLC26A9 gene (hazard ratio [HR]: 1.38; p-value: 3.6e-8). Additionally, the previous association of CFRD with variants in TCF7L2 based on candidate gene studies was supported and T2D associated variants at three candidate loci CDKAL1, CDKN2A/B, and IGF2BP2 associated with CFRD (p-value < 0.004). These five loci were estimated to account for 8.3% of the phenotypic variance in CFRD onset and had a combined population-attributable risk of 68% (13).

Given the partial phenotypic overlap between CFRD and T2D and the much earlier age at onset of diabetes in CF, we explored whether severe reduction in CFTR function in the presence of T2D risk alleles sensitizes individuals to development of diabetes. To do this, we analyzed 2,697 new subjects, updated the phenotypes of previously studied individuals (for a total 5,740 individuals with CF; 5,364 of who are unrelated), and tested for association with CFRD using genome-wide markers.

Comparison of association statistics with those for T2D and T1D, and polygenic risk scores (PRSs) for T1D, T2D and other diabetes-related traits revealed deep commonality but also distinct endophenotypic differences in the genetic architectures of T2D and

CFRD.

11 2.2 Results

2.2.1 Variants at three loci were associated with age at onset of CFRD at genome-wide significance

Genome-wide association with CFRD age at onset was performed on 5,740 subjects with CF with two severe CFTR mutations and/or clinically diagnosed exocrine pancreatic insufficiency, of which 1,341 have CFRD (combined Phase 1 and Phase 2; see Table 1 and Methods). A separate replication data set (Phase 2 Replication; 2R) was composed of 591 individuals (204 with CFRD). A Cox proportional hazards (CoxPH) model was used to test for association (event = diagnosis of CFRD; time = age at CFRD diagnosis or age at last clinic visit if no CFRD). Study site and 4 principal components were included as covariates for an ‘unadjusted’ analysis, and sex was included as an additional covariate for an ‘adjusted’ analysis (Table S2.1).

In the unadjusted analysis, variants exhibiting genome-wide significant association with CFRD age at onset were identified on 1, 2, and 10 (Figure

2.1, Table 2, Figure S2.1, Table S2.2). In the adjusted analysis, the variants on chromosomes 1 and 10 exceeded genome-wide significance (Table 2), and no other locus exceeded genome-wide significance. Variants in 6 additional loci, RASAL2, UNQ6975,

ELFN1, IMMP2L, MCPH1, CYP11B2, were associated with CFRD with suggestive significance (p-value<1.8e-6) (Table S2.3). For all loci, results were similar when

CoxPH or linear regression using Martingale residuals were used to account for the different ages at onset of diabetes (data not shown). There was no significant heterogeneity in association based on site or platform for any locus (Figure 2.2).

12 The 10 locus contains associated variants within and around the

TCF7L2 gene exceeding genome-wide significance (e.g., rs34872471; combined Phase 1

+ 2 p-value:2.80e-12; Phase 1 p-value: 2.58e-06, Phase 2 p-value: 9.69e-8) (Table S2.2).

The same variants have been shown in several populations to be associated with increased T2D risk with the same direction of effect (14) and were previously reported to be associated with CFRD using a candidate gene based approach (8,13). CFRD- associated variants in TCF7L2 all appear to be in high LD with one another (Figure

2.2A), and conditional analysis showed the CFRD-associated variants at this locus

(including rs7903146) tag the same genetic association signal as rs34872471 (Figure

S2.2A). At rs7903146 (T allele) and rs34872471 (C allele) showed evidence of association with CFRD in the Phase 2 replication dataset (N=591, one-sided p-values

1.90e-02 and 2.41e-2, respectively), and including an interaction term in the model demonstrated this association with CFRD in the replication dataset and the test dataset were not significantly different (p-values 0.54 and 0.46, respectively). All variants exceeding genome-wide significance at this locus fell into open chromatin regions intronic of TCF7L2 (Figure S2.3; data from Roadmap Epigenome Browser and UCSC

Genome Browser).

The chromosome 2 locus contains multiple associated variants surrounding

PTMA, PDE6D and COPS7B with the most significantly associated variant located 12.5 kb 5’ of PTMA (e.g. rs838455; p-value: 2.98e-8; HR: 1.75; 95% CI: 1.43-2.13; Figure

2.2B). Association of rs838455 with CFRD was replicated in the Phase 2 replication dataset (N=591, one-sided p-value: 2.37e-2; HR: 1.67). In the analysis adjusting for sex, the p-values for variants at this locus were less significant in the Phase 1+2 dataset (p-

13 value: 7.60e-8, HR: 1.74; Table 2), and more significant in the replication dataset (one- sided p-value: 2.06e-2; HR: 1.71) Individuals from both Phase 1 and Phase 2 contributed to this association signal (Table S2.2). Conditional analysis shows all CFRD-associated variants tag the same genetic association signal as rs838455 (Figure S2.2B). None of these variants are significant eQTLs for any gene within 100 kb of these variants in the pancreas, adipose tissue, brain or muscle tissue contained in the GTEx database (GTEx, v7, dbGaP Accession phs000424.v7.p2)(15). Of the genome-wide significant variants, rs838440 is located in the region predicted to be the promoter for PTMA by Chromatin

State Segmentation using a Hidden Markov Model from ENCODE/Broad in many different cell types. This region has open chromatin based on DNase hypersensitivity tests across 125 cell types by ENCODE (16) (Figure S2.4), and falls on a transcription factor binding site of SMARCA4, VDR, RBL2, CTCF, FOS and ETS1 according to the

Open Regulatory Annotation database (17). Additionally, the ancestral allele at this variant position is conserved across species (GERP score: 0.768).

The third locus at chromosome 1 encompassed variants located 5’ and intronic of the SLC26A9 gene (Figure 2.2C; rs4077468 p-value 2.25e-8, HR: 1.38), that were previously reported to be associated with CFRD in the Phase 1 study and replicated in a

Phase 1 replication population(13). Association between rs4077468 and CFRD onset was demonstrated in analyses of the Phase 2 individuals (some of whom were included in the

Phase 1 replication study; Table S2.3; rs4077468 Phase 2 p-value: 1.21e-2; Phase 1 p- value: 1.20e-7) and the subset of Phase 2 individuals who were not in the Phase 1 replication study (n=2170; p-value: 1.1e-2). In the Phase 2 replication data set (N=591), rs4077468 was not significantly associated with CFRD onset (p-value: 0.65, HR: 1.06),

14 however interaction term analyses demonstrated that the effect size did not significantly differ between the replication dataset and test dataset (p-value: 0.107), suggesting the

Phase 2 replication may have limited power to further replicate association at this locus.

Conditional analysis demonstrates all of the significant variants tag the same genetic association signal as rs4077468 (Figure S2.2C). It is noted that the genotyping platforms all contain a 200 kb gap reflecting a former gap in the reference sequence.

To identify genes at the SLC26A9 locus whose expression might be affected by the CFRD modifier variants, eQTL data from GTEx (15) were analyzed. The variants in question are correlated not only with SLC26A9 expression in pancreas (e.g., rs4077468,

P=0.026; beta=0.19) but also are significant eQTLs for the adjacent gene, PM20D1

(rs4077468 eQTL p-values: pancreas: 7.1e-3; subcutaneous adipose: 7.5e-6; visceral

(omentum) adipose: 6.8e-4), raising the possibility these CFRD modifier variants could affect SLC26A9 and/or PM20D1 expression. No variant that exceeded genome-wide significance at this locus stood out based on the chromatin state and conservation on that position (Figure S2.5; data from Roadmap Epigenome Browser and UCSC Genome

Browser)

2.2.2 CFRD risk correlates strongly with genetic risk of T2D, and weakly with that of

T1D

To test broadly for variants associated with both CFRD and T2D, we compared the p-values of all variants from a T2D genome-wide association study (GWAS) carried out by the DIAGRAM consortium (18) to the p-values from our study (Figure 2.3A;

Table S2.6). The DIAGRAM consortium study is well powered and includes subjects of

15 mostly European descent, similar to our study. Known T2D loci selected using a candidate approach that previously associated with age at onset of CFRD (at TCF7L2,

CDKAL1, IGF2BP2 and CDKN2A/B) (8,13) maintained significant association in this larger study (Figure 2.3A; Table S2.4). Conversely, CFRD-associated variants at

SLC26A9 and PTMA were not significantly associated with T2D (e.g., SLC26A9 top variant rs4077468 p-value: 0.13; PTMA top variant rs838455 p-value: 0.89) (18) (Figure

2.3A). Similarly, a comparison with the T2D GWAS(19) demonstrated that the odds ratio

(current study) and hazard ratio (18) of variants at TCF7L2, CDKAL1, IGF2BP2 and

CDKN2A/B were correlated, whereas odds ratios for variants at SLC26A9 and PTMA were not (Figure 2.3B; Table S2.5).

To quantify the shared heritability between CFRD and T2D, we estimated the genetic correlation between the CFRD and T2D(18) GWAS summary statistics using

LD-score regression (see Methods). We found that although the genetic correlation was high (0.6477 (ranges from -1 to 1), it was not statistically significant (SE: 0.983, p-value:

0.51). To further assess the genetic overlap between T2D and CFRD, we created a log

OR weighted PRS for T2D from distinct loci reported in the most recent T2D GWAS by

DIAGRAM (18) (see Methods) (Table S2.6). This T2D PRS was significantly associated with CFRD (p-value 8.84e-17; HR: 1.29; Table S2.7; Figure 2.3C), and

CFRD onset in individuals with high T2D PRSs (PRS>6; n=973) was at a significantly younger age than individuals with low T2D PRSs (PRS≤4; n=1467) (log-rank p-value:

1.03e-13). After excluding the four loci associated with CFRD previously and in this study (see above; TCF7L2, CDKAL1, CDKN2A/B and IGF2BP2), the T2D PRS retained significant association with CFRD (p-value:1.45e-6, HR: 1.19). To assess which variants

16 are responsible for the remaining association signal, we identified the variants that exceeded an FDR of 0.1 using the Benjamini-Hochberg procedure. Variants at 13 loci were significant; 10 novel (CEBPB, ADCY5, LTK, SLC2A2, ANK1, BCAR1, GLIS3,

ETS1, SHQ1 and SLC30A8), and 3 previously known to influence CFRD (see above;

TCF7L2, CDKN2A/B and CDKAL1).

Two approaches were used to assess genetic overlap between CFRD and T2D beyond the variants associating with either trait with genome-wide significance. First, we constructed an optimized PRS using PRSice-2 (20) including variants with p-values <

0.0025 (1067 variants) reported in the most recent T2D GWAS by DIAGRAM (18). This

T2D PRS was also significantly associated with CFRD (p-value: 1.80e-18, HR: 1.297) indicating that variants beyond those reaching genome-wide significance influence both

CFRD and T2D. Second, we tested whether the direction of effect was concordant (i.e., having the same risk allele for T2D and CFRD) more often than the 50%, which would occur by chance. To do so, variants outside of TCF7L2, CDKAL1, IGF2BP2 and

CDKN2A/B were LD-pruned (r-squared>0.1 within 500kb windows), then sorted by their

Fisher-combined p-value for association with CFRD (this study) and T2D (DIAGRAM)

(18). Among the top ranked 5,000 variants, the CFRD and T2D risk alleles were the same for 2,643 (instead of 2,500 expected by chance; p-value 0.004), demonstrating that there could be > 100 independent association signals associated with both CFRD and T2D. The percentage of variants within 1500-variant bins with concordant direction of effect in

CFRD and T2D (Figure S2.6A), and the chi-squared p-values in comparison to expected were plotted (Figure S2.6B; see Methods).

17 In contrast, a comparison of the association p-values from a T1D GWAS(21) with the current study did not highlight any variants associated with both diseases. Variants were either associated with CFRD only (e.g., at SLC26A9, PTMA and TCF7L2), T1D only (e.g., at HLA, IGF2 and PTPN22), or neither phenotype (Figure S2.7A). A T1D

PRS constructed from independent non-HLA T1D variants reported (48 variants(22);

Table S2.6) showed weak evidence of association with CFRD (p-value:2.45e-02, HR:

1.08), and CFRD age at onset in individuals with higher T1D PRSs (PRS>7; n=535) and lower T1D PRSs (PRS4; n=760) were significantly different (log-rank p-value: 0.02;

Figure S2.7B). A genome-wide T1D PRS generated using PRSice-2 (20) (1029 variants; p-values<0.06; see Methods) was significantly associated with CFRD (p-value: 7.1e-05,

HR: 1.13), indicating that there might be some overlap between CFRD and T1D. Of the

1029 variants included in the PRS, 567 associated with CFRD and T1D in the same direction, however none had an FDR<0.1 using the Benjamini-Hochberg procedure. Of note, the T1D PRSs did not include variants at the HLA locus, which has a large effect size on T1D. Variants at the HLA locus did not associate with CFRD (Figure S2.7A).

2.2.3 CFRD is associated with genetic risk for reduced β-cell function, not insulin resistance

To evaluate the effect of genetic risk for insulin secretion and insulin action on

CFRD, we constructed trait-specific PRSs in three different ways. First, PRSs were made based on PRSs previously constructed (23) for homeostatic model assessment of β-cell function (HOMA-B; 20 variants) and homeostatic model assessment for insulin resistance (HOMA-IR; 10 variants). A second pair of PRSs included T2D-associated

18 variants classified as influencing either insulin secretion (14 variants; 10 overlapping with HOMA-B), or insulin action (7 variants; 4 overlapping with HOMA-IR)(19). Third,

PRSs were constructed using PRSice-2 (20) from summary statistics of genome-wide association studies on HOMA-B and HOMA-IR(24) (3940 and 233 variants, respectively).

The HOMA-B and the insulin secretion PRSs significantly associated with CFRD onset (p-value: 1.47e-09; HR: 1.192, and p-value: 7.57e-18; HR: 1.247, respectively).

Similar results were obtained after removing variants known to influence CFRD (p-value:

9.92e-03; HR: 1.081, and p-value: 1.20e-02, HR:1.078, respectively) (Table S2.7). Of the variants included in the HOMA-B PRS, variants at 9 loci had an FDR<0.1 with the

Benjamini-Hochberg procedure; 5 were novel loci (ADCY5, SLC30A8, MAEA, GLIS3,

DGKB), of which ADCY5, GLIS3 and SLC30A8 were also significant in the T2D PRS

FDR analysis. The remaining 4 loci are known to be associated with CFRD (see above;

TCF7L2, CDKAL1, CDKN2A/B, IGF2BP2). Of the 14 variants included in the insulin secretion PRS, 7 significantly associated with CFRD (FDR<0.1), all of which were included in the HOMA-B PRS and had an FDR<0.1 (at TCF7L2, CDKAL1, CDKN2A/B,

IGF2BP2, ADCY5, SLC30A8 and GLIS3). In addition, the HOMA-B PRS constructed using PRSice-2, which includes 3940 variants with p-values < 0.02757 was significantly associated with CFRD (p-value: 1.50e-03, HR:1.103), demonstrating overlap in the genetic risk variants for CFRD and insulin secretion outside of the most significantly associated variants.

Both the insulin action PRS and the HOMA-IR PRS did not associate with CFRD

(p-value: 0.949; HR: 0.998, and p-value: 0.903, HR: 1.004, respectively). An optimized

19 PRS constructed using PRSice, including variants with p-values<2.54e-4 in the HOMA-

IR GWAS (233 variants) showed weak evidence of association with CFRD (p-value:

0.02, HR: 1.070) (Table S2.7). None of the variants included in the PRS individually associated with CFRD with an FDR<0.1. Thus, genetic risk for reduced β-cell function

(HOMA-B and insulin secretion PRSs) was strongly associated with CFRD while the genetic risk for insulin resistance (HOMA-IR and insulin action PRSs) was associated with CFRD either weakly or not at all. This is also illustrated in Figure 2.3D, in which variants associated with HOMA-B (blue) (24) tended to be associated with CFRD, while variants associated with HOMA-IR (red) (24) were not.

In addition, we constructed PRSs for post-challenge glucose concentration (2- hour plasma glucose; 2hPG; 9 variants (25)), fasting plasma glucose levels (FPG; 36 variants (25)), autoimmune thyroid disease (7 variants; (26)), islet autoimmunity (8 variants (27)) and hemoglobin A1c (10 variants; (28)). Each PRS was constructed from variants reported in previous studies to be associated with the respective phenotype, with

1 variant per locus included (Methods; Table S2.6). The 2hPG and FPG PRSs were associated with CFRD onset (p-values 2.99e-8 and 1.25e-4, respectively), however upon removal of variants previously known to influence CFRD, the association was no longer significant. The autoimmune thyroid, islet autoimmunity and hemoglobin A1c PRSs were not associated with CFRD onset.

2.2.4 Construction of a CFRD PRS

With a more expansive list of variants that modify CFRD, including 3 genome- wide significant loci and 15 variants exceeding FDR of 0.1 from the either or both of the

20 T2D and HOMA-B PRSs, we constructed a log-HR weighted CFRD PRS from these loci

(n=18; Table S2.8), which, as expected, was associated with CFRD in our discovery population (p-value: 2.7e-51, HR: 1.54; Figures 4A and 4C). This PRS was validated by demonstrating an association with CFRD onset in the independent replication population

(n=591; 204 with CFRD; p-value: 5.8e-6, HR: 1.35; Figures 4B and 4D), and the ROC area under the curve was 0.5798. A 16-variant PRS not including the CF-specific loci (at

SLC26A9 and PTMA) was also significantly associated with CFRD in the replication population (1.90e-5; HR:1.29).

In addition, we constructed a CFRD PRS using PRSice-2 (20) which included 188 variants with a p-value<0.001 in the discovery sample. This PRS was also significantly associated with CFRD (p-value: 3.1e-03, HR: 1.21), and the ROC area under the curve was 0.5312.

2.2.5 Association at CFRD modifier loci are not CFTR genotype specific

We tested to see whether the modifier variants were acting in a CFTR-genotype specific fashion. F508del is the most common CF-causing variant in CFTR. It causes misfolding of the CFTR protein, leading to degradation and absence of CFTR at the cell surface. We conducted F508del homozygote only (n=2,303) and non-F508del homozygote only (n=1,800) subset association analysis on the genome-wide significant loci to see if association signals were specific to either subgroup. Additionally, we conducted a F508del homozygosity (in which homozygosity defined as 0 or 1) interaction term analysis to determine whether the associations between CFRD and the modifier variants in F508del homozygotes and non-F508del homozygotes is different. Association

21 signals at TCF7L2, SLC26A9 and PTMA did not show any F508del-specific effect (Table

S2.9).

2.3 Discussion

This study provides evidence that genetic modification of CFRD incorporates pathways both in common with and dissimilar to T2D, and that T2D and CFRD have etiologic and mechanistic overlap to greater depth than previously known. We had previously demonstrated family history of T2D substantially increases the risk for developing CFRD (OR: 3.1) and variants within TCF7L2, CDKAL1, CDKN2A/B and

IGF2BP2 influence T2D (29-31) are associated with risk to CFRD (8,13). The genetic overlap contrasted with clinical and pathophysiologic differences; people with CFRD tend to have normal insulin sensitivity and reduced/abnormal production of insulin unlike in T2D which is due to a combination of reduced insulin sensitivity and insufficient production of insulin.

In the current study, GWAS has identified a novel modifier locus on chromosome

2 (PTMA), and replicated the previous association between modifier variants at the

SLC26A9 locus, which were both specific to CFRD. On the other hand, T2D risk variants at TCF7L2 were associated with CFRD achieving genome-wide significance, and the newly genotyped samples provided replication of variants at IGF2BP2, confirming there is a genetic overlap between T2D and CFRD. Moreover, multiple lines of evidence demonstrated that additional T2D susceptibility loci influence CFRD.

A key advance from this work was an increased understanding of the relationship between CFRD and related metabolic traits. As genetic variants known to influence

22 complex phenotypes (such as diabetes) are now better understood, PRSs can be used to explore the genetic overlap between various diabetes endophenotypes. Here, we demonstrated PRSs for insulin secretion, HOMA-B, FPG and 2hPG in the general population are associated with CFRD among CF patients. An in-depth look revealed the variants associated with both T2D and CFRD were frequently associated with HOMA-B

(ß-cell function), and none with HOMA-IR (insulin sensitivity) (Figure 2.3D). Taken together, these results indicated, in general, CFRD is modified by variants that tend to affect beta cell function rather than insulin sensitivity.

TCF7L2 is a well-known T2D-associated locus, and has been shown through candidate studies that it also influences risk to CFRD(8,13). Previous studies have shown rs7903146, one of the most significantly T2D and CFRD associated variant in TCF7L2, falls in a FOXA2 binding site (32), is located in islet-selective open chromatin (33), alters enhancer activity (33), and is considered a plausible causal variant for earlier onset of diabetes. Many studies have examined how variants at TCF7L2 contribute to diabetes, however the mechanism remains unknown(34). TCF7L2 encodes a transcription factor, and it has been shown that reduction of TCF7L2 inhibits insulin secretion (35,36). This could similarly influence age at onset of CFRD as well, as a key feature of CFRD is decreased insulin secretion. Alternatively, a study has shown that rs7903146 influences the expression of a nearby gene, ACSL5, an acyl-coA essential for fatty acid metabolism(37).

Additional T2D susceptibility genes also contribute to CFRD risk. By direction of effect concordance analysis, we found evidence that more than 100 additional independent loci are associated with both CFRD and T2D. Of the variants comprising the

23 T2D PRS, 10 novel variants (in addition to 3 previously identified variants) were significantly associated with CFRD using an FDR-based approach. Of these, the top variant at CEBPB (rs1169802, C allele) is in high LD with the variant (rs2094716; A allele) associated with meconium ileus in individuals with CF(38). A similar FDR-based approach taken with the HOMA-B PRS identified 2 additional loci (MAEA, DGKB) that contribute to this CFRD association.

Three loci were identified by both the T2D and HOMA-B FDR analyses (ADCY5,

GLIS3 and SLC30A8). ADCY5 encodes an adenylate cyclase enzyme which catalyzes the production of cyclic AMP, which is a second messenger molecule involved in insulin secretion. A previous study has shown that the risk-allele at this locus disrupts an islet enhancer, resulting in reduced ADCY5 expression and impaired insulin secretion(39), and it is plausible that variants at this locus are acting on CFRD through the same mechanism.

Defective expression of Glis3 has been shown to drive increased levels of beta cell apoptosis and senescence in non-obese diabetic mice (40). SLC30A8 transports zinc from the cytoplasm to insulin secretory granules in the pancreatic beta cells, and variants at this locus are associated with lower beta cell function and lower plasma insulin levels

(41). Consistently, all three of these loci are thought to influence T2D by affecting insulin secretion. Therefore, their association with CFRD demonstrates genetic predisposition to decreased insulin secretion increases risk for CFRD.

Studies have shown a T1D PRS is good at discriminating T1D from T2D (42) and can be used to identify patients with T2D who rapidly progress to insulin therapy (43).

Interestingly, we found some evidence of overlap between the genetic architecture of

CFRD and T1D using PRSs, even though PRSs for the T1D-related endophenotypes

24 (islet autoimmunity and autoimmune thyroiditis) were not associated with CFRD.

Previous studies reported HLA haplotypes associated with T1D and autoantibody levels are not risk factors for CFRD (44,45). An FDR approach did not reveal any individual variant within the T1D PRS that associate with CFRD; therefore, a more highly powered analysis on more individuals will be needed to dissect the basis of the genetic overlap between T1D and CFRD.

The newly identified locus on chromosome 2 contains a compelling candidate gene, PTMA. PTMA encodes Prothymosin-α, which is involved in oxidative stress, inflammation, cell proliferation and apoptosis, all of which can plausibly be argued to be involved with CFRD pathogenesis. Recently, a study of 185 T2D and non-diabetes subjects showed the serum PTMA level was higher in T2D patients compared to healthy subjects(46). PTMA may also affect insulin sensitivity by acting as a ligand for Toll-like receptor 4(47). Additionally, transgenic mice overexpressing Prothymosin-α (ProT) had insulin resistance, and silencing hepatic ProT expression in mice ameliorated high-fat diet-induced insulin resistance. Insulin sensitivity in CF can be substantially reduced at times of acute pulmonary exacerbation, often a time of increased inflammation.

Furthermore, a study has shown that thymosin alpha could rectify the multiple tissue defects of CF mice and cells from subjects with the most common CF-causing mutation,

F508del (48), though another group was unable to replicate these findings(49). We hypothesize this variant is influencing PTMA expression, though it does not appear so in

GTEx, perhaps because the eQTL calculations in GTEx are based on whole tissue instead of being cell-type specific. In addition to PTMA CFRD-associated variants 3’ of PTMA are located within and near COPS7B and PDE6D. COPS7B is a protein component of the

25 COP9 signalosome complex, which is involved in regulation of the ubiquitin conjugation pathway. PDE6D is a phosphodiesterase involved in the phototransduction cascade.

Either of these genes could also be implicated in CFRD, although there is no evidence of these genes acting on diabetes in the general population to date.

The variants associated with CFRD at the SLC26A9 locus are 5’ and intronic (all noncoding), suggesting a role in gene regulation. SLC26A9 encodes a bicarbonate and chloride transport protein (50,51), and is a good biologic candidate as a modifier of

CFRD for several reasons. First, in vitro studies have shown that SLC26A9 interacts with

CFTR via its STAS domain and PDZ-binding motif (52). Second, a CFRD-associated variant in SLC26A9 was associated with exocrine pancreatic dysfunction (assessed by immunoreactive trypsinogen (IRT) levels at birth) (53). Third, the CFRD-associated variants in and near SLC26A9 have also been associated with risk for meconium ileus in

CF (54), a complication that requires the presence of pancreatic exocrine insufficiency

(55), and these modifier variants colocalized with eQTLs in the pancreas (38). Finally,

Slc26a9 knockout mice with CF manifest increased rates of mortality due to intestinal obstruction (56). Although SLC26A9 is a compelling candidate causal gene, the CFRD modifier variants at SLC26A9 are also significantly associated with expression of a neighboring gene, PM20D1. The PM20D1 gene is located 63 kb downstream of

SLC26A9, and encodes for a secreted enzyme that acts as a regulator of N-acyl amino acids. PM20D1 is expressed across many tissues (15), and has been found to be involved in energy expenditure in brown and beige adipocytes where it is leading to reduced fat mass and lowered glucose (57). Thus, both PM20D1 and SLC26A9 are plausible candidate genes for CFRD modifiers.

26 There are several limitations of this study. First, diabetes is defined by data available in the clinical chart and/or CF Foundation Patient Registry, which does not always include laboratory confirmation. Our previous analysis found good agreement between the clinical and laboratory-based diagnoses of diabetes (13). Also, annual updates of the Patient Registry provide increasingly accurate ascertainment of CFRD as the cohort ages. Second, as with any association study, these data do not implicate any specific variant or gene as causal for CFRD. Further investigation will be required to identify the molecular mechanisms implicated in this study. Another limitation of the study is sample size; though this is a large group for a Mendelian disease, to study complex phenotypes such as diabetes, a much larger sample size would have increased power to detect more genetic variants. Additionally, most of the variants included in this study were imputed, which could have imputation errors. We removed variants that were poorly genotyped or imputed, and this could have resulted in missing some associated loci.

Studying modifiers of Mendelian disorders is not only informative of that disorder, but also could be informative for the general population. By comparing summary statistics of T2D and CFRD GWASs, we identified two risk loci that are CFRD specific (at SLC26A9 or PTMA); and 16 loci that influence both CFRD and T2D (such as

TCF7L2 and CDKAL1). The overlapping variants tend to affect beta cell function rather than insulin sensitivity, supporting the hypothesis that CFRD is less related to insulin resistance than insulin secretion. Investigations of these differences hold the promise of delivering insight into overlapping and non-overlapping molecular etiologies of CFRD and T2D. Studying sequence data of a larger cohort and conducting functional studies on

27 our variants and genes of interest will be useful to further test hypotheses on variants influencing CFRD.

2.4 Methods

2.4.1 Samples, genotyping and quality control

Individuals with CF who have two severe CFTR mutations and/or exocrine pancreatic insufficiency were recruited in two phases (total n=5,740). Phase 1 subjects are those included in the prior CFRD genome-wide association study (GWAS) (13) excluding 16 subjects following additional data cleaning (n=3,043). These individuals were recruited from three cohorts (Johns Hopkins Twin and Sibling Study (TSS),

Canadian CF Gene Modifier Study (CGS) and Genetic Modifier Study (GMS))

(13,54,58). Phase 2 subjects included individuals from the above cohorts (TSS, CGS, and

GMS) who were recruited or acquired CFRD phenotype information since Phase 1, or who were excluded from Phase 1 for relatedness. Phase 2 also includes an additional cohort from the French CF Gene Modifier Consortium (FrGMC; Phase 2 n=2,697). The

GMS subjects in Phase 1 were recruited for an extremes-of-phenotype study (severe vs. mild CF lung disease), and were all F508del homozygotes; GMS subjects in Phase 2 included the above plus additional participants recruited without regard to genotype or lung function recruited by the GMS study, or by CF centers at Children’s Hospital of

Denver, Wisconsin, and Boston (see (59) for more information). CGS participants were recruited from CF centers in Canada without regard to genotype or lung function. TSS participants were recruited from CF centers mostly in the U.S. based on having a surviving affected CF sibling. FrGMC recruited patients from 48 CF centers with both

28 parents born in European countries. Subjects all met diagnostic criteria for CF and were required to have two severe CFTR mutations and/or exocrine pancreatic insufficiency.

An independent dataset of 591 individuals were included as a replication dataset

(Phase 2 replication dataset). These individuals were recruited in Phase 1 and Phase 2 as a part of the Johns Hopkins TSS, but not included in the test dataset because of lack of phenotype information at the time of the analysis. Updated phenotype information from the 2017 CFF Patient Registry revealed that 204 of these individuals had CFRD and 387 did not. This dataset was genotyped and imputed together with the Phase 1 and Phase 2 datasets.

Written informed consent was obtained from each participant and/or parents/guardians. Studies were approved by institutional review boards at participating sites and include: Committee on Clinical Investigation, Boston Children's Hospital;

Institutional Review Board at Children’s Hospital of Wisconsin; Colorado Multiple

Institutional Review Board; Johns Hopkins School of Medicine eIRB2 (Committee: IRB-

3); Research Ethics Board of The Hospital for Sick Children; Biomedical Institutional

Review Board, Office of Human Research, University of North Carolina at Chapel Hill; and University Hospitals Case Medical Center, Institutional Review Board for Human

Investigation. In France, the study was approved by the French ethical committee (CPP n°2004/15) and the information collection was approved by Commission nationale de l'informatique et des libertés (CNIL; n°04.404).

Phenotypes were obtained from extracted medical charts and CF Foundation

Patient Registry through 2011. CFRD was defined by clinician diagnosis of diabetes plus insulin treatment for at least 1 year. The onset of CFRD was defined as the date at which

29 insulin was started, if it was subsequently continued for at least one year. In approximately 50% of the participants, independent laboratory data (such as oral glucose tolerance test or hemoglobin A1c) were able to independently confirm the diagnosis of

CFRD. Diabetes data were censored at the last clinic visit or date of solid organ transplant. Participants diagnosed with type 1 diabetes were excluded. Because phenotype data collection was prior to clinical use of CFTR modulators, no participants were treated with CFTR modulators during this study.

Genotyping for Phase 1 was performed on the Illumina Quad 610 platform.

Genotyping for Phase 2 was performed on the Illumina Quad 610 platform (for individuals typed for Phase 1 but excluded from the CFRD analysis for relatedness), and on the Illumina 660W and Omni5 platforms (TSS, CGS, and GMS cohorts). The French cohort was genotyped on the Illumina 370K and 660K platforms (59). Genotype calling was performed using GenomeStudio V2011.1. Individuals were removed if the initial call rates were <95%, or had extreme heterozygosity rates where the threshold was described as in(54). Duplicated individuals/identical twins were identified using IBD/IBS estimation, and for each pair the one with a higher genotype call rate was kept.

2.4.2 Imputation and Quality Control

MaCH/Minimac software was used for phasing and imputation. The reference used was

Phase I, Version 3 haplotype data from 1000 Genomes project including all 1000

Genomes reference samples. Genotyped variants with a low minor allele frequency

(MAF<2%) and low call rate (<95%) were excluded prior to imputation. Imputed variants with a MaCH quality score r2<0.30 were excluded from the analysis.

30

2.4.3 Estimating population structure with principal components

Genotype principal components (PCs) were calculated using the EIGENSOFT package’s

EIGENSTRAT method (60). Genotyped variants common to all platforms, with a

MAF>5%, and not in linkage disequilibrium (pruned by plink indep parameter, with a

50bp window size that shifts by 5 bases, and prunes based if r-squared>0.5) were included in the PC calculation. For PCs to not be influenced by related individuals, they were calculated on only unrelated individuals (determined by IBD/IBS estimation in plink as individuals with a probability of identity less than 40%, which is individuals that are no more related then first-degree relatives), and projected on to the related individuals using the poplist flag.

2.4.4 Genetic association testing

A combined analysis was conducted on individuals who were included in the first

CFRD GWAS (13) (Phase 1), in addition to newly genotyped individuals who were not previously reported (Phase 2). Since the majority of CF patients with exocrine pancreatic insufficiency (PI) are predicted to develop diabetes at some point, but the age at diagnosis of diabetes varies, a Cox proportional hazards (CoxPH) model was used to test for association (event = diagnosis of CFRD; time = age at CFRD diagnosis or age at last clinic visit if no CFRD). Four PCs and site (Johns Hopkins University, Hospital for Sick

Children, University of North Carolina/Case Western Reserve University, or University of Pierre and Marie Curie) were included as covariates for the “unadjusted” analysis, and

4 PCs, site and sex were included as covariates for the “adjusted” analysis.

31 Due to the known presence of first-degree relatives (twins and siblings) and possibility of additional unknown first-degree relatives within our cohorts, we determined relatedness among individuals based on IBD/IBS estimation in PLINK. Individuals with probability of identity greater than 40% were considered to be in the same nuclear family.

Our analysis showed there were few 2nd or 3rd degree relatives in the study, and most relatives were nuclear families. To control for correlation among family members resulting in genomic inflation, we tested several different models on a subset of family samples from the Phase 1 TSS patients (396 samples, 288 families): 1) CoxPH with all the samples ignoring within family correlation (61), 2) CoxPH with maximum unrelated individuals, 3) CoxPH with marginal model (cluster), 4) CoxPH with frailty model

(random per-family effect), 5) mixed effects cox model with family specific random intercept (62), 6) mixed effects cox model with kinship coefficient matrix as random intercept, 7) mixed effects model using martingale residual with kinship coefficient as random intercept (63). Based on the quantile-quantile (Q-Q) plots and lambda values

(lambda = 1.124, 1.028, 1.086, 1.025, 1, 1.097, 1.097 for the seven models respectively) of these analyses, we decided to use the CoxPH with frailty model to control for relatedness in all analyses (64). Using LDSC (65), we determined our GWAS has a lambda GC of 1.0345, intercept of 1.0288 (SE: 0.0067), and a ratio of 0.88 (SE: 0.2052), which demonstrates little impact of confounding due to population stratification.

Analysis included 9,157,530 genotyped and imputed variants with minor allele frequency >2% and markers with imputation r-squared>0.3. Genome-wide significance was defined as p<5e-8. Suggestive significance was defined as p<1.8e-6 (13).

32 2.4.5 Estimating genetic correlation

Genetic correlation between T2D (taken from DIAGRAM(18)) and CFRD (our study) was estimated using GWAS summary statistics with the LD-score regression (LDSC) software, v1.0.0 (65,66), command “-rg”.

2.4.6 Polygenic Risk Scores

Weighted polygenic risk scores (PRSs) were calculated for various phenotypes from published GWAS results, using no more than 1 variant per locus. For T2D, we used

221 primary variants available in our dataset from the BMI-unadjusted analysis reported in the most recent T2D GWAS from the DIAGRAM consortium (18). For the T1D PRS, we included 48 T1D-associated variants available in our dataset reported by the Type 1

Diabetes Genetics Consortium (22). Of note, the T1D PRS does not include any variants at the HLA locus.

For insulin secretion and action, we used PRSs which were previously generated

(23) for homeostatic model assessment of β-cell function (HOMA-B; reflects insulin secretion; 20 variants) and homeostatic model assessment for insulin resistance (HOMA-

IR; reflects insulin resistance; 10 variants) based on prior genetic and physiologic evidence reported in literature. We also constructed PRSs using subsets of T2D- associated variants which were classified as acting on insulin secretion vs. insulin action by a recent study (19) which clustered phenotypic effects using z scores from published

GWASs (67).

For post-challenge glucose concentration (2-hour plasma glucose; 2hPG; 9 variants) and fasting plasma glucose levels (FPG; 36 variants), we applied PRSs

33 generated by a previous study (25), which included variants based on their association reported in another study (68). For the autoimmune thyroid disease PRS (7 variants), we used the variants reported in an association study (26), for islet autoimmunity (8 variants) we used associations reported in a recent TEDDY study (27), and for hemoglobin A1c (10 variants), we used variants reported from a meta-analysis of 23

GWASs on nondiabetic adults (28).

The PRSs were calculated as a log(OR) or effect size-weighted sum of risk alleles, normalized and scaled from 0-10. A list of the variants included in each PRS can be found in Table S2.6(69).

Weighted genome-wide T2D polygenic risk scores were constructed using

PRSice-2 (20) off of summary statistics with various p-value cut-offs for inclusion in the

PRS. Clumping was performed with an r-squared of 0.1 in 250kb windows. The PRS with most significantly associating p-value cut-off was reported.

PRSs were tested for association with CFRD onset using the CoxPH model with frailty, including 4 PCs and site as covariates in the discovery population, and including 4

PCs as covariates in the replication population (all individuals in the replication population were from JHU).

2.4.7 Direction of effect concordance analysis

P-values and hazard ratios (HRs) of variants in the current study were compared to p-values and odds ratios (ORs) of each variant reported by DIAGRAM(18). Summary statistics for DIAGRAM were downloaded from http://diagram- consortium.org/downloads.html (T2D GWAS meta-analysis - Unadjusted for BMI).

34 Variants at TCF7L2, CDKN2A/B, CDKAL1 and IGF2BP2 were removed. The remaining variants were LD-pruned (r-squared>0.1 within 500kb windows) and sorted by their T2D and CFRD Fisher-combined p-value. The percentage of variants with concordant direction of effect in CFRD and T2D was determined within bins spanning

1500 variants in 750 variant sliding windows. Chi-squared p-values of each bin (1500 variants) were calculated in comparison to the expected: 750 variants (50%) concordant,

750 variants (50%) discordant.

35

Figure 2.1: Manhattan plot of Phase 1 + 2 combined association analysis.

Association analysis was performed on all variants with MAF>2% that passed quality control criteria. The x-axis indicates chromosomal position, and the y-axis indicates the strength of evidence for association with CFRD (-log10(p-value)) by Cox proportional hazards regression including frailty model to allow for clustering by family. The black line corresponds to the genome-wide significance threshold (p=5e-8).

36

Figure 2.2: LocusZoom and Forest Plots of genome-wide significant loci, TCF7L2

(A-B), PTMA (C-D), and SLC26A9 (E-F).

Forest plots are shown for the top variant at each locus, rs34872471, rs838455 and rs4077468, respectively. To the right of the forest plots are subset analyses p-values and

HR, in addition to interaction term p-values for association of the interaction term between the variable (site, sex, platform or CFTR genotype) and the variant.

37

Figure 2.3: Comparison of the genetic risk architectures of CFRD and T2D.

(A) Comparison of p-values for each variant in the T2D(18) and CFRD (current study)

GWASs. All genotyped and imputed variants have been plotted. Variants at known

CFRD modifier loci have been colored and labelled. The T2D log10(p-value) was

38 defined as positive when the risk alleles were concordant between CFRD and T2D. (B)

Comparison of CFRD log transformed hazard ratios to T2D log transformed odds ratios.

The top variant of the loci that exceeded genome-wide significance (p<5e-8) in the T2D

GWAS or our study has been plotted. Known CFRD modifiers have been labelled. (C)

Cumulative incidence plot for CFRD onset in individuals divided into 6 bins based on their weighted T2D PRS. (D) Comparison of p-values for each variant in the T2D and

CFRD GWASs. All genotyped and imputed variants reported in the T2D(18), HOMA-

B(24) and HOMA-IR(24) studies have been plotted. Variants associated with T2D

(p<0.00001) were colored by association by HOMA-B (p-value<0.01; blue), HOMA-IR

(p-value<0.01, red), both (purple). Variants not associated with HOMA-B, HOMA-IR or

T2D were colored grey. The T2D log(p-value) was defined as positive when the risk alleles were concordant between CFRD and T2D.

39 A B 1.0 PRS > 6 (n=874) 1.0 PRS > 6 (n=79)

) ) 5 < PRS ≤ 6 (n=1,370) 5 < PRS ≤ 6 (n=130)

R

2

2

4 < PRS ≤ 5 (n=1,765) 4 < PRS ≤ 5 (n=205)

+

,

n 1 3 < PRS ≤ 4 (n=1,199) 3 < PRS ≤ 4 (n=130)

0.8 o 0.8

i

e

t

s PRS ≤ 3 (n=532) PRS ≤ 3 (n=47)

a

a

c

i

h

l

P

p

(

e

e 0.6 0.6

R

c

(

n

e

e

c

d

i

n

c

e

n 0.4 d 0.4

i

i

c

e

n

v

i

i

t

e

a

l v

i

u 0.2 t 0.2

a

l

m

u

u

C

m

u 0 C 0 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 Age (years) Age (years) C D 0.6 0.6

0.5 ) 0.5

)

R

2

2

,

+

n

1

o

0.4 i 0.4

t

e

a

s

c

a

i

l

h

p

P

(

e

0.3 0.3

R

e

(

c

e

n

c

e

l

n

a 0.2 0.2

e

l

v

a

e

r

v

P

e

r

0.1 P 0.1

0 0 ≤3 (3-4] (4-5] (5-6] >6 ≤3 (3-4] (4-5] (5-6] >6 n=532 n=1199 n=1765 n=1370 n=874 n=47 n=130 n=205 n=130 n=79 CFRD PRS CFRD PRS

Figure 2.4: Cumulative incidence (A-B) and prevalence (C-D) plots by CFRD PRS in test and replication populations

(A-B) Cumulative incidence plots of incidence of CFRD in Phase 1 + 2 population (A) and replication population (B). Individuals are divided into bins based on their CFRD

PRS. (C-D) Prevalence plots of CFRD in Phase 1 + 2 population (C) and replication population (D), divided by CFRD PRS bin.

40 Mean age DF/ Fe- No Phase Study Site All Unrelated CFRD CFRD Both DF male CFRD TSS JHU 285 285 171 147 103 17.3 15.3 16

1 CGS HSC 1496 1462 907 819 163 23.9 18.8 19.3 UNC/ GMS 1262 1247 1213 674 373 23.3 22.4 22.7 CWRU TSS JHU 688 403 394 352 149 21.5 12.4 14.4

CGS HSC 379 351 247 192 55 24.4 13.4 15 2 UNC/ GMS 740 726 340 403 246 23.8 23.8 23.8 CWRU Fr UPMC 890 890 588 453 252 24 21.2 22 GMC Total phase 1 and 2: 5740 5364 3860 3040 1341 23 19.1 20

2R TSS JHU 591 427 311 267 204 20.8 23.6 22.7

Table 2.1. Characteristics of patients enrolled by the studies comprising the

International Cystic Fibrosis Gene Modifier Consortium

DF/DF signifies individuals that are homozygous for F508del variant. TSS; Twin and

Sibling Study, CGS; Canadian CF Gene Modifier Study, GMS; Genetic Modifier Study,

FrGMC; French CF Gene Modifier Consortium, JHU; Johns Hopkins University, HSC;

Hospital for Sick Children, UNC; University of North Carolina, CWRU; Case Western

Reserve University, UPMC; University of Pierre and Marie Curie, 2R; Phase 2 replication.

41 Position Risk/Alt Unadjusted Adjusted Adjusted % Chr (hg19) rsID Allele RAF p-value HR p-value HR Genotyped Annotation 10 114754071 rs34872471 C/T 0.30 2.80e-12 1.51 1.30e-12 1.53 0% TCF7L2 10 114758349 rs7903146 T/C 0.30 4.88e-12 1.50 2.54e-12 1.52 100% TCF7L2 1 205914757 rs4077468 A/G 0.59 2.25e-8 1.38 4.12e-8 1.38 95.8% SLC26A9 2 232560638 rs838455 T/C 0.08 2.98e-8 1.69 7.60e-8 1.74 0% PTMA 2 232572011 rs838440 G/T 0.09 3.45e-8 1.75 8.44e-8 1.69 100% PTMA

Table 2.2: Association statistics for top variant at each genome-wide significant locus.

CFRD onset was analyzed as a censored trait (event = diagnosis of diabetes; censoring = age at last normal diabetes screening test). Unadjusted analysis includes adjustment for 4 principal components and site. Adjusted analysis also includes sex as a covariate. Frailty model was used to account for relatedness. Three loci were genome-wide significant

(p<5e-8). The most significantly associated variants at each locus, and the most significant genotyped variants at each locus are listed. RAF is risk allele frequency. %

Genotyped signifies % of individuals who have been genotyped at this locus, as opposed to being imputed.

42

Figure S2.1. Q-Q plot for association with CFRD.

Association analysis was performed on all variants with MAF>2% that passed quality control criteria. The y-axis indicates the observed –log10(p-value), and the x-axis indicates the expected –log10(p-value) given the total number of variants. The red line corresponds to the points at which observed and expected are equal. CFRD onset was analyzed as a censored trait (event = diagnosis of diabetes; censoring = age at last normal diabetes screening test). Analysis includes adjustment for 4 principal components and site. Frailty model was used to account for relatedness. Using LDSC, we determined that our GWAS has a lambda GC of 1.0345.

43

Figure S2.2. LocusZoom conditional plots at loci that associate with CFRD onset with genome-wide significance.

Conditional analysis on the top variant at each locus: (A) TCF7L2, (B) PTMA, and (C)

SLC26A9

44

Figure S2.3. Epigenetic mapping at the TCF7L2 locus.

The relative position of the TCF7L2 transcript (on the positive strand) and variants that reached genome-wide significance on chromosome 10 is shown. The chromatin state of this locus is detailed with H3K4me1, H3K4me3, H3K27ac, FAIRE-seq and DNaseI hypersensitivity in the pancreas, and the chromatin state segmentation track

(chromHMM) on the pancreas, pancreatic islets, liver, adipose and skeletal muscle from the Roadmap Epigenome Browser. Additionally, conservation by PhyloP from the UCSC genome browser is shown below.

45

Figure S2.4: Epigenetic mapping at the PTMA locus.

The relative position of the PTMA transcript (on the positive strand) and variants that reached genome-wide significance on chromosome 2 is shown. The chromatin state of this locus is detailed with H3K4me1, H3K4me3, H3K27ac, FAIRE-seq and DNaseI hypersensitivity in the pancreas, and the chromatin state segmentation track

(chromHMM) on the pancreas, pancreatic islets, liver, adipose and skeletal muscle from the Roadmap Epigenome Browser. Additionally, conservation by PhyloP from the UCSC genome browser is shown below.

46

Figure S2.5: Epigenetic mapping at the SLC26A9 locus.

The relative position of the SLC26A9 transcript (on the negative strand) and variants that reached genome-wide significance on chromosome 1 is shown. The chromatin state of this locus is detailed with H3K4me1, H3K4me3, H3K27ac, FAIRE-seq and DNaseI hypersensitivity in the pancreas, and the chromatin state segmentation track

(chromHMM) on the pancreas, pancreatic islets, liver, adipose and skeletal muscle from the Roadmap Epigenome Browser. Additionally, conservation by PhyloP from the UCSC genome browser is shown below.

47 Figure S2.6. Direction of effect concordance between CFRD and T2D GWASs.

Variants within 500kb windows were pruned with a threshold of r-squared > 0.1 after excluding variants at TCF7L2, CDKAL1, CDKN2A/B, and IGF2BP2 (45,442 variants remained). Fisher combined p-values were generated, ignoring direction of effect. Bins containing 1500 variants were examined in overlapping sliding windows in 750 variant increments. (A) The percentage of variants concordant within each bin for the top 20,000 variants. (B) The chi-squared -log(p-value) for the concordant fraction within each bin being different from the expected 50%.

48 Figure S2.7. Genetic overlap between T1D and CFRD.

(A) Concordance of T1D and CFRD association p-values. Each point is a variant, with its T1D p-value plotted on the x-axis and CFRD p-value plotted on the y-axis. Variants exceeding genome-wide significance (p<5e-8) in the current study (CFRD) and variants with p-values < 5e-10 in the T1D GWAS have been colored and labeled with gene annotation. The remaining variants are colored grey. (B) Cumulative incidence plot binned by T1D PRS. The x-axis indicates the age, and the y-axis indicates the fraction of individuals with CFRD.

49 Table S2.1: Associations of potential covariates with CFRD onset.

Model Variable HR p-value Recruitment BOS 0.359 1.20E-08 Site DEN 1.337 0.11 (ref: JHU) FR 0.541 6.20E-12 TOR 0.293 <2e-16 UNC 0.496 <2e-16 WIS 0.661 0.12 Genotyping 370K 1.328 0.011 Platform 660K 1.122 0.187 (ref: GWAS1) 660W 1.301 9.80E-05 GWAS3 0.839 0.564 Omni5 1.199 0.178 Recruitment BOS_660W 1.380 0.1004 site & BOS_Omni5 0.864 0.7516 Genotyping DEN_660W 4.680 1.50E-14 platform DEN_Omni5 5.510 4.00E-05 (ref: FR_370K 2.150 1.90E-09 TOR_GWAS1) FR_660K 1.830 1.70E-08 JHU_660W 3.240 <2e-16 JHU_GWAS1 4.430 <2e-16 JHU_GWAS3 1.350 0.329 JHU_Omni5 4.210 6.20E-12 TOR_660W 1.790 0.0056 TOR_Omni5 0.000 0.9826 UNC_660W 1.710 2.40E-06 UNC_GWAS1 1.820 1.50E-11 UNC_Omni5 1.350 0.2029 WIS_660W 2.470 0.0024 WIS_Omni5 2.050 0.219 Site FR 1.067 0.39 (ref: UNC) JHU 1.968 <2e-16 TOR 0.579 4.00E-12 Sex Sex 1.568 2.20E-16 PC1 PC1 7267.840 1.90E-07 PC2 PC2 0.618 0.89 PC3 PC3 0.969 0.99 PC4 PC4 1.714 0.83 PC5 PC5 110.620 0.088 PC6 PC6 218.850 0.1 PC7 PC7 0.433 0.72 PC8 PC8 1.762 0.8 PC9 PC9 2.442 0.68 PC10 PC10 0.012 0.096 Liver Disease Liver Disease 2.714 <2e-16 Association was conducted as a Cox proportional hazard regression with CFRD age at onset.

50 Table S2.2. Association statistics of variants that exceeded genome-wide significance (p-value<5e-8).

rsID Chr Position (hg19) Al1 Al2 Gene Phase 1 + 2 Phase 1 Phase 2 Annotated freq(n=5740, p-value 1341 withHR freq(n=3043, p-value 639 withHR freq(n=2697, p-value 702 withHR rs1342063 1 205912859 C T SLC26A9 0.41 3.55ECFRD)-08 0.73 0.42 2.10ECFRD)-07 0.69 0.41 1.19ECFRD)-02 0.80 rs1342064 1 205913073 T C SLC26A9 0.41 3.21E-08 0.73 0.42 1.37E-07 0.69 0.41 1.38E-02 0.80 rs4077468 1 205914757 A G SLC26A9 0.41 2.25E-08 0.72 0.42 1.20E-07 0.69 0.41 1.21E-02 0.80 rs4077469 1 205914885 C T SLC26A9 0.41 2.25E-08 0.72 0.42 1.20E-07 0.69 0.41 1.21E-02 0.80 rs4951271 1 205913848 A G SLC26A9 0.42 4.54E-08 0.73 0.43 2.18E-07 0.70 0.41 1.30E-02 0.80 rs6661355 1 205910631 T C SLC26A9 0.43 3.76E-08 1.36 0.42 3.21E-07 1.41 0.44 4.36E-03 1.28 rs6673820 1 205910604 G A SLC26A9 0.43 4.30E-08 1.36 0.42 4.29E-07 1.40 0.44 4.45E-03 1.28 rs7419153 1 205917309 G A SLC26A9 0.37 4.26E-08 1.38 0.37 2.78E-06 1.39 0.38 1.17E-03 1.33 rs838438 2 232579165 T C PTMA 0.09 3.12E-08 1.68 0.08 1.97E-04 1.56 0.09 9.65E-05 1.69 rs838440 2 232572011 T G PTMA 0.09 3.45E-08 1.69 0.08 4.14E-04 1.53 0.09 6.77E-05 1.72 rs838455 2 232560638 C T PTMA 0.08 2.98E-08 1.75 0.08 5.00E-04 1.56 0.08 5.01E-05 1.80 rs10659211 10 114782581 G GCT TCF7L2 0.30 1.79E-10 1.46 0.30 1.53E-05 1.39 0.30 1.13E-06 1.53 rs11196211 10 114817009 A C TCF7L2 0.29 1.83E-08 1.41 0.29 1.57E-04 1.33 0.29 1.07E-05 1.48 rs12243326 10 114788815 T C TCF7L2 0.29 3.13E-09 1.43 0.29 3.95E-05 1.36 0.29 9.18E-06 1.48 rs12244851 10 114773926 C T TCF7L2 0.33 2.67E-09 1.41 0.33 7.04E-06 1.40 0.33 3.83E-05 1.43 rs12255372 10 114808902 G T TCF7L2 0.29 7.24E-09 1.40 0.29 8.27E-05 1.34 0.29 8.33E-06 1.47 rs12260037 10 114791490 C T TCF7L2 0.29 5.15E-09 1.42 0.29 5.00E-05 1.36 0.29 1.35E-05 1.47 rs17747324 10 114752503 T C TCF7L2 0.24 5.56E-11 1.53 0.24 1.36E-05 1.45 0.25 1.32E-06 1.60 rs34872471 10 114754071 T C TCF7L2 0.30 2.80E-12 1.51 0.30 2.58E-06 1.43 0.31 9.69E-08 1.58 rs35011184 10 114749734 G A TCF7L2 0.24 7.76E-11 1.53 0.24 5.00E-05 1.41 0.25 4.80E-07 1.63 rs35198068 10 114754784 T C TCF7L2 0.30 3.23E-12 1.51 0.30 2.70E-06 1.43 0.31 1.07E-07 1.58 rs36090025 10 114774433 A C TCF7L2 0.31 2.44E-10 1.45 0.30 3.39E-05 1.36 0.31 5.90E-07 1.54 rs4132670 10 114767771 G A TCF7L2 0.33 6.46E-10 1.43 0.33 1.74E-06 1.43 0.34 3.56E-05 1.44 rs4267006 10 114758779 G T TCF7L2 0.24 7.82E-11 1.52 0.23 4.39E-05 1.41 0.24 4.60E-07 1.61 rs4367880 10 114795256 G C TCF7L2 0.21 8.95E-09 1.47 0.21 3.53E-04 1.36 0.21 8.41E-06 1.55 rs4506565 10 114756041 A T TCF7L2 0.33 1.40E-10 1.46 0.32 2.57E-06 1.43 0.33 5.53E-06 1.48 rs4575195 10 114765747 C A TCF7L2 0.33 4.38E-10 1.44 0.33 1.73E-06 1.43 0.34 2.28E-05 1.44 rs55853916 10 114781950 G C TCF7L2 0.23 3.36E-10 1.50 0.23 1.44E-04 1.37 0.24 5.13E-07 1.62 rs55899248 10 114773608 A G TCF7L2 0.24 3.07E-10 1.50 0.23 9.88E-05 1.38 0.24 7.06E-07 1.61 rs55972445 10 114782790 G T TCF7L2 0.23 3.61E-10 1.50 0.23 1.47E-04 1.37 0.24 5.47E-07 1.61 rs56087297 10 114799172 A G TCF7L2 0.28 6.84E-09 1.42 0.28 7.83E-05 1.35 0.28 9.35E-06 1.48 rs56299331 10 114788436 C T TCF7L2 0.21 7.19E-09 1.48 0.21 3.41E-04 1.36 0.21 7.21E-06 1.56 rs61872784 10 114798893 T A TCF7L2 0.28 4.04E-09 1.42 0.28 5.37E-05 1.36 0.29 7.66E-06 1.49 rs61872786 10 114806697 G A TCF7L2 0.22 1.33E-08 1.46 0.21 4.30E-04 1.35 0.22 8.38E-06 1.54 rs61875118 10 114750236 T C TCF7L2 0.24 7.12E-11 1.53 0.24 4.95E-05 1.41 0.25 4.37E-07 1.63 rs61875119 10 114750380 G A TCF7L2 0.24 1.50E-10 1.52 0.24 6.50E-05 1.40 0.25 7.22E-07 1.61 rs61875120 10 114753259 T C TCF7L2 0.24 1.15E-11 1.56 0.24 1.83E-05 1.44 0.24 1.58E-07 1.65 rs7074440 10 114785424 G A TCF7L2 0.31 8.03E-10 1.44 0.31 1.75E-05 1.38 0.31 4.67E-06 1.49 rs72826075 10 114750500 A G TCF7L2 0.24 7.02E-11 1.53 0.24 4.91E-05 1.41 0.25 4.34E-07 1.63 rs72826094 10 114801488 A T TCF7L2 0.21 9.00E-09 1.47 0.21 3.65E-04 1.36 0.21 8.01E-06 1.55 rs7901695 10 114754088 T C TCF7L2 0.32 9.22E-11 1.46 0.32 8.28E-07 1.45 0.33 1.20E-05 1.47 rs7903146 10 114758349 C T TCF7L2 0.30 4.88E-12 1.50 0.30 4.73E-06 1.41 0.30 9.22E-08 1.58

51 Table S2.3. Association statistics for all variants that exceeded suggestive significance

rsID Chr Position Al1 Al2 HR p-value Gene (hg19) Annotation rs79706452 1 178299962 T C 0.34 5.10E-07 RASAL2 rs111417935 1 178328393 G A 0.26 1.20E-07 RASAL2 rs112468284 1 178393562 G C 0.28 3.20E-07 RASAL2 rs7512462 1 205899595 T C 0.76 1.60E-06 SLC26A9 rs7549173 1 205906897 G C 1.33 6.80E-07 SLC26A9 rs2036100 1 205907872 C G 0.74 5.00E-07 SLC26A9 rs6593975 1 205909276 T C 1.32 5.50E-07 SLC26A9 rs6593976 1 205909285 C A 1.32 5.40E-07 SLC26A9 rs61814953 1 205910080 T C 0.75 9.50E-07 SLC26A9 rs7415921 1 205910883 T G 1.34 1.80E-07 SLC26A9 rs1342061 1 205911385 A T 1.35 7.40E-08 SLC26A9 2:45327721:AGTT_ 2 45327721 R D 2.38 9.80E-07 UNQ6975 rs838431 2 232589343 G A 1.67 1.20E-06 PTMA rs752560 2 232589622 C T 1.67 1.20E-06 PTMA rs838429 2 232591793 C T 1.66 1.80E-06 PTMA rs838428 2 232591857 C T 1.66 1.40E-06 PTMA rs838426 2 232596066 C A 1.65 1.80E-06 PTMA rs838425 2 232597042 T C 1.65 1.80E-06 PTMA rs838423 2 232598197 G A 1.65 1.80E-06 PTMA rs1246076 2 232600629 T C 1.65 1.80E-06 PTMA rs838450 2 232624379 A G 1.65 1.80E-06 PTMA rs117044562 7 1678724 A C 6.77 6.80E-07 ELFN1 rs7787637 7 110039723 A C 2.08 1.00E-07 IMMP2L rs73198785 7 110050180 G T 2.04 1.10E-06 IMMP2L rs6978971 7 110056350 G A 2.03 9.80E-07 IMMP2L rs6979574 7 110056953 A G 2.06 9.00E-07 IMMP2L rs117602021 7 110059908 T C 2.07 8.10E-07 IMMP2L rs73198792 7 110060417 G C 2.07 8.00E-07 IMMP2L rs73203207 7 110068084 C A 2.09 6.90E-07 IMMP2L rs13235769 7 110071923 T C 2.09 6.70E-07 IMMP2L rs7016179 8 6207719 C G 2.09 4.70E-07 MCPH1 rs17533237 8 6208089 G A 2.08 5.80E-07 MCPH1 rs35431302 8 144004698 A G 3.08 1.10E-07 CYP11B2 rs142582579 8 144025469 C G 4.58 1.40E-06 CYP11B2 rs4073672 8 144026552 G A 4.74 1.00E-06 CYP11B2 rs138789732 8 144026980 C T 4.74 1.00E-06 CYP11B2 rs12243578 10 114733456 C T 1.38 1.30E-06 TCF7L2 rs11196175 10 114736614 T C 1.39 7.10E-07 TCF7L2 10:114739524:T_TA 10 114739524 R I 1.40 4.30E-07 TCF7L2 rs4073980 10 114746580 C G 1.35 2.40E-07 TCF7L2 rs4073288 10 114747277 G A 1.34 8.80E-07 TCF7L2 rs11196180 10 114748029 A T 1.34 8.30E-07 TCF7L2 rs4074720 10 114748497 C T 1.35 1.90E-07 TCF7L2 rs4074718 10 114748617 G A 1.35 1.90E-07 TCF7L2 rs10885402 10 114761697 A C 1.34 2.10E-07 TCF7L2 rs6585197 10 114762064 G T 1.33 2.30E-07 TCF7L2 rs6585198 10 114762237 G A 1.33 2.90E-07 TCF7L2 rs10787471 10 114764661 G A 1.31 1.80E-06 TCF7L2 rs6585199 10 114765171 G A 1.33 3.10E-07 TCF7L2 rs6585200 10 114768609 A G 1.33 2.80E-07 TCF7L2

(Table S2.3 is continued on next page…)

52

(Table S2.3 continued…)

rsID Chr Position (hg19) Al1 Al2 HR p-value Gene rs6585201 10 114768783 G A 1.33 3.70E-07 AnnotationTCF7L2 10:114772253:C_CA 10 114772253 R I 1.32 6.20E-07 TCF7L2 rs7904519 10 114773927 A G 1.32 5.20E-07 TCF7L2 10:114777138:CAGTG 10 114777138 R D 1.32 5.80E-07 TCF7L2 rs7918599 10 114777396 C T 1.32 5.10E-07 TCF7L2 rs10885405 10 114777670 C T 1.31 7.20E-07 TCF7L2 rs10885406 10 114777724 A G 1.32 6.40E-07 TCF7L2 rs11196190 10 114778252 A G 1.31 9.30E-07 TCF7L2 rs7899529 10 114779538 G A 1.31 8.20E-07 TCF7L2 rs11196191 10 114780633 A C 1.32 4.80E-07 TCF7L2 rs10787472 10 114781297 A C 1.32 5.70E-07 TCF7L2 rs10787473 10 114781400 C A 1.32 5.70E-07 TCF7L2 rs12258200 10 114781698 T C 1.32 4.90E-07 TCF7L2 rs6585202 10 114782803 T C 1.32 6.30E-07 TCF7L2 rs7921525 10 114783333 C T 1.35 8.60E-08 TCF7L2 rs6585203 10 114783403 C G 1.32 5.50E-07 TCF7L2 rs6585204 10 114783586 C G 1.32 6.10E-07 TCF7L2 rs11196193 10 114783838 T C 1.32 5.00E-07 TCF7L2 rs4309084 10 114785278 A G 1.32 4.50E-07 TCF7L2 rs4128598 10 114785939 A G 1.32 4.50E-07 TCF7L2 rs4128597 10 114786154 A G 1.32 4.00E-07 TCF7L2 rs7924080 10 114787012 T C 1.33 3.10E-07 TCF7L2 rs7907610 10 114787090 A G 1.33 3.50E-07 TCF7L2 rs7077039 10 114789077 T C 1.31 1.10E-06 TCF7L2 rs10885408 10 114791388 A G 1.32 4.60E-07 TCF7L2 rs7086857 10 114793237 A G 1.31 7.70E-07 TCF7L2 rs12359102 10 114793451 G A 1.31 7.60E-07 TCF7L2 rs7900150 10 114793823 T A 1.31 7.40E-07 TCF7L2 rs7100927 10 114796048 A G 1.31 9.90E-07 TCF7L2 rs7895340 10 114801525 G A 1.31 6.80E-07 TCF7L2 rs11196200 10 114801938 C G 1.31 6.70E-07 TCF7L2 rs11196205 10 114807047 G C 1.31 6.00E-07 TCF7L2 rs10885409 10 114808072 T C 1.31 6.10E-07 TCF7L2 rs12265291 10 114810240 T C 1.31 6.80E-07 TCF7L2 rs11196208 10 114811316 T C 1.31 8.10E-07 TCF7L2 rs7077247 10 114812071 T C 1.31 7.40E-07 TCF7L2 rs4077527 10 114813041 C T 1.31 9.50E-07 TCF7L2 rs12718338 10 114813047 C T 1.31 9.60E-07 TCF7L2 rs7071302 10 114817527 T G 1.32 7.10E-07 TCF7L2 rs35519679 10 114818754 G A 1.40 3.40E-07 TCF7L2

Suggestive significance is p-value<1.8e-6.

53 Table S2.4. Association statistics of T2D risk variants previously identified as CFRD modifiers other than TCF7L2. Phase 1 + 2 (n=5740; 1341 with Phase 1 Phase 2 CFRD) (n=3043; 639 with CFRD) (n=2697; 702 with CFRD) Position rsID Chr (hg19) Al1 Al2 Gene freq p-value HR freq p-value HR freq p-value HR

rs1412829 9 22043926 T C CDKN2A/B 0.399 1.42E-03 0.833 0.406 2.46E-04 0.772 0.392 3.45E-01 0.921 rs1470579 3 185529080 A C IGF2BP2 0.325 1.12E-04 1.261 0.317 4.56E-03 1.234 0.334 6.23E-03 1.280

rs7754840 6 20661034 G C CDKAL1 0.320 1.20E-02 1.160 0.325 2.61E-02 1.175 0.315 1.12E-01 1.152

54 Table S2.5. Associations of variants with T2D and CFRD that were genome-wide significant (p-value<5e-8) in either study. The most significantly associated variant at each locus has been listed.

Position CFRD CFRD T2D T2D rsID Chr (hg19) Al1 Al2 Annotation p-value HR p-value log(OR) rs3768321 1 40035928 G T LOC101929516 0.015 0.8302736 1.30E-26 -0.085 rs58432198 1 51256091 C T FAF1 0.213 0.893508 1.80E-10 0.065 rs12140153 1 62579891 G T PATJ 0.566 0.9330467 1.2E-08 0.064 rs1127215 1 117532790 C T PTGFRN 0.216 0.9304379 2.30E-13 0.047 rs1493694 1 120526982 C T NOTCH2 0.47 1.072294 2.10E-16 -0.084 rs145904381 1 151017991 T C BNIPL 0.224 1.362062 2.20E-08 0.17 rs539515 1 177889025 A C LINC01741 0.95 0.9955798 1.20E-10 -0.051 rs12048743 1 205114873 G C DSTYK 0.394 1.050956 4.40E-09 0.038 rs4077468 1 205914757 A G SLC26A9 2.20E-08 0.7235396 1.30E-01 -0.0099 rs9430095 1 206593900 C G SRGAP2 0.452 1.045087 2.30E-08 0.036 rs340874 1 214159256 T C PROX1-AS1 0.84 1.011465 5.60E-26 -0.068 rs2820446 1 219748818 C G LOC102723886 0.938 1.004912 3.70E-16 0.057 rs499689 1 229565732 A G CCSAP 0.716 1.023164 1.40E-08 0.037 rs348330 1 229672955 G A ABCB10 0.76 0.9809831 3.90E-14 0.051 rs291367 1 235690800 G A MIR5096 0.977 0.9983014 6.10E-10 0.044 rs62107261 2 422144 T C LINC01865 0.97 0.9914072 1.80E-11 0.11 rs35913461 2 653575 C T LOC105373352 0.949 1.004982 5.90E-11 0.056 rs11680058 2 16574669 A G GACAT3 0.31 0.8921688 1.30E-08 0.058 rs17802463 2 25643221 G T DTNB 0.355 0.9416704 3.50E-08 0.039 rs1260326 2 27730940 C T GCKR 0.42 1.047912 1.30E-24 0.067 rs80147536 2 43698028 A T THADA 0.433 0.9242247 2.70E-30 0.13 rs10193538 2 58981064 T G LINC01122 0.81 1.014403 1.70E-08 0.037 rs6545714 2 59307725 A G LINC01122 0.862 1.010252 1.70E-08 -0.037 rs243024 2 60583665 G A LINC01793 0.424 0.9544691 4.40E-20 -0.058 rs2249105 2 65287896 A G CEP68 0.326 0.9453501 1.20E-15 0.053 rs2028150 2 65655012 C G SPRED2 0.012 0.8639849 3.09E-15 0.052 rs11688682 2 121347612 G C LINC01101 0.464 0.9388496 1.40E-14 0.058 rs35999103 2 147861633 C T PABPC1P2 0.1 0.8775687 8.30E-09 -0.052 rs13426680 2 158339550 A G CYTIP 0.845 1.023062 6.40E-10 0.082 rs3772071 2 161135544 T C RBMS1 0.13 0.9090093 1.60E-11 0.048 rs10195252 2 165513091 T C GRB14 0.8 0.985112 1.60E-20 0.06 rs2972144 2 227101411 G A LOC646736 0.88 1.009293 7.90E-46 0.094 rs838455 2 232560638 C T PTMA 3.00E-08 1.746301 8.90E-01 0.0017 rs11709077 3 12336507 G A PPARG 0.764 0.9736533 1.60E-27 0.11 rs35352848 3 23455582 T C MIR548AC 0.691 1.028704 9.50E-20 0.071 rs11926707 3 46925539 C T PTH1R 0.936 1.005073 1.50E-08 0.038 rs4683324 3 47251284 C T KIF9-AS1 0.659 1.026136 3.60E-08 0.035 rs6446298 3 49860854 C T UBA7 0.728 0.9794148 5.60E-09 0.039 rs4688760 3 49980596 T C RBM6 0.219 0.9244096 4.50E-10 0.043 rs2014830 3 50172397 C T SEMA3F-AS1 0.357 0.9446886 5.9E-09 0.041 rs2581787 3 53127677 T G RFT1 0.74 1.019284 3.00E-08 0.036 rs76263492 3 54828827 G T CACNA2D3 0.474 1.097462 6.30E-09 -0.091 rs3774723 3 63962339 G A ATXN7 0.709 1.029116 2.00E-13 0.065 rs9860730 3 64701146 G A ADAMTS9-AS2 0.566 1.036967 7.40E-15 -0.055 rs13085136 3 72865183 C T SHQ1 0.017 0.6989332 1.40E-08 0.074 rs2272163 3 77671721 C A ROBO2 0.576 1.034792 1.20E-08 0.037 rs11708067 3 123065778 A G ADCY5 0.0011 0.7969208 1.30E-31 0.089 rs649961 3 124926637 C T SLC12A8 0.892 1.00776 1.30E-09 -0.038 rs9828772 3 129333182 C G PLXND1 0.54 0.9432726 4.20E-08 0.059 rs62271373 3 150066540 T A LINC01214 0.576 0.9298797 1.00E-09 -0.088 rs147579559 3 152086533 A G MBNL1 0.18 0.9234857 3.60E-09 0.039 rs74653713 3 152417881 C A MBNL1 0.76 0.9535151 5.70E-09 0.094 rs7629630 3 168218841 A T EGFEM1P 0.459 0.9410114 2.20E-08 0.051 rs9873618 3 170733076 G A SLC2A2 0.0069 0.8454384 8.50E-21 0.066 rs2872246 3 183738460 C A ABCC5 0.629 1.028704 1.80E-08 -0.036 rs6780171 3 185503456 T A IGF2BP2 0.024 1.144308 2.50E-58 -0.11 rs3887925 3 186665645 C T ST6GAL1 0.533 1.036448 1.40E-17 -0.055 (Table S2.5 is continued on next page..)

55 Position CFRD CFRD T2D T2D rsID Chr (hg19) Al1 Al2 Annotation p-value HR p-value log(OR) rs4686471 3 187740899 C T LINC01991 0.945 0.9959881 3.10E-20 0.06 rs1531583 4 744972 G T PCGF3 0.93 1.012174 1.20E-12 -0.11 rs56337234 4 1784403 T C TACC3 0.231 1.106941 1.40E-17 -0.057 rs362307 4 3241845 C T HTT 0.113 1.200094 1.10E-09 -0.074 rs10937721 4 6306763 C G WFS1 0.165 0.9188797 1.60E-40 0.087 rs12640250 4 17792869 C A FAM184B 0.629 0.9686034 4.50E-08 0.039 rs10938398 4 45186139 G A GNPDA2 0.549 0.9663782 4.90E-12 -0.044 rs2102278 4 52818664 A G DCUN1D4 0.684 0.9740428 4.50E-08 -0.038 rs138641407 4 83578271 G A SCD5 0.937 0.9952513 5.70E-10 -0.042 rs1903002 4 89740894 G C FAM13A 0.174 1.084696 3E-08 0.036 rs6821438 4 95091911 G A LOC101929210 0.457 0.9584863 5.39E-11 -0.042 rs1580278 4 104140848 A C CENPE 0.53 1.039147 2.90E-10 -0.041 rs1296328 4 137083193 A C LINC00613 0.039 1.138601 4.30E-08 0.035 rs7669833 4 153513369 T A LINC02486 0.84 0.987479 1.80E-14 0.054 rs28819812 4 157652753 C A CTSO 0.762 0.9816701 2.70E-08 0.04 rs58730668 4 185717759 T C ACSL1 0.173 0.8917227 1.00E-13 0.068 rs6885132 5 14768092 C G ANKH 0.593 1.054852 9.50E-13 0.078 rs6884702 5 44682589 A G LINC02224 0.62 1.034999 5.80E-09 -0.038 rs3811978 5 52100489 A G ITGA1 0.195 0.9053805 4.20E-10 -0.053 rs702634 5 53271420 A G ARL15 0.11 0.9030296 2.1E-13 0.051 rs2454916 5 53448014 T C ARL15 0.675 0.9730693 4.00E-08 0.04 rs465002 5 55808475 T C C5orf67 0.82 0.9846195 3.80E-23 0.073 rs2307111 5 75003678 C T POC5 0.688 0.976774 3.30E-16 -0.053 rs4457053 5 76424949 A G ZBED3-AS1 0.95 0.9961076 1.40E-17 -0.059 rs1316776 5 78430607 C A BHMT 0.423 1.049905 3.50E-12 0.046 rs7719891 5 86577352 A G RASA1 0.239 1.081015 2.90E-08 -0.04 rs145510090 5 101273694 T A ST8SIA4 0.51 1.098889 1.10E-12 -0.1 rs115505614 5 102422968 C T GIN1 0.396 1.138259 1.70E-29 -0.17 rs116407196 5 102973337 A G NUDT12 0.757 0.9530385 4.40E-10 -0.1 rs329122 5 133864599 G A JADE2 0.299 1.0612 9.20E-09 -0.037 rs9379084 6 7231843 G A RREB1 0.276 0.8949388 2.30E-20 0.097 rs7756992 6 20679709 A G CDKAL1 0.000023 1.299137 3.00E-87 -0.14 rs116664241 6 31577825 G A NCR3 0.015 0.8138331 9.30E-17 0.073 rs116322440 6 32652196 T C HLA-DQB1 0.129 1.22544 4.50E-21 -0.1 rs34298980 6 40409243 T C LRFN2 0.74 1.021937 1.20E-09 0.04 rs6458354 6 43814190 T C VEGFA 0.381 0.9454446 3.70E-13 -0.051 rs3798519 6 50788778 A C TFAP2B 0.182 0.9050184 1.10E-12 -0.058 rs4946812 6 107431688 G A BEND3 0.85 1.015418 1.00E-08 0.039 rs11759026 6 126792095 A G CENPW 0.959 1.003868 1.30E-18 -0.067 rs719727 6 127414838 A G MIR588 0.038 1.151885 4.00E-15 0.059 rs1573090 6 137302159 T G SLC35D3 0.96 0.9972637 8.40E-15 0.05 rs474513 6 160770312 A G SLC22A3 0.953 0.996546 1.00E-09 0.039 rs4709746 6 164133001 C T LOC102724152 0.11 0.8669274 5.00E-09 0.056 rs10228066 7 15063569 T C DGKB 0.75 0.9818664 1.90E-25 0.066 rs4279506 7 23512896 C G IGF2BP3 0.935 1.004882 5.70E-09 -0.039 rs62451127 7 27971300 G A JAZF1 0.604 0.9347277 2.20E-09 0.086 rs1708302 7 28198677 C T JAZF1 0.851 1.010656 4.20E-48 0.092 rs917195 7 30728452 C T CRHR2 0.056 0.8695321 5.60E-11 0.051 rs878521 7 44255643 G A YKT6 0.335 0.9371611 1.60E-14 -0.057 rs11496066 7 102486254 T C FBXL13 0.258 0.9195232 1.20E-08 0.047 rs39328 7 103444978 C T RELN 0.294 0.940447 3.00E-08 -0.036 rs6976111 7 117495667 C A CTTNBP2 0.22 0.921272 1.50E-08 -0.042 rs1562396 7 130457914 A G KLF14 0.524 0.9573368 7.60E-17 -0.058 rs62492368 7 150537635 G A TMEM176A 0.042 0.8802934 1.50E-10 -0.044 rs6459733 7 156930550 G C MNX1-AS1 0.717 0.9758953 3.90E-17 0.058 rs2921077 8 8304502 A G PRAG1 0.81 0.9855061 1.30E-09 0.04 rs12542733 8 8824858 G T MFHAS1 0.97 1.001862 5.70E-11 0.042 rs6990912 8 9200472 C A LOC157273 0.239 0.9317414 1.00E-10 -0.044 rs151051146 8 9677216 T G TNKS 0.818 0.9854075 5.00E-10 -0.043 rs17689007 8 9974824 G A MSRA 0.31 1.059185 1.70E-13 0.048 rs9329221 8 10240202 G T MSRA 0.158 0.921272 1.70E-09 -0.039 rs2001433 8 10903475 T A XKR6 0.268 1.066946 3.10E-13 0.048 rs10096633 8 19830921 C T LPL 0.365 0.9273725 8.70E-13 0.07 rs10954772 8 30863938 C T PURG 0.265 0.9312756 2.30E-09 -0.041 rs13262861 8 41508577 C A NKX6-3 0.0087 0.8117198 1.80E-27 0.094 (Table S2.5 is continued on next page..)

56 Position CFRD CFRD T2D T2D rsID Chr (hg19) Al1 Al2 Annotation p-value HR p-value log(OR) rs10097617 8 95961626 C T TP53INP1 0.037 1.126032 1.10E-15 -0.051 rs12680028 8 110123183 G C TRHR 0.19 0.927929 3.10E-08 -0.035 rs3802177 8 118185025 G A SLC30A8 0.018 0.8599337 6.30E-55 0.11 rs17772814 8 128711742 G A CASC8 0.566 0.9033908 5.00E-10 0.078 rs1561927 8 129568078 T C LINC00824 0.065 1.121761 1.90E-09 -0.043 rs13267392 8 145245647 C T MROH1 0.669 0.9545646 1.50E-09 -0.046 rs4977213 8 145507304 T C BOP1 0.82 0.9862948 4.40E-14 -0.051 rs12719778 8 145879883 T C ARHGAP39 0.875 1.009081 2.10E-09 0.039 rs10974438 9 4291928 A C GLIS3 0.014 1.17269 1.60E-14 -0.051 rs7022807 9 19067833 A G HAUS6 0.157 1.086542 3.60E-10 -0.04 rs7867635 9 20241069 C T SLC24A2 0.373 1.057915 4.10E-08 0.036 rs10811660 9 22134068 G A CDKN2B-AS1 0.017 0.824482 6.59E-79 0.16 rs1412234 9 28410683 T C LINGO2 0.182 1.085999 2.5E-10 -0.043 rs12001437 9 34074476 T C UBAP2 0.888 1.008415 3.70E-10 -0.041 rs11137820 9 81359113 C G PSAT1 0.777 1.017044 3.60E-08 0.035 rs17791513 9 81905590 A G LOC101927450 0.302 1.137008 2.90E-14 0.1 rs2796441 9 84308948 G A LOC101927502 0.11 0.908464 8.50E-24 0.066 rs55653563 9 97001682 A C MIRLET7DHG 0.34 0.9396009 3.20E-09 0.043 rs505922 9 136149229 T C ABO 0.889 1.008466 5.40E-12 -0.046 rs28642213 9 139248082 G A GPSM1 0.475 0.9393191 4.80E-20 0.073 rs11257655 10 12307894 C T CDC123 0.293 1.078639 3.70E-32 -0.09 rs177045 10 71321279 A G TSPAN15 0.564 1.035827 1.00E-12 -0.05 rs2642588 10 71466578 G T C10orf35 0.074 0.8948492 6.30E-14 0.052 rs703972 10 80952826 G C ZMIZ1 0.397 0.9524668 2.50E-28 0.071 rs146240813 10 93566126 C T TNKS2 0.88 1.018163 5.60E-10 -0.077 rs10882101 10 94462427 T C HHEX 0.87 1.009293 1.60E-62 0.11 rs7903146 10 114758349 C T TCF7L2 4.90E-12 1.501253 0.00E+00 -0.31 rs2280141 10 124193181 G T PLEKHA1 0.342 1.055485 2.00E-13 -0.047 rs4929965 11 2197286 G A MIR4686 0.847 0.988467 4.79E-25 -0.07 rs2237895 11 2857194 A C KCNQ1 0.349 1.061094 3.60E-44 -0.093 rs141521721 11 14763828 C A PDE3B 0.85 0.9610777 2.80E-08 -0.12 rs74643981 11 15103981 T G CALCB 0.397 0.8158702 3.70E-08 -0.12 rs5213 11 17408404 T C KCNJ11 0.792 0.9844226 1.90E-26 -0.071 rs145678014 11 32927778 G T QSER1 0.223 0.8140773 1.10E-11 0.11 rs2767036 11 34982148 A C PDHX 0.36 1.060033 2.50E-08 -0.039 rs1061810 11 43877934 C A HSD17B12 0.558 1.037797 8.50E-13 -0.05 rs7115753 11 45912013 A G MAPK8IP1 0.993 0.9994742 4.80E-09 0.038 rs7124681 11 47529947 C A CELF1 0.68 0.9765787 6.40E-09 -0.037 rs7122217 11 47777920 C G FNBP4 0.692 0.9777513 4.60E-08 0.035 rs1783541 11 65294799 C T SCYL1 0.116 1.117618 1.40E-14 -0.061 rs2276133 11 65562257 G A OVOL1 0.28 0.9309032 1.20E-08 0.042 rs11820019 11 69448758 T C LINC01488 0.783 0.934167 1.00E-11 0.14 rs77464186 11 72460398 A C ARAP1 0.291 1.086868 2.30E-33 0.11 rs1783598 11 72851463 T C FCHSD2 0.261 1.08307 3.80E-17 0.068 rs10830963 11 92708710 C G MTNR1B 0.5 0.9480007 1.50E-43 -0.099 rs57235767 11 93013531 C T SLC36A4 0.342 0.9411997 6.60E-11 0.046 rs10893829 11 128042575 T C LOC101929497 0.612 0.9613661 2.70E-10 0.057 rs10750397 11 128234144 G A LINC02098 0.227 0.9267235 2.00E-10 -0.045 rs67232546 11 128398938 C T LOC101929517 0.015 1.177037 1.40E-12 -0.056 rs117233107 12 4328521 G A PARP11 0.331 0.6989332 9.50E-34 0.35 rs2066827 12 12871099 T G CDKN1B 0.56 1.047912 3.5E-08 -0.044 rs10842675 12 26259919 G A RASSF8 0.21 0.927187 1.20E-08 -0.038 rs718314 12 26453283 A G SSPN 0.723 0.9772625 1.10E-10 -0.047 rs10842994 12 27965150 C T KLHL42 0.601 0.9628092 2.50E-20 0.074 rs2258238 12 66221060 A T HMGA2 0.595 1.047493 2.00E-25 -0.11 rs1796330 12 71522953 G C TSPAN8 0.046 0.8898521 3.20E-14 0.049 rs2197973 12 95928560 T C USP44 0.399 1.049905 4.40E-08 0.035 rs77864822 12 97848775 A G NEDD1 0.9 1.015621 2.20E-08 0.073 rs1426371 12 108629780 G A WSCD2 0.759 0.979219 1.10E-11 0.05 rs34965774 12 118412373 G A KSR2 0.733 1.030042 3.50E-09 -0.054 rs56348580 12 121432117 G C HNF1A 0.18 0.9175943 3.80E-19 0.062 rs138062324 12 123296204 G A CCDC62 0.0088 1.303952 3.70E-09 0.054 rs4148856 12 123450765 C G ABCB9 0.042 1.152807 2.20E-10 0.049 rs12811407 12 133069698 G A FBRSL1 0.238 1.076591 2.40E-12 -0.049 rs34584161 13 26776999 A G RNF6 0.581 0.9629055 2.90E-10 0.048 (Table S2.5 is continued on next page..)

57 Position CFRD CFRD T2D T2D rsID Chr (hg19) Al1 Al2 Annotation p-value HR p-value log(OR) rs11842871 13 31042452 G T HMGB1 0.641 1.032931 1.50E-08 0.042 rs576674 13 33554302 A G LINC00423 0.252 1.090351 6.80E-10 -0.053 rs963740 13 51096095 A T DLEU1 0.049 0.8808218 2.6E-08 0.039 rs34927497 13 58472262 A C PCDH17 0.904 0.9909413 1.30E-08 0.048 rs9563615 13 59077406 A T LINC00374 0.202 1.082313 3.90E-09 0.042 rs1359790 13 80717156 G A LINC01080 0.032 0.8716215 5.70E-31 0.083 rs7987740 13 109947213 T C MYO16 0.353 0.9474321 4.1E-08 0.036 rs17122772 14 23288935 C G SLC7A7 0.934 0.9935609 2.00E-08 -0.043 rs17522122 14 33302882 G T AKAP6 0.299 1.061306 4.00E-09 -0.038 rs8017808 14 38848419 G T CLEC14A 0.24 1.080366 2.60E-08 0.041 rs17836088 14 79932041 G C NRXN3 0.225 1.089806 9.70E-14 -0.058 rs8010382 14 91963722 G A PPP4R3A 0.857 1.010757 8.10E-09 0.038 rs62007683 14 103894071 G T MARK3 0.61 0.9702514 3.80E-08 0.037 rs34715063 15 38873115 T C RASGRP1 0.85 0.9804928 3.29E-14 -0.076 rs11070332 15 41809205 G A RPAP1 0.0023 1.203941 1.30E-13 -0.049 rs2456530 15 53091553 C T ONECUT1 0.271 1.093737 4.70E-09 -0.056 rs117483894 15 57456802 A G TCF12 0.537 0.9068303 3.90E-08 -0.093 rs8037894 15 62394264 G C C2CD4A 0.91 0.9938589 3.70E-13 0.047 rs7178762 15 63871292 C T USP3 0.2 0.9299728 7.00E-10 0.039 rs4776970 15 68080886 T A MAP2K5 0.913 1.006521 6.20E-09 -0.039 rs13737 15 75932129 G T IMP3 0.12 0.8965511 7.30E-10 0.046 rs1005752 15 77818128 A C LOC101929457 0.512 1.042373 5.70E-29 0.079 rs4932265 15 90423293 C T AP3S2 0.971 1.002383 7.20E-20 -0.065 rs12910825 15 91511260 G A PRC1-AS1 0.035 0.8793256 2.40E-15 0.053 rs6600191 16 295795 T C FAM234A 0.809 1.017959 7.00E-13 0.061 rs3751837 16 3583173 C T CLUAP1 0.634 1.034481 1.70E-08 -0.044 rs8046545 16 28915217 A G ATP2A1 0.436 0.9517052 2.30E-08 -0.037 rs11642430 16 30045789 C G FAM57B 0.381 1.053481 1.20E-10 -0.042 rs62048489 16 53454855 G T LOC102723373 0.8 1.04029 3.10E-09 0.072 rs1421085 16 53800954 T C FTO 0.681 0.9763833 2.40E-78 -0.12 rs862320 16 69651866 C T NFAT5 0.394 0.9517052 5.10E-11 0.042 rs72802342 16 75234872 C A ZFP1 0.012 0.7041246 1.30E-27 0.13 rs2925979 16 81534790 C T CMIP 0.639 1.029116 2.10E-14 -0.053 rs12920022 16 89564055 T A ANKRD11 0.634 0.9600211 2.90E-09 -0.053 rs1377807 17 4045440 G C ZZEF1 0.46 0.9540874 5.70E-17 -0.057 rs7222481 17 9785187 G C GLP2R 0.57 1.036759 1.70E-08 -0.039 rs4925109 17 17661802 A G RAI1 0.647 0.9725829 3.90E-12 0.048 rs71372253 17 29413019 T C LOC646030 0.754 1.051061 4.30E-08 -0.073 rs10908278 17 36099952 A T HNF1B 0.064 1.117395 3.10E-30 -0.074 rs34855406 17 40731411 G C RETREG3 0.441 1.052428 3.20E-12 -0.05 rs11552192 17 40962509 A T BECN1 0.248 1.131111 1.80E-09 -0.06 rs35895680 17 47060322 C A GIP 0.7 1.026444 3.80E-15 0.055 rs60276348 17 62203304 C T ERN1 0.478 0.9381926 2.90E-08 -0.052 rs61676547 17 65892507 G C BPTF 0.14 0.8894073 1.00E-11 -0.055 rs7240767 18 7070642 C T LAMA1 0.262 0.9355693 2.00E-08 0.037 rs62080313 18 36278709 T C MIR4318 0.72 0.9704455 9.10E-09 -0.056 rs72940580 18 52848102 A G LINC01929 0.48 1.119968 5.50E-09 -0.09 rs72926932 18 53050646 A C TCF4 0.126 1.181636 3.60E-13 -0.083 rs17684074 18 54675384 G C WDR7 0.54 0.9613661 3.50E-08 0.041 rs9957145 18 56876228 G A SEC11C 0.086 0.8728426 6.70E-09 0.05 rs523288 18 57848369 A T PMAIP1 0.81 1.016739 7.50E-14 -0.056 rs12454712 18 60845884 T C BCL2 1 0.9997311 5.10E-13 0.049 rs7249758 19 4948862 G A UHRF1 0.09 0.8802054 1.20E-08 -0.045 rs75253922 19 7240848 T C INSR 0.724 1.026238 2.20E-08 -0.046 rs4804833 19 7970635 G A MAP2K7 0.78 1.016739 1.10E-12 -0.047 rs3111316 19 13038415 A G FARSA 0.261 0.9354758 1.60E-12 0.046 rs8107974 19 19388500 A T SUGP1 0.88 0.9829471 6.30E-15 -0.093 rs10406327 19 33890838 C G PEPD 0.87 1.009404 4.60E-08 0.035 rs429358 19 45411941 T C APOE 0.79 0.9762857 1.80E-18 0.08 rs10406431 19 46157019 A G EML2 0.218 0.9278362 2.50E-19 0.059 rs3810291 19 47569003 G A ZC3H4 0.81 0.9829471 1.20E-11 -0.046 rs13041756 20 21466795 T C NKX2-4 0.65 1.044147 1.30E-08 -0.058 rs2747567 20 32381337 G A ZNF341-AS1 0.031 0.8735412 1.10E-08 0.038 rs2268078 20 32596704 A G RALY 0.628 1.030352 2.90E-10 0.043 rs1800961 20 43042364 C T HNF4A 0.46 0.8726681 3.20E-20 -0.16 (Table S2.5 is continued on next page..)

58 Position CFRD CFRD T2D T2D rsID Chr (hg19) Al1 Al2 Annotation p-value HR p-value log(OR) rs6063048 20 45598564 G A EYA2 0.242 1.078855 5.80E-11 0.047 rs11699802 20 48832135 C T CEBPB 0.000011 0.7540475 2.50E-11 0.043 rs34454109 20 51223594 A T ZFP64 0.97 1.003015 8.80E-09 0.044 rs6070625 20 57394628 C G GNAS-AS1 0.873 1.009192 3.20E-12 -0.044 rs6518681 22 30609554 G A HORMAD2 0.612 0.9462959 9.60E-13 0.083 rs117001013 22 32348841 C T YWHAH 0.379 1.100979 1.50E-08 0.065 rs5758223 22 41489920 A G EP300 0.148 0.9095548 4.60E-08 0.038 rs738408 22 44324730 C T PNPLA3 0.809 1.016129 1.80E-10 -0.049 rs1801645 22 50356850 T C PIM3 0.572 1.057915 1.50E-10 -0.048 (End of Table S2.5)

59 Table S2.6. Variants included in each polygenic risk score. Position Risk Other OR or Phenotype # rsID Chr (hg19) Nearest Gene Allele Allele Effect Size Type 2 Diabetes 1 rs3768321 1 40035928 PABPC4/MACF1 T G 1.09 2 rs58432198 1 51256091 FAF1 C T 1.07 3 rs12140153 1 62579891 PATJ G T 1.07 4 rs184660829 1 115144899 DENND2C C T 8.05 5 rs1127215 1 117532790 PTGFRN C T 1.05 6 rs1493694 1 120526982 NOTCH2 T C 1.09 7 rs145904381 1 151017991 BNIPL T C 1.19 8 rs539515 1 177889025 SEC16B C A 1.05 9 rs12048743 1 205114873 DSTYK G C 1.04 10 rs9430095 1 206593900 SRGAP2 C G 1.04 11 rs340874 1 214159256 PROX1 C T 1.07 12 rs2820446 1 219748818 LYPLAL1 C G 1.06 13 rs348330 1 229672955 ABCB10 G A 1.05 14 rs291367 1 235690800 GNG4 G A 1.04 15 rs62107261 2 422144 TMEM18 T C 1.12 16 rs11680058 2 16574669 FAM49A A G 1.06 17 rs17802463 2 25643221 DTNB G T 1.04 18 rs1260326 2 27730940 GCKR C T 1.07 19 rs80147536 2 43698028 THADA A T 1.13 20 rs6545714 2 59307725 FANCL G A 1.04 21 rs243024 2 60583665 BCL11A A G 1.06 22 rs2249105 2 65287896 CEP68 A G 1.1 23 rs11688682 2 121347612 GLI2 G C 1.05 24 rs35999103 2 147861633 PABPC1P2 T C 1.05 25 rs13426680 2 158339550 CYTIP A G 1.09 26 rs3772071 2 161135544 RBMS1 T C 1.05 27 rs10195252 2 165513091 GRB14/COBLL1 T C 1.07 28 rs2972144 2 227101411 IRS1 G A 1.1 29 rs11709077 3 12336507 PPARG G A 1.14 30 rs35352848 3 23455582 UBE2E2 T C 1.07 31 rs11926707 3 46925539 PTH1R C T 1.27 32 rs4688760 3 49980596 RBM6 T C 1.04 33 rs2581787 3 53127677 RFT1 T G 1.04 34 rs76263492 3 54828827 CACNA2D3 T G 1.09 35 rs3774723 3 63962339 ATXN7 G A 1.07 36 rs9860730 3 64701146 ADAMTS9 A G 1.06 37 rs13085136 3 72865183 SHQ1 C T 1.08 38 rs2272163 3 77671721 ROBO2 C A 1.04 39 rs11708067 3 123065778 ADCY5 A G 1.09 40 rs649961 3 124926637 SLC12A8 T C 1.04 41 rs9828772 3 129333182 TMCC1 C G 1.06 42 rs62271373 3 150066540 TSC22D2 A T 1.09 43 rs7629630 3 168218841 EGFEM1P A T 1.05 44 rs9873618 3 170733076 SLC2A2 G A 1.07 45 rs2872246 3 183738460 ABCC5 A C 1.04 46 rs6780171 3 185503456 IGF2BP2 A T 1.14 47 rs3887925 3 186665645 ST6GAL1 T C 1.07 48 rs4686471 3 187740899 LPP C T 1.06 49 rs1531583 4 744972 PCGF3 T G 1.13 50 rs56337234 4 1784403 FGFR3 C T 1.06 51 rs362307 4 3241845 HTT T C 1.08 52 rs10937721 4 6306763 WFS1 C G 1.06 53 rs12640250 4 17792869 FAM184B/DCAF16 C A 1.04 54 rs10938398 4 45186139 GNPDA2 A G 1.05 55 rs2102278 4 52818664 DCUN1D4/LRRC66 G A 1.04 56 rs1903002 4 89740894 FAM13A G C 1.04 57 rs6821438 4 95091911 SMARCAD1 A G 1.04 58 rs1580278 4 104140848 CENPE C A 1.04 59 rs1296328 4 137083193 PCDH18 A C 1.04 60 rs7669833 4 153513369 TMEM154 T A 1.06 61 rs28819812 4 157652753 PDGFC C A 1.04 62 rs58730668 4 185717759 ACSL1 T C 1.07 63 rs6884702 5 44682589 MRPS30 G A 1.04 64 rs3811978 5 52100489 ITGA1 G A 1.06 65 rs702634 5 53271420 ARL15 A G 1.05 66 rs465002 5 55808475 ANKRD55 T C 1.11 67 rs2307111 5 75003678 POC5 T C 1.05 68 rs4457053 5 76424949 ZBED3 G A 1.06 69 rs1316776 5 78430607 BHMT C A 1.05 70 rs7719891 5 86577352 RASA1 G A 1.04 71 rs138337556 5 101232944 SLCO6A1 G A 1.56 72 rs115505614 5 102422968 GIN1 T C 1.19 73 rs329122 5 133864599 PHF15 A G 1.04 74 rs9379084 6 7231843 RREB1 G A 1.11 75 rs7756992 6 20679709 CDKAL1 G A 1.15 76 rs34298980 6 40409243 LRFN2 T C 1.04 77 rs6458354 6 43814190 VEGFA C T 1.05 78 rs3798519 6 50788778 TFAP2B C A 1.06 79 rs4946812 6 107431688 BEND3 G A 1.04 80 rs11759026 6 126792095 CENPW G A 1.07 81 rs2800733 6 127416930 AK127472 A G 1.05 82 rs9494624 6 137300960 SLC35D3 A G 1.04 83 rs474513 6 160770312 SLC22A3 A G 1.04 (Table S2.6 is continued…)

60 Position Risk Other OR or Phenotype # rsID Chr (hg19) Nearest Gene Allele Allele Effect Size Type 2 Diabetes 84 rs4709746 6 164133001 AK093114/QKI C T 1.06 (continued) 85 rs10228066 7 15063569 DGKB T C 1.07 86 rs4279506 7 23512896 IGF2BP3 G C 1.06 87 rs1708302 7 28198677 JAZF1 C T 1.1 88 rs917195 7 30728452 CRHR2 C T 1.05 89 rs878521 7 44255643 YKT6/CAMK2B A G 1.06 90 rs11496066 7 102486254 FBXL13 T C 1.08 91 rs39328 7 103444978 RELN T C 1.04 92 rs6976111 7 117495667 CTTNBP2 A C 1.04 93 rs1562396 7 130457914 KLF14 G A 1.06 94 rs62492368 7 150537635 AOC1 A G 1.05 95 rs6459733 7 156930550 MNX1 G C 1.06 96 rs17689007 8 9974824 MSRA G A 1.04 97 rs57327348 8 10808687 XKR6 A T 1.04 98 rs10096633 8 19830921 LPL C T 1.07 99 rs10954772 8 30863938 PURG T C 1.04 100 rs13262861 8 41508577 ANK1 C A 1.07 101 rs10097617 8 95961626 TP53INP1 T C 1.04 102 rs149364428 8 97737741 CPQ A G 1.27 103 rs12680028 8 110123183 TRHR C G 1.04 104 rs3802177 8 118185025 SLC30A8 G A 1.11 105 rs17772814 8 128711742 CASC11 G A 1.08 106 rs1561927 8 129568078 PVT1 C T 1.04 107 rs4977213 8 145507304 BOP1 C T 1.05 108 rs10974438 9 4291928 GLIS3 C A 1.05 109 rs7022807 9 19067833 HAUS6 G A 1.04 110 rs7867635 9 20241069 MLLT3 C T 1.04 111 rs10811660 9 22134068 CDKN2A/B G A 1.27 112 rs1412234 9 28410683 LINGO2 C T 1.04 113 rs12001437 9 34074476 UBAP2 C T 1.04 114 rs11137820 9 81359113 PSAT1 C G 1.04 115 rs17791513 9 81905590 TLE4 A G 1.1 116 rs2796441 9 84308948 TLE1 G A 1.07 117 rs55653563 9 97001682 ZNF169 A C 1.04 118 rs505922 9 136149229 ABO C T 1.05 119 rs11257655 10 12307894 CDC123/CAMK1D T C 1.09 120 rs2642588 10 71466578 COL13A1 G T 1.05 121 rs703972 10 80952826 ZMIZ1 G C 1.07 122 rs10882101 10 94462427 HHEX/IDE T C 1.06 123 rs7903146 10 114758349 TCF7L2 T C 1.37 124 rs2280141 10 124193181 PLEKHA1 T G 1.05 125 rs4929965 11 2197286 INS/IGF2/TH A G 1.07 126 rs2237895 11 2857194 KCNQ1 C A 1.12 127 rs141521721 11 14763828 PDE3B A C 1.13 128 rs5213 11 17408404 KCNJ11 C T 1.07 129 rs145678014 11 32927778 QSER1 G T 1.11 130 rs2767036 11 34982148 PDHX C A 1.04 131 rs1061810 11 43877934 HSD17B12 A C 1.05 132 rs7115753 11 45912013 MAPK8IP1 A G 1.04 133 rs7124681 11 47529947 CELF1 A C 1.04 134 rs1783541 11 65294799 SCYL1 T C 1.06 135 rs11820019 11 69448758 CCND1 T C 1.16 136 rs77464186 11 72460398 CENTD2/ARAP1 A C 1.11 137 rs10830963 11 92708710 MTNR1B G C 1.1 138 rs67232546 11 128398938 ETS1 T C 1.06 139 rs2066827 12 12871099 CDKN1B G T 1.05 140 rs718314 12 26453283 ITPR2 G A 1.05 141 rs10842994 12 27965150 KLHL42 C T 1.08 142 rs2258238 12 66221060 HMGA2 T A 1.1 143 rs1796330 12 71522953 TSPAN8/LGR5 G C 1.05 144 rs2197973 12 95928560 USP44 T C 1.04 145 rs77864822 12 97848775 RMST A G 1.08 146 rs1426371 12 108629780 WSCD2 G A 1.05 147 rs34965774 12 118412373 KSR2 A G 1.06 148 rs56348580 12 121432117 HNF1A G C 1.05 149 rs4148856 12 123450765 ABCB9 C G 1.05 150 rs12811407 12 133069698 FBRSL1 A G 1.05 151 rs34584161 13 26776999 RNF6 A G 1.05 152 rs11842871 13 31042452 HMGB1 G T 1.04 153 rs576674 13 33554302 KL G A 1.05 154 rs963740 13 51096095 DLEU1 A T 1.04 155 rs9537803 13 58366634 PCDH17 C T 1.04 156 rs9563615 13 59077406 PCDH17 A T 1.05 157 rs1359790 13 80717156 SPRY2 G A 1.09 158 rs7987740 13 109947213 MYO16/IRS2 T C 1.04 159 rs17122772 14 23288935 SLC7A7 G C 1.04 160 rs17522122 14 33302882 AKAP6 T G 1.04 161 rs8017808 14 38848419 CLEC14A G T 1.04 162 rs17836088 14 79932041 NRXN3 C G 1.06 163 rs8010382 14 91963722 SMEK1 G A 1.04 164 rs62007683 14 103894071 MARK3 G T 1.04 165 rs34715063 15 38873115 RASGRP1 C T 1.1 166 rs11070332 15 41809205 RPAP1 A G 1.05 167 rs2456530 15 53091553 ONECUT1 T C 1.06 168 rs117483894 15 57456802 TCF12 G A 1.1

(Table S2.6 is continued…)

61 Position Risk Other OR or Phenotype # rsID Chr (hg19) Nearest Gene Allele Allele Effect Size Type 2 Diabetes 169 rs8037894 15 62394264 C2CD4A/B G C 1.05 (continued) 170 rs7178762 15 63871292 USP3 C T 1.04 171 rs4776970 15 68080886 MAP2K5 A T 1.04 172 rs13737 15 75932129 IMP3 G T 1.05 173 rs1005752 15 77818128 HMG20A A C 1.08 174 rs4932265 15 90423293 AP3S2 T C 1.07 175 rs12910825 15 91511260 PRC1 G A 1.05 176 rs6600191 16 295795 ITFG3 T C 1.06 177 rs3751837 16 3583173 CLUAP1 T C 1.04 178 rs8046545 16 28915217 ATP2A1 G A 1.04 179 rs11642430 16 30045789 FAM57B G C 1.04 180 rs1421085 16 53800954 FTO C T 1.13 181 rs862320 16 69651866 NFAT5 C T 1.04 182 rs72802342 16 75234872 CTRB2 C A 1.17 183 rs2925979 16 81534790 CMIP T C 1.05 184 rs12920022 16 89564055 SPG7 A T 1.05 185 rs1377807 17 4045440 ZZEF1 C G 1.05 186 rs7222481 17 9785187 GLP2R C G 1.04 187 rs4925109 17 17661802 RAI1 A G 1.05 188 rs71372253 17 29413019 NF1 C T 1.08 189 rs10908278 17 36099952 HNF1B T A 1.08 190 rs34855406 17 40731411 FAM134C C G 1.05 191 rs35895680 17 47060322 IGF2BP1 C A 1.06 192 rs60276348 17 62203304 ERN1 T C 1.05 193 rs61676547 17 65892507 BPTF C G 1.06 194 rs7240767 18 7070642 LAMA1 C T 1.04 195 rs62080313 18 36278709 CELF4 C T 1.06 196 rs72926932 18 53050646 TCF4 C A 1.09 197 rs17684074 18 54675384 WDR7 G C 1.04 198 rs9957145 18 56876228 GRP G A 1.05 199 rs523288 18 57848369 MC4R T A 1.05 200 rs12454712 18 60845884 BCL2 T C 1.05 201 rs7249758 19 4948862 UHRF1 A G 1.05 202 rs75253922 19 7240848 INSR C T 1.05 203 rs4804833 19 7970635 MAP2K7 A G 1.05 204 rs3111316 19 13038415 FARSA A G 1.05 205 rs8107974 19 19388500 SUGP1 T A 1.1 206 rs10406327 19 33890838 PEPD C G 1.04 207 rs429358 19 45411941 TOMM40/APOE T C 1.08 208 rs10406431 19 46157019 GIPR A G 1.05 209 rs3810291 19 47569003 ZC3H4 A G 1.05 210 rs13041756 20 21466795 NKX2.2 C T 1.06 211 rs2268078 20 32596704 RALY A G 1.04 212 rs1800961 20 43042364 HNF4A T C 1.18 213 rs6063048 20 45598564 EYA2 G A 1.05 214 rs11699802 20 48832135 CEBPB C T 1.04 215 rs34454109 20 51223594 TSHZ2 A T 1.04 216 rs6070625 20 57394628 GNAS G C 1.05 217 rs6518681 22 30609554 HORMAD2/LIF G A 1.09 218 rs117001013 22 32348841 YWHAH C T 1.07 219 rs5758223 22 41489920 EP300 A G 1.04 220 rs738408 22 44324730 PNPLA3 T C 1.05 221 rs1801645 22 50356850 PIM3 C T 1.04 Type 1 Diabetes 1 rs2476601 1 114377568 PTPN22 A G 1.89 2 rs6691977 1 200814959 CAMSAP2 C T 1.13 3 rs3024505 1 206939904 IL10 G A 1.16 4 rs13415583 2 100764087 AFF3 T G 1.11 5 rs4849135 2 111615079 ACOXL G T 1.12 6 rs2111485 2 163110536 IFIH1 G A 1.18 7 rs35667974 2 163124637 IFIH1 T C 1.69 8 rs72871627 2 163136942 IFIH1 A G 1.64 9 rs3087243 2 204738919 CTLA4 G A 1.19 10 rs113010081 3 46457412 CCRL2 T C 1.18 11 rs2611215 4 166574267 CPE/LINC01179 A G 1.18 12 rs11954020 5 35883251 IL7R G C 1.11 13 rs72928038 6 90976768 BACH2 A G 1.20 14 rs1538171 6 126752884 CENPW G C 1.12 15 rs62447205 7 50465830 IKZF1 A G 1.12 16 rs10277986 7 51028987 GRB10/COBL A T 1.32 17 rs6476839 9 4290823 GLIS3 T A 1.12 18 rs61839660 10 6094697 IL2RA C T 1.61 19 rs10795791 10 6108340 IL2RA G A 1.16 20 rs41295121 10 6129643 IL2RA C T 2.04 21 rs12416116 10 90035654 RNLS C A 1.18 22 rs689 11 2182224 INS T A 2.38 23 rs72853903 11 2198665 INS C T 1.18 24 rs917911 12 9905851 CD69 C A 1.10 25 rs705705 12 56435504 RPS26 C G 1.25 26 rs653178 12 112007756 ATXN2 C T 1.30 27 rs9585056 13 100081766 UBAC2 C T 1.12 28 rs1456988 14 98488007 LINC01550 G T 1.12 29 rs56994090 14 101306447 MEG3 T C 1.14 30 rs72727394 15 38847022 RASGRP1 T C 1.15 31 rs34593439 15 79234957 CTSH G A 1.28 32 rs12927355 16 11194771 CLEC16A C T 1.22 (Table S2.6 is continued…)

62 Position Risk Other OR or Phenotype # rsID Chr (hg19) Nearest Gene Allele Allele Effect Size Type 1 Diabetes 33 rs193778 16 11351211 RMI2 G A 1.14 (continued) 34 rs8056814 16 75252327 CTRB1/BCAR1 A G 1.32 35 rs12453507 17 38053207 IKZF3, ORMDL3, GSDM G C 1.11 36 rs757411 17 38775150 CCR7B T C 1.11 37 rs1893217 18 12809340 PTPN2 G A 1.21 38 rs12971201 18 12830538 PTPN2 G A 1.12 39 rs1615504 18 67526644 CD226 T C 1.13 40 rs34536443 19 10463118 TYK2 G C 1.49 41 rs12720356 19 10469975 TYK2 A C 1.22 42 rs402072 19 47219122 PRKD2 T C 1.15 43 rs516246 19 49206172 FUT2 T C 1.15 44 rs6043409 20 1616206 SIRPG G A 1.14 45 rs11203202 21 43825357 UBASH3A G C 1.16 46 rs6518350 21 45621817 ICOSLG A G 1.14 47 rs4820830 22 30531091 HORMAD2 C T 1.14 48 rs229533 22 37587111 C1QTNF6, RAC2 C A 1.11 HOMA-IR 1 rs780094 2 27741237 GCKR C T 0.02 (Vassy et al. 2 rs13389219 2 165528876 GRB14 C T 0.012 Diabetes. 2014) 3 rs2943640 * 2 227093585 IRS1 C A 0.009 4 rs1801282 * 3 12393125 PPARG C G 0.016 5 rs459193 * 5 55806751 ANKRD55 G A 0.012 * in high LD 6 rs13233731 * 7 130437689 KLF14 G A 0.008 with variant in 7 rs2261181 12 66212318 HMGA2 T C 0.014 insulin action 8 rs9936385 16 53819169 FTO C T 0.015 PRS 9 rs12970134 18 57884750 MC4R A G 0.008 10 rs8182584 19 33909710 PEPD T G 0.012 Insulin Action 1 rs3768321 1 40035928 MACF1 T G 1.08 (Scott, et al. 2 rs2972156 * 2 227117778 IRS1 G C 1.08 Diabetes. 2017) 3 rs11712037 * 3 12344730 PPARG C G 1.14 4 rs11747901 5 53301561 ARL15 G C 1.07 5 rs9687833 5 55861601 ANKRD55 A G 1.1 * in high LD 6 rs173964 * 5 55809465 ANKRD55 G A 1.06 with variant in 7 rs10954284 * 7 130463758 KLF14 T A 1.06 HOMA-BIR PRS 1 rs2075423 * 1 214154719 PROX1 G T 0.013 (Vassy et al. 2 rs10203174 * 2 43690030 THADA C T 0.026 Diabetes. 2014) 3 rs1496653 3 23454790 UBE2E2 A G 0.009 4 rs11717195 * 3 123082398 ADCY5 T C 0.018 5 rs4402960 * 3 185511687 IGF2BP2 T G 0.012 * in high LD 6 rs6819243 4 1293245 MAEA T C 0.025 with variant in 7 rs7756992 * 6 20679709 CDKAL1 G A 0.01 insulin secretion 8 rs17168486 7 14898282 DGKB T C 0.013 PRS 9 rs10278336 * 7 44245363 GCK A G 0.0128 10 rs3802177 * 8 118185025 SLC30A8 G A 0.016 11 rs10758593 * 9 4292083 GLIS3 A G 0.015 12 rs10811661 * 9 22134094 CDKN2A/B T C 0.009 13 rs11257655 * 10 12307894 CDC123 T C 0.009 14 rs1111875 * 10 94462882 HHEX/IDE C T 0.004 15 rs7903146 * 10 114758349 TCF7L2 T C 0.02 16 rs163184 11 2847069 KCNQ1 G T 0.009 17 rs5215 11 17408630 KCNJ11 T C 0.001 18 rs1552224 11 72433098 ARAP1 A C 0.017 19 rs10830963 11 92708710 MTNR1B G C 0.039 20 rs4502156 * 15 62383155 VPS13C A G 0.01 Insulin Secretion 1 rs340874 * 1 214159256 PROX1 C T 1.07 (Scott, et al. 2 rs6757251 * 2 43734847 THADA C T 1.14 Diabetes. 2017) 3 rs11708067 * 3 123065778 ADCY5 A G 1.12 4 rs4402960 * 3 185511687 IGF2BP2 T G 1.15 5 rs7451008 * 6 20673880 CDKAL1 C T 1.19 * in high LD 6 rs10238625 7 15054232 DGKB A G 1.07 with variant in 7 rs878521 * 7 44255643 GCK A G 1.05 HOMA-B PRS 8 rs3802177 * 8 118185025 SLC30A8 G A 1.12 9 rs10758593 * 9 4292083 GLIS3 A G 1.05 10 rs10965248 * 9 22132878 CDKN2A/B T C 1.15 11 rs11257659 * 10 12309269 CDC123/CAMK1D T C 1.08 12 rs11187140 * 10 94466910 HHEX/IDE G A 1.14 13 rs7903146 * 10 114758349 TCF7L2 T C 1.34 14 rs4774420 * 15 62117975 C2CD4A C T 1.08 2hPG 1 rs1260326 2 27730940 GCKR T C 0.07 2 rs2877716 3 123094451 ADCY5 C T 0.09 3 rs7651090 3 185513392 IGF2BP2 G A 0.058 4 rs1019503 5 96254817 ERAP2 A G 0.063 5 rs6975024 7 44231886 GCK C T 0.103 6 rs11782386 8 9201787 PPP1R3B C T 0.099 7 rs7903146 10 114758349 TCF7L2 T C 0.118 8 rs17271305 15 62332980 VPS13C G A 0.06 9 rs10423928 19 46182304 GIPR A T 0.09 FPG 1 rs340874 1 214159256 PROX1 C T 0.013 2 rs780094 2 27741237 GCKR C T 0.029 3 rs560887 2 169763148 G6PC2 C T 0.075 4 rs11715915 3 49455330 AMT C T 0.012 5 rs11708067 3 123065778 ADCY5 A G 0.027 6 rs11920090 3 170717521 SLC2A2 T A 0.02 7 rs7651090 3 185513392 IGF2BP2 G A 0.013 8 rs7708285 5 76425867 ZBED3 G A 0.011 9 rs4869272 5 95539448 PCSK1 T C 0.018 (Table S2.6 is continued)

63 Position Risk Other OR or Phenotype # rsID Chr (hg19) Nearest Gene Allele Allele Effect Size FPG 10 rs17762454 6 7213200 RREB1 T C 0.012 (continued) 11 rs9368222 6 20686996 CDKAL1 A C 0.014 12 rs2191349 7 15064309 DGKB-TMEM195 T G 0.03 13 rs4607517 7 44235668 GCK A G 0.062 14 rs6943153 7 50791579 GRB10 T C 0.015 15 rs983309 8 9177732 PPP1R3B T G 0.026 16 rs13266634 8 118184783 SLC30A8 C T 0.027 17 rs7034200 9 4289050 GLIS3 A C 0.018 18 rs10811661 9 22134094 CDKN2B T C 0.024 19 rs16913693 9 111680359 IKBKAP T G 0.043 20 rs3829109 9 139256766 DNLZ G A 0.017 21 rs10885122 10 113042093 ADRA2A G T 0.022 22 rs7903146 10 114758349 TCF7L2 T C 0.023 23 rs11605924 11 45873091 CRY2 A C 0.015 24 rs7944584 11 47336320 MADD A T 0.021 25 rs174550 11 61571478 FADS1 T C 0.017 26 rs11603334 11 72432985 ARAP1 G A 0.019 27 rs10830963 11 92708710 MTNR1B G C 0.067 28 rs2657879 12 56865338 GLS2 G A 0.012 29 rs10747083 12 133041618 P2RX2 A G 0.013 30 rs11619319 13 28487599 PDX1 G A 0.02 31 rs576674 13 33554302 KL G A 0.017 32 rs3783347 14 100839261 WARS G T 0.017 33 rs11071657 15 62433962 C2CD4B A G 0.008 34 rs2302593 19 46196634 GIPR C G 0.014 35 rs6113722 20 22557099 FOXA2 G A 0.035 36 rs6072275 20 39743905 TOP1 A G 0.016 Autoimmune 1 rs2843403 1 2529097 MMEL1 C T 1.19 Thyroid 2 rs1534422 2 12640741 TRIB2 G A 1.16 3 rs13093110 3 188125120 LPP T C 1.18 4 rs72928038 6 90976768 BACH2 A G 1.21 6 rs4409785 11 95311422 FAM76B C T 1.21 7 rs4768412 12 42869140 PRICKLE1 T C 1.19 8 rs57348955 16 31185882 ITGAM G A 1.20 Islet 1 rs6679677 1 114303808 PTPN22 A C 1.46 Autoimmunity 2 rs11705721 3 58400414 PXK/PDHB G A 1.27 3 rs73043122 6 167383267 RNASET2/MIR3939 C G 1.42 4 rs113306148 10 124159838 PLEKHA1/MIR3941 A G 1.97 5 rs3842727 11 2184848 INS/TH A C 1.34 6 rs3184504 12 111884608 SH2B3 A G 1.35 7 rs9934817 16 6136219 RBFOX1 G A 2.66 8 rs428595 22 22016391 PPIL2 A G 2.47 Hemoglobin A1c 1 rs2779116 1 158585415 SPTA1 T C 0.024 2 rs552976 2 169791438 G6PC2/ABCB11 G A 0.047 3 rs1800562 6 26093141 HFE G A 0.063 4 rs1799884 7 44229068 GCK T C 0.038 5 rs4737009 8 41630405 ANK1 A G 0.058 6 rs16926246 10 71093392 HK1 C T 0.089 7 rs1387153 11 92673828 MTNR1B T C 0.028 8 rs7998202 13 113331868 ATP11A/TUBGCP3 G A 0.031 9 rs1046896 17 80685533 FN3K T C 0.035 10 rs855791 22 37462936 TMPRSS6 A G 0.027

(Table S2.6 ends here)

64 Table S2.7. Polygenic Risk Score (PRS) association statistics.

Variant Phenotype selection # of SNPs Reference p-value HR T2D Genome-wide Mahajan, et al. significant 221 8.84E-17 1.285 Nat Genet. 2018. variants Genome-wide significant Mahajan, et al. 217 1.45E-06 1.194 variants, without Nat Genet. 2018. known Genome-wide Mahajan, et al. 1067 1.80E-18 1.297 (PRSice-2) Nat Genet. 2018. T1D Genome-wide Onengut- significant 48 Gumuscu, et al. 2.45E-02 1.077 variants Nat Genet. 2015. Onengut- Genome-wide 1029 Gumuscu, et al. 7.09E-05 1.127 (PRSice-2) Nat Genet. 2015. Insulin HOMA-IR Vassy, et al. 10 9.03E-01 1.004 resistance selected variants Diabetes. 2014. Insulin Action Scott, et al. 7 9.49E-01 0.998 selected variants Diabetes. 2017. HOMA-IR Dupuis, et al. Nat genome-wide 233 2.00E-02 1.070 Genet. 2010. (PRSice-2) Insulin HOMA-B Vassy, et al. 20 1.47E-09 1.192 secretion selected variants Diabetes. 2014. HOMA-B Vassy, et al. selected variants 16 9.92E-03 1.081 Diabetes. 2014. without known Insulin Secretion Scott, et al. 14 7.57E-18 1.247 selected variants Diabetes. 2017. Insulin Secretion Scott, et al. selected variants, 10 1.20E-02 1.078 Diabetes. 2017. without known HOMA-B Dupuis, et al. Nat genome-wide 3940 1.50E-03 1.103 Genet. 2010. (PRSice-2) 2hPG Stančáková, et al. selected variants 9 Diabetologia. 2.99E-08 1.178 2017. Stančáková, et al. selected variants, 7 Diabetologia. 7.86E-02 1.050 without known 2017. FPG Stančáková, et al. selected variants 36 Diabetologia. 1.25E-04 1.112 2017. Stančáková, et al. selected variants, 32 Diabetologia. 5.70E-02 1.055 without known 2017. Thyroid Cooper, et al. autoimmunity selected variants 7 Hum Mol Genet. 6.60E-01 1.013 2012. Islet Sharma, et al. J Autoimmunity selected variants 8 Autoimmun. 4.66E-01 1.026 2018. Hemoglobin Soranzo, et al. selected variants 10 6.79E-01 1.012 A1C Diabetes. 2010. Results of associations for PRSs of each disease or trait with CFRD are shown. Any variants annotated to TCF7L2, CDKAL1, IGF2BP2 or CDKN2A/B were removed in

"without known" PRSs.

65 Table S2.8. Variants included in the CFRD PRS.

Position rsID Gene Chr (hg19) Reason for Inclusion

rs4077468 SLC26A9 1 205914757 GW-Sig

rs838455 PTMA 2 232560638 GW-Sig rs7903146 TCF7L2 10 114758349 GW-sig, and significant FDR p-value in HOMA-B and T2D PRS rs7756992 CDKAL1 6 20679709 Significant FDR p-value in HOMA-B and T2D PRS

rs11708067 ADCY5 3 123065778 Significant FDR p-value in HOMA-B and T2D PRS rs10811660 CDKN2A/B 9 22134068 Significant FDR p-value in HOMA-B and T2D PRS rs3802177 SLC30A8 8 118185025 Significant FDR p-value in HOMA-B and T2D PRS

rs6819243 MAEA 4 1293245 Significant FDR p-value in HOMA-B PRS rs6780171 IGF2BP2 3 185503456 Significant FDR p-value in HOMA-B PRS rs10974438 GLIS3 9 4291928 Significant FDR p-value in HOMA-B and T2D PRS

rs17168486 DGKB 7 14898282 Significant FDR p-value in HOMA-B PRS rs11699802 CEBPB 20 48832135 Significant FDR p-value in T2D PRS rs11070332 LTK 15 41809205 Significant FDR p-value in T2D PRS

rs9873618 SLC2A2 3 170733076 Significant FDR p-value in T2D PRS rs13262861 ANK1 8 41508577 Significant FDR p-value in T2D PRS rs72802342 BCAR1 16 75234872 Significant FDR p-value in T2D PRS

rs67232546 ETS1 11 128398938 Significant FDR p-value in T2D PRS

rs13085136 SHQ1 3 72865183 Significant FDR p-value in T2D PRS

Some variants in the HOMA-B PRS were in high LD with a variant in the T2D PRS; in this case the T2D variant was selected.

66 Table S2.9. Association summary statistics for F508del only, non F508del homozygotes, and F508del homozygosity * SNP interaction term analyses in variants that exceeded genome-wide significance.

F508del F508del non F508del homozygosity * homozygotes homozygotes SNP (n=3860) (n=1880) (n=5740) rsID chr Position (hg19) p-value HR p-value HR p-value HR rs4077468 1 205914757 1.30E-07 0.70 1.70E-02 0.78 0.44 0.91 rs838455 2 232560638 3.59E-06 1.72 2.93E-03 1.70 0.94 1.02 rs34872471 10 114754071 3.16E-09 1.50 1.41E-04 1.51 0.99 1.00

The top variant at each locus has been shown.

67

Chapter 3

Common variants in CDKN2B-AS1 delay onset of CFRD in females

68 3.1 Introduction

Females with cystic fibrosis have worse outcomes, including poorer survival, worse lung function, and earlier onset of cystic fibrosis-related diabetes (CFRD)(1).

Women with CF had shorter life expectancies than men according to the United States

Cystic Fibrosis Foundation Patient Registry (CFFPR) between 1988 through 1992 (n =

21,047; approximately 85% of all US patients diagnosed with CF at the time)(70).

Females were 60% more likely to die than males from ages 1-20, and this difference remained even after adjusting for the confounding factors nutritional status, pulmonary function, airway microbiology, pancreatic insufficiency, age at diagnosis, mode of presentation and race.

Similarly, the proportion of adult females with CFRD was higher at every age than the proportion of males (log-rank test; P = 0.0008) in a study which reviewed clinical information of individuals with CF with severe genotypes followed at the

University of Minnesota CF center from 2008-2012. Although severe CFTR genotypes were equally prevalent in adult males and females (81.4% vs. 81.3%; P = 0.972), 66% of females with a severe genotype developed CFRD compared with 54% of males with a severe genotype (P = 0.024). Thus, a severe genotype increased the risk of CFRD by a factor of about four in males and five in females(1).

Many studies have explored potential risk factors for this “gender gap” (71), focusing on the role of female sex hormones, in particular, estrogen. Estrogen has been shown to modulate ion channels on airway epithelium(72) and therefore it puts CF females at a greater disadvantage than males by enhancing mucus viscosity. In addition, estrogen has been shown to promote conversion of P. aeruginosa from a nonmucoid to

69 mucoid form in CF, which is the more drug resistant and pathogenic (73). However, sex hormones are likely not the only explanation for this gender gap, as the gender gap has been observed before pubertal hormone surges. Others believe the gender disparity has more to do with differences in morphometrics, such as airway diameter or airway branch angles, however this would not explain the sex difference in cystic fibrosis-related diabetes.

In Chapter 2, we identified genetic variants that modify the age of onset of

CFRD. Motivated by this, in this chapter, we tested whether these genetic modifiers of

CFRD contributed to the gender gap in onset of CFRD in the same set of individuals

(5,740 individuals with CF; 5,364 of whom are unrelated). To do so, we conducted male- only, female-only stratified analysis as well as a model including a sex*genetic variant interaction term in genome-wide association and candidate-based association tests.

3.2 Results

3.2.1 Variants at CDKN2B-AS1 have a female-specific association with CFRD

In our cohort, consistent with what has been observed in previous studies, females develop CFRD at a significantly younger age than females (p-value: 5.55e-20, HR: 2.11;

Figure 3.1). To identify whether CFRD-associated variants at SLC26A9, PTMA,

TCF7L2, CDKN2B-AS1, CDKAL1 and IGF2BP2 influence this gender gap, we conducted stratified analyses (male-only and female-only) and tested for association including a term for sex*SNP interaction at these loci.

Of these previously identified CFRD modifier loci, variants at SLC26A9, PTMA,

TCF7L2, CDKAL1 and IGF2BP2 were not male or female specific. However,

70 surprisingly, variants at CDKN2B-AS1 associated with earlier onset of CFRD in females

(e.g. rs1333045 p-value:2.2e-7, HR:1.46, 95% CI:1.27-1.69, effect allele: C), whereas they were not associated with CFRD in males (e.g., rs1333045; male-only analysis p- value: 0.18 HR: 1.11, 95% CI:0.95-1.30) (Figure 3.2). Females of rs1333045 C/C or C/T genotypes developed CFRD at significantly earlier ages than did males (log rank p-values

5.04e-8 and 6.55e-14, respectively), while females with the T/T genotype had lower rate of CFRD which did not differ from that of males (log rank p-value: 0.317). The sex*SNP interaction term associated with CFRD was also significant at this locus (e.g., rs1333045 p-value: 7.3e-3; HR: 1.37), indicating association significantly differed between males and females. The sex variable in this analysis associated with a similar effect size (p- value: 0.014, HR: 1.45). Thus, the effect size of this variant is sufficient to account for almost half of the observed difference in CFRD risk between males and females.

The observed female-specific association at CDKN2A/B was consistently observed in variants in high LD, and across different subsets (Figure 3.3). When tested in an independent replication population (n=591), variants at this locus did not associate with CFRD in males or females (male-only p-value: 0.186, HR: 0.76; female-only p- value: 0.979; HR: 0.99). However, an interaction term analysis revealed that this association is not different between the test population and the replication population in both males (p-value: 0.183) and females (p-value: 0.175), demonstrating that this might be a power issue.

3.2.2 CFRD-associated variants at CDKN2B-AS1 are distinct from the T2D-associated variants at this locus

71 A detailed analysis of the locus architecture revealed physically distinct clusters of variants that also associate with T2D (Figure 3.4B) or with coronary artery disease

(CAD) (Figure 3.4C). The variants most strongly associated with T2D at this locus

(rs10965250; p-value: 7.60e-77; beta: -0.16(18)) are weakly associated with CFRD in the same direction (p-value: 0.017; HR: 0.82), and conditioning on rs10965250 genotype did not affect the CFRD association (Figure 3.4D). The previously reported CFRD and T2D associated variant (rs1412829) lies within a secondary T2D-associated locus. Some decrease in CFRD signal was observed when conditioning on the secondary T2D variant

(e.g. when adjusted for rs1412829, another nearby variant rs1333045 gave p-value:

0.0037; HR: 0.81) (Figure 3.4E). Notably, the CFRD-associated variants at CDKN2A/B

(top variant: rs1333045; p-value: 2.4e-5; HR: 1.28) are the variants most significantly associated with risk of CAD(74); and conditioning on rs1333045 completely abolishes the signal for association with CFRD (Figure 3.4F). In summary, these analyses give the surprising result that the variants at CDKN2A/B associated with risk to CFRD appear to correlate with genetic susceptibility for CAD, to some extent with the other markers in this T2D cluster, and not with the primary T2D cluster at the CDKN2A/B locus.

3.2.3 Identification of additional sex-specific genetic modifiers of CFRD through GWASs

To identify additional sex-specific loci, we conducted a genome-wide analysis considering interaction with sex (Figure 3.5). This analysis revealed a rare

(MAF=3.13%) variant intronic of CNTN3 exceeding genome-wide significance

(rs79480701; interaction term p-value: 3.5e-8) which was associated with CFRD in males and females, but showed an effect of opposite directions. The common allele (C)

72 associated with later CFRD onset in males (male-only p-value: 4.7e-4, HR: 0.49), and with younger CFRD onset in females (female-only p-value: 8.7e-6, HR: 3.19), which is consistent with the younger onset of CFRD observed in females. Because this variant was imputed (imputation quality score > 0.97), we Sanger sequenced 52 individuals (1 T/T, 8

T/C and 43 C/C individuals) at this locus, and found only 1 of these individuals was imputed incorrectly (imputed dosage: 0.25, actual genotype: T/C), confirming that the variant imputation was of high quality. In an independent replication sample of 591 individuals, this variant did not associate with CFRD in males (male-only p-value:

0.7405, HR: 0.829) or females (female-only p-value: 0.20, HR: 2.12). However, the direction of the effect was consistent with the test population, and interaction term analyses revealed that the association in the test and replication populations were not different (p-values 0.90 and 0.54 for male-only and female-only, respectively). This likely occurred because this analysis was not highly powered, as only 1 individual was homozygous and 33 individuals were heterozygous for the rare allele of this variant in the replication population.

Additionally, we conducted male-only and female-only GWASs for CFRD

(Figure 3.5). In the male-only analysis, variants intronic of KDM2B exceeded genome- wide significance, whereas in females there was no evidence of association (rs7962571;

MAF: 5.8%, male-only p-value: 3.0e-8, HR: 4.5; female-only p-value: 0.34, HR: 0.6).

However, all of the associated variants were imputed, with a lower imputation quality

(imputation quality r-squared range: 0.39-0.52). Sequencing of 49 of these individuals

(17 A/G and 32 G/G individuals) revealed 4 individuals were imputed incorrectly

73 (imputed as heterozygous but sequenced as G/G). For this reason, this variant must be genotyped in a larger group of individuals to be verified as truly male-specific.

In the female-only analysis, two loci exceeded genome-wide significance; variants at TCF7L2 (top variant: rs34872471) and a variant intronic of USP40

(rs838563). The variants at TCF7L2 were not female-specific (e.g., rs34872471; female- only p-value: 2.4e-9, HR: 1.57; male-only p-value: 8.6e-6, HR: 1.45; interaction term p- value: 0.37; HR: 0.89), demonstrating no statistically significant sex difference for the

TCF7L2 association with CFRD. The variant in USP40 associated with CFRD in females but not males (male-only p-value: 0.34, HR: 0.69; female only p-value: 3.9e-8, HR:

5.69). However, this variant was imputed (imputation quality r-squared range: 0.36-0.45), and had no other variants nearby showing association with CFRD. Sanger sequencing of

50 individuals (32 G/G, 16 A/G and 2 A/A) demonstrated imputation errors suggesting that this signal may be spurious.

3.3 Discussion

This study provides evidence that genetic modifiers could explain at least part of the gender gap in age of onset of CFRD. Common genetic variants at CDKN2B-AS1 were associated with CFRD onset in females, but not males; this difference accounted for almost half of the observed difference in CFRD risk between males and females in our population. In addition, a rare variant at CNTN3 (MAF: 3.13%) associated with CFRD in males and females in opposite directions, and could also be contributing to the gender gap in risk of CFRD.

74 T2D association between variants near CDKN2B-AS1 in a mostly European population is not sex-specific (e.g., rs1333045; female-only p-value: 0.009, OR: 0.95,

95% CI: 0.92-0.98; male-only p-value: 0.014, OR: 0.96, 95% CI: 0.94-0.99; data taken from (75)). Additionally, the T2D association of the other two LD-blocks in this region in the same population are also not female-specific (e.g., rs10811661; female-only p-value:

1.2e-13, OR:0.85, 95% CI: 0.82-0.89; male-only p-value: 3.9e-13, OR:0.86, 95%

CI:0.83-0.90; data taken from (75)). Interestingly, in a (North African) Tunisian Arabic population, variants at rs10811661 have been reported to be female-specific(76).

CDKN2A/B is a well-studied locus spanning >50kb which harbors at least 2 independent LD blocks(77). It has been shown to harbor variants influencing many diseases including T2D, CAD, and many types of cancer(78), though it is not clear why this region might influence risk to so many diseases. This locus harbors 3 genes;

CDKN2A, CDKN2B and CDKN2B-AS1. CDKN2A encodes the CDK inhibitor protein

(CDKI) p16INK4a and the p53 regulatory protein p14ARF, which share common second and third exons but have different first exons and promoters about 20 kb apart. The

CDKN2B gene encodes another CDKI, p15INK4b. These three proteins are involved in cell cycle regulation, aging, senescence, and apoptosis. CDKN2B-AS1 expresses the long noncoding RNA ANRIL (antisense noncoding RNA in the inhibitor of CDK4 (INK4) locus). Both type 2 diabetes and non-type 2 diabetes variants in this region appear to have a stronger effect on ANRIL expression than CDKN2A or CDKN2B expression(79).

This observed difference in CFRD association in males and females could be due to a sex differences in chromatin architecture at this locus, or availability of transcription

75 factors. An important future direction would be to analyze the chromatin architecture at this region in males and females in various relevant cell types.

An important limitation of this study is that we have not yet provided any replication in truly independent samples. Although we see evidence of association within our dataset, individuals recruited from various sites and genotyped in various platforms mostly support association in females but not males, testing for association of this locus in an independent population would confirm this sex-specific association.

In conclusion, we identified genetic variants appearing to influence the earlier onset of CFRD in females. This study is useful not only to inform differences in the mechanism of CFRD in males and females; but also suggests females harboring variants predisposing to earlier onset of CFRD could be screened for CFRD more frequently than males and females without those variants.

3.4 Methods

3.4.1 Samples, phenotypes, genotyping, imputation and quality control

Individuals with CF who have two severe CFTR mutations and/or exocrine pancreatic insufficiency were recruited in two phases, from four sites (Johns Hopkins

Twin and Sibling Study (TSS), Canadian CF Gene Modifier Study (CGS), Genetic

Modifier Study (GMS), and French CF Gene Modifier Consortium (FrGMC) (total n=5,740), as described in Section 2.4.1. Written informed consent was obtained from each participant and/or parents/guardians, and studies were approved by institutional review boards at participating sites as described in Section 2.4.1. Genotyping,

76 phenotyping, imputation and quality control were performed as described in Sections

2.4.1 and 2.4.2.

3.4.2 Genetic association testing

A combined analysis was conducted on individuals who were included in both phases. A majority of CF patients with exocrine pancreatic insufficiency (PI) are predicted to develop diabetes at some point, but the age at diagnosis of diabetes varies, we used a Cox proportional hazards (CoxPH) model to test for association (event = diagnosis of CFRD; time = age at CFRD diagnosis or age at last clinic visit if no CFRD).

Four PCs (estimated as described in Section 2.4.3) and site (Johns Hopkins University,

Hospital for Sick Children, University of North Carolina/Case Western Reserve

University, or University of Pierre and Marie Curie) were included as covariates in the analyses. Due to the known presence of first-degree relatives (twins and siblings) and possibility of additional unknown first degree relatives within our cohorts, we used the

CoxPH with frailty model to control for relatedness in all analyses (see Section 2.4.4 for details). In the genome-wide association analyses, genome-wide significance was defined as p<5e-8.

77 Figure 1: Survival plot for CFRD onset in males and females. Table below shows number of individuals who are above given age that have still not developed diabetes.

Figure 3.1: Kaplan–Meier plot of CFRD onset in males and females.

D

R

F

C

t Males

u

o

h

t

i

w Females

n

o

i

t

r

o

p

o

r

P

Age (years)

Number of individuals without diabetes by age Male 2011 1027 398 105 6 2 (n=3040) Female 1649 724 236 63 9 3 (n=2700)

The table below shows number of individuals who are above given age that have still not developed diabetes.

78 Figure 3.2. Cumulative incidence of CFRD in males and females binned by rs1333045 genotype.

80%

D

R

F 60%

C

f

o

e

c

n

e

d 40%

i

c

n

I

e

v

i

t

a

l

u 20% Female C/C (n=783)

m Female C/T (n=1,330) u Female T/T (n=587) C Male C/C (n=908) Male C/T (n=1,521) 0% Male T/T (n=611) 0 10 20 30 40 50 60

Age (years)

Cumulative incidence of CFRD plotted for females (red) and males (blue) divided by rs1333045 genotype (C/C: full line, C/T: dashed line, T/T: dotted line).

79 Figure 3.3. LocusZoom and forest plots of association with CFRD onset at the

CDKN2A/B locus.

A Male Only B

)

e

u

l

a

v

-

p

(

0

1

g

o

l

-

Female Only

)

e

u

l

a

v

-

p

(

0

1

g

o

l

-

C9orf53 CDKN2B-AS1 CDKN2A CDKN2B

21.95 22 22.05 22.1 22.15 Position on chr 9 (Mb)

(A) LocusZoom plot of female only and male only association with CFRD at the

CDKN2A/B locus. (B) Forest plot of hazard ratio of association between rs1333045, the top CFRD-associated variant, and CFRD onset in males (M) and females (F), binned by platform, site and CFTR genotype. FF indicates F508del homozygotes.

80 Figure 3.4 LocusZoom plots of association between rs1333045 and CFRD, T2D and

CAD.

Plotted SNPs Plotted SNPs

10 r2 r2 10 2 A rs10965250 B r 0.8 0.8 Figure 2 D 0.8 0.6 0.6 5 R 0.6 15 4 0.4 F 8 0.4

0 D 8 0.4 ) 0.2 C 0.2

3

R

0.2

e

F

3

h

u

)

3

C

)

l

t

e

e

)

i u 6

u

1

l

e

l a 6

h

a

u

a

l

s

v 10

v

t

a

v

r

w rs10217586

v

i

p

p rs1333045

-

(

(

0

p

0

1

(

1

n

w

0

g

p

n

g

1

o

o

l

g

l

o ( 4

o

n

o

l

0 4

i

g

o

1

t

i

n

t

g

5 i

a

i

a

n

o

i

l c 2 rs72652478

c

o

- 2

i

o

o

t

i

s

s

s

d

s

n 0 A A 0

o 0

c C9orf53 CDKN2B−AS1 C9orf53 CDKN2B−AS1 CDKN2A CDKN2B Plotted SNPs CDKN2A CDKN2B Plotted SNPs 21.95 22 22.0522.05 22.122.1 22.1522.15 r2 21.95 22 22.05 22.1 22.15 Positionosition onon chr9chr9 (Mb)(Mb) 10 r2 rs10965250 0 Position on chr9 (Mb)

D C 0.8 D 5 0.8

2 0.6 2 15 D 0.6

5

T ) 0.4

R

6 8 0.4

e 0.2 F

h 9 0.2

C

u

t

0

l

i

)

h

1

a

e

t

)

w

s

u

l

i

e

v

a r 6

u

l

v - 10

a

w

n

v

p

n

(

p rs1333045 − 0 rs10217586

p

1

o

(

(

n

o

g

0

i

1

o

0

l

o

g

t

g

1

i

o l 4

t

a

n

g

i

i

a

i

c o 5

n

l

c

o

o

-

o

i

s

t s 2

i

s

s

d

A

A

n

0 o

c 0

C9orf53 CDKN2B−AS1 C9orf53 CDKN2B−AS1 CDKN2A CDKN2B Plotted SNPs CDKN2A CDKN2B Plotted SNPs 21.95 22 22.05 22.1 22.15 35 2 21.95 22 22.05 22.1 22.15 r Position on chr9 (Mb) 2 10 D r rs1333045 Position on chr9 (Mb) E F 9

A 0.8

30 2 0.8

) C 0.6

D 8 0.6 rs1333045

e

2

0.4 R 8 0.4

h

F

1

u 0.2

t

l 25 0.2

i

C

4

a

1

h

w

)

s

v

e

)

t

u

r e

-

i l 20 6

u

a

l

n

v

a

p

n

v

w

p

(

o

(

p

o

0

i

(

0

1

n

0

g

1 t 15

1

o

g

g

l

o

o

l

i

a

− g 4

n

i

t

i

o

c

a

n

l 10 i rs10217586

-

o

c

o

i

s

o

t

i

s 2 5 s

s

d

A

n

A

o

0 c 0

CC9orf539orf53 CCDKN2BDKN−2AS1B-AS1 CC9orf539orf53 CDKCDKN2BN2B−-AS1AS1 CDCDKN2AKN2A CDCDKN2BKN2B CDKCDKN2AN2A CDCDKN2BKN2B

21.95 22 22.05 22.1 22.15 21.95 22 22.05 22.1 22.15 Position on chr9 (Mb) 21.95 22 22.05 22.1 22.15 21.95 22 Position22 on. 0chr95 (Mb) 22.1 22.15 Position on chr 9 (Mb) Position on chr 9 (Mb)

Each data point has been colored based on its strength of LD with rs1333045. LD blocks

have been colored in blue, yellow and red. CFRD; cystic fibrosis-related diabetes. T2D;

type 2 diabetes. CAD; coronary artery disease.

81 Figure 4 Manhattan plot of (A) sex*SNP interaction term, (B) male-only, and (C) and female only analyses. Association analysis was performed on all variants with MAF>2% that passed quality control criteria. The x-axis indicates chromosomal position, and the y-axis indicates the strength of evidence for association with CFRD (-log10p) by Cox proportional hazardFigure regressio n3.5. inclu dManhattaning frailty model tploto allow of fo r(A) clust esex*SNPring by family .interaction The black line term, (B) male-only, and (C) corresponds to the genome-wide significance threshold (p>5e-8). and female only analyses.

A

B

C

TCF7L2 USP40

Association analysis was performed on all variants with MAF>2% that passed quality

control criteria. The x-axis indicates chromosomal position, and the y-axis indicates the

strength of evidence for association with CFRD (-log10p) by Cox proportional hazard

82 regression including frailty model to allow for clustering by family. The black line corresponds to the genome-wide significance threshold (p=5.0e-8).

83

Chapter 4

Decreased mRNA and protein stability of W1282X

limits response to modulator therapy

MA Aksit†, AD Bowling†, TA Evans, AT Joynt, D Osorio, S Patel, N West, C Merlo, PR Sosnay, GR Cutting, N Sharma. Decreased mRNA and protein stability of W1282X limits response to modulator therapy. J Cyst Fibros. 2019 Sep;18(5):606-613. † equal contributions

84 4.1 Introduction

The development of modulators that target defective forms of the cystic fibrosis

(CF) transmembrane conductance regulator (CFTR) protein has transformed the treatment of CF (80). The potentiator ivacaftor alters chloride channel gating and increases the open probability of CFTR. The corrector lumacaftor functions by augmenting folding of mutant forms of CFTR (81). The combination of ivacaftor and lumacaftor improves clinical endpoints for individuals homozygous for F508del (82).

Another corrector compound, tezacaftor, when combined with ivacaftor, has also shown clinical efficacy for F508del homozygous individuals(83), as well as in individuals heterozygous for F508del carried in trans with a residual function variant (84).

However, the above CFTR modulator medications work only when there is expressed CFTR protein at the cell surface. CF-causing variants that cause absence of expressed protein constitute a challenging problem for modulator treatment. Prime among these are nonsense variants. Approximately 10% of individuals with CF harbor at least one nonsense variant in CFTR, which introduces premature termination codons

(PTCs) [https://cftr2.org]. W1282X is the most common nonsense variant (n = 1556) reported in the CFTR2 database [https://cftr2.org]. In cell lines that express CFTR from intronless complementary DNA (cDNA), introduction of W1282X truncates CFTR protein causing defective cellular processing and gating (85). Nonsense-mediated mRNA decay (NMD) of nonsense transcripts has been shown to affect the response to readthrough drugs (86). Several studies have shown modulator treatment increases the activity of the truncated CFTR protein synthesized from the cDNA vectors (85,87). In contrast, modulators fail to activate CFTR in primary cells from an individual who is

85 homozygous for W1282X (85) and have limited clinical benefits. We speculated this discrepancy may be due to degradation of CFTR mRNA bearing W1282X in the primary cells. This concept is supported by the observation that mRNA transcribed from cDNAs do not trigger nonsense mediated decay (NMD) since they do not undergo intron- mediated splicing. Thus, cell lines expressing W1282X CFTR from cDNA generate truncated protein that is not present or a very low abundance in primary cells.

It has been shown the level of CFTR RNA transcript bearing W1282X can range from low (<5% of wildtype) to near normal levels (80% of wild type) (85,86,88-91).

Notably, two studies reported CFTR mRNA abundance at ~40% and ~80% of wild-type

(WT) in primary nasal cells from the same individual homozygous for the W1282X mutation(85,91). Variation in the estimate levels of W1282X transcript may be due to technical issues such as non-linear amplification of target or control transcripts or variation in cell culture conditions that may provoke mechanisms such as the unfolded protein response affecting NMD efficiency(92). Natural biologic variation in the efficiency of NMD among tissues and individuals may also be responsible (90,93). To address our hypothesis and to account for potential sources of non-biologic variation, we performed amplification-based and direct quantification of CFTR RNA transcripts in primary cells from an individual homozygous for W1282X, her heterozygous parents, and healthy controls. Heterozygous individuals have an ideal internal control for CFTR transcript levels as they carry a WT allele as well as the W1282X variant. Finally, we demonstrate the utility of expression minigenes (EMGs) for assessment of therapies for nonsense variants such as W1282X.

86 4.2 Results

4.2.1 CFTR mRNA bearing W1282X is substantially reduced compared to WT CFTR mRNA in the primary nasal epithelial cells of healthy heterozygous carriers of W1282X

To ascertain the level of CFTR transcript bearing W1282X compared to WT levels, we obtained nasal epithelial (NE) cells from an individual with CF who is a

W1282X homozygote, her carrier parents, and two unrelated controls who do not harbor disease-causing CFTR variant (Fig. 4.1A). Sanger sequencing of genomic DNA extracted from the NE cells confirmed genotypes of each family member (Fig. 4.1B). Total RNA extracted from the NE cells underwent reverse transcription (RT) to generate complementary DNA (cDNA), then polymerase chain reaction (PCR) using CFTR specific primers from exon 22 and exon 24 to amplify the region encompassing the

W1282X mutation. Amplification products of the correct size generated from CFTR cDNA of each parent were of reduced intensity compared to the healthy control (Fig.

4.1C, top panel). Notably, CFTR RNA was not observed in the proband. Increasing the quantity of total cDNA by tenfold (10×) revealed the quantity of CFTR cDNA in each parent remained substantially reduced compared to WT, indicating amplification was in the linear range (Fig. 4.1C, top panel). Only at the increased level of RNA could a faint amplification product of the correct size be observed in the proband, consistent with severely reduced levels of CFTR mRNA in the proband compared to the healthy control and her parents (estimated at ~2.1% of WT). Comparable levels of cDNA amplified RNA encoding the TATA Box Binding protein (TBP), a housekeeping gene, indicated that similar amounts of RNA were used for each sample (Fig. 4.1C, bottom panel). Sanger sequencing of these PCR products revealed that RNA from the W1282X gene in each

87 parent was dramatically lower than the level of WT CFTR RNA (Fig. 4.1D). Area under the peak from the W1282X allele (A allele) and WT allele (G allele) estimated that mother had ~5% of WT and father had ~20% of WT. To precisely quantify the relative levels of DNA amplified from W1282X and WT CFTR transcripts, we performed pyrosequencing on cDNA amplified from the RNA of NE cells from each parent. The relative expression of W1282X compared to normal transcript was 4.2 ± 0.9% (n = 3) in the mother and 12.4 ± 1.3% (n = 3) in the father (Fig. 4.1D); consistent with estimates from Sanger sequencing.

4.2.2 Transcriptome analysis confirms CFTR mRNA is severely decreased, and reveals a subset of NMD and UPR factors may be upregulated in the primary nasal epithelial cells of the W1282X homozygote

It is challenging to accurately quantify transcript levels relative to WT in individuals carrying two variants that reduce RNA stability as they lack an internal WT

CFTR control. Therefore, we employed RNA-seq to obtain a direct quantitative measurement of mRNA abundance at the global level in the primary NE cells of the

W1282X homozygous proband, her heterozygous parents, and WT controls without nonsense variants in CFTR (Fig. 4.2A). We determined expression of each gene after normalizing for sequencing depth, gene length, and paired-end data (94-96). Volcano plots were generated to show the distribution of RNA transcripts from ~14,000 genes that considered both fold-change and statistical significance as shown on the x-axis and y- axis, respectively (97). First, we compared RNA-seq data of the healthy control NE

(n = 2) collected at Johns Hopkins University with healthy control NE (n = 2) from a

88 previous published study conducted at George Washington University (Fig. 4.2B)

(Accession IDs SRR1528464 and SRR1528468). CFTR was centered around zero indicating its expression was very similar among the 4 healthy control samples (red dot,

Fig. 4.2B). Analysis of probands RNA-seq data revealed CFTR expression is reduced by

57-fold in comparison to control (control: 7.04 FPKM, proband: 0.12 FPKM, p-value:

4.6e-3) indicating a level of 1.7% of WT, consistent with our RT-PCR results (Fig. 4.2C).

RNA-seq analysis of heterozygote parents revealed down-regulation of CFTR in both parents compared to control, consistent with amplification results (Fig. 4.2D, E).

Since we had the transcriptome data available, we evaluated the levels of NMD and unfolded protein response (UPR) factors in the proband vs. control to determine the relative abundance. First, we extracted a list of 33 rigorously experimentally verified

NMD factors from published literature(98-100). None of these factors significantly differed in expression levels between the proband and the control (Fig. S4.1A). Next, we expanded this list to 109 NMD-linked genes from (101). Eleven of these

NMD-linked genes were significantly differently expressed between the proband and the control with a p-value<0.01 (RPL7, NMD3, RPSA, RPL6, RPS7, RPL24, EIF3E,

PPP2R2A, PPP2CA, RPS2 and ETF1) (Fig. S4.1B). Since many of these are ribosomal proteins, we evaluated the expression of 10 RNA pol3 genes, of which POLR3G was significantly higher in the proband (15.44 FPKM) compared to WT (0.91 FPKM; p:

0.00165), the father (1.57 FPKM; p: 0.00265) and mother (1.83 FPKM, p: 0.0047).

Finally, we looked at another homeostatic system, UPR, because it has been previously shown to regulate the level of nonsense transcripts (92). In our dataset, 6 out of 28 genes

89 were significantly differently expressed between the proband and control with a p-value

<0.01 (DNAJB5, HSPA13, HSPA4, DNAJB1, DNAJB4 and HSP90B1) (Fig. S4.1C).

4.2.3 W1282X mRNA is unstable in primary nasal epithelial cells due to nonsense mediated mRNA decay (NMD)

To verify the reduction in W1282X transcript abundance in the heterozygous cells was due to NMD, we used a small molecule, NMDI14, that enhances the stability of premature termination codon mutated mRNA by disrupting interaction between UPF1 and SMG7 that mediate nonsense mediated mRNA decay (Fig. 4.3A, left) (102). To determine whether an NMD inhibitor can increase the transcript levels of the W1282X allele, conditionally reprogrammed NE cell cultures on Airway Liquid Interface (ALI) culture established from W1282X/WT heterozygous parents were treated for 12 h with

NMDI14 (5 μM) (Fig. 4.3A, right). Following this treatment, cells received actinomycin

D (2 μM) to inhibit synthesis of new transcripts. RNA was extracted at different time points after actinomycin D treatment so decay of the W1282X transcript could be monitored over time (allele G, shown in green, Fig. 4.3B). Heterozygous cells offered the advantage that comparison could be made to wild type transcript in the same cells (allele

A, shown in white, Fig. 4.3B). The level of W1282X transcript diminished over time in the cells that did not receive NMDI14 (Fig. 4.3B). Notably, NMDI14 dramatically increased the level of W1282X transcript for all time points (Fig. 4.3B).

90

4.2.4 NMD inhibition improves the stability of CFTR protein in response to correctors in a cell line model expressing W1282X expression minigene

Studies of CFTR bearing W1282X variant using intronless cDNA constructs demonstrate the cDNA construct produces a truncated, mature, complex glycosylated form of CFTR that responds to CFTR targeted modulators (85,103). However, no improvement in CFTR function was observed when primary NE cells of a W1282X homozygous individual were treated with modulators (85). We propose the lack of response is due to loss of W1282X-bearing CFTR transcript undergoes NMD in primary cells. This mechanism would not be operative when CFTR cDNA constructs are used as they lack intron sequences that mediate splicing, leading to the formation of exon- junction complexes required for NMD (104). We have previously shown EMGs containing full-length CFTR cDNA and flanking introns faithfully reproduce splicing patterns observed in affected tissues (Table S4.1). We therefore used expression minigenes (EMG) to evaluate the transcriptional and translational effects of W1282X

CFTR (Fig. 4.4A). To create an EMG to study W1282X, we incorporated abridged introns 21, 22, 23, and 24 into full-length CFTR cDNA. EMGs with introns 21–24 containing the W1282X variant and the WT version were individually integrated into a single genomic site in Human Embryonic Kidney (HEK) 293-Flp-In cells. CFTR mRNA transcripts generated by the integrated WT EMG were correctly spliced (Fig. S4.2). The steady state levels of W1282X CFTR mRNA abundance in 293 stable cells were found to be 30.1 ± 5.62% (n = 4) of WT when both were normalized to a housekeeping gene

(B2M) (Fig. 4.4B). To evaluate W1282X mRNA stability over time, transcription was

91 inhibited using actinomycin D to inhibit transcription, and we quantitated the remaining

RNA at various time points (Fig. 4.4C). Marked reduction in mRNA produced from

W1282X-EMG was observed within 1 h compared to WT-EMG (Fig. 4.4C). The approximate half-lives determined by semilogarithmic decay plots of the same data analyzed by nonlinear regression were ~260 min for W1282X-EMG, and ~530 min for

WT-EMG (Fig. S4.3). These results demonstrate the level of W1282X mRNA is significantly reduced compared to WT in the 6 h following inhibition of transcription.

EMGs offer an additional advantage in that protein synthesis and RNA synthesis and splicing can be studied simultaneously. Immunoblotting of protein lysates collected from Flp-In-293 stable cells expressing EMGs and their intronless cDNA equivalents revealed three patterns: first, WT-EMG and WT-cDNA produced abundant full-length mature, complex-glycosylated CFTR protein (band C), and minor amounts of core glycosylated truncated CFTR protein (band B) (Fig. 4.4D, lanes 1 and 3); second, the

W1282X cDNA construct generated truncated protein of an estimated molecular mass corresponding to complex and incompletely glycosylated 1281aa protein (asterisks, Fig.

4.4D, lane 4); and third, W1282X-EMG produced barely detectable truncated protein

(Fig. 4.4D, lane 2). The latter observation is remarkable given the highly abundant protein generated by the WT-EMG. Thus, the EMG based cell line indicates lack of

CFTR modulator response in W1282X primary HNE cells is likely due to the extremely low levels of truncated CFTR protein as a result of efficient nonsense mediated mRNA decay. Furthermore, our observation that mature form of W1282X protein was produced at low levels compared to WT CFTR even from the intronless construct suggested that

CFTR truncated at residue 1282 has impaired stability, as described previously (85).

92 Taken together, our data suggest the W1282X variant affects both mRNA and protein levels.

Finally, we utilized the NMD inhibitor NMDI14 and FDA approved CFTR correctors to test whether W1282X protein stability could be increased. Treatment of cells expressing the W1282X-CFTR-EMG with correctors (lumacaftor/tezacaftor or both) resulted in minimal amounts of protein of molecular mass consistent with mature (i.e. glycosylated) truncated W1282X-CFTR (Fig. 4.4E, lanes, 2–4 on left; and corresponding bars 2–4 on right). Treatment with NMDI14 alone resulted in a moderate increase of

W1282X-CFTR (Fig. 4.4E, lane 5; and corresponding bar 5). Furthermore, treatment with NMDI14 in combination with correctors (lumacaftor/tezacaftor or both) resulted in a marked increase in the amount of mature-truncated W1282X-CFTR (Fig. 4.4E. lanes, 6–

8; and corresponding bars 6–8 on right).

4.3 Discussion

Treatment of individuals bearing variants that generate premature termination codons requires detailed understanding of the effects on RNA and protein processing. The commonness of the W1282X variant has garnered substantial scrutiny since its discovery over 25 years ago(88,105,106). To this end, the W1282X homozygous proband studied here has been repeatedly evaluated for the effect of her nonsense variant on CFTR transcript levels (85,91). In contrast to prior studies, we show there is a marked reduction in the abundance of the W1282X-bearing transcript in this individual. Our assertion is based on multiple independent lines of evidence. First, RNA isolated from the primary

NE cells of carrier parents showed significant reduction in the mRNA of the allele

93 containing W1282X compared to the WT allele. Second, RNA-seq confirmed the abundance of CFTR mRNA was severely reduced in the proband. Third, NMD inhibition in the primary nasal epithelial cells of the parents resulted in the stability of W1282X

CFTR mRNA. Based on these observations, we propose that inhibition or evasion of

NMD should be paramount in the development of therapies for individuals with CF carrying W1282X. Similarly, a study had demonstrated downregulation of NMD in cells carrying W1282X increased CFTR transcripts and improved response to gentamicin (90).

However, NMD targets need to be carefully selected since the physiologic role of NMD is to mute transcriptional noise(98,107,108). To this end, it is notable that a subset of

NMD genes were expressed at significantly higher levels in the proband compared to controls, resulting in higher efficiency of NMD and thus reduced levels of CFTR mRNA.

These results could be used to prioritize components of the NMD machinery for therapeutic targeting. While manipulation of NMD pathways in primary cells will be an eventual goal, the availability of robust cell based models is essential for testing efficacy of various approaches. For this purpose, we developed a cell line stably expressing intron-containing CFTR EMGs and show NMD is triggered when a nonsense variant is introduced. The isogenic EMG cell line provides a platform for development of non-toxic therapies that antagonize NMD. We have demonstrated NMD inhibition using NMDI14 can markedly improve the stability of CFTR protein in a cell line model expressing

W1282X-EMG. Successful approaches can subsequently tested in the limited primary cells derived from heterozygote parents to obtain unambiguous evidence that nonsense bearing transcript levels are stabilized relative to WT CFTR transcripts.

94 In the present study, inclusion of parents heterozygous for W1282X gave us an opportunity to rigorously assess the level of their CFTR transcripts containing W1282X in comparison to their WT allele. Both parents had significantly lower W1282X mRNA abundance compared to WT. Our data established expression of the W1282X allele relative to the normal allele in a heterozygote parent provides unprecedented accuracy to quantify transcript stability because allelic differences were measured within rather than between samples. These results were consistent in the RNA-seq data. Using this data, we also evaluated the levels of NMD and UPR factors because it has been previously shown they influence gene expression (92,98-100). Two ‘NMD’ genes (PPP2R2A and PPP2CA) were differentially expressed between proband and control. These genes regulate NMD by controlling the phosphorylation status of Upf1(109). It is unclear whether the CFTR mRNA in the proband is low due to high expression of these NMD factors. Additionally, we found several ribosomal proteins and translation factors listed as NMD-linked in the gene-ontology database (101) to be differentially expressed. However, these genes have not been experimentally verified to influence NMD. Of the UPR factors, the levels of

DNAJB4 and HSPA13 were significantly different (p < 0.001) in the proband and control. Differential expression of these genes could lead to variability of expression of nonsense transcripts between individuals(90).

Treatment with CFTR modulators has been reported to increase the function of

W1282X-truncated CFTR(85,87,91), but there was no improvement in CFTR function when W1282X primary nasal cells were treated with CFTR modulators (85). Our result suggests this discordance is a consequence of the expression system used in that cDNAs lack introns and their encoded transcripts do not undergo splicing. Consequently, NMD is

95 not engaged since at least one intron is required for this process to occur(110).

Introduction of introns to full-length CFTR cDNA engaged splicing and activated NMD when W1282X variant was introduced, as shown for RNA transcripts bearing other nonsense variants in CFTR(111). The steady-state levels of W1282X CFTR mRNA in

Flp-In-293 stable cells (~30% WT) were considerably higher compared to primary cells from the proband and from each heterozygous parent. We suspect use of a potent constitutive promoter (CMV) generated far higher levels of W1282X RNA transcript than the native CFTR promoter. Consequently, NMD was less effective in degrading the nonsense containing transcript, as has been reported in other cell based studies of NMD

(112). This issue can be addressed in future studies by employing promoters such as

EF1alpha that drives CFTR expression at levels comparable to those seen in primary tissues (113). Nonetheless, we have demonstrated NMD inhibition is of utmost importance for the W1282X truncated protein to get efficiently processed using CFTR correctors (lumacaftor, tezacaftor either alone or in combination). Additionally, the

EMG-W1282X cell line model provides an opportunity to rigorously test the efficacy of

CFTR modulators in combination with investigational NMD inhibitors and readthrough compounds (114-117).

Our study had the limitation that only a single W1282X family was studied.

However, the proband of this family has been repeatedly studied by others with results that appear to be incongruent with responses observed in her primary cells. Our study provides a rationale for the observed inconsistency based on multiple lines of evidence.

We suggest future studies enroll extended family members to enable assessment of the role of genetic (and non-genetic) factors in variation in NMD efficiency. Derivation of

96 cell lines from additional heterozygous W1282X/WT individuals who exhibit varying efficiency of NMD will be invaluable for exploring the molecular mechanisms underlying NMD and for testing whether variation in NMD efficiency alters the effect of anti-NMD treatments. Another limitation is the bulk RNA-seq could not distinguish cell types from the nasal epithelia. This could be important as single-cell RNA-seq of the airway epithelia in the lung has revealed expression of CFTR in rare cell types (118,119).

In summary, our results indicate W1282X mRNA degradation has to be addressed so that

CFTR modulators have an opportunity to work in primary nasal cells. Upregulation of

CFTR transcript following NMD suppression demonstrates that therapeutic approaches for individuals with nonsense variants must first increase levels of transcript before modulating protein activity. Since NMD plays a crucial role in maintaining cellular homeostasis by repressing the production of C-terminally truncated proteins, its global inhibition could be deleterious to the cells (120). Alternative approaches like pharmacological inhibition, tRNAs (near cognate or engineered), and antisense oligonucleotides could be considered to treat the primary molecular defect associated with nonsense variants (115,121-124).

4.4 Materials and Methods

4.4.1 Ethics statement

This study was approved by the Institutional Review Boards at Johns Hopkins Medicine,

Baltimore (approval numbers IRB00116966). Written informed consent was obtained from all subjects.

97 4.4.2 Study design

The purpose of this study was to evaluate stability of W1282X mRNA to inform treatments. Primary nasal cells were obtained from an individual with CF who is a

W1282X homozygote, her carrier parents, and two unrelated controls who do not harbor disease-causing CFTR variants. CFTR mRNA abundance and mRNA stability in primary nasal cells were assessed by RT-PCR, Sanger sequencing, pyrosequencing, and RNA- seq. W1282X mRNA stability and protein production were also assessed in Flp-In-293 stable cells expressing W1282X EMG.

4.4.3 Collection of blood and primary nasal epithelial cells

Whole blood and primary nasal epithelial (NE) cells were collected from individuals with

CF and healthy individuals following IRB protocols at Johns Hopkins University,

Baltimore (IRB# 00116966). Blood samples (5 ml) were collected in tubes containing

EDTA anti-coagulant. An experienced physician performed endoscopic procedures to harvest NE cells from individuals after informed consent was obtained. NE cells were collected from the mid-part of the inferior turbinate of healthy/CF individuals by brushing with interdental brushes, after spraying a topical anesthetic on the nasal mucosa.

4.4.4 DNA extraction and sequencing

Genomic DNA samples were extracted from whole blood using phenol-chloroform extraction methods. The concentration of DNA in the samples was determined by

NanoDrop (Thermo Fisher Scientific). DNA samples (50 ng/µl) were screened for

W1282X mutations within exon 23 of CFTR by direct sequencing using CFTR specific

98 PCR primers designed from the flanking introns. Both primers were tagged with M13F and M13R (tgtaaaacgacggccagtATTGAAGTACAATACTGAATTATG and caggaaacagctatgaccGAGTACAAGTATCAAATAGCAG). PCR amplicons were purified on AMPure XP (Beckman Coulter). Purified amplicons were subjected to automated sequence analysis on an ABI Prism 3100 Genetic Analyzer using the ABI PRISM Big

Dye Terminator Cycle Sequencing Kit (PE Applied Biosystems) with M13F and M13R primers.

4.4.5 Pyrosequencing

To determine relative expression of alternate alleles at the DNA level, pyrosequencing was performed on genomic DNA. Both PCR and sequencing primers were designed from within exon 23, location of W1282X variant, using PyroMark Assay Design Software.

The reverse PCR primer was biotin-labeled to enable mobilization of streptavidin-coated beads. The PCR was performed on 50 ng genomic DNA using in a standard 50 µl reaction set up containing: PyroMark PCR master mix (2X) 25 µl, Primer mix (10X) 5

µl, and RNAse free water 18 µl following manufacturer’s instructions (PyroMark PCR kit (Qiagen#978703). CoralLoad concentrate provided in the kit was avoided. PCR conditions were 2 min at 95 °C, followed by 40 cycles of 30 sec at 94 °C, 30 sec at 60 °C,

30 sec at 72 °C, and final extension of 10 min at 72 °C. The products were sequenced according to PyroMark Q24 system (Qiagen) with 0.4 µM of specific pyrosequencing primers and pyrograms were analysed with the software PyroMark Q24 V.2.0.6 (Qiagen).

99 4.4.6 RNA Isolation

One nasal brush from each family member was used directly for RNA extraction. Nasal epithelial cells were dissociated from the brush. Cells were collected in DMEM-FBS

(10%). Cell suspension was centrifuged at 800 rcf for 5 minutes. Cells were washed with

PBS made with DEPC-treated H2O and lysed by addition of 500 µl of TRIzol reagent

(Life Technologies). Bead beating (Zirconium beads, OPS diagnostics #80025029) using

FastPrep-24 (MP) was performed for effective lysis. Cellular debris was removed by passing lysed suspension through a shredder column (Denville#CM-610250). Flow through was collected and 200 µl chloroform was added. Aqueous and organic phases were allowed to separate at room temperature for 5 min followed by centrifugation at

12,000xg for 5 minutes at 4°C. Clear aqueous layer containing RNA was collected, mixed with 70% ethanol, and further purified using the SpinSmart RNA binding columns, according to the manufacturer’s instructions (Denville#CM-610250). The RNA content and purity were determined using NanoDrop (ThermoScientific).

4.4.7 RNA sequencing

RNA sequencing was conducted on the primary NE cells obtained from the proband, carrier parents, and two controls. Additionally, raw reads from the RNA sequencing of

NE cells from two healthy individuals were obtained from the National Center for

Biotechnology Information (NCBI) Sequence Read Archive (SRA) database (study accession numbers SRR1528464 and SRR1528468 (125). Raw reads were mapped to the reference genome (hg19) with TopHat (v2.0.13) (95), using the Bowtie2 read aligner

(version 2.1.0.0) (94). Mapped sequences were assembled with Cufflinks (v2.2.1) (126).

100 CuffQuant was used to estimate the relative abundances of gene transcripts among samples and CuffDiff was then used to determine differential expression values among samples (126). In this step, the control samples sequenced in house (n=2) were classified as a group which was used to compare to the proband and carrier parents. In addition, the two control samples obtained from SRA were classified as a second control group, which was used to compare with the in-house control group. Volcano plots were created by plotting the -log10(p-value) and log10(fold change) from the Cuffdiff output of autosomal genes with FPKM>0.1. We extracted a list of genes associated with nonsense- mediated mRNA decay from the Gene Ontology knowledgebase (GO: 0000184) (101), and colored those genes in black on the volcano plots (97). CFTR was colored red, and all other genes were colored grey.

101

Figure 4.1. Assessment of CFTR mRNA bearing W1282X in the primary nasal epithelial cells of healthy heterozygous carriers of W1282X.

(A) Pedigree of family with W1282X CFTR mutation demonstrating segregation consistent with autosomal recessive inheritance of CF. (B) Electropherograms from

Sanger sequencing of genomic DNA confirmed the segregation of W1282X CFTR mutation as indicated in the pedigree. Polymerase chain reaction (PCR) was performed on 50 ng genomic DNA using CFTR specific primers (10 μM each) from introns flanking exon 23. The W1282X mutation is caused by the replacement of a “G” nucleotide at 3846 by a “A” nucleotide (compare heterozygous parents and homozygous proband tracings).

102 (C) RT-PCR analysis of W1282X CFTR. RT-PCR was performed on cDNAs using

CFTR-specific primers from exon 22 and exon 24. Following electrophoresis on a 2% agarose gel, a single amplification product was visualized. The molecular weight of the amplification product matched the expected size of a product (417 base pairs, based on the position of RT-PCR primers in the 5′- exon 22 and 3′- exon 24). TATA box binding protein (TBP) was amplified as control. No-RT and water lanes are negative controls. (D)

Direct sequencing of the RT-PCR product to assess differential expression of W1282X mRNA in heterozygous parents. Top, electropherograms obtained from Sanger sequencing compare area under the peak for “nucleotide A” and “nucleotide G” indicated by arrows corresponding to W1282X and WT alleles respectively. Bottom, Graphs showing relative expression of W1282X and WT evaluated by pyrosequencing. Assay was designed such that exon 23 with upstream and downstream flanking exons was amplified from the corresponding cDNA preparations. Sequencing primer yielded relative abundance of alternate alleles at the W1282 codon (n = 3, Mean ± SEM).

103

Figure 4.2. Transcriptome analysis of the primary nasal epithelial cells of W1282X proband, carrier parents, and controls.

(A) Diagram of methodology used to generate and analyze RNA-seq data. (B) Volcano plot (x-axis: log10(fold change), y-axis: -log10(p-value)) comparing global mRNA expression of autosomal genes with gene expression >0.1 FPKM between two controls groups (individuals do not contain nonsense variants in CFTR); obtained and sequenced in house vs. obtained from SRA from a study conducted at George Washington (GW)

University (study accession numbers SRR1528464 and SRR1528468) (14,457 genes) (C)

Volcano plot of differential gene expression between the proband and control (n = 14,324 genes). (D) Volcano plot of differential gene expression between the carrier mother and 104 control (n = 17,839 genes). (E) Volcano plot of differential gene expression between carrier father and control (n = 18,733 genes).

105

Figure 4.3. Evaluation of W1282X mRNA stability in primary nasal epithelial cells of W1282X carrier parents.

(A) Left, Schematic illustration showing premature termination codon (PTC), e.g.

W1282X, recognized by components of nonsense mediated decay (NMD). NMDI14 inhibits interaction of UPF1 (grey) and SMG7 (red) proteins that assemble at the PTC, as shown previously (102). Right, Schematic illustration of well-differentiated primary NE cells of the carrier parents that express W1282X and WT CFTR RNA. NE cells were conditionally reprogrammed (CR) using Rho-kinase inhibitor. CRNE cells were

106 transferred on snapwells and exposed to air at the apical side to establish air-liquid interface (ALI) culture. Four-week old ALI culture was used to assess mRNA stability.

(B) Assessment of W1282X mRNA stability in ALI cultures of both carrier parents. Cells were treated with vehicle (DMSO 0.1%) or NMDI14 (5 μM) for 12 h. Actinomycin D

(2 μM) was added to terminate mRNA synthesis. RNA was collected at time points indicated on the graph after addition of actinomycin D. Relative expression of W1282X mRNA was assessed by pyrosequencing. Data are Mean ± SEM, n = 2 from each parent.

(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

107

Figure 4.4. Assessment of mRNA stability and protein production in cell line model expressing an expression minigene (EMG) bearing W1282X CFTR

(A) A schematic of CFTR-EMG with abridged introns 21, 22, 23 and 24 (EMG-i21-i24) constructed in pcDNA5FRT plasmid. CFTR exons are shown in boxes and abridged introns in dashed lines. Location of W1282X variant in the plasmid DNA is indicated.

EMGi21-i24 results in normal splicing. Both WT and W1282X EMG splice normally.

CFTR mRNA splicing patterns of the total RNA extracted from Flp-In-293 cells

108 transiently transfected with E831X-EMG. (B) RT-qPCR showing relative steady state levels of CFTR transcript in Flp-In-293 stable cells expressing wild-type EMG or

W1282X EMG. Values were normalized to B2M. Mean ± SEM (n = 3) measured in triplicates. (C) CFTR mRNA decay in Flp-In-293 cells stably expressing wild type EMG, and W1282X EMG. Actinomycin D (2 μM) was added at time 0 to induce transcriptional shut-down. Cells were collected at the indicated time points. Levels of the CFTR mRNAs were assessed by RT-qPCR, normalized to B2M mRNA and displayed as a percentage of the levels at t = 0. Mean ± SEM (n = 3). (D) Steady state levels of CFTR protein in Flp-In-

293 stable cells expressing wild-type EMG or W1282X EMG. 40 μg of total cell lysates were electrophoresed and Immunoblot (IB) was probed with anti-CFTR , 596

(CFFT). Lysates from cells expressing intronless WT CFTR or F508del served as positive controls, and non-transfected Flp-in 293 cells as negative control.

Na+K+ATPase was used as loading control. (E) Left, shows immunoblot of W1282X

CFTR-EMGi21–24, stably expressed in Flp-In-293 cells in response to correctors

(lumacaftor 3 μM, tezacafor 3 μM, and their combination) either alone or in combination with NMD inhibitor (NMDI14 5 μM). Both correctors, and NMDI14 treatments were given for 24 h. Arrows represent immature core-glycosylated and mature complex- glycosylated truncated W1282X protein generated from W1282X-EMG in Flp-In-293 stable cells. Lysates collected from Flp-In-293 stable cells expressing intronless WT

CFTR or F508del served as positive controls, and non-transfected Flp-in-293 cells as negative control. Na+K+ATPase was used as loading control. Right, shows the quantitative assessment through densitometry findings of three different exposures of that experiment (Mean ± SEM).

109

Figure S4.1. Volcano plots (x-axis: log10(fold change)), y-axis: -log10(p-value) of differential gene expression between the proband and control.

All autosomal genes with gene expression >0.1 FPKM in the proband and control are plotted (14,323 genes) (A) Rigorously experimentally verified NMD factors are labelled in black. (B) NMD-factors listed in gene ontology are labelled in black. (C) UPR-factors are labelled in black.

110

Figure S4.2. Normal splicing of WT-EMG i21-24 in Flp-In-293 stable cells.

Total RNA was extracted, and RT-PCR was performed using CFTR gene specific primers. The forward primer was 6-Fam labelled. The amplified product labelled with 6-

FAM was electrophoresed on an ABI 3100 sequencer with Genescan ROX 500 as size standard. Normal spliced product of the expected size is shown as a single blue peak.

RFU refers to Relative Fluorescence Units.

111

Figure S4.3. Semi-log plot of the time-course experiment to determine the mRNA half-life (related to Figure 4.4C).

Actinomycin D, to inhibit transcription, was added at 0 min time point. The cells were harvested for total RNA extraction at the indicated times, and mRNA levels were measured with real-time RT-PCR and normalized to B2M. mRNA abundances were expressed as percent remaining compared to time 0. The results shown are means of samples collected from four experiments. The semilogarithmic decay plots of the same data shown in Fig. 4.4 were analyzed by nonlinear regression using GraphPad software.

The half-life (t1/2) was determined as time to 50% mRNA decay for each experiment

(the time points when the red line (W1282X-EMG)), and blue line (WT-EMG) cross the black dotted line respectively for each curve.

112

Expression CFTR variant Primary tissue References minigene (EMG)

c.1680−877G>T EMG i12 *

c.3717+40A>G EMG i21-i23 *

G27X, L88X, EMG i1-i5 Primary nasal

E831X EMG i14-i18 cells ** R1158X, S1196X, EMG i21-i22

W1282X EMG i21-i24 This study

Table S4.1: EMGs containing full-length CFTR cDNA and flanking introns faithfully reproduce splicing patterns observed in affected tissues.

* Lee M, Roos P, Sharma N, Atalar M, Evans TA, Pellicore MJ, et al. Systematic

Computational Identification of Variants That Activate Exonic and Intronic Cryptic

Splice Sites. Am J Hum Genet. 2017;100(5):751-765. doi: 10.1016/j.ajhg.2017.04.001.

PubMed PMID: 28475858; PubMed Central PMCID: PMCPMC5420354.

** Sharma N, Evans TA, Pellicore MJ, Davis E, Aksit MA, McCague AF, et al.

Capitalizing on the heterogeneous effects of CFTR nonsense and frameshift variants to inform therapeutic strategy for cystic fibrosis. PLoS Genet. 2018;14(11):e1007723. doi:

10.1371/journal.pgen.1007723. PubMed PMID: 30444886.

113

Chapter 5

Conclusions

114 From the studies detailed in this thesis, we have observed cystic fibrosis is a

Mendelian disease with variable expressivity influenced by the severity of the CF- causing variant (a severe example, W1282X was discussed in Chapter 4), but is also influenced by genetic variants outside of the CFTR gene as well (Chapters 2 and 3).

Understanding the genetic cause of each individual’s cystic fibrosis is essential for that individual’s treatment. An individual with a nonsense mutation such as W1282X will not produce any CFTR protein; hence modulator treatment cannot benefit this patient.

Similarly, an individual with genetic variants associated with earlier or later onset of diabetes could have an adjusted schedule for diabetes screening tests. Screening for diabetes is time-consuming and unpopular with patients, and avoiding unnecessary screening tests could reduce part of the burden of care. Conversely, in patients at high risk of CFRD, discussions of the importance of adherence to screening schedules could begin at a young age, and screening itself could begin at a younger age. Screening for diabetes at the appropriate intervals/ages will allow complications to be detected as soon as they arise, preventing progression into a more severe disease resulting in a shorter lifespan.

Cystic fibrosis-related diabetes (CFRD) has been linked to worsened prognosis of cystic fibrosis; therefore, studying the etiology of CFRD is clinically relevant. Previously, it was shown that CFRD is largely influenced by genetic variants outside of CFTR.

Genetic variants at SLC26A9, and T2D-associated loci TCF7L2, CDKAL1, CDKN2A/B and IGF2BP2 were associated with earlier CFRD onset. In Chapter 2, a GWAS on cystic fibrosis-related diabetes identified a novel modifier locus on chromosome 2

(PTMA), and replicated the previously reported association between CFRD and modifier

115 variants at the SLC26A9 locus. In addition, T2D risk variants at TCF7L2 were associated with CFRD at genome-wide significance, and the newly genotyped samples provided replication of variants at IGF2BP2, confirming there is a genetic overlap between T2D and CFRD. Moreover, multiple lines of evidence demonstrated additional T2D susceptibility loci influence onset of CFRD.

A key advance presented in Chapter 2 was an increased understanding of the relationship between CFRD and related metabolic traits. We demonstrated that PRSs for insulin secretion, HOMA-B, FPG and 2hPG in the general population are associated with onset of CFRD. An in-depth look revealed the variants associated with both T2D and

CFRD were frequently associated with HOMA-B (ß-cell function), and none with

HOMA-IR (insulin sensitivity). Taken together, these results suggest, in general, CFRD is modified by variants that tend to affect beta cell function rather than insulin sensitivity.

An important yet understudied issue of CF is the worse prognosis in females.

Females with cystic fibrosis tend to have a shorter life span, worse lung function, and younger onset of diabetes. Despite this, CF is treated the same in males and females. To better understand any key differences in the disease etiology between males and females, we sought to identify any female or male-specific genetic modifiers of CFRD in Chapter

3. Through a candidate-based approach, we observed common variants at CDKN2B-AS1 associate with CFRD onset in females, but not males. These variants could explain a large portion of the difference in CFRD onset between males and females. In addition, we identified a rare (3%) variant associated with onset of CFRD in females and males but in opposite directions.

116 These findings indicate genetic differences account for at least part of the difference in CFRD onset in males and females, suggesting differences in disease etiology of CFRD between males and females. Understanding the mechanism of disease and differences between males and females will be essential for better treatment of these individuals. Better understanding and treatment of CFRD will improve the overall prognosis of the disease.

In addition to the key issue of understanding the impact of genetic variants outside of CFTR on CF, it is also critical to understand the impact of genetic variants within CFTR because individuals with CF receive treatment based on the genetic mutations causing their disease. In Chapter 4, we studied the impact of a CF-causing mutation within CFTR, W1282X. Many studies previously reported the nonsense mutation W1282X will result in a truncated protein, and the function of this truncated protein could be improved by being treated with modulators. In Chapter 4, we demonstrated transcripts harboring this mutation will be degraded through nonsense- mediated mRNA decay, hence no protein will be produced. This provided evidence that treatment with modulators would not benefit patients with this mutation. Preventing unnecessary treatment is essential for protection from unnecessary costs and side effects.

In conclusion, this work demonstrates the importance of understanding the genetic architecture of a disease, even in diseases such as cystic fibrosis which is autosomal recessive and fully penetrant.

117 References

1. Lewis C, Blackman SM, Nelson A, Oberdorfer E, Wells D, Dunitz J, Thomas W, Moran A. Diabetes-related mortality in adults with cystic fibrosis. Role of genotype and sex. Am J Respir Crit Care Med. 2015;191(2):194-200. 2. Hayes D, McCoy KS, Sheikh SI. Resolution of cystic fibrosis-related diabetes with ivacaftor therapy. Am J Respir Crit Care Med. 2014;190(5):590-591. 3. Bellin MD, Laguna T, Leschyshyn J, Regelmann W, Dunitz J, Billings J, Moran A. Insulin secretion improves in cystic fibrosis following ivacaftor correction of CFTR: a small pilot study. Pediatr Diabetes. 2013;14(6):417-421. 4. Tsabari R, Elyashar HI, Cymberknowh MC, Breuer O, Armoni S, Livnat G, Kerem E, Zangen DH. CFTR potentiator therapy ameliorates impaired insulin secretion in CF patients with a gating mutation. J Cyst Fibros. 2016;15(3):e25-27. 5. Bessonova L, Volkova N, Higgins M, Bengtsson L, Tian S, Simard C, Konstan MW, Sawicki GS, Sewall A, Nyangoma S, Elbert A, Marshall BC, Bilton D. Data from the US and UK cystic fibrosis registries support disease modification by CFTR modulation with ivacaftor. Thorax. 2018;73(8):731-740. 6. Volkova N, Moy K, Evans J, Campbell D, Tian S, Simard C, Higgins M, Konstan MW, Sawicki GS, Elbert A, Charman SC, Marshall BC, Bilton D. Disease progression in patients with cystic fibrosis treated with ivacaftor: Data from national US and UK registries. J Cyst Fibros. 2019. 7. Blackman SM, Hsu S, Vanscoy LL, Collaco JM, Ritter SE, Naughton K, Cutting GR. Genetic modifiers play a substantial role in diabetes complicating cystic fibrosis. J Clin Endocrinol Metab. 2009;94(4):1302-1309. 8. Blackman SM, Hsu S, Ritter SE, Naughton KM, Wright FA, Drumm ML, Knowles MR, Cutting GR. A susceptibility gene for type 2 diabetes confers substantial risk for diabetes complicating cystic fibrosis. Diabetologia. 2009;52(9):1858-1865. 9. Couce M, O'Brien TD, Moran A, Roche PC, Butler PC. Diabetes mellitus in cystic fibrosis is characterized by islet amyloidosis. J Clin Endocrinol Metab. 1996;81(3):1267-1272. 10. Hull RL, Westermark GT, Westermark P, Kahn SE. Islet amyloid: a critical entity in the pathogenesis of type 2 diabetes. J Clin Endocrinol Metab. 2004;89(8):3629-3643. 11. Moran A, Becker D, Casella SJ, Gottlieb PA, Kirkman MS, Marshall BC, Slovis B, Committee CCC. Epidemiology, pathophysiology, and prognostic implications of cystic fibrosis-related diabetes: a technical review. Diabetes Care. 2010;33(12):2677-2683. 12. Moran A, Dunitz J, Nathan B, Saeed A, Holme B, Thomas W. Cystic fibrosis-related diabetes: current trends in prevalence, incidence, and mortality. Diabetes Care. 2009;32(9):1626-1631. 13. Blackman SM, Commander CW, Watson C, Arcara KM, Strug LJ, Stonebraker JR, Wright FA, Rommens JM, Sun L, Pace RG, Norris SA, Durie PR, Drumm ML, Knowles MR, Cutting GR. Genetic modifiers of cystic fibrosis-related diabetes. Diabetes. 2013;62(10):3627-3635. 14. Jin T. Current Understanding on Role of the Wnt Signaling Pathway Effector TCF7L2 in Glucose Homeostasis. Endocr Rev. 2016;37(3):254-277. 15. Consortium G. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45(6):580- 585. 16. Consortium EP. An integrated encyclopedia of DNA elements in the . Nature. 2012;489(7414):57-74.

118 17. Lesurf R, Cotto KC, Wang G, Griffith M, Kasaian K, Jones SJ, Montgomery SB, Griffith OL, Consortium ORA. ORegAnno 3.0: a community-driven resource for curated regulatory annotation. Nucleic Acids Res. 2016;44(D1):D126-132. 18. Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM, Rayner NW, Payne AJ, Steinthorsdottir V, Scott RA, Grarup N, Cook JP, Schmidt EM, Wuttke M, Sarnowski C, Mägi R, Nano J, Gieger C, Trompet S, Lecoeur C, Preuss MH, Prins BP, Guo X, Bielak LF, Below JE, Bowden DW, Chambers JC, Kim YJ, Ng MCY, Petty LE, Sim X, Zhang W, Bennett AJ, Bork- Jensen J, Brummett CM, Canouil M, Ec Kardt KU, Fischer K, Kardia SLR, Kronenberg F, Läll K, Liu CT, Locke AE, Luan J, Ntalla I, Nylander V, Schönherr S, Schurmann C, Yengo L, Bottinger EP, Brandslund I, Christensen C, Dedoussis G, Florez JC, Ford I, Franco OH, Frayling TM, Giedraitis V, Hackinger S, Hattersley AT, Herder C, Ikram MA, Ingelsson M, Jørgensen ME, Jørgensen T, Kriebel J, Kuusisto J, Ligthart S, Lindgren CM, Linneberg A, Lyssenko V, Mamakou V, Meitinger T, Mohlke KL, Morris AD, Nadkarni G, Pankow JS, Peters A, Sattar N, Stančáková A, Strauch K, Taylor KD, Thorand B, Thorleifsson G, Thorsteinsdottir U, Tuomilehto J, Witte DR, Dupuis J, Peyser PA, Zeggini E, Loos RJF, Froguel P, Ingelsson E, Lind L, Groop L, Laakso M, Collins FS, Jukema JW, Palmer CNA, Grallert H, Metspalu A, Dehghan A, Köttgen A, Abecasis GR, Meigs JB, Rotter JI, Marchini J, Pedersen O, Hansen T, Langenberg C, Wareham NJ, Stefansson K, Gloyn AL, Morris AP, Boehnke M, McCarthy MI. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet. 2018;50(11):1505-1513. 19. Scott RA, Scott LJ, Mägi R, Marullo L, Gaulton KJ, Kaakinen M, Pervjakova N, Pers TH, Johnson AD, Eicher JD, Jackson AU, Ferreira T, Lee Y, Ma C, Steinthorsdottir V, Thorleifsson G, Qi L, Van Zuydam NR, Mahajan A, Chen H, Almgren P, Voight BF, Grallert H, Müller- Nurasyid M, Ried JS, Rayner NW, Robertson N, Karssen LC, van Leeuwen EM, Willems SM, Fuchsberger C, Kwan P, Teslovich TM, Chanda P, Li M, Lu Y, Dina C, Thuillier D, Yengo L, Jiang L, Sparso T, Kestler HA, Chheda H, Eisele L, Gustafsson S, Frånberg M, Strawbridge RJ, Benediktsson R, Hreidarsson AB, Kong A, Sigurðsson G, Kerrison ND, Luan J, Liang L, Meitinger T, Roden M, Thorand B, Esko T, Mihailov E, Fox C, Liu CT, Rybin D, Isomaa B, Lyssenko V, Tuomi T, Couper DJ, Pankow JS, Grarup N, Have CT, Jørgensen ME, Jørgensen T, Linneberg A, Cornelis MC, van Dam RM, Hunter DJ, Kraft P, Sun Q, Edkins S, Owen KR, Perry JRB, Wood AR, Zeggini E, Tajes-Fernandes J, Abecasis GR, Bonnycastle LL, Chines PS, Stringham HM, Koistinen HA, Kinnunen L, Sennblad B, Mühleisen TW, Nöthen MM, Pechlivanis S, Baldassarre D, Gertow K, Humphries SE, Tremoli E, Klopp N, Meyer J, Steinbach G, Wennauer R, Eriksson JG, Mӓnnistö S, Peltonen L, Tikkanen E, Charpentier G, Eury E, Lobbens S, Gigante B, Leander K, McLeod O, Bottinger EP, Gottesman O, Ruderfer D, Blüher M, Kovacs P, Tonjes A, Maruthur NM, Scapoli C, Erbel R, Jöckel KH, Moebus S, de Faire U, Hamsten A, Stumvoll M, Deloukas P, Donnelly PJ, Frayling TM, Hattersley AT, Ripatti S, Salomaa V, Pedersen NL, Boehm BO, Bergman RN, Collins FS, Mohlke KL, Tuomilehto J, Hansen T, Pedersen O, Barroso I, Lannfelt L, Ingelsson E, Lind L, Lindgren CM, Cauchi S, Froguel P, Loos RJF, Balkau B, Boeing H, Franks PW, Barricarte Gurrea A, Palli D, van der Schouw YT, Altshuler D, Groop LC, Langenberg C, Wareham NJ, Sijbrands E, van Duijn CM, Florez JC, Meigs JB, Boerwinkle E, Gieger C, Strauch K, Metspalu A, Morris AD, Palmer CNA, Hu FB, Thorsteinsdottir U, Stefansson K, Dupuis J, Morris AP, Boehnke M, McCarthy MI, Prokopenko I, Consortium DGRAM-aD. An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans. Diabetes. 2017;66(11):2888-2902. 20. Euesden J, Lewis CM, O'Reilly PF. PRSice: Polygenic Risk Score software. Bioinformatics. 2015;31(9):1466-1468. 21. Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, Erlich HA, Julier C, Morahan G, Nerup J, Nierras C, Plagnol V, Pociot F, Schuilenburg H, Smyth DJ, Stevens H, Todd JA, 119

Walker NM, Rich SS, Consortium TDG. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat Genet. 2009;41(6):703-707. 22. Onengut-Gumuscu S, Chen WM, Burren O, Cooper NJ, Quinlan AR, Mychaleckyj JC, Farber E, Bonnie JK, Szpak M, Schofield E, Achuthan P, Guo H, Fortune MD, Stevens H, Walker NM, Ward LD, Kundaje A, Kellis M, Daly MJ, Barrett JC, Cooper JD, Deloukas P, Todd JA, Wallace C, Concannon P, Rich SS, Consortium TDG. Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat Genet. 2015;47(4):381-386. 23. Vassy JL, Hivert MF, Porneala B, Dauriz M, Florez JC, Dupuis J, Siscovick DS, Fornage M, Rasmussen-Torvik LJ, Bouchard C, Meigs JB. Polygenic type 2 diabetes prediction at the limit of common variant detection. Diabetes. 2014;63(6):2172-2182. 24. Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, Jackson AU, Wheeler E, Glazer NL, Bouatia-Naji N, Gloyn AL, Lindgren CM, Mägi R, Morris AP, Randall J, Johnson T, Elliott P, Rybin D, Thorleifsson G, Steinthorsdottir V, Henneman P, Grallert H, Dehghan A, Hottenga JJ, Franklin CS, Navarro P, Song K, Goel A, Perry JR, Egan JM, Lajunen T, Grarup N, Sparsø T, Doney A, Voight BF, Stringham HM, Li M, Kanoni S, Shrader P, Cavalcanti-Proença C, Kumari M, Qi L, Timpson NJ, Gieger C, Zabena C, Rocheleau G, Ingelsson E, An P, O'Connell J, Luan J, Elliott A, McCarroll SA, Payne F, Roccasecca RM, Pattou F, Sethupathy P, Ardlie K, Ariyurek Y, Balkau B, Barter P, Beilby JP, Ben-Shlomo Y, Benediktsson R, Bennett AJ, Bergmann S, Bochud M, Boerwinkle E, Bonnefond A, Bonnycastle LL, Borch-Johnsen K, Böttcher Y, Brunner E, Bumpstead SJ, Charpentier G, Chen YD, Chines P, Clarke R, Coin LJ, Cooper MN, Cornelis M, Crawford G, Crisponi L, Day IN, de Geus EJ, Delplanque J, Dina C, Erdos MR, Fedson AC, Fischer-Rosinsky A, Forouhi NG, Fox CS, Frants R, Franzosi MG, Galan P, Goodarzi MO, Graessler J, Groves CJ, Grundy S, Gwilliam R, Gyllensten U, Hadjadj S, Hallmans G, Hammond N, Han X, Hartikainen AL, Hassanali N, Hayward C, Heath SC, Hercberg S, Herder C, Hicks AA, Hillman DR, Hingorani AD, Hofman A, Hui J, Hung J, Isomaa B, Johnson PR, Jørgensen T, Jula A, Kaakinen M, Kaprio J, Kesaniemi YA, Kivimaki M, Knight B, Koskinen S, Kovacs P, Kyvik KO, Lathrop GM, Lawlor DA, Le Bacquer O, Lecoeur C, Li Y, Lyssenko V, Mahley R, Mangino M, Manning AK, Martínez-Larrad MT, McAteer JB, McCulloch LJ, McPherson R, Meisinger C, Melzer D, Meyre D, Mitchell BD, Morken MA, Mukherjee S, Naitza S, Narisu N, Neville MJ, Oostra BA, Orrù M, Pakyz R, Palmer CN, Paolisso G, Pattaro C, Pearson D, Peden JF, Pedersen NL, Perola M, Pfeiffer AF, Pichler I, Polasek O, Posthuma D, Potter SC, Pouta A, Province MA, Psaty BM, Rathmann W, Rayner NW, Rice K, Ripatti S, Rivadeneira F, Roden M, Rolandsson O, Sandbaek A, Sandhu M, Sanna S, Sayer AA, Scheet P, Scott LJ, Seedorf U, Sharp SJ, Shields B, Sigurethsson G, Sijbrands EJ, Silveira A, Simpson L, Singleton A, Smith NL, Sovio U, Swift A, Syddall H, Syvänen AC, Tanaka T, Thorand B, Tichet J, Tönjes A, Tuomi T, Uitterlinden AG, van Dijk KW, van Hoek M, Varma D, Visvikis-Siest S, Vitart V, Vogelzangs N, Waeber G, Wagner PJ, Walley A, Walters GB, Ward KL, Watkins H, Weedon MN, Wild SH, Willemsen G, Witteman JC, Yarnell JW, Zeggini E, Zelenika D, Zethelius B, Zhai G, Zhao JH, Zillikens MC, Borecki IB, Loos RJ, Meneton P, Magnusson PK, Nathan DM, Williams GH, Hattersley AT, Silander K, Salomaa V, Smith GD, Bornstein SR, Schwarz P, Spranger J, Karpe F, Shuldiner AR, Cooper C, Dedoussis GV, Serrano-Ríos M, Morris AD, Lind L, Palmer LJ, Hu FB, Franks PW, Ebrahim S, Marmot M, Kao WH, Pankow JS, Sampson MJ, Kuusisto J, Laakso M, Hansen T, Pedersen O, Pramstaller PP, Wichmann HE, Illig T, Rudan I, Wright AF, Stumvoll M, Campbell H, Wilson JF, Bergman RN, Buchanan TA, Collins FS, Mohlke KL, Tuomilehto J, Valle TT, Altshuler D, Rotter JI, Siscovick DS, Penninx BW, Boomsma DI, Deloukas P, Spector TD, Frayling TM, Ferrucci L, Kong A, Thorsteinsdottir U, Stefansson K, van Duijn CM, Aulchenko YS, Cao A, Scuteri A, Schlessinger D, Uda M, Ruokonen A, Jarvelin MR, Waterworth DM, Vollenweider P, 120

Peltonen L, Mooser V, Abecasis GR, Wareham NJ, Sladek R, Froguel P, Watanabe RM, Meigs JB, Groop L, Boehnke M, McCarthy MI, Florez JC, Barroso I, Consortium D, Consortium G, Consortium GB, Consortium AHoboP, investigators M. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet. 2010;42(2):105-116. 25. Stančáková A, Kuulasmaa T, Kuusisto J, Mohlke KL, Collins FS, Boehnke M, Laakso M. Genetic risk scores in the prediction of plasma glucose, impaired insulin secretion, insulin resistance and incident type 2 diabetes in the METSIM study. Diabetologia. 2017;60(9):1722- 1730. 26. Cooper JD, Simmonds MJ, Walker NM, Burren O, Brand OJ, Guo H, Wallace C, Stevens H, Coleman G, Franklyn JA, Todd JA, Gough SC, Consortium WTCC. Seven newly identified loci for autoimmune thyroid disease. Hum Mol Genet. 2012;21(23):5202-5208. 27. Sharma A, Liu X, Hadley D, Hagopian W, Chen WM, Onengut-Gumuscu S, Törn C, Steck AK, Frohnert BI, Rewers M, Ziegler AG, Lernmark Å, Toppari J, Krischer JP, Akolkar B, Rich SS, She JX, Group TS. Identification of non-HLA genes associated with development of islet autoimmunity and type 1 diabetes in the prospective TEDDY cohort. J Autoimmun. 2018;89:90- 100. 28. Soranzo N, Sanna S, Wheeler E, Gieger C, Radke D, Dupuis J, Bouatia-Naji N, Langenberg C, Prokopenko I, Stolerman E, Sandhu MS, Heeney MM, Devaney JM, Reilly MP, Ricketts SL, Stewart AF, Voight BF, Willenborg C, Wright B, Altshuler D, Arking D, Balkau B, Barnes D, Boerwinkle E, Böhm B, Bonnefond A, Bonnycastle LL, Boomsma DI, Bornstein SR, Böttcher Y, Bumpstead S, Burnett-Miller MS, Campbell H, Cao A, Chambers J, Clark R, Collins FS, Coresh J, de Geus EJ, Dei M, Deloukas P, Döring A, Egan JM, Elosua R, Ferrucci L, Forouhi N, Fox CS, Franklin C, Franzosi MG, Gallina S, Goel A, Graessler J, Grallert H, Greinacher A, Hadley D, Hall A, Hamsten A, Hayward C, Heath S, Herder C, Homuth G, Hottenga JJ, Hunter- Merrill R, Illig T, Jackson AU, Jula A, Kleber M, Knouff CW, Kong A, Kooner J, Köttgen A, Kovacs P, Krohn K, Kühnel B, Kuusisto J, Laakso M, Lathrop M, Lecoeur C, Li M, Loos RJ, Luan J, Lyssenko V, Mägi R, Magnusson PK, Mälarstig A, Mangino M, Martínez-Larrad MT, März W, McArdle WL, McPherson R, Meisinger C, Meitinger T, Melander O, Mohlke KL, Mooser VE, Morken MA, Narisu N, Nathan DM, Nauck M, O'Donnell C, Oexle K, Olla N, Pankow JS, Payne F, Peden JF, Pedersen NL, Peltonen L, Perola M, Polasek O, Porcu E, Rader DJ, Rathmann W, Ripatti S, Rocheleau G, Roden M, Rudan I, Salomaa V, Saxena R, Schlessinger D, Schunkert H, Schwarz P, Seedorf U, Selvin E, Serrano-Ríos M, Shrader P, Silveira A, Siscovick D, Song K, Spector TD, Stefansson K, Steinthorsdottir V, Strachan DP, Strawbridge R, Stumvoll M, Surakka I, Swift AJ, Tanaka T, Teumer A, Thorleifsson G, Thorsteinsdottir U, Tönjes A, Usala G, Vitart V, Völzke H, Wallaschofski H, Waterworth DM, Watkins H, Wichmann HE, Wild SH, Willemsen G, Williams GH, Wilson JF, Winkelmann J, Wright AF, Zabena C, Zhao JH, Epstein SE, Erdmann J, Hakonarson HH, Kathiresan S, Khaw KT, Roberts R, Samani NJ, Fleming MD, Sladek R, Abecasis G, Boehnke M, Froguel P, Groop L, McCarthy MI, Kao WH, Florez JC, Uda M, Wareham NJ, Barroso I, Meigs JB, WTCCC. Common variants at 10 genomic loci influence hemoglobin A₁(C) levels via glycemic and nonglycemic pathways. Diabetes. 2010;59(12):3229-3239. 29. Cauchi S, Froguel P. TCF7L2 genetic defect and type 2 diabetes. Curr Diab Rep. 2008;8(2):149- 155. 30. Weedon MN. The importance of TCF7L2. Diabet Med. 2007;24(10):1062-1066. 31. Grant SF, Thorleifsson G, Reynisdottir I, Benediktsson R, Manolescu A, Sainz J, Helgason A, Stefansson H, Emilsson V, Helgadottir A, Styrkarsdottir U, Magnusson KP, Walters GB, Palsdottir E, Jonsdottir T, Gudmundsdottir T, Gylfason A, Saemundsdottir J, Wilensky RL, Reilly MP, Rader DJ, Bagger Y, Christiansen C, Gudnason V, Sigurdsson G, Thorsteinsdottir U,

121

Gulcher JR, Kong A, Stefansson K. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat Genet. 2006;38(3):320-323. 32. Gaulton KJ, Ferreira T, Lee Y, Raimondo A, Mägi R, Reschen ME, Mahajan A, Locke A, Rayner NW, Robertson N, Scott RA, Prokopenko I, Scott LJ, Green T, Sparso T, Thuillier D, Yengo L, Grallert H, Wahl S, Frånberg M, Strawbridge RJ, Kestler H, Chheda H, Eisele L, Gustafsson S, Steinthorsdottir V, Thorleifsson G, Qi L, Karssen LC, van Leeuwen EM, Willems SM, Li M, Chen H, Fuchsberger C, Kwan P, Ma C, Linderman M, Lu Y, Thomsen SK, Rundle JK, Beer NL, van de Bunt M, Chalisey A, Kang HM, Voight BF, Abecasis GR, Almgren P, Baldassarre D, Balkau B, Benediktsson R, Blüher M, Boeing H, Bonnycastle LL, Bottinger EP, Burtt NP, Carey J, Charpentier G, Chines PS, Cornelis MC, Couper DJ, Crenshaw AT, van Dam RM, Doney AS, Dorkhan M, Edkins S, Eriksson JG, Esko T, Eury E, Fadista J, Flannick J, Fontanillas P, Fox C, Franks PW, Gertow K, Gieger C, Gigante B, Gottesman O, Grant GB, Grarup N, Groves CJ, Hassinen M, Have CT, Herder C, Holmen OL, Hreidarsson AB, Humphries SE, Hunter DJ, Jackson AU, Jonsson A, Jørgensen ME, Jørgensen T, Kao WH, Kerrison ND, Kinnunen L, Klopp N, Kong A, Kovacs P, Kraft P, Kravic J, Langford C, Leander K, Liang L, Lichtner P, Lindgren CM, Lindholm E, Linneberg A, Liu CT, Lobbens S, Luan J, Lyssenko V, Männistö S, McLeod O, Meyer J, Mihailov E, Mirza G, Mühleisen TW, Müller- Nurasyid M, Navarro C, Nöthen MM, Oskolkov NN, Owen KR, Palli D, Pechlivanis S, Peltonen L, Perry JR, Platou CG, Roden M, Ruderfer D, Rybin D, van der Schouw YT, Sennblad B, Sigurðsson G, Stančáková A, Steinbach G, Storm P, Strauch K, Stringham HM, Sun Q, Thorand B, Tikkanen E, Tonjes A, Trakalo J, Tremoli E, Tuomi T, Wennauer R, Wiltshire S, Wood AR, Zeggini E, Dunham I, Birney E, Pasquali L, Ferrer J, Loos RJ, Dupuis J, Florez JC, Boerwinkle E, Pankow JS, van Duijn C, Sijbrands E, Meigs JB, Hu FB, Thorsteinsdottir U, Stefansson K, Lakka TA, Rauramaa R, Stumvoll M, Pedersen NL, Lind L, Keinanen-Kiukaanniemi SM, Korpi- Hyövälti E, Saaristo TE, Saltevo J, Kuusisto J, Laakso M, Metspalu A, Erbel R, Jöcke KH, Moebus S, Ripatti S, Salomaa V, Ingelsson E, Boehm BO, Bergman RN, Collins FS, Mohlke KL, Koistinen H, Tuomilehto J, Hveem K, Njølstad I, Deloukas P, Donnelly PJ, Frayling TM, Hattersley AT, de Faire U, Hamsten A, Illig T, Peters A, Cauchi S, Sladek R, Froguel P, Hansen T, Pedersen O, Morris AD, Palmer CN, Kathiresan S, Melander O, Nilsson PM, Groop LC, Barroso I, Langenberg C, Wareham NJ, O'Callaghan CA, Gloyn AL, Altshuler D, Boehnke M, Teslovich TM, McCarthy MI, Morris AP, Consortium DGRAM-aD. Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci. Nat Genet. 2015;47(12):1415-1425. 33. Gaulton KJ, Nammo T, Pasquali L, Simon JM, Giresi PG, Fogarty MP, Panhuis TM, Mieczkowski P, Secchi A, Bosco D, Berney T, Montanya E, Mohlke KL, Lieb JD, Ferrer J. A map of open chromatin in human pancreatic islets. Nat Genet. 2010;42(3):255-259. 34. Adams JD, Vella A. What Can Diabetes-Associated Genetic Variation in TCF7L2 Teach Us About the Pathogenesis of Type 2 Diabetes? Metab Syndr Relat Disord. 2018;16(8):383-389. 35. da Silva Xavier G, Loder MK, McDonald A, Tarasov AI, Carzaniga R, Kronenberger K, Barg S, Rutter GA. TCF7L2 regulates late events in insulin secretion from pancreatic islet beta-cells. Diabetes. 2009;58(4):894-905. 36. Shu L, Sauter NS, Schulthess FT, Matveyenko AV, Oberholzer J, Maedler K. Transcription factor 7-like 2 regulates beta-cell survival and function in human pancreatic islets. Diabetes. 2008;57(3):645-653. 37. Xia Q, Chesi A, Manduchi E, Johnston BT, Lu S, Leonard ME, Parlin UW, Rappaport EF, Huang P, Wells AD, Blobel GA, Johnson ME, Grant SFA. The type 2 diabetes presumed causal variant within TCF7L2 resides in an element that controls the expression of ACSL5. Diabetologia. 2016;59(11):2360-2368.

122

38. Gong J, Wang F, Xiao B, Panjwani N, Lin F, Keenan K, Avolio J, Esmaeili M, Zhang L, He G, Soave D, Mastromatteo S, Baskurt Z, Kim S, O'Neal WK, Polineni D, Blackman SM, Corvol H, Cutting GR, Drumm M, Knowles MR, Rommens JM, Sun L, Strug LJ. Genetic association and transcriptome integration identify contributing genes and tissues at cystic fibrosis modifier loci. PLoS Genet. 2019;15(2):e1008007. 39. Roman TS, Cannon ME, Vadlamudi S, Buchkovich ML, Wolford BN, Welch RP, Morken MA, Kwon GJ, Varshney A, Kursawe R, Wu Y, Jackson AU, Erdos MR, Kuusisto J, Laakso M, Scott LJ, Boehnke M, Collins FS, Parker SCJ, Stitzel ML, Mohlke KL, Program NIoHISCNCS. A Type 2 Diabetes-Associated Functional Regulatory Variant in a Pancreatic Islet Enhancer at the. Diabetes. 2017;66(9):2521-2530. 40. Dooley J, Tian L, Schonefeldt S, Delghingaro-Augusto V, Garcia-Perez JE, Pasciuto E, Di Marino D, Carr EJ, Oskolkov N, Lyssenko V, Franckaert D, Lagou V, Overbergh L, Vandenbussche J, Allemeersch J, Chabot-Roy G, Dahlstrom JE, Laybutt DR, Petrovsky N, Socha L, Gevaert K, Jetten AM, Lambrechts D, Linterman MA, Goodnow CC, Nolan CJ, Lesage S, Schlenner SM, Liston A. Genetic predisposition for beta cell fragility underlies type 1 and type 2 diabetes. Nat Genet. 2016;48(5):519-527. 41. Rutter GA, Chimienti F. SLC30A8 mutations in type 2 diabetes. Diabetologia. 2015;58(1):31- 36. 42. Oram RA, Patel K, Hill A, Shields B, McDonald TJ, Jones A, Hattersley AT, Weedon MN. A Type 1 Diabetes Genetic Risk Score Can Aid Discrimination Between Type 1 and Type 2 Diabetes in Young Adults. Diabetes Care. 2016;39(3):337-344. 43. Grubb AL, McDonald TJ, Rutters F, Donnelly LA, Hattersley AT, Oram RA, Palmer CNA, van der Heijden AA, Carr F, Elders PJM, Weedon MN, Slieker RC, 't Hart LM, Pearson ER, Shields BM, Jones AG. A Type 1 Diabetes Genetic Risk Score Can Identify Patients With GAD65 Autoantibody-Positive Type 2 Diabetes Who Rapidly Progress to Insulin Therapy. Diabetes Care. 2019;42(2):208-214. 44. Gottlieb PA, Yu L, Babu S, Wenzlau J, Bellin M, Frohnert BI, Moran A. No relation between cystic fibrosis-related diabetes and type 1 diabetes autoimmunity. Diabetes Care. 2012;35(8):e57. 45. Minicucci L, Cotellessa M, Pittaluga L, Minuto N, d'Annunzio G, Avanzini MA, Lorini R. Beta- cell autoantibodies and diabetes mellitus family history in cystic fibrosis. J Pediatr Endocrinol Metab. 2005;18(8):755-760. 46. Su YC, Ou HY, Wu HT, Wu P, Chen YC, Su BH, Shiau AL, Chang CJ, Wu CL. Prothymosin-α Overexpression Contributes to the Development of Insulin Resistance. J Clin Endocrinol Metab. 2015;100(11):4114-4123. 47. Mosoian A, Teixeira A, Burns CS, Sander LE, Gusella GL, He C, Blander JM, Klotman P, Klotman ME. Prothymosin-alpha inhibits HIV-1 via Toll-like receptor 4-mediated type I interferon induction. Proc Natl Acad Sci U S A. 2010;107(22):10178-10183. 48. Romani L, Oikonomou V, Moretti S, Iannitti RG, D'Adamo MC, Villella VR, Pariano M, Sforna L, Borghi M, Bellet MM, Fallarino F, Pallotta MT, Servillo G, Ferrari E, Puccetti P, Kroemer G, Pessia M, Maiuri L, Goldstein AL, Garaci E. Thymosin α1 represents a potential potent single- molecule-based therapy for cystic fibrosis. Nat Med. 2017;23(5):590-600. 49. Tomati V, Caci E, Ferrera L, Pesce E, Sondo E, Cholon DM, Quinney NL, Boyles SE, Armirotti A, Ravazzolo R, Galietta LJ, Gentzsch M, Pedemonte N. Thymosin α-1 does not correct F508del-CFTR in cystic fibrosis airway epithelia. JCI Insight. 2018;3(3). 50. Lohi H, Kujala M, Makela S, Lehtonen E, Kestila M, Saarialho-Kere U, Markovich D, Kere J. Functional characterization of three novel tissue-specific anion exchangers SLC26A7, -A8, and - A9. J Biol Chem. 2002;277(16):14246-14254.

123

51. Loriol C, Dulong S, Avella M, Gabillat N, Boulukos K, Borgese F, Ehrenfeld J. Characterization of SLC26A9, facilitation of Cl(-) transport by bicarbonate. Cell Physiol Biochem. 2008;22(1- 4):15-30. 52. Chang MH, Plata C, Sindic A, Ranatunga WK, Chen AP, Zandi-Nejad K, Chan KW, Thompson J, Mount DB, Romero MF. Slc26a9 is inhibited by the R-region of the cystic fibrosis transmembrane conductance regulator via the STAS domain. J Biol Chem. 2009;284(41):28306- 28318. 53. Soave D, Miller MR, Keenan K, Li W, Gong J, Ip W, Accurso F, Sun L, Rommens JM, Sontag M, Durie PR, Strug LJ. Evidence for a causal relationship between early exocrine pancreatic disease and cystic fibrosis-related diabetes: a Mendelian randomization study. Diabetes. 2014;63(6):2114-2119. 54. Sun L, Rommens JM, Corvol H, Li W, Li X, Chiang TA, Lin F, Dorfman R, Busson PF, Parekh RV, Zelenika D, Blackman SM, Corey M, Doshi VK, Henderson L, Naughton KM, O'Neal WK, Pace RG, Stonebraker JR, Wood SD, Wright FA, Zielenski J, Clement A, Drumm ML, Boëlle PY, Cutting GR, Knowles MR, Durie PR, Strug LJ. Multiple apical plasma membrane constituents are associated with susceptibility to meconium ileus in individuals with cystic fibrosis. Nat Genet. 2012;44(5):562-569. 55. Blackman SM, Deering-Brose R, McWilliams R, Naughton K, Coleman B, Lai T, Algire M, Beck S, Hoover-Fong J, Hamosh A, Fallin MD, West K, Arking DE, Chakravarti A, Cutler DJ, Cutting GR. Relative contribution of genetic and nongenetic modifiers to intestinal obstruction in cystic fibrosis. Gastroenterology. 2006;131(4):1030-1039. 56. Liu X, Li T, Riederer B, Lenzen H, Ludolph L, Yeruva S, Tuo B, Soleimani M, Seidler U. Loss of Slc26a9 anion transporter alters intestinal electrolyte and HCO3(-) transport and reduces survival in CFTR-deficient mice. Pflugers Arch. 2015;467(6):1261-1275. 57. Long JZ, Svensson KJ, Bateman LA, Lin H, Kamenecka T, Lokurkar IA, Lou J, Rao RR, Chang MR, Jedrychowski MP, Paulo JA, Gygi SP, Griffin PR, Nomura DK, Spiegelman BM. The Secreted Enzyme PM20D1 Regulates Lipidated Amino Acid Uncouplers of Mitochondria. Cell. 2016;166(2):424-435. 58. Wright FA, Strug LJ, Doshi VK, Commander CW, Blackman SM, Sun L, Berthiaume Y, Cutler D, Cojocaru A, Collaco JM, Corey M, Dorfman R, Goddard K, Green D, Kent JW, Lange EM, Lee S, Li W, Luo J, Mayhew GM, Naughton KM, Pace RG, Paré P, Rommens JM, Sandford A, Stonebraker JR, Sun W, Taylor C, Vanscoy LL, Zou F, Blangero J, Zielenski J, O'Neal WK, Drumm ML, Durie PR, Knowles MR, Cutting GR. Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13.2. Nat Genet. 2011;43(6):539-546. 59. Corvol H, Blackman SM, Boëlle PY, Gallins PJ, Pace RG, Stonebraker JR, Accurso FJ, Clement A, Collaco JM, Dang H, Dang AT, Franca A, Gong J, Guillot L, Keenan K, Li W, Lin F, Patrone MV, Raraigh KS, Sun L, Zhou YH, O'Neal WK, Sontag MK, Levy H, Durie PR, Rommens JM, Drumm ML, Wright FA, Strug LJ, Cutting GR, Knowles MR. Genome-wide association meta- analysis identifies five modifier loci of lung disease severity in cystic fibrosis. Nat Commun. 2015;6:8382. 60. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38(8):904-909. 61. Therneau TM. A Package for Survival Analysis in S. version 2.38. 2015. 62. Therneau TM. Package ‘coxme’, version 2.2-7. 2018. 63. Therneau T. The lmekin function. 2018. 64. Ling H, Zhang P, Pugh EW, Atalar M, Blackman SM. A Comparison of Methods for Identification of Genetic Variants Related to Age-of-Onset of Cystic Fibrosis Related Diabetes. 124

The 2017 Annual Meeting of the International Genetic Epidemiology Society, Queens’ College Cambridge, UK, 2017, p 644–709. 65. Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Patterson N, Daly MJ, Price AL, Neale BM, Consortium SWGotPG. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47(3):291-295. 66. Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh PR, Duncan L, Perry JR, Patterson N, Robinson EB, Daly MJ, Price AL, Neale BM, Consortium R, Consortium PG, 3 GCfANotWTCCC. An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015;47(11):1236-1241. 67. Manning AK, Hivert MF, Scott RA, Grimsby JL, Bouatia-Naji N, Chen H, Rybin D, Liu CT, Bielak LF, Prokopenko I, Amin N, Barnes D, Cadby G, Hottenga JJ, Ingelsson E, Jackson AU, Johnson T, Kanoni S, Ladenvall C, Lagou V, Lahti J, Lecoeur C, Liu Y, Martinez-Larrad MT, Montasser ME, Navarro P, Perry JR, Rasmussen-Torvik LJ, Salo P, Sattar N, Shungin D, Strawbridge RJ, Tanaka T, van Duijn CM, An P, de Andrade M, Andrews JS, Aspelund T, Atalay M, Aulchenko Y, Balkau B, Bandinelli S, Beckmann JS, Beilby JP, Bellis C, Bergman RN, Blangero J, Boban M, Boehnke M, Boerwinkle E, Bonnycastle LL, Boomsma DI, Borecki IB, Böttcher Y, Bouchard C, Brunner E, Budimir D, Campbell H, Carlson O, Chines PS, Clarke R, Collins FS, Corbatón-Anchuelo A, Couper D, de Faire U, Dedoussis GV, Deloukas P, Dimitriou M, Egan JM, Eiriksdottir G, Erdos MR, Eriksson JG, Eury E, Ferrucci L, Ford I, Forouhi NG, Fox CS, Franzosi MG, Franks PW, Frayling TM, Froguel P, Galan P, de Geus E, Gigante B, Glazer NL, Goel A, Groop L, Gudnason V, Hallmans G, Hamsten A, Hansson O, Harris TB, Hayward C, Heath S, Hercberg S, Hicks AA, Hingorani A, Hofman A, Hui J, Hung J, Jarvelin MR, Jhun MA, Johnson PC, Jukema JW, Jula A, Kao WH, Kaprio J, Kardia SL, Keinanen-Kiukaanniemi S, Kivimaki M, Kolcic I, Kovacs P, Kumari M, Kuusisto J, Kyvik KO, Laakso M, Lakka T, Lannfelt L, Lathrop GM, Launer LJ, Leander K, Li G, Lind L, Lindstrom J, Lobbens S, Loos RJ, Luan J, Lyssenko V, Mägi R, Magnusson PK, Marmot M, Meneton P, Mohlke KL, Mooser V, Morken MA, Miljkovic I, Narisu N, O'Connell J, Ong KK, Oostra BA, Palmer LJ, Palotie A, Pankow JS, Peden JF, Pedersen NL, Pehlic M, Peltonen L, Penninx B, Pericic M, Perola M, Perusse L, Peyser PA, Polasek O, Pramstaller PP, Province MA, Räikkönen K, Rauramaa R, Rehnberg E, Rice K, Rotter JI, Rudan I, Ruokonen A, Saaristo T, Sabater-Lleal M, Salomaa V, Savage DB, Saxena R, Schwarz P, Seedorf U, Sennblad B, Serrano-Rios M, Shuldiner AR, Sijbrands EJ, Siscovick DS, Smit JH, Small KS, Smith NL, Smith AV, Stančáková A, Stirrups K, Stumvoll M, Sun YV, Swift AJ, Tönjes A, Tuomilehto J, Trompet S, Uitterlinden AG, Uusitupa M, Vikström M, Vitart V, Vohl MC, Voight BF, Vollenweider P, Waeber G, Waterworth DM, Watkins H, Wheeler E, Widen E, Wild SH, Willems SM, Willemsen G, Wilson JF, Witteman JC, Wright AF, Yaghootkar H, Zelenika D, Zemunik T, Zgaga L, Wareham NJ, McCarthy MI, Barroso I, Watanabe RM, Florez JC, Dupuis J, Meigs JB, Langenberg C, Consortium DGRAM-aD, Consortium MTHERM. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat Genet. 2012;44(6):659-669. 68. Scott RA, Lagou V, Welch RP, Wheeler E, Montasser ME, Luan J, Mägi R, Strawbridge RJ, Rehnberg E, Gustafsson S, Kanoni S, Rasmussen-Torvik LJ, Yengo L, Lecoeur C, Shungin D, Sanna S, Sidore C, Johnson PC, Jukema JW, Johnson T, Mahajan A, Verweij N, Thorleifsson G, Hottenga JJ, Shah S, Smith AV, Sennblad B, Gieger C, Salo P, Perola M, Timpson NJ, Evans DM, Pourcain BS, Wu Y, Andrews JS, Hui J, Bielak LF, Zhao W, Horikoshi M, Navarro P, Isaacs A, O'Connell JR, Stirrups K, Vitart V, Hayward C, Esko T, Mihailov E, Fraser RM, Fall T, Voight BF, Raychaudhuri S, Chen H, Lindgren CM, Morris AP, Rayner NW, Robertson N, Rybin D, Liu CT, Beckmann JS, Willems SM, Chines PS, Jackson AU, Kang HM, Stringham HM, Song K, Tanaka T, Peden JF, Goel A, Hicks AA, An P, Müller-Nurasyid M, Franco- 125

Cereceda A, Folkersen L, Marullo L, Jansen H, Oldehinkel AJ, Bruinenberg M, Pankow JS, North KE, Forouhi NG, Loos RJ, Edkins S, Varga TV, Hallmans G, Oksa H, Antonella M, Nagaraja R, Trompet S, Ford I, Bakker SJ, Kong A, Kumari M, Gigante B, Herder C, Munroe PB, Caulfield M, Antti J, Mangino M, Small K, Miljkovic I, Liu Y, Atalay M, Kiess W, James AL, Rivadeneira F, Uitterlinden AG, Palmer CN, Doney AS, Willemsen G, Smit JH, Campbell S, Polasek O, Bonnycastle LL, Hercberg S, Dimitriou M, Bolton JL, Fowkes GR, Kovacs P, Lindström J, Zemunik T, Bandinelli S, Wild SH, Basart HV, Rathmann W, Grallert H, Maerz W, Kleber ME, Boehm BO, Peters A, Pramstaller PP, Province MA, Borecki IB, Hastie ND, Rudan I, Campbell H, Watkins H, Farrall M, Stumvoll M, Ferrucci L, Waterworth DM, Bergman RN, Collins FS, Tuomilehto J, Watanabe RM, de Geus EJ, Penninx BW, Hofman A, Oostra BA, Psaty BM, Vollenweider P, Wilson JF, Wright AF, Hovingh GK, Metspalu A, Uusitupa M, Magnusson PK, Kyvik KO, Kaprio J, Price JF, Dedoussis GV, Deloukas P, Meneton P, Lind L, Boehnke M, Shuldiner AR, van Duijn CM, Morris AD, Toenjes A, Peyser PA, Beilby JP, Körner A, Kuusisto J, Laakso M, Bornstein SR, Schwarz PE, Lakka TA, Rauramaa R, Adair LS, Smith GD, Spector TD, Illig T, de Faire U, Hamsten A, Gudnason V, Kivimaki M, Hingorani A, Keinanen-Kiukaanniemi SM, Saaristo TE, Boomsma DI, Stefansson K, van der Harst P, Dupuis J, Pedersen NL, Sattar N, Harris TB, Cucca F, Ripatti S, Salomaa V, Mohlke KL, Balkau B, Froguel P, Pouta A, Jarvelin MR, Wareham NJ, Bouatia-Naji N, McCarthy MI, Franks PW, Meigs JB, Teslovich TM, Florez JC, Langenberg C, Ingelsson E, Prokopenko I, Barroso I, Consortium DGRaM-aD. Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nat Genet. 2012;44(9):991-1005. 69. Aksit M. Data from: Cystic Fibrosis-Related Diabetes & Type 2 Diabetes Supplemental Data. 2019. 10.17605/OSF.IO/AG6ZW, OSF.IO/AG6ZW. 70. Rosenfeld M, Davis R, FitzSimmons S, Pepe M, Ramsey B. Gender gap in cystic fibrosis mortality. Am J Epidemiol. 1997;145(9):794-803. 71. Harness-Brumley CL, Elliott AC, Rosenbluth DB, Raghavan D, Jain R. Gender differences in outcomes of patients with cystic fibrosis. J Womens Health (Larchmt). 2014;23(12):1012-1020. 72. Sweezey N, Tchepichev S, Gagnon S, Fertuck K, O'Brodovich H. Female gender hormones regulate mRNA levels and function of the rat lung epithelial Na channel. Am J Physiol. 1998;274(2):C379-386. 73. Chotirmall SH, Smith SG, Gunaratnam C, Cosgrove S, Dimitrov BD, O'Neill SJ, Harvey BJ, Greene CM, McElvaney NG. Effect of estrogen on pseudomonas mucoidy and exacerbations in cystic fibrosis. N Engl J Med. 2012;366(21):1978-1986. 74. Schunkert H, König IR, Kathiresan S, Reilly MP, Assimes TL, Holm H, Preuss M, Stewart AF, Barbalic M, Gieger C, Absher D, Aherrahrou Z, Allayee H, Altshuler D, Anand SS, Andersen K, Anderson JL, Ardissino D, Ball SG, Balmforth AJ, Barnes TA, Becker DM, Becker LC, Berger K, Bis JC, Boekholdt SM, Boerwinkle E, Braund PS, Brown MJ, Burnett MS, Buysschaert I, Carlquist JF, Chen L, Cichon S, Codd V, Davies RW, Dedoussis G, Dehghan A, Demissie S, Devaney JM, Diemert P, Do R, Doering A, Eifert S, Mokhtari NE, Ellis SG, Elosua R, Engert JC, Epstein SE, de Faire U, Fischer M, Folsom AR, Freyer J, Gigante B, Girelli D, Gretarsdottir S, Gudnason V, Gulcher JR, Halperin E, Hammond N, Hazen SL, Hofman A, Horne BD, Illig T, Iribarren C, Jones GT, Jukema JW, Kaiser MA, Kaplan LM, Kastelein JJ, Khaw KT, Knowles JW, Kolovou G, Kong A, Laaksonen R, Lambrechts D, Leander K, Lettre G, Li M, Lieb W, Loley C, Lotery AJ, Mannucci PM, Maouche S, Martinelli N, McKeown PP, Meisinger C, Meitinger T, Melander O, Merlini PA, Mooser V, Morgan T, Mühleisen TW, Muhlestein JB, Münzel T, Musunuru K, Nahrstaedt J, Nelson CP, Nöthen MM, Olivieri O, Patel RS, Patterson CC, Peters A, Peyvandi F, Qu L, Quyyumi AA, Rader DJ, Rallidis LS, Rice C, Rosendaal FR, Rubin D, Salomaa V, Sampietro ML, Sandhu MS, Schadt E, Schäfer A, Schillert A, Schreiber S, 126

Schrezenmeir J, Schwartz SM, Siscovick DS, Sivananthan M, Sivapalaratnam S, Smith A, Smith TB, Snoep JD, Soranzo N, Spertus JA, Stark K, Stirrups K, Stoll M, Tang WH, Tennstedt S, Thorgeirsson G, Thorleifsson G, Tomaszewski M, Uitterlinden AG, van Rij AM, Voight BF, Wareham NJ, Wells GA, Wichmann HE, Wild PS, Willenborg C, Witteman JC, Wright BJ, Ye S, Zeller T, Ziegler A, Cambien F, Goodall AH, Cupples LA, Quertermous T, März W, Hengstenberg C, Blankenberg S, Ouwehand WH, Hall AS, Deloukas P, Thompson JR, Stefansson K, Roberts R, Thorsteinsdottir U, O'Donnell CJ, McPherson R, Erdmann J, Samani NJ, Cardiogenics, Consortium C. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet. 2011;43(4):333-338. 75. Morris AP, Voight BF, Teslovich TM, Ferreira T, Segrè AV, Steinthorsdottir V, Strawbridge RJ, Khan H, Grallert H, Mahajan A, Prokopenko I, Kang HM, Dina C, Esko T, Fraser RM, Kanoni S, Kumar A, Lagou V, Langenberg C, Luan J, Lindgren CM, Müller-Nurasyid M, Pechlivanis S, Rayner NW, Scott LJ, Wiltshire S, Yengo L, Kinnunen L, Rossin EJ, Raychaudhuri S, Johnson AD, Dimas AS, Loos RJ, Vedantam S, Chen H, Florez JC, Fox C, Liu CT, Rybin D, Couper DJ, Kao WH, Li M, Cornelis MC, Kraft P, Sun Q, van Dam RM, Stringham HM, Chines PS, Fischer K, Fontanillas P, Holmen OL, Hunt SE, Jackson AU, Kong A, Lawrence R, Meyer J, Perry JR, Platou CG, Potter S, Rehnberg E, Robertson N, Sivapalaratnam S, Stančáková A, Stirrups K, Thorleifsson G, Tikkanen E, Wood AR, Almgren P, Atalay M, Benediktsson R, Bonnycastle LL, Burtt N, Carey J, Charpentier G, Crenshaw AT, Doney AS, Dorkhan M, Edkins S, Emilsson V, Eury E, Forsen T, Gertow K, Gigante B, Grant GB, Groves CJ, Guiducci C, Herder C, Hreidarsson AB, Hui J, James A, Jonsson A, Rathmann W, Klopp N, Kravic J, Krjutškov K, Langford C, Leander K, Lindholm E, Lobbens S, Männistö S, Mirza G, Mühleisen TW, Musk B, Parkin M, Rallidis L, Saramies J, Sennblad B, Shah S, Sigurðsson G, Silveira A, Steinbach G, Thorand B, Trakalo J, Veglia F, Wennauer R, Winckler W, Zabaneh D, Campbell H, van Duijn C, Uitterlinden AG, Hofman A, Sijbrands E, Abecasis GR, Owen KR, Zeggini E, Trip MD, Forouhi NG, Syvänen AC, Eriksson JG, Peltonen L, Nöthen MM, Balkau B, Palmer CN, Lyssenko V, Tuomi T, Isomaa B, Hunter DJ, Qi L, Shuldiner AR, Roden M, Barroso I, Wilsgaard T, Beilby J, Hovingh K, Price JF, Wilson JF, Rauramaa R, Lakka TA, Lind L, Dedoussis G, Njølstad I, Pedersen NL, Khaw KT, Wareham NJ, Keinanen-Kiukaanniemi SM, Saaristo TE, Korpi-Hyövälti E, Saltevo J, Laakso M, Kuusisto J, Metspalu A, Collins FS, Mohlke KL, Bergman RN, Tuomilehto J, Boehm BO, Gieger C, Hveem K, Cauchi S, Froguel P, Baldassarre D, Tremoli E, Humphries SE, Saleheen D, Danesh J, Ingelsson E, Ripatti S, Salomaa V, Erbel R, Jöckel KH, Moebus S, Peters A, Illig T, de Faire U, Hamsten A, Morris AD, Donnelly PJ, Frayling TM, Hattersley AT, Boerwinkle E, Melander O, Kathiresan S, Nilsson PM, Deloukas P, Thorsteinsdottir U, Groop LC, Stefansson K, Hu F, Pankow JS, Dupuis J, Meigs JB, Altshuler D, Boehnke M, McCarthy MI, Consortium WTCC, Investigators M-AoGaI- rtCM, Consortium GIoATG, Consortium AGENTDA-TD, Consortium SATDSD, Consortium DGRAM-aD. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet. 2012;44(9):981-990. 76. Turki A, Al-Zaben GS, Khirallah M, Marmouch H, Mahjoub T, Almawi WY. Gender-dependent associations of CDKN2A/2B, KCNJ11, POLI, SLC30A8, and TCF7L2 variants with type 2 diabetes in (North African) Tunisian Arabs. Diabetes Res Clin Pract. 2014;103(3):e40-43. 77. Helgadottir A, Thorleifsson G, Magnusson KP, Grétarsdottir S, Steinthorsdottir V, Manolescu A, Jones GT, Rinkel GJ, Blankensteijn JD, Ronkainen A, Jääskeläinen JE, Kyo Y, Lenk GM, Sakalihasan N, Kostulas K, Gottsäter A, Flex A, Stefansson H, Hansen T, Andersen G, Weinsheimer S, Borch-Johnsen K, Jorgensen T, Shah SH, Quyyumi AA, Granger CB, Reilly MP, Austin H, Levey AI, Vaccarino V, Palsdottir E, Walters GB, Jonsdottir T, Snorradottir S, Magnusdottir D, Gudmundsson G, Ferrell RE, Sveinbjornsdottir S, Hernesniemi J, Niemelä M, Limet R, Andersen K, Sigurdsson G, Benediktsson R, Verhoeven EL, Teijink JA, Grobbee DE, 127

Rader DJ, Collier DA, Pedersen O, Pola R, Hillert J, Lindblad B, Valdimarsson EM, Magnadottir HB, Wijmenga C, Tromp G, Baas AF, Ruigrok YM, van Rij AM, Kuivaniemi H, Powell JT, Matthiasson SE, Gulcher JR, Thorgeirsson G, Kong A, Thorsteinsdottir U, Stefansson K. The same sequence variant on 9p21 associates with myocardial infarction, abdominal aortic aneurysm and intracranial aneurysm. Nat Genet. 2008;40(2):217-224. 78. Kong Y, Sharma RB, Nwosu BU, Alonso LC. Islet biology, the CDKN2A/B locus and type 2 diabetes risk. Diabetologia. 2016;59(8):1579-1593. 79. Cunnington MS, Santibanez Koref M, Mayosi BM, Burn J, Keavney B. Chromosome 9p21 SNPs Associated with Multiple Disease Phenotypes Correlate with ANRIL Expression. PLoS Genet. 2010;6(4):e1000899. 80. Cutting GR. Cystic fibrosis genetics: from molecular understanding to clinical application. Nat Rev Genet. 2015;16(1):45-56. 81. Han ST, A, Pellicore MJ, Davis EF, McCague AF, Evans TA, Joynt AT, Lu Z, Cai Z, Raraigh KS, Hong JS, Sheppard DN, Sorscher EJ, Cutting GR. Residual function of cystic fibrosis mutants predicts response to small molecule CFTR modulators. JCI Insight. 2018;3(14). 82. Wainwright CE, Elborn JS, Ramsey BW. Lumacaftor-Ivacaftor in Patients with Cystic Fibrosis Homozygous for Phe508del CFTR. N Engl J Med. 2015;373(18):1783-1784. 83. Taylor-Cousar JL, Munck A, McKone EF, van der Ent CK, Moeller A, Simard C, Wang LT, Ingenito EP, McKee C, Lu Y, Lekstrom-Himes J, Elborn JS. Tezacaftor-Ivacaftor in Patients with Cystic Fibrosis Homozygous for Phe508del. N Engl J Med. 2017;377(21):2013-2023. 84. Rowe SM, Daines C, Ringshausen FC, Kerem E, Wilson J, Tullis E, Nair N, Simard C, Han L, Ingenito EP, McKee C, Lekstrom-Himes J, Davies JC. Tezacaftor-Ivacaftor in Residual-Function Heterozygotes with Cystic Fibrosis. N Engl J Med. 2017;377(21):2024-2035. 85. Haggie PM, Phuan PW, Tan JA, Xu H, Avramescu RG, Perdomo D, Zlock L, Nielson DW, Finkbeiner WE, Lukacs GL, Verkman AS. Correctors and Potentiators Rescue Function of the Truncated W1282X-Cystic Fibrosis Transmembrane Regulator (CFTR) Translation Product. J Biol Chem. 2017;292(3):771-785. 86. Linde L, Boelz S, Nissim-Rafinia M, Oren YS, Wilschanski M, Yaacov Y, Virgilis D, Neu-Yilik G, Kulozik AE, Kerem E, Kerem B. Nonsense-mediated mRNA decay affects nonsense transcript levels and governs response of cystic fibrosis patients to gentamicin. J Clin Invest. 2007;117(3):683-692. 87. Wang W, Hong JS, Rab A, Sorscher EJ, Kirk KL. Robust Stimulation of W1282X-CFTR Channel Activity by a Combination of Allosteric Modulators. PLoS One. 2016;11(3):e0152232. 88. Hamosh A, Rosenstein BJ, Cutting GR. CFTR nonsense mutations G542X and W1282X associated with severe reduction of CFTR mRNA in nasal epithelial cells. Hum Mol Genet. 1992;1(7):542-544. 89. Will K, Dörk T, Stuhrmann M, von der Hardt H, Ellemunter H, Tümmler B, Schmidtke J. Transcript analysis of CFTR nonsense mutations in lymphocytes and nasal epithelial cells from cystic fibrosis patients. Hum Mutat. 1995;5(3):210-220. 90. Linde L, Boelz S, Neu-Yilik G, Kulozik AE, Kerem B. The efficiency of nonsense-mediated mRNA decay is an inherent character and varies among different cells. Eur J Hum Genet. 2007;15(11):1156-1162. 91. Mutyam V, Libby EF, Peng N, Hadjiliadis D, Bonk M, Solomon GM, Rowe SM. Therapeutic benefit observed with the CFTR potentiator, ivacaftor, in a CF patient homozygous for the W1282X CFTR nonsense mutation. J Cyst Fibros. 2017;16(1):24-29. 92. Oren YS, McClure ML, Rowe SM, Sorscher EJ, Bester AC, Manor M, Kerem E, Rivlin J, Zahdeh F, Mann M, Geiger T, Kerem B. The unfolded protein response affects readthrough of premature termination codons. EMBO Mol Med. 2014;6(5):685-701.

128

93. Nguyen LS, Wilkinson MF, Gecz J. Nonsense-mediated mRNA decay: inter-individual variability and human disease. Neurosci Biobehav Rev. 2014;46 Pt 2:175-186. 94. Langmead B, Trapnell, C., Pop, M., and Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. . Genome Biol. 2009;10(3). 95. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25(9):1105-1111. 96. Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol. 2013;31(1):46-53. 97. Li W. Volcano plots in analyzing differential expressions with mRNA microarrays. J Bioinform Comput Biol. 2012;10(6):1231003. 98. Isken O, Maquat LE. The multiple lives of NMD factors: balancing roles in gene and genome regulation. Nat Rev Genet. 2008;9(9):699-712. 99. Bhuvanagiri M, Schlitter AM, Hentze MW, Kulozik AE. NMD: RNA biology meets human genetic medicine. Biochem J. 2010;430(3):365-377. 100. Hug N, Longman D, Cáceres JF. Mechanism and regulation of the nonsense-mediated decay pathway. Nucleic Acids Res. 2016;44(4):1483-1495. 101. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25-29. 102. Martin L, Grigoryan A, Wang D, Wang J, Breda L, Rivella S, Cardozo T, Gardner LB. Identification and characterization of small molecules that inhibit nonsense-mediated RNA decay and suppress nonsense p53 mutations. Cancer Res. 2014;74(11):3104-3113. 103. Rowe SM, Varga K, Rab A, Bebok Z, Byram K, Li Y, Sorscher EJ, Clancy JP. Restoration of W1282X CFTR activity by enhanced expression. Am J Respir Cell Mol Biol. 2007;37(3):347- 356. 104. Maquat LE. Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat Rev Mol Cell Biol. 2004;5(2):89-99. 105. Shoshani T, Augarten A, Gazit E, Bashan N, Yahav Y, Rivlin Y, Tal A, Seret H, Yaar L, Kerem E. Association of a nonsense mutation (W1282X), the most common mutation in the Ashkenazi Jewish cystic fibrosis patients in Israel, with presentation of severe disease. Am J Hum Genet. 1992;50(1):222-228. 106. Shoshani T, Kerem E, Szeinberg A, Augarten A, Yahav Y, Cohen D, Rivlin J, Tal A, Kerem B. Similar levels of mRNA from the W1282X and the delta F508 cystic fibrosis alleles, in nasal epithelial cells. J Clin Invest. 1994;93(4):1502-1507. 107. Mendell JT, Sharifi NA, Meyers JL, Martinez-Murillo F, Dietz HC. Nonsense surveillance regulates expression of diverse classes of mammalian transcripts and mutes genomic noise. Nat Genet. 2004;36(10):1073-1078. 108. Pertea M, Shumate A, Pertea G, Varabyou A, Breitwieser FP, Chang YC, Madugundu AK, Pandey A, Salzberg SL. CHESS: a new human gene catalog curated from thousands of large- scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 2018;19(1):208. 109. Kashima I, Yamashita A, Izumi N, Kataoka N, Morishita R, Hoshino S, Ohno M, Dreyfuss G, Ohno S. Binding of a novel SMG-1-Upf1-eRF1-eRF3 complex (SURF) to the exon junction complex triggers Upf1 phosphorylation and nonsense-mediated mRNA decay. Genes Dev. 2006;20(3):355-367. 110. Zhang J, Sun X, Qian Y, LaDuca JP, Maquat LE. At least one intron is required for the nonsense-mediated decay of triosephosphate isomerase mRNA: a possible link between nuclear splicing and cytoplasmic translation. Mol Cell Biol. 1998;18(9):5272-5283. 129

111. Sharma N, Evans TA, Pellicore MJ, Davis E, Aksit MA, McCague AF, Joynt AT, Lu Z, Han ST, Anzmann AF, Lam AN, Thaxton A, West N, Merlo C, Gottschalk LB, Raraigh KS, Sosnay PR, Cotton CU, Cutting GR. Capitalizing on the heterogeneous effects of CFTR nonsense and frameshift variants to inform therapeutic strategy for cystic fibrosis. PLoS Genet. 2018;14(11):e1007723. 112. Gerbracht JV, Boehm V, Gehring NH. Plasmid transfection influences the readout of nonsense- mediated mRNA decay reporter assays in human cells. Sci Rep. 2017;7(1):10616. 113. Raraigh KS, Han ST, Davis E, Evans TA, Pellicore MJ, McCague AF, Joynt AT, Lu Z, Atalar M, Sharma N, Sheridan MB, Sosnay PR, Cutting GR. Functional Assays Are Essential for Interpretation of Missense Variants Associated with Variable Expressivity. Am J Hum Genet. 2018;102(6):1062-1077. 114. Mutyam V, Du M, Xue X, Keeling KM, White EL, Bostwick JR, Rasmussen L, Liu B, Mazur M, Hong JS, Falk Libby E, Liang F, Shang H, Mense M, Suto MJ, Bedwell DM, Rowe SM. Discovery of Clinically Approved Agents That Promote Suppression of Cystic Fibrosis Transmembrane Conductance Regulator Nonsense Mutations. Am J Respir Crit Care Med. 2016;194(9):1092-1103. 115. Xue X, Mutyam V, Thakerar A, Mobley J, Bridges RJ, Rowe SM, Keeling KM, Bedwell DM. Identification of the amino acids inserted during suppression of CFTR nonsense mutations and determination of their functional consequences. Hum Mol Genet. 2017;26(16):3116-3129. 116. Xue X, Mutyam V, Tang L, Biswas S, Du M, Jackson LA, Dai Y, Belakhov V, Shalev M, Chen F, Schacht J, J Bridges R, Baasov T, Hong J, Bedwell DM, Rowe SM. Synthetic aminoglycosides efficiently suppress cystic fibrosis transmembrane conductance regulator nonsense mutations and are enhanced by ivacaftor. Am J Respir Cell Mol Biol. 2014;50(4):805- 816. 117. Keeling KM, Xue X, Gunn G, Bedwell DM. Therapeutics based on stop codon readthrough. Annu Rev Genomics Hum Genet. 2014;15:371-394. 118. Plasschaert LW, Žilionis R, Choo-Wing R, Savova V, Knehr J, Roma G, Klein AM, Jaffe AB. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature. 2018;560(7718):377-381. 119. Montoro DT, Haber AL, Biton M, Vinarsky V, Lin B, Birket SE, Yuan F, Chen S, Leung HM, Villoria J, Rogel N, Burgin G, Tsankov AM, Waghray A, Slyper M, Waldman J, Nguyen L, Dionne D, Rozenblatt-Rosen O, Tata PR, Mou H, Shivaraju M, Bihler H, Mense M, Tearney GJ, Rowe SM, Engelhardt JF, Regev A, Rajagopal J. A revised airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature. 2018;560(7718):319-324. 120. Ottens F, Gehring NH. Physiological and pathophysiological role of nonsense-mediated mRNA decay. Pflugers Arch. 2016;468(6):1013-1028. 121. Nickless A, Jackson E, Marasa J, Nugent P, Mercer RW, Piwnica-Worms D, You Z. Intracellular calcium regulates nonsense-mediated mRNA decay. Nat Med. 2014;20(8):961-966. 122. Bhuvanagiri M, Lewis J, Putzker K, Becker JP, Leicht S, Krijgsveld J, Batra R, Turnwald B, Jovanovic B, Hauer C, Sieber J, Hentze MW, Kulozik AE. 5-azacytidine inhibits nonsense- mediated decay in a MYC-dependent fashion. EMBO Mol Med. 2014;6(12):1593-1609. 123. Roy B, Leszyk JD, Mangus DA, Jacobson A. Nonsense suppression by near-cognate tRNAs employs alternative base pairing at codon positions 1 and 3. Proc Natl Acad Sci U S A. 2015;112(10):3038-3043. 124. Huang L, Low A, Damle SS, Keenan MM, Kuntz S, Murray SF, Monia BP, Guo S. Antisense suppression of the nonsense mediated decay factor Upf3b as a potential treatment for diseases caused by nonsense mutations. Genome Biol. 2018;19(1):4.

130

125. Castro-Nallar E, Shen Y, Freishtat RJ, Pérez-Losada M, Manimaran S, Liu G, Johnson WE, Crandall KA. Integrating microbial and host transcriptomics to characterize asthma-associated microbial communities. BMC Med Genomics. 2015;8:50. 126. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7(3):562-578.

131

Curriculum Vitae Melis Atalar Aksit

733 N Broadway MRB 552, Baltimore, MD, 21205 Ph: +1 (443) 875 – 9534; e-mail: [email protected]

Education Ph.D., Human Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 2020

B.Sc., Molecular Biology and Genetics, Bilkent University, Ankara, Turkey 2015  Tuition waiver merit scholarship  Graduation with high honor Research Experience Doctoral Thesis Student, Johns Hopkins University School of Medicine, Baltimore, MD 2015 - 2020  Advisors: Garry R. Cutting, M.D. & Scott M. Blackman, M.D., Ph.D.  Thesis Title: Computational analysis of genetic modifiers of cystic fibrosis-related diabetes

Undergraduate Researcher, Bilkent University, Ankara, Turkey 2011-2015 Advisor: Tayfun Ozcelik, M.D.

Summer Undergraduate Researcher, Rockefeller University, New York, NY Summers of 2012, 2013, 2014 Advisor: Jeffrey Friedman, M.D., Ph.D. (Total 6 months) Awards 1st place among doctoral students in Genetics Research Day poster competition April 3, 2019 Maryland Genetics, Epidemiology and Medicine Training Program (MD-GEM), Baltimore, MD

Semi-finalist for Junior Investigators Best Abstract in Basic Science October 18-20, 2018 North American Cystic Fibrosis Conference, Denver, CO

2nd place among doctoral students in Genetics Research Day poster competition February 9, 2018 Maryland Genetics, Epidemiology and Medicine Training Program (MD-GEM), Baltimore, MD

Teaching Experience and Mentoring Teaching assistant, Johns Hopkins University School of Medicine, Baltimore, MD Nov 2017 – Jan 2018 Graduate course: Computational Biology and Bioinformatics

Teaching assistant, Johns Hopkins University, Center for Computational Genomics June 25 - 28 2018 Practical Genomics Workshop: From Biology to Biostatistics

Private tutor, Johns Hopkins University School of Medicine, Baltimore, MD April – May 2019

Mentor of 3 Ph.D. rotation students, Johns Hopkins University School of Medicine, Baltimore, MD 2017, 2019

Peer-Reviewed Publications 1. Aksit MA, Pace RG, Vecchio-Pagan B, Ling H, Rommens JM, Boelle P, Guillot L, Pugh E, Zhang P, Strug LJ, Drumm ML, Knowles MR, Cutting GR, Corvol H, Blackman SM. Genetic modifiers of cystic fibrosis- related diabetes have extensive overlap with type 2 diabetes and related traits. JCEM. 2019. 2. Lam ATN, Aksit MA, Vecchio-Pagan B, Sheltan CA, Anzmann AF, Goff L, Whitcomb DC, Blackman, SM, Cutting GR. Increased expression of SLC26A9 delays age-at-onset of cystic fibrosis-related diabetes. JCI. 2019.

132

3. Aksit MA†, Bowling AD†, Evans TA, Joynt AT, Osorio D, Patel S, West N, Merlo C, Sosnay PR, Cutting GR, Sharma N. Decreased mRNA and protein stability of W1282X limits response to modulator therapy. Journal of Cystic Fibrosis. Sep 2019. † equal contributions 4. Na CH, Sharma N, Madugundu AK, Chen R, Aksit MA, Rosson GD, Cutting GR, Pandey A. Integrated transcriptomic and proteomic analysis of human eccrine sweat glands identifies missing and novel proteins. Molecular & Cellular Proteomics. Apr 2019. 5. Sharma N, Evans TA, Pellicore MJ, Davis E, Aksit MA, McCague AF, Joynt AT, Lu Z, Han ST, Anzmann AF, Lam ANT, Thaxton A, West N, Merlo C, Gottschalk LB, Raraigh KS, Sosnay PR, Cotton CU, Cutting GR. Capitalizing on the heterogeneous effects of CFTR nonsense and frameshift variants to inform therapeutic strategy for cystic fibrosis. PLOS Genetics. Nov 2018. 6. Teerapuncharoen K, Wells JM, Raju SV, Raraigh KS, Atalar M, Cutting GR, Rasmussen L, Nath PH, Bhatt SP, Solomon GM, Dransfield MT, Rowe SM. Acquired CFTR Dysfunction and Radiographic Bronchiectasis in Current and Former Smokers: A Cross-Sectional Study. Ann Am Thorac Soc. 2018. 7. Raraigh KS, Han TS, Davis E, Evans TA, Pellicore MJ, McCague AF, Joynt AT, Lu Z, Atalar M, Sharma N, Sheridan MB, Sosnay PR, Cutting GR. Functional Assays Are Essential for Interpretation of Missense Variants Associated with Variable Expressivity. AJHG. 2018 June. 102(6):1062-1077. 8. Lee M, Roos P, Sharma N, Atalar M, Evans TA, Pellicore MJ, Davis E, Lam ATN, Stanley SE, Khalil SE, Solomon GM, Walker D, Raraigh KS, Vecchio-Pagán B, Armanios M, Cutting GR. Systematic Computational Identification of Variants That Activate Exonic and Intronic Cryptic Splice Sites. AJHG. 2017 May. 100;5:751-765. 9. Vecchio-Pagán B, Blackman SM, Lee M, Atalar M, Pellicore MJ, Pace RG, Franca AL, Raraigh KS, Sharma N, Knowles MR, Cutting GR. Deep resequencing of CFTR in 762 F508del homozygotes reveals clusters of non-coding variants associated with cystic fibrosis disease traits. Hum Genome Var. 2016 Nov 24;3:16038. 10. Sharma N, LaRusch J, Sosnay P, Gottschalk LB, Lopez A, Pellicore MJ, Evans T, Davis E, Atalar M, Na CH, Rosson GD, Belchis D, Milewski M, Pandey A, Cutting GR. A sequence upstream of canonical PDZ binding motif within CFTR C-terminus enhances NHERF1 interaction. Am J Physiol Lung Cell Mol Physiol. 2016.

First author conference poster and oral presentations 1. Aksit MA. Pace RG, Vecchio-Pagan B, Ling H, Rommens JM, Boelle PY, Guillot L, Raraigh KS, Pugh E, Zhang P, Strug LJ, Drumm ML, Knowles MR, Cutting GR, Corvol H, Blackman SM. Genetic modifiers of cystic fibrosis-related diabetes have extensive overlap with type 2 diabetes and related traits. Gordon Research Conference on Human Genetics and Genomics. July 7-12, 2019. WaterVille Valley, NH. Poster Presentation. 2. Aksit MA, Pace RG, Vecchio-Pagan B, Ling H, Rommens JM, Boelle PY, Guillot L, Raraigh KS, Pugh E, Zhang P, Strug LJ, Drumm ML, Knowles MR, Cutting GR, Corvol H, Blackman SM. Common variants in CDKN2B-AS1 are associated with earlier onset of CFRD in females. Maryland Genetics, Epidemiology and Medicine (MD-GEM) Genetics Research Day. April 3, 2019. Baltimore, MD. Poster Presentation. 3. Atalar M, Vecchio-Pagan B, Hua Ling, Strug LJ, Pace RG, Corvol H. Rommens JM, Drumm ML, Knowles MR, Cutting GR, Blackman SM. Identification of new genetic modifiers of CF-related diabetes and evidence of sex-dependent association. North American Cystic Fibrosis Conference. Oct 18-20 2018. Colorado, DE. Poster Presentation.

133

4. Atalar M, Vecchio-Pagan B, Hua Ling, Strug LJ, Pace RG, Corvol H. Rommens JM, Drumm ML, Knowles MR, Cutting GR, Blackman SM. Genome-wide association analysis identifies modifiers of cystic fibrosis- related diabetes. Gordon Research Conference on Human Genetic Variation and Disease. June 10-15 2018. Biddleford, ME. Poster Presentation. 5. Atalar M, Vecchio-Pagan B, Hua Ling, Strug LJ, Pace RG, Corvol H. Rommens JM, Drumm ML, Knowles MR, Cutting GR, Blackman SM. Genetic Architecture of Cystic Fibrosis-Related Diabetes. Maryland Genetics, Epidemiology and Medicine (MD-GEM) Genetics Research Day. February 9, 2018. Baltimore, MD. Poster Presentation. 6. Atalar M, Vecchio-Pagan B, Strug LJ, Pace RG, Corvol H, Rommens JM, Drumm ML, Knowles MR, Cutting GR; Blackman SM. Diabetes in cystic fibrosis and type 2 diabetes (T2D) have overlapping genetic risk architecture. Oct 17-21, 2017. The American Society of Human Genetics Meeting. Orlando, FL. Poster Presentation. 7. Atalar M, Vecchio-Pagan B, Lam AT, Davis E, Akhtar Y, Sharma N, Blackman SM, Cutting GR. Analysis of the SLC26A9 Locus as a modifier of Cystic Fibrosis. Oct 27-29, 2016. 30th Annual North American Cystic Fibrosis Conference. Orlando, FL. Platform Presentation.

134