MOLECULAR CHARACTERIZATION OF SLC26A9 AND EVALUATION OF ITS POTENTIAL ROLE AS A MODIFIER OF CYSTIC FIBROSIS

By Anh-Thu Ngoc Lam

A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy

Baltimore, Maryland May 2019

© 2019 Anh-Thu Lam All rights Reserved

Abstract

Diabetes is an age-dependent complication of Cystic fibrosis (CF) that affects ~19% of adolescents and 40-50% of adults with CF. Genome-wide studies for age-at-onset of CF- related diabetes have identified variants near and within SLC26A9. SLC26A9 encodes a chloride/bicarbonate transporter that interacts with CFTR, the defective in CF.

We have previously evaluated the genetic architecture of SLC26A9 and have identified two linkage disequilibrium (LD) blocks that contain two common haplotypes encompassing all variants associated with higher-risk (HR) and lower-risk (LR) of diabetes. No coding variants were in LD with either haplotypes. No individual variant showed greater association with diabetes than the two common haplotypes. We identified a single transcriptional start site confirming inclusion of a non-coding exon 1 in SLC26A9 transcripts in human pancreas, lung and stomach. Single-cell RNA-sequencing indicated that SLC26A9 and CFTR are coexpressed in a minor fraction of pancreatic ductal cells.

Characterization of the regulatory landscape 5’ of exon 1 and profiling of the transcriptome of the ductal pancreatic cells that express SLC26A9 and CFTR suggested that FOS and JUN transcription factors may direct SLC26A9 expression. DNA fragments from the 5’ region of SLC26A9 bearing LR variants generated 12-20% higher levels of expression in PANC-1 and CFPAC-1 cells compared to fragments containing HR variants, consistent with eQTL analysis. Finally, using primary tissue, we were able to exclude alternative splicing of the 3’ end of SLC26A9 transcripts, which have been predicted by RNA-seq. We also attempted to verify our findings by quantifying levels of

SLC26A9 and PM20D1 in primary human tissue relative to HPRT1 and TBP.

Quantitative PCR analysis revealed no observable correlation between LR or HR

ii haplotypes and the SLC26A9 expression and a subtle trend for PM20D1 expression, as predicted by eQTL. However, power analysis indicated that twice the number of samples should be analyzed to firmly conclude (alpha = 0.05) that correlation does not exist.

Collectively, our findings indicate that a modest increase in SLC26A9 expression in ductal cells of the pancreas delays the age-at-onset of diabetes thereby providing a new target for CFTR-agnostic treatment of a major complication of CF.

iii Thesis Committee

Advisor & 1st Reader: Garry R. Cutting, MD Professor Department of Genetic Medicine - Johns Hopkins University School of Medicine

Chair: Andrew S. McCallion, PhD Associate Professor Department of Molecular & Comparative Pathobiology Johns Hopkins University School of Medicine

Scott M. Blackman, MD/PhD Associate Professor Department of Pediatrics - Johns Hopkins University School of Medicine

Jill A. Fahrner, MD/PhD Assistant Professor Department of Pediatrics - Johns Hopkins University School of Medicine

2nd Reader: Daniel S. Warren, PhD Assistant Professor Department of Surgery - Johns Hopkins University School of Medicine

iv Dedication

I am dedicating this thesis to the beloved people whose love and support are the hidden strength behind my every success.

To my beloved paternal grandmother Thoai Thi Le, whose role in my life, though half way around the world, has been, and remains, immeasurable. And in loving memory to my late paternal grandfather Binh Ngoc Lam, and maternal grandparents, Long Trieu

Tran and Sau Thi Nguyen.

To my parents, Tuan Ngoc Lam and Mai Ngoc Tran, who have always been a source of inspiration and motivation and have instilled in me the virtues of perseverance, commitment and hard work. They have relentlessly encouraged me to strive for excellence and to become a proud Vietnamese-American woman. Without their selfless sacrifices and the lessons that they have taught me, I would not be where I am today.

To my younger brother Minh Quang Lam, with whom I share the love of Harry

Potter, the Patriots, Judo and Taekwondo. The rivalry between us growing up has pushed me to succeed and I am so happy to see him pursue his PhD training.

To my husband Danh Cong Do who has taken every step with me over the past 11 years: marrying him was one of the highlights of graduate school. Danh is a very giving person who has inspired me to not only be a great scientist but a better person and has been my motivation for continuing to improve my knowledge. In addition to cooking dinner every night, he has patiently listened to all of my stories about lab and experimental challenges. I am so thankful for his confidence in my abilities and for his incredible patience. I look forward to continuing my life’s endeavors and sharing many more adventures with him (and Charles Gregor the dog) by my side.

v Acknowledgments

It goes without saying that the most important person in shaping the scientific abilities of a graduate student is the thesis advisor so I would like to first express my sincere gratitude to Dr. Garry Cutting, for his continuous guidance, ceaseless patience, constant encouragement and contagious enthusiasm throughout the course of my training, especially during the low moments. He has constantly reminded me that a PhD is a marathon and not a sprint and has also been a supporter of my career goals from day one.

He has pushed me to understand and practice rigorous scientific thinking, all of which have been instrumental in my formation as an independent scientist and scholar. The critical and creative thinking that he has taught me about solving scientific problems, whether big or small, were critical in the execution of this research, and will undoubtedly guide me through the next phase of my career and beyond. I am continually inspired by his inexhaustible energy and commitment to science, his family, and the people around him and could not have imagined having a better advisor and mentor for my Ph.D. training.

Not least among the privileges of training with Dr. Cutting has been the opportunity to work with and learn from many amazingly talented people. I am so thankful for the members of the Cutting Lab, past and present, who have provided friendship and support, and with whom I have shared laughter, frustration and companionship. Some of my most challenging days in the lab were trying to generate the luciferase constructs, and Dr. Neeraj Sharma was always there to provide helpful direction or plenty of laughs when my attempts would fail. I am deeply grateful to Melis

Atalar Aksit, my “kitten pair”, without whom this project would undoubtedly not have been possible. I value our brainstorming conversations and productive collaborations, as

vi well as the resulting friendship, which I will forever treasure. Many thanks to Dr. Scott

Blackman for providing a new prospective to a problem especially when it involves statistics and for allowing me to shadow him, along with Dr. Yasmin Akhtar and Dr.

Patrick Sosnay, over in clinic. Thank you to Tricia Cornwall for making sure I got paid, reimbursed, scheduled and for keeping all of us in line and Karen for taking over as the lab social organizer and for motivating the members for our annual CF Great Strides fundraising. The Cutting lab members truly made coming to lab each day and my graduate school experience a pleasure. Thank you for the countless discussions, pep talks and laughs. Many thanks to my fellow graduate students Dr. Laura Gottschalk, Dr.

Briana Vecchio-Pagan, Dr. Melissa Lee, Dr. Ted Han, and Dr. Allison McCague for paving the way for one another, and to Anya and Alyssa, our youngest members, for the fun lunch conversations. Many thanks to the technicians who have come through the lab:

Matthew Pellicore, Taylor Evans, Emily Marcisak, Corey Lu, Kathleen Paul and in particular, Derek Luciano, who contributed significantly to the second half of my thesis work, for keeping the lab well-stocked and making the lab a vibrant place to work.

I would like to greatly acknowledge my thesis committee members, Dr. Andrew

McCallion, Dr. Scott Blackman, Dr. Jill Fahrner, Dr. Daniel Warren, Dr. Michael Parsons for their precious time, expertise and insightful feedback on various aspects of my project. I would like to thank my orals examination committee: Dr. Susan Michaelis, Dr.

Jill Fahrner, Dr. Kathleen Burns, Dr. Kirby Smith and Dr. Gregg Semenza for having faith that I would be able to complete a doctoral dissertation. I would also like to acknowledge the generosity of the Cystic Fibrosis Foundation for funding my thesis project on SLC26A9.

vii A big thanks to my wider support system. Thank you to my inspiring and challenging teachers and professors along the way. Mrs. Alexandra Pollard and Mrs. Rita

Silvia, whose pedagogical talents I witnessed in AP Biology and Spanish, respectively, at

Hanover High School, are greatly responsible for my love of learning. As an undergraduate research student, I was fortunate to have the opportunity to work and write an honors thesis under the mentorship of Dr. Bryan Ballif. He was and remains my role model as a scientist, mentor and teacher. Dr. Carmen Pont was a perpetual guide during my tenure at UVM as I attempted to navigate my myriad of academic interests. She encouraged me to pursue a degree in Spanish Language and Literature in addition to

Biochemistry and still continues to champion for my success as a student and person. I will also be forever thankful for my experience at Boston Children’s Hospital prior to starting at Hopkins because it was there that I was introduced to genetics research in Dr.

Christopher Walsh’s lab. Dr. Walsh’s big scientific vision encompassing human disease and mechanisms of brain development, combined with his ability to steadily focus on tractable scientific questions, was truly inspirational, and an important reason why I pursued a PhD training.

Life has also given me with the most amazing friends. Thank you to Dr. Sandeep

Dhall, Dr. Jungeun Sung, Dr. Michael Chou, Amanda Schmitt, Tan Tran, Xinh-Xinh

Nguyen, Ly Tran, Huy Tran, Amelia Palmer, Lauren Coneeny, Sarah Robbins, Derin

Aksit, Alejandro Betancourt, Alberto Paredes and a myriad of other friends and extended family who have accompanied me along the way for accepting nothing less than excellence from me.

viii In addition to my nuclear family, to whom I dedicate this thesis, I am blessed with a second pair of parents, Thuan Cong Do and Hoa Thi Nguyen, who provides additional pillars of security in my life and always keeps me in their prayers. Through Danh, I have also gained eight more brothers and sisters: Anh Thanh, Chi Linh, Dat, Oanh, Loan, Jose,

Phuong, and Nam. There’s never a dull moment at any of our family gatherings!

Finally, I am grateful to have had the privilege of training here at Johns Hopkins in the Human Genetics program. Many thanks to Dr. David Valle, and the admission committee, for giving me the opportunity. I would also like to thank Ms. Sandy Muscelli for making sure everything is in line for interviews, orientation, qualifications, graduation, and everything in between. To my fellow doctoral students in the program, particularly to the Class 2014, thank you for your support, feedback and friendship. This experience has afforded me the opportunity to work with some of the best and brightest, and the resources for me to achieve great success.

This thesis would not have been possible without the enormous efforts and support of countless people I am so lucky to have in my life. I could not have made it through the challenging graduate school years without the love, support, and advice from all of you.

ix Table of Contents Abstract ii

Thesis Committee iv

Dedication v

Acknowledgement vi

Table of Contents x

List of Tables xii

List of Figures xiii

Chapter 1. Introduction 1

1.1 Cystic Fibrosis 2

1.2 Cystic Fibrosis-Related Diabetes 9

1.3 Genetic Modifier of Cystic Fibrosis 12

1.4 SLC26A9 16

Chapter 2. Evaluation of SLC26A9 as a modifier of age-at-onset of CFRD 18

2.1 Introduction 19

2.2 Results 20

2.3 Discussion 30

2.4 Materials and Method 37

x Chapter 3. Evaluation of SLC26A9 and PM20D1 expression in Human Stomach,

Lung and Pancreas 76

3.1 Introduction 77

3.2 Results 79

3.3 Discussion 82

3.4 Materials and Method 86

Chapter 4. Conclusions 113

References 119

Curriculum Vitae 150

xi List of Tables Chapter 2

Table 2.1. Expression of transcription factors and CFTR in the pancreas, PANC-1, and

CFPAC-1 cells……………………………………………..………………………….64

Table 2.2 SLC26A9 and relevant expression in pancreatic cells. ……………….66

Table 2.3 Summary Statistics of association for SNPs that reached genome-wide or suggestive significance in the genome-wide association study for CFRD onset (Blackman et al., 2013)…………….………………………………………………..…….………68

Table 2.4 The number of cells sequenced by predicted cell type in our study (‘Current

Study’) and two publicly available datasets (Baron et al., 2016 (GSE84133); Segerstolpe et al., 2016 (E-MTAB-5061))………………………………………………………....70

Table 2.5 Average normalized gene expression values of selected in cells that express only CFTR, only SLC26A9, both or neither…………………………………..72

Table 2.6 Top CFRD-associated variants as eQTLs for SLC26A9……………………74

Chapter 3

Table 3.1 List of SNPs used to identify the haplotype of the gDNA extracted from human tissue samples…………………………………………………………………….……111

xii List of Figures

Chapter 2

Figure 2.1. Association of SNPs and 5kb regions across a 47.7 kb region encompassing

SLC26A9 with CFRD in 762 F508del homozygous individuals ………………..……46

Figure 2.2 Two common haplotypes that associate with age-at-onset of CFRD…..….48

Figure 2.3 Transcription Start Site of SLC26A9 begins in Exon 1……………..…….50

Figure 2.4 Regulatory annotations 5’ and within SLC26A9 from the UCSC Genome

Browser…………………...……………………………..…………………………….52

Figure 2.5 Coexpression of SLC26A9 and CFTR in pancreatic cells…………....……54

Figure 2.6 Dissection of regulatory regions 5’ of SLC26A9………………………..…56

Figure 2.7 Haplotypes observed in 762 F508del homozygous samples (1,524 ) across the SLC26A9 locus…………..…………………..….…………58

Figure 2.8 Violin plot of CFTR and SLC26A9 expression in pancreatic cells………..60

Figure 2.9 Dual Luciferase-Renilla Experimental Design……...………………….... 62

Chapter 3

Figure 3.1 RNA Sequencing Data predicts three isoforms of SLC26A9 with varying levels of expression in different human tissues……...……………………………...…91

Figure 3.2 SLC26A9 expression profile in human stomach tissues (N=20) …….……93

Figure 3.3 SLC26A9 expression profile in human lung tissue (N=25) ……...... ……95

Figure 3.4 SLC26A9 expression profile in human pancreas tissue (N=25) ………..…97

Figure 3.5 Figure 3.3 Gene expression of SLC26A9, CFTR, PM20D1 and housekeeping genes in the lung, salivary gland, pancreas, stomach and skin……...... ……99

xiii Figure 3.6 SLC26A9 quantity relative to HPRT1 and TBP in human pancreas tissue...101

Figure 3.7 SLC26A9 quantity relative to HPRT1 and TBP in human stomach tissue...103

Figure 3.8 SLC26A9 quantity relative to HPRT1 and TBP in human lung tissue……..105

Figure 3.9 PM20D1 quantity relative to HPRT1 and TBP in human pancreas tissue...107

Figure 3.10 Graphical depiction of SLC26A9 coding transcripts……...... ……109

xiv

Chapter 1 Introduction

1

1.1 Cystic Fibrosis

Cystic Fibrosis (CF) is a chronic, progressive, and often fatal autosomal recessive disease and is one of the most common lethal hereditary disorder among Caucasians. CF affects approximately 30,000 individuals in the United States with 1,000 new cases reported annually (1). Worldwide, CF affects approximately 70,000 individuals (1).

CF is caused by deleterious variants in the CF transmembrane conductance regulator

(CFTR) gene (2-4). CFTR is a cAMP-regulated chloride (Cl-) and bicarbonate transporter that is expressed on the apical membrane of epithelial cells (PMID: 20473396). CFTR plays a crucial role in the absorption and secretion of salt by numerous epithelial organs

(5). Lack of chloride transport due to loss of CFTR function in airway epithelium leads to an increase in intracellular salt concentration, hyper-absorption of sodium chloride and a decrease in the periciliary salt-water content resulting in impaired mucociliary clearance

(6). One of the hallmarks of CF is allelic heterogeneity. Over 2000 variants have been described to date in CFTR (www.genet.sickkids.on.ca/cftr/). Some of the phenotypic variability in individuals with CF can be explained by the nature of the variants in CFTR.

However, genotype-phenotype studies have revealed that variability in disease severity is influenced by factors other than the CFTR genotype (7).

Because mortality is primarily due to pulmonary insufficiency, early works to unravel the molecular and pathophysiology of CF have mainly focused on the chronic obstruction of the lung that evolves from early onset mucus plugging in the small airway (8-11).

However, CF is highly variable, multi-system disease caused by variation in CFTR.

2 Individuals with CF suffers from obstruction and inflammation not only of the lung, but also pancreas, sweat glands, intestines, bilary tract, and male reproductive tract (12). The major manifestations of CF are briefly described below.

CF phenotype

Lung

Lung disease is the most significant cause of mortality and morbidity in CF and is the cause of death in almost 90% of individuals with CF (13). The defect in chloride secretion and excessive sodium reabsorption in the lungs leads to impaired noxious matters clearance and compromises the removal of mucins and defense substances deposited onto the airway surface. This sets the stage for chronic bacterial infection, most commonly by Pseudomonas aeruginosa, Burkholderia cepacian and Staphylococcus aureus (14, 15). Chronic infection and inflammation lead to bronchiectasis (destruction of the large airways), irreversible lung damage, progressive decline in lung function, and death. Clinical trials to evaluate therapy targeting the early eradication of these pathogens and the efficacy of chronic suppressive antibiotic treatment are under investigation.

Pancreas

The exocrine pancreas provides digestive enzymes and substances critical for the digestion of food into components that are readily absorbed by the intestinal epithelium

(16). The acinar cells and the duct cells are key cellular components responsible for the secretion of fluids and digestive enzymes by the exocrine component. CFTR is highly expressed in the pancreatic duct epithelia and functions to allows anions and water to

3 enter the ductal lumen increasing the volume of alkaline fluid that solubilizes highly concentrated to be secreted by the acinar cells (17). Thus, CFTR absence in the pancreas leads to a low rate of secretions with a high protein concentration that can precipitate in the duct lumina causing obstruction, inflammation, and damage (18).

Without the proper functioning of the exocrine pancreas, malabsorption and malnutrition occur as often seen in patients with CF.

Pancreatic insufficiency in CF patients is characterized by the loss of acinar cells with fatty replacement and interstitial fibrosis and affects approximately 80% of patients (19,

20). The spectrum of pancreatic disease in CF patients varies from complete loss of exocrine and endocrine function to nearly normal pancreatic function. This is directly correlated with the class of CFTR variants (21). While the CFTR genotype does not predict survival due to the complex interaction between genetic and environmental modifiers of lung disease, it is known to be highly predictive of pancreatic exocrine status

(7, 22). For instance, only a small percentage of F508del homozygotes has preserved pancreatic function (23). Pancreatic insufficiency can be detected in neonates on the basis of elevated serum immunoreactive trypsinogen (IRT). Hence, IRT serves as a marker for newborn screening for CF (24, 25).

Cystic fibrosis-related diabetes (See section 1.2 for more details) is a unique type of diabetes, although it shares some features with both type 1 and type 2 diabetes. In individuals with CF, the thick mucus that is characteristic of the disease causes scarring of the pancreas. This inhibits the pancreas from producing normal amounts of insulin.

4 The insulin deficiency that results is similar to that found with type 1 diabetes. In addition, individuals with CFRD may not respond to insulin as efficiently, similar to those with type 2 diabetes. This is referred to as insulin resistance. CFRD affects 2% of children with CF, 19% of adolescents and 50% of adults and >90% of pancreatic insufficient individuals with CF have CFRD by age ~50 (26, 27). The development of diabetes is associated with worse lung disease and increased morbidity (28) and mortality

(26, 27).

Sweat gland

Individuals with CF have very salty sweat due to the lack of functional CFTR that prevent normal NaCl absorption. This results in the high salt concentration in sweat that form the basis of the Quantitative Pilocarpine Iontophoresis Test (QPIT); still used today as a diagnostic test of CF (29). The human’s sweat gland is composed of two structurally and functionally distinct units, the secretory coil and the reabsorptive duct epithelium (29,

30). CFTR is found in both units of the sweat gland. The secretory coil is secretesNaCl and water in response to sympathetic cholinergic innervation and beta-adrenergic stimulation. The epithelial cells of the reabsorptive duct express a constitutively active

CFTR on both the apical and basolateral membrane and together with ENaC regulates net transepithelial NaCl absorption (31, 32). The mechanism as to how the CFTR and ENaC are coordinated is still poorly understood and requires further research. Although, very salty sweat is not a major clinical problem, the sweat gland has been pivotal to the general understanding of the role of CFTR in CF.

Intestine

5 Approximately 20% of neonates with CF will present with meconium ileus (MI), an intestinal obstruction attributable to the production of a thick dehydrated mucus plugs of meconium in the terminal ileum (33). MI manifests in the distal ileum or proximal colon and necessitates surgical treatment to relieve the obstruction (34, 35). The underlying class or type of CFTR variant does correlate with the development of MI. Kerem et. al. documented a rate of MI in 29% of subsequent siblings in families in which the first child had MI compared with 6% of siblings born to families in which the first child did not have meconium ileus (36). Familial clustering and poorly correlated genotype phenotype suggest that other genetic factors, such as modifier genes, may be involved in the development of MI. Indeed, a CF modifier locus for MI has been implicated on the long arm of the human 19 (19q13) (37). Furthermore, variants in and near

SLC26A9 have also been associated with risk for meconium ileus (MI) in CF (38).

Distal intestinal obstruction syndrome (DIOS) is a common complication in adults with

CF. Accumulation of viscid fecal material with sticky mucoid intestinal content that adhere to the intestinal wall of the terminal ileum and caecum is the hallmark of DIOS.

As a result, patients with DIOS are presented with abdominal pain, distension, and vomiting due to a complete or incomplete intestinal obstruction by a fecal mass in the ileocaecum (39). The pathogenesis of DIOS is unclear; poorly controlled fat absorption may contribute to DIOS by altering the viscosity of luminal contents (40). Abnormal bile acid absorption may also play a role in the obstructive process given that their secretion and acidification can occur through a CFTR-dependent mechanism. This offers a potential mechanism for DIOS and an explanation for its occurrence in the distal ileum

(41).

6 Liver

Chronic fibrotic liver disease develops in 2-5% of CF patients and accounts for virtually all non-pulmonary causes of mortality in patients with CF (23). Liver disease in CF patient is characterized by focal biliary fibrosis. Studies that characterized liver disease in

CF have failed to find any significant association between disease severity and specific class of CFTR variants (42-44). Interestingly, patients with MI or DIOS seems to have an increased risk in the development of liver disease compared to those unaffected by these complications (44). Limited studies have attempt to investigate the role of genetic modifying factors in CF related liver disease (45, 46). However, the relation between liver disease and genetic variations has not been confirmed.

Male reproductive tract

Most men with CF are infertile because of obstructive azoospermia due to congenital bilateral absence of the vas deferens (CBAVD) (47, 48). These individuals lack the ductal system that carries sperm from the testis to the ejaculatory duct. It is hypothesized that

CFTR mutations cause defective chloride transport in the epididymis leading to early regression of the mesonephric duct (49). Interestingly, CBAVD can manifest alone as a unique genetic disorder without presenting other clinical features associated with classic

CF (50). Report of isolated CBAVD without a diagnosis of CF (only one CFTR variant has been detected), suggests the low tolerance of this organ to genetic variation in CFTR

(50, 51).

7 Managing Complications of Cystic Fibrosis

Cystic fibrosis (CF) is a progressive and life-limiting disease (52, 53). Advances in therapies that address the complications of malnutrition (54), infection (55-57), inflammation, mucus clearance (58, 59), and CFTR expression (60) have dramatically increased the longevity of patients with CF. Early screening for CF and its’ complications has also allowed prompt implementation of therapy and improved health (26, 61). For instance, pancreatic replacement enzymes and chest physiotherapy that promotes mucus clearance were vital early treatments to extend CF longevity into the adolescence years

(54). Systemic antibiotics usage such as aerosolized tobramycin or azithromycin that target Staphylococcus aureus and Pseudomonas aeruginosa was associated with improvement in clinically relevant end points and increased the age of survival into the teen years (55-57, 62, 63). The discovery of lumacaftor and ivacaftor, drugs that correct misprocessing and potentiate CFTR has allowed for the targeting of CFTR protein dysfunction, the underlying cause of CF (64-71). Although, neither ivacaftor nor lumacaftor alone has been shown to have meaningful clinical efficacy in patients who are homozygous for the p.Phe508del CFTR variant, a phase 2 and 3 study suggested that the combination of lumacaftor and ivacaftor may be sufficient to improve clinical outcomes

(71-73). Disease management in combination with a CFTR corrector and potentiator, designed to address the underlying cause of cystic fibrosis by targeting CFTR, represents a treatment milestone for individuals with CF. Collectively, these advances have led to the increase in the life expectancy of CF patients from months to ~50 years of age in the last 75 years (1). Recent clinical trials have shown (74, 75) that triple-combination therapy in patients with homozygous Phe508del CFTR mutation improves predicted

8 FEV1 compared to double-combination therapy with no reported dose-limiting side effects or toxicity. These reports represent a significant breakthrough in CF therapeutics, which will improve the survival in patients who carry the most common CFTR mutation.

1.2 Cystic Fibrosis-Related Diabetes (CFRD)

Successful management of disease symptoms and malnutrition have dramatically improved CF life expectancy well into adulthood (1). As individuals with CF live longer, age-dependent complications such as diabetes are becoming more prevalent. Although, only 2% of CF patients have CFRD in childhood, this prevalence is increased to 20% in adolescents and reaching 40-50% in adults with CF (76). In addition to CF complications,

CFRD patients also experience additional complications including excessive protein catabolism (77), decline in pulmonary function (2651148 (78), and autonomic neuropathy

(79).

CFRD has overlapping features with type 1 and type 2 diabetes (T1D and T2D, respectively) but also displays cellular, histological, and clinical differences, thereby warranting a separate diagnostic classification (3). Reduced insulin production is observed in both T1D and CFRD, however, CFRD is not associated with the islet autoimmunity that causes T1D (80). Both CFRD and T2D shows increase in prevalence with age, a progressive defect in beta cell function, and an accumulation of amyloid polypeptide in pancreatic islets (81-83), and susceptibility genes for T2D also modify

CFRD(84). However, insulin sensitivity is usually normal in CFRD in contrast to T2D

(26).

9

Etiology of CFRD

Though the etiology of CFRD is poorly understood and likely multifactorial, it has been proposed that insulin secretion diminishes as individuals with CF age due to inflammation and subsequent destruction of pancreatic islet cells (85). Other studies report that CFTR plays a direct role in the release of insulin and glucagon as well as in the protection of beta cells from oxidative stress and in controlling the resting potential of alpha and beta cells in rats (76, 86). CFTR has also been proposed as a glucose-sensing negative regulator of glucagon secretion in alpha cells in mice, a defect postulated to contribute to glucose intolerance in CF and other forms of diabetes (87). However, several observations question whether CFTR plays a direct role in insulin release from beta cells. First, the specificity of compounds used to activate and to inhibit CFTR function in beta cells has been debated (88). Second, the CFTR potentiator ivacaftor has inconsistent effects on glucose tolerance and insulin secretion on isolated pancreatic islets and in individuals with CF (89-93). Third, we and others demonstrated that CFTR RNA expression is barely detectable in beta cells while prominently transcribed in ductal cells

(85, 88, 94, 95). Fourth, there is growing evidence that variation in the risk of CFRD correlates with ductal (i.e., exocrine) dysfunction. The SNP rs7512462 in intron 5 of

SLC26A9 has been associated with variability in newborn screened immunoreactive trypsinogen levels, a biomarker of prenatal exocrine pancreatic disease in individuals with CF (96) which, in turn, has been associated with decreased risk of CFRD (97).

Notably, exocrine pancreatic dysfunction has been observed in 10-30% of individuals with T1D and T2D (98, 99). Furthermore, loss of function of the pancreatic enzyme

10 carboxyl ester lipase (CEL) due to deleterious genetic variants were associated with exocrine pancreatic disease and diabetes in two families (100). CEL is highly polymorphic and variation in the gene can be causative for mature-onset diabetes of the young and serve as a significant risk factor in chronic pancreatitis (101). Together, these studies support the concept that aberrant exocrine ductal function is a major contributor to the development of diabetes in individuals with CF.

Screening and treatment for CFRD

CFRD is often clinically silent but is associated with weight loss, protein catabolism, decline in lung function and increase in mortality (102). This warrants regular screening for individuals with CF. Hemoglobin A1C has been the standard screening test for type 2 diabetes, however, it is not sufficiently sensitive for diagnosis of CFRD (26). A large number of studies have reported a low positive predictive value of the A1C test (26).

Instead screening for CFRD should be performed using the 2-h 75-g OGTT. Although inherent variability exists, longitudinal studies have shown that a diabetes diagnosis by

OGTT correlates with clinically important CF outcomes (26, 103).

Treatment of diabetes improves nutritional status and pulmonary function (61, 84, 104).

Elucidating the mechanisms underlying this increasingly prevalent complication is essential to continued improvement in the survival of individuals with CF and may shed light on similar conditions in the general population. The goal for treating CFRD is to maintain blood sugar (glucose) at normal or near normal levels, along with a high-calorie diet, and staying active. CFRD can be well managed with insulin therapy. Insulin helps

11 maintain blood glucose levels under control. This can prevent health complications that may arise due to diabetes and help maintain a healthy weight. The combination of diabetes with CF can present challenges different from those with diabetes type 1 or 2.

For instance, insulin requirements may need to be adjusted, particularly during times when corticosteroids are taken to manage the symptoms of CF (105).

1.3 Genetic modifiers of cystic fibrosis

Despite its Mendelian pattern of inheritance, CF demonstrates considerable variability in disease severity and clinical presentation, independent of CFTR genotype. While pancreatic insufficiency, determined by deleterious variants in CFTR, is a major risk factor for developing CFRD (8), there is a high degree of variability in age-at-onset of

CFRD, even after accounting for the level of CFTR dysfunction (27). Over the years, numerous studies have been performed to uncover various pathways that have high biologic plausibility for modifying CF.

Twin studies

The lack of a cure for this life-limiting disease fuels the search for modifier genes, which will serve as new targets for therapeutic intervention. Because of this, it is critical to determine the magnitudes of non-CFTR genetic variation and environment effects that contribute to the variability of the phenotypes.

Twin and sibling studies have been used extensively to assess the relative contribution of genetic and environmental factors to a particular disease or trait (13). And while CF is

12 relatively uncommon compared to diseases that have been studied using the twin method, there are sufficient numbers of affected twins and siblings to apply this method and draw some conclusions. By comparing intrapair variability in monozygotic (MZ) twins and dizygotic (DZ) twins, the degree to which heritable factors account for variance in the disease under study could be determined. This is because DZ twins share on average 50% of genes whereas MZ twins share 100% of variants. Furthermore, the comparison of intrapair variability between DZ twins and siblings can estimate environmental contribution. This is because DZ twins, like ordinary siblings, on average share 50% of genes, but siblings do not share the in utero environment and they are born into different cohorts (13). Studies of twins and siblings with CF indicate that variants in genes other than CFTR accounts for most of the risk for developing CFRD (106). The first twin- based study to identify the contribution of a genetic modifier to CF severity used a composite measure of lung function and body mass index (BMI) and identified a genetic effect to this trait (107). Twin studies have also been used to assess the contribution of genetic modifier to CFRD. It has been reported that MZ twins display significantly higher rates of concordance than DZ twins and siblings when adjusted for age and sex with a heritability estimate approaching 1.0 (106).

Candidate gene studies

Prior to the genome-wide era, CF modifier gene studies employed biologic insight to select potential candidates. Such studies focused on association between genetic variation within groups of genes involved in the disease pathways and phenotypes. Candidate genes are most often selected for study based on a priori knowledge of the gene’s

13 biological functional impact on the trait or disease of interest. In the case of CF, these genes involve disease pathways relevant to ion transport, inflammation, tissue fibrosis, etc. Regarding CFRD, Blackman et al.(108) found that both a variant in the T2D susceptibility gene TCF7L2 and a positive family history for T2D, which approximately triples the risk, were highly associated with the risk of CFRD. In this case, pathways that operated in complex disease was shown to overlap with monogenic diseases like CF. In

Blackman et al., 2013, gene variants in four loci (TCF7L2, CDKAL1, CDKN2A/B, and

IGF2BP2) were reported to associate with both T2D and CFRD (p-value: 0.004) (84).

These four loci (along with SLC26A9) were estimated to account for 8.3% of the phenotypic variance in CFRD onset and had a combined population-attributable risk of

68% (84).

Genome-wide association studies

To discover novel modifier genes on the basis of localizing common variants in complex disorders, genome-wide association studies (GWAS) have been widely utilized. The development of the HapMap project (https://www.genome.gov/10001688/international- hapmap-project) and the discovery of millions of single nucleotide polymorphisms

(SNPs) scattered across the genome has facilitated the implementation of GWAS.

To identify novel modifiers associated with the age-at-onset of CFRD and to further determine the degree of overlap between CFRD and type 2 diabetes, a GWAS was performed in 2013 by Blackman et al. in 3,059 individuals with CF (644 with CFRD).

This study identified SNPs 5’ of and within noncoding regions of SLC26A9 on

14 chromosome 1 (rs4077468; p-value: 2.8E-8) (84). No locus other than the SLC26A9 locus contained SNPs with p-values surpassing a genome-wide suggestive threshold. A phase 2 analysis was performed in 2017 to include an addition 2,681 individuals that replicated the same variants in the SLC26A9 locus (rs4077468; p-value:1.2E-2). The combined

GWAS analysis, which includes individuals who were included in the first CFRD

GWAS, for a total of 5,740 individuals (1,339 with CFRD) confirmed significant association between the SLC26A9 locus (rs4077468; p-value:2.25e-8) and CFRD.

Restricting analysis to only the Phase 2 individuals (individuals not in (84)) demonstrates independent support for the signal at this locus (rs4077468 p-value: 1.21e-2; Phase 1 p- value: 1.20e-7). Furthermore, restricting to individuals who were also not in the replication study of Phase 2 (n=2,170) also supported association with CFRD (p-value:

1.1e-2).

From the combined analysis, variants exhibiting genome-wide significant association with age-at-onset of CFRD were also identified on chromosome 2 and 10. The chromosome 10 locus contains associated variants within and around the TCF7L2 gene that exceed genome-wide significance (rs34872471; combined Phase 1 + 2 p:2.80e-12;

Phase 1 p: 2.58e-06, Phase 2 p: 9.69e-8). CFRD-associated variants in TCF7L2 all appear to be in high LD. The same variants with the same direction of effect have been shown in several populations to be associated with increased risk of T2D and were previously reported to be associated with CFRD using candidate gene based approaches in GWAS

Phase 1 (38, 84).

15 The chromosome 2 locus contains multiple associated variants surrounding genes PMTA,

PDE6D and COPS7B with the most significantly associated variant located 12.5 kb 5’

PTMA (rs838455; p-value: 2.98e-8). None of these variants are significant eQTLs for any gene within 100 kb in the pancreas, adipose tissue, brain or muscle (GTEx, v7, dbGaP

Accession phs000424.v7.p2) (109). The new analysis (Aksit et al.) is currently under review.

1.4 SLC26A9

SLC26A9 is a member of the solute-lined carrier 26 (SLC26) family of anion transporters

- - - that functions as a WNK kinase-regulated Cl /HCO3 exchanger and Cl channel (and possibly as a Na+-anion cotransporter) (110-114). Interestingly, a recent cryo-EM study implicated that murine Slc26a9 can oligomerize to facilitate the rapid transport of chloride ion with negligible bicarbonate or sulfate permeability (115). SLC26 members have a conserved structure organization that includes a N-terminal sulphate permease transmembrane domain, a cytosolic C-terminal sulphate transporter, and an anti-sigma factor antagonist (STAS) domain (116). The STAS domain can interact with scaffolding proteins and other membrane ion channels such as CFTR. SLC26A9 has a diverse range of functions including acid regulation in the gastric parietal cells (117, 118), bicarbonate transport in the intestine (118) regulation of systemic arterial pressure and chloride excretion in kidney medullary collecting duct (119), and secretion of bicarbonate rich fluid in the exocrine pancreas (120). In the lung, SLC26A9 contributes to constitutive chloride secretion in the airway (121) and mucociliary clearance (122). Thus, slc26a9- deficient mice showed early signs of airway mucus obstruction leading to epithelial

16 necrosis and neutrophilic inflammation resulting in death (123). Variants in SLC26A9 have been previously associated with atypical CF-like lung disease and risk for asthma

(122, 124) and modulation of airway response to CFTR-directed therapeutics (125, 126).

SLC26A9 has been reported to be expressed in epithelial cells of the lung and stomach and multiple other tissues including stomach, salivary gland, heart, skin, kidney, thyroid and prostate (109, 110, 113, 127-129).

SLC26A9 is a compelling candidate as a modifier of CFRD. First, in vitro studies have suggested that SLC26A9 interacts with CFTR via its STAS domain and PDZ-binding motif and constitutive basal chloride conductance generated by SLC26A9 is regulated by

CFTR (113, 121, 130). Second, variants in SLC26A9 have been shown to modify prenatal exocrine pancreatic damage in cystic fibrosis (assessed by immunotrypsinogen (IRT) levels at birth) (96), and these variants confer risk for CFRD and for meconium ileus by affecting exocrine pancreatic function (97, 131). Third, the CFRD-associated variants in and near SLC26A9 have also been associated with risk for neonatal intestinal obstruction

(meconium ileus; MI) in CF (38), a complication that appears to be intimately linked to pancreatic exocrine insufficiency (132). Understanding SLC26A9 regulation and expression in pancreatic tissues along with its interaction network may serve as a foundation to further explore SLC26A9-targeted therapy for CF and other epithelial- obstructive diseases.

17

Chapter 2: Genetic and Regulatory Architecture and Molecular Characterization of human solute carrier family 26 member 9 (SLC26A9)

18 2.1 Introduction

Advances in CF-related therapies have led to the increase in the life expectancy of CF patients. Age-dependent complications such as diabetes are becoming more prevalent in

CF patients. Cystic fibrosis–related diabetes (CFRD) is the most common comorbidity in people with CF. While CFRD shares some features with type 1 and type 2 diabetes, it is a distinct clinical entity. CFRD is caused by insulin insufficiency (similar to that of the type 1 diabetes), but fluctuating levels of insulin resistance (similar to that of type 2 diabetes) relating to acute and chronic disease severity are also observed. Thus, elucidating the mechanisms underlying CFRD disease severity and complications may be essential for continued improvement in the survival of individuals with CF.

Modifiers reveal potential pathways that can be targeted for therapeutic interventions and individualized treatment of CF that can operate beyond dysfunction of the causal gene.

Genetic modifiers are currently being explored as a therapeutic target for diseases such like Duchenne muscular dystrophy (133) and Huntington’s disease (134). Importantly, a

CFTR-agnostic approach may be needed for diabetes as CFTR modulators that effect dramatic improvements in lung function have not provided clear evidence of improvement in diabetic status (89, 90, 93, 135, 136). In this study, we analyzed the role of CFRD-risk variants that reside upstream of SLC26A9 to elucidate the mechanism by which these variants affect the age-at-onset of CFRD. Greater understanding of the pathologic mechanism(s) provides insight that can inform molecular based treatments to delay or avert onset of diabetes.

19 This chapter will detail the molecular characterization of the SLC26A9 gene, evaluate

SLC26A9 as a modifier gene and determine the role of the CFRD-risk variants in the age- at-onset of diabetes in CF. We show that although multiple alternative splicing of the 5’ end of SLC26A9 have been predicted, we have found only one isoform that begins with exon 1. We illustrate using single-cell RNA-sequencing that SLC26A9 and CFTR are coexpressed in a minor fraction of ductal cells in the pancreas. We also provide functional assessment of the variants 5’ of SLC26A9 that shows how these variants act in concert with the basal promoter of SLC26A9 to alter the expression of SLC26A9 in pancreatic cells.

2.2 Results

CFRD-associated variants in the SLC26A9 are common and noncoding. To evaluate the genetic architecture of SLC26A9, we sequenced 47.7 kb encompassing the SLC26A9 locus (9.9 kb 5’, 30.4 kb gene and 7.4 kb 3’) in 762 individuals with CF who are homozygous for the common CF-causing variant, p.Phe508del (legacy name: F508del)

(see Methods for details). The sequenced region completely encompassed the variants 5’ and within SLC26A9 that are significantly associated with age-at-onset of diabetes (84).

Using linear regression of martingale residuals of age-at-onset of CFRD (Figure 2.1A), we observed that the variants that achieved significance in the genome-wide study were associated with CFRD in this dataset (p<0.005) (Table 2.3) rs7512462 in intron 5 had the lowest p-value, however a cluster of variants in intron 1 and 5’ of SLC26A9 were also significantly associated with age-at-onset of CFRD. All significantly CFRD-associated variants were in non-coding regions, either intronic or 5’ of the gene. No individual

20 variant was associated with CFRD by more than an order of magnitude compared to the next most significant variant (Table 2.3; Figure 2.1A).

To determine whether any combination of physically close variants display more robust association with age-at-onset of CFRD than individual variants, we conducted burden testing using the SKAT-O algorithm on 5kb sliding windows (see Methods for details).

For reference, 5kb was sufficiently large to encompass all genome-wide significant variants in the 5’ region of SLC26A9. Numerous combinations of common variants

(minor allele frequency; MAF>1%) in intron 1 and 5’ of SLC26A9 significantly associated with age-at-onset of CFRD (p<2.7E-4) but none achieved greater significance than observed with individual common variants in this region (Figure 2.1B, top panel).

Notably, variant combinations that included rs7512462 in intron 5 generated less robust evidence of association than variant combinations in intron 1 and 5’ of SLC26A9. None of the rare variants or 5kb windows containing only rare variants were significantly associated with age-at-onset of CFRD (Figure 2.1B, bottom panel). These results show that neither a single common or rare variant nor a combination of physically close variants solely accounts for the association with age-at-onset of CFRD in this region.

Consequently, we tested the effects of association of the naturally occurring combinations of variants (i.e., haplotypes) with age-at-onset of diabetes.

CFRD-associated variants are in linkage disequilibrium and combine into haplotypes that associate with either high risk or low risk of CFRD. The analysis of single and small clusters of variants suggested that association with CFRD is likely due to

21 multiple variants, possibly distributed over several regions of SLC26A9. To address this concept, we derived the haplotypes formed by common variants (MAF>15%) for all 762 individuals that were sequenced. Two ancestrally maintained regions (i.e., linkage disequilibrium (LD) blocks) defined by a single recombination event between introns 5 and 8 were identified (Figure 2.2A bottom; note SLC26A9 is on the (-) DNA strand). All

CFRD-associated variants located in the region encompassing portions of intron 5 and extending 9.9 kb 5’ of the first exon of SLC26A9 were commonly inherited together (i.e., high LD; D’>0.80) (Figure 2.2A bottom). This LD block has two common haplotypes that associated with CFRD; one associated with later onset of CFRD (Low Risk; LR;

Minor Haplotype Frequency (MHF): 28.4%; p-value: 1.14E-03) while the second associated with earlier onset of CFRD (High Risk; HR; MHF: 24.1%; p-value: 4.34E-03)

(Figure 2.2A top). The LR haplotype contains all the alleles of the variants that associated with later onset of CFRD in the GWAS (84) (labeled with * in Figure 2.7), and the HR haplotype contains all alleles associated with earlier onset of CFRD. The finding that the

HR and LR haplotypes were associated with CFRD is based on 594 individuals with phenotype information available, of which 457 have at least one HR or LR haplotype and

137 did not. In addition to reporting the significance of the association of the LR and HR haplotypes with age-at-onset of CFRD, we illustrated the strength of the clinical association in the dataset by performing a log rank test for difference in proportion with

CFRD in the 82 individuals carrying either two copies of the LR haplotype or two copies of the HR haplotype. Using this subset of individuals, we show that the cumulative incidence of CFRD differed significantly between individuals homozygous for the LR haplotype (LR/LR) and those homozygous for the HR haplotype (HR/HR); log rank p-

22 value: 6.5E-3; Figure 2B). From a clinical perspective, by age 40, >80% of individuals with two copies of the HR haplotype (HR/HR) have developed CFRD compared to only

~25% of LR/LR individuals. A third less common haplotype (High Risk 2) that shares 11 of the 12 CFRD-associated alleles with the HR haplotype also associated with earlier age-at-onset of diabetes (Figure 2.7). These analyses indicated that the SLC26A9 variants operate in concert to modify age-at-onset of diabetes in CF.

SLC26A9 mRNA transcripts from pancreas, lung and stomach contain non-coding exon 1. Exon 1 of SLC26A9 is predicted to be non-coding contributing only to the 5’ untranslated sequence mRNA transcripts. As non-coding 5’ exons can play a role in temporal or spatial gene expression (110), the location of the CFRD-associated variants upstream and downstream of exon 1 suggested that they may influence SLC26A9 expression. However, alternative splicing of the 5’ end of SLC26A9 leading to exclusion of exon 1 has been reported by the Human and Vertebrate Analysis and Annotation

(HAVANA) project

(http://www.sanger.ac.uk/research/projects/vertebrategenome/havana/). Furthermore, the transcription start site (TSS) of SLC26A9 has only been mapped in RNA from human lung. Therefore, we sought to determine whether SLC26A9 transcripts in additional tissues relevant to CF, contained non-coding exon 1 and if so, the exact location of the

TSS using 5’ Rapid Amplification of cDNA Ends (RACE). 5’ RACE products from three unrelated human lung samples (5, 16, 8 transcripts, respectively), one human stomach sample (3 transcripts) and one human pancreas sample (2 transcripts) confirmed that

SLC26A9 mRNA transcripts contain exon 1 and that the TSS map in all three tissues to

23 chr1:205,912,584 (hg19) (Figure 2.3). The major TSS is four nucleotides 3’ relative to a previously reported TSS (110). The sequencing traces were contiguous across 4 exon- exon junctions confirming that amplification was from mRNA transcript. Four 5’ RACE transcripts from one of the 3 lung samples had an alternative TSS beginning at position chr1:205,912,548 (hg19) which is 56 nucleotides upstream of the exon1/exon 2 junction.

It is not clear if this is a minor TSS or the result of incomplete extension of the 5’ RACE.

The establishment of the TSS confirmed that the first exon of the SLC26A9 gene is embedded within the variants that form the CFRD risk haplotypes.

Regulatory regions in the 5’ region and first intron of SLC26A9. The region 5’ of the major TSS contains a TATA (TATAAAC) box 29 bp upstream as well as a CCATT

(GCCAATC) box 77 bp upstream. In addition, the region encompassing exon 1 and extending approximately 550 bp upstream is highly conserved across species (Figure

2.4). These features are attributes of a basal promoter. To search for potential regulatory regions encompassing exon 1 of SLC26A9, we used the Open Regulatory Annotation database (ORegAnno) track on the UCSC genome browser, which contains curated regulatory annotation derived from experimental data (137) (Figure 2.4). General binding sequences (GBSs) that interact with transcription factors (TFs) GATA3, NFYA and NFYB were mapped to the immediate 5’ region (Figure 2.4, blue highlighted box).

While the CFRD-associated variant rs1342063 falls within a TF cluster in this region, it does not affect any consensus TF binding motif according to the JASPAR core database

(138). Also present 5’ of exon 1 are GBSs that interact with FOS, JUNB, JUND, and

FOSL2 (Figure 2.4, yellow highlighted box) as well as for MAFF, MAFK, TFAP2C,

24 FOXA1, GATA3, and TFAP2A (Figure 2.4, green highlighted box). In intron 1, GBSs that interact with FOXA1, STAT1, SP1, USF2, TFAP2C and MAX have been mapped. CFRD- associated variant rs7555534 in intron 1 falls within the GBS of TFAP2C and FOXA1 but it does not alter any consensus binding motifs for the TFs according to the JASPAR core database (138). The location of ENCODE regulatory regions 5kb upstream of exon 1 and within the first intron suggests that CFRD risk may haplotypes influence the expression of SLC26A9.

SLC26A9 and CFTR are co-expressed in a discrete population of pancreatic cells with ductal characteristics. To assess which pancreatic cell types express SLC26A9, and whether it is co-expressed with CFTR, we conducted single-cell RNA-sequencing

(scRNA-seq) of the pancreas obtained from a pediatric individual with early chronic pancreatitis in the absence of CF. Using the Seurat pipeline (139), we were able to identify all major pancreatic cell types in addition to a cell type that contained characteristics of ductal and acinar cells (ductal/acinar; Figure 2.5A). Of the 2,999 of pancreatic single cells, CFTR was expressed in 531 cells (86.5% ductal and ductal/acinar), SLC26A9 was expressed in 15 cells, and 11 cells expressed both SLC26A9 and CFTR (100% ductal and ductal/acinar; hypergeometric test for co-expression p- value: 2.31E-07) (Figure 2.5B and C and Table 2.1). Re-analysis of scRNA-seq data from four studies containing a total of 31 pancreatic samples obtained from individuals of varying age and disease status (4 adults (95); 7 healthy adults, 1 T1D adult, 3 T2D adults,

2 healthy children (140); 4 adults (141) and 6 healthy and 4 T2D donors of varying BMI and age (94)) revealed that CFTR and SLC26A9 are co-expressed in a small subset of

25 ductal pancreatic cells in each dataset (Table 2.2). Data from two studies (94, 95) also confirmed that the co-expressing cells were primarily ductal (Figures 2.5D and E). The fraction of ductal cells that express CFTR ranges from 35.7% to 96.9% across studies.

SLC26A9 expression is detected in a lower fraction of ductal cells ranging from 1.4% to

17%. This variation likely reflects the different pancreatic tissue sampling approaches in the three studies, as illustrated by their differences in cellular composition (Table2.4).

While CFTR is expressed at relatively high levels in a fraction of ductal cells, both CFTR and SLC26A9 demonstrated variable expression among acinar and acinar/ductal cells in our sample (Figure 2.8). It is important to mention that the co-expression of CFTR and

SLC26A9 is not merely due to the broad presence of CFTR in ductal cells and presence of

SLC26A9 in the same cell type. The hypergeometric test showed that the co-occurrence of both transcripts in the same cells is highly significant given the distribution of the two genes across all pancreatic cell types. Of note, CFTR RNA expression is very low in beta cells (2/531 CFTR-expressing cells are beta cells) while prominently transcribed in ductal cells (Table 2.2). This finding was consistent with our re-analysis of data from other studies (10/478 (94) and 0/389 (95) of CFTR-expressing cells are beta cells) (Figures

2.5D and E) and with re-analyses reported by other groups (85, 142).

We next determined whether pancreatic cells that express SLC26A9 also express the TFs that have binding sites surrounding exon 1 (Figure 2.4). FOS, JUNB and JUND transcripts were broadly expressed and found in the majority of cells expressing

SLC26A9 (Table 2.1). At the other end of the spectrum, FOXA1, TFAP2C, GATA3 and

TFAP2A transcripts were not detected in cells expressing SLC26A9 in our pancreatic

26 sample. Of the TFs expressed in fewer cells (32 to 296 out of 2999 cells), FOSL2, SP1, and MAFK are co-expressed in a small but significant fraction of SLC26A9-expressing cells (Table 2.1; above dotted line). Re-analysis of four published pancreatic scRNA-seq datasets (94, 95, 140, 141) revealed similar patterns with FOS, JUNB and JUND being broadly expressed and found in the majority of SLC26A9-expressing ductal cells while

FOSL2 and SP1 were expressed in fewer cells but significantly co-expressed with

SLC26A9 (Table 2.2) (94, 95, 140, 141). Furthermore, FOXA1, TFAP2C, GATA3 and

TFAP2A TFs were either absent or present in only a few cells that expressed SLC26A9.

One notable difference from our scRNA-seq data was that MAFF was present in a relatively high fraction of SLC26A9-expressing cells in all four published datasets. From these results, we noted that binding sequences of the four TFs consistently present in

SLC26A9-expressing cells (FOS, JUNB, JUND and FOSL2) occur in a cluster 5’ of exon

1 (Figure 2.4).

To characterize the pancreatic ductal cells that express SLC26A9, we evaluated expression of apical and/or basolateral channels and bicarbonate transporters using our scRNA-seq data and the four publicly available data sets. We focused our search on genes encoding proteins that have been detected in pancreatic ductal cells by biochemical and electrophysiological methods (143-145). We also examined the expression of selected genes relevant to SLC26A9 and CFTR (e.g. WNK family and FOXI1+). Our analysis revealed that cells expressing CFTR and SLC26A9 also consistently express

Aquaporin 1 (AQP1) and SLC4A4 (NBCe1-B) in our scRNA-seq study and the four publicly available datasets (Table 2.5). In most studies, SCNN1A (ENaC alpha subunit),

27 SLC4A2 (AE2) and activators (STK39 (SPAK) and WNK1) appear to be expressed in ductal cells that co-express SLC26A9 and CFTR. Notably absent (or very minimally expressed) are WNK4 and other SLC26 transporters (A3, A4 and A6). We did not find evidence of a cell population that expressed high levels of CFTR along with FOXI1+ or vATPase genes (ATP6V1C2 and ATP6V0D2) similar to ionocytes that have been reported in the lung (146, 147).

DNA fragments 5’ of SLC26A9 bearing CFRD low risk haplotype generate higher levels of reporter gene expression than high risk CFRD haplotype. To determine if the region containing the diabetes-associated variants drive expression at different levels in the pancreas, four DNA fragments from the 5’ region of SLC26A9 (Figure 2.6A) containing either HR and LR variants were cloned into a firefly luciferase reporter construct (pGL4.10, Promega) in the native orientation (SLC26A9 resides on the negative strand). All SLC26A9 constructs were tested in the PANC-1 cell line, a human pancreatic adenocarcinoma cell line that is of ductal cell origin (148) but also is a surrogate for pancreatic progenitor cells since they can be induced to differentiate into insulin- producing cells (149). A renilla construct (pRL-TK, Promega) was included to normalize for transfection efficiency. Analysis of RNA-seq data available on the sequence read archive demonstrated that PANC-1 cells express TFs FOS, JUNB, JUND and FOSL2 that have putative binding sites in the 5’ region of SLC26A9 (Table 2.1). Both SLC26A9 and

CFTR are expressed in PANC-1 cells, albeit at low levels relative to the aforementioned

TFs (Table 2.1) likely due to inactivation of their promoters, as observed in other immortalized cell lines (150).

28

The 1.172 kb DNA fragment immediately adjacent to exon 1 generated robust luciferase expression consistent with our expectation that this region encompassed the basal promoter of SLC26A9. Although 2 CFRD-associated variants are in this region, no differences in expression levels were noted when DNA fragments bearing the LR (blue) or HR (red) alleles were analyzed (Figure 2.6B). We next examined the region immediately adjacent and upstream of the 1.172 kb region that contains 3 CFRD- associated variants. Constructs containing the 1.173kb region displayed little to no luciferase expression, similar to negative controls (Figure 2.6B). However, when fused to the 1.172 kb region to form a contiguous 2.3 kb fragment, we noted that 3 out of the 4 LR

2.3kb clones consistently differed in luciferase expression levels from clones with HR alleles (Figure 2.6B). Combined analysis of the normalized data from 3 independent transfections with 4 biological clones per haplotype (technical replicates: transfection well N=71 for LR and N=72 for HR; Figure 2.9) revealed that the fragment containing variants associated with LR of diabetes had a difference in means of 12% higher activity compared to HR (p-value: 5.15E-09). Addition of 2.5 kb of sequence from the region immediately adjacent and upstream of the 2.3 kb regions formed a 4.8 kb fragment containing all 6 of the CFRD-associated variants residing 5’ of SLC26A9. Notably, both clones bearing the LR haplotype generated an overall difference in means of 19% higher expression level compared to clones bearing the HR haplotype (p-value: 6.28E-07)

(Figure 2.6B).

29 We also tested the 2.3 kb LR and HR constructs in a second cell line, CFPAC-1, a pancreatic ductal adenocarcinoma cell line derived from an individual with CF (151,

152). CFPAC-1 cells express TFs FOS, JUNB, JUND and FOSL2 and have very low levels of endogenous CFTR and SLC26A9 expression, as noted for PANC-1 cells (Table

2.1). LR constructs demonstrated significantly higher expression than HR constructs in two independent transfections of 4 clones per construct (Figure 2.6C). Overall, LR exhibited 20% higher expression than HR (p-value: 2.00E-03 (N=48 for LR, N=47 for

HR)) in CFPAC-1 cells. From these results, we concluded that CFRD-associated variants in the 5’ region act in concert with its basal promoter to alter the expression of SLC26A9 in pancreatic cells.

eQTL analysis suggests that low risk alleles of CFRD variants are associated with increased expression of SLC26A9. We downloaded publicly available data from the

Genotype-Tissue Expression (GTEx, v7) portal to determine whether the CFRD risk variants associate with SLC26A9 RNA expression in the pancreas. Results show that the

CFRD-associated variants associate with SLC26A9 RNA expression in the pancreas.

Alleles on the LR haplotype were associated with increased expression of SLC26A9 in the pancreas, but it did not correlate with expression in the lung (Table 2.6), as recently reported (131).

2.3 Discussion

The goal of this study was to determine if variants associated with age-at-onset of cystic fibrosis-related diabetes (CFRD) affected the expression of SLC26A9. We discovered that

30 the alleles of the CFRD-risk variants are co-inherited as two common haplotypes, one that is associated with later onset of CFRD (Low Risk; LR), and the other that is associated with earlier onset of CFRD (High Risk; HR). A third less common haplotype similar to HR also associated with earlier onset of diabetes and it is possible that other less common haplotypes bearing the majority of the CFRD-risk variants also correlate with CFRD, but are not sufficiently frequent to allow detection of association in the 762 individuals studied here. There was no evidence that a coding or rare variant accounted for the CFRD association. Mapping of the major TSS indicate that the non-coding first exon of SLC26A9 is placed in the middle of the cluster of CFRD-risk variants in the 5’ region of SLC26A9. These results suggested that the HR and LR CFRD haplotypes affect transcriptional regulation of SLC26A9. Characterization of the transcription factor binding sites 5’ of exon 1 and profiling of the transcriptome of the ductal pancreatic cells that express SLC26A9 indicated that the TFs FOS and JUN likely direct SLC26A9 expression. DNA fragments derived from the 5’ region of SLC26A9 were transcriptionally active in pancreatic ductal cell line models (PANC-1 and CFPAC-1) that express FOS and JUN TFs. Reporter assays showed that the presence of variants corresponding to the LR haplotype showed 12-20% higher levels of expression compared to the HR haplotype in both pancreatic ductal cell lines. The CFPAC-1 cell line demonstrated that absence of CFTR (as seen in CF) did not alter the difference in expression between the LR and HR constructs. Collectively, our findings indicate an increase in the expression of SLC26A9 in ductal cells of the pancreas delays the age-at- onset of diabetes in individuals with CF.

31 Locating the 5’ TSS was essential to establish whether the non-coding exon 1 was included in SLC26A9 RNA transcripts. Mapping to the same nucleotide in multiple independent transcripts from three different tissues (pancreas, lung and stomach) confirmed that the full-length transcript had been obtained. As the previously reported

TSS was also determined using RNA from the lung, the inconsistency between the major

TSS we found and the previously reported TSS (4 base pairs longer (110) is likely due to technical reasons. Placement of the TSS upstream of exon 2 verifies inclusion of a non- coding first exon in the majority of SLC26A9 transcripts. Non-coding first exons have been generally thought to fulfill regulatory roles in gene expression (e.g. by controlling translation efficiency and mRNA stability). This control may occur through the primary sequence of the 5’UTR as well as secondary structure of the RNA. The latter governs the recognition and interaction with a combination of factors important for translation and stability (153, 154). However, we did not discover any variants in the 5’UTR of

SLC26A9 encoded by exon 1 that might be postulated to affect transcript stability, leading us to focus on upstream sequences.

To assess the appropriate cellular context for evaluating the putative regulatory regions and the effect of the CFRD-associated variants, we established the pancreatic cell types that express SLC26A9. Single-cell RNA-sequencing (scRNA-seq) revealed that

SLC26A9 is expressed in a minor fraction of ductal cells. Since our study was performed on a single pediatric chronic pancreatitis case, we confirmed and extended our findings using scRNA-seq data from four additional publicly available studies of 31 pancreas tissues from children and adults (94, 95, 140, 141). We have not been able to evaluate the

32 expression profile of SLC26A9 during development when exocrine pancreatic damage first occurs in individuals with CF. This is likely to be relevant as observations in mice indicate that SLC26A9 expression is considerably higher in utero and decreases shortly after birth (125). Notably, CFTR is present in the majority of the pancreatic cells that express SLC26A9 and 100% of the cells expressing both genes are ductal or ductal/acinar.

Evidence of co-expression supports the concept that SLC26A9 and CFTR interact in vivo, as suggested by in vitro and cell-based studies (113, 121, 155). We have further evaluated the expression level of key genes in the WNK pathway whose proteins regulate

SLC26A9 activity. Among the five scRNA-seq studies, there was evidence of WNK1 and STK39 (SPAK) being expressed in cells with SLC26A9 while WNK4 was almost absent.

How could variation in SLC26A9 expression in a small subset of ductal cells affect risk for diabetes in CF? First, it has been shown that transcript copy number correlates modestly with protein concentration (156). Thus, SLC26A9 protein levels might be considerably higher in ductal cells than the levels implied by counts of RNA transcript.

Second, it is possible that the SLC26A9 expressing cells play a critical role in ductal ion transport, perhaps by being anatomically clustered in one portion of the pancreatic duct.

This situation might be analogous to ionocytes in the lung, a rare cell type that expresses high lev4els of CFTR (146, 147). We did not, however, consistently observe expression of FOXI1+ or vATPase genes (ATP6V1C2 and ATP6V0D2) that characterize ionocytes in the SLC26A9/CFTR co-expressed pancreatic cells (Table 2.5). Third, the cells that express SLC26A9 may have other key roles in the pancreas, such as that reported for

33 centroacinar cells (CACs), a specialized ductal cell-type found near acini that express

CFTR in fetal and adult pancreas (157, 158) that can replenish beta cells in zebrafish and mammals (157, 159, 160).

Though the etiology of CFRD is incompletely understood and is likely multifactorial, it has been documented that insulin secretion diminishes as individuals with CF age due to inflammation and destruction of pancreatic islet cells (85). Other studies report that

CFTR plays a direct role in the release of insulin and glucagon as well as in the protection of beta cells from oxidative stress and in controlling the resting potential of alpha and beta cells in rats (86). CFTR has also been proposed as a glucose-sensing negative regulator of glucagon secretion in alpha cells in mice, a defect postulated to contribute to glucose intolerance in CF and other forms of diabetes (87). However, several observations question whether CFTR plays a direct role in insulin release from beta cells (85, 89-91, 93-95, 142). Whether loss of CFTR function in beta cells does or does not contribute to the development of diabetes in CF, there is growing evidence that variation in the risk of CFRD correlates with ductal (i.e., exocrine) dysfunction. The

CFRD-associated variant rs7512462 in intron 5 of SLC26A9 has also been associated with variability in newborn screened immunoreactive trypsinogen levels, a biomarker of prenatal exocrine pancreatic disease in individuals with CF (96) which, in turn, has been associated with risk of CFRD (97). In the general population, exocrine pancreatic dysfunction has been observed in 10-30% of individuals with T1D and T2D (98, 161).

Furthermore, loss of function of the pancreatic enzyme carboxyl ester lipase due to deleterious genetic variants were associated with exocrine pancreatic disease and diabetes

34 in two families (100). Together, these studies support the concept that aberrant exocrine ductal function can be a major contributor to reduced insulin secretion and the development of diabetes.

Based on crowd-sourced assessments provided in the Open Regulatory Annotation database, we suspected that the cluster of transcription factor binding sites for FOS,

JUNB, JUND, and FOSL2 act as enhancers for SLC26A9 expression. This assertion was supported by the observation that the DNA fragment containing these putative binding sites drives expression only when fused to the native SLC26A9 promoter (2.3kb fragment). Members of the FOS and JUN family are well known to dimerize via leucine zippers to create the AP-1 TF complex (162). AP-1 activity has been implicated in a variety of normal cellular function such as proliferation, differentiation and apoptosis as well as abnormal processes, in particular, neoplastic transformation (163). Thus, the expression of FOS and JUN in a cancer cell line such as the pancreatic adenocarcinoma

(PANC-1) cell line used in our studies is expected. However, we posit that these TFs have a physiologic role in SLC26A9 expression as RNA encoding these TFs are consistently expressed in the subset of ductal cells that express SLC26A9 (95, 140, 141).

Furthermore, we observed that the SLC26A9 2.3 kb construct expressed in the CFPAC-1 cells, a pancreatic adenocarcinoma cell line derived from an individual with CF (151,

152). FOS TFs have been implicated in diabetes and glucose homeostasis. Computational analysis has suggested that FOS plays a role in the pathogenesis of T2D (164) and

FOSL2 in T2D individuals has been shown to be hypermethylated leading to lower mRNA and protein expression levels (165). Finally, we observed that TFs FOXA1,

35 TFAP2A and 2C and GATA3 that are known to be associated with type 2 diabetes risk

(166), development and subsequent maintenance of beta cells (167) and insulin secretion

(168) were absent in SLC26A9-expressing pancreatic cells in our study and in two of the four published studies (95, 140). The absence of these TFs likely explains why the 4.8kb fragment containing the 2.5 kb region (that has binding sites for FOXA1, TFAP2A and 2C and GATA3) displayed a similar level of reporter expression and maintained the allele- dependent expression observed with the 2.3 kb fragment. Together, these findings support a role for FOS and JUN in the transcriptional regulation of SLC26A9 in the post- natal pancreas.

Age-at-onset of diabetes in CF is a complex trait modified by multiple genes that develops over the lifetime of individuals with CF (84). As such, the ~20% difference between the expression level of LR and HR haplotypes in PANC-1 and CFPAC-1 cells is consistent with the modest effect size attributable to a gene operating in the context of a complex disorder (169). Indeed, more substantial changes in SLC26A9 expression cause distinct intestinal and pulmonary phenotypes in knock-out mouse models (118, 122).

Although we have not yet been able to determine the precise element(s) that is responsible for the difference observed between LR and HR haplotypes, this information is not essential for moving forward with a strategy to treat CFRD. There is growing evidence that provision of alternative pathways for chloride transport via channels such as TMEM16A (170) or small molecule ion channels (171) can restore anion secretion in

CF tissues. Likewise, several studies suggest that SLC26A9, a chloride/bicarbonate transporter may be able to compensate for the loss of CFTR function in individuals with

36 CF (60, 115, 120). Consequently, our results indicate that strategies that increase the level and/or function of SLC26A9 provide a viable approach to delaying the onset of diabetes in CF.

2.4 Materials and Method

Diagnosis of CFRD

The CFRD phenotype was defined using clinical data extracted from medical charts or

CFF Patient Registry data; information includes clinician’s CFRD diagnosis or lack thereof, age at diagnosis and use of insulin. Minimum criteria include clinical diagnosis of CFRD and at least 1 year insulin use (106). Supporting lab data was used when available including glucose tolerance and hemoglobin A1c (the HbA1c is not used to rule out diabetes but can be used to rule it in; as per CFRD guidelines). Fasting glucose was found to have low specificity for CFRD after review of chart data and was not used in the definition of CFRD.

Resequencing Cohort and Capture

Individuals who are p.Phe508del (F508del) homozygotes were recruited as a part of the

Johns Hopkins Twin and Sibling Study (TSS) and University of North Carolina’s Genetic

Modifiers Study (GMS). Cohort selection, sample consent and DNA preparations were previously described (172). A total of 790 p.Phe508del homozygotes (1580 chromosomes) were sequenced. After discordant samples were removed, 762 p.Phe508del homozygotes were analyzed (172). A total of 47.7kb encompassing

SLC26A9 and extending 9.9kb 5’ and 7.4kb 3’ of the gene were deep sequenced (Capture

37 design, library prep, sequencing, variant call and annotation and data cleaning as reported by Vecchio-Pagán et al., 2016 (172)).

Linkage Disequilibrium, Haplotype Block Analysis and Association Testing

Each variant was associated with the martingale residual phenotype for cystic fibrosis- related diabetes (CFRD) using a linear regression in the PLINK software package v1.07

(173). Data was initially cleaned for individual and variant missingness and IBD structure to remove related samples. Individual variants association with CFRD was conducted using --assoc command on PLINK. Log transformed p-values were plotted as a locus zoom plot using LocusZoom (174) in Figure 1A. For haplotype-based association testing, the analysis was conducted in PLINK using the --chap and --each-vs-others commands.

Only haplotypes with frequencies >2% containing variants with frequencies >15% were derived. LD blocks and haplotypes were confirmed and visualized using Haploview

(Figure 2.2; Figure 2.7).

Common and Rare Variant Burden Testing

To check for association between sets of variants and CFRD, a 5kb sliding window was moved across the entire 47.7kb capture region in 1250bp increments, and common and rare variants (MAF cut-off: 1% in our population) falling within these regions were grouped for region-based burden testing using the SKAT-O algorithm (175). In the

47.7kb captured region encompassing the SLC26A9 locus and surrounding genes, a total of 36 windows were present. The SKAT-O algorithm was implemented in R, using the

38 “SSD” commands which allow for loading of a plink formatted dataset, and the

“optimal.adj” method, representing the optimized method.

Determination of transcription start site of SLC26A9

5’ Rapid Amplification of cDNA Ends (RACE) was performed using the SMARTer

(“Switching Mechanism At RNA Termini”) RACE cDNA Amplification Kit (Clontech).

RNA isolated from primary tissue (pancreas, lung and stomach) obtained from the Johns

Hopkins Pathology Department was used to synthesize the first-strand cDNA and 5’-

RACE-Ready cDNA with the SeqAmpTM DNA Polymerase in accordance with the manufacturer’s instructions. The gene-specific primer

(5’GATTACGCCAAGCTTGGCAGGCTAGCGTAGCTGACACG-3’) sitting in exon 5 of SLC26A9 was used for RACE PCR and the products containing the 15 bp overlap

(GATTACGCCAAGCTT) were cloned into the linearized pRACE vector with In-

Fusion® HD Cloning. Plasmids were sent for Sanger sequencing with M13F and M13R primers.

Regulatory Profile Conservation Analysis of the region encompassing of SLC26A9

To explore regulatory profile of the region 5’ and within intron 1 of SLC26A9, we used the Open Regulatory Annotation database (ORegAnno) track on the UCSC genome browser (https://www.genome.ucsc.edu), which contains curated regulatory annotation including transcription factor general binding sites derived from experimental data (137).

We also evaluated conservation in this region using the Vertebrate Multiz Alignment &

Conservation track.

39 Single-cell RNA-sequencing of pancreatic cells

Sample collection: Collection of a human pancreas sample was performed following informed consent (see below). Preparation of single cells: Following clinical islet collagenase digestion for total pancreatectomy with islet autotransplantation, leftover human pancreatic material was immediately dissociated into single cells by enzymatic digestion by incubation with Accumax (Invitrogen). Cell clumps were removed with the

MACS SmartStrainer 30M. Cells were then prepared according to the 10X Genomics®

Cell Preparation Guide for Single Cell Protocols and resuspended in PBS with 0.04%

BSA. Cell viability (~80%) and concentration were determined using the Cellometer

Auto 2000 Cell Viability Counter. Single-cell library preparation and sequencing: The single cell cDNA library was prepared using droplet-based technology from 10X

Genomics®. ~17,400 single cells were immediately loaded into the 10X Genomics®

Chromium Controller to prepare gel bead-in-emulsions (GEMs). Single cell libraries were generated according to the 10X Genomics Chromium Single-Cell 3’ v2 protocol.

The library was loaded onto an Illumina NextSeq500 with 2x75 cycle paired end sequencing. Post Single-Cell RNA-Seq Processing: Processing of RNA-Seq reads was completed with the Cell Ranger Single Cell Software and pipeline v2.1.1

(http://software.10xgenomics.com/single-cell/overview/welcome). Raw base call files were demultiplexed into FASTQ files. Reads were aligned to GRCh38 supplied by 10X

Genomics® using STAR. Cell barcodes and unique molecular identifiers (UMIs) were counted and filtered for barcodes corresponding to a known barcode sequence and for unique RNA molecules. Cells were filtered for those with UMI counts >10% of the 99th percentile, a cut-off identified by Cell Ranger. The Seurat R package (version 2.3.3)

40 (139) was used for further quality control. Genes expressed in fewer than 3 cells and cells expressing fewer than 200 detected genes were filtered out. Cells with greater than 50% mitochondrial expression and >3000 unique gene counts (possible doublets) were also filtered out. A total of 2,999 cells and 16,884 genes were retained. Following filtering of the gene-barcode matrix, gene counts were log-normalized. Dimensionality Reduction,

Clustering and Expression: Seurat was used to identify highly variable genes (default parameters, except dispersion selection method), perform principal component analysis

(with n=1000 highly variable genes), and determine significant principal components.

The t-SNE projection was generated with the first 12 principal components. Graph-based clustering with K-nearest neighbor was used to predict cell populations. Cell specific expression markers identified from previous single cell papers (141) were then used to define and divide predicted clusters–acinar (PRSS1, PNLIP), beta (INS), alpha (GCG), delta (SST), PP (PPY), ductal (KRT19, SPP1, ATP1B, SLC4A4), endothelial (ESAM), mesenchyme (THY1, COL1A1).

Reanalysis of published single-cell RNA-Sequencing of the Pancreas

Single-cell RNA-sequencing of the pancreas conducted by the studies referenced in Table

2 were reanalyzed. Data was downloaded from the gene expression omnibus repository

(accession numbers GSE84133, GSE83139, GSE85241) and the ArrayExpress (EBI) (E-

MTAB-5061), and reanalyzed in R. Determining co-expression: Significance of co- expression was determined with a hypergeometric test, using the phyper function

(phyper(# of cells co-expressing SLC26A9 and gene B, number of cells expressing

SLC26A9, # of cells that don’t express SLC26A9, # of cells expressing CFTR)).

41 Expression of a gene was defined by having a gene count >1 for data downloaded from the gene expression omnibus repository, and a log-normalized gene count >0.5 for our data.

Reanalysis of publicly available RNA-Sequencing data of PANC-1 and CFPAC-1 cells

RNA-sequencing data available in the sequence read archive were used (accession IDs

SRR5171012, SRR5171013, SRR1172002, SRR3615309, SRR5952226; CFPAC-1:

SRR1736491). Raw reads were aligned to the reference genome (hg19) using the

Bowtie2 algorithm (176) and splice junctions were identified via Tophat2 (v2.0.13) (177) from the Tuxedo software suite. CuffQuant and Cuffdiff (Cufflinks v2.2.1) (178) were then used to assemble transcripts, estimate their abundances, and test for differential expression among samples.

Plasmid construction

Reporter constructs were generated to contain regions of different lengths (5’ 4.8kb,

2.3kb, 1.172kb and 1.173kb) corresponding to either high risk (HR) or low risk (LR) haplotypes (Figure 2.9). Inserts were amplified from genomic DNA using specific primers with KOD Hot Start DNA polymerase. With overhangs added, inserts were fused upstream to the firefly luciferase reporter PGL4.10 vector (Promega) using the In-fusion

Cloning Kit (Takara) according to manufacturer’s instructions. After transformation in

Stellar Competent Cells (provided by the In-fusion Cloning kit), plasmids resulting from both SpinsmartTM Plasmid Miniprep DNA Purification Kit (Denville) and Plasmid Maxi

42 Kit (Qiagen) were checked by sequence analyses. As needed, site-directed mutagenesis was used to modify key variants or unwanted changes as a result of subcloning to match the sequence corresponding to haplotypes-of-interest with the Site-Directed Mutagenesis

Kit (NEB).

Mammalian cell culture, transfection and Dual Luciferase-Renilla Reporter Assay

PANC-1 cells were maintained in Dulbecco’s modified Eagle’s medium (DMEM,

Invitrogen) supplemented with 10% v/v fetal bovine serum (FBS) and 1% Penicillin-

Steptomycin (PS). CFPAC-1 cells were maintained in Iscover’s modified Dulbecco’s medium (IMDM, ThermoFisher Scientific) also supplemented with 10% v/v FBS and 1%

PS. When PANC-1s/CFPAC-1s were approaching 70%-80% confluency, they were transfected with LR or HR reporter plasmids (Figure 2.9) with Lipofectamine 2000

(Invitrogen) according to the manufacturer’s instructions and then placed in antibiotic free DMEM/FBS or IMDM/FBS for 48 hours. As transfection and expression efficiency can vary due to the structure of the plasmids (e.g. coiled, supercoiled), we used up to 4 independently derived plasmid clones for each SLC26A9 DNA fragment tested. A spectrophotometer was used to quantify DNA concentration. The number of plasmids used was calculated based on the concentration of the plasmid adjusted for size

(molecular molar mass) thus, all transfections contain equal number of plasmid copies per technical replicate/well in each independent transfection (1.7E-13 mol or ~1.0E11 copies). To address biological variation, transfections were performed in 6 wells for at least 2-3 independent transfections per construct. As a control for the normalization of transfection efficiency, same amount of the renilla luciferase encoding plasmid pRL-TK

43 (3.4E-15 mol or approximately 2.0E9 copies), is added to all transfection wells (179,

180). The neutral constitutive expression of Renilla luciferase was used as an internal control value to which expression of the experimental firefly luciferase reporter gene was normalized. Whole cell lysates were harvested after 15-minute incubation with 1x passive lysis buffer (Promega). All samples were centrifuged at maximum speed for 15 minutes on a tabletop centrifuge at 4oC and plated onto a 96-well plate in triplicates with 20 uL lysate per well then analyzed using the Dual-Luciferase® Reporter Assay System

(Promega) on a BioTek plate reader (BioTek Instruments, Inc.). The luminometer was set to inject 50 uL of Luciferase Assay Buffer II (LARII) and 50 uL Stop & Glo Reagent sequentially into each sample for independent measurement of fLUC and rLUC activities.

Each injection was followed by slow shake for 3 seconds then by an integration period allotted by a 2 second delay. Luminescence for both fLUC and rLUC, and the relative ratio of fLUC/rLUC activity was recorded in an excel file.

Variant association with gene expression

Pancreas and lung cis-eQTL association statistics of the CFRD-associated variants reported in Blackman et al., 2013 (84) were downloaded from GTEx, v7 (Table 2.6).

Directionality of beta value was modified from GTEx such that a positive beta value indicates association of the high risk allele instead of the reference allele.

Statistics

SKAT-O tests: To check for association between sets of variants and CFRD, the optimized sequence kernel association test (SKAT-O) was used. Statistical significance

44 after correction for the number of windows used in the analysis was defined as a p-value less than 0.01/36=2.7E-4. See methods for details.

Determining significance of co-expression of transcripts in scRNA-seq data: The hypergeometric test was used to measure the statistical significance of two genes being co-expressed in the same cell given the total number of cells they are expressed in.

Significance of co-expression was only calculated if at least 1 cell expressed both genes.

P-values less than 0.05 were considered significant. See methods for details.

Dual luciferase-renilla assay (DLRA): DLRA reading was performed three times for each of the transfection well. Each of the three readings were averaged. An α of 0.05 using a

Student’s t-test based on a difference between sample means was considered significant.

Study approval

For samples obtained from Johns Hopkins Pathology Department: The IRB study number is IRB00157289. Date of Acknowledgement is August 16, 2018. Date of Expiration is

August 16, 2021. The RNA Seq was performed under IRB# PRO16030614 (demographic information) and PRO13020493 (genetic evaluation of pancreatic surgical waste).

45 Figure 2.1 Association of SNPs and 5kb regions across a 47.7 kb region encompassing SLC26A9 with CFRD in 762 p.Phe508del (F508del) homozygous individuals. (A) Manhattan plot for association with CFRD (points, left y-axis) and recombination ratio plotted by genomic location (blue line, right y-axis). (B) SKAT-O test for association of sets of common (top) and rare (bottom) variants with CFRD. All variants within each 5 kb window, moved across the entire region in increments of

1,250 bp, were tested for a combined association with CFRD via SKAT-O test. The x axis denotes chr1 hg19 bp position, y axis is −log10 of the regional p-value. Association values were plotted at the center of each 5 kb window. Common and rare variants were assigned based on a MAF cut-off of 1%. Red line indicates significance threshold

Bonferroni corrected for the number of sliding windows (p=0.01/36=2.7E-4). Below is a genomic map of SLC26A9. No other RefSeq genes are present in this region other than

SLC26A9.

46 Plotted SNPs

10 r2 100

0.8 0.6 8 0.4 80 0.2

R Plotted SNPs

e

2 A 1010 r 1001c 00

0.8 o

0.6 R

m

e 0.4 8 80 c

80 o 8 0.2

m

R ) b

e

b

c ) chr1:205899595 o

i e i

m

n e

n ) b

a

u 6 60

chr1:205899595

e i l

6 60 n

t

u u

l 60a

i a 6

o

a t l a

i

v v

o

n -

n −

p

p

r (

a r

a t

0

( 1

a

t 0

e g

i

1

t o

(

l v

e

o

c

g 4 40 − M

4 40 o

(

/ l

M

c

n

− -

b

M

)

/

p

M

r (

2 20 b

2a 0

2 )

0 1

t

e g 0 0

0 0 o

( l

B 4 6 SLC26A9 40 c − M 5205.87 205.88 205.89 205.9 205.91 205.92 205.93

Position on chr1 (Mb) /

M )

e 4 u

b

l

a v

)

- 3

p

(

0

1 g

o 2

l - 2 1 20 0 205.87 205.89 205.91 205.93 6

5

4

)

e

u l

a 3

v

-

p

( 0

1 0 2 0

g

o l - 1

0 205.87 205.89 205.91 205.93 3’ SLC26A9 5’ SLC26A9 In In tro tro Ex n n on 5 1 1 Position on chr 1 (Mb) 205.87 205.88 205.89 4205.97 205.91 205.92 205.93 Position on chr1 (Mb)

Figure 2.2. Two common haplotypes that associate with age-at-onset of CFRD. (A)

Top: SLC26A9 SNP haplotypes with MAF>15% and MHF>20%. Location of variants relative to SLC26A9 and the luciferase constructs are shown above haplotypes (Note:

SLC26A9 is on (-) DNA strand thus locations are shown 3’ to 5’ from left to right, not drawn to scale). † indicates TGGGGCCTCGGGTATCTCA. Haplotype frequencies, p- values and beta values are shown to the left of the respective haplotype. rsIDs are shown for the CFRD associated variants reported by Blackman et al., 2013 (84). SNPs highlighted in blue indicate alleles composing the most common ancestral haplotype.

Variants highlighted in red indicate alleles that differ from those in the common haplotype. Bottom: LD heat map of variants with MAF>15% in the p.Phe508del homozygous study sample, created with Haploview. Within the heatmap, black boxes indicate an r2 value of 1 or complete LD, while white boxes indicate an r2 of 0 or linkage equilibrium. Proposed LD blocks are outlined (triangles), defined by a recombination event between intron 5 and 8. (B) Cumulative Incidence plot of proportion with CFRD relative to age among individuals with low risk (LR) or high risk (HR) haplotypes.

LR/LR homozygotes (n=46) versus HR/HR homozygotes (n=36) are plotted. Log rank association between “Low risk” and “High risk” individuals yielded a significance difference in proportion with CFRD (p-value: 6.5E-3).

48

0

5

0

4

0

3

e

g

A

0

2

)

)

6

6

3

4

=

=

N

N

(

(

R

0

R

H

L

/

/

1

R

R

L

H

0

0 . 8 0 . 6 0 . 2 0 . 0 0 . 4

P r o p o r t i o n

w i t h

C F

R

D

B

T

C

T

C

T

C

C

G

b

k

C

G

9

8

A

G

.

A

A

G 4

r s 7 4 1 9 1 5

3 6

C

C

m

b

T

2

C

k

a

A

C

3

e

T

.

C

r

L

t

2

T

G

s

S

A

C

3

p r s 4 0 7 7 4 6

9 T

A

7

b

r s 4 0 7 7 4 6

U 8

1

k

A

G .

r s 4 9 5 1 2 7

1 1

T

C

T

C

T

A 2

r s 1 3 4 2 0 6 4

7

T

C

b

1 r s 1 3 4 2 0 6

k 3

A

.

G

1

A

G

:

A

C

s r s 7 4 1 5 9 2

1 T

A

t

c

A

C

u

A

G

r

t r s 6 1 8 1 4 9 5

3 T

C

s

A

G

n

1

-

T

o

c

T

G

n

r s 7 5 5 5 5 3

4 A

G

o

e

r

s

A

G

t

a r s 1 8 7 4 3 6

1 C

n

G

r

I r s 2 0 3 6 1 0

0 T

G

e

f

i

C

G

c

A

G

u

C

G

L

-

G

T A I n t r o n

4

C C

r s 7 5 1 2 4 6

2 C G I n t r o n

5 A

G

)

3

-

a

f

t

5

n

e

e

D

1

r

o

.

-

i

B

R

t

0

(

a

F

i

3

3

e

c

C

0

0

u

-

-

o

l

h

s

E

E

t

a

i

s

4

4

v

w

3

1

-

A

.

.

p

4

1

1

4

q

4

8

e

2

2

r

.

.

F

0

0

e

k

k

p

s

s

i

i

y

t

R

R

o

l

h

w

p

g

i

o

a

L

H H A

49 Figure 2.3. Transcription Start Site of SLC26A9 begins in Exon 1. (A) Schematic showing the first five exons of SLC26A9. Note that SLC26A9 is transcribed from the minus strand and the schematic shown here is drawn in the native orientation. The size of each exon and intron regions are labeled (bp). The hash marks denote where the figure is not drawn to scale. (B) Transcriptional start site of SLC26A9 is in Exon 1. 5’ RACE was performed using a gene specific primer (GSP) in exon 5 of SLC26A9. The portion of the

GSP in red is the overhang necessary for Infusion PCR. Transcription start site is labeled in Exon 1. The translational start site is labeled with the Kozak consensus sequence in

Exon 2. (C) Sanger sequencing trace of the 5’ RACE product of the SLC26A9 mRNA transcript. The RACE plasmid product was sequence confirmed with M13F/R. Sanger sequencing of one primary lung sample and one primary stomach sample confirms that the transcription start site (TSS) of the SLC26A9 mRNA transcript is in Exon 1. Further upstream of the TSS is the RACE adapter sequence. The sequencing trace crosses exon- exon junctions (shown here between exon 1 and 2 by the vertical black line) confirming that RACE used mRNA as the template.

50

e

t

i

S

t

r

a

t

S

n

o

i

t

p

i

r

c

s

n

a

r

T

e

t

i

S

t

r

a

t

S

n

o

i

t

a

l

s

n

a

r

T

B A C

51 Figure 2.4 Regulatory annotations 5’ and within SLC26A9 from the UCSC Genome

Browser. The key CFRD-risk SNPs (Blackman et al., 2013 (84)) 5’ and within SLC26A9 are annotated at the top. The blue region highlights the 1.172 kb region 5’ of SLC26A9.

The yellow region highlights the 1.173 kb region that together with the blue region denotes the 2.3 kb region 5’ of SLC26A9. The green highlight denotes the 2.5 kb region, which encompasses the rest of the 5’ 4.8 kb region upstream of SLC26A9. The Open

Regulatory Annotation database (ORegAnno) track displays transcription factor binding sites (https://genome.ucsc.edu/cgi-bin/hgTrackUi?g=oreganno&db=hg19). Bottom track displays the Vertebrate Multiz Alignment & Conservation.

52

53 Figure 2.5 Coexpression of SLC26A9 and CFTR in pancreatic cells. Results were obtained from single-cell RNA-sequencing data. (A) t-SNE plot of single-cell RNA- sequencing data. Each data point represents a cell, colored by its cell type. (B) t-SNE plot of single cell RNA-sequencing of the pancreas, with cells expressing CFTR and/or

SLC26A9 with a log-normalized expression >=0.50 colored (C) Venn diagram representing the number of cells that express CFTR, SLC26A9, and both, and the percentage of cell types that these genes are expressed in. Number within the parentheses is the number of cells per compartment. *Of the 520 CFTR-expressing cells, 3.1% or 16 cells are categorized as ‘OTHER’ and include 5 alpha cells, 2 beta cells, 3 mesenchymal cells, and 6 unspecified other cells. D and E, like 5C, are Venn diagrams to summarize two of the four publicly available scRNA-seq data that we reanalyzed that had provided cell type information, Baron et al., 2016 and Segerstolph et al., 2016.

54

17)

17)

Alpha (5) Alpha

Ductal/Acinar 64% (333) 64% Ductal/Acinar

Ductal 22.5% (1 22.5% Ductal

Other: Unspecified (6) Unspecified Other:

Other: Mesenchymal (3) Mesenchymal Other:

Other: Beta (2) Beta Other:

Other: Other:

Ductal/Acinar 64% (333) 64% Ductal/Acinar

Acinar 10.4% (54) 10.4% Acinar

Ductal 22.5% (1 22.5% Ductal

Acinar 50% (2) 50% Acinar

Ductal/Acinar 50% (2) 50% Ductal/Acinar

17)

s

Alpha (5) Alpha

9

l

l

A

e

6

c

Ductal/Acinar 72.7% (8) 72.7% Ductal/Acinar

Ductal 27.3% (3) 27.3% Ductal

Other: Unspecified (6) Unspecified Other:

Other: Mesenchymal (3) Mesenchymal Other:

Other: Beta (2) Beta Other:

Other: Other:

Ductal/Acinar 64% (333) 64% Ductal/Acinar

Acinar 10.4% (54) 10.4% Acinar

Ductal 22.5% (1 22.5% Ductal

2

4

C

s

l

L

l

S

e

c

1

1

s

l

r

l

R

e

%

e

T

h

1

c

F

.

t

3

0

C

O

2

5

C

Acinar 9.6% (7) 9.6% Acinar

Ductal 90.4% (66) 90.4% Ductal

l

a

t

c

u

D

l

l

A

9

s

l

l

A

e

6

c

2

0

C

s

2

L

l

l

S

e

c

3

Co-expressed 0.84% (4) 0.84% Co-expressed

PSC 0.21% (1) 0.21% PSC

Unclassified Endocrine 0.63% (3) 0.63% Endocrine Unclassified

Gamma 1% (5) 1% Gamma

Epsilon 0.42% (2) 0.42% Epsilon

Endothelial 0.21% (1) 0.21% Endothelial

Delta 0.63% (3) 0.63% Delta

Beta 2.1%(10) Beta

Acinar 10.9% (52) 10.9% Acinar

Alpha 5% (24) 5% Alpha

Ductal 78% (373) 78% Ductal

7

Co-expressed 0.84% (4) 0.84% Co-expressed

PSC 0.21% (1) 0.21% PSC

Unclassified Endocrine 0.63% (3) 0.63% Endocrine Unclassified

Gamma 1% (5) 1% Gamma

Epsilon 0.42% (2) 0.42% Epsilon

Endothelial 0.21% (1) 0.21% Endothelial

Delta 0.63% (3) 0.63% Delta

Beta 2.1%(10) Beta

Acinar 10.9% (52) 10.9% Acinar

Alpha 5% (24) 5% Alpha

Ductal 78% (373) 78% Ductal

R

s

l

T

l

F

e

C

c

8

7

4

E

B

r

l

a

a

t

n

i

c

c

u

A

D

r

l

l

a

A

n

i

c

A

/

l

a

t

9

s

c

l

A

l

u

6

e

s

D

l

2

c

l

e

C

6

c

L

Quiescent stellate 0.5% (2) 0.5% stellate Quiescent

Delta 0.2% (1) 0.2% Delta

Acinar 0.5% (2) 0.5% Acinar

Ductal 98.7% (384) 98.7% Ductal

s

-

l

S

l

α

e

c

l

9

a

t

c

s

R

u

l

e

l

l

T

D

a

e

F

i

m

l

s

c

l

y

C

s

e

l

l

9

h

l

e

h

8

c

t

e

c

3

-

n

o

c

-

β

e

d

α

s

n

e

E

M D A

55 Figure 2.6. Dissection of regulatory regions 5’ of SLC26A9. (A) Diagram depicting the location and length of the regions studied relative to SLC26A9. (B) Transfections in

PANC-1 cell line. The 1.172 kb region containing 2 CFRD-risk SNPs generated robust expression of luciferase consistent with a promoter. However, levels do not differ between the LR and HR plasmids. The 1.173 kb region containing 3 CFRD-risk SNPs generated little to no activity, similar to negative controls.The 2.3 kb region composed of the 1.172 kb and 1.173 kb region drove expression of luciferase that was 12% higher for

LR compared to HR haplotype (p=5.02E-09; indicated by ***). The 4.8kb region encompassing all 6 LR alleles generated 19% higher expression level compared to HR

(P: 6.43E-07, indicated by ***). (C) Transfections in CFPAC-1 cell line resulted in same trend being observed. The 2.3 kb region drove expression of luciferase that was 20% higher for LR compared to HR haplotype (P: 2.47E-03).

For plots in (B) and (C): Results are shown for two or three separate transfections of

PANC-1 and CFPAC-1 cells with two to four independent plasmid constructs (A-D); each containing alleles corresponding to the LR (blue) or HR (red) haplotypes in their native orientation. For each plot, the data points to the left of the vertical line are results from each independent clone. On the left, data points from all clones are combined into two groups. Negative controls (pGL4.10 empty vector and renilla) are shown in gray.

Total data points (N-values) are listed below each construct. Significance was assessed using Student’s T-test. Error bars with SEM.

56

1.172kb 1.173kb Luciferase constructs: A 2.3kb 4.8kb Intron 1 Upstream

B Transfection 1 Transfection 2

a

a

l

l

l

l

i

i n

15 n 15 ns

e e

R ns

R

b

o

o

t

t

k

e

10 e 10

v

v

i

i

2

t

t

a

a

l

7

l

e

e

1

R

R

.

5 5

e

e

s

1

s

a

a

r

r

e

e

f

f

i

5 i

c 0

c 0 u

A B C D A B C D E u A B C D A B C D E L 0 LR HR L 0 LR HR .1 .1 LR HR P4 RenTK LR HR P4 RenTK

N = 6 6 6 6 6 6 6 6 8 8 24 24 N = 6 6 6 6 6 6 6 6 4 2 24 24

a

a

l

l

l

l

i

i n

n 15 15

e

e

b

R

R

k

o

o

t

t

e

e 10 10

3

v

v

i

i

t

t

7

a

a

l

l

1

e

e

.

R

R

5 5

e

1 e

s

s

a a

r r

e e

5

f f

i i c

c 0 0 u

A B C D A B C D E u A B C D A B C D E L 0 LR HR L 0 LR HR .1 .1 P4 RenTK P4 RenTK

1 LR HR LR HR Transfection 3 -

N = 6 6 6 6 6 6 6 6 8 7 24 24 N = 6 6 6 6 6 6 6 6 4 2 24 24

C

N

a

a

a

l

l

l

l

l

l

i

i

i

A

n n

n 8 8 8

e e

e *** P

R ***

R

R

o o

o ***

t t t 6 6

6

b

e

e e

v

v v

i

k

i i

t

t t

a

a

a

l l

4 l 4 4

3

e

e

e

.

R R

R

2

e

e e

s s

2 2 s 2

a

a a

r

r r

e

e e

5

f

f f

i

i i c

0 c

0 c 0

u u

A B C D A B C D A B C D A B C D u A B C D A B C D

L L LR HR LR HR L LR HR LR HR p4.10ERenTK LR HR p4.10ERenTK LR HR p4.10ERenTK

N = 6 6 6 6 6 6 6 6 4 2 24 24 N = 6 6 6 6 6 6 6 6 4 2 23 24 N = 6 6 6 6 6 6 6 6 4 2 24 24

a

a

l

l

l

l

i

i n

n 8 8

e

e

R

R

o o t

t 6 *** 6

b e

e ***

v

v

k

i

i

t

t

a

a l

4 l 4

8

e

e

.

R

R

4

e

e

s

2 s 2

a

a

r

r

5 e

e

f

f

i

i c

0 c 0 u C D A B E u C D A B

L E 0 LR HR L 0 LR HR .1 1 4 4. LR HR p RenTK LR HR p RenTK

N = 6 6 6 6 6 2 12 12 N = 6 6 6 6 6 2 12 12 a

C l

a

l

l i

* l i

n 10

n 10

e

e

R

R

o 1 8

o 8

t *

t

-

b

e

e

v

v

k i

C 6 i

t 6

t

a

a

l

l

A

3

e e

. 4 4

R

R

P

2

e

e

-

s

s ’

a 2

a 2

F

r

r

e

e

5

f

f

i

i C

c 0

c 0 u u A B C D A B C D A B C D A B C D L LR L LR HR HR p4.10ERenTK p4.10ERenTK

N = 6 6 6 6 6 6 6 6 4 4 24 23 N = 6 6 6 6 6 6 6 6 4 4 24 24

57 Figure 2.7 Haplotypes observed in 762 F508del homozygous samples (1,524 chromosomes) across the SLC26A9 locus. Representation of SLC26A9 SNP haplotypes with MAF>15% and MHF>2%. Location of variants relative to SLC26A9 are shown in box diagram below the haplotypes (Note: SLC26A9 is on (-) DNA strand thus locations are shown 3’ to 5’ from left to right, not drawn to scale). Haplotype frequencies, p-values and beta values are shown to the left of the respective haplotype. rsIDs are shown above with the gray dotted lines denoting the SNPs that were included in the respective luciferase constructs. CFRD-associated variants reported by Blackman et al. 2013 are marked by an asterisk (*) and bolded. SNPs highlighted in blue indicate the most common ancestral haplotype. Variants highlighted in red indicate changes from the most common ancestral haplotype. § indicates TGGGGCCTCGGGTACCTCA, and † indicates

TGGGGCCTCGGGTATCTCA. In addition to the Low Risk (LR) and High Risk (HR) haplotypes that we functionally test in this study, also labeled here is High Risk 2, which is identical at 11 of 12 CFRD-associated SNPs (exception is rs7419153).

58

59 Figure 2.8. Violin plot of CFTR and SLC26A9 expression in pancreatic cells. From our current study, each data point is a cell, colored by the predicted cell type. (A)

Expression of CFTR in cells that do and do not express SLC26A9. (B) Expression of

SLC26A9 in cells that do and do not express CFTR.

60

61 Figure 2.9 Dual Luciferase-Renilla Experimental Design. Constructs of two different versions of the 1.172 kb region or region of interest were made with each one containing all of the naturally occurring high risk and low risk variants for risk of developing CFRD cloned in a dual luciferase/renilla reporter system (DLRS). For each of these versions, clones containing experimental firefly luciferase from two-four maxipreps were prepared and arbitrarily designated as clones A-D. These were co-transfected two-three independent times into PANC-1 along with the same quantity per well of renilla luciferase encoding plasmid pRL-TK, a control reporter plasmid for the normalization of transfection efficiency. Each transfection consists of 6 wells per transfection day. Dual

Luciferase-Renilla reading were performed three times for each of the well. Each of the three readings were averaged. Note that the amount of experimental luciferase construct used was calculated based on the concentration of the plasmid adjusted for it size

(molecular molar mass). Experimental plasmids and control plasmids are added in a 50:1 ratio.

62

63 Table 2.1. Expression of transcription factors and CFTR in the pancreas, PANC-1, and CFPAC-1 cells. Number of cells co-expressing SLC26A9 and the respective gene in the pancreas was quantified using single-cell RNA-sequencing. Cells were determined to express respective gene if normalized log-transformed gene expression was greater than

0.5. Significance of the co-occurrence of two genes were determined with a hypergeometric test. Significance was not tested of 0 cells co-expressed both genes.

Number of ductal and ductal/acinar cells expressing respective gene are shown in parenthesis. Rightmost column indicates average gene expression in PANC-1 and

CFPAC-1 cells determined by RNA-sequencing data available in the sequence read archive (PANC-1: SRR5171012, SRR5171013, SRR1172002, SRR3615309,

SRR5952226; CFPAC-1: SRR1736491). SLC26A9 is expressed in PANC-1 and CFPAC-

1 cells with 0.01 and 3.0003 FPKM, respectively.

64 PANC-1 CFPAC-1 scRNA-sequencing of the pancreas RNA-seq RNA-seq (Current study n=2,999) (n=5) (n=1) Proportion of Cells expressing Significance of SLC26A9-expressing cells Gene expression Gene B co-expression that express Gene B (FPKM) (Ductal/Ductal Acinar) (p-value) Gene B (Ductal/Ductal Acinar) CFTR 531 (461) 11(11) / 15(13) [73.4%] 2.31E-07 0.02 0.04 FOS 2633 (599) 15(13) / 15(13) [100%] <2.2e-16 54.94 325.72 JUND 2251 (568) 14(12) / 15(13) [93.4%] 1.34E-02 56.17 123.31 JUNB 935 (340) 11(10) / 15(13) [73.4%] 1.34E-04 64.71 218.20 FOSL2 284 (91) 4(4) / 15(13) [26.7%] 9.95E-03 12.95 9.26 SP1 101 (51) 3(3) / 15(13) [20%] 1.24E-03 24.48 29.12 MAFK 275 (144) 3(3) / 15(13) [20%] 4.20E-02 50.66 17.88 STAT1 181 (96) 2(2) / 15(13) [13.4%] 5.75E-02 44.41 66.18 NFYA 32 (22) 1(1) / 15(13) [6.7%] 1.06E-02 14.2 10.11 NFYB 139 (63) 1(1) / 15(13) [6.7%] 1.51E-01 14.25 5.02 MAX 156 (59) 1(1) / 15(13) [6.7%] 1.82E-01 23.3 42.68 USF2 265 (84) 1(1) / 15(13) [6.7%] 3.88E-01 72.65 40.46 MAFF 296 (202) 1(1) / 15(13) [6.7%] 4.44E-01 5.52 46.05 GATA3 0 (0) 0(0) / 15(13) [0%] NA 3.57 17.50 TFAP2A 0 (0) 0(0) / 15(13) [0%] NA 34.61 55.62 FOXA1 3 (2) 0(0) / 15(13) [0%] NA 1.55 10.58 TFAP2C 7 (2) 0(0) / 15(13) [0%] NA 3.15 1.82

65 Table 2.2 SLC26A9 and relevant gene expression in pancreatic cells. Results were obtained from single-cell RNA-sequencing data. Number of cells expressing respective genes are listed, with the number of ductal cells expressing that gene listed in parentheses. Cells were determined to express respective gene if gene count was greater than 1. Significance of the co-occurrence of two genes were determined with a hypergeometric test. Significance was not tested of 0 cells co-expressed both genes. Data is taken from Baron et al. (GSE84133), Wang et al. (GSE83139), Muraro et al.

(GSE85241) and Segerstolpe et al. (E-MTAB-5061).

66

f

n

o

o

)

i

e

7

2

4

2

4

1

7

6

9

4

3

9

4

2

4

2

4

s

e

c

0

0

0

0

0

0

0

0

0

0

0

0

2

0

0

0

3

s

u

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

n

l

e

E

E

E

E

E

E

E

E

E

E

E

E

E

E

E

E

E

a

r

a

c

0

8

5

6

4

7

4

5

5

6

3

7

9

1

8

1

6

p

v

i

3

7

6

1

1

7

1

0

3

3

5

0

2

4

0

3

1

-

f

x

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

i

p

e

2

1

3

4

4

9

2

1

2

4

4

4

5

1

1

1

3

n

(

-

g

o

i

c

S

)

]

]

]

]

]

]

]

]

]

]

]

]

]

9

g

]

]

]

]

%

%

%

%

%

%

%

%

%

%

%

%

0

B

%

5

4

8

4

2

8

6

3

9

5

6

5

n

2

i

%

%

%

%

.

.

.

.

.

.

8

.

.

.

.

.

.

,

e

.

7

3

7

1

5

8

2

2

0

2

5

1

3

2

5

8

s

2

f

.

.

.

.

1

n

]

3

4

8

6

6

8

6

6

8

9

6

7

s

=

o

e

9

4

9

1

1

[

[

[

[

[

[

[

[

[

[

[

[

e

n

[

[

[

[

[

%

r

(

)

)

)

)

)

)

)

)

)

)

)

)

G

n

[

)

)

)

)

)

6

6

6

6

6

6

6

6

6

6

6

6

p

o

e

6

6

6

6

6

)

s

s

i

l

x

6

6

6

6

6

6

6

6

6

6

6

6

l

p

6

6

6

6

6

t

(

(

(

(

(

(

(

(

(

(

(

(

s

l

l

a

e

(

(

(

(

(

r

t

e

e

-

o

3

3

3

3

3

3

3

3

3

3

3

3

r

o

t

3

3

3

3

3

c

c

9

9

9

9

9

9

9

9

9

9

9

9

9

s

p

p

9

9

9

9

9

u

r

A

/

/

/

/

/

/

/

/

/

/

/

/

x

o

e

/

/

/

/

/

D

6

r

e

)

)

)

)

)

)

)

)

)

)

)

)

(

g

)

)

)

)

)

2

P

t

0

7

6

4

3

8

5

2

6

0

7

6

e

7

3

7

1

9

3

2

5

4

4

5

4

4

5

6

3

6

a

C

(

(

(

(

(

S

(

(

(

(

(

(

(

(

(

(

(

(

h

L

9

4

9

1

1

t

3

5

7

8

6

7

1

7

8

6

1

3

S

1

3

4

7

5

5

7

6

5

7

8

6

7

)

)

)

)

)

)

)

)

)

)

)

)

g

6

4

6

)

0

7

4

)

n

)

)

)

4

6

0

3

6

3

l

9

9

2

1

3

2

4

B

i

5

0

6

)

7

9

0

1

0

7

a

s

s

1

2

3

4

3

3

2

t

l

1

1

4

3

1

1

2

2

2

3

e

(

(

(

(

(

(

(

s

l

(

(

(

(

(

(

(

(

(

(

c

n

e

e

2

4

1

3

0

2

5

u

r

e

1

7

7

8

1

3

4

3

2

8

C

0

1

1

2

6

5

1

p

D

3

3

6

7

6

0

2

7

7

G

3

3

5

1

6

7

2

(

x

4

8

7

8

3

4

1

1

1

1

1

1

e

f

n

o

o

)

i

e

4

3

6

3

5

1

3

4

7

6

8

8

2

3

2

6

s

e

c

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

s

u

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

n

l

e

A

E

E

E

E

E

E

E

E

E

E

E

E

E

E

E

E

a

r

a

c

2

2

5

9

4

7

3

7

8

2

8

0

7

7

6

2

N

p

v

i

1

3

9

1

1

5

6

1

7

0

5

9

4

9

4

9

-

f

x

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

i

p

e

3

3

3

3

6

2

9

1

1

1

1

2

3

4

7

1

n

(

-

g

o

i

c

S

)

2

s

7

l

]

]

]

]

]

]

]

]

]

]

]

f

l

]

]

]

]

0

]

s

,

e

o

-

%

%

%

%

%

%

%

%

%

%

%

]

]

s

%

%

%

%

3

c

1

7

9

8

1

3

9

6

5

6

4

%

9

e

n

8

8

8

.

5

.

.

.

.

.

.

.

.

.

.

=

%

%

[

.

.

.

.

r

g

7

7

8

4

4

8

5

3

9

0

2

A

o

n

3

0

1

1

4

6

i

p

(

[

[

4

6

5

6

9

8

5

7

7

7

8

B

n

6

t

1

1

1

2

i

[

[

[

[

[

[

[

[

[

[

[

x

r

2

[

[

[

[

o

4

4

e

s

e

r

o

4

4

4

4

4

4

4

4

4

4

4

3

3

C

s

4

4

4

4

n

t

a

/

3

3

3

3

3

/

3

3

3

3

3

3

p

e

L

r

3

3

3

3

e

a

/

/

/

/

/

/

/

/

/

/

/

1

0

r

o

/

/

/

/

u

S

6

3

0

2

2

0

9

5

7

4

8

h

r

G

p

4

4

5

9

t

1

2

2

2

3

3

1

2

2

2

2

P

M

x

e

g

n

B

i

6

3

1

8

1

8

s

s

2

8

3

0

4

4

l

e

s

6

4

1

1

7

8

8

4

5

4

l

8

0

1

2

7

1

0

n

e

5

9

4

5

8

2

3

8

8

8

e

6

7

9

7

5

6

r

e

1

1

1

1

1

1

C

p

G

x

e

n

e

o

)

i

c

2

3

3

1

1

1

1

1

2

3

2

1

6

6

3

s

e

n

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

s

u

a

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

l

e

c

A

A

f

E

E

E

E

E

E

E

E

E

E

E

E

e

e

E

r

i

a

o

f

1

1

7

3

8

4

2

3

2

6

9

9

2

2

7

N

N

p

v

i

.

.

3

7

3

5

3

6

0

7

0

9

0

5

2

-

x

n

.

.

.

.

.

.

.

.

.

.

.

.

2

2

.

p

e

g

3

1

4

3

6

8

5

1

8

6

7

9

<

<

1

(

-

i

o

S

c

s

l

)

)

]

]

]

f

l

]

]

]

]

]

]

]

]

]

]

5

]

s

]

e

o

-

%

%

%

%

3

]

]

%

%

s

%

%

%

%

%

%

%

%

c

%

9

3

7

3

%

9

6

0

0

e

n

6

4

9

6

4

2

.

6

.

4

.

.

[

%

%

3

.

.

.

.

.

.

.

.

r

=

0

0

g

.

8

4

2

4

A

o

0

0

0

7

6

1

6

2

1

6

i

p

n

1

1

5

[

[

7

8

5

8

B

n

6

t

(

1

4

3

3

2

4

3

2

[

[

i

[

(

[

[

[

x

r

2

[

[

[

[

[

[

[

[

g

9

9

e

s

e

9

9

o

9

9

9

9

9

C

n

1

1

s

9

9

9

9

9

9

9

9

n

t

1

1

1

1

1

1

1

p

/

/

a

e

L

1

1

1

1

1

1

1

1

e

/

/

a

/

/

/

/

/

r

0

0

o

/

/

/

/

/

/

/

/

S

9

9

1

5

6

0

6

h

r

G

W

p

2

9

7

6

5

8

6

5

t

1

1

1

1

1

1

P

x

e

g

n

B

i

s

s

7

6

3

1

5

7

3

0

3

5

8

3

8

l

e

s

0

1

l

9

2

4

2

4

6

8

5

3

9

0

6

1

1

4

n

e

1

2

e

1

2

2

2

2

4

1

3

1

4

6

6

3

r

e

C

p

G

x

e

f

n

o

o

)

i

e

6

1

1

2

2

4

1

2

3

3

s

e

c

0

0

0

0

0

0

0

0

0

2

s

u

-

-

-

-

-

-

-

-

-

-

n

l

e

A

A

A

A

A

A

A

E

E

E

E

E

E

E

E

E

E

a

r

a

c

7

2

2

0

5

3

7

6

3

2

N

N

N

N

N

N

N

p

v

i

7

4

8

4

2

3

9

2

5

0

-

f

x

.

.

.

.

.

.

.

.

.

.

i

p

e

9

5

9

1

2

2

4

2

2

1

n

(

-

g

o

i

c

S

]

]

]

]

]

]

]

]

%

%

%

g

]

]

B

]

]

]

]

]

]

]

%

%

%

%

3

3

7

)

n

%

.

.

.

i

%

%

e

9

3

3

3

3

%

%

%

%

%

%

%

.

7

.

.

.

3

3

6

s

0

0

f

.

6

n

0

0

0

0

0

0

0

3

3

3

3

7

7

8

]

s

2

6

[

[

[

[

6

[

[

[

o

5

[

[

[

e

,

[

1

1

1

3

[

[

e

)

)

)

)

[

)

)

[

[

)

[

%

r

)

)

)

8

)

)

G

n

)

[

5

5

5

5

5

5

5

)

)

)

)

5

5

5

p

=

5

5

5

o

)

1

1

1

1

1

1

1

s

s

5

5

5

5

1

1

1

i

l

x

n

1

1

l

(

(

(

(

1

(

(

(

(

(

(

t

s

l

(

(

1

1

1

1

(

(

a

e

r

(

(

(

(

t

e

e

-

5

5

5

5

5

5

5

5

5

5

n

5

5

r

o

5

c

c

1

1

1

1

1

1

1

5

5

5

5

1

1

1

9

o

1

1

1

p

p

u

r

1

1

1

1

/

/

/

/

/

/

/

/

/

/

A

x

/

/

o

/

a

D

6

r

)

)

)

)

/

)

)

/

/

)

/

)

)

)

e

)

)

(

)

B

0

0

0

0

0

0

0

2

)

)

)

)

1

1

3

P

t

3

9

(

(

(

(

(

(

(

1

(

2

2

2

5

1

1

1

(

(

a

C

(

(

(

(

(

(

(

0

0

0

0

0

0

0

h

L

3

9

1

t

2

2

2

5

1

1

3

S

1

1

1

)

)

)

)

)

)

)

)

)

g

2

)

)

)

4

8

1

)

)

4

)

1

2

8

4

n

l

i

)

3

0

7

3

4

6

9

B

)

0

)

6

6

0

0

7

8

a

s

s

5

1

7

1

5

8

7

5

t

l

1

1

0

2

1

1

1

2

3

e

(

(

(

(

(

(

(

(

s

l

(

(

(

(

(

(

(

(

(

c

n

e

e

0

9

5

5

8

0

4

5

u

r

e

4

1

1

5

1

4

7

4

9

C

1

3

5

6

6

0

5

5

p

D

1

9

5

2

5

5

8

G

3

5

1

1

3

3

8

(

x

5

6

2

9

3

1

5

5

3

e

C

A

B

1

3

F

1

K

2

2

2

B

A

B

D

2

X

R

S

A

A

L

F

T

F

1

e

P

F

Y

Y

P

T

A

S

N

N

X

T

P

O

n

A

S

A

A

F

F

A

F

A

S

A

O

U

U

F

e

O

T

M

F

U

C

F

N

N

J

J

M

M

S

F

F

G

G T T 67 Table 2.3 Summary statistics of associations for SNPs that reached genome-wide or suggestive significance in the genome-wide association study for CFRD onset

(Blackman et al., 2013). Association conducted with martingale residuals of Cystic

Fibrosis Related Diabetes in 762 508del homozygotes.

68

69 Table 2.4 The number of cells sequenced by predicted cell type in our study

(‘Current Study’) and two publicly available datasets (Baron et al., 2016

(GSE84133); Segerstolpe et al., 2016 (E-MTAB-5061)). Count and percentage of the cell types in each study is displayed.

70

t

n

2

5

1

e

1

2

3

3

7

8

9

4

2

4

9

.

.

.

.

.

.

.

.

.

.

.

.

.

.

c

2

7

0

r

0

0

0

0

0

1

1

2

5

8

8

1

1

4

e

P

t

9

n

4

5

7

0

6

6

0

6

9

1

4

u

2

5

7

7

1

8

9

7

8

8

2

1

3

4

5

o

1

1

1

2

3

8

2

C

e

p

l

l

l

o

e

t

c

s

l

r

l

e

l

l

l

e

l

l

e

n

l

i

e

c

g

e

r

l

e

l

c

l

c

l

l

l

e

c

l

c

l

l

l

n

l

l

l

l

I

e

e

l

e

l

l

l

o

I

S

l

e

e

d

o

e

c

e

p

c

i

e

e

e

l

d

a

c

c

e

c

s

i

c

y

s

i

c

c

c

l

a

a

n

l

f

n

r

s

t

s

i

t

T

a

e

e

a

a

o

a

a

a

o

m

s

t

t

e

l

C

s

l

l

t

h

h

l

i

r

n

l

s

t

c

d

T

i

a

c

S

e

p

s

e

m

p

e

l

a

o

u

e

c

l

i

p

a

B

P

x

M

D

d

f

C

A

C

c

A

D

i

e

E

G

n

-

s

n

H

s

E

o

U

M

a

C

l

c

n

U

t

n

2

6

1

5

e

1

2

2

3

6

0

9

0

3

0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

c

1

2

7

9

r

0

0

0

0

0

2

2

3

3

7

1

1

2

2

e

P

t

9

7

6

5

n

3

2

5

4

1

8

6

3

8

5

5

7

2

2

u

7

7

5

5

8

0

5

5

1

1

2

5

0

3

5

o

1

2

2

2

6

9

n

8

1

2

2

C

o

r

a

B

e

e

t

t

a

a

l

l

e

l

l

l

a

e

g

e

n

i

e

t

l

a

l

t

l

a

n

r

p

n

l

l

t

s

a

e

a

s

a

h

o

a

y

a

a

m

t

t

a

e

l

s

t

t

h

h

l

i

t

p

n

T

c

t

c

d

i

a

n

w

e

p

s

m

e

o

o

l

l

o

u

_

c

e

e

h

r

l

p

a

B

t

M

T

D

d

A

T

c

c

c

A

D

e

a

E

G

n

s

a

S

v

C

i

e

E

t

i

M

c

u

A

Q

t

n

0

8

e

2

5

2

5

4

5

.

.

.

.

.

.

.

.

c

6

7

r

1

1

2

2

4

4

1

6

e

P

t

9

2

n

y

3

4

9

9

6

5

5

5

3

u

d

3

3

7

9

3

4

6

7

0

o

1

1

4

u

2

2

t

C

S

t

n

e

r

e

r

t

l

s

c

u

l

a

s

e

m

l

i

l

u

l

l

l

y

r

p

C

e

r

l

e

e

a

a

h

D

y

c

t

e

a

/

c

h

-

t

c

n

r

T

t

c

-

h

i

a

t

o

n

a

l

o

u

a

c

l

t

h

e

T

n

d

O

D

i

A

e

e

p

s

l

n

c

e

B

C

A

E

A M

71 Table 2.5 Average normalized gene expression values of selected genes in cells that express only CFTR, only SLC26A9, both or neither. List encompasses selected ion channels, bicarbonate transporters, FOXI1 and WNK pathway genes. Gene expression values in our study (‘Current Study’), and four previously published studies (Baron et al.

(GSE84133), Wang et al. (GSE83139), Muraro et al. (GSE85241) and Segerstolpe et al.

(E-MTAB-5061)) are shown. Each expression value has been colored according to its relative expression value within each study, where green indicates higher expression and grey indicates lower expression. NA indicates that this gene was not detected in that study. SLC26A9 average expression among the five studies are as follows: ‘Current

Study’: 0.0043, ‘Baron’: 0.0066, ‘Wang’: 3.9926, ‘Muraro’: 0.0173 and ‘Segerstolpe’:

0.7564.

72

73

Table 2.6 Top CFRD-associated variants as eQTLs for SLC26A9. Top CFRD- associated SNPs (as determined by Blackman et al., 2013) and their association with

SLC26A9 expression in the pancreas and lung, obtained from GTEx, v7. A positive beta value indicates the risk variants associated with higher gene expression.

74

75

Chapter 3: Evaluation of SLC26A9 and PM20D1 expression in Human Stomach, Lung and Pancreas

76 3.1 Introduction

Genome-wide association studies (GWAS) have detected many disease-associated variants (mostly in non-coding regions (181)) and determining the mechanism(s) of action is not usually straightforward. Dual luciferase-renilla reporter assays (Chapter 2) have shown that DNA fragments containing CFRD-associated variants derived from the

5’ region of SLC26A9 generated 12-19% higher levels of expression in PANC-1 cells (p- value: 6.32E-07) and CFPAC-1 (p-value: 2.0E-3) when low risk variants were present compared to high risk variants. But despite the indirect evidence that SLC26A9 may be involved in CF physiology, there is, as yet, no direct evidence that the CFRD (or MI) modifier variants are acting through (or exclusively through) SLC26A9. For example, studies have shown that GWAS variants can affect expression of genes located as much

200 Kbps (182). Thus, it is possible that the CFRD associated variants might affect the expression not only of SLC26A9, but also of adjacent genes. PM20D1 (peptidase M20 domain-containing 1) is located 63 kb downstream of SLC26A9 and encodes for an enzyme that acts as a regulator of N-acyl amino acids. PM20D1 is involved in energy expenditure leading to reduced fat mass and lowered glucose in brown and beige adipocytes where it is found to be expressed (183).

The results from the reporter assay was consistent with our eQTL analysis of publicly available data from the Genotype-Tissue Expression (GTEx) portal used to determine whether the CFRD-risk variants associate with SLC26A9 and PM20D1 RNA expression in the pancreas, lung and stomach. We found that the high risk alleles of CFRD variants are associated with decreased expression of SLC26A9, which indirectly confirms the

77 luciferase reporter assay results. Specifically, our results show that alleles on the LR haplotype were associated with increased expression of SLC26A9 in the pancreas, but it did not correlate with expression in the lung and stomach, and thus will provide a useful control. PM20D1, on the other hand, is expressed in brown and beige adipocytes as well as the pancreas, and very small amounts in skin. However, it is not expressed at appreciable levels in the lung or stomach. eQTL analysis showed that the top 12 CFRD- associated variants are as likely to be eQTLs for PM20D1 as SLC26A9 in the pancreas.

In addition to altering expression of SLC26A9 or PM20D1, another (non-exclusive) possibility is that the CFRD-associated variants influence the splicing of SLC26A9, as at least three isoforms of this gene have been reported. Isoform-specific expression of genes can be mediated through enhancer-associated noncoding RNA intermediates (184). The

SLC26A9 isoforms differ in the inclusion or exclusion of a C-terminal PDZ-domain binding motif at the 3’ end, of which the latter is essential for interaction CFTR (155).

We aim to find evidence of alternative spliced isoforms of SLC26A9 and to determine whether the isoform ratio correlates with the status CFRD risk in primary human tissue.

Additionally, with several observations (luciferase-renilla reporter assays and eQTL analysis) in hand which suggest that CFRD risk variants 5’ of SLC26A9 influence its level of expression, and possibly the expression of an adjacent gene PM20D1, we also aim to evaluate these findings in primary tissues. This is essential before moving to develop strategies to alter the level of either gene product as a treatment for CFRD.

78 We propose to quantify the RNA abundance of each gene in primary human tissues and correlate these findings with the status of the CFRD variants. We hypothesize that low risk CFRD alleles are associated with higher expression of SLC26A9 and PM20D1 compared to high risk CFRD status.

3.2 Results

RNA Sequencing Data predicts three isoforms of SLC26A9 with varying levels of expression in different human tissues. Using the publicly available RNA sequencing data from the Human Protein Atlas, we found that the isoform profiles of SLC26A9 appear to vary by tissue. The predominant mRNA transcript in most tissues is isoform 1, which is expressed in the salivary glands, skin, stomach and lung, and pancreas in decreasing amounts. Isoform 2 is present in salivary gland and isoform 3 in lung, skin and sweat gland at very low levels (Figure 3.1).

3’RT-PCR excludes alternative splicing of the 3’ end of SLC26A9 transcripts.

RNA-sequencing data predicts alternative splicing of the 3’ end of SLC26A9, which creates isoforms of SLC26A9 that differ in the inclusion or exclusion of a C-terminus

PDZ-domain binding motif. Human Tissue samples were obtained from the Johns

Hopkins Hospital Department of Pathology (n=20 for stomach, n=25 for lung, and n= 25 for pancreas). We analyzed cDNA synthesized from human stomach, lung, and pancreas

RNA and determined whether alternative splicing occurs using RT-PCR targeted at the 3’ end of SLC26A9. Although the RNA sequencing data suggests that several isoforms occur in tissue, we could only detect one isoform type, isoform 1, in each sample

79 examined. SLC26A9 transcript was amplified in 8 of 20 samples from human stomach

(Figure 3.2), 20 of 25 lung samples (Figure 3.3), and 16 of 25 pancreas samples (Figure

3.4)). Isoform 2 and isoform 3 could not be detected in any sample. The results for stomach and pancreas are consistent with data from the Human Protein Atlas (Figure

3.1). For pancreas, fragment at ~180 bp for P20, P37 and P40 were gel excised, purified, sent for sequencing, and were determined to be non-specific amplification. Although data from the human protein atlas suggests that two isoforms occur in the lung, we could only detect one isoform type, isoform 1 at 360 bp, in each lung sample examined that showed amplification of SLC26A9. Fragment at ~180 bp for L14, 29, 46 and 44 were determined to be nonspecific amplification.

qPCR reveals no observable correlation between LR/LR and HR/HR haplotypes with expression of SLC26A9 mRNA in human primary stomach, lung and pancreas tissue. Using cDNA synthesized from RNA extracted from human primary tissue, we performed qPCR to quantify the amount of SLC26A9 in human stomach, lung and pancreas. Both HPRT1 and TBP were used as housekeeping genes due to their relatively similar expression levels compared to SLC26A9 (Figure 3.5). To assess the effect of

CFRD-risk variants, we determined the haplotype status and quantified the total amount of SLC26A9 for 17 human stomach tissue samples, 26 human lung tissue samples, and 25 human pancreas tissue samples.

The haplotypes were determined by sanger sequencing of gDNA (Table 3.1; See

Materials and Method). For Figures 3.6, 3.7, 3.8: Bar diagram indicate the level of

80 SLC26A9 mRNA analyzed using the 2ΔΔCT method relative to HPRT1 (Figures 3.6A,

3.7A, 3.8A) and TBP (Figures 3.6B, 3.7B, 3.8B). The data is sorted in ascending order for human stomach, lung and pancreas. The haplotypes of each sample are represented by the color in each bar where HR/HR homozygotes are in red, HR/LR heterozygotes are in gray, and LR/LR homozygotes are in blue. In parentheses () are the quantity of SLC26A9 relative to housekeeping genes calculated using the 2ΔΔCT method. For pancreas (n=25), the variability in expression of SLC26A9 ranged from (0.38) to 75-fold high (28.47) relative to HPRT1 (Figure 3.6A) and (0.26) to 96-fold high (24.85) relative to TBP

(Figure 3.6B). For stomach (n=17), the variability in expression of SLC26A9 ranged from

(0.141) to 200-fold high (28.218) relative to HPRT (Figure 3.7A). 1 and (0.028) to 757- fold high (21.205) relative to TBP (Figure 3.7B). For lung (n=26), the variability in expression of SLC26A9 ranged from (0.295) to 17-fold high (5.066) relative to HPRT1

(Figure 3.8A) and (0.446) to 12-fold high (5.352) relative to TBP (Figure 3.8B). Note:

For TBP, L13 did not amplify within 35 cycles.

For each of the different tissues, SLC26A9 quantity relative to HPRT1 (Figures 3.6C,

3.7C, 3.8C) and TBP (Figures 3.6D, 3.7D, 3.8D) were grouped based on haplotype

(LR/LR, LR/HR, HR/HR). When grouped by haplotypes, there were no statistically significant difference among LR/LR, LR/HR, and HR/HR in the three tissues studied.

qPCR reveals a potential correlation between LR/LR and HR/HR haplotypes with expression of PM20D1 in human primary pancreas tissue. Using cDNA synthesized from RNA extracted from human primary tissue, we performed qPCR to quantify the

81 amount of PM20D1 in human pancreas. Both HPRT1 and TBP were used as housekeeping genes due to their relatively similar expression levels compared to

PM20D1 (Figure 3.5). To assess the effect of CFRD-risk variants, we determined the haplotype status and quantified the total amount of PM20D1 in 21 human pancreas tissue samples. The same grouping by haplotype analysis was done for PM20D1 in human pancreas tissue as described for SLC26A9. For the pancreas (n=21), the variability in expression of PM20D1 ranging from almost undetectable (0.010) to 293-fold higher

(2.927) relative to HPRT1 and also barely undetectable (0.003) to 531-fold higher (1.592) relative to TBP. For Figure 3.9: Bar diagram indicate the level of PM20D1 mRNA analyzed using the 2ΔΔCT method relative to housekeeping genes, HPRT1 (Figure 3.9A) and TBP (Figure 3.9B). The data is sorted in ascending order for human stomach, lung and pancreas.

For the pancreas, PM20D1 quantity relative to HPRT1 (Figure 3.9C) and TBP (Figure

3.9D) were grouped based on haplotype (LR/LR, LR/HR, HR/HR). When grouped by haplotypes, there was a subtle trend among LR/LR, LR/HR, and HR/HR. The haplotypes of each sample are represented by the color in each bar where HR/HR homozygotes are in red, HR/LR heterozygotes are in gray, and LR/LR homozygotes are in blue.

3.3 Discussion

RNA sequencing of multiple tissues indicates that at least three isoforms of SLC26A9 occur, only one of which matches those reported by Lohi et al. (110). The major species

(isoform 1) has 21 exons including exon 1 as observed by Lohi et al. Two isoforms have

82 alternative splicing of the 3’ region; isoform 2 has exon 1 and isoform 3 is missing exon

1 (Figure 3.10). Our re-analysis of predicted data obtained from the Human Protein Atlas suggests that isoform 1, the predominant mRNA transcript of SLC26A9, is expressed in the lung, pancreas, stomach, and salivary gland. However, alternative splicing of the 3’ end to retain or splice out a PDZ-binding motif yielding alternative isoforms, has been reported. Determining whether alternative splicing of the 3’ end occurs is of significant importance as the PDZ-binding motif of SLC26A9 has been reported to play a role in the interaction with CFTR (113, 121, 130).

3’RTPCR analysis performed on human tissues using SLC26A9-specific primers confirmed that SLC26A9 isoform 1 transcripts were found in stomach (Figure 3.2), lung

(Figure 3.3) and pancreas (Figure 3.4). Salivary gland was not a candidate for analysis because it does not express CFTR. Neither isoform 2 nor isoform 3 were observed with amplification at expected size of 411 base pairs for stomach in agreement with data obtained from the Human Protein Atlas for pancreas and stomach tissue (Figures 3.2 and

3.4). Although two isoforms were predicted to exist in the lung, we were only able to detect the major isoform, isoform 1, and not isoform 2 or 3 at expected size of 183 base pairs(Figure 3.3). For pancreas, two samples, P37 and P40 had amplification at ~180 bp, however, those were sequence verified to be nonspecific binding. There is a possibility that isoform 3 is expressed below the sensitivity threshold. It is also possible that the isoform(s) are expressed in specific cell-type(s) that are not representative in the tissue samples that were analyzed. Multiple studies have shown transcripts to have cell-specific expression. In the current study, SLC26A9 was shown to be expressed in a subset of duct

83 cells, which was replicated in four additional publicly available studies (94, 95, 140,

141). Similarly, expression CFTR transcript has been shown to be solely found to be expressed in ionocytes (146, 147). Lastly, it could be due to variability from individual to individuals or disease-state specific. The tissue obtained from Johns Hopkins Pathology are de-identified and therefore we have no knowledge of the clinical history and disease phenotype of these samples.

To verify the luciferase expression studies in PANC-1 and CFPAC-1 cells, as well as eQTL observations, we sought to quantify the RNA abundance of SLC26A9 in human primary pancreas, stomach, and lung tissue, and correlating the findings with the status of

CFRD variant. However, correlation between haplotype and expression of either genes relative to housekeeping genes, HPRT1 and TBP, were not observed (Figures 3.6, 3.7,

3.8). This may be because the express levels of SLC26A9 and may be developmental stage dependent which is not captured in the sample cohort that we acquired. This is likely to be relevant as observations in mice indicate that SLC26A9 expression is considerably higher in utero and decreases shortly after birth (125).

We also looked to find direct evidence of the CFRD-risk SNPs affecting expression of the downstream gene PM20D1, in human primary tissue and correlating the findings with the status of CFRD variant. A subtle trend was observed (Figure 3.9), however, additional samples are need in order to achieve enough power to statistically analyze the results for significance. It is calculated such that if a minimum number of 120 samples were analyzed, then we can expect 10 samples to be LR1/LR1 and 9 to be HR1/HR1. We should also obtain at least 16 samples that contain one copy of each haplotype, i.e.

84 LR1/HR1. Based on the variance of PM20D1 transcript levels from preliminary results,

80% power is achieved with true difference in means of 1.5 x variance (2-sample t-test,

N=10 and 8). This would be sufficient to statistically distinguish about 48% of the mean expression level of 0.27 (1.4*0.09 = 0.13 expression units). Power is improved if we use all three CFRD-risk groups (LR1/LR1; LR1/HR1; HR1/HR1) and apply linear regression. If total n=35 (10/8/16), then the detectable beta is 0.49*variance or 0.0441 which is about 16% of the mean expression level of 0.27 (185).

Primary tissues for analysis were selected on the basis of gene expression profiling. Since

SLC26A9 may interact with CFTR, we have chosen to study its expression in tissues that also express CFTR, namely the pancreas and lung (see Figure 3.1, 3.5) We also analyzed stomach tissue since it is shown by GTEx to express the highest level of SLC26A9

(Figure 3.1, 3.5). PM20D1 is also expressed in the pancreas but is not expressed at appreciable levels in the lung. While we are confident that our estimates of the CFRD haplotype frequencies are correct as they are based on sequencing of approximately 800 individuals, it is possible by chance that we will not obtain the minimum number of homozygous LR1 or HR1 or both. To overcome this challenge, we acquired additional samples from the Johns Hopkins Department of Pathology to supplement our data from samples that have the LR2 and Other 5, Other 6 haplotypes to the LR1 samples and HR2 and Other 1-4 to the HR1 samples. “Other” haplotypes are additional haplotypes with frequencies greater than 2% (Table 3.1). The LR2 haplotype is identical to the LR1 haplotype at all 12 of the critical CFRD-associated variants. The HR2 haplotype is identical to HR1 at 10 of the 12 CFRD- associated variants.

85 Collectively, our data shows that 3’end of SLC26A9 transcripts retains the PDZ-binding motif, which has been reported to play a role in the interaction with CFTR. This implicates the exclusion of alternative splicing of the 3’ end of SLC26A9 transcripts as predicted by RNA-seq. Our findings also indicated that further in depth or comprehensive analysis is required to assess the correlation between expression of SLC26A9 and

PM20D1 and haplotype status. This may require availability of additional human primary tissue as well as access to tissues from human fetal developmental stages or more plausible tissue from earlier developmental stages using animal tissue.

3.4 Materials and Method

RNA Extraction from human stomach, lung and pancreas tissue samples. Total RNA was isolated from pancreas, lung, and stomach tissues using the Total RNA Mini

Purification Kit (Denville Scientific). RNAlater-treated tissues were homogenized using a hand-held Homogenizer in trizol.

cDNA synthesis from RNA isolated from human stomach, lung and pancreas tissue samples. cDNA templates were synthesized from isolated total RNA using SuperScript II

(Life Technologies) as instructed by the manufacturer. For each sample mix: 8 uL buffer and 2uL reverse transcriptase were used.

Haplotyping of stomach, lung and pancreas tissue samples. Tissue DNA will be isolated from the same RNAlater-treated de-identified donor pancreas, lung and stomach tissues from Johns Hopkins Hospital Pathology using the DNeasy Blood & Tissue Kit

(Qiagen) as instructed by the manufacturer. The 5’ region upstream of SLC26A9 was

86 PCR amplified using four primer pairs (expected size fragments 365-678 bp) encompassing the 12 variants that establish the CFRD-risk haplotypes (HR or LR). The fragments of expected size were visualized using gel agarose electrophoresis and gel purified and sent for Sanger sequencing. Haplotypes were determined using deep sequencing data of the entire SLC26A9 locus (Figure 3.9). Sentinel SNPs allow us to categorize the sample as a high-risk (HR) or low-risk (LR) (or an “other”) haplotype sample. The 7 remaining variants allow for differentiating sub-haplotypes i.e. LR1, LR2,

HR1, HR2 or “Other”. Using the combination of those SNPs, we can confidently establish the haplotypes of each sample as HR1/HR1, HR1/LR1, LR1/LR1, Other/Other or any combination of these.

Primer sequences to amplify across 4.8 kb 5’ of SLC26A9:

Capture SNPS 15, 13, 8A (Expected size 678 bp): ANL 89R: 5’CATGGGTAGCCTGACCTGTT3’ ANL 90F: 5’TTCCAATTCCAGCATGTATCC3’

Capture SNPS 8 and 6 (Expected size 452 bp): ANL 91R: 5’GGGCTTCAGTAAGGCACCTC3’ ANL 29F: 5’TTCACGGTGTTTGCAGAAAG3’

Capture SNPs 5 and 4 (Expected size 488 bp): ANL 93R: 5’GAGCTATTGAGGCCACCTGA3’ ANL 95F: 5’TGAGGTCCTTCCGTAGCACT3’

Capture SNPs 3 and 2 (Expected size 365 bp): ANL 96R: 5’GCCTGAGGAGATAGGAAAAGG3’ ANL 22F: 5’CCACCCCCTAGATCTCCAAT3’

Primer sequences for sanger sequencing:

Sequence product of ANL 89/ANL 90: ANL 27: 5’CCAGACGTCTTCATCCAGGT3’ ANL 28: 5’CTGCAAAAGCAGCCCTAGTC3’

Sequence product of ANL 91/ANL 29: ANL 92: 5’TGAATTCTGGCTCTCTTCAGC3’

87

Sequence product of ANL 93/ANL 95: ANL 94: 5’GGGCTGTGACTGGAGATTTG3’

Sequence product of ANL 96/ANL 22: ANL 97: 5’AGGACAGCAAGCTCAGAGGA3’

Determining the alternative splicing profile of the 3’ end of SLC26A9 transcripts using 3’RT PCR. The 3’ end of the SLC26A9 isoform(s) was captured using primers that bind exons 19-22 for stomach samples and exons 20-21 for lung and pancreas samples. DNA fragments of the expected size were purified from 2% agarose and sequenced. By comparison with the SLC26A9 genomic sequence, we determined whether intron 21 was spliced out to create isoforms 2 or 3 (Figure 3.2).

Primer sequences:

For SLC26A9:

Targeting exons 19-22 (Expected size: 588 bp for Isoform 1; 411 bp for Isoform 2/3)

ANL 50F: 5’CCCAGGCAAATGCTAGAGAC3’

ANL 51R: 5’GGCTCTGGGACACTTTTTGA 3’

Targeting exons 20-21 (Expected size: 360 bp for isoform 1; 183 bp for isoform 2/3)

DOL 1F: 5’AGGACATTCGCAGCTACTGG 3’

DOL 2R: 5’GAGCAGGAGGCTTGTCCATT 3’

Quantifying levels of SLC26A9 and PM20D1 using qPCR. Quantitative PCR (qPCR) was performed in triplicate using SYBRGreen Universal 23 quantitative PCR Master Mix

(Thermo Fisher Scientific) on an BIORAD CFX Connect Real-Time detection system.

Each sample was composed of 6 uL SYBRGreen, 3 uL water, 0.6 uL of primers and 2.4

88 uL of cDNA. Data were analyzed using the 2ΔΔCT method as described by Livak and

Schmittgen (186). The mRNA levels were normalized to the internal control gene

HPRT1 and TBP.

Primer sequences:

For SLC26A9: One pair of primers targeted within the SLC26A9 transcript (exons 4-5) and the other pair targeted the 3’ end of the transcript (exons 19-22).

Targeting exons 4-5:

DOL 5F: 5’ TCCCCCTCCTGACCTACTTC 3’

DOL 6R: 5’ TGCAGACAGATGTTACCCACC 3’

Targeting exons 19-22:

DOL 12F: 5’CTAGAGACGTGACCCCAGGA 3’

DOL 13R: 5’GTCCCAGTAGCTGCGAATGT 3’

For PM20D1: One pair of primer targeted the beginning of the PM20D1 transcript (exons

2-3) and the other pair targeted the middle of the transcript (exons 11-12).

Targeting exons 2-3: DOL 8F: 5’AGTCCAATACTACAGCCCTGG 3’

DOL 9R: 5’ TCCACGACTTCATGCTGGAT 3’

Targeting exons 11-12:

DOL 10F: 5’ GACCGTACAGTCCGTCTTCC 3’

DOL 11R: 5’ATGCCAGTGGTGAGGTTTGT 3’

For housekeeping genes:

Targeting TBP:

TBP F (ANL56): 5’ TCGTCTTCCTGAATCCCTTTA 3’

TBP R (ANL55): 5’ TGTTCTCCTTATTTTTGTTTCTGG 3’

89

Targeting HPRT1:

Primers were purchased from BioRad: PrimePCR™ SYBR® Green Assay:

HPRT1, Human 2500 x 20 µL reactions desalted BIO-RAD

https://www.bio-rad.com/en-us/prime-pcr-assays/assay/qhsacid0016375-

primepcr-sybr-green-assay-hprt1-human

SLC26A9 mRNA transcripts will be quantified using qPCR with SYBR GREEN. The primers will amplify the SLC26A9 transcript that has been converted into cDNA. To normalize SLC26A9 expression across the different samples, we will use HPRT1 and

TBP, two housekeeping genes that are expressed at similar levels among different individuals and has relatively equal expression across tissue types, including the tissues of interest (pancreas, lung and stomach).

90 Figure 3.1 RNA Sequencing Data predicts three isoforms of SLC26A9 with varying levels of expression in different human tissues. Reanalysis of data from the Human

Protein Atlas. Isoform 1 is the predominant mRNA transcript and is expressed in the salivary glands, skin, stomach and lung, and pancreas. Isoform 2 is present in salivary gland and isoform 3 in lung, skin and sweat gland at very low levels.

91

92 Figure 3.2 SLC26A9 expression profile in human stomach tissues (N=20). The

3’RTPCR analysis was performed on cDNA at 500 ng/uL using SLC26A9-specific primers from exon 19 to exon 21. An electrophoresis of 1% agarose gel stained with EtBr was performed to visualize the expected fragment of 588 base pairs for SLC26A9 Isoform

I (upper Gel). 8 out of the 20 (40%) stomach samples shows the single amplification of the product. No Isoform 2/3 were observed with amplification at expected size of 411 base pairs. TATA box binding protein (TBP) as a control was amplified (bottom Gel) with an expected size of 150 base pairs for RNA (797 bp for gDNA). No-RT and water lanes were used as negative controls.

93

0

1

=

N

5

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

0

1

16

=

N

17

18

19

20

21

’ 3

94 Figure 3.3 SLC26A9 expression profile in human lung tissue (N=25). The 3’RTPCR analysis was performed on cDNAs at 500 ng/uL using SLC26A9-specific primers from exon 20 to exon 21. An electrophoresis of 2% agarose gel stained with EtBr was performed to visualize the expected fragment of 360 base pairs for SLC26A9 Isoform I

(upper gel). 20 out of the 25 (80%) lung samples shows the single amplification of the product. No isoform 2 or 3 were observed with amplification at expected size of 183 base pairs. Fragment at ~180 bp for L14, 29, 46 and 44 were determined to be nonspecific amplification. TATA box binding protein (TBP) as a control was amplified (bottom gel) with an expected size of 150 base pairs for RNA (797 bp for gDNA). No-RT and water lanes were used as negative controls.

95

3

1

=

N

5

1

2

3

4

5

6

7

8

9

10

11

12

2

1

13

=

N

14

15

16

17

18

19

20

21

’ 3

96 Figure 3.4 SLC26A9 expression profile in human pancreas tissue (N=25). The 3’RT-

PCR analysis was performed on cDNAs at 1000 ng/uL using SLC26A9-specific primers from exon 20 to exon 21. An electrophoresis of 2% agarose gel stained with EtBr was performed to visualize the expected fragment of 360 base pairs for SLC26A9 Isoform I

(upper gel). 16 out of the 25 (64%) pancreas samples shows the amplification of the product. No isoform 2 or 3 were observed with amplification at expected size of 183 base pairs. Fragment at ~180 bp for P20, P37 and P40 were gel excised, purified, sent for sequencing, and were determined to be non-specific amplification. TATA box binding protein (TBP) as a control was amplified (bottom gel) with an expected size of 150 base pairs for RNA (797 bp for gDNA). No-RT and water lanes were used as negative controls.

97

3

1

=

N

5

1

2

3

4

5

6

7

8

9

10

11

12

2

13

1

=

14

N

15

16

17

18

19

20

21

’ 3

98 Figure 3.5 Gene expression of SLC26A9, CFTR, PM20D1 and housekeeping genes in the lung, salivary gland, pancreas, stomach and skin. Gene expression data extracted from Genotype Tissue Expression (GTEx) portal are plotted on the Y-axis as a logarithm transformed RPKM+1 values (Read per kilobase per million+1). 1 was added to the

RPKM value to ensure values that fell between 0 and 1 when log transformed would result in a positive value. The two hinges are the first and third quartile, the horizontal line indicates the median, and the vertical lines indicate the maximum and minimum values, not including the outliers shown in circles.

99

100 Figure 3.6 SLC26A9 quantity relative to HPRT1 and TBP in human pancreas tissue.

Analysis of SLC26A9 gene expression relative to housekeeping genes HPRT1 and TBP in pancreas tissue using qPCR and the 2ΔΔCT method (N=25). Sanger sequencing of gDNA was performed to confirm haplotype of each sample represented as bar color: HR-

HR in red, HR-LR (HET) in gray and LR-LR in blue. (A) SLC26A9 quantity in pancreas tissue relative to HPRT1 and (B) to TBP for each sample sorted in ascending order. (C)

SLC26A9 quantity relative to HPRT1 and (D) to TBP in pancreas tissue were grouped based on haplotype (LR/LR, LR/HR, HR/HR).

101

1

1 T

A C T

R

R P

30 P

H 30

H

o

t

o

t

e

e

v

i

v t

i 20 t

a 20

l

a

l

e

e

R

R

y

t 10

y

i

t

t

i t

n 10

a

n

a

u u

Q 0

Q

9

9

A 6

0 A 2

6 -10 2

C 3 9 0 7 4 0 0 4 9 6 9 5 3 2 3 7 7 3 2 1 8 0 7 4 4 L

P5 P2 P4 P1 P1 P2 P5 P3 P1 P1 P4 P1 P6 P2 P3 P4 P3 P1 P4 P5 P3 P3 P2 P2 P6 C

S

L S Pancreas LR/LR LR/HR HR/HR

Pancreas P

B B D

P T

30

B

o

T

t

30

e

o

t

v

i

t

e

a

v

l

i t

e 20 20

a

l

R

e

y

t

R

i

t

y t

n 10

i t

a 10

n

u

a

Q

u 9

Q 0

A

9

6 2

0 A

6

C 2

L 4 3 9 0 9 0 7 6 3 0 0 5 7 4 9 7 1 2 8 3 3 2 7 4 4 -10 C

S P1 P5 P2 P5 P1 P4 P1 P1 P6 P3 P2 P1 P4 P3 P4 P3 P5 P2 P3 P1 P3 P4 P2 P2 P6 L Pancreas S LR/LR LR/HR HR/HR Pancreas

102 Figure 3.7 SLC26A9 quantity relative to HPRT1 and TBP in human stomach tissue.

Analysis of SLC26A9 gene expression relative to housekeeping genes HPRT1 and TBP in stomach tissue using qPCR and the 2ΔΔCT method (N=17). Sanger sequencing of gDNA was performed to confirm haplotype of each sample represented as bar color: HR-HR in red, HR-LR (HET) in gray and LR-LR in blue. (A) SLC26A9 quantity in stomach tissue relative to HPRT1 and (B) to TBP for each sample sorted in ascending order. (C)

SLC26A9 quantity relative to HPRT1 and (D) to TBP in stomach tissue were grouped based on haplotype (LR/LR, LR/HR, HR/HR).

103

1 T

A C 1

R

T P

30 R

H P

30

H

o

t

o

e

t

v

i

e

t

v i

a 20 20

l

t

e

a

l

R

e

y

R

t

i 10

y

t

t

i n

10 t

a

n

u

a

Q u

0

9

Q

A

9 6

0 A

2 6

C -10 1 4 5 0 2 9 3 4 7 2 5 9 7 8 3 1 0 2

L _

S3 S3 S3 S2 S3 S1 S3 S2 S3 6 S2 S2 S2 S2 S2 S2 S3 C S

S2 L S Stomach LR/LR LR/HR HR/HR

Stomach P

B B D

P T

30

B

o

T

t

30

o

e

t

v

i

e

t

v

a

i

l t

e 20 20

a

l

R

e

y

R

t

i

y t

t 10

i

n

t a

10 n

u

a

Q

u

Q

9 0

9 A

6

A 2

0 6 2

C -10 L

5 4 9 2 2 2 3 8 7 0 4 0 5 1 7 9 3 C L

S _ S3 S3 S2 S2 S3 6 S3 S2 S3 S3 S2 S2 S2 S2 S2 S1 S2 S2 S LR/LR LR/HR HR/HR Stomach Stomach

104 Figure 3.8 SLC26A9 quantity relative to HPRT1 and TBP in human lung tissue.

Analysis of SLC26A9 gene expression relative to housekeeping genes HPRT1 and TBP in lung tissue using qPCR and the 2ΔΔCT method (N=26;*Note: L13 did not amplify for

TBP). Sanger sequencing of gDNA was performed to confirm haplotype of each sample represented as bar color: HR-HR in red, HR-LR (HET) in gray and LR-LR in blue. (A)

SLC26A9 quantity in lung tissue relative to HPRT1 and (B) to TBP for each sample sorted in ascending order. (C) SLC26A9 quantity relative to HPRT1 and (D) to TBP in lung tissue were grouped based on haplotype (LR/LR, LR/HR, HR/HR).

105

1

1 T

A C T

R

R P 6 P

H 6

H

o

o

t

t

e

e

v

i

v

i t t

a 4 l

a 4

l

e

e

R

R

y

t

y

i

t

t

i

t n

2 n

a 2

a

u

u

Q

Q

9

9

A

A 6

0 6

2 0

2 C

-1 C

L L7 L6 L9 L24 L43 L31 L13 L19 L46 8 L33 L29 L36 L18 L47 L23 L20 L15 L38 L30 L39 L41 L49 L14 L44 L37 4 L

S L S Lung LR/LR LR/HR HR/HR

Lung P

B B D

P T

6

B

o

t

T

6

e

o

t

v

i

t

e

a

v

l

i t

e 4 4

a

R

l

e

y

t

R

i

t

y n

t 2

i a

2 t

u

n

a Q

u

9 Q

A 0

6

9 2

0 A

C

6 L -1 2

S L7 L6 L9 -2

8 C L36 L43 L46 L23 L33 L24 L19 L20 4 L30 L31 L38 L41 L29 L47 L18 L37 L39 L44 L49 L14 L15

L L S

Lung LR/LR LR/HR HR/HR Lung

106 Figure 3.9 PM20D1 quantity relative to HPRT1 and TBP in human pancreas tissue.

Analysis of PM20D1 expression relative to housekeeping genes HPRT1 and TBP in pancreas tissue using qPCR analyzed by the 2ΔΔCT method. Sanger sequencing of gDNA was performed to confirm haplotype of each sample represented as bar color: HR-

HR in red, HR-LR (HET) in gray and LR-LR in blue. (A) PM20D1 quantity in pancreas tissue relative to HPRT1 and (B) for TBP for each sample sorted in ascending order. (C)

PM20D1 quantity relative to HPRT1 and (D) to TBP in pancreas tissue were grouped based on haplotype (LR/LR, LR/HR, HR/HR).

107

1

1 T

A C T

R

R P

4 P 4

H

H

o

o

t

t

e

e 3

v 3

v

i

i

t

t

a

a

l l

e 2

e

R

R

2

y

y

t

t

i i

t 1

t

n

n

a a

u 1 u

Q 0

Q

1

1

D

D 0

0 0 -1 2

1 9 2 0 3 4 7 6 0 3 4 3 2 0 7 4 3 0 7 9 2 M

5 4 4 3 3 1 2 1 5 6 3 1 2 4 3 2 5 2 1 2 M R R R

P P P P P P P P P P P P P P P P P P P P P /L H H P / / R R R

L L H

P

P B

B D B T 2.0 T

2.0

o

o

t

t

e

e

v v

i 1.5 i

t 1.5

t

a

a

l

l

e e

R 1.0

R

y 1.0

y

t

i

t

i

t t

n 0.5

n

a

a u

0.5 u

Q

Q

1 0.0

1

D

D

0 0 2 0.0

2 -0.5 M

1 0 9 2 4 0 3 6 3 7 2 4 7 3 4 3 0 9 0 9 7 M P R P5 P3 P4 P4 P1 P5 P6 P1 P3 P2 P2 P2 P3 P1 P3 P5 P2 P1 P4 P2 P1 P R R /L /H /H R R R L L H Pancreas

108 Figure 3.10 Graphical depiction of SLC26A9 coding transcripts. The transcripts are depicted in the native orientation with 5’ on the right and 3’ on the left. Isoform 1 colored in gold contains 21 exons, includes 5’ UTR and a PDZ-binding motif in exon 21, which has been shown to mediate interaction with CFTR. Isoform 2 is similar to isoform 1 in that it has the same 5’ end but the PDZ domain has been spliced out. Isoform 3 has alternative splicing at both ends. 5’UTR was annotated to be 2nt upstream of translational start and like isoform 2, it does not contain the PDZ binding motif.

109

5

a

a

a

a

a

a

7

7

8

8

1

8

8

9

7

s

1

1

3

2

R

1

t

T

m

m

r

r

U

m

o

o

r

f

f

o

o

o

f

p

s

s

2

I

I

o

s

i

I

r

n

o

x

e

c

g

n

i

d

s

o

C

3

n

m

r

o

f

o

a

s

I

r

T

n

o

x

e

g

g

n

i

d

o

n

C

i

2

m

r

o

d

f

o

s

I

o

C

n

o

x

e

g

n

n

n

i

i

i

9

n

a

a

i

a

d

m

m

m

o

o

o

o

D

C

D

D

S

S

S

1

A

A

A

A

T

T

T

m

S

S

S

r

o

6

f

o

s

I

2

C

n

o

f

x

i

1

1

t

e

1

2

2

o

L

2

g

m

n

i

Z

2

2

d

2

D

2

P

o

S

c

L

-

-

n

A

-

o

T

N ’

3 110 Table 3.1 List of SNPs used to identify the haplotype of the gDNA extracted from human tissue samples. Haplotypes have been determined using deep sequencing data of the entire SLC26A9 locus (Figure 3.9). SNPs followed by the rsID are listed most left.

Ones highlighted in yellow are the CFRD-associated variants from Blackman et al.,

2013(84). The major haplotypes that we focused on are listed in columns across the figure with the allele that is found at each of the position. Alleles highlighted in red are key in differentiating LR from HR, LR1 from LR2, HR1 from HR2. Sentinel SNPs allow us to categorize the sample as a high-risk (HR) or low-risk (LR) (or an “other”) haplotype sample. The 7 remaining variants allow for differentiating sub-haplotypes i.e. LR1, LR2,

HR1 or HR2. Using the combination of those SNPs, we can confidently establish the haplotypes of each sample as HR1/HR1, HR1/LR1, LR1/LR1, Other/Other or any combination of these.

111

6

r

A

e

C

T

C

A

T

A

C

C

C

T

T

A

A

G

G

G

G

G

G

h

A

t

O

5

r

A

e

C

T

C

A

T

A

C

C

C

T

C

A

A

G

G

G

G

G

G

h

A

t

O

4

r

A

e

C

T

C

C

A

T

T

C

A

C

A

A

G

G

G

G

G

G

G

h

A

t

O

3

r

A

e

C

C

C

C

A

T

T

C

A

C

A

A

G

G

G

G

G

G

G

h

A

t

O

2

r

A

e

C

C

C

C

A

T

T

C

A

C

A

A

G

G

G

G

G

G

G

h

A

t

O

1

r

A

e

C

T

C

C

A

T

T

C

A

C

A

A

G

G

G

G

G

G

G

h

A

t

O

8

.

A

C

4

-

A

S

/

T

T

l

/

T

T

T

C

G

A

C

A

G

A

G

A

C

A

G

A

A

-

2

N

e

I

R

d

H

8

.

T

4

S

A

/

T

T

T

C

G

C

A

G

C

A

G

G

G

C

A

C

A

G

G

C

A

1

N

I

R

H

8

.

A

C

4

-

A

S

/

T

l

/

T

C

C

G

G

C

A

A

C

G

A

C

C

C

G

A

G

A

-

N

e

2

I

d

R

L

8

.

4

A

-

/

T

T

T

T

T

C

C

A

G

A

A

C

G

G

C

C

G

A

G

A

A

1

R

L

1

2

0

5

2

2

0

9

1

2

8

3

5

1

4

9

8

1

7

0

4

3

2

1

3

9

9

0

8

9

6

4

3

5

2

2

0

6

6

7

0

3

6

6

6

2

8

6

3

0

7

2

4

2

2

1

9

2

5

4

4

2

4

0

0

0

0

D

1

7

1

5

1

5

1

5

3

3

I

9

9

2

1

7

7

1

8

1

2

2

2

3

4

0

9

0

6

4

1

3

9

s

1

7

6

6

7

7

5

3

5

4

4

4

r

7

0

8

0

3

2

7

8

4

5

4

0

6

6

0

0

9

4

9

3

3

3

7

2

1

2

4

4

2

1

7

6

7

0

9

9

4

4

4

9

4

1

1

1

6

1

1

1

3

3

1

6

1

6

2

P

a

a

a

8

6

4

9

5

4

3

1

0

9

5

3

b

8

6

5

4

3

2

N

8

6

2

3

3

3

2

2

2

2

2

2

1

1

1

8 S

112

Chapter 4: Conclusions

113

Cystic fibrosis is a Mendelian autosomal recessive genetic disorder caused by loss-of- function mutations in CFTR. Airway inflammation and chronic infection lead to progressive destruction of the lungs making lung disease the major cause of morbidity and mortality in CF. Since the discovery of CFTR as the disease-causing gene for CF in

1989, considerable progress has been made in understanding its’ structure, function, and disease-causing mechanisms. Significant advances in the development of new drugs and therapies such as correctors and potentiators have steadily increased the median predicted age of survival in the last 30 years. As individuals with CF live longer, other age-related complications are becoming prominent. Cystic fibrosis-related diabetes (CFRD) is an age-dependent complication and is associated with worse lung function, malnutrition and mortality.

The studies detailed here investigate variants near and within SLC26A9 that was found to be associated with age-at-onset of CFRD from Blackman et al., 2018 (84). SLC26A9 is a compelling candidate as a modifier of CF. Multiple lines of evidence have shown the involvement of SLC26A9 in CF physiology. SLC26A9 encodes for an electrogenic chloride/bicarbonate transporter that has been shown by in vitro studies to interact with

CFTR in epithelial tissues via its STAS domain and PDZ-binding motif. Constitutive basal chloride conductance generate by SLC26A9 has been shown to be dependent on

CFTR being correctly trafficked to the plasma membrane. Indeed, when co-expressed with F508del, a variant that cause trafficking defect of CFTR, leads to PDZ motif- sensitive intracellular retention of SLC26A9. CFRD-associated variant in SLC26A9 provided causal evidence for a relationship between exocrine ductal dysfunction

114 (assessed by immunotrypsinogen (IRT) levels at birth) and risk of CFRD. CFRD- associated variants in and near SLC26A9 have also been associated with risk for meconium ileus (MI) in patients with CF, a complication that requires the presence of pancreatic exocrine insufficiency. Slc26a9 knockout mice in a CF disease model manifest increased rates of mortality due to intestinal obstruction. Collectively, SLC26A9 is a good biological candidate for involvement in basic CF pathophysiology.

Though the etiology of CFRD is poorly understood, diminishing insulin secretion in individuals with CF has been observed with age, most likely due to inflammation and subsequent destruction of pancreatic islet cells. Interestingly, our scRNA-sequencing of a pancreas from an individual with chronic pancreatitis revealed that SLC26A9 is co- expressed with CFTR in ductal cells. Reanalysis of four independent and publicly available scRNA-sequencing data also shown that SLC26A9 is expressed in pancreatic ductal cells. This suggests that SLC26A9 expression may regulate pancreatic function in diseases like CFRD.

Evaluation of the genetic architecture of SLC26A9 by deep sequencing of the entire locus in 762 individuals (all F508del homozygotes) revealed two linkage disequilibrium (LD) blocks. Haplotype testing revealed two common haplotypes (28.4% and 24.2% frequency) within the 5’ LD block containing the variants identified by GWAS that associated with lower and higher risk for CFRD, respectively.

Our sequencing data indicated that CFRD-associated variants flank upstream and downstream of exon 1 in SLC26A9. We have excluded that a variant in the exons or

RNA splice junctions of SLC26A9 is responsible for the modifier effect. No coding SNPs

115 were in LD with either haplotype and no individual SNP showed greater association with

CFRD that the two common haplotypes. Applying a test called SKAT-O, all variants were tested for a combined association with CFRD within each 5 kb window, moved across the entire region in increments of 1,250bp. This test suggests that CFRD-risk is conferred by either a single SNP in high LD with the remainder of the haplotype, or a combination of these non-coding SNPs.

The annotation of the translation start site (TSS) is crucial in determining the temporal or spatial gene expression of SLC26A9 given that the exclusion of exon 1 due to alternative splicing has been reported. We showed that 5’ RACE products from three unrelated human lung samples, one human stomach sample and one human pancreas sample confirmed that SLC26A9 mRNA transcripts contain exon 1 and that the TSS map in all three tissues to chr1:205,912,584 (hg19). One of the 3 lung samples (4 transcripts) had an alternative TSS beginning at position chr1:205,912,548 (hg19) which is 56 nucleotides upstream of the exon1/exon 2 junction. We concluded that the start of transcription begins with the first exon of the SLC26A9 gene embedded within the SNPs that form the

CFRD risk haplotypes.

The region proximal to the 5’ of SLC26A9 contains the promoter element TATA

(TATAAAC) box, 29 base pairs upstream, and a CCATT (GCCAATC) box, 77 base pairs upstream of the TSS. Annotating potential regulatory regions encompassing exon 1 of SLC26A9 shows binding sequence that interacts with the transcription factors (TFs)

FOS, JUNB, JUND, and FOSL2, which through scRNA-sequencing have consistently been present in SLC26A9-expressing cells. Therefore, CFRD-associated variants flanking exon 1 may influence SLC26A9 expression by modifying potential promoter activity in

116 the 5’ region of SLC26A9. These findings implicate that the region upstream of the 5’ of

SLC26A9 may serve as a promoter.

Systematic evaluation of DNA fragments derived from the 5’ region of SLC26A9 tested using a dual luciferase-renilla system generated 12-19% higher levels of expression in

PANC-1 cells (p-value: 6.43E-07) when low risk variants were present compared to high risk variants consistent with eQTL results from GTEx. This study provides evidence that the CFRD-risk SNPs may modulate diabetes risk by altering SLC26A9 expression in pancreatic ductal PANC-1 cells. Seeking to quantify the RNA abundance of SLC26A9, and a downstream gene PM20D1, in human primary tissues and correlating the findings with the status of CFRD variant was an important step to verify the expression studies in

PANC-1 as well as the eQTL observations. Although correlation between haplotype and expression of either genes relative to housekeeping genes were not observed, this could be due to many reasons. From scRNA-seq, we know that SLC26A9 is expressed in very specific cell types within different tissues, however, the bulk RNA that we extract from primary human tissue may have varying levels of specific cell type or tissue specific regions of interest. Furthermore, expression levels of both SLC26A9 and PM20D1 may also be developmental stage dependent. We determined SLC26A9 transcripts in pancreas, stomach, and lung to retain the PDZ-binding motif and excluded alternative splicing of the 3’ end as predicted by RNA-sequencing. Nevertheless, the expression observations in

PANC-1 were replicated in a second cell line, CFPAC-1 cells, also of pancreatic adenocarcinoma ductal origin. CFPAC-1 cell line provided an opportunity to test whether the absence of CFTR (as seen in CF) altered the difference in expression between LR and

HR constructs. Results showed similar trend as PANC-1 cell lines such that the 2.3 kb

117 bearing the LR haplotype from two independent transfections of 4 clones per haplotypes yielded 20% higher expression than those bearing the HR haplotype (p-value: 2.0E-03

(N=48 for LR, N=47 for HR)). From these results, we concluded that CFRD-associated

SNPs in the 5’ region act in concert with its basal promoter to alter the expression of

SLC26A9 in pancreatic cells. Taken together, our findings indicate that a modest increase in SLC26A9 expression in ductal cells of the pancreas delays the age-at-onset of diabetes, thereby suggesting a CFTR-agnostic treatment for a major complication of CF.

Elucidating the mechanisms of diabetes is essential to improve survival of CF patients by improving lung function through decreasing the risk of developing CFRD. There is growing evidence that alternative pathways for chloride transport can restore anion secretion in CF tissues. This study provides evidence of the effect of GWAS risk variants on a genetic modifier of CF. We postulate that the SLC26A9 partially compensates for the loss of CFTR function in the pancreas in individuals with CF and that modest differences in the level of SLC26A9 expression over the lifetime of an individual with CF affects the age-at-onset of diabetes providing a viable approach to delaying CFRD.

Modifiers are potential pathways that can be targeted for therapeutic interventions and individualized treatment of CF. Lastly, characterization of genetic modifiers of monogenic disorders such as CF may shed light not just on CFRD, but also on similar conditions in the general population such as type 2 diabetes

118 References

1. Foundation CF. Cystic Fibrosis Foundation Patient Registry Annual Data Report

2017.

2. Rommens JM, Iannuzzi MC, Kerem B, Drumm ML, Melmer G, Dean M, et al.

Identification of the cystic fibrosis gene: chromosome walking and jumping.

Science. 1989;245(4922):1059-65.

3. Riordan JR, Rommens JM, Kerem B, Alon N, Rozmahel R, Grzelczak Z, et al.

Identification of the cystic fibrosis gene: cloning and characterization of

complementary DNA. Science. 1989;245(4922):1066-73.

4. Kerem B, Rommens JM, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A, et

al. Identification of the cystic fibrosis gene: genetic analysis. Science.

1989;245(4922):1073-80.

5. Clarke LL, and Harline MC. CFTR is required for cAMP inhibition of intestinal

Na+ absorption in a cystic fibrosis mouse model. Am J Physiol. 1996;270(2 Pt

1):G259-67.

6. Pilewski JM, and Frizzell RA. Role of CFTR in airway disease. Physiol Rev.

1999;79(1 Suppl):S215-55.

119 7. Zielenski J. Genotype and phenotype in cystic fibrosis. Respiration.

2000;67(2):117-33.

8. Davies JC, Alton EW, and Bush A. Cystic fibrosis. BMJ. 2007;335(7632):1255-9.

9. Boucher RC. Airway surface dehydration in cystic fibrosis: pathogenesis and

therapy. Annu Rev Med. 2007;58:157-70.

10. Mall MA. Role of cilia, mucus, and airway surface liquid in mucociliary

dysfunction: lessons from mouse models. J Aerosol Med Pulm Drug Deliv.

2008;21(1):13-24.

11. Milla CE, and Warwick WJ. Risk of death in cystic fibrosis patients with severely

compromised lung function. Chest. 1998;113(5):1230-4.

12. Rowe SM, Miller S, and Sorscher EJ. Cystic fibrosis. N Engl J Med.

2005;352(19):1992-2001.

13. Cutting GR. Modifier genetics: cystic fibrosis. Annu Rev Genomics Hum Genet.

2005;6:237-60.

14. Zemanick ET, Harris JK, Wagner BD, Robertson CE, Sagel SD, Stevens MJ, et

al. Inflammation and airway microbiota during cystic fibrosis pulmonary

exacerbations. PLoS One. 2013;8(4):e62917.

120 15. Jones AM, Dodd ME, Govan JR, Barcus V, Doherty CJ, Morris J, et al.

Burkholderia cenocepacia and Burkholderia multivorans: influence on survival in

cystic fibrosis. Thorax. 2004;59(11):948-51.

16. SJ P. The Exocrine Pancreas. 2010.

17. Quinton PM. Cystic fibrosis: impaired bicarbonate secretion and mucoviscidosis.

Lancet. 2008;372(9636):415-7.

18. Marino CR, Matovcik LM, Gorelick FS, and Cohn JA. Localization of the cystic

fibrosis transmembrane conductance regulator in pancreas. J Clin Invest.

1991;88(2):712-6.

19. Kessler L, and Abély M. [Pancreatic infringement exocrine and endocrine in

cystic fibrosis]. Arch Pediatr. 2016;23(12S):12S21-12S32.

20. Coulier B. Pancreatic Lipomatosis: An Extensive Pictorial Review. J Belg Soc

Radiol. 2016;100(1):39.

21. Gibson-Corley KN, Meyerholz DK, and Engelhardt JF. Pancreatic

pathophysiology in cystic fibrosis. J Pathol. 2016;238(2):311-20.

22. Cutting GR. Modifier genes in Mendelian disorders: the example of cystic

fibrosis. Ann N Y Acad Sci. 2010;1214:57-69.

121 23. Witt H. Chronic pancreatitis and cystic fibrosis. Gut. 2003;52 Suppl 2:ii31-41.

24. Crossley JR, Elliott RB, and Smith PA. Dried-blood spot screening for cystic

fibrosis in the newborn. Lancet. 1979;1(8114):472-4.

25. Heeley AF, Heeley ME, King DN, Kuzemko JA, and Walsh MP. Screening for

cystic fibrosis by died blood spot trypsin assay. Arch Dis Child. 1982;57(1):18-

21.

26. Moran A, Brunzell C, Cohen RC, Katz M, Marshall BC, Onady G, et al. Clinical

care guidelines for cystic fibrosis-related diabetes: a position statement of the

American Diabetes Association and a clinical practice guideline of the Cystic

Fibrosis Foundation, endorsed by the Pediatric Endocrine Society. Diabetes Care.

2010;33(12):2697-708.

27. Lewis C, Blackman SM, Nelson A, Oberdorfer E, Wells D, Dunitz J, et al.

Diabetes-related mortality in adults with cystic fibrosis. Role of genotype and sex.

Am J Respir Crit Care Med. 2015;191(2):194-200.

28. Milla CE, Warwick WJ, and Moran A. Trends in pulmonary function in patients

with cystic fibrosis correlate with the degree of glucose intolerance at baseline.

Am J Respir Crit Care Med. 2000;162(3 Pt 1):891-5.

122 29. Quinton PM. Cystic fibrosis: lessons from the sweat gland. Physiology

(Bethesda). 2007;22:212-25.

30. Cui CY, and Schlessinger D. Eccrine sweat gland development and sweat

secretion. Exp Dermatol. 2015;24(9):644-50.

31. Reddy MM, Light MJ, and Quinton PM. Activation of the epithelial Na+ channel

(ENaC) requires CFTR Cl- channel function. Nature. 1999;402(6759):301-4.

32. Reddy MM, and Quinton PM. Functional interaction of CFTR and ENaC in sweat

glands. Pflugers Arch. 2003;445(4):499-503.

33. Sathe M, and Houwen R. Meconium ileus in Cystic Fibrosis. J Cyst Fibros.

2017;16 Suppl 2:S32-S9.

34. Stoltz DA, Rokhlina T, Ernst SE, Pezzulo AA, Ostedgaard LS, Karp PH, et al.

Intestinal CFTR expression alleviates meconium ileus in cystic fibrosis pigs. J

Clin Invest. 2013;123(6):2685-93.

35. Rescorla FJ, and Grosfeld JL. Contemporary management of meconium ileus.

World J Surg. 1993;17(3):318-25.

36. Kerem E, C orey M, Kerem B, Durie P, Tsui L, and Levison H. Journal of

Pediatrics; 1989.

123 37. Zielenski J, Corey M, Rozmahel R, Markiewicz D, Aznarez I, Casals T, et al.

Detection of a cystic fibrosis modifier locus for meconium ileus on human

chromosome 19q13. Nat Genet. 1999;22(2):128-9.

38. Sun L, Rommens JM, Corvol H, Li W, Li X, Chiang TA, et al. Multiple apical

plasma membrane constituents are associated with susceptibility to meconium

ileus in individuals with cystic fibrosis. Nat Genet. 2012;44(5):562-9.

39. Houwen RH, van der Doef HP, Sermet I, Munck A, Hauser B, Walkowiak J, et al.

Defining DIOS and constipation in cystic fibrosis with a multicentre study on the

incidence, characteristics, and treatment of DIOS. J Pediatr Gastroenterol Nutr.

2010;50(1):38-42.

40. Brown NJ, Read NW, Richardson A, Rumsey RD, and Bogentoft C.

Characteristics of lipid substances activating the ileal brake in the rat. Gut.

1990;31(10):1126-9.

41. Colombo C, Ellemunter H, Houwen R, Munck A, Taylor C, Wilschanski M, et al.

Guidelines for the diagnosis and management of distal intestinal obstruction

syndrome in cystic fibrosis patients. J Cyst Fibros. 2011;10 Suppl 2:S24-8.

42. Duthie A, Doherty DG, Williams C, Scott-Jupp R, Warner JO, Tanner MS, et al.

Genotype analysis for delta F508, G551D and R553X mutations in children and

124 young adults with cystic fibrosis with and without chronic liver disease.

Hepatology. 1992;15(4):660-4.

43. Colombo C, Apostolo MG, Ferrari M, Seia M, Genoni S, Giunta A, et al.

Analysis of risk factors for the development of liver disease associated with cystic

fibrosis. J Pediatr. 1994;124(3):393-9.

44. Waters DL, Dorney SF, Gruca MA, Martin HC, Howman-Giles R, Kan AE, et al.

Hepatobiliary disease in cystic fibrosis patients with pancreatic sufficiency.

Hepatology. 1995;21(4):963-9.

45. Duthie A, Doherty DG, Donaldson PT, Scott-Jupp R, Tanner MS, Eddleston AL,

et al. The major histocompatibility complex influences the development of

chronic liver disease in male children and young adults with cystic fibrosis. J

Hepatol. 1995;23(5):532-7.

46. Gabolde M, Hubert D, Guilloud-Bataille M, Lenaerts C, Feingold J, and Besmond

C. The mannose binding lectin gene influences the severity of chronic liver

disease in cystic fibrosis. J Med Genet. 2001;38(5):310-1.

47. Daudin M, Bieth E, Bujan L, Massat G, Pontonnier F, and Mieusset R. Congenital

bilateral absence of the vas deferens: clinical characteristics, biological

parameters, cystic fibrosis transmembrane conductance regulator gene mutations,

and implications for genetic counseling. Fertil Steril. 2000;74(6):1164-74.

125 48. Jarzabek K, Zbucka M, Pepiński W, Szamatowicz J, Domitrz J, Janica J, et al.

Cystic fibrosis as a cause of infertility. Reprod Biol. 2004;4(2):119-29.

49. Dohle GR, Veeze HJ, Overbeek SE, van den Ouweland AM, Halley DJ, Weber

RF, et al. The complex relationships between cystic fibrosis and congenital

bilateral absence of the vas deferens: clinical, electrophysiological and genetic

data. Hum Reprod. 1999;14(2):371-4.

50. Chillón M, Casals T, Mercier B, Bassas L, Lissens W, Silber S, et al. Mutations in

the cystic fibrosis gene in patients with congenital absence of the vas deferens. N

Engl J Med. 1995;332(22):1475-80.

51. Anguiano A, Oates RD, Amos JA, Dean M, Gerrard B, Stewart C, et al.

Congenital bilateral absence of the vas deferens. A primarily genital form of

cystic fibrosis. JAMA. 1992;267(13):1794-7.

52. Clancy JP, and Jain M. Personalized medicine in cystic fibrosis: dawning of a

new era. Am J Respir Crit Care Med. 2012;186(7):593-7.

53. Dodge JA, Lewis PA, Stanton M, and Wilsher J. Cystic fibrosis mortality and

survival in the UK: 1947-2003. Eur Respir J. 2007;29(3):522-6.

126 54. Levy L, Durie P, Pencharz P, and Corey M. Prognostic factors associated with

patient survival during nutritional rehabilitation in malnourished children and

adolescents with cystic fibrosis. J Pediatr Gastroenterol Nutr. 1986;5(1):97-102.

55. Hazinski TA. Intermittent administration of inhaled tobramycin in patients with

cystic fibrosis. J Pediatr. 1999;135(1):130.

56. Ramsey BW, Pepe MS, Quan JM, Otto KL, Montgomery AB, Williams-Warren

J, et al. Intermittent administration of inhaled tobramycin in patients with cystic

fibrosis. Cystic Fibrosis Inhaled Tobramycin Study Group. N Engl J Med.

1999;340(1):23-30.

57. Saiman L, Marshall BC, Mayer-Hamblett N, Burns JL, Quittner AL, Cibene DA,

et al. Azithromycin in patients with cystic fibrosis chronically infected with

Pseudomonas aeruginosa: a randomized controlled trial. JAMA.

2003;290(13):1749-56.

58. Southern KW, Clancy JP, and Ranganathan S. Aerosolized agents for airway

clearance in cystic fibrosis. Pediatr Pulmonol. 2019.

59. Fuchs HJ, Borowitz DS, Christiansen DH, Morris EM, Nash ML, Ramsey BW, et

al. Effect of aerosolized recombinant human DNase on exacerbations of

respiratory symptoms and on pulmonary function in patients with cystic fibrosis.

The Pulmozyme Study Group. N Engl J Med. 1994;331(10):637-42.

127 60. Mall MA, and Galietta LJ. Targeting ion channels in cystic fibrosis. J Cyst

Fibros. 2015;14(5):561-70.

61. Moran A, Becker D, Casella SJ, Gottlieb PA, Kirkman MS, Marshall BC, et al.

Epidemiology, pathophysiology, and prognostic implications of cystic fibrosis-

related diabetes: a technical review. Diabetes Care. 2010;33(12):2677-83.

62. Burns JL, Van Dalfsen JM, Shawar RM, Otto KL, Garber RL, Quan JM, et al.

Effect of chronic intermittent administration of inhaled tobramycin on respiratory

microbial flora in patients with cystic fibrosis. J Infect Dis. 1999;179(5):1190-6.

63. Ratjen F. Changes in strategies for optimal antibacterial therapy in cystic fibrosis.

Int J Antimicrob Agents. 2001;17(2):93-6.

64. Ramsey BW, Davies J, McElvaney NG, Tullis E, Bell SC, Drevinek P, et al. A

CFTR potentiator in patients with cystic fibrosis and the G551D mutation. N Engl

J Med. 2011;365(18):1663-72.

65. Van Goor F, Hadida S, Grootenhuis PD, Burton B, Stack JH, Straley KS, et al.

Correction of the F508del-CFTR protein processing defect in vitro by the

investigational drug VX-809. Proc Natl Acad Sci U S A. 2011;108(46):18843-8.

128 66. Yu H, Burton B, Huang CJ, Worley J, Cao D, Johnson JP, Jr., et al. Ivacaftor

potentiation of multiple CFTR channels with gating mutations. J Cyst Fibros.

2012;11(3):237-45.

67. Davies JC, Wainwright CE, Canny GJ, Chilvers MA, Howenstine MS, Munck A,

et al. Efficacy and safety of ivacaftor in patients aged 6 to 11 years with cystic

fibrosis with a G551D mutation. Am J Respir Crit Care Med. 2013;187(11):1219-

25.

68. De Boeck K, Munck A, Walker S, Faro A, Hiatt P, Gilmartin G, et al. Efficacy

and safety of ivacaftor in patients with cystic fibrosis and a non-G551D gating

mutation. J Cyst Fibros. 2014;13(6):674-80.

69. McKone EF, Borowitz D, Drevinek P, Griese M, Konstan MW, Wainwright C, et

al. Long-term safety and efficacy of ivacaftor in patients with cystic fibrosis who

have the Gly551Asp-CFTR mutation: a phase 3, open-label extension study

(PERSIST). Lancet Respir Med. 2014;2(11):902-10.

70. Bernarde C, Keravec M, Mounier J, Gouriou S, Rault G, Ferec C, et al. Impact of

the CFTR-potentiator ivacaftor on airway microbiota in cystic fibrosis patients

carrying a G551D mutation. PLoS One. 2015;10(4):e0124124.

71. Konstan MW, McKone EF, Moss RB, Marigowda G, Tian S, Waltz D, et al.

Assessment of safety and efficacy of long-term treatment with combination

129 lumacaftor and ivacaftor therapy in patients with cystic fibrosis homozygous for

the F508del-CFTR mutation (PROGRESS): a phase 3, extension study. Lancet

Respir Med. 2017;5(2):107-18.

72. Boyle MP, Bell SC, Konstan MW, McColley SA, Rowe SM, Rietschel E, et al. A

CFTR corrector (lumacaftor) and a CFTR potentiator (ivacaftor) for treatment of

patients with cystic fibrosis who have a phe508del CFTR mutation: a phase 2

randomised controlled trial. Lancet Respir Med. 2014;2(7):527-38.

73. Rehman A, Baloch NU, and Janahi IA. Lumacaftor-Ivacaftor in Patients with

Cystic Fibrosis Homozygous for Phe508del CFTR. N Engl J Med.

2015;373(18):1783.

74. Davies JC, Moskowitz SM, Brown C, Horsley A, Mall MA, McKone EF, et al.

VX-659-Tezacaftor-Ivacaftor in Patients with Cystic Fibrosis and One or Two

Phe508del Alleles. N Engl J Med. 2018;379(17):1599-611.

75. Keating D, Marigowda G, Burr L, Daines C, Mall MA, McKone EF, et al. VX-

445-Tezacaftor-Ivacaftor in Patients with Cystic Fibrosis and One or Two

Phe508del Alleles. N Engl J Med. 2018;379(17):1612-20.

76. Kayani K, Mohammed R, and Mohiaddin H. Cystic Fibrosis-Related Diabetes.

Front Endocrinol (Lausanne). 2018;9:20.

130 77. Konstan MW, Hilliard KA, Norvell TM, and Berger M. Bronchoalveolar lavage

findings in cystic fibrosis patients with stable, clinically mild lung disease suggest

ongoing infection and inflammation. Am J Respir Crit Care Med.

1994;150(2):448-54.

78. Lange P, Groth S, Kastrup J, Mortensen J, Appleyard M, Nyboe J, et al. Diabetes

mellitus, plasma glucose and lung function in a cross-sectional population study.

Eur Respir J. 1989;2(1):14-9.

79. Schwarzenberg SJ, Thomas W, Olsen TW, Grover T, Walk D, Milla C, et al.

Microvascular complications in cystic fibrosis-related diabetes. Diabetes Care.

2007;30(5):1056-61.

80. Gottlieb PA, Yu L, Babu S, Wenzlau J, Bellin M, Frohnert BI, et al. No relation

between cystic fibrosis-related diabetes and type 1 diabetes autoimmunity.

Diabetes Care. 2012;35(8):e57.

81. Hull RL, Westermark GT, Westermark P, and Kahn SE. Islet amyloid: a critical

entity in the pathogenesis of type 2 diabetes. J Clin Endocrinol Metab.

2004;89(8):3629-43.

82. Couce M, O'Brien TD, Moran A, Roche PC, and Butler PC. Diabetes mellitus in

cystic fibrosis is characterized by islet amyloidosis. J Clin Endocrinol Metab.

1996;81(3):1267-72.

131 83. Dobson L, Sheldon CD, and Hattersley AT. Understanding cystic-fibrosis-related

diabetes: best thought of as insulin deficiency? J R Soc Med. 2004;97 Suppl

44:26-35.

84. Blackman SM, Commander CW, Watson C, Arcara KM, Strug LJ, Stonebraker

JR, et al. Genetic modifiers of cystic fibrosis-related diabetes. Diabetes.

2013;62(10):3627-35.

85. Hart NJ, Aramandla R, Poffenberger G, Fayolle C, Thames AH, Bautista A, et al.

Cystic fibrosis-related diabetes is caused by islet loss and inflammation. JCI

Insight. 2018;3(8).

86. Boom A, Lybaert P, Pollet JF, Jacobs P, Jijakli H, Golstein PE, et al. Expression

and localization of cystic fibrosis transmembrane conductance regulator in the rat

endocrine pancreas. Endocrine. 2007;32(2):197-205.

87. Huang WQ, Guo JH, Zhang XH, Yu MK, Chung YW, Ruan YC, et al. Glucose-

Sensitive CFTR Suppresses Glucagon Secretion by Potentiating KATP Channels

in Pancreatic Islet α Cells. Endocrinology. 2017;158(10):3188-99.

88. Norris AW, Ode KL, Merjaneh L, Sanda S, Yi Y, Sun X, et al. Survival in a bad

neighborhood: pancreatic islets in cystic fibrosis. J Endocrinol. 2019.

132 89. Thomassen JC, Mueller MI, Alejandre Alcazar MA, Rietschel E, and van

Koningsbruggen-Rietschel S. Effect of Lumacaftor/Ivacaftor on glucose

metabolism and insulin secretion in Phe508del homozygous cystic fibrosis

patients. J Cyst Fibros. 2018;17(2):271-5.

90. Kelly A, De Leon DD, Sheikh S, Camburn D, Kubrak C, Peleckis AJ, et al. Islet

Hormone and Incretin Secretion in Cystic Fibrosis after Four Months of Ivacaftor

Therapy. Am J Respir Crit Care Med. 2019;199(3):342-51.

91. Barry PJ, Banerjee A, Horsley A, and Brennan AL. 182 Impact of ivacaftor on

glycaemic health in patients carrying the G551D mutation. Journal of Cystic

Fibrosis. 2015;14:S104.

92. Kirwan L, Fletcher G, Harrington M, Jeleniewska P, Zhou S, Casserly B, et al.

Longitudinal Trends in Real-World Outcomes after Initiation of Ivacaftor. A

Cohort Study from the Cystic Fibrosis Registry of Ireland. Ann Am Thorac Soc.

2019;16(2):209-16.

93. Bellin MD, Laguna T, Leschyshyn J, Regelmann W, Dunitz J, Billings J, et al.

Insulin secretion improves in cystic fibrosis following ivacaftor correction of

CFTR: a small pilot study. Pediatr Diabetes. 2013;14(6):417-21.

133 94. Segerstolpe Å, Palasantza A, Eliasson P, Andersson EM, Andréasson AC, Sun X,

et al. Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health

and Type 2 Diabetes. Cell Metab. 2016;24(4):593-607.

95. Baron M, Veres A, Wolock SL, Faust AL, Gaujoux R, Vetere A, et al. A Single-

Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and

Intra-cell Population Structure. Cell Syst. 2016;3(4):346-60.e4.

96. Miller MR, Soave D, Li W, Gong J, Pace RG, Boëlle PY, et al. Variants in Solute

Carrier SLC26A9 Modify Prenatal Exocrine Pancreatic Damage in Cystic

Fibrosis. J Pediatr. 2015;166(5):1152-7.e6.

97. Soave D, Miller MR, Keenan K, Li W, Gong J, Ip W, et al. Evidence for a causal

relationship between early exocrine pancreatic disease and cystic fibrosis-related

diabetes: a Mendelian randomization study. Diabetes. 2014;63(6):2114-9.

98. Hardt PD, Krauss A, Bretz L, Porsch-Ozcürümez M, Schnell-Kretschmer H,

Mäser E, et al. Pancreatic exocrine function in patients with type 1 and type 2

diabetes mellitus. Acta Diabetol. 2000;37(3):105-10.

99. Nunes AC, Pontes JM, Rosa A, Gomes L, Carvalheiro M, and Freitas D.

Screening for pancreatic exocrine insufficiency in patients with diabetes mellitus.

American Journal of Gastroenterology. 2003;98(12):2672-5.

134 100. Raeder H, Johansson S, Holm PI, Haldorsen IS, Mas E, Sbarra V, et al. Mutations

in the CEL VNTR cause a syndrome of diabetes and pancreatic exocrine

dysfunction. Nat Genet. 2006;38(1):54-62.

101. Johansson BB, Fjeld K, El Jellas K, Gravdal A, Dalva M, Tjora E, et al. The role

of the carboxyl ester lipase (CEL) gene in pancreatic disease. Pancreatology.

2018;18(1):12-9.

102. Bismuth E, Laborde K, Taupin P, Velho G, Ribault V, Jennane F, et al. Glucose

tolerance and insulin secretion, morbidity, and death in patients with cystic

fibrosis. J Pediatr. 2008;152(4):540-5, 5.e1.

103. Peckham D. Routine screening for cystic fibrosis-related diabetes. J R Soc Med.

2009;102 Suppl 1:36-9.

104. Martín-Frías M, Máiz L, Carcavilla A, and Barrio R. Long-term benefits in lung

function and nutritional status of strict metabolic control of cystic fibrosis-related

diabetes. Arch Bronconeumol. 2011;47(10):531-4.

105. Bridges N, Rowe R, and Holt RIG. Unique challenges of cystic fibrosis-related

diabetes. Diabet Med. 2018.

135 106. Blackman SM, Hsu S, Vanscoy LL, Collaco JM, Ritter SE, Naughton K, et al.

Genetic modifiers play a substantial role in diabetes complicating cystic fibrosis. J

Clin Endocrinol Metab. 2009;94(4):1302-9.

107. Mekus F, Ballmann M, Bronsveld I, Bijman J, Veeze H, and Tümmler B.

Categories of deltaF508 homozygous cystic fibrosis twin and sibling pairs with

distinct phenotypic characteristics. Twin Res. 2000;3(4):277-93.

108. Blackman SM, Hsu S, Ritter SE, Naughton KM, Wright FA, Drumm ML, et al. A

susceptibility gene for type 2 diabetes confers substantial risk for diabetes

complicating cystic fibrosis. Diabetologia. 2009;52(9):1858-65.

109. Consortium G. The Genotype-Tissue Expression (GTEx) project. Nat Genet.

2013;45(6):580-5.

110. Lohi H, Kujala M, Makela S, Lehtonen E, Kestila M, Saarialho-Kere U, et al.

Functional characterization of three novel tissue-specific anion exchangers

SLC26A7, -A8, and -A9. J Biol Chem. 2002;277(16):14246-54.

111. Dorwart MR, Shcheynikov N, Wang Y, Stippec S, and Muallem S. SLC26A9 is a

Cl(-) channel regulated by the WNK kinases. J Physiol. 2007;584(Pt 1):333-45.

136 112. Loriol C, Dulong S, Avella M, Gabillat N, Boulukos K, Borgese F, et al.

Characterization of SLC26A9, facilitation of Cl(-) transport by bicarbonate. Cell

Physiol Biochem. 2008;22(1-4):15-30.

113. Chang MH, Plata C, Sindic A, Ranatunga WK, Chen AP, Zandi-Nejad K, et al.

Slc26a9 is inhibited by the R-region of the cystic fibrosis transmembrane

conductance regulator via the STAS domain. J Biol Chem. 2009;284(41):28306-

18.

114. Salomon JJ, Spahn S, Wang X, Füllekrug J, Bertrand CA, and Mall MA.

Generation and functional characterization of epithelial cells with stable

expression of SLC26A9 Cl- channels. Am J Physiol Lung Cell Mol Physiol.

2016;310(7):L593-602.

115. Walter JD, Sawicka M, and Dutzler R. Cryo-EM structures and functional

characterization of murine Slc26a9 reveal mechanism of uncoupled chloride

transport. Elife. 2019;8.

116. Geertsma ER, Chang YN, Shaik FR, Neldner Y, Pardon E, Steyaert J, et al.

Structure of a prokaryotic fumarate transporter reveals the architecture of the

SLC26 family. Nat Struct Mol Biol. 2015;22(10):803-8.

117. Xu J, Song P, Miller ML, Borgese F, Barone S, Riederer B, et al. Deletion of the

chloride transporter Slc26a9 causes loss of tubulovesicles in parietal cells and

137 impairs acid secretion in the stomach. Proc Natl Acad Sci U S A.

2008;105(46):17955-60.

118. Liu X, Li T, Riederer B, Lenzen H, Ludolph L, Yeruva S, et al. Loss of Slc26a9

anion transporter alters intestinal electrolyte and HCO3(-) transport and reduces

survival in CFTR-deficient mice. Pflugers Arch. 2015;467(6):1261-75.

119. Amlal H, Xu J, Barone S, Zahedi K, and Soleimani M. The chloride

channel/transporter Slc26a9 regulates the systemic arterial pressure and renal

chloride excretion. J Mol Med (Berl). 2013;91(5):561-72.

120. Balázs A, and Mall MA. Role of the SLC26A9 Chloride Channel as Disease

Modifier and Potential Therapeutic Target in Cystic Fibrosis. Front Pharmacol.

2018;9:1112.

121. Bertrand CA, Zhang R, Pilewski JM, and Frizzell RA. SLC26A9 is a

constitutively active, CFTR-regulated anion conductance in human bronchial

epithelia. J Gen Physiol. 2009;133(4):421-38.

122. Anagnostopoulou P, Riederer B, Duerr J, Michel S, Binia A, Agrawal R, et al.

SLC26A9-mediated chloride secretion prevents mucus obstruction in airway

inflammation. J Clin Invest. 2012;122(10):3629-34.

138 123. Salomon JJ, Buechner PM, Spahn S, Wagner W, and Mall MA. American

Journal of Respiratory and Critical Care Medicine 2019.

124. Bakouh N, Bienvenu T, Thomas A, Ehrenfeld J, Liote H, Roussel D, et al.

Characterization of SLC26A9 in patients with CF-like lung disease. Hum Mutat.

2013;34(10):1404-14.

125. Strug LJ, Gonska T, He G, Keenan K, Ip W, Boëlle PY, et al. Cystic fibrosis gene

modifier SLC26A9 modulates airway response to CFTR-directed therapeutics.

Hum Mol Genet. 2016;25(20):4590-600.

126. Kmit A, Marson FAL, Pereira SV, Vinagre AM, Leite GS, Servidoni MF, et al.

Extent of rescue of F508del-CFTR function by VX-809 and VX-770 in human

nasal epithelial cells correlates with SNP rs7512462 in SLC26A9 gene in

F508del/F508del Cystic Fibrosis patients. Biochim Biophys Acta Mol Basis Dis.

2019.

127. Xu J, Henriksnäs J, Barone S, Witte D, Shull GE, Forte JG, et al. SLC26A9 is

expressed in gastric surface epithelial cells, mediates Cl-/HCO3- exchange, and is

inhibited by NH4+. Am J Physiol Cell Physiol. 2005;289(2):C493-505.

128. Lee HJ, Yoo JE, Namkung W, Cho HJ, Kim K, Kang JW, et al. Thick airway

surface liquid volume and weak mucin expression in -deficient human

airway epithelia. Physiol Rep. 2015;3(8).

139 129. El Khouri E, and Touré A. Functional interaction of the cystic fibrosis

transmembrane conductance regulator with members of the SLC26 family of

anion transporters (SLC26A8 and SLC26A9): physiological and

pathophysiological relevance. Int J Biochem Cell Biol. 2014;52:58-67.

130. Ousingsawat J, Schreiber R, and Kunzelmann K. Differential contribution of

SLC26A9 to Cl(-) conductance in polarized and non-polarized epithelial cells. J

Cell Physiol. 2012;227(6):2323-9.

131. Gong J, Wang F, Xiao B, Panjwani N, Lin F, Keenan K, et al. Genetic association

and transcriptome integration identify contributing genes and tissues at cystic

fibrosis modifier loci. PLoS Genet. 2019;15(2):e1008007.

132. Blackman SM, Deering-Brose R, McWilliams R, Naughton K, Coleman B, Lai T,

et al. Relative contribution of genetic and nongenetic modifiers to intestinal

obstruction in cystic fibrosis. Gastroenterology. 2006;131(4):1030-9.

133. van den Bergen JC, Hiller M, Böhringer S, Vijfhuizen L, Ginjaar HB, Chaouch A,

et al. Validation of genetic modifiers for Duchenne muscular dystrophy: a

multicentre study assessing SPP1 and LTBP4 variants. J Neurol Neurosurg

Psychiatry. 2015;86(10):1060-5.

140 134. Moss DJH, Pardiñas AF, Langbehn D, Lo K, Leavitt BR, Roos R, et al.

Identification of genetic variants associated with Huntington's disease

progression: a genome-wide association study. Lancet Neurol. 2017;16(9):701-11.

135. Tsabari R, Elyashar HI, Cymberknowh MC, Breuer O, Armoni S, Livnat G, et al.

CFTR potentiator therapy ameliorates impaired insulin secretion in CF patients

with a gating mutation. J Cyst Fibros. 2016;15(3):e25-7.

136. Li A, Vigers T, Pyle L, Zemanick E, Nadeau K, Sagel SD, et al. Continuous

glucose monitoring in youth with cystic fibrosis treated with lumacaftor-ivacaftor.

J Cyst Fibros. 2019;18(1):144-9.

137. Lesurf R, Cotto KC, Wang G, Griffith M, Kasaian K, Jones SJ, et al. ORegAnno

3.0: a community-driven resource for curated regulatory annotation. Nucleic

Acids Res. 2016;44(D1):D126-32.

138. Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, et al.

JASPAR 2014: an extensively expanded and updated open-access database of

transcription factor binding profiles. Nucleic Acids Res. 2014;42(Database

issue):D142-7.

139. Butler A, Hoffman P, Smibert P, Papalexi E, and Satija R. Integrating single-cell

transcriptomic data across different conditions, technologies, and species. Nat

Biotechnol. 2018;36(5):411-20.

141 140. Wang YJ, Schug J, Won KJ, Liu C, Naji A, Avrahami D, et al. Single-Cell

Transcriptomics of the Human Endocrine Pancreas. Diabetes. 2016;65(10):3028-

38.

141. Muraro MJ, Dharmadhikari G, Grün D, Groen N, Dielen T, Jansen E, et al. A

Single-Cell Transcriptome Atlas of the Human Pancreas. Cell Syst.

2016;3(4):385-94.e3.

142. Norris AW, Ode KL, Merjaneh L, Sanda S, Yi Y, Sun X, et al. Survival in a bad

neighborhood: pancreatic islets in cystic fibrosis. J Endocrinol. 2019;241(1):R35-

R50.

143. Sinđić A, Sussman CR, and Romero MF. Primers on molecular pathways:

bicarbonate transport by the pancreas. Pancreatology. 2010;10(6):660-3.

144. Alka K, and Casey JR. Bicarbonate transport in health and disease. IUBMB Life.

2014;66(9):596-615.

145. Park HW, and Lee MG. Transepithelial bicarbonate secretion: lessons from the

pancreas. Cold Spring Harb Perspect Med. 2012;2(10).

146. Montoro DT, Haber AL, Biton M, Vinarsky V, Lin B, Birket SE, et al. A revised

airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature.

2018;560(7718):319-24.

142 147. Plasschaert LW, Žilionis R, Choo-Wing R, Savova V, Knehr J, Roma G, et al. A

single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary

ionocyte. Nature. 2018;560(7718):377-81.

148. Lieber M, Mazzetta J, Nelson-Rees W, Kaplan M, and Todaro G. Establishment

of a continuous tumor-cell line (panc-1) from a human carcinoma of the exocrine

pancreas. Int J Cancer. 1975;15(5):741-7.

149. Wu Y, Li J, Saleem S, Yee SP, Hardikar AA, and Wang R. c-Kit and stem cell

factor regulate PANC-1 cell differentiation into insulin- and glucagon-producing

cells. Lab Invest. 2010;90(9):1373-84.

150. Gottschalk LB, Vecchio-Pagan B, Sharma N, Han ST, Franca A, Wohler ES, et

al. Creation and characterization of an airway epithelial cell line for stable

expression of CFTR variants. J Cyst Fibros. 2016;15(3):285-94.

151. McIntosh JC, Schoumacher RA, and Tiller RE. Pancreatic adenocarcinoma in a

patient with cystic fibrosis. Am J Med. 1988;85(4):592.

152. Schoumacher RA, Ram J, Iannuzzi MC, Bradbury NA, Wallace RW, Hon CT, et

al. A cystic fibrosis pancreatic adenocarcinoma cell line. Proc Natl Acad Sci U S

A. 1990;87(10):4012-6.

143 153. Bockmühl Y, Murgatroyd CA, Kuczynska A, Adcock IM, Almeida OF, and

Spengler D. Differential regulation and function of 5'-untranslated GR-exon 1

transcripts. Mol Endocrinol. 2011;25(7):1100-10.

154. Babendure JR, Babendure JL, Ding JH, and Tsien RY. Control of mammalian

translation by mRNA structure near caps. RNA. 2006;12(5):851-61.

155. Bertrand CA, Mitra S, Mishra SK, Wang X, Zhao Y, Pilewski JM, et al. The

CFTR trafficking mutation F508del inhibits the constitutive activity of SLC26A9.

Am J Physiol Lung Cell Mol Physiol. 2017;312(6):L912-L25.

156. Ghazalpour A, Bennett B, Petyuk VA, Orozco L, Hagopian R, Mungrue IN, et al.

Comparative analysis of proteome and transcriptome variation in mouse. PLoS

Genet. 2011;7(6):e1001393.

157. Delaspre F, Beer RL, Rovira M, Huang W, Wang G, Gee S, et al. Centroacinar

Cells Are Progenitors That Contribute to Endocrine Pancreas Regeneration.

Diabetes. 2015;64(10):3499-509.

158. Harris A. The Duct Cell in Cystic Fibrosis. Annals of the New York Academy of

Sciences. 2006;30(880):17-30.

159. Beer RL, Parsons MJ, and Rovira M. Centroacinar cells: At the center of pancreas

regeneration. Dev Biol. 2016;413(1):8-15.

144 160. Ghaye AP, Bergemann D, Tarifeño-Saldivia E, Flasse LC, Von Berg V, Peers B,

et al. Progenitor potential of nkx6.1-expressing cells throughout zebrafish life and

during beta cell regeneration. BMC Biol. 2015;13:70.

161. Nunes AC, Pontes JM, Rosa A, Gomes L, Carvalheiro M, and Freitas D.

Screening for pancreatic exocrine insufficiency in patients with diabetes mellitus.

Am J Gastroenterol. 2003;98(12):2672-5.

162. Vesely PW, Staber PB, Hoefler G, and Kenner L. Translational regulation

mechanisms of AP-1 proteins. Mutat Res. 2009;682(1):7-12.

163. Karin M, Liu Z, and Zandi E. AP-1 function and regulation. Curr Opin Cell Biol.

1997;9(2):240-6.

164. Gupta MK, and Vadde R. Identification and characterization of differentially

expressed genes in Type 2 Diabetes using in silico approach. Comput Biol Chem.

2019;79:24-35.

165. Li J, Li S, Hu Y, Cao G, Wang S, Rai P, et al. The Expression Level of mRNA,

Protein, and DNA Methylation Status of. J Diabetes Res. 2016;2016:5957404.

166. Huda N, Hosen MI, Yasmin T, Sarkar PK, Hasan AKMM, and Nabi AHMN.

Genetic variation of the transcription factor GATA3, not STAT4, is associated

145 with the risk of type 2 diabetes in the Bangladeshi population. PLoS One.

2018;13(7):e0198507.

167. Gao N, Le Lay J, Qin W, Doliba N, Schug J, Fox AJ, et al. Foxa1 and Foxa2

maintain the metabolic and secretory features of the mature beta-cell. Mol

Endocrinol. 2010;24(8):1594-604.

168. Vatamaniuk MZ, Gupta RK, Lantz KA, Doliba NM, Matschinsky FM, and

Kaestner KH. Foxa1-deficient mice exhibit impaired insulin secretion due to

uncoupled oxidative phosphorylation. Diabetes. 2006;55(10):2730-6.

169. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al.

Finding the missing heritability of complex diseases. Nature.

2009;461(7265):747-53.

170. Sondo E, Caci E, and Galietta LJ. The TMEM16A chloride channel as an

alternative therapeutic target in cystic fibrosis. Int J Biochem Cell Biol.

2014;52:73-6.

171. Muraglia KA, Chorghade RS, Kim BR, Tang XX, Shah VS, Grillo AS, et al.

Small-molecule ion channels increase host defences in cystic fibrosis airway

epithelia. Nature. 2019;567(7748):405-8.

146 172. Vecchio-Pagán B, Blackman SM, Lee M, Atalar M, Pellicore MJ, Pace RG, et al.

Deep resequencing of CFTR in 762 F508del homozygotes reveals clusters of non-

coding variants associated with cystic fibrosis disease traits. Hum Genome Var.

2016;3:16038.

173. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al.

PLINK: a tool set for whole-genome association and population-based linkage

analyses. Am J Hum Genet. 2007;81(3):559-75.

174. Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, et al.

LocusZoom: regional visualization of genome-wide association scan results.

Bioinformatics. 2010;26(18):2336-7.

175. Lee S, Wu MC, and Lin X. Optimal tests for rare variant effects in sequencing

association studies. Biostatistics. 2012;13(4):762-75.

176. Langmead B, and Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat

Methods. 2012;9(4):357-9.

177. Trapnell C, Pachter L, and Salzberg SL. TopHat: discovering splice junctions

with RNA-Seq. Bioinformatics. 2009;25(9):1105-11.

147 178. Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, and Pachter L.

Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat

Biotechnol. 2013;31(1):46-53.

179. Behre G, Smith LT, and Tenen DG. Use of a promoterless Renilla luciferase

vector as an internal control plasmid for transient co-transfection assays of Ras-

mediated transcription activation. Biotechniques. 1999;26(1):24-6, 8.

180. Sherf B, Navarro S, Hannah R, and Wood K. Dual-Luciferase Reporter Assay: An

Advanced Co-Reporter Technology Integrating Firefly and Renilla Luciferase

Assays. 1996;57:2–8.

181. Zhang F, and Lupski JR. Non-coding genetic variants in human disease. Hum Mol

Genet. 2015;24(R1):R102-10.

182. Brodie A, Azaria JR, and Ofran Y. How far from the SNP may the causative

genes be? Nucleic Acids Res. 2016;44(13):6046-54.

183. Long JZ, Svensson KJ, Bateman LA, Lin H, Kamenecka T, Lokurkar IA, et al.

The Secreted Enzyme PM20D1 Regulates Lipidated Amino Acid Uncouplers of

Mitochondria. Cell. 2016;166(2):424-35.

148 184. Onodera CS, Underwood JG, Katzman S, Jacobs F, Greenberg D, Salama SR, et

al. Gene isoform specificity through enhancer-associated antisense transcription.

PLoS One. 2012;7(8):e43511.

185. Lenth RV. Statistical power calculations. J Anim Sci. 2007;85(13 Suppl):E24-9.

186. Livak KJ, and Schmittgen TD. Analysis of relative gene expression data using

real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods.

2001;25(4):402-8.

149 Curriculum Vitae

ANH-THU NGOC LAM 733 N. Broadway, MRB-552, Baltimore, MD 21205 Email: [email protected] | Phone: (617) 648-6318

EDUCATION

JOHNS HOPKINS UNIVERSITY SCHOOL OF MEDICINE (Baltimore, MD) Aug 2014-May 2019 Ph.D. in Human Genetics and Molecular Biology • Doctoral Thesis: Molecular characterization of SLC26A9 and evaluation of its potential role as a modifier of cystic fibrosis UNIVERSITY OF VERMONT, CAS, Honors College (Burlington, VT) Aug 2007-Jan 2012 Bachelor of Science in Biochemistry • Honors Thesis: The role of tyrosine 504 phosphorylation on the regulation of collapsing response mediator protein 1 Bachelor of Arts in Spanish Language and Literature • The National Spanish Honor Society: Sigma Delta Pi-UVM Chapter Inducted on Apr 2011

RESEARCH EXPERIENCES ———————————————————————————————————— JOHNS HOPKINS UNIVERSITY SCHOOL OF MEDICINE (Baltimore, MD) Aug 2014-May 2019 • Department of Genetic Medicine • Mentor: Garry R. Cutting, M.D. • Investigate genetic modifiers associated with age-at-onset of cystic fibrosis-related diabetes • Functional evaluation of noncoding variants on promoter and enhancer activity BOSTON CHILDREN’S HOSPITAL (Boston, MA) Jun 2012-May 2014 • HHMI, Manton Center for Orphan Disease Research, Depart. of Genetics, Harvard Medical School • Mentor: Christopher A. Walsh, M.D., Ph.D. • Genetic analysis of potentially pathogenic variants in families with children diagnosed with various atypical brain malformations UNIVERSITY OF VERMONT (Burlington VT) Aug 2008-Dec 2011 • Department of Biology • Mentor: Bryan A. Ballif, Ph.D. • Investigate the phospho-regulation of Collapsin Response Mediator Proteins (CRMPs) UNIVERSITY OF VERMONT (Burlington, VT) Jun 2010-Aug 2011 • Department of Psychology • Mentor: Sondra E. Solomon, Ph.D. • Study the psychological effects and stigmatism of men and women infected with HIV

AWARDS/GRANTS ———————————————————————————————————— CYSTIC FIBROSIS FOUNDATION RESEARCH GRANT Sep 2017-Sep 2019 • Project: Analysis of SLC26A9 and PM20D1 as modifiers of cystic fibrosis • Funding: $200,000/year • Identification Number: CUTTIN17G0 APLE MINI RESEARCH GRANT AWARD Sep 2009-Aug 2010 • Project: Characterizing the role of CRMP1 in neuronal guidance • APLE awards provide support of expenses involved in credit-bearing undergraduate research. • Each year, up to twenty-four grants are awarded throughout all disciplines.

150

PEER REVIEWED PUBLICATIONS ———————————————————————————————————— 1. “Increased expression of SLC26A9 delays age at onset of diabetes in cystic fibrosis.” Lam, A. N., M. Atalar, B. Vecchio-Pagan, A. L. Anzmann, J. Shen, L. A. Goff, S. M. Blackman, G. R. Cutting. Under consideration for JCI. 30, April.2019. 2. “Homozygous deletions of non-coding DNA sequences in autism spectrum disorder.” Schmitz-Abe, K., R. Doan, M. Chahrour, J. N. Partlow, B. Barry, B. K. Mehta, S. Servattalab, B. Ataman, A. N. Lam, E. Morrow, M E. Greenberg, T. W. Yu, C. A. Walsh and K. Markianos. Under review at the European Journal of Human Genetics. 14, Apr 2018. 3. “Capitalizing on the heterogeneous effects of CFTR nonsense and frameshift variants to inform therapeutic strategy for cystic fibrosis.” Sharma N, Evans TA, Pellicore M, Davis E, Atalar M, McCague A, Joynt A, Lu Z, Han S, Franca A, Lam AN, Thaxton A, West N, Merlo C, Gottschalk L, Raraigh KS, Sosnay PR, Cotton CU, Cutting GR. PLOS Genetics, eCollection 16, Nov. 2018. 4. “Integrated genome and transcriptome sequencing identifies a noncoding mutation in the genome replication factor DONSON as the cause of microcephaly-micromelia syndrome.” Evrony, G. D., D. R. Cordero, J. Shen, J. N. Partlow, T. W. Yu, R. E. Rodin, R. S. Hill, M. E. Coulter, A. N. Lam, D. Jayaraman, D. Gerrelli, D. G. Diaz, C. Santos, V. Morrison, A. Galli, U. Tschulena, S. Wiemann, M. J. Martel, B. Spooner, S. C. Ryu, P. C. Elhosary, J. M. Richardson, D. Tierney, C. A. Robinson, R. Chibbar, D. Diudea, R. Folkerth, S. Wiebe, A. J. Barkovich, G. H. Mochida, J. Irvine, E. G. Lemire, P. Blakley and C. A. Walsh (2017). Genome Res 27(8): 1323-1335. 5. “Systematic Computational Identification of Variants That Activate Exonic and Intronic Cryptic Splice Sites.” Lee, M., P. Roos, N. Sharma, M. Atalar, T. A. Evans, M. J. Pellicore, E. Davis, A. N. Lam, S. E. Stanley, S. E. Khalil, G. M. Solomon, D. Walker, K. S. Raraigh, B. Vecchio-Pagan, M. Armanios and G. R. Cutting (2017). Am J Hum Genet 100(5): 751-765. 6. “Biallelic mutations in human DCC cause developmental split-brain syndrome.” Jamuar, S. S., K. Schmitz-Abe, A. M. D'Gama, M. Drottar, W. M. Chan, M. Peeva, S. Servattalab, A. N. Lam, M. R. Delgado, N. J. Clegg, Z. A. Zayed, M. A. Dogar, I. A. Alorainy, A. A. Jamea, K. Abu-Amero, M. Griebel, W. Ward, E. S. Lein, K. Markianos, A. J. Barkovich, C. D. Robson, P. E. Grant, T. M. Bosley, E. C. Engle, C. A. Walsh, and T. W. Yu. 2017. Nature genetics 49: 606-612. 7. “Mutations in mitochondrial enzyme GPT2 cause metabolic dysfunction and neurological disease with developmental and progressive features.” Ouyang, Q., T. Nakayama, O. Baytas, S. M. Davidson, C. Yang, M. Schmidt, S. B. Lizarraga, S. Mishra, M. Ei-Quessny, S. Niaz, M. Gul Butt, S. Imran Murtaza, A. Javed, H. R. Chaudhry, D. J. Vaughan, R. S. Hill, J. N. Partlow, S. Y. Yoo, A. T. Lam, R. Nasir, M. Al-Saffar, A. J. Barkovich, M. Schwede, S. Nagpal, A. Rajab, R. J. DeBerardinis, D. E. Housman, G. H. Mochida, and E. M. Morrow. 2016. Proceedings of the National Academy of Sciences of the United States of America 113: E5598-5607. 8. “Targeted DNA Sequencing from Autism Spectrum Disorder Brains Implicates Multiple Genetic Mechanisms.” D'Gama, A. M., S. Pochareddy, M. Li, S. S. Jamuar, R. E. Reiff, A. N. Lam, N. Sestan, and C. A. Walsh. 2015. Neuron 88: 910-917. 9. “Somatic mutations in cerebral cortical malformations.” Jamuar, S. S., A. T. Lam, M. Kircher, A. M. D'Gama, J. Wang, B. J. Barry, X. Zhang, R. S. Hill, J. N. Partlow, A. Rozzo, S. Servattalab, B. K. Mehta, M. Topcu, D. Amrom, E. Andermann, B. Dan, E. Parrini, R. Guerrini, I. E. Scheffer, S. F. Berkovic, R. J. Leventer, Y. Shen, B. L. Wu, A. J. Barkovich, M. Sahin, B. S. Chang, M. Bamshad, D. A. Nickerson, J. Shendure, A. Poduri, T. W. Yu, and C. A. Walsh. 2014. The New England Journal of Medicine 371: 733-743. 10. “Mutations in QARS, encoding glutaminyl-tRNA synthetase, cause progressive microcephaly, cerebral- cerebellar atrophy, and intractable seizures.” Zhang, X., J. Ling, G. Barcia, L. Jing, J. Wu, B. J. Barry, G. H. Mochida, R. S. Hill, J. M. Weimer, Q. Stein, A. Poduri, J. N. Partlow, D. Ville, O. Dulac, T. W. Yu, A. T. Lam, S. Servattalab, J. Rodriguez, N. Boddaert, A. Munnich, L. Colleaux, L. I. Zon, D. Soll, C. A. Walsh, and R. Nabbout. 2014. Am J Hum Genet 94: 547-558. 11. “Deletions in GRID2 lead to a recessive syndrome of cerebellar ataxia and tonic upgaze in humans.” Hills, L. B., A. Masri, K. Konno, W. Kakegawa, A. T. Lam, E. Lim-Melia, N. Chandy, R. S. Hill, J. N. Partlow, M. Al-Saffar, R. Nasir, J. M. Stoler, A. J. Barkovich, M. Watanabe, M. Yuzaki, and G. H. Mochida. 2013. Neurology 81: 1378-1386. 12. “The Role of Tyrosine 504 Phosphorylation on the Regulation of Collapsin Response Mediator Protein 1.” Lam, A. N. University of Vermont, The Honors College, Bailey Howe Library, 2011.

151

SELECTED ABSTRACTS AND PRESENTATIONS ———————————————————————————————————— 1. “Increased expression of SLC26A9 delays age-at-onset of diabetes in cystic fibrosis.” Lam, A., Atalar, M., Vecchio-Pagan, B., Blackman, SM., Cutting, GR. Gordon Research Conference: Human Genetic Variation and Disease, 2018 (Biddeford, ME) • Invited Oral Presentation and Lecture and Poster Presentation 2. “Upstream regulatory element(s) increase expression of SLC26A9 leading to a delayed age-at-onset of diabetes in cystic fibrosis.” Lam, A., Atalar, M., Vecchio-Pagan, B., Blackman, SM., Cutting, GR. Maryland Genetics, Epidemiology and Medicine Training Program: Genetics Research Day Symposium, 2018 (Baltimore, MD) 3. “Increased expression of SLC26A9 delays age-at-onset of Cystic Fibrosis-Related Diabetes.” Lam, A., Atalar, M., Vecchio-Pagan, B., Blackman, SM., Cutting, GR. American Society of Human Genetics Conference, 2017 (Orland, FL) • Reviewer’s Choice Abstract: Scored top 10% of poster abstracts as determined by the reviewer’s score 4. “Analysis of the SLC26A9 Locus as a Modifier of Cystic Fibrosis” Atalar, M., Vecchio-Pagan, B. Lam, A. Davis, E. Akhtar, Y. Sharma, N. Blackman, S.M. Cutting, G.R. North American Cystic Fibrosis Conference, 2016 (Orlando, FL) 5. “Systematic computational identification and experimental verification of variants that activate exonic and intronic cryptic splice sites” Lee, M., Roos, P., Sharma, N., Evans, T.A., Pellicore, M.J., Stanley, S., Khalil, S., Lam, A., Vecchio-Pagan, B., Armanios, M., Cutting, G.R. American Society of Human Genetics Conference, 2016 (Vancouver, Canada)

LEADERSHIP AND SPECIAL TRAININGS ———————————————————————————————————— 2020 GORDON RESEARCH SEMINAR: Human Genetic Variation and Disease May 2018-present • Co-Chair • Work side-by-side with leaders in the field of human genetics to promote discussions at the intersection of genomics, molecular biology, computational biology and medicine • Duties include developing a rigorous scientific program, leading discussion sessions, raising and managing financial support, reviewing abstract submissions, and organizing poster session WOMEN IN GENETICS - JOHNS HOPKINS MEDICINE Jun 2017-Aug 2019 • Organizing Committee Member, “Journeys of Women in Genetics” Seminar Series • Orchestrating lunch forums with female trainees and faculty members to discuss solutions to challenges faced by women in the workplace and life/work balance strategies 56th ANNUAL SHORT COURSE Jun 2015-Sep 2015 • Medical and experimental mammalian genetics • Jackson Laboratories, Bar Harbor, ME, Trainee • Two-week course on heredity, disease, and genetics

TEACHING EXPERIENCES ———————————————————————————————————— JOHNS HOPKINS UNIVERSITY SCHOOL OF MEDICINE (Baltimore, MD) Sep 2015-Dec 2015 • Teaching Assistant • Fundamentals of Genetics Course • Assist in writing and grading fly genetics problem set and final exam; office hours BOSTON CHILDREN’S HOSPITAL (Boston, MA) - Tutor Aug 2012-Mar 2014 • School Work Assistance Team (SWAT) • Tutor inpatients/outpatients in a wide variety of subjects and grades through Volunteer Services BOYS AND GIRLS CLUB OF BURLINGTON (Burlington, VT) May 2010-Sep 2010 • Arts/Music Assistant, Science tutor

152

CLINICAL AND COMMUNITY EXPERIENCES ———————————————————————————————————— JOHNS HOPKINS HOSPITAL (Baltimore, MD) - Medical Shadowing Nov 2016-present • Division of Endocrinology, Diabetes and Metabolism: Scott Blackman, M.D., Ph.D., Yasmin Akhtar, D.O. • Department of Pulmonary and Critical Care Medicine: Patrick Sosnay, M.D. • Department of Pediatric Genetics: Ada Hamosh, M.D. MEDLIFE Sep 2010-Sep 2013 • Medicine, Education and Development for Low Income Families Everywhere • Set up medical clinics in Lima, Peru to provide basic medical care for local residents. • Interpreter for Spanish-English during mission trip. • Promoting and fundraising activities in the United States for MEDLIFE HARVARD MEDICAL INSTITUTIONS (Boston, MA) - Medical Showing Jun 2012-Apr 2014 • Boston Children’s Hospital, Department of Neurosurgery: Joseph Madsen, M.D. Epilepsy Brain Surgeries • Massachusetts General Hospital, Department of Neurology: Ronald L. Thibert, D.O Acute stroke clinic rounds UNIVERSITY OF VERMONT MEDICAL CENTER (Burlington VT) – Volunteer Jun 2009-Sep 2009 • Pediatric Ward: Participate in Child Life Play Program - Involve pediatric patients in play activities

LANGUAGE PROFICIENCY ———————————————————————————————————— • Vietnamese, Fluent • Spanish, Acquiring, Advance

REFERENCES ———————————————————————————————————— Garry R. Cutting, M.D. Johns Hopkins University School of Medicine - Department of Genetic Medicine Miller Research Building - Suite 551/Room 559 733 N. Broadway Baltimore, MD 21205 Phone: (410) 955-1773 | E-mail: [email protected]

Scott M. Blackman, M.D., Ph.D. Johns Hopkins University School of Medicine - Division of Pediatric Endocrinology David M. Rubenstein Children’s Health Building, 3rd Floor 200 N. Wolfe Street Baltimore, MD 21205 Phone: (410) 955-6463 | E-mail: [email protected]

Christopher A. Walsh, M.D., Ph.D. Harvard Medical School/Boston Children’s Hospital - Department of Genetics Center for Life Science 15049 3 Blackfan Circle Boston, MA 02115 Phone: (617) 919-2923 | E-mail: [email protected]

Bryan A. Ballif, Ph.D. University of Vermont - Department of Biology 311 Marsh Life Sciences Burlington, VT 05405 Phone: (802) 656-1389 | E-mail: [email protected]

153