Evaluation of genetic variants for Type 2 diabetes associated

kidney disease in African Americans

By

Meijian Guan

A Dissertation Submitted to the Graduate Faculty of

WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES

in Partial Fulfillment of the Requirements

for the Degree of

DOCTOR OF PHILOSOPHY

In Integrative Physiology and Pharmacology

August 2017

Winston-Salem, North Carolina

Approved By:

Maggie Ng, Ph.D. Advisor

Barry Freedman, M.D., Chairman

Donald Bowden, Ph.D.

Fang-Chi Hsu, Ph.D.

Timothy Howard, Ph.D.

I

Acknowledgements

Foremost, I would like to thank my advisor Maggie Ng, for allowing me to have the opportunity to work in this lab, and providing extensive support, resources, and training. During my time in the lab, I appreciate your unique insights on many subjects, including genetic research, statistics, scientific writing, as well as problem solving. You taught me how to pay attention to the small details during problem-solving process.

Because of you, I learned the value of examining all sides of a problem, instead of just scratching the surface, to make a conclusion that truly stands up to scrutiny. The lessons that I have learned from you during the past few years can be applied to so many situations in my life, not just limited to research.

Dr. Bowden, thank you for creating such a wonderful environment that promotes open thinkings, collaboration, as well as encouragement of expanding our interests. And thank you for being a member of my advisory committee. I appreciate your invaluable advices on both of my research and career development. I would never be able to achieve what I achieve now without your generous support. Dr. Freedman, thank you for being my committee chair and for being so willing to share your expertise and experience with me. I cannot say how much I appreciate your support along the way, the advices on my research projects, the clinical interpretations of the genetic results, help with manuscript revisions, and the opportunities you offered to collaborate with your research group. To Fang-chi, I learned fundamentals of statistics and SAS programming skills from your class. You have been such a good listener and always willing to encourage me for further pursuing. Dr. Howard, thank you so much for filling in to help me out when Dr. Zhou left my committee. I enjoyed the classes you offered at genomics center, which furthered my understanding in human genetics.

II

To those in the Bowden Lab, I appreciate your training, guidance, advice, supports, as well as your great company. Nichole, thank you for sharing your experience in kidney disease research with me; the most updated APOL1 risk coding you provided was very helpful. Pam - thank you for your great help in providing the genotyping reports and phenotypic information.

And thank you for your great patience; I don’t remember how many times questions I have asked you regarding the details of the data. Becky, I will miss the cakes you baked for me, as well as the interesting conversations between us. To the data administrators in Bowden Lab, JJ and Lucho – thank you for your help getting me started in the lab. Both of you helped to install softwares, set up systems and solve the IT problems. Lucho – you really gave a lot of invaluable advices on how to be a good programmer, and how to better manage my files. JJ, I appreciate your assistance in retrieving data from the database for me whenever I asked. Moreover, thank you for all your encouragements and advices on how to succeed. To Poorva, I still remember those complicated programs you help me to run when I started my project. You were always there when I have some programming issues. You already left the lab, wish you enjoy your new baby, as well as your new job. To Hayrettin, you are one of the best statisticians I ever worked with. You were the “go-to” person whenever I had difficulties to understand statistical concepts.

To Jacob, Chuan, Jackie, Laura, Jeremy, Mary and Keri, we had so many good moments together. Thank you all for always sitting next me and inspiring me to be a better version of myself. I really enjoyed hanging out with each of you. Wish you all have the brightest future.

To the faculty and staff of Integrative Physiology and Pharmacology, particularly Dr.

Allyn Howlett, Dr. Paul Czoty and Denise Wolfe, I appreciate your advice, guidance, and endless support over the past years; I would not be as far along in my career if it was not for the support I have received from IPP.

Lastly, I would like to thank my family for their endless support. They are my motivation to pursue a PhD. Thanks to my parents, Wanming Guan and Changfeng Chen, for their

III continuous love, support and encouragement. I am so appreciative that you taught me to be a good person and to work hard. To my parents in law, Zhe Wang and Sheying Ma, I appreciate your endless support. You offered your help when it is most needed. Thanks to my amazing wife Yan Wang for her unconditional love and support. We both know I would not have made it this far without you. Special thanks to my son Greyson who has brought so much happy and enjoyable moments to this family. I appreciate every minute

I spend with you son.

IV

Table of Contents

Page Number List of Figures and VI Tables List of Abbreviations IX Abstract XI Chapter 1 Introduction 1 Chapter 2 Genome-wide association study in African 16 Americans with T2D-attributed end-stage kidney disease Chapter 3 An exome-wide association study for type 2 61 diabetes-attributed end-stage kidney disease in African Americans Chapter 4 Association of kidney structure-related 102 variants with type 2 diabetes-attributed end-stage kidney disease in African Americans Chapter 5 Association analysis of the (CUBN) and 137 megalin (LRP2) with end-stage kidney disease in African Americans Chapter 6 Discussion and Conclusions 165 References 178 Curriculum Vitae 197

V

List of Figures and Tables

Page Number Chapter 2 Figure 1. Workflow of T2D-ESKD GWAS in AAs (Baseline model) 35 Figure 2.A. Locus plots of T2D-ESKD associations at P<5x10-8 in 36 baseline model Figure 2.B. Locus plots of all-cause ESKD associated variants at 37 P<5x10-8 in baseline model Figure 3. Locus plots of T2D-ESKD associations at P<5x10-8 in 38 APOL1-negative model Table 1. Clinical characteristics of Affy6.0 dataset (stage 1a) 39 Table 2. Clinical characteristics of Axiom and MEGA datasets 40 (stage 1b and 1c) Table 3. Independent T2D-ESKD associations (P<5x10-8) under 41 the baseline model Table 4. All-cause ESKD associated variants at P<5x10-6 in in 42 baseline model Table 5. Independent T2D-ESKD associations (P<5x10-8) under 43 APOL1-negative model Table 6. All-cause ESKD associated variants at P<5x10-6 in 44 APOL1-negative model Table 7. CKD and related associations from previous studies 45 replicated significance and consistent effect Sup. Figure 1. QQ plot of GWAS results of T2D-ESKD vs. non-diabetic 46 non-nephropathy controls under baseline model Sup. Table 1. Discrimination analysis for genome-wide significant T2D- 47 ESKD associated SNPs in baseline model Sup. Table 2. Discrimination analysis for genome-wide significant T2D- 48 ESKD associated SNPs in APOL1-negative model Sup. Table 3. Results of APOL1-negative model for top associations in 49 baseline model Sup. Table 4. Evaluation of previous loci associated with kidney disease 50 and related traits Sup. Table 5. Examination of top associations in previous T2D-ESKD 59 GWAS

VI

Chapter 3 Figure 1. Analysis workflow of single-variant association analysis for 81 T2D-ESKD Exome sequencing study (baseline model) Table 1. Clinical characteristics of study cohorts 82 Table 2. T2D-ESKD associated variants in meta-analysis from 83 discovery and replication cohorts (Baseline model) Table 3. Top T2D-ESKD associations in meta-analysis after 84 removing APOL1 renal-risk genotype carriers (APOL1- negative model) Table 4. Meta-analysis combining T2D-ESKD and all-cause ESKD 85 cohorts for rs41302867 Table 5. Top associations of gene-based analyses in baseline or 86 APOL1-negative models Sup. Table 1. Discrimination analysis for top associations of baseline 87 model and APOL1-negative model Sup. Table 2. GTEx results of top associations (P<1x10-4) 88 Sup. Table 3. Results of top associations from baseline model in 95 APOL1-negative model Sup. Table 4. Results of single-variant analysis detected loci in gene- 96 based analysis Chapter 4 Figure 1. Workflow of kidney structure-related gene analyses. 123 Table 1. Clinical characteristics of study cohorts 124 Table 2. T2D-ESKD associated SNPs in meta-analysis from 125 discovery and replication cohorts (Baseline model) Table 3. Additional associations in combined analysis after 126 removing APOL1 renal-risk genotype carriers (APOL1- negative model) Table 4. Top associations from the gene-based analysis in APOL1- 127 negative model Sup. Table 1. Kidney structure-related genes 130 Sup. Table 2. Differentiation steps for T2D-ESKD associated SNPs in 132 Baseline model Sup. Table 3. Associations approaching locus-wide significance in 133 Baseline model or APOL1-negative model Sup. Table 4. Differentiation steps for top associated SNPs in APOL1- 134

VII

negative model Sup. Table 5. Results of the top three associations from Baseline model 135 in APOL1-negative model Sup. Table 6. eQTL analysis of T2D-ESKD associated SNPs and gene 136 expression Chapter 5 Figure 1. Cubilin gene (CUBN) and megalin gene (LRP2) SNP 154 selection and genetic association analysis workflow. Table 1. Demographic and clinical characteristics of study samples 155 Table 2. Association analysis between CUBN and LRP2 variants 156 with T2D-ESKD (additive, fully-adjusted model) Table 3. Strongest genetic associations in cases with non-diabetic 158 ESKD (additive, fully-adjusted model) Table 4. Trait discrimination analysis of T2D-ESKD associated 159 SNPs in AXIOM array samples Sup. Table S1. All 66 SNPs tested in CUBN and LRP2 selected from 161 T2D-GENES, ESP, and Sup. Table S2. Top hits from T2D-GENES 163 Sup. Table S3. Association analysis between CUBN and LRP2 variants 164 with T2D-ESKD (additive, APOL1 risk removed)

VIII

List of Abbreviations

AA African American ACAD11 Acyl-CoA Dehydrogenase Family Member 11 AFR African AGEs advanced glycation end products AIM ancestry informative marker APOL1 apolipoprotein L1 ARHGAP24 Rho GTPase Activating 24 CADD Combined Annotation Dependent Depletion CD2AP CD2 Associated Protein CHR CKD chronic kidney disease CLDN8 Claudin 8 CMC Combined and Multivariate Collapsing test COL4A3 Collagen Type IV Alpha 3 Chain CUBN Cubilin DKD diabetic kidney disease DLGAP5 DLG Associated Protein 5 EA1 European American EA2 effect allele EAF effect allele frequency EFNB2 ephrin-B2 eGFR estimated glomerular filtration rate ENPP7 Ectonucleotide Pyrophosphatase/Phosphodiesterase 7 eQTL Expression quantitative trait loci ESKD end-stage kidney disease EUR European GFR glomerular filtration rate Generalized linear Mixed Model GMMAT Association Tests GNG7 G Protein Subunit Gamma 7 GRM genetic relationship matrix GRM8 Glutamate Metabotropic Receptor 8 GWAS genome-wide association study HDL high-density lipoprotein IER2 Immediate Early Response 2 IFITM3 Interferon Induced Transmembrane Protein 3 ILDR2 Immunoglobulin Like Domain Containing Receptor 2 LD linkage disequilibrium LMM linear mixed model LRP2 LDL Receptor Related Protein 2

IX

MALD mapping by linkage disequilibrium MAPK mitogen-activated protein kinase MB Madsen-Browning test MIR4739 MicroRNA 4739 MMP2 Matrix Metallopeptidase 2 MYH9 Myosin Heavy Chain 9 N Number NADK NAD Kinase NF-κB Nuclear factor-κB NPHP3 Nephrocystin 3 OR odds ratio OTUD7B OTU Deubiquitinase 7B PC1 pricipal component 1 PCA principal comopent analysis PEX6 Peroxisomal Biogenesis Factor 6 PKC protein kinase C PLEKHN1 Pleckstrin Homology Domain Containing N1 POS Position PRX Periaxin RAD51AP2 RAD51 Associated Protein 2 RAS renin-angiotensin system RBFOX3 RNA Binding Protein, Fox-1 Homolog 3 RBM43 RNA Binding Motif Protein 43 RNA Ribonucleic acid RREB1 Ras Responsive Element Binding Protein 1 SD standard deviation SKAT sequence kernel association test T1D type 1 diabetes T2D type 2 diabetes TFBS transcription factors binding site TGF-beta transforming growth factor beta TTC21B Tetratricopeptide Repeat Domain 21B VEGF vascular endothelial growth factor VEP variant effect predictor VT Variable Threshold test

X

Abstract

End-stage kidney disease (ESKD) is a significant public health problem in the U.S., disproportionately affecting African Americans (AAs; 1,003 per million/year) at 3.3-fold higher incidence rate than European Americans (301 per million/year), and 2-fold higher than Native

Americans (499 per million/year) in 2014. Diabetic kidney disease (DKD), primarily attributed to type 2 diabetes (T2D), including ESKD and advanced chronic kidney disease (CKD) accounts for over 40% of all ESKD cases. Observation of familial aggregation in epidemiologic studies suggests that genetic factors may contribute to the risk of DKD. While apolipoprotein L1 gene

(APOL1) G1 and G2 alleles explain approximately 70% of the disparity in non-diabetic ESKD in

AAs, they fail to account for the excess risk of T2D-ESKD in AAs. Genetic studies have revealed >70 genome-wide significant of genetic determinants with impact on kidney diseases or kidney functions. Despite the progress, the proportion of T2D-ESKD susceptibility attributed by these kidney related genes is still unclear. The goal of this series of studies is to provide a comprehensive evaluation of the genetic architecture of T2D-ESKD in AAs.

Our previous T2D-ESKD genome-wide association study (GWAS) has limited sample size and statistical power to detect genome-wide significant associations. In the first study, we extended the GWAS to include 3,432 T2D-ESKD cases and 6,977 non-diabetic non-nephropathy controls, followed by a discrimination analysis with 2,756 T2D non-nephropathy subjects to exclude T2D- associated variants. Following replication analysis in 1,910 non-diabetic ESKD cases and 908 non-diabetic non-nephropathy controls, meta-analysis of 5,342 all-cause ESKD cases and

6,977 non-diabetic non-nephropathy controls revealed an additional novel locus at

LINC00460/EFNB2 (rs77113398 (P=9.84x10-9, OR=1.94) and rs9622363 (P=1.96x10-25,

OR=0.68)). Given previous GWAS efforts have largely focused on common genetic variants, which cumulatively only explain a low percentage of disease variance, rare coding variants may accounted for some part of the “missing heritability”. In the second study, we evaluated the role

XI of low-frequency coding variants in T2D-ESKD using 1730 AA exome-wide sequencing data followed by replication in 3141 additional AA individuals. A total of 15 coding variants putatively contributed to T2D-ESKD in AAs (P<1x10-4), including PLEKHN1, NADK, RAD51AP2, RREB1,

PEX6, GRM8, PRX, APOL1, OTUD7B, IFITM3, DLGAP5 and IER2. T2D-ESKD associated variants at GRM8, PEX6, ILDR2, RREB1 and PRX were also associated with obesity, dyslipidemia, T2D and the Mendelian disease CMT neuropathy. Interestingly, rs41302867, located in RREB1, revealed consistent association with both T2D-ESKD and non-diabetic ESKD.

In addition to genome-wide approach, we conducted two hypothesis-driven studies targeting strong candidate genes that are important in kidney function and structure. In the third study we focused on 47 genes important in podocyte, glomerular basement membrane, mesangial cell, mesangial matrix, renal tubular cell, and renal interstitium structure for association with T2D-

ESKDin AAs. A number of region-wide significant associations with T2D-ESKD were revealed at

CD2AP, MMP2, TTC21B, COL4A3, NPHP3-ACAD11, CLDN8, and ARHGAP24. The fourth study further examined two related genes, CUBN (cubilin) and LRP2 (megalin), which contribute to albumin transportation in proximal tubule. CUBN SNP rs1801239 (I2984V), previously associated with albuminuria, as well as a novel LRP2 missense variant, rs17848169 (N2632D), were significantly associated with T2D-ESKD in African Americans.

These projects have identified a number of genetic loci contributing to T2D-ESKD in AAs using a broad range of genetic approaches. The results presented here provide new biological insights into T2D-ESKD disease pathogenesis, and expand our knowledge of the genetic factors underlying this disease. More importantly, these results may shed light on novel therapeutics and tractable drug targets for DKD.

XII

Chapter 1

Introduction

Chronic kidney disease

The kidneys play a central role in whole-body homeostasis, regulating acid-base balance, electrolyte concentrations, extracellular fluid volume, as well as controlling blood pressure.

Chronic kidney disease (CKD) is a disease characterized by a gradual loss of kidney function over periods of years to decades. Glomerular filtration rate (GFR), which describes flow rate of filtered fluid through the kidney, has been used to characterize the progression of CKD in five stages. The definitions of five stages are, stage 1 with normal or high GFR (GFR>90 ml/min/173m2); stage 2 mild CKD (GFR = 60-89 ml/min/173m2); stage 3 moderate CKD

(GFR=30-59 ml/min/173m2); stage 4 severe CKD (GFR = 15-29 ml/min/173m2); and stage 5 end stage CKD (GFR<15 ml/min/173m2). GFR is usually estimated by serum creatinine, an end product of muscle activity, along with other variables (age, ethnicity and gender) through a mathematical formula (Levey and Stevens 2010).

CKD is a significant health problem in the US, and is associated with an increased risk for cardiovascular disease, all-cause mortality and end-stage kidney disease (Fox et al. 2012).

The overall prevalence of CKD has increased from 12% to 14% between 1988-1994 and 1994-

2004, and the most recent rate is 14.8% during 2011-2014, with CKD stage 3 being the most prevalent stage (USRDS, 2016). In 2014, mortality rates were nearly 3-fold higher for Medicare patients aged 66 and older with CKD (135 per 1000 per year) than those without (44 per 1000 per year) (USRDS, 2016). In addition, CKD has become a significant economic burden in the

US. Medicare spending for patients with age >=65 who have CKD exceeded $50 billion in 2014, accounting for 20% of all Medicare spending in this age group. Medicare spending for

1 beneficiaries with CKD who were younger than age 65 exceeded $8 billion in 2014, representing 44% of total spending in this age group (USRDS, 2016).

End-stage kidney disease

End-stage kidney disease (ESKD) is the final stage of CKD, with kidney function below

10% of its normal level (eGFR<15 ml/min/173m2). Ultimately, ESKD results in anemia and accumulation of nitrogenous waste product, as well as other toxic metabolites that result in uremia unless renal replace therapy (RRT) is started. Uremia has systemic consequences that impact nearly all organs and is fatal if left untreated. In 2014, 120,688 new ESKD cases were diagnosed and 97.4% of them received dialysis while only 2.6% of them started RRT (USRDS,

2016). The mortality rates for ESKD, dialysis, and transplant patients were 136, 166, and 30, per 1,000 patients per years, respectively in 2014 (USRDS, 2016). In addition, ESKD has been a financial burden in the US – the annual expenditure for ESKD borne solely by Medicare exceeded $32.8 billion in 2014, 3.3% higher than in 2013 and accounting for 7.2% of all

Medicare paid claims costs (USRDS, 2016). Actual costs are higher as this excludes non-

Medicare costs and pre-dialysis care for CKD. Incident ESKD rates, predominantly related to diabetes, remained relatively constant in recent years despite medical advances in the treatment of hypertension and hyperglycemia.

Diabetic kidney disease

Diabetic kidney disease (DKD) is the most common cause of ESKD, accounting for more than 44% of all causes of ESKD in the US in 2014 and approximately 90% relate to type 2 diabetes (T2D) (USRDS, 2016). DKD is a devastating microvascular complication of diabetes, usually accompanied with the presence of proteinuria, declining kidney function (eGFR) and/or progression to ESKD. The etiology of DKD is multifactorial, with family history, hemoglobin A1c, systolic blood pressure, glycemic control, albuminuria, duration of diabetes, serum uric acid, dyslipidemia, obesity and smoking being identified as risk factors (Zoppini et al. 2012; Macisaac

2 et al. 2014; Radcliffe et al. 2017). Notably, with a permissive environment of hyperglycemia, family history of kidney disease appears to be one of the strongest risk factors for initiation of

DKD (Seaquist et al. 1989; Freedman et al. 1993; Quinn et al. 1996).

The molecular mechanisms responsible for the development and progression of DKD remain unclear. However, it is believed that prolonged hyperglycemia leads to chronic metabolic and hemodynamic changes that regulate a number of intracellular signaling pathways, transcription factors, cytokines, chemokines, and growth factors (Remuzzi et al. 2002; Soldatos and Cooper 2008). The cumulative result of these changes leads to structural abnormalities in the kidney, such as glomerular basement membrane thickening, podocyte injury, and mesangial matrix expansion. In addition, GFR declining associated glomerular sclerosis and tubulointerstitial fibrosis occur during the DKD progression (Badal and Danesh 2014). Oxidative stress caused by free radical production is believed to play a central role in the development of diabetes complications, especially in the kidneys, where increased oxidative stress has been reported to decrease O2 tension in the diabetic kidney (Nishikawa et al. 2000; Palm et al. 2003;

Haidara et al. 2009). In addition to oxidative stress, enhanced flux into the polyol and hexosamine pathways, activation of protein kinase C (PKC) and transforming growth factor beta

(TGF-beta)/Smad/mitogen-activated protein kinase (MAPK) signaling pathways and formation of advanced glycation end products (AGEs) may also implicated in the development of DKD (Ishii et al. 1996; Koya et al. 1997; Inoguchi et al. 2003; Lee et al. 2003). Moreover, high glucose levels can activate the proinflammatory transcription factor nuclear factor-κB (NF-κB), promoting the increased inflammatory (Schmid et al. 2006). Finally, hemodynamic changes also contribute to the pathogenesis of DKD via activation of the renin-angiotensin system (RAS) and vascular endothelial growth factor (VEGF) signaling cascades (Hostetter et al.

1982; Khamaisi et al. 2003). There are additional potential mechanisms may also contribute to

3 the progression of DKD, including microRNAs, epigenetics, and Rho family of small GTPase (Badal and Danesh 2014).

In addition, genome-wide association studies (GWASs) have also suggested that genetic components contribute to the susceptibility of DKD in multiple ethnic groups (Maeda

2004; Pezzolesi et al. 2009; McDonough et al. 2011; Sandholm et al. 2012; Iyengar et al. 2015).

Moreover, investigations on low-frequency genetic variants have revealed a number of rare coding variants located in RREB1, NPHS1, CUBN, LRP2, COL4A3 and CLDN8 that also play a role in ESKD and T2D-ESKD susceptibilities in AAs (Bonomo et al. 2014a, b; Ma et al. 2016;

Guan et al. 2016). In addition, it has been hypothesized that genetic variations in Mendelian disease genes may in part account for common disease susceptibility (Blair et al. 2013; Parsa et al. 2013a).

Genetic susceptibility to kidney diseases

Ethnic differences in the risk of ESKD and Type 2 Diabetes attributed ESKD (T2D-ESKD) have been observed. This disparity is especially significant in African Americans (AAs), who have incidence rates of ESKD nearly 3.1-fold and 2-fold higher than European Americans (EAs) and Native Americans, respectively (USRDS, 2016). ESKD incidence rates per million people in the U.S. were 1,003 for AAs, 301 (Caucasians) and 499 (Native Americans) in 2014 (USRDS,

2016). One explanation put forth to explain these disparities is that racial or ethnic minorities have lower economic status and consequently more limited access to medical care. While this may explain some of the differences, several studies have controlled for socio-economic status and still observed significant higher rates of renal complications in AA patients than in EA patients (Spray et al. 1995; Freedman et al. 1995; Lei et al. 1998; Song et al. 2009).

Familial aggregations of DKD, including T2D-ESKD, in multiple populations have been previously reported (Seaquist et al. 1989; Spray et al. 1995; Freedman et al. 1995; Quinn et al.

1996). For AAs with T2D, individuals who had close relatives with T2D-ESKD were at nearly 8-

4 fold higher risk for developing T2D-ESKD comparing to the individuals without T2D-ESKD affected relatives, the difference remained even after controlling for yearly income, education, serum cholesterol level, smoking and hypertension (Freedman et al. 1995). These data suggests genetics may partly explain the racial disparity and familial aggregation of various forms of kidney diseases.

A story of APOL1

In an effort to explain the kidney disease disparity in AAs, mapping by linkage disequilibrium (MALD) method had been used in several studies to search for ESKD- predisposing loci and led to the identification of a genomic region on chromosome 22q12.3 containing 30 genes, including MYH9 gene (Kao et al. 2008; Kopp et al. 2008; Shlush et al.

2010). MYH9 was considered as the putative susceptibility loci for non-diabetic ESKD partially based on previous knowledge showing that autosomal dominant mutations in MYH9 cause the

Giant Platelet Syndrome group of disorders which in some cases includes a form of renal failure of glomerular origin (Kopp 2010). However, researchers started looking for variants outside of

MYH9 region when intensive sequencing failed to identify the causal variants within MYH9

(Nelson et al. 2010).

Subsequent studies interrogated a larger chromosomal region adjacent to MYH9, identified the causal variants located in APOL1 region that account for 70% of the prevalence of non-T2D ESKD in AAs (Genovese et al. 2010). The risk allelic variants in APOL1 comprise two missense variants in nearly perfect linkage disequilibrium (LD), rs73885319 (S342G) and rs60910145 (I382M), designated as G1, and another 6 bp deletion (rs71785313) designated as

G2. The G1 and G2 variants have not been observed together on the same chromosome as a combined haplotype, indicating that they were positively selected for and arose independent of one another (Genovese et al. 2010).

5

Apolipoprotein L1 (ApoL1) circulates in the blood at high levels as part of a high-density lipoprotein (HDL) complex. It is also widely expressed in tissues, particularly in the lung, placenta, pancreas, liver, and kidney (Duchateau et al. 2001; Page et al. 2001; Uhlén et al.

2015). ApoL1 protein with either the G1 or G2 allele is able to lyse Trypanosoma brucei (T.b.) rhodesiense, a causative agent of African sleeping sickness, but not ApoL1 without the G1 or

G2 alleles(Genovese et al. 2010). Thus, the disparity of non-diabetic ESKD in those of African descent was very well explained: natural selection led to positive selection of these common variants which can provide strong survival advantages in the local environment against certain strains of T.b (Genovese et al. 2010).

The most striking risk of APOL1 variants for ESKD have been reported under a recessive inheritance: the high-risk genotype can be G1/G1, G1/G2, or G2/G2 (Genovese et al.

2010). About half of all AAs inherit at least one APOL1 risk allele while 12-15% are risk homozygotes (Friedman et al. 2011). APOL1 renal-risk genotypes have shown strong association with many kidney diseases, especially focal segmental glomerulosclerosis (FSGS), hypertension-attributed ESKD, and HIV nephropathy (HIVAN) (Kopp et al. 2011; Pollak et al.

2012). In addition, APOL1 renal-risk alleles were also implicated in lupus kidney disease and subtypes of membranous nephropathy (Freedman et al. 2014; Larsen et al. 2014). The same risk genotypes leading to greatly increased risk of many different types of kidney disease indicates similarities in disease pathogenesis. More recent evidence support that kidneys from

AA deceased donors with APOL1 risk carriers are at increased risk for early allograft failure relative to kidneys from AAs with zero or one APOL1-renal-risk variants (Freedman et al. 2015,

2016). However, the relationship between APOL1 and DKD remains unclear. The APOL1 risk genotype demonstrated association with progression but not incidence of DKD (Friedman et al.

2011; Parsa et al. 2013b). Due to the frequent lack of biopsy in DKD patients, misclassification of individuals with coincidence of both DKD and APOL1 risk genotype cannot be fully excluded.

6

Genetic approaches to identify loci for kidney diseases and their functions

Given the great healthcare burden associated with CKD, there has been much interest in the search of genetic contributors to CKD. Familial linkage analysis was commonly applied to identify genetic loci contributing to diabetic and non-diabetic kidney disease before the advent of

GWAS. One successful example of linkage analysis was the implication of CNDP1 gene in DKD with replication in multiple populations (Vardarli et al. 2002; Janssen et al. 2005; Freedman et al.

2007; Ahluwalia et al. 2011). Candidate gene approach is another widely used methodology to assess variants located in genes with plausible biological relationship with kidney disease.

However the majority of the findings of CKD candidate gene studies have turned out to be disappointing with very few exceptions, for example PRKCB gene in DKD among Asians, EPO gene with DKD in Europeans, and NOS3 gene with ESKD in both Asian and European populations (Noiri et al. 2002; Nagase et al. 2003; Liu et al. 2005; Palmer and Freedman 2012).

GWAS has become the most popular genetic tool in the last ten years to identify common variants associated with disease/trait of interest. Large efforts have been spent on

GWAS in CKD and its related continious traits (eGFR, serum creatinine level) including up to

175,579 individuals of European ancestry from the CKDGen consortium. A dozen of replicated

GWAS loci, including Shroom3, UMOD, NAT8 and SLC7A9 have been identified in association with kidney disease/function (Kottgen et al. 2010; Pattaro et al. 2012; Tin et al. 2013a; Pattaro et al. 2016). In addition, these results indicate an enriched expression of genes that map into associated loci in kidney tissues and involve pathways related to kidney development and transmembrane transporter activity, regulation of glucose metabolism, as well as kidney structure (Pattaro et al. 2016). Fewer GWAS have been performed specifically for type 1 diabetes or type 2 diabetes attributed kidney diseases, and only a handful of suggestive loci have been demonstrated in association with DKD or T2D-ESKD. The reported associations including ELMO, NCALD, and ACACB with DKD in Japanese, the PVT1 gene with T2D-ESKD

7 in Pima Indians, FRMD3 with type 1 diabetes attributed ESKD in Europeans, and RPS12,

LIMK2 and SFI1 with T2D-ESKD or all-cause ESKD in AAs (Vardarli et al. 2002; Shimazaki et al.

2005; Pezzolesi et al. 2009; McDonough et al. 2011).

Another powerful genetic approach is admixture mapping, which has been specifically used in admixed populations such as AAs. It successfully led to the identification of APOL1 gene which are predominantly associated with non-diabetic ESKD in AAs (Genovese et al.

2010). Follow-up studies have indicated that the G1 and G2 alleles of APOL1, which function under a recessive model, can cause increased risk of human immunodeficiency virus nephropathy, focal segmental glomerulosclerosis (FSGS), and CKD attributed to hypertension among AAs (Kopp et al. 2011; Pollak et al. 2012). Despite the generally held idea that APOL1

G1/G2 alleles only contribute to non-diabetic kidney disease, a recent study described a relationship between APOL1 risk alleles and progression of DKD (Friedman et al. 2011; Parsa et al. 2013b).

Despite numerous risk factors that have been identified, the predominant genetic contributors of DKD or T2D-ESKD still remain unclear. Since previous studies have largely focused on common genetic variants, rare variants might be the next hope to decipher the genetic architecture of DKD.

Rare genetic variants

Generally two hypotheses had been proposed to explain genetic susceptibility of complex traits: common variants common diseases and rare variants common diseases. Rare, low-frequency and common variants were commonly defined as variants that have frequency

<1%, 1-5% and >5%. Large scale GWAS has been proved to be a very powerful genetic approach in identifying common genetic variants with impact on complex human phenotypes.

Countless genetic variants have been revealed contributing to the susceptibility to hundreds of complex diseases such as T2D, heart disease and cancer. However, with very few exceptions,

8 common variants discovered by GWAS have modest effects and cumulatively explain only a small proportion of disease liability (Kiezun et al. 2012). The missing heritability has inspired extensive efforts on other genetic factors such as epigenetics, non-coding RNA, epistasis and rare variants. One attractive hypothesis is that rare variants tend to be more deleterious than common variants due to the nature selection (Kircher et al. 2014). It has been noted that rare variants with large effect size are the major cause of Mendelian disorders and low prevalence diseases (Gibson 2012). Moreover, there are increasing evidence that low frequency or rare variants play an important role in complex human diseases (Rivas et al. 2011; Jonsson et al.

2012; Gudmundsson et al. 2012; Flannick et al. 2014; Do et al. 2015).

In recent years, next-generation sequencing has emerged as a very powerful tool to evaluate the effects of rare variants on traits of interest. Since the cost of whole-genome sequencing still remains unaffordable to most investigators, exome sequencing has been commonly used as an alternative approach, which is specifically focused on coding regions, and is substantially cheaper than whole-genome sequencing. For rare variants association studies, tag-SNPs approach adapted from GWAS, does not work well due to the low correlations between tag-SNPs and rare variants. Instead, direct mapping or sequencing has been used to capture and analyze functional variants. Due to their low occurrence, rare variants are usually aggregated by genes/regions for joint analysis, which can help improve power and reduce multiple comparison problems (Wessel and Schork 2006). Several types of gene-based analysis are available, such as collapsing tests (burden tests) and kernel-based tests. Collapsing tests are specifically designed for analyzing rare variants, which have minor allele frequency less than 0.01-0.05. They simply aggregate all the rare variants of a genetic region into one single variable that is used for association with diseases of interest. Numerous collapsing tests have been developed for rare variants, such as combined multivariate and collapsing (CMC), weighted sum statistic (WSS), and kernel based adaptive cluster (KBAC) (Li and Leal 2008;

9

Madsen and Browning 2009; Liu and Leal 2010). A limitation of those collapsing tests is that they assume that all rare variants are affecting the disease with the same direction and magnitude. More recently, a non-collapsing-based sequence kernel association test (SKAT) has been proposed with assumption that each variant would have different magnitudes of effect on disease, whereas some variants have protective effect and others have deleterious effect (Wu et al. 2011). A series of tests based on SKAT have been developed, including a more robust extension of SKAT, optimal SKAT (SKAT-O), which can adaptively apply SKAT or collapsing test based on different scenarios to maximize power (Lee et al. 2012a, b).

Despite many powerful genetic tools that have been proposed, analyzing rare variants still remains challenging. A few questions need to be considered before designing an association study for rare variants: which variants should be collapsed? What is the best frequency threshold? How many cases and controls are needed to detect the association? Zuk and colleagues have proposed a conceptual framework to address these questions (Zuk et al.

2014). Ideally, one would aggregate only disruptive or loss-of-function variants to increase effect size. Unfortunately no current annotation tools can perfectly distinguish deleterious variants from benign variants. Thus multiple strategies are necessary to assess different groups of variants according to their predicted effects and allele frequency (Zuk et al. 2014). Usually disruptive variants require no threshold of allele frequency because they provide most robust information, whereas missense variants need to be carefully selected due to the contamination of benign variants. Ideally, allele frequency threshold should be selected based on the proportion of damaging variants of each gene and total sample size. In particular, a well- powered rare variant association study requires similar sample size as a sufficient GWAS, that is, at least 25,000 cases as well as a substantial replication set (Zuk et al. 2014).

The recent advance of massively parallel sequencing technologies has provided a rich opportunity to study the effect of rare variants, particularly those with traceable impact on a

10 protein level, on complex human diseases. Significant efforts have been continually devoted to develop more powerful and computational efficient statistical approaches for rare variants driven studies. The rapid development of these technologies makes it possible to evaluate the role of rare variants in the etiology of complex traits, and address the missing heritability that is unexplained by common variants.

Effect of admixture and correction

American populations with African, Caribbean and Hispanic heritage are admixed with substantial but varying components of continental African ancestry according to studies with ancestry informative panels (Tishkoff et al. 2009; Bryc et al. 2010). This is a result of tragic slave trade beginning in the 16th century that brought ~12 million individuals from Western Africa to the Americas (Salas et al. 2005). The relocation was followed by mating with resident

Amerindian and European populations, which is referred as admixture, led to the genetic profile of the present populations.

Previous studies have shown that AAs in the US carry segments of DNA from peoples of

Europe, Africa, and the Americas, with variation in African and European admixture proportions across individuals and differences in groups across parts of the country (Parra et al. 1998;

Smith et al. 2004). According to a recent study, AAs showed average proportions of 73.2%

African, 24.0% European, and 0.8% Native American ancestry (Bryc et al. 2015). Systematic differences across states in the US in mean ancestry proportions of AAs were observed. On average, the highest proportions of African ancestry were found in AAs living in or born in the

South, especially South Carolina and Georgia, while lower levels of African ancestry in the

Northeast, the Midwest, the Pacific Northwest, and California were discovered (Bryc et al. 2015).

In genetic association studies in admixture populations, for example, AAs, admixture can be a serious confounding effect leading to spurious associations if not properly adjusted. A number of methods can be used to control for the effect of admixture, including ancestry

11 informative markers (AIMs), principal component analysis (PCA), and mixed models. AIMs are

SNPs spaced out across the genome with large frequency differences between parent populations of admixed populations (Smith et al. 2004). An expectation-maximization algorithm is used to estimate the proportion of ancestry for each sample based on the distribution of AIMs

(Keene et al. 2008a). The calculated ancestry proportion based on AIMs is then used as a covariate in regression models to account for admixture effect. An alternative approach is to perform PCA. PCA is a statistical procedure to reduce the multidimensional data to small dimensions (eigenvectors), and simultaneously preserving as much individual variation as possible. It has been widely used in genetic studies to correct for population stratification (Price et al. 2006). However, these two methods are not able to fully capture all population structures when family structure or cryptic relatedness is also present (Price et al. 2010b). More recently, approaches using linear mixed model (LMM) that incorporate full covariance structure across individuals have gained popularity (Yang et al. 2014). In particular, linear mixed model relies on an estimate of the genetic relationship matrix (GRM), which encodes the pairwise similarity between pairs of individuals in the data set. The standard LMM is shown as follows:

y = 푋β + Q + ϵ

where y is a vector of outcomes on n subjects, 푋 is the a matrix of predictor variables to be modelled as fixed effects, including genetic and non-genetic covariates as well as a vector of variables representing the genotypes at a particular SNP currently being tested, β is a vector of regression coefficients representing the linear effects of predictors on outcome, and Q and ϵ are

2 2 2 random effects assumed to follow distributions Q~N(0,2φ휎g ) and ϵ~N(0, 휎푒 퐼) respectively (휎푔

2 and 휎푒 are parameters to be estimated representing genetic and environmental components of variance, 퐼 is the identical matrix and φ is the pair-wise kinship coefficients). LMMs have been widely used in large-scale genetic studies, including those with binary outcomes. However,

LMMs assume that the trait has constant residual variance, which is usually violated by binary

12 traits in the presence of covariates and lead to the failure of controlling for type 1 error (Chen et al. 2016). A new method called GMMAT based on logistic mixed model was proposed to correct for population structures in genetic studies with binary outcomes (Chen et al. 2016). For a single-variant test, logistic mixed model can be represented as following:

logit(휋푖) = 푋푖α + 퐺푖β + 푏푖

where 휋푖 = 푃(푦푖 = 1|푋푖, 퐺푖,푏푖) is the probability of a binary phenotype for subject 푖, 푋푖is the vector of covariates, α is the effects of fixed covariates, 퐺푖 is the genotype of a genetic variant for subject 푖, and β is the genotype effect. In addition, 풃 is a matrix of random effects

퐾 with an assumed distribution 퐛~N(0,∑푘=1 휏푘 푉푘 ) where 푘 are the variance component parameters and 푉푘 is the GRM. GMMAT was used in the majority of our single-variant analyses due to our case-control study design and multi-level population structure.

Project Objectives

In this dissertation we aimed to address the hypotheses that the disproportionate burden of T2D-ESKD in the AA community is attributed to genetic risk loci that can be identified through genetic association analyses. Throughout the following projects we sought to leverage high density genotyping arrays along with exome and customized contents and exome-sequencing, as well as a number of statistical and bioinformatic methodologies to aid in the identification of

T2D-ESKD loci in AAs. By unifying these genetic resources, we may provide novel insights into the pathogenesis of T2D-ESKD and add to the knowledgebase of potential therapeutic targets.

The project presented in Chapter 2 performed a GWAS to investigate the hypotheses that common genetic variants are associated with T2D-ESKD susceptibility in AAs. In this study, we examined 3,432 T2D-ESKD cases and 6,977 non-diabetic non-nephropathy controls across three genome-wide genotyping platforms for association with T2D-ESKD. A discrimination stage was performed to remove variants associated with T2D alone in a cohort of 2,756 T2D non-

13 nephropathy subjects. In addition, T2D-ESKD associated variants were tested in 1,910 AA non- diabetic ESKD patients to evaluate their contribution to general forms of ESKD. A secondary analysis was also conducted with APOL1 renal-risk genotype excluded from cases given their increased risk of non-diabetic kidney disease. Six variants were identified in association with

T2D-ESKD at genome-wide significance level P<5x10-8. This study involved a total of 15,075

AA subjects, which provided a substantial power to detect disease loci with moderate effect.

The findings suggested that multiple common variants underlie susceptibility to T2D-ESKD in

AAs.

In Chapter 3 we utilized whole exome-sequencing data to examine the hypotheses that low frequency coding variants are novel drivers of T2D-ESKD in AAs. The genetic variants located in coding regions were evaluated in three independent AA cohorts for association with

T2D-ESKD and all-cause ESKD through a multi-stage study design. A total of 8,577 subjects including 2,476 T2D-ESKD, 2,057 non-diabetic non-nephropathy, 1,003 T2D-lacking nephropathy, and 2,129 non-diabetic ESKD were involved in a series of analyses. Overall, we detected 13 suggestive T2D-ESKD associations (excluding APOL1 G1 alleles) located in 11 distinct regions, including two missense variants, in either the baseline or the APOL1-negative models. Gene-based methods identified seven suggestive associations with T2D-ESKD.

Results of this project suggest that genetic variation in coding region may partially explain the genetic predisposition to T2D-ESKD in AAs.

In Chapter 4 we hypothesized genetic variants in glomerular and tubulointerstitial structure-related genes may be associated with T2D-ESKD in AAs. Herein, 47 candidate genes involved in kidney structure were examined using a high coverage genotyping array, with a customized content of the respective targeted regions, to evaluate the impacts of common and low frequency variants for association with T2D-ESKD in 2,524 T2D-ESKD cases, 1,694 non- diabetic non-nephropathy controls, and 667 T2D-lacking nephropathy individuals. Evidence of

14 association with T2D-ESKD was observed for seven loci including CD2AP, MMP2, TTC21B,

COL4A3, NPHP3-ACAD11, CLDN8 and ARHGAP24 in single variant analysis of either the baseline or the APOL1-negative models. Gene-based analysis and multiple bioinformatic techniques supported potential cumulative effect and functional relevance at associated loci.

These findings suggest the contribution of genetic variations in kidney structure-related genes to predisposition to T2D-ESKD in AAs.

In chapter 5, we took the advantage of exome-sequencing technology to assess the role of genetic variants in CUBN and the gene encoding its transport partner megalin (LRP2) in AA

ESKD patients. The cubilin (CUBN) gene was previously identified as a novel locus for albuminuria, a missense variant rs1801239 (I2984V) in CUBN was associated with elevated urine albumin-to-creatinine ratio in individuals of European and recent African ancestry (Böger et al. 2011). Cubilin forms a functional receptor complex with megalin (encoded by the LRP2) in the proximal tubule to reabsorb filtered urinary albumin (Birn et al. 2000; Dickson et al. 2014).

Sixty-six CUBN and LRP2 genetic variants were selected for association with all-cause ESKD in this multistage study. CUBN variant rs1801239 (I2984V) was significantly associated with T2D-

ESRD in AAs. A novel LRP2 missense variant, rs17848169 (N2632D), was also significantly protective from T2D-ESRD. However, no evidence of association with nondiabetic etiologies of

ESKD in CUBN and LRP2 were identified.

15

Chapter 2

Genome-wide association study in African Americans with T2D-

attributed end-stage kidney disease

Meijian Guan, Jacob M. Keaton, BS, Latchezar Dimitrov, Pamela J. Hicks, Jianzhao Xu, John R.

Sedor, Rulan S. Parekh, Denyse Thornley-Brown, FIND Consortium, Nora Franceschini, Joe

Coresh, Myriam Fornage, Adrienne Tin, Anna Kottgen, Jerome I. Rotter, Stephen S. Rich, Ida

Chen, James G Wilson, Laura J Rasmussen-Torvik, Carl Langefeld, Nicholette Allred, Barry I.

Freedman, Donald W. Bowden, Maggie C. Y. Ng

16

Abstract

End-stage kidney disease (ESKD) is a significant public health problem in the U.S., disproportionately affecting African Americans (AAs) at 3.1-fold higher incidence rate than

European Americans. Diabetes is the leading cause of ESKD, accounting for more than 40% of all ESKD cases. Previous genome-wide association studies (GWASs) have identified >60 genetic variants related to kidney disease or kidney function, however, efforts to uncover genetic susceptibility variants in diabetic kidney disease (DKD) have limited success. In this study, we extended our previous efforts to examine a large AA population with GWAS imputation to the 1000 Genome reference panel. The discovery analysis was performed in

3,432 T2D-ESKD cases and 6,977 non-diabetic non-nephropathy controls (N=10,409), followed by a discrimination analysis with 2,756 T2D non-nephropathy subjects to exclude T2D- associated variants. We identified six independent variants achieved genome-wide significant association (P<5x10-8) with T2D-ESKD located in or close to LOC101929282/RBM43,

LINC01322, RBFOX3/MIR4739, ENPP7, GNG7 and APOL1. Following replication analysis in

1,910 non-diabetic ESKD cases and 908 non-diabetic non-nephropathy controls, meta-analysis of 5,342 all-cause ESKD cases and 6,977 non-diabetic non-nephropathy controls revealed an additional novel locus at LINC00460/EFNB2 (rs77113398 (P=9.84x10-9, OR=1.94) and rs9622363 in APOL1 (P=1.96x10-25, OR=0.68)). Exclusion of APOL1 renal-risk genotype carriers identified an additional variant located in TCF7L2 achieved genome-wide signficant association with T2D-ESKD. These findings provide further evidence for the role of genetic factors impacting advanced kidney disease in diabetic AA individuals.

Introduction

Increasing evidence suggests that genetics plays a major role in end-stage kidney disease (ESKD). This is particularly relevant in African Americans (AAs) where incidence rates of ESKD are almost 3.1-fold greater than European Americans (EAs) (USRDS, 2016). ESKD

17 has been a great health and financial burden in the US. The mortality rates for ESKD, dialysis, and transplant patients were 136, 166 and 30 per 1000 patient-year respectively and it accounts for 7.2% of all Medicare paid claims costs (USRDS, 2016). Diabetes is the leading cause of

ESKD, accounting for> 44% of all causes of ESKD in the US and approximately 90% relate to type 2 diabetes (T2D) (USRDS, 2016). Unfortunately, implementation of glycemic, lipid and blood pressure control have not significantly decreased the prevalence of diabetic kidney disease (DKD) (de Boer et al. 2011; USRDS, 2016). In addition, incidence rates and familial aggregation of DKD remained significant after controlling for socioeconomic status and environmental factors (Spray et al. 1995; Freedman et al. 1995). While the G1 and G2 alleles of apolipoprotein L1 gene (APOL1) explained approximately 70% of non-diabetic ESKD in AAs, however they do not explain the excess risk of T2D-attributed ESKD (T2D-ESKD) in AAs (Tzur et al. 2010a; Genovese et al. 2010; Kopp et al. 2011).

Genome-wide association studies (GWAS) have identified >70 genome-wide significant variants associated with various kidney diseases and functions (Kottgen et al. 2010; Pattaro et al. 2012; Tin et al. 2013a; Pattaro et al. 2016). However, fewer loci were reported to be associated with DKD and did not replicate consistently due to limited sample size(Maeda 2004;

Pezzolesi et al. 2009; McDonough et al. 2011; Sandholm et al. 2012, 2017; Iyengar et al. 2015).

Compared to DKD in type 1 diabetes, the etiology of kidney complications in T2D patients is more heterogeneous, thus requires more careful assessment and greater sample size to achieve statistical power. To explore the underlying genetic architecture of advanced kidney disease in T2D patients, we extended our previous efforts to perform GWAS in a larger sample of AA with kidney diseases. In the discovery stage, GWAS was performed in 3,432 T2D-ESKD cases and 6,977 non-diabetic non-nephropathy controls, followed by a discrimination analysis with 2,756 T2D non-nephropathy individuals to exclude T2D loci. Replication in an additional cohort of 1,910 non-diabetic ESKD subjects and 908 controls were performed to assess the

18 contribution of T2D-ESKD loci to general forms of kidney diseases. Meta-analysis of both diabetic and non-diabetic ESKD individuals was performed to evaluate the overall effect in all- cause ESKD. A secondary analysis excluding APOL1 renal-risk genotype was also conducted to minimize the misclassification of DKD in GWAS.

Materials and Method

Study Participants

This study included participants recruited at the Wake Forest School of Medicine (WFSM;

N=8,052), Family Investigation of Nephropathy and Diabetes (FIND; N=926), Jackson Heart

Study (JHS; N=1,912), Atherosclerosis Risk in Communities Study (ARIC; N=2,221), Coronary

Artery Risk Development in Young Adults (CARDIA; N=912) and Multi-Ethnic Study of

Atherosclerosis (MESA; N=1,052). It is approved by the Institutional Review Board of each participating center. All participants provided written informed consent. Patients were considered to have T2D-ESKD when diabetes was diagnosed for ≥5 years prior to the onset of

ESKD (or in the presence of diabetic retinopathy ensuring adequate T2D durations), and one or more of the followings: renal replacement therapy, estimated glomerular filtration rate (eGFR)

<=30 ml/min/1.73 m2, or urine albumin:creatinine ratio (UACR) >300 mg/g. T2D was diagnosed according to the American Diabetes Association criteria with at least one of the followings: fasting glucose ≥126 mg/dL, 2-h oral glucose tolerance test glucose ≥200 mg/dL, random glucose ≥200 mg/dL, use of diabetes medications, or physician diagnosed diabetes. Non- diabetic ESKD cases lacked diabetes or diabetes developed after initial renal replacement therapy, and ESKD was attributed to chronic glomerular disease (e.g. FSGS), HIV-associated nephropathy, hypertension or unknown cause. Patients with ESKD attributed to surgical or urologic causes, polycystic kidney disease, autoimmune disease, hepatitis, IgA nephropathy, membranous glomerulonephritis, membranoproliferative glomerulonephritis, or monogenic kidney diseases were excluded. Non-diabetic non-nephropathy controls included participants

19 without diabetes or kidney disease (eGFR >60 ml/min/1.73 m2 and UACR <30 mg/g).

Individuals with T2D-lacking nephropathy had eGFR ≥60 ml/min/1.73 m2 and UACR <30 mg/g.

Sample Preparation, genotyping, imputation and quality control

Three sources of genetic data were analyzed in this study: 1) 8,704 samples, recruited from WFSM, ARIC, CARDIA, JHS, MESA and FIND, genotyped on the Affymetrix Genome-wide

Human SNP array 6.0 (Affy6.0); 2) 3,133 samples (WFSM) with data from Affymetrix Axiom

Biobank Genotyping Array (Axiom); and 3) 3,238 samples (WFSM) genotyped with Illumina

Multi-Ethnic Genotyping Array (MEGA). Quality control and imputation were performed in each platform separately.

Variants that passed QC were imputed to a combined cosmopolitan reference haplotype panel from African Genome Variation Project (AGVP) (Gurdasani et al. 2015) and 1000

Genomes Project phase 3 (1000 Genome Consortium 2010) using SHAPEIT2 (Delaneau et al.

2012) and IMPUTE2 (Marchini et al. 2007). Post-imputation QC was conducted to exclude variants with allele mismatch or with large frequency discrepancy (>=0.2) with the reference panel (0.2xfrequency in EUR + 0.8xfrequency in AFR), and imputation info score<0.4.

Affy6.0 datasets

As described in detail previously (Ng et al. 2013), 1,513 T2D-ESKD cases, 5,299 non- diabetic non-nephropathy controls, and 1,892 T2D non-nephropathy subjects from WFSM, FIND,

JHS, ARIC, CARDIA, and MESA cohorts were genotyped using Affy6.0 (Table 1). In each study, standard quality controls (QCs) were applied to exclude variants with call rate <95 %, minor allele frequency (MAF) <0.01, or showing departure from HWE (P< 1 × 10−4). Sample QC was performed to exclude subjects with call rates <95 %, contamination, duplicates, or population outliers. Given that CARDIA, JHS and MESA did not have T2D-ESKD cases, variants that passed QC in each study were combined for imputation and association

20

Axiom dataset

In WFSM, a total of 1,700 AA participants with T2D-ESKD, 770 AA controls without diabetes or nephropathy, as well as 663 AAs with T2D who lacked evidence of nephropathy were genotyped on a customized Axiom genotyping array. Detailed variant information, custom content design, including fine mapping of candidate regions, genotyping methods, and quality control (QC) are described in previous report (Guan et al. 2016). In brief, this array includes approximately 264K coding variants and insertions/deletions (indels), 70K loss-of-function variants, 2K pharmacogenomic variants, 23K eQTL markers, 246K multi-ethnic population based genome-wide tag markers, and 115K custom content. A total of 724,530 variants were successfully called for downstream quality control (QC) and analyses. Variants with call rates

<95%, departure from Hardy Weinberg Equilibrium (HWE) (P<0.0001), and monomorphic variants were excluded. Sample QC was also performed to exclude individuals with low call rate, gender discordance, DNA contamination or other than AA ancestry. Duplicate samples were identified, and one of each duplicate pair removed.

MEGA dataset

In WFSM, 1,910 non-diabetic ESKD cases, 219 T2D-ESKD cases, 201 T2D-lacking nephropathy subjects, and 908 non-diabetic non-nephropathy controls were genotyped on

MEGA array. This array was designed by the Population Architecture using Genomics and

Epidemiology (PAGE) consortium and Illumina to improve fine-mapping and functional discovery by increasing variant coverage across multiple ethnicities. , MEGA content includes two major categories: 1) backbone content containing highly informative variants for GWAS and exome analyses in ancestrally diverse populations, and 2) custom content used to replicate or generalize index GWAS association, augment GWAS tagging variants in priority regions, enhance exome content in priority regions, fine-map GWAS loci, identify functional regulatory variants, explore medically important variants, and identify novel variant loci in candidate

21 pathways(Bien et al. 2016). Genotyping was performed at Wake Forest School of Medicine.

DNA from cases and controls were equally interleaved on 96-well plates to minimize artifactual errors during sample processing. A total of 48 samples sequenced as part of the 1000

Genomes Project (1000 Genome Consortium 2010) at the Coriell Institute for Medical Research were included in genotyping and had a concordance rate of 98.57%. Genotype calling was performed using GenomeStudio (Illumina, CA, USA). A total of 1,705,970 variants were successfully called for downstream quality control (QC) and analyses. Variants with missing position or allele, allele mismatch, call rates <95%, departure from HWE (P<0.0001), frequency difference >0.2 comparing with 1000 Genome Project phase 3 reference panel, and monomorphic variants were removed. Multiple probe sets were compared and only one of them with highest call rate was kept. Sample QC was also performed to exclude individuals with low call rate, gender discordance, DNA contamination or other than AA ancestry. DNA swapping was identified and corrected. For duplicate samples, one of each duplicate pair was removed.

Statistical Analysis

Discovery stage

In the discovery stage, association analysis was performed for each dataset using a logistic mixed model method implemented in the program GMMAT (Chen et al. 2016) under an additive genetic model. This method controls for population structure and cryptic relatedness through including a genetic relationship matrix (GRM) estimated from a set of high-quality autosomal variants as a random effect. Principle components analysis was performed using

EIGENSOFT (Patterson et al. 2006). The first eigenvector (PC1) along with age and sex were used as covariates. A meta-analysis was performed in the three datasets using a fixed-effect inverse variance weighting method implemented in METAL (Willer et al. 2010). Suggestive associations for T2D-ESKD with P<1x10-5 were selected for discrimination analysis.

22

Discrimination stage

To differentiate if T2D-ESKD loci from the discovery stage associations were driven by association with T2D, 2756 AA subjects with T2D-lacking nephropathy and 6977 non-diabetic non-nephropathy controls from the three datasets were examined. Variants showing nominal association (P<0.05) with T2D were excluded. This analysis removed variants associated with

T2D.

Replication and meta-analysis of all-cause ESKD

Genetic variants showing suggestive association (P<1x10-5) in T2D-ESKD meta-analysis and passed discrimination test were examined in a non-diabetic ESKD cohort containing 1,910 non-diabetic ESKD cases and 908 non-diabetic non-nephropathy controls for association with non-diabetic etiologies of kidney disease. Variants showing nominal association( P<0.05) were included in a meta-analysis of all-cause ESKD using all T2D-ESKD, non-diabetic ESKD and controls from the three datasets (N=12,319). This meta-analysis evaluated whether T2D-ESKD associations contributed to the risk of general causes of ESKD.

Exclusion of APOL1 risk genotype carriers

APOL1 G1 and G2 risk alleles explain nearly 70% of genetic susceptibility of non- diabetic kidney disease in AAs (Genovese et al. 2010). To minimize misclassification of T2D-

ESKD, we performed a secondary analysis by excluding APOL1 renal-risk-variant carriers and those missing APOL1 genotypes from the T2D-ESKD samples (APOL1-negative model). This analysis reduced the heterogeneity of our case group despite reducing sample size lower statistical power. Specifically, we removed 308 of 1,513 T2D-ESKD cases from the Affy6.0 datasets, 323 of 1700 T2D-ESKD cases from the Axiom dataset, and 33 of 219 from the MEGA dataset. In addition, we also excluded 891 of 1910 all-cause ESKD cases considering the great impact of APOL1 on non-diabetic kidney disease. It may help to “unmask” the effects of other

23 non-diabetic ESKD variants other than APOL1 risk alleles. Individuals were considered APOL1 renal-risk-variant carriers if they carried two G1 alleles (rs60910145 G allele, rs73885319 G allele), two G2 alleles (rs143830837, 6 in-frame deletion), or were compound heterozygotes (one G1 and one G2 allele)(Genovese et al. 2010).

Transferability of previous kidney disease and kidney function loci

We queried the GWAS catalog for genome-wide significant kidney disease and related phenotypes, including CKD, DKD, ESKD, eGFR, and albuminuria. A total of 77 associated variants located in 71 regions were found and evaluated for association with T2D-ESKD as well as all-cause ESKD in our data set.

Functional characterization

A publically available expression quantitative trait loci (eQTL) database GTEx

(http://www.gtexportal.org/home/) was used to uncover the potential influences of T2D-ESKD associated variants on nearby gene expression. We queried 16 top associations from either the baseline or the APOL1-negative models in GTEx across multiple tissues. In addition, Functional annotation of genetic variants was performed with ANNOVAR to identify the functional relevance of identified variants (Wang et al. 2010).

Results

Study overview

Association analyses were performed in six independent AA cohorts (WFSM, FIND,

ARIC, MESA, JHS, CARDIA) for T2D-ESKD or non-diabetic ESKD through a multi-stage study design (Figure 1). We brought together a total of 15,075 AA subjects classified into four phenotypic groups, including T2D-ESKD (N=3,432), non-diabetic non-nephropathy controls

(N=6,977), T2D-lacking nephropathy (N=2,756), as well as non-diabetic ESKD (N=1,910). This study is well-powered (80%) to detect common variant (MAF>=0.10) with moderate effect

24

(OR>=1.3) at significance level of α=5x10-8 (http://csg.sph.umich.edu/abecasis/cats/). Altogether, we identified seven genome-wide significant loci associated with T2D-ESKD in either the baseline model or the APOL1-negative model, including LOC101929282/RBM43, LINC01322,

RBFOX3/MIR4739, ENPP7, GNG7, APOL1, and TCF7L2. In addition, there was one locus,

LINC00460/EFNB2, reached genome-wide significance in a meta-analysis of 5,342 all-cause

ESKD and 6,977 controls under the baseline model. Eight additional loci were found in suggestive association with all-cause ESKD, these are LPP, FSTL5, OPRK1/ATP6V1H,

SYBU/KCNV1, ALK/YPEL5, MNX1-AS1/UBE3C, ZFHX4/MIR3149, and ZNF536.

Clinical characteristics of study participants

The detailed characteristics of study participants from all stages are described in Table 1 and Table 2. All the ESKD subjects were contributed by three WFSM datasets (Affy6.0, Axiom,

MEGA), FIND and ARIC. Individuals with T2D-ESKD and T2D-lacking nephropathy were overall older or similar compared to the non-diabetic non-nephropathy controls at recruitment. However, the average age of diagnosis of T2D in T2D-ESKD and T2D-lacking nephropathy individuals were younger than the healthy controls at recruitment. T2D-lacking nephropathy subjects and non-diabetic, non-nephropathy controls across all studies had normal kidney function (eGFR>60 ml/min/1.73m2). In addition, all controls had normal fasting glucose level (<126 mg/d).

Individuals with T2D-lacking nephropathy were more obese than individuals with diabetic or non-diabetic ESKD and healthy controls except in ARIC, where T2D-ESKD was the most obese group.

Stage 1 T2D-ESKD association analysis

In stage 1 discovery, we conducted genome-wide association analyses separately in three datasets, 1) 1,513 T2D-ESKD cases and 5,299 non-diabetic non-nephropathy controls genotyped on Affy6.0, contributed by WFSM, FIND, ARIC, JHS, MESA and CARDIA (stage 1a);

2) 1,700 T2D-ESKD cases and 770 non-diabetic non-nephropathy controls from WFSM

25 genotyped with Axiom Biobank genotyping array (stage 1b); and 3) 219 T2D-ESKD cases and

908 non-diabetic, non-nephropathy controls from WFSM genotyped on MEGA (stage 1c). A meta-analysis (stage 1d) was followed to combine association results for 3,432 T2D-ESKD cases and 6,977 non-diabetic, non-nephropathy controls from stage 1a, 1b and 1c (stage 2).

Variants with cumulative minor allele count (MAC) <400 were filtered out. Meta-analysis of

3,432 T2D-ESKD cases and 6,977 non-diabetic, non-nephropathy controls yield an inflation factor λ=1.013 after correcting for genomic control (Supplementary Figure 1), suggesting that population structure and cryptic relatedness were sufficiently adjusted. We excluded additional

59 variants with I2>=80 due to their high heterogeneity in meta-analysis. 478 variants that reached suggestive significance (P<1x10-5) were further assessed in a discrimination analysis.

Stage 2 discrimination analysis

To distinguish whether the T2D-ESKD associations identified in meta-analysis were driven by association with T2D, we performed a discrimination analysis on T2D to compare

2,756 AA T2D-lacking nephropathy subjects with 6,977 non-diabetic non-nephropathy controls from stage 1 (Supplementary Table 1). We excluded 174 out of 478 T2D-ESKD associated variants that were nominally associated with T2D-lacking nephropathy. Among the remaining

T2D-ESKD associations, 10 variants representing 6 independent loci achieved genome-wide significance (Table 3). The top association was rs9622363 (P=1.42x10-10, OR=0.77), located in

APOL1. This variant was in moderate linkage disequilibrium (LD; 0.33, 0.34, YRI) with the

APOL1 G1 alleles (rs60910145, rs73885319) that associated with non-diabetic

ESKD(Genovese et al. 2010). In contrast, APOL1 G1 G2 alleles revealed significantly higher impact on non-diabetic ESKD, with OR ranges from 6.7 to 11 (Genovese et al. 2010; Tzur et al.

2010b; Friedman et al. 2011). A variant located in a non-protein coding RNA gene LINC01322, rs58627064 (P=6.81x10-10, OR=1.62), demonstrated the second most significant association.

There were two independent signals rs148187038 (P=1.71x10-8, OR=0.74) and rs142671759

26

(P=5.53x10-9, OR=2.26), located at RBFOX3/MIR4739 and ENPP7 respectively in chromosome

17, also showed genome-wide significance. In addition, two genome-wide significant T2D-ESKD associations, rs4807299 (P=3.21x10-8, OR=1.67) located in GNG7 and rs72858591 (P=4.54x10-

8, OR=1.43) located in LOC101929282/RBM43 were identified (Fig 1A).

Stage 3 non-diabetic ESKD analysis and stage 4 all-cause ESKD meta-analysis

We tested 245 variants that indicated suggestive association (P<1x10-5) with T2D-

ESKD, passed the discrimination stage, and had I2<80 in an independent non-diabetic ESKD

AA cohort (N=1,910). The goal of this analysis (stage 3) was to evaluate the contribution of

T2D-ESKD association loci to non-diabetic kidney disease. A total of 45 variants demonstrated nominal significance (P<0.05) in stage 4, 20 out 45 were located in APOL1 and MYH9 regions, which further confirmed their role in non-diabetic kidney disease in AAs. An all-cause ESKD meta-analysis, including 5,342 all-cause ESKD and 6,977 non-diabetic non-nephropathy controls, was conducted to evaluate the generalizability of 45 T2D-ESKD variants in broader forms of kidney disease (stage 4). We identified 35 genome-wide significant variants associated with all-cause ESKD from two regions, 15 from LINC00460/EFNB2 and 20 from APOL1 region

(Fig 1b). The lead association in APOL1 was rs9622363 (P=1.96x10-25, OR=0.68), and the top signal in LINC00460/EFNB2 was rs77113398 (P=9.84x10-9, OR=1.94) (Table 4). There were four more independent loci also demonstrated suggestive association (P<5x10-6) with all-cause

ESKD found in LPP, FSTL5, OPRK1/ATPV1H, and SYBU/KCNV1 (Table 4).

Association analysis with exclusion of APOL1 renal-risk carriers

We performed a secondary analysis to exclude APOL1 G1 and G2 carriers in T2D-

ESKD cases (APOL1-negative model) due to their substantially increased risk of non-diabetic kidney disease. The baseline model showed strong association of APOL1 and MYH9 with T2D-

ESKD which suggest some cases may be misclassified as having T2D-ESKD. We excluded 664

T2D-ESKD cases from stage 1 and stage 2 analyses, which led to 2,768 T2D-ESKD cases

27 remaining in the analyses. A total of 458 variants that showed nominal association with T2D-

ESKD (P<1x10-5) were selected for stage 3 discrimination analysis. We dropped 153 variants that indicated evidence of association with T2D. In addition, we removed 27 associations that showed strong heterogeneity (I2>=80) in meta-analysis. Two genome-wide significant variants identified in baseline model replicated consistent association with T2D-ESKD in APOL1- negative model, these are rs72858591 (P=3.22x10-8, OR=1.47) in LOC101929282/RBM43 and rs142671759 (P=4.10x10-8, OR=2.30) in ENPP7. We identified an additional variant that reached genome-wide significance, rs73358292 (P=1.36x10-8, OR=1.60) located in TCF7L2

(Table 5). This variant indicated modest LD with rs7903146 (r2=0.22, YRI), which is the most significant association with T2D in AAs (Ng et al. 2014a), however rs73358292 was not associated with T2D in this study (P=0.070; Supplementary Table 2). We further tested 278 suggestive T2D-ESKD associations that passed discrimination and had I2 <80 in additional 1019

AA non-diabetic ESKD subjects with APOL1 renal-risk carriers excluded, 14 variants showing nominal evidence of association with non-diabetic ESKD were further tested in an all-cause

ESKD meta-analysis without APOL1 renal-risk carriers (N=1,0764). No genome-wide significant association was observed. However, Five loci showed nominal association (P=5x10-6) with all- cause ESKD including LPP, previously associated with all-caused ESKD in baseline model, as well as four additional loci, including ALK/YPEL5, MNX1-AS1/UBE3C, ZFHX4/MIR3149 and

ZNF536 (Table 6). It is of note that all the top associations from the baseline model had moderate attenuation in significance, despite the similar effect size, due in part to the reduced sample size (Supplementary Table 3).

Evaluation of previous kidney disease and related GWAS associations

We evaluated 76 variants from 71 regions that associated with kidney disease or related traits in previous report for association with T2D-ESKD as well as all-cause ESKD

(Supplementary Table 4). Three variants replicated directionally consistent association with

28

T2D-ESKD or all-cause ESKD in either baseline model or APOL1-negative model (Table 7). A

“stop gained” variant rs1044261, located in IDI2 and previously associated with CKD (Pattaro et al. 2016), replicated consistent association with both T2D-ESKD (P=0.013, OR=1.26) and all- cause ESKD (P=0.014, OR=1.24) in baseline model. Notably, after excluding APOL1-renal-risk carriers, we observed elevated association at rs1044261 for both T2D-ESKD (P=0.0006,

OR=1.41) and all-cause ESKD (P=0.00053, OR=1.39). It may indicate a potential interaction between IDI2 and APOL1 G1 G2 alleles. An eGFR associated variant rs9895661 (Kottgen et al.

2010) from BCAS3 showed association with both T2D-ESKD (P=0.031, OR=1.08) and all-cause

ESKD (P=0.034, OR=1.08) in baseline model. In addition, there was another variant previously showed association with eGFR, rs17536527 located in SPATA5L1/C15orf48 (Tin et al. 2013a), replicated association with T2D-ESKD (P=0.040, OR=1.09) in APOL1-negative model.

In addition, we examined top associations in our previous T2D-ESKD GWAS

(McDonough et al. 2011), located in RPS12/HMGB1P13, SASH1, LINC00484/AUH,

PCNPP3/RPSAP52, and LIMK2. They all showed attenuated association in this study

(Supplementary Table 5).

Discussion

We performed a high density GWAS to investigate genetic susceptibility to T2D-ESKD in

15,075 AAs. Top T2D-ESKD associated variants were further tested for association with non- diabetic ESKD and meta-analysis to test for their generalizability in common forms of ESKD.

Seven independent genetic loci achieved genome-wide significant association with T2D-ESKD in either the baseline or the APOL1-negative models, including LOC101929282/RBM43,

LINC01322, RBFOX3/MIR4739, ENPP7, GNG7, APOL1, and TCF7L2. We also identified two genome-wide signficant all-cause ESKD loci located in LINC00460/EFNB2 and APOL1. In addition, there were eight genetic loci demonstrated nominal association (P<5x10-6) with all-

29 cause ESKD, including LPP, FSTL5, OPRK1/ATP6V1H, SYBU/KCNV1, ALK/YPEL5, MNX1-

AS1/UBE3C, ZFHX4/MIR3149, and ZNF536.

An intronic variant, rs9622363, located in APOL1 region, revealed the most significant association with T2D-ESKD (OR=0.77, P=1.42x10-10) as well as all-cause ESKD (OR=0.69,

P=1.96x10-25) in the baseline model. Rs9622363 was originally identified along with the APOL1

G1 and G2 alleles for association with non-diabetic kidney disease in individuals with African ancestry, that is, conditioning on APOL1 G1 and G2 dramatically diminished its significance(Genovese et al. 2010). In a more recent study, rs9622363 and APOL1 G1 alleles formed a haplotype that achieved the strongest association with CKD in Nigerians(Tayo et al.

2013). Unlike G1 or G2, the major allele (G, MAF=0.57) of rs9622363 is responsible for the increase disease risk. After the removal of APOL1 risk carriers, the association of rs9622363 was attenuated, which confirmed that rs9622363 and APOL1 G1 and G2 alleles are contributing to the same signal. The appearance of rs9622363 in baseline model may due in part to the misclassified individuals as T2D-ESKD cases.

An intergenic variant (rs72858591) located between an RNA gene LOC101929282 and

RBM43, and encodes RNA binding motif protein 43, revealed genome-wide significance. There was an independent intergenic variant (rs7560163, r2=0.01, YRI) located in this region previously reported to be associated with T2D in AAs(Palmer et al. 2012). In contrast, rs72858591 showed no evidence of association (P=0.073) with T2D in our study. It may suggest that two different sets of variations in this locus may contribute to T2D and T2D-ESKD separately, which suggest a pleiotropic effect of this region. Another intergenic variant

(rs148187038) in genome-wide significant association with T2D-ESKD was located between

RBFOX3 and a micro RNA gene, MIR4739. RBFOX3 encodes a RNA binding protein, fox-1 homolog 3, which was involved in neural tissue development and regulation of adult brain function. Variations in this gene have been reported in association with neurological disorders

30

(Lucas et al. 2014; Wang et al. 2015). A recent study suggested that RBFOX3 variation interacts with BMI for increased serum urate levels(Huffman et al. 2015). The biological correlation between RBFOX3/MIR4739 region and T2D-ESKD remains unclear and requires further assessment.

There were two additional genome-wide significant T2D-ESKD signals, rs142671759

(P=5.53x10-9) and rs4807299 (P=3.21x10-8), located in ENPP7 and GNG7. The protein encoded by ENPP7 is an intestinal alkaline sphingomyelin phosphodiesterase that converts sphingomyelin to ceramide and phosphocholine. This gene has been reported affecting cholesterol absorption in an animal study(Zhang et al. 2014). ENPP7 may associate with DKD as numerous studies have suggested that high-density lipoprotein cholesterol is a risk factor of kidney disease in diabetic patients (Chang et al. 2013; Williams and Conway 2017; Ceriello et al.

2017). GNG7 encodes G Protein Subunit Gamma 7, which was implicated in central nervous system(Schwindinger et al. 2003) function as well as multiple types of cancer (Ohta et al. 2008;

Demokan et al. 2013).

Analyses excluding APOL1 renal-risk genotype carriers in T2D-ESKD cases provided an opportunity to uncover the genetic architecture in T2D-ESKD. In the APOL1-negative model, in addition to LOC101929282/RBM43 and ENPP7, which have been identified in the baseline model, one variant in TCF7L2 achieved genome-wide significant association with T2D-ESKD.

TCF7L2 is a well-established T2D loci implicated in multiple ethnic groups including AAs

(Saxena et al. 2013; DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium et al. 2014; Ng et al. 2014a). The variant we identified in this study had no evidence of association with T2D (P=0.070) and very modest LD (r2=0.22, YRI) with the known T2D variant rs7903146. Furthermore, variations in TCF7L2 previously demonstrated compelling correlation with the development of kidney disease in both diabetic and non-diabetic individuals (Köttgen et al. 2008; Araoka et al. 2010; Fan et al. 2016). In addition, experimental results suggest that

31

TCF7L2 impacts the development of DKD through regulating activin receptor-like kinase 1

(ALK1)/Smad1 pathway (Araoka et al. 2010). Taken together, our finding provides further evidence of involvement of TCF7L2 in advanced DKD. The pleiotropic effect of TCF7L2 on T2D and kidney disease will require further investigate.

We included a cohort of AA individuals with non-diabetic ESKD to evaluate the generalizability of T2D-ESKD associated loci in broader forms of kidney disease. The meta- analysis combining both T2D-ESKD and non-diabetic ESKD samples (baseline model) revealed two genome-wide significant loci associated with all-cause ESKD, with lead variants rs77113398 (P=9.84x10-9) in LINC00460/EFNB2 and rs9622363 (P=1.96x10-25) in APOL1.

Genome scans in AAs identified significant evidence for linkage to ESKD on chromosome

13q33, containing EFNB2 region, in both diabetic and non-diabetic ESKD individuals (Bowden et al. 2004; Freedman et al. 2005). A follow-up study examined 28 tag SNPs spanning the 39 kilobases (kb) of the EFNB2 coding region for association with all-cause ESKD; nominal associations were observed on two SNPs independent from rs77113398 (Hicks et al. 2008).

The ephrin-B2 (EFNB2) is expressed in the developing nephron, interactions between ephrin-B2 and its receptors appeared to play an important role in glomerular microvascular assembly

(Takahashi et al. 2001). In addition, ephrin-B2 reverse signaling protects against peritubular capillary rarefaction by regulating angiogenesis and vascular stability during kidney injury (Kida et al. 2013). Ephrin-B1 was also found to co-localize with CD2-associated protein (CD2AP) and nephrin at the podocyte slit diaphragm and plays an important role in maintaining barrier function at the slit diaphragm (Hashimoto et al. 2007), and ephrin B4 receptor kinase transgenic mice develop glomerulopathy, manifested by fused afferent and efferent arterioles bypassing the glomeruli (Andres et al. 2003). Taken together, multiple lines of evidence support that

EFNB2 is associated with various kidney functions, and it is the most promising causal gene

32 underlying the association of rs77113398. A more comprehensive investigation on EFNB2 gene is essential to further understand the pathophysiology associated with ESKD.

The extended sample size provided an opportunity to evaluate previous genetic associations with kidney disease or its related traits. We observed directionally consistent association at three loci, IDI2, BCAS3 and SPATA5L1/C15orf48 (Kottgen et al. 2010; Tin et al.

2013a; Pattaro et al. 2016), associated with CKD or eGFR in Europeans previously. The lack of replication of previous associations may largely due to the ancestral difference in genetic architecture since the majority of the associations were identified in Europeans. Another reason is that previous studies mostly focused on early stages of kidney disease which differ from

ESKD with respect to the genetic etiology. In addition, many previous DKD associations are likely to be false positives because of limited sample size and heterogeneous disease etiologies.

This study has limitations. Although the multi-stage study design is well-powered including 15,075 AA individuals, it still lacks T2D-ESKD replications. We uncover several variants of low frequency but lacking additional T2D-ESKD samples for replication. There are few other existing collections of appropriate AA samples; this limited possible replication studies.

Moreover, it is nearly impossible to exclude all individuals misclassified as DKD due to the frequent lack of kidney biopsies. However, we carefully excluded samples with ESKD attributed to non-diabetic etiologies, and subsequently excluded APOL1 renal-risk-alleles carriers with high risk for non-diabetic kidney disease. These should minimize misclassification.

To summarize, we conducted a GWAS in AA individuals with T2D-ESKD. Variants in seven genetic loci, LOC101929282/RBM43, LINC01322, RBFOX3/MIR4739, ENPP7, GNG7,

APOL1, and TCF7L2 achieved genome-wide significant association with T2D-ESKD. The contribution of top T2D-ESKD associations to non-diabetic etiologies of ESKD was also evaluated. APOL1 and LINC00460/EFNB2 were associated with non-diabetic ESKD, and revealed genome-wide significance for association with all-cause ESKD. Future investigations,

33 including generic replication and experimental validation, on the newly identified associations will be necessary to determine their potential impact on the biological processes related to DKD.

34

Figure 1. Workflow of T2D-ESKD GWAS in AAs (Baseline model)

35

Figure 2.A. Locus plots of T2D-ESKD associations at P<5x10-8 in baseline model

36

Figure 1.B. Locus plots of all-cause ESKD associated variants at P<5x10-8 in baseline model

Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; P, p value. Baseline model: adjusted for age, sex and PC1, APOL1 risk genotype carriers included; APOL1-negative mode, adjusted for age, sex and PC1, APOL1 risk genotype carriers excluded.

37

Figure 3. Locus plots of T2D-ESKD associations at P<5x10-8 in APOL1-negative model

Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; P, p value. Baseline model: adjusted for age, sex and PC1, APOL1 risk genotype carriers included; APOL1-negative mode, adjusted for age, sex and PC1, APOL1 risk genotype carriers excluded.

38

Table 1. Clinical characteristics of Affy6.0 dataset (stage 1a)

WFSM FIND ARIC JHS MESA CARDIA Non- Non- Non- Non- Non- T2D T2D- T2D- T2D- T2D- T2D- T2D- diabetic T2D- diabetic diabetic diabetic diabetic Characteristics - lacking lacking lacking lacking lacking ESK non- ESK non- non- non- non- ESK nephrop nephrop nephrop nephrop nephrop D nephropat D nephropat nephropat nephropat nephropat D athy athy athy athy athy hy hy hy hy hy N 790 891 627 299 96 1318 807 1569 343 774 278 747 165 63.0 54.2 64.5 Females (%) 56 75.92 62.22 63.07 58.57 67.64 53.88 53.24 60.11 66.06 4 3 8 62.1 56.8 60.3 47.23±11.7 59.77±11. 60.26±5.7 53.37±11.5 59.29±9.0 65.60±9.2 49.39±4.3 Age (years) 1±9. 4±1 4±5. 60.16±6.11 63.84±9.21 47.57±5.03 5 10 9 7 5 2 1 95 1.95 60 Age of onset of 40.3 47.2 57.41±10. 44.92±11. 32.28±7.8 diabetes 5±11 - - - 8±10 - - - - - 79 26 7 (years) .76 .42 Duration of 18.4 20.4 diabetes prior 7±9. - - - 8±8. ------to ESKD 59 59 (years) 3.31 14.6 Duration of ±3.5 - - - 8±5. ------ESKD (years) 8 74 293. 192. Fasting serum 22±1 105.12±9.1 192.42±9 156.49±6 164.26±5 166.63±6 75±6 - - - 89.83±8.48 98.09±9.6 96.44±8.83 glucose (mg/dl) 14.9 7 4.32 5.82 8.50 5.70 4.13 6 eGFR 96.67±21.3 78.58±12.4 79.50±12. 95.53±17.5 92.72±18. 78.87±12.1 82.86±16. 91.05±13.1 95.60±13. - - - - (ml/min/1.73m2) 3 7 63 7 48 0 00 0 58 29.8 33.4 Body mass 32.24±6.7 35.41±7.7 31.83±6.4 1±6. 29.99±6.99 - - 3±7. 29.24±6.12 31.59±7.67 29.84±5.8 - - index (kg/m2) 5 2 8 98 25 Categorical data expressed as percentage; continuous data as mean ± SD. Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; N, number; eGFR, estimated glomerular filtration rate

39

Table 2. Clinical characteristics of Axiom and MEGA datasets (stage 1b and 1c)

Axiom MEGA Non-diabetic Non-diabetic Non- Characteristics T2D-lacking T2D-lacking T2D-ESKD non- T2D-ESKD non- diabetic nephropathy nephropathy nephropathy nephropathy ESKD N 1700 770 663 219 908 201 1910 Females (%) 55.65 48.96 64.4 49.12 58.78 62.1 41.23 Age (years) 62.03±10.82 47.89±12.01 55.72±11.62 61.99±10.99 44.77±13.86 55.81±9.51 55.35±14.44 Age of onset of diabetes 39.68±12.77 - 46.20±12.27 37.82±9.58 - 43.74±10.30 - (years) Duration of diabetes prior to 19.09±9.97 - - 20.42±9.53 - - - ESKD (years) Duration of ESKD (years) 3.56±3.58 - - 4.08±3.09 - - 6.2±5.81 Fasting serum glucose 184.26±89.23 95.92±21.89 163.21±92.29 126.75±33.43 96.03±10.51 173.59±61.75 90.33±3.06 (mg/dl) eGFR (ml/min/1.73m2) 33.49±28.56 96.00±20.84 91.3±19.66 - 85.86±17.33 95.23±17.26 19.01±4.64 Body mass index (kg/m2) 30.74±7.05 29.58±7.37 33.13±7.83 30.8±7.03 29.73±6.61 33.01±6.52 27.75±7.21 Categorical data expressed as percentage; continuous data as mean ± SD. Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; N, number; eGFR, estimated glomerular filtration rate

40

Table 3. Independent T2D-ESKD associations (P<5x10-8) under the baseline model

Stage 1c (219 Stage 1a (1513 T2D- Stage 2: Meta-analysis Stage 1b (1700 T2D- T2D-ESKD cases Effec ESKD cases vs. 5299 (3432 T2D-ESKD cases ESKD cases vs. 770 vs. 908 non- t/Oth non-diabetic vs. 6977 non-diabetic CH non-diabetic non- diabetic non- Lead variant POS Locus er non-nephropathy non-nephropathy R nephropathy controls) nephropathy allele controls) controls) s controls) EAF OR P EAF OR P EAF OR P EAF OR P LOC101929282/RB 1.32E- 1.1 0.5 4.54E- rs72858591 2 151711452 C/T 0.10 1.55 0.09 1.19 0.18 0.10 0.10 1.43 M43 08 4 7 08 4.38E- 1.2 0.4 6.81E- rs58627064 3 165051826 LINC01322 T/G 0.07 1.84 0.06 1.19 0.27 0.06 0.06 1.62 11 5 6 10 3.27E- 0.9 0.6 1.71E- rs148187038 17 77666704 RBFOX3/MIR4739 A/G 0.22 0.69 0.23 0.81 0.018 0.25 0.23 0.74 08 2 2 08 1.53E- 1.5 0.4 5.53E- rs142671759 17 77706698 ENPP7 C/T 0.03 2.82 0.02 1.15 0.66 0.02 0.02 2.26 10 3 0 09 7.31E- 1.1 0.6 3.21E- rs4807299 19 2570002 GNG7 A/C 0.05 1.95 0.04 1.12 0.56 0.05 0.05 1.67 10 6 8 08 2.55E- 0.000 0.7 0.0 1.42E- rs9622363 22 36656555 APOL1 A/G 0.46 0.78 0.44 0.77 0.48 0.45 0.77 07 40 7 8 10 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; EAF, effect allele frequency; OR, odds ratio; P, p value; Baseline model: adjusted for age, sex and PC1, APOL1 risk genotype carriers included.

41

Table 4. All-cause ESKD associated variants at P<5x10-6 in in baseline model Stage 5: Meta-analysis Stage 2 (3432 T2D-ESKD Stage 4 (1910 non-diabetic (5342 all-cause ESKD cases vs. 6977 non- ESKD cases vs. 908 non- cases vs. 6977 non- CH Effect/Oth diabetic non- diabetic non-nephropathy Lead variant POS Locus diabetic non- R er alleles nephropathy controls) controls) nephropathy controls) EAF OR P EAF OR P EAF OR P 8.66E- 1.3 rs76971802 3 188607071 LPP T/C 0.087 1.35 0.084 1.42 0.013 0.087 1.36E-06 06 5 1.64E- 1.2 rs5863506 4 162909217 FSTL5 TA/T 0.33 1.22 0.33 1.19 0.033 0.33 4.57E-07 06 1 OPRK1/ATP6V 9.40E- 0.6 rs141746998 8 54310938 CAT/C 0.031 0.62 0.034 0.61 0.022 0.031 1.44E-06 1H 06 1 3.21E- 1.2 rs11997465 8 110891977 SYBU/KCNV1 C/G 0.28 1.22 0.28 1.21 0.026 0.28 6.72E-07 06 2 LINC00460/EF 1.25E- 1.9 rs77113398 13 107103906 A/G 0.023 1.94 0.021 1.87 0.025 0.023 9.84E-09 NB2 07 4 0.454 1.42E- 0.6 rs9622363 22 36656555 APOL1 A/G 0.77 0.35 0.42 4.32E-29 0.43 1.96E-25 7 10 8 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; EAF, effect allele frequency; OR, odds ratio; P, p value; Baseline model: adjusted for age, sex and PC1, APOL1 risk genotype carriers included.

42

Table 5. Independent T2D-ESKD associations (P<5x10-8) under APOL1-negative model Stage 1a (1205 T2D- Lea ESKD cases vs. 5299 Stage 1b (1377 T2D-ESKD Stage 1c (186 T2D-ESKD cases Stage 2: Meta-analysis (2768 T2D- C d PO Effect/ non-diabetic cases vs. 770 non-diabetic non- vs. 908 non-diabetic non- ESKD cases vs. 6977 non-diabetic H Locus vari S Other non-nephropathy nephropathy controls) nephropathy controls) non-nephropathy controls) R ant alleles controls) EAF OR P EAF OR P EAF OR P EAF OR P rs72 151 LOC101 1.6 8.53E- 858 2 711 929282/ T/C 0.096 0.094 1.22 0.15 0.097 1.17 0.52 0.096 1.47 3.22E-08 2 09 591 452 RBM43 rs73 114 1 1.7 1.97E- 358 806 TCF7L2 A/C 0.063 0.073 1.15 0.37 0.076 2.15 0.0033 0.067 1.60 1.36E-08 0 8 08 292 988 rs14 777 267 1 2.9 1.37E- 066 ENPP7 T/C 0.025 0.019 1.27 0.45 0.016 1.25 0.69 0.023 2.30 4.10E-08 175 7 2 09 98 9 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value; APOL1-negative model: adjusted for age, sex and PC1, APOL1 risk genotype carriers excluded.

43

Table 6. All-cause ESKD associated variants at P<5x10-6 in APOL1-negative model Stage 4 (1019 non- Stage 2: Meta-analysis diabetic ESKD cases stage 5 (3787 all- (2768 T2D-ESKD cases vs. 908 non-diabetic cause ESKD cases

Lead Effect/Other vs. 6977 non-diabetic non-nephropathy vs. 6977 non-diabetic CHR POS Locus variant alleles non-nephropathy controls) non-nephropathy controls) controls) EAF OR P EAF OR P EAF OR P 6.07E- 3.08E- rs12472637 2 30304514 ALK/YPEL5 A/G 0.33 0.82 0.36 0.82 0.024 0.33 0.83 06 06 1.38E- 5.39E- rs76971802 3 188607071 LPP T/C 0.087 1.42 0.079 1.40 0.040 0.087 1.40 06 07 MNX1- 4.69E- 1.72E- rs6459733 7 156930550 C/G 0.61 0.81 0.61 0.81 0.032 0.61 0.81 AS1/UBE3C 07 07 5.33E- 2.83E- rs111267392 8 77815702 ZFHX4/MIR3149 T/C 0.96 0.61 0.96 0.60 0.029 0.96 0.62 06 06 8.95E- 1.46E- rs58114373 19 30965425 ZNF536 T/C 0.42 0.81 0.44 0.82 0.034 0.42 0.81 07 07 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; EAF, effect allele frequency; OR, odds ratio; P, p value; APOL1-negative model: adjusted for age, sex and PC1, APOL1 risk genotype carriers excluded.

44

Table 7. CKD and related associations from previous studies replicated significance and consistent effect

Stage 2: Meta-analysis (3432 T2D-ESKD Stage 5: Meta-analysis (5342 all-cause Baseline model cases vs. 6977 non-diabetic non- ESKD cases vs. 6977 non-diabetic non- nephropathy controls) nephropathy controls) Report C Phe Rep Ann Effect/ Vari PO Citatio ed H not orte Locus otati Other EAF OR P EAF OR P ant S n BETA/ R ype d EA on allele OR rs10 106 (Pattaro stop 1 CK 442 571 et al. T 1.15 IDI2 _gai T/C 0.046 1.26 0.013 0.045 1.24 0.014 0 D 61 0 2016) ned rs98 594 (Köttge 1 eGF intro 956 565 n et al. C 0.01 BCAS3 C/T 0.48 1.08 0.031 0.49 1.08 0.034 7 R n 61 89 2010) Stage 2: Meta-analysis (2768 T2D-ESKD stage 5 (3787 all-cause ESKD cases vs. APOL1-negative model cases vs. 6977 non-diabetic non- 6977 non-diabetic non-nephropathy nephropathy controls) controls) rs10 106 (Pattaro stop 1 CK 442 571 et al. T 1.15 IDI2 _gai T/C 0.047 1.41 0.0006 0.046 1.39 0.00053 0 D 61 0 2016) ned rs17 457 (Tin et SPATA Inter 1 eGF 536 191 al. C 1.13 5L1/C1 geni C/G 0.34 1.09 0.040 0.34 1.06 0.16 5 R 527 87 2013a) 5orf48 c Abbreviations: CHR, chromosome; POS, position; EA, effect allele; OR, odds ratio; BETA, regression coefficient; EAF, effect allele frequency; P, p value; Baseline model: adjusted for age, sex and PC1, APOL1 risk genotype carriers included, APOL1-negative model: adjusted for age, sex and PC1, APOL1 risk genotype carriers excluded.

45

Supplementary Figure 1. QQ plot of GWAS results of T2D-ESKD vs. non-diabetic non- nephropathy controls under baseline model

46

Supplementary Table 1. Discrimination analysis for genome-wide significant T2D-ESKD associated SNPs in baseline model

Variant CHR POS Effect/Other allele N EAF OR P rs72858591 2 151711452 C/T 9733 0.091 1.12 0.073 rs58627064 3 165051826 T/G 9733 0.058 1.14 0.083 rs148187038 17 77666704 A/G 9733 0.23 0.91 0.060 rs142671759 17 77706698 C/T 9733 0.021 1.29 0.070 rs4807299 19 2570002 A/C 9733 0.048 1.14 0.15 rs9622363 22 36656555 A/G 9733 0.47 0.96 0.25 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value.

47

Supplementary Table 2. Discrimination analysis for genome-wide significant T2D-ESKD associated SNPs in APOL1- negative model

Variant CHR POS Effect/Other allele N EAF OR P rs72858591 2 151711452 C/T 9733 0.091 1.12 0.073 rs73358292 10 114806988 C/A 9733 0.062 1.15 0.070 rs142671759 17 77706698 C/T 9733 0.021 1.29 0.070 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value.

48

Supplementary Table 3. Results of APOL1-negative model for top associations in baseline model

Variant CHR POS EA OA EAF N OR P Phenotype rs72858591 2 151711452 T C 0.096 9745 1.47 3.22E-08 T2D-ESKD rs58627064 3 165051826 T G 0.061 9745 1.56 2.06E-07 T2D-ESKD rs148187038 17 77666704 A G 0.230 9745 0.74 7.98E-08 T2D-ESKD rs142671759 17 77706698 T C 0.023 9745 2.30 4.10E-08 T2D-ESKD rs4807299 19 2570002 A C NA NA 1.50 6.91E-05 T2D-ESKD rs9622363 22 36656555 A G NA NA 1.16 6.08E-04 T2D-ESKD rs76971802 3 188607071 T C 0.087 10764 1.40 5.39E-07 all-cause ESKD G rs11997465 8 110891977 C 0.28 10764 1.22 4.89E-06 all-cause ESKD rs77113398 13 107103906 A G 0.023 10764 1.86 1.21E-06 all-cause ESKD rs5863506 4 162909217 D I NA NA 0.85 1.40E-04 all-cause ESKD rs141746998 8 54310938 D I NA NA 0.61 7.78E-06 all-cause ESKD Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; EA, effect allele; OA, other allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value.

49

Supplementary Table 4. Evaluation of previous loci associated with kidney disease and related traits

T2D-ESKD cases vs. non- All-ESKD cases vs. non- Baseline model diabetic non-nephropathy diabetic non-nephropathy controls controls C Conseq E O SNP H POS Traits Study Gene EAF OR P N EAF OR P N uence A A R rs129 1 2036 (Köttgen et UMOD/PDIL downstr 1231 1770 eGFR T G 0.047 0.97 0.730 10409 0.045 0.96 0.624 6 7690 al. 2009) T eam 9 7 rs102 Serum 7390 (Chambers 1231 0689 2 creatinine ALMS1P intron T C 0.453 0.95 0.187 10409 0.453 0.96 0.291 0900 et al. 2010) 9 9 levels 1606 Serum rs312 (Chambers SLC22A2/LO 5 prime 1231 6 8139 creatinine A G 0.842 0.99 0.798 10409 0.842 1.00 0.928 7573 et al. 2010) C105378088 UTR 9 3 levels Serum rs806 1 5948 (Chambers 1231 creatinine TBX2 intron T C 0.276 1.02 0.728 10409 0.275 1.00 0.956 8318 7 3766 et al. 2010) 9 levels Serum rs480 1 3345 (Chambers 1231 creatinine CEP89 intron T C 0.023 1.13 0.368 10409 0.023 1.13 0.308 5834 9 3659 et al. 2010) 9 levels 1509 rs267 (Köttgen et CERS2/LOC upstrea 1231 1 5147 CKD T C 0.963 1.00 0.979 10409 0.963 0.97 0.739 734 al. 2010) 105371438 m gene 9 7 rs126 2773 (Köttgen et missen 1231 2 CKD GCKR T C 0.138 0.93 0.167 10409 0.138 0.94 0.253 0326 0940 al. 2010) se 9 rs135 7386 (Köttgen et missen 1231 2 CKD NAT8 A G 0.455 0.96 0.246 10409 0.455 0.97 0.345 38 8328 al. 2010) se 9 1418 rs347 (Köttgen et 1231 3 0713 CKD TFDP2 intron A C 0.747 0.97 0.434 10409 0.746 0.95 0.231 685 al. 2010) 9 7 rs119 3939 (Köttgen et 1231 5992 5 CKD DAB2 intron A T 0.320 0.92 0.046 10409 0.317 0.92 0.020 7132 al. 2010) 9 8 regulat rs881 4380 (Köttgen et LOC1079865 1231 6 CKD ory A G 0.364 0.98 0.604 10409 0.364 1.00 0.910 858 6609 al. 2010) 98 9 region 1606 rs227 (Köttgen et 1231 6 6838 CKD SLC22A2 intron A G 0.803 1.03 0.533 10409 0.803 1.03 0.551 9463 al. 2010) 9 9 rs646 7741 (Köttgen et RSBN1L/TM downstr 1231 7 CKD T C 0.510 1.04 0.311 10409 0.509 1.03 0.411 5825 6439 al. 2010) EM60 eam 9

50

1514 rs780 (Köttgen et 1231 7 0780 CKD PRKAG2 intron A G 0.307 0.99 0.811 10409 0.309 0.98 0.599 5747 al. 2010) 9 1 rs101 2375 (Köttgen et STC1/LOC10 interge 1231 0941 8 CKD T C 0.326 0.95 0.252 10409 0.325 0.97 0.469 1151 al. 2010) 7986931 nic 9 4 rs474 7143 (Köttgen et 1231 9 CKD PIP5K1B intron A C 0.412 1.00 0.929 10409 0.412 1.00 0.973 4712 4707 al. 2010) 9 rs107 1 1156 (Köttgen et 1231 9472 CKD WDR37 intron T C 0.211 0.95 0.245 10409 0.212 0.95 0.240 0 165 al. 2010) 9 0 rs401 1 6550 (Köttgen et KRT8P26/AP interge 1231 CKD C G 0.789 1.04 0.350 10409 0.788 1.04 0.395 4195 1 6822 al. 2010) 5B1 nic 9 rs107 1 3492 (Köttgen et 1231 7402 CKD SLC6A13 intron T C 0.496 1.09 0.024 10409 0.495 1.08 0.021 2 98 al. 2010) 9 1 1120 rs653 1 (Köttgen et 1231 0775 CKD ATXN2 intron T C 0.926 0.88 0.092 10409 0.925 0.87 0.057 178 2 al. 2010) 9 6 rs626 1 7234 (Köttgen et 1231 CKD DACH1 intron A C 0.350 1.06 0.125 10409 0.348 1.04 0.314 277 3 7696 al. 2010) 9 LOC1005338 rs245 1 4564 (Köttgen et interge 1231 CKD 53/RNU6- A C 0.844 0.97 0.552 10409 0.845 0.99 0.861 3533 5 1225 al. 2010) nic 9 953P rs491 1 5394 (Köttgen et 1231 CKD WDR72 intron A C 0.441 1.01 0.784 10409 0.440 1.01 0.875 567 5 6593 al. 2010) 9 rs139 1 7615 (Köttgen et 1231 CKD UBE2Q2 intron A G 0.358 1.01 0.900 10409 0.357 0.99 0.704 4125 5 8983 al. 2010) 9 rs989 1 5945 (Köttgen et 1231 CKD BCAS3 intron T C 0.517 0.92 0.031 10409 0.513 0.93 0.034 5661 7 6589 al. 2010) 9 rs124 1 3335 (Köttgen et 1231 6087 CKD SLC7A9 intron T C 0.718 1.00 0.994 10409 0.718 1.00 0.976 9 6891 al. 2010) 9 6 rs911 2 2361 (Köttgen et downstr 1231 CKD CST3 T C 0.638 1.03 0.515 10409 0.637 1.01 0.724 119 0 2737 al. 2010) eam 9 regulat rs482 2 3661 (Bostrom et APOL4/APO 5.742E- 1231 ESKD ory T C 0.463 0.88 0.0013 10409 0.446 0.80 1469 2 6445 al. 2010) L2 10 9 region CKD and (Gudbjartss rs429 1 2036 serum upstrea 1231 on et al. UMOD A G 0.795 0.97 0.527 10409 0.794 0.97 0.470 3393 6 4588 creatinine m gene 9 2010) levels rs643 1586 (Pattaro et LOC1019269 1231 2 CKD intron T C 0.990 0.96 0.840 10409 0.990 0.98 0.900 1731 3002 al. 2012) 66/RNU5E- 9

51

7P rs392 1 3076 (Pattaro et LOC1019283 interge 1231 CKD T C 0.881 0.94 0.288 10409 0.882 0.96 0.507 5584 1 0335 al. 2012) 38 nic 9 1004 rs758 T1D (Sandholm 1231 2 6065 AFF3 intron T C 0.449 0.99 0.710 10409 0.449 0.99 0.720 3877 nephropathy et al. 2012) 9 4 rs124 1 9414 T1D (Sandholm LOC1079839 1231 3785 intron T G 0.845 0.99 0.864 10409 0.843 0.99 0.854 5 1833 nephropathy et al. 2012) 74 9 4 rs175 1 4571 (Tin et al. SPATA5L1/C upstrea 1231 3652 eGFR C G 0.338 1.08 0.057 10409 0.337 1.06 0.144 5 9187 2013a) 15orf48 m gene 9 7 1744 rs497 (Sandholm LOC1079859 interge 1231 2 6285 T1D-ESKD A T 0.296 0.95 0.272 10409 0.299 0.97 0.457 2593 et al. 2013) 61 nic 9 4 (Nanayakka rs606 2 4528 1231 CKD ra et al. SLC13A3 intron A G 0.036 1.16 0.171 10409 0.035 1.14 0.191 6043 0 8453 9 2014) 1549 rs955 (Iyengar et RPL31P29/L interge 1231 6 4740 DKD A G 0.972 1.11 0.405 10409 0.971 1.08 0.495 333 al. 2015) OC646274 nic 9 8 rs107 1 1696 Microalbuminur (Teumer et 1231 9543 CUBN intron A C 0.393 1.02 0.562 10409 0.391 1.02 0.569 0 9923 ia al. 2016) 9 3 rs180 1583 (Pattaro et 1231 1 eGFR CASP9 intron T C 0.270 1.03 0.535 10409 0.270 1.04 0.330 0615 2281 al. 2016) 9 rs121 1586 (Pattaro et 1231 2407 1 eGFR DNAJC16 intron A G 0.692 0.99 0.763 10409 0.693 0.97 0.490 9899 al. 2016) 9 8 2010 rs385 (Pattaro et missen 1231 1 1629 eGFR CACNA1S A G 0.020 1.26 0.107 10409 0.020 1.14 0.331 0625 al. 2016) se 9 6 2435 rs280 (Pattaro et 1231 1 0176 eGFR SDCCAG8 intron A C 0.477 1.06 0.158 10409 0.480 1.05 0.203 2729 al. 2016) 9 3 rs807 1579 (Pattaro et DDX1/LOC1 interge 1231 2 eGFR T G 0.766 0.94 0.182 10409 0.765 0.93 0.067 601 3014 al. 2016) 01926966 nic 9 rs654 7367 (Pattaro et missen 1231 2 eGFR ALMS1 A G 0.240 0.96 0.411 10409 0.239 0.97 0.491 6838 9280 al. 2016) se 9 1700 rs466 (Pattaro et 1231 2 0850 eGFR LRP2 intron A T 0.606 1.02 0.654 10409 0.605 1.02 0.579 7594 al. 2016) 9 6 rs742 2 2115 CKD (Pattaro et CPS1 Missen A C 0.371 0.98 0.644 10409 0.370 0.98 0.614 1231

52

2339 4050 al. 2016) se 9 7 2176 rs271 (Pattaro et LOC1019282 1231 2 8277 eGFR intron A C 0.476 1.02 0.628 10409 0.475 1.00 0.977 2184 al. 2016) 78 9 9 rs679 1390 (Pattaro et 1231 3 eGFR WNT7A intron A G 0.211 1.05 0.309 10409 0.211 1.06 0.170 5744 6850 al. 2016) 9 1700 rs968 (Pattaro et 1231 3 9190 eGFR SKIL intron T C 0.752 0.97 0.546 10409 0.753 0.99 0.863 2041 al. 2016) 9 2 rs105 1858 (Pattaro et 1231 1380 3 2235 eGFR ETV5 intron T G 0.970 1.08 0.503 10409 0.970 1.07 0.553 al. 2016) 9 1 3 rs173 7736 (Pattaro et 1231 1972 4 CKD SHROOM3 intron A G 0.213 0.97 0.580 10409 0.211 0.97 0.540 8847 al. 2016) 9 1 1035 rs228 (Pattaro et 1231 4 6170 eGFR MANBA intron A G 0.250 1.07 0.116 10409 0.250 1.05 0.240 611 al. 2016) 9 9 1768 rs642 (Pattaro et 1231 5 1763 CKD SLC34A1 intron A G 0.805 1.00 0.942 10409 0.803 1.01 0.843 0094 al. 2016) 9 6 rs775 2734 (Pattaro et upstrea 1231 6 eGFR ZNF204P A G 0.868 1.03 0.581 10409 0.868 1.03 0.627 9001 1409 al. 2016) m gene 9 rs102 1285 (Pattaro et UNCX/MICA interge 1231 7711 7 eGFR A T 0.771 0.93 0.157 10409 0.773 0.94 0.200 195 al. 2016) LL2 nic 9 5 rs375 3291 (Pattaro et 1231 7 eGFR KBTBD2 intron A T 0.613 1.04 0.274 10409 0.614 1.03 0.382 0082 9927 al. 2016) 9 rs848 7755 (Pattaro et 1231 7 eGFR PHTF2 intron C G 0.857 0.96 0.501 10409 0.858 0.96 0.386 490 5005 al. 2016) 9 1562 rs645 (Pattaro et LOC285889/ interge 1231 7 5856 eGFR T G 0.522 1.00 0.983 10409 0.519 0.99 0.721 9680 al. 2016) LINC01006 nic 9 8 rs104 1 1065 (Pattaro et stop 1231 CKD IDI2 T C 0.046 1.26 0.013 10409 0.045 1.24 0.014 4261 0 710 al. 2016) gained 9 rs109 1 5264 (Pattaro et 5 prime 1231 9486 eGFR A1CF T C 0.209 1.03 0.558 10409 0.211 1.02 0.706 0 5424 al. 2016) UTR 9 0 rs163 1 2789 (Pattaro et 1231 eGFR KCNQ1 intron A G 0.922 0.94 0.363 10409 0.924 1.00 0.991 160 1 955 al. 2016) 9 rs963 1 3074 (Pattaro et LOC1019283 interge 1231 CKD T C 0.877 0.96 0.442 10409 0.879 0.98 0.674 837 1 9090 al. 2016) 16 nic 9

53 rs104 1 3368 (Pattaro et 1231 9196 eGFR TSPAN9 intron A G 0.407 1.02 0.632 10409 0.407 1.01 0.800 2 093 al. 2016) 9 7 rs795 1 1532 (Pattaro et 1231 eGFR RERG intron T C 0.493 1.01 0.726 10409 0.490 0.99 0.881 6634 2 1194 al. 2016) 9 rs110 1 5780 (Pattaro et 1231 eGFR R3HDM2 intron T C 0.098 1.00 0.974 10409 0.097 0.96 0.447 6766 2 9456 al. 2016) 9 rs476 1 4139 (Pattaro et 1231 eGFR INO80 intron C G 0.220 1.00 0.950 10409 0.218 0.98 0.645 633 5 2134 al. 2016) 9 rs246 1 4569 1231 CKD (Pattaro et SPATA5L1 intron T G 0.157 1.03 0.523 10409 0.156 1.01 0.871 7853 5 8793 9 al. 2016) rs164 1 8970 (Pattaro et DPEP1/CHM downstr 1231 eGFR C G 0.895 0.96 0.562 10409 0.896 0.96 0.471 748 6 8292 al. 2016) P1A eam 9 rs245 1 1943 (Pattaro et 1231 eGFR SLC47A1 intron T C 0.609 0.98 0.691 10409 0.609 0.99 0.758 3580 7 8321 al. 2016) 9 rs991 1 3749 (Pattaro et 1231 eGFR FBXL20 intron T C 0.515 0.98 0.588 10409 0.515 0.99 0.772 6302 7 9949 al. 2016) 9 rs809 1 7716 (Pattaro et 1231 eGFR NFATC1 intron A G 0.194 1.01 0.863 10409 0.194 1.02 0.759 1180 8 4243 al. 2016) 9 rs116 1 3846 (Pattaro et 1231 6649 eGFR SIPA1L3 intron T C 0.071 1.06 0.446 10409 0.070 1.06 0.370 9 4262 al. 2016) 9 7 rs608 2 3328 (Pattaro et PIGU/LOC10 splice 1231 eGFR C G 0.347 1.00 0.935 10409 0.346 1.00 0.972 8580 0 5053 al. 2016) 5372599 region 9 rs172 regulat 2 5273 (Pattaro et BCAS1/CYP 1231 1670 eGFR ory T C 0.937 0.93 0.361 10409 0.937 0.96 0.565 0 2362 al. 2016) 24A1 9 7 region rs357 1768 (Mahajan et RGS14/SLC upstrea 1231 1609 5 0663 eGFR T C 0.397 1.00 0.964 10409 0.397 0.98 0.576 al. 2016) 34A1 m gene 9 7 6 rs779 (Mahajan et 1 2039 1231 2461 eGFR al. 2016) PDILT intron A G 0.058 0.91 0.307 10409 0.056 0.89 0.147 6 2332 9 5 T2D-ESKD cases vs. non- All-ESKD cases vs. non- APOL1-negative model diabetic non-nephropathy diabetic non-nephropathy controls controls rs129 1 2036 (Köttgen et UMOD/PDIL downstr 1076 1770 eGFR T G 0.047 0.98 0.834 9745 0.045 0.95 0.625 6 7690 al. 2009) T eam 4 7 rs102 7390 Creatinine (Chambers 1076 0689 2 ALMS1P intron T C 0.454 0.96 0.371 9745 0.453 0.97 0.468 0900 levels et al. 2010) 4 9

54

1606 rs312 Creatinine (Chambers SLC22A2/LO 5 prime 1076 6 8139 A G 0.843 1.02 0.721 9745 0.843 1.03 0.534 7573 levels et al. 2010) C105378088 UTR 4 3 rs806 1 5948 Creatinine (Chambers 1076 TBX2 intron T C 0.276 1.01 0.866 9745 0.275 0.99 0.822 8318 7 3766 levels et al. 2010) 4 rs480 1 3345 Creatinine (Chambers 1076 CEP89 intron T C 0.023 1.16 0.282 9745 0.023 1.19 0.175 5834 9 3659 levels et al. 2010) 4 1509 rs267 (Köttgen et CERS2/LOC upstrea 1076 1 5147 CKD T C 0.962 0.97 0.747 9745 0.962 0.97 0.762 734 al. 2010) 105371438 m gene 4 7 rs126 2773 (Köttgen et missen 1076 2 CKD GCKR T C 0.138 0.88 0.043 9745 0.137 0.90 0.078 0326 0940 al. 2010) se 4 rs135 7386 (Köttgen et missen 1076 2 CKD NAT8 A G 0.456 0.97 0.426 9745 0.455 0.97 0.512 38 8328 al. 2010) se 4 1418 rs347 (Köttgen et 1076 3 0713 CKD TFDP2 intron A C 0.748 0.97 0.536 9745 0.747 0.96 0.374 685 al. 2010) 4 7 rs119 3939 (Köttgen et 1076 5992 5 CKD DAB2 intron A T 0.320 0.92 0.069 9745 0.318 0.92 0.038 7132 al. 2010) 4 8 regulat rs881 4380 (Köttgen et LOC1079865 1076 6 CKD ory A G 0.363 0.96 0.364 9745 0.364 0.99 0.774 858 6609 al. 2010) 98 4 region 1606 rs227 (Köttgen et 1076 6 6838 CKD SLC22A2 intron A G 0.803 1.02 0.650 9745 0.803 1.02 0.703 9463 al. 2010) 4 9 rs646 7741 (Köttgen et RSBN1L/TM downstr 1076 7 CKD T C 0.510 1.06 0.178 9745 0.509 1.05 0.225 5825 6439 al. 2010) EM60 eam 4 1514 rs780 (Köttgen et 1076 7 0780 CKD PRKAG2 intron A G 0.307 0.98 0.677 9745 0.308 0.97 0.520 5747 al. 2010) 4 1 rs101 2375 (Köttgen et STC1/LOC10 interge 1076 0941 8 CKD T C 0.326 0.95 0.278 9745 0.326 0.98 0.564 1151 al. 2010) 7986931 nic 4 4 rs474 7143 (Köttgen et 1076 9 CKD PIP5K1B intron A C 0.411 1.01 0.745 9745 0.411 1.00 0.955 4712 4707 al. 2010) 4 rs107 1 1156 (Köttgen et 1076 9472 CKD WDR37 intron T C 0.212 0.97 0.582 9745 0.213 0.98 0.597 0 165 al. 2010) 4 0 rs401 1 6550 (Köttgen et KRT8P26/AP interge 1076 CKD C G 0.789 1.05 0.311 9745 0.788 1.04 0.404 4195 1 6822 al. 2010) 5B1 nic 4 rs107 1 3492 (Köttgen et 1076 7402 CKD SLC6A13 intron T C 0.497 1.10 0.019 9745 0.496 1.08 0.035 2 98 al. 2010) 4 1

55

1120 rs653 1 (Köttgen et 1076 0775 CKD ATXN2 intron T C 0.926 0.92 0.344 9745 0.924 0.90 0.159 178 2 al. 2010) 4 6 rs626 1 7234 (Köttgen et 1076 CKD DACH1 intron A C 0.349 1.05 0.260 9745 0.348 1.04 0.380 277 3 7696 al. 2010) 4 LOC1005338 rs245 1 4564 (Köttgen et interge 1076 CKD 53/RNU6- A C 0.843 0.96 0.497 9745 0.843 0.98 0.776 3533 5 1225 al. 2010) nic 4 953P rs491 1 5394 (Köttgen et 1076 CKD WDR72 intron A C 0.441 1.01 0.802 9745 0.441 1.00 0.939 567 5 6593 al. 2010) 4 rs139 1 7615 (Köttgen et 1076 CKD UBE2Q2 intron A G 0.356 0.99 0.807 9745 0.356 0.97 0.427 4125 5 8983 al. 2010) 4 rs989 1 5945 (Köttgen et 1076 CKD BCAS3 intron T C 0.519 0.94 0.141 9745 0.517 0.95 0.184 5661 7 6589 al. 2010) 4 rs124 1 3335 (Köttgen et 1076 6087 CKD SLC7A9 intron T C 0.716 0.99 0.854 9745 0.717 0.99 0.841 9 6891 al. 2010) 4 6 rs911 2 2361 (Köttgen et downstr 1076 CKD CST3 T C 0.639 1.03 0.469 9745 0.640 1.03 0.431 119 0 2737 al. 2010) eam 4 regulat rs482 2 3661 (Bostrom et APOL4/APO 3.52E- 0.00000 1076 ESKD ory T C 0.488 1.21 9745 0.488 1.20 1469 2 6445 al. 2010) L2 06 24 4 region CKD and (Gudbjartss rs429 1 2036 serum upstrea 1076 on et al. UMOD A G 0.795 0.95 0.303 9745 0.794 0.95 0.336 3393 6 4588 creatinine m gene 4 2010) levels LOC1019269 rs643 1586 (Pattaro et 1076 2 CKD 66/RNU5E- intron T C 0.990 0.96 0.849 9745 0.990 0.99 0.965 1731 3002 al. 2012) 4 7P rs392 1 3076 (Pattaro et LOC1019283 interge 1076 CKD T C 0.881 0.94 0.340 9745 0.882 0.97 0.664 5584 1 0335 al. 2012) 38 nic 4 1004 rs758 T1D (Sandholm 1076 2 6065 AFF3 intron T C 0.450 0.97 0.517 9745 0.451 0.98 0.547 3877 nephropathy et al. 2012) 4 4 rs124 1 9414 T1D (Sandholm LOC1079839 1076 3785 intron T G 0.846 1.03 0.643 9745 0.845 1.03 0.626 5 1833 nephropathy et al. 2012) 74 4 4 rs175 1 4571 (Tin et al. SPATA5L1/C upstrea 1076 3652 eGFR C G 0.337 1.09 0.040 9745 0.337 1.06 0.160 5 9187 2013a) 15orf48 m gene 4 7 1744 rs497 (Sandholm LOC1079859 interge 1076 2 6285 T1D-ESKD A T 0.297 0.97 0.534 9745 0.297 0.97 0.542 2593 et al. 2013) 61 nic 4 4

56

(Nanayakka rs606 2 4528 1076 CKD ra et al. SLC13A3 intron A G 0.036 1.19 0.151 9745 0.035 1.19 0.121 6043 0 8453 4 2014) 1549 rs955 (Iyengar et RPL31P29/L interge 1076 6 4740 DKD A G 0.971 1.17 0.233 9745 0.971 1.13 0.298 333 al. 2015) OC646274 nic 4 8 rs107 1 1696 Microalbuminur (Teumer et 1076 9543 CUBN intron A C 0.392 1.02 0.692 9745 0.389 1.00 0.986 0 9923 ia al. 2016) 4 3 rs180 1583 (Pattaro et 1076 1 eGFR CASP9 intron T C 0.270 1.02 0.589 9745 0.269 1.03 0.493 0615 2281 al. 2016) 4 rs121 1586 (Pattaro et 1076 2407 1 eGFR DNAJC16 intron A G 0.693 0.99 0.753 9745 0.693 0.97 0.530 9899 al. 2016) 4 8 2010 rs385 (Pattaro et missen 1076 1 1629 eGFR CACNA1S A G 0.021 1.32 0.076 9745 0.021 1.20 0.200 0625 al. 2016) se 4 6 2435 rs280 (Pattaro et 1076 1 0176 eGFR SDCCAG8 intron A C 0.476 1.07 0.121 9745 0.480 1.06 0.113 2729 al. 2016) 4 3 rs807 1579 (Pattaro et DDX1/LOC1 interge 1076 2 eGFR T G 0.766 0.94 0.178 9745 0.764 0.92 0.061 601 3014 al. 2016) 01926966 nic 4 rs654 7367 (Pattaro et missen 1076 2 eGFR ALMS1 A G 0.241 0.98 0.620 9745 0.241 0.98 0.696 6838 9280 al. 2016) se 4 1700 rs466 (Pattaro et 1076 2 0850 eGFR LRP2 intron A T 0.606 1.02 0.578 9745 0.604 1.02 0.682 7594 al. 2016) 4 6 2115 rs742 (Pattaro et missen 1076 2 4050 CKD CPS1 A C 0.372 1.01 0.794 9745 0.372 1.02 0.615 2339 al. 2016) se 4 7 2176 rs271 (Pattaro et LOC1019282 1076 2 8277 eGFR intron A C 0.477 1.04 0.386 9745 0.477 1.02 0.578 2184 al. 2016) 78 4 9 rs679 1390 (Pattaro et 1076 3 eGFR WNT7A intron A G 0.211 1.08 0.154 9745 0.211 1.08 0.099 5744 6850 al. 2016) 4 1700 rs968 (Pattaro et 1076 3 9190 eGFR SKIL intron T C 0.753 0.97 0.465 9745 0.753 0.98 0.691 2041 al. 2016) 4 2 rs105 1858 (Pattaro et 1076 1380 3 2235 eGFR ETV5 intron T G 0.970 1.09 0.481 9745 0.970 1.07 0.577 al. 2016) 4 1 3 rs173 7736 (Pattaro et 1076 1972 4 CKD SHROOM3 intron A G 0.213 0.97 0.502 9745 0.212 0.97 0.501 8847 al. 2016) 4 1

57

1035 rs228 (Pattaro et 1076 4 6170 eGFR MANBA intron A G 0.249 1.05 0.268 9745 0.249 1.03 0.488 611 al. 2016) 4 9 1768 rs642 (Pattaro et 1076 5 1763 CKD SLC34A1 intron A G 0.805 1.00 0.962 9745 0.804 1.01 0.859 0094 al. 2016) 4 6 rs775 2734 (Pattaro et upstrea 1076 6 eGFR ZNF204P A G 0.867 1.02 0.721 9745 0.868 1.02 0.735 9001 1409 al. 2016) m gene 4 rs102 1285 (Pattaro et UNCX/MICA interge 1076 7711 7 eGFR A T 0.772 0.94 0.218 9745 0.771 0.93 0.152 195 al. 2016) LL2 nic 4 5 rs375 3291 (Pattaro et 1076 7 eGFR KBTBD2 intron A T 0.612 1.04 0.317 9745 0.614 1.04 0.318 0082 9927 al. 2016) 4 rs848 7755 (Pattaro et 1076 7 eGFR PHTF2 intron C G 0.857 0.96 0.482 9745 0.858 0.96 0.486 490 5005 al. 2016) 4 1562 rs645 (Pattaro et LOC285889/ interge 1076 7 5856 eGFR T G 0.523 1.00 0.998 9745 0.521 1.00 0.994 9680 al. 2016) LINC01006 nic 4 8 rs104 1 1065 (Pattaro et stop 1076 CKD IDI2 T C 0.047 1.41 0.0006 9745 0.046 1.39 0.00053 4261 0 710 al. 2016) gained 4 rs109 1 5264 (Pattaro et 5 prime 1076 9486 eGFR A1CF T C 0.208 1.03 0.510 9745 0.210 1.03 0.517 0 5424 al. 2016) UTR 4 0 rs163 1 2789 (Pattaro et 1076 eGFR KCNQ1 intron A G 0.923 0.96 0.609 9745 0.924 1.01 0.851 160 1 955 al. 2016) 4 rs963 1 3074 (Pattaro et LOC1019283 interge 1076 CKD T C 0.877 0.96 0.475 9745 0.879 0.99 0.825 837 1 9090 al. 2016) 16 nic 4 rs104 1 3368 (Pattaro et 1076 9196 eGFR TSPAN9 intron A G 0.407 1.02 0.608 9745 0.407 1.02 0.679 2 093 al. 2016) 4 7 rs795 1 1532 (Pattaro et 1076 eGFR RERG intron T C 0.491 0.97 0.384 9745 0.489 0.95 0.172 6634 2 1194 al. 2016) 4 rs110 1 5780 (Pattaro et 1076 eGFR R3HDM2 intron T C 0.098 0.97 0.713 9745 0.098 0.95 0.399 6766 2 9456 al. 2016) 4 rs476 1 4139 (Pattaro et 1076 eGFR INO80 intron C G 0.220 1.01 0.781 9745 0.221 1.00 0.941 633 5 2134 al. 2016) 4 rs246 1 4569 1076 CKD (Pattaro et SPATA5L1 intron T G 0.158 1.04 0.501 9745 0.158 1.01 0.833 7853 5 8793 4 al. 2016) rs164 1 8970 (Pattaro et DPEP1/CHM downstr 1076 eGFR C G 0.895 1.01 0.865 9745 0.896 0.99 0.890 748 6 8292 al. 2016) P1A eam 4 rs245 1 1943 (Pattaro et 1076 eGFR SLC47A1 intron T C 0.608 0.97 0.439 9745 0.607 0.96 0.319 3580 7 8321 al. 2016) 4

58 rs991 1 3749 (Pattaro et 1076 eGFR FBXL20 intron T C 0.513 0.96 0.298 9745 0.513 0.97 0.390 6302 7 9949 al. 2016) 4 rs809 1 7716 (Pattaro et 1076 eGFR NFATC1 intron A G 0.195 1.02 0.793 9745 0.195 1.01 0.789 1180 8 4243 al. 2016) 4 rs116 1 3846 (Pattaro et 1076 6649 eGFR SIPA1L3 intron T C 0.072 1.12 0.164 9745 0.071 1.11 0.152 9 4262 al. 2016) 4 7 rs608 2 3328 (Pattaro et PIGU/LOC10 splice 1076 eGFR C G 0.347 0.99 0.899 9745 0.346 0.99 0.802 8580 0 5053 al. 2016) 5372599 region 4 rs172 regulat 2 5273 (Pattaro et BCAS1/CYP 1076 1670 eGFR ory T C 0.936 0.91 0.260 9745 0.936 0.93 0.373 0 2362 al. 2016) 24A1 4 7 region rs357 1768 (Mahajan et RGS14/SLC upstrea 1076 1609 5 0663 eGFR T C 0.397 1.00 0.993 9745 0.399 0.99 0.754 al. 2016) 34A1 m gene 4 7 6 rs779 1 2039 (Mahajan et 1076 2461 eGFR PDILT intron A G 0.058 0.94 0.526 9745 0.057 0.91 0.302 6 2332 al. 2016) 4 5 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; EA, effect allele; OA, other allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value.

59

Supplementary Table 5. Examination of top associations in previous T2D-ESKD GWAS

T2D-ESKD cases vs. non- All-ESKD cases vs. non- Baseline Model diabetic non-nephropathy diabetic non-nephropathy

controls controls SNP CHR POS Gene Consequence EA OA EAF OR P N EAF OR P N rs7769051 6 133146796 RPS12/HMGB1P13 intergenic A G 0.32 1.12 0.0042 10409 0.32 1.13 0.001346 12319 rs6930576 6 148704954 SASH1 intron A G 0.31 1.07 0.11 10409 0.31 1.06 0.1036 12319 rs773506 9 93975471 LINC00484/AUH downstream A G 0.22 0.90 0.021 10409 0.22 0.93 0.07365 12319 rs2358944 12 66117558 PCNPP3/RPSAP52 intergenic A G 0.23 0.90 0.031 10409 0.23 0.90 0.01525 12319 rs2106294 22 31645759 LIMK2 intron T C 0.95 1.21 0.033 10409 0.95 1.22 0.01549 12319 T2D-ESKD cases vs. non- All-ESKD cases vs. non- APOL1-negative model diabetic non-nephropathy diabetic non-nephropathy controls controls rs7769051 6 133146796 RPS12/HMGB1P13 intergenic A G 0.32 1.09 0.059 9745 0.32 1.11 0.015 10764 rs6930576 6 148704954 SASH1 intron A G 0.31 1.06 0.16 9745 0.31 1.06 0.14 10764 rs773506 9 93975471 LINC00484/AUH downstream A G 0.22 0.88 0.0083 9745 0.22 0.90 0.023 10764 rs2358944 12 66117558 PCNPP3/RPSAP52 intergenic A G 0.23 0.89 0.017 9745 0.23 0.89 0.014 10764 rs2106294 22 31645759 LIMK2 intron T C 0.95 1.05 0.59 9745 0.95 1.01 0.93 10764 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; EA, effect allele; OA, other allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value.

60

Chapter 3

An exome-wide association study for type 2 diabetes-attributed end-

stage kidney disease in African Americans

Meijian Guan, Jacob M. Keaton, Latchezar Dimitrov, Pamela J. Hicks, Jianzhao Xu,

Nicholette Allred, James G. Wilson, Barry I. Freedman, Donald W. Bowden, Maggie C.

Y. Ng

61

Abstract

Compared to European Americans, African Americans (AAs) are at higher risk for developing end-stage kidney disease (ESKD). Genome-wide association studies (GWAS) have identified >70 genetic variants, primarily non-coding, associated with kidney function and chronic kidney disease (CKD) in patients with and without diabetes. However, these variants explain a small proportion of disease liability. This study examined the contribution of coding genetic variants for risk of type 2 diabetes (T2D)-attributed ESKD and advanced CKD in AAs.

Exome sequencing was performed in 456 AA T2D-ESKD cases, 936 AA non-diabetic, non- nephropathy controls, and 338 AA cases with T2D lacking nephropathy at the discovery stage.

Single-variant association with T2D-ESKD was performed (additive effect) using a mixed logistic regression model. Nominal associations (P<0.05) were then replicated in an independent cohort of AAs including 2,020 T2D-ESKD cases and 1,121 non-diabetic, non-nephropathy controls. A discrimination analysis was performed in 1,003 T2D subjects lacking nephropathy from the discovery and replication stages to exclude variants associated with T2D per se. Meta-analysis of 4,533 discovery and replication samples revealed 11 suggestive T2D-ESKD associations

(P<1x10-4) from eight loci (PLEKHN1, NADK, RAD51AP2, RREB1, PEX6, GRM8, PRX, APOL1).

Exclusion of APOL1 renal-risk genotype carriers identified four additional suggestive loci

(OTUD7B, IFITM3, DLGAP5, IER2). These putative T2D-ESKD associations were tested in an additional 2,129 AA non-diabetic and diabetic ESKD cases and 912 AA non-diabetic non- nephropathy controls. Rs41302867 in RREB1 displayed consistent association with T2D-ESKD and non-diabetic ESKD (OR=0.47, P=1.2x10-6 in 4,605 all-cause ESKD and 2,969 non-diabetic non-nephropathy controls). Gene-based analysis identified suggestive significant associations

(P<1x10-4) at seven genes (TMEM5, SPATS2, ZIC4, HELZ2, ILDR2, LGALS3BP, RSAD2). The

T2D-ESKD associated variants at GRM8, PEX6, ILDR2, RREB1 and PRX may support shared genetic risk between T2D-ESKD and other phenotypes, including obesity, dyslipidemia, T2D

62 and the Mendelian disease CMT neuropathy. Our findings suggest that coding genetic variants are implicated in predisposition to T2D-ESKD in AAs.

Introduction

African Americans (AAs) are disproportionately affected by end-stage kidney disease

(ESKD); incidence rates of ESKD are 3.1-fold higher in AAs than European Americans (EAs)

(USRDS, 2016). In 2014, 120,688 new cases of ESKD were diagnosed, of which 97.4% received dialysis and only 2.6% underwent kidney transplantation (USRDS, 2016). The mortality rates for patients with ESKD on dialysis and after transplant were 166 and 30 per 1,000 patient- years, respectively in 2014 (USRDS, 2016). Diabetes is one of the most common causes of

ESKD, accounting for >44% of all causes of ESKD in the US; of these ~90% relate to type 2 diabetes (T2D) (USRDS, 2016). Even after adjustment for socioeconomic status and environmental factors, incidence rates and familial aggregation of diabetic kidney disease (DKD), including T2D-attributed ESKD (T2D-ESKD), remain significantly higher in AAs (Spray et al.

1995; Freedman et al. 1995). Several lines of evidence support genetic contributors to ESKD susceptibility (Köttgen 2010; Friedman and Pollak 2011). Prior studies have shown that common genetic variants contribute to DKD susceptibility in multiple ethnic groups (Maeda 2004;

Pezzolesi et al. 2009; McDonough et al. 2011; Sandholm et al. 2012, 2017). While the apolipoprotein L1 gene (APOL1) G1 and G2 alleles explain a substantial proportion (~70%) of the disparity in non-diabetic ESKD in AAs vs. EAs, they fail to account for the excess risk of

T2D-ESKD in AAs (Tzur et al. 2010a; Genovese et al. 2010; Kopp et al. 2011).

Although DKD-associated variants have been identified by genome-wide association studies (GWAS), their contributions to DKD risk is modest (Maeda 2004; Pezzolesi et al. 2009;

McDonough et al. 2011; Sandholm et al. 2012; Iyengar et al. 2015). GWAS is not efficient in testing low frequency variants and these may account for a portion of the missing heritability. In

63 contrast, exome sequencing is a powerful tool to explore coding regions, particularly low frequency variants, which are poorly imputed by GWAS. In our prior studies, several coding variants located in RREB1, NPHS1, CUBN, LRP2, COL4A3 and CLDN8, were shown to contribute to diabetic and/or non-diabetic ESKD in AAs (Bonomo et al. 2014a, b; Ma et al. 2016;

Guan et al. 2016). Herein, we extend prior studies by performing a comprehensive association study of coding variants for T2D-ESKD risk in AAs. We examined 1,392 AA individuals with or without T2D-ESKD by exome sequencing (stage 1), followed by replication (stage 2) in 3,141

AA subjects, and a meta-analysis combining 4,533 AAs from stage 1 and stage 2 (stage 3), and a discrimination analysis in 1,003 T2D subjects without nephropathy to exclude T2D-associated variants (stage 4). We also performed replication in an additional 3,041 all-cause ESKD cases and controls (stage 5). Finally, all T2D-ESKD and non-diabetic ESKD cases as well as controls were meta-analyzed (N=7,574) to evaluate the impact of associated variants on risk of all-cause

ESKD (stage 6).

Materials and Method

Study Participants

Study participants were recruited by Wake Forest School of Medicine (WFSM) (stages

1-5) and Jackson Heart Study (JHS) (stage 1). The studies were approved by the Institutional

Review Board of each participating center. All participants provided written informed consent.

Participants were considered to have T2D-ESKD when T2D was clinically diagnosed ≥5 years prior to the onset of ESKD (or with diabetic retinopathy to ensure long duration T2D), and with at least one of the following: on renal replacement therapy, estimated glomerular filtration rate

(eGFR) <30 ml/min/1.73 m2, or urine albumin/creatinine ratio (UACR) >300 mg/g if an eGFR was unavailable. Participants with CKD stage 4 or macroalbuminuria were included as cases given their high risk of developing ESKD. Non-diabetic ESKD cases lacked diabetes (or

64 developed diabetes after initiating renal replacement therapy), and their kidney disease was attributed to chronic glomerular disease (e.g. focal segmental glomerulosclerosis, FSGS), HIV- associated nephropathy, hypertension or unknown cause. Those with ESKD attributed to surgical or urologic causes, polycystic kidney disease, autoimmune disease, hepatitis, IgA nephropathy, membranous glomerulonephritis, membranoproliferative glomerulonephritis, or monogenic kidney diseases were excluded. Non-diabetic non-nephropathy controls included those lacking diabetes and kidney disease (eGFR ≥ 60 ml/min/1.73 m2 and UACR <30 mg/g

[when UACR was available]). Individuals were considered to have T2D without nephropathy when their eGFR was ≥60 ml/min/1.73 m2 and UACR <30 mg/g in the presence of T2D.

Sequencing, genotyping and quality control

Discovery cohort

A total of 456 AA T2D-ESKD cases (5 with CKD4), 936 AA non-diabetic non- nephropathy controls, and 338 AA T2D cases lacking nephropathy underwent exome sequencing as part of the T2D-GENES consortium project (Fuchsberger et al. 2016). Details of sequencing data generation have been described previously (Fuchsberger et al. 2016). Briefly, exome sequencing was performed using an Agilent V2 capture array platform (Agilent

Technologies) at the Broad Institute (Cambridge, MA). Sequence data underwent multiple levels of quality control (QC) for both samples and sequence reads. Samples with average coverage

≤20x or possible DNA contamination were excluded. Aligned sequence reads were filtered based on multiple QC criteria, including number of mapped reads, fraction of properly pared reads, distribution of insertion sizes, distribution of average base quality, as well as GC bias. A total of 928,860 polymorphic exome variants were included in this study.

65

Replication and discrimination cohort

The replication and discrimination cohorts included 2,020 AA T2D-ESKD cases (143 had

CKD4 or macroalbuminuria), 1,121 AA non-diabetic non-nephropathy controls, and 665 AAs with T2D who lacked nephropathy. Samples were genotyped on a custom Affymetrix Axiom

Biobank Genotyping Array (Affymetrix, Santa Clara, CA, USA). Detailed variant information, custom content design, including fine mapping of candidate regions, genotyping methods, and quality control (QC) were described previously (Guan et al. 2016). In brief, this array includes approximately 264K coding variants and insertions/deletions (indels), 70K loss-of-function variants, 2K pharmacogenomic variants, 23K eQTL markers, 246K multi-ethnic population based genome-wide tag markers, and 115K custom content. A total of 724,530 variants were successfully called for downstream quality control (QC) and analyses. Variants with call rates

<95%, departure from Hardy Weinberg Equilibrium (HWE) (P<0.0001), and monomorphic variants were removed. Sample QC was also performed to remove individuals with low call rate, gender discordance, DNA contamination, or non-AA ancestry. Duplicate samples were identified, and one of each duplicate pair removed. Variants that passed QC were imputed to a combined haplotype reference panel including the 1000 Genomes phase 3 cosmopolitan reference panel

(October 2014 version) ( 1000 Genomes Project Consortium 2010) and a version of the African

Genome Variation Project (AGVP) reference panel including 640 African ancestry haplotypes kindly provided by the African Partnership for Chronic Disease Research and Wellcome Trust

Sanger Institute (Gurdasani et al. 2015). Pre-phasing was performed using SHAPEIT2

(Delaneau et al. 2012) and imputation was performed using IMPUTE2 (Marchini et al. 2007).

Variants with imputation info scores <0.4 were excluded from analysis.

66

All-cause ESKD replication cohort

Stage 5 analyses included 1,910 non-diabetic ESKD cases and 219 T2D-ESKD cases (7 with CKD4 or macroalbuminuria), and 912 non-diabetic non-nephropathy controls. These samples were genotyped on the Multi-Ethnic Genotyping Array (MEGA, Illumina, CA, USA), designed to improve fine-mapping and functional variants discovery by increasing variant coverage across multiple ethnicities (Bien et al. 2016). In brief, the array includes variants from two major categories: 1) backbone content containing highly informative variants for GWAS and exome analyses in ancestrally diverse populations, and 2) custom content used to replicate or generalize index GWAS associations, augment GWAS tagging variants in priority regions, enhance exome content in priority regions, fine-map GWAS loci, identify functional regulatory variants, explore medically important variants, and identify novel variant loci in candidate pathways. Genotyping was performed at Wake Forest School of Medicine. DNA from cases and controls were equally interleaved on 96-well plates to minimize artifactual errors during sample processing. A total of 48 samples sequenced as part of the 1000 Genomes Project(Consortium

2010) at the Coriell Institute for Medical Research were included in genotyping and had a concordance rate of 98.6%. Genotype calling was performed using GenomeStudio (Illumina, CA,

USA). A total of 1,705,970 variants were successfully called for downstream quality control (QC) and analyses. Variants with missing position or alleles, allele mismatch, call rates <95%, departure from HWE (P<0.0001), frequency difference >0.2 comparing with 1000 Genome

Project phase 3 reference panel, and monomorphic variants were removed. Multiple probe sets were compared and only one set with the highest call rate was kept. Sample QC was performed to remove individuals with low call rates (<0.95), gender discordance, DNA contamination, or non-AA ancestry. Duplicate samples were compared, and one of each duplicate pair was removed. Variants and samples that passed QC were used to perform pre-phasing with

SHAPEIT2 (Delaneau et al. 2012) and imputation with IMPUTE2 (Marchini et al. 2007) using a

67 combined haplotype reference panel from the 1000 Genomes Project phase 3 (1000 Genomes

Project Consortium 2010) and the African Genome Variation Project (AGVP) (Gurdasani et al.

2015), described above. Variants with imputation info scores <0.4 were excluded from analysis.

Statistical Analysis

Discovery and replication of associations in T2D-ESKD (baseline model)

Single-variant association analyses in case-control samples (from all analysis stages) were performed using a logistic mixed model method implemented in the program

GMMAT(Chen et al. 2016) under an additive genetic model. This controlled for population structure and cryptic relatedness by incorporating a genetic relationship matrix (GRM) estimated from a set of high-quality autosomal variants as a random effect. Principal components analysis was performed using EIGENSOFT (Price et al. 2006). The first eigenvector (PC1), representing

African-European ancestry, along with age and sex were used as covariates.

In the stage 1 analysis, exome-wide analysis was performed in 456 T2D-ESKD cases and 936 non-diabetes non-nephropathy controls on 193,646 variants that passed QC and had minor allele count (MAC)>10. Variants with nominal associations (P<0.05) were further tested in a replication cohort with 2,020 T2D-ESKD cases and 1,121 non-diabetic non-nephropathy controls (stage 2 analysis). Meta-analysis including the discovery and replication stages (2,476 cases and 2,057 controls) was performed using a fixed-effect inverse variance weighting method implemented in METAL in the stage 3 analysis (Willer et al. 2010). Variants with suggestive evidence of associations (P<1x10-4) were selected for discrimination analysis.

Discrimination analysis

To distinguish whether the meta-analysis association results were driven by T2D alone or T2D-ESKD, 1,003 AAs with T2D-lacking nephropathy and 2,057 non-diabetic non-

68 nephropathy controls from analysis stages 1 and 2 were compared in a discrimination analysis

(stage 4 analysis). Variants showing nominal association with T2D (P<0.05) were excluded.

Replication of associations in all-cause ESKD

Variants that showed suggestive association (P<1x10-4) in stage 3 T2D-ESKD meta- analysis and no evidence of association with T2D in the stage 4 discrimination analysis were tested for association with all-cause ESKD in 1,910 non-diabetic ESKD cases, 219 additional

T2D-ESKD cases, and 912 non-diabetic non-nephropathy controls (stage 5 analysis). Variants showing nominal association with all-cause ESKD (P<0.05) were tested in a meta-analysis of all-cause ESKD that included all of the T2D-ESKD cases, non-diabetic ESKD cases, and controls from stages 1, 2 and 5 analyses (N=7574). This meta-analysis evaluated whether T2D-

ESKD associated gene variants contributed more broadly to other causes of ESKD. The power to detect a low frequency variant (MAF=0.05) with OR=1.50 at a significance level of α=5x10 -7 for a disease with 0.1% prevalence was estimated with CaTS

(http://csg.sph.umich.edu/abecasis/cats/).

Association analyses with exclusion of APOL1 risk genotype carriers (APOL1-negative model)

APOL1 G1 and G2 risk alleles explain a substantial proportion of genetic susceptibility to non-diabetic kidney disease in AAs (Genovese et al. 2010). To minimize effects of misclassification, the same single-variant analyses were repeated (APOL1-negative model) by excluding APOL1 renal-risk-variant carriers and those missing APOL1 genotype data from T2D-

ESKD and non-diabetic ESKD samples. Individuals were considered APOL1 renal-risk-variant carriers if they possessed two G1 alleles (rs60910145 G allele, rs73885319 G allele), two G2 alleles (6 base pair in-frame deletion), or were compound heterozygotes (one G1 and one G2 allele) (Genovese et al. 2010). This secondary analysis reduces heterogeneity in the T2D-ESKD

69 case group, despite its smaller sample size. Specifically, 98 of 456 T2D-ESKD cases from stage

1 discovery and 386 of 2,020 T2D-ESKD cases from stage 2 replication were removed. In addition, 936 of 2,136 all-cause ESKD cases in the stage 5 analysis were removed, considering the impact of APOL1 in non-diabetic ESKD. This strategy may unmask effects of non-diabetic

ESKD variants beyond APOL1.

Gene-based analysis

We applied four gene-based tests implemented in RAREMETAL (Liu et al. 2014):

Sequence Kernel Association Test (SKAT) (Wu et al. 2011), Madsen-Browning test (MB)

(Madsen and Browning 2009), Variable Threshold test (VT) (Price et al. 2010a), and Combined and Multivariate Collapsing test (CMC) (Li and Leal 2008) to test for the joint effect of rare variants within a gene on the stage 1 discovery analysis with exome sequencing data. Variants were categorized into four groups in all methods, 1) moderate to high impact protein structure altering variants (transcript ablation, splice acceptor, splice donor, stop gained, frameshift, stop lost, start lost, transcript amplification, inframe insertion, inframe deletion, missense, and protein altering); 2) a more restricted group with only high impact protein structure altering variants

(transcript ablation, splice acceptor, splice donor, stop gained, frameshift, stop lost, start lost, transcript amplification); 3) variants predicted to be deleterious by at least one of the four prediction methods, including SIFT(Ng and Henikoff 2003), LRT (Chun and Fay 2009),

MutationTaster (Schwarz et al. 2010), and CADD (Kircher et al. 2014); and 4) variants predicted to be deleterious by all four prediction methods. Age, gender and PC1 were included as fixed- effect covariates. A P value <2.5x10-6 was considered exome-wide significant. Both baseline and APOL1-negative models were evaluated.

70

Functional characterization

The publically available expression quantitative trait loci (eQTL) database GTEx

(http://www.gtexportal.org/home/) was used to determine potential influences of T2D-ESKD associated variants on nearby gene expression. We queried 15 top associations from either the baseline or APOL1-negative models in GTEx across multiple tissues. Additional functional annotation of genetic variants was performed with VEP (McLaren et al. 2010), SnpEff (Cingolani et al. 2012) and dbNSFP (Liu et al. 2011).

Results

Genetic variants located in coding regions were evaluated in three independent AA cohorts for association with T2D-ESKD and all-cause ESKD through a multi-stage study design

(Figure 1). A total of 8,577 individuals categorized as having T2D-ESKD, non-diabetic non- nephropathy, T2D-lacking nephropathy, and non-diabetic ESKD were included in a sequence of association analyses. We had sufficient power (82%) to detect a low frequency variant

(MAF=0.05) with OR=1.50 at significance level of α=5x10-7

(http://csg.sph.umich.edu/abecasis/cats/). Overall, 19 suggestive T2D-ESKD associations in 12 distinct regions were identified, including two missense variants, in either the baseline or the

APOL1-negative models. Rs41302867 located in the RREB1 region revealed consistent association with T2D-ESKD and non-diabetic ESKD, confirming our previous report. Additionally, gene-based associations identified seven genes with suggestive evidence of association with

T2D-ESKD.

Clinical characteristics of study participants

Clinical characteristics of study participants from all analysis stages are presented in

Table 1. Individuals with T2D-ESKD and T2D-lacking nephropathy were older than the non- diabetic non-nephropathy controls at recruitment. However, the average age of diagnosis of

71

T2D in T2D-ESKD and T2D-lacking nephropathy participants were similar or younger than healthy controls at recruitment. Overall, individuals with T2D-lacking nephropathy were more obese than individuals with T2D- or non-diabetic ESKD and healthy controls.

Stage 1 discovery analysis in exome sequencing cohort

The stage 1 discovery analysis with exome sequencing data included 456 T2D-ESKD patients and 936 non-diabetic non-nephropathy controls (Figure 1). A total of 193,646 exome variants that passed QC and had MAC ≥10 were tested for association with T2D-ESKD using a logistic mixed model (Chen et al. 2016). Age, gender, and PC1 were included as fixed-effects in the model (Baseline model). The association analysis yielded an inflation factor of 1.001

(Supplementary Figure 1), indicating that population structure and cryptic relatedness were sufficiently controlled. No exome-wide significant association (P<5x10-7) was observed.

However, 21 variants showed suggestive significance at a P<1x10-4. We selected 9,436 variants with nominal significance (P<0.05) for replication analysis, given the relatively small sample size and low power of the discovery cohort.

Stage 2 replication analysis, stage 3 meta-analysis, and stage 4 discrimination analysis

An independent cohort of 2,020 AA T2D-ESKD cases and 1,121 AA non-diabetic non- nephropathy controls with imputed dosage data was used to replicate the selected variants from the discovery stage (Figure 1). More than 93% of the variants were imputed with imputation quality info >0.4 for association analyses. A total of 9,436 variants with nominal association

(P<0.05) were selected for meta-analysis of the discovery and replication samples. Fifteen variants with suggestive association with T2D-ESKD at P<1x10-4 were identified. To differentiate

T2D-ESKD associations identified in meta-analysis as putative T2D-ESKD or T2D-only (non- nephropathy) associated loci, we performed a T2D discrimination analysis comparing 1,003 AA

T2D cases lacking nephropathy with 2,057 non-diabetic non-nephropathy controls from stages 1

72 and 2. Four of the 15 T2D-ESKD associated variants were nominally associated (P<0.05) with

T2D and were removed from subsequent analyses (Supplementary Table 1). The remaining 11

T2D-ESKD associated variants were located in 8 loci including PLEKHN1, NADK, RAD51AP2,

RREB1, PEX6, GRM8, PRX, APOL1 (P<1x10-4) (Table 2). The appearance of two APOL1 G1 alleles, rs73885319 (OR=1.32, P=9.90x10-6) and rs60910145 (OR=1.33, P=5.74x10-6), known to be associated with non-diabetic ESKD, may due in part to misclassification of DKD cases

(Genovese et al. 2010). T2D-ESKD associated variants located within RAD51AP2 and PRX are highly correlated (r2>0.9), and rs834514 at RAD51AP2 (OR=0.76, P=6.56x10-5) and rs268671 at

PRX (OR=0.75, P=2.11x10-5) were missense variants. Moreover, variants from PLEKHN1,

RAD51AP2, PEX6, and PRX were associated with the expression level of nearby genes across multiple tissues, suggesting potential regulatory roles (Supplementary Table 2).

Association analysis excluding APOL1 renal-risk genotype carriers

The primary analysis showed moderate association of APOL1 G1 alleles with T2D-

ESKD, suggesting misclassification of some cases. A secondary analysis excluding APOL1 G1 and G2 risk genotype carriers (APOL1-negative model) was performed. In the stage 1 discovery analysis, 358 T2D-ESKD cases were compared with 936 non-diabetic non-nephropathy controls.

A total of 9,227 variants showing nominal association (P<0.05) were selected for replication in

1,636 T2D-ESKD cases and 1,121 non-diabetic non-nephropathy controls in a stage 2 analysis.

Meta-analysis of stages 1 and 2 samples revealed 11 suggestive signals with P<1x10-4 (Table

3). Discrimination analysis on the top associations removed 3 variants with nominal association

(P<0.05) with T2D per se. Among the 8 top T2D-ESKD associated variants, four variants including rs41302867 at RREB1 (OR=0.44, P=3.60x10-5), rs74678433 at GRM8 (OR=2.11,

P=6.24x10-5), and rs268672 and rs268671 at PRX (OR=0.74-0.75, P=1.87x10-5 – 5.21x10-5) were identified in the baseline model. It is of note that the other 5 top associations (excluding two APOL1 G1 alleles) from the baseline model had moderate attenuation in significance,

73 despite the similar effect size, due in part to the reduced sample size (Supplementary Table 3).

In contrast, 4 additional associations were identified at OTUD7B (rs115499155; OR=0.43,

P=8.58x10-5), IFITM3 (rs34481144; OR=0.67, P=5.25x10-5), DLGAP5 (rs8009244; OR=0.75,

P=6.42x10-5), and IER2 (rs148278682; OR=0.25, P=6.49x10-5). Variants in IFITM3 and

DLGAP5 demonstrated association with transcript abundance of nearby genes based on GTEx

(Supplementary Table 2).

Replication analysis in all-cause ESKD

In the stage 5 analysis, an all-cause ESKD cohort consisting of 1,910 non-diabetic

ESKD cases, 219 T2D-ESKD cases, and 912 non-diabetic non-nephropathy controls was used to replicate the 15 T2D-ESKD associated signals identified from the baseline and the APOL1- negative models. This analysis evaluated the generalizability of T2D-ESKD loci in non-diabetic etiologies of ESKD. Initially, the 15 suggestive T2D-ESKD associations were tested in 2,129 non-diabetic ESKD cases and 912 non-diabetic non-nephropathy controls. Only one variant, located in RREB1 (rs41302867, OR=0.52, P=0.015), achieved P<0.05. Similar results were obtained after excluding 219 T2D-ESKD cases from the association analysis; only rs41302867 reached P<0.05 (OR=0.50, P=0.011). Consistent results were observed after excluding APOL1 renal-risk genotype carriers. Meta-analysis of rs41302867 in 4,605 all-cause ESKD cases and

2,969 non-diabetic non-nephropathy controls from stages 1, 2, and 5 analyses revealed strong evidence of association (OR=0.47; P=1.2x10-6, Table 4), but did not reach exome-wide significance.

Gene-based analysis

To increase power to detect association with low frequency and rare variants, gene- based analyses were performed to aggregate effects of functional variants within each gene using only stage 1 exome sequencing data. Four complementary approaches were used to

74 model differences in effect sizes and direction of effects in association tests, a kernel-based test that allows variants having different direction of effects (SKAT) (Wu et al. 2011); and three different types of burden test, Madsen-Browning burden test (MB) (Madsen and Browning 2009), variable threshold burden test (VT) (Price et al. 2010a), and combined and multivariate collapsing test (CMC) (Li and Leal 2008). Functional variants were categorized into four groups based on the level of deleterious impact. Both baseline and APOL1 risk genotype-negative models were examined, followed by a discrimination analysis comparing T2D-lacking nephropathy with non-diabetic non-nephropathy controls. Although no gene reached exome- wide significance (P<2.5x10-6, adjusted for 20,000 genes), eight revealed suggestive association with T2D-ESKD at P<1x10-4 and no association was observed for T2D (P>=0.05).

TMEM5 showed the strongest association (P=4.08x10-6 in APOL1-negative model, P=2.7x10-5 in baseline model). Six additional genes, including SPATS2, ZIC4, HELZ3, ILDR2, LGALS3BP, and RSAD2, were nominally associated with T2D-ESKD (P<1x10-4). We evaluated the T2D-

ESKD associated loci from single-variant analyses in gene-based association results but no strong gene-based associations were observed. The top overlapping association was from

PEX6 (P=6.3x10-3 in gene-based analysis) (Supplementary Table 4).

Discussion

This study presents results of an exome sequencing-based genetic association study evaluating the contribution of coding variants to risk of T2D-ESKD in AAs. We tested the top variants for replication in additional T2D-ESKD and control cohorts, and evaluated the generalizability of T2D-ESKD associations in common forms of non-diabetic ESKD. Evidence of nominal association with T2D-ESKD was observed for 12 loci, including PLEKHN1, NADK,

RAD51AP2, RREB1, PEX6, APOL1, GRM8, PRX, OTUD7B, IFITM3, DLGAP5 and IER2 in single-variant analysis of either baseline or APOL1-negative models. Two associated variants from RAD51AP2 and PRX were missense mutations. Meta-analysis of all-cause ESKD revealed

75 strong association at RREB1, previously identified as associated with ESKD (Bonomo et al.

2014a). These results confirm its role in diabetes and non-diabetes associated kidney disease in AAs. Gene-based analyses identified additional suggestive T2D-ESKD loci with multiple variants having cumulative effects: TMEM5, SPATS2, ZIC4, HELZ2, ILDR2, LGALS3BP, and

RSAD2. However, there were minimal overlapping associations between single-variant and gene-based tests.

The only locus that revealed association with both T2D-ESKD and non-diabetic ESKD was an intron variant at RREB1, a large complex gene encoding the ras responsive transcription factor. RREB1 has repeatedly been implicated in fasting glucose, T2D susceptibility, fat distribution, and adipocyte development in large-scale genetic studies (Below et al. 2011; Liu et al. 2013; DIAbetes Genetics Replication And Meta-analysis (DIAGRAM)

Consortium 2014; Mahajan et al. 2015; Chu et al. 2017). Relevant to kidney disease susceptibility, RREB1 variants had been reported to be associated with kidney function and interact with APOL1 renal-risk-alleles in non-diabetic nephropathy (Yang et al. 2010; Bostrom et al. 2012). More recently, the same variant, rs41302867, was shown to be associated with all- cause ESKD in AAs and EAs (Bonomo et al. 2014a). Despite partially overlapping participants, the present results confirmed a role for RREB1 in common forms of ESKD in AAs with a substantially expanded sample. In addition, it is noteworthy that the ESKD-associated variant rs41302867 had no association with T2D alone in this study (P=0.096); this suggests a pleiotropic effect of RREB1 in multiple traits including T2D and kidney disease.

Among the putative T2D-ESKD associated loci, an intronic variant at GRM8

(rs74678433) showed strong and consistent associations in baseline and APOL1-negative models (P<5.96x10-6 in baseline and P<6.24x10-5). Metabotropic glutamate receptor 8, the protein product encoded by GRM8, is associated with weight gain in mouse models (Duvoisin et al. 2005; Davis et al. 2013). Previous studies suggest that obesity is a major risk factor for

76 kidney disease in patients with diabetes (Zoppini et al. 2012). This result may suggest potential genetic correlation between obesity and T2D-ESKD.

Two missense variants showing putative association with T2D-ESKD were identified, rs834514 and rs268671 located at RAD51AP2 and PRX, respectively. RAD51 Associated

Protein 2, encoded by RAD51AP2, is a recombinase which plays a critical role in both DNA repair and meiotic recombination. Rs834514 may regulate the expression of RAD51AP2, GEN1 and VSNL1 in multiple tissues including pancreas and adipose. The missense variant (rs268671) located in PRX encodes periaxin, a key myelination molecule forming tight junctions between myelin loops and axons. Mutations in PRX caused late-onset Charcot-Marie-Tooth (CMT) neuropathy, a common inherited neurological disorder (Tokunaga et al. 2012; Renouil et al.

2013). CMT neuropathy has been reported to be associated with kidney disease, mainly FSGS

(Nadal et al. 1998; Boyer et al. 2011; De Rechter et al. 2015). The underlying mechanism linking these entities remains unknown.

A splice region variant (rs9986447) located in PEX6 was nominally associated with T2D-

ESKD in the baseline model. Rs9986447 showed significant impact on the expression level of

PEX6 in a number of tissues, including adipose and pancreas, indicating its regulatory role in this gene. This gene plays a direct role in the biosynthesis of peroxisome, a subcellular organelle involved in lipid metabolism. PEX6 may associate with DKD susceptibility, as dyslipidemia is thought to be a contributor (Stadler et al. 2015).

Analyses excluding APOL1 renal-risk genotype carriers provided an opportunity to uncover the genetic architecture in T2D-ESKD. In the APOL1-negative model, four additional variants revealing suggestive association with T2D-ESKD were identified, in OTUD7B, IFITM3,

DLGAP5, and IER2. OTU deubiquitinase 7B, encoded by OTUD7B, may be involved in the inflammatory response of DKD progression via regulating the NF-kappa-B signalling pathway

77

(Gao et al. 2014). However, IFITM3 (interferon induced transmembrane protein 3), DLGAP5

(DLG associated protein 5), and IER2 (immediate early response 2), do not have literature to support roles in T2D-ESKD. Further studies are required to evaluate potential biological roles.

Gene-based methods provide an opportunity to evaluate the cumulative impact of low frequency variants within a region on the disease of interest. We utilized four types of gene- based analysis methods to test four categories of variants grouped by their functional impact.

The top signal was found at a transmembrane protein gene TMEM5 showing consistent association in both the baseline and APOL1-negative models. For ILDR2 (immunoglobulin like domain containing receptor 2), Ildr2 has been implicated in T2D susceptibility and hepatic lipid metabolism in mice models (Cooper et al. 2015; Watanabe et al. 2016). Potential involvement of

ILDR2 in T2D-ESKD disease remains to be assessed.

This study suggests that low frequency and rare variants located at coding regions provide crucial information on understanding the genetic architecture of complex diseases such as DKD. We identified suggestive T2D-ESKD signals in 19 loci using multiple single variant and gene-based analyses; however additional replication is required to confirm findings given limited power in the current study. There was lack of enrichment between the single-variant analyses and gene-based tests. This may be partially due to the deleterious variants included in gene- based tests, which tend to have low frequencies and may be excluded (or under powered) in single-variant analyses. On the other hand, gene-based associations was performed only in exome sequencing data; thus, with limited power.

The pathophysiology and pathology of DKD is heterogeneous, with effects of glycemia, blood pressure, albuminuria, diabetes duration, serum uric acid, dyslipidemia, obesity, and smoking (Zoppini et al. 2012; Macisaac et al. 2014; Radcliffe et al. 2017). Thus, T2D-associated

DKD may share common genetic background with other phenotypes. It has also been proposed

78 that genetic variation in Mendelian disease genes may in part account for common disease susceptibility (Blair et al. 2013; Parsa et al. 2013a). Identification of GRM8, PEX6, ILDR2,

RREB1 and PRX may indicate potential genetic correlations between T2D-ESKD with obesity, dyslipidemia, T2D, and Mendelian CMT neuropathy.

This study has similar limitations as in other reports. First, it is difficult to exclude all individuals misclassified as DKD due to the frequent lack of kidney biopsies. However, we carefully removed patients with ESKD attributed to non-diabetic etiologies, and subsequently excluded APOL1 risk genotype carriers with high risk for non-diabetic kidney disease. These should minimize misclassification. In addition, although the multi-stage study design brings together 8,577 AA individuals, study power was moderate. This is especially true in our discovery exome sequencing cohort, which may result in additional variants of modest effect and low frequency that were not captured. There are few other existing collections of appropriate AA samples; this limited possible replication studies.

In conclusion, an exome sequencing-based, multi-phase study to identify T2D-ESKD susceptibility loci was performed in AAs and 19 suggestive associations with T2D-ESKD were detected. RREB1 was consistently associated with diabetic and non-diabetic etiologies of

ESKD in AAs. T2D-ESKD associated variants at GRM8, PEX6, ILDR2, RREB1 and PRX may support genetic correlation between T2D-ESKD and related phenotypes. Future efforts to confirm the newly identified associations and determine their potential impact on the biological processes related to DKD requires investment in additional sample recruitment and comprehensive functional evaluation.

79

Acknowledgements

This work was supported by NIH grants R01 DK53591 (DWB), DK070941 and

DK084149 (BIF). The JHS is supported by contracts HHSN268201300046C,

HHSN268201300047C, HHSN268201300048C, HHSN268201300049C,

HHSN268201300050C from the National Heart, Lung, and Blood Institute and the National

Institute on Minority Health and Health Disparities. Dr. Wilson is supported by U54GM115428 from the National Institute of General Medical Sciences. We acknowledge the contributions of the study participants, coordinators, physicians, staff and laboratory from the Wake Forest

School of Medicine (WFSM) and the Jackson Heart Study (JHS).

80

Figure 1. Analysis workflow of single-variant association analysis for T2D-ESKD Exome sequencing study (baseline model)

81

Table 1. Clinical characteristics of study cohorts Discovery cohort Replication and discrimination cohort All-cause ESKD replication cohort

T2D- Non-diabetic non- T2D- Non-diabetic non- non- Non-diabetic non- T2D-lacking T2D lacking T2D- ESKD nephropathy ESKD nephropathy diabetic nephropathy nephropathy nephropathy ESKD cases controls cases controls ESKD controls N 456 936 338 2020 1121 665 1910 219 912

Female (%) 61.62 59.72 68.05 57.12 51.74 64.47 58.74 50.68 41.89 61.99 64.61±8. 61.46±10 55.35±14. Age (years) 52.49±11.25 57.42±10.16 46.50±12.09 55.78±11.61 ±10.9 44.82±13.83 55 .88 44 9 46.2±9.2 38.62±12 37.81 Age at T2D onset (years) - - - 46.20±12.27 - - 7 .76 ±9.60 Duration of T2D for T2D- lacking nephropathy - - - - - 9.58±9.12 - - - (years) Duration of T2D prior to 15.36±8. 19.66±9. 20.41 ------ESKD (years) 80 91 ±9.55 3.11±3.7 3.61±3.6 4.04± Duration of ESKD (years) - - - - 6.17±5.80 - 3 4 3.11 Fasting serum glucose 87.00±8.7 - 84.75±9.69 - - 96.52±20.93 - - 96.09±10.46 (mg/dl) 2 eGFR (ml/min/1.73m2) - 92.10±17.33 94.44±19.78 - 103.45±19.08 95.71±18.28 - - 89.29±16.72 29.43±6. 30.74±7. 27.76±7.2 29.73 Body Mass Index (kg/m2) 30.86±6.88 33.11±6.22 29.81±7.36 33.10±7.81 29.73±6.60 52 13 1 ±7.03 Categorical data expressed as percentage; continuous data as mean ± SD. Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; N, number; eGFR, estimated glomerular filtration rate

82

Table 2. T2D-ESKD associated variants in meta-analysis from discovery and replication cohorts (Baseline model)

Discovery (T2D-ESKD cases vs. non- Replication (T2D-ESKD cases vs. non- diabetic diabetic Meta-analysis

non-nephropathy controls) non-nephropathy controls) N N Conseque CH E N Variant Gene POS EAF OR P Case/contr EAF OR P Case/cont EAF OR P nce R A Case/control ol rol rs5992898 PLEKHN 1.3 3.32E- Intron 1 907622 C 456/936 0.22 1.46 0.0017 2020/1121 0.21 1.27 0.0041 2476/2057 0.21 4 1 3 05 rs1808815 0.01 2.8 5.84E- NADK Intron 1 1685713 T 456/936 0.01 4.87 0.00064 2020/1121 0.011 2.23 0.011 2476/2057 14 1 5 05 RAD51A 1769657 0.7 6.56E- rs834514 Missense 2 T 456/936 0.21 0.70 0.0037 2020/1121 0.21 0.79 0.0043 2476/2057 0.21 P2 3 6 05 RAD51A synonymou 1769867 0.7 5.27E- rs665312 2 G 456/936 0.24 0.69 0.0011 2020/1121 0.24 0.81 0.0077 2476/2057 0.24 P2 s 8 7 05 rs4130286 0.02 0.4 2.43E- RREB1 Intron 6 7240876 A 456/936 0.023 0.35 0.0016 2020/1121 0.020 0.51 0.0030 2476/2057 7 1 5 05 splice 4294277 0.7 9.04E- rs9986447 PEX6 6 G 456/936 0.22 0.71 0.0028 2020/1121 0.21 0.80 0.0074 2476/2057 0.21 region 9 7 05 rs7467843 1261739 0.02 2.1 5.96E- GRM8 Intron 7 G 456/936 0.026 2.73 0.00060 2020/1121 0.028 1.92 0.0019 2476/2057 3 50 7 7 06 synonymou 4090160 0.7 5.87E- rs268672 PRX 19 A 456/936 0.19 0.73 0.0096 2020/1121 0.20 0.77 0.0020 2476/2057 0.19 s 4 6 05 4090161 0.7 2.11E- rs268671 PRX Missense 19 A 456/936 0.19 0.73 0.0096 2020/1121 0.20 0.76 0.00072 2476/2057 0.19 4 5 05 rs7388531 3666190 1.3 9.9E- APOL1 Missense 22 G 456/936 0.23 1.39 0.0036 2020/1121 0.24 1.29 0.00076 2476/2057 0.24 9 6 2 06 rs6091014 3666203 1.3 5.74E- APOL1 Missense 22 G 456/936 0.23 1.42 0.0017 2020/1121 0.24 1.29 0.00081 2476/2057 0.24 5 4 3 06 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; EA, effect allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value; Baseline model: adjusted for age, sex and PC1, APOL1 risk genotype carriers included.

83

Table 3. Top T2D-ESKD associations in meta-analysis after removing APOL1 renal-risk genotype carriers (APOL1-negative model)

Discovery (T2D-ESKD cases vs. Replication (T2D-ESKD cases vs. non-diabetic non-diabetic Meta-analysis

non-nephropathy controls) non-nephropathy controls) N N N Conseque CH E Variant Gene POS case/contr EAF OR P Case/contr EAF P OR Case/con EAF OR P nce R A ol ol trol rs115499 OTUD synonymo 149943 0.0010 1994/205 0.02 0.4 8.58E- 1 T 358/936 0.020 0.44 0.030 1636/1121 0.021 0.42 155 7B us 088 6 7 1 3 05 rs413028 RREB 724087 0.0028 1994/205 0.02 0.4 3.60E- Intron 6 A 358/936 0.025 0.36 0.0032 1636/1121 0.021 0.49 67 1 6 2 7 2 4 05 rs746784 126173 0.0027 1994/205 0.02 2.1 6.24E- GRM8 Intron 7 G 358/936 0.024 2.47 0.0066 1636/1121 0.027 1.97 33 950 7 7 6 1 05 rs344811 IFITM 0.0011 1994/205 0.6 5.25E- 5’ UTR 11 320836 T 358/936 0.10 0.66 0.017 1636/1121 0.11 0.68 0.11 44 3 1 7 7 05 rs800924 DLGA 556437 0.0008 1994/205 0.7 6.42E- Intron 14 A 358/936 0.23 0.75 0.030 1636/1121 0.21 0.75 0.21 4 P5 52 0 7 5 05 rs148278 synonymo 132640 0.0050 1994/205 0.00 0.2 6.49E- IER2 19 A 358/936 0.01 0.21 0.0041 1636/1121 0.007 0.28 682 us 42 1 7 80 5 05 synonymo 409016 0.0018 1994/205 0.7 5.21E- rs268672 PRX 19 A 358/936 0.19 0.71 0.0089 1636/1121 0.20 0.76 0.19 us 04 4 7 5 05 409016 0.0006 1994/205 0.7 1.87E- rs268671 PRX missense 19 A 358/936 0.19 0.71 0.0089 1636/1121 0.20 0.74 0.20 14 8 7 4 05 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; EA, effect allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value; APOL1-negative model: adjusted for age, sex and PC1, APOL1 renal-risk genotype carriers removed.

84

Table 4. Meta-analysis combining T2D-ESKD and all-cause ESKD cohorts for rs41302867

rs41302867 RREB1 N EAF OR P Case/control Discovery (T2D-ESKD cases vs. non-diabetic non-nephropathy controls) 456/936 0.023 0.35 0.0016

Replication (T2D-ESKD cases vs. non-diabetic non-nephropathy controls) 2020/1121 0.020 0.51 0.0030

Discrimination (T2D-lacking nephropathy cases vs. non-diabetic non- nephropathy controls) 1003/2057 0.027 0.73 0.096

All-cause ESKD cohort (non-diabetic ESKD & T2D-ESKD case vs. non- diabetic non-nephropathy controls) 2129/912 0.020 0.52 0.015

Meta-analysis of Discovery & Replication & All-cause ESKD cohort 4605/2969 0.021 0.47 1.20E-06

Abbreviations: N, number; EAF, effect allele frequency; OR, odds ratio; P, p value.

85

Table 5. Top associations of gene-based analyses in baseline or APOL1-negative models T2D-ESKD vs. non-diabetic, non-nephropathy controls (N) T2D-lacking nephropathy vs. controls (N) Variant N Average N Gene Method Model N of Variants P N of Variants P Group case/control Frequency case/control Baseline single TMEM5 SKAT 456/936 9 0.0056 2.70E-05 338/936 10 0.25 Model predicted APOL1- single TMEM5 SKAT 358/936 8 0.0062 4.08E-06 338/936 10 0.25 negative predicted APOL1- single SPATS2 SKAT 358/936 11 0.0019 9.22E-05 338/936 11 0.93 negative predicted APOL1- single ZIC4 VT 358/936 6 0.00039 4.68E-05 338/936 2 0.13 negative predicted APOL1- HELZ2 MB H&M 358/936 69 0.0035 8.12E-05 338/936 73 0.78 negative APOL1- ILDR2 MB H&M 358/936 11 0.00070 9.31E-05 338/936 6 0.16 negative APOL1- LGALS3BP MB H&M 358/936 19 0.0016 3.81E-05 338/936 3 0.93 negative APOL1- RSAD2 MB H&M 358/936 9 0.0074 8.31E-05 338/936 3 0.56 negativel Abbreviations: N, number; EAF, effect allele frequency; OR, odds ratio; P, p value; single predicted, variant group containing variants predicted to be deleterious by at least one prediction methods; H&M, variant group containing variants with high to moderate protein structure disrupting effect

86

Supplementary Table 1. Discrimination analysis for top associations of baseline model and APOL1-negative model Discovery Replication Meta-analysis T2D-lacking nephropathy vs. control T2D-lacking nephropathy vs. control T2D-lacking nephropathy vs. control N N N Variant CHR POS EA EAF OR P EAF OR P EAF OR P Case/control Case/control Case/control rs59928984 1 907622 C 338/936 0.198 0.96 0.74 665/1121 0.199 1.10 0.37 1003/2057 0.199 1.04 0.65 rs180881514 1 1685713 T 338/936 0.005 0.73 0.62 665/1121 0.008 1.29 0.57 1003/2057 0.007 1.07 0.86 rs115499155 1 149943088 T 338/936 0.024 1.17 0.61 665/1121 0.026 0.79 0.40 1003/2057 0.025 0.95 0.79 rs834514 2 17696573 T 338/936 0.221 0.85 0.16 665/1121 0.223 0.92 0.42 1003/2057 0.222 0.89 0.13 rs665312 2 17698678 G 338/936 0.256 0.88 0.24 665/1121 0.250 0.90 0.26 1003/2057 0.252 0.89 0.10 rs41302867 6 7240876 A 338/936 0.028 0.64 0.12 665/1121 0.027 0.81 0.39 1003/2057 0.027 0.73 0.10 rs9986447 6 42942779 G 338/936 0.229 0.88 0.24 665/1121 0.216 0.86 0.12 1003/2057 0.222 0.87 0.051 rs74678433 7 126173950 G 338/936 0.021 1.42 0.28 665/1121 0.021 1.23 0.48 1003/2057 0.021 1.31 0.21 rs34481144 11 320836 T 338/936 0.116 1.07 0.66 665/1121 0.116 0.75 0.03 1003/2057 0.116 0.88 0.19 rs8009244 14 55643752 A 338/936 0.234 0.78 0.03 665/1121 0.231 0.94 0.55 1003/2057 0.232 0.87 0.06 rs148278682 19 13264042 A 338/936 0.012 0.63 0.29 665/1121 0.010 0.83 0.67 1003/2057 0.011 0.72 0.29 rs268672 19 40901604 A 338/936 0.197 0.86 0.20 665/1121 0.207 0.88 0.18 1003/2057 0.202 0.87 0.07 rs268671 19 40901614 A 338/936 0.197 0.87 0.21 665/1121 0.210 0.86 0.14 1003/2057 0.204 0.86 0.052 rs73885319 22 36661906 G 338/936 0.210 1.06 0.62 665/1121 0.210 0.95 0.63 1003/2057 0.210 1.00 0.96 rs60910145 22 36662034 G 338/936 0.204 1.05 0.65 665/1121 0.208 0.95 0.62 1003/2057 0.207 0.99 0.93 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; EA, effect allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value; APOL1-negative model: adjusted for age, sex and PC1, APOL1 risk carriers removed.

87

Supplementary Table 2. GTEx results of top associations (P<1x10-4) Variant Associated region Target Gene P-Value Effect Size Tissue rs268671 PRX PRX 3.90E-05 -0.23 Whole Blood rs268671 PRX SERTAD3 1.30E-16 -0.5 Skin - Sun Exposed (Lower leg) rs268671 PRX SERTAD3 3.20E-08 -0.41 Skin - Not Sun Exposed (Suprapubic) rs268671 PRX SERTAD3 1.90E-07 -0.22 Whole Blood rs268671 PRX SERTAD3 2.40E-07 -0.32 Nerve - Tibial rs268671 PRX SERTAD3 1.90E-06 -0.27 Cells - Transformed fibroblasts rs268671 PRX SERTAD3 2.00E-06 -0.34 Artery - Tibial rs268671 PRX SERTAD3 4.20E-06 -0.29 Muscle - Skeletal rs268671 PRX SERTAD3 9.40E-06 -0.26 Adipose - Subcutaneous rs268671 PRX SERTAD3 2.00E-05 -0.33 Heart - Left Ventricle rs268672 PRX PRX 3.90E-05 -0.23 Whole Blood rs268672 PRX SERTAD3 1.30E-16 -0.5 Skin - Sun Exposed (Lower leg) rs268672 PRX SERTAD3 3.20E-08 -0.41 Skin - Not Sun Exposed (Suprapubic) rs268672 PRX SERTAD3 1.90E-07 -0.22 Whole Blood rs268672 PRX SERTAD3 2.40E-07 -0.32 Nerve - Tibial rs268672 PRX SERTAD3 1.90E-06 -0.27 Cells - Transformed fibroblasts rs268672 PRX SERTAD3 2.00E-06 -0.34 Artery - Tibial rs268672 PRX SERTAD3 4.20E-06 -0.29 Muscle - Skeletal rs268672 PRX SERTAD3 9.40E-06 -0.26 Adipose - Subcutaneous rs268672 PRX SERTAD3 2.00E-05 -0.33 Heart - Left Ventricle rs34481144 IFITM3 IFITM1 7.70E-06 0.2 Whole Blood rs34481144 IFITM3 IFITM1 9.50E-06 0.41 Heart - Atrial Appendage rs34481144 IFITM3 IFITM1 1.40E-05 0.25 Esophagus - Muscularis rs34481144 IFITM3 IFITM1 1.40E-05 0.17 Skin - Sun Exposed (Lower leg) rs34481144 IFITM3 IFITM1 1.90E-05 0.16 Muscle - Skeletal rs34481144 IFITM3 IFITM1 3.60E-05 0.24 Nerve - Tibial rs34481144 IFITM3 IFITM2 4.20E-09 0.72 Cells - EBV-transformed lymphocytes

88 rs34481144 IFITM3 IFITM2 1.40E-08 0.33 Cells - Transformed fibroblasts rs34481144 IFITM3 IFITM2 2.10E-05 0.19 Lung rs34481144 IFITM3 IFITM2 2.60E-05 0.28 Adipose - Visceral (Omentum) rs34481144 IFITM3 IFITM3 3.60E-15 -0.49 Whole Blood rs34481144 IFITM3 IFITM3 9.60E-11 -0.25 Testis rs34481144 IFITM3 IFITM3 4.00E-08 -0.35 Pancreas rs34481144 IFITM3 IFITM3 7.50E-08 -0.34 Brain - Caudate (basal ganglia) rs34481144 IFITM3 IFITM3 1.00E-06 -0.29 Brain - Putamen (basal ganglia) rs34481144 IFITM3 IFITM3 1.50E-06 -0.35 Brain - Nucleus accumbens (basal ganglia) rs34481144 IFITM3 IFITM3 1.10E-05 -0.38 Brain - Hypothalamus rs34481144 IFITM3 IFITM3 7.30E-05 -0.19 Thyroid rs34481144 IFITM3 RP11-326C3.10 1.30E-05 0.53 Testis rs34481144 IFITM3 RP11-326C3.11 1.20E-16 -0.61 Adipose - Subcutaneous rs34481144 IFITM3 RP11-326C3.11 4.10E-11 -0.48 Skin - Sun Exposed (Lower leg) rs34481144 IFITM3 RP11-326C3.11 1.70E-07 -0.37 Lung rs34481144 IFITM3 RP11-326C3.11 5.00E-07 -0.49 Breast - Mammary Tissue rs34481144 IFITM3 RP11-326C3.11 2.70E-06 -0.56 Pancreas rs34481144 IFITM3 RP11-326C3.11 1.00E-05 -0.47 Skin - Not Sun Exposed (Suprapubic) rs34481144 IFITM3 RP11-326C3.11 1.30E-05 -0.48 Cells - EBV-transformed lymphocytes rs34481144 IFITM3 RP11-326C3.11 2.30E-05 -0.34 Colon - Transverse rs34481144 IFITM3 RP11-326C3.11 2.70E-05 -0.36 Stomach rs34481144 IFITM3 RP11-326C3.11 5.90E-05 -0.33 Artery - Tibial rs34481144 IFITM3 RP11-326C3.12 1.20E-13 0.77 Esophagus - Muscularis rs34481144 IFITM3 RP11-326C3.12 1.80E-11 0.64 Nerve - Tibial rs34481144 IFITM3 RP11-326C3.12 6.10E-10 0.8 Brain - Caudate (basal ganglia) rs34481144 IFITM3 RP11-326C3.12 9.80E-10 0.9 Brain - Hypothalamus rs34481144 IFITM3 RP11-326C3.12 1.90E-09 0.83 Colon - Sigmoid rs34481144 IFITM3 RP11-326C3.12 4.20E-09 0.77 Brain - Cortex rs34481144 IFITM3 RP11-326C3.12 5.70E-09 0.55 Artery - Tibial rs34481144 IFITM3 RP11-326C3.12 6.20E-09 0.8 Brain - Putamen (basal ganglia)

89 rs34481144 IFITM3 RP11-326C3.12 5.90E-08 0.44 Skin - Sun Exposed (Lower leg) rs34481144 IFITM3 RP11-326C3.12 2.30E-07 0.69 Brain - Nucleus accumbens (basal ganglia) rs34481144 IFITM3 RP11-326C3.12 3.00E-07 0.71 Brain - Frontal Cortex (BA9) rs34481144 IFITM3 RP11-326C3.12 7.60E-07 0.64 Esophagus - Gastroesophageal Junction rs34481144 IFITM3 RP11-326C3.12 8.70E-07 0.55 Artery - Aorta rs34481144 IFITM3 RP11-326C3.12 1.70E-06 0.24 Whole Blood rs34481144 IFITM3 RP11-326C3.12 3.80E-06 0.62 Brain - Cerebellum rs34481144 IFITM3 RP11-326C3.12 1.10E-05 0.5 Testis rs34481144 IFITM3 RP11-326C3.12 2.10E-05 0.39 Colon - Transverse rs34481144 IFITM3 RP11-326C3.13 6.20E-22 0.56 Whole Blood rs34481144 IFITM3 RP11-326C3.13 7.30E-13 0.63 Artery - Tibial rs34481144 IFITM3 RP11-326C3.13 7.40E-13 0.7 Nerve - Tibial rs34481144 IFITM3 RP11-326C3.13 2.40E-12 0.55 Lung rs34481144 IFITM3 RP11-326C3.13 4.00E-12 0.81 Esophagus - Gastroesophageal Junction rs34481144 IFITM3 RP11-326C3.13 1.90E-11 0.62 Adipose - Subcutaneous rs34481144 IFITM3 RP11-326C3.13 1.30E-08 0.81 Colon - Sigmoid rs34481144 IFITM3 RP11-326C3.13 8.70E-08 0.48 Skin - Sun Exposed (Lower leg) rs34481144 IFITM3 RP11-326C3.13 1.20E-07 0.52 Testis rs34481144 IFITM3 RP11-326C3.13 1.70E-07 0.51 Breast - Mammary Tissue rs34481144 IFITM3 RP11-326C3.13 2.80E-07 0.42 Adipose - Visceral (Omentum) rs34481144 IFITM3 RP11-326C3.13 5.10E-07 0.55 Artery - Aorta rs34481144 IFITM3 RP11-326C3.13 1.60E-05 0.46 Heart - Left Ventricle rs34481144 IFITM3 RP11-326C3.13 2.40E-05 0.42 Esophagus - Mucosa rs34481144 IFITM3 RP11-326C3.14 4.40E-06 0.51 Testis rs34481144 IFITM3 RP11-326C3.15 1.70E-09 0.52 Artery - Tibial rs34481144 IFITM3 RP11-326C3.15 2.00E-08 0.48 Nerve - Tibial rs34481144 IFITM3 RP11-326C3.15 7.40E-08 0.28 Whole Blood rs34481144 IFITM3 RP11-326C3.15 3.00E-06 0.47 Esophagus - Muscularis rs34481144 IFITM3 RP11-326C3.15 7.60E-06 0.6 Brain - Anterior cingulate cortex (BA24) rs34481144 IFITM3 RP11-326C3.15 1.40E-05 0.59 Brain - Caudate (basal ganglia)

90 rs34481144 IFITM3 RP11-326C3.15 4.20E-05 0.57 Brain - Cerebellum rs34481144 IFITM3 RP11-326C3.7 2.80E-06 0.59 Cells - EBV-transformed lymphocytes rs34481144 IFITM3 RP11-326C3.7 1.70E-05 0.25 Whole Blood rs59928984 PLEKHN1 RP11-206L10.1 1.30E-07 -0.97 Adrenal Gland rs59928984 PLEKHN1 RP11-206L10.1 2.00E-06 -0.62 Nerve - Tibial rs59928984 PLEKHN1 RP11-206L10.1 7.60E-06 -0.82 Stomach rs59928984 PLEKHN1 RP11-206L10.1 1.70E-05 -0.94 Spleen rs665312 RAD51AP2 GEN1 2.10E-06 -0.24 Skin - Sun Exposed (Lower leg) rs665312 RAD51AP2 GEN1 2.10E-05 -0.26 Skin - Not Sun Exposed (Suprapubic) rs665312 RAD51AP2 GEN1 2.30E-05 -0.28 Adipose - Visceral (Omentum) rs665312 RAD51AP2 RAD51AP2 7.50E-08 0.33 Esophagus - Mucosa rs665312 RAD51AP2 RAD51AP2 8.30E-06 0.38 Stomach rs665312 RAD51AP2 SMC6 3.10E-06 -0.2 Thyroid rs665312 RAD51AP2 VSNL1 4.00E-06 0.32 Liver rs665312 RAD51AP2 VSNL1 9.10E-06 0.47 Pancreas rs8009244 DLGAP5 LGALS3 1.70E-05 -0.51 Artery - Aorta rs8009244 DLGAP5 RP11-454L9.2 2.70E-05 -0.55 Testis rs8009244 DLGAP5 RP11-665C16.6 1.90E-06 -0.75 Testis rs834514 RAD51AP2 GEN1 1.40E-08 -0.36 Thyroid rs834514 RAD51AP2 GEN1 1.20E-07 -0.27 Skin - Sun Exposed (Lower leg) rs834514 RAD51AP2 GEN1 2.80E-07 -0.31 Skin - Not Sun Exposed (Suprapubic) rs834514 RAD51AP2 GEN1 1.20E-06 -0.43 Heart - Left Ventricle rs834514 RAD51AP2 GEN1 2.10E-06 -0.33 Artery - Tibial rs834514 RAD51AP2 GEN1 6.00E-06 -0.27 Stomach rs834514 RAD51AP2 GEN1 1.20E-05 -0.25 Adipose - Subcutaneous rs834514 RAD51AP2 GEN1 1.20E-05 -0.29 Adipose - Visceral (Omentum) rs834514 RAD51AP2 GEN1 2.00E-05 -0.26 Lung rs834514 RAD51AP2 GEN1 2.00E-05 -0.27 Nerve - Tibial rs834514 RAD51AP2 GEN1 2.40E-05 -0.5 Esophagus - Gastroesophageal Junction rs834514 RAD51AP2 GEN1 3.00E-05 -0.18 Colon - Transverse

91

rs834514 RAD51AP2 GEN1 4.30E-05 -0.4 Pancreas rs834514 RAD51AP2 RAD51AP2 1.60E-07 0.32 Esophagus - Mucosa rs834514 RAD51AP2 RAD51AP2 2.10E-06 0.41 Stomach rs834514 RAD51AP2 SMC6 3.50E-07 0.51 Liver rs834514 RAD51AP2 SMC6 3.90E-06 -0.2 Thyroid rs834514 RAD51AP2 VSNL1 4.60E-07 0.34 Liver rs834514 RAD51AP2 VSNL1 3.70E-05 0.45 Pancreas rs9986447 PEX6 C6orf226 2.50E-05 -0.18 Nerve - Tibial rs9986447 PEX6 CNPY3 4.50E-05 -0.22 Testis rs9986447 PEX6 GNMT 8.00E-19 -0.53 Thyroid rs9986447 PEX6 GNMT 6.00E-17 -0.55 Artery - Tibial rs9986447 PEX6 GNMT 1.70E-15 -0.43 Muscle - Skeletal rs9986447 PEX6 GNMT 2.20E-14 -0.49 Nerve - Tibial rs9986447 PEX6 GNMT 1.30E-13 -0.52 Adipose - Subcutaneous rs9986447 PEX6 GNMT 1.40E-13 -0.91 Brain - Frontal Cortex (BA9) rs9986447 PEX6 GNMT 2.70E-13 -0.5 Heart - Left Ventricle rs9986447 PEX6 GNMT 1.20E-12 -1 Brain - Cortex rs9986447 PEX6 GNMT 2.80E-11 -0.46 Esophagus - Muscularis rs9986447 PEX6 GNMT 1.30E-10 -0.43 Skin - Sun Exposed (Lower leg) rs9986447 PEX6 GNMT 5.90E-10 -0.62 Brain - Cerebellar Hemisphere rs9986447 PEX6 GNMT 7.40E-10 -0.87 Brain - Putamen (basal ganglia) rs9986447 PEX6 GNMT 9.30E-10 -0.82 Brain - Hippocampus rs9986447 PEX6 GNMT 6.00E-09 -0.86 Brain - Hypothalamus rs9986447 PEX6 GNMT 8.10E-09 -0.66 Colon - Sigmoid rs9986447 PEX6 GNMT 1.40E-08 -0.92 Brain - Anterior cingulate cortex (BA24) rs9986447 PEX6 GNMT 1.60E-08 -0.28 Lung rs9986447 PEX6 GNMT 4.10E-08 -0.44 Artery - Aorta rs9986447 PEX6 GNMT 4.00E-07 -0.7 Brain - Nucleus accumbens (basal ganglia) rs9986447 PEX6 GNMT 5.00E-07 -0.61 Brain - Caudate (basal ganglia) rs9986447 PEX6 GNMT 6.60E-07 -0.57 Artery - Coronary

92 rs9986447 PEX6 GNMT 2.00E-06 -0.41 Heart - Atrial Appendage rs9986447 PEX6 GNMT 2.10E-06 -0.48 Brain - Cerebellum rs9986447 PEX6 GNMT 8.10E-06 -0.28 Cells - Transformed fibroblasts rs9986447 PEX6 GNMT 1.30E-05 -0.43 Adrenal Gland rs9986447 PEX6 GNMT 1.40E-05 -0.34 Esophagus - Gastroesophageal Junction rs9986447 PEX6 PEX6 1.10E-92 0.95 Muscle - Skeletal rs9986447 PEX6 PEX6 1.20E-77 0.66 Whole Blood rs9986447 PEX6 PEX6 1.30E-70 0.82 Cells - Transformed fibroblasts rs9986447 PEX6 PEX6 1.60E-68 0.84 Adipose - Subcutaneous rs9986447 PEX6 PEX6 5.90E-67 0.79 Skin - Sun Exposed (Lower leg) rs9986447 PEX6 PEX6 2.00E-58 0.78 Esophagus - Mucosa rs9986447 PEX6 PEX6 8.50E-58 0.72 Lung rs9986447 PEX6 PEX6 2.90E-57 0.65 Nerve - Tibial rs9986447 PEX6 PEX6 1.30E-54 0.54 Artery - Tibial rs9986447 PEX6 PEX6 3.00E-50 0.47 Thyroid rs9986447 PEX6 PEX6 8.30E-48 0.87 Skin - Not Sun Exposed (Suprapubic) rs9986447 PEX6 PEX6 3.80E-47 0.58 Esophagus - Muscularis rs9986447 PEX6 PEX6 6.40E-42 0.72 Colon - Transverse rs9986447 PEX6 PEX6 1.50E-40 0.68 Breast - Mammary Tissue rs9986447 PEX6 PEX6 1.10E-34 0.68 Heart - Left Ventricle rs9986447 PEX6 PEX6 8.00E-34 0.67 Testis rs9986447 PEX6 PEX6 1.30E-32 0.66 Adipose - Visceral (Omentum) rs9986447 PEX6 PEX6 6.70E-30 0.41 Artery - Aorta rs9986447 PEX6 PEX6 3.20E-29 1.2 Cells - EBV-transformed lymphocytes rs9986447 PEX6 PEX6 2.90E-28 0.63 Heart - Atrial Appendage rs9986447 PEX6 PEX6 2.20E-25 0.6 Colon - Sigmoid rs9986447 PEX6 PEX6 7.80E-25 0.69 Esophagus - Gastroesophageal Junction rs9986447 PEX6 PEX6 1.60E-24 0.81 Adrenal Gland rs9986447 PEX6 PEX6 5.40E-22 0.72 Spleen rs9986447 PEX6 PEX6 5.70E-20 0.65 Stomach

93

rs9986447 PEX6 PEX6 9.40E-17 0.7 Liver rs9986447 PEX6 PEX6 3.30E-16 0.75 Pancreas rs9986447 PEX6 PEX6 3.40E-16 0.57 Brain - Caudate (basal ganglia) rs9986447 PEX6 PEX6 5.30E-15 0.68 Ovary rs9986447 PEX6 PEX6 7.00E-15 0.76 Small Intestine - Terminal Ileum rs9986447 PEX6 PEX6 9.70E-15 0.48 Artery - Coronary rs9986447 PEX6 PEX6 1.30E-13 0.73 Brain - Cortex rs9986447 PEX6 PEX6 7.20E-13 0.67 Vagina rs9986447 PEX6 PEX6 7.50E-13 0.59 Pituitary rs9986447 PEX6 PEX6 1.20E-12 0.58 Brain - Frontal Cortex (BA9) rs9986447 PEX6 PEX6 1.30E-12 0.54 Brain - Cerebellum rs9986447 PEX6 PEX6 4.40E-11 0.71 Brain - Putamen (basal ganglia) rs9986447 PEX6 PEX6 1.90E-10 0.43 Prostate rs9986447 PEX6 PEX6 2.80E-10 0.47 Brain - Nucleus accumbens (basal ganglia) rs9986447 PEX6 PEX6 4.00E-10 0.62 Brain - Hippocampus rs9986447 PEX6 PEX6 2.40E-09 0.61 Brain - Anterior cingulate cortex (BA24) rs9986447 PEX6 PEX6 1.40E-08 0.35 Brain - Cerebellar Hemisphere rs9986447 PEX6 PEX6 6.90E-07 0.36 Brain - Hypothalamus rs9986447 PEX6 RP3-475N16.1 2.10E-06 0.41 Stomach Abbreviations: P, p value.

94

Supplementary Table 3. Results of top associations from baseline model in APOL1-negative model

Discovery (T2D-ESKD cases vs. non- Replication (T2D-ESKD cases vs. non- Meta-analysis diabetic non-nephropathy controls) diabetic non-nephropathy controls) Conse C N N Dire Varian E N E O Gene quenc H POS EAF OR P Case/contr EAF OR P Case/c P ctio t A Case/control AF R e R ol ontrol n 0. 1. rs5992 PLEK 9076 1994/2 0.00 intron 1 C 358/936 0.210 1.36 0.0194 1636/1121 0.208 1.25 0.01128 20 2 ++ 8984 HN1 22 057 065 9 8 0. 2. rs1808 NAD 1685 1994/2 0.00 intron 1 T 358/936 0.009 4.34 0.0065 1636/1121 0.011 2.35 0.00879 01 7 ++ 81514 K 713 057 026 1 6 0. 0. rs8345 RAD5 missen 1769 1994/2 0.00 2 T 358/936 0.217 0.74 0.0206 1636/1121 0.209 0.80 0.00865 21 7 -- 14 1AP2 se 6573 057 052 1 8 0. 0. rs6653 RAD5 synony 1769 1994/2 0.00 2 G 358/936 0.248 0.72 0.0081 1636/1121 0.239 0.82 0.01665 24 7 ++ 12 1AP2 mous 8678 057 054 2 9 0. 0. rs4130 RRE 7240 1994/2 3.60 intron 6 A 358/936 0.025 0.36 0.0032 1636/1121 0.021 0.49 0.00282 02 4 -- 2867 B1 876 057 E-05 2 4 0. 0. rs9986 Splice 4294 1994/2 0.00 PEX6 6 G 358/936 0.223 0.69 0.0024 1636/1121 0.212 0.82 0.02231 21 7 ++ 447 region 2779 057 030 5 7 1261 0. 2. rs7467 GRM 1994/2 6.24 intron 7 7395 G 358/936 0.024 2.47 0.0066 1636/1121 0.027 1.97 0.00277 02 1 -- 8433 8 057 E-05 0 6 1 0. 0. rs2686 synony 4090 1994/2 5.21 PRX 19 A 358/936 0.191 0.71 0.0089 1636/1121 0.196 0.76 0.00184 19 7 -- 72 mous 1604 057 E-05 4 5 0. 0. rs2686 missen 4090 1994/2 1.87 PRX 19 A 358/936 0.191 0.71 0.0089 1636/1121 0.198 0.74 0.00068 19 7 -- 71 se 1614 057 E-05 6 4 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CHR, chromosome; POS, position; EA, effect allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value; APOL1-negative model: adjusted for age, sex and PC1, APOL1 risk carriers removed.

95

Supplementary Table 4. Results of single-variant analysis detected loci in gene-based analysis APOL1-negative Model (T2D-ESKD Discrimination (T2D non-nephropathy Baseline Model (T2D-ESKD cases vs non- cases vs non-diabetic non-nephropathy cases vs non-diabetic non-nephropathy diabetic non-nephropathy controls) controls) controls) N of N of Average N of Metho Variant Average Average Gene N varia P N variant Frequenc P N varian P d Group Frequency Frequency nts s y ts 456/93 338/9 APOL1 CMC H&M 21 0.0084 0.14 358/936 19 0.0071 0.92 18 0.0073 0.73 6 36 456/93 338/9 APOL1 MB H&M 21 0.0084 0.33 358/936 19 0.0071 0.61 18 0.0073 0.18 6 36 456/93 338/9 APOL1 SKAT H&M 21 0.0084 0.17 358/936 19 0.0071 0.21 18 0.0073 0.67 6 36 456/93 338/9 APOL1 VT H&M 19 0.0042 0.085 358/936 11 0.00039 0.39 11 0.00039 0.12 6 36 456/93 338/9 APOL1 CMC High Impact 1 0.0014 0.84 358/936 1 0.0015 0.75 1 0.0016 0.85 6 36 456/93 338/9 APOL1 MB High Impact 1 0.0014 0.84 358/936 1 0.0015 0.75 1 0.0016 0.85 6 36 456/93 338/9 APOL1 SKAT High Impact 1 0.0014 0.84 358/936 1 0.0015 0.75 1 0.0016 0.85 6 36 456/93 338/9 APOL1 VT High Impact 1 0.0014 0.84 358/936 1 0.0015 0.75 1 0.0016 0.85 6 36 single 456/93 338/9 APOL1 CMC 4 0.00036 0.95 358/936 4 0.00039 0.89 3 0.00039 0.23 Predicted 6 36 single 456/93 338/9 APOL1 MB 4 0.00036 0.95 358/936 4 0.00039 0.89 3 0.00039 0.23 Predicted 6 36 single 456/93 338/9 APOL1 SKAT 4 0.00036 0.81 358/936 4 0.00039 0.80 3 0.00039 0.66 Predicted 6 36 single 456/93 338/9 APOL1 VT 4 0.00036 0.95 358/936 4 0.00039 0.89 3 0.00039 0.23 Predicted 6 36 456/93 338/9 DLGAP5 CMC H&M 22 0.0016 0.18 358/936 20 0.0017 0.23 18 0.0017 0.91 6 36 456/93 338/9 DLGAP5 MB H&M 22 0.0016 0.018 358/936 20 0.0017 0.038 18 0.0017 0.75 6 36 456/93 338/9 DLGAP5 SKAT H&M 22 0.0016 0.43 358/936 20 0.0017 0.71 18 0.0017 0.49 6 36 456/93 338/9 DLGAP5 VT H&M 13 0.00041 0.070 358/936 11 0.00039 0.12 14 0.00053 0.68 6 36 single 456/93 338/9 DLGAP5 CMC 6 0.00078 0.23 358/936 6 0.00084 0.40 7 0.00073 0.55 Predicted 6 36 single 456/93 338/9 DLGAP5 MB 6 0.00078 0.27 358/936 6 0.00084 0.45 7 0.00073 0.65 Predicted 6 36 single 456/93 338/9 DLGAP5 SKAT 6 0.00078 0.50 358/936 6 0.00084 0.57 7 0.00073 0.14 Predicted 6 36 single 456/93 338/9 DLGAP5 VT 6 0.00078 0.44 358/936 6 0.00084 0.68 4 0.00049 0.87 Predicted 6 36 456/93 338/9 GRM8 CMC All predicted 3 0.00036 0.48 358/936 3 0.00039 0.64 3 0.00039 0.96 6 36

96

456/93 338/9 GRM8 MB All predicted 3 0.00036 0.48 358/936 3 0.00039 0.64 3 0.00039 0.96 6 36 456/93 338/9 GRM8 SKAT All predicted 3 0.00036 0.27 358/936 3 0.00039 0.26 3 0.00039 0.22 6 36 456/93 338/9 GRM8 VT All predicted 3 0.00036 0.48 358/936 3 0.00039 0.64 3 0.00039 0.96 6 36 456/93 338/9 GRM8 CMC H&M 22 0.0033 0.54 358/936 21 0.0035 0.37 21 0.0037 0.92 6 36 456/93 338/9 GRM8 MB H&M 22 0.0033 0.98 358/936 21 0.0035 0.90 21 0.0037 0.70 6 36 456/93 338/9 GRM8 SKAT H&M 22 0.0033 0.31 358/936 21 0.0035 0.34 21 0.0037 0.92 6 36 456/93 338/9 GRM8 VT H&M 18 0.00078 0.48 358/936 17 0.00080 0.78 14 0.00039 0.93 6 36 456/93 GRM8 CMC High Impact 1 0.00036 0.44 358/936 1 0.00039 0.35 NA NA NA NA 6 456/93 GRM8 MB High Impact 1 0.00036 0.44 358/936 1 0.00039 0.35 NA NA NA NA 6 456/93 GRM8 SKAT High Impact 1 0.00036 0.44 358/936 1 0.00039 0.35 NA NA NA NA 6 456/93 GRM8 VT High Impact 1 0.00036 0.44 358/936 1 0.00039 0.35 NA NA NA NA 6 single 456/93 338/9 GRM8 CMC 21 0.0035 0.46 358/936 20 0.0036 0.30 21 0.0037 0.92 Predicted 6 36 single 456/93 338/9 GRM8 MB 21 0.0035 0.71 358/936 20 0.0036 0.58 21 0.0037 0.70 Predicted 6 36 single 456/93 338/9 GRM8 SKAT 21 0.0035 0.33 358/936 20 0.0036 0.36 21 0.0037 0.92 Predicted 6 36 single 456/93 338/9 GRM8 VT 17 0.00080 0.68 358/936 20 0.0036 0.77 14 0.00039 0.93 Predicted 6 36 456/93 338/9 IER2 CMC H&M 1 0.00037 0.75 358/936 1 0.00039 0.77 1 0.00040 0.66 6 36 456/93 338/9 IER2 MB H&M 1 0.00037 0.75 358/936 1 0.00039 0.77 1 0.00040 0.66 6 36 456/93 338/9 IER2 SKAT H&M 1 0.00037 0.75 358/936 1 0.00039 0.77 1 0.00040 0.66 6 36 456/93 338/9 IER2 VT H&M 1 0.00037 0.75 358/936 1 0.00039 0.77 1 0.00040 0.66 6 36 single 456/93 338/9 IER2 CMC 1 0.00037 0.75 358/936 1 0.00039 0.77 1 0.00040 0.66 Predicted 6 36 single 456/93 338/9 IER2 MB 1 0.00037 0.75 358/936 1 0.00039 0.77 1 0.00040 0.66 Predicted 6 36 single 456/93 338/9 IER2 SKAT 1 0.00037 0.75 358/936 1 0.00039 0.77 1 0.00040 0.66 Predicted 6 36 single 456/93 338/9 IER2 VT 1 0.00037 0.75 358/936 1 0.00039 0.77 1 0.00040 0.66 Predicted 6 36 single 456/93 338/9 IFITM3 CMC 1 0.00036 0.70 358/936 1 0.00039 0.78 1 0.0012 0.074 Predicted 6 36

97

single 456/93 338/9 IFITM3 MB 1 0.00036 0.70 358/936 1 0.00039 0.78 1 0.0012 0.074 Predicted 6 36 single 456/93 338/9 IFITM3 SKAT 1 0.00036 0.70 358/936 1 0.00039 0.78 1 0.0012 0.074 Predicted 6 36 single 456/93 338/9 IFITM3 VT 1 0.00036 0.70 358/936 1 0.00039 0.78 1 0.0012 0.074 Predicted 6 36 456/93 NADK CMC All predicted 1 0.00036 0.056 358/936 1 0.00039 0.049 NA NA NA NA 6 456/93 NADK MB All predicted 1 0.00036 0.056 358/936 1 0.00039 0.049 NA NA NA NA 6 456/93 NADK SKAT All predicted 1 0.00036 0.056 358/936 1 0.00039 0.049 NA NA NA NA 6 456/93 NADK VT All predicted 1 0.00036 0.056 358/936 1 0.00039 0.049 NA NA NA NA 6 456/93 338/9 NADK CMC H&M 14 0.0013 0.92 358/936 13 0.0015 0.93 11 0.0017 0.89 6 36 456/93 338/9 NADK MB H&M 14 0.0013 0.78 358/936 13 0.0015 0.96 11 0.0017 0.88 6 36 456/93 338/9 NADK SKAT H&M 14 0.0013 0.97 358/936 13 0.0015 1.00 11 0.0017 0.28 6 36 456/93 338/9 NADK VT H&M 8 0.00036 0.86 358/936 6 0.00039 0.96 3 0.00039 0.53 6 36 single 456/93 338/9 NADK CMC 9 0.00068 0.78 358/936 8 0.00077 0.94 6 0.0010 0.64 Predicted 6 36 single 456/93 338/9 NADK MB 9 0.00068 0.48 358/936 8 0.00077 0.78 6 0.0010 0.87 Predicted 6 36 single 456/93 338/9 NADK SKAT 9 0.00068 0.50 358/936 8 0.00077 0.76 6 0.0010 0.69 Predicted 6 36 single 456/93 338/9 NADK VT 7 0.00036 0.40 358/936 6 0.00039 0.82 2 0.00039 0.62 Predicted 6 36 OTUD7 456/93 338/9 CMC H&M 13 0.00091 0.099 358/936 13 0.00095 0.16 13 0.0011 0.98 B 6 36 OTUD7 456/93 338/9 MB H&M 13 0.00091 0.23 358/936 13 0.00095 0.41 13 0.0011 0.88 B 6 36 OTUD7 456/93 338/9 SKAT H&M 13 0.00091 0.15 358/936 13 0.00095 0.15 13 0.0011 0.70 B 6 36 OTUD7 456/93 338/9 VT H&M 13 0.00091 0.23 358/936 13 0.00095 0.35 12 0.00052 0.94 B 6 36 OTUD7 single 456/93 338/9 CMC 8 0.00036 0.91 358/936 8 0.00039 0.65 8 0.00039 0.37 B Predicted 6 36 OTUD7 single 456/93 338/9 MB 8 0.00036 0.91 358/936 8 0.00039 0.65 8 0.00039 0.37 B Predicted 6 36 OTUD7 single 456/93 338/9 SKAT 8 0.00036 0.25 358/936 8 0.00039 0.20 8 0.00039 0.19 B Predicted 6 36 OTUD7 single 456/93 338/9 VT 7 0.00036 0.66 358/936 7 0.00039 0.43 8 0.00039 0.37 B Predicted 6 36 456/93 338/9 PEX6 CMC All predicted 8 0.00040 0.10 358/936 8 0.00039 0.15 5 0.00047 0.92 6 36

98

456/93 338/9 PEX6 MB All predicted 8 0.00040 0.092 358/936 8 0.00039 0.15 5 0.00047 0.98 6 36 456/93 338/9 PEX6 SKAT All predicted 8 0.00040 0.10 358/936 8 0.00039 0.030 5 0.00047 0.62 6 36 456/93 338/9 PEX6 VT All predicted 6 0.00036 0.10 358/936 7 0.00039 0.13 4 0.00039 0.84 6 36 456/93 338/9 PEX6 CMC H&M 33 0.0044 0.91 358/936 33 0.0044 0.84 29 0.0050 0.84 6 36 456/93 338/9 PEX6 MB H&M 33 0.0044 0.89 358/936 33 0.0044 0.85 29 0.0050 0.92 6 36 456/93 338/9 PEX6 SKAT H&M 33 0.0044 0.050 358/936 33 0.0044 0.035 29 0.0050 0.33 6 36 456/93 338/9 PEX6 VT H&M 26 0.00058 0.71 358/936 26 0.00057 0.79 23 0.00099 0.34 6 36 single 456/93 338/9 PEX6 CMC 25 0.0031 0.040 358/936 26 0.0049 0.0006 22 0.0052 0.66 Predicted 6 36 single 456/93 338/9 PEX6 MB 25 0.0031 0.13 358/936 26 0.0049 0.029 22 0.0052 0.93 Predicted 6 36 single 456/93 338/9 PEX6 SKAT 25 0.0031 0.13 358/936 26 0.0049 0.035 22 0.0052 0.21 Predicted 6 36 single 456/93 338/9 PEX6 VT 25 0.0031 0.18 358/936 26 0.0049 0.0040 12 0.00039 0.30 Predicted 6 36 PLEKHN 456/93 338/9 CMC H&M 32 0.0031 0.33 358/936 31 0.0032 0.37 31 0.0028 0.0012 1 6 36 PLEKHN 456/93 338/9 MB H&M 32 0.0031 0.18 358/936 31 0.0032 0.20 31 0.0028 0.0028 1 6 36 PLEKHN 456/93 338/9 SKAT H&M 32 0.0031 0.36 358/936 31 0.0032 0.71 31 0.0028 0.26 1 6 36 PLEKHN 456/93 338/9 VT H&M 18 0.00046 0.16 358/936 23 0.00073 0.35 31 0.0028 0.0096 1 6 36 PLEKHN single 456/93 338/9 CMC 13 0.0047 0.22 358/936 13 0.0048 0.47 13 0.0043 0.014 1 Predicted 6 36 PLEKHN single 456/93 338/9 MB 13 0.0047 0.17 358/936 13 0.0048 0.35 13 0.0043 0.027 1 Predicted 6 36 PLEKHN single 456/93 338/9 SKAT 13 0.0047 0.61 358/936 13 0.0048 0.75 13 0.0043 0.14 1 Predicted 6 36 PLEKHN single 456/93 338/9 VT 9 0.00068 0.28 358/936 9 0.00073 0.49 13 0.0043 0.082 1 Predicted 6 36 456/93 338/9 PRX CMC H&M 43 0.0029 0.34 358/936 40 0.0031 0.44 43 0.0028 0.85 6 36 456/93 338/9 PRX MB H&M 43 0.0029 0.83 358/936 40 0.0031 0.48 43 0.0028 0.88 6 36 456/93 338/9 PRX SKAT H&M 43 0.0029 0.50 358/936 40 0.0031 0.56 43 0.0028 0.41 6 36 456/93 338/9 PRX VT H&M 29 0.00036 0.82 358/936 27 0.00039 0.52 33 0.00046 0.99 6 36 single 456/93 338/9 PRX CMC 22 0.0024 0.72 358/936 22 0.0024 0.62 25 0.0021 0.83 Predicted 6 36

99

single 456/93 338/9 PRX MB 22 0.0024 0.26 358/936 22 0.0024 0.45 25 0.0021 0.89 Predicted 6 36 single 456/93 338/9 PRX SKAT 22 0.0024 0.31 358/936 22 0.0024 0.40 25 0.0021 0.67 Predicted 6 36 single 456/93 338/9 PRX VT 15 0.00036 0.25 358/936 15 0.00039 0.40 21 0.00047 0.95 Predicted 6 36 RAD51A 456/93 338/9 CMC H&M 36 0.0060 0.22 358/936 34 0.0064 0.28 40 0.0061 0.62 P2 6 36 RAD51A 456/93 338/9 MB H&M 36 0.0060 0.23 358/936 34 0.0064 0.15 40 0.0061 0.35 P2 6 36 RAD51A 456/93 338/9 SKAT H&M 36 0.0060 0.29 358/936 34 0.0064 0.37 40 0.0061 0.38 P2 6 36 RAD51A 456/93 338/9 VT H&M 30 0.0015 0.77 358/936 28 0.0016 0.48 20 0.00039 0.63 P2 6 36 RAD51A 456/93 338/9 CMC High Impact 2 0.00036 0.64 358/936 2 0.00039 0.64 3 0.00039 0.98 P2 6 36 RAD51A 456/93 338/9 MB High Impact 2 0.00036 0.64 358/936 2 0.00039 0.64 3 0.00039 0.98 P2 6 36 RAD51A 456/93 338/9 SKAT High Impact 2 0.00036 0.84 358/936 2 0.00039 0.83 3 0.00039 0.70 P2 6 36 RAD51A 456/93 338/9 VT High Impact 2 0.00036 0.64 358/936 2 0.00039 0.64 2 0.00039 0.64 P2 6 36 RAD51A single 456/93 338/9 CMC 13 0.0058 0.18 358/936 13 0.0059 0.25 16 0.0054 0.61 P2 Predicted 6 36 RAD51A single 456/93 338/9 MB 13 0.0058 0.10 358/936 13 0.0059 0.14 16 0.0054 0.81 P2 Predicted 6 36 RAD51A single 456/93 338/9 SKAT 13 0.0058 0.41 358/936 13 0.0059 0.58 16 0.0054 0.40 P2 Predicted 6 36 RAD51A single 456/93 338/9 VT 12 0.0037 0.51 358/936 12 0.0038 0.62 6 0.00039 0.46 P2 Predicted 6 36 456/93 338/9 RREB1 CMC All predicted 3 0.011 0.27 358/936 3 0.012 0.19 3 0.0097 0.058 6 36 456/93 338/9 RREB1 MB All predicted 3 0.011 0.18 358/936 3 0.012 0.24 3 0.0097 0.085 6 36 456/93 338/9 RREB1 SKAT All predicted 3 0.011 0.34 358/936 3 0.012 0.25 3 0.0097 0.080 6 36 456/93 338/9 RREB1 VT All predicted 2 0.00072 0.41 358/936 3 0.012 0.44 3 0.0097 0.11 6 36 456/93 338/9 RREB1 CMC H&M 46 0.0047 0.85 358/936 44 0.0050 0.86 46 0.0047 0.54 6 36 456/93 338/9 RREB1 MB H&M 46 0.0047 0.91 358/936 44 0.0050 0.85 46 0.0047 0.85 6 36 456/93 338/9 RREB1 SKAT H&M 46 0.0047 0.20 358/936 44 0.0050 0.18 46 0.0047 0.78 6 36 456/93 338/9 RREB1 VT H&M 11 0.00036 0.97 358/936 13 0.00039 0.82 16 0.00039 0.95 6 36 single 456/93 338/9 RREB1 CMC 29 0.0037 0.49 358/936 27 0.0041 0.56 26 0.0041 0.029 Predicted 6 36

100

single 456/93 338/9 RREB1 MB 29 0.0037 0.92 358/936 27 0.0041 0.85 26 0.0041 0.095 Predicted 6 36 single 456/93 338/9 RREB1 SKAT 29 0.0037 0.059 358/936 27 0.0041 0.11 26 0.0041 0.40 Predicted 6 36 single 456/93 338/9 RREB1 VT 28 0.0027 0.62 358/936 26 0.0029 0.55 26 0.0041 0.14 Predicted 6 36 Abbreviations: N, number; P, p value; single predicted, variant group containing variants predicted to be deleterious by at least one prediction methods; all predicted, variants group containing variants predicted to be deleterious by all prediction methods; H&M, variant group containing variants with high to moderate protein structure disrupting effect; high impact, variant group containing variants with high protein structure disrupting effect.

101

Chapter 4

Association of kidney structure-related gene variants with type 2

diabetes-attributed end-stage kidney disease in African Americans

Meijian Guan, Jun Ma, Jacob M. Keaton, Latchezar Dimitrov, Poorva Mudgal, Mary

Stromberg, Jason A. Bonomo, Pamela J. Hicks, Barry I. Freedman, Donald W. Bowden,

Maggie C. Y. Ng

This manuscript was published in Human Genetics.

Meijian Guan et al. Association of genetic variations in kidney structure-related genes in

African Americans with type 2 diabetes associated end-stage kidney disease. Hum

Genet (2016). doi:10.1007/s00439-016-1714-2.

102

Abstract

African Americans (AAs) are at higher risk for developing end-stage kidney disease

(ESKD) compared to European Americans. Genome-wide association studies have identified variants associated with diabetic and non-diabetic kidney disease. Nephropathy loci including

SLC7A9, UMOD and SHROOM3 have been implicated in the maintenance of normal glomerular and renal tubular structure and function. Herein, 47 genes important in podocyte, glomerular basement membrane, mesangial cell, mesangial matrix, renal tubular cell, and renal interstitium structure were examined for association with type 2 diabetes (T2D)-attributed ESKD in AAs.

Single variant association analysis was performed in the discovery stage including 2,041 T2D-

ESKD cases and 1,140 controls (non-diabetic, non-nephropathy). Discrimination analyses in

667 T2D cases lacking nephropathy excluded T2D-associated SNPs. Nominal associations were tested in an additional 483 T2D-ESKD cases and 554 controls in the replication stage.

Meta-analysis of 4,218 discovery and replication samples revealed 3 significant associations with T2D-ESKD at CD2AP and MMP2 (Pcorr<0.05 corrected for effective number of SNPs in each locus). Removal of APOL1 renal-risk genotype carriers revealed additional association at 5 loci, TTC21B, COL4A3, NPHP3-ACAD11, CLDN8, and ARHGAP24 (Pcorr<0.05). Genetic variants at COL4A3, CLDN8, and ARHGAP24 were potentially pathogenic. Gene-based associations revealed suggestive significant aggregate effects of coding variants at 4 genes.

Our findings suggest that genetic variation in kidney structure-related genes may contribute to

T2D-attributed ESKD in the AA population.

103

Introduction

Relative to European Americans (EAs), African Americans (AAs) are at higher risk for developing end-stage kidney disease (ESKD). In 2012, the incidence rate of ESKD was 908 cases per million in AAs, nearly 3.3-fold higher than in EAs (USRDS, 2016). The excess rate of

ESKD in AAs is partly explained by socioeconomic factors (Byrne et al. 1994; Freedman 2002).

Diabetes mellitus contributes to more than 44% of all cases of ESKD in the US and approximately 90% relates to type 2 diabetes (T2D)(USRDS, 2016). Familial aggregation of diabetic kidney disease (DKD), including ESKD attributed to T2D (T2D-ESKD), has been reported (Seaquist et al. 1989; Spray et al. 1995; Freedman et al. 1995; Quinn et al. 1996;

Iyengar et al. 2015). AAs with T2D who have T2D-ESKD affected relatives have a nearly 8-fold higher risk for developing T2D-ESKD compared to those who lack relatives with T2D-ESKD.

The association remains after controlling for income, education, serum cholesterol, smoking, and hypertension (Freedman et al. 1995). In addition, prior genome-wide association studies

(GWAS) have identified variants associated with DKD in multiple populations (Maeda 2004;

Pezzolesi et al. 2009; McDonough et al. 2011; Sandholm et al. 2012). These findings suggest that genetic factors may play a role in the development of nephropathy and ESKD in T2D patients. While the apolipoprotein L1 gene (APOL1) explains a substantial proportion of the disparity in non-diabetic ESKD in AAs (Tzur et al. 2010a; Genovese et al. 2010), less is known about the genetic contributors to the high T2D-ESKD risk in AAs (Bonomo et al. 2014a; Ma et al.

2016; Skorecki and Wasser 2016).

Structural abnormalities in the kidney, including glomerular basement membrane (GBM) thickening, podocyte effacement, mesangial matrix expansion, glomerulosclerosis, and tubulointerstitial fibrosis are observed during the development of kidney disease in patients with diabetes (Badal and Danesh 2014). A number of gene products play critical roles in maintaining normal kidney substructure and variants in several of these genes have been associated with

104

DKD. For example, three single nucleotide polymorphisms (SNPs) in the CD2-associated protein gene (CD2AP), essential for podocyte function, were associated with ESKD in

Europeans with type 1 diabetes (T1D) (Hyvönen et al. 2013). Coding variants in the podocyte slit diaphragm gene nephrin (NPHS1) were associated with T2D-ESKD in AAs (Bonomo et al.

2014b). In addition, genetic variants at shroom family member 3 (SHROOM3), which plays a role in maintaining the integrity of the glomerular filtration barrier (Yeo et al. 2015), and uromodulin (UMOD), a kidney-specific gene expressed in the thick ascending limb (TAL) of the loop of Henle, were associated with kidney filtration in Europeans with T2D (Deshmukh et al.

2013).

These findings suggest that additional variants in glomerular and tubulointerstitial structure-related genes may be associated with ESKD in AAs with T2D. Herein, 47 candidate genes involved in kidney structure were examined using a high coverage genotyping array, with a customized content of the respective targeted regions, to evaluate the impacts of common and low frequency SNPs for association with T2D-ESKD in 4,885 AA subjects.

Materials and Methods

Study samples and clinical characteristics

The discovery stage included 2,041 AA participants with T2D-ESKD and 1,140 AA controls without diabetes or nephropathy. The replication stage included 483 AA participants with T2D-ESKD and 554 AA controls without diabetes or nephropathy. Trait discrimination analyses included T2D-ESKD cases and controls from the discovery stage, as well as 667 AAs with T2D who lacked evidence of nephropathy (Table 1). All subjects resided in North Carolina,

South Carolina, Georgia, Tennessee, and Virginia. Patients were considered to have T2D-

ESKD when diabetes was diagnosed after age 25 years and was present for >5 years prior to the onset of ESKD (or in the presence of diabetic retinopathy ensuring adequate T2D durations),

105 and one or more of the following: renal replacement therapy (N=1,928), estimated glomerular filtration rate (eGFR) <30 ml/min/1.73 m2 (N=43), or urine albumin:creatinine ratio (UACR) >300 mg/g (N=70), when eGFR was missing. Patients with ESKD attributed to non-diabetic causes were excluded. Non-diabetic non-nephropathy controls included participants without diabetes or kidney disease (eGFR >60 ml/min/1.73m2 and UACR <30 mg/g). Individuals with T2D lacking nephropathy had eGFR >60 ml/min/1.73m2 and UACR <30 mg/g. Detailed descriptions of inclusion criteria for T2D-ESKD patients, non-diabetic non-nephropathy controls, and T2D lacking nephropathy subjects are included in the Supplementary Methods. This study was approved by the Wake Forest School of Medicine Institutional Review Board. All participants provided written informed consent.

Selection of kidney structure-related candidate genes

Although the precise mechanisms underlying kidney disease progression in patients with

T2D is unclear, dysfunction of the glomerular filtration barrier, mesangial cell proliferation, mesangial matrix expansion, renal tubular atrophy, and renal interstitial fibrosis are observed.

Therefore, genes related to the structure or function of podocytes, the slit diaphragm, GBM, mesangial cells, mesangial matrix, renal tubules, and renal interstitium were selected. Kidney structure-related genes were selected based on literature search and online database queries of

GeneCards and Online Mendelian Inheritance in Man (OMIM) using keywords including

“glomerular”, “podocytes”, “mesangial matrix”, “glomerular basement membrane” and “renal tubular”. A total of 47 genes reported to impact kidney structures were selected. These included

NPHS1, NPHS2, CD2AP and ACTN4 in podocytes/slit diaphragms (Fan et al. 2006), type IV collagen, and LAMB2 in GBM (Miner 2011), ITGA3 and MMP2 in mesangial matrix (Prete et al.

1997; Shukrun et al. 2014), and SLC12A3, ATP6V1BA and the claudin gene-family in renal tubules(Abu Seman et al. 2014; Brown et al. 2009; Yu 2015) (full list in Supplementary Table 1).

106

Genotyping, imputation and quality control

Discovery and trait discrimination samples. Samples from the discovery and trait discrimination stages (to discriminate association with nephropathy from association with T2D per se) were genotyped on a custom Affymetrix Axiom Biobank Genotyping Array (Affymetrix,

Santa Clara, CA, USA). Detailed SNP information, custom content design including fine mapping of candidate regions, genotyping methods, and quality control are described in the

Supplementary Methods. The final dataset consisted of 3,848 subjects with relevant phenotypes for analysis. Relevant to this study, 4,588 highly dense SNPs flanking 25kb upstream and downstream of the 47 selected candidate genes which passed standard quality control (QC) were examined in the discovery stage.

Replication samples. 483 AA T2D-ESKD cases and 554 AA controls (non-diabetic, non-nephropathy) were genotyped in the replication stage with the Affymetrix Genome-wide

Human SNP array 6.0 (Affymetrix, Santa Clara, CA, USA). Standard quality controls (QCs) were applied to remove SNPs with call rate <95%, minor allele frequency (MAF) <0.01, or showing departure from HWE (P<1x10-4). Sample QC was performed to remove subjects with call rates

<95%, contamination, duplicates, or population outliers. SNPs that passed QC were used for pre-phasing and imputation to the cosmopolitan reference haplotype panel from the 1000

Genomes Project phase 1 version 3 (March 2012) (Consortium 2010) using SHAPEIT2

(Delaneau et al. 2012) and IMPUTE2 (Marchini et al. 2007). Imputed variants with imputation info score <0.4 were removed.

Statistical analysis

Single variant association. Single variant association in case-control samples was performed using linear mixed models (LMM) implemented in the program MMM (Pirinen et al.

2013) under an additive genetic model with adjustment for age and gender (Baseline model).

107

This method incorporated a genetic relationship matrix (GRM) estimated from a set of high quality autosomal SNPs as a random effect to control for population structure and cryptic relatedness. Whole genome analysis of discovery samples with 516,226 SNPs that passed QC yielded an inflation factor of 1.02, suggesting population structure was sufficiently controlled by including GRM as a random effect. In the discovery stage, 4,588 genetic variants from 47 candidate gene regions were tested in 2,041 AA T2D-ESKD cases and 1,140 AA non-diabetic, non-nephropathy controls. Nominally significant SNPs (P<0.05) were further analyzed in the discrimination stages.

To distinguish whether discovery stage associations were driven by T2D alone or T2D-

ESKD, 667 AAs with T2D lacking nephropathy were included in two trait discrimination analyses.

First, association was assessed in 2,041 AAs with T2D-ESKD versus 667 AAs with T2D lacking nephropathy. Differences in the effect size between T2D-ESKD versus non-diabetic controls

(discovery stage) and T2D-ESKD versus T2D alone (discrimination stage) analyses were tested using Cochran’s Q-test implemented in METAL (Willer et al. 2010). SNPs showing heterogeneity (Cochran’s Q-test P<0.05) and small effects (P>0.05) in the T2D-ESKD versus

T2D non-nephropathy analysis, suggesting lack of association with ESKD, were excluded.

Second, the remaining SNPs were tested for association between the 667 AA T2D non- nephropathy participants versus the 1,140 AA non-diabetic non-nephropathy controls. SNPs showing nominal association (P<0.05) with T2D were excluded. These analyses removed SNPs that were solely associated with T2D. In secondary discrimination analyses, T2D samples without nephropathy that had T2D durations less than 5 years (N=204) were removed to minimize misclassification.

SNPs remaining after the discovery and discrimination stages were further tested for replication in 483 AA T2D-ESKD cases and 554 AA controls lacking diabetes and nephropathy.

A meta-analysis was performed that included the discovery and replication cases and controls

108

(N=4,885) using a fixed-effect inverse variance weighting method implemented in METAL

(Willer et al. 2010).

Effective number of SNPs calculation. In each candidate region, the effective number of SNPs was calculated from the eigenvalues of the pair-wise SNP correlation matrix created with composite linkage disequilibrium (LD) for SNPs included in the discovery stage using simpleM (Gao et al. 2008). Meta-analysis results were corrected for the effective number of

SNPs in each region given their candidacy for associations. A corrected p-value (Pcorr) <0.05 was considered significant.

Controlling for the effect of the apolipoprotein L1 gene (APOL1). Two sets of variants in the APOL1 gene are strong predictors of non-diabetic kidney disease in AAs. In order to account for the confounding effect of APOL1, the same single variant analyses were repeated (APOL1-negative model) by removing 1,019 APOL1 renal-risk-variant carriers and those missing APOL1 genotypes from T2D-ESKD, T2D non-nephropathy, and non-diabetic non- nephropathy groups, in both discovery and replication cohorts. Specifically, we removed 425 of

2,041 T2D-ESKD cases, 293 of 1,140 non-diabetic non-nephropathy controls and 131 of 667

T2D non-nephropathy individuals from the discovery cohort. In addition, 100 of 483 T2D-ESKD cases and 70 of 554 non-diabetic non-nephropathy controls were removed from the replication cohort. Individuals were considered APOL1 renal-risk-variant carriers if they possessed two G1 alleles (rs60910145 G allele, rs73885319 G allele), two G2 alleles (6 base pair in-frame deletion), or were compound heterozygotes (one G1 and one G2 allele) (Genovese et al. 2010).

The power to detect odds ratio (OR) ≥1.50 for SNPs with an effect allele frequency ≥0.05 at α level =1×10-4 under an additive model fell from 68% to 50% after removing APOL1 risk allele carriers (http://csg.sph.umich.edu//abecasis/cats/).

109

Gene-based association. Sequence kernel association test (SKAT) (Wu et al. 2011) and Madsen-Browning test (MB) (Madsen and Browning 2009), implemented in RAREMETAL

(Liu et al. 2014), were used to perform gene-based analyses with GRM included as a random effect. Only loss of function (LOF) and missense variants for each region were aggregated which was based on annotations from Variant Effect Predictor (VEP) (McLaren et al. 2010).

Candidate regions with only one variant were not tested. Age and gender were included as fixed-effect covariates. A p-value <0.001, corrected for the number of genes tested, was considered statistically significant.

Bioinformatic characterization

Expression quantitative trait loci analysis. We identified proxies (r2>0.8) for top associated variants using SNAP (https://www.broadinstitute.org/mpg/snap/ldsearch.php) based on African population and 1000 Genome Project data. Top variants and their LD proxies were then searched in a publically available eQTL database GTEx (http://www.gtexportal.org/home/).

Comprehensive functional annotation. Comprehensive variant annotation was performed using whole genome sequencing annotator (WGSA) through its Amazon Machine

Image. WGSA incorporates functional annotation from ANNOVAR (Wang et al. 2010), SnpEff

(Cingolani et al. 2012), and VEP (McLaren et al. 2010) using contigs from both RefSeq (Pruitt et al. 2014) and Ensembl (Yates et al. 2016). In addition, it integrates five functional prediction scores, eight conservation scores, four disease-related annotations and regulatory region- centric resources from multiple epigenomics projects.

Results

The impact of 4,588 variants in 47 genes involved in kidney structure was investigated in

5 AA cohorts for associations with T2D-ESKD. Overall, 9 variants, 2 missense and 7 intronic,

110 upstream or downstream of candidate regions, located in 7 distinct regions reached locus-wide significance in either baseline model or APOL1-negative model.

Clinical characteristics of study participants. Characteristics of the samples in discovery, discrimination, and replication stages are shown in Table 1. Clinical characteristics of the individuals in the discovery stage were similar to those in the replication stage. The average age at recruitment was greater for individuals with T2D-ESKD and T2D lacking nephropathy, compared to non-diabetic non-nephropathy controls (P<1x10-27 in all stages). However, the average age of non-diabetic controls was greater than the average age at T2D diagnosis in

T2D-ESKD cases (P<1x10-9).

Association analysis in discovery stage. The impact of 4,588 variants in 47 genes involved in kidney structure (Supplementary Table 1) was tested in 2,041 T2D-ESKD cases and

1,140 non-diabetic non-nephropathy controls for association with T2D-ESKD using a linear mixed model-based method (stage 1, Figure 1) (Sawcer et al. 2011; Pirinen et al. 2013). Age and gender were included as fixed effects (Baseline model). These analyses did not adjust for

PC1 given that the effect of ancestry is accounted for by GRM. A secondary analysis was performed which removed APOL1 two-renal-risk-variant carriers, given their increased risk for non-diabetic kidney disease (APOL1-negative model). For single SNP analysis, only SNPs with

MAF≥0.01 were included. The number of SNPs in each gene region ranged from 11 to 205. The coverage for each of the 47 genes was calculated using SNPs with MAF≥0.01 in 1000 Genome phase 1 version 3 AFR samples. The average coverage at r2>0.5 (pairwise LD) was 54.4%, and varied from 27.6% to 92.8%. Of 4,588 SNPs, 212 showed nominal association (P<0.05) in baseline model from 39 candidate regions; these were further examined in the discrimination stages.

111

Discrimination analysis. Two analyses were performed to differentiate T2D-ESKD loci identified in the discovery stage as putative ESKD-associated or T2D-associated loci. An independent cohort of 667 AA individuals with T2D lacking nephropathy was compared with

T2D-ESKD cases and with non-diabetic non-nephropathy controls from the discovery stage in separate analyses (Table 1). First, we tested for association of the 212 associated SNPs between 2,041 AA T2D-ESKD cases and 667 AAs with T2D lacking nephropathy (stage 2a).

Cochran’s Q-test (Willer et al. 2010) was used to test for differences in allelic effects of association results between stage 1 and stage 2a, assuming that SNPs associated with T2D-

ESKD have similar effects in both analyses. The smaller sample size in stage 2a reduced power to detect significant associations. Of the 212 SNPs, 169 did not demonstrate significant heterogeneity in effect sizes in stage 1 and stage 2a (Phet >0.05). They were further tested for association with T2D by comparing 667 AA individuals with T2D lacking nephropathy with 1,140

AA non-diabetic non-nephropathy controls (stage 2b). Evidence of association with T2D was observed for 3 SNPs (P<0.05); these were excluded from further analysis. The remaining 166 candidate SNPs from 35 distinct genomic regions were carried to the replication stage.

Secondary discrimination analyses excluding T2D patients with <5 years disease duration yielded similar results (data not shown).

Replication and meta-analysis. An independent cohort of 483 AA T2D-ESKD cases and 554 AA non-diabetic non-nephropathy controls with imputed dosage data was used to replicate the 166 SNPs selected from stage 2b (stage 3). All SNPs met QC criteria in the replication cohort. The average imputation quality score (info) was 0.90, with the lowest info score of 0.45. Ninety SNPs from 30 candidate regions showed consistent directions of association with T2D-ESKD. Meta-analysis was performed to combine association results from stage 1 and stage 3 (labeled ‘stage 4’). Three SNPs reached locus-wide significance after correcting for the effective number of SNPs in the respective regions (Pcorr<0.05) (Table 2,

112

Supplementary Table 2). The two most significant locus-wide associations were intronic variants, rs116139597 (OR [95% confidence interval] (OR [95% CI])=1.28 [1.13-1.46], P=1.2x10-4,

-4 Pcorr=0.0056) and rs115912771 (OR [95% CI]=0.63 [0.49-0.80], P=1.6x10 , Pcorr=0.0074), located in the CD2AP region, which overlap with a small portion of the adjacent gene ADGRF2.

Rs115912771 is located in intron 11 of CD2AP, while rs116139597 is in intron 2 of ADGRF2, downstream from CD2AP. The third locus-wide association (rs7185763, OR [95% CI]=0.87

-4 [0.81-0.95], P=9.5x10 , Pcorr=0.036) was a variant located 12kb upstream from the MMP2 gene.

Of note, two additional associations approached locus-wide significance, rs6804354 (OR [95%

-3 CI]=1.19 [1.07-1.32], P=1.2x10 , Pcorr=0.055) located 18kb upstream of the NPHP3-ACAD11

-4 read-through region and rs9927174 (OR [95% CI]=0.88 [0.81-0.94], P=5.5x10 , Pcorr=0.060) located 13kb downstream from SLC12A3 (Supplementary Table 3).

Association analyses excluding APOL1 renal-risk carriers. The same series of single variant analyses were performed under APOL1-negative model after removing APOL1 renal-risk genotype carriers. A total of 70 SNPs showed evidence of association with T2D-ESKD

(P<0.05) and no evidence of association with T2D (stage 1 through stage 3). Meta-analysis of

T2D-ESKD cases and non-nephropathy controls from the discovery and replication stages

(stage 4) revealed 6 additional SNPs demonstrating locus-wide significance (Table 3,

Supplementary Table 4). These included two missense variants, located at CLDN8 (rs55884670,

-3 M97T, OR [95% CI]=1.35 [1.12-1.64], P=1.89x10 , Pcorr=0.021) and COL4A3 (rs34505188,

-4 R408H, OR [95% CI]=1.55 [1.22-1.97], P=4.75x10 , Pcorr=0.029), respectively. Two variants in

2 -4 perfect LD (r =1.0), rs74504809 (OR [95% CI]=0.80 [0.70-0.91], P=8.71x10 , Pcorr=0.040) and

-3 rs78174962 (OR [95% CI]=0.79 [0.69-0.91], P=1.0x10 , Pcorr=0.048) from the NPHP3-ACAD11 read-through region also demonstrated locus-wide significance, but are located at introns of

ACAD11. After conditioning on rs78174962, rs74504809 did not demonstrate association

(P>0.05) suggesting they comprise a single signal. In addition, a locus-wide significant signal

113

-4 (rs6742727, OR [95% CI]=0.81 [0.72-0.92], P=7.99x10 , Pcorr=0.023) was observed at an RNA gene AC009495.3, located upstream of TTC21B. Finally, an intronic variant at ARHGAP24

-4 (rs10433935, OR [95% CI]=0.62 [0.48-0.82], P=5.43x10 , Pcorr=0.041) demonstrated locus-wide significant association with T2D-ESKD. There were two additional associations that approach locus-wide significance including rs72654165 (OR [95% CI]=2.06 [1.37-3.10], P=5.3x10-4,

Pcorr=0.081) located in the second intron of COL4A1 and a missense variant rs57737815

-3 (P555R, OR [95% CI]=0.70 [0.55-0.89], P=3.2x10 , Pcorr=0.060) located exon 7 of WNK4

(Supplementary Table 3). All the three locus-wide significant SNPs from baseline model were no longer significant in APOL1-negative model (Pcorr>0.05); despite similar effect sizes in baseline model and APOL1-negative model. This suggests that the attenuation of association may be due to decreased sample size with loss of power (Supplementary Table 5).

Gene-based analysis. To improve statistical power, gene-based analysis was performed to aggregate the effects of functional variants within each candidate region using samples from the discovery and discrimination stages. Only missense and loss of function (LOF) variants were considered, the latter are predicted to cause splice site changes, frameshifts or stop codon gains (McLaren et al. 2010). Initially, 2,041 AA T2D-ESKD cases and 1,140 AA non- diabetic non-nephropathy controls were examined in the discovery stage, followed by testing

2,041 AA T2D-ESKD cases and 667 AAs with T2D lacking nephropathy in the discrimination stage under models 1 and 2. Analyses were performed for 39 genes with variant counts ≥2.

After excluding genes showing association with T2D alone, no gene reached defined significance level (P<0.001, correction for 39 comparisons). However, nominal associations were observed at EMP2 (variant count=2, P=0.0065), COL4A3 (variant count=12, P=0.0044),

WNK4 (variant count=8, P=0.0081), and CLDN14 (variant count=2, P=0.0026) under APOL1- negative model (Table 4).

114

Bioinformatic characterization. We applied expression quantitative trait loci (eQTL) analysis and functional annotation tool to uncover functional relevance underlying the identified associations. In the eQTL analysis, locus-wide significant SNPs from stage 4 and their proxies were queried in the publically available eQTL database GTEx. The intronic SNP rs10433935 in

ARHGAP24 was strongly associated with expression level of ARHGAP24 in artery-tibial tissue

(P=0.000011), indicating its regulatory role in this gene. In addition, rs6742727 and rs7185763 were associated with transcript abundance of nearby genes GALNT3 and AC00949.2 at the

TTC21B region, and IRX6 at the MMP2 region, respectively (Supplementary Table 6).

We used the Amazon Compute Cloud version of WGAS (Liu et al. 2016) to annotate

T2D-ESKD associated SNPs. Three SNPs (rs55884670, M97T at CLDN8; rs34505188, R408H at COL4A3; and rs57737815, P555R at WNK4), were annotated as missense variants by VEP

(McLaren et al. 2010), ANNOVAR (Wang et al. 2010) and SnpEff (Cingolani et al. 2012). In addition, rs34505188 and rs57737815 were predicted to be probably damaging by PolyPhen2

(Adzhubei et al. 2010). All three missense SNPs, along with rs10433935 from ARHGAP24, were located within highly conserved regions across mammalians according to their high

GERP++ rejected substitutions scores (Davydov et al. 2010). Two most recently developed functional prediction scores, CADD (Kircher et al. 2014) and DANN (Quang et al. 2015) were evaluated for the significantly associated SNPs. Both methods predict potential pathogenicity of genetic variants by integrating a wide range of annotations into a single metric using machine learning methods (Kircher et al. 2014; Quang et al. 2015). Again, rs34505188, rs57737815 and rs10433935 at COL4A3, WNK4 and ARHGAP24 loci, respectively, were predicted within the top pathogenic variants (0.4% to 8%) in the . In addition, rs55884670 (CLDN8) was ranked within the top 16% and 28% pathogenic variants in the human genome by CADD and

DANN, respectively. For regulatory region-centric annotations, rs57737815 at WNK4 overlapped with transcription factors binding site (TFBS) of POLR2A in HepG2 cells, as well as

115

DNase I hypersensitive site detected in two cell types. In addition, rs10433935 from ARHGAP24 overlaps with a general developmental enhancer with <5% false discovery rate (FDR) (Erwin et al. 2014). Full results are shown in Supplementary Table 7.

Discussion

The contribution of genetic variants in 47 kidney structure-related genes to T2D-ESKD was evaluated in 4,885 AAs with high density genotype data which included coding and non- coding variants. Evidence of association with T2D-ESKD was observed for seven loci including

CD2AP, MMP2, TTC21B, COL4A3, NPHP3-ACAD11, CLDN8 and ARHGAP24 in single variant analysis of either baseline or APOL1-negative models. The associated variants at COL4A3 and

CLDN8 were missense mutations. All associated SNPs achieved locus-wide significance in the meta-analysis of T2D-ESKD cases and non-diabetic non-nephropathy controls and showed no evidence of association with T2D. Gene-based and multiple bioinformatic analyses supported potential cumulative effect and functional relevance at associated loci.

The most significant association was observed in two SNPs located at CD2AP in baseline model analysis. CD2-associated protein (CD2AP) is an adapter protein between membrane proteins and the actin cytoskeleton (Kirsch et al. 1999). CD2AP is highly expressed in glomeruli and has been associated with podocyte injury and focal segmental glomerulosclerosis (FSGS) (Shih et al. 1999; Löwik et al. 2007). Other variants in CD2AP demonstrated suggestive associations with ESKD in Europeans with T1D (Hyvönen et al. 2013).

Together, this and other studies suggest CD2AP variants are associated with both non-diabetic and diabetic (T1D and T2D) kidney disease.

The SNP rs7185763 located 12kb upstream to matrix metallopeptidase 2 (MMP2) also achieved locus-wide significance in baseline model analysis. Metallopeptidase 2 is a member of the MMP family and is involved in the degradation and turnover of extracellular matrix (ECM).

116

Decreased expression of matrix metallopeptidase 2 in glomeruli was observed in patients with

DKD, suggesting that lower expression level of MMP2 is associated with mesangial matrix expansion and T2D-ESKD (Prete et al. 1997; Mason and Wahab 2003).

Analyses considering the impact of APOL1 renal-risk genotypes revealed further insights into genetic susceptibility to T2D-ESKD. In APOL1-negative model analysis that removed

APOL1 risk genotype carriers, a low frequency missense variant R408H (rs34505188) in type IV collagen (COL4A3) was associated with T2D-ESKD with the frequency of risk allele A being

0.033. R408H has a lower frequency in African populations (0.028), compared to European

(0.069) and East Asian (0.16) based on the 1000 Genomes Project data. The alpha 3 chain in type IV collagen forms a heterotrimer with alpha 4 and alpha 5 chains, forming the major component of the GBM; it may be synthesized by podocytes (Miner 2011). Mutations in

COL4A3 are known to cause autosomal forms of Alport Syndrome (AS) and are associated with

FSGS and chronic kidney disease in patients with thin basement membrane nephropathy

(Kashtan 1995; Voskarides et al. 2007). Similar to the other five alpha chains in type IV collagen, alpha 3 contains a short amino-terminal domain, a long collagenous central domain consisting of Gly-X-Y repeats, and a non-collagenous domain. Replacement of a glycine in the Gly-X-Y repeat region is the most common mutant type of AS. These missense variants may induce kinks in folding with aberrant formation of the collagen triple helix (Prockop 1992). R408H is located in the Gly-X-Y repeat region and a prior study suggested this coding variant was not pathogenic for AS (Heidet et al. 2001). However, R408H is located in a highly conserved region, and was predicted within the top 1% of potentially pathogenic variants in the human genome by both CADD and DANN analyses. Furthermore, COL4A3 trended toward association with T2D-

ESKD in the gene-based analysis with the most significant association with a rare missense mutation rs55816283 (P1109S, P=0.00096, effect allele frequency [EAF] =0.0008). Taken

117 together, multiple lines of evidence support that COL4A3 variants may be involved in ESKD susceptibility in AAs with T2D.

Another missense variant, M97T (rs55884670) in the claudin 8 gene (CLDN8) demonstrated locus-wide association with T2D-ESKD in AAs (risk G allele frequency = 0.052).

This variant was not observed in non-African populations based on the 1000 Genomes Project data. Claudins are integral membrane components of tight junctions that support pore and barrier functions of paracellular pathways in epithelial cells. They confer selective paracellular permeability in different segments of the renal tubule (Yu 2015). Claudin 8 is expressed in the thin descending limb of the loop of Henle, the distal tubule, and the collecting duct. It is required for the formation of the paracellular renal chloride channel by interacting with claudin 4 (Kiuchi-

Saishin et al. 2002; Li et al. 2004; Hou et al. 2010). Other claudins have been associated with

DKD (Gaut et al. 2014; Molina-Jijón et al. 2014); this is the first study to report association of a

CLDN8 gene variant with T2D-ESKD. Interestingly, another claudin family member CLDN14 also demonstrated a trend toward association with T2D-ESKD in the gene-based analysis.

Given the vital role of claudins in the renal tubule, variants in claudin genes may impact progression of kidney disease under the stress of hyperglycemia.

Two locus-wide significant SNPs (rs74504809, rs78174962) in complete LD located in the NPHP3-ACAD11 region were associated with T2D-ESKD in APOL1-negative model.

NPHP3 is involved in ciliogenesis and possibly renal tubular development (Zhou et al. 2010).

Mutations in NPHP3 cause type 3 nephronophthisis, a relatively common cause of kidney failure in children and young adults (Olbrich et al. 2003). Both variants were located outside NPHP3, their potential regulatory role in NPHP3 gene function remains to be determined.

The SNP rs10433935 located at ARHGAP24 which encodes Rho GTPase activating protein 24 was also associated with T2D-ESKD. Multiple lines of evidence, including eQTL

118 analysis and WGSA annotation, suggest that rs10433935 overlaps with a highly functional region in ARHGAP24. Rho GTPase-activating protein 24 is involved in cell polarity, cell morphology, and cytoskeletal organization (Katoh and Katoh 2004). It plays a role in podocyte differentiation and contributes to the balance of podocyte Rho A and Rac1 signaling (Akilesh et al. 2011). Furthermore, a mutation in this gene has been associated with familial FSGS (Akilesh et al. 2011).

Collectively, these results suggest that variation at glomerular and tubulointerstitial- related structural genes may contribute to T2D-ESKD susceptibility in AAs. DKD is a complex disease with multiple environmental risk factors and polygenic inheritance. Microvascular changes, for example, abnormal capillary permeability and retinopathy, may precede early structural glomerular changes in patients with diabetes. Thus, variations in structural genes may be manifest specifically in the kidneys, despite possible altered expression in other organs

(Alpert et al. 1972; Girach and Vignati 2006). It is noteworthy that mutations in CD2AP, COL4A3, and ARHGAP24 are known to cause the podocyte disorder FSGS (Pollak 2014). The present findings suggest that structural kidney genetic variants may contribute to both DKD and FSGS.

Alternatively, some T2D-ESKD cases might have been misclassified and had proteinuric FSGS or GBM diseases with coincident T2D. Most cases lacked a kidney biopsy; this limitation is present in virtually all large-scale genetic studies of T2D-ESKD. We carefully attempted to minimize effects of misclassification by removing patients with T2D who had other causes of kidney disease and by removing APOL1 risk genotype carriers who likely had non-diabetic kidney diseases (APOL1-negative model). The latter should lead to enrichment for true DKD cases, and additional loci may be discovered as these samples were less heterogeneous despite the smaller sample size. A similar approach led to the identification of FRMD3 association with DKD in non-APOL1 carriers (Freedman et al. 2011). Renal tubulointerstitial fibrosis and ECM accumulation have great impact on progression of DKD, as well as on other

119 etiologies of nephropathy. Our results are consistent with variation in tubulointerstitium-related structural genes such as MMP2, NPHP3, TTC21B, CLDN8, CLDN14 and WNK4 contributing to interstitial injury and fibrosis in patients susceptible to DKD.

This study has limitations. As stated, it is difficult to fully exclude all cases misclassified as DKD due to the frequent lack of kidney biopsy material. However, removing patients with

ESKD attributed to non-diabetic etiologies, as well as excluding APOL1 renal-risk allele carriers, should minimize misclassification despite considerable reductions in power. An additional limitation is that many other kidney structure-related genes, for instance SLC5A2, which plays a role in tubuloglomerular feedback and hyperfiltration injury (Škrtić and Cherney 2015), could not be included due to the gene selection criteria. Higher density genetic data remain necessary in order to thoroughly examine these genes. Moreover, we lack prospective follow-up data to investigate genetic contributions to the progression of kidney disease in diabetic patients; this could be a powerful analysis to understand the dynamics of DKD. Finally, we did not adjust for several well-established DKD risk factors in our analyses, including blood pressures, degrees of albuminuria, body mass index, HbA1c or medication usage, because these change frequently over time as disease progresses.

Conclusion

This is the first study to comprehensively survey the genetic effects of kidney structure- related loci in T2D-ESKD susceptibility in the AA population. Using high density genetic data and a multi-stage study design, strong genetic associations were identified between several kidney structure genes and T2D-ESKD risk. T2D-ESKD associated variants at COL4A3, CLDN8, and ARHGAP24 were potentially pathogenic. Future functional studies are warranted to illustrate the potential mechanisms underlying the effect of these newly identified genetic associations.

120

Acknowledgements

This work was supported by NIH grants R01 DK53591 (DWB), DK070941 and

DK084149 (BIF), and by the National Natural Science Foundation of China (No. 81200488).

This work has also been made possible through an International Society of Nephrology

Fellowship and Shanghai Jiaotong University K.C. Wong Medical Fellowship Fund (Jun Ma).

We acknowledge the contributions of the study participants, coordinators, physicians, staff and laboratory.

Compliance with Ethical Standards

Disclosure of potential conflicts of interest

The authors declare that they have no conflict of interest.

Research involving Human Participants and/or Animals

Informed consent: All participants provided written informed consent.

121

Figure 1: Workflow of kidney structure-related gene analyses. Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; MAF, minor allele frequency

122

Table 1. Clinical characteristics of study cohorts

Discovery stage Discrimi Replication stage nation stage T2D-ESKD Non-diabetic, non- T2D T2D-ESKD Non-diabetic, non- cases nephropathy controls lacking cases nephropathy controls nephropat hy N 2,041 1,140 667 483 554 Female (%) 57.2 51.7 64.5 62 55.4 Age (years) 61.4±10.8 46.5±12.0 55.7±11.6 64.5±8.6 49.5±11.1 Age at T2D onset (years) 38.6±12.7 - 46.2±12.2 45.4±9.8 - Duration of T2D for T2D lacking - - 9.6±9.1 - - nephropathy cohort Duration of T2D prior to ESKD 19.6±9.9 - 16.1±9.0 - - (years) Duration of ESKD (years) 3.6±3.6 - - 3.2±3.7 - Fasting serum glucose(mg/dl) - 96.5±20.9 - - 85.1±10.0 Serum creatinine (mg/dl) - 0.97±0.2 0.94±0.2 - 0.95±0.2 2 Body Mass Index (kg/m ) 30.8±7.1 29.7±7.4 33.1±7.8 29.3±6.5 28.4±4.7 Categorical data expressed as percentage; continuous data as mean ± SD. Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; N, number

123

Table 2. T2D-ESKD associated SNPs in meta-analysis from discovery and replication cohorts (Baseline model)

Discovery (T2D-ESKD cases vs. Replication (T2D-ESKD non-diabetic, non-nephropathy cases vs. non-diabetic, non- Meta-analysis controls) nephropathy controls) Candid N EAF N OR Annotati Ch Positio E EA EA ate SNP case/con case/con OR P case/con OR P (95% P P on r n A F F corr region trol trol trol CI) 0.63 rs115912 475511 2040/113 0.019/0.0 0.6 0.00 0.02 0.6 0.06 0.02 1.6 0.00 CD2AP Intronic 6 A 483/554 (0.49,0. 771 82 8 32 4 10 7 0 0 4 E-4 74 8) 1.28 rs116139 Downstre 476337 2038/113 0.11/0.08 1.2 0.00 0.09 1.3 0.03 0.09 1.2 0.00 CD2AP 6 C 483/554 (1.13,1. 597 am 75 6 0 8 13 6 0 4 9 E-4 56 46) 0.87 rs718576 554114 2034/113 0.8 0.00 0.8 0.09 9.5 0.03 MMP2 Upstream 16 C 0.28/0.33 483/554 0.30 0.30 (0.81,0. 3 50 6 7 42 8 8 E-4 6 95) Abbreviations: SNP, single nucleotide polymorphism; T2D, type 2 diabetes; ESKD, end-stage kidney disease; Chr, chromosome; EA, effect allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value; CI, confidence interval; Pcorr, p value adjusted for multiple comparison. Baseline model: adjusted for age and sex, APOL1 risk genotype carriers included.

124

Table 3. Additional associations in combined analysis after removing APOL1 renal-risk genotype carriers (APOL1-negative model)

Discovery (T2D-ESKD cases vs. Replication (T2D-ESKD vs. non-diabetic, non-nephropathy non-diabetic, non- Meta-analysis

controls) nephropathy controls) Candida N EAF N Annotati C Positio E EA EA OR P te SNP case/con case/con OR P case/con OR P P cor on hr n A F F (95%CI) Region trol trol trol r 0.81 rs67427 166692 0.7 0.000 0.1 0.9 0.6 0.1 8.0 0.0 TTC21B Intronic 2 T 1583/828 0.11/0.15 383/484 (0.72,0.9 27 686 6 25 3 4 2 3 E-4 23 16) 1.55 rs34505 Missens 228128 0.035/0.0 1.5 0.003 0.0 1.5 0.0 0.0 4.7 0.0 COL4A3 2 A 1613/846 383/484 (1.21, 188 e 568 24 7 5 37 0 54 33 E-4 29 1.97) 0.79 NPHP3- rs78174 132306 0.086/0.1 0.8 0.005 0.0 0.7 0.0 0.0 1.0 0.0 Intronic 3 A 1612/847 383/484 (0.69, ACAD11 962 663 1 0 8 94 8 75 95 E-3 48 0.91) 0.80 NPHP3- rs74504 132366 0.099/0.1 0.8 0.008 0.1 0.7 0.0 0.1 8.7 0.0 Intronic 3 G 1613/846 383/484 (0.70, ACAD11 809 578 2 1 4 0 5 35 1 E-4 40 0.91) 0.62 ARHGA rs10433 865972 0.020/0.0 0.6 0.0 0.6 0.0 0.0 5.4 0.0 Intronic 4 C 1608/843 0.003 383/484 (0.48, P24 935 45 29 3 31 2 77 25 E-4 41 0.82) 1.35 rs55884 Missens 315879 0.060/0.0 1.2 0.0 1.5 0.0 0.0 1.9 0.0 CLDN8 21 G 1612/845 0.024 383/484 (1.12, 670 e 54 43 9 48 6 22 52 E-3 21 1.64) Abbreviations: SNP, single nucleotide polymorphism; T2D, type 2 diabetes; ESKD, end-stage kidney disease; Chr, chromosome; EA, effect allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value; CI, confidence interval; Pcorr, p value adjusted for multiple comparison. APOL1-negative model: adjusted for age and sex, APOL1 risk carriers removed.

125

Table 4. Top associations from the gene-based analysis in APOL1-negative model

T2D lacking T2D-ESKD vs. non-diabetic, non-nephropathy controls (N=2,463) nephropathy vs. non-

T2D (N=1,383) Gene Method Number of SNPs P Best SNP P (Best SNP) EAF (Best SNP) Number of SNPs P

EMP2 SKAT 2 0.0065 rs73503834 0.0064 0.0333 2 0.93

COL4A3 SKAT 12 0.0044 rs55816283 0.00096 0.000813 12 0.56

WNK4 SKAT 8 0.0081 rs57737815 0.005 0.0279 8 0.64

CLDN14 Burden 2 0.0026 rs140918123 0.032 0.00406 2 0.78 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; P, p value; EAF, effect allele frequency; SKAT, sequence kernel association test; Burden, Madsen-Browning test; Best SNP, the most significant SNP within each region APOL1-negative model: adjusted for age and sex, APOL1 risk genotype carriers excluded

126

Supplementary Methods

Sample inclusion criteria. Patients were considered to have T2D-ESKD (or severe forms of nephropathy likely to progress to ESKD) when diabetes was diagnosed after age 25 and was present for at least 5 years prior to the onset of ESKD (or cases had diabetic retinopathy to ensure adequate T2D durations), and one or more of the following: renal replacement therapy (N=1,879), estimated glomerular filtration rate

2 (eGFR)<30 ml/min/1.73 m (N=50), or urine albumin:creatinine ratio (UACR) >300 mg/g

(N=94) if eGFR is missing. Patients with ESKD attributed to surgical or urologic causes, polycystic kidney disease, autoimmune disease, hepatitis, IgA nephropathy, membranous glomerulonephritis, membranoproliferative glomerulonephritis, or monogenic kidney diseases were excluded. Controls without diabetes or kidney disease

(eGFR>60 ml/min/1.73m2 and UACR<30 mg/g) were recruited from the community and internal medicine clinics at Wake Forest School of Medicine (WFSM). T2D was diagnosed in individuals who did not receive insulin treatment alone since diagnosis.

Participants with T2D lacking kidney disease had eGFR >60 ml/min/1.73m2 and UACR

<30 mg/g and were recruited from medical clinics and community resources.

Design, genotyping and quality control of Axiom custom array. This array includes standard exome chip content of approximately 264K coding SNPs and insertions/deletions (indels), 70K loss-of-function variants, 2K pharmacogenomic variants, 23K eQTL markers, and 246K multi-ethnic population based genome-wide tag markers. The custom content of the array includes ~115K variants containing additional genome-wide tag SNPs for the Yoruba (YRI) population in HapMap, SNPs associated with kidney function, T2D, glycemia, or adiposity from the literature, and haplotype

127 tagging-SNPs in biological candidate genes/pathways and GWAS loci for fine mapping.

DNA from cases and controls were equally interleaved on 96-well plates to minimize artifactual errors during sample processing. A total of 81 samples sequenced as part of the 1000 Genomes Project (http://www.1000genomes.org/) at the Coriell Institute for

Medical Research were included in genotyping and had a concordance rate of 99.2%.

Genotype calling was performed using Affymetrix Power Tools (APT). A total of 724,530

SNPs were successfully called for downstream quality control (QC) and analyses. QC was performed using PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/), unless otherwise specified. SNPs with call rates <95%, departure from Hardy Weinberg

Equilibrium (HWE) (P<0.0001), and monomorphic SNPs were removed. Samples with call rates <95% were removed from downstream analyses. Cryptic relatedness was assessed by identity-by-descent (IBD) analysis. Duplicate samples were identified, and one of each duplicate pair removed. Samples with gender discordance between genetic estimation and self-reported gender were removed. Samples with negative inbreeding coefficients (<3 SD from mean) suggesting DNA contamination were removed. Principal

Components (PC) analysis was performed using EIGENSOFT

(https://data.broadinstitute.org/alkesgroup/EIGENSOFT/) by combining these samples with those from the 1000 Genomes EUR, AFR, and ASN populations. Samples with only European ancestry or ancestry other than European and African were removed.

The first 10 PCs were calculated in the study samples. PC1 was associated with

African-European ancestry and no additional population substructure was observed.

128

Supplementary table 1. Kidney structure-related genes Gene Location* Gene Name Related Kidney Structure NPHS2 1:179519674- Podocin slit diaphragm 179545087 NPHS1 19:36316274-36360189 Nephrin slit diaphragm TTC21B 2:166713985- tetratricopeptide repeat domain 21B Podocytes 166810353 ARHGAP2 4:86396267-86923823 Rho GTPase activating protein 24 Podocytes 4 CD2AP 6:47445525-47594999 CD2-associated protein Podocytes ANLN 7:36429415-36493400 anillin, actin binding protein Podocytes PLCE1 10:95753746-96092580 phospholipase C, epsilon 1 Podocytes WT1 11:32409321-32457176 Wilms tumor 1 Podocytes TRPC6 11:101322295- transient receptor potential cation channel, subfamily C, Podocytes 101743293 member 6 INF2 14:105155943- inverted formin, FH2 and WH2 domain containing Podocytes 105185947 MYO1E 15:59427113-59665099 myosin IE Podocytes EMP2 16:10622279-10674555 epithelial membrane protein 2 Podocytes DGKE 17:54911460-54946036 diacylglycerol kinase, epsilon 64kDa Podocytes ACTN4 19:39138267-39222223 actinin, alpha 4 Podocytes COL4A4 2:227867427- collagen, type IV, alpha 4 GBM 228029275 COL4A3 2:228029281- collagen, type IV, alpha 3 GBM 228179508 LAMB2 3:49158547-49170599 laminin, beta 2 (laminin S) GBM LMX1B 9:129376722- LIM homeobox transcription factor 1, beta GBM 129463311 COL4A1 13:110801310- collagen, type IV, alpha 1 GBM 110959496 COL4A2 13:110958159- collagen, type IV, alpha 2 GBM 111165374 CFH 1:196621008- complement factor H mesangial cell and matrix 196716634

129

MMP2 16:55423612-55540603 matrix metallopeptidase 2 mesangial cell and matrix ITGA3 17:48133332-48167849 integrin, alpha 3 mesangial cell and matrix MMP9 20:44637547-44645200 matrix metallopeptidase 9 mesangial cell and matrix TIMP3 22:33196802-33259030 TIMP metallopeptidase inhibitor 3 mesangial cell and matrix NPHP4 1:5922870-6052533 nephronophthisis 4 Tubulointerstitium REN 1:204123944- Renin Tubulointerstitium 204135465 ATP6V1B1 2:71162998-71192561 ATPase, H+ transporting, lysosomal 56/58kDa, V1 subunit B1 Tubulointerstitium NPHP1 2:110879888- nephronophthisis 1 Tubulointerstitium 110962643 NPHP3 3:132276986- nephronophthisis 3 Tubulointerstitium 132441303 CLDN18 3:137717577- claudin 18 Tubulointerstitium 137752494 CLDN11 3:170136653- claudin 11 Tubulointerstitium 170578169 CLDN1 3:190023490- claudin 1 Tubulointerstitium 190040264 CLDN16 3:190040330- claudin 16 Tubulointerstitium 190129932 SLC2A9 4:9772777-10056560 solute carrier family 2, member 9 Tubulointerstitium SLC12A7 5:1050489-1112172 solute carrier family 12, member 7 Tubulointerstitium CLDN20 6:155585147- claudin 20 Tubulointerstitium 155597682 CLDN12 7:90013035-90142716 claudin 12 Tubulointerstitium ATP6V0A4 7:138391039- ATPase, H+ transporting, lysosomal V0 subunit a4 Tubulointerstitium 138484305 CLDN10 13:96085853-96232013 claudin 10 Tubulointerstitium UMOD 16:20344373-20367623 Uromodulin Tubulointerstitium SLC12A3 16:56899119-56949762 solute carrier family 12, member 3 Tubulointerstitium HNF1B 17:36046434-36105237 HNF1 homeobox B Tubulointerstitium WNK4 17:40932649-40949084 WNK lysine deficient protein kinase 4 Tubulointerstitium SLC7A9 19:33311415-33370672 solute carrier family 12, member 7 Tubulointerstitium

130

CLDN8 21:31586324-31588469 claudin 8 Tubulointerstitium CLDN14 21:37832919-37948867 claudin 14 Tubulointerstitium *NCBI Build 37 GBM, glomerular basement membrane

131

Supplementary Table 2. Differentiation steps for T2D-ESKD associated SNPs in Baseline model

Discrimination Discrimination Discovery (T2D-ESKD cases vs. non- (T2D-ESKD cases vs. T2D lacking Cochran’s Q (T2D lacking nephropathy vs. non- diabetic, non-nephropathy controls) nephropathy) test T2D)

N EAF N EAF N EAF Candida E Directio Marker case/contr case/contr OR P case/contr case/contr OR P P case/contr case/contr OR P A n te region ol ol ol ol ol ol

rs1159127 0.019/0.03 0.6 0.001 0.019/0.02 0.7 0.5 0.023/0.03 0.7 0.2 CD2AP A 2040/1138 2040/664 0.14 -- 664/1138 71 2 4 0 3 4 4 2 8 0

rs1161395 0.110/0.08 1.2 0.001 0.110/0.10 1.1 0.2 0.103/0.08 1.2 0.0 CD2AP C 2038/1136 2038/667 0.29 ++ 667/1136 97 0 8 3 2 1 8 0 3 6

0.283/0.32 0.8 0.004 0.283/0.31 0.8 0.04 0.9 0.310/0.32 0.9 0.8 MMP2 rs7185763 C 2034/1136 2034/664 -- 664/1136 6 7 2 0 8 9 5 6 9 5 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; CTRL, control; EA, effect allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value Baseline model: adjusted for age and sex, APOL1 risk carriers included

132

Supplementary Table 3. Associations approaching locus-wide significance in Baseline model or APOL1-negative model

Replication (T2D-ESKD Discovery (T2D-ESKD cases vs. non- cases vs. non-diabetic, Meta-analysis diabetic, non-nephropathy controls) non-nephropathy controls) Candida N EAF N Annotatio Ch E te SNP Position case/contr case/contr OR P case/contr OR P OR P P n r A corr region ol ol ol (95% CI)

Baseline model

1.19 NPHP3- rs680435 1322586 1.2 0.002 1.1 0.05 Upstream 3 G 2033/1129 0.16/0.15 483/554 0.17 (1.07,1.3 0.0012 ACAD11 4 39 1 9 4 5 2) 0.88 SLC12A rs992717 Downstrea 5696263 0.9 0.8 0.004 0.0005 0.06 16 T 2031/1131 0.38/0.42 0.020 483/554 (0.81,0.9 3 4 m 0 0 1 8 5 0 4)

APOL1-negative model

2.06 rs726541 1108873 0.016/0.01 1.8 0.007 2.8 0.0005 0.08 COL4A1 Intronic 13 A 1613/844 383/484 0.020 (1.37,3.1 65 34 1 9 1 6 3 1 0) 0.70 rs577378 4093948 0.024/0.03 0.6 0.004 0.7 0.08 WNK4 Missense 17 G 1609/844 383/484 0.30 (0.55,0.8 0.0032 15 3 5 7 8 9 4 9) Abbreviations: SNP, single nucleotide polymorphism; T2D, type 2 diabetes; ESKD, end-stage kidney disease; Chr, chromosome; EA, effect allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value; CI, confidence interval; Pcorr, corrected p value. Baseline model: adjusted for age and sex, APOL1 renal-risk genotypes included. APOL1-negative model: adjusted for age and sex, APOL1 renal-risk carriers removed.

133

Supplementary Table 4. Differentiation steps for top associated SNPs in APOL1-negative model

Discovery (T2D-ESKD cases vs. non- Discrimination (T2D-ESKD cases vs. Cochran’s Q Discrimination (T2D lacking diabetic, non-nephropathy controls) T2D lacking nephropathy) test nephropathy vs. non-T2D)

N EAF N EAF N EAF Candidate E Directi MARKER case/cont case/cont OR P case/cont case/cont OR P P case/cont case/cont OR P Region A on rol rol rol rol rol rol

rs674272 0.7 0.0002 0.6 0.0002 0.4 1.0 0.3 TTC21B T 1583/828 0.11/0.15 1583/525 0.11/0.15 -- 654/1116 0.14/0.14 7 6 5 9 9 6 9 3

rs345051 0.035/0.02 1.5 0.035/0.03 1.1 0.1 0.032/0.02 1.1 0.5 COL4A3 A 1613/846 0.0035 1613/532 0.60 ++ 662/1139 88 4 7 2 1 7 9 1 7

NPHP3- rs781749 0.086/0.11 0.8 0.086/0.09 0.9 0.2 0.091/0.10 0.8 0.2 A 1612/847 0.0058 1612/536 0.60 -- 665/1140 ACAD11 62 2 0 0 4 7 3 9 8

NPHP3- rs745048 0.099/0.12 0.8 0.099/0.10 0.9 0.4 0.107/0.11 0.9 0.5 G 1613/846 0.0084 1613/535 0.37 -- 666/1138 ACAD11 09 3 1 7 0 5 5 4 2

ARHGAP rs104339 0.020/0.02 0.6 0.019/0.02 0.9 0.1 0.022/0.02 0.8 0.5 C 1608/843 0.0030 1608/530 0.80 -- 661/1134 24 35 9 3 0 4 6 5 8 4

rs558846 0.060/0.04 1.2 0.060/0.05 1.0 0.3 0.056/0.05 1.0 0.5 CLDN8 G 1612/845 0.024 1612/536 0.64 ++ 667/1135 70 3 9 9 7 2 0 9 2 Abbreviations: T2D, type 2 diabetes; ESKD, end-stage kidney disease; EA, effect allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value APOL1-negative model: adjusted for age and sex, APOL1 renal-risk genotype carriers excluded

134

Supplementary Table 5. Results of the top three associations from Baseline model in APOL1-negative model Discovery (T2D-ESKD cases vs. Replication (T2D-ESKD non-diabetic, non-nephropathy cases vs. non-diabetic, non- Meta-analysis controls) nephropathy controls) Candid N EAF N OR Annotati C Positio E O O P ate SNP case/con case/cont P case/con EAF P EAF (95% P cor on hr n A R R region trol rol trol CI) r 0.65 rs115912 475511 0.0198/0.0 0.6 0.01 0.02 0.5 0.0 0.02 0.001 0.0 CD2AP Intronic 6 A 1615/845 383/484 (0.49,0. 771 82 284 8 4 77 4 43 38 9 89 85) 1.26 rs116139 Downstr 476337 0.109/0.07 1.2 0.00 0.10 1.2 0.1 0.09 0.001 0.0 CD2AP 6 C 1613/846 383/484 (1.09,1. 597 eam 75 8 8 49 1 1 5 91 62 78 45) 0.87 rs718576 Upstrea 554114 0.286/0.32 0.8 0.02 0.30 0.8 0.1 0.30 0.006 0.2 MMP2 16 C 1609/845 383/484 (0.81,0. 3 m 50 1 9 5 5 8 3 0 8 6 97) Abbreviations: SNP, single nucleotide polymorphism; T2D, type 2 diabetes; ESKD, end-stage kidney disease; Chr, chromosome; EA, effect allele; N, number; EAF, effect allele frequency; OR, odds ratio; P, p value; CI, confidence interval; Pcorr, corrected p value. Baseline model: adjusted for age and sex, APOL1 renal-risk genotypes included. APOL1-negative model: adjusted for age and sex, APOL1 renal-risk carriers removed.

135

Supplementary Table 6. eQTL analysis of T2D-ESKD associated SNPs and gene expression

Candidate Marker Chr Position Regulated Gene P-Value Effect Size eQTL Tissue Type region TTC21B rs6742727 2 166692686 GALNT3 2.70E-07 0.42 Nerve - Tibial cis-eQTL TTC21B rs6742727 2 166692686 AC009495.2 3.20E-07 0.26 Whole Blood cis-eQTL TTC21B rs6742727 2 166692686 AC009495.2 3.70E-07 0.6 Prostate cis-eQTL TTC21B rs6742727 2 166692686 GALNT3 8.0E-06 0.39 Adipose - Subcutaneous cis-eQTL ARHGAP24 rs10433935 4 86597245 ARHGAP24 1.10E-05 -0.47 Artery - Tibial cis-eQTL MMP2 rs7185763 16 55411450 IRX6 7.40E-06 0.48 Adipose - Subcutaneous cis-eQTL Abbreviations: SNP, single nucleotide polymorphism; Chr, chromosome; eQTL, expression quantitative trait loci.

136

Chapter 5

Association analysis of the cubilin (CUBN) and megalin (LRP2)

genes with end-stage kidney disease in African Americans

Jun Ma*, Meijian Guan*, Donald W. Bowden, Maggie C. Y. Ng, Pamela J. Hicks,

Janice P. Lea, Lijun Ma, Chuan Gao, Nicholette D. Palmer, Barry I. Freedman

This manuscript was published in Clinical Journal of the American Society of

Nephrology.

Jun Ma*, Meijian Guan* et al. Association analysis of the cubilin (CUBN) and megalin (LRP2) genes with end-stage kidney disease in African Americans. Clin J

Am Soc Nephrol, 2016 May 19. pii: CJN.12971215.

*Jun Ma and Meijian Guan were equal contributors.

137

Abstract

Background and objectives: Genetic variation in the cubilin gene (CUBN) is associated with albuminuria and chronic kidney disease. Common and rare coding variants in CUBN and the gene encoding its transport-partner megalin (LRP2) were assessed for association with end-stage kidney disease (ESKD) in African Americans (AAs).

Design, setting, participants and measurements: 66 CUBN and LRP2 single nucleotide polymorphisms (SNPs) were selected and analyzed in this multi-stage study. Exome sequencing data from 529 AA type 2 diabetes (T2D)-attributed ESKD cases and 535 controls lacking T2D or nephropathy (T2D-GENES Consortium) were first evaluated, focusing on coding variants in CUBN and LRP2. Fifteen potentially associated SNPs identified from T2D-GENES, as well as 51 other selected SNPs were then assessed in an independent AA T2D-ESKD sample set (Affymetrix Axiom Biobank Genotyping Array

[AXIOM]; 2041 T2D-ESKD cases, 627 T2D without nephropathy, and 1140 non-diabetic, non-nephropathy controls). A meta-analysis combining T2D-GENES and Axiom data was performed for 18 overlapping SNPs. Additionally, all 66 SNPs were genotyped in Wake

Forest AA non-diabetic ESKD samples (885 non-diabetic ESKD cases and 721 controls).

Association testing with ESKD was performed in models including age, sex, African ancestry proportion, and apolipoprotein L1 gene renal-risk-variants.

Results: CUBN SNP rs1801239 (I2984V), previously associated with albuminuria, was significantly associated with T2D-ESKD in AAs (T2D-GENES+AXIOM meta-analysis p=0.027; odds ratio [OR] 1.31; 95% confidence interval [CI] 1.03-1.67; minor

138 allele frequency [MAF]=0.028). A novel LRP2 missense variant, rs17848169 (N2632D), was also significantly protective from T2D-ESKD (T2D-GENES+AXIOM p<0.002; OR 0.47; 95%

CI 0.29-0.75; meta-analysis MAF=0.007). Neither SNP was associated with T2D, when contrasting T2D cases with controls lacking diabetes. CUBN and LRP2 SNPs were not associated with non-diabetic etiologies of ESKD.

Conclusions: Evidence for genetic association exists between a cubilin and a rare megalin variant with diabetes-attributed ESKD in populations with recent African ancestry.

139

Introduction

Increasing evidence supports that inherited factors make major contributions to end- stage kidney disease (ESKD) susceptibility (Köttgen 2010; Friedman and Pollak 2011). This is particularly true in African Americans who have high rates of ESKD with marked familial aggregation of nephropathy (Freedman et al. 1995). The incidence rate of ESKD in African

Americans is 3.3 fold higher than in European Americans (USRDS 2016). Apolipoprotein L1 gene (APOL1) renal-risk alleles associate with approximately 70% of non-diabetic ESKD in

African Americans (Genovese et al. 2010; Tzur et al. 2010b; Kopp et al. 2011); however, they do not explain the excess risk for type 2 diabetes (T2D)-associated ESKD (McDonough et al. 2011). Additional genetic loci likely contribute to this risk (Palmer et al. 2014; Bonomo et al. 2014a, b).

The cubilin gene (CUBN) was identified as a novel locus for albuminuria from a genome-wide association study (GWAS)-based meta-analysis (Böger et al. 2011). The missense single nucleotide polymorphism (SNP) rs1801239 (I2984V) in CUBN was associated with elevated urine albumin-to-creatinine ratio in individuals of European and recent African ancestry. Another intronic CUBN variant, rs10795433, in moderate linkage disequilibrium with rs1801239 (r2=0.54), was associated with urine albumin-to-creatinine ratio in patients with diabetes (Teumer et al. 2016). Cubilin forms a functional receptor complex with megalin (encoded by the LRP2 gene) in the proximal tubule to reabsorb filtered urinary albumin (Birn et al. 2000; Dickson et al. 2014). Megalin is important in facilitating the internalization of the cubilin-albumin complex (Amsellem et al. 2010).

Because albuminuria is an important risk factor for progression of kidney disease, we

140 hypothesized that variation in CUBN and LPR2 could contribute to nephropathy susceptibility. A recent analysis in European kidney transplant donors and recipients demonstrated that CUBN SNP rs7918972 was significantly associated with risk for ESKD

(Reznichenko et al. 2012). However, the role of CUBN genetic variation in ESKD susceptibility among African Americans remains unknown. We analyzed next generation exome sequencing (NGES) data to survey the CUBN and megalin (LRP2) genes to determine whether variation in these genes impacted risk for ESKD in African Americans, beyond reducing renal proximal tubule reabsorption of albumin.

Materials and Methods

Study Participants

This study was approved by the Institutional Review Board at the Wake Forest School of

Medicine; all participants provided written informed consent. Detailed recruitment and sample collection procedures have been reported (Bonomo et al. 2014a). African Americans with ESKD were recruited from dialysis facilities. T2D was diagnosed in those whose illness developed after 25 years of age and who lacked diabetic ketoacidosis or receipt of insulin alone since diagnosis. ESKD was attributed to T2D with >5 years diabetes duration before the start of renal replacement therapy in the absence of other causes of nephropathy. Cases with non-diabetic ESKD had nephropathy due to chronic glomerulosclerosis, focal segmental glomerulosclerosis (FSGS), HIV-associated nephropathy, attributed to hypertension, or unknown causes. Those with ESKD due to urologic or surgical causes, polycystic kidney disease, IgA nephropathy, membranous, or membranoproliferative glomerulonephritis were excluded. African Americans with T2D lacking nephropathy were receiving insulin and/or oral

141 hypoglycemic agents, had a hemoglobin (Hb) A1C >6.5%, or a fasting plasma glucose >126 mg/dl, with a serum creatinine concentration <1.5 (males) or <1.3 mg/dl (females). Unrelated

African Americans without diabetes or kidney disease (estimated glomerular filtration rate

[eGFR] >60 ml/min/1.73m2 and urine albumin-to-creatinine ratio <30 mg/g) were recruited as controls (described as non-T2D, non-nephropathy or healthy controls). Genomic DNA was extracted with the PureGene system (Gentra Systems, Minneapolis, MN) according to manufacturer instructions. Ethnicity was self-reported and confirmed using African ancestry proportions calculated with seventy ancestry informative markers (AIMs) (Tang et al. 2005;

Keene et al. 2008b).

SNP selection, genotyping, and quality control

A total of 66 SNPs in CUBN (n=50) and LRP2 (n=16) were selected to evaluate potential

ESKD associations in African Americans. Figure 1 displays the study design and

Supplementary Table S1 displays the SNPs and their sources of selection. Fifteen SNPs were selected from samples provided by Wake Forest to the T2D-GENES Consortium exome sequencing project (https://t2d-genessph.umich.edu/), which included 529 African

American T2D-ESKD cases and 535 non-diabetic non-nephropathy controls. T2D-GENES genotyping and quality control methods have been reported (Bonomo et al. 2014b). Among the total of 407 CUBN SNPs and 581 LRP2 SNPs identified in T2D-GENES, we selected 4

CUBN and 11 LRP2 SNPs for analysis based on association results with T2D-ESKD with p- values <0.10. Eleven additional coding variants were selected from Exome Sequencing

Project (ESP) data based on minor allele frequencies (MAF) in African Americans >0.01 and

“probably damaging” effects using Polyphen2 prediction; 6 were CUBN SNPs and 5 were in

142

LRP2. From the literature, two CUBN SNPs previously associated with kidney disease phenotypes, rs1801239 and rs7918972 hereafter referred to as “index” SNPs, were included

(Böger et al. 2011; Reznichenko et al. 2012). In addition, 38 haplotype-tagging SNPs with

MAF >5% across the HapMap region of linkage disequilibrium (chr10:16897609-17012313) and inclusive of the index variants were selected. Because the index SNPs were primarily identified in European populations, the borders of their linkage disequilibrium blocks in Utah residents with Northern and Western European ancestry were determined in Haploview and tagging SNPs were selected between these regions in Yoruba in Ibadan, Nigeria (YRI) to tag differential linkage disequilibrium block structures in populations with recent African ancestry.

Association analyses were performed in 2,041 independent African American cases with T2D-ESKD and 1,807 independent non-nephropathy controls (667 controls with T2D lacking nephropathy and 1,140 controls without T2D) who had been genotyped on the

Affymetrix Axiom Biobank Genotyping Array (AXIOM samples; none of these individuals overlapped with T2D-GENES). Detailed SNP information, genotyping methods, and AXIOM quality control data are reported in Supplementary Methods. Of the initial 66 SNPs selected in the discovery analysis, 35 were present in the AXIOM samples, 27 in CUBN and 8 in

LRP2. Of these, 10 CUBN SNPs and 8 LRP2 SNPs were present in both T2D-GENES and

AXIOM data; these 18 SNPs were included in a meta-analysis from both datasets (Figure

1).

To assess associations in non-diabetic ESKD, 885 African Americans with non-diabetic

ESKD and 721 non-diabetic non-nephropathy controls were genotyped for the 66 CUBN and

LRP2 SNPs. Genotyping was performed using the Sequenom MassArray system

143

(Sequenom, San Diego, CA). Polymerase chain reaction primers were designed using

MassARRAY Assay Design 3.1 (Sequenom), and genotypes analyzed using MassARRAY

Typer (Sequenom). Of all 66 SNPs, 60 were successfully genotyped, had call rates >95%, and met quality control standards based on 100% concordance with blind duplicates and

Hardy-Weinberg Equilibrium (HWE) p-value >1x10-4. Two APOL1 G1 nephropathy-risk SNPs

(rs73885319; rs60910145) and an insertion/deletion for the APOL1 G2 risk allele

(rs71785313) were genotyped in all samples on the same platform.

Statistical analysis

For data from the Axiom custom array, a linear mixed model-based method was used to correct for population structure and cryptic relatedness (Sawcer et al. 2011). This resulted in an inflation factor <1.002, computed from 315,610 high-quality autosomal SNPs with a

MAF >0.05. Since all samples in T2D-GENES and non-diabetic ESRD datasets at Wake

Forest were from unrelated individuals, logistic regression was performed using PLINK for the NGES and directly genotyped data

Single SNP association tests in all sets were computed using an additive genetic model.

The fully-adjusted model for association with ESKD included participant age, sex, African ancestry proportion, and recessive APOL1 G1/G2 risk alleles. The adjusted model for association with T2D per se in the Axiom samples (T2D-only versus non-T2D, non- nephropathy controls) included participant age, sex, and African ancestry proportion (not

APOL1). A corrected P value (Pcorr) was calculated for SNP associations by adjusting for the number of SNPs tested in each study. Pcorr values <0.05 were considered statistically significant.

144

Meta-analysis

Summary statistics from 18 overlapping SNPs in the T2D-GENES and AXIOM samples

(10 CUBN and 8 LRP2) were combined using the fixed-effects meta-analysis method implemented in METAL (Willer et al. 2010). Of these, CUBN index SNP rs1801239 was included, but the second CUBN index SNP rs7918972 was not present in T2D-GENES and could not be meta-analyzed.

Results

Demographic data from cases and controls in the T2D-GENES, AXIOM, and Wake

Forest non-diabetic ESKD samples are summarized in Table 1. Participant characteristics were generally similar among cases with T2D-ESKD; however, age at onset of T2D was younger in T2D-ESKD cases than individuals with T2D lacking nephropathy. T2D-ESKD cases had older ages at enrollment, compared with individuals with T2D lacking nephropathy and non-diabetic non-nephropathy controls. Mean ages at enrollment and body mass index

(BMI) were lower in non-diabetic ESKD cases than in the T2D-ESKD groups, while their duration of ESKD was longer.

The initial evaluation of 15 SNPs in 529 T2D-GENES African Americans with T2D-ESKD versus 535 non-diabetic non-nephropathy controls identified 6 variants (3 in CUBN, 3 in

LRP2) nominally associated with T2D-ESKD (p<0.05 in additive models adjusted for age, sex, African ancestry proportion, and APOL1). Among these, common synonymous CUBN variant rs1873469 showed the strongest association, MAF 26% in T2D-ESKD cases and

33% in controls (p=0.003; odds ratio [OR] 0.72; 95% confidence interval [CI] 0.58-0.90). In addition, two low frequency protective missense LRP2 variants were identified: rs17848169

145

(N2632D, p=0.011; OR 0.19; 95% CI 0.05-0.68) and rs34291900 (G669D, p=0.015; OR

0.17; 95% CI 0.04-0.71); MAF were 0.39% and 0.29% in cases; 1.2% and 1.1% in controls, respectively. The other 9 SNPs showed trends toward association (p<0.10; Supplementary

Table S2).

Thirty five of the 66 SNPs selected for analysis could be surveyed in the AXIOM

replication sample consisting of 2,041 cases with T2D-ESKD, 667 individuals with T2D

lacking nephropathy, and 1,140 non-diabetic non-nephropathy controls (Supplementary

Table S1). In fully-adjusted models, LRP2 SNP rs17848169 (p=0.018; OR 0.54; 95% CI

0.32-0.90) and CUBN index SNP rs1801239 (p=0.018; OR 1.37; 95% CI 1.06-1.78) were

associated with T2D-ESKD (vs. non-nephropathy controls). The LRP2 SNP rs34291900

detected in the T2D-GENES data showed a weak trend toward association (p=0.18; OR

0.67; 95% CI 0.38-1.20, Table 2). Other SNPs tested in AXIOM samples, including the

second CUBN index SNP rs7918972, were not associated with T2D-ESKD

(Supplementary Table S1).

A meta-analysis considering 10 CUBN and 8 LRP2 SNPs genotyped in T2D-GENES

and AXIOM samples was performed for association with T2D-ESKD (Table 2). LRP2 SNP

rs17848169 was significantly associated with T2D-ESKD (vs. non-nephropathy controls)

after correction for multiple testing with Pcorr=0.031 (p-value 0.002*18=0.036) and LRP2

SNP rs34291900 was nominally associated (p=0.032); both variants showed the same

directions of effect in each sample set. CUBN index SNP rs1801239 also replicated

association in the meta-analysis with p=0.027 and the same direction of effect in each set.

The meta-analysis was repeated by removing cases and controls felt likely to be at risk for

146

non-diabetic ESKD based on possession of two APOL1 renal-risk variants. Despite a

smaller sample, results generally remained consistent (Supplementary Table S3).

Associations between CUBN and LRP2 variants with non-diabetic ESKD were next assessed. The 66 selected SNPs were genotyped in 885 Wake Forest African American cases with non-diabetic ESKD and 721 non-diabetic non-nephropathy controls (independent from T2D-GENES); 60 SNPs were successfully genotyped and met quality control standards for analysis. Among these, 2 SNPs in LRP2 and 4 SNPs in CUBN were nominally associated with non-diabetic ESKD (vs. non-nephropathy controls) in the fully-adjusted model with p- values of 0.018 to 0.047 under the additive model (Table 3). None remained significantly associated after correction for multiple comparisons (Pcorr>0.05). One CUBN index SNP rs7918972 trended toward significant association with non-T2D ESKD (p=0.06; OR=1.25;

95% CI 0.99-1.58), while the second CUBN index SNP rs1801239 was not associated

(p=0.90; OR=1.03; 95% CI 0.61-1.74, Supplementary Table S1).

To determine whether SNPs associated with T2D-ESKD reflected association with T2D per se or kidney disease, trait discrimination analyses were performed in the AXIOM samples. None of the associated variants were associated with T2D per se, when comparing

T2D non-nephropathy cases with non-diabetic non-nephropathy controls (for example, p=0.40 and 0.78, respectively, for LRP2 SNP rs17848169 and CUBN index SNP rs1801239;

Table 4). Furthermore, CUBN SNP rs1801239 was associated with nephropathy in cases with T2D-ESKD compared to those with T2D lacking nephropathy (p=0.037). These findings support risk or protective allele associations with nephropathy, not with diabetes.

147

Discussion

This study investigated genetic association between CUBN and LRP2 gene variants with diabetic and non-diabetic etiologies of ESKD in African Americans from T2D-GENES, independent AXIOM array-based samples, and additional cases with non-diabetic etiologies of ESKD and controls from Wake Forest. Since NGES is a powerful technology which allows one to comprehensively identify and test genetic variations in coding sequences of genes for disease association, we used NGES data (T2D-GENES) to survey CUBN/LRP2 genes as a first step and identified 15 SNPs suggestively associated with T2D-ESKD (P<0.10).

Considering most of the SNPs from T2D-GENES were rare variants, 11 additional coding variants from ESP were selected as a supplement based on their allele enrichment

(MAF>0.01) and in silico prediction. To explore the role of common variants in disease susceptibility, 38 tagging SNPs (MAF>0.05) across the HapMap region of linkage disequilibrium with the two index variants were also selected. Thus, this study provided a locus-wide association analysis instead of simple replication for the identified variants.

CUBN index SNP rs1801239 replicated risk for association with T2D-ESKD; this variant was previously associated with albuminuria. A novel LRP2 missense variant rs17848169

(N2632D) was also found to be protective from T2D-ESKD. In contrast, no CUBN or LRP2

SNPs were significantly associated with non-diabetic ESKD in African Americans.

Trait discrimination analyses supported that the associated CUBN and LRP2 variants play roles in nephropathy susceptibility in African Americans, not diabetes per se. The present results suggest an important role of the cubilin-megalin complex in development of progressive diabetic kidney disease in populations with recent African ancestry, beyond

148 albuminuria due to reduced proximal tubule reabsorption. Recent evidence supports the importance of endocytotic reabsorption of filtered albumin in health, because glomerular filtration of albumin appears to be greater than initially appreciated (Osicka et al. 2004;

Russo et al. 2007; Gagliardini et al. 2010). Albumin reabsorption occurs in proximal tubule cells, where the cubilin-megalin receptor complex is expressed on the apical brush border and plays a critical role in receptor-mediated endocytosis (Christensen and Birn 2002; Grant and Donaldson 2009; Amsellem et al. 2010). Although albuminuria often leads to nephropathy progression, roles of the CUBN and LRP2 genes in T2D-ESKD in African

Americans were not previously studied.

Several CUBN and cubilin-associated amnionless gene variants cause Imerslund

Grasbeck syndrome, a rare autosomal-recessive disease characterized by megaloblastic anemia, recurrent infections, failure to thrive, and proteinuria (Storm et al. 2013; Drögemüller et al. 2014). However, the common CUBN variants rs1801239 and rs7918972 were only recently found to associate with albuminuria and nephropathy. In initial studies, rs1801239 was associated with albuminuria, not eGFR or ESKD (Böger et al. 2011). In the present report, rs1801239 was associated with T2D-ESKD with the same direction of effect as for albuminuria, indicating that the C allele carries risk for T2D-ESKD in African Americans. We did not replicate association with the rs7918972 index SNP in CUBN in these African

American cases; this SNP was previously associated with ESKD in a European sample

(Reznichenko et al. 2012). Additional studies with large sample sizes and in different ethnic groups are necessary to clarify the correlations between genetic variation in CUBN and

LRP2 with diabetic ESKD.

149

An LRP2 missense variant, rs17848169 (N2632D), protective for T2D-ESKD in

African Americans, was identified for the first time. Although present at low frequency, concordance for MAFs in cases and controls were present in independent T2D-GENES

(<0.004 cases/0.012 controls) and AXIOM array (<0.005 cases/<0.01 controls) samples. The same trend was observed in the non-diabetic ESKD samples (<0.008 cases/0.011 controls, p=0.14). Therefore, the association with T2D-ESKD appears credible.

Cubilin is a 460 kDa multipurpose receptor which can bind to a range of ligands including intrinsic factor/vitamin B12, transferrin, hemoglobin, high density lipoprotein- cholesterol, apolipoprotein A1, megalin, and albumin (Christensen et al. 2013). As a peripheral membrane protein, cubilin contains a 110-residue N-terminal domain, an eight epidermal growth factor (EGF)-like repeat domain, and 27 CUB domains(Moestrup et al.

1998). Because cubilin lacks transmembrane and cytoplasmic domains, internalization of albumin is thought to be mediated via its interaction with megalin, a 600 KDa transmembrane protein in the LDL-receptor family (Christensen et al. 2009).

LRP2 variant rs17848169 (N2632D) is located in the extracellular low density lipoprotein-receptor repeat segments of megalin where ligand binding sites exist (Saito et al.

1994). Although megalin can bind albumin (Cui et al. 1996), animal studies reveal the major role of megalin in albumin reabsorption is to drive internalization of cubilin-albumin complexes (Amsellem et al. 2010). CUBN index SNP rs1801239 (I2984V) is located in the

22nd CUB domain of cubilin, one of the three fragments that binds to megalin (Ahuja et al.

2008). Therefore, I2984V and N2632D may interfere with the interaction between cubilin and megalin to alter albumin reabsorption. Functional studies will be required to clarify potential

150 mechanisms.

A recent report revealed that the CUBN rs1801239 risk variant appeared on a derived low frequency European haplotype consisting of 19 SNPs, the frequency of each SNP differed significantly in Africans (and was absent in West Africans). This European haplotype may represent a region of extended linkage disequilibrium, conceivably reflecting the effect of positive selective pressure under nutritional influences during evolution (Tzur et al. 2012).

Based on results in admixed African Americans, we observed that variation at rs1801239 was slightly higher than that in the 1000 Genomes or HapMap YRI data (0.024-0.031 in our

Axiom dataset versus 0.018 in public datasets). Among the 19 CUBN variants assessed by

Tzur et. al., only rs1801239 and rs62619939 were available in the current study based on the differential selection strategy. Therefore, further association studies using variants in continental African cohorts, lacking this European-origin haplotype, will be important to clarify the causative variant.

It is noteworthy that none of the CUBN or LRP2 SNPs was associated with non-diabetic etiologies of ESKD in this African American sample. Although the non-diabetic ESKD sample was smaller than the T2D-ESKD cohorts with reduced statistical power (significance level

8.3x10-4, based on the 60 successfully genotyped SNPs in non-diabetic ESKD samples), the expected power to detect SNPs with frequency=0.05 and OR=1.5 was 0.31, we feel it is likely that mechanisms beyond impaired albumin reabsorption in proximal tubule cells contribute to nephropathy in African Americans with non-diabetic kidney disease. Studies have demonstrated that the APOL1 G1 and G2 renal-risk alleles markedly increase risk for

FSGS, focal global glomerulosclerosis, HIV-associated nephropathy and lupus nephritis

151

(Genovese et al. 2010; Tzur et al. 2010b). The younger age of controls relative to cases in this report warrants comment, since this could bias results toward the null hypothesis. In analyses of T2D-ESKD, the mean age of controls was older than the age at onset of T2D in cases. Therefore, the controls are less likely to develop T2D and/or subsequent T2D-ESKD.

In analyses of non-diabetic ESKD, the mean age of controls was nearly four years younger than the age at onset of ESKD in cases ([age at recruitment] – [ESKD duration]); therefore, they are also far less likely to develop ESKD within this short time frame given that they were non-diabetic, non-nephropathy and had a normal serum creatinine concentration

(0.97mg/dl).

In conclusion, genetic association was explored between the cubilin and megalin genes for susceptibility to advanced nephropathy in African Americans. CUBN variant rs1801239, previously associated with albuminuria in predominantly European populations, was associated with T2D-ESKD in individuals with recent African ancestry. A novel LRP2 missense variant rs17848169 (N2632D) was also found to be associated with lower risk for

T2D-ESKD in this population. Variants in CUBN and LRP2 were not associated with type 2 diabetes or non-diabetic etiologies of ESKD in African Americans.

152

Acknowledgements: This work was supported by NIH grants R01 DK53591 (DWB), R01

DK070941 (BIF), DK071891 (BIF), and by the National Natural Science Foundation of China

(No. 81200488). Dr. Ma was supported by an International Society of Nephrology Fellowship and the Shanghai Jiaotong University K.C. Wong Medical Fellowship Fund. The authors thank Dr. Mark D. Okusa (University of Virginia School of Medicine) for assistance with participant recruitment. No author reports a conflict of interest.

153

Figure 1. Cubilin gene (CUBN) and megalin gene (LRP2) SNP selection and genetic association analysis workflow.

Figure abbreviations: T2D – type 2 diabetes; ESKD – end-stage kidney disease; T2D- GENES – Type 2 diabetes GENES Consortium; ESP - Exome Sequencing Project; SNPs – single nucleotide polymorphisms; MAF – minor allele frequency; AXIOM - Affymetrix Axiom Biobank Genotyping Array.

154

Table 1. Demographic and clinical characteristics of study samples

T2D-GENES samples AXIOM samples Non-T2D-ESKD samples

Sample source T2D-ESKD Healthy T2D-ESKD T2D-only Healthy Non-T2D-ESKD Healthy

Cases Controls cases (no-kidney disease) controls cases controls

Number 529 535 2041 667 1140 885 721

Female (%) 61.2 57.3 57.1 64.5 51.7 44.1 48.8

Age at recruitment (years) 61.6±10.5 49.0±11.9 61.4±10.8 55.7±11.6 46.5±12.0 55.3±14.4 45.9±12.2

Age at T2D (years) 47.3±9.9 ─ 38.6±12.7 46.2±12.3 ─ ─ ─

Duration of T2D prior to 12 (6,19) ─ 19 (13, 26) ─ ─ ─ ─ ESKD (years)

Duration of ESKD (years) 3.77±3.8 ─ 3.64±3.51 ─ ─ 6.26±6.15 ─

Blood glucose(mg/dl) ─ 88.8±13.1 ─ ─ 96.5±20.9 NA 89.3±13.5

Serum creatinine (mg/dl) ─ 0.99±0.25 ─ 0.94±0.20 0.97±0.20 ─ 0.97±0.19

BMI at recruitment (kg/m2) 29.7±7.0 30.0±7.0 30.8±7.1 33.1±7.8 29.7±7.4 27.1±7.0 29.2±7.3

African ancestry (%) 80.13±11.44 78.02±11.25 83.87±11.11 82.01±12.58 82.34±10.90 84.35±11.79 82.29±10.75

Categorical data expressed as percentage; Continuous data presented as mean ± SD; Duration of T2D prior to ESKD presented as median (25th percentile, 75th percentile); T2D, type 2 diabetes; ESKD, end-stage kidney disease; BMI, body mass index

155

Table 2. Association analysis between CUBN and LRP2 variants with T2D-ESKD (additive, fully-adjusted model)

T2D-GENES samples AXIOM samples Meta-analysis (T2D-ESKD cases vs healthy controls) (T2D-ESKD cases vs T2D-only+healthy controls) GENE SNP Minor allele N case MAF case P- N case MAF case P- P- Direc- OR 95%CI OR 95% CI OR /control /control value /control /control value value tion

0.32, LRP2 rs17848169 C 512/502 0.004/0.012 0.011 0.19 0.05, 0.68 1997/1579 0.005/0.01 0.018 0.54 0.002 0.47 − − 0.90

0.38, LRP2 rs34291900 T 511/501 0.003/0.011 0.015 0.17 0.04, 0.71 1997/1584 0.005/0.006 0.18 0.67 0.032 0.56 − − 1.20

0.83, LRP2 rs4667591 G 511/502 0.24/0.23 0.74 1.04 0.81, 1.34 1991/1582 0.20/0.22 0.17 0.93 0.26 0.94 + − 1.03

0.79, LRP2 rs144081819 T 512/502 0.036/0.033 0.63 1.15 0.65, 2.01 1997/1584 0.036/0.038 0.95 0.99 0.91 1.01 + − 1.24

0.85, LRP2 rs143367996 A 512/502 0.022/0.032 0.30 1.41 0.74, 2.70 1995/1584 0.023/0.023 0.39 1.13 0.23 1.17 + + 1.50

0.74, LRP2 rs61995913 C 512/502 0.046/0.047 0.59 0.87 0.53, 1.43 1995/1582 0.043/0.046 0.37 0.91 0.30 0.90 − − 1.12

0.82, LRP2 rs116456291 T 512/502 0.036/0.030 0.90 1.04 0.58, 1.88 1993/1580 0.037/0.037 0.81 1.03 0.79 1.03 + + 1.29

0.12, LRP2 rs144864408 T 512/502 0.001/0.002 0.091 0.090 0.01, 1.47 1997/1584 0.001/0.001 0.74 0.73 0.23 0.39 − − 4.48

0.89, CUBN rs1873469 A 512/502 0.26/0.33 0.003 0.72 0.58, 0.90 1994/1581 0.28/0.29 0.66 0.98 0.12 0.93 − − 1.07

156

0.28, CUBN rs144360241 C 512/502 0.002/0.001 0.022 16.1 1.51, 172.2 1998/1584 0.0008/0.002 0.88 0.92 0.31 1.61 + − 2.97

0.29, CUBN rs148100631 C 512/502 0.001/0.005 0.039 0.073 0.01, 0.88 1997/1583 0.001/0.001 0.63 1.48 0.47 0.60 − + 7.48

1.06, CUBN rs1801239 C 512/502 0.029/0.029 0.95 1.02 0.55, 1.91 1993/1582 0.031/0.024 0.018 1.37 0.027 1.31 + + 1.78

0.76, CUBN rs74431427 A 512/502 0.023/0.018 0.64 1.18 0.59, 2.33 1989/1581 0.020/0.019 0.88 1.02 0.75 1.05 + + 1.38

0.88, CUBN rs111265129 C 512/502 0.038/0.041 0.95 0.98 0.58, 1.66 1996/1579 0.043/0.041 0.46 1.08 0.51 1.07 − + 1.34

0.93, CUBN rs780807 A 511/502 0.31/0.29 0.52 1.08 0.86, 1.35 1988/1582 0.30/0.30 0.70 1.02 0.55 1.03 + + 1.12

0.78, CUBN rs2271460 C 512/502 0.016/0.015 0.41 1.43 0.61, 3.34 1993/1578 0.01/0.007 0.37 1.24 0.24 1.28 + + 1.96

0.89, CUBN rs2271462 T 512/502 0.14/0.13 0.63 1.08 0.80, 1.44 1991/1580 0.14/0.14 0.90 1.01 0.76 1.02 + + 1.14

0.87, CUBN rs12259370 T 512/502 0.071/0.061 0.42 1.19 0.78, 1.83 1955/1544 0.068/0.067 0.72 1.03 0.53 1.05 + + 1.22

T2D, type 2 diabetes; ESKD, end-stage kidney disease; MAF, minor allele frequency; OR, odds ratio; 95% CI, 95% confidence interval; sample sizes may be smaller than those displayed in Table 1 due to missing APOL1 genotypes.

157

Table 3. Strongest genetic associations in cases with non-diabetic ESKD (additive, fully-adjusted model)

Gene SNP Minor N MAF P- OR 95% CI allele cases/controls cases/controls value LRP2 rs11898106 G 868/713 0.30/0.26 0.043 1.21 1.01, 1.46 LRP2 rs78750385 G 885/720 0.11/0.09 0.032 1.34 1.03, 0.76

CUBN rs3808925 C 873/715 0.17/0.15 0.019 1.30 1.05, 1.62 CUBN rs2796838 T 883/718 0.40/0.43 0.048 0.85 0.72, 1.00 CUBN rs11254267 A 882/717 0.092/0.07 0.018 1.43 1.06, 1.93 CUBN rs7921129 T 876/714 0.32/0.35 0.047 0.84 0.71, 1.00 MAF, minor allele frequency; OR, odds ratio; 95% CI, 95% confidence interval

158

Table 4. Trait discrimination analysis of T2D-ESKD associated SNPs in AXIOM array samples

Minor GENE SNP N MAF P- allele Subgroups OR 95%CI cases/controls cases/controls value

LRP2 rs17848169 C T2D-ESKD cases vs healthy controls 1997/976 0.005/0.010 0.018 0.54 0.32, 0.90

T2D-ESKD cases vs T2D-only 1997/603 0.005/0.008 0.16 0.58 0.27, 1.25 controls

T2D-only cases vs healthy controls* 662/1140 0.008/0.011 0.40 0.75 0.39, 1.45

CUBN rs1801239 C T2D-ESKD cases vs healthy controls 1993/975 0.031/0.025 0.24 1.18 0.90, 1.55

T2D-ESKD cases vs T2D-only 1993/607 0.031/0.022 0.037 1.57 1.03, 2.40 controls

T2D-only cases vs healthy controls* 666/1138 0.023/0.024 0.78 0.94 0.63, 1.42

* adjusted for age, sex, and African ancestry proportion; other comparisons included APOL1 adjustment. Sample numbers varied due to missing APOL1 genotypes. T2D, type 2 diabetes; ESKD, end-stage kidney disease; MAF, minor allele frequency; OR, odds ratio; 95% CI, 95% confidence interval.

159

Supplementary Methods

Genotyping and quality control of AXIOM array data

The custom Affymetrix Axiom Biobank Genotyping Array (Affymetrix, Santa Clara, CA,

USA) includes the standard content of 264,000 exome variants, 70,000 loss-of-function variants,

2000 pharmacogenomic variants, 23000 eQTL markers, and 265,000 multi-ancestry based genome-wide tag SNPs. The custom content of the array includes 115,000 variants containing

SNPs associated with kidney disease, type 2 diabetes or adiposity-related phenotypes from the literature, and tag SNPs in the respective candidate gene/loci. DNA from cases and controls were equally added on 96-well plates to minimize errors during sample processing. A total of 81 blind Coriell samples were included in genotyping and had a concordance rate of 99.21% compared to 1000 Genome genotypes. Genotype calling was performed using Affymetrix Power

Tools (APT). A total of 724,530 SNPs were successfully called for downstream QC and analyses. All QC was performed using the PLINK package

(http://pngu.mgh.harvard.edu/~purcell/plink/), unless otherwise specified. SNPs with less than

95% call rate, departure from Hardy Weinberg Equilibrium (P<1x10-4), and monomorphic SNPs were removed. Samples with call rate <95% were removed. Cryptic relatedness was assessed by identity-by-descent (IBD) analysis. Duplicate samples were identified, and one of each duplicate pair was removed. Subjects with gender discordance between genetic estimation and self-report were removed. Subjects with very negative inbreeding coefficients, suggesting DNA contamination, were removed. Principal Components (PC) Analysis was performed using

EIGENSOFT (https://data.broadinstitute.org/alkesgroup/EIGENSOFT/), combining our samples and the 1000 Genomes EUR, AFR and ASN. Samples with only European ancestry or ancestry other than European and African were removed. The first 10 Principal Components (PCs) were calculated using these samples. PC1 was associated with African-European ancestry and no additional population substructure was observed.

160

Supplementary Table S1. All 66 SNPs tested in CUBN and LRP2 selected from T2D-GENES, ESP, and Hapmap

Minor T2D-GENES set* AXIOM set # Non T2D-ESKD setΔ Protein allele Gene SNP SOURCE change Polyphen2 p-value OR p-value OR p-value OR CUBN rs1873469 T2D-GENES G1174= - A 0.0034 0.72 0.66 0.98 0.92 0.99 CUBN rs144360241 T2D-GENES N2157D Pro dam C 0.022 16.1 0.88 0.92 0.29 4.12 CUBN rs148100631 T2D-GENES N2320S Benign C 0.039 0.073 0.63 1.48 0.54 2.16 CUBN rs186593805 T2D-GENES intron - A 0.072 0.079 0.77 1.36 LRP2 rs17848169 T2D-GENES N2632D Pos dam C 0.011 0.19 0.018 0.54 0.14 0.52 LRP2 rs34291900 T2D-GENES G669D Pro dam T 0.015 0.17 0.18 0.67 0.18 0.48 LRP2 rs139916510 T2D-GENES intron - A 0.024 0.25 0.26 1.65 LRP2 rs150373759 T2D-GENES T4645= - A 0.051 0.19 0.074 3.53 LRP2 rs78750385 T2D-GENES intron - G 0.062 0.71 0.032 1.34 LRP2 rs73970129 T2D-GENES P4378= - C 0.062 1.80 0.24 1.32 LRP2 rs150122414 T2D-GENES intron - A 0.064 0.58 0.40 0.83 LRP2 rs148452352 T2D-GENES intron - T 0.067 0.33 0.85 1.10 LRP2 rs115854094 T2D-GENES intron - G 0.067 2.67 LRP2 rs11898106 T2D-GENES intron - G 0.068 0.81 0.042 1.21 LRP2 rs144864408 T2D-GENES V1203I Benign T 0.091 0.090 0.74 0.73 CUBN rs2271462 ESP G1840S Pro dam T 0.63 1.08 0.90 1.01 0.13 1.20 CUBN rs62619939 ESP L2153F Pro dam G 0.72 1.06 CUBN rs12259370 ESP G66R Pro dam T 0.42 1.19 0.72 1.03 0.39 1.15 CUBN rs111265129 ESP I3189V Pro dam C 0.95 0.98 0.46 1.08 0.29 1.25 CUBN rs74431427 ESP S3329L Pro dam A 0.64 1.18 0.88 1.02 0.40 0.76 CUBN rs2271460 ESP F2263C Pro dam C 0.41 1.43 0.37 1.24 LRP2 rs4667591 ESP I4210L Pro dam G 0.74 1.04 0.17 0.93 0.31 0.90 LRP2 rs61995913 ESP A1901G Pro dam C 0.59 0.87 0.37 0.91 0.17 1.30 LRP2 rs144081819 ESP A3602T Pro dam T 0.63 1.15 0.95 0.99 0.77 0.94 LRP2 rs116456291 ESP V1765M Pro dam T 0.90 1.04 0.81 1.03 1.00 1.00 LRP2 rs143367996 ESP P3468 Pro dam A 0.30 1.41 0.39 1.13 0.80 1.07 CUBN rs1801239 Index I2984V Benign C 0.95 1.02 0.018 1.37 0.90 1.03 CUBN rs7918972 Index intron - G 0.34 0.94 0.061 1.25 CUBN rs7099855 Tag SNP intron - C 0.53 0.93 CUBN rs1849381 Tag SNP intron - C 0.37 0.92 CUBN rs2796838 Tag SNP intron - T 0.048 0.85 CUBN rs780830 Tag SNP intron - C 0.21 0.93 0.62 1.05 CUBN rs11254248 Tag SNP intron - G CUBN rs2458399 Tag SNP intron - A 0.24 1.05 0.085 0.86 CUBN rs1276721 Tag SNP intron - G 0.73 1.05 161

CUBN rs7068023 Tag SNP intron - G 0.78 1.01 0.99 1.00 CUBN rs780811 Tag SNP intron - C 0.66 0.96 CUBN rs17139398 Tag SNP intron - C 0.16 1.22 CUBN rs17139411 Tag SNP intron - A 0.33 0.94 0.33 1.11 CUBN rs780837 Tag SNP intron - G 0.10 1.08 0.80 1.02 CUBN rs3808925 Tag SNP intron - C 0.77 0.98 0.019 1.30 CUBN rs11254276 Tag SNP intron - G 0.10 1.18 CUBN rs12251746 Tag SNP intron - T 0.24 1.08 0.59 1.07 CUBN rs809698 Tag SNP intron - A 0.85 0.98 CUBN rs780635 Tag SNP 3’utr - C 0.23 1.11 CUBN rs2603794 Tag SNP intron - G 0.37 0.89 CUBN rs11814160 Tag SNP intron - C 0.60 1.11 CUBN rs10904824 Tag SNP intron - A CUBN rs7897716 Tag SNP intron - C 0.74 0.98 0.076 1.19 CUBN rs7099178 Tag SNP intron - C 0.37 0.92 CUBN rs703062 Tag SNP intron - T 0.79 1.03 0.47 0.94 CUBN rs7908449 Tag SNP intron - A 0.068 1.17 CUBN rs7087545 Tag SNP intron - G 0.17 0.93 0.24 1.12 CUBN rs7075036 Tag SNP intron - C 0.35 0.92 CUBN rs7921129 Tag SNP intron - T 0.047 0.84 CUBN rs17139378 Tag SNP intron - C 0.34 0.96 0.57 0.95 CUBN rs1891473 Tag SNP intron - T 0.75 0.99 0.47 0.94 CUBN rs7904368 Tag SNP intron - C 0.25 1.10 CUBN rs12049735 Tag SNP intron - G 0.75 0.95 CUBN rs10752062 Tag SNP intron - T 0.75 0.98 0.84 1.02 CUBN rs11254267 Tag SNP intron - A 0.077 0.87 0.018 1.43 CUBN rs780807 Tag SNP intron - A 0.52 1.08 0.70 1.02 0.26 0.90 CUBN rs796667 Tag SNP intron - C 0.43 1.03 0.18 0.90 CUBN rs812975 Tag SNP intron - T 0.55 0.95 CUBN rs3740169 Tag SNP intron - T 0.71 0.98 0.42 1.07 CUBN rs11254244 Tag SNP intron - C 0.47 1.04 0.44 0.93 T2D, type 2 diabetes; ESKD, end stage kidney disease; MAF, minor allele frequency; OR, odds ratio; ESP, Exome Sequencing Project; Pro dam, probably damaging; Pos dam, possibly damage; N/A, not available (six SNPs failed during genotyping or QC in non-T2D ESKD samples); *: 529 T2D-ESRD cases vs 535 non-T2D, non-nephropathy controls; #: 2041 T2D-ESRD cases vs 667 T2D+1140 non-T2D, non-nephropathy controls; Δ: 885 non-T2D ESRD cases vs 721 non-T2D, non-nephropathy controls.

162

Supplementary Table S2. Top hits from T2D-GENES Minor Number MAF Gene SNP allele Case/control Case/control P-value OR 95% CI CUBN rs1873469 A 512/502 0.26/0.33 0.0034 0.72 0.58, 0.90 CUBN rs144360241 C 512/502 0.0020/0.0010 0.022 16.1 1.51, 172.20 CUBN rs148100631 C 512/502 0.0010/0.0050 0.039 0.073 0.0061, 0.88 CUBN rs186593805 A 512/502 0.0010/0.0030 0.072 0.079 0.0050, 1.25 LRP2 rs17848169 C 512/502 0.0039/0.012 0.011 0.19 0.053, 0.68 LRP2 rs34291900 T 511/501 0.0029/0.011 0.015 0.17 0.039, 0.71 LRP2 rs139916510 A 510/501 0.0059/0.013 0.024 0.25 0.074, 0.83 LRP2 rs150373759 A 512/502 0.0020/0.0090 0.051 0.19 0.035, 1.01 LRP2 rs78750385 G 511/501 0.084/0.10 0.062 0.71 0.50, 1.02 LRP2 rs73970129 C 512/502 0.034/0.021 0.062 1.80 0.97, 3.33 LRP2 rs150122414 A 512/502 0.027/0.040 0.064 0.58 0.33, 1.03 LRP2 rs148452352 T 512/502 0.0049/0.010 0.067 0.33 0.10, 1.08 LRP2 rs115854094 G 510/500 0.011/0.0070 0.067 2.67 0.93, 7.65 LRP2 rs11898106 G 510/501 0.26/0.29 0.068 0.81 0.64, 1.02 LRP2 rs144864408 T 512/502 0.0010/0.0020 0.091 0.090 0.0055, 1.47 MAF, minor allele frequency; OR, odds ratio; 95% CI, 95% confidence interval.

163

Supplementary Table S3. Association analysis between CUBN and LRP2 variants with T2D-ESKD (additive, APOL1 risk removedΔ) T2D-GENES samples AXIOM samples (T2D-ESKD cases vs healthy controls) (T2D-ESKD cases vs T2D-only+healthy controls) Meta-analysis Direction GENE SNP Minor allele N case/control P-value OR N case/control P-value OR P-value OR 0.50 LRP2 rs17848169 C 400/441 0.018 0.19 1615/1379 0.055 0.59 0.0078 -- 0.60 LRP2 rs34291900 T 399/440 0.015 0.15 1615/1383 0.37 0.75 0.081 -- 1611/1381 0.25 0.93 0.37 0.95 +- LRP2 rs4667591 G 399/441 0.68 1.06 1616/1383 0.69 0.95 0.99 1.00 +- LRP2 rs144081819 T 400/441 0.31 1.38 1614/1383 0.76 1.05 0.52 1.10 ++ LRP2 rs143367996 A 400/441 0.38 1.36 1614/1381 0.50 0.93 0.34 0.91 -- LRP2 rs61995913 C 400/441 0.36 0.77 1612/1379 0.95 0.99 0.80 1.03 +- LRP2 rs116456291 T 400/441 0.38 1.35 1615/1383 0.83 0.82 0.83 0.82 -- LRP2 rs144864408 T 400/441 0.99 0.00 400/441 0.95 CUBN rs1873469 A 0.043 0.78 16121382 0.81 0.99 0.32 -- 400/441 1.99 CUBN rs144360241 C 0.031 15.1 1616/1384 0.88 1.11 0.25 ++ 400/441 0.13 CUBN rs148100631 C 0.038 0.070 1615/1382 0.31 0.25 0.027 -- 400/441 1.25 CUBN rs1801239 C 0.87 0.95 1612/1382 0.062 1.31 0.10 -+ 400/441 CUBN rs74431427 A 0.85 0.93 1610/1380 0.77 1.05 0.85 1.03 -+ 400/441 CUBN rs111265129 C 0.32 0.74 1614/1379 0.48 1.09 0.75 1.03 -+ 400/441 CUBN rs780807 A 0.68 1.06 1608/1381 0.35 1.05 0.30 1.05 ++ 400/441 CUBN rs2271460 C 0.43 1.45 1612/1379 0.50 1.19 0.33 1.25 ++ CUBN 400/441 rs2271462 T 0.37 1.16 1611/1379 0.60 0.97 0.89 0.99 +- 400/441 CUBN rs12259370 T 0.60 1.14 1581/1352 0.46 1.07 0.39 1.08 ++ T2D, type 2 diabetes; ESKD, end-stage kidney disease; MAF, minor allele frequency; OR, odds ratio; 95% CI, 95% confidence interval; sample sizes may be smaller than those displayed in Table 2 due to removal of individuals with two APOL1 renal risk alleles. Δ: Individuals with two APOL1 nephropathy-risk variants were removed, adjusted for age, sex, and ancestry.

164

Chapter 6

Discussion and Conclusions

Diabetic kidney disease (DKD) is the most devastating complication of type 2 diabetes

(T2D), accounting for almost the entire excess mortality in T2D patients, where the death risk among diabetic patients without kidney disease is similar to the general population (Afkarian et al. 2013). Each year, ~120,000 individuals develop end-stage kidney disease (ESKD) in the US, nearly 44% of them are caused by DKD (USRDS, 2016). The mortality rates for ESKD, dialysis, and transplant patients were 136, 166, and 30, per 1,000 patients per years, respectively in

2014 (USRDS, 2016). In addition, the annual expenditure for ESKD borne solely by Medicare exceeded $32.8 billion in 2014, accounting for 7.2% of all Medicare paid claims costs (USRDS,

2016). The high mortality rate and extensive medical treatment have made ESKD a great health burden worldwide. Large efforts have been allocated to DKD research, aiming to identify the major risk factors, and ultimately lead to novel therapeutic targets. A number of contributors of

DKD have been suggested by previous reports, these include family history, glycemia, hemoglobin A1c, albuminuria, duration of diabetes, serum uric acid, systolic blood pressure, dyslipidemia, obesity, and smoking (Zoppini et al. 2012; Macisaac et al. 2014; Radcliffe et al.

2017). Notably, multiple lines of evidence, including family studies and GWAS, have suggested a strong genetic component in DKD (Seaquist et al. 1989; Spray et al. 1995; Freedman et al.

1995; Pezzolesi et al. 2009; McDonough et al. 2011; Sandholm et al. 2012; Iyengar et al. 2015).

However, the majority of DKD genetic studies suffer from limited sample sizes, as well as heterogeneous disease etiologies in the case group, which may lead to both false negative results and false positive discoveries (Sandholm et al. 2017). While the APOL1 renal-risk variants explain a substantial proportion of disease risk of non-diabetic ESKD (Genovese et al.

2010), the genetic architecture underlying DKD remains unclear. It is unclear whether genetic

165 findings from studies of kidney functions, milder chronic kidney disease (CKD) and non-diabetic

ESKD are transferable to the risk prediction of ESKD attributed to T2D, which may involve different pathophysiology. Studies of kidney diseases in African Americans (AAs) are relatively limited as compared to Europeans. Therefore, the major goal of these thesis projects is to gain further insight into the genetic landscape of T2D-ESKD in AAs.

Throughout these studies, we utilized vast wealth of clinical and genetic data in an effort to identify genetic variants associated with T2D-ESKD. Specifically, these projects involved up to 15,075 AA individuals with T2D-ESKD, T2D, non-diabetic ESKD and control subjects recruited at Wake Forest and collaborative study sites. Their clinical records and lab measurements were carefully evaluated to classify subjects for their diabetes and kidney function status. A variety of genetic data, produced by GWAS chip with additional exome and/or custom contents, and exome-sequencing data, provided high genomic coverage. The richness of the datasets allows us to explore several hypotheses from multiple perspectives. First, we evaluated the “common disease-common variant” hypothesis in chapter 2 by performing an extended T2D-ESKD GWAS incorporating 3,432 T2D-ESKD cases and 6977 non-diabetic non- nephropathy controls across six cohorts (WFSM, FIND, ARIC, MESA, CARDIA, JHS). It was followed by a discrimination analysis with 2,756 T2D non-nephropathy individuals to remove signals attributed to T2D alone. An additional cohort of 1,910 non-diabetic ESKD subjects was used to assess the contribution of T2D-ESKD associations to general forms of kidney disease.

This study revealed genome-wide significant association located in 7 regions with T2D-ESKD, including LOC101929282/RBM43, LINC01322, RBFOX3/MIR4739, ENPP7, GNG7, APOL1, and TCF7L2. In addition, there was one locus, LINC00460/EFNB2, reached genome-wide significance association with all-cause ESKD under the baseline model. Our results are not consistent with findings from CKD including milder form of kidney disease and kidney function while only three gene show association either diabetic or non-diabetic ESKD, suggesting

166 different pathogenesis. These results highlights the importance to study advanced disease status to identify genetic variants which otherwise may not be uncovered using biochemical kidney markers or milder form of disease. Second, we explored the “common disease-rare variant” hypothesis for T2D-ESKD, that is, rare variants located in coding region may partially account for the disease susceptibility. We performed a comprehensive association study in

1,730 AA individuals with exome-sequencing data, followed by an independent AA cohort containing 3,806 subjects as replication cohort. In addition, an all-cause ESKD cohort consisting of 3,048 samples was used to exam the role of T2D-ESKD variants in general causes of ESKD.

15 nominally associated T2D-ESKD loci were identified, which confirmed that genetic variation in coding region may explain the genetic predisposition to T2D-ESKD in AAs. The most significant locus was RREB2, which associated with both T2D-ESKD and non-diabetic ESKD.

There is no enrichment between exome-sequencing study signals and those arising from our

GWAS in chapter 2. Thus the suggestive ESKD associations generated by variants in coding region seem independent of the non-coding variants identified by GWAS. Finally, in chapter 4 and 5 we hypothesized that genes important in kidney structure or kidney function may contribute to the disease risk of DKD. Genome-wide genetic approaches usually suffer from the multiple comparisons problem. Thus, hypothesis-driven candidate gene method, which incorporates previous findings and biological knowledge, provides a complementary approach to focus on a smaller number of regions. In chapter 4 we assessed 47 kidney structure-related genes in 5 AA cohorts for association with T2D-ESKD. Overall, 9 variants, 2 missense and 7 intronic, upstream, or downstream of 7 distinct candidate regions achieved locus-wide significance. These regions include CD2AP, MMP2, TTC21B, COL4A3, NPHP3-ACAD11,

CLDN8 and ARHGAP24, which are implicated in podocyte and tubulointerstitium-related structures. In chapter 5, we investigated 66 genetic variants located in CUBN and LARP2 for association with both T2D-ESKD and non-diabetic ESKD. We established that low-frequency variants from CUBN and LARP2 modulate susceptibility to T2D-ESKD instead of non-diabetic

167 etiology of ESKD. These projects represent a systematic evaluation of both common genetic risk factors and rare coding variants for the most severe kidney complication in AAs with T2D.

We performed various association, enrichment, and bioinformatic inference analyses to obtain insights into the genetic architecture, as well as biological pathophysiology for T2D-ESKD.

T2D-ESKD GWAS Summary

Previous GWASs have discovered >70 genome-wide significant hits in association with kidney disease or its indicators, mainly focused on early stages, non-diabetic kidney disease in

Europeans (Kottgen et al. 2010; Pattaro et al. 2012, 2016; Tin et al. 2013b). However, efforts to identify DKD genetic susceptibility variants have had limited success, partially due to limited power and heterogeneous disease etiologies (Maeda 2004; Pezzolesi et al. 2009; McDonough et al. 2011; Sandholm et al. 2012, 2017; Iyengar et al. 2015). We therefore extended our previous effort to investigate genetic susceptibility to T2D-ESKD in 15,075 AAs through a high density GWAS. We located seven independent genome-wide significant associations with T2D-

ESKD in LOC101929282/RBM43, LINC01322, RBFOX3/MIR4739, ENPP7, GNG7, APOL1, and

TCF7L2. In addition, we identified two genome-wide signficant all-cause ESKD loci located in

LINC00460/EFNB2 and APOL1.

The identification of APOL1 variant rs9622363, which is in moderate linkage disequilibrium (LD) with APOL1 G1 G2 alleles, may suggest misclassified T2D-ESKD samples in our case group. We then excluded APOL1 risk carriers to minimize the impact of misclassification; the association of rs9622363 was attenuated, which indicate that rs9622363 and APOL1 G1 and G2 alleles are contributing to the same signal.

Variations in two T2D-ESKD loci, LOC101929282/RBM43 and TCF7L2, showed strong evidence of association with T2D in previous reports (Palmer et al. 2012; Saxena et al. 2013;

DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium 2014; Ng et al.

2014b). It is noteworthy that neither of these two associations showed any evidence of

168 association with T2D in this study. In addition, variations in TCF7L2 previously revealed compelling correlation with the development of kidney disease in both diabetic and non-diabetic individuals (Köttgen et al. 2008; Araoka et al. 2010; Fan et al. 2016). Moreover, experimental follow-up suggests that TCF7L2 relates to the progress of DKD via regulating activin receptor- like kinase 1 (ALK1)/Smad1 pathway (Araoka et al. 2010). In sum, our finding provides evidence of a potential pleiotropic effect at LOC101929282/RBM43 and TCF7L2 on T2D and kidney disease in AAs.

Another encouraging discovery of chapter 2 is a genome-wide association with all-cause

ESKD identified at LINC00460/EFNB2. Previous linkage analyses detected a chromosomal region 13q33, which includes EFNB2, harboring significant risk of both diabetic and non-diabetic

ESKD in AAs (Bowden et al. 2004; Freedman et al. 2005). A follow-up study examined 28 tag

SNPs spanning the 39 kilobases (kb) of the EFNB2 coding region for association with all-cause

ESKD and observed nominal association on two SNPs (Hicks et al. 2008). The ephrin-B2

(EFNB2) is expressed in the developing nephron; it appeared to play an important role in glomerular microvascular assembly through interacting with its receptors (Takahashi et al. 2001).

Ephrin-B1 was also found to co-localize with CD2-associated protein (CD2AP) and nephrin at the podocyte slit diaphragm and plays an important role in maintaining barrier function at the slit diaphragm (Hashimoto et al. 2007). Interestingly, variation in CD2AP showed significant association with T2D-ESKD in AAs in chapter 4. Taken together, numerous evidences support that EFNB2 associates with kidney development and functions. It is the most promising causal gene underlying the association of LINC00460/EFNB2 region. A more comprehensive investigation on EFNB2 gene is necessary to further understand the pathophysiology associated with ESKD.

169

T2D-ESKD Exome-sequencing Project Summary

Despite the countless disease associations identified by GWAS, they are mainly non- coding variants, have modest effects, and cumulatively explain only a small proportion of disease liability. Therefore, in chapter 3, we evaluated the contribution of genetic variants, especially low frequency variants, located in coding regions in AA T2D patients on dialysis or with high risk for ESKD. The analyses did not reveal conventionally accepted exome-wide significant associations (P<5x10-7). However, meta-analysis of 4,533 discovery and replication samples revealed 11 suggestive T2D-ESKD associations (P<1x10-4) from eight regions. In addition, removal of APOL1 renal-risk genotype carriers identified four additional associations.

These findings suggest that exome-sequencing is a powerful tool to discover the missing disease heritabilities attributed to low-frequency genetic variants. For instance, a low-frequency variant (rs74678433, MAF=0.027) located in GRM8 region revealed strong evidence of association (P=5.96x10-6) with T2D-ESKD. Product of this gene, metabotropic glutamate receptor 8, was associated with weight gain in mouse models (Duvoisin et al. 2005; Davis et al.

2013). This finding may suggest a shared genetic background between obesity and T2D-ESKD since obesity is one of the risk factor of DKD. Another example is the discovery of two missense variants that associated with T2D-ESKD, rs834514 and rs268671, located at RAD51AP2 and

PRX respectively. Rs834514 may regulate the expression of RAD51AP2, GEN1 and VSNL1 in multiple tissues including pancreas and adipose according GTEx database

(http://www.gtexportal.org/home/). The other missense variant (rs268671) was located in PRX, a gene that was associated with an inherited neurological disorder, late-onset Charcot-Marie-

Tooth (CMT) neuropathy. CMT neuropathy has been repeatedly reported to be associated with renal diseases, mostly focal segmental glomerulosclerosis (FSGS) (Nadal et al. 1998; Boyer et al. 2011; De Rechter et al. 2015). This finding may indicate potential involvement of Mendelian disease genes in DKD susceptibility. Previous studies have suggested that genetic variations in

170

Mendelian disease genes may in part account for the genetic predisposition to common disease

(Blair et al. 2013; Parsa et al. 2013a).

The most encouraging discovery of chapter 3 is the replication of RREB1 gene in both

T2D-ESKD and non-diabetic ESKD in AAs. Genetic variation in RREB1 was previously reported to be associated with fasting glucose, T2D susceptibility, fat distribution, and adipocyte development (Below et al. 2011; DIAbetes Genetics Replication And Meta-analysis (DIAGRAM)

Consortium 2014; Mahajan et al. 2015; Chu et al. 2017). In addition, RREB1 variants have been reported to be associated with kidney function and interact with APOL1 renal-risk-alleles (Yang et al. 2010; Bostrom et al. 2012). Relevant to ESKD susceptibility, the same variant rs41302867, was identified in strong association with all-cause ESKD in both AAs and European Americans

(Bonomo et al. 2014a). This study confirmed the role of RREB1 in general forms of kidney disease in AAs.

Gene-based analysis revealed additional T2D-ESKD loci, TMEM5 and ILDR2, in suggestive association with T2D-ESKD. Ildr2 has been implicated in type 2 diabetes susceptibility and hepatic lipid metabolism in mice models (Watanabe et al. 2013, 2016). The lack of consistency between gene-based analyses and single-variant analyses may be due to the insufficient statistical power of our study. According to a recent research, most genes require >100,000 samples in order to robustly detect an association using gene-based methods

(Auer et al. 2016).

This project confirmed that low-frequency and rare variants located in coding sequence provide crucial information on understanding the genetic architecture of complex disease such as DKD. In addition, the identification of GRM8, PEX6, ILDR2, RREB1 and PRX may indicate a potential genetic correlation between DKD and obesity, dyslipidemia, as well as Mendelian diseases CMT neuropathy.

171

Kidney Structure-Related Candidate Gene Project Summary

High density GWAS suffer from the multiple comparisons problem when millions of variants are tested. Current adjustment procedures, such as Bonferroni correction, could lead to an over conservative result. On the other hand, hypothesis-driven candidate gene method, which incorporates previous findings and biological knowledge into the study design minimize the burden of multiple comparisons since it targets on much smaller number of regions.

Hypothesis-driven study provides unique opportunities targeting candidate genes that have previously demonstrated strong associations with kidney function and structures. Microvascular changes, for instance, abnormal capillary permeability and retinopathy, may precede early structural glomerular changes in patients with diabetes. Thus, variations in structural genes may be manifest specifically in the kidneys, despite possible altered expression in other organs

(Alpert et al. 1972; Girach and Vignati 2006). Therefore in chapter 4 we examined 47 genes important in podocyte, GBM, mesangial cell, mesangial matrix, renal tubular cell, as well as renal interstitium structure for association with advanced DKD in AAs. These genes were carefully selected based on literature search and online database queries. However, we were not able to include many other kidney structure-related genes, for example SLC5A2, due to the poor genomic coverage on these regions in the genotyping arrays.

This study identified strong evidence of association with T2D-ESKD at seven regions.

The most exciting finding was the identification of two missense variants located in COL4A3 and

CLDN8 respectively. The alpha 3 chain in type IV collagen (COL4A3), may be synthesized by podocytes, establishes a heterotrimer with alpha 4 and alph 5 chains, which is the major component of GBM (Miner 2011). This gene has been associated with Alport Syndrome (AS),

FSGS and CKD (Kashtan 1995; Voskarides et al. 2007). The direct connection between

COL4A3 and DKD has not been previously reported. Our finding suggests a strong candidacy of

COL4A3 in DKD in AAs. Another missense variant was located in claudin 8 gene (CLDN8).

172

Claudin 8 is expressed in the thin descending limb of the loop of Henle, the distal tubule, and the collecting duct. It is required for the formation of the paracellular renal chloride channel by interacting with claudin 4 (Kiuchi-Saishin et al. 2002; Li et al. 2004; Hou et al. 2010).

Interestingly, other claudins have been associated with DKD (Gaut et al. 2014; Molina-Jijón et al.

2014). Given the vital role of claudins in the renal tubule, and previous studies, variants in claudin genes may impact progression of kidney disease under the stress of hyperglycemia. A variant located at ARHGAP24 is also worth mentioning. ARHGAP24 plays a role in podocyte differentiation and contributes to the balance of podocyte Rho A and Rac1 signaling (Akilesh et al. 2011). In addition, a mutation in this gene has been associated with familial FSGS.

Furthermore, we also performed bioinformatic characterization to evaluate the functional relevance of the identified variants. Multiple lines of evidence support that T2D-ESKD associated variants at COL4A3, CLDN8, and ARHGAP24 are potentially pathogenic. These findings may aid the searching for novel therapeutic targets of DKD.

CUBN and LRP2 Project Summary

Albuminuria is one of the most frequently assessed markers of kidney damage in clinical practice because it reflects damage to glomerular macromolecular ultrafiltration and/or dysfunctional proximal tubule processing of albumin (Roscioni et al. 2014). Higher levels of albuminuria, even within the low normal range, are associated with increased risks of ESRD, all- cause mortality, and cardiovascular mortality (Hillege et al. 2002; Astor et al. 2011; van der

Velde et al. 2011). The majority of albumin that enters the proximal tubules is reclaimed by a functional receptor complex formed by the endocytic proteins cubilin (CUBN) and megalin

(LRP2) and reintroduced into the bloodstream (Birn et al. 2000; Woredekal and Friedman 2009;

Dickson et al. 2014). A missense variant (I2984V) in CUBN was identified for association with elevated urinary albumin-to-creatinine ratio (UACR) and microalbuminuria in a meta-analysis comprising 63,153 European Americans and 6981 AAs performed by CKDGen Consortium

173

(Böger et al. 2011). Therefore an investigation on genetic variants in CUBN and LRP2 was warranted in order to have a better understanding of the increased risk for DKD in AAs.

One important finding of this study was the replicated risk of CUBN index variant rs1801239 for association with T2D-ESKD; the same variant was previously associated with albuminuria. In addition, we identified a novel LRP2 missense variant rs17848169 (N2632D) to be protective from T2D-ESKD. In contrast, no CUBN or LRP2 variants were significantly associated with non-diabetic etiologies of ESKD in AAs. These results suggest a crucial role of cubilin-megalin complex in DKD development in population with recent African ancestry. Recent studies support that the endocytotic reabsorption of filtered albumin plays a critical role in health, because glomerular filtration of albumin seems to be greater than initially expected (Osicka et al.

2004; Russo et al. 2007; Gagliardini et al. 2010). CUBN encodes a large 460-kD glycosylated multiligand extracellular protein that interacts with a range of membrane proteins involved in endocytosis. Megalin is a 600-kD transmembrane protein initially identified as the autoantigen in

Heymann nephritis, an experimental model for membranous nephropathy. The multiligand endocytic receptor cubilin interacting with megalin has been identified as an essential process in uptaking of the filtered proteins in kidney proximal tubular (Christensen et al. 2012). CUBN index variant rs1801239 (I2984V) is located in the 22nd CUB domain of cubilin, one of the three segments that bind to megalin (Ahuja et al. 2008). In addition, LRP2 variant rs17848169

(N2632D) is located in the extracellular LDL receptor repeat segments of megalin, where the ligand binding sites are (Saito et al. 1994). Therefore, I2984V and N2632D may interfere with the interaction between cubilin and megalin to impact albumin reabsorption. Genetic testing for

CUBN and LRP2 risk variants in early stage of DKD may initiate early treatment to slow down the progression of kidney dysfunction in diabetic patients.

174

Limitations

There are a number of limitations in these projects. Foremost is the limitation to the majority of DKD studies, that is, DKD individuals may be misclassified to have primary cause of kidney disease due to the presence of T2D, due to the frequent lack of kidney biopsy examination. Without special attention, some T2D-ESKD signals may turn out to be associated with non-diabetic etiologies of ESKD. We carefully attempted to minimize the effect of misclassification by removing patients with T2D who had other causes of kidney disease, for example, proteinuric FSGS or GBM diseases. In addition, we removed APOL1-risk-allele carriers who likely had non-diabetic kidney diseases in a secondary model (APOL1-negative model) analysis. These two methods should lead to enrichment for true DKD cases, and additional loci may be discovered as these samples were less heterogeneous despite the smaller sample size. A similar approach was conducted in the identification of FRMD3 association with DKD in non-APOL1 carriers (Freedman et al. 2011).

Another universal limitation to DKD studies is that the associations driven by T2D solely are very difficult to be differentiated from true DKD loci. Despite a discrimination procedure was implemented in each study by comparing T2D-lacking nephropathy patients to non-diabetic non- nephropathy controls, this comparison often lacks statistical power to detect all associations due to the small sample size. Thus we used a very stringent exclusion criterion (P<0.05) to remove any plausible T2D-driven associations.

A limitation specific to Chapter 2 is that we combined directly genotyped data from WFU,

FIND, MESA, ARIC, JHS, and CARDIA for association study in order to avoid case-control imbalance in individual studies which will otherwise lead to inflated statistics. Despite the increased statistical power, combining genotype data across study centers may introduce artifacts and biases which may lead to spurious associations. We performed imputation based on the intersection of high quality genotyped SNPs across studies, which is a strategy

175 recommended by previous study to eliminate biases (Johnson et al. 2013). We also applied principal components analysis to help to control for the artifact effect.

Furthermore, a limitation to Chapters 3-5 is the lack of statistical power to detect rare and low-frequency variants. Although we bring together as many as 8581 AAs individuals in these projects, our study power remains moderate. There are few other existing collections of appropriate AA samples which limited possible replication studies.

Finally, we had limited resources to incorporate functional follow-up of identified loci in our lab. Although we utilized bioinformatic tools and online databases to gain insights into the functional relevance of identified T2D-ESKD loci, it is still necessary to perform in vitro and in vivo experimental verification of predicted molecular mechanisms. It would be interesting to study missense variants that were associated with T2D-ESKD in vitro and in vivo to further characterize attributes of those variants and aid in the development of potential drug targets.

Conclusions

In summary, this thesis work aimed at identifying genetic variations contributing to the advanced kidney disease in AA T2D patients. Most diabetic nephropathy studies examined a mosaic of case subjects, for example, individuals with varying levels of albuminuria or stages of

CKD (Pezzolesi et al. 2009; Bowden and Freedman 2012; Williams et al. 2012). Because each of these diabetic nephropathy sub-phenotypes usually differs with respect to the genetic etiology, it is not surprising that such studies have yielded relatively few consistent genetic associations.

In contrast, ESRD is a well-defined, hard end point, it represent a significant advantage over studying dynamic quantitative traits or less severe phenotypes.

AAs carry a significantly greater risk of T2D-ESKD than other ethnic groups. However it remains unclear how genetic background impacts the increased T2D-ESKD susceptibility. It has been noted that AAs have smaller haplotype blocks, the median haplotype block length in those

176 of African ancestry is 22kb, whereas it is 44kb in European and Asian populations (Gabriel et al.

2002). Lower degrees of LD in AAs provide an unique opportunity for fine mapping of causal variants in loci shared by AAs and other populations (Ng et al. 2013). However, differences in genetic variations across ethnicities result in a complex pattern which further complicates the drawing clear conclusions regarding the relationship between genetic risk factors and ethnic disparities in T2D-ESKD prevalence. For example, we only replicated association at 3 out of 76 genome-wide significant signals that previously associated with kidney disease or function mainly detected in Europeans. The lack of replication may largely due to the ancestral difference in genetic architecture.

In the studies presented here we identified a number of T2D-ESKD associated genetic regions, independent of APOL1. These include six missense variants, rs34505188 (R408H,

OR=1.55) in COL4A3, rs55884670 (M97T, OR=1.35) in CLDN8, rs1801239 (I2984V, OR=1.31) in CUBN, rs17848169 (N2632D, OR=0.47) in LRP2, rs834514 (G1037D, OR=0.77) in

RAD51AP2 and rs268671 (V882A, OR=0.75) in PRX. It is noteworthy that RREB1 variant rs41302867 (OR=0.47) and a genome-wide significant variant rs77113398 (OR=1.94), located upstream to EFNB2, revealed association with both T2D-ESKD and non-diabetic ESKD, which suggest implication in general forms of kidney disease in African Americans.

Collectively, the association between these variants and T2D-ESKD has markedly expanded our understanding of genetic architecture of T2D-ESKD in African Americans, and will aid in future genetic T2D-ESKD prognostication, as well as the identification of tractable therapeutic targets.

177

References

Abu Seman N, He B, Ojala JRM, et al (2014) Genetic and biological effects of sodium-chloride cotransporter (SLC12A3) in diabetic nephropathy. Am J Nephrol 40:408–416. doi: 10.1159/000368916

Adzhubei IA, Schmidt S, Peshkin L, et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248–249. doi: 10.1038/nmeth0410-248

Afkarian M, Sachs MC, Kestenbaum B, et al (2013) Kidney Disease and Increased Mortality Risk in Type 2 Diabetes. J Am Soc Nephrol 24:302–308. doi: 10.1681/ASN.2012070718

Ahluwalia TS, Lindholm E, Groop LC (2011) Common variants in CNDP1 and CNDP2, and risk of nephropathy in type 2 diabetes. Diabetologia 54:2295–2302. doi: 10.1007/s00125- 011-2178-5

Ahuja R, Yammani R, Bauer JA, et al (2008) Interactions of cubilin with megalin and the product of the amnionless gene (AMN): effect on its stability. Biochem J 410:301–308. doi: 10.1042/BJ20070919

Akilesh S, Suleiman H, Yu H, et al (2011) Arhgap24 inactivates Rac1 in mouse podocytes, and a mutant form is associated with familial focal segmental glomerulosclerosis. J Clin Invest 121:4127–4137. doi: 10.1172/JCI46458

Alpert JS, Coffman JD, Balodimos MC, et al (1972) Capillary Permeability and Blood Flow in Skeletal Muscle of Patients with Diabetes Mellitus and Genetic Prediabetes. N Engl J Med 286:454–460. doi: 10.1056/NEJM197203022860903

Amsellem S, Gburek J, Hamard G, et al (2010) Cubilin is essential for albumin reabsorption in the renal proximal tubule. J Am Soc Nephrol JASN 21:1859–1867. doi: 10.1681/ASN.2010050492

Andres A-C, Munarini N, Djonov V, et al (2003) EphB4 receptor tyrosine kinase transgenic mice develop glomerulopathies reminiscent of aglomerular vascular shunts. Mech Dev 120:511–516.

Araoka T, Abe H, Tominaga T, et al (2010) Transcription factor 7-like 2 (TCF7L2) regulates activin receptor-like kinase 1 (ALK1)/Smad1 pathway for development of diabetic nephropathy. Mol Cells 30:209–218. doi: 10.1007/s10059-010-0109-9

Astor BC, Matsushita K, Gansevoort RT, et al (2011) Lower estimated glomerular filtration rate and higher albuminuria are associated with mortality and end-stage renal disease. A collaborative meta-analysis of kidney disease population cohorts. Kidney Int 79:1331– 1340. doi: 10.1038/ki.2010.550

Auer PL, Reiner AP, Wang G, et al (2016) Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project. Am J Hum Genet 99:791. doi: 10.1016/j.ajhg.2016.08.012

178

Badal SS, Danesh FR (2014) New Insights Into Molecular Mechanisms of Diabetic Kidney Disease. Am J Kidney Dis 63:S63–S83. doi: 10.1053/j.ajkd.2013.10.047

Below JE, Gamazon ER, Morrison JV, et al (2011) Genome-wide association and meta-analysis in populations from Starr County, Texas, and Mexico City identify type 2 diabetes susceptibility loci and enrichment for expression quantitative trait loci in top signals. Diabetologia 54:2047–2055. doi: 10.1007/s00125-011-2188-3

Bien SA, Wojcik GL, Zubair N, et al (2016) Strategies for Enriching Variant Coverage in Candidate Disease Loci on a Multiethnic Genotyping Array. PLOS ONE 11:e0167758. doi: 10.1371/journal.pone.0167758

Birn H, Fyfe JC, Jacobsen C, et al (2000) Cubilin is an albumin binding protein important for renal tubular albumin reabsorption. J Clin Invest 105:1353–1361. doi: 10.1172/JCI8862

Blair DR, Lyttle CS, Mortensen JM, et al (2013) A Nondegenerate Code of Deleterious Variants in Mendelian Loci Contributes to Complex Disease Risk. Cell 155:70–80. doi: 10.1016/j.cell.2013.08.030

Böger CA, Chen M-H, Tin A, et al (2011) CUBN is a gene locus for albuminuria. J Am Soc Nephrol JASN 22:555–570. doi: 10.1681/ASN.2010060598

Bonomo JA, Guan M, Ng MCY, et al (2014a) The ras responsive transcription factor RREB1 is a novel candidate gene for type 2 diabetes associated end-stage kidney disease. Hum Mol Genet. doi: 10.1093/hmg/ddu362

Bonomo JA, Ng MCY, Palmer ND, et al (2014b) Coding variants in nephrin (NPHS1) and susceptibility to nephropathy in African Americans. Clin J Am Soc Nephrol CJASN 9:1434–1440. doi: 10.2215/CJN.00290114

Bostrom MA, Kao WHL, Li M, et al (2012) Genetic association and gene-gene interaction analyses in African American dialysis patients with nondiabetic nephropathy. Am J Kidney Dis Off J Natl Kidney Found 59:210–221. doi: 10.1053/j.ajkd.2011.09.020

Bostrom MA, Lu L, Chou J, et al (2010) Candidate genes for non-diabetic ESRD in African Americans: a genome-wide association study using pooled DNA. Hum Genet 128:195– 204. doi: 10.1007/s00439-010-0842-3

Bowden DW, Colicigno CJ, Langefeld CD, et al (2004) A genome scan for diabetic nephropathy in African Americans. Kidney Int 66:1517–1526. doi: 10.1111/j.1523-1755.2004.00915.x

Bowden DW, Freedman BI (2012) The challenging search for diabetic nephropathy genes. Diabetes 61:1923–1924. doi: 10.2337/db12-0596

Boyer O, Nevo F, Plaisier E, et al (2011) INF2 mutations in Charcot-Marie-Tooth disease with glomerulopathy. N Engl J Med 365:2377–2388. doi: 10.1056/NEJMoa1109122

Brown D, Paunescu TG, Breton S, Marshansky V (2009) Regulation of the V-ATPase in kidney epithelial cells: dual role in acid-base homeostasis and vesicle trafficking. J Exp Biol 212:1762–1772. doi: 10.1242/jeb.028803

179

Bryc K, Durand EY, Macpherson JM, et al (2015) The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. Am J Hum Genet 96:37–53. doi: 10.1016/j.ajhg.2014.11.010

Bryc K, Velez C, Karafet T, et al (2010) Colloquium paper: genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proc Natl Acad Sci U S A 107 Suppl 2:8954–8961. doi: 10.1073/pnas.0914618107

Byrne C, Nedelman J, Luke RG (1994) Race, socioeconomic status, and the development of end-stage renal disease. Am J Kidney Dis Off J Natl Kidney Found 23:16–22.

Ceriello A, De Cosmo S, Rossi MC, et al (2017) Variability in HbA1c, blood pressure, lipid parameters and serum uric acid and risk of development of chronic kidney disease in type 2 diabetes. Diabetes Obes Metab. doi: 10.1111/dom.12976

Chambers JC, Zhang W, Lord GM, et al (2010) Genetic loci influencing kidney function and chronic kidney disease. Nat Genet 42:373–375. doi: 10.1038/ng.566

Chang Y-H, Chang D-M, Lin K-C, et al (2013) High-density lipoprotein cholesterol and the risk of nephropathy in type 2 diabetic patients. Nutr Metab Cardiovasc Dis NMCD 23:751–757. doi: 10.1016/j.numecd.2012.05.005

Chen H, Wang C, Conomos MP, et al (2016) Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models. Am J Hum Genet 98:653–666. doi: 10.1016/j.ajhg.2016.02.012

Christensen EI, Birn H (2002) Megalin and cubilin: multifunctional endocytic receptors. Nat Rev Mol Cell Biol 3:256–266. doi: 10.1038/nrm778

Christensen EI, Birn H, Storm T, et al (2012) Endocytic Receptors in the Renal Proximal Tubule. Physiology 27:223–236. doi: 10.1152/physiol.00022.2012

Christensen EI, Nielsen R, Birn H (2013) From bowel to kidneys: the role of cubilin in physiology and disease. Nephrol Dial Transplant Off Publ Eur Dial Transpl Assoc - Eur Ren Assoc 28:274–281. doi: 10.1093/ndt/gfs565

Christensen EI, Verroust PJ, Nielsen R (2009) Receptor-mediated endocytosis in renal proximal tubule. Pflugers Arch 458:1039–1048. doi: 10.1007/s00424-009-0685-8

Chu AY, Deng X, Fisher VA, et al (2017) Multiethnic genome-wide meta-analysis of ectopic fat depots identifies loci associated with adipocyte development and differentiation. Nat Genet 49:125–130. doi: 10.1038/ng.3738

Chun S, Fay JC (2009) Identification of deleterious mutations within three human genomes. Genome Res 19:1553–1561. doi: 10.1101/gr.092619.109

Cingolani P, Platts A, Wang LL, et al (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin) 6:80–92. doi: 10.4161/fly.19695

Consortium T 1000 GP (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073. doi: 10.1038/nature09534

180

Cooper JN, Wei L, Fernandez SA, et al (2015) Pre-operative prediction of surgical morbidity in children: comparison of five statistical models. Comput Biol Med 57:54–65. doi: 10.1016/j.compbiomed.2014.11.009

Cui S, Verroust PJ, Moestrup SK, Christensen EI (1996) Megalin/gp330 mediates uptake of albumin in renal proximal tubule. Am J Physiol 271:F900-907.

Davis MJ, Duvoisin RM, Raber J (2013) Related functions of mGlu4 and mGlu8. Pharmacol Biochem Behav 111:11–16. doi: 10.1016/j.pbb.2013.07.022

Davydov EV, Goode DL, Sirota M, et al (2010) Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++. PLOS Comput Biol 6:e1001025. doi: 10.1371/journal.pcbi.1001025 de Boer IH, Rue TC, Hall YN, et al (2011) Temporal trends in the prevalence of diabetic kidney disease in the United States. JAMA 305:2532–2539. doi: 10.1001/jama.2011.861

De Rechter S, De Waele L, Levtchenko E, Mekahli D (2015) Charcot–Marie–Tooth: Are you testing for proteinuria? Eur J Paediatr Neurol 19:1–5. doi: 10.1016/j.ejpn.2014.08.004

Delaneau O, Marchini J, Zagury J-F (2012) A linear complexity phasing method for thousands of genomes. Nat Methods 9:179–181. doi: 10.1038/nmeth.1785

Demokan S, Chuang AY, Chang X, et al (2013) Identification of guanine nucleotide-binding protein γ-7 as an epigenetically silenced gene in head and neck cancer by gene expression profiling. Int J Oncol 42:1427–1436. doi: 10.3892/ijo.2013.1808

Deshmukh HA, Palmer CNA, Morris AD, Colhoun HM (2013) Investigation of known estimated glomerular filtration rate loci in patients with type 2 diabetes. Diabet Med J Br Diabet Assoc 30:1230–1235. doi: 10.1111/dme.12211

DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium (2014) Genome- wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat Genet. doi: 10.1038/ng.2897

DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium, Asian Genetic Epidemiology Network Type 2 Diabetes (AGEN-T2D) Consortium, South Asian Type 2 Diabetes (SAT2D) Consortium, et al (2014) Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat Genet 46:234–244. doi: 10.1038/ng.2897

Dickson LE, Wagner MC, Sandoval RM, Molitoris BA (2014) The proximal tubule and albuminuria: really! J Am Soc Nephrol JASN 25:443–453. doi: 10.1681/ASN.2013090950

Do R, Stitziel NO, Won H-H, et al (2015) Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction. Nature 518:102–106. doi: 10.1038/nature13917

181

Drögemüller M, Jagannathan V, Howard J, et al (2014) A frameshift mutation in the cubilin gene (CUBN) in Beagles with Imerslund-Gräsbeck syndrome (selective cobalamin malabsorption). Anim Genet 45:148–150. doi: 10.1111/age.12094

Duchateau PN, Pullinger CR, Cho MH, et al (2001) Apolipoprotein L gene family: tissue-specific expression, splicing, promoter regions; discovery of a new gene. J Lipid Res 42:620–630.

Duvoisin RM, Zhang C, Pfankuch TF, et al (2005) Increased measures of anxiety and weight gain in mice lacking the group III metabotropic glutamate receptor mGluR8. Eur J Neurosci 22:425–436. doi: 10.1111/j.1460-9568.2005.04210.x

Erwin GD, Oksenberg N, Truty RM, et al (2014) Integrating Diverse Datasets Improves Developmental Enhancer Prediction. PLOS Comput Biol 10:e1003677. doi: 10.1371/journal.pcbi.1003677

Fan Q, Xing Y, Ding J, et al (2006) The relationship among nephrin, podocin, CD2AP, and alpha-actinin might not be a true “interaction” in podocyte. Kidney Int 69:1207–1215. doi: 10.1038/sj.ki.5000245

Fan Z, Cai Q, Chen Y, et al (2016) Association of the Transcription Factor 7 Like 2 (TCF7L2) Polymorphism With Diabetic Nephropathy Risk: A Meta-Analysis. Medicine (Baltimore) 95:e3087. doi: 10.1097/MD.0000000000003087

Flannick J, Thorleifsson G, Beer NL, et al (2014) Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat Genet 46:357–363. doi: 10.1038/ng.2915

Fox CS, Matsushita K, Woodward M, et al (2012) Associations of kidney disease measures with mortality and end-stage renal disease in individuals with and without diabetes: a meta- analysis. The Lancet 380:1662–1673. doi: 10.1016/S0140-6736(12)61350-6

Freedman BI (2002) End‐stage renal failure in African Americans: insights in kidney disease susceptibility. Nephrol Dial Transplant 17:198–200. doi: 10.1093/ndt/17.2.198

Freedman BI, Bowden DW, Rich SS, et al (2005) A genome scan for all-cause end-stage renal disease in African Americans. Nephrol Dial Transplant Off Publ Eur Dial Transpl Assoc - Eur Ren Assoc 20:712–718. doi: 10.1093/ndt/gfh704

Freedman BI, Hicks PJ, Sale MM, et al (2007) A leucine repeat in the carnosinase gene CNDP1 is associated with diabetic end-stage renal disease in European Americans. Nephrol Dial Transplant Off Publ Eur Dial Transpl Assoc - Eur Ren Assoc 22:1131–1135. doi: 10.1093/ndt/gfl717

Freedman BI, Julian BA, Pastan SO, et al (2015) Apolipoprotein L1 gene variants in deceased organ donors are associated with renal allograft failure. Am J Transplant Off J Am Soc Transplant Am Soc Transpl Surg 15:1615–1622. doi: 10.1111/ajt.13223

Freedman BI, Langefeld CD, Andringa KK, et al (2014) End-stage renal disease in African Americans with lupus nephritis is associated with APOL1. Arthritis Rheumatol Hoboken NJ 66:390–396. doi: 10.1002/art.38220

182

Freedman BI, Langefeld CD, Lu L, et al (2011) Differential Effects of MYH9 and APOL1 Risk Variants on FRMD3 Association with Diabetic ESRD in African Americans. PLoS Genet. doi: 10.1371/journal.pgen.1002150

Freedman BI, Pastan SO, Israni AK, et al (2016) APOL1 genotype and kidney transplantation outcomes from deceased African American donors. Transplantation 100:194. doi: 10.1097/TP.0000000000000969

Freedman BI, Spray BJ, Tuttle AB, Buckalew VM (1993) The familial risk of end-stage renal disease in African Americans. Am J Kidney Dis Off J Natl Kidney Found 21:387–393.

Freedman BI, Tuttle AB, Spray BJ (1995) Familial predisposition to nephropathy in African- Americans with non-insulin-dependent diabetes mellitus. Am J Kidney Dis 25:710–713. doi: 10.1016/0272-6386(95)90546-4

Friedman DJ, Kozlitina J, Genovese G, et al (2011) Population-based risk assessment of APOL1 on renal disease. J Am Soc Nephrol JASN 22:2098–2105. doi: 10.1681/ASN.2011050519

Friedman DJ, Pollak MR (2011) Genetics of kidney failure and the evolving story of APOL1. J Clin Invest 121:3367–3374. doi: 10.1172/JCI46263

Fuchsberger C, Flannick J, Teslovich TM, et al (2016) The genetic architecture of type 2 diabetes. Nature 536:41–47. doi: 10.1038/nature18642

Gabriel SB, Schaffner SF, Nguyen H, et al (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229. doi: 10.1126/science.1069424

Gagliardini E, Conti S, Benigni A, et al (2010) Imaging of the porous ultrastructure of the glomerular epithelial filtration slit. J Am Soc Nephrol JASN 21:2081–2089. doi: 10.1681/ASN.2010020199

Gao C, Huang W, Kanasaki K, Xu Y (2014) The Role of Ubiquitination and Sumoylation in Diabetic Nephropathy. BioMed Res Int 2014:e160692. doi: 10.1155/2014/160692

Gao X, Starmer J, Martin ER (2008) A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol 32:361–369. doi: 10.1002/gepi.20310

Gaut JP, Hoshi M, Jain S, Liapis H (2014) Claudin-1 and Nephrin Label Cellular Crescents in Diabetic Glomerulosclerosis. Hum Pathol 45:628–635. doi: 10.1016/j.humpath.2013.10.030

Genovese G, Friedman DJ, Ross MD, et al (2010) Association of Trypanolytic ApoL1 Variants with Kidney Disease in African Americans. Science 329:841–845. doi: 10.1126/science.1193032

Gibson G (2012) Rare and common variants: twenty arguments. Nat Rev Genet 13:135–145. doi: 10.1038/nrg3118

183

Girach A, Vignati L (2006) Diabetic microvascular complications—can the presence of one predict the development of another? J Diabetes Complications 20:228–237. doi: 10.1016/j.jdiacomp.2006.03.001

Grant BD, Donaldson JG (2009) Pathways and mechanisms of endocytic recycling. Nat Rev Mol Cell Biol 10:597–608. doi: 10.1038/nrm2755

Guan M, Ma J, Keaton JM, et al (2016) Association of kidney structure-related gene variants with type 2 diabetes-attributed end-stage kidney disease in African Americans. Hum Genet 135:1251–1262. doi: 10.1007/s00439-016-1714-2

Gudbjartsson DF, Holm H, Indridason OS, et al (2010) Association of Variants at UMOD with Chronic Kidney Disease and Kidney Stones--Role of Age and Comorbid Diseases. PLoS Genet. doi: 10.1371/journal.pgen.1001039

Gudmundsson J, Sulem P, Gudbjartsson DF, et al (2012) A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nat Genet 44:1326–1329. doi: 10.1038/ng.2437

Gurdasani D, Carstensen T, Tekola-Ayele F, et al (2015) The African Genome Variation Project shapes medical genetics in Africa. Nature 517:327–332. doi: 10.1038/nature13997

Haidara MA, Mikhailidis DP, Rateb MA, et al (2009) Evaluation of the effect of oxidative stress and vitamin E supplementation on renal function in rats with streptozotocin-induced Type 1 diabetes. J Diabetes Complications 23:130–136. doi: 10.1016/j.jdiacomp.2008.02.011

Hashimoto T, Karasawa T, Saito A, et al (2007) Ephrin-B1 localizes at the slit diaphragm of the glomerular podocyte. Kidney Int 72:954–964. doi: 10.1038/sj.ki.5002454

Heidet L, Arrondel C, Forestier L, et al (2001) Structure of the human type IV collagen gene COL4A3 and mutations in autosomal Alport syndrome. J Am Soc Nephrol JASN 12:97– 106.

Hicks PJ, Staten JL, Palmer ND, et al (2008) Association analysis of the ephrin-B2 gene in African-Americans with end-stage renal disease. Am J Nephrol 28:914–920. doi: 10.1159/000141934

Hillege HL, Fidler V, Diercks GFH, et al (2002) Urinary albumin excretion predicts cardiovascular and noncardiovascular mortality in general population. Circulation 106:1777–1782.

Hostetter TH, Rennke HG, Brenner BM (1982) The case for intrarenal hypertension in the initiation and progression of diabetic and other glomerulopathies. Am J Med 72:375–380.

Hou J, Renigunta A, Yang J, Waldegger S (2010) Claudin-4 forms paracellular chloride channel in the kidney and requires claudin-8 for tight junction localization. Proc Natl Acad Sci U S A 107:18010–18015. doi: 10.1073/pnas.1009399107

Huffman JE, Albrecht E, Teumer A, et al (2015) Modulation of genetic associations with serum urate levels by body-mass-index in humans. PloS One 10:e0119752. doi: 10.1371/journal.pone.0119752

184

Hyvönen ME, Ihalmo P, Sandholm N, et al (2013) CD2AP is associated with end-stage renal disease in patients with type 1 diabetes. Acta Diabetol 50:887–897. doi: 10.1007/s00592-013-0475-9

Inoguchi T, Sonta T, Tsubouchi H, et al (2003) Protein kinase C-dependent increase in reactive oxygen species (ROS) production in vascular tissues of diabetes: role of vascular NAD(P)H oxidase. J Am Soc Nephrol JASN 14:S227-232.

Ishii H, Jirousek MR, Koya D, et al (1996) Amelioration of vascular dysfunctions in diabetic rats by an oral PKC beta inhibitor. Science 272:728–731.

Iyengar SK, Sedor JR, Freedman BI, et al (2015) Genome-Wide Association and Trans-ethnic Meta-Analysis for Advanced Diabetic Kidney Disease: Family Investigation of Nephropathy and Diabetes (FIND). PLoS Genet 11:e1005352. doi: 10.1371/journal.pgen.1005352

Janssen B, Hohenadel D, Brinkkoetter P, et al (2005) Carnosine as a protective factor in diabetic nephropathy: association with a leucine repeat of the carnosinase gene CNDP1. Diabetes 54:2320–2327.

Johnson EO, Hancock DB, Levy JL, et al (2013) Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. Hum Genet 132:509–522. doi: 10.1007/s00439-013-1266-7

Jonsson T, Atwal JK, Steinberg S, et al (2012) A mutation in APP protects against Alzheimer’s disease and age-related cognitive decline. Nature 488:96–99. doi: 10.1038/nature11283

Kao WHL, Klag MJ, Meoni LA, et al (2008) MYH9 is associated with nondiabetic end-stage renal disease in African Americans. Nat Genet 40:1185–1192. doi: 10.1038/ng.232

Kashtan CE (1995) Clinical and molecular diagnosis of Alport syndrome. Proc Assoc Am Physicians 107:306–313.

Katoh M, Katoh M (2004) Identification and characterization of ARHGAP24 and ARHGAP25 genes in silico. Int J Mol Med 14:333–338.

Keene KL, Mychaleckyj JC, Leak TS, et al (2008a) Exploration of the utility of ancestry informative markers for genetic association studies of African Americans with type 2 diabetes and end stage renal disease. Hum Genet 124:147–154. doi: 10.1007/s00439- 008-0532-6

Keene KL, Mychaleckyj JC, Smith SG, et al (2008b) Association of the distal region of the ectonucleotide pyrophosphatase/phosphodiesterase 1 gene with type 2 diabetes in an African-American population enriched for nephropathy. Diabetes 57:1057–1062. doi: 10.2337/db07-0886

Khamaisi M, Schrijvers BF, De Vriese AS, et al (2003) The emerging role of VEGF in diabetic kidney disease. Nephrol Dial Transplant Off Publ Eur Dial Transpl Assoc - Eur Ren Assoc 18:1427–1430.

185

Kida Y, Ieronimakis N, Schrimpf C, et al (2013) EphrinB2 reverse signaling protects against capillary rarefaction and fibrosis after kidney injury. J Am Soc Nephrol JASN 24:559–572. doi: 10.1681/ASN.2012080871

Kiezun A, Garimella K, Do R, et al (2012) Exome sequencing and the genetic basis of complex traits. Nat Genet 44:623–630. doi: 10.1038/ng.2303

Kircher M, Witten DM, Jain P, et al (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46:310–315. doi: 10.1038/ng.2892

Kirsch KH, Georgescu MM, Ishimaru S, Hanafusa H (1999) CMS: an adapter molecule involved in cytoskeletal rearrangements. Proc Natl Acad Sci U S A 96:6211–6216.

Kiuchi-Saishin Y, Gotoh S, Furuse M, et al (2002) Differential Expression Patterns of Claudins, Tight Junction Membrane Proteins, in Mouse Nephron Segments. J Am Soc Nephrol 13:875–886.

Kopp JB (2010) Glomerular pathology in autosomal dominant MYH9 spectrum disorders: what are the clues telling us about disease mechanism? Kidney Int 78:130–133. doi: 10.1038/ki.2010.82

Kopp JB, Nelson GW, Sampath K, et al (2011) APOL1 Genetic Variants in Focal Segmental Glomerulosclerosis and HIV-Associated Nephropathy. J Am Soc Nephrol 22:2129–2137. doi: 10.1681/ASN.2011040388

Kopp JB, Smith MW, Nelson GW, et al (2008) MYH9 is a major-effect risk gene for focal segmental glomerulosclerosis. Nat Genet 40:1175–1184. doi: 10.1038/ng.226

Köttgen A (2010) Genome-wide association studies in nephrology research. Am J Kidney Dis Off J Natl Kidney Found 56:743–758. doi: 10.1053/j.ajkd.2010.05.018

Köttgen A, Glazer NL, Dehghan A, et al (2009) Multiple loci associated with indices of renal function and chronic kidney disease. Nat Genet 41:712–717. doi: 10.1038/ng.377

Köttgen A, Hwang S-J, Rampersaud E, et al (2008) TCF7L2 variants associate with CKD progression and renal function in population-based cohorts. J Am Soc Nephrol JASN 19:1989–1999. doi: 10.1681/ASN.2007121291

Kottgen A, Pattaro C, Boger CA, et al (2010) Multiple New Loci Associated with Kidney Function and Chronic Kidney Disease: The CKDGen consortium. Nat Genet 42:376–384. doi: 10.1038/ng.568

Köttgen A, Pattaro C, Böger CA, et al (2010) New loci associated with kidney function and chronic kidney disease. Nat Genet 42:376–384. doi: 10.1038/ng.568

Koya D, Jirousek MR, Lin YW, et al (1997) Characterization of protein kinase C beta isoform activation on the gene expression of transforming growth factor-beta, extracellular matrix components, and prostanoids in the glomeruli of diabetic rats. J Clin Invest 100:115–126. doi: 10.1172/JCI119503

186

Larsen CP, Beggs ML, Walker PD, et al (2014) Histopathologic effect of APOL1 risk alleles in PLA2R-associated membranous glomerulopathy. Am J Kidney Dis Off J Natl Kidney Found 64:161–163. doi: 10.1053/j.ajkd.2014.02.024

Lee HB, Yu M-R, Yang Y, et al (2003) Reactive oxygen species-regulated signaling pathways in diabetic nephropathy. J Am Soc Nephrol JASN 14:S241-245.

Lee S, Emond MJ, Bamshad MJ, et al (2012a) Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet 91:224–237. doi: 10.1016/j.ajhg.2012.06.007

Lee S, Wu MC, Lin X (2012b) Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13:762–775. doi: 10.1093/biostatistics/kxs014

Lei HH, Perneger TV, Klag MJ, et al (1998) Familial aggregation of renal disease in a population-based case-control study. J Am Soc Nephrol 9:1270–1276.

Levey AS, Stevens LA (2010) Estimating GFR using the CKD Epidemiology Collaboration (CKD-EPI) creatinine equation: more accurate GFR estimates, lower CKD prevalence estimates, and better risk predictions. Am J Kidney Dis Off J Natl Kidney Found 55:622– 627. doi: 10.1053/j.ajkd.2010.02.337

Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83:311–321. doi: 10.1016/j.ajhg.2008.06.024

Li WY, Huey CL, Yu ASL (2004) Expression of claudin-7 and -8 along the mouse nephron. Am J Physiol - Ren Physiol 286:F1063–F1071. doi: 10.1152/ajprenal.00384.2003

Liu C-T, Monda KL, Taylor KC, et al (2013) Genome-Wide Association of Body Fat Distribution in African Ancestry Populations Suggests New Loci. PLOS Genet 9:e1003681. doi: 10.1371/journal.pgen.1003681

Liu DJ, Leal SM (2010) A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet 6:e1001156. doi: 10.1371/journal.pgen.1001156

Liu DJ, Peloso GM, Zhan X, et al (2014) Meta-analysis of gene-level tests for rare variant association. Nat Genet 46:200–204. doi: 10.1038/ng.2852

Liu X, Jian X, Boerwinkle E (2011) dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat 32:894–899. doi: 10.1002/humu.21517

Liu X, White S, Peng B, et al (2016) WGSA: an annotation pipeline for human genome sequencing studies. J Med Genet 53:111–112. doi: 10.1136/jmedgenet-2015-103423

Liu Y, Burdon KP, Langefeld CD, et al (2005) T-786C polymorphism of the endothelial nitric oxide synthase gene is associated with albuminuria in the diabetes heart study. J Am Soc Nephrol JASN 16:1085–1090. doi: 10.1681/ASN.2004100817

187

Löwik MM, Groenen PJTA, Pronk I, et al (2007) Focal segmental glomerulosclerosis in a patient homozygous for a CD2AP mutation. Kidney Int 72:1198–1203. doi: 10.1038/sj.ki.5002469

Lucas C-H, Calvez M, Babu R, Brown A (2014) Altered subcellular localization of the NeuN/Rbfox3 RNA splicing factor in HIV-associated neurocognitive disorders (HAND). Neurosci Lett 558:97–102. doi: 10.1016/j.neulet.2013.10.037

Ma J, Guan M, Bowden DW, et al (2016) Association Analysis of the Cubilin (CUBN) and Megalin (LRP2) Genes with ESRD in African Americans. Clin J Am Soc Nephrol 11:1034–1043. doi: 10.2215/CJN.12971215

Macisaac RJ, Ekinci EI, Jerums G (2014) Markers of and risk factors for the development and progression of diabetic kidney disease. Am J Kidney Dis Off J Natl Kidney Found 63:S39-62. doi: 10.1053/j.ajkd.2013.10.048

Madsen BE, Browning SR (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5:e1000384. doi: 10.1371/journal.pgen.1000384

Maeda S (2004) Genome-wide search for susceptibility gene to diabetic nephropathy by gene- based SNP. Diabetes Res Clin Pract 66:S45–S47. doi: 10.1016/j.diabres.2003.09.017

Mahajan A, Rodan AR, Le TH, et al (2016) Trans-ethnic Fine Mapping Highlights Kidney- Function Genes Linked to Salt Sensitivity. Am J Hum Genet 99:636–646. doi: 10.1016/j.ajhg.2016.07.012

Mahajan A, Sim X, Ng HJ, et al (2015) Identification and functional characterization of G6PC2 coding variants influencing glycemic traits define an effector transcript at the G6PC2- ABCB11 locus. PLoS Genet 11:e1004876. doi: 10.1371/journal.pgen.1004876

Marchini J, Howie B, Myers S, et al (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913. doi: 10.1038/ng2088

Mason RM, Wahab NA (2003) Extracellular Matrix Metabolism in Diabetic Nephropathy. J Am Soc Nephrol 14:1358–1373. doi: 10.1097/01.ASN.0000065640.77499.D7

McDonough CW, Palmer ND, Hicks PJ, et al (2011) A GENOME WIDE ASSOCIATION STUDY FOR DIABETIC NEPHROPATHY GENES IN AFRICAN AMERICANS. Kidney Int 79:563–572. doi: 10.1038/ki.2010.467

McLaren W, Pritchard B, Rios D, et al (2010) Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26:2069–2070. doi: 10.1093/bioinformatics/btq330

Miner JH (2011) Glomerular basement membrane composition and the filtration barrier. Pediatr Nephrol Berl Ger 26:1413–1417. doi: 10.1007/s00467-011-1785-1

Moestrup SK, Kozyraki R, Kristiansen M, et al (1998) The intrinsic factor-vitamin B12 receptor and target of teratogenic antibodies is a megalin-binding peripheral membrane protein with homology to developmental proteins. J Biol Chem 273:5235–5242.

188

Molina-Jijón E, Rodríguez-Muñoz R, Namorado M del C, et al (2014) Oxidative stress induces claudin-2 nitration in experimental type 1 diabetic nephropathy. Free Radic Biol Med 72:162–175. doi: 10.1016/j.freeradbiomed.2014.03.040

Nadal MA, Lago NR, Olivieri LE, et al (1998) Fibrillary glomerulonephritis and Charcot-Marie- Tooth disease. Am J Kidney Dis Off J Natl Kidney Found 32:E3.

Nagase S, Suzuki H, Wang Y, et al (2003) Association of ecNOS gene polymorphisms with end stage renal diseases. Mol Cell Biochem 244:113–118.

Nanayakkara S, Senevirathna STMLD, Abeysekera T, et al (2014) An integrative study of the genetic, social and environmental determinants of chronic kidney disease characterized by tubulointerstitial damages in the North Central Region of Sri Lanka. J Occup Health 56:28–38.

Nelson GW, Freedman BI, Bowden DW, et al (2010) Dense mapping of MYH9 localizes the strongest kidney disease associations to the region of introns 13 to 15. Hum Mol Genet 19:1805–1815. doi: 10.1093/hmg/ddq039

Ng MCY, Saxena R, Li J, et al (2013) Transferability and fine mapping of type 2 diabetes loci in African Americans: the Candidate Gene Association Resource Plus Study. Diabetes 62:965–976. doi: 10.2337/db12-0266

Ng MCY, Shriner D, Chen BH, et al (2014a) Meta-analysis of genome-wide association studies in African Americans provides insights into the genetic architecture of type 2 diabetes. PLoS Genet 10:e1004517. doi: 10.1371/journal.pgen.1004517

Ng MCY, Shriner D, Chen BH, et al (2014b) Meta-analysis of genome-wide association studies in African Americans provides insights into the genetic architecture of type 2 diabetes. PLoS Genet 10:e1004517. doi: 10.1371/journal.pgen.1004517

Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31:3812–3814.

Nishikawa T, Edelstein D, Du XL, et al (2000) Normalizing mitochondrial superoxide production blocks three pathways of hyperglycaemic damage. Nature 404:787–790. doi: 10.1038/35008121

Noiri E, Satoh H, Taguchi J, et al (2002) Association of eNOS Glu298Asp Polymorphism With End-Stage Renal Disease. Hypertension 40:535–540. doi: 10.1161/01.HYP.0000033974.57407.82

Ohta M, Mimori K, Fukuyoshi Y, et al (2008) Clinical significance of the reduced expression of G protein gamma 7 (GNG7) in oesophageal cancer. Br J Cancer 98:410–417. doi: 10.1038/sj.bjc.6604124

Olbrich H, Fliegauf M, Hoefele J, et al (2003) Mutations in a novel gene, NPHP3, cause adolescent nephronophthisis, tapeto-retinal degeneration and hepatic fibrosis. Nat Genet 34:455–459. doi: 10.1038/ng1216

189

Osicka TM, Strong KJ, Nikolic-Paterson DJ, et al (2004) Renal processing of serum proteins in an albumin-deficient environment: an in vivo study of glomerulonephritis in the Nagase analbuminaemic rat. Nephrol Dial Transplant Off Publ Eur Dial Transpl Assoc - Eur Ren Assoc 19:320–328.

Page NM, Butlin DJ, Lomthaisong K, Lowry PJ (2001) The human apolipoprotein L gene cluster: identification, classification, and sites of distribution. Genomics 74:71–78. doi: 10.1006/geno.2001.6534

Palm F, Cederberg J, Hansell P, et al (2003) Reactive oxygen species cause diabetes-induced decrease in renal oxygen tension. Diabetologia 46:1153–1160. doi: 10.1007/s00125- 003-1155-z

Palmer ND, Freedman BI (2012) Insights into the genetic architecture of diabetic nephropathy. Curr Diab Rep 12:423–431. doi: 10.1007/s11892-012-0279-2

Palmer ND, McDonough CW, Hicks PJ, et al (2012) A genome-wide association search for type 2 diabetes genes in African Americans. PloS One 7:e29202. doi: 10.1371/journal.pone.0029202

Palmer ND, Ng MCY, Hicks PJ, et al (2014) Evaluation of Candidate Nephropathy Susceptibility Genes in a Genome-Wide Association Study of African American Diabetic Kidney Disease. PLoS ONE. doi: 10.1371/journal.pone.0088273

Parra EJ, Marcini A, Akey J, et al (1998) Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet 63:1839–1851. doi: 10.1086/302148

Parsa A, Fuchsberger C, Köttgen A, et al (2013a) Common Variants in Mendelian Kidney Disease Genes and Their Association with Renal Function. J Am Soc Nephrol 24:2105– 2117. doi: 10.1681/ASN.2012100983

Parsa A, Kao WHL, Xie D, et al (2013b) APOL1 risk variants, race, and progression of chronic kidney disease. N Engl J Med 369:2183–2196. doi: 10.1056/NEJMoa1310345

Pattaro C, Kottgen A, Teumer A, et al (2012) Genome-Wide Association and Functional Follow- Up Reveals New Loci for Kidney Function. PLoS Genet. doi: 10.1371/journal.pgen.1002584

Pattaro C, Teumer A, Gorski M, et al (2016) Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function. Nat Commun 7:10023. doi: 10.1038/ncomms10023

Patterson N, Price AL, Reich D (2006) Population Structure and Eigenanalysis. PLoS Genet 2:e190. doi: 10.1371/journal.pgen.0020190

Pezzolesi MG, Poznik GD, Mychaleckyj JC, et al (2009) Genome-wide association scan for diabetic nephropathy susceptibility genes in type 1 diabetes. Diabetes 58:1403–1410. doi: 10.2337/db08-1514

190

Pirinen M, Donnelly P, Spencer CCA (2013) Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. Ann Appl Stat 7:369–390. doi: 10.1214/12-AOAS586

Pollak MR (2014) Familial FSGS. Adv Chronic Kidney Dis 21:422–425. doi: 10.1053/j.ackd.2014.06.001

Pollak MR, Genovese G, Friedman DJ (2012) APOL1 and kidney disease. Curr Opin Nephrol Hypertens 21:179–182. doi: 10.1097/MNH.0b013e32835012ab

Prete DD, Anglani F, Forino M, et al (1997) Down-regulation of glomerular matrix metalloproteinase-2 gene in human NIDDM. Diabetologia 40:1449–1454. doi: 10.1007/s001250050848

Price AL, Kryukov GV, de Bakker PIW, et al (2010a) Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet 86:832–838. doi: 10.1016/j.ajhg.2010.04.005

Price AL, Patterson NJ, Plenge RM, et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909. doi: 10.1038/ng1847

Price AL, Zaitlen NA, Reich D, Patterson N (2010b) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11:459–463. doi: 10.1038/nrg2813

Prockop DJ (1992) Mutations in Collagen Genes as a Cause of Connective-Tissue Diseases. N Engl J Med 326:540–546. doi: 10.1056/NEJM199202203260807

Pruitt KD, Brown GR, Hiatt SM, et al (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42:D756–D763. doi: 10.1093/nar/gkt1114

Quang D, Chen Y, Xie X (2015) DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31:761–763. doi: 10.1093/bioinformatics/btu703

Quinn M, Angelico MC, Warram JH, Krolewski AS (1996) Familial factors determine the development of diabetic nephropathy in patients with IDDM. Diabetologia 39:940–945.

Radcliffe NJ, Seah J-M, Clarke M, et al (2017) Clinical predictive factors in diabetic kidney disease progression. J Diabetes Investig 8:6–18. doi: 10.1111/jdi.12533

Remuzzi G, Schieppati A, Ruggenenti P (2002) Nephropathy in Patients with Type 2 Diabetes. N Engl J Med 346:1145–1151. doi: 10.1056/NEJMcp011773

Renouil M, Stojkovic T, Jacquemont ML, et al (2013) [Charcot-Marie-Tooth disease associated with periaxin mutations (CMT4F): Clinical, electrophysiological and genetic analysis of 24 patients]. Rev Neurol (Paris) 169:603–612. doi: 10.1016/j.neurol.2013.07.004

Reznichenko A, Snieder H, van den Born J, et al (2012) CUBN as a novel locus for end-stage renal disease: insights from renal transplantation. PloS One 7:e36512. doi: 10.1371/journal.pone.0036512

191

Rivas MA, Beaudoin M, Gardet A, et al (2011) Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet 43:1066–1073. doi: 10.1038/ng.952

Roscioni SS, Lambers Heerspink HJ, de Zeeuw D (2014) Microalbuminuria: target for renoprotective therapy PRO. Kidney Int 86:40–49. doi: 10.1038/ki.2013.490

Russo LM, Sandoval RM, McKee M, et al (2007) The normal kidney filters nephrotic levels of albumin retrieved by proximal tubule cells: retrieval is disrupted in nephrotic states. Kidney Int 71:504–513. doi: 10.1038/sj.ki.5002041

Saito A, Pietromonaco S, Loo AK, Farquhar MG (1994) Complete cloning and sequencing of rat gp330/"megalin," a distinctive member of the low density lipoprotein receptor gene family. Proc Natl Acad Sci U S A 91:9725–9729.

Salas A, Carracedo A, Richards M, Macaulay V (2005) Charting the ancestry of African Americans. Am J Hum Genet 77:676–680. doi: 10.1086/491675

Sandholm N, McKnight AJ, Salem RM, et al (2013) Chromosome 2q31.1 associates with ESRD in women with type 1 diabetes. J Am Soc Nephrol JASN 24:1537–1543. doi: 10.1681/ASN.2012111122

Sandholm N, Salem RM, McKnight AJ, et al (2012) New Susceptibility Loci Associated with Kidney Disease in Type 1 Diabetes. PLoS Genet. doi: 10.1371/journal.pgen.1002921

Sandholm N, Zuydam NV, Ahlqvist E, et al (2017) The Genetic Landscape of Renal Complications in Type 1 Diabetes. J Am Soc Nephrol 28:557–574. doi: 10.1681/ASN.2016020231

Sawcer S, Hellenthal G, Pirinen M, et al (2011) Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476:214–219. doi: 10.1038/nature10251

Saxena R, Saleheen D, Been LF, et al (2013) Genome-wide association study identifies a novel locus contributing to type 2 diabetes susceptibility in Sikhs of Punjabi origin from India. Diabetes 62:1746–1755. doi: 10.2337/db12-1077

Schmid H, Boucherot A, Yasuda Y, et al (2006) Modular activation of nuclear factor-kappaB transcriptional programs in human diabetic nephropathy. Diabetes 55:2993–3003. doi: 10.2337/db06-0477

Schwarz JM, Rödelsperger C, Schuelke M, Seelow D (2010) MutationTaster evaluates disease- causing potential of sequence alterations. Nat Methods 7:575–576. doi: 10.1038/nmeth0810-575

Schwindinger WF, Betz KS, Giger KE, et al (2003) Loss of G protein gamma 7 alters behavior and reduces striatal alpha(olf) level and cAMP production. J Biol Chem 278:6575–6579. doi: 10.1074/jbc.M211132200

Seaquist ER, Goetz FC, Rich S, Barbosa J (1989) Familial Clustering of Diabetic Kidney Disease. N Engl J Med 320:1161–1165. doi: 10.1056/NEJM198905043201801

192

Shih NY, Li J, Karpitskii V, et al (1999) Congenital nephrotic syndrome in mice lacking CD2- associated protein. Science 286:312–315.

Shimazaki A, Kawamura Y, Kanazawa A, et al (2005) Genetic variations in the gene encoding ELMO1 are associated with susceptibility to diabetic nephropathy. Diabetes 54:1171– 1178.

Shlush LI, Bercovici S, Wasser WG, et al (2010) Admixture mapping of end stage kidney disease genetic susceptibility using estimated mutual information ancestry informative markers. BMC Med Genomics 3:47. doi: 10.1186/1755-8794-3-47

Shukrun R, Vivante A, Pleniceanu O, et al (2014) A human integrin-α3 mutation confers major renal developmental defects. PloS One 9:e90879. doi: 10.1371/journal.pone.0090879

Skorecki K, Wasser WG (2016) Beyond APOL1: Genetic Inroads into Understanding Population Disparities in Diabetic Kidney Disease. Clin J Am Soc Nephrol 11:928–931. doi: 10.2215/CJN.04680416

Škrtić M, Cherney DZI (2015) Sodium–glucose cotransporter-2 inhibition and the potential for renal protection in diabetic nephropathy: Curr Opin Nephrol Hypertens 24:96–103. doi: 10.1097/MNH.0000000000000084

Smith MW, Patterson N, Lautenberger JA, et al (2004) A high-density admixture map for disease gene discovery in african americans. Am J Hum Genet 74:1001–1013. doi: 10.1086/420856

Soldatos G, Cooper ME (2008) Diabetic nephropathy: Important pathophysiologic mechanisms. Diabetes Res Clin Pract 82, Supplement 1:S75–S79. doi: 10.1016/j.diabres.2008.09.042

Song E-Y, McClellan WM, McClellan A, et al (2009) Effect of Community Characteristics on Familial Clustering of End-Stage Renal Disease. Am J Nephrol 30:499–504. doi: 10.1159/000243716

Spray BJ, Atassi NG, Tuttle AB, Freedman BI (1995) Familial risk, age at onset, and cause of end-stage renal disease in white Americans. J Am Soc Nephrol JASN 5:1806–1810.

Stadler K, Goldberg IJ, Susztak K (2015) The evolving understanding of the contribution of lipid metabolism to diabetic kidney disease. Curr Diab Rep 15:40. doi: 10.1007/s11892-015- 0611-8

Storm T, Zeitz C, Cases O, et al (2013) Detailed investigations of proximal tubular function in Imerslund-Gräsbeck syndrome. BMC Med Genet 14:111. doi: 10.1186/1471-2350-14- 111

Takahashi T, Takahashi K, Gerety S, et al (2001) Temporally Compartmentalized Expression of Ephrin-B2 during Renal Glomerular Development. J Am Soc Nephrol 12:2673–2682.

Tang H, Peng J, Wang P, Risch NJ (2005) Estimation of individual admixture: analytical and study design considerations. Genet Epidemiol 28:289–301. doi: 10.1002/gepi.20064

193

Tayo BO, Kramer H, Salako BL, et al (2013) Genetic variation in APOL1 and MYH9 genes is associated with chronic kidney disease among Nigerians. Int Urol Nephrol 45:485–494. doi: 10.1007/s11255-012-0263-4

Teumer A, Tin A, Sorice R, et al (2016) Genome-wide Association Studies Identify Genetic Loci Associated With Albuminuria in Diabetes. Diabetes 65:803–817. doi: 10.2337/db15-1313

Tin A, Colantuoni E, Boerwinkle E, et al (2013a) Using Multiple Measures for Quantitative Trait Association Analyses: Application to Estimated Glomerular Filtration Rate (eGFR). J Hum Genet 58:461–466. doi: 10.1038/jhg.2013.23

Tin A, Colantuoni E, Boerwinkle E, et al (2013b) Using Multiple Measures for Quantitative Trait Association Analyses: Application to Estimated Glomerular Filtration Rate (eGFR). J Hum Genet 58:461. doi: 10.1038/jhg.2013.23

Tishkoff SA, Reed FA, Friedlaender FR, et al (2009) The genetic structure and history of Africans and African Americans. Science 324:1035–1044. doi: 10.1126/science.1172257

Tokunaga S, Hashiguchi A, Yoshimura A, et al (2012) Late-onset Charcot–Marie–Tooth disease 4F caused by periaxin gene mutation. neurogenetics 13:359–365. doi: 10.1007/s10048- 012-0338-5

Tzur S, Rosset S, Shemer R, et al (2010a) Missense mutations in the APOL1. Hum Genet 128:345–350. doi: 10.1007/s00439-010-0861-0

Tzur S, Rosset S, Shemer R, et al (2010b) Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene. Hum Genet 128:345–350. doi: 10.1007/s00439-010-0861-0

Tzur S, Wasser WG, Rosset S, Skorecki K (2012) Linkage disequilibrium analysis reveals an albuminuria risk haplotype containing three missense mutations in the cubilin gene with striking differences among European and African ancestry populations. BMC Nephrol 13:142. doi: 10.1186/1471-2369-13-142

Uhlén M, Fagerberg L, Hallström BM, et al (2015) Proteomics. Tissue-based map of the human proteome. Science 347:1260419. doi: 10.1126/science.1260419 van der Velde M, Matsushita K, Coresh J, et al (2011) Lower estimated glomerular filtration rate and higher albuminuria are associated with all-cause and cardiovascular mortality. A collaborative meta-analysis of high-risk population cohorts. Kidney Int 79:1341–1352. doi: 10.1038/ki.2010.536

Vardarli I, Baier LJ, Hanson RL, et al (2002) Gene for susceptibility to diabetic nephropathy in type 2 diabetes maps to 18q22.3-23. Kidney Int 62:2176–2183. doi: 10.1046/j.1523- 1755.2002.00663.x

Voskarides K, Damianou L, Neocleous V, et al (2007) COL4A3/COL4A4 Mutations Producing Focal Segmental Glomerulosclerosis and Renal Failure in Thin Basement Membrane Nephropathy. J Am Soc Nephrol 18:3004–3016. doi: 10.1681/ASN.2007040444

194

Wang H-Y, Hsieh P-F, Huang D-F, et al (2015) RBFOX3/NeuN is Required for Hippocampal Circuit Balance and Function. Sci Rep 5:17383. doi: 10.1038/srep17383

Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38:e164–e164. doi: 10.1093/nar/gkq603

Watanabe K, Nakayama K, Ohta S, et al (2016) ZNF70, a novel ILDR2-interacting protein, contributes to the regulation of HES1 gene expression. Biochem Biophys Res Commun 477:712–716. doi: 10.1016/j.bbrc.2016.06.124

Watanabe K, Watson E, Cremona ML, et al (2013) ILDR2: an endoplasmic reticulum resident molecule mediating hepatic lipid homeostasis. PloS One 8:e67234. doi: 10.1371/journal.pone.0067234

Wessel J, Schork NJ (2006) Generalized genomic distance-based regression methodology for multilocus association analysis. Am J Hum Genet 79:792–806. doi: 10.1086/508346

Willer CJ, Li Y, Abecasis GR (2010) METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26:2190–2191. doi: 10.1093/bioinformatics/btq340

Williams AN, Conway BN (2017) Effect of high density lipoprotein cholesterol on the relationship of serum iron and hemoglobin with kidney function in diabetes. J Diabetes Complications. doi: 10.1016/j.jdiacomp.2017.03.010

Williams WW, Salem RM, McKnight AJ, et al (2012) Association testing of previously reported variants in a large case-control meta-analysis of diabetic nephropathy. Diabetes 61:2187–2194. doi: 10.2337/db11-0751

Woredekal Y, Friedman EA (2009) Chapter 54. Diabetic Nephropathy. In: Lerma EV, Berns JS, Nissenson AR (eds) CURRENT Diagnosis & Treatment: Nephrology & Hypertension. The McGraw-Hill Companies, New York, NY,

Wu MC, Lee S, Cai T, et al (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89:82–93. doi: 10.1016/j.ajhg.2011.05.029

Yang J, Zaitlen NA, Goddard ME, et al (2014) Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46:100–106. doi: 10.1038/ng.2876

Yang Q, Köttgen A, Dehghan A, et al (2010) Multiple genetic loci influence serum urate levels and their relationship with gout and cardiovascular disease risk factors. Circ Cardiovasc Genet 3:523–530. doi: 10.1161/CIRCGENETICS.109.934455

Yates A, Akanni W, Amode MR, et al (2016) Ensembl 2016. Nucleic Acids Res 44:D710–D716. doi: 10.1093/nar/gkv1157

Yeo NC, O’Meara CC, Bonomo JA, et al (2015) Shroom3 contributes to the maintenance of the glomerular filtration barrier integrity. Genome Res 25:57–65. doi: 10.1101/gr.182881.114

195

Yu ASL (2015) Claudins and the kidney. J Am Soc Nephrol JASN 26:11–19. doi: 10.1681/ASN.2014030284

Zhang P, Chen Y, Cheng Y, et al (2014) Alkaline sphingomyelinase (NPP7) promotes cholesterol absorption by affecting sphingomyelin levels in the gut: A study with NPP7 knockout mice. Am J Physiol Gastrointest Liver Physiol 306:G903-908. doi: 10.1152/ajpgi.00319.2013

Zhou W, Dai J, Attanasio M, Hildebrandt F (2010) Nephrocystin-3 is required for ciliary function in zebrafish embryos. Am J Physiol - Ren Physiol 299:F55–F62. doi: 10.1152/ajprenal.00043.2010

Zoppini G, Targher G, Chonchol M, et al (2012) Predictors of estimated GFR decline in patients with type 2 diabetes and preserved kidney function. Clin J Am Soc Nephrol CJASN 7:401–408. doi: 10.2215/CJN.07650711

Zuk O, Schaffner SF, Samocha K, et al (2014) Searching for missing heritability: designing rare variant association studies. Proc Natl Acad Sci U S A 111:E455-464. doi: 10.1073/pnas.1322563111

United States Renal Data System. 2016 USRDS annual data report: Epidemiology of kidney disease in the United States. National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, 2016.

U.S. Renal Data System, USRDS 2014 Annual Data Report: Atlas of End-Stage Renal Disease in the United States, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, 2014.

196

CURRICULUM VITAE

Meijian Guan [email protected] 336-391-6110 Education

Aug 2016-present M.S. Candidate, Department of Computer Science, Wake Forest University, NC, U.S.

Aug 2012-present PhD Candidate, Integrative Physiology and Pharmacology Program, Winston-Salem, NC, U.S.

Sep 2006-Jun 2010 B.S, Biotechnology, School of Biology and Basic Medical Sciences, Soochow University, China

Research Experience

May 2012-present Graduate Student, Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, NC, U.S. Dissertation project: Evaluation of the contribution of genetic variants to Type 2 diabetes associated nephropathy in African Americans.  Project 1: T2D-ESKD genome-wide association study (GWAS) in African Americans  Project 2: Exome-sequencing study for African American ESKD patients with diabetes  Project 3: Evaluation of the role of candidate genes in T2D-ESKD in African Americans Sep 2011-May 2012 Research Assistant, Department of Surgery, James H. Quillen College of Medicine, East Tennessee State University, TN, U.S. Project: TLR9 ligand induces protection against cerebral I/R injury and myocardial ischemic injury through a PI3K/AKt-dependent mechanism.

Jun 2010-Dec 2010 Brand Ambassador, Pfizer, China

Sep 2009-Jun 2010 Internship, Laboratory of Aging and Nervous Diseases, Department of Pharmacology, School of Pharmacy, Soochow University, China Project: The application of high-fidelity enzyme to detect the related Tamoxifen metabolism genotype, P450 CYP2D6.

Sep 2008-Sep 2009 Internship, Department of Cell biology, School of Biology and Basic Medical Sciences, Soochow University, China Project: Silk fibroin nanofibers support growth of neurons in vitro (National Outstanding College Student Innovation Research Grant, China)

Key Skills

Programming skills: C++, UNIX Shell Script, R, SAS, MATLAB, Python, LaTeX, HTML5

197

Analytical and statistical skills: data management/manipulate, data clean/quality control, data visualization, data interpretation, analytical protocol design, statistical model selection, result validation/replication, ANOVA, Principle Components Analysis, linear regression model, linear mixed model, logistic regression, genome-wide association study, next-generation sequencing data analysis, region-based test, meta-analysis, genetic locus fine-mapping

Statistical applications and research tools: PLINK, METAL, EIGENSOFT, ggplot2, RAREMETAL/RAREMETALWORKER, RvTests, EPACT, GTOOL, SKAT, CMC, VT, CADD, simpleM, RegulomeDB, EasyQC, zCall, Haploview, Variant Effect Predictor, ANNOVAR, GENCODE, Exome Variant Server, ExAC Browser, 1000 Genomes Browser, UCSC Genome Browser, dbGAP, GeneALaCart/GeneCards, Locuszoom, OMIM, Nephromine

Other laboratory skills: PCR, immunoblotting, cell culture, immunofluorescence, myocardial infarct mouse model, myocardial ischemia/reperfusion, focal cerebral ischemia/reperfusion

Honors and Awards

2012-present Graduate Fellowship, Wake Forest University, NC, U.S.

2015 Runner-up of 15th Wake Forest Graduate Student and PostDoc Research Day Three Minutes Thesis Competition

2014 Travel Award for 4th NIGMS-funded Short Course on Statistical Genetics and Genomics at UAB

2008-2010 National Outstanding College Student Innovation Research Grant, China

2008-2010 President of Student Association, Biotechnology Program, Soochow University, China

2007-2010 Integrative Courses Scholarship, Soochow University, China

2007-2009 President of Students' Scientific Association, School of Biology and Basic Medical Sciences, Soochow University, China

2006-2007 Outstanding Student Leadership, Soochow University, China

Membership

2014-present Membership of American Society of Human Genetics

Conferences and Presentations

2016 ASHG 2016 Annual Meeting in Vancouver, Canada Poster presentation: An exome-wide association study for type 2 diabetes-attributed end-stage kidney disease in African Americans

2015 5th Annual Triangle Statistical Genetics Conference, Durham, NC

198

Invited talk: An exome-wide sequencing study for type 2 diabetes-attributed end-stage kidney disease in African Americans

2015 ASHG 2015 Annual Meeting in San Diego, CA Poster presentation: Variation in kidney structure-related genes in African Americans with type 2 diabetes-attributed end-stage kidney disease

2015 15th Wake Forest Graduate Student and PostDoc Research Day Three Minutes Thesis Competition Oral presentation: High throughput genetic testing for human diseases.

2015 14th Wake Forest Graduate Student and PostDoc Research Day Poster presentation: Variation in kidney structure-related genes in African Americans with type 2 diabetes-attributed end-stage kidney disease

2014 4th Annual Triangle Statistical Genetics Conference, Durham, NC

2014 ASHG 2014 Annual Meeting in San Diego, CA Poster presentation: An exome-wide sequencing study for type 2 diabetes-attributed kidney disease in African Americans.

2014 14th Wake Forest Graduate Student and PostDoc Research Day Poster presentation: Genome wide association study on diabetic chronic kidney disease in African Americans.

2013 3rd Annual Triangle Statistical Genetics Conference, Durham, NC

Other Activities

2016 Ambassador of 2016 NC DNA Day

2014 Organizer of 2014 virtual conference on Genetics and Genomics at WFSM

2014 4th NIGMS-funded Short Course on Statistical Genetics and Genomics at UAB

Article Reviews

Journal of Medical genetics

BMC Medical Genomics

European Archives of Psychiatry and Clinical Neuroscience

Publications

Meijian Guan, Jacob M. Keaton, BS, Latchezar Dimitrov, Pamela J. Hicks, Jianzhao Xu, John R. Sedor, Rulan S. Parekh, Denyse Thornley-Brown, FIND Consortium, Nora Franceschini, Joe Coresh, Myriam Fornage, Adrienne Tin, Anna Kottgen, Jerome I. Rotter, Stephen S. Rich, Ida Chen, James G Wilson, Laura J Rasmussen-Torvik, Carl Langefeld, Nicholette Allred, Barry I.

199

Freedman, Donald W. Bowden, Maggie C. Y. Ng. “Genome-wide association study in African Americans with T2D-attributed end-stage kidney disease.” In preparation.

Meijian Guan, Jacob Keaton, Latchezar Dimitrov, Mary Stromberg, Pamela J. Hicks, Barry Freedman, Donald Bowden, Maggie C. Y. Ng. “An exome-wide association study for type 2 diabetes-attributed end-stage kidney disease in African Americans.” In preparation.

Meijian Guan, Jun Ma, Jacob Keaton, Latchezar Dimitrov, Poorva Mudgal, Mary Stromberg, Pamela J. Hicks, Barry Freedman, Donald Bowden, Maggie C. Y. Ng. “Association of genetic variations in kidney structure-related genes in African Americans with type 2 diabetes associated end-stage kidney disease.” Hum Genet (2016). doi:10.1007/s00439-016-1714-2.

Jun Ma*, Meijian Guan*, Donald W. Bowden, Maggie C. Y. Ng, Pamela J. Hicks, Janice P. Lea, Lijun Ma, MD, Chuan Gao, Nicholette D. Palmer, Barry I. Freedman. “Association analysis of the cubilin (CUBN) and megalin (LRP2) genes with end-stage kidney disease in African Americans.” Clin J Am Soc Nephrol, 2016 May 19. pii: CJN.12971215. *Co-first author.

Jun Ma, Jasmin Divers, Nicholette D. Palmer, Bruce A. Julian, Ajay K. Israni, David Schladt, Stephen O. Pastan, Kryt Chattrabhuti, Michael D Gautreaux, Vera Hauptfeld, Robert A Bray, Allan D Kirk, W Mark Brown, Robert S Gaston, Jeffrey Rogers, Alan C Farney, Giuseppe Orlando, Robert J Stratta, Meijian Guan, Amudha Palanisamy, Amber M Reeves- Daniel, Donald W Bowden, Carl D Langefeld, Pamela J Hicks, Lijun Ma and Barry I Freedman. “Deceased Donor Multidrug Resistance Protein 1 and Caveolin 1 Gene Variants May Influence Allograft Survival in Kidney Transplantation.” Kidney International, April 8, 2015. doi:10.1038/ki.2015.105.

Jason A. Bonomo, Meijian Guan, Maggie C.Y. Ng, Nicholette D. Palmer, Pamela J. Hicks, Jacob M. Keaton, Janice P. Lea, Carl D. Langefeld, Barry I. Freedman, Donald W. Bowden. “The ras responsive transcription factor RREB1 is a novel candidate gene for type 2 diabetes associated end-stage kidney disease.” Human Molecular Genetics, July 15, 2014. doi:10.1093/hmg/ddu362.

Lu, Chen, Tuanzhu Ha, Xiaohui Wang, Li Liu, Xia Zhang, Erinmarie Olson Kimbrough, Zhanxin Sha, Meijian Guan, John Schweitzer, John Kalbfleisch, David Williams, Chuanfu Li. “The TLR9 Ligand, CpG-ODN, Induces Protection against Cerebral Ischemia/reperfusion Injury via Activation of PI3K/Akt Signaling.” Journal of the American Heart Association, 3, no. 2 (2014): e000629. doi:10.1161/JAHA.113.000629.

Abstract

Guan MJ, Nautiyal M, Hakuda D, de Lima R, Rose J, Pirro NT, Chappell MC. Renal mitochondria predominantly express [des-Ang I]- angiotensinogen and renin [abstract]. FASEB J. 2013;27():909.5.

200