BASIC RESEARCH www.jasn.org

Functional Genomic Annotation of Genetic Risk Loci Highlights Inflammation and Epithelial Biology Networks in CKD

Nora Ledo, Yi-An Ko, Ae-Seo Deok Park, Hyun-Mi Kang, Sang-Youb Han, Peter Choi, and Katalin Susztak

Renal Electrolyte and Hypertension Division, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania

ABSTRACT Genome-wide association studies (GWASs) have identified multiple loci associated with the risk of CKD. Almost all risk variants are localized to the noncoding region of the genome; therefore, the role of these variants in CKD development is largely unknown. We hypothesized that polymorphisms alter factor binding, thereby influencing the expression of nearby . Here, we examined the regulation of transcripts in the vicinity of CKD-associated polymorphisms in control and diseased samples and used systems biology approaches to identify potentially causal genes for prioritization. We interro- gated the expression and regulation of 226 transcripts in the vicinity of 44 single polymorphisms using RNA sequencing and expression arrays from 95 microdissected control and diseased tubule samples and 51 glomerular samples. analysis from 41 tubule samples served for external validation. 92 transcripts in the tubule compartment and 34 transcripts in glomeruli showed statistically significant correlation with eGFR. Many novel genes, including ACSM2A/2B, FAM47E, and PLXDC1, were identified. We observed that the expression of multiple genes in the vicinity of any single CKD risk allele correlated with renal function, potentially indicating that genetic variants influence multiple transcripts. Network analysis of GFR-correlating transcripts highlighted two major clusters; a positive correlation with epithelial and vascular functions and an inverse correlation with inflammatory gene cluster. In summary, our functional genomics analysis highlighted novel genes and critical pathways associated with kidney function for future analysis.

J Am Soc Nephrol 26: 692–714, 2015. doi: 10.1681/ASN.2014010028

Twenty million people suffer from CKD and ESRD in experiments to understand the genetics of a com- the United States. The risk of death significantly plex trait such as CKD is the genome-wide associ- increases as kidney function (GFR) declines and it can ation study (GWAS).2 These studies compare be as high as 20% for patients with diabetes on 1 hemodialysis. Diseases of the kidney and urinary Received January 8, 2014. Accepted July 8, 2014. tract are ranked 12th in the mortality charts (www. Published online ahead of print. Publication date available at cdc.org), indicating their importance in public health. www.jasn.org. CKD is a typical gene environmental disease. Present address: Dr. Sang-Youb Han, Department of Internal Several environmental factors play important roles Medicine, Inje University, Ilsan-Paik Hospital, Goyang, Gyeonggi, inCKD development; diabetesandhypertension are South Korea. the two most important causes, accounting for close Correspondence: Dr. Katalin Susztak, Perelman School of to 75% of ESRD cases. In addition, CKD has a clear Medicine, University of Pennsylvania, 415 Curie Boulevard, 415 genetic component, because ,20% of patients with Clinical Research Building, Philadelphia, PA 19104. Email: diabetes or hypertension will actually develop kid- [email protected] ney disease. At present, one of the most powerful Copyright © 2015 by the American Society of Nephrology

692 ISSN : 1046-6673/2603-692 J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH genetic variants in two groups of participants: people with the Reports from the ENCODE project indicate that the majority disease (patients) and similar people without the disease (con- (70%–80%) of the gene regulatory elements (promoters, en- trols). If a variant (single nucleotide polymorphism [SNP]) is hancers, and insulators) are within 250 kb of the gene.3 Using more frequent in people with the disease, the SNP is said to be these criteria, we identified 306 genes within 500 kb of 44 CKD associated with the disease. GWASs, however, have several SNPs. There was no gene within the 500-kb window around the limitations. First, GWASs became possible, because the ge- rs12437854 SNP; therefore, 43 loci were followed. We called netic information is inherited in fairly large blocks. Therefore, these transcripts CKD risk-associated transcripts (CRATs). we do not have to test the association with each of the close to 20 million genetic variations but can use fewer (about 1 mil- CRATs Are Enriched for Kidney-Specific Expression lion) SNPs representing the genetic variation of larger genetic We hypothesized that cells that express CRATs play an important regions (called haplotype or linkage disequilibrium block). role in controlling kidney function. Therefore, we determined Although haplotype blocks made GWAS convenient and fi- expression levels of all CRATsin control (normal) human kidney nancially feasible, they also mean that we do not know which samples (n=2) using comprehensive RNA sequencing analysis. of the many variants within a single haplotype block is func- We found that 41% of the CKD risk loci-associated transcripts tionally relevant. showed high expression (upper quartile) and that only 6% of Furthermore, .83% of the disease-associated SNPs are lo- CRAT transcripts were not detectable in human kidney tubule calized to the noncoding region of the genome3; therefore, it samples (Supplemental Figure 1). Overall, we found that a large is unclear how they induce illness. Recent reports from the percentage of the CKD SNP neighboring transcripts (94%; 287 Encyclopedia of DNA Elements (ENCODE) project indicate of 306) were expressed in the human kidney, indicating statisti- that most complex trait polymorphisms are localized to gene cally significant kidney-specific enrichment compared with 44 regulatory regions in target types.4 Disease-associated ge- randomly selected loci, where only 13% of the transcripts netic variants can alter binding sites for important transcrip- showed high expression and 16% of the nearby transcripts 2 tion factors and influence the expression of nearby genes.3,5–7 were not expressed in the kidney (P51.25310 9). Genetic variants can potentially alter steady-state expression analysis (david.abcc.ncifcrf.gov) to under- of genes, in which case they interfere with basal transcription stand the tissue specificity of CRATs indicated specificand factor binding or can alter the amplitude of transcript changes significant enrichment in the kidney and peripheral leukocytes after signal-dependent binding. (P value=0.0082 and P value=0.0014, respectively). Next, we Here, we hypothesized that polymorphisms associated with compared absolute expression levels of CRATs by RNA se- renal disease will influence the expression of nearby transcript quencing in 16 different human organs using the Illumina levels in the kidney. We used genomics and systems biology Body Map database (www.ebi.ac.uk). The atlas confirmed approaches to investigate tissue-specific expression of tran- the statistically significant kidney-specific expression enrich- scripts and their correlation with kidney function. ment of CRATs (Supplemental Figure 2). For example, the atlas highlighted the high and kidney-specific expression of Uro- modulin (UMOD). In summary, expression of CRATs was en- RESULTS riched in the kidney and peripheral lymphocytes, potentially indicating the role of these cells in kidney disease development. CKD Risk-Associated Transcripts By manual literature search, we identified all GWASs reporting CRAT Expression in Normal and Diseased Human genetic association for CKD-related traits (Supplemental Table1). Kidney Glomerular Samples Many of these studies, however, used different parameters as We hypothesized that functionally important CRATs are not kidney disease indicators. We included SNPs associated with only expressed in relevant cell types (kidney and leukocytes) eGFR (on the basis of serum creatinine or cystatin C calcula- but that their expression level will change in CKD. To test this tions) or the presence of ESRD. Our analysis identified 10 pub- hypothesis, we analyzed gene expression levels in a large lications meeting these criteria.8–25 Most publications did not collection of microdissected human glomerular (n=51) and differentiate cases on the basis of disease etiology and included tubule (n=95) samples. Kidney samples were obtained from a cases with hypertensive and diabetic kidney disease. Coding diverse population (Supplemental Tables 3 and 4). Statistical polymorphisms and SNPs that did not reach genome-wide sig- analysis failed to detect ethnicity-driven gene expression dif- 2 nificance (P.5310 8) were excluded.26 Finally, 44 leading ferences (data not shown). SNPs meeting all of these criteria were used for further analysis Transcript profiling was performed for each individual sam- (Supplemental Table 2). Three SNPs associated only with dia- ple using Affymetrix U133v2 arrays. The data were processed betic CKD development were also analyzed separately; all other using established pipelines, and they contained probe set iden- SNPs were from studies including both diabetic and nondiabetic tifications for 226 transcripts from 306 originals CRATs. We cases. There were only two SNPs that reached genome-wide analyzed the expression levels of 226 CRATsin 51 microdissected significance in multiple studies (rs12917707 and rs9895661). glomerular samples (Supplemental Table 3). On the basis of the These two SNPs were counted only once. Chronic Kidney Disease Epidemiology Collaboration eGFR

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 693 BASIC RESEARCH www.jasn.org determination, we had 27 samples with normal renal function (Figure 1D) correlated with eGFR. In normal nondisease human (eGFR.60 ml/min per 1.73 m2) and 24 samples with reduced kidney tissue, coded by FAM47E (Figure 1E) and GFR (eGFR,60 ml/min per 1.73 m2).27 This eGFR cutoff was VEGFA (Figure 1G) were highly expressed in glomeruli. Immu- used as a CKD definition in the included GWASs. To match the nostaining studies (from the Human Atlas) showed that transcript data with GWAS cases, we included samples from the protein encoded by PLXDC1 (also known as Tumor endo- patients with diabetic and hypertensive CKD. thelial marker 7 [TEM7]) exhibited glomerular endothelial- Linear correlation analysis identified the association of 34 specificexpressioninnormalhumankidneytissue(Figure1F). CRATs with eGFR (P,0.05) (Table 1). The correlation between MAGI2 seemed to have a podocyte-specific expression pattern the expression of seven CRATs and eGFR remained significant, (Figure 1H), potentially indicatingitsroleinthiscelltype.In- even after Benjamini–Hochberg-based multiple testing correc- terestingly, FAM47E, PLXDC1,andMAGI2 have not been iden- tion. The expression of multiple novel transcripts showed excel- tified in GWASs as potential causal genes in the vicinity of CKD lent correlation with kidney function. For example, expression risk loci. We also separately examined the expression and corre- levels of Family with sequence similarity 47, member E lation of diabetic CKD associated transcripts and their correla- (FAM47E) (Figure 1A), Plexin domain-containing 1 (PLXDC1) tion with glomerular gene expression (Supplemental Table 5). In (Figure 1B), Vascular endothelial growth factor A (VEGFA) (Fig- summary, the analysis highlighted that the expression of several ure 1C), and Membrane-associated guanylate (MAGI2) CRATs in glomeruli correlates with renal function.

Table 1. Expression levels of 34 transcripts (CRATs) in glomeruli showed significant correlation with eGFR Gene Symbol Pearson R 95% Confidence Interval P Value (Two-Tailed) P Value (Corrected) 2 CTSS 20.501 20.68 to 20.26 1.8310 4 0.0426 2 FAM47E///STBD1 0.496 0.26 to 0.68 2.1310 4 0.0426 2 FYB 20.478 20.67 to 20.23 3.9310 4 0.0427 2 LTB 20.471 20.66 to 20.23 4.8310 4 0.0427 2 EHBP1L1 20.466 20.66 to 20.22 5.7310 4 0.0427 2 MFAP4 20.462 20.65 to 20.21 6.4310 4 0.0427 2 MICALL2 20.457 20.65 to 20.21 7.5310 4 0.0428 2 CTSK 20.428 20.63 to 20.17 1.7310 3 0.077 2 PLXDC1 20.417 20.62 to 20.16 2.3310 3 0.085 2 F12 20.390 20.60 to 20.13 4.6310 3 0.154 2 VEGFA 0.372 0.11 to 0.59 7.2310 3 0.222 2 PCOLCE 20.369 20.59 to 20.10 7.8310 3 0.222 2 MYCN 0.364 0.10 to 0.58 8.6310 3 0.231 GP2 0.356 0.09 to 0.58 0.010 0.235 SLC34A1 0.356 0.09 to 0.58 0.010 0.235 EFHD2 20.355 20.58 to 20.09 0.011 0.235 LST1 20.352 20.57 to 20.09 0.011 0.236 MICB 20.345 20.57 to 20.08 0.013 0.25 LRCH4 20.336 20.56 to 20.07 0.016 0.275 ANXA9 0.334 0.07 to 0.56 0.017 0.275 MAGI2 0.321 0.05 to 0.55 0.022 0.336 ACSM5 0.313 0.04 to 0.54 0.025 0.375 SLC22A3 20.308 20.54 to 20.04 0.028 0.387 PSRC1 20.304 20.54 to 20.03 0.030 0.398 UMOD 0.301 0.03 to 0.53 0.032 0.401 ALDH3A2 0.298 0.02 to 0.53 0.034 0.401 GATM 0.297 0.02 to 0.53 0.035 0.401 SORT1 0.294 0.02 to 0.53 0.036 0.401 AGMAT 0.293 0.02 to 0.53 0.037 0.401 DRAP1 0.293 0.02 to 0.53 0.037 0.401 CASP9 0.289 0.01 to 0.52 0.040 0.401 KDM5A 20.287 20.52 to 20.01 0.041 0.401 SLC6A13 0.286 0.01 to 0.52 0.042 0.401 HLA-C 20.284 20.52 to 20.01 0.044 0.406 Pearson product moment correlation coefficient (Pearson R) was used to measure the strength of association between gene expression and eGFR. Two-tailed test was used to determine the statistical significance. With Benjamini–Hochberg multiple testing correction, seven transcripts showed significant correlation with eGFR (P corrected,0.05). Gene symbols are official symbols approved by the Organization Committee (HGNC).

694 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH

We also looked for linear correlation between CRATsand renal function. Pearson correlation identified 92 transcripts with statistically significant (P,0.05) linear cor- relation with kidney function (Table 2). The correlation between the expression of 70 CRATs and eGFR remained significant, even after Benjamini–Hochberg-based multiple testing correction. More tran- scripts (58%) showed a positive correlation with renal function (i.e., their expression was decreased in samples with lower GFR), whereas 42% showed an inverse cor- relation. Renal function correlated with the expression of 25 CRATs both in glomeruli and tubules. Tubule-specific expression of solute carriers had the strongest correlation with renal function. For example, the levels of Solute carrier family 34, member 1 (SLC34A1), which codes a type II sodium/ phosphate cotransporter, and SLC7A9, which codes the light chain of an transporter (Figure 2, A and B), cor- related strongly with eGFR (with R values of 0.61 and 0.59, respectively). Both tran- scripts proteins that are highly and specifically expressed in renal tubule epi- thelial cells (Figure 2, D and E). In addition to solute carriers, the expression of a meta- bolic , Acyl-CoA synthetase me- dium chain family member 5 (ACSM5), also highly correlated with renal function and showed high protein expression in tu- Figure 1. Correlation between CRAT expression in glomeruli and renal function. The bule epithelial cells (Figure 2, C and F). For y axis shows the relative normalized glomerular expressions of (A) FAM47E,(B)PLXDC1, external validation, we used a gene expres- (C) VEGFA,and(D)MAGI2.Thex axes show the eGFR for each sample. Each dot sion dataset containing genome-wide tran- represents one individual miscrodissected glomerular sample. The lines represent the scription profiling from 41 microdissected fitted linear correlation values. Immunohistochemistry shows the protein expression tubule samples. The clinical characteristics in human glomeruli ([E] FAM47E,[F]PLXDC1,[G]VEGFA,and[H]MAGI2). Scale bars, of these samples are described in Supple- 100 mm. Reprinted from www.proteinatlas.org. mental Table 7. Samples in this dataset were different from the primary dataset, and a slightly different method was used for mi- CRAT Expression in Normal and Diseased Human croarray probe labeling. Although this dataset was much Kidney Tubule Samples smaller with a narrower GFR range, we confirmed the signif- Next, we analyzed the expression levels of 226 CRATs in 95 mi- icant linear correlation of 51 transcripts, highlighting the im- crodissected human kidney tubule samples. Samples were obtained portance of these CRATs (Table 2). Next, we also specifically from patients with a wide range of kidney function (Supplemental examined the correlation of the diabetic CKD-associated Table 4): 56 samples with normal eGFR (eGFR.60 ml/min per polymorphisms (rs12437854, rs7583877, and rs1617640) 1.73 m2) and 39 samples with kidney disease (eGFR,60 ml/min and transcript changes only in diabetic kidney disease (Supple- per 1.73 m2). We performed a binary analysis by comparing the mental Table 5). The analysis highlighted that Procollagen expression levels of CRATs in control versus CKD samples. Using C-endopeptidase enhancer (PCOLCE) and Thyroid hormone re- statistical correction for multiple testing (Benjamini–Hochberg ceptor interactor 6 (TRIP6) in the vicinity of diabetic CKD SNPs corrected P value,0.05), 73 CRATs (from 226 CRATs) showed correlate with kidney function. In summary, the gene expression differential expression when CKD tubule samples were compared and kidney function correlation analysis underscored CRATs to controls (Supplemental Table 6). for future prioritization.

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 695 BASIC RESEARCH www.jasn.org

Table 2. In tubules, expression levels of 92 transcripts (CRATs) showed significant correlation with eGFR Gene Symbol Pearson R 95% Confidence Interval P Value (Two-Tailed) P Value (Corrected) 2 2 SLC34A1a 0.610 0.47 to 0.72 5.3310 11 2.1310 8 2 2 SLC7A9a 0.588 0.44 to 0.71 3.6310 10 7310 8 2 2 ACSM5a 0.551 0.39 to 0.68 7.3310 9 9.6310 7 2 2 FYBa 20.531 20.66 to 20.37 3310 8 2.4310 6 2 2 ACSM2A///ACSM2Bb 0.526 0.36 to 0.66 4.3310 8 2.4310 6 2 2 NAT8Bb 0.518 0.35 to 0.65 7.4310 8 3.3310 6 2 2 ALDH3A2a 0.517 0.35 to 0.65 8.3310 8 3.3310 6 2 2 LTBa 20.517 20.65 to 20.35 8.3310 8 3.3310 6 2 2 LST1a 20.514 20.65 to 20.35 1310 7 3.6310 6 2 2 UMODa 0.504 0.34 to 0.64 1.9310 7 6.1310 6 2 2 ACAD10b 0.485 0.31 to 0.63 6.5310 7 1.8310 5 2 2 DNAJC16 0.478 0.31 to 0.62 9.7310 7 2.5310 5 2 2 GSTM4a 0.474 0.30 to 0.62 1.2310 6 3310 5 2 2 SLC6A13b 0.469 0.30 to 0.61 1.7310 6 3.3310 5 2 2 VEGFAa 0.468 0.29 to 0.61 1.7310 6 3.3310 5 2 2 CTSSa 20.468 20.61 to 20.29 1.8310 6 3.3310 5 2 2 ANXA9a 0.464 0.29 to 0.61 2.2310 6 3.9310 5 2 2 SLC6A12 0.454 0.28 to 0.60 3.8310 6 6310 5 2 2 FAM47E///STBD1a 0.444 0.27 to 0.59 6.5310 6 9.5310 5 2 2 SLC47A1a 0.444 0.27 to 0.59 6.5310 6 9.5310 5 2 2 NAT8///NAT8Bb 0.441 0.26 to 0.59 7.7310 6 1.1310 4 2 2 ALDH2b 0.440 0.26 to 0.59 8.1310 6 1.1310 4 2 2 CERS2 0.436 0.26 to 0.59 1310 5 1.2310 4 2 2 STC1b 0.431 0.25 to 0.58 1.3310 5 1.6310 4 2 2 APOMa 0.428 0.25 to 0.58 1.5310 5 1.8310 4 2 2 DAB2a 0.410 0.23 to 0.57 3.7310 5 3.9310 4 2 2 AGMATa 0.397 0.21 to 0.56 6.7310 5 6.9310 4 2 2 GATMa 0.393 0.21 to 0.55 8310 5 7.8310 4 2 2 SLC22A2 0.393 0.21 to 0.55 8.3310 5 7.9310 4 2 2 FAM89Bb 20.391 20.55 to 20.21 8.9310 5 8.1310 4 2 2 SLC30A4 0.382 0.20 to 0.54 1310 4 1.1310 3 2 2 MYCNa 0.376 0.19 to 0.54 2310 4 1.4310 3 2 2 AIF1b 20.352 20.52 to 20.16 5310 4 3.4310 3 2 2 TRIP6b 0.352 0.17 to 0.52 5310 4 3.5310 3 2 2 LARP4B 0.347 0.16 to 0.51 6310 4 4310 3 2 2 GPERa 0.347 0.16 to 0.51 6310 4 4310 3 2 2 LRCH4 20.342 20.51 to 20.15 7310 4 4.7310 3 2 2 SLC22A1 0.342 0.15 to 0.51 7310 4 4.7310 3 2 2 FAM193B 20.338 20.51 to 20.15 8310 4 5.4310 3 2 2 CTSKa 20.335 20.50 to 20.14 9310 4 5.9310 3 2 2 IGF2R 0.321 0.13 to 0.49 1.5310 3 9.5310 3 2 2 DDI2///RSC1A1 0.320 0.13 to 0.49 1.6310 3 9.9310 3 2 PLXDC1a 20.315 20.49 to 20.12 1.9310 3 0.012 2 TFDP2b 0.313 0.12 to 0.48 2310 3 0.012 2 BAG6 0.311 0.12 to 0.48 2.1310 3 0.013 2 EHBP1L1a 20.311 20.48 to 20.12 2.1310 3 0.013 2 CTSW 20.309 20.48 to 20.12 2.3310 3 0.013 2 ATXN2 0.309 0.12 to 0.48 2.3310 3 0.013 2 ERBB2b 0.308 0.11 to 0.48 2.4310 3 0.013 2 PHTF2b 20.307 20.48 to 20.11 2.5310 3 0.013 2 TBX2 0.305 0.11 to 0.48 2.7310 3 0.014 2 MICBa 20.303 20.48 to 20.11 2.8310 3 0.014 2 SIPA1 20.298 20.47 to 20.10 3.3310 3 0.017 2 PRUNE 0.298 0.10 to 0.47 3.4310 3 0.017 2 CCDC85Bb 20.287 20.46 to 20.09 4.9310 3 0.023 2 GRK6 20.287 20.46 to 20.09 4.9310 3 0.023

696 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH

Table 2. Continued

Gene Symbol Pearson R 95% Confidence Interval P Value (Two-Tailed) P Value (Corrected) 2 PIP5K1A 0.278 0.08 to 0.46 6.3310 3 0.029 2 LTBP3b 20.276 20.45 to 20.08 6.9310 3 0.032 2 CASP9b 0.273 0.08 to 0.45 7.5310 3 0.034 2 NSD1 0.272 0.07 to 0.45 7.7310 3 0.034 2 SOX11 20.271 20.45 to 20.07 7.8310 3 0.034 2 DUOX1 20.269 20.45 to 20.07 8.4310 0.036 2 AFF3 20.267 20.45 to 20.07 8.8310 3 0.036 2 NCR3 20.267 20.45 to 20.07 8.8310 3 0.036 2 FIBP 0.265 0.07 to 0.44 9.6310 3 0.039 GNAI3 20.263 20.44 to 20.07 0.010 0.040 FOSL1 0.261 0.06 to 0.44 0.011 0.042 MPPED2 0.259 0.06 to 0.44 0.011 0.044 MFAP4a 20.258 20.44 to 20.06 0.012 0.045 B9D1 20.254 20.43 to 20.06 0.013 0.049 SLC12A9 0.254 0.06 to 0.43 0.013 0.050 CFL1 20.250 20.43 to 20.05 0.015 0.055 CST8 20.248 20.43 to 20.05 0.015 0.057 DBN1a 20.242 20.42 to 20.04 0.018 0.067 CDC42SE1a 20.242 20.42 to 20.04 0.018 0.067 GP2a 0.233 0.03 to 0.42 0.023 0.079 MYH9b 20.233 20.42 to 20.03 0.023 0.079 IDI1 0.233 0.03 to 0.42 0.023 0.079 MLLT11b 20.233 20.42 to 20.03 0.023 0.079 DACH1 0.229 0.03 to 0.41 0.026 0.087 DDX1 0.226 0.03 to 0.41 0.028 0.093 ACSM1 20.225 20.41 to 20.03 0.028 0.094 PDLIM7b 20.225 20.41 to 20.02 0.029 0.094 RELA 20.224 20.41 to 20.02 0.029 0.096 SORT1 0.221 0.02 to 0.41 0.031 0.10 SLC28A2a 0.221 0.02 to 0.41 0.031 0.10 EPN2b 20.213 20.40 to 20.01 0.038 0.119 CELA2A///CELA2Ba 20.213 20.40 to 20.01 0.039 0.120 CUX2 20.212 20.40 to 20.01 0.040 0.122 PTPN12b 20.211 20.40 to 20.01 0.040 0.122 MICALL2a 20.206 20.39 to 2,0.004 0.046 0.137 MKKS 0.204 ,0.01 to 0.39 0.048 0.141 Pearson product moment correlation coefficient (Pearson R) was used to measure the strength of association between gene expression and eGFR. Two-tailed test was used to determine the statistical significance. Seventy transcripts showed significant correlation with eGFR (P corrected,0.05) after Benjamini–Hochberg- based multiple testing correction. Gene symbols are official symbols approved by the Human Genome Organization Gene Nomenclature Committee (HGNC). aThegeneexpressionsignificantly correlated with eGFR in the external validation microarray dataset containing 41 tubule samples and the correlation remained significant after multiple testing correction. bThe gene expression significantly correlated with eGFR in the external validation microarray dataset containing 41 tubule samples.

Transcript Levels around the UMOD immunohistochemistry staining from samples used for the tran- We specifically further investigated expression changes of the scriptomic analysis, indicating the excellent correlation between UMOD transcript, because it is a potential causal gene underly- uromodulin protein expression and its transcript levels. ing the polymorphism of some of the best characterized CKD- Although UMOD has emerged as an important causal gene associated SNPs on 16 (rs12917707, rs4293393, and for CKD, unexpectedly, we found that three other nearby rs11864909). This gene encodes one of the most abundant pro- genes were also highly expressed in renal tubules, and their teins in human urine; Uromodulin or Tamm–Horsfall protein. expression strongly correlated with eGFR. To illustrate this Furthermore, functional studies seem to link UMOD expression observation, Figure 4, A–C shows the locus, both as a biomarker and a causal gene for CKD development.28 including three leading SNPs (rs12917707, rs4293393, and We found that UMOD transcript levels showed a highly signifi- rs11864909) with polymorphisms that best correlate with 2 cant linear correlation with renal function (P51.9310 7)intu- CKD. Closest genes to these polymorphisms are UMOD and bule samples (Figure 3A). Figure 3, B–E, shows results of PDILT (Protein disulfide isomerase-like, testis expressed). The

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 697 BASIC RESEARCH www.jasn.org

expression strongly correlated with renal function (Figure 3, F–J), potentially indicat- ing a functional role for these transcripts. Similar results are shown for a chromosome 5region(Figure4,D–F). In summary, our observations showed that the expression of multiple genes around a single locus corre- lated with kidney function, potentially indi- cating that the regulation of these genes could be linked. Next, we examined whether in the prox- imity of a single SNP we could observe changes in expression of a single gene or multiple genes. We found that, on 23 of 43 examined CKD risk loci, multiple neighbor- ing transcripts correlated with renal function (Supplemental Figure 3). For example, at the SLC47A1 locus (rs2453580), not only the SLC47A1 (multidrug extrusion protein) but also, the Aldehyde dehydrogenase 3 family, member A2 (ALDH3A2) correlated with 2 eGFR (P58310 8). We found that, around the rs267734 polymorphism on chromo- some 1, both Ceramide synthase 2 (CERS2; 2 P51.01310 5) and A9 (ANXA9) 2 (P52.2310 6) transcripts correlated with eGFR in tubule samples. In addition, tran- script level of (CTSS) corre- lated with renal function both in tubules 2 and glomeruli (P51.773 10 6 and Figure 2. Correlation between CRAT expression in tubules and renal function. Ex- P5 3 24 pressions of (A) SLC34A1,(B)SLC7A9,and(C)ACSM5 correlate with eGFR in tubule 1.8 10 ,respectively).Onchromo- samples. The x axes represent eGFR (ml/min per 1.73 m2), whereas the y axes rep- some 5 at the rs11959928 polymorphism, resent the normalized gene expression values of the transcript. Each dot represents both Disabled homolog 2 (DAB2,aputa- transcript levels and eGFR values from a single kidney sample. The lines are the fitted tive mitogen-responsive phosphopro- correlation values. Immunohistochemistry shows tubular-specific expression of (D) tein) and FYN binding protein (FYB) SLC34A1,(E)SLC7A9,and(F)ACSM5. Scale bars, 100 mm. Reprinted from www. showed strong correlation with renal func- 2 2 proteinatlas.org. tion (P53.68310 5 and P53310 8,re- spectively) (Figure 4, D–F). On the basis of renal expression and expression level of one of the flanking genes PDILTwas nearly renal function association, we could prioritize potential target undetectable, but our RNA sequencing analysis confirmed and/or causal genes for CKD development for 39 of 44 high UMOD transcript levels in human kidney samples (Fig- examined loci. As mentioned earlier, there was no gene around ure 4B). Unfortunately, PDILT probes were absent from the rs12437854,andtheonlynearbygene(WDR72)around human U133 chips, and therefore, the correlation between rs491567 had no probe on the human U133 chips, albeit PDILT and renal function could not be analyzed. However, WDR72 is highly expressed in human kidney (Supplemental we observed that ACSM5 (Figure 2, C and F) and ACSM2A/B Table 2) by RNA sequencing analysis. No nearby transcript were highly expressed in human kidney tubule samples (Fig- showed association with renal function for three SNPs ure 4, A and B). Furthermore, we also validated the transcript (rs1394125, rs7805747, and rs4744712) (Supplemental Figure 3). expression of UMOD,Glycoprotein2(GP2), ACSM5, The correlation between these loci and kidney function would ACSM2A/B, ACSM1,andPDILT by quantitative real-time need to be re-evaluated. RT-PCR (QRT-PCR) (Supplemental Table 8) to confirm the Taken together, we identified 104 transcripts of 226 CRATs microarray results (Figure 4C). ACSM5 and ACSM2A/2B showing significant correlation with eGFR. We could highlight genes (ACSM family members) encode three genes in the genes for further prioritization for 39 of 44 loci (89%). Using a-fatty acid oxidation pathway. Interestingly, these transcripts UMOD, ACSM2A,andVEGFA genes as examples, we showed not only showed high expression in the kidney, but also, their that these expression changes likely correlate with protein

698 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH

Figure 3. UMOD and ACSM2A expressions correlate with renal function. The expressions of (A) UMOD and (F) ACSM2A correlate with eGFR in tubule samples. The x axes represent eGFR (ml/min per 1.73 m2), whereas the y axes represent the normalized gene expression values of the transcript. Each dot represents transcript levels and eGFR values from a single kidney sample. The lines are the fitted cor- relation value (Pcorr, P value after Benjamini-Hochberg multiple testing correction). Immunohistochemistry of the samples with low and high mRNA expression showed differences of (B–E) the UMOD and (G–J) the ACSM2A expression on protein level. Scale bars, 50 mm. levels. Our results also suggest that not only the closest gene 500-kb window that we used to identify CRATs. All 11 tran- but also, several genes in the close vicinity correlate highly with scripts were at least moderately expressed in human kidney tis- renal function, indicating their potential importance and their sue. Only one transcript CLTB (, light chain B) showed potential coregulation. significant linear correlation with eGFR in glomerulus samples (P=0.016). Another transcript, CERS2 (also known as LASS2), Expressions Quantitative Trait Loci Highlight CKD showed variation in gene expression in lymphoblastiod tissue on Candidate Genes the basis of the rs267734 and rs267738 genotypes. Furthermore, Polymorphisms associated with kidney function can also directly CERS2 was differentially expressed in CKD and highly correlated control baseline transcript levels in disease-relevant types. To with eGFR (Table 2 and Supplemental Table 6), making it a examine whether CKD risk SNPs influence local transcript levels potential candidate gene for CKD development. (in cis; i.e., within 1-Mb distance), we examined multiple differ- Unfortunately, we did not have genomic DNA from all ent datasets where genotype and gene expression correlation analyzed kidney samples to examine genotype and gene expres- data were available. These datasets included the MuTHER (Mul- sion correlations, but we genotyped 21 control (eGFR.85 tiple Tissue Human Expression Resource) and other studies,29–33 ml/min per 1.73 m2) samples for the rs881858 polymorphism. where transcript levels were available from liver, adipose, and In the same samples, tubule-specific VEGFA transcript levels lymphoblastoid samples. Cis-expressions quantitative trait loci were determined by QRT-PCR. Tubule-specific VEGFA transcript (cis-eQTLs) often can be detected in multiple tissues. We found levels were lower in patients who were homozygous for the ma- that 4 SNPs from the previously identified 44 leading SNPs and jor allele on the rs881858 locus compared to heterozygous or 16 SNPs in their linkage disequilibrium (r2$0.8) acted as minor allele homozygous samples (Figure 5A). Glomerular or cis-eQTLs for 11 different transcripts (P,0.05) (Supplemental tubule-specific VEGFA transcript and protein expression level Table 9). Four of these transcripts (33%) were outside of the correlated with GFR (Figures 1C and 5, B–D). These results

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 699 BASIC RESEARCH www.jasn.org

Figure 4. Tubule-specific transcript levels correlate with renal function near the UMOD locus (rs4293393, rs12917707, and rs11864909 polymorphisms) and the disabled homolog 2 (DAB2) locus (rs11959928). The x axes represent the genomic positions of each gene on (A and C) 16 and (D and F) 5. The y axes represent the negative logarithms of the corrected P values (significances) between the expressions of each gene and eGFR (ml/min per 1.73 m2). (A and D) Color coding represents the baseline expression of the

700 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH indicate that the rs881858 polymorphism likely influences critical regions in the genome with variations that are associated VEGF transcript levels and that VEGFA could be an important with kidney function. The second step is to identify transcripts CKD candidate gene. Additionally, we examined whether ge- that are regulated by SNPs. The working hypothesis in the field is netic polymorphism (rs6420094) on around that causal polymorphisms alter transcription factor binding, SLC34A1 will influence transcript expression. We found that causing changes in transcript levels in target cell types and tubule-specific SLC34A1 expression was significantly higher in inducing diseaseinspecificorgans.Because there are hundreds of patients who were homozygous for the major allele on the genetic variants associated with disease development, analyzing rs6420094 locus compared with heterozygous or minor allele variants individually is a daunting task. Recently, two comple- homozygous samples (Supplemental Figure 4A). mentary methods have been developed and successfully applied Transcription factor binding analysis around 44 CKD risk- to identify genes that are targets of the polymorphisms. The first associated polymorphisms indicated that, at 42 loci, multiple method uses the transcript levels as quantitative traits to identify binding motifs are altered by the genetic variants (Supple- polymorphisms that influence their levels (eQTL).34 Toperform mental Table 10). This altered transcription factor binding such analysis, a large human tissue bank from target cell types is can potentially highlight the mechanism of CKD risk variant- necessary where both genetic polymorphisms and transcript associated disease development. levels are analyzed. The second (newer) method uses the cell type-specific cellular epigenome for regulatory element annota- CRATs Form Networks Highlighting the Role of tion and identifies target transcripts that are associated with Inflammation and Epithelial Biology genetic variants.35 A critical limitation of these methods is that Table 3 summarizes our results and provides comprehensive they only identify transcripts that are influenced by a basal tran- evaluation of the loci and transcripts studied here. Finally, we scription factor, because these datasets are generated from con- examined whether the 104 renal function-correlating CRATs trol healthy samples. However, it is possible that polymorphisms (either in tubule or glomerular samples) in the neighborhood control transcription factor binding sites for signal-dependent of 39 CKD risk loci show relatedness and can form a network. transcription factors. This would mean that the expression of The network analysis was performed separately on genes that aCKDcausinggeneisnotalteredatbaselinebutshowsdiffer- showed positive or negative correlation with kidney function. ences under stress conditions. Figure 7 summarizes the concept Genes showing negative correlation with kidney function underlying this work. Here, we performed the initial level of such (higher expression in CKD) clustered at the TNF-a, TGF-b1, analysis by examining the correlation between transcripts in the and NF-kB/RelA regulatory nodes (Figure 6A). Most members vicinity of CKD SNPs and GFR. of this cluster are known to play a role in immune function and On the basis of recent observations that close to 90% of target regulation of inflammation. The second cluster (transcripts with transcripts are within 250 kb of the polymorphism, we defined expression that inversely correlated with kidney function) cen- 306 CRATs. Most priorstudies focused onthetwo flanking genes, tered at VEGFA and EGF 2 molecules (Figure 6B). As ignoring transcripts that are farther away.8,9 These 306 CRATs indicated by their name, these molecules play important roles in could be important for future studies as potential candidates for maintaining epithelial and endothelial functions. In summary, CKD development. We determined their baseline expression network analysis highlighted the relatedness of the regulated patterns using highly accurate RNA sequencing methods. Their genes and the potential role of epithelial cell biology and inflam- strong enrichment in the kidney supports their functional role. mation in CKD. However, it also highlighted that two separate cell types are likely important for CKD development: the kidney and peripheral leukocytes. This finding is supported by both network analysis DISCUSSION and tissue-specific gene expression analysis. Mechanistic studies shall determine the role of these cells in CKD development. Di- Understanding complex trait development is a formidable abetic and hypertensive renal disease are considered nonimmune- challenge. The first step is to understand the genetic architecture mediated renal diseases; however, this dogma might need to be of the disease. Initial GWASs have provided the first glimpse of revisited.

transcripts in human kidney on the basis of the RNA sequencing data. Red, high expression in the kidney; yellow, medium expression in the kidney; green, low expression in the kidney. (B and E) On the basis of the results of the Illumina Body Map (www.ebi.ac.uk), a heat map was generated from the FPKM values of the CRATs near these SNPs. High expression values (90th percentile) are marked red, and low ex- pression values (,10th percentile) are marked blue. Expressions with FPKM values,0.1 are marked white. *Genes without probe set identifications on the Affymetrix arrays. QRT-PCR validation confirmed the significant correlation with eGFR of the following transcripts: (C) 2 (GP2), UMOD, ACSM5, ACSM2A,andACSM2B and (F) FYN binding protein (FYB)andDAB2. A shows a strong correlation between UMOD expression and eGFR, whereas the expressions of ACSM5 and -2A/2B also highly correlate with renal function. (D) At the rs11959928 locus, not only the transcript DAB2 but also, the FYB show high correlation with eGFR (PDILT, Protein disulfide isomerase-like, testis expressed; C9, Complement component 9).

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 701 BASIC RESEARCH www.jasn.org

only correlates with kidney function, but in other tissues, CERS2 levels strongly influenced by a nearby polymorphism, making this gene a very strong CKD candidate. Our analysis highlighted a large number of novel genes lo- cated in the vicinity of CKD-related GWAS hits; these genes can be the target of additional analysis. A critically important observation of the work is that the expression of more than one gene correlated with eGFR on a single genetic locus. We illustrated this observation on the chromosome 16 locus, where not only UMOD but also, a cluster of ACSM genes (ACSM2A/B and -5) showed association with eGFR. Because ACSM2A/B and -5 are part of the same a-fatty oxidation pathway, the examination of this pathway warrants ad- ditional scrutiny. This interesting coregula- tory pattern was present for most of the CKD GWAS SNPs, potentially indicating that a single polymorphism can control the expression of multiple genes. These observa- tions could indicate that a SNP may not just regulate a single gene but may cause the dif- ferential regulation of an entire gene cluster. Our analysis emphasized the importance of smallexpression differencesin many genes in CKD, but these genes do not seem to be independent but instead, form organized clusters and pathways. We identified two major clusters. One of them centered at Figure 5. The expression of VEGFA correlates with renal function. The expression of epithelial and VEGF signaling. These genes VEGFA is significantly lower (*P=0.025) in samples homozygous for A alleles (A/A; show a linear correlation with kidney func- n=7) at the rs881585 locus compared with samples with minor alleles (A/G; n=7 or tion, likely indicating the relationship be- . 2 G/G; n=7) at this locus. (A) Only control samples (eGFR 85 ml/min per 1.73 m ) tween epithelial and vascular integrity in were used for the analysis. (B) Microarray-based transcript levels of VEGFA correlate 2 26 progressive nephropathy. The second cluster with renal function in tubule samples (R =0.219, P51.7310 ). (C) QRT-PCR–based 2 VEGFA transcript levels (R2=0.228, P57.8310 4)confirm its correlation with kidney highlightedTNF andTGF-b1; these genes are fl function. (D) VEGFA protein expression (by immunohistochemistry) correlates with known to play important roles in in amma- transcript levels. Counterstained with hematoxylin. Scale bars, 50 mm. tion and fibrosis. Expressions of these tran- scripts showed an inverse correlation with renal function, indicating an increased ex- The highlight of our work is the identification of novel genes pression of these genes in CKD. Functional experiments support in the vicinity of CKD-associated SNPs that show strong our findings. Increased inflammation and destruction of func- correlation with kidney function; thereby, they are potential tioning epithelial cells are cornerstones of fibrosis develop- candidates for CKD development (for example, FAM47E, ment.39,40 PLXDC1, ACSM2A/B, ACSM5,andMAGI2). PLXDC1 (previously Alimitationofthestudy isthatitisfromasinglecenter,andwe known as tumor endothelial marker 7) is primarily associated with did not have genetic and genomic information from the same angiogenesis in the cancer field, including kidney cancers.36 Re- kidney samples to directly correlate genetic variation and gene cently, its increased expression in diabetic retinopathy has been expression. Furthermore, as with every human study, the work reported.37 We found that the MAGI2 expression correlates with mostly highlights an association and cannot fully establish renalfunctioninglomeruli.AlthoughMAGI2 is expressed in the causality. Changes of transcript levels do not fully indicate that , MAGI2 expression is enriched in podocytes.38 Given the they are functionally relevant. However, even if some of the critical role of podocytes in kidney disease development, this gene identified genes are not causally linked to CKD development, the could be an important candidate. The expression of CERS2 not expression of these transcripts correlates with kidney function

702 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH

Table 3. Summary of the CRATs correlating with eGFR CKD Risk- CRATs Correlating Position of the SNP No. of Total Location Genes within Associated Position with eGFR in T Relative to Each Correlating (Chromosome) 6250 kb Locus and G Samples Correlating CRAT CRATs (P,0.05) rs267734 1 150951477 CTSSb,e CTSS (T and G)f Upstream 8 CTSKb,e CTSK (Tf and G) Upstream ARNTb,e CERS2 (T)f Upstream SETDB1b,e ANXA9 (Tf and G) Downstream CERS2a,e PRUNE (Tf) Downstream ANXA9a,e MLLT11 (T) Downstream FAM63Aa,e CDC42SE1 (T) Downstream PRUNEa,e PIP5K1A (T)f Downstream MLLT11b,e BNIPLc C1orf56b,e GABPB2b SEMA6Cb,e CDC42SE1a,e LYSMD1b SCNM1b,e TMOD4c VPS72a,e PIP5K1Aa,e TNFAIP8L2c,e rs1933182 1 109999588 SARSa,e PSRC1 (G) Upstream 4 CELSR2b,e SORT1 (T and G) Upstream PSRCc,e GNAI3 (Tf) Downstream MYBPHLc GSTM4 (Tf) Downstream SORT1a,e PSMA5a,e SYPL2a ATXN7L2b CYB561D1b AMIGO1b GPR61c GNAI3a,e AMPD2b,e GSTM2a,e GSTM4a,e GSTM1a,e GNAT2c,e rs12124078 1 15869899 FHAD1c EFHD2 (G) Upstream 8 EFHD2a,e CELA2A (T) Upstream CTRCd,e CELA2B (T) Upstream CELA2Ad,e CASP9 (Tf and G) Upstream CELA2Bc,e DNAJC16 (Tf) Intronic CASP9a,e AGMAT (Tf and G) Downstream DNAJC16a,e DDI2 (Tf) Downstream AGMATa,e RSC1A1 (Tf) Downstream DDI2b,e RSC1A1b,e SLC25A34b TMEM82a FBLIM1a rs16864170 2 5907880 SOX11c,e SOX11 (Tf) Upstream 1

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 703 BASIC RESEARCH www.jasn.org

Table 3. Continued

CKD Risk- CRATs Correlating Position of the SNP No. of Total Location Genes within Associated Position with eGFR in T Relative to Each Correlating (Chromosome) 6250 kb Locus and G Samples Correlating CRAT CRATs (P,0.05) rs6431731 2 15863002 DDX1a,e DDX1 (T) Upstream 2 MYCNb,e MYCN (Tf and G) Downstream rs10206899 2 73900900 ALMS1b,e NAT8 (Tf) Upstream 2 NAT8a,e NAT8B (Tf) Downstream NAT8Ba,e TPRKBa,e DUSP11a,e C2orf78c STAMBPb,e ACTG2c,e rs7583877 2 100460654 AFF3c,e AFF3 (Tf) Intronic 1 rs347685 3 141807137 ATP1B3a,e TFDP2 (Tf) Intronic 1 TFDP2b,e GK5b XRN1b rs17319721 4 77368847 SCARB2a,e FAM47E (Tf and Gf) Upstream 2 rs13146355 77412140 FAM47Eb,e STBD1 (Tf and Gf) Upstream STBD1b,e CCDC158c SHROOM3b rs6420094 5 176817636 NSD1b,e NSD1 (Tf) Upstream 7 RAB24a SLC34A1 (Tf and G) Intronic PRELID1a F12 (G) Downstream MXD3c,e GRK6 (Tf) Downstream LMAN2a,e DBN1 (T) Downstream RGS14a,e PDLIM7 (T) Downstream SLC34A1a,e FAM193B (Tf) Downstream PFN3b F12b,e GRK6b,e PRR7c,e DBN1a,e PDLIM7a,e DOK3c,e DDX41a,e FAM193Bb,e TMED9a,e B4GALT7a,e rs11959928 5 39397132 FYBb,e FYB (Tf and Gf) Upstream 2 C9c,e DAB2 (Tf) Intronic DAB2a,e rs881858 6 43806609 POLHb,e VEGFA (Tf and G) Upstream 1 GTPBP2a,e MAD2L1BPa,e RSPH9c MRPS18Aa,e VEGFAa,e C6orf223c rs2279463 6 160668389 IGF2Ra,e IGF2R (Tf) Upstream 4 SLC22A1c,e SLC22A1 (Tf) Upstream

704 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH

Table 3. Continued

CKD Risk- CRATs Correlating Position of the SNP No. of Total Location Genes within Associated Position with eGFR in T Relative to Each Correlating (Chromosome) 6250 kb Locus and G Samples Correlating CRAT CRATs (P,0.05) SLC22A2a,e SLC22A2 (Tf) Intronic SLC22A3b,e SLC22A3 (G) Downstream rs3828890 6 31440669 HLA-Ca,e HLA-C (G) Upstream 8 HLA-Ba,e MICB (Tf and G) Downstream MICAb,e LST1 (Tf and G) Downstream MICBc,e NCR3 (Tf) Downstream DDX39Ba,e AIF1 (Tf) Downstream ATP6V1G2b,e BAG6 (Tf) Downstream LTAc,e APOM (Tf) Downstream NFKBIL1a,e LTB (Tf and Gf) Downstream LST1b,e NCR3c,e AIF1a,e PRRC2Aa,e BAG6a,e C6orf47a,e GPANK1b CSNK2Ba,e LY6G5Bb ABHD16Aa LY6G5Cb,e APOMa,e LY6G6Fd LY6G6Cc,e DDAH2a,e C6orf25c,e LTBb,e TNFc,e rs6465825 7 77416439 PTPN12a,e PTPN12 (T) Upstream 3 RSBN1Lb PHTF2 (Tf) Downstream TMEM60a MAGI2 (G) Downstream PHTF2b,e MAGI2b,e rs7805747 7 151407801 RHEBa,e 0 PRKAG2a,e rs10277115 7 1285195 C7orf50a GPER (Tf) Upstream 2 GPR146b MICALL2 (Gf) Downstream GPERb,e ZFAND2Aa UNCXc MICALL2b,e INTS1a,e rs1617640 7 100317298 TSC22D4a,e LRCH4 (Tf and G) Upstream 4 NYAP1c PCOLCE (G) Upstream AGFG2b,e SLC12A9 (T) Downstream SAP25b,e TRIP6 (Tf) Downstream LRCH4a,e FBXO24c,e PCOLCEb,e MOSPD3a,e TFR2c,e

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 705 BASIC RESEARCH www.jasn.org

Table 3. Continued

CKD Risk- CRATs Correlating Position of the SNP No. of Total Location Genes within Associated Position with eGFR in T Relative to Each Correlating (Chromosome) 6250 kb Locus and G Samples Correlating CRAT CRATs (P,0.05) ACTL6Bd,e GNB2a,e GIGYF1b POP7a,e EPOb,e ZANd EPHB4a,e SLC12A9a,e TRIP6a,e SRRTa,e UFSP1b ACHEc,e rs10109414 8 23751151 NKX3-1c,e STC1 (Tf) Upstream 1 rs1731274 23766319 NKX2-6d STC1a,e rs4744712 9 71434707 PIP5K1Bb,e 0 FAM122Ab PRKACGd,e FXNb,e rs10794720 10 1156165 LARP4Bb,e LARPB4 (Tf) Upstream 2 GTPBP4a,e IDI1 (T) Upstream IDI2b,e IDI1a,e WDR37b,e ADARB2c,e rs4014195 11 65506822 SCYL1a LTBP3 (Tf) Upstream 11 LTBP3a,e FAM89B (Tf) Upstream SSSCA1a,e EHBP1L1 (Tf and Gf) Upstream FAM89Ba,e SIPA1 (Tf) Upstream EHBP1L1b,e RELA (T) Upstream KCNK7c,e CFL1 (T) Downstream MAP3K11a,e CCDC85B (Tf) Downstream PCNXL3b FOSL1 (Tf) Downstream SIPA1b,e CTSW (Tf) Downstream RELAa,e FIBP (Tf) Downstream KAT5a,e DRAP1 (G) Downstream RNASEH2Ca AP5B1b OVOL1b,e SNX32c CFL1a,e MUS81a,e EFEMP2a,e CCDC85Ba,e FOSL1b,e CTSWc,e FIBPa,e C11orf68a,e TSGA10IPd

706 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH

Table 3. Continued

CKD Risk- CRATs Correlating Position of the SNP No. of Total Location Genes within Associated Position with eGFR in T Relative to Each Correlating (Chromosome) 6250 kb Locus and G Samples Correlating CRAT CRATs (P,0.05) SART1a,e DRAP1a,e rs3925584 11 30760335 MPPED2b,e MPPED2 (Tf) Upstream 1 DCDC5c DCDC1c rs653178 12 112007756 CUX2c,e CUX2 (T) Upstream 4 FAM109Ab ATXN2 (Tf) Intronic SH2B3b,e ACAD10 (Tf) Downstream ATXN2b,e ALDH2 (Tf) Downstream BRAPb,e ACAD10a,e ALDH2a,e rs10774021 12 349298 IQSEC3b,e SLC6A12 (Tf) Upstream 3 SLC6A12a,e SLC6A13 (Tf and G) Intronic SLC6A13a,e KDM5A (G) Downstream KDM5Ab,e CCDC77b B4GALNT3b rs626277 13 72347696 DACH1b,e DACH1 (T) Intronic 1 rs491567 15 53946593 WDR72a 0 rs1394125 15 76158983 SNUPNa,e 0 IMP3a,e SNX33b CSPG4c,e ODF3L1c UBE2Q2a NRG4c C15orf27c rs2453533 15 45641225 DUOX1c,e DUOX1 (Tf) Upstream 4 DUOXA2c SLC28A2 (T) Upstream DUOXA1d GATM (Tf and G) Downstream SHFc SLC30A4 (Tf) Downstream SLC28A2b,e GATMa,e SPATA5L1b,e C15orf48b SLC30A4b,e BLOC1S6a rs12437854 15 94141833 No gene in 0 ,250 kb distance rs12917707 16 20367690 GP2b,e GP2 (T and G) Upstream 6 rs4293393 20364588 UMODa,e UMOD (Tf and G) Upstream rs11864909 20400839 PDILTd ACSM5 (Tf and G) Downstream ACSM5a,e ACSM2A (Tf) Downstream ACSM2Aa,e ACSM2B (Tf) Downstream ACSM2Ba,e ACSM1 (T) Downstream ACSM1c,e rs9895661 17 59456589 BCAS3b,e TBX2 (Tf) Downstream 1 TBX2a,e C17orf82c TBX4d,e NACA2b,e

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 707 BASIC RESEARCH www.jasn.org

Table 3. Continued

CKD Risk- CRATs Correlating Position of the SNP No. of Total Location Genes within Associated Position with eGFR in T Relative to Each Correlating (Chromosome) 6250 kb Locus and G Samples Correlating CRAT CRATs (P,0.05) rs2453580 17 19438321 EPN2b,e EPN2 (T) Upstream 4 B9D1a,e MFAP4 (Tf and Gf) Upstream MAPK7b,e SLC47A1 (Tf) Intronic MFAP4a,e ALDH3A2 (Tf and G) Downstream RNF112c SLC47A1a,e ALDH3A2a,e ALDH3A1c,e SLC47A2a ULK2b,e rs11078903 17 37631924 FBXL20b ERBB2 (Tf) Downstream 1 MED1b,e CDK12b,e NEUROD2d,e PPP1R1Bb STARD3a,e PNMTc,e PGAP3a,e ERBB2a,e TCAPc,e rs7208487 17 37543449 PLXDC1c,e PLXDC1 (Tf and G) Upstream 1 CACNB1c,e ARL5Cd RPL19a,e STAC2b FBXL20b MED1b,e CDK12b,e NEUROD2d,e PPP1R1Bb STARD3a,e rs12460876 19 33356891 ANKRD27b,e SLC7A9 (Tf) Intronic 1 RGS9BPc NUDT19b,e TDRD12c,e SLC7A9a,e CEP89b C19orf40c,e RHPN2a GPATCH1b,e rs911119 20 23612737 NAPBb CST8 (T) Upstream 1 rs13038305 23610262 CSTL1d CST11c,e CST8c,e CST9Ld CST9c CST3a,e CST4d,e CST1d,e CST2d,e CST5c,e

708 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH

Table 3. Continued

CKD Risk- CRATs Correlating Position of the SNP No. of Total Location Genes within Associated Position with eGFR in T Relative to Each Correlating (Chromosome) 6250 kb Locus and G Samples Correlating CRAT CRATs (P,0.05) rs6040055 20 10633313 MKKSa,e MKKS (T) Upstream 1 SLX4IPc JAG1a,e rs4821469 22 36616445 APOL3a,e MYH9 (T) Downstream 1 APOL4b APOL2a,e APOL1a,e MYH9a,e TXN2a,e The table shows the list of genes within the 250-kb region of the CKD-associated SNPs. Baseline expression of the transcripts in human kidneys (by RNA sequencing) are shown. T, tubule; G, glomerular. aHigh expression. bMedium expression. cLow expression. dNo expression. eGenes with available probe set identifications on the microarray chip. fThe correlation with eGFR in T and G samples remained significant (P,0.05) after Benjamini-Hochberg multiple testing correction. in a large collection of human kidney samples. Therefore, these RNA quality and quantity were determined using the Laboratory-on- genes could be important potential candidate biomarkers for Chip Total RNA PicoKit Agilent BioAnalyzer. Only samples without renal function decline. evidence of degradation were further used (RNA integrity number.6). In summary, this study is one of the first studies to perform a comprehensive functional genomic analysis of Microarray Procedure CKD-associated GWAS hits. These results highlight multiple Purified total from 95 tubule samples were amplified using the new CKD risk-associated candidate genes, that were not Ovation Pico WTA System V2 (NuGEN) and labeled with the Encore originally considered by GWAS experiments. Future candidate Biotin Module (NuGEN) according to the manufacturer’s protocol. molecular and cell biology experiments will be needed to The purified total RNAs from 51 glomerular samples and 41 tubule understand the functional role of these CRATs. samples used for validation were amplified using the Two-Cycle Target Labeling Kit (Affymetrix) as per the manufacturer’s protocol. Transcript levels were analyzed using Affymetrix U133A arrays. CONCISE METHODS Microarray Data Processing Human Kidney Samples After hybridization and scanning, raw data files were imported into Kidney samples were obtained from routine surgical nephrectomies GeneSpring GX software, version 12.6 (Agilent Technologies). Raw and leftover portions of diagnostic kidney biopsies. Only the normal, expression levels were summarized using the RMA16 algorithm. non-neoplasmatic part of the tissue was used for further investigation. Normalized values were generated after log transformation and base- Samples were deidentified, and corresponding clinical information line transformation. GeneSpring GX software then was used for was collected by an individual who was not involved in the research statistical analysis. We used Benjamini–Hochberg multiple testing protocol.41,42 The study was approved by the institutional review correction with a P value,0.05. In the case of genes with more probe boards (IRBs) of the Albert Einstein College of Medicine and Mon- set identifications, the results with the lowest P values are represented. tefiore Medical Center (IRB 2002–202) and the University of Penn- Statistical analyses for the patient demographics and the linear cor- sylvania (IRB 815796). Glomerular sclerosis and interstitial fibrosis relation tests between the gene expression arrays and eGFR were were evaluated using periodic acid–Schiff-stained kidney sections by performed using Prism 6 software (GraphPad). two independent nephropathologists. Network Analyses Tissue Handling and Microdissection Transcripts with expression levels showing significant linear correlation The kidney tissue was immediately placed and stored in RNAlater with eGFR were exported to Ingenuity Network Analysis software (Ambion) according to the manufacturer’s instruction. The tissue (Ingenuity Systems). This software determines the top canonical was manually microdissected under a microscope in RNAlater for pathways by using a ratio (calculated by dividing the number of genes glomerular and tubular compartment. Dissected tissue was homog- in a given pathway that meet cutoff criteria by the total number of genes enized, and RNA was prepared using RNAeasy mini columns that constitute that pathway) and then scoring the pathways using a (Qiagen, Valencia, CA) according to the manufacturer’sinstructions. Fischer exact test (P value,0.05).

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 709 BASIC RESEARCH www.jasn.org

Figure 6. Kidney function-correlating CRATs form tight networks. (A) CRATs showing negative correlation with eGFR (green with P corrected,0.05) clustered around TNF and TGF-b. (B) CRATs showing positive correlation with eGFR (red with P corrected,0.05) centered around VEGFA and ERBB2 [erythroblastic leukemia viral oncogene homolog 2 (EGFR2, epidermal growth factor receptor 2)] (Ingenuity Systems).

710 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH

After quantile normalization, we determined the transcripts as no expression, which had zero FPKM values. The rest of the genome (FPKM values.0) was divided in three equal groups: transcripts with high, medium, and low expres- sion in the kidney. We used these four groups (high, medium, low, and no expression) to de- scribe the baseline expression of the CRATs in the kidney.

QRT-PCR Two-hundred fifty ng RNA was reverse tran- scribed using the cDNA Archival Kit (Life Technology) and QRT-PCR was run in the ViiA 7 System (Life Technology) machine using SYBRGreen Master Mix and gene-specificprimers. The data were normalized and analyzed using the ΔΔCT method.

Immunohistochemistry Immunohistochemistry was performed on paraffin- embedded sections with the following anti- bodies: UMOD (AAH35975; Sigma-Aldrich), VEGFA (Ab46154), and ACSM2A (Ab181865). We used the Vectastain MOM or anti-rabbit Elite ABC Peroxidase Kit and3,39diaminobenzidine for visualizations. Antibody specificity was evaluated separately; secondary antibodies alone showed no positive staining. Figure 7. Schematic representation of the experimental design. GWASs examine the re- lationship between genetic variants (SNP) and disease state (CKD). The eQTL examines the relationship between transcript levels and genetic variation in control samples. Here, we ACKNOWLEDGMENTS investigated the relationship between transcript levels around CKD risk variants and kidney function by examining the contribution of genetic and environmental factors. The work was supported by National Institutes of Health Grants DK087635 (to K.S.) and DK076077 (to K.S.). Part of the work was presented at the Annual RNA Sequencing Analyses Meeting of the American Society of Nephrology (November 5–10, 2013, RNA sequencing was carried out on microdissected kidney tubules. Atlanta, GA). Total RNAwas isolated using the miRNeasy Kit (Qiagen) according to the manufacturer’s protocol. An additional DNase1 digestion step was performed to ensure that the samples were not contaminated DISCLOSURES with genomic DNA. RNA purity was assessed using the Agilent 2100 The laboratory of K.S. received research support from Boehringer Bioanalyzer. Each RNA sample had an A260:A280 ratio.1.8, an RNA Ingelheim. integrity number.9, and an A260:A230 ratio.2.2. Single-end 100-bp RNA sequencing was carried out an Illumina HiSeq2000 machine. REFERENCES RNA sequencing reads were aligned to the human genome

(GRCh37/hg19) and transcriptome (hg19 RefSeq from Illumina 1. Go AS, Chertow GM, Fan D, McCulloch CE, Hsu CY: Chronic kidney iGenomes) using the softwares TopHat (version 2.0.9) and Cufflinks disease and the risks of death, cardiovascular events, and hospitaliza- (version 2.1.1.Linux_x86_64), respectively.43,44 Wecounted the number tion. NEnglJMed351: 1296–1305, 2004 of fragments mapped to each gene annotated in the UCSC hg19. Tran- 2. Böger CA, Heid IM: Chronic kidney disease: Novel insights from genome- – script abundances were measured in fragments per kilobase of per wide association studies. Kidney Press Res 34: 225 234, 2011 3. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, million fragments mapped (FPKM). Sequence data can be accessed at Reynolds AP, Sandstrom R, Qu H, Brody J, Shafer A, Neri F, Lee K, ’ the National Center for Biotechnology Informations Gene Expression Kutyavin T, Stehling-Sun S, Johnson AK, Canfield TK, Giste E, Diegel Omnibus (Accession number: GSE60119). M, Bates D, Hansen RS, Neph S, Sabo PJ, Heimfeld S, Raubitschek A,

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 711 BASIC RESEARCH www.jasn.org

Ziegler S, Cotsapas C, Sotoodehnia N, Glass I, Sunyaev SR, Kaul R, Gierman HJ, Feitosa M, Hwang SJ, Atkinson EJ, Lohman K, Cornelis Stamatoyannopoulos JA: Systematic localization of common disease- MC, Johansson Å, Tönjes A, Dehghan A, Chouraki V, Holliday EG, associated variation in regulatory DNA. Science 337: 1190–1195, 2012 Sorice R, Kutalik Z, Lehtimäki T, Esko T, Deshmukh H, Ulivi S, Chu AY, 4. Susztak K: Understanding the epigenetic syntax for the genetic alpha- Murgia F, Trompet S, Imboden M, Kollerits B, Pistis G, Harris TB, Launer bet in the kidney. J Am Soc Nephrol 25: 10–17, 2014 LJ, Aspelund T, Eiriksdottir G, Mitchell BD, Boerwinkle E, Schmidt H, 5. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Cavalieri M, Rao M, Hu FB, Demirkan A, Oostra BA, de Andrade M, Zhang X, Wang L, Issner R, Coyne M, Ku M, Durham T, Kellis M, Turner ST, Ding J, Andrews JS, Freedman BI, Koenig W, Illig T, Döring Bernstein BE: Mapping and analysis of chromatin state dynamics in nine A, Wichmann HE, Kolcic I, Zemunik T, Boban M, Minelli C, Wheeler HE, human cell types. Nature 473: 43–49, 2011 Igl W, Zaboli G, Wild SH, Wright AF, Campbell H, Ellinghaus D, 6. Musunuru K, Strong A, Frank-Kamenetsky M, Lee NE, Ahfeldt T, Sachs Nöthlings U, Jacobs G, Biffar R, Endlich K, Ernst F, Homuth G, Kroemer KV,LiX,LiH,KuperwasserN,RudaVM,PirruccelloJP,MuchmoreB, HK, Nauck M, Stracke S, Völker U, Völzke H, Kovacs P, Stumvoll M, Mägi Prokunina-Olsson L, Hall JL, SchadtEE,MoralesCR,Lund-KatzS, R, Hofman A, Uitterlinden AG, Rivadeneira F, Aulchenko YS, Polasek O, Phillips MC, Wong J, Cantley W, Racie T, Ejebe KG, Orho-Melander Hastie N, Vitart V, Helmer C, Wang JJ, Ruggiero D, Bergmann S, M, Melander O, Koteliansky V, Fitzgerald K, Krauss RM, Cowan CA, Kähönen M, Viikari J, Nikopensius T, Province M, Ketkar S, Colhoun H, Kathiresan S, Rader DJ: From noncoding variant to phenotype via SORT1 Doney A, Robino A, Giulianini F, Krämer BK, Portas L, Ford I, Buckley at the 1p13 cholesterol locus. Nature 466: 714–719, 2010 BM, Adam M, Thun GA, Paulweber B, Haun M, Sala C, Metzger M, 7. Harismendy O, Notani D, Song X, Rahim NG, Tanasa B, Heintzman N, MitchellP,CiulloM,KimSK,VollenweiderP,RaitakariO,Metspalu Ren B, Fu XD, Topol EJ, Rosenfeld MG, Frazer KA: 9p21 DNA variants A, Palmer C, Gasparini P, Pirastu M, Jukema JW, Probst-Hensch associated with coronary artery disease impair interferon-g signalling NM, Kronenberg F, Toniolo D, Gudnason V, Shuldiner AR, Coresh response. Nature 470: 264–268, 2011 J, Schmidt R, Ferrucci L, Siscovick DS, van Duijn CM, Borecki I, 8. Köttgen A, Glazer NL, Dehghan A, Hwang SJ, Katz R, Li M, Yang Q, Kardia SL, Liu Y, Curhan GC, Rudan I, Gyllensten U, Wilson JF, Gudnason V, Launer LJ, Harris TB, Smith AV, Arking DE, Astor BC, Franke A, Pramstaller PP, Rettig R, Prokopenko I, Witteman JC, Boerwinkle E, Ehret GB, Ruczinski I, Scharpf RB, Chen YD, de Boer IH, Hayward C, Ridker P, Parsa A, Bochud M, Heid IM, Goessling W, Haritunians T, Lumley T, Sarnak M, Siscovick D, Benjamin EJ, Levy D, Chasman DI, Kao WH, Fox CS; CARDIoGRAM ConsortiumICBP Upadhyay A, Aulchenko YS, Hofman A, Rivadeneira F, Uitterlinden AG, ConsortiumCARe ConsortiumWellcome Trust Case Control Consortium van Duijn CM, Chasman DI, Paré G, Ridker PM, Kao WH, Witteman JC, 2 (WTCCC2): Genome-wide association and functional follow-up reveals Coresh J, Shlipak MG, Fox CS: Multiple loci associated with indices of new loci for kidney function. PLoS Genet 8: e1002584, 2012 renal function and chronic kidney disease. Nat Genet 41: 712–717, 2009 12. Chasman DI, Fuchsberger C, Pattaro C, Teumer A, Böger CA, Endlich K, 9. Köttgen A, Pattaro C, Böger CA, Fuchsberger C, Olden M, Glazer NL, Olden M, Chen MH, Tin A, Taliun D, Li M, Gao X, Gorski M, Yang Q, Parsa A, Gao X, Yang Q, Smith AV, O’Connell JR, Li M, Schmidt H, Hundertmark C, Foster MC, O’Seaghdha CM, Glazer N, Isaacs A, Liu Tanaka T, Isaacs A, Ketkar S, Hwang SJ, Johnson AD, Dehghan A, CT, Smith AV, O’Connell JR, Struchalin M, Tanaka T, Li G, Johnson AD, Teumer A, Paré G, Atkinson EJ, Zeller T, Lohman K, Cornelis MC, Gierman HJ, Feitosa MF, Hwang SJ, Atkinson EJ, Lohman K, Cornelis Probst-Hensch NM, Kronenberg F, Tönjes A, Hayward C, Aspelund T, MC, Johansson A, Tönjes A, Dehghan A, Lambert JC, Holliday EG, Eiriksdottir G, Launer LJ, Harris TB, Rampersaud E, Mitchell BD, Arking Sorice R, Kutalik Z, Lehtimäki T, Esko T, Deshmukh H, Ulivi S, Chu AY, DE, Boerwinkle E, Struchalin M, Cavalieri M, Singleton A, Giallauria F, Murgia F, Trompet S, Imboden M, Coassin S, Pistis G, Harris TB, Launer Metter J, de Boer IH, Haritunians T, Lumley T, Siscovick D, Psaty BM, LJ, Aspelund T, Eiriksdottir G, Mitchell BD, Boerwinkle E, Schmidt H, Zillikens MC, Oostra BA, Feitosa M, Province M, de Andrade M, Turner Cavalieri M, Rao M, Hu F, Demirkan A, Oostra BA, de Andrade M, ST, Schillert A, Ziegler A, Wild PS, Schnabel RB, Wilde S, Munzel TF, Turner ST, Ding J, Andrews JS, Freedman BI, Giulianini F, Koenig W, Leak TS, Illig T, Klopp N, Meisinger C, Wichmann HE, Koenig W, Zgaga Illig T, Meisinger C, Gieger C, Zgaga L, Zemunik T, Boban M, Minelli C, L, Zemunik T, Kolcic I, Minelli C, Hu FB, Johansson A, Igl W, Zaboli G, Wheeler HE, Igl W, Zaboli G, Wild SH, Wright AF, Campbell H, Wild SH, Wright AF, Campbell H, Ellinghaus D, Schreiber S, Aulchenko Ellinghaus D, Nöthlings U, Jacobs G, Biffar R, Ernst F, Homuth G, YS, Felix JF, Rivadeneira F, Uitterlinden AG, Hofman A, Imboden M, Kroemer HK, Nauck M, Stracke S, Völker U, Völzke H, Kovacs P, Nitsch D, Brandstätter A, Kollerits B, Kedenko L, Mägi R, Stumvoll M, Stumvoll M, Mägi R, Hofman A, Uitterlinden AG, Rivadeneira F, Kovacs P, Boban M, Campbell S, Endlich K, Völzke H, Kroemer HK, Aulchenko YS, Polasek O, Hastie N, Vitart V, Helmer C, Wang JJ, Nauck M, Völker U, Polasek O, Vitart V, Badola S, Parker AN, Ridker PM, Stengel B, Ruggiero D, Bergmann S, Kähönen M, Viikari J, Nikopensius Kardia SL, Blankenberg S, Liu Y, Curhan GC, Franke A, Rochat T, T, Province M, Ketkar S, Colhoun H, Doney A, Robino A, Krämer BK, Paulweber B, Prokopenko I, Wang W, Gudnason V, Shuldiner AR, Portas L, Ford I, Buckley BM, Adam M, Thun GA, Paulweber B, Haun M, Coresh J, Schmidt R, Ferrucci L, Shlipak MG, van Duijn CM, Borecki I, Sala C, Mitchell P, Ciullo M, Kim SK, Vollenweider P, Raitakari O, Krämer BK, Rudan I, Gyllensten U, Wilson JF, Witteman JC, Pramstaller Metspalu A, Palmer C, Gasparini P, Pirastu M, Jukema JW, Probst- PP, Rettig R, Hastie N, Chasman DI, Kao WH, Heid IM, Fox CS: New loci Hensch NM, Kronenberg F, Toniolo D, Gudnason V, Shuldiner AR, associated with kidney function and chronic kidney disease. Nat Genet Coresh J, Schmidt R, Ferrucci L, Siscovick DS, van Duijn CM, Borecki IB, 42: 376–384, 2010 Kardia SL, Liu Y, Curhan GC, Rudan I, Gyllensten U, Wilson JF, Franke A, 10. Böger CA, Gorski M, Li M, Hoffmann MM, Huang C, Yang Q, Teumer A, Pramstaller PP, Rettig R, Prokopenko I, Witteman J, Hayward C, Ridker Krane V, O’Seaghdha CM, Kutalik Z, Wichmann HE, Haak T, Boes E, PM,ParsaA,BochudM,HeidIM,KaoWH,FoxCS,KöttgenA; Coassin S, Coresh J, Kollerits B, Haun M, Paulweber B, Köttgen A, Li G, CARDIoGRAM ConsortiumICBP ConsortiumCARe ConsortiumWTCCC2: Shlipak MG, Powe N, Hwang SJ, Dehghan A, Rivadeneira F, Integration of genome-wide association studies with biological knowl- Uitterlinden A, Hofman A, Beckmann JS, Krämer BK, Witteman J, edge identifies six novel genes related to kidney function. Hum Mol Bochud M, Siscovick D, Rettig R, Kronenberg F, Wanner C, Thadhani RI, Genet 21: 5329–5343, 2012 Heid IM, Fox CS, Kao WH; CKDGen Consortium: Association of eGFR- 13. Sandholm N, Salem RM, McKnight AJ, Brennan EP, Forsblom C, Related Loci Identified by GWAS with Incident CKD and ESRD. PLoS Isakova T, McKay GJ, Williams WW, Sadlier DM, Mäkinen VP, Swan EJ, Genet 7: e1002292, 2011 Palmer C, Boright AP, Ahlqvist E, Deshmukh HA, Keller BJ, Huang H, 11. Pattaro C, Köttgen A, Teumer A, Garnaas M, Böger CA, Fuchsberger C, Ahola AJ, Fagerholm E, Gordin D, Harjutsalo V, He B, Heikkilä O, Olden M, Chen MH, Tin A, Taliun D, Li M, Gao X, Gorski M, Yang Q, Hietala K, Kytö J, Lahermo P, Lehto M, Lithovius R, Osterholm AM, Hundertmark C, Foster MC, O’Seaghdha CM, Glazer N, Isaacs A, Liu Parkkonen M, Pitkäniemi J, Rosengård-Bärlund M, Saraheimo M, Sarti CT, Smith AV, O’Connell JR, Struchalin M, Tanaka T, Li G, Johnson AD, C, Söderlund J, Soro-Paavonen A, Syreeni A, Thorn LM, Tikkanen H,

712 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 www.jasn.org BASIC RESEARCH

Tolonen N, Tryggvason K, Tuomilehto J, Wadén J, Gill GV, Prior S, Wolkow P, Dunn JS, Smiles A, Walker WH, Boright AP, Bull SB, Doria A, Guiducci C, Mirel DB, Taylor A, Hosseini SM, Parving HH, Rossing P, Rogus JJ, Rich SS, Warram JH, Krolewski AS; DCCT/EDIC Research Tarnow L, Ladenvall C, Alhenc-Gelas F, Lefebvre P, Rigalleau V, Roussel Group: Genome-wide association scan for diabetic nephropathy R, Tregouet DA, Maestroni A, Maestroni S, Falhammar H, Gu T, susceptibility genes in type 1 diabetes. Diabetes 58: 1403–1410, 2009 Möllsten A, Cimponeriu D, Ioana M, Mota M, Mota E, Serafinceanu 20. McKnight AJ, Patterson CC, Pettigrew KA, Savage DA, Kilner J, Murphy C, Stavarachi M, Hanson RL, Nelson RG, Kretzler M, Colhoun HM, M, Sadlier D, Maxwell AP; Warren 3/U.K. Genetics of Kidneys in Di- Panduru NM, Gu HF, Brismar K, Zerbini G, Hadjadj S, Marre M, abetes (GoKinD) Study Group: A GREM1 gene variant associates with Groop L, Lajer M, Bull SB, Waggott D, Paterson AD, Savage DA, diabetic nephropathy. J Am Soc Nephrol 21: 773–781, 2010 BainSC,MartinF,HirschhornJN,GodsonC,FlorezJC,GroopPH, 21. Pezzolesi MG, Katavetin P, Kure M, Poznik GD, Skupien J, Mychaleckyj Maxwell AP; DCCT/EDIC Research Group: New susceptibility loci JC, Rich SS, Warram JH, Krolewski AS: Confi rmation of genetic associa- associatedwithkidneydiseaseintype1diabetes.PLoS Genet 8: tions at ELMO1 in the GoKinD collection supports its role as a suscepti- e1002921, 2012 bility gene in diabetic nephropathy. Diabetes 58: 2698–2702, 2009 14. Chambers JC, Zhang W, Lord GM, van der Harst P, Lawlor DA, Sehmi 22. McKnight AJ, Currie D, Patterson CC, Maxwell AP, Fogarty DG: Tar- JS, Gale DP, Wass MN, Ahmadi KR, Bakker SJ, Beckmann J, Bilo HJ, geted genome-wide investigation identifies novel SNPs associated Bochud M, Brown MJ, Caulfield MJ, Connell JM, Cook HT, Cotlarciuc I, with diabetic nephropathy. HUGO J 3: 77–82, 2009 Davey Smith G, de Silva R, Deng G, Devuyst O, Dikkeschei LD, 23. Shimazaki A, Kawamura Y, Kanazawa A, Sekine A, Saito S, Tsunoda T, Dimkovic N, Dockrell M, Dominiczak A, Ebrahim S, Eggermann T, Koya D, Babazono T, Tanaka Y, Matsuda M, Kawai K, Iiizumi T, Imanishi FarrallM,FerrucciL,FloegeJ,ForouhiNG,GansevoortRT,HanX, M, Shinosaki T, Yanagimoto T, Ikeda M, Omachi S, Kashiwagi A, Kaku Hedblad B, Homan van der Heide JJ, Hepkema BG, Hernandez- K, Iwamoto Y, Kawamori R, Kikkawa R, Nakajima M, Nakamura Y, Maeda Fuentes M, Hypponen E, Johnson T, de Jong PE, Kleefstra N, Lagou V, S: Genetic variations in the gene encoding ELMO1 are associated with Lapsley M, Li Y, Loos RJ, Luan J, Luttropp K, Maréchal C, Melander O, susceptibility to diabetic nephropathy. Diabetes 54: 1171–1178, 2005 Munroe PB, Nordfors L, Parsa A, Peltonen L, Penninx BW, Perucha E, 24. McDonough CW, Palmer ND, Hicks PJ, Roh BH, An SS, Cooke JN, Pouta A, Prokopenko I, Roderick PJ, Ruokonen A, Samani NJ, Sanna S, Hester JM, Wing MR, Bostrom MA, Rudock ME, Lewis JP, Talbert ME, Schalling M, Schlessinger D, Schlieper G, Seelen MA, Shuldiner AR, Blevins RA, Lu L, Ng MC, Sale MM, Divers J, Langefeld CD, Freedman Sjögren M, Smit JH, Snieder H, Soranzo N, Spector TD, Stenvinkel P, BI, Bowden DW: A genome-wide association study for diabetic ne- Sternberg MJ, Swaminathan R, Tanaka T, Ubink-Veltmaat LJ, Uda M, phropathy genes in African Americans. Kidney Int 79: 563–572, 2011 Vollenweider P, Wallace C, Waterworth D, Zerres K, Waeber G, 25. Hanson RL, Craig DW, Millis MP, Yeatts KA, Kobes S, Pearson JV, Lee Wareham NJ, Maxwell PH, McCarthy MI, Jarvelin MR, Mooser V, AM, Knowler WC, Nelson RG, Wolford JK: Identification of PVT1 as a Abecasis GR, Lightstone L, Scott J, Navis G, Elliott P, Kooner JS: Ge- candidate gene for end-stage renal disease in using a netic loci influencing kidney function and chronic kidney disease. Nat pooling-based genome-wide single nucleotide polymorphism associ- Genet 42: 373–375, 2010 ation study. Diabetes 56: 975–983, 2007 15. Okada Y, Sim X, Go MJ, Wu JY, Gu D, Takeuchi F, Takahashi A, Maeda 26. Gudbjartsson DF, Holm H, Indridason OS, Thorleifsson G, Edvardsson V, S, Tsunoda T, Chen P, Lim SC, Wong TY, Liu J, Young TL, Aung T, Sulem P, de Vegt F, d’Ancona FC, den Heijer M, Wetzels JF, Franzson L, Seielstad M, Teo YY, Kim YJ, Lee JY, Han BG, Kang D, Chen CH, Tsai FJ, Rafnar T, Kristjansson K, Bjornsdottir US, Eyjolfsson GI, Kiemeney LA, Chang LC, Fann SJ, Mei H, Rao DC, Hixson JE, Chen S, Katsuya T, Isono Kong A, Palsson R, Thorsteinsdottir U, Stefansson K: Association of var- M, Ogihara T, Chambers JC, Zhang W, Kooner JS, Albrecht E, iants at UMOD with chronic kidney disease and kidney stones-role of age Yamamoto K, Kubo M, Nakamura Y, Kamatani N, Kato N, He J, Chen and comorbid diseases. PLoS Genet 6: e1001039, 2010 YT, Cho YS, Tai ES, Tanaka T; KidneyGen ConsortiumCKDGen 27. Levey AS, Stevens LA, Schmid CH, Zhang YL, Castro AF 3rd, Feldman ConsortiumGUGC consortium: Meta-analysis identifies multiple loci HI, Kusek JW, Eggers P, Van Lente F, Greene T, Coresh J; CKD-EPI associated with kidney function-related traits in east Asian populations. (Chronic Kidney Disease Epidemiology Collaboration): A new equation Nat Genet 44: 904–909, 2012 to estimate glomerular filtration rate. Ann Intern Med 150: 604–612, 2009 16. Bostrom MA, Lu L, Chou J, Hicks PJ, Xu J, Langefeld CD, Bowden DW, 28. Trudu M, Janas S, Lanzani C, Debaix H, Schaeffer C, Ikehata M, Citterio Freedman BI: Candidate genes for non-diabetic ESRD in African L, Demaretz S, Trevisani F, Ristagno G, Glaudemans B, Laghmani K, Americans: A genome-wide association study using pooled DNA. Hum Dell’Antonio G, Loffing J, Rastaldi MP, Manunta P, Devuyst O, Genet 128: 195–204, 2010 Rampoldi L; Swiss Kidney Project on Genes in Hypertension (SKIPOGH) 17. Tong Z, Yang Z, Patel S, Chen H, Gibbs D, Yang X, Hau VS, Kaminoh Y, team: Common noncoding UMOD gene variants induce salt-sensitive Harmon J, Pearson E, Buehler J, Chen Y, Yu B, Tinkham NH, Zabriskie hypertension and kidney damage by increasing uromodulin expres- NA, Zeng J, Luo L, Sun JK, Prakash M, Hamam RN, Tonna S, sion. Nat Med 19: 1655–1660, 2013 Constantine R, Ronquillo CC, Sadda S, Avery RL, Brand JM, London N, 29. Schadt EE, Molony C, Chudin E, Hao K, Yang X, Lum PY, Kasarskis A, Anduze AL, King GL, Bernstein PS, Watkins S, Jorde LB, Li DY, Aiello LP, Zhang B, Wang S, Suver C, Zhu J, Millstein J, Sieberts S, Lamb J, Pollak MR, Zhang K; Genetics of Diabetes and Diabetic Complication GuhaThakurta D, Derry J, Storey JD, Avila-Campillo I, Kruger MJ, Study Group: polymorphism of the erythropoietin gene in Johnson JM, Rohl CA, van Nas A, Mehrabian M, Drake TA, Lusis AJ, severe diabetic eye and kidney complications. Proc Natl Acad Sci U S A Smith RC, Guengerich FP, Strom SC, Schuetz E, Rushmore TH, Ulrich R: 105: 6998–7003, 2008 Mapping the genetic architecture of gene expression in human liver. 18. Maeda S, Kobayashi MA, Araki S, Babazono T, Freedman BI, Bostrom PLoS Biol 6: e107, 2008 MA, Cooke JN, Toyoda M, Umezono T, Tarnow L, Hansen T, Gaede P, 30. Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett Jorsal A, Ng DP, Ikeda M, Yanagimoto T, Tsunoda T, Unoki H, Kawai K, J, Guigo R, Dermitzakis ET: Transcriptome genetics using second gener- Imanishi M, Suzuki D, Shin HD, Park KS, Kashiwagi A, Iwamoto Y, Kaku ation sequencing in a Caucasian population. Nature 464: 773–777, 2010 K, Kawamori R, Parving HH, Bowden DW, Pedersen O, Nakamura Y: 31. Stranger BE, Nica AC, Forrest MS, Dimas A, CP, Beazley C, Ingle A single nucleotide polymorphism within the acetyl-coenzyme A car- CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavaré S, Deloukas boxylase beta gene is associated with proteinuria in patients with type 2 P, Dermitzakis ET: Population genomics of human gene expression. diabetes. PLoS Genet 6: e1000842, 2010 Nat Genet 39: 1217–1224, 2007 19. Pezzolesi MG, Poznik GD, Mychaleckyj JC, Paterson AD, Barati MT, 32. Grundberg E, Small KS, Hedman AK, Nica AC, Buil A, Keildson S, Bell Klein JB, Ng DP, Placha G, Canani LH, Bochenski J, Waggott D, JT, Yang TP, Meduri E, Barrett A, Nisbett J, Sekowska M, Wilk A, Shin Merchant ML, Krolewski B, Mirea L, Wanic K, Katavetin P, Kure M, SY, Glass D, Travers M, Min JL, Ring S, Ho K, Thorleifsson G, Kong A,

J Am Soc Nephrol 26: 692–714, 2015 Genetical Genomics of CKD 713 BASIC RESEARCH www.jasn.org

Thorsteindottir U, Ainali C, Dimas AS, Hassanali N, Ingle C, Knowles D, from patients with proliferative diabetic retinopathy. Invest Ophthalmol Krestyaninova M, Lowe CE, Di Meglio P, Montgomery SB, Parts L, Vis Sci 49: 3151–3157, 2008 Potter S, Surdulescu G, Tsaprouni L, Tsoka S, Bataille V, Durbin R, 38. Ihara KI, Nishimura T, Fukuda T, Ookura T, Nishimori K: Generation of Nestle FO, O’Rahilly S, Soranzo N, Lindgren CM, Zondervan KT, Venus reporter knock-in mice revealed MAGI-2 expression patterns in Ahmadi KR, Schadt EE, Stefansson K, Smith GD, McCarthy MI, adult mice [published online ahead of print February 15, 2012]. Gene Deloukas P, Dermitzakis ET, Spector TD; Multiple Tissue Human Ex- Expr Patterns doi: 10.1016/j.gep.2012.01.006 pression Resource (MuTHER) Consortium: Mapping cis- and trans- 39. Bielesz B, Sirin Y, Si H, Niranjan T, Gruenwald A, Ahn S, Kato H, Pullman regulatory effects across multiple tissues in twins. Nat Genet 44: J, Gessler M, Haase VH, Susztak K: Epithelial Notch signaling regulates fi 1084–1089, 2012 interstitial brosis development in the kidneys of mice and . – 33. Nica AC, Parts L, Glass D, Nisbet J, Barrett A, Sekowska M, Travers M, JClinInvest120: 4040 4054, 2010 Potter S, Grundberg E, Small K, Hedman AK, Bataille V, Tzenova Bell J, 40. Anders HJ, Vielhauer V, Frink M, Linde Y, Cohen CD, Blattner SM, Kretzler M, Strutz F, Mack M, Gröne HJ, Onuffer J, Horuk R, Nelson Surdulescu G, Dimas AS, Ingle C, Nestle FO, di Meglio P, Min JL, Wilk PJ, Schlöndorff D: A CCR-1 antagonist reduces A, Hammond CJ, Hassanali N, Yang TP, Montgomery SB, O’Rahilly S, renal fibrosis after unilateral ureter ligation. J Clin Invest 109: 251– Lindgren CM, Zondervan KT, Soranzo N, Barroso I, Durbin R, Ahmadi K, 259, 2002 Deloukas P, McCarthy MI, Dermitzakis ET, Spector TD; MuTHER 41. Woroniecka KI, Park AS, Mohtat D, Thomas DB, Pullman JM, Susztak K: Consortium: The architecture of gene regulatory variation across Transcriptome analysis of human diabetic kidney disease. Diabetes 60: multiple human tissues: The MuTHER study. PLoS Genet 7: e1002003, 2354–2369, 2011 2011 42. Si H, Banga RS, Kapitsinou P, Ramaiah M, Lawrence J, Kambhampati G, 34. Flutre T, Wen X, Pritchard J, Stephens M: A statistical framework for Gruenwald A, Bottinger E, Glicklich D, Tellis V, Greenstein S, Thomas joint eQTL analysis in multiple tissues. PLoS Genet 9: e1003486, 2013 DB, Pullman J, Fazzari M, Susztak K: Human and murine kidneys show 35. Ko YA, Mohtat D, Suzuki M, Park AS, Izquierdo MC, Han SY, Kang HM, gender- and species-specific gene expression differences in response Si H, Hostetter T, Pullman JM, Fazzari M, Verma A, Zheng D, Greally JM, to injury. PLoS ONE 4: e4802, 2009 Susztak K: Cytosine methylation changes in enhancer regions of core 43. Trapnell C, Pachter L, Salzberg SL: TopHat: Discovering splice junctions pro-fibrotic genes characterize kidney fibrosis development. Genome with RNA-Seq. Bioinformatics 25: 1105–1111, 2009 Biol 14: R108, 2013 44. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, 36. Bagley RG, Rouleau C, Weber W, Mehraein K, Smale R, Curiel M, Salzberg SL, Rinn JL, Pachter L: Differential gene and transcript ex- Callahan M, Roy A, Boutin P, St Martin T, Nacht M, Teicher BA: Tumor pression analysis of RNA-seq experiments with TopHat and Cufflinks. endothelial marker 7 (TEM-7): A novel target for antiangiogenic ther- Nat Protoc 7: 562–578, 2012 apy. Microvasc Res 82: 253–262, 2011 37. Yamaji Y, Yoshida S, Ishikawa K, Sengoku A, Sato K, Yoshida A, Kuwahara R, Ohuchida K, Oki E, Enaida H, Fujisawa K, Kono T, Ishibashi T: TEM7 This article contains supplemental material online at http://jasn.asnjournals. (PLXDC1) in neovascular endothelial cells of fibrovascular membranes org/lookup/suppl/doi:10.1681/ASN.2014010028/-/DCSupplemental.

714 Journal of the American Society of Nephrology J Am Soc Nephrol 26: 692–714, 2015 Supplementary Material

Functional genomic annotation of genetic risk loci highlights inflammation and epithelial biology networks in chronic kidney disease

Nora Ledo, Yi-An Ko, Ae Seo Deok Park, Hyun Mi Kang, Sang-Youb Han*, Peter Choi and

Katalin Susztak

Renal Electrolyte and Hypertension Division, Perelman School of Medicine, University of

Pennsylvania, Philadelphia, PA 19104

* Current address: Department of Internal Medicine, Inje Univ. Ilsan-Paik Hospital, Joowha-ro

170, Ilsanseo-gu, Goyang, Gyunggi-prov. South Korea, zip: 411-706

Correspondence:

Katalin Susztak MD, PhD

Associate Professor of Medicine

Perelman School of Medicine, University of Pennsylvania

415 Curie Blvd

415 Clinical Research Building

Philadelphia, PA 19104 [email protected]

Tel: 215 898 2009 Supplementary Table 1

List of the publications included in the analysis

The table shows the list of reviewed articles that described an association between single

nucleotide polymorphisms (SNPs), kidney function, chronic kidney disease, end stage kidney

failure or other related symptoms. 44 leading SNPs from the first 10 publications were

included in our study.

List of Reviewed Articles 1 New loci associated with kidney function and chronic kidney disease Köttgen A et al. Nat Genet. 2010 May;42(5):376-84. doi: 10.1038/ng.568. Epub 2010 Apr 11. 2 Genome-wide association and functional follow-up reveals new loci for kidney function Pattaro C et al. PLoS Genet. 2012;8(3):e1002584. doi: 10.1371/journal.pgen.1002584. Epub 2012 Mar 29. 3 Association of variants at UMOD with chronic kidney disease and kidney stones-role of age and comorbid diseases Gudbjartsson DF et al. PLoS Genet. 2010 Jul 29;6(7):e1001039. doi: 10.1371/journal.pgen.1001039. Erratum in: PLoS Genet. 2010;6(11). 4 Multiple loci associated with indices of renal function and chronic kidney disease. Köttgen A et al. Nat Genet. 2009 Jun;41(6):712-7. doi: 10.1038/ng.377. Epub 2009 May 10 5 Genetic loci influencing kidney function and chronic kidney disease Chambers JC et al. Nat Genet. 2010 May;42(5):373-5. doi: 10.1038/ng.566. Epub 2010 Apr 11 6 Meta-analysis identifies multiple loci associated with kidney function-related traits in east Asian populations Okada Y et al. Nat Genet. 2012 Jul 15;44(8):904-9. doi: 10.1038/ng.2352 7 Integration of genome-wide association studies with biological knowledge identifies six novel genes related to kidney function Chasman DI et al. Hum Mol Genet. 2012 Dec 15;21(24):5329-43. doi: 10.1093/hmg/dds369. Epub 2012 Sep 8. 8 Candidate genes for non-diabetic ESRD in African Americans: a genome-wide association study using pooled DNA Bostrom MA et al. Hum Genet. 2010 Aug;128(2):195-204. doi: 10.1007/s00439-010-0842-3. Epub 2010 Jun 8 9 New susceptibility loci associated with kidney disease in type 1 diabetes Sandholm N et al. PLoS Genet. 2012 Sep;8(9):e1002921. doi: 10.1371/journal.pgen.1002921. Epub 2012 Sep 20 10 Promoter polymorphism of the erythropoietin gene in severe diabetic eye and kidney complications Tong Z et al. Proc Natl Acad Sci U S A. 2008 May 13;105(19):6998-7003. doi: 10.1073/pnas.0800454105. Epub 2008 May 5 11 Genetic association for renal traits among participants of African ancestry reveals new loci for renal function Liu CT et al. PLoS Genet. 2011 Sep;7(9):e1002264. doi: 10.1371/journal.pgen.1002264. Epub 2011 Sep 8. 12 CUBN is a gene locus for albuminuria Böger CA et al. J Am Soc Nephrol. 2011 Mar;22(3):555-70. doi: 10.1681/ASN.2010060598 13 Genome-wide association study for renal traits in the Framingham and Atherosclerosis Risk in Communities Studies Kottgen A et al. BMC Med Genet. 2008 Jun 3;9:49. doi: 10.1186/1471-2350-9-49 14 A single nucleotide polymorphism within the acetyl-coenzyme A carboxylase beta gene is associated withproteinuria in patients with type 2 diabetes Maeda S et al. PLoS Genet. 2010 Feb 12;6(2):e1000842. doi: 10.1371/journal.pgen.1000842 15 Genome-wide association scan for diabetic nephropathy susceptibility genes in type 1 diabetes Pezzolesi MG et al. Diabetes. 2009 Jun;58(6):1403-10. doi: 10.2337/db08-1514. Epub 2009 Feb 27. 16 Effects of MCF2L2, ADIPOQ and genetic polymorphisms on the development of nephropathy in type 1Diabetes Mellitus Zhang D et al. BMC Med Genet. 2010 Jul 28;11:116. doi: 10.1186/1471-2350-11-116. 17 A GREM1 gene variant associates with diabetic nephropathy McKnight AJ et al. J Am Soc Nephrol. 2010 May;21(5):773-81. doi: 10.1681/ASN.2009070773. Epub 2010 Feb 11. 18 Genetic analysis of coronary artery disease single-nucleotide polymorphisms in diabetic nephropathy McKnight AJ et al. Nephrol Dial Transplant. 2009 Aug;24(8):2473-6. doi: 10.1093/ndt/gfp015. Epub 2009 Mar 31. 19 Confirmation of genetic associations at ELMO1 in the GoKinD collection supports its role as a susceptibility gene in diabetic nephropathy. Pezzolesi MG et al. Diabetes. 2009 Nov;58(11):2698-702. doi: 10.2337/db09-0641. Epub 2009 Aug 3 20 An intergenic region on chromosome 13q33.3 is associated with the susceptibility to kidney disease in type 1 and 2 diabetes. Pezzolesi MG et al. Kidney Int. 2011 Jul;80(1):105-11. doi: 10.1038/ki.2011.64. Epub 2011 Mar 16. 21 Targeted genome-wide investigation identifies novel SNPs associated with diabetic nephropathy McKnight AJ et al. Hugo J. 2009 Dec;3(1-4):77-82. doi: 10.1007/s11568-010-9133-2. Epub 2010 Feb 24 22 Identification of PVT1 as a candidate gene for end-stage renal disease in type 2 diabetes using a pooling-based genome-wide single nucleotide polymorphism association study. Hanson RL et al. Diabetes. 2007 Apr;56(4):975-83 23 A genome-wide association study for diabetic nephropathy genes in African Americans McDonough CW et al. Kidney Int. 2011 Mar;79(5):563-72. doi: 10.1038/ki.2010.467. Epub 2010 Dec 8 24 Genetic variations in the gene encoding ELMO1 are associated with susceptibility to diabetic nephropathy Shimazaki A et al. Diabetes. 2005 Apr;54(4):1171-8

Supplementary Table 2

List of single nucleotide polymorphisms (SNPs) that met our criteria

The table shows the list of the single nucleotide polymorphisms (SNPs) which reached the

genome wide significance (P < 5 x 10-8) in the association with eGFR (estimated glomerular

filtration rate, based on creatinine (crea) or cystatin C (cys) levels) and/or the presence of chronic

kidney disease (CKD) or end stage renal disease (ESRD).

Genes less than 250 kb from the leading SNPs are listed. Color-coding shows the baseline

expression of the transcripts based on human kidney RNA sequencing, red: high expression,

yellow: medium expression, green: low expression, blue: no expression. Genes with available

probe set IDs on the microarray chip are marked bold.

Leading Location Position Leading SNP Association Association p- Genes within Journal SNPs (chr) functional parameter value 250-250kb location 1 rs10794720 10 1156165 Intronic eGFRcrea p=2.1 × 10−8 LARP4B, 1 GTPBP4, IDI2, IDI1, WDR37, ADARB2 2 rs491567 15 53946593 Intronic eGFRcrea p=1.3 × 10−8 WDR72 1 3 rs267734 1 150951477 Upstream eGFRcrea p=5.2 × 10−9 CTSS, CTSK, 1 ARNT, SETDB1, CERS2, ANXA9, FAM63A, PRUNE, MLLT11, BNIPL, C1orf56, GABPB2, SEMA6C, CDC42SE1, LYSMD1 SCNM1, TMOD4, VPS72 PIP5K1A, TNFAIP8L2 4 rs347685 3 141807137 Intronic eGFRcrea p=7.0 × 10−9 ATP1B3, TFDP2, 1 GK5, XRN1 5 rs4744712 9 71434707 Intronic eGFRcrea p=7.2 × 10−10 PIP5K1B, 1 FAM122A, PRKACG, FXN 6 rs626277 13 72347696 Intronic eGFRcrea p=2.9 × 10−10 DACH1 1 7 rs1394125 15 76158983 Intronic eGFRcrea p=3.7 × 10−10 SNUPN, IMP3, 1 SNX33, CSPG4, ODF3L1, UBE2Q2, NRG4, C15orf27 8 rs9895661 17 59456589 Intronic eGFRcrea p=1.4 × 10−8 BCAS3, TBX2, 1 C17orf82, TBX4, NACA2 9 rs10109414 8 23751151 Intergenic eGFRcrea p=1.0 × 10−8 NKX3-1, NKX2- 1 6, STC1 10 rs911119 20 23612737 Intergenic eGFRcys p=2.3 × 10−138 NAPB, CSTL1, 1 CST11, CST8, CST9L, CST9, CST3, CST4, CST1, CST2, CST5 11 rs6465825 7 77416439 Intergenic eGFRcrea p=3.5 × 10−9 PTPN12, 1 RSBN1L, TMEM60, PHTF2, MAGI2 12 rs653178 12 112007756 Intronic eGFRcys p=3.8 × 10−8 CUX2, 1 FAM109A, SH2B3, ATXN2, BRAP, ACAD10, ALDH2 13 rs6420094 5 176817636 Intronic eGFRcrea p=3.8 × 10−12 NSD1, RAB24, 1 PRELID1, MXD3, LMAN2, RGS14, SLC34A1, PFN3, F12, GRK6, PRR7, DBN1, PDLIM7, DOK3, DDX41, FAM193B, TMED9, B4GALT7 14 rs11959928 5 39397132 Intronic eGFRcrea p=1.8 × 10−11 FYB, C9, DAB2 1 15 rs12917707 16 20367690 Upstream eGFRcrea p=1.2 × 10−20 GP2, UMOD, 1 PDILT, ACSM5, ACSM2A, ACSM2B 16 rs2453533 15 45641225 Intergenic eGFRcrea p=4.6 × 10−22 DUOX1, 1 DUOXA2, DUOXA1, SHF, SLC28A2, GATM, SPATA5L1, C15orf48, SLC30A4, BLOC1S6 17 rs17319721 4 77368847 Intronic eGFRcrea p=1.1 × 10−19 SCARB2, 1 FAM47E, STBD1, CCDC158, SHROOM3 18 rs1933182 1 109999588 Intergenic eGFRcrea p=1.3 × 10−8 SARS, CELSR2, 1 PSRC1, MYBPHL, SORT1, PSMA5, SYPL2, ATXN7L2, CYB561D1, AMIGO1, GPR61, GNAI3, AMPD2, GSTM2, GSTM4, GSTM1, GNAT2 19 rs16864170 2 5907880 Intergenic CKD p=4.5 × 10−8 SOX11 1 20 rs881858 6 43806609 Intergenic eGFRcrea p=2.2 × 10−11 POLH, GTPBP2, 1 MAD2L1BP, RSPH9, MRPS18A, VEGFA, C6orf223 21 rs7805747 7 151407801 Intronic CKD p=8.6 × 10−9 RHEB, PRKAG2 1 22 rs4014195 11 65506822 Intergenic eGFRcrea p=3.3 × 10−8 SCYL1, LTBP3, 1 SSSCA1, FAM89B, EHBP1L1, KCNK7, MAP3K11, PCNXL3, SIPA1, RELA, KAT5, RNASEH2C, AP5B1, OVOL1, SNX32, CFL1, MUS81, EFEMP2, CCDC85B, FOSL1, CTSW, FIBP, C11orf68, TSGA10IP, SART1, DRAP1 23 rs12460876 19 33356891 Intronic eGFRcrea p=5.5 × 10−9 ANKRD27, 1 RGS9BP, NUDT19, TDRD12, SLC7A9, CEP89, C19orf40, RHPN2, GPATCH1 24 rs2279463 6 160668389 Intronic eGFRcrea p=8.7 × 10−10 IGF2R, 1 SLC22A1, SLC22A2, SLC22A3 25 rs10774021 12 349298 Intronic eGFRcrea p=6.7 × 10−9 IQSEC3, 1 SLC6A12, SLC6A13, KDM5A, CCDC77, B4GALNT3 26 rs6431731 2 15863002 Intergenic eGFRcrea p=4.6 x 10-8 DDX1, MYCN 2 27 rs3925584 11 30760335 Intergenic eGFRcrea p=1 x 10-9 MPPED2, 2 DCDC5, DCDC1 28 rs12124078 1 15869899 Intronic eGFRcrea p=9.8 x 10-10 FHAD1, EFHD2, 2 CTRC, CELA2A, CELA2B, CASP9, DNAJC16, AGMAT, DDI2, RSC1A1, SLC25A34, TMEM82, FBLIM1 29 rs2453580 17 19438321 Intronic eGFRcrea p=4.6 x 10-8 EPN2, B9D1, 2 MAPK7, MFAP4, RNF112, SLC47A1, ALDH3A2, ALDH3A1, SLC47A2, ULK2 30 rs11078903 17 37631924 Intronic eGFRcrea p=2.4 x10-9 FBXL20, MED1, 2 CDK12, NEUROD2, PPP1R1B, STARD3, PNMT, PGAP3, ERBB2, TCAP 31 rs4293393 16 20364588 Intronic eGFRcrea p=2.6 x10-10 GP2, UMOD, 3 PDILT, ACSM5, ACSM2A, ACSM2B X rs12917707 16 20367690 Intronic CKD p=2.9 x 10-9 GP2, UMOD, 4 PDILT, ACSM5, ACSM2A, ACSM2B 32 rs6040055 20 10633313 Intronic eGFRcrea p=1 x 10-8 MKKS, SLX4IP, 4 JAG1 33 rs1731274 8 23766319 Intergenic eGFRcys p=4.6 x 10-8 STC1, NKX3-1, 4 NKX2-6 34 rs13038305 20 23610262 Intronic eGFRcys p=2.2 x 10-88 NAPB, CSTL1, 4 CST11, CST8, CST9L, CST9, CST3, CST4, CST1, CST2, CST5 35 rs10206899 2 73900900 Intronic eGFRcrea p=2.3 x 10-8 ALMS1, NAT8, 5 NAT8B, TPRKB, DUSP11, C2orf78, STAMBP, ACTG2 X rs9895661 17 59456589 Intronic eGFRcrea p=4.8 × 10−11 BCAS3, TBX2, 6 C17orf82, TBX4, NACA2 36 rs11864909 16 20400839 Intronic eGFRcrea p=3.6 × 10−10 GP2, UMOD, 6 PDILT, ACSM5, ACSM2A, ACSM2B, ACSM1 37 rs13146355 4 77412140 Intronic eGFRcrea p=6.6 × 10−11 FAM47E, 6 STBD1, CCDC158, SHROOM3 38 rs10277115 7 1285195 Intergenic eGFRcrea p=1.0 × 10−10 , GPR146, 6 GPER, ZFAND2A, UNCX, MICALL2, INTS1 39 rs3828890 6 31440669 Unknown eGFRcrea p=1.2 × 10−9 HLA-C, HLA-B, 6 MICA, MICB, DDX39B, ATP6V1G2, LTA, NFKBIL1, LST1, NCR3, AIF1, PRRC2A, BAG6, C6orf47, GPANK1, CSNK2B, LY6G5B, ABHD16A, LY6G5C, APOM, LY6G6F, LY6G6C, DDAH2, C6orf25, LTB, TNF 40 rs7208487 17 37543449 Intronic eGFRcrea p=5.6 x 10-9 PLXDC1, 7 CACNB1, ARL5C, RPL19, STAC2, FBXL20, MED1, CDK12, NEUROD2, PPP1R1B, STARD3 41 rs4821469 22 36616445 Intergenic ESRD p=1.78 x 10-19 APOL3, APOL4, 8 APOL2, APOL1, MYH9, TXN2 42 rs12437854 15 94141833 Intergenic ESRD p=2 x 10-9 no gene in <250 9 kb distance 43 rs7583877 2 100460654 Intronic ESRD p=1.2 x 10-8 AFF3 9 44 rs1617640 7 100317298 In promoter but ESRD p=2.66 x 10-8 TSC22D4, 10 not missence NYAP1, AGFG2, SAP25, LRCH4, FBXO24, PCOLCE, MOSPD3, TFR2, ACTL6B, GNB2, GIGYF1, POP7, EPO, ZAN, EPHB4, SLC12A9, TRIP6, SRRT, UFSP1, ACHE List of Journals 1 New loci associated with kidney function and chronic kidney disease Köttgen A et al. Nat Genet. 2010 May;42(5):376-84. doi: 10.1038/ng.568. Epub 2010 Apr 11. 2 Genome-wide association and functional follow-up reveals new loci for kidney function Pattaro C et al. PLoS Genet. 2012;8(3):e1002584. doi: 10.1371/journal.pgen.1002584. Epub 2012 Mar 29. 3 Association of variants at UMOD with chronic kidney disease and kidney stones-role of age and comorbid diseases Gudbjartsson DF et al. PLoS Genet. 2010 Jul 29;6(7):e1001039. doi: 10.1371/journal.pgen.1001039. Erratum in: PLoS Genet. 2010;6(11). 4 Multiple loci associated with indices of renal function and chronic kidney disease. Köttgen A et al. Nat Genet. 2009 Jun;41(6):712-7. doi: 10.1038/ng.377. Epub 2009 May 10 5 Genetic loci influencing kidney function and chronic kidney disease Chambers JC et al. Nat Genet. 2010 May;42(5):373-5. doi: 10.1038/ng.566. Epub 2010 Apr 11 6 Meta-analysis identifies multiple loci associated with kidney function-related traits in east Asian populations Okada Y et al. Nat Genet. 2012 Jul 15;44(8):904-9. doi: 10.1038/ng.2352 7 Integration of genome-wide association studies with biological knowledge identifies six novel genes related to kidney function Chasman DI et al. Hum Mol Genet. 2012 Dec 15;21(24):5329-43. doi: 10.1093/hmg/dds369. Epub 2012 Sep 8. 8 Candidate genes for non-diabetic ESRD in African Americans: a genome-wide association study using pooled DNA Bostrom MA et al. Hum Genet. 2010 Aug;128(2):195-204. doi: 10.1007/s00439-010-0842-3. Epub 2010 Jun 8 9 New susceptibility loci associated with kidney disease in type 1 diabetes Sandholm N et al. PLoS Genet. 2012 Sep;8(9):e1002921. doi: 10.1371/journal.pgen.1002921. Epub 2012 Sep 20 10 Promoter polymorphism of the erythropoietin gene in severe diabetic eye and kidney complications Tong Z et al. Proc Natl Acad Sci U S A. 2008 May 13;105(19):6998-7003. doi: 10.1073/pnas.0800454105. Epub 2008 May 5

Supplementary Figure 1

Baseline expression of the CKD risk loci associated transcripts in kidney

Human kidney RNA sequencing data was used to examine the baseline expression of the transcripts.

41 % of the CKD risk loci neighboring transcripts are highly expressed in the kidney (red),

32 % showed medium expression (yellow), 21 % showed low expression in the kidney (green).

6% of the transcripts are not expressed (blue).

Baseline expression of the CKD risk associated transcripts 6% KIDNEY 21% High Medium 32% 41% Low No expression Supplementary Figure 2

Comprehensive RNA sequencing map of the CKD risk associated transcripts in 16 different human tissues

Expression levels of 306 CRATs were examined using the Illumina Body Map (www.ebi.ac.uk).

The relative expression of each gene in each tissue is shown in the table. The transcripts with the highest expression represented on the top of the map. High expression values (> 90 percentile) are marked red, low expression values (<10 percentile) marked blue. Expression levels with

FPKM values (fragments per kilobase of exon per million fragments mapped) lower than 0.1 are white. e cl e us e od m e l yt n e l s a t y c h at ta id o en in as n rt e o r g p ry st le is ro ip r a e lo a dn uk e n m a o e st y Gene Symbol ad ad br br co he ki le liv lu ly ov pr sk te th UMOD 1417 0.1 RPL19 975 1359 275 665 1003 273 567 1536 413 1091 1339 1169 1435 964 745 737 CFL1 338 467 334 209 227 112 327 695 169 501 332 231 269 78 208 193 CERS2 53 34 52 37 45 27 149 113 278 110 81 100 18 37 58 CST3 307 218 252 331 310 177 146 510 244 347 334 135 477 212 313 234 ACSM2A 0.2 0.3 0.6 0.5 0.3 133 0.2 98 0.1 0.5 0.6 0.1 0.6 0.2 ACSM2B 0.3 0.3 0.1 1 0.1 121 0.6 0.1 0.4 0.2 ATP1B3 59 91 61 49 63 31 105 65 9 211 135 64 51 13 94 92 GATM 6118971168932145153617123320 ALDH2 39 121 384 134 98 81 86 307 143 52 123 57 53 96 DAB2 6025328207787 41262019 1913 TMED9 54 66 16 39 37 19 75 60 117 73 70 51 47 21 44 70 MYH9 143 23 107 26 63 194 23 225 97 93 87 41 95 75 RHEB 70 55 99 52 103 47 62 29 36 88 38 79 110 85 67 122 GNB2 127 70 42 65 72 28 61 159 36 132 80 112 69 88 74 65 NAT8 0.1 61 0.1 5 0.2 0.2 0.2 0.4 PSMA5 33 40 28 39 37 30 56 74 66 52 37 46 45 41 58 48 LMAN2 28 33 12 22 37 15 55 62 85 42 40 31 44 22 61 60 SLC47A2 0.1 1 0.1 48 0.1 0.1 2 0.1 0.1 0.7 0.4 DDX1 41 39 64 53 44 43 47 21 21 17 16 57 40 50 57 81 SARS 71 70 87 44 63 26 46 57 37 73 38 90 70 76 60 107 SCARB2 51 42 94 71 63 28 46 23 57 75 39 69 73 88 48 76 TXN2 27 32 28 34 24 36 44 37 43 28 30 43 46 39 32 58 VEGFA 17 5 15 13 28 43 3 32 65 32 21 32 29 6 93 WDR72 0.1 1 0.2 0.3 0.2 43 10 0.1 0.7 0.2 0.6 27 SLC47A1 52221241129429126813 SORT1 98738143041176179 616404546 PRELID1 32 83 20 26 35 7 39 83 39 36 34 31 31 4 28 30 FIBP 22 31 37 23 14 16 38 27 13 11 14 29 29 17 49 22 ALDH3A2 28 15 15 68 21 14 37 12 38 47 11 35 34 26 73 SLC6A13 0.2 0.2 2 0.2 0.7 35 2 0.2 0.1 0.6 0.2 0.3 0.9 5 IDI1 13 27 64 20 12 10 31 28 53 46 26 46 15 18 25 38 DRAP1 62 55 64 51 61 17 30 46 19 46 28 59 43 53 121 78 GNAI3 29 27 12 25 29 6 28 52 12 24 22 33 31 6 33 24 SLC6A12 0.6 0.5 4 0.4 0.2 0.3 28 2 13 1 1 1 1 7 CDC42SE1 23 44 12 15 14 10 26 184 8 55 71 27 30 11 13 25 ERBB2 3 4 9 16 26 2 2 4 14 5 MKKS 8 9 18 13 12 16 26 16 14 6 13 13 16 6 24 14 TPRKB 17 36 15 27 29 24 26 20 17 12 21 46 30 11 61 34 APOL1 243051825102518217518415272413 PTPN12 43 33 14 27 37 20 24 57 8 90 36 29 37 11 33 48 BLOC1S6 42 41 28 20 23 26 12 20 21 41 25 6 27 41 PCOLCE 15743361014211 93050594024124 SCNM1 22 46 20 20 21 5 20 34 7 15 22 30 27 32 30 ACTG2 36 173 3 40 2101 2 19 2 86 63 66 1888 11 162 13 GSTM4 91719181313195 611113845131620 PRKAG2 12 17 18 10 13 44 19 23 9 9 10 10 33 3 22 24 SRRT 9 21 12 10 13 9 19 15 11 14 33 30 26 26 60 17 UBE2Q2 17 17 14 18 35 11 19 27 2 20 18 26 33 10 25 23 VPS72 13 18 21 17 22 10 19 16 9 14 14 24 35 44 27 25 MRPS18A 16 19 13 16 12 12 18 15 24 10 13 16 26 31 15 25 SLC22A2 0.7 0.1 18 0.1 0.2 0.1 0.1 0.3 C7orf50 17 15 17 19 18 10 17 13 6 11 29 25 6 18 38 SART1 4751112731917281126301836383844 APOL2 171622169131620134215141591013 DUSP11 10 12 7 10 8 6 16 25 6 8 11 16 13 5 17 14 IMP3 10 24 14 15 12 8 16 29 15 6 17 20 13 15 24 KAT5 13 22 16 14 14 8 16 17 6 12 9 24 26 20 27 37 MFAP4 11 36 3 39 70 23 16 551 67 66 221 7 57 23 APOL3 66 3 30 38 10 15 4 62 42 14 30 7 DBN1 22 12 28 19 19 3 15 3 0.7 8 9 39 22 4 25 16 GTPBP4 15 21 16 14 13 9 15 14 7 11 13 25 15 21 26 28 IGF2R 27 11 4 13 12 16 15 68 13 9 15 11 12 46 11 20 RELA 372910181491537114127 25321824 RNASEH2C 21 40 8 17 19 8 15 13 7 11 13 26 23 23 9 STAMBP 9 9 22 11 9 9 15 17 7 7 15 11 19 13 16 14 DDX41 19 22 24 14 15 4 14 23 9 11 15 29 14 9 20 27 LTBP3 13 23 18 24 24 11 14 4 4 28 25 50 31 12 23 30 PIP5K1A 8 15 8 9 6 10 14 13 5 15 14 7 18 12 SCYL1 15 20 13 11 9 14 17 15 17 21 TBX2 5 3 0.7 4 2 2 14 0.9 26 3 3 11 0.9 7 4 TRIP6 11 12 4 15 8 3 14 1 6 12 8 16 18 1 13 22 EPHB4 22 27 2 15 11 9 13 2 14 7 10 42 6 0.8 15 10 FAM193B 6 13 6 5 10 3 13 6 8 19 28 7 11 5 9 8 FAM63A 438661213163456125914 FBLIM1 18 42 2 7 13 14 13 0.6 0.7 20 28 35 18 1 39 7 GSTM2 4 9 12 38 12 20 13 4 2 24 12 67 30 14 18 23 HLA‐B 37 45 15 28 5 7 13 757 0.8 506 44 2 31 8 16 9 JAG1 22 15 4 16 21 6 13 1 1 22 13 11 20 4 13 10 POP7 10 13 16 10 8 6 13 11 16 7 10 10 13 19 11 11 RAB24 6 14 11 8 6 4 13 24 16 21 9 15 4 7 7 SLC34A1 13 0.1 RHPN2 0.923230.6120.5652230.10.93 SIPA1 114517731235718461215166 TSC22D4 9 12 88 8 10 8 12 36 2 14 26 10 14 3 11 9 ARNT 2111515 811216 132915181614 CTSS 31 43 10 26 16 10 11 2006 19 131 63 22 25 3 9 10 MAP3K11 20 17 5 14 10 4 11 27 16 29 18 8 12 4 11 10 PRUNE 77681117112048101216101420 RGS14 4 12 5 3 2 0.6 11 27 8 5 23 3 3 0.2 3 3 SNUPN 5 11 6 10 8 5 11 10 5 7 11 14 14 9 24 15 TFDP2 635106141165656565210 AGFG2 10128544107195112292127 FXN 4 12 5 8 6 15 10 18 32 4 14 14 16 7 12 8 SSSCA1 19 22 8 14 7 10 10 19 11 23 13 19 12 9 16 23 TMEM60 8 12 7 11 8 4 10 28 6 7 15 12 11 9 19 14 ATXN2 99139699106171015161811 B9D1 69 21714 C11orf68 12 9 14 11 13 7 9 20 4 19 18 16 26 23 21 18 INTS1 619914749786961217219 MAD2L1BP 494448915657855610 FAM89B 813710206 82951725152323108 LARP4B 9 10 7 10 5 9 8 19 4 10 8 10 8 12 14 16 MPPED2 22613280.90.612262331 STARD3 916129106 828813321212111317 ZFAND2A 61535873812161615121221940 ACAD10 37 3 AGMAT 2 0.8 0.7 0.9 7 2 21 0.3 0.8 0.7 ANKRD27 5546113710157101512157 ANXA9 0.7 0.5 1 0.7 0.8 0.8 7 2 6 0.4 0.8 1 4 0.1 1 4 GTPBP2 1412488671523017812201014 PDLIM7 17 47 5 21 69 5 7 13 2 18 11 44 83 32 57 30 SYPL2 0.70.90.22157 20.313235315 B4GALT7 58516336343876467 DACH1 3 0.8 1 1 1 0.8 6 0.8 0.2 3 0.8 0.9 2 1 5 0.3 DNAJC16 3444236642564654 EFHD2 14 27 26 9 15 3 6 145 4 27 19 14 12 1 22 8 FAM122A 564775611336108577 LRCH4 41634336394113819792411 MOSPD3 7655526666610102156 NACA2 72 86 19 69 69 4 6 12 4 15 10 127 14 11 120 40 NUDT19 44363868535451134 PGAP3 46 32666 2 51 PPP1R1B 14 0.3 94 35 30 0.5 6 0.1 15 2 13 25 0.5 10 0.5 RSBN1L 597572623399131051717 SLC7A9 0.1 0.2 0.2 0.1 6 4 0.3 0.5 0.3 0.3 0.2 0.2 ACSM5 30.40.5910.35 230.620.41112 AMPD2 491145252377191010255 BRAP 88879451034699135612 CELSR2 0.30.41411150.610.40.723254 CTSK 171243459175213012055108104621 DDI2 94483557655652745 GK5 3436135314256255 MED1 81056665193581310688 NAPB 341823825315568167 NSD1 55444451423585688 PHTF2 4767 958 7876221011 PIP5K1B 23102724540.155260.7211 SNX33 11611082524101011 277 SPATA5L1 2423325633766368 WDR37 26842551433543347 APOL4 673343410.555 100.733 CASP9 2232234824495354 CCDC85B 19 24 16 13 14 7 4 14 3 32 15 10 20 6 10 9 CDK12 773553414359754106 EHBP1L1 14 30 3 8 21 11 4 1 15 17 14 27 43 12 16 EPN2 1238640.91 68125115 FAM47E 0.3 7 4 0.1 0.2 2 6 GIGYF1 6935334414886377 KDM5A 67 5434 247753107 PCNXL3 3623214533774265 SH2B3 1715395104313161375366 SHROOM3 0.220.62214 360.3240.123 STAC2 0.6 0.4 13 5 0.7 0.3 4 0.1 0.6 0.9 1 6 0.3 2 0.2 STC1 26 3 0.7 5 51 4 4 0.1 11 18 1 23 2 1 14 ULK2 6311754460.943572153 XRN1 76553441548566867 CEP89 333442330.734106466 CYB561D1 121210.9370.90.73320.914 GABPB2 442432350.713741124 GP2 0.1 0.2 0.1 0.4 0.1 3 0.2 0.8 0.2 0.1 0.7 0.1 0.1 GPATCH1 2543323411454264 GRK6 6238553347 713756119 LYSMD1 2243343 32348 33 MAPK7 551 34 7 264 MICALL2 164340.730.30.665240.422 SLC12A9 373321311 39640.523 SLC22A3 2 11 0.6 5 5 0.7 3 26 3 11 10 9 15 12 11 SLC30A4 324331320.713540.672 AMIGO1 22114332511254745 AP5B1 32230.8121212321214 ATXN7L2 0.6 2 0.7 0.7 0.4 0.3 2 0.7 0.4 0.5 2 2 0.9 0.9 5 1 B4GALNT3 0.10.80.60.413210.120.710.60.432 BNIPL 0.3 0.7 0.3 2 0.4 0.3 2 0.3 0.1 2 4 0.5 4 0.1 1 2 CSPG4 9328922 722111644 FAM109A 120.42322212252132 FBXL20 3244332512334 93 FYB 39420.50.821240.9613210.620.5 GPER 0.921430.820.340.520.820.423 GPR146 15165187220.734 16342 MLLT11 44 3 14 2 0.6 3 2 0.4 2 1 POLH 231211240.8 343264 SLC25A34 0.7 3 2 0.4 4 ABHD16A 0.3 0.5 0.9 0.1 0.6 0.6 1 1 0.9 2 1 0.5 2 0.5 1 0.4 ALDH3A1 4 0.7 0.6 0.8 0.3 1 0.1 7 0.9 0.4 0.7 2 0.8 ALMS1 322120.8130.522831193 C15orf27 0.4 0.9 1 0.6 0.1 0.1 1 0.2 0.2 0.6 0.4 0.3 0.3 0.5 3 0.6 DOK3 2 17 1 0.9 0.4 0.6 1 45 0.7 3 20 1 2 0.5 2 0.7 DUOX1 0.2351370.112 90.4361317 NRG4 0.5 2 3 0.8 0.5 0.4 1 0.1 2 2 0.8 1 2 0.8 3 0.7 OVOL1 0.1 0.2 1 1 0.2 1 8 0.3 PLXDC1 43 55110.147 30.712 PSRC1 0.60.7170.510.3120.911230.170.7 SEMA6C 12132810.20.213321353 SLC22A1 0.5 0.4 1 1 0.2 350 0.3 0.1 0.5 0.6 0.1 SLC28A2 0.1 0.2 0.1 0.1 0.5 0.1 1 0.1 0.1 0.1 0.2 0.2 0.4 0.1 STBD1 630.5142610.5 4124835 TMEM82 12 SHF 3242 0.90.90.510 4110.56 CACNB1 0.7110310.90.80.9 1323102319 F12 0.8 63 PNMT 0.6 0.2 1 0.1 1 1 0.8 0.4 0.3 0.2 2 0.8 1 0.8 ACSM1 0.1 0.2 0.5 0.2 0.3 0.1 0.7 0.6 0.3 0.4 2 7 0.2 2 0.3 AFF3 0.955330.90.74 2240.330.6 C19orf40 0.6 1 0.4 1 0.6 0.4 0.7 1 0.3 0.8 0.7 1 0.8 0.5 3 1 CST5 0.7 2 PRRC2A 1 0.6 0.3 0.4 0.7 4 0.7 0.2 0.1 FHAD1 0.6 0.5 1 0.4 0.2 0.6 0.1 0.1 4 0.2 4 0.3 0.1 8 0.4 FOSL1 721220.70.60.50.41120.5110.54 SLX4IP 0.4 0.5 0.5 0.4 0.2 0.4 0.6 2 0.4 0.1 0.8 1 0.9 0.3 0.5 0.8 ARL5C 0.9 0.2 0.5 0.2 0.1 0.5 0.6 0.4 0.6 0.2 0.2 0.5 MYCN 0.3 2 0.2 0.5 0.2 0.6 1 0.2 1 0.8 NKX3‐1 0.610.80.9110.50.41425243141 ADARB2 0.2 0.2 5 2 0.2 0.4 0.7 0.1 0.2 0.5 0.7 0.4 0.1 1 0.1 CSNK2B 0.4 0.3 0.3 DCDC1 0.1 0.1 0.6 0.1 0.4 0.3 0.3 1 0.1 0.9 1 0.8 HLA‐C 17 5 0.4 99 0.6 70 2 0.3 2 0.6 0.7 RNF112 0.4 2 6 0.3 0.6 0.2 0.4 0.1 0.9 0.5 0.7 1 0.8 0.8 1 RSPH9 0.5 0.3 0.4 4 1 5 TCAP 0.3 0.3 0.4 47 0.9 588 0.4 0.2 0.1 0.5 3 0.6 0.4 284 9 99 UFSP1 0.5 2 0.1 1 0.3 0.1 0.4 0.7 0.5 0.1 0.2 1 0.9 1 0.3 2 UNCX 0.4 ACHE 0.334230.20.30.612112118 CST1 0.3 0.3 0.1 0.6 0.2 0.3 CSTL1 0.3 0.2 0.4 0.3 0.3 0.3 2 0.1 NYAP1 0.3 0.3 6 0.2 0.5 0.1 0.3 0.1 0.1 0.5 1 0.4 2 0.3 PRR7 0.1 0.4 1 0.3 0.2 0.2 0.3 1 0.1 1 0.7 0.3 0.3 0.3 0.4 0.2 C6orf223 0.1 0.1 0.1 0.2 0.5 0.1 0.1 1 CCDC158 0.2 0.5 0.7 0.9 0.3 0.4 0.2 0.4 0.6 0.1 0.2 0.3 0.1 17 0.5 CTRC 0.2 0.7 0.2 0.3 0.1 0.1 0.2 0.1 0.1 0.3 0.2 0.3 0.2 0.1 0.3 0.2 DDX39B 0.2 0.1 0.2 0.1 0.1 GPR61 0.1 0.2 1 0.1 0.2 0.1 0.1 0.3 0.1 0.4 0.6 IQSEC3 0.8 0.3 12 0.7 0.7 0.4 0.2 0.4 1 0.7 1 1 1 0.5 LY6G5B 0.7 0.4 0.3 0.3 0.2 0.2 0.1 0.3 0.4 0.2 0.5 PDILT 0.2 0.1 0.1 0.2 0.2 0.1 0.2 3 TFR2 0.5 0.6 2 0.5 0.2 0.3 0.2 0.8 174 0.7 0.9 0.4 0.5 0.1 0.6 0.4 BAG6 0.1 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.3 0.2 0.6 1 NFKBIL1 0.1 0.1 0.1 0.1 0.1 ODF3L1 0.2 0.2 0.1 1 0.2 0.1 0.1 0.3 0.1 0.3 0.2 0.2 8 0.2 TDRD12 0.8 0.2 0.1 0.5 0.4 0.1 0.1 0.2 0.1 0.3 0.8 0.3 0.1 11 0.1 ACTL6B 0.1 22 0.3 0.1 0.5 0.2 AIF1 0.1 APOM 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.1 ATP6V1G2 0.3 0.1 BCAS3 10 3 4 3 9 10 4 5 11 11 C15orf48 106 14 126 3 C17orf82 0.1 0.1 0.4 0.1 0.3 0.1 0.2 0.1 C1orf56 22 2 331194 C2orf78 4 C6orf25 0.1 0.1 C6orf47 C9 571 CCDC77 236 CELA2A 11 CELA2B CST11 0.8 17 CST2 0.1 0.1 0.1 0.2 0.1 0.3 0.1 CST4 0.1 0.1 0.6 CST8 10 CST9 0.2 CST9L 23 CTSW 28914102 CUX2 3 0.1 0.1 0.1 8 1 0.5 1 DCDC5 DDAH2 1 DUOXA1 5 0.2 4 0.8 11 DUOXA2 EFEMP2 17 24 16 10 44 23 21 23 EPO 0.1 4 0.1 0.1 0.4 FBXO24 0.3 0.2 19 GNAT2 0.1 0.1 0.1 0.1 0.5 GPANK1 0.1 0.1 0.1 0.1 GSTM1 12 24 28 0.7 0.2 17 57 154 26 IDI2 184 KCNK7 0.1 1 2 0.1 0.3 0.5 0.3 0.2 0.4 0.3 0.2 0.2 0.3 LST1 0.7 1 LTA LTB 4 1 0.4 4 0.1 LY6G5C 0.2 0.1 0.1 0.1 LY6G6C LY6G6F MAGI2 221521 0.3 134 4 MICA 0.1 0.2 MICB 0.3 0.1 0.1 0.9 0.3 0.4 0.1 MUS81 MXD3 MYBPHL 0.1 2 1 0.1 0.1 0.4 0.3 0.8 0.2 NAT8B NCR3 0.1 NEUROD2 5 0.1 NKX2‐6 PFN3 0.4 PRKACG 10 0.1 RGS9BP 0.4 3 RSC1A1 SAP25 198 SETDB1 513354210 4 75178 SNORD84 SNX32 SOX11 0.5 0.2 0.6 0.1 0.1 TBX4 0.2 0.1 0.2 0.1 0.3 18 0.2 0.1 9 0.7 0.4 TMOD4 12 250 4 12 TNF 0.2 0.8 0.1 0.2 TNFAIP8L2 9 64236 TSGA10IP 0.1 0.1 0.9 0.3 ZAN 0.2 Supplementary Table 3

Demographics, clinical information and histological analysis of glomerular samples

Data are presented as mean and standard deviation with the median values or percentage (%).

Estimated Glomerular Filtration Rate (eGFR) was calculated according to the CKD-EPI equation. Pearson product moment correlation or Spearman correlation coefficient (R coefficient) was used to measure the strength of association between age, BMI, serum-glucose, blood pressure (systole and diastole), serum-creatinine, BUN, serum-albumin, percentage of glomerulosclerosis and interstitial fibrosis and eGFR; depending on the results of the

D'Agostino-Pearson normality tests. Asterisks (*) indicate when the two-tailed tests reached the statistical significance (P < 0.05).

Patient Demographics (Samples from Glomeruli) Total: n=51 % or mean ± SD correlation with GFR (median) (R coefficient) Gender Male 47.1 %

Female 52.9 %

Race Non-Hispanic 19.6 % White

African American 35.3 %

Asian 5.9 %

Hispanic 15.7 %

Multiracial 9.8 %

Unknown 13.7%

Diabetes 45.1 %

Hypertension 80.4 %

Age (years) 61.08 ± 12.9 (63) -0.262

BMI (Body Mass Index) (kg/m2) 32.18 ± 15.7 -0.097 (29.2) Serum glucose (mg/dL) 124.8 ± 51.3 -0.254 (115) Blood pressure - systole (mm Hg) 136.52± 20.2 -0.153 (130) Blood pressure - diastole (mm Hg) 81.24 ± 13.4 (80) -0.081 eGFR ( ml/min/1.73m2 ) 58.53 ± 28.5 (60.9) Serum creatinine (mg/dL) 1.66 ± 1.4 (1.2) -0.893 *

BUN (Blood Urea Nitrogen) (mg/dL) 21.59 ± 14.2 (19) -0.653 *

Serum albumin (g/dL) 3.75 ± 0.8 (4) 0.219

Glomerulosclerosis (%) 11.45 ± 17.4 -0.511 * (3.9) Interstitial Fibrosis (%) 13.91 ± 13.6 (10) -0.586 *

Supplementary Table 4

Demographics, clinical information and histological analysis of tubule samples

Data are presented as mean and standard deviation with the median values or percentage (%).

Estimated Glomerular Filtration Rate (eGFR) was calculated according to the CKD-EPI equation. Pearson product moment correlation or Spearman correlation coefficient (R coefficient) were used to measure the strength of association between age, BMI, serum-glucose, blood pressure (systole and diastole), serum-creatinine, BUN, serum-albumin, percentage of glomerulosclerosis and interstitial fibrosis and eGFR; depending on the results of the

D'Agostino-Pearson normality tests. Asterisks (*) indicate when the two-tailed tests reached the statistical significance (P < 0.05).

Patient Demographics (Samples from Tubules) Total: n=95 % or mean ± SD correlation with (median) eGFR (R coefficient) Gender Male 57.9 %

Female 42.1 %

Race Non-Hispanic 20.0 % White

African American 36.8 %

Asian 3.2 %

Hispanic 6.3 %

Multiracial 17.9 %

Unknown 15.8%

Diabetes 38.9 %

Hypertension 76.8 %

Age (years) 63.57 ± 13.5 (65) -0.131

BMI (Body Mass Index) (kg/m2) 29.77 ± 9.3 (29) 0.150

Serum glucose (mg/dL) 135.4 ± 65.3 0.153 (118) Blood pressure - systole (mm Hg) 138.97 ± 24.8 -0.299 * (136.5) Blood pressure - diastole (mm Hg) 78.05 ± 13.7 -0.174 (78.5) eGFR ( ml/min/1.73m2 ) 60.08 ± 29.8 (64.1) Serum creatinine (mg/dL) 2.05 ± 2.5 (1.1) -0.894 *

BUN (Blood Urea Nitrogen) (mg/dL) 23.2 ± 13.7 (19) -0.696 *

Serum albumin (g/dL) 3.96 ± 0.7 (4.1) 0.228 *

Glomerulosclerosis (%) 17.97 ± 27.3 -0.570 * (5.5) Interstitial Fibrosis (%) 16.47 ± 21.6 (10) -0.732 *

Supplementary Table 5

The correlation between levels of diabetic CKD risk associated transcripts (D-CRATs) and kidney function

We identified 18 D-CRATs in the neighborhood of three loci associating with diabetic kidney disease development (rs12437854, rs7583877 and rs1617640). Pearson product moment correlation coefficient (Pearson R) was used to measure the strength of association between gene expression and eGFR. Two-tailed test was used to determine the statistical significance. Four transcripts showed significant correlation with eGFR (P corrected<0.05) after Benjamini

Hochberg based multiple testing correction and 10 transcripts showed correlation with GFR with uncorrected p values. Gene symbols are official symbols approved by HGNC (HUGO Gene

Nomenclature Committee).

eGFR correlation of DKD specific CRATs in Glomeruli

Gene Symbol Pearson R 95% confidence interval P (two-tailed) P corrected PCOLCE -0.4555 -0.6671 to -0.1759 0.0024 0.0912 LRCH4 -0.3602 -0.5986to -0.0631 0.0191 0.3629 TFR2 0.2841 -0.0217 to 0.5413 0.068 0.8639 MOSPD3 0.1742 -0.1370 to 0.4541 0.270 0.9931 TSC22D4 0.1661 -0.1451 to 0.4475 0.293 0.9931 AGFG2 0.1561 -0.1552 to 0.4392 0.323 0.9931 TRIP6 0.1498 -0.1614 to 0.4340 0.343 0.9931 SRRT 0.1461 -0.1652 to 0.4309 0.356 0.9931 EPO 0.1256 -0.1854 to 0.4138 0.428 0.9931 EPHB4 -0.1240 -0.4124 to 0.1869 0.434 0.9931 GNB2 -0.0958 -0.3884 to 0.2144 0.546 0.9931 ACHE -0.0864 -0.3803 to 0.2234 0.586 0.9931 LRCH4///SAP25 0.0805 -0.2290 to 0.3753 0.612 0.9931 AFF3 -0.0439 -0.3432 to 0.2636 0.783 0.9931 SLC12A9 -0.0265 -0.3278 to 0.2797 0.868 0.9931 POP7 0.0186 -0.2870 to 0.3207 0.907 0.9931 ACTL6B 0.0156 -0.2897 to 0.3180 0.922 0.9931 FBXO24 0.0036 -0.3006 to 0.3072 0.982 0.9931 eGFR correlation of DKD specific CRATs in Tubules Gene Symbol Pearson R 95% confidence interval P (two-tailed) P corrected TRIP6 0.5121 0.3241 to 0.6612 2.26 x 10-6 8.59 x 10-5 LRCH4 -0.3761 -0.5545 to -0.1646 8.14 x 10-4 0.0155 SLC12A9 0.3564 0.1423 to 0.5385 1.58 x 10-3 0.0200 MOSPD3 0.320 0.1019 to 0.5087 4.84 x 10-3 0.0459 AFF3 -0.2731 -0.4696 to -0.0507 0.017 0.1002 SRRT 0.2666 0.0438 to 0.4642 0.020 0.1002 ACHE -0.2642 -0.4621 to -0.0412 0.021 0.1002 AGFG2 0.2518 0.0279 to 0.4516 0.028 0.1191 TFR2 -0.2316 -0.4344 to -0.0065 0.044 0.1674 EPO -0.2036 -0.4103 to 0.0229 0.078 0.2459 EPHB4 0.1858 -0.0413 to 0.3948 0.108 0.3156 FBXO24 -0.1379 -0.3524 to 0.0904 0.235 0.5579 ACTL6B -0.0941 -0.3129 to 0.1342 0.419 0.7275 LRCH4///SAP25 -0.0763 -0.2966 to 0.1518 0.512 0.8467 POP7 -0.0682 -0.2892 to 0.1597 0.558 0.8501 PCOLCE 0.0429 -0.1844 to 0.2658 0.713 0.9128 TSC22D4 0.0284 -0.1983 to 0.2523 0.807 0.9128 GNB2 0.0140 -0.2121 to 0.2387 0.904 0.9128

Supplementary Table 6

List of CRATs showing differential expression in control vs. CKD tubule samples

In tubules, 73 transcripts in the neighborhood of the CKD risk loci showed significant differences when CKD samples are compared to controls. Benjamini-Hochberg multiple-testing correction was used with a P value < 0.05.

Differentially expressed CRATs in chronic kidney disease Gene Symbol P value (corrected) Regulation LST1 6.40 x 10-8 up SLC7A9 4.56 x 10-7 down ALDH3A2 1.89 x 10-6 down SLC34A1 2.04 x 10-6 down CTSS 4.19 x 10-6 up FYB 4.19 x 10-6 up ACSM5 1.85 x 10-5 down LTB 2.55 x 10-5 up UMOD 2.70 x 10-5 down ACSM2A///ACSM2B 3.81 x 10-5 down SLC47A1 4.26 x 10-5 down ANXA9 4.54 x 10-5 down DNAJC16 1.24 x 10-4 down NAT8B 1.63 x 10-4 down ACAD10 1.97 x 10-4 down GSTM4 2.55 x 10-4 down VEGFA 4.05 x 10-4 down CTSW 5.04 x 10-4 up NAT8///NAT8B 5.04 x 10-4 down FAM89B 6.09 x 10-4 up AFF3 6.28 x 10-4 up MYCN 6.28 x 10-4 down ALDH2 6.64 x 10-4 down FAM47E///STBD1 6.70 x 10-4 down GNAI3 6.87 x 10-4 up SLC22A2 8.54 x 10-4 down DAB2 1.21 x 10-3 down STC1 1.59 x 10-3 down APOM 1.83 x 10-3 down GPER 1.93 x 10-3 down SLC22A1 1.96 x 10-3 down AGMAT 2.18 x 10-3 down EHBP1L1 2.47 x 10-3 up SLC6A13 2.47 x 10-3 down FAM193B 2.66 x 10-3 up CERS2 3.14 x 10-3 down LRCH4 3.38 x 10-3 up PLXDC1 3.63 x 10-3 up SLC30A4 4.33 x 10-3 down GATM 5.84 x 10-3 down PGAP3 6.17 x 10-3 down SLC6A12 6.81 x 10-3 down IGF2R 7.18 x 10-3 down MICALL2 8.09 x 10-3 up CTSK 0.011 up DDI2///RSC1A1 0.012 down ATXN2 0.013 down CCDC85B 0.014 up HLA-C 0.014 up TFDP2 0.015 down AIF1 0.018 up DDX1 0.018 down PRUNE 0.018 down MFAP4 0.018 up DBN1 0.021 up CELA2A///CELA2B 0.024 up DACH1 0.024 down TBX2 0.024 down ERBB2 0.027 down GP2 0.030 down F12 0.031 up PHTF2 0.031 up CDC42SE1 0.033 up LARP4B 0.033 down PTPN12 0.035 up PDLIM7 0.036 up IDI1 0.037 down BRAP 0.039 down DUOX1 0.045 up FIBP 0.047 down MPPED2 0.048 down MYH9 0.048 up WDR37 0.049 down

Supplementary Table 7

Demographics, clinical information and histological analysis of the 41 tubule samples for external microarray validation

Data are presented as mean and standard deviation with the median values or percentage (%).

Estimated Glomerular Filtration Rate (eGFR) was calculated according to the CKD-EPI equation. Pearson product moment correlation or Spearman correlation coefficient (R coefficient) were used to measure the strength of association between age, BMI, serum-glucose, blood pressure (systole and diastole), serum-creatinine, BUN, serum-albumin, percentage of glomerulosclerosis and interstitial fibrosis and eGFR; depending on the results of the

D'Agostino-Pearson normality tests. Asterisks (*) indicate when the two-tailed tests reached the statistical significance (P < 0.05).

Patient Demographics (Samples from Tubules for Replication) Total: n=41 % or mean ± SD correlation with (median) eGFR (R coefficient) Gender Male 41.5 %

Female 58.5 %

Race Non-Hispanic 19.5 % White

African American 41.5 %

Asian 2.4%

Hispanic 14.6 %

Multiracial 4.9 %

Unknown 17.1 % Diabetes 51.2 %

Hypertension 78.0%

Age (years) 60.2 ± 13.3 (60) -0.177

BMI (Body Mass Index) (kg/m2) 30.26 ± 6.5 0.042 (30.5) Serum glucose (mg/dL) 140.83 ± 65.9 0.072 (129) Blood pressure - systole (mm Hg) 142.44 ± 22.7 -0.504 (151) Blood pressure - diastole (mm Hg) 76.22 ± 13.8 (75) -0.246 eGFR ( ml/min/1.73m2 ) 52.7 ± 28.2 (55.7) Serum creatinine (mg/dL) 2.01 ± 1.8 (1.2) -0.796 *

BUN (Blood Urea Nitrogen) (mg/dL) 25.0 ± 13.1 (22) -0.749 *

Serum albumin (g/dL) 3.69 ± 0.9 (3.9) 0.409 *

Glomerulosclerosis (%) 17.97 ± 25.5 -0.641 * (14.3) Interstitial Fibrosis (%) 19.93 ± 22.0 (15) -0.769 *

Supplementary Table 8

Demographics, clinical information and histological analysis of the 46 tubule samples used for qRT-PCR validation

Data are presented as mean and standard deviation with the median values or percentage (%).

Estimated Glomerular Filtration Rate (eGFR) was calculated according to the CKD-EPI equation. Pearson product moment correlation or Spearman correlation coefficient (R coefficient) were used to measure the strength of association between age, BMI, serum-glucose, blood pressure (systole and diastole), serum-creatinine, BUN, serum-albumin, percentage of glomerulosclerosis and interstitial fibrosis and eGFR; depending on the results of the

D'Agostino-Pearson normality tests. Asterisks (*) indicate when the two-tailed tests reached the statistical significance (P < 0.05).

Patient Demographics (Tubule samples with qRT-PCR validation) Total: n=46 % or mean ± SD correlation with (median) eGFR (R coefficient) Gender Male 54.35 %

Female 45.65 %

Race Non-Hispanic 21.7 % White

African American 41.3 %

Asian 4.35 %

Hispanic 4.35 %

Multiracial 8.7 %

Unknown 19.6 %

Diabetes 52.2 %

Hypertension 73.9 %

Age (years) 62.2 ± 13.1 0.162 (63.5) BMI (Body Mass Index) (kg/m2) 28.4 ± 6.3 (28.5) 0.197

Serum glucose (mg/dL) 145.8 ± 79.6 0.015 (117.5) Blood pressure - systole (mm Hg) 139.47 ± 29.9 -0.377 * (135) Blood pressure - diastole (mm Hg) 77.81 ± 15.5 -0.291 * (76.5) eGFR ( ml/min/1.73m2 ) 54.2 ± 32.8 (58.1) Serum creatinine (mg/dL) 2.60 ± 3.1 (1.2) -0.743 *

BUN (Blood Urea Nitrogen) (mg/dL) 25.93 ± 13.7 (21) -0.712 *

Serum albumin (g/dL) 3.94 ± 0.6 (4) 0.064

Glomerulosclerosis (%) 23.7 ± 33.1 (6.2) -0.748 *

Interstitial Fibrosis (%) 21.53 ± 25.3 (10) -0.737 *

Supplementary Figure 3

Transcript level expression and correlation of tubule specific transcript levels with renal

function around all 44 CKD risk loci

The x-axis represents the genomic position of each gene on different chromosomes. The y-axis

represents the negative logarithm of the p-value (significance) between the expression of each

gene and eGFR (estimated glomerular filtration rate, ml/min/1.73m2). The lower panel of each

chart represents the expression of transcripts within the 250 kb vicinity of the CKD SNP in 16 human organs. Asterisks indicate genes without probe set IDs on the Affymetrix arrays. Two loci are not shown. There is no gene in the vicinity of rs12437854. No genes correlated with renal function around rs491567. rs10109414 and rs1731274 6 STC1 5

4

3

2 NKX3-1

eGFRvalue) P (-log 1

0 rs10109414 rs1731274 Gene expression correlation with correlation expression Gene Chr8 (p21.2) Position on the chromosome rs10206899 8 NAT8B 7

6 NAT8 5

4

3

2 ALMS1 ACTG2 eGFRvalue) P (-log 1 TPRKB DUSP11 STAMBP 0 rs10206899 Gene expression correlation with correlation expression Gene Chr2 (p13.1) Position on the chromosome rs10277115 4 GPER

3

2 MICALL2 INTS1 1 eGFRvalue) P (-log

0

Gene expression correlation with correlation expression Gene rs10277115 Chr7 (p22.3) Position on the chromosome rs10774021 7 SLC6A13 6 SLC6A12 5

4

3

2 IQSEC3 eGFRvalue) P (-log KDM5A 1

0 rs10774021 Gene expression correlation with correlation expression Gene Chr12 (p13.33) Position on the chromosome Gene expression correlation with eGFR (-log P value) 0 1 2 3 4 Chr10 (p15.3)Chr10 LARPB4 GTPBP4 IDI2 IDI1 WDR37 rs10794720 rs10794720 Position on the the chromosome Positionon ADARB2 rs11959928 8 FYB 7

6 5 DAB2 4

3

2

eGFRvalue) P (-log C9 1

0

Gene expression correlation with correlation expression Gene rs11959928 Chr5 (p13.1) Position on the chromosome rs12124078 7 DNAJC16 6

5 AGMAT 4

3 DDI2 RSC1A1 CASP9 2 EFHD2 CELA2A/2B eGFRvalue) P (-log 1 CTRC

0

Gene expression correlation with correlation expression Gene rs12124078 Chr1 (p36.21) Position on the chromosome rs12460876 10 SLC7A9 9 8 7 6 5 4 3

eGFRvalue) P (-log 2 TDRD12 1 ANKRD27 NUDT19 C19orf40 GPATCH1 0 rs12460876 Gene expression correlation with correlation expression Gene Chr19 (q13.11) Position on the chromosome rs12917707, rs4293393 and rs11864909 with

9 ACSM5 8 ACSM2A ACSM2B 7 UMOD

value) 6 P

correlation 5

log 4 ‐ 3 2 GP2 ACSM1

eGFR ( 1 expression 0 rs4293393 rs11864909

Gene rs12917707 Chr16 (p12.3) Position on the chromosome rs1394125 3 with

2 value) P correlation log ‐ 1 IMP3 CSPG4

eGFR ( SNUPN expression

0

Gene rs1394125 Chr15 (q24.2) Position on the chromosome 4 rs1617640 with TRIP6

3 value) P correlation 2 SLC12A9 log ‐

TFR2 ACHE AGFG2 FBXO24 EPO eGFR ( 1 ACTL6B

expression POP7 LRCH4 PCOLCE SAP25 MOSPD3 EPHB4 SRRT TSC22D4 GNB2

Gene 0 rs1617640

Chr7 (q22.1) Position on the chromosome rs16864170 3

SOX11 2

1 eGFRvalue) P (-log

0 rs16864170 Gene expression correlation with correlation expression Gene Chr2 (p25.2) Position on the chromosome rs17319721 and rs13146355 6 FAM47E/STBD1 5

4

3

2

1

eGFRvalue) P (-log SCARB2 0 rs17319721 rs13146355 Gene expression correlation with correlation expression Gene Chr4 (q21.1) Position on the chromosome rs1933182 7 GSTM4 6

5

4

3 GNAI3 2 SORT1 GNAT

eGFRvalue) P (-log GSTM1 1 CELSR2 PSRC1 AMPD2 SARS PSMA5 GSTM2 0 rs1933182 Gene expression correlation with correlation expression Gene Chr1 (p13.3) Position on the chromosome rs2279463 5 SLC22A2 4 IGF2R SLC22A1 3

2

eGFRvalue) P (-log 1 SLC22A3

0

Gene expression correlation with correlation expression Gene rs2279463 Chr6 (q25.3) Position on the chromosome rs2453580

8

with ALDH3A2 7

6 SLC47A1

value) 5 P correlation 4 log ‐ 3 MFAP4 2 B9D1 eGFR ( EPN2 MAPK7 expression 1 ALDH3A1 ULK2 0 Gene rs243580 Chr17 (p11.2) Position on the chromosome rs2453533 5 GATM SLC30A4 4

3 DUOX1 2 SLC28A2

eGFRvalue) P (-log 1 SPATA5L1

0 Gene expression correlation with correlation expression Gene rs2453533 Chr15 (p21.1) Position on the chromosome rs267734 7 with CTSS 6 ANXA9

5 CERS2 value) P

4 correlation CTSK log

‐ 3 PRUNE PIP5K1A 2 CDC42SE1 MLLT11 SETDB1 eGFR ( 1 ARNT expression SCNM1/TNFAIP8L2 FAM63AC1orf56SEMA6C VPS72 0

Gene rs267734 Chr1 (q21.3) Position on the chromosome rs347685 3 TFDP2

2

1

eGFRvalue) P (-log ATP1B3

0 rs347685 Gene expression correlation with correlation expression Gene Chr3 (q23) Position on the chromosome rs3828890 8

LST1 LTB 7

6

5 APOM

4 AIF1 LY6G5C LY6G6C DDAH2 eGFRvalue) P (-log 3 BAG6 HLA-B MICB DDX39B PRRC2A NCR3 2 C6orf25 MICA Gene expression correlation with correlation expression Gene CSNK2B ATP6V1G2 HLA-C TNF 1 LTA NFKBIL1 0 rs3828890

Chr6 (p21.33) Position on the chromosome rs3925584 3

MPPED2 2

1 eGFRvalue) P (-log

0 rs3925584 Gene expression correlation with correlation expression Gene Chr11 (p14.1) Position on the chromosome rs4014195 5

FAM89B 4

3 EHBP1L1 CTSW SIPA1 LTBP3 CCDC85B FIBP FOSL1 2 CFL1 RELA eGFRvalue) P (-log 1 SART1 MAP3K11 KAT5 EFEMP2 KCNK7 C11orf68

Gene expression correlation with correlation expression Gene SSSCA1 OVOL1 MUS81 0 DRAP1 rs4014195 Chr11 (q13.1) Position on the chromosome rs4744712 1

0.75

0.5

0.25 PIP5K1B eGFRvalue) P (-log PRKACG FXN 0

Gene expression correlation with correlation expression Gene rs4744712 Chr9 (q21.11) Position on the chromosome rs4821469

with 3

value) 2 P

correlation MYH9 log ‐ TXN2 1 APOL1 eGFR ( APOL2

expression APOL3

0 Gene rs4821469 Chr22 (q12.3) Position on the chromosome rs6040055

3

2

MKKS

1

JAG1 eGFRvalue) P (-log

0 rs6040055 Gene expression correlation with correlation expression Gene Chr20 (p12.2) Position on the chromosome rs626277 3

2 DACH1

1 eGFRvalue) P (-log

0

Gene expression correlation with correlation expression Gene rs626277 Chr13 (q21.33) Position on the chromosome rs6420094

11 SLC34A1 10

9

8

7

6

5

4

eGFRvalue) P (-log FAM193B 3 NSD1 GRK6 2 DBN1 PDLIM7

Gene expression correlation with correlation expression Gene F12 DOK3 1 MXD3 RGS14 DDX41 LMAN2 PRR7 TMED9 B4GALT7 0 rs6420094 Chr5 (q35.3) Position on the chromosome rs6431731 4 MYCN

3

2 DDX1

1 eGFRvalue) P (-log

0 rs6431731 Gene expression correlation with correlation expression Gene Chr2 (p24.3) Position on the chromosome rs6465825 3 PHTF2

2 PTPN12

1 MAGI2 eGFRvalue) P (-log

0

Gene expression correlation with correlation expression Gene rs6465825 Chr7 (q11.23) Position on the chromosome rs653178 7 ACAD10 6 ALDH2 5

4

3 ATXN2 2 CUX2 BRAP

eGFRvalue) P (-log SH2B3 1

0 rs653178 Gene expression correlation with correlation expression Gene Chr12 (q24.12) Position on the chromosome rs7208487 and rs11078903 3 PLXDC1 ERBB2

2

CACNB1 PGAP3 1 RPL19

eGFRvalue) P (-log PNMT MED1 NEUROD2 CDK12 STARD3 TCAP 0 rs7208487 rs11078903 Gene expression correlation with correlation expression Gene Chr17 (q12) Position on the chromosome rs7583877 3 with

AFF3 2 value) P correlation log ‐ 1 eGFR ( expression

0

Gene rs7583877 Chr2 (q11.2) Position on the chromosome rs7805747 3

2

RHEB 1 PRKAG2 eGFRvalue) P (-log

0

Gene expression correlation with correlation expression Gene rs7805747 Chr7 (q36.1) Position on the chromosome rs881858 7

6 VEGFA

5

4

3

2

eGFRvalue) P (-log MAD2L1BP 1 POLH GTPBP2 MRSPS18A 0

Gene expression correlation with correlation expression Gene rs881858 Chr6 (p21.1) Position on the chromosome rs911119 and rs13038305

2 CST8

1

CST5 CST2 eGFRvalue) P (-log CST3 CST4 CST1 0 rs911119 and rs13038305 Gene expression correlation with correlation expression Gene Chr20 (p11.21) Position on the chromosome rs9895661 3 TBX2

2

1 NACA2 eGFRvalue) P (-log BCAS3 TBX4 0 rs9895661 Gene expression correlation with correlation expression Gene Chr17 (q23.2) Position on the chromosome Supplementary Table 9

eQTL analysis of the CKD risk loci in external datasets

SNPs associated with chronic kidney disease (CKD) acts as cis expression quantitative trait loci

(eQTL) in other tissues. P-values indicate the strength of the association between the SNP (as eQTL) and the nearby gene. The sources of the analyses are marked and detailed on the bottom of the table (analysis ID: 1-4). The table shows the results of our study in the case of these genes: baseline expression in the kidney based on our RNA sequencing data indicated in the last column; transcripts which are differentially expressed in CKD and/or transcripts whose expression levels correlate with estimated glomerular filtration rate (eGFR) are marked in red.

Gene in Baseline Distance the Gene P-value Analysis expression SNP Chromosome from the same Symbol (eQTL) ID in the SNP (kb) LD kidney block SYPL2 rs10857787 1 0 Yes 2.85 x 10-35 1 High ATXN7L2 rs10857787 1 16 Yes 1.5 x 10-3 1 Medium CERS2 rs267734 1 4 Yes 3.70 x 10-5 2 High CERS2 rs267738 1 0 Yes 7.51 x 10-5 2 SHROOM3 rs17253722 4 243 Yes 7.32 x 10-6 3 Medium CLTB rs3812035 5 974 No 1.5 x 10-4 2 High CLTB rs6420094 5 974 No 1.5 x 10-4 2 CLTB rs6862195 5 979 No 1.5 x 10-4 2 RMND5B rs3812035 5 748 No 4.06 x 10-5 2 High RMND5B rs6420094 5 747 No 4.06 x 10-5 2 RMND5B rs6862195 5 743 No 4.06 x 10-5 2 SLC25A37 rs17786744 8 347 No 4.68 x 10-5 3 Medium AP5B1 rs11227299 11 0 Yes 2.39 x 10-6 4 Medium AP5B1 rs4014195 11 35 Yes 2.96 x 10-6 4 AP5B1 rs9666878 11 66 Yes 4.03 x 10-6 4 THUMPD1 rs13333226 16 379 No 8.99 x 10-5 2 High THUMPD1 rs13335818 16 385 No 8.99 x 10-5 2 THUMPD1 rs4293393 16 380 No 8.99 x 10-5 2 CDK12 rs11078895 17 217 Yes 1.97 x 10-11 1 Medium PGAP3 rs8076494 17 311 Yes 7.87 x 10-7 1 High Type of the eQTL analysis Expression ID Title Tissue Samples (n) profiling Mapping the genetic architecture of gene 1 Liver Array 427 expression in human liver Transcriptome genetics using second generation 2 Lymphoblastoid RNAseq 60 sequencing in a Caucasian population Mapping cis- and trans-regulatory effects across 3 Adipose Array 856 multiple tissues in twins 4 Population genomics of human gene expression Lymphoblastoid Array 210

Supplementary Figure 4

Transcript level of SLC34A1 is different by rs6420094 genotype

The expression of SLC34A1 (solute carrier family 34, member 1) is significantly higher

(P=0.0305) in samples homozygous for A alleles (A/A, n=9) at the rs6420094 locus compering to samples with minor alleles (A/G, n=3 or G/G, n=6) at this locus. Only control samples

(eGFR> 85 ml/min/1.73m2) were used for the analysis (A). Microarray based transcript levels of

SCL34A1 correlate with renal function in tubule samples (R2=0.372, P=5.3 x 10-11) (B). QRT-

PCR-based SLC34A1 transcript levels (R2=0.293, P=1.3 x 10-4) confirm its correlation with kidney function (C). A Relative SLC34A1 expression for genotypes at rs6420094 1.2

1

0.8 0.6 * 0.4

0.2

0 (normalized to A/A genotype) to (normalized Relative SLC34A1 expression expression SLC34A1 Relative A/A A/G or G/G

B SLC34A1 (microarray) 2

1

0

-1 R2 = 0.372 -2 R = 0.61 expression value Normalized Normalized gene P = 5.3 x 10-11 -3 0 20406080100120 eGFR (ml/min/1.73m2)

C SLC34A1 (validation with

2 6 qRT-PCR) R = 0.293 5 R = 0.541 -4 4 P = 1.3 x 10 3 2 1 expression value Normalized Normalized gene 0 020406080100120 eGFR (ml/min/1.73m2) Supplementary Table 10

Disrupted transcription factor binding site motifs at the 44 CKD risk associated loci

The table shows the disrupted transcription factor binding site (TFBS) motifs at the 44 chronic

kidney disease (CKD) associated SNPs (single nucleotide polymorphisms). The table also lists

the CRATs (CKD risk associated transcripts) correlating with eGFR (estimated glomerular

filtration rate) at each CKD risk associated SNPs. Color-coding shows the baseline expression of

the transcripts based on human kidney RNA sequencing, red: high expression, yellow: medium

expression, green: low expression, blue: no expression. Transcripts in significant (P < 0.05)

correlation with eGFR in tubule (T) and glomerular (G) samples are listed, asterisk (*) marks if

the correlation was significant with a corrected P value below 0.05.

CKD risk Location Position CRATs correlating Number of Disrupted Number associated (chromosome) with eGFR in tubule total transcription of locus (T) and glomerular correlating factor binding site disrupted (G) samples (* P CRATS (TFBS) motifs TFBS corrected<0.05) (P<0.05) motifs rs267734 1 150951477 CTSS (T* and G*) 8 PTF1-beta, STAT 2 CTSK (T* and G) CERS2 (T*) ANXA9 (T* and G) PRUNE (T*) MLLT11 (T) CDC42SE1 (T) PIP5K1A (T*) rs1933182 1 109999588 PSRC1 (G) 4 Ik-1, Lhx3, Pax-4, 6 SORT1 (T and G) Pou2f2, ZBTB7A, Zfp740 GNAI3 (T*) GSTM4 (T*) rs12124078 1 15869899 EFHD2 (G) 8 Ets 1 CELA2A (T) CELA2B (T) CASP9 (T* and G) DNAJC16 (T*) AGMAT (T* and G) DDI2 (T*) RSC1A1 (T*) rs16864170 2 5907880 SOX11 (T*) 1 BATF, HNF4, Irf, 4 RFX5 rs6431731 2 15863002 DDX1 (T) 2 FAC1, RFX5 2 MYCN (T* and G) rs10206899 2 73900900 NAT8 (T*) 2 SP1 1 NAT8B (T*) rs7583877 2 100460654 AFF3 (T*) 1 Gfi1 1 rs347685 3 141807137 TFDP2 (T*) 1 AP-1,Irf 2 rs17319721 4 77368847 FAM47E (T* and G*) 2 LBP-1, Nanog, 5 STBD1 (T* and G*) Smad3, Sox, YY1 rs13146355 4 77412140 FAM47E (T* and G*) 2 Foxa,PLAG1 2 STBD1 (T* and G*) rs6420094 5 176817636 NSD1 (T*) 7 GATA 1 SLC34A1 (T* and G) F12 (G) GRK6 (T*) DBN1 (T) PDLIM7 (T) FAM193B (T*) rs11959928 5 39397132 FYB (T* and G*) 2 Irf,Maf,WT1 3 DAB2 (T*) rs881858 6 43806609 VEGFA (T* and G) 1 Pitx2, 6 p300,GATA, CEBPD,CEBPB,C EBPA rs2279463 6 160668389 IGF2R (T*) 4 Cdx2, , 4 SLC22A1 (T*) TATA, Zfp281 SLC22A2 (T*) SLC22A3 (G) rs3828890 6 31440669 HLA-C (G) 8 T3R 1 MICB (T* and G) LST1 (T* and G) NCR3 (T*) AIF1 (T*) BAG6 (T*) APOM (T*) LTB (T* and G*) rs6465825 7 77416439 PTPN12 (T) 3 COMP1, Ik-2, 3 PHTF2 (T*) Mef2 MAGI2 (G) rs7805747 7 151407801 0 Tgif1, Pax-6, 4 HMGN3, GATA rs10277115 7 1285195 GPER (T*) 2 0 MICALL2 (G*) rs1617640 7 100317298 LRCH4 (T* and G) 4 AP-1, BDP1 2 PCOLCE (G) SLC12A9 (T) TRIP6 (T*) rs10109414 8 23751151 STC1 (T*) 1 Irx, Pou2f2 2 rs1731274 8 23766319 STC1 (T*) AP-1, INSM1, 3 STAT rs4744712 9 71434707 0 ERalpha-a, Egr-1, 6 Esr2, Prrx2, RXRA, VDR rs10794720 10 1156165 LARPB4 (T*) 2 HEN1, PTF1-beta 2 IDI1 (T) rs4014195 11 65506822 LTBP3 (T*) 11 Mef2, Pou1f1, 3 FAM89B (T*) TATA EHBP1L1 (T* and G*) SIPA1 (T*) RELA (T) CFL1 (T) CCDC85B (T*) FOSL1 (T*) CTSW (T*) FIBP (T*) DRAP1 (G) rs3925584 11 30760335 MPPED2 (T*) 1 PU.1 1 rs653178 12 112007756 CUX2 (T) 4 Esr2 1 ATXN2 (T*) ACAD10 (T*) ALDH2 (T*) rs10774021 12 349298 SLC6A12 (T*) 3 GR, Irf, Zec, Zic 4 SLC6A13 (T* and G) KDM5A (G) rs626277 13 72347696 DACH1 (T) 1 NF-I 1 rs491567 15 53946593 0 Barx1, Barx2, 42 Dbx1, Dbx2, En-1, Esx1, Gbx1, Gbx2, Hlx1, Hlxb9, HNF1, Hoxa10, Hoxa3, Hoxa5, Hoxb13, Hoxb3, Hoxb4, Hoxb7, Hoxb8, Hoxb9, Hoxc6, Hoxd8, Isx, Lhx3, Msx-1, Ncx, Nkx6-1, Nkx6-2, Nobox, Pax-6, Pax7, Pbx- 1, Phox2a, Pou1f1, Pou2f2, Pou3f2, Pou3f4, Pou4f3, Prrx1, Prrx2, Sox, Vax2 rs1394125 15 76158983 0 GATA, TATA, 3 p300 rs2453533 15 45641225 DUOX1 (T*) 4 , TATA 2 SLC28A2 (T) GATM (T* and G) SLC30A4 (T*) rs12437854 15 94141833 0 GR, Hoxa9, Pdx1, 4 Pitx2 rs12917707 16 20367690 GP2 (T and G) 5 E2A, ERalpha-a, 3 UMOD (T* and G) ZEB1 ACSM5 (T* and G) ACSM2A (T*) ACSM2B (T*) rs4293393 16 20364588 GP2 (T and G) 5 0 UMOD (T* and G) ACSM5 (T* and G) ACSM2A (T*) ACSM2B (T*) rs11864909 16 20400839 GP2 (T and G) 6 Ehf, Ets, Nrf-2 3 UMOD (T* and G) ACSM5 (T* and G) ACSM2A (T*) ACSM2B (T*) ACSM1 (T) rs9895661 17 59456589 TBX2 (T*) 1 Eomes, hR, NRSF, 5 Pax-8, TBX5 rs2453580 17 19438321 EPN2 (T) 4 EBF, ERalpha-a, 5 MFAP4 (T* and G*) Ets, HEN1, Pou2f2 SLC47A1 (T*) ALDH3A2 (T* and G) rs11078903 17 37631924 ERBB2 (T*) 1 Nkx2 1 rs7208487 17 37543449 PLXDC1 (T* and G) 1 DEC, RXRA 2 rs12460876 19 33356891 SLC7A9 (T*) 1 AP-1, Pax-4, 3 Rad21 rs911119 20 23612737 CST8 (T) 1 SZF1- 3 1,TCF12,VDR rs13038305 20 23610262 CST8 (T) 1 ERRA 1 rs6040055 20 10633313 MKKS (T) 1 , Ets, Nrf-2, 4 Sox rs4821469 22 36616445 MYH9 (T) 1 , Eomes, 3 TBX5