BASIC RESEARCH www.jasn.org

Whole-Genome Sequencing of Finnish Type 1 Diabetic Siblings Discordant for Kidney Disease Reveals DNA Variants associated with Diabetic Nephropathy

Jing Guo ,1,2 Owen J. L. Rackham ,2 Niina Sandholm ,3,4,5 Bing He ,1 Anne-May Österholm,1,2 Erkka Valo ,3,4,5 Valma Harjutsalo ,3,4,5,6 Carol Forsblom,3,4,5 Iiro Toppila,3,4,5 Maija Parkkonen,3,4,5 Qibin Li,7 Wenjuan Zhu,7 Nathan Harmston ,2,8 Sonia Chothani,2 Miina K. Öhman ,2 Eudora Eng,2 Yang Sun,2 Enrico Petretto ,2,9 Per-Henrik Groop,3,4,5,10 and Karl Tryggvason1,2,11

Due to the number of contributing authors, the affiliations are listed at the end of this article.

ABSTRACT Background Several genetic susceptibility loci associated with diabetic nephropathy have been documen- ted, but no causative variants implying novel pathogenetic mechanisms have been elucidated. Methods We carried out whole-genome sequencing of a discovery cohort of Finnish siblings with type 1 diabetes who were discordant for the presence (case) or absence (control) of diabetic nephropathy. Con- trols had diabetes without complications for 15–37 years. We analyzed and annotated variants at genome, , and single-nucleotide variant levels. We then replicated the associated variants, , and regions in a replication cohort from the Finnish Diabetic Nephropathy study that included 3531 unrelated Finns with type 1 diabetes. Results We observed -altering variants and an enrichment of variants in regions associated with the presence or absence of diabetic nephropathy. The replication cohort confirmed variants in both regulatory and protein-coding regions. We also observed that diabetic nephropathy–associated variants, when clus- tered at the gene level, are enriched in a core protein-interaction network representing essential for podocyte function. These genes include protein kinases (protein kinase C isoforms « and i) and protein tyrosine kinase 2. Conclusions Our comprehensive analysis of a diabetic nephropathy cohort of siblings with type 1 di- abetes who were discordant for kidney disease points to variants and genes that are potentially caus- ative or protective for diabetic nephropathy. This includes variants in two isoforms of the protein kinase C family not previously linked to diabetic nephropathy, adding support to previous hypotheses that the protein kinase C family members play a role in diabetic nephropathy and might be attractive therapeutic targets.

JASN 31: ccc–ccc, 2020. doi: https://doi.org/10.1681/ASN.2019030289

With the increase in the incidence of diabetes worldwide, complications such as diabetic nephrop- Received March 22, 2019. Accepted October 19, 2019. athy (DN), retinopathy, neuropathy, skin ulcers, and Published online ahead of print. Publication date available at amputations have become a major global health and www.jasn.org. socioeconomic threat. In addition to intensive blood Correspondence: Prof. Karl Tryggvason or Prof. Enrico Petretto, 1 glucose control, the only drugs providing a signif- Cardiovascular and Metabolic Disorders Program, Duke-NUS Medical icant delay in progression of DN are angiotensin- School, 8 College Road Singapore 169857. Email: karl.tryggvason@ converting enzyme inhibitors or angiotensin receptor duke-nus.edu.sg or [email protected] blockers, which reduce intraglomerular pressure and Copyright © 2020 by the American Society of Nephrology

JASN 31: ccc–ccc,2020 ISSN : 1046-6673/3102-ccc 1 BASIC RESEARCH www.jasn.org efferent arteriolar vasoconstriction.2 The molecular pathogene- Significance Statement sis of DN is still poorly understood. Hyperglycemia, a major risk factor for complications, causes accumulation of toxic glucose Although diabetic nephropathy is partly genetic in nature, the un- derivatives, such as methylglyoxal, that bind covalently to the side derlying pathogenetic mechanisms are obscure. The authors as- chains of amino acids, particularly arginine and lysine, and also sembled from the homogeneous Finnish population a cohort of 76 sibling pairs with type 1 diabetes who were discordant for diabetic 3,4 fi methionine and cysteine. Hyperglycemia alone is not suf cient nephropathy. Using whole-genome sequencing and multiple to trigger the development of complications, as only 30%–40% of analytic approaches, they identified DNA variants associated with individuals with type 1 diabetes (T1D) develop diabetic micro- nephropathy or its absence and validated their findings in a 3531- angiopathy.1,5,6 Independent familial studies have shown a trend member cohort of unrelated Finns with type 1 diabetes. The genes of family aggregation of DN in different populations,7,8 most strongly associated with diabetic nephropathy encode two protein kinase C isoforms (isoforms « and i) not previously impli- suggesting a genetic predisposition to DN. At least four metabolic cated in the condition. Besides providing a resource for studies on pathways have been implicated in the development of complica- diabetic complications, these findings support previous hypotheses tions: polyol flux, increased the formation of advanced glycation that the protein kinase C family plays a role in diabetic nephropathy end products, hyperactivity of the hexosamine pathway, and and suggest potential targets for treatment. activation of protein kinase C (PKC) isoforms.4,9,10 Genome-wide association studies (GWAS) and candidate Study Participants fi gene approaches have identi ed several potential genomic loci The discovery cohort consisted of sibling pairs and small fami- 11 for DN susceptibility, but no variants with a major effect on lies, whereas the replication cohort consisted of unrelated indi- the risk of complications have been found, suggesting that DN viduals, all having T1D. The renal status was on the basis of the is modulated by a number of variants in genes that cooperate albumin excretion rate (AER) in a 24-hour urine collection or the within complex pathways. It is intriguing, however, that sev- albumin-to-creatinine ratio (ACR) in a random spot-urine col- eral independent, genome-wide linkage analysis studies car- lection. The presence of ESKD was defined according to whether ried out in white Americans, Pima Indians, black Americans, patients were undergoing dialysis or had received a kidney trans- fi and Finns have identi ed the same DN susceptibility plant. DN was defined by (1) persistent macroalbuminuria 12–15 on 3q. The complex interaction between (AER$300 mg/24 h or ACR.30 mg/mmol) in two of three con- genetics, risk factors such as hyperglycemia and environmen- secutive measurements or the presence of ESKD; and (2)the fi fi tal components makes it more challenging to nd speci c absence of clinical or laboratory evidence of nondiabetic renal genes for DN using genetic association studies. To that end, or urinary tract disease. Control status was defined by normoal- it could be advantageous to search for DN susceptibility genes buminuria (AER,30 mg/24 h or ACR,3 mg/mmol) despite in populations such as Finns, a uniquely homogeneous European duration of diabetes for at least 15 years (range, 15–37 years). In 16 ’ 17,18 population with the world s highest incidence of T1D. the discovery cohort, all study participants had been diagnosed With a combination of founder effects and genetic isolation, with T1D for at least 15 years, with the age at onset ,30 years; the population has accumulated rare genetic traits referred to in the replication cohort, age at diabetes onset was #40 years, “ ”19 as the Finnish Disease Heritage. In addition, Finland has a with insulin dependence within 1 year after the diagnosis of di- good public health care system, including nationwide disease abetes (or age at diabetes onset #15 years). Controls in the fi and treatment registries, which facilitates identi cation of replication cohort had minimum diabetes duration of 15 years. patients and follow-up of their clinical records. The replication cohort included 2187 controls with normal AER and 1344 cases with macroalbuminuria and ESKD.

METHODS Ethical Permits All patients with diabetes gave written, informed consent Experimental Design to participate in the study and the Ethical Committee of the To search for DN susceptibility genes, we have assembled a Finnish National Public Institute, the Ethical Committee of cohort of Finnish T1D siblings with extreme phenotypes re- the Helsinki and Uusimaa Health District, and Karolinska In- garding the presence (case) or absence (control) of DN. This stitutet approved the protocol for the study. The transgene discovery cohort contained 76 T1D sibling pairs discordant manipulation in zebrafish was approved by the local ethical for DN, and three T1D families with three siblings (total of committee (the North Stockholm District Court). 80 cases and 81 controls). The samples came from two sources: the Finnish National Institutes of Health and Welfare diabetes Whole-Genome Sequencing collections, as described elsewhere15; and the Finnish Diabetic Whole-genome sequencing (WGS) was carried out on the dis- Nephropathy (FinnDiane) study.20 Furthermore, 3531 unre- covery cohort using both Illumina HiSeq 2000 and Complete lated individuals with T1D (1344 cases and 2187 controls) Genomics platforms. To evaluate the quality of the two different (Figure 1A) from FinnDiane were used for replication of findings sequencing methods, we sequenced four discordant sibling made in the discovery cohort. The main clinical characteristics of pairs with both platforms and compared the difference of the patients in the discovery cohort are summarized in Table 1. called variants across different platforms. The methods used

2 JASN JASN 31: ccc–ccc,2020 www.jasn.org BASIC RESEARCH

AB Discovery Cohort Whole Genome Sequence Identification of SNVs by WGS of Finnish type 1 diabetic sib-pairs (n=76) discordant for nephropathy

T1D + Nephropathy T1D only SNV level Gene level Genome level Protein-altering Sequence Kernel Recurrently mutated Mutations in Not Available DSP and IncRNAs Association Test regions regulatory regions n = 76

Association with DN susceptibility or protection

MS MS n = 2 n = 1 Prioritization and annotation of mutations, genes and pathways

Replication Cohort Replication in FinnDiane (3,531 unrelated T1D patients) SNParray + Imputation

in vivo functional studies (zebrafish)

n = 1344 n = 2187

T1D + Nephropathy T1D only

Figure 1. Study design illustrates patient cohort and principles of DNA variant analysis. (A) Cohorts used in the search for DN sus- ceptibility genes in Finnish patients with T1D: the genomes of a total of 76 sibling pairs concordant for T1D but discordant for diabetic nephropathy (DSPs) were subjected to WGS. Additionally, T1D siblings from three families with three siblings (multiple siblings [MS]) with or without DN were included in the sequencing analyses. The control siblings (n=81) have had diabetes for at least 15 years (range, 15–37 years) without developing DN, and have never been on angiotensin-converting enzyme inhibitor or angiotensin receptor blocker medication for kidney disease. The case siblings (n=80) have had overt proteinuria, been on dialysis, received a kidney transplant, or have died from kidney complications. (B) Multilevel strategy used to analyze the WGS data from Finnish T1D individuals with or without diabetes complications. SNV, single-nucleotide variants. for sequence alignment, quality control, variant calling, and sample size analysis22,23 to assess the significance of the asso- single-nucleotide variant (SNV) annotation can be found in ciation (P value) in the discovery, replication, and combined Supplemental Appendix 1. cohorts. Odds ratio (OR) and P values for association were calculated using the Firth bias-reduced, penalized-likelihood Bioinformatics Approaches for WGS Analysis logistic regression method, and was implemented in the R pack- To fully utilize WGS data, we performed the association anal- age logistf.24 The association test results were used to select SNVs ysis with DN at three levels (Figure 1B): (1) genome-level for gene-level test, and SNV-level test. The criteria for selection analyses to study hotspots of mutations and SNVs affecting reg- are different in gene- and SNV-level tests (see details below). ulatory regions; (2) gene-level aggregation tests to identify genes with DN-predisposing (or protecting) variants; and (3) SNV-level Genome-Level Analysis tests focusing on the protein-altering variants (PAVs) present only To identify genomic regions with frequent variants associated in cases or only in controls, and therefore potentially causal or with DN in the 76 discordant sibling pairs, we set out to (1)iden- protective for DN. Each level of analysis uses different criteria tify regions that are significantly recurrently mutated (recurrently for statistical significance; a brief summary of the statistical mutated regions [RMRs]) compared with the distribution of models and criteria used in each analysis is reported in Table 2. mutations across the genome, and (2) test each region for signif- A global snapshot of all DN-associated variants and replicated icant over-representation of mutations in DN cases or controls. in the FinnDiane cohort is provided in Figure 2. For (1), the identification of RMR was carried out following the method proposed by Weinhold et al.25 Briefly, all mutations lo- Association Test for SNVs cated within 50 bps of each other were merged using BEDTools26 For each SNV, we tested the association with DN using into hotspot clusters, and this procedure was repeated until no four genetic models: (1) case-dominant, (2) case-recessive, cluster was found within 50 bp of another cluster. The optimal (3) control-dominant, and (4) control-recessive.21 To this aim, cluster size was determined empirically given the observed dis- we used the Firth logistic regression method that accounts for tribution of mutations and their distance in the genome (data not rare variants and provides bias reduction in case of small shown). Clusters with fewer than three mutations were removed.

JASN 31: ccc–ccc,2020 DNA Variants associated with Diabetic Nephropathy 3 BASIC RESEARCH www.jasn.org

Table 1. Clinical characteristics of the Finnish T1D patient of their minor allele frequency (MAF); (2)allSNVswith discovery cohort MAF,0.01, irrespective of their association with DN in discovery Characteristic Cases Controls cohort; and (3) all SNVs with MAF,0.05, irrespective of their N (male %)a 80 (61.3) 81 (46.9) association with DN in discovery cohort. Genes that reached sig- T1D nificance in the F-SKATanalysis (nominal F-SKAT P,0.01) have Duration, yrb Range, 21–38 Range, 15–37 been annotated for functional enrichment test using Enrichr.28 Age at onset, yr 11.668.1 16.6611.3 BP, mm Hg SNV-Level Analysis Systolic 149.2623.1 (n=60) 135.4615.3 (n=59) To select significant SNVs for replication, we focused on the SNVs Diastolic 82.1611.3 (n=60) 79.268.0 (n=59) that are PAV (missense, nonsense, stop-loss, splicing site) or lo- Antihypertensive catedinanexonicregioninnoncodingRNAs.SNVspresentonly medication (%) in cases or only in controls in dominant (three or more individ- At baseline 83.8 25.9 uals) or recessive (one or more individuals) were selected for During follow-up 98.0 74.0 replication in the FinnDiane cohort. For analysis of replication, Hemoglobin A1c (%) 9.062.0 (n=70) 8.461.4 (n=57) Body mass index, kg/m2 26.365.0 (n=63) 26.463.9 (n=57) an association test, as described above, was carried out in the Total cholesterol, mmol/L 5.561.2 (n=69) 5.161.0 (n=77) discovery (T1D discordant sibling pairs, n=152), replication Lipid-lowering medication (%) (FinnDiane, n=3531), and combined cohorts (discovery plus At baseline 22.5 9.9 replication, n=3683). Methods for power calculation for SNVs During follow-up 82.5 69.1 association tests are described in Supplemental Appendix 1. ESKD (%) 46.3 0 6 Data are reported as range or mean SD. Analysis of Replication Cohort aN, number of subjects. bDuration till year 2017. Genome-wide genotyping was performed on the Illumina HumanCoreExome Bead arrays 12–1.0, 12–1.1, and 24–1.0. The arrays include a core set of genome-wide variants plus an For each cluster, a P value was calculated using the negative extensive set of exome variants. Data processing and quality binomial distribution, taking into account the length of the control methods have been described earlier.29 The genotype candidate hotspot region, the number of mutations in the cluster, data were imputed with Micmac3 using the 1000 Genomes and the background mutation rate (average mutation rate per sam- reference panel (phase 3, version 5). SNVs with poor quality ple) for the cluster that was estimated using the genome-wide (R2,0.3) were removed from analysis. Samples overlapping expectation. The candidate hotspot regions were selected for with the discovery cohort were excluded. Candidate SNVs were further analyses on the basis of their P value for significance and extracted from the GWAS imputation data and the number of using a stringent Bonferroni correction for the number of regions genotypes was counted for controls and cases on the basis of tested (Supplemental Figure 1). To identify recurrently mutated the most likely genotypes using SNPTest. regions associated with DN (DN-RMR), for each region we coun- To evaluate the false positive rate of replication at the SNV ted the number of mutations found in DN cases or controls and level, we performed an empirical test in three steps: (1)selecta carried out a Fisher exact test (FET) to assess whether a mutation random set of SNVs from discovery PAVs; (2) test association was over-represented in either cases or controls. The Benjamini– on this random set, and count the number of significant variants Hochberg false discovery rate (FDR) correction to account for (OR.1.5; P,0.05); and (3) repeat the steps 10,000 times to assess the number of regions tested by FET was applied to identify the false positive rate. For gene-level replication, we could not apply DN-RMR at the genome-wide level. For details of the analy- F-SKAT to FinnDiane data because that replication cohort does ses performed on transcription factor binding sites (TFBS), not contain familial data. Instead, we used the sequence kernel promoters, and enhancers, please see Supplemental Appendix 1. association test30 on the same SNV set (if found in FinnDiane) as we used for F-SKAT in the discovery cohort. For replication in Gene-Level Analysis the genome level DN-RMR test, we extracted all SNVs within the We applied the adjusted sequence kernel association test for famil- RMR regions defined by tests of the discovery cohort, and tested ial data of dichotomous traits (F-SKAT27) on the multisibling co- enrichment of variants in cases or controls for each region using hort (n=161). SNVs within a gene region were clustered together two-tailed FET, then corrected by Bonferroni P,0.01. for the analysis. The gene region included variants in upstream 1000 bp, downstream 1000 bp, untranslated regions (UTR) at the 39 side and 59 side, intron, and exon. Only the PAVs in the exonic RESULTS region were included, i.e.,nonsynonymous,stop-gain,stop-loss, and splicing site variants in RefSeq. We performed the gene-level Variants Detected by WGS aggregation test on three different sets of variants: (1)SNVsnom- We evaluated the sequencing quality by sequencing inally associated with a case-control phenotype in the discovery four sibling pairs with both Complete Genomics and Illumina cohort (OR.1.5 and nominal P,0.05, Firth test), irrespective HiSeq 2000 platforms. The concordance rate across the two

4 JASN JASN 31: ccc–ccc,2020 JASN 31: ccc – ccc ,2020

Table 2. Summary of test results from the genomic, gene, and single-variant levels of data analysis on 76 discordant sibling pairs Multi-test Correction Functional Replicated in Test Name Test Model and threshold Results Result Report Annotation FinnDiane (discovery) Genome level RMRs Hotspot clustering, Bonferroni N.A. 850,137 RMR N.A. Only DN-RMR are Figure 3, Supplemental 2 negative binomial P,3.7310 5 replicated Figure 1 distributiona DN-RMR FET FDR,0.05 Genome location, 732 DN-RMR Bonferroni P,0.01 141 Figure 3, Supplemental pathway over- and the DN-RMR replicated Table 4 representation pathways (KEGG), protein- involved protein interaction Promoters, enhancer, FET FDR,0.05 Functional enrichment 270 promoters, Bonferroni P,0.01, 68 Figure 3, Supplemental TFBS test, protein-protein 44 enhancers, promoters, 5 Tables 5–7 interaction 40 TFBS enhancers, 6 TFBS replicated Gene level F-SKAT (76 pairs plus F-SKAT on DN- N.A. nominal Functional enrichment 206 F-SKAT significant 9 genes using strict Figure 4, Supplemental three multisibling associated SNVs P,0.01 test, protein-protein genes replication approach Tables 9–11 families) (OR.1.5; P,0.05) interaction

N ainsascae ihDaei Nephropathy Diabetic with associated Variants DNA F-SKAT on rare SNVs N.A. Supplemental Table 8 (MAF,0.05; MAF,0.01) Single-variant level Single-variant OR in dominant and N.A. Case-only or SNV location, SIFT, 3562 PAVs, 3259 OR.1.5 and P,0.05, Table 3, Supplemental association test recessive model control- only, Polyphen2 variants in ncRNA 47 recessive PAVs Table 12 and 13 PAV or ncRNA exonic replicated, 86 www.jasn.org exonicb,c recessive ncRNA variants replicated N.A., not available; ncRNA, noncoding RNA. aVariant clustering method proposed by Weinhold et al.25 bCase-only or control-only: $3 heterozygous individuals in only case/control in dominant model; $1 homozygous individual in only cases/control in recessive model. ci.e., nonsynonym, stop-gain, and stop-loss. RESEARCH BASIC 5 BASIC RESEARCH www.jasn.org RPAP1 AVEN LTK

UBR7

TPPP2

PML

WDR73

KRT32 KRTAP2−3 SEPT9

PPP4R1 ZNF407 ZNF844HKR1

OR6X1 TEX101 TYR OR51G2 ANO9 ETV6 SIGIRR NEFH

STAMBPL1 PNPLA3 ERG Replicated SNVs C

a s e & Control ANKRD26 Replicated

F-SKAT

DN-RMR

Promoter PNPLA7 Enhancer TFBS DBH

SIGMAR1 TP73 CPTP PTK2 KAZN ATAD3B PKHD1L1 RALYL TNFRSF14 CSMD2 ESCO2 ADAM32 WDR63 CSMD1 URGCP- DARS2 MRPS24

TMEM176A

UNC93A

SFT2D1

C6orf118 COL6A5

C6orf10 ATP10D

TBC1D9

C4orf51

Figure 2. Analysis of whole-genome sequencing data from Finnish T1D sibling pairs reveals DNA variants and regions that are associated with DN. The Circos plot consists of multiple layers, each of which represents a bioinformatic analysis approach and its significant outcomes in the discovery cohort. From the outside to the center, with cytoband as a genome location reference. DN-associated PAVs that are replicated in FinnDiane are highlighted. PAVs that are highly enriched in cases are marked in red and controls are marked in green. In the second layer, genes with a highly enriched cluster of DN-associated variants that has been pri- oritized by F-SKAT are depicted in the orange circle, and those passing the stringent replication are marked by their names. From the third layer, regions of recurrent mutations that are associated with case or control (DN-RMR) are shown in the light green circle, fol- lowed by promoters (6500 bp from a promoter annotated CAGE cluster according to FANTOM5) in light blue, enhancers (6500 bp from an enhancer annotated CAGE cluster according to FANTOM5) in light purple, and TFBS in light red. The details of the statistical models and the call of significance of association for each approach are listed in Table 2. platforms for all eight individuals was 98.8% (Supplemental deletions (Supplemental Table 3). Here, we focused on genetic Table 1, A–C). variants functionally associated with the DN phenotype, i.e., WGS of the discovery cohort revealed 12 million SNVs variants affecting gene regulatory elements and/or coding (Supplemental Table 2) and .6 million short insertions and regions.

6 JASN JASN 31: ccc–ccc,2020 www.jasn.org BASIC RESEARCH

Genome-Level Analysis whether there was a significant over-representation of variants We analyzed the complete genome sequences of the 76 T1D in DN-ascertained cases or in controls (Figure 3C). Overall, discordant sibling pairs to systematically identify genomic re- we found more variants affecting TFBS in controls than in gions that are recurrently mutated and over-represented in cases, and in some instances these variants are present only the DN cases or controls (i.e., individuals with T1D but without in controls and across multiple families. By pooling results for DN). A similar approach has been used in studies of genome- TFs over their corresponding TFBSs, we identified 40 TFs with wide noncoding regulatory mutations in cancer,25 and is on significantly different variant frequencies between cases and the basis of (1) genome-wide “hotspot” mutation analysis controls (Benjamini–Hochberg corrected P,0.05) and six to identify small regions with frequent (recurrent) mutations (out of 20 TFs for which genotype data were available in the and analysis of clusters of recurrent mutations, (2) DNA vari- replication cohort) were replicated in FinnDiane (Bonferroni ants affecting TFBS, and (3) annotated regulatory regions (e.g., P,0.01) (Supplemental Table 5). The 40 TFs were enriched promoters and enhancers). for pathways relevant to the pathophysiology of DN (Figure For the genome-wide hotspot mutation analysis, we 3D). These include the EGF receptor-dependent endothelin identified a total of 850,137 RMRs. Each RMR represents a signaling (implicated in the development and progression of genomic locus enclosing a cluster of variants within 50 bp of renal fibrosis and hypertrophy of the glomerular basement each other, and genome-wide RMRs have a median size of membrane), which has been proposed for targeting by endothelin 436 (4–37,433) bp. Each identified RMR is significantly recur- antagonist therapy in DN.34 We also found the structurally related rently mutated compared with a random distribution of muta- transmembrane receptors belonging to the receptor tyrosine 2 tions across the genome (Bonferroni-corrected P,3.7310 5; kinase superfamily (e.g., ErbB1) that are involved in the devel- Supplemental Figure 1). opment and progression of DN.35 Of note, variants in ERBB4 We first tested whether these RMRs are significantly over- have previously been suggested to be associated with DN,18,36 represented in DN cases or in controls. After correcting for the although the causal variants were not identified. number of total RMRs analyzed, we detected 732 RMRs that The third genome-level analysis approach was to study an- are over-represented in either DN cases or in controls at notated regulatory regions in the genome (gene promoters FDR,5%, thereby identifying a set of DN-RMRs (Figure 3A, and enhancers) that are derived from the FANTOM5 data- Supplemental Table 4). A total of 141 of these DN-RMRs base37 and were further supported by ENCODE38 histone (19.26%) were replicated in the FinnDiane cohort (Bonferroni modification data, and to test whether variants in these P,0.01). We found that 458 (63%) DN-RMRs are intergenic, regions were significantly over-represented in DN cases or whereas 274 (37%) overlap with 194 annotated genes. When controls. We found significant enrichment (FDR,0.05) for compared with the whole set of RMRs identified at the genome- DN-associated variants in 270 promoter regions (61kb wide level in the discovery cohort, the DN-RMRs more frequently around the annotated gene transcription start site), 68 (25.2%) overlap with exons, introns, 39UTRs, 59 UTRs, enhancers, and were replicated in the FinnDiane cohort (Bonferroni P,0.01) gene promoter regions (Figure 3A). This suggests that DN- (Supplemental Table 6A). We also found significant enrichment associated clusters of mutations are more likely to affect exons (FDR,0.05) for DN-associated variants within 61kbof44 and regulatory regions than the RMRs that are not associated predicted enhancers (Supplemental Table 6B). DN-associated with DN. The genes overlapping with DN-RMRs are signifi- variants in five enhancers were replicated in the larger FinnDiane cantly enriched for several canonical KEGG pathways relevant cohort (Bonferroni P,0.01). We further prioritized candidate to the pathobiology of DN (P,0.01), including ECM-receptor genes within these replicated enhancers using data related to interaction, focal adhesion, and T1D (Figure 3B). These pathways topologically associated domains, epigenetic regulation, and have several genes in common, suggesting that the identified transcriptome analysis of DN in human39 (Supplemental Table7). DN-RMRs affect multiple genes interacting across overlapping Not surprisingly, in a few cases distinct genome-level anal- functional pathways (Figure 3B). Interestingly, COL4A1 and yses prioritized the same gene locus. For instance, ALOX5, COL4A2, which encode the most prominent non-GBM colla- encoding arachidonate 5-lipoxygenase (a member of the lip- gens were shown to be associated with DN, as previously re- oxygenase gene family regulating metabolites of AA), was ported.31,32 Both genes were enriched for variants in intronic found to overlap with an intragenic DN-RMR spanning regulatory regions, but their possible role in the pathogenesis 4724 bp and has DN-associated variants in two predicted en- of DN remains obscure, especially as no exonic mutations hancers and in its annotated promoter region, suggesting po- were different between cases and controls in these genes in tential enhancer–promoter interaction40 (Figure 3E). A role the discovery cohort. for lipoxygenase inhibitors in DN has been proposed in the As the second genome-level approach, to investigate the rat41 and 12-lipoxygenase is increased in glucose-stimulated potential regulatory effect of DN-associated variants, we re- cultured mesangial cells and in kidney of rat DN model.42 trieved and annotated experimentally derived TFBS data Furthermore, it has been shown that 5-lipoxygenase contrib- from a large repository of chromatin immunoprecipitation utes to degeneration of retinal capillaries in a mouse model sequencing data representing DNA binding data for 237 tran- of diabetic retinopathy, suggesting a proinflammatory role of scription factors (TFs).33 Within each TFBS region, we tested 5-lipoxygenase in the pathogenesis of DN.43

JASN 31: ccc–ccc,2020 DNA Variants associated with Diabetic Nephropathy 7 BASIC RESEARCH www.jasn.org

A B RYR3 CCKBR PLCG2 all RMR CHRM2 RELN DN-RMR (different between cases/controls) 50% TACR3 Arrhythmogenic right COL4A2 44% 43% ventricular B cell receptor cardiomyopathy signaling pathway 40% 37% 37% GNAL Leukocyte COL4A1 ECM-receptor transendothelial interaction 30% migration 24% VAV1 CTNNA3 20% Calcium signaling 15% Focal adhesion pathway VAV2 CTNNA2 Small cell lung cancer 10% 9% 7% 5% 3% 3% 2% 3% 1% CARD11 RYR2 0% ITGA2 FHIT ITGA1 Type I diabetes mellitus Arrhythmogenic right ventricular cardiomyopathy ECM-receptor interaction Calcium signaling pathway Focal adhesion Leukocyte transendothelial migration B cell receptor signaling pathway Cell adhesion molecules (CAMs) Small cell lung cancer 4.0

(Fantom5) Promoters (Fantom5) HLA-A Enhancers Exons (refseq) 3'UTR (refseq) 5'UTR (refseq) 3.0 Genes (refseq) Introns (refseq) HLA-DPA1 PTPRN2 Type I diabetes 2.0 mellitus (P-value) 10 1.0 Cell adhesion -log CDH4 molecules NFASC

Significance of enrichment 0.0 KEGG pathways NCAM2

Thrombin/protease-activated C D Nectin adhesion pathway receptor (PAR) pathway

Non-redundant ChIP-seq peaks for 237 transcription factors

668 ChIP-seq experiments present the GEO (public) + ENCODE

PDGF receptor signaling EGFR-dependent Endothelin NFYA NFKB1 network signaling events DN-RMR NFYB MYC

NR3C1 MAX

RELA JUND DN cases TF1 controls Genome RUNX2 FOXA1 Urokinase-type plasminogen activator (uPA) and uPAR-mediated mTOR signaling pathway DN cases signaling SMARCA4 ETS1 TF2 controls Genome STAT5A CEBPB

Transcription factors with mutations associated with DN co-localizing to their ChlP-peaks TP53 AR

VDR CTCF TGFBR VEGF and VEGFR signaling ErbB1 downstream signaling network

S1P1 pathway

E chr10 83 kb 860 kb 45,870 kb 45,880 kb 45,890 kb 45,900 kb 45,910 kb 45,920 kb 45,930 kb 45,940 kb ALOX5 locus

[-7.000 - 15]

Distribution of mutations SNV counts in cases cases SNV counts in controls controls

ALOX5

DN-RMR

±1kb promoter

±1kb enhancer

Figure 3. Genome-wide analysis of variants reveals regions and TFBSs associated with DN in the discovery cohort. (A) Annotation of RMRs with respect to overlapping gene regulatory elements; relative frequencies have been calculated with respect to each group: all RMRs (white) and DN-RMR (red). (B) Significantly over-represented KEGG pathways comprise common genes overlapping with DN-RMR. The relationships between genes overlapping with DN-RMR and KEGG pathways is depicted as a network graph, wherein the outer circle comprises genes and inner circle comprises the pathways. (C) Schematic representation of genome-wide analysis of

8 JASN JASN 31: ccc–ccc,2020 www.jasn.org BASIC RESEARCH

Gene-Level Analysis Furthermore, we tested the 206 genes that were found to be To investigate the aggregated gene-level contribution of multi- significant using F-SKAT in the replication cohort. This rep- ple SNVs, we used the F-SKAT framework.27 We tested differ- lication is limited by the less numerous SNVs in FinnDiane ent sets of SNVs that were aggregated at the gene level (see compared with the discovery cohort (2316 out of 3755 SNVs), Methods). We only found a few genes that reached the nominal which also does not include family data. Therefore, we applied significance level of P,0.05 by testing on the rare variants SKAT using only the same SNVs used by F-SKAT in the dis- (Supplemental Table 8), and found no associations with any covery cohort. This is a rather stringent replication approach, relevant functional pathways or networks. Alternatively, we as it tests for both the genes and the specific SNVs that were first identified 28,237 SNVs (within 3745 genes) that were found to be associated with DN in the discovery cohort. Out nominally associated with DN susceptibility or protection of the 206 genes tested, only 120 genes were found with at (OR.1.5; P,0.05). Then we gathered all DN-associated least one F-SKAT SNV, and nine genes passed the nominal SNVs that were within upstream 1000 bp, downstream 1000 criteria P,0.05, including a protein kinase gene PTK2 bp, 39UTR and 59UTR regions, intron, and PAVs, and tested (Supplemental Table 9C). The replicated genes are highlighted their accumulative effect on each gene. We found 206 genes in Figure 2. that reach a significance level of P,0.01 in the F-SKATanalysis (Supplemental Table 9, A and B). Analyses of PAVs To investigate the potential function of the SNVs in the It has been estimated that about 85% of mutations underlying 206 genes detected by F-SKAT, we analyzed these SNVs Mendelian diseases reside in coding sequences or at exon-in- using a recent expression quantitative trait locus44 data set tron borders.47,48 Numerous reports have described rare but from the glomerulus and tubulointerstitium of patients with highly penetrant exon mutations in Mendelian disease,49,50 nephrotic syndrome. We found that these F-SKAT significant and it is likely that such mutations also frequently contribute to genes are more likely to be under cis-acting regulation in the complex disease phenotypes. Our initial exon variant analyses glomeruli of patients with nephrotic syndrome than genes have focused on 53,449 PAVs (nonsynonymous, stop-gain, with nonsignificant F-SKAT (OR=3.84; 95% confidence in- stop-loss, and splice site variants; Supplemental Table 2) that 2 terval, 2.83 to 5.21; P=2.2310 16). This suggests that the were exclusively found in cases or controls in the 76 T1D SNVs contributing to the gene-level association with DN discordant sibling pairs and are associated with DN suscepti- (detected by F-SKAT) may exert their pathologic function bility or DN protection. The PAVs were tested for association by regulating in the kidney. We then used with DN in the FinnDiane cohort using a recessive disease Enrichr28 to test for functional enrichment in the 206 genes model for the homozygous variants detected in one or more identified by F-SKAT, and observed the only significant enrich- cases/controls in the discovery cohort and by a dominant ment for protein–protein interactions in the podocyte network model for the heterozygous SNVs detected in three or more expanded by STRING (XPodNet45) (22 out of 808 genes, en- cases/controls in the discovery cohort. The 47 PAVs identified richment P=0.005; Wikipathways46). The F-SKAT–associated in the recessive model were replicated in FinnDiane (P,0.05; genes within the core XPodNet are shown in Figure 4A and OR.1.5). By using a permutation-based strategy (see Supplemental Table 10. The genes in this subnetwork of Methods), we estimated the probability that these 47 PAVs XPodNet are enriched for several pathways, including focal are replicable by chance alone is only 2.3%. However, the false adhesion and insulin signaling (Figure 4B, Supplemental positive rate in the dominant model is estimated to be high Table 11). The top candidate gene from the F-SKAT test (Supplemental Figure 2). Therefore, only candidate SNVs that is the protein kinase C ɛ gene (PRKCE)(F-SKATP,0.001), were replicated in the recessive model are reported (top SNVs with multiple intronic DN-associated SNVs that overlap with in Table 3, and in full in Supplemental Tables 12 and 13). Some predicted regulatory regions (Figure 4C, Supplemental Table of the top-replicated PAVsare within genes that have previously 9B). Protein kinases PRKCE, PTK2 (F-SKAT P=0.004), and been linked to renal disease, implying a potential role in DN, PRKCI (F-SKAT P=0.009) are part of a “core protein-interaction e.g., mutations in WDR73 have been reported to be responsible network” representing proteins essential for podocyte function. for late-onset steroid-resistant nephrotic syndrome.51 We also These genes are particularly interesting as PKCs have been impli- studied the gene function of ABTB1, where we found the only cated in the pathogenesis of DN.10 However, specific inhibitors for case-only homozygous mutation that is truncating the protein. those three PKCs have not yet been developed to our knowledge. Zebrafish knockout of the gene displayed a phenotype that is

variants occurring in TFBSs, that were derived from 668 chromatin immunoprecipitation sequencing data sets (see Methods). (D) We identified 40 TFs with significantly different variant frequencies between cases and controls, in the TFBS, which were significantly enriched for pathways relevant to the pathophysiology of DN. For the top ten enriched KEGG pathways, the known relationships (edges) between TFs (inner circle) and the KEGG pathways (outer circle) are depicted as a network graph. (E) We found an enrichment of variants in cases in the promoter and enhancer regions (61 kb) of the ALOX5 gene locus. Enhancers and promoter regions were retrieved from FANTOM5 and crosschecked with chromHMM, whereas other gene annotations were obtained from RefSeq (see Methods).

JASN 31: ccc–ccc,2020 DNA Variants associated with Diabetic Nephropathy 9 BASIC RESEARCH www.jasn.org

A Dock4 Nrxn1 Grid2 Tbc1d4 Kif2a

Camk2b Dnm1 Psen1 Jup mICA Ywhab Vim Ctnnb1 Cdh4 Grin1 Matr3 Ctnnd1

Ldb3 Racgap1 Prkce Rac1 Neo1 Actb Trf Nrp1 Krt8 Anxa2 Palld Bcl2 Efs Src Bcar1 Cav1 Shc1 Ptpn11 Nedd9 Insr Grb7 Ptk2 Plcg1 Cbl Pik3r3 Inpp5d Ephb2 Csk Itgb5 Grb2 Sorbs3 Pxn Shc2 Kirrel Pard3 L1cam Fyn Pick1 Frs3 Kirrel2 Sorbs1 Frs2 Pard6a Kirrel3 Lrp1b Arrb1 Prkci Cdc42 Vcl

Rab4b Stxbp5 Cdc42ep4 p value 0.05 0

B C 10 CAV1 BCL2 PTK2 VCL INSR SORBS1CAMK2BTBC1D4 PRKCE PRKCI 8 Focal adhesion 6 4 Adherens junction Odds Ratio 2 Bacterial invasion 0 of epithelial cells NM_005400 45.9 mb 46.2 mb HIF-1 signaling pathway Chr 2 exon1 exon2 exon3 Insulin resistance

Insulin signaling pathway Case Dominant/Recessive SNVs Enhancer Control Dominant/Recessive SNVs Promoter Flanking Region Open Chromatin Region CTCF Binding Site; Promoter Flanking Region cis-eQTL in tubuli

Figure 4. FSKAT gene-level analysis identifies genes associated with DN in core podocyte network. (A) Graphical representation of the core podocyte network that includes the genes associated with DN by F-SKAT analysis in the discovery cohort. Node color indicates the statistical significance (P value) of the F-SKAT test. White color nodes indicates podocyte network genes not detected in this study. (B) The F-SKAT–associated genes within the podocyte network are enriched (adjusted P,0.05) for several pathways; top six pathways and contributing genes are reported. Full functional enrichment results are reported in Supplemental Table 6. (C) Details on the PRKCE gene that showed the highest association with DN (by F-SKAT) and location of the intronic SNVs associated with DN. For each SNV, the association with DN is reported by OR tested in either a recessive or dominant model. Full statistics and regulatory information on the SNVs are reported in Supplemental Table 4B. specific for kidney damage (Supplemental Appendix 1, of mutations leading to individual amino acid(s) substitution Supplemental Figure 3). showed significant over-representation/depletion in either Hyperglycemia causes an increase in intracellular reactive cases or controls. oxygen species that leads to increase in glucose derivatives, such as methylglyoxal, that readily react with amino groups Power Calculation of protein amino acid residues, particularly arginine, lysine, To estimate the statistical power for detecting association in cysteine, and methionine.52 Here, PAVs altering amino acid our sibship discovery cohort, we used a method described by codons to arginine were found to be significantly less repre- Li et al.53 We estimated the power assuming different levels sented in the set of mutations detected in controls only as of penetrance (Supplemental Table 14A). Our sample size of compared with all PAVs (OR=0.66; 95% confidence interval, 76 discordant sibling pairs reaches .80% power to detect 2 0.43 to 0.97; P=0.03; Supplemental Figure 4). No other classes significant associations (P,4.11310 9)forrarevariants

10 JASN JASN 31: ccc–ccc,2020 JASN 31: P, . ccc Table 3. Top protein-altering variants replicated with criteria 0.05 and OR 1.5 in the FinnDiane cohort

– MAF Replication, n=3531 Combined, n=3683 ccc Discovery SIFT | AA

,2020 Gene Symbol Gene Description dbSNP ID 1000 ExAC Case| P P PP2 Change OR (95% CI) OR (95% CI) Genomes (All|Finns) Controla Value Value WDR73 WD repeat domain 73 rs72750868 0.044 0.076 T|B D-.G 2|0 2.52(1.36-4.77) 0.002 2.64 (1.44-4.96) 0.001 TPPP2 Tubulin polymerization promoting protein rs9624 0.160 0.148 D|D R-.L 1|0 3.28 (1.4-8.32) 0.003 3.4 (1.46-8.55) 0.002 family member 2 UBR7 Ubiquitin protein ligase E3 component rs2286653 0.113 0.147 T|B A-.T 1|0 3.55 (1.25-11.41) 0.008 3.74 (1.35-11.92) 0.005 n-recognin 7 ATP10D ATPase phospholipid transporting 10D rs34208443 0.077 0.141 T|B P-.T 1|0 1.65 (1.11-2.46) 0.009 1.65 (1.11-2.44) 0.009 ANO9 Anoctamin 9 rs114405390 0.015 0.027 T|B T-.A 1|0 4.09 (1.18-17.89) 0.01 4.41 (1.3-19.01) 0.007 SIGIRR Single Ig and TIR domain containing rs117739035 0.016 0.029 D|D S-.Y 1|0 3.6 (1.15-13.25) 0.01 3.85 (1.26-13.97) 0.008 SFT2D1 SFT2 domain containing 1 rs11551053 0.111 0.077 T|B I-.V 1|0 3.04 (1.12-9.02) 0.02 3.21 (1.21-9.41) 0.009 HKR1 HKR1, GLI-Kruppel zinc finger family member rs2921563 0.098 0.054 T|D R-.H 1|0 5.72 (1.09-56.43) 0.02 6.4 (1.28-62.01) 0.009 KRT32 Keratin 32 rs2604956 0.046 0.071 T|D D-.E 1|0 2.15 (1.07-4.43) 0.02 2.21 (1.1-4.52) 0.02 C6orf118 open reading frame 118 rs17852379 0.103 0.073 T|D G-.E 1|0 2.55 (1.02-6.69) 0.03 2.67 (1.09-6.95) 0.02 PPP4R1 Protein phosphatase 4 regulatory subunit 1 rs329003 0.041 0.073 .|B I-.V 2|0 3 (1.01-9.9) 0.03 3.47 (1.23-11.17) 0.009 ANKRD26 Ankyrin repeat domain 26 rs12572862 0.067 0.036 T|B V-.L 1|0 8.16 (0.91-385.51) 0.03 9.59 (1.16-440.85) 0.01 PKHD1L1 Polycystic kidney and hepatic disease 1 rs117037399 0.005 0.019 T|P G-.V 1|0 8.16 (0.91-385.51) 0.03 9.59 (1.16-440.74) 0.01 CSMD1 CUB and Sushi multiple domains 1 rs34337712 0.021 0.069 T|B Q-.H 1|0 1.86 (1-3.49) 0.03 1.9 (1.03-3.53) 0.03 C6orf10 Chromosome 6 open reading frame 10 rs7775397 0.019 0.060 T|P K-.Q 1|0 1.55 (1-2.38) 0.04 1.55 (1.01-2.37) 0.04 TMEM176A Transmembrane protein 176A rs10378 0.128 0.139 D|D L-.F 0|1 0.38 (0.17-0.78) 0.004 0.37 (0.16-0.74) 0.002 C4orf51 Chromosome 4 open reading frame 51 rs10008599 0.077 0.098 D|B D-.N 0|1 0.18 (0.02-0.75) 0.007 0.17 (0.02-0.69) 0.004 SIGMAR1 Sigma nonopioid intracellular receptor 1 rs1800866 0.217 0.184 T|B Q-.P 0|2 0.43 (0.2-0.86) 0.01 0.4 (0.19-0.8) 0.005 .

N ainsascae ihDaei Nephropathy Diabetic with associated Variants DNA CPTP Ceramide-1-phosphate transfer protein rs150672559 0.005 0.007 T|B R- H 0|1 0 (0-0.82) 0.01 0 (0-0.71) 0.008 NEFH Neurofilament heavy polypeptide rs5763269 0.151 0.182 D|B P-.L 0|1 0.49 (0.26-0.9) 0.01 0.47 (0.25-0.86) 0.009 TNFRSF14 TNF receptor superfamily member 14 rs2234167 0.114 0.130 T|B V-.I 0|1 0.47 (0.22-0.92) 0.02 0.45 (0.22-0.88) 0.01 TBC1D9 TBC1 domain family member 9 rs13118702 0.010 0.020 T|B E-.K 0|1 0.13 (0-0.91) 0.02 0.12 (0-0.81) 0.01 UNC93A Unc-93 homolog A rs2235197 0.110 0.109 .|. W-.* 0|4 0.51 (0.27-0.94) 0.02 0.46 (0.24-0.84) 0.007 TYR Tyrosinase rs1042602 0.123 0.252 .|D S-.Y 0|5 0.63 (0.41-0.96) 0.03 0.58 (0.38-0.88) 0.007 ATAD3B ATPase family, AAA domain containing 3B rs139902189 0.078 0.076 D|P C-.T 0|1 0.32 (0.08-0.97) 0.03 0.3 (0.08-0.9) 0.02 www.jasn.org TEX101 Testis expressed 101 rs35033974 0.041 0.084 D|D G-.T 0|3 0.51 (0.25-0.98) 0.03 0.47 (0.23-0.88) 0.01 AVEN Apoptosis and caspase activation inhibitor rs61729120 0.007 0.016 D|D G-.T 0|2 0.23 (0.03-1.01) 0.03 0.2 (0.02-0.84) 0.01 ZNF844 Zinc finger protein 844 rs76842919 0.026 0.060 D|B A-.G 0|1 0.23 (0.03-1.01) 0.03 0.21 (0.02-0.91) 0.02 ZNF844 Zinc finger protein 844 rs8102258 0.119 0.095 T|B T-.C 0|1 0.23 (0.03-1.01) 0.03 0.21 (0.02-0.91) 0.02 OR631 Olfactory receptor family 6 subfamily rs12364099 0.077 0.122 D|B C-.A 0|1 0.54 (0.28-0.99) 0.04 0.51 (0.27-0.94) 0.02

X member 1 RESEARCH BASIC Case/control only protein-altering SNVs that remain significant (OR.1.5; P,0.05) after replication in the FinnDiane cohort (1344 cases, 2187 controls). Only the top 15 protein-altering SNVs detected in the recessive model (case or control only) are listed here (full results are reported in Supplemental Table 12). MAF in general population is annotated from the 1000 Genomes project and ExAC. dbSNP, single nucleotide polymorphism database; ExAC, the exome aggregation consortium. The potential effect of a variant in the protein is predicted by SIFT (T: Tolerant; D: Deleterious) and Polyphen2 (B: Benign; P: Possibly damaging; D: Damaging). “.” data not available; *, stop codon. aThe number of homozygous carriers of the SNV in the case group and in the control group. “|” sign is used to separate the two numbers. 11 BASIC RESEARCH www.jasn.org with high penetrance (penetrance, 90%; MAF=0.01). Further- model (SKAT versus F-SKAT) might also introduce a bias in more, we estimate the power for the replication study. Similar the replication test. The constraints caused by limited geno- to a previous report,18 our replication cohort (n=3531) reaches types in the replication cohort also apply to the RMR and TFBS at least 80% power detect common variants with high OR replications. The data-driven detection of RMRs requires (OR=2, MAF=0.05 in the dominant model; OR=5, MAF=0.2 comprehensive SNV data (i.e., WGS data). Using a panel of in the recessive model). predefined genotyped single-nucleotide polymorphisms (i.e., single-nucleotide polymorphism array data), even if the panel is large and supported by imputation, might introduce a DISCUSSION considerable bias in the replication of DN-associated RMRs. The analyses of the discovery cohort led to the identifica- To the best of our knowledge, this is the first study where WGS tion of several novel DN candidate genes in Finns, including has been applied in a search for genomic variants specifically PRKCE, PTK1, PRKCI, ABTB1, and ALOX5, as discussed above. associated with the presence or absence of DN in patients The significant association of three protein kinase genes with with T1D. The challenge with finding susceptibility genes DN is intriguing, as the large PKC protein family has long been for diabetes complications is that one searches for muta- associated with diabetes complications.4,10 Several clinical tions that only cause complications if the individual has trials have been carried out for the treatment of DN with hyperglycemia. We assembled a unique discovery cohort ruboxistaurin, a compound that inhibits PRKC-b.54 This of T1D siblings from the highly homogeneous Finnish popu- suggests that hyperglycemia-driven PKC activation, particu- lation and replicated key findings in a larger cohort of unrelated larly that of the b-isoform, may underlie endothelial dysfunc- T1D Finns. This enabled a direct comparison of whole-genome tion. In this study, we identified two novel isoforms of the PKC sequences in individuals with extreme phenotypes, i.e., T1D family (i.e., epsilon and iota) that have not been previously with progressive DN on one hand, and siblings with no com- linked to DN. The results strongly support and extend previous plications for at least 15 years (range, 15–37 years) on the other. hypotheses that protein kinases, especially the PKC family, play The results provide a unique catalog of DNA variants in Finns. a role in the pathogenesis of DN, and could be attractive novel We have developed a comprehensive panel of multiple targets for the development of PKC inhibitors for DN treatment. bioinformatic approaches to detect genetic predisposition DN is a disorder characterized by hyperglycemia, which can of DN in the discovery sibling cohort. The SNVs approach, lead to nonenzymatic glycation of amino acids and formation which evaluates PAVs that are present only in cases or con- of advanced glycation end products in both intracellular and trols,focusesonthepotentialproteinfunctioninDN.The extracellular proteins.4,9,55 It can be speculated that glycation kernel test (F-SKAT) prioritizes genes with multiple associated of amino acids in functionally important regions of the pro- variants within the gene region, and hypothesizes that the ac- tein can affect functionality of the protein or promote their cumulated burden leads to malfunction of the gene. The geno- degradation.3 Amino acids that are most prone to become non- mic approach includes variants in other genome regions and enzymatically glycated by methylglyoxal and other carbonyls could potentially detect functionally important regions. These are arginine and, to a lesser extent, lysine,56 cysteine, and approaches identified different individual variants, genes, and methionine.4,9 Our study highlighted mutated arginine codons regulatory regions that are potentially involved in DN as being of special interest when considering mutations that susceptibility. can cause pathogenic nonenzymatic glycation of proteins and Although the discovery cohort only consisted of 161 indi- consequent development of DN. viduals with T1D, together with the FinnDiane replication Previously reported genes/regions associated with DN were cohort, we show that they can provide enough power to iden- not strongly replicated in our discovery cohort (Supplemental tify and replicate potential causative and protective mutations Table 15), suggesting that different sets of loci/variants for DN. Here, the use of discordant T1D sibling pairs for DN contribute to the pathogenesis of DN. However, despite has been pivotal to increase power to identify variants associ- the scarce replication of previous loci in our cohort, we report ated with DN susceptibility of protection. the identification of variants/genes in functional pathways rel- We have also studied the replication of candidates and re- evant to the pathobiology of DN, many of which have been port candidates with robust signals for each analysis approach. previously reported (e.g., EGF receptor-dependent endothelin However, although the replication of SNVs is commonly used signaling34 and PodNet45). for GWAS where it applies on the same loci, the replication for Overall, we have performed a comprehensive study on the statistical tests that involve multiple loci (i.e., RMRs, F-SKAT, genetics of a unique T1D Finnish cohort of siblings discordant and TFBS) have limitations that need to be taken into consid- for nephropathy using WGS data. Although the sample size is eration. For replication of F-SKAT in FinnDiane, about one relatively small and the association test for SNV cannot reach 2 third of F-SKAT SNVs cannot be found by array genotyping the genome-wide significance (P,4310 9), efforts were made plus imputation. Thus, the number of replicable genes is lim- to optimize the test model to fit for the specific sibship cohort, ited (120 out of 206), and within each gene, SNVs are also and the top-listed SNVs were replicated (when applicable) in less represented. Additionally, the use of a different statistical larger Finnish cohort. Novel potential DN susceptibility genes

12 JASN JASN 31: ccc–ccc,2020 www.jasn.org BASIC RESEARCH and regulatory variants are promoted in hope to merit further Supplemental Table 3. Frameshift-causing small insertions and investigation in other populations and animal models. deletions (indels) found in DN cases-only or controls-only individ- uals of the Finnish T1D discordant sibling pair discovery cohort. Supplemental Table 4. RMRs significantly over-represented in DN ACKNOWLEDGMENTS cases or controls (FDR,5% in discovery cohort) and replication in FinnDiane cohort. All genetic association data presented here are made freely accessible Supplemental Table 5. TFBS affected by DN mutations. via http://dnc.systems-genetics.net. Supplemental Table 6. Enhancer (S6a) and promoter (S6b) region We are grateful to the physicians, nurses, and researchers in the with mutations over-represented in DN cases or controls (FDR,0.05 FinnDiane study group and at each center participating in the in discovery cohort) and replication statistics in FinnDiane cohort. collection of patients. We thank Leena Ollitervo and Maire Jarva Supplemental Table 7. Enhancers replicated in FinnDiane cohort for technical assistance on DNA extraction. We also acknowledge and gene prioritization. Dr. Jaakko Tuomilehto for the original collection of samples. The Supplemental Table 8. (A) Genes associated with DN by F-SKAT computational analyses were performed on resources provided by analysis (P,0.01). (B) Details on the DN-associated SNVs used in the Swedish National Infrastructure for Computing through Uppsala F-SKAT analysis. Multidisciplinary Center for Advanced Computational Science Supplemental Table 9. (A) F-SKAT test results on rare SNVs with (UPPMAX) under Project b2013027, and the High-Performance MAF,0.01. Only top genes with P,0.1 are reported. (B) F-SKAT test Computing Cluster in the Duke-National University of Singapore on SNVs with MAF,0.05. Only top genes with P,0.1 are reported. Computational Center. (C) Replication of F-SKAT significant (P,0.01) genes in FinnDiane cohort. Supplemental Table 10. PodNet genes detected by F-SKAT(P,0.01). DISCLOSURES Supplemental Table 11. Functional enrichment test (KEGG pathways) on the core genes within the XPodNet network in Figure 4. Dr. Groop reports personal fees from AbbVie, personal fees from Astellas, Supplemental Table 12. Protein-altering SNVs replicated in FinnDiane personal fees from Astra Zeneca, personal fees from Boehringer Ingelheim, , . personal fees from Eli Lilly, personal fees from Elo Water, personal fees from cohort (combined P-value 0.05; OR 1.5), in each genetic model. Janssen, personal fees from Medscape, personal fees from MSD, personal fees Supplemental Table 13. Noncoding RNA SNVs replicated in FinnDiane from Mundipharma, personal fees from Novo Nordisk, and personal fees from cohort (combined P-value,0.05; OR.1.5), in each genetic model. fi Sano , outside the submitted work. Supplemental Table 14. (A) Power estimation of discovery cohort (76 discordant sibling pairs) on the whole genome level of significance , 3 29 FUNDING (12 million, P 4.11 10 ) of case-only and control-only variants. Power estimation assuming different levels of penetrance. (B) Power The work was supported by grants to Prof. Tryggvason from the Novo Nordisk Foundation, Knut and Alice Wallenberg Foundation, Söderberg estimation of replication cohort (2187 controls and 1344 cases) with 28 Foundation, Hedlund Foundation, the Swedish Medical Research Council, genome-wide significance level (P,5310 ) with one-stage study design. the Swedish Foundation for Strategic Research, Sigrid Juselius Foundation, Supplemental Table 15. Test previously reported SNVs in dis- Folkhälsan Research Foundation, the Wilhelmand Else Stockmann Founda- covery cohort. SNVs were downloaded from GWAS catalog. tion, the Liv och Hälsa Foundation; and grants to Prof. Groop from Helsinki Supplemental Figure 1. Manhattan plot of the RMRs identified University Central Hospital Research Funds (EVO), Juvenile Diabetes Re- search Foundation (17-2013-7 [Diabetic Nephropathy Collaborative Research genome-wide in the 76 T1D discordant sibling pairs. Initiative]), European Foundation for the Study of Diabetes Young Investiga- Supplemental Figure 2. Estimation of replication false positive rate tor Research Award funds, and the Academy of Finland grants (38387, 46558, on PAVs in FinnDiane cohort. 275614, 299200, and 316664). This work has also been supported by Singapore Supplemental Figure 3. Expression and functional analysis of Abtb1. National Medical Research Council grants (NMRC/OFLCG/001/2017 and Supplemental Figure 4. Forest plots showing that PAVs altering NMRC/STaR/0010/2012). amino acid codons for arginine (Arg) are less represented in the set of mutations detected in controls as compared with all protein-altering SUPPLEMENTAL MATERIAL mutations (indicated by solid diamond). Supplemental Figure 5. Chromosome 3q21 locus for DN sus- This article contains the following supplemental material ceptibility that was previously identified. online at http://jasn.asnjournals.org/lookup/suppl/doi:10.1681/ ASN.2019030289/-/DCSupplemental. Supplemental Appendix 1. Supplemental methods, results, web REFERENCES resources, and references. Supplemental Table 1. Comparison of DNA sequencing quality 1. Intensive blood-glucose control with sulphonylureas or insulin com- using the Illumina and Complete Genomics platforms. pared with conventional treatment and risk of complications in patients with type 2 diabetes (UKPDS 33). UK Prospective Diabetes Study Supplemental Table 2. Annotation of SNVs and small insertions (UKPDS) Group. Lancet 352: 837–853, 1998 and deletions (indels) identified in 161 genomes in the discovery 2. Braam B, Koomans HA: Renal responses to antagonism of the renin- cohort by RefSeq. angiotensin system. Curr Opin Nephrol Hypertens 5: 89–96, 1996

JASN 31: ccc–ccc,2020 DNA Variants associated with Diabetic Nephropathy 13 BASIC RESEARCH www.jasn.org

3. Westwood ME, Argirov OK, Abordo EA, Thornalley PJ: Methylglyoxal- 25. Weinhold N, Jacobsen A, Schultz N, Sander C, Lee W: Genome-wide modified arginine residues--a signal for receptor-mediated endocytosis and analysis of noncoding regulatory mutations in cancer. Nat Genet 46: degradation of proteins by monocytic THP-1 cells. Biochim Biophys Acta 1160–1165, 2014 1356: 84–94, 1997 26. Quinlan AR, Hall IM: BEDTools: A flexible suite of utilities for comparing 4. Brownlee M: Biochemistry and molecular cell biology of diabetic genomic features. Bioinformatics 26: 841–842, 2010 complications. Nature 414: 813–820, 2001 27.YanQ,TiwariHK,YiN,GaoG,ZhangK,LinWY,etal.:Asequence 5. Thomas MC, Groop PH, Tryggvason K: Towards understanding the kernel association test for dichotomous traits in family samples under a inherited susceptibility for nephropathy in diabetes. Curr Opin Nephrol generalized linear mixed model. Hum Hered 79: 60–68, 2015 Hypertens 21: 195–202, 2012 28. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, 6. Thomas MC, Brownlee M, Susztak K, Sharma K, Jandeleit-Dahm KA, et al.: Enrichr: A comprehensive gene set enrichment analysis web Zoungas S, et al.: Diabetic kidney disease. Nat Rev Dis Primers 1: server 2016 update. Nucleic Acids Res 44: W90-7, 2016 15018, 2015 29. Syreeni A, Sandholm N, Cao J, Toppila I, Maahs DM, Rewers MJ, et al.: 7. Borch-Johnsen K, Nørgaard K, Hommel E, Mathiesen ER, Jensen JS, DCCT/EDIC Research Group; Groop PH; FinnDiane Study Group: Ge- Deckert T, et al.: Is diabetic nephropathy an inherited complication? netic determinants of glycated hemoglobin in type 1 diabetes. Diabetes Kidney Int 41: 719–722, 1992 68: 858–867, 2019 8. Seaquist ER, Goetz FC, Rich S, Barbosa J: Familial clustering of diabetic 30. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X: Rare-variant association kidney disease. Evidence for genetic susceptibility to diabetic ne- testing for sequencing data with the sequence kernel association test. phropathy. NEnglJMed320: 1161–1165, 1989 Am J Hum Genet 89: 82–93, 2011 9. Giacco F, Brownlee M: Oxidative stress and diabetic complications. 31. Pihlajaniemi T, Myllylä R, Kivirikko KI, Tryggvason K: Effects of strep- Circ Res 107: 1058–1070, 2010 tozotocin diabetes, glucose, and insulin on the metabolism of type IV 10. Geraldes P, King GL: Activation of protein kinase C isoforms and its collagen and proteoglycan in murine basement membrane-forming impact on diabetic complications. Circ Res 106: 1319–1331, 2010 EHS tumor tissue. JBiolChem257: 14914–14920, 1982 11. Dahlström E, Sandholm N: Progress in defining the genetic basis of 32. Mason RM, Wahab NA: Extracellular matrix metabolism in diabetic diabetic complications. Curr Diab Rep 17: 80, 2017 nephropathy. JAmSocNephrol14: 1358–1373, 2003 12. Moczulski DK, Rogus JJ, Antonellis A, Warram JH, Krolewski AS: Major 33. Griffon A, Barbier Q, Dalino J, van Helden J, Spicuglia S, Ballester B: susceptibility locus for nephropathy in type 1 diabetes on chromosome Integrative analysis of public ChIP-seq experiments reveals a complex 3q: Results of novel discordant sib-pair analysis. Diabetes 47: 1164–1169, multi-cell regulatory landscape. Nucleic Acids Res 43: e27, 2015 1998 34. Barton M: Therapeutic potential of endothelin receptor antagonists for 13. Imperatore G, Hanson RL, Pettitt DJ, Kobes S, Bennett PH, Knowler chronic proteinuric renal disease in humans. Biochim Biophys Acta WC: Sib-pair linkage analysis for susceptibility genes for microvascular 1802: 1203–1213, 2010 complications among Pima Indians with type 2 diabetes. Pima diabetes 35. Zhang MZ, Wang Y, Paueksakon P, Harris RC: Epidermal growth factor genes group. Diabetes 47: 821–830, 1998 receptor inhibition slows progression of diabetic nephropathy in as- 14. Bowden DW, Colicigno CJ, Langefeld CD, Sale MM, Williams A, sociation with a decrease in endoplasmic reticulum stress and an in- Anderson PJ, et al.: A genome scan for diabetic nephropathy in African crease in autophagy. Diabetes 63: 2063–2072, 2014 Americans. Kidney Int 66: 1517–1526, 2004 36. Sandholm N, Salem RM, McKnight AJ, Brennan EP, Forsblom C, 15. Österholm AM, He B, Pitkäniemi J, Albinsson L, Berg T, Sarti C, et al.: Isakova T, et al.: DCCT/EDIC Research Group: New susceptibility loci Genome-wide scan for type 1 diabetic nephropathy in the Finnish associated with kidney disease in type 1 diabetes. PLoS Genet 8: population reveals suggestive linkage to a single locus on chromosome e1002921, 2012 3q. Kidney Int 71: 140–145, 2007 37. Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, 16. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, Boyd M, et al.: An atlas of active enhancers across human cell types et al.: Exome Aggregation Consortium: Analysis of protein-coding and tissues. Nature 507: 455–461, 2014 genetic variation in 60,706 humans. Nature 536: 285–291, 2016 38. ENCODE Project Consortium: An integrated encyclopedia of DNA 17. Harjutsalo V, Sund R, Knip M, Groop PH: Incidence of type 1 diabetes in elements in the . Nature 489: 57–74, 2012 Finland. JAMA 310: 427–428, 2013 39. Woroniecka KI, Park AS, Mohtat D, Thomas DB, Pullman JM, Susztak K: 18. Sandholm N, Van Zuydam N, Ahlqvist E, Juliusdottir T, Deshmukh HA, Transcriptome analysis of human diabetic kidney disease. Diabetes 60: Rayner NW, et al.: The FinnDiane Study Group; The DCCT/EDIC Study 2354–2369, 2011 Group; GENIE Consortium; SUMMIT Consortium: The genetic land- 40. Nolis IK, McKay DJ, Mantouvalou E, Lomvardas S, Merika M, Thanos D: scapeofrenalcomplicationsintype1diabetes.J Am Soc Nephrol 28: Transcription factors mediate long-range enhancer-promoter interac- 557–574, 2017 tions. Proc Natl Acad Sci U S A 106: 20222–20227, 2009 19. Peltonen L, Jalanko A, Varilo T: Molecular genetics of the Finnish dis- 41. Ma J, Natarajan R, LaPage J, Lanting L, Kim N, Becerra D, et al.: 12/15- ease heritage. Hum Mol Genet 8: 1913–1923, 1999 lipoxygenase inhibitors in diabetic nephropathy in the rat. Prostaglandins 20. Thorn LM, Forsblom C, Fagerudd J, Thomas MC, Pettersson-Fernholm K, Leukot Essent Fatty Acids 72: 13–20, 2005 Saraheimo M, et al.: FinnDiane Study Group: Metabolic syndrome in 42. Kang SW, Adler SG, Nast CC, LaPage J, Gu JL, Nadler JL, et al.: type 1 diabetes: Association with diabetic nephropathy and glycemic 12-lipoxygenase is increased in glucose-stimulated mesangial cells and control (the FinnDiane study). Diabetes Care 28: 2019–2024, 2005 in experimental diabetic nephropathy. Kidney Int 59: 1354–1362, 2001 21. Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, 43. Gubitosi-Klug RA, Talahalli R, Du Y, Nadler JL, Kern TS: 5-Lipoxygenase, Zondervan KT: Basic statistical analysis in genetic case-control studies. but not 12/15-lipoxygenase, contributes to degeneration of retinal Nat Protoc 6: 121–133, 2011 capillaries in a mouse model of diabetic retinopathy. Diabetes 57: 1387– 22. Wang X: Firth logistic regression for rare variant association tests. Front 1393, 2008 Genet 5: 187, 2014 44. Gillies CE, Putler R, Menon R, Otto E, Yasutake K, Nair V, et al.: 23. Ma C, Blackwell T, Boehnke M, Scott LJ; GoT2D investigators: Recom- Nephrotic Syndrome Study Network (NEPTUNE): An eQTL land- mended joint and meta-analysis strategies for case-control association test- scape of kidney tissue in human nephrotic syndrome. Am J Hum ing of single low-count variants. Genet Epidemiol 37: 539–550, 2013 Genet 103: 232–244, 2018 24. Georg Heinze M.P. logistf: Firth’s Bias-Reduced Logistic Regression. R 45. Warsow G, Endlich N, Schordan E, Schordan S, Chilukoti RK, Homuth package version 1.22. 2016. Available at: https://rdrr.io/cran/logistf/ G, et al.: PodNet, a protein-protein interaction network of the podo- man/logistf.html. Accessed October 11, 2019 cyte. Kidney Int 84: 104–115, 2013

14 JASN JASN 31: ccc–ccc,2020 www.jasn.org BASIC RESEARCH

46. Kutmon M, Riutta A, Nunes N, Hanspers K, Willighagen EL, Bohler A, microcephaly and steroid-resistant nephrotic syndrome: Galloway- et al.: WikiPathways: capturing the full diversity of pathway knowledge. Mowat syndrome. Am J Hum Genet 95: 637–648 Nucl. Acids Res. 44: D488–D494, 2016 52. Thao MT, Gaillard ER: The glycation of fibronectin by glycolaldehyde 47. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, et al.: and methylglyoxal as a model for aging in Bruch’smembrane.Amino Exome sequencing identifies the cause of a mendelian disorder. Nat Acids 48: 1631–1639, 2016 Genet 42: 30–35, 2010 53. Li Z, McKeague IW, Lumey LH: Optimal design strategies for sibling 48. Lalonde E, Albrecht S, Ha KC, Jacob K, Bolduc N, Polychronakos C, studies with binary exposures. Int J Biostat 10: 185–196, 2014 et al.: Unexpected allelic heterogeneity and spectrum of mutations in 54. Bansal D, Badhan Y, Gudala K, Schifano F: Ruboxistaurin for the Fowler syndrome revealed by next-generation exome sequencing. treatment of diabetic peripheral neuropathy: A systematic review of Hum Mutat 31: 918–923, 2010 randomized clinical trials. Diabetes Metab J 37: 375–384, 2013 49. Tsui LC, Dorfman R: The cystic fibrosis gene: A molecular genetic 55. Qi W, Keenan HA, Li Q, Ishikado A, KanntA,SadowskiT,etal.:Pyruvatekinase perspective. Cold Spring Harb Perspect Med 3: a009472, 2013 M2 activation may protect against the progression of diabetic glomerular 50. Sulem P, Helgason H, Oddson A, Stefansson H, Gudjonsson SA, Zink F, pathology and mitochondrial dysfunction. Nat Med 23: 753–762, 2017 et al.: Identification of a large set of rare complete human knockouts. 56. Dhar I, Dhar A, Wu L, Desai K: Arginine attenuates methylglyoxal- and Nat Genet 47: 448–452, 2015 high glucose-induced endothelial dysfunction and oxidative stress by 51. Colin E, Huynh Cong E, Mollet G, Guichet A, Gribouval O, Arrondel C, an endothelial nitric-oxide synthase-independent mechanism. JPhar- et al.: Loss-of-function mutations in WDR73 are responsible for macol Exp Ther 342: 196–204, 2012

AFFILIATIONS

1Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden; 2Cardiovascular and Metabolic Disorders Programme, Duke-National University of Singapore Medical School, Singapore; 3Folkhälsan Institute of Genetics, Folkhälsan Research Centre, Helsinki, Finland; 4Abdominal Center Nephrology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland; 5Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland; 6Chronic Disease Prevention Unit, National Institute for Health and Welfare, Helsinki, Finland; 7Complex Disease Research Center, BGI Genomics, Shenzhen, China; 8Science Division, Yale-National University of Singapore College, National University of Singapore, Singapore; 9MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; 10Department of Diabetes, Central Clinical School, Monash University, Melbourne, Victoria, Australia; and 11Division of Nephrology, Department of Medicine, Duke University, Durham, North Carolina

JASN 31: ccc–ccc,2020 DNA Variants associated with Diabetic Nephropathy 15