BASIC RESEARCH www.jasn.org

Contributions of Rare Variants to Familial and Sporadic FSGS

Minxian Wang ,1,2,3 Justin Chun,1,2,4 Giulio Genovese,5 Andrea U. Knob,1 Ava Benjamin,1 Maris S. Wilkins,1 David J. Friedman,1,2 Gerald B. Appel,6 Richard P. Lifton,7 Shrikant Mane,8 and Martin R. Pollak1,2,3

Due to the number of contributing authors, the affiliations are listed at the end of this article.

ABSTRACT Background Over the past two decades, the importance of genetic factors in the development of FSGS has become increasingly clear. However, despite many known monogenic causes of FSGS, single gene defects explain only 30% of cases. Methods To investigate underlying FSGS, we sequenced 662 whole exomes from individuals with sporadic or familial FSGS. After quality control, we analyzed the exome data from 363 unrelated family units with sporadic or familial FSGS and compared this to data from 363 ancestry-matched controls. We used rare variant burden tests to evaluate known disease-associated and potential new genes. Results We validated several FSGS-associated genes that show a marked enrichment of deleterious rare variants among the cases. However, for some genes previously reported as FSGS related, we identified rare variants at similar or higher frequencies in controls. After excluding such genes, 122 of 363 cases (33.6%) had rare variants in known disease-associated genes, but 30 of 363 controls (8.3%) also harbored rare variants that would be classified as “causal” if detected in cases; applying American College of Medical Genetics filtering guidelines (to reduce the rate of false-positive claims that a variant is disease related) yielded rates of 24.2% in cases and 5.5% in controls. Highly ranked new genes include SCAF1, SETD2,andLY9. Network analysis showed that top-ranked new genes were located closer than a random set of genes to known FSGS genes. Conclusions Although our analysis validated many known FSGS-causing genes, we detected a nontrivial number of purported “disease-causing” variants in controls, implying that filtering is inadequate to allow clinical diagnosis and decision making. Genetic diagnosis in patients with FSGS is complicated by the nontrivial rate of variants in known FSGS genes among people without disease.

JASN 30: 1625–1640, 2019. doi: https://doi.org/10.1681/ASN.2019020152

Focal and segmental glomerulosclerosis (FSGS) is a onset, associated with progressive kidney disease, common histologically defined pattern of kidney and often manifest with subnephrotic proteinuria injury. The clinical syndrome associated with and the absence of the full set of features that define FSGS varies in severity, age of onset, progression the nephrotic syndrome. This is typical of kidney of disease, responsiveness to steroids, and degree of proteinuria.1 Over the past two decades, the im- portance of genetic factors in the development of Received February 14, 2019. Accepted April 25, 2019. FSGS has become increasingly clear. More than 50 Published online ahead of print. Publication date available at single gene forms of the pathologically defined en- www.jasn.org. fi tity FSGS or the related clinically de ned pheno- Correspondence: Dr. Martin Pollak, Beth Israel Deaconess Medical type steroid-resistant nephrotic syndrome (SRNS) Center, 99 Brookline Avenue, Boston, MA 02215-5491. Email: have been identified.2 As a general rule, the auto- [email protected] somal dominant forms of FSGS tend to be late Copyright © 2019 by the American Society of Nephrology

JASN 30: 1625–1640, 2019 ISSN : 1046-6673/3009-1625 1625 BASIC RESEARCH www.jasn.org disease associated with mutations in ACTN4, INF2, and Significance Statement TRPC6.3–5 In contrast, recessive forms of FSGS and SRNS tend to be earlier in onset and associated with more severe Despite many known monogenic causes of FSGS, single gene de- proteinuria and overt nephrotic syndrome.6 Among the most fects explain only 30% of cases. In this study, sequencing of 662 severe form of inherited SRNS is congenital nephrotic syn- exomes from families with FSGS and 622 control exomes validated many known FSGS-causing genes. However, for some genes pre- drome caused by mutations in the nephrin gene, NPHS1. Mu- viously reported as FSGS related, they identified a number of pur- tations in NPHS1 typically cause neonatal nephrosis and dif- ported “disease-causing” variants in controls at similar or higher fuse podocyte foot process effacement, rather than FSGS. frequencies. They also identified multiple additional candidate However, at least one report has suggested increased fre- FSGS genes in which rare variants were more common among ca- quency of rare NPHS1 variants in patients with FSGS.7 It is ses. Network analysis showed that their top-ranked genes were located significantly closer to known FSGS genes compared with a reasonable to view early onset nephrotic syndrome and random gene set. These findings imply that genetic diagnosis in later onset FSGS as part of a spectrum of phenotypes caused patients with FSGS is complicated by the nontrivial rate of variants by a variety of mutations in genes that primarily affect po- in known FSGS genes among people without kidney disease. docyte function.8 In addition, it is increasingly clear that mutations in genes not typically thought of as glomerular nephrotic syndrome. We obtained blood or saliva for DNA disease genes can lead to FSGS as a secondary response to isolation as well as clinical information after receiving in- primary injury. As examples, mutations in type 4 collagen formed consent. Control exome data were obtained from the genes and nephronophthisis genes have been reported in Yale Center for Mendelian Genetics. DNA for sequencing was association with FSGS.9–12 extracted from blood or saliva using standard methods. The majority of known FSGS and SRNS genes have been Of 1284 exomes, 1079 (84%) were captured by Roche’s identified through linkage analysis (including homozygosity MedExome target enrichment kit. The other 205 exomes mapping) and subsequent sequence analysis of the candi- were captured by different capture technologies. This study date regions.6 Here, we report on the results of an exome- analyzed 1284 exomes. Excluding the repeated samples, this sequencing study aimed at understanding the genetic landscape represents 1241 unique samples, 622 controls and 619 affected of mutations and variants in patients and families carrying the individuals or their family members. The number of se- diagnosis of FSGS. We examined the frequencies of rare variants quenced individuals per family ranged from one to seven in known FSGS-causing genes, other known kidney disease (Supplemental Table 1). genes, and the rest of the human exome in a cohort of patients with familial or sporadic FSGS. We performed the same analysis in a similar number of control individuals. We analyzed the Variant Calling, Quality Control, and Inbreeding fi burden of such variants in cases versus controls. We identified Coef cient Estimation 13 several new genes that may be possible contributors to the risk of Read quality was checked and preprocessed by FastQC and fi FSGS. The top 50 new genes in our analysis are significantly Trimmomatic (version 0.36). Quality-checked FASTQ les closer to known FSGS genes in a gene interaction network anal- were aligned against the human reference genome 14 ysis than are randomly chosen sets of genes, suggesting that (GRCH38, hg38) by BWA (version 0.7.13-r1126). Variants several of these genes are in fact true contributors to FSGS. were called according to the best practices for use of the Ge- – 15,16 Importantly, we find that for many genes previously described nome Analysis Toolkit (version 3.5 0-g36282e4). Only fl as FSGS or kidney disease genes, nonsynonymous and loss-of- variants located in exon regions and the 100-base anking function (LOF) variants are as common in matched controls region of each exon were called. The gene exon annotations 17 18 as they are in cases. These results have important implications were from refSeq (hg38). CrossMap (v0.2.2) was used to for understanding the landscape of genetic variation in FSGS, as convert the genome coordinates from hg38 to hg19 for the fi fi well as for the interpretation of genetic testing in patients VCF le. Three additional lters were also applied at each $ $ with FSGS. individual call, read depth 4, genotype quality 20, and minor read ratio $0.2 for heterozygous sites. The average read depth was approximately 20–303 after removing indi- METHODS viduals with read depth #103. Three individuals with an excess number of Mendelian errors were also removed. Human Subjects Individuals were also removed if genetic relationship was The study was approved by the institutional review board at not consistent with pedigree structure. Additionally, the fi Beth Israel Deaconess Medical Center. Individuals belonging genome-wide inbreeding coef cient was estimated by 19 to 395 families with FSGS were included in this study. Famil- IBDLD software. ial FSGS–affected status was defined as having either a reported history of proteinuria with urine albumin-to-creatinine ratio Variant Annotation of 250 mg/g, nephrotic syndrome, or biopsy-proven FSGS in a Variants were annotated by Variant Effect Predictor (VEP) soft- family with at least one other case of documented FSGS or ware,20 with annotated information including: minor allele

1626 JASN JASN 30: 1625–1640, 2019 www.jasn.org BASIC RESEARCH frequency [MAF] of continental populations (in the GnomAD Ricans from Puerto Rico; CLM, Colombians from Medellin, project21 and in the Exome Aggregation Consortium [ExAC] Colombia; and PEL, Peruvians from Lima, Peru]; SAS, South project21); the Genomic Evolutionary Rate Profiling (GERP) Asian; AA, African American; AFR, African). The ancestry of score22; CADD score23; and deleterious effect prediction by our samples was inferred by the k-nearest neighbors (k-NN) PolyPhen2,24 SIFT,25 M-CAP,26 LRT,27 and the MutationTaster algorithm on the basis of the distance between individuals method.28 The confidences of LOF effects of splice sites, non- measured in the top ten principal components space. Specif- sense sites, and indels were annotated using the LOFTEE plugin ically, we set k=1, and the distance between two individuals 20 ∑10 ∙ 2 of the VEP software. (i and j) was measured as Dij ¼ m¼1Vm ðPmiPmjÞ , where Vm is the proportion of variance explained by the mth principal Rare Variant Spectrum in Known Disease-Causing component, and Pkl is the kth projection value of individual l. Genes We matched each case by one control with the closest distance In order to catalog the spectrum of rare variants in known dis- measured by D. ease-causing genes, we performed rare variant filtering according to the suspected genetic inheritance model, the allele frequency, Association Test for Common Variants and Gene-Based and the genetic prediction scores from in silico genetic score Burden Test for Moderately Rare Variants prediction software. Variant cosegregation with phenotype was The genomic inflation factor (ƛ)30,31 and the data for QQ plot checked whenever possible. Only the splicing sites, stop gain/loss were estimated by PLINK2 software32 for the case/control sites, in-frame/frame-shift indels, and nonsynonymous sites data of before and after case/control matching. Only common were analyzed. Note that for the compound heterozygous model variants (MAF$0.05) were used in this analysis. In order to we did not have phasing information available. In order to com- deal with heterogeneous variant effects and suboptimal allele ply with the American College of Medical Genetics (ACMG) frequency cut-offs, three methods (variable threshold [VT] standards for gaining at least a supporting level of pathogenic test, C-a test, and SKATO test) were applied for moderately classification by computational prediction, we required that puta- rare variants (MAF#5%). These tests were performed using tive causal variants should be predicted as damaging by at least PLINK/seq (http://atgu.mgh.harvard.edu/plinkseq/) and three algorithms of five prediction methods used for nonsynon- EPACTS software (http://genome.sph.umich.edu/wiki/ ymous rare variants, or as LOF variants classified by the LOTTEE EPACTS). plugin of VEP20 as high-confidence LOF sites. Additional filters specifically for each inheritance model were as follows: Gene-Based Burden Test for Extremely Rare Variants For different inheritance models, the qualified variants were Dominant Model classified similarly to how variants in known disease-causing (1) The maximum population-level allele frequency (pop_max) genes were classified, but only case/control-matched data were was assumed to be ,0.0001 in the general population. (2)The used. After variant filtering and classification, two tests were identified rare variant was absent from control samples. performed. The “single-group” burden test33,34 (Supplemen- (3)CADDscore$10. tal Figure 1A) compares the observed rare variant rate to the expected rate of each gene. The “case-control” burden test Recessive Model (Homozygous) (Supplemental Figure 1B) compares the observed rare variant (1) Maximum population frequency ,1%. (2)Nohomozygous rate between case and control samples. To validate the perfor- alternative allele genotype in GnomAD and ExAC databases, or mance of this method, we explored the model performance in our in-house control samples. (3)CADDscore$10. by different parameter combinations. The performance was assessed by the total information content I (I ¼ 2 ∑log2ðÞPi ) Recessive Model (Compound Heterozygous) and by the ranking of these genes (Supplemental Figure 2). To (1) Maximum population frequency ,8%. Note: the allele test the asymmetric distribution of the top genes between case frequency of R229Q in NPHS2 was 0.06893 in the European and control samples, the rank sums of genes from each group Finnish population in the GnomAD database. (2) At least two were compared using the Wilcoxon rank-sum test (U test) to rare heterozygous genotypes in a single gene, such that at least check the overall rank orders between these groups.35 one of these heterozygous sites did not have a homozygous genotype found in the GnomAD and ExAC databases. Enrichment Test for Top-Ranked Genes in Renal Glomerulus–Enriched Genes Ancestry Labeling and Case/Control Matching Nine glomerular expression datasets were collected by Ding Our samples were mixed with the samples from the 1000 Ge- et al.36 A high-confidence set of glomerulus-enriched genes nomes Project29 for the principal component analysis (PCA). was defined by those genes that have significant differential ex- The population label was consistent with the label from the pression between glomerular and nonglomerular compartments 1000 Genomes Project; in total, six ancestry labels were used in at least two datasets (see Table 1 of Ding, et al.36). Fisher’s exact (EUR, European; EAS, East Asian; AMR, admixed American test was used to evaluate the significance of the enrichment of [MXL,MexicanAncestryfromLosAngeles;PUR,Puerto renal glomerulus–enriched genes in top-ranked genes.

JASN 30: 1625–1640, 2019 Rare Gene Variants Contributions to FSGS 1627 BASIC RESEARCH www.jasn.org

Table 1. Summary for familial and sporadic cases phenotypes. The number of sequenced in- Sex Ancestry dividuals per family in our cohort ranged Group Patientsa Age of Onsetb Male Female EUR EAS SAS AA AMR AFR from one to seven (Supplemental Table 1). In total, we sequenced samples from 395 Familial cases 337 22.6 (15.8) 179 158 216 5 16 31 69 0 distinct family units, including 337 cases Sporadic cases 187 20.1 (14.9) 77 110 162 2 1 8 13 1 coming from 208 families with multiple af- EUR, European; EAS, East Asian; SAS, South Asian; AA, African American; AMR, admixed American (MXL, Mexican Ancestry from Los Angeles, PUR, Puerto Ri-cans from Puerto Rico, CLM, Colombians fecteds and 187 from sporadic cases (Sup- from Medellin, Colombia, and PEL, Peruvians from Lima, Peru); AFR, African. plemental Table 1, Table 1). Variants from b Mean in years (SD). our cohort were called using the Genome Analysis Toolkit.15,16 To estimate individual- Network Distance Analysis level ancestry and perform quality control, we performed PCA We define the distance (dij) of two genes (i and j) as the shortest and genetic relationship estimation using the SNPRelate path between gene i and j of all of the possible paths between R package.40 Our case and control samples had a world-wide them on a given network. This shortest path problem was ancestry distribution, with the majority having predominantly solved by Dijkstra’salgorithm.37 Given two gene sets K and recent European ancestry (Figure 2, Table 1). As revealed by T,wedefine the distance between a gene t from T to gene set PCA-based clustering, the breakdown for families was 73.4%, K as DtK ¼ minfdtk; for k ∈ Kg, the minimal distance of 21.7%, 4.5%, and 0.25% for European, African American and k t with all of the genes from K.Then,thedistance(GTK)ofa Admixed American, Asian, and African ancestry, respectively candidate gene set T with a known gene set K was defined as (see Methods). GTK ¼ averagefDtK ; for t ∈ Tg, the average of the distance of Next, given the diverse ancestry distribution, we applied a all of the candidate genes from T with its minimal distance to case/control matching approach to control for population genes from K. Depending on the comparisons, the known structure in our case/control statistical testing.41,42 After gene set K and the novel gene set T were set appropriately. the removal of low–call-rate samples, 363 family units, either Statistical significance was estimated by repeating the random unrelated FSGS families or sporadic FSGS cases, with high- process 1000 times. Each time, T was replaced by 50 random quality sequence were used for downstream analysis (Supple- genes from a gene pool that was covered by the exome- mental Table 1). PCA plots for the matched case and control sequencing target. We also performed another analysis by fixing samples are shown in Figure 2, C and D. As shown in Figure 2, the top 50–ranked new genes, but replacing the known genes by E and F, the genomic inflation factor (ƛ) for a common variant randomly picked genes for the estimation of the background association test dropped from 1.21 to 1.06 after case/control distribution, repeating this process 1000 times. The network matching, indicative of an improvement in our ability to con- analysis was implemented in Java on the basis of the Java version trol for population structure. of Dijkstra’s algorithm from Princeton University (https:// algs4.cs.princeton.edu/44sp/DijkstraUndirectedSP.java.html). Exome Sequence Analysis for Genetic Diagnosis and Two gene-gene interaction networks were used as back- Identifying Candidate Genes for FSGS bones for checking and comparing distances, the STRING38 We hypothesized that in our FSGS cohort, approximately 30% 39 and inBio Map networks. The connection score (Ci)in of samples would be explained by mutations in previously these networks was converted to Cmax– Ci in order to apply reported genes on the basis of earlier reports, with a higher 43,44 the shortest path algorithm. Cmax is the maximum edge score rate of identification in the familial cases. We in a network. In order to check the robustness of the gene set also hypothesized that mutations in genes not yet identified distance between top-ranked new genes and known disease- as FSGS-causing contributed to the development of disease associated genes, we performed “leave one out” and “leave in a nontrivial fraction of these subjects. We therefore per- three out” analyses (details in Supplemental Material). formed several analyses: (1) characterizing the rare variant distribution spectrum in known disease-causing genes (i.e., genetic diagnosis (2) common variant association test; and RESULTS (3) burden test for moderately rare variants and extremely rare variants. For each class of variants, different kinds of statis- Study Design, Samples, Sequencing, and Quality tical methods were applied to test the association of variants/ Control genes with the disease status. Toidentify genes that contribute to FSGS, we analyzed a total of 1284 whole exomes including 662 exomes from individuals Rare Variants Identified in Known Disease-Causing diagnosed with, or with family members diagnosed with, FSGS Genes (Figure 1), either sporadic or familial, and 622 exomes from In order to define the distribution of rare variants in known unrelated control individuals (all adults). None of the control FSGS-associated genes, we performed variant filtering on the individuals were reported to have kidney disease, although, in basis of each family’s compatibility with the different inheritance most cases, they were not extensively studied for kidney-related models. Variants were filtered on the basis of the annotation of

1628 JASN JASN 30: 1625–1640, 2019 www.jasn.org BASIC RESEARCH

395 case families (662 exomes) + 622 control exomes

QC and case/control matching 363 unrelated case families and 363 controls

Genetic diagnosis Common variants association test Moderately rare variants Extremely rare variants

ACMG guideline

49.6% cases 30.6% cases Top 10 ranked gene list Top 20 ranked gene list includes Validation of APOL1 gene 35.5% controls 18.7% controls includes APOL1, WT1, COL4A5 WT1, COL4A5, TRPC6, NPHS2, INF2, ACTN4.

No control No control Top ranked new genes have high burden genes high burden genes closer network distance than random background 33.6% cases 24.2% cases 8.3% controls 5.5% controls

Figure 1. Data analysis diagram. After quality control, 363 unrelated case families with ancestry-matched controls were included in this study. Rare variants in known disease-causing genes were profiled (genetic diagnosis); common variants association test and gene burden test for moderately rare and extremely rare variants were conducted thereafter. allele frequency and the predicted effects of several in silico controls, namely LMX1B, SIX2, UPK3A, BMP4, DSTYK, genetic prediction tools (see Methods).20 We checked the dis- HNF1B, and SIX5. tribution of these variants in 165 known kidney disease– Given this observation, we explored whether a stricter filter related genes (excluding kidney stone genes, Supplemental using ACMG guidelines could help us better distinguish Table 2)6. the causal rare variants from noise. We applied filters to include only those rare variants with multiple lines of compu- Known Dominantly Acting Genes tational evidence supporting pathogenicity following First, dominant inheritance genes were analyzed in the 363 these ACMG guidelines. Specially, we included only case family units (including a total of 483 affected individ- nonsynonymous rare variants predicted as damaging by at least uals) and 363 matched controls. The rare variants in each three of five algorithms (PolyPhen2,24 SIFT,25 M-CAP,26 LRT,27 individual were filtered by consistency with a dominant in- and MutationTaster28), as well as splice sites, nonsense sites, heritance model (i.e., shared by all affecteds in a family if and indels that were classified as high-confidence LOF by the multiple affecteds were available), allele frequency #0.0001 LOTTEE plugin of VEP.20 We refer to such variants as LOF+3D. in ExAC and GnomAD databases, as well as CADD$10. After application of these stricter filtering guidelines, most of Figure 3A and Supplemental Table 3 illustrate the numbers the known dominant disease genes in which we observed a high of families (cases) and controls carrying rare variant geno- variant burden in controls still retained this trend (Figure 3D). types in known disease-associated genes that passed these Thus, applying a stricter in silico classification is not sufficient filters. COL4A5, WT1, UMOD, and INF2 were the genes to distinguish causal variants from noise. identified most frequently as harboring likely causal muta- We further partitioned the rare variants seen in cases ac- tions, with only a small number of similar variants identified cording to whether they were observed in patients with familial in controls. However, we identified several known disease or sporadic FSGS (Supplemental Figure 3). The rare variants in genes in which a nontrivial number of rare variant genotypes COL4A5 and WT1 were evenly distributed between familial passing these same filtering conditions in the control sam- and sporadic patients. In contrast, rare variants in UMOD and ples (Figure 3A). For example, genes MYH9, FN1, SALL1, COL4A3 were significantly enriched in those patients with PODXL, TBX18, ROBO2, COL4A6, and SRGAP1 were all familial inheritance (P value ,0.05, Fisher’s exact test). For found to have a higher burden of such variants in control INF2, TRPC6, and PAX2, we observed trends toward detec- samples than in cases. There are also seven kidney disease tion in familial cases, but the statistical significance was lim- genes in which suspicious variants were detected only in ited by the smaller numbers.

JASN 30: 1625–1640, 2019 Rare Gene Variants Contributions to FSGS 1629 BASIC RESEARCH www.jasn.org

A B 0.04

0.02 0.03

0 0.02

–0.02 EUR 0.01 PC2 PC3 EAS –0.04 AMR 0 SAS –0.06 AA –0.01 AFR –0.08 CaseFamily Control –0.02 –0.01 00.01 0.02 0.03 0.04 –0.010 0.01 0.02 0.03 0.04 PC1 PC1

C D CASE 0.1 CASE 0.2 CTRL CTRL

0.05 0.15

0 0.1 pc3 pc2

–0.05 0.05

–0.1 0

–0.05 –0.15 0 0.05 0.1 0.15 0 0.05 0.1 0.15 pc1 pc1

E F 6 5

4 4

3

2 2 Observed -log10(Pvalue)

Observed -log10 (Pvalue) 1

λ = 1.21 λ = 1.06 0 0

042 0 24 Expected -log10(pValue) Expected -log10(pValue)

Figure 2. Quality control for study samples. (A and B) PCA plot for case and control samples mixed with 1000 Genomes Project samples. EUR, European; EAS, East Asian; AMR, admixed American; SAS, South Asian; AA, African American; AFR, African. (C and D) PCA plot for case and control matched samples. (E) QQ plot for the genome-wide association study before case/control matching. (F) QQ plot for the genome-wide association study after case/control matching, where ƛ is the genomic inflation factor.

1630 JASN JASN 30: 1625–1640, 2019 www.jasn.org BASIC RESEARCH

Known Recessive Disease Genes databases (details in Methods section). Eleven families have For genes following recessive inheritance, 266 affected cases rare homozygous variants in the known disease-causing genes from 236 unrelated family units (one case per family plus C5orf42, COL4A3, IFT80, NPHP3, NPHS2, PLCE1, NUP93, sporadic cases) with an inheritance pattern compatible and OFD1 (Figure 3C, Supplemental Table 6). All of these with a recessive model and 236 matched controls were ana- families show a high level of inbreeding (Supplemental lyzed. For the compound heterozygous model, the rare variants Table 6); ten of 11 families had a genome-wide inbreeding in each family were filtered by: (1) allele frequency #0.08 in coefficient ranked in the top 12% of all of the families tested, ExAC and GnomAD databases (a number slightly greater than and four ranked in the top 1%. All of the presumed causal the most common recessive monogenic FSGS variant NPHS2 variants detected in these families are very rare, with the max- p.R229Q, with allele frequency as high as 6.8% in some pop- imum population allele frequency ,0.0066 as estimated from ulations per the GnomAD database); (2) the presence of the ExAC database.21 However, attributing a causal relation- at least two heterozygous rare variants in the same gene; and ship to variants in C5orf42, IFT80, and OFD1 is difficult, be- (3) at least one of the heterozygous variants lacking any cor- cause these genes contained candidate variants that were not responding homozygous genotype found in either the ExAC predicted to be damaging and were detected in control sam- or GnomAD databases (for details see the Methods section). ples. Except for the variants in these three genes, all of the other We also required that this genotype be shared by all of the missense variants were predicted to be either very strongly dam- affecteds in a family. We identified FRAS1, FAT1, and aging or LOF and all were located in evolutionarily conserved PKHD1 as the three genes most likely to have a genotype regions22 (Supplemental Table 6). All of the rare variant geno- consistent with disease contribution under this model types identified in NPHS2, COL4A3, NPHP3, NUP93, and (Figure 3B). However, we detected a similar burden of such PLCE1 show compelling genetic evidence for being disease caus- genotypes in control samples for FRAS1, FAT1, PKHD1, ing (Figure 3, Supplemental Table 6). ALMS1, DYNC2H1, CUBN, AHI1, NPHP4, WDR35, WDR19, TTC21B, and RPGRIP1L, as well as C5orf42 Fraction of Families Explained by Rare Variants in (Figure 3B, Supplemental Table 4 illustrated by orange bars). Known Genes There are also several genes in which suspicious variants were By grouping together all of the rare variants detected under the detected only in controls, including ZNF423, PDE6D, NUP93, above models, rare variants or rare variant genotypes in known NPHP3,NEK1,KANK4,ITGB4,ITGA3,IFT140,GRIP1, FSGS and FSGS-like disease-causing genes were detected in GRHPR, EMP2, COL4A6, CEP41, CC2D2A, BBS9, and ACE 49.6% (180 of 363) of unrelated families and sporadics, al- genes. Even after applying stricter in silico thresholds for clas- though 35.5% (129 of 363) of control individuals showed a sification of LOF (LOF+3D sites only), we still find a high similar rare variant genotype. This observation in cases is burden of variants in these genes in control individuals, in- clearly an overestimation of the disease explanation rate, cluding FRAS1, FAT1, FRME2, ADCK4, RPGRIP1L, PKHD1, because a nontrivial proportion of genes harbor a similar or NPHP4, NPHP3, DYNC2H1, and CUBN genes (Figure 3E). higher level of burden in control samples (Figure 3). In order To assess the hypothesis that these genes are likely to harbor to give a better estimate of the identification of true causal false-positive “disease-causing” genotypes, we used the gene genotypes, we removed those genes that were among the top damage index score45 and frequently mutated genes (FLAGS) 100 FLAGS genes46,47 or with a similar or higher level of rare classification (rare variants in gene “Frequently Mutated variant burden in controls (with enrichment of rare variants in in Public Exomes”).46,47 On the basis of the gene damage index cases versus controls #1.5 in our data set). We refer to these prediction, FRAS1, FAT1, ALMS1, DYNC2H1, and CUBN are genes as “high control burden genes” hereafter. When we do all significantly enriched for rare variants in the general pop- this, 33.6% (122 of 363) of unrelated families or sporadic cases ulations.45 These five genes have also been labeled as FLAGS, have rare variants in known kidney disease genes (Figure 4A). as all are observed to be frequently mutated in public exomes, By comparison, only 8.3% (30 of 363) of the control individ- irrespective of the phenotypes studied.46,47 By contrast, for uals have rare variant genotypes that pass the same filtering several other genes, the burden of disease-consistent geno- procedure as used for case samples. types in cases is much greater than in controls. For example, Weobservedahigherproportionofrarevariantratein we saw no candidate disease-causing genotypes in NPHS2 or familial cases (37.2%, 70 of 188) than sporadic cases (29.7%, COL4A4 among controls, but a significant burden in cases. As 52 of 175). These numbers are similar to previous findings shown in Supplemental Table 5, the distribution of variant in SRNS, where NPHS1, NPHS2, and WT1 are the major con- NPHS2 genotypes is consistent with previous reports, with tributors.43,44 In our cohort, COL4A5, WT1, and COL4A4 are the disease-causing variants seen in combination with the rel- the top three contributors under a dominant or X-linked atively common p.Arg229Gln variant all located in exon model and NPHS2 is the top contributor under a recessive 7or8.48 model (Figure 3). The differences in the relative contributions Under a homozygous recessive model, the rare variants were of the major genes reflect differences in subject ascertain- filtered by CADD$10, allele frequency #0.01, and the absence ment.43,44 COL4A5, WT1, NPHS2, COL4A4, UMOD, of any homozygous genotype in ExAC and GnomAD COL4A3, INF2, ACTN4, PLCE1, CEP290, PAX2, and

JASN 30: 1625–1640, 2019 Rare Gene Variants Contributions to FSGS 1631 BASIC RESEARCH www.jasn.org

A C 16 4 CTRL CTRL 14 CASE CASE 12 3 10 8 2 6 4 1 2 0 0 C3 CFI FN1 RET WT1 INF2 SIX2 SIX1 SIX5 INF2* CD46 PAX2 IFT80 TNXB ANLN OFD1 BMP4 MYH9 WNT4 SALL1 UMOD TBX18 PLCE1 NUP93 CHD1L ACTN4 TRPC6 HNF1B GATA3 UPK3A LMX1B DSTYK NPHS2 NPHP3 PODXL ROBO2 C5orf42 ACTN4* COL4A4 COL4A3 COL4A6 COL4A3 COL4A5 SRGAP1 ARHGAP24 B 28 CTRL 26 CASE 24 22 20 18 16 14 12 10 8 6 4 2 0 CFH EVC ACE KIF7 AHI1 INVS FAT1 TTC8 BBS4 BBS9 NEK1 OFD1 XPO5 MKS1 EMP2 CUBN COQ2 COQ6 LRIG2 ITGA8 ITGB4 ITGA3 GRIP1 IFT172 IFT140 PLCE1 CEP41 TCTN2 TCTN1 TRAP1 NUP93 FRAS1 APOL1 LAMB2 KANK1 KANK4 ALMS1 ADCK4 NPHS1 NPHP4 PDE6D NPHP3 PKHD1 NPHS2 FREM1 FREM2 POC1A WDR35 WDR19 CHRM3 C5orf42 GRHPR ZNF423 TTC21B CEP164 CEP290 NUP205 COL4A4 COL4A6 CC2D2A SCARB2 TMEM67 SLC12A3 TBC1D32 DYNC2H1 RPGRIP1L D F 4 14 CTRL CTRL 12 CASE CASE 3 10

8 2 6 4 1 2 0 0 C3 CFI FN1 RET WT1 INF2 SIX5 SIX1 SIX2 INF2* PAX2 ANLN TNXB OFD1 BMP4 MYH9 WNT4 SALL1 UMOD TBX18 PLCE1 HNF1B ACTN4 TRPC6 CHD1L UPK3A LMX1B DSTYK NPHS2 NPHP3 PODXL ROBO2 C5orf42 ACTN4* COL4A6 COL4A5 COL4A3 COL4A4 COL4A3 SRGAP1 ARHGAP24 E 5 CTRL CASE 4

3

2

1

0 FAT1 TTC8 CUBN COQ6 LRIG2 ITGA8 PLCE1 FRAS1 ALMS1 LAMB2 NPHS1 ADCK4 PKHD1 NPHP4 NPHP3 FREM2 TTC21B CEP290 TMEM67 DYNC2H1 RPGRIP1L

Figure 3. Rare variants discovered in known genes. Variants were filtered by allele frequency, CADD score, and genotype-phenotype cosegregation. (A) Variants were filtered by dominant model/X-linked model. (B) Variants were filtered by recessive compound heterozygous model. (C) Variants were filtered by single site homozygous recessive model. (D–F) Variants filtered by the same conditions as A, B, and C, respectively, with further variant filtering following ACMG guidelines, where only variants with multiple lines of computational evidence supporting pathogenicity were included. *Variants outside of the known disease-causing domain were not included.

1632 JASN JASN 30: 1625–1640, 2019 www.jasn.org BASIC RESEARCH

AB

WT1 COL4A5 9.46% 10.8% NPHS2 6.76%

COL4A4 6.08%

OTHERS 23% UMOD 33.6% 5.41%

COL4A3 66.4% 4.73%

INF2 4.73% TNXB 2.03% KANK1 ACTN4 3.38% 2.03%

TRPC6 INVS PLCE1 2.7% 3.38% 2 .7% 2.03% 2 CEP290 COQ2 PAX 2.7% 2.03% C3 2.03% ARHGAP24 2.03% ADCK4 2.03%

Figure 4. Pie chart of rare variants in known kidney disease–associated genes. Genes with a similar or higher level of burden in control samples were not included here. (A) The proportion of families with rare variants in known genes (blue section) versus without (orange section). (B) A pie chart showing variants in FSGS cases with disease explained by rare variants in known genes.

TRPC6 accounted for 62.8% of the families with likely causal genotype. All of the members tested from these families were rare variant genotypes (Figure 4B). For these 12 genes, to- found to be admixed individuals (labeled as AA or AMR) by gether with known disease-causing domain infor- PCA-based ancestry inference (see Methods section), which mation, we have very high confidence in assigning disease is consistent with the African origin of the G1 and G2 vari- causality to the rare variants identified.3–5 However, we are ants.49–51 None of the control samples studied were found to not able to use in silico prediction alone to attribute disease have a high-risk APOL1 genotype. We note that two sporadic causality for high control burden genes (Figure 3). The in silico cases and one control are compound heterozygous for rare predictions of CADD and GERP scores of the extremely rare APOL1 variants (Supplemental Tables 4 and 7). When includ- variants (frequency,1/10,000) seen in case and control sam- ing APOL1 high-risk genotypes, 41.3% (150 of 363) of fami- ples are indistinguishable (P value .0.05, t test; Supplemental lies or sporadic cases can be explained by a genetic cause or Figure 4). Even by applying stricter filtering (LOF+3D sites), genetic susceptibility. We note that ten families with high-risk an equal or higher level of burden in control samples remained APOL1 genotypes also had rare variants in known kidney dis- (Figure 3, lower panel). This highlights the difficulty in ease genes, of which eight families qualify for LOF+3D criteria using simple sequencing and filtering for classifying variants (Supplemental Table 8). The severity of the phenotype, as as disease-causing even in those genes previously reported as assessed by age of disease onset, was not significantly different disease-associated. Previously reported data as well as future in the individuals with mutations in a known disease gene data regarding these genes and their variants therefore need with a high-risk APO1 genotype compared with those to be interpreted cautiously. with a high-risk APOL1 genotype alone. In addition, we found no evidence to support a genetic interaction between APOL1 APOL1 and these other genes, as the number of co-occurrences of Two specific variants in APOL1 (termed G1 and G2) greatly these events is less than the expected value of independence influence the risk of FSGS (with an odds ratio of approximately (specifically : 10,36330:09430:33611). 10 (95% confidence interval 6.0 to 18.4) under a recessive model), and are common in people of recent African ances- Age of Disease Onset in the Different Inheritance try.49–51 We therefore looked specifically at APOL1. We coun- Models ted the number of samples carrying two copies of the so-called We looked at the age of disease onset in families with rare G1 and G2 risk variants (genotypes G1G1, G2G2, and G1G2, variantsdetected in known disease-associated genes(excluding referred as high-risk APOL1 genotypes). Of the 363 unrelated high control burden genes) after partitioning by inheritance families studied, 34 families (9.4%) carried a high-risk APOL1 model. Age of onset is significantlylowerfor thosefamilies with

JASN 30: 1625–1640, 2019 Rare Gene Variants Contributions to FSGS 1633 BASIC RESEARCH www.jasn.org the homozygous recessive inheritance model when compared genes), and the best overall ranking (summation of the rank with the other two models (P value ,0.05, U test35; Supple- of each gene, Supplemental Figure 2, A and B). mental Figure 5). The age of disease onset in families with Tohelpvisualizeanddistinguishthedistributionofdisease- dominant inheritance is also slightly higher than that of associated genes, we compiled the information from the families with the compound heterozygous model, but not single-group burden test with that of the case-control burden significantly (P value=0.14, U test35). test in a single plot. We subtracted the number of rare variants seen in each gene in controls from that of cases (x axis of Figure 5) Analysis for the Effect of Common Variants and and plotted this difference against the P value of single-group Moderately Rare Variants analysis (y axis of Figure 5). Using this graphing method, the We performed an association study to try to identify cod- known disease-causing genes PAX2, NPHS2, TRPC6, WT1, ing variants associated with increased risk of FSGS, given the COL4A5, INF2, and ACTN4 are all located in the top-right high-density coverage of common coding variants in the quadrant of the graph (Figure 5), as expected. whole-exome–sequencing data. The only that reached In order to quantitatively determine whether the distribu- genome-wide significance was APOL1, specifically the G1 tion of candidate genes in cases compared with controls was (rs73885319 and rs60910145) variant (P value=1.69E-7; Sup- asymmetric, we compared the rank order of genes from the plemental Figure 6). This implies that larger sample sizes are single-group analysis. We expected that the genes from the case required to detect the contribution of common coding vari- group would have a smaller rank sum (with smaller P value) ants other than those in APOL1 to FSGS risk. than those from the control group, because the case group To look for the possible contribution of moderately rare should include rare variants that are in fact causally related variants, we used three other burden tests: (1) variant thresh- to the disease in addition to statistical noise. As shown in old (VT) test, (2)C-a test, and (3) SKAT-O, in order to deal Table2,weobservedasignificantly smaller rank sum in the with the unknown optimal cut-off of allele frequency (with case group when either the top 50 or 75 genes were chosen for the VT test) and heterogeneity of variant effects (with C-a test the analysis (P value=0.02 and 0.01, respectively). The same and SKAT-O). We applied these three methods to our ancestry- trend was also found when the top 100 genes were compared matched case/control data for those rare variants with (P value=0.08). As a negative control, we analyzed rare syn- MAF,5%. APOL1, WT1, and COL4A5 were ranked in the onymous variants (zero allele frequency in GnomAD top ten by the P values from the VT test (Supplemental database) in the same manner as we did for the rare missense Table 9), but none of them reached a genome-wide level of and LOF variants. In the case of the synonymous variants, significance. The APOL1 signal was driven by the G1 and G2 a smaller rank sum from the case group was not observed variants, and the WT1 and COL4A5 signals were driven by the (Supplemental Table 10). In fact, there was a trend toward very rare variants as discussed earlier (Figure 3, Supplemental lower rank sum in controls (Supplemental Table 10). There- Table 3). fore, despite a limited number of genes reaching genome-wide significance, we conclude that there is an enrichment of true Gene-Based Burden Tests for Extremely Rare Variants disease-causing rare variants in the case group, where enrich- Next, we wanted to determine whether additional new genes ment of disease-causing rare missense and LOF variants is could also be identified as likely contributors to the develop- driving genes to have smaller rank sum in the case group. ment of FSGS, given that in only 33.6% of families with disease be explained by rare variants in known Mendelian disease Ranking FSGS Genes Using the Rare Variant Burden genes. We performed a single-group burden test and a case- Test control burden test. In the single-group burden test, the We next combined the burden information from the y axis observed number of rare variants was compared with the ex- (single-group analysis) and x axis (case-control analysis) pected number estimated from the whole-genome back- shown in Figure 5 as a single value using Fisher’smethod. ground (Supplemental Figure 1A).33,34 In the case-control The combination of the P values as a single value provides a burden test, the number of observed rare variants in cases simpler gene rank list. As shown in Table 3, the top four genes was compared with the observed number in controls (Supple- in this analysis were WT1, COL4A5, TRPC6, and NPHS2. The mental Figure 1B). The statistical power of the single-group known disease-causing genes INF2, ACTN4, and PAX2 also test was tuned by testing several combinations of different rank near the top of this list, with ranks of 6, 20, and 56, re- parameters. As shown in Supplemental Figure 2, we found spectively. As expected, we found a significant enrichment that the best combination of parameters for the single-group of our top-ranked genes toward those genes enriched for analysis was to use missense and LOF variants (splice-altering, glomerular expression (Supplemental Figure 7, Supplemental nonsense, frameshift, and indels) with zero allele frequency as Table 11).36 When the top 20 genes were considered, the en- estimated from the GnomAD database.21 In this setting, the richment odds ratio was 8.51 (P value ,0.001; 95% confi- positive control genes WT1, COL4A5, ACTN4, INF2, TRPC6, dence interval, 3.21 to 22.09). The top 60 genes ranked by and PAX2 had the largest total information content I,(where this method (and the associated rare variants) are listed in I ¼ 2 ∑log2ðÞPi ; the smallest overall P values of these Supplemental Tables 12 and 13.

1634 JASN JASN 30: 1625–1640, 2019 www.jasn.org BASIC RESEARCH

WT1

15

COL4A5

10

TRPC6 5 NPHS2 ACTN4 INF2

rare variant) (-log P value) PAX2

Observed over expected number of 0

–15 –10 –5 0 5 10 15 Observed number of rare variants (#Case – #Control)

Figure 5. Burden test for extremely rare variants of the dominant model. Volcano plot for “single-group” analysis (comparison of observed rare variant rate with expected rate of each gene) and “case-control” analysis. The x axis shows the difference in the number of families with qualified rare variants between case and control samples. The y axis height of each dot shows the 2log10(P value) of the “single-group” analysis, where on the left (x,0), P value is on the basis of the analysis of control samples, whereas in the right part of this figure (x.0), each dot shows the P value of a gene from the analysis of case samples.

Network-Based Analysis of Top Candidate Genes Table 15). Next, we computed the gene set distance between Next, we wanted a better sense as to whether it is likely that the 50 top-ranked new genes (T, Supplemental Table 12, with some of our highly ranked new genes are true contributors to known kidney disease risk genes excluded if any) and the Kfsgs kidney disease and FSGS. Totest this, we compared the gene set or Kexpanded gene panel, respectively. As shown in Figure 6 and distance between the set of our top-ranked new genes and Supplemental Table 15, compared with a randomly chosen set previously identifiedgenesassociatedwithkidneydiseaseor of 50 genes, the top 50 genes ranked from the control group, or with FSGS (Supplemental Tables 2 and 14), because previous the top 50 genes ranked by synonymous rare variants from the studies have shown that disease-causing genes tend to cluster case group, our top 50 new genes had shorter distances to 52,53 together. We curated two sets of disease-causing genes, the known disease-causing genes. This held true for both the Kfsgs (Supplemental Table 14) and Kexpanded (Supplemental STRING and the inBio Map networks (Figure 6, Supplemental fi Table 2). The rst gene set, Kfsgs, comprises genes with compelling Table 15).38,39 Fixing these 50 new genes and bootstrapping the genetic evidence as causative of FSGS or FSGS-like phenotypes known genes 1000 times by choosing random genes, we saw when mutated; this set is called the FSGS panel.6 The other set, that the distance between the top 50 new genes was also K , is a mixture of genes that are associated with a expanded significantly closer to known disease-associated genes than broader range of kidney phenotypes (excluding kidney to genes chosen by random bootstrapping (Z score #21.87, stones).6 We referred to this set as the EXPANDED panel. P value #0.02). The shorter distance between disease-causing genes was vali- To determine whether this shorter distance of our top- dated by a shorter distance between the K gene panel expanded ranked new genes was driven by a few genes or driven by at (excluding genes in K ) and the K gene panel than the fsgs fsgs least several genes, we performed “leave one out” and “leave distance between disease-causing genes and the randomly three out” analyses. As shown in Supplemental Figure 8, the picked genes (P value #0.002; Figure 6, Supplemental shorter distance was robust to these analyses, confirming that the shorter distance was driven by at least several genes ranked Table 2. Higher rank of genes from case samples ranked by missense and LOF variants among the top. Taken together, these results strongly suggest that among the variants identified in the most highly Top Genes Case Rank Sum Control Rank Sum P Value ranked genes (Supplemental Table 12, Table 3), a nontriv- 50 2212 2838 0.03 ial subset are in fact true contributors to the development 75 5044 6281 0.02 of FSGS. 100 9475 10,625 0.16 Comparison of the gene rank sum between the case and control groups by Wilcoxon rank-sum test. Equal numbers of top genes were picked from the “single-group” test of case and control samples, respectively, then these DISCUSSION genes were ranked together on the basis of the “single-group” test P value. The rank sums of genes from the case group were compared with those from the control group. Under the null hypotheses, an equal number of case and In this study, we performed exome sequencing in a large set control total rank sums is expected. of individuals diagnosed with familial or sporadic FSGS and

JASN 30: 1625–1640, 2019 Rare Gene Variants Contributions to FSGS 1635 BASIC RESEARCH www.jasn.org

Table 3. Top 20 ranked genes using rare variant burden test Order Gene No. Cases Single-Group P Value No. Controls Case-Control P Value Combined P Value 1WT1a 14 8.27E219 0 5.54E205 1.93E219 2COL4A5a 14 1.93E212 0 6.05E205 1.20E214 3TRPC6a 5 4.67E205 0 3.09E202 2.22E205 4NPHS2a 5 1.91E204 0 3.11E202 6.38E205 5 SCAF1 5 3.10E204 0 2.89E202 8.42E205 6 INF2a 7 6.71E205 2 8.01E202 1.11E204 7 SETD2 6 1.11E203 0 1.57E202 1.14E204 8 LY9 5 9.10E204 0 3.10E202 2.12E204 9 ANKRD34A 3 9.21E205 0 1.22E201 2.63E204 10 CDC16 4 8.16E204 0 6.25E202 4.63E204 11 KCNS1 4 9.75E204 0 6.21E202 5.23E204 12 SMC6 4 1.17E203 0 6.23E202 6.03E204 13 PRCP 4 1.40E203 0 6.25E202 6.90E204 14 ATP13A3 4 1.64E203 0 6.25E202 7.80E204 15 RNF115 3 4.94E204 0 1.24E201 8.32E204 16 NCAPG 4 1.84E203 0 6.16E202 8.34E204 17 MAP3K15 5 5.63E203 0 3.12E202 9.35E204 18 MAATS1 6 4.81E204 2 1.44E201 1.02E203 19 GTPBP2 3 7.08E204 0 1.23E201 1.04E203 20 ACTN4a 5 1.03E203 1 1.08E201 1.13E203 The summary statistics of the top 20 ranked genes. Genes were ordered by the combined P value from the single-group analysis and the case-control analysis using Fisher’smethod. aKnown disease-causing genes. compared this data with exome sequence data from control diagnosis of FSGS; (2) in a significant subset of these cases, the individuals. We identified rare nonsynonymous variants and genotype is of unclear significance, because genetic variants LOF variants in Mendelian disease genes that are plausible indistinguishable from putative disease-causing variants are contributors to disease in 49.6% of cases (not including found in several Mendelian disease genes in control individ- APOL1, a non-Mendelian FSGS risk gene). This is almost cer- uals at rates similar to that in cases. Even by applying stricter tainly an overestimate of the true percentage of families genetic score–based filtering according to ACMG guidelines54 explained by known disease genes, because (1) similar rare in which multiple lines of in silico mutation effect prediction variant genotypes were detected in 35.5% of controls and were used, a similar or higher rare variant burden in controls (2) for a number of known kidney disease genes (specifically was still present in several genes. This demonstrates the diffi- MYH9,FN1,SALL1,PODXL,TBX18,ROBO2,COL4A6, culty in using filtering and in silico prediction alone for judging SRGAP1, FRAS1, FAT1, PKHD1, ALMS1, DYNC2H1, the pathogenicity of rare variants in many genes reported to CUBN, AHI1, NPHP4, WDR35, WDR19, RPGRIP1L, and cause FSGS (or kidney diseases that can phenocopy FSGS). C5orf42) we identified rare variant genotypes at similar or After filtering by ACMG guidelines (and for consistency higher frequencies in controls as in cases. A more conservative with observed inheritance pattern in familial cases), the only estimate, excluding those genes with an equal or higher variant genes in which the remaining variants were limited to cases burden in controls, puts the percentage at 33.6% for cases but not controls were COL4A5, WT1, INF2, ACTN4 (but only compared with 8.3% in controls. In our analysis, a higher ex- when restricting to the known disease-associated domain), planation rate was found in familial cases: 37.2% of familial and TRPC6, under a dominant model; and NPHS2, PLCE1, cases and 29.7% of sporadic cases could be attributed to var- OFD1, NPHP3, COL4A3, C5orf42, CEP290, TTC8, TTC21B, iants in known FSGS genes. This is consistent with the notion TMEM67, PLCE1, NPHS1, LRIG2, LAMB2, ITGA8, COQ6, that familial cases are more likely to be the result of single gene and ALMS1, under a recessive model. However, attributing defects than nonfamilial cases. disease to genes with a high control variant burden, such as Ourcollection of FSGS subjects reflectsthe specificnatureof C5orf42, TTC21B, and ALMS1, should be done with extreme our group’s ascertainment over the past 20 years, and may not caution. correlate closely with genetic variant detection in the clinical The difficulty of assigning disease causality to rare variants setting. The same caveats should be applied to making clinical was also demonstrated by the large proportion of cases with decision-making on the basis of other published reports. candidate rare variants detected in more than one disease- Nevertheless, our results do suggest that (1) a rare genotype associated gene. In 14.9% (54 of 363) of familial cases or spo- consistent with an inherited form of FSGS or FSGS-like injury radiccases, we detected two or more genes with a candidate rare is present in about a third of individuals with the histologic variant genotype. After removing those genes with a similar or

1636 JASN JASN 30: 1625–1640, 2019 www.jasn.org BASIC RESEARCH

A B STRING NETWORK inBio Map

0.4 0.4

0.3 0.3 fsgs.CASE.syn fsgs.CASE.syn expanded.CASE.syn expanded.CASE.syn ty ty i i 0.2 fsgs.CTRL 0.2 fsgs.CTRL ens ens

D expanded.CTRL

D expanded.CTRL expanded.CASE fsgs.CASE 0.1 fsgs.CASE 0.1 expanded.CASE

expanded.fd.fsgsg expanded.fsgsgss

0 0

–3 –2 –10 1 2 3 –3 –2 –1 0 1 2 3 Normalized minimal distance between two gene sets Normalized minimal distance between two gene sets

Figure 6. Closer network distance of top ranked genes with known kidney disease–associated genes than a random set of genes. We computed the network distance between the 50 top-ranked genes (with known kidney disease–associated genes excluded) identified from the rare variant burden test under a dominant model and known disease-associated genes. “fsgs” refers to genes known to cause FSGS when mutated; “expanded” means an expanded set of genes that are associated with various kidney disease phenotypes. “expanded.fsgs” indicates the normalized network distance between the “fsgs” panel and the “expanded” gene panel (excluding overlaps with “fsgs” panel) as a positive control. “fsgs.CASE” indicates the network distance of the 50 top-ranked genes from case samples on the basis of rare missense variants and LOF variants compared with the FSGS panel. “expanded.CASE” indicates the distance from the top 50 genes to the “expanded” panel. “fsgs.CTRL” and “expanded.CTRL” are control analyses that indicate the network distance between the 50 top-ranked genes from the control samples chosen by the burden of rare missense variants and LOF variants and the “fsgs” or “expanded” panels. “fsgs.CASE.syn” and “expanded.CASE.syn” indicate the network distance between the 50 top-ranked genes from case samples by synonymous rare variants and the “‘fsgs” or “expanded” panels. The background distri- bution was estimated by bootstrapping 1000 times, and randomly picking 50 genes from the pool of all genes to replace the 50 top- ranked genes. (A) Network distance measured on the basis of the structure from the STRING network. (B) Network distance measured on the basis of the structure from the inBio Map network. higher burden in control samples, 5.9% (21 of 363) of the kidney disease gene, compared with 18.7% (68 of 363) in con- familial and sporadic cases still remain with more than one trols (Figure 1). Thus, a nontrivial proportion of rare variant candidate mutation. genotypes remained in control samples after application of Among the merits of this study is the analysis of case samples this strict filter. Removing high control burden genes gives together with a set of well matched controls. On the basis of the the values 24.2% and 5.5%, respectively (Figure 1). The high observation of many putative disease genes in which the rare proportion in control samples illustrates the limited help of variant burden in cases was similar to that in control samples, ACMG guidelines in this setting. Reasons for this may include we conclude that special care must be taken to assign disease incomplete penetrance of disease-causing variants and incor- causality in those genes based solely on in silico genetic pre- rect labeling of some genes as disease-causing in the existing diction. When we used a larger set of candidate genes for anal- literature. ysis, rare variant genotypes were detected in 49.6% (180 of After filtering by ACMG guidelines and removal of genes 363) of cases, although a surprisingly high 35.5% (129 of 363) with high variant burden in controls, there are still 5.5% of of control individuals showed similar rare variant genotypes. control samples with rare variants in previously reported dis- The high rare variant burden in known kidney disease genes in ease genes that appear similar to what are presumed to be controls was also highlighted recently in a large cohort with disease-causing variantsin cases. Thismaybe because ofseveral 7974 self-declared healthy adults in which such genes were reasons: (1) incomplete penetrance, (2) the existence of “neu- analyzed.55 Panels of genes for analysis in individuals with tral” or unknown significance rare variants in our genome, (3) kidney disease are being used with increasing frequency in false-positive calls from sequencing errors or unknown phas- the clinic. Our results emphasize that substantial caution ing information under a compound heterozygous model, and should be taken in both the design and interpretation of (4) limited power to filter out “neutral” rare variants and false- such panels. positive genes. This nontrivial number cannot be attributed ACMG guidelines are widely used for reducing the false- to undetected kidney phenotypes in the control, because rare positive rate in claiming a variant to be disease related. After variants were observed in cases but almost none in controls for applying ACMG guidelines in our cohort, 30.9% (112 of 363) several well established disease-causing genes, including of case families have a rare variant genotype in a plausible COL4A5, WT1, INF2, ACTN4, TRPC6, NPHS2, NPHP3,

JASN 30: 1625–1640, 2019 Rare Gene Variants Contributions to FSGS 1637 BASIC RESEARCH www.jasn.org and APOL1. Unlike other genes with very high control variant cannot at present determine which of these genes are most burdens, these genes are not “contaminated” by controls in this likely to be true contributors to disease. study. The sample with variants that are “neutral” or of un- known significance in cases is expected to be ,5.5%, because we have .30% of families in which we sequenced two or more ACKNOWLEDGMENTS members (and 17% in which we sequenced three or more mem- bers). Therefore, we are better able to filter out false-positive We thank the study participants. variants by checking cosegregation in case families. Dr. Wang, Dr. Chun, and Dr. Pollak designed the study. On the other hand, this strict filtering removed variant Ms. Wilkins, Ms. Benjamin, Ms. Knob, and Dr. Appel carried out genotypes in several case families which are likely to be in patient ascertainment and managed the DNA sequencing. Dr. Wang fact disease-causing (for example, in WT and NPHS2). There analyzed the data with the help of Dr. Chun, Dr. Genovese, Dr. Friedman, are several possible reasons for this. For example, there is dis- and Dr. Pollak. Dr. Lifton and Dr. Mane provided control sample data cordance between different genetic prediction algorithms. The and performed exome sequencing in both cases and controls. Dr. Wang, availability of increasing numbers of large next-generation Dr. Chun, and Dr. Pollak drafted and revised the paper. sequencing cohorts, together with better resolution of regional constraints in genes and protein domains, is expected to help improve the sensitivity and specificity of in silico prediction DISCLOSURES tools.56 In the case of two well known autosomal domi- nant FSGS genes, INF2 and ACTN4, the variants we found Dr. Friedman and Dr. Pollak are co-inventors on patents related to APOL1 in control samples did not localize to the well defined disease- diagnostics and therapeutics. Dr. Friedman and Dr. Pollak receive research causing domains (Figure 3A, Supplemental Table 3).3,5 This support from and have consulted for Vertex Pharmaceuticals. Dr. Friedman and Dr. Pollak have equity in Apolo1Bio. observation highlights the importance of the delineation of disease-causing domains using both functional studies and further analyses of variant localization in large case and con- FUNDING trol populations. It also suggests that even for genes that are thought to cause a Mendelian form of kidney disease, assign- This work was supported by National Institutes of Health grant ing causality to a specific variant is not simple, particularly R01DK54931 (to Dr. Pollak) and by National Institutes of Health award when the biology of disease causation is poorly understood UM1HG006504-08 (Yale Center for Mendelian Genomics, to Dr. Lifton and disease-specific domains are not known. and Dr. Mane). Dr. Chun was supported by an Alberta Innovates Health fi Solutions Clinician Fellowship and a KRESCENT (Kidney Research Scientist This study con rms the observation that mutations in type Core Education and National Training Program) postdoctoral fellowship. IV collagen genes are a frequent contributor to FSGS.11,57 It also confirms that WT1 mutations, which cause a variety of syndromes of abnormal urogenital development with SUPPLEMENTAL MATERIAL glomerulopathy, are a frequent cause of inherited FSGS.44,58,59 In addition, UMOD variants, a known cause of CKD char- This article contains the following supplemental material online at acterized by tubule dysfunction, can present with a histo- http://jasn.asnjournals.org/lookup/suppl/doi:10.1681/ASN.2019020152/-/ logic lesion of FSGS (presumably developing secondary to DCSupplemental. the tubulopathy).60 Detailed methods. Althoughwe werenot able to identify new FSGS genes in this Supplemental Figures: study, our results make us confident that our top-ranking can- Figure S1: Illustration figures for the “single-group” analysis (A) didate genes include some that do in fact contribute to FSGS and “case-control” analysis (B). risk. Specifically, in a network analysis, we find that the top Figure S2: “Single-group” burden test for extreme rare variants of 50 genes on our list (after eliminating the known genes) are dominant model, showing the statistics for known FSGS genes WT1, significantly closer to known FSGS genes as well as a longer list COL4A5, ACTN4, INF2, TRPC6, and PAX2. of kidney disease genes than they are to a randomly chosen set Figure S3: The rare variant distribution in familial and sporadic of genes. By comparing the rare variant distribution in cases families for each dominant disease-causing gene. with controls and by comparing the distribution pattern of Figure S4: The comparison of rare variant CADD and GERP score missense and LOF variants with synonymous rare variants, we between the case and control samples for those dominant genes with observe that there is enrichment of missense and LOF rare high burden in control samples. variants in case samples (Figure 5, Supplemental Table 10, Figure S5: Diagnosis age distribution for affected individuals with Table 2). We conclude, therefore, that some of these genes rare variants detected in known disease-causing genes (without genes and their variants are almost certainly contributing to FSGS. with high level of burden in control samples), partitioned by inher- However, without either analyses of larger families (e.g.,large itance model. families segregating these variants) or larger sample sizes Figure S6: Manhattan plot for the results of the association test (more individuals with familial and/or sporadic FSGS), we of the exonic common variants.

1638 JASN JASN 30: 1625–1640, 2019 www.jasn.org BASIC RESEARCH

Figure S7: The enrichment of renal glomerulus expression en- 10. Al-Romaih KI, Genovese G, Al-Mojalli H, Al-Othman S, Al-Manea H, riched genes in top ranked genes. Al-Suleiman M, et al.: Genetic diagnosis in consanguineous families Figure S8: Robustness analysis of gene set distance comparison. with kidney disease by homozygosity mapping coupled with whole- exome sequencing. Am J Kidney Dis 58: 186–195, 2011 Supplemental Tables: 11. Gast C, Pengelly RJ, Lyon M, Bunyan DJ, Seaby EG, Graham N, et al.: Table S1. Summary of study samples. Collagen (COL4A) mutations are the most frequent mutations Table S2. The gene list of kidney disease-associated genes. underlying adult focal segmental glomerulosclerosis. Nephrol Dial Table S3. The rare variants in dominant/X-linked kidney disease- Transplant 31: 961–970, 2016 associated genes. 12. Brown EJ, Pollak MR, Barua M: Genetic testing for nephrotic syndrome and FSGS in the era of next-generation sequencing. Kidney Int 85: Table S4. The rare variants in recessive (compound-heterozygous) 1030–1038, 2014 kidney disease-associated genes. 13. Andrews S: FastQC: A quality control tool for high throughput sequence Table S5. The rare variants in NPHS2 gene. data, 2010. Available at: http://www.bioinformatics.babraham.ac.uk/ Table S6. The rare variants in recessive kidney disease-associated projects/fastqc/. Accessed July X, 2019 genes. 14. Li H, Durbin R: Fast and accurate long-read alignment with Burrows- Wheeler transform. Bioinformatics 26: 589–595, 2010 Table S7: Individuals of compound heterozygous for rare APOL1 15. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, variants. et al.: A framework for variation discovery and genotyping using next- Table S8. Rare variants detected in APOL1 high risk genotype generation DNA sequencing data. Nat Genet 43: 491–498, 2011 families. 16. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy- fi Table S9. Top 100 genes for moderate and rare variant burden Moonshine A, et al.: From FastQ data to high con dence variant calls: The genome analysis Toolkit best practices pipeline. Curr Protoc test. Bioinformatics 43: 11.10.1–11.10.33, 2013 Table S10. Comparison of gene rank sum between case and 17. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, control group, genes were ranked by synonymous sites. et al.: Reference sequence (RefSeq) database at NCBI: Current status, Table S11. The enrichment of glomerulus-expression enriched taxonomic expansion, and functional annotation. Nucleic Acids Res 44 – genes in top ranked genes. [D1]: D733 D745, 2016 18. Zhao H, Sun Z, Wang J, Huang H, Kocher JP, Wang L: CrossMap: A Table S12. Top ranked 60 genes, known genes and new genes. versatile tool for coordinate conversion between genome assemblies. Table S13. The rare variants in top ranked 60 genes (Table S12). Bioinformatics 30: 1006–1007, 2014 Table S14. FSGS disease associated genes. 19. Han L, Abney M: Identity by descent estimation with dense genome- Table S15. Network distance of top ranked new genes with known wide genotype data. Genet Epidemiol 35: 557–567, 2011 genes. 20. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al.: The ensembl variant effect predictor. Genome Biol 17: 122, 2016 21. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al.: Exome Aggregation Consortium: Analysis of protein-coding – REFERENCES genetic variation in 60,706 humans. Nature 536: 285 291, 2016 22. Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S: Identifying a high fraction of the to be under selective 1. Rosenberg AZ, Kopp JB: Focal segmental glomerulosclerosis. Clin constraint using GERP++. PLOS Comput Biol 6: e1001025, 2010 JAmSocNephrol12: 502–517, 2017 23. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J: A 2. Lovric S, Ashraf S, Tan W, Hildebrandt F: Genetic testing in steroid- general framework for estimating the relative pathogenicity of human resistant nephrotic syndrome: When and how? Nephrol Dial Transplant genetic variants. Nat Genet 46: 310–315, 2014 31: 1802–1813, 2016 24. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, 3. Kaplan JM, Kim SH, North KN, Rennke H, Correia LA, Tong HQ, et al.: Bork P, et al.: A method and server for predicting damaging mis- Mutations in ACTN4, encoding a-actinin-4, cause familial focal seg- sense mutations. Nat Methods 7: 248–249, 2010 mental glomerulosclerosis. Nat Genet 24: 251–256, 2000 25. Kumar P, Henikoff S, Ng PC: Predicting the effects of coding non- 4. Winn MP, Conlon PJ, Lynn KL, Farrington MK, Creazzo T, Hawkins AF, synonymous variants on protein function using the SIFT algorithm. Nat et al.: A mutation in the TRPC6 cation channel causes familial focal Protoc 4: 1073–1081, 2009 segmental glomerulosclerosis. Science 308: 1801–1804, 2005 26. Jagadeesh KA, Wenger AM, Berger MJ, Guturu H, Stenson PD, Cooper 5. Brown EJ, Schlöndorff JS, Becker DJ, Tsukaguchi H, Tonna SJ, DN, et al.: M-CAP eliminates a majority of variants of uncertain signifi- Uscinski AL, et al.: Mutations in the formin gene INF2 cause focal cance in clinical exomes at high sensitivity. Nat Genet 48: 1581–1586, 2016 segmental glomerulosclerosis. Nat Genet 42: 72–76, 2010 27. Chun S, Fay JC: Identification of deleterious mutations within three 6. Vivante A, Hildebrandt F: Exploring the genetic basis of early-onset human genomes. Genome Res 19: 1553–1561, 2009 chronic kidney disease. Nat Rev Nephrol 12: 133–146, 2016 28. Schwarz JM, Rödelsperger C, Schuelke M, Seelow D: MutationTaster 7. Santín S, García-Maset R, Ruíz P, Giménez I, Zamora I, Peña A, et al.: evaluates disease-causing potential of sequence alterations. Nat FSGS Spanish Study Group: Nephrin mutations cause childhood- and Methods 7: 575–576, 2010 adult-onset focal segmental glomerulosclerosis. Kidney Int 76: 1268– 29. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, 1276, 2009 et al.: 1000 Genomes Project Consortium: A global reference for hu- 8. Stokman MF, Renkema KY, Giles RH, Schaefer F, Knoers NV, van Eerde man genetic variation. Nature 526: 68–74, 2015 AM: The expanding phenotypic spectra of kidney diseases: Insights 30. Devlin B, Roeder K: Genomic control for association studies. Biometrics from genetic studies. Nat Rev Nephrol 12: 472–483, 2016 55: 997–1004, 1999 9. Mistry K, Ireland JH, Ng RC, Henderson JM, Pollak MR: Novel mu- 31. Clayton DG, Walker NM, Smyth DJ, Pask R, Cooper JD, Maier LM, tations in NPHP4 in a consanguineous family with histological find- et al.: Population structure, differential bias and genomic control in a ings of focal segmental glomerulosclerosis. AmJKidneyDis50: large-scale, case-control association study. Nat Genet 37: 1243–1246, 855–864, 2007 2005

JASN 30: 1625–1640, 2019 Rare Gene Variants Contributions to FSGS 1639 BASIC RESEARCH www.jasn.org

32. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ: Second- 47. Fuentes Fajardo KV, Adams D, Mason CE, Sincan M, Tifft C, Toro C, generation PLINK: rising to the challenge of larger and richer datasets. et al.: NISC Comparative Sequencing Program: Detecting false-posi- Gigascience 4: 7, 2015 tive signals in exome sequencing. Hum Mutat 33: 609–613, 2012 33. Besse W, Dong K, Choi J, Punia S, Fedeles SV, Choi M, et al.: Isolated 48. Tory K, Menyhárd DK, Woerner S, Nevo F, Gribouval O, Kerti A, et al.: polycystic disease genes define effectors of polycystin-1 function. Mutation-dependent recessive inheritance of NPHS2-associated ste- JClinInvest127: 1772–1785, 2017 roid-resistant nephrotic syndrome. Nat Genet 46: 299–304, 2014 34.JinSC,HomsyJ,ZaidiS,LuQ,MortonS,DePalmaSR,etal.:Contri- 49. Genovese G, Friedman DJ, Ross MD, Lecordier L, Uzureau P, Freedman BI, bution of rare inherited and de novo variants in 2,871 congenital heart et al.: Association of trypanolytic ApoL1 variants with kidney disease in disease probands. Nat Genet 49: 1593–1601, 2017 African Americans. Science 329: 841–845, 2010 35. Fay MP, Proschan MA: Wilcoxon-Mann-Whitney or t-test? On as- 50. Tzur S, Rosset S, Shemer R, Yudkovsky G, Selig S, Tarekegn A, et al.: sumptions for hypothesis tests and multiple interpretations of decision Missense mutations in the APOL1 gene are highly associated with end rules. Stat Surv 4: 1–39, 2010 stage kidney disease risk previously attributed to the MYH9 gene. Hum 36. Ding F, Tan A, Ju W, Li X, Li S, Ding J: The prediction of Key cyto- Genet 128: 345–350, 2010 skeleton components involved in glomerular diseases based on a 51. Kopp JB, Nelson GW, Sampath K, Johnson RC, Genovese G, An P, et al.: protein-protein interaction network. PLoS One 11: e0156024, 2016 APOL1 genetic variants in focal segmental glomerulosclerosis and 37. Sedgewick R, Wayne K: Algorithms, Boston, Addison-Wesley, 2011 HIV-associated nephropathy. JAmSocNephrol22: 2129–2137, 2011 38. von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, 52. Moreau Y, Tranchevent LC: Computational tools for prioritizing can- et al.: STRING: Known and predicted protein-protein associations, in- didate genes: Boosting disease gene discovery. Nat Rev Genet 13: tegrated and transferred across organisms. Nucleic Acids Res 33: 523–536, 2012 D433–D437, 2005 53. Cowen L, Ideker T, Raphael BJ, Sharan R: Network propagation: A 39. Li T, Wernersson R, Hansen RB, Horn H, Mercer J, Slodkowicz G, et al.: A universal amplifier of genetic associations. Nat Rev Genet 18: 551–562, scored human protein-protein interaction network to catalyze genomic 2017 interpretation. Nat Methods 14: 61–64, 2017 54. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al.: ACMG 40. Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS: A high- Laboratory Quality Assurance Committee: Standards and guidelines performance computing toolset for relatedness and principal compo- for the interpretation of sequence variants: A joint consensus recom- nent analysis of SNP data. Bioinformatics 28: 3326–3328, 2012 mendation of the American College of Medical Genetics and Geno- 41. Luca D, Ringquist S, Klei L, Lee AB, Gieger C, Wichmann HE, et al.: On mics and the Association for Molecular Pathology. Genet Med 17: the use of general control samples for genome-wide association 405–424, 2015 studies: Genetic matching highlights causal variants. Am J Hum Genet 55. Rasouly HM, Groopman EE, Heyman-Kantor R, Fasel DA, Mitrotti A, 82: 453–463, 2008 Westland R, et al.: The burden of candidate pathogenic variants for 42. Epstein MP, Duncan R, Broadaway KA, He M, Allen AS, Satten GA: kidney and genitourinary disorders emerging from exome sequencing. Stratification-score matching improves correction for confounding by AnnInternMed170: 11–21, 2019 population stratification in case-control association studies. Genet 56. Samocha KE, Kosmicki JA, Karczewski KJ, O’Donnell-Luria AH, Pierce- Epidemiol 36: 195–205, 2012 Hoffman E, MacArthur DG, et al.: Regional missense constraint im- 43. Giglio S, Provenzano A, Mazzinghi B, Becherucci F, Giunti L, Sansavini G, proves variant deleteriousness prediction [published online ahead of et al.: Heterogeneous genetic alterations in sporadic nephrotic syndrome print June 12, 2017]. bioRxiv doi:1101/148353 associate with resistance to immunosuppression. J Am Soc Nephrol 26: 57. Voskarides K, Damianou L, Neocleous V, Zouvani I, Christodoulidou S, 230–236, 2015 Hadjiconstantinou V, et al.: COL4A3/COL4A4 mutations producing 44. Trautmann A, Bodria M, Ozaltin F, Gheisari A, Melk A, Azocar M, et al.: focal segmental glomerulosclerosis and renal failure in thin basement PodoNet Consortium: Spectrum of steroid-resistant and congenital membrane nephropathy. J Am Soc Nephrol 18: 3004–3016, 2007 nephrotic syndrome in children: The PodoNet registry cohort. Clin J 58. Sadowski CE, Lovric S, Ashraf S, Pabst WL, Gee HY, Kohl S, et al.: SRNS Am Soc Nephrol 10: 592–600, 2015 Study Group: A single-gene cause in 29.5% of cases of steroid-resistant 45. Itan Y, Shang L, Boisson B, Patin E, Bolze A, Moncada-Vélez M, et al.: nephrotic syndrome. JAmSocNephrol26: 1279–1289, 2015 The human gene damage index as a gene-level approach to priori- 59. Schumacher V, Schärer K, Wühl E, Altrogge H, Bonzel KE, Guschmann tizing exome variants. Proc Natl Acad Sci U S A 112: 13615–13620, M, et al.: Spectrum of early onset nephrotic syndrome associated with 2015 WT1missensemutations.Kidney Int 53: 1594–1600, 1998 46. Shyr C, Tarailo-Graovac M, Gottlieb M, Lee JJ, van Karnebeek C, 60. Devuyst O, Pattaro C: The UMOD locus: Insights into the pathogen- Wasserman WW: FLAGS, frequently mutated genes in public exomes. esis and prognosis of kidney disease. JAmSocNephrol29: 713–726, BMC Med Genomics 7: 64, 2014 2018

AFFILIATIONS

1Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts; 2Department of Medicine, Harvard Medical School, Boston, Massachusetts; 3Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts; 4Division of Nephrology, Department of Medicine, University of Calgary, Cumming School of Medicine, Calgary, Alberta, Canada; 5Stanley Center for Psychiatric Research, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts; 6Division of Nephrology, Department of Medicine, Columbia University College of Physicians and Surgeons, New York, New York; 7Laboratory of Human Genetics and Genomics, The Rockefeller University, New York, New York; and 8Department of Genetics, Yale University School of Medicine, New Haven, Connecticut

1640 JASN JASN 30: 1625–1640, 2019