CLINICAL RESEARCH www.jasn.org

Characterization of Coding/Noncoding Variants for SHROOM3 in Patients with CKD

Jeremy W. Prokop,1,2 Nan Cher Yeo,3 Christian Ottmann,4,5 Surya B. Chhetri,1,6 Kacie L. Florus,1 Emily J. Ross,1,7 Nadiya Sosonkina,1 Brian A. Link,8 Barry I. Freedman,9 Candice J. Coppola,6 Chris McDermott-Roe,10 Seppe Leysen,4 Lech-Gustav Milroy,4 Femke A. Meijer,4 Aron M. Geurts,10 Frank J. Rauscher III,11 Ryne Ramaker,1 Michael J. Flister,10 Howard J. Jacob,1 Eric M. Mendenhall,1,6 and Jozef Lazar1

Due to the number of contributing authors, the affiliations are listed at the end of this article.

ABSTRACT Background Interpreting genetic variants is one of the greatest challenges impeding analysis of rapidly increasing volumes of genomic data from patients. For example, SHROOM3 is an associated risk for CKD, yet causative mechanism(s) of SHROOM3 allele(s) are unknown.

Methods We used our analytic pipeline that integrates genetic, computational, biochemical, CRISPR/Cas9 editing, molecular, and physiologic data to characterize coding and noncoding variants to study the human SHROOM3 risk locus for CKD. Results We identified a novel SHROOM3 transcriptional start site, which results in a shorter isoform lacking the PDZ domain and is regulated by a common noncoding sequence variant associated with CKD (rs17319721, allele frequency: 0.35). This variant disrupted allele binding to the transcription factor TCF7L2 in podocyte cell nuclear extracts and altered transcription levels of SHROOM3 in cultured cells, potentially through the loss of repressive looping between rs17319721 and the novel start site. Although common variant mechanisms are of high utility, sequencing is beginning to identify rare variants involved in disease; therefore, we used our biophysical tools to analyze an average of 112,849 individual sequences for rare SHROOM3 missense variants, revealing 35 high-effect variants. The high-effect alleles include a coding variant (P1244L) previously associated with CKD (P=0.01, odds ratio=7.95; 95% CI, 1.53 to 41.46) that we find to be present in East Asian individuals at an allele frequency of 0.0027. We determined that P1244L attenuates the interaction of SHROOM3 with 14–3-3, suggesting alterations to the Hippo pathway, a known mediator of CKD. Conclusions These data demonstrate multiple new SHROOM3-dependent genetic/molecular mecha- nisms that likely affect CKD.

J Am Soc Nephrol 29: 1525–1535, 2018. doi: https://doi.org/10.1681/ASN.2017080856

The combination of current sequencing and genome wideassociationstudies(GWAS)haslinkedhundredsof Significance Statement human loci with CKD. Yet the majority of causative Although the genetics of CKD are beginning to be deciphered, interpretation of how variants result in Received August 8, 2017. Accepted January 19, 2018. disease remains a challenge that is increasing as more and more genomes are being sequenced. In this paper, Published online ahead of print. Publication date available at we use our workflow designed to assess variants to www.jasn.org. develop mechanistic insights into CKD variants, high- Correspondence: Dr. Jeremy W. Prokop or Dr. Jozef Lazar, lighting new knowledge of both common noncoding HudsonAlpha Institute for Biotechnology, 601 Genome Way and rare coding variants within SHROOM3. The de- Northwest, Huntsville, AL 35806. E-mail: [email protected] tailed knowledge gleaned for function of SHROOM3 in or [email protected] podocytes advances novel pathways and mechanisms for CKD. Copyright © 2018 by the American Society of Nephrology

J Am Soc Nephrol 29: 1525–1535, 2018 ISSN : 1046-6673/2905-1525 1525 CLINICAL RESEARCH www.jasn.org

faced by genetic research is to develop strate- gies that will rapidly characterize variants and provide insight for personalizing disease in- terventions.1 Here, we deployed our sequence-to-struc- ture-to-function approach2 that integrates ge- netic, computational, biochemical, CRISPR/ Cas9 editing, molecular, and physiologic data to characterize coding and noncoding geno- mic variants identified for CKD3–10 risk within SHROOM3. SHROOM3 is an actin- binding involved in cell shape, neural tube formation, and epithelial morphogene- sis.11,12 A recent study showed that rs17319721 was associated with changes in expression of SHROOM3,andisaleadingSNPincreasing renal fibrosis in patients with kidney trans- plantation.13,14 Our group identified missense variants within Shroom3 of the FHH rat that affect normal maintenance of kidney glomer- ular filtration.15 In mice, genetic deletion of Shroom3 confirms its role in glomerular func- tion and maintenance of proper podocyte morphology, with alterations of apically dis- tributed actin.12 The apical construction role of SHROOM3 was first documented in neu- rulation.16 This paper lays out mechanistic in- sights into both noncoding GWAS-associated common variants and rare coding variants of SHROOM3,layingoutaworkflow for addi- tional GWAS LD block analysis.

Figure 1. Analysis of SHROOM3 data from Roadmap Epigenetics. Core 15-state METHODS model for multiple human tissue types for SHROOM3 gene. Colors indicate the predicted states: red=active TSS, orange red=flanking active TSS, green=transcript, Analysis of SHROOM3 Regulation yellow=enhancer, gray=repressed polycomb, white=quiescent. All 15 colors for each 17 state can be found at http://egg2.wustl.edu/roadmap/web_portal/chr_state_learning. The roadmap epigenomics 15 core data html#core_15state. Three active TSSs (labeled on top), resulting in three isoforms for were viewed for SHROOM3 and sorted on the SHROOM3 gene. Neural tissue is boxed in blue, fetal kidney in red, and adrenal the basis of expression and isoform detec- gland in green. The bottom of the figure is the zoomed in view of the CKD-associated tion. Start sites predicted for SHROOM3 LD block of SNPs associated in GWAS showing a breakdown of the 15-state model, were identified using SwitchGear. The LD 25-state model, DNase hypersensitivity, H3kme1, H3kme3, vertebrate conservation, block for the GWAS near SHROOM3 was human GWAS lead SNPs, and the HapMap CEU Utah LD analysis. The red intensity identified using the SNAP tool18 with a 0.8 shows the correlation of any two points (red is highest correlation) on the chromo- correlation. Biotin-conjugated DNA probes some for coinheritance of genetic variants, such that the point of the triangle is the were used to perform LightShift Chemilumi- correlation of the two edges of the base. CEU, Utah Residents with Northern and nescent Electrophoresis Mobility Shift As- Western European Ancestry; LD, linkage disequilibrium; SNP, single nucleotide polymorphism. says (EMSA) (ThermoFisher) as previously published.19

CRISPR/Cas9 Modification variant(s) and mechanism(s) within each locus that confers risk CRISPR/Cas9 replacement of rs17319721 was performed remainlargelyunknown.Thus,rapidlygrowingvolumesofgenetic using gRNAs following previous published conditions20 data without functional validation have dramatically increased the into HEK293T cells. Cells underwent clonal expansion catalog of variants that have not been categorized with respect to and variants were confirmed with Sanger sequencing. function. We and others postulate that the next greatest challenge Real-time qPCR was performed using the RNeasy Plus

1526 Journal of the American Society of Nephrology J Am Soc Nephrol 29: 1525–1535, 2018 www.jasn.org CLINICAL RESEARCH

Zebrafish Experiments to Test the Short Isoform of SHROOM3 Human SHROOM3 cDNA ORF (NM_020859) was purchased from OriGene Technolo- gies and the ASD2 (ΔASD2) or PDZ (ΔPDZ) were removed using Phusion site-directed mutagenesis. Zebrafish coin- jections were performed on one- to four- cell–stage zebrafish embryos, and dextran clearance assay performed as previously described.15 A p53 morpholino was coin- jected to reduce off-target cell dealth,22 and efficiency of morpholino has been previ- ously published.15

P1244L Analysis Human coding variants for SHROOM3 were pulled from the gnomAD data- base23 and potential functional variants were assessed using our sequence-to- structure-to-function tools.2 LATS2 phosphorylation was analyzed using multiple custom peptides, and ADP-Glo Kinase Assay. For determining Kd and solving the crystal structure of 14–3-3 with SHROOM3 (either WT or P1244L) the peptides were synthesized24 and combined with purified 14–3-3. Isother- mal titration calorimetry measurements were performed with Malvern MicroCal iTC200. Crystals were set up using 14–3-3 and SHROOM3 peptide dissolved in crystallization buffer and mixed in a 1:1 stoichiometry, set up for sitting-drop crystallization; crystals were captured af- ter 10 days of incubation at 4°C; and dif- fraction data collected at the 306SA/PXI beamline. Figure 2. Nuclear interactions with the SHROOM3 CKD noncoding associated SNPs. (A) EMSA using probes with minor (A) or major (G) alleles of rs17319721 and nuclear lysates from HEK293 and the three primary kidney cell nuclear extracts (endothelial, tubule, and RESULTS podocyte). Shown to the side of the representative EMSA is the quantification of free probe and shifted probe from three separate replicates. The stars are sites significantly different from the control. (B) EMSA assays of rs17319721 using recombinant TCF7L2 Transcript Characterization of and FOXO1 compared with the shifting seen by the podocyte nuclear lysate. The bands SHROOM3 Isoforms in Human and for TCF7L2 are identified in blue and those for FOXO1 in magenta with the quantification Mouse Tissues of three separate experiments shown below the representative EMSA. An additional Although the SHROOM3 locus has been scrambled probe was used for the EMSA where eight bases where mutated in the middle implicated in CKD,3–10 the molecular of the probe. (C) Models for both TCF7L2 (blue) and FOXO1 (magenta) interacting with mechanisms that affect CKD through the DNA near variant rs17319721. DNA is shown in gray with the variant location colored SHROOM3 are unknown. Data from the in red. Nuc, nuclear. human Roadmap Epigenomics Project17 were scanned for transcriptional regulation Mini kit for mRNA isolation, QuantiTect SYBR Green PCR surrounding SHROOM3 (Figure 1). Analysis of the core Kit, a QuantStudio 6 Flex Real-Time PCR system, and rel- 15-state model25 suggests that there is active SHROOM3 tran- ativegeneexpressionusingtheDD CT method21 with scription (green) in most tissue types. However, the GAPDH as a reference. transcriptional state (green, Figure 1) is of various sizes and

J Am Soc Nephrol 29: 1525–1535, 2018 CKD and SHROOM3 Variants 1527 CLINICAL RESEARCH www.jasn.org

Figure 3. Demonstrating expression changes in SHROOM3 by allelic specific changes in rs17319721. (A) Transcripts of the SHROOM3 gene with primer combinations designed to identify isoform 1 using primers on exons 2 and 4 (E2+E4) or all isoforms (exons 8+9, E8+E9). (B) RT-PCR of exon primer pairs showing expression of full-length isoform (E2E4) only in RPMI8226 and not HEK293T cells, whereas all other exon pairs can be seen in all samples. (C) Using a donor sequence after CRISPR/Cas9 editing, the wild-type G allele (WT) at rs17319721 converted into a homozygous A allele (Mut) in HEK293T cells confirmed by Sanger sequencing. (D) Expression of short (E8E9) forms of SHROOM3 were seen (larger change of shorter isoforms) to elevate after the generation of the A allele (Mut) of rs17319721. Expression of the that flank SHROOM3, SEPT11,andFAM47E were not changed. Error bars for all represent the SEM with significance (*) determined as P,0.05. SNP, single nucleotide polymorphism. can be binned into three isoforms on the basis of different (Supplemental Figure 4) that was confirmed in addition to iso- transcriptional start sites (TSSs: TSS1 at chr4:77356253, form 2/3 protein in immunoprecipitations of primary podocytes TSS2 at chr4:77507377, and TSS3 at chr4:77610545). Neuro- (Supplemental Figure 5). The CKD-associated LD block was nal cells contain active histone marks suggestive of a full- identified (zoomed in bottom, Figure 1) after TSS1 and is located length isoform with a promoter (red, Figure 1) annotated at approximately 100 kb upstream of TSS2/3. The implication of TSS1 (blue box, Figure 1). Tissues such as fetal kidney (red the shorter isoform in kidney is relevant to CKD, because early box, Figure 1) and fetal adrenal gland (green box, Figure 1) papers claimed the associated SNP was intronic,13 so changing have a shorter isoform (isoform 2) with transcription starting how one interprets the biologic role of SNPs. between exons two and three (of long isoform 1) with TSS2 Analysis of LD for rs17319721 identified 11 noncoding SNPs and TSS3 identified as promoter state (red, Figure 1). Analysis with 0.8 correlation in the Utah Residents with Northern and West- of frequency of reads from RNAseq for the Human Roadmap ern European Ancestry population, four of which have been asso- Epigenomics Program (Supplemental Figure 1), the Human ciated with CKD. Using RegulomeDB26 to prioritize these 11 SNPs, Protein Atlas (Supplemental Table 2), and 1830 FANTOM only rs17319721 is predicted to have high effect with high conser- CAGE datasets (Supplemental Figure 2) shows strong corre- vation, DNase hypersensitivity (bottom Figure 1), and prediction to lation with data from the core 15-state model for a full-length alter TCF7L2 or FOXO1 binding. Utilizing publicly available Hi- isoform in neuronal cells (isoform 1) and a shorter isoform ChIP data,27 we show that in a cell line, GM12878 (with no detect- found in most other tissues including kidney (isoform 2/3). able levels of SHROOM3), chromatin contacts exist between areas Sixty RNAseq datasets generated from different cell types near rs17319721 and TSS2 (Supplemental Figure 6), suggesting a (Supplemental Figure 3) were analyzed for reads that map to looping structure potentially involved in gene repression. Analyses SHROOM3, showing peaks of transcription start at TSS2/3 in of current ENCODE ChIP-Seq data for transcription factor (TF) kidney and TSS1 in brain. Analysis of RNAseq from isolated binding near rs17319721 and TSS2 reveal eight factors shared be- mouse glomeruli and 20 single cell RNAseq runs from podo- tween both sites (Supplemental Figure 7), including association of cyte show a similar short isoform (NM_001077595) that is not TCF7L2 to TSS2 and association of linker protein CTCF at both present in cells from tubule. HEK293 cells, used for variant anal- sites. A network analysis of HEK293 cells for TFs near TSS2 shows ysis, show a similar isoform to podocytes but with lower levels of association of TCF7L2 and also multiple TFs for gene repression transcripts. Breaking down individual glomeruli and podocyte such as TRIM28/KAP1 (Supplemental Figure 8). This data suggests datasets with SHROOM3 expression, a fourth novel mouse iso- that rs17319721 is potentially the functional SNP for CKD as a form was mapped that results in a 45.9 kD protein regulatory site for TSS2 within the shorter isoform of SHROOM3.

1528 Journal of the American Society of Nephrology J Am Soc Nephrol 29: 1525–1535, 2018 www.jasn.org CLINICAL RESEARCH

Figure 4. Kidney functionality of the shortened SHROOM3 protein. (A) Schematic of the full-length human SHROOM3, PDZ, and ASD2 deletion constructs. The resulting from isoform 2/3 relative to isoform 1 result in the removal of the PDZ domain. A control construct removing the ASD2 domain was also designed. Isoform 2/3 starts protein production at the conserved MM site (amino acid 177 of full-length SHROOM3). (B) Evolutionary analysis of SHROOM3 and the highly homologous gene SHROOM2 identifying func- tional domains and linear motifs. Previously unpublished motifs are seen conserved in SHROOM2 and SHROOM3 (magenta). (C) Representative images of zebrafish dorsal aorta at 1, 24, and 48 hours after injection of 70 kD FITC dextran. (D) Coinjection of shroom3+tp53 MO with SHROOM3ΔPDZ but not SHROOM3ΔASD2 mRNA rescued dextran leakage induced by the MO (n=15, 9, 9, and 9, respectively; *P,0.05 versus uninjected). hpi, hours post injection; MM, double methionine start site; MO, morpholino.

EMSA to Map Functional Noncoding Variants rs17319721 changes the binding of TCF7L2, a factor with known To confirm the potential role of the noncoding rs17319721, we human CKD effect28 and suggested podocyte function.29 performedbiochemicalTFinteractiontestsusingEMSA,identifying only two of the LD SNPs to have potential altered binding (Supple- Modification of LD Block and rs17319721 Using mental Figure 9). Taking nuclear extract from multiple cell lines CRISPR/Cas9 with variable expression of SHROOM3, we show a strong shifted To test rs17319721 in a cellular model for SHROOM3 regulation, band for rs17319721 that can be cold-outcompeted (Supplemental we used CRISPR/Cas9 to generate the homozygous A allele in Figure 9). Nuclear extracts were taken from three primary kidney HEK293T cells (Figure 3A). HEK293T cells were used as a model cell types (glomerular endothelial, tubule, and podocyte) and for kidney because this cell line expresses the short isoform of HEK293 cell line. These extracts were mixed with rs17319721 SHROOM3, and as a stable cell line facilitated gene editing. DNA probes containing either major (G) or minor (A) alleles, HEK293T cells homozygous for the A allele at rs17319721 were and EMSA was performed in three independent experiments (Fig- confirmed by Sanger sequencing (Figure 3C). The A allele variant ure 2A). Podocytes for rs17319721 had the largest and most signif- resulted in an elevated SHROOM3 expression of the short isoform icant shift between major and minor alleles, with no changes in relative to GAPDH, without changing expression of two flanking binding observed in HEK293 or glomerular endothelial extracts, genes, SEPT11 and FAM47E (Figure 3D), and still did not produce and some relative loss of binding from tubule nuclear extracts. any detectable long isoform (Figure 3B). Toconfirm the role of the Because rs17319721 falls between FOXO1 and TCF7L213 bind- LDblockinTSS2regulation,weremovedtheentireLDblock ing motifs (Figure 2C), we evaluated ability of recombinant pro- from HEK293 cells, which also resulted in an elevation of short tein for each to compete with shifts induced by podocyte nuclear and not long isoform of SHROOM3. The number of modified extracts. TCF7L2 protein shifts the probe to the position of the alleles in these HEK293 cells correlates to expression change in lower band (Figure 2A) whereas FOXO1 shifts to the higher band, SHROOM3 (Supplemental Figure 10). This study demonstrates both of which were present in podocyte nuclear lysate shifts (Fig- that the minor allele (A variant) can significantly increase expres- ure 2B). The lower band shift by TCF7L2 is altered by the minor sion of a short isoform of SHROOM3. allele to the same level as an 8-bp scramble of the loci, whereas FOXO1 minor allele has less alteration than the scramble element. Functionality of the Shorter SHROOM3 Isoform in Of note, lack of complete loss of binding by TCF7L2 is not a Kidney surprise because it is a minor groove binding protein, with high With strong evidence that the shorter SHROOM3 isoform in nonspecific DNA interaction. This suggests that within podocytes kidney is altered by the minor allele of rs17319721 in the CKD

J Am Soc Nephrol 29: 1525–1535, 2018 CKD and SHROOM3 Variants 1529 CLINICAL RESEARCH www.jasn.org

Figure 5. Biochemistry of the CKD-associated P1244L SHROOM3 variant. (A) Within gnomAD are 1043 missense variants, with 35 having an effect score .100 (red box). The variant our group previously identified in association with CKD,15 P1244L, (OR 7.95; 95% CI 1.53 to 41.46, P=0.01) is found in East Asian individuals within gnomAD (blue box). (B) Evolutionary analysis of the P1244 location (red)

1530 Journal of the American Society of Nephrology J Am Soc Nephrol 29: 1525–1535, 2018 www.jasn.org CLINICAL RESEARCH

GWAS LD block, we evaluated the resulting protein. The three allowing for the breakdown of complex CKD GWAS LD isoforms of human transcripts result in two SHROOM3 pro- blocks; and (2) discovery of novel SHROOM3 mechanisms teins, with isoform 1 being the longest and isoform 2/3 sharing through rare disease-associated variants in unknown regions the same initiator methionine found as an MM sequence at of the SHROOM3 protein. We queried all of the human amino acid 177 of full-length isoform 1 (Figure 4A). A fourth SHROOM3 variants identified in the gnomAD database23 isoform resulting from an unknown TSS or splicing has also (covering an average of 112,849 individual whole genomes been identified that results in a 45.9 kD protein (Supplemental or exomes) and identified 2237 variants in SHROOM3 nucleo- Figures 4 and 5); however, little is currently known about the tides (Supplemental Table 3). Of these variants, 1043 were functionality of this isoform 4 besides the fact that it lacks both missense (Figure 5A). Each variant was run through Poly- the PDZ and ASD1 regions. Isoform 1 is the longest protein, Phen2 with 518 of 1043 (49.66%) identified as benign, 195 containing PDZ, ASD1, and ASD2 domains. The second protein of 1043 (18.70%) as possibly damaging, and 330 of 1043 contains all of the same domains except the PDZ (Figure 4A), (31.64%) as “probably damaging.” To prioritize the “probably similar to previously identified mouse shrmS.11 Our sequence- damaging” variants, we created an effect score by multiplying to-structure-to-function approach2 was used to assess conser- the allele counts, codon selection score, and 21-codon sliding vation in multiple sequences for full-length SHROOM3 window score for each variant, identifying 35 variants with an protein (Supplemental Figures 11 and 12). To validate effect score of at least 100 (red box, Figure 5A; details provided our prediction of the potential functional regions of in Supplemental Table 4). It should be noted that our evolu- SHROOM3, we ran the same analysis for the highly homol- tionary metrics used in this paper add codon selection, de- ogous SHROOM2 (Figure 4B). The PDZ domain of tection of conserved linear motifs, and allele frequencies to SHROOM3 is highly conserved and has been suggested to outputs of PolyPhen2 prediction status. be a critical functional region in neural biology,11,30 butitis When we evaluated the distribution of these 35 variants not known what effect the lack of the PDZ domain in iso- within specific population cohorts (Supplemental Figure 13, form 2/3 has on functional protein within kidney. Supplemental Table 5), a specific variant, G186R, within South Totest functionality, we created a human SHROOM3 mutant Asian individuals (allele frequency of 5.09%) had the second allele lacking either ASD2 (ΔASD2) or PDZ (ΔPDZ, podocyte highest effect score (14,902). Within the East Asian population isoform 2) (Figure 4A) and tested these alleles using a zebrafish P1244L was identified, with an allele frequency of 0.27% (50 of assay for renal function. Coinjection of a shroom3+tp53 mor- 18,674 alleles) in East Asian individuals (blue box, Figure 5A). pholino with SHROOM3ΔPDZ mRNA successfully prevented We have shown that P1244L is associated with CKD (P=0.01, the dextran leakage phenotype resulted by injection of the mor- OR=7.95) and that the mutation is unable to recover the pholino alone similar to isoform 1; whereas, SHROOM3ΔASD2 SHROOM3 dextran zebrafish kidney assays, suggesting a loss of mRNA did not (Figure 4, C and D). This suggests that isoforms 1 function in animal studies.15 However, no mention of cellular and 2/3 both have functional capacity in kidney. Therefore, the mechanisms has been put forth for this variant’slossoffunction. shorter isoform 2/3 identified in podocytes is sufficient for kid- ney function. Biophysical Insights into SHROOM3 P1244L CKD- Associated Variant Rare Variants in Human Populations for SHROOM3 A deeper analysis of P1244 revealed it to be highly conserved Given the evidence that commonly inherited variants within in 125 species that have SHROOM3, and it is also conserved regulatory regions of SHROOM3 may be functional in GWAS, in 128 species within a homologous region of SHROOM2 we set out to look for rare coding variants that also could (Figure 5B). The site is within a predicted linear motif for contribute to CKD but are present at lower frequencies than 14–3-3 interaction (HVRSRSSP), which must be phosphor- needed for genome-wide significance. This analysis serves a ylated for binding. However, this site within SHROOM3 has two-fold importance for this paper: (1) identification of rare not been previously described despite its high conservation. missense variants within a gene near GWAS can serve as a This motif has a highly-conserved histidine, which we predict is a secondary independent confirmation of disease linkage, thus substrate for LATS1/2 kinase–mediated phosphorylation, thus

showing conserved sites for 14–3-3 binding (gray) and LATS1/2 kinase recognition (yellow). The site is also conserved in the SHROOM2 protein. (C) LATS2 kinase assay on peptides of SHROOM3 containing multiple mutations including the P1244L CKD variant. Wild-type SHROOM3 (black) and SHROOM3 P1244L (red) were both phosphorylated. Removal of the Histidine (SHROOM3 H123A), no peptide, and scrambled (SHROOM3 SS1241AA) all failed to phosphorylate. (D) Crystal structure of 14–3-3 (gray) interacting with either SHROOM3 WT (cyan) or P1244L (red). (E) Final electron density map (2Fo-Fc, contoured at 1s) of the WT (top) or P1244L variant showing detailed changes to binding pocket particularly around the P to L change with no modification for the binding of the phosphorylated Ser. (F) Water coordination is increased in the crystal structure for P1244L relative to WT. (G) Molecular dynamic simulations were performed for 125 nanoseconds on both the WT and P1244L SHROOM3–14–3-3 structures using the AMBER03 force field. All amino acids interacted the same except for the one at position 1244. L, leucine; P, proline; WT, wild type.

J Am Soc Nephrol 29: 1525–1535, 2018 CKD and SHROOM3 Variants 1531 CLINICAL RESEARCH www.jasn.org

Figure 6. Schematic of potential mechanism for SHROOM3 rs17319721 TSS2 regulation proposed in this study for individuals (A) without or (B) with rs17319721. LD, linkage disequilibrium.

regulating the 14–3-3 binding motif. To test this hypothesis, we disease,32,33 these results create a better understanding of how conducted an in vitro LATS2 phosphorylation assay. Both the SHROOM3 might contribute to CKD. WT and P1244L variant could be phosphorylated by LATS2 in a concentration-dependent mode as expected on the basis of the variant location (Figure 5C), whereas controls were not. Given the DISCUSSION fact that P1244L can still be phosphorylated, the 14–3-3 interac- tion kinetics are the next question to address. We solved the crystal As sequencing expands for clinical diagnosis and to guide treat- structures(PDB:6FBB/6FCP,Table1),toincrediblyhigh ment, the list of variants will expand for genes already associated resolution allowing for coordination of water to be visualized, with disease, as well as genes not previously connected to path- of complexes of WT or P1244L mutant with 14–3-3 (Figure ogenicity. Weapplied our sequence-to-structure-to-function ap- 5D). A magnified view of these structures showed proline at proach for variants of SHROOM3 in both protein coding and 1244 to be well packed with 14–3-3, whereas leucine at 1244 lacks noncoding regions to assign preliminary function to these multiple contacts (Figure 5E). The decrease in interaction with changes. In this process, we gained considerable insight into 14–3-3 resulting from P1244L also results in an increase in water SHROOM3 and its role in CKD. This work is not just limited coordination (Figure 5F), strongly suggesting there is reduced to SHROOM3, but the strategies can be applied to nearly any binding to 14–3-3. Molecular dynamic simulations of these two GWAS LD block, allowing for the breakdown of more complex crystal structures confirmed P1244L destabilizes contacts with multigene loci using a combination of common GWAS and rare 14–3-3 (Figure 5G). Further validation using isothermal titration missense variants within each gene. calorimetry and affinity capture showed that P1244L variant de- Broadly, the network of factors found in this study suggests creases binding to 14–3-3 (Supplemental Figure 14). Therefore, an enrichment of apoptotic signaling pathways and Hippo we have identified a previously unpublished conserved site within signaling for SHROOM3 (including FOXO1, TCF7L2, SHROOM3forLATS1/2phosphorylationand14–3-3 interaction LATS1/2, and 14–3-3 proteins). The Hippo pathway has pre- that contains a CKD-associated variant that alters the interaction viously been associated with kidney function34–37 and is a between SHROOM3 and 14–3-3. In affinity capture experiments known repressor of the Wnt/b-catenin signaling system,38 in vitro for 14–3-3 b, SHROOM3 was one of many binding pro- of which regulation for the noncoding variant rs17319721 teins,31 further supporting an in vitro/vivo interaction of the two by TCF7L2 is Wnt/b-catenin dependant.13 In evaluating non- proteins. Given the role of LATS1/2 and 14–3-3 in renal coding variants of SHROOM3, we determined that there are

1532 Journal of the American Society of Nephrology J Am Soc Nephrol 29: 1525–1535, 2018 www.jasn.org CLINICAL RESEARCH

Table 1. Statistics for crystal structure Data Collection Wild-Type P1244L PDB accession codes 6FBB 6FCP Wavelength, Å 1.0 Å 1.0 Resolution, Å 45.63–1.30 (1.32–1.30) 45.55–1.45 (1.48–1.45) Space group C2221 C2221 Cell parameters, Å a 82.53 82.53 b 112.24 112.18 c 62.72 62.53 a,b CC1/2,% 99.8 (87.3) 99.9 (90.9) a,c Rsym,% 8.7 (76.8) 6.8 (65.6) a,d,e Rmeas,% 9.5 (84.2) 7.4 (71.2) a Average I/s(I) 15.9 (3.2) 21.1 (4.1) Completeness, %a 100 (100) 99.6 (97.3) No. of unique reflectionsa 71785 (3534) 51510 (2541) Redundancya 12.7 (11.9) 12.9 (12.9) Wilson B-factor, Å2 11.6 12.6 Refinement Number of nonhydrogen protein/solvent atoms 2008/304 2027/400

Rwork,% 15.41 13.32

Rfree,% 16.64 15.90 No. of reflections in the “free” set 3513 2554 R.m.s. deviations from ideal values Bond lengths (Å)/bond angles (°) 0.009/1.2 0.008/1.17 Average protein/solvent B-factor, Å2 18.6/31.2 17.4/33.7 Ramachandran plot: favored/outlier residues, % 98.8/0.0 99.0/0.0 Multiple statistics (first column) for solved crystal structures of Wild-Type SHROOM3 (middle column) or P1244L (last column) when interacting with 14-3-3. PDB,

Protein Data Bank; I/s(I), signal to noise of intensity values; r.m.s., root mean square. aNumber in parentheses is for the highest resolution shell used in the refinement. b 43 CC1/2=Pearson’s intradataset correlation coefficient, as described by Karplus and Diederichs. c Rsym=∑h∑l│Ihl – Ih│/∑h∑l,Ih.,whereIhl is the intensity of the lth observation of reflection h and ,Ih. is the average intensity of reflection h. d Rmeas=∑h│!(nh/(nh – 1))∑l│Ihl – ,Ih.││/∑h∑l,Ih.,wherenh is the number of observations of reflection h. eCorrelation of experimental intensities with intensities calculated from refined model, as described by Karplus and Diederichs.43 multiple isoforms predicted to encode three proteins, two that highlighting the future potential of precision genome editing lack the PDZ domain. Public databases revealed this shorter in deciphering complex disease mechanisms. In conclusion, isoform is expressed in kidney and more specifically podo- we show for the firsttimethatvariantsinSHROOM3 that cytes. Isoforms of Shroom in Drosophila melanogaster are alter gene expression and protein function have the ability to known to change cellular localization and thus actomyosin affect CKD, laying a foundation for future CKD and disease networks and cellular morphology.39 In mouse, Shroom3 iso- mechanistic insights. forms lacking the PDZ domain were originally identified by gene trap, resulting in the same sized protein as TSS2-derived human SHROOM3 protein.11 WeshowusingENCODEdata- sets that the rs17319721 region loops back on TSS2 in cells ACKNOWLEDGMENTS that do not express SHROOM3. More importantly, both sites have CTCF association, a factor known to serve a looping roll National Institutes of Health (NIH)–K01ES025435 (to J.W.P.), in insulating transcription.40 The HEK293 TF binding sug- NIH-R01HL069321 (to H.J.J.), NIH-R01EY014167 (to B.A.L.), the gests factors such as TCF7L2 serve a repression role of TSS2, Collaborative Research Centre 1093 through the Deutsche For- with association of factors such as HDAC241 and TRIM28/ schungsgemeinschaft (to C.O.), and HudsonAlpha Institute for KAP142 bound near TSS2 known to associate with transcrip- Biotechnology. tional repression. The rs17319721 A allele decreases TCF7L2 J.W.P., N.C.Y.,B.I.F.,F.J.R., M.J.F.,H.J.J., E.M.M., and J.L. designed binding and thus possibly results in the loss of looping with the studies; J.W.P., N.C.Y., C.O., B.A.L., S.B.C., C.M.-R., A.M.G., TSS2 that results in TSS2 transcriptional activation (Figure 6). K.L.F., E.J.R., N.S., S.L., L.-G.M., F.A.M., C.J.C., and R.R. carried out Toour knowledge this is the first time a single noncoding SNP the experiments and analyzed the data; J.W.P. and N.C.Y. made the has been integrated into a genome using CRISPR/Cas9 and figures; J.W.P., N.C.Y., H.J.J., E.M.M., and J.L. drafted and revised the showed changed gene regulation with effect on CKD, manuscript; all authors approved the final version of the manuscript.

J Am Soc Nephrol 29: 1525–1535, 2018 CKD and SHROOM3 Variants 1533 CLINICAL RESEARCH www.jasn.org

DISCLOSURES 18. Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de fi None. Bakker PIW: SNAP: A web-based tool for identi cation and anno- tation of proxy SNPs using HapMap. Bioinformatics 24: 2938–2939, 2008 19. Araujo FC, Milsted A, Watanabe IK, Puerto HL, Santos RA, Lazar J, et al.: REFERENCES Similarities and differences of X and Y homologous genes, SRY and SOX3, in regulating the renin-angiotensin system – 1. Jacob HJ: Next-generation sequencing for clinical diagnostics. NEngl promoters. Physiol Genomics 47: 177 186, 2015 JMed369: 1557–1558, 2013 20. Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, Zhang F: Genome en- – 2. Prokop JW, Lazar J, Crapitto G, Smith DC, Worthey EA, Jacob HJ: gineering using the CRISPR-Cas9 system. Nat Protoc 8: 2281 2308, 2013 Molecular modeling in the age of clinical genomics, the enterprise of 21. Livak KJ, Schmittgen TD: Analysis of relative gene expression data the next generation. JMolModel23: 75, 2017 using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. – 3. Köttgen A, Glazer NL, Dehghan A, Hwang S-J, Katz R, Li M, et al.: Methods 25: 402 408, 2001 Multiple loci associated with indices of renal function and chronic kid- 22. Robu ME, Larson JD, Nasevicius A, Beiraghi S, Brenner C, Farber SA, ney disease. Nat Genet 41: 712–717, 2009 et al.: p53 activation by knockdown technologies. PLoS Genet 3: e78, 4. Meyer TE, Verwoert GC, Hwang S-J, Glazer NL, Smith AV, van Rooij 2007 23. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, FJA, et al.; Genetic Factors for Osteoporosis Consortium; Meta Analysis et al.; Exome Aggregation Consortium: Analysis of protein-coding of Glucose and Insulin Related Traits Consortium: Genome-wide as- genetic variation in 60,706 humans. Nature 536: 285–291, 2016 sociation studies of serum magnesium, potassium, and sodium con- 24. Chan WC, White PD: Fmoc Solid Phase Peptide Synthesis: A Practical centrations identify six Loci influencing serum magnesium levels. PLoS Approach, New York, Oxford University Press, 2000 Genet 6: e1001045, 2010 25. Ernst J, Kellis M: ChromHMM: Automating chromatin-state discovery 5. Köttgen A, Pattaro C, Böger CA, Fuchsberger C, Olden M, Glazer NL, and characterization. Nat Methods 9: 215–216, 2012 et al.: New loci associated with kidney function and chronic kidney 26. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, disease. Nat Genet 42: 376–384, 2010 et al.: Annotation of functional variation in personal genomes using 6. Chambers JC, Zhang W, Lord GM, van der Harst P, Lawlor DA, Sehmi RegulomeDB. Genome Res 22: 1790–1797, 2012 JS, et al.: Genetic loci influencing kidney function and chronic kidney 27. Mumbach MR, Rubin AJ, Flynn RA, Dai C, Khavari PA, Greenleaf WJ, disease. Nat Genet 42: 373–375, 2010 et al.: HiChIP: Efficient and sensitive analysis of protein-directed ge- 7. Pattaro C, De Grandi A, Vitart V, Hayward C, Franke A, Aulchenko YS, et al.; nome architecture. Nat Methods 13: 919–922, 2016 EUROSPAN consortium: A meta-analysis of genome-wide data from five 28. Köttgen A, Hwang S-J, Rampersaud E, Coresh J, North KE, Pankow JS, European isolates reveals an association of COL22A1, SYT1, and GABRR2 et al.: TCF7L2 variants associate with CKD progression and renal with serum creatinine level. BMC Med Genet 11: 41, 2010 function in population-based cohorts. JAmSocNephrol19: 1989– 8. Böger CA, Gorski M, Li M, Hoffmann MM, Huang C, Yang Q, et al.; 1999, 2008 CKDGen Consortium: Association of eGFR-related loci identified 29. Kato H, Gruenwald A, Suh JH, Miner JH, Barisoni-Thomas L, Taketo by GWAS with incident CKD and ESRD. PLoS Genet 7: e1002292, MM, et al.: Wnt/b-catenin pathway in podocytes integrates cell ad- 2011 hesion, differentiation, and survival. JBiolChem286: 26003–26015, 9. Ellis JW, Chen M-H, Foster MC, Liu C-T, Larson MG, de Boer I, et al.; 2011 CKDGen Consortium; CARe Renal Consortium: Validated SNPs for 30. Chu C-W, Gerstenzang E, Ossipova O, Sokol SY: Lulu regulates eGFR and their associations with albuminuria. Hum Mol Genet 21: Shroom-induced apical constriction during neural tube closure. PLoS – 3293 3298, 2012 One 8: e81854, 2013 10. Okada Y, Sim X, Go MJ, Wu J-Y, Gu D, Takeuchi F, et al.; KidneyGen 31. Couzens AL, Knight JDR, Kean MJ, Teo G, Weiss A, Dunham WH, Consortium; CKDGen Consortium; GUGC consortium: Meta-analysis et al.: Protein interaction network of the mammalian Hippo pathway fi identi es multiple loci associated with kidney function-related traits in reveals mechanisms of kinase-phosphatase interactions. Sci Signal 6: – east Asian populations. Nat Genet 44: 904 909, 2012 rs15, 2013 11. Hildebrand JD, Soriano P: Shroom, a PDZ domain-containing actin- 32. McNeill H, Reginensi A: Lats1/2 regulate Yap/Taz to control nephron binding protein, is required for neural tube morphogenesis in mice. progenitor epithelialization and inhibit myofibroblast formation. JAm – Cell 99: 485 497, 1999 Soc Nephrol 28: 852–861, 2017 12. Khalili H, Sull A, Sarin S, Boivin FJ, Halabi R, Svajger B, et al.: De- 33. Faul C, Donnelly M, Merscher-Gomez S, Chang YH, Franz S, Delfgaauw fi velopmental origins for kidney disease due to shroom3 de ciency. J J, et al.: The actin cytoskeleton of kidney podocytes is a direct target of – Am Soc Nephrol 27: 2965 2973, 2016 the antiproteinuric effect of cyclosporine A. Nat Med 14: 931–938, 13. Menon MC, Chuang PY, Li Z, Wei C, Zhang W, Luan Y, et al.: Intronic 2008 locus determines SHROOM3 expression and potentiates renal allograft 34. Happé H, van der Wal AM, Leonhard WN, Kunnen SJ, Breuning MH, de fi – brosis. J Clin Invest 125: 208 221, 2015 Heer E, et al.: Altered Hippo signalling in polycystic kidney disease. fl 14. Yan L, Li Y, Tang J-T, An Y-F, Luo L-M, Dai B, et al.: The in uence of J Pathol 224: 133–142, 2011 living donor SHROOM3 and ABCB1 genetic variants on renal function 35. Seo E, Kim W-Y, Hur J, Kim H, Nam SA, Choi A, et al.: The Hippo-Sal- – after kidney transplantation. Pharmacogenet Genomics 27: 19 26, vador signaling pathway regulates renal tubulointerstitial fibrosis. Sci 2017 Rep 6: 31931, 2016 15. Yeo NC, O’Meara CC, Bonomo JA, Veth KN, Tomar R, Flister MJ, et al.: 36. Wong JS, Meliambro K, Ray J, Campbell KN: Hippo signaling in the Shroom3 contributes to the maintenance of the glomerular filtration kidney: The good and the bad. Am J Physiol Renal Physiol 311: F241– barrier integrity. Genome Res 25: 57–65, 2015 F248, 2016 16. Haigo SL, Hildebrand JD, Harland RM, Wallingford JB: Shroom induces 37. Chen J, Harris RC: Interaction of the EGF receptor and the hippo apical constriction and is required for hingepoint formation during pathway in the diabetic kidney. JAmSocNephrol27: 1689–1700, neural tube closure. Curr Biol 13: 2125–2137, 2003 2016 17. Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, 38. Varelas X, Miller BW, Sopko R, Song S, Gregorieff A, Fellouse FA, et al.: et al.; Roadmap Epigenomics Consortium: Integrative analysis of 111 The Hippo pathway regulates Wnt/beta-catenin signaling. Dev Cell 18: reference human epigenomes. Nature 518: 317–330, 2015 579–591, 2010

1534 Journal of the American Society of Nephrology J Am Soc Nephrol 29: 1525–1535, 2018 www.jasn.org CLINICAL RESEARCH

39. Bolinger C, Zasadil L, Rizaldy R, Hildebrand JD: Specific isoforms of recruits the KAP1 co-repressor machinery. JBiolChem284: 35670– drosophila shroom define spatial requirements for the induction of 35680, 2009 apical constriction. Dev Dyn 239: 2078–2093, 2010 43. Karplus PA, Diederichs K: Linking crystallographic model and data 40. Holwerda SJB, de Laat W: CTCF: The protein, the binding partners, the quality. Science 336: 1030–1033, 2012 binding sites and their chromatin loops. Philos Trans R Soc Lond B Biol Sci 368: 20120369, 2013. 41. Wang H, Matise MP: Tcf7l2/Tcf4 transcriptional repressor function re- quires HDAC activity in the developing vertebrate CNS. PLoS One 11: See related editorial, “Using Large Datasets to Understand CKD,” on pages e0163267, 2016. Available at: https://www.ncbi.nlm.nih.gov/pmc/ar- 1351–1353. ticles/PMC5036887/ [Internet] 42. Peng H, Ivanov AV, Oh HJ, Lau Y-FC, Rauscher FJ 3rd: Epigenetic gene This article contains supplemental material online at http://jasn.asnjournals. silencing by the SRY protein is mediated by a KRAB-O protein that org/lookup/suppl/doi:10.1681/ASN.2017080856/-/DCSupplemental.

AFFILIATIONS

1HudsonAlpha Institute for Biotechnology, Huntsville, Alabama; 2Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, Michigan; 3Department of Genetics, Harvard Medical School, Boston, Massachusetts; 4Department of Biomedical Engineering and Institute for Complex Molecular Systems, Eindhoven University of Technology, Eindhoven, The Netherlands; 5Department of Chemistry, University of Duisburg-Essen, Essen, Germany; 6Department of Biological Sciences, The University of Alabama in Huntsville, Huntsville, Alabama; 7Department of Chemical and Physical Biology, Vanderbilt University, Nashville, Tennessee; 8Department of Cell Biology, Neurobiology and Anatomy and 9Section on Nephrology, Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina; and 10Department of Physiology, Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin; 11Gene Expression & Regulation Program, Wistar Institute, Philadelphia, Pennsylvania

J Am Soc Nephrol 29: 1525–1535, 2018 CKD and SHROOM3 Variants 1535 Supplemental Material Detailed Methods Analysis of SHROOM3 Regulation and Transcripts Roadmap epigenomics 15 core data was accessed from Wash U browser (http://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state). SwitchGear Genomics Transcription Start Sites were identified using the human genome browser in hg19 build with lower stringency setting (https://genome.ucsc.edu/cgi- bin/hgTrackUi?hgsid=647211331_6qX1NPkqJmLzEbxdLX2UaLwwrruH&c=chr4&g=switchDbTs s). The LD block for the GWAS near SHROOM3 was identified using the SNAP tool1 (http://archive.broadinstitute.org/mpg/snap/) with a 0.8 correlation. The 11 variants identified with SNAP were then analyzed using RegulomeDB2 for integration analysis of multiple ENCODE and roadmap datasets at each variant. To identify cell usage of TSS1 and TSS2, Cap Analysis of Gene Expression (CAGE) data was analyzed on the datasets of FANTOM3,4. The robust promoter data of Phase1 and 2 FANTOM5 was extracted from the Zenbu browser for sites near either TSS1 (hg19: chr4:77,352,033- 77,362,449) or TSS2 (hg19: chr4:77,504,889-77,509,140). In order to build transcript maps using multiple RNAseq datasets, reads were extracted using NCBI SRA BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&BLAST _SPEC=SRA&LINK_LOC=blasttab) for SHROOM3/Shroom3, aligned to transcripts using BOWTIE 2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml), and visualized using Unipro UGENE (http://ugene.net/). Reads mapped per million reads sequenced (RPM) were normalized between each RNAseq run by taking the number of mapped reads to each transcript divided by the total number of million reads sequenced in the dataset. Transcripts used were: mouse Shroom3 (NM_015756), human SHROOM3 (NM_020859), mouse Nphs2 (XM_006496683), and human NPHS2 (NM_014625). RNAseq datasets used were mouse podocyte single cell: SRX2248950, SRX2248951, SRX2248952, SRX2248953, SRX2248954, SRX2248955, SRX2248956, SRX2248957, SRX2248958, SRX2248959, SRX2248960, SRX2248961, SRX2248962, SRX2248963, SRX2248964, SRX2248965, SRX2248966, SRX2248967, SRX2248968, SRX2248969; mouse isolated glomeruli: SRX1637541, SRX1637540, SRX1637539, SRX1637538, SRX1637537, SRX1637536, SRX1637535, SRX1637534, SRX1637533, SRX1637532, SRX1637531; mouse single cell proximal tubule cells: SRX2884964, SRX2884963, SRX2884962, SRX2884961, SRX2884960, SRX2884959, SRX2884958, SRX2884957, SRX2884956, SRX2884955, SRX2884954, SRX2884953, SRX2884952, SRX2884951, SRX2884950, SRX2884949, SRX2884948, SRX2884947, SRX2884946; Human Protein Atlas kidney: ERX288476, ERX288607, ERX288628, ERX288622; Human Protein Atlas brain: ERX288614, ERX288639, ERX288561; HEK293: SRX3015634, SRX3015633, SRX3015632. HiChIP analysis was performed using the Yue labs integrated viewer: (http://promoter.bx.psu.edu/hi- c/chiapet.php?method=hichip&species=human&assembly=hg19&tissue=GM12878&target=coh esin&lab=Chang&gene=shroom3&window=250&sessionID=&browser=ucsc). HiChIP data was from the cohesin datasets previously published5. For all of ENCODE Transcription Factor (TF) binding sites as determined by ChIP-Seq to overlap with either rs17319721 (hg19: chr4:77,361,347-77,376,346) or TSS2 (hg19: chr4:77487991-77513325) we pulled peaks from the Human Genome Browser (https://genome.ucsc.edu/) using the Transcription Factor ChIP- Seq (161 factors). For HEK293 cells we downloaded ChIP-Seq maps of 190 unique TFs from ENCODE portal (https://www.encodeproject.org) to retrieve only IDR (Irreproducible Discovery Rate) passed peaks or binding sites for each factors. Peaks were analyzed for binding overlap of all 190 TFs with rs17319721 or TSS2 loci using a - 50 bp up-and- downstream from summit of each peak using Pybedtools6,7. GO enrichment analysis of all TFs groupings was performed using the Enrichment tools (http://geneontology.org/). STRING analysis8 was performed on the list of TFs found bound to show connections of TFs bound. Electrophoresis Mobility Shift Assays Nuclear extracts were purchased for K562 (#36015, Active Motif), HEK293 (#36033, Active Motif), and A-431 (#36004, Active Motif) and prepared from Human Primary Kidney Endothelial Cells (#H-6014, Cell Biologics), Human Primary Tubule cells (#PCKDH01, Primecells), and Primary Monkey Podocytes (#PCKDP01, Primecells) grown according to manufactures and extracted with the NE-PER kit (#78833, ThermoFisher). Recombinant proteins were purchased for FOXO1 (#TP300477, Origene) and TCF7L2 (#513437, Novoprolabs). Biotin conjugated DNA probes (5’ tagged) were ordered (IDT) for each SNP (Table S1), annealed (90°C-10min, 80°C-10min, 70°C-10min, 60°C-10min, 50°C-10min, 40°C-10min, 30°C- 10min, 20°C-hold) with the reverse complement strand (nonbiotin tagged) in annealing buffer (10mM Tris, 1mM EDTA, 100mM NaCl, ph 8), and diluted to a final volume of 0.25uM. EMSAs were performed using the LightShift Chemiluminescent EMSA kit (ThermoFisher). Binding reactions consisted of 2uL of binding buffer, 1uL glycerol, 1uL NP40, 1uL Poly-dIdC, 1uL of DNA probe annealed and diluted, 2uL of nuclear extract (at 2 ug/uL), and brought to 20uL with water. Outcompetition binding experiments were performed with non-biotin tagged probe at 40X concentration in addition to the labeled probe. Binding reactions were incubated at room temp for 20 min followed by the addition of 5uL of 5X loading buffer. Reactions were run on a 6% DNA Retardation Gel (ThermoFisher, #EC6365BOX) at 100V for 1hr with 0.5X TBE and transferred to Biodyne B membrane (ThermoFisher, # 77016). Membranes were dried and crosslinked. Membranes were probed with streptavidin-HRP exactly as recommended in the LightShift Chemiluminescent EMSA kit. CRISPR/Cas9 Modification and Expression Analysis For generation of the single variant, CRISPR gRNAs were cloned into U6-Chimeric_BB-CBh- hSpCas9 (PX330) which was a kind gift from Feng Zhang (Addgene plasmid # 42230). Oligos for a gRNA (GAGTAGCAGGGCAAAAACA) near the SNPs were identified and ordered through Integrated DNA technologies (IDT) and cloned downstream of a U6 promoter element as previously described9. Single strand donor DNA was designed to include the SNP rs17319721 which also removed the PAM sequence, and ordered as ultramers through IDT. HEK293T cells were grown under recommended growth conditions. Nucleofection (Lonza) was done using SF kit with the Nucleofetor 4D in the 20ul total volume with 2 ug of PX330 and 0.1 nmoles of ssDNA donor. Cells were expanded and diluted to single cell colonies. These colonies were expanded and tested for genetic engineering at the Cas9 target, and a colony homozygous for rs17319721 was confirmed by Sanger sequencing. For generation of a cell line without the LD block, single guide (sgRNA) oligos targeting intron 1 of SHROOM3 were designed and cloned into pX330 plasmids as previously described10. pX330 plasmids, containing spCas9 and the designed sgRNAs, were transfected into HEK 293T using lipofectamine 2000 according to manufacturer’s protocol (Thermo Fisher Scientific). All cell lines were cultured and maintained in Dulbecco modified Eagle medium (DMEM) with 10% fetal bovine serum (FBS), 1% L-glutamine, penicillin streptomycin (1X), and sodium pyruvate (1X). Single cell clones were isolated and genomic DNA of each clone was extracted using DNeasy Blood & Tissue Kit (Qiagen). Genotypes of each cell clones were checked by PCR amplification using primers flanking the two CRISPR target sites as cell clones harboring the desired deletion but not the unmodified cells were expected to produce PCR amplicons of ~200bp. PCR product was ran on a gel, purified and sequenced to verify deletions. Target Primer sequence (5’ to 3’) (F:forward, R:reverse) sgRNA1_F CACCGGTATGCTGCAGGATGACTA sgRNA1_R AAACTAGTCATCCTGCAGCATACC sgRNA2_F CACCGGGAGGATGTATCGGACTTT sgRNA2_R AAACAAAGTCCGATACATCCTCCC PCR_F TTCCCCACACTCAGAAAGGA PCR_R CTTATCTGCCGCTCCACCAT

Cells were grown to 70% confluency and then lysed in 350uL of RLT Plus buffer supplemented with 2-Mercaptoethanol and RNA was purified using RNaesay Plus Mini kit (QIAGEN) according to manufacturer’s instructions. For all samples, cDNA was generated with RevertAid First Strand cDNA Synthesis Kit (Thermo Scientific) using 1ug of total RNA. Quantitative PCR was performed using AmpliTaq DNA polymerase (Applied Biosystems), and the products were separated on 2% agarose gel. Quantitative real-time PCR was performed on the QuantStudio 6 Flex Real-Time PCR system (Applied Biosystems) using primers listed in Table S1 and QuantiTect SYBR Green PCR Kit (QIAGEN) with cycling conditions following: 95C- 15 min, 40x (94°C- 20° sec, 60°C- 20 sec, 70°C- 20 sec). The reactions were conducted in triplicate. Relative gene expression levels were analyzed using the Comparative CT Method (DDCT). The GAPDH and B2M gene was used as a reference. Zebrafish Experiments to Test the Short Isoform of SHROOM3. Human SHROOM3 cDNA ORF (NM_020859) was purchased from OriGene Technologies (Rockville, MD). To test the function of SHROOM3 protein domains, human SHROOM3 mutant alleles lacking either the ASD2 (∆ASD2) or PDZ (∆PDZ) site were generated using Phusion site- directed mutagenesis kit according to manufacturer’s protocol (Thermo Fisher Scientific) using phosphorylated primers listed in the table below. All plasmids were sequenced to verify the constructs were correct. For rescue experiments in zebrafish, plasmids were linearized by PmeI and in vitro transcribed into mRNA using mMESSAGE mMACHINE® kit (Ambion) according to the manufacturer’s instructions. 400 pg of full-length or mutant SHROOM3 mRNA was co- injected along with the shroom3 morpholino (MO) into 1- to 4- cell stage zebrafish embryos. A modified dextran clearance assay was developed from that of previous study11. In short, 55-h post-fertilization zebrafish were anesthetized with tricane, embedded in 1% agarose, and the cardinal vein injected with 9.2 nl 70-kDa FITC-labeled dextran (1 mg/mL). Following injection, the dorsal aorta was imaged by confocal microscopy at 1, 24, and 48 h post-injection (hpi). FITC fluorescent intensities were quantified using ImageJ software (National Institutes of Health). Target Primer sequence (5’ to 3’) (F:forward, R:reverse) ∆PDZ _F TGCCAGCATGATGCAGATATCTCAGGG ∆PDZ _R GGCGGCAGATCTCCTCGGTA ∆ASD2 _F CCAACATTAACCTCTCCACTT ∆ASD2 _R TAAACCCTGAGTTCCACTGAC

Immunofluorescence and Immunoprecipitation of SHROOM3 in Podocytes Human (Celprogen, #36036-08) primary podocytes were grown on podocyte extra-cellular expansion flasks (Celprogen, #E36036-08-T75) with complete human podocyte media (Celprogen, #M36036-08S) at 5%CO2/20%O2. K3 (gift from Duncan lab) inducible pluripotent stem cells (iPSCs) were cultured on StemAdhere with mTeSR1 media (StemCell Technology, #85870) at 5%CO2/5%O2. Cells were plated at 90% confluency onto iMatrix Laminin-511-E8 (Iwai, #N-892012) coated 6-well plates (VWR, #82050-846) and grown using a previously published podocyte differentiation protocol (PMID: 29038743). The iPSC derived podocytes were plated onto 6-well plates coated with iMatrix Laminin-511-E8 and the human podocytes onto podocyte treated 6-well plates (Celprogen, E36036-08-6W). Immunoprecipitations were performed using 100 ug of SHROOM3 antibody from either Sigma (#HPA047784) or ThermoFisher (#PA5-61589) crosslinked to resin using Pierce Co- Immunoprecipitation kit (#26149) according to recommended protocol. Primary human podocytes were lysed using the IP lysis buffer from the kit, precleared with control agarose resin, and then incubated with SHROOM3 antibody resin. A control resin without SHROOM3 antibody was used. Proteins were removed from the resin as recommended in the kit and run on 4-12% SDS-PAGE followed by either silver staining (ThermoFisher, #24612) or transfer to PDVF and western blotted with a third anti-SHROOM3 antibody (ThermoFisher, #PA5-34482) or the Sigma anti-SHROOM3 antibody. LATS2 Kinase Assay All phosorylation sites shown in Figure 4A were identified from UniProt (http://www.uniprot.org/) and the LATS1/2 kinase prediction done using NetPhosK12. Peptides for the kinase assay were ordered from GenScript at >90% purity as confirmed by HPLC and mass spec followed by suspension in water: SHROOM3 (AGPVHVRSRSSPATAD), SHROOM3 H1237A (AGPVAVRSRSSPATAD), SHROOM3 SS1241AA (AGPVHVRSRAAPATAD), SHROOM3 P1244L (AGPVHVRSRSSLATAD). LATS2 active enzyme (#L02-11G, SignalChem) was setup to run reactions at 400ng, 300ng, 200ng, 100ng, 50ng, 25ng, and a 0 control in two separate experiments. Reactions were setup to a total volume of 20uL using 5ug of each peptide, LATS2 enzyme, and 5uL Kinase Reaction Buffer (40mM Tris pH 7.5, 20mM MgCl2, 0.1mg/mL BSA). Kinase reactions were performed using the ADP-Glo Kinase Assay (#V6930, Promega) such that reactions were initiated with the addition of 5uL UltraPure ATP to a final concentration of 1mM and incubated at 23°C for 30min. 25uL of ADP-Glo reagent was added, reactions incubated at 23°C for 40min, 50uL kinase detection reagent added, incubated at 23°C for 30min, and luminescence read using a BioTek Synergy H4 multimode plate reader. Peptide Synthesis for Crystal Structures The SHROOM3 linear motif at 1244 was identified to be a potential 14-3-3 binding site based on 13 ELM analysis . The SHROOM3-derived peptides, wt (H2N-AGPVHVRSRpSSPATA-NH2, pS represents a phospho Ser) and P1244L mutant (H2N-AGPVHVRSRpSSLATA-NH2), were synthesized via Fmoc solid phase peptide synthesis strategy14 using an Intavis MultiPep RSi peptide synthesizer. The phosphorylated peptides were synthesized using Fmoc-protected amino acid building blocks (4 eq., Novabiochem) and a HBTU (Biosolve b.v., 4 eq.) /N,N- diisopropylethylamine (8 eq., Biosolve b.v.) coupling strategy at 50 μmol scale on a Tentagel®R RAM resin (Rapp Polymere; 0.19 mmol/g loading). The phosphoserine amino acid was specifically introduced via Fmoc-Ser(PO(OBzl)OH)-OH (Novabiochem). The peptides were cleaved using TFA/triisopropylsilane/H2O (95/2.5/2.5 v/v), precipitated into ice-cold ether, and the resultant crude isolated as a solid pellet by centrifiguation. Peptide purification was performed using a preparative LC-MS system, which comprised of a LCQ Deca XP Max (Thermo Finnigan) ion-trap mass spectrometer equipped with a Surveyor autosampler and Surveyor photodiode detector array (PDA) detector (Thermo Finnigan). Solvents were pumped using a high-pressure gradient system using two LC-8A pumps (Shimadzu) for the preparative system (20 mL min-1 flow rate) and two LC-20AD pumps (Shimadzu) for the analytical system (0.2 mL min-1 flow rate). The crude mixture was purified on a reverse-phase C18 column (Atlantis T3 prep OBD, 5 µm, 150 x 19 mm, Waters) using a linear acetonitrile gradient in water with 0.1% v/v trifluoracetic acid (TFA). Fractions with the correct mass were collected using a PrepFC fraction collector (Gilson Inc). Molecular characterization of SHROOM3-derived peptides (LC-MS) are as follows: wt sequence, C62H107N25O22P1, Exact Mass: 1570.7724, found: + 1571.7 [M+H] ; P1244L mutant sequence, C63H111N25O22P1, Exact Mass: 1586.8037, found: 1587.8 [M+H]+. 14-3-3 Expression

His6-tagged 14-3-3 proteins (full-length and ΔC-terminus) were expressed in NiCo21(DE3) competent cells with a pPROEX HTb plasmid, and purified using Ni2+-affinity chromatography. The ΔC variant meant for crystallization was treated with TEV-protease to cleave off the His6- tag, followed by a second Ni2+-affinity column and size exclusion chromatography. The proteins were dialyzed against ITC- or crystallization-buffers before usage (described below). Isothermal Titration Calorimetry (ITC)

The ITC measurements were performed with the Malvern MicroCal iTC200. The protein and peptides were dissolved in ITC-buffer (25 mM HEPES pH 7.4, 100 mM NaCl, 10 mM MgCl2, 0.5 mM TCEP). Eighteen titrations of 2 µL were performed at 37 °C (reference power: 5 µCal/sec., initial delay: 60 sec., stirring speed: 750 rpm, spacing: 180 sec.). In case of two titration series the data was merged with ConCat32 software. Crystallography The 14-3-3σ protein was C-terminally truncated after T231 to enhance crystallization (14-3- 3σΔC). The 14-3-3 protein and peptide were dissolved in crystallization buffer (25 mM HEPES, 0.1 M NaCl, 2 mM DTT, pH7.4) and mixed in a 1:1 stoichiometry to a final protein concentration of 10 mg/mL. This peptide-protein mixture was set up for sitting-drop crystallization in crystallization liquor (0.095 M HEPES, 0.19 M CaCl2, 26% (v/v) PEG 400, 5% (v/v) glycerol, pH7.3). Crystals were fished after 10 days of incubation at 4°C and flash-cooled in liquid nitrogen. Diffraction data was collected at 100 K at the X06SA/PXI beamline (Swiss Light Source, Villigen, Switzerland). The data were indexed and integrated using XDS15 and scaled using Aimless16. The structure was phased by molecular replacement using PHASER17 with PDB 3LW118 as search model. Coot19 and phenix.refine20 were used in alternating cycles of manual and automatic refinement to complete the atomic model. SHROOM3 and 14-3-3 affinity capture Affinity capture was performed taking 400ug of GenScript synthesized (>95% purity) N- terminally biotin tagged phospho peptides for SHROOM3 (AGPVHVRSRpSSPATAD) or SHROOM3 P1244L (AGPVHVRSRpSSLATAD) binding to 80uL Pierce™ Avidin Agarose (#20219, ThermoFisher). Beads were washed three times with PBS and then 5ug of 14-3-3 beta (LF-P0040, ThermoFisher) was captured for 1 hour at 23°C followed by two washes with PBS+500mM NaCl and two washes with PBS. Beads were boiled with LDS sample buffer and samples were run on a NuPAGE 4-12% Bis-Tris Protein Gels (#NP0321BOX, ThermoFisher). Proteins were imaged using Pierce™ Silver Stain (#24600, ThermoFisher) on a myECL system. Molecular Dynamic Simulations Each of the PDB files were loaded into YASARA and setup for molecular dynamic simulations. Simulations were setup using pH of 7.4 for pKa predictions, 0.997g/mL of explicit water, and 0.9% NaCl in a simulation square that extends 10 angstroms from all atoms. Simulations were run for 125 nanoseconds with the AMBER03 force field21 and periodic boundaries capturing the structure coordinates every 24 picoseconds. This recorded a total of 5000 trajectory files for each protein that were then assessed for movement in Root-mean squared deviation (RMSD in angstroms) averaging every atom of each residue that is connected to a hydrogen throughout all 5000 trajectory files. References in Supplemental File

1. Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de Bakker PIW: SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinforma. Oxf. Engl. 24: 2938–2939, 2008

2. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S, Cherry JM, Snyder M: Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22: 1790–1797, 2012

3. Abugessaisa I, Noguchi S, Carninci P, Kasukawa T: The FANTOM5 Computation Ecosystem: Genomic Information Hub for Promoters and Active Enhancers. Methods Mol. Biol. Clifton NJ 1611: 199–217, 2017

4. A promoter-level mammalian expression atlas. - PubMed - NCBI [Internet]. Available from: https://www.ncbi.nlm.nih.gov/pubmed/24670764 [cited 2017 Dec 12]

5. Mumbach MR, Rubin AJ, Flynn RA, Dai C, Khavari PA, Greenleaf WJ, Chang HY: HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods 13: 919–922, 2016

6. Dale RK, Pedersen BS, Quinlan AR: Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinforma. Oxf. Engl. 27: 3423–3424, 2011

7. Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinforma. Oxf. Engl. 26: 841–842, 2010

8. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ: STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41: D808-815, 2013

9. Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, Zhang F: Genome engineering using the CRISPR-Cas9 system. Nat. Protoc. 8: 2281–2308, 2013

10. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W, Marraffini LA, Zhang F: Multiplex genome engineering using CRISPR/Cas systems. Science 339: 819– 823, 2013

11. Hentschel DM, Mengel M, Boehme L, Liebsch F, Albertin C, Bonventre JV, Haller H, Schiffer M: Rapid screening of glomerular slit diaphragm integrity in larval zebrafish. Am. J. Physiol. Renal Physiol. 293: F1746-1750, 2007

12. Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S: Prediction of post- translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4: 1633–1649, 2004 13. Dinkel H, Michael S, Weatheritt RJ, Davey NE, Van Roey K, Altenberg B, Toedt G, Uyar B, Seiler M, Budd A, Jödicke L, Dammert MA, Schroeter C, Hammer M, Schmidt T, Jehl P, McGuigan C, Dymecka M, Chica C, Luck K, Via A, Chatr-Aryamontri A, Haslam N, Grebnev G, Edwards RJ, Steinmetz MO, Meiselbach H, Diella F, Gibson TJ: ELM--the database of eukaryotic linear motifs. Nucleic Acids Res. 40: D242-251, 2012

14. Fmoc Solid Phase Peptide Synthesis: A Practical Approach. Oxford, New York: Oxford University Press;

15. Kabsch W: XDS. Acta Crystallogr. D Biol. Crystallogr. 66: 125–132, 2010

16. Evans PR, Murshudov GN: How good are my data and what is the resolution? Acta Crystallogr. D Biol. Crystallogr. 69: 1204–1214, 2013

17. McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ: Phaser crystallographic software. J. Appl. Crystallogr. 40: 658–674, 2007

18. Schumacher B, Mondry J, Thiel P, Weyand M, Ottmann C: Structure of the p53 C-terminus bound to 14-3-3: implications for stabilization of the p53 tetramer. FEBS Lett. 584: 1443– 1448, 2010

19. Emsley P, Cowtan K: Coot: model-building tools for molecular graphics. Acta Crystallogr. D Biol. Crystallogr. 60: 2126–2132, 2004

20. Adams PD, Afonine PV, Bunkóczi G, Chen VB, Davis IW, Echols N, Headd JJ, Hung L-W, Kapral GJ, Grosse-Kunstleve RW, McCoy AJ, Moriarty NW, Oeffner R, Read RJ, Richardson DC, Richardson JS, Terwilliger TC, Zwart PH: PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D Biol. Crystallogr. 66: 213–221, 2010

21. Duan Y, Wu C, Chowdhury S, Lee MC, Xiong G, Zhang W, Yang R, Cieplak P, Luo R, Lee T, Caldwell J, Wang J, Kollman P: A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations. J. Comput. Chem. 24: 1999–2012, 2003

Supplemental Figures Table S1 Oligos used

Name Use Sequence (5'->3') Assay

5' Biotin- rs17319721 Probe EMSA CTAGTGAGTAGCAGGGCAAAAACAA[A/G]GCAGCCATTGATCTAAGTGTAAGG 5' Biotin- rs17319721 Scramble Probe EMSA CTAGTGAGTAGCAGGGCAAAGGTGGAATAGCCATTGATCTAAGTGTAAGG ∆PDZ F 5’-Phospho-TGCCAGCATGATGCAGATATCTCAGGG Mut Construct

∆PDZ R 5’-Phospho-GGCGGCAGATCTCCTCGGTA Mut Construct

∆ASD2 F 5’-Phospho-CCAACATTAACCTCTCCACTT Mut Construct

∆ASD2 R 5’-Phospho-TAAACCCTGAGTTCCACTGAC Mut Construct

rs17319721 CRISPR gRNA GAGTAGCAGGGCAAAAACA CRISPR / Cas9 GAAAATAGACTCTTTTCTCTCATGCTTTGCTTTTGC TTTTGTTTATTTTTTATATTTTTACTTTTGAGTATCA GCTAGTGAGTAGCAGGGCAAAAACAA[A]GCAGC rs17319721 CRISPR Donor CRISPR / Cas9 CATTGATCTAAGTGTAAGGCAGCAGGAGGAAGG TTAAGAGAATGAGTGTTATTCTTTAGGACTACATT CATAGTTGCATGGTGTTGCTCTTGGCTC rs17319721 Sanger F TGTGATGAACATAGGGTGCAA Sanger

rs17319721 Sanger R CAGAGGAAAGGGGACACTGA Sanger

F GGATCCTACAAGACCCTCAG qPCR SHROOM3 E2E4 R TCCCCAGATTTAGTTGTGTG

F TGTCCACCAAGACAAATCTC qPCR, qRT-PCR SHROOM3 E8E9 R GCTTCCTTGTCTTCATTCCT

F CTCACTCGTGGCACACAACTA qPCR SHROOM3 E4E5 R GACCATGCTTATCTAAGGCGG

F AAGCCAGAAGTTATGAGCTTCAG qRT-PCR 11-Sep R GCCTCGAACTGGGCATCAAT

F AGACCTTCACGGAACAGTTGC qRT-PCR FAM47E R CACGTTTACATTCCTTCCCGA

F CGGAGTCAACGGATTTGGTCGTAT qRT-PCR GAPDH R AGCCTTCTCCATGGTGGTGAAGAC

F TGAGTATGCCTGCCGTGTGAAC qRT-PCR B2M R ATGCTGCTTACATGTCTCGATCCC

Table S2 Expression of transcripts for SHROOM3 and TCF7L2 from the Human Protein Atlas. Kidney is marked in red and the top four tissues for each gene are colored in cyan.

SHROOM3 SHROOM3 SHROOM3 TCF7L2 FPKM Tissue TCF7L2 Staining FPKM±SEM Isoform Staining ± SEM

Glandular, Endothelial, Colon 30.6 ± 3.1 2/3 Glandular 23.6 ± 2.3 Ganglion

Salivary 25.1 ± 1.3 2/3 Glandular 12.2 ± 1.4 Glandular Gland

Rectum 24.1 ± 4.2 2/3 Glandular 25.7 ± 1.4 * Glandular

Gallbladder 21.3 ± 2.5 2/3 Glandular 12.3 ± 1.6 Glandular

Stomach 21.0 ± 3.9 2/3 Glandular 25.3 ± 8.5 * Glandular

Small 18.6 ± 0.7 2/3 Glandular 18.9 ± 0.2 Glandular Intestine

Fallopian 18.3 ± 5.0 2/3 Glandular 23.4 ± 1.0 Glandular Tube

Duodenum 17.5 ± 0.9 2/3 Glandular 20.3 ± 0.6 Glandular

Macrophages, Lung 16.8 ± 2.5 2/3 - 14.5 ± 0.2 Pneumocytes

Squamous Squamous epithelial Esophagus 16.4 ± 0.4 2/3 11.9 ± 1.1 Epithelial cells

Prostate 11.5 ± 1.7 2/3 Glandular 13.8 ± 1.0 Glandular

Glomeruli and Kidney 11.0 ± 0.9 2/3 9.7 ± 0.9 Glomeruli and Tubules Tubules

Thyroid Gland 9.7 ± 2.1 1 - 14.4 ± 3.4 Glandular

Smooth 7.8 ± 1.1 2/3 - 21.2 ± 4.9 Smooth muscle Muscle

Liver 6.5 ± 1.1 1 - 14.6 ± 2.5 Bile duct, Hepatocytes

Heart Muscle 6.1 ± 1.7 1 - 8.7 ± 1.5 Myocytes

Appendix 5.2 ± 0.5 2/3 Glandular 9.3 ± 1.2 Glandular, Lymphoid

Pancreas 5.1 ± 0.4 2/3 Glandular 8.2 ± 1.6 Glandular, Islets

Spleen 4.6 ± 0.9 1 - 23.8 ± 0.6 * Red Pulp, White Pulp

Glandular, endometrial Endometrium 4.0 ± 1.8 2/3 Glandular 22.1 ± 1.3 stroma

Urinary 3.7 ± 1.5 2/3 Glandular 12.7 ± 4.1 Urothelial Bladder

Decidual, Placenta 3.7 ± 1.1 2/3 - 19.4 ± 2.4 Trophoblastic

Tonsil 3.0 ± 0.5 1 - 4.9 ± 0.2 Squamous Epithelial

Adrenal 2.4 ± 0.3 2/3 - 9.1 ± 0.3 Glandular Gland

Cerebral Neuronal, Glial, 2.2 ± 0.5 1 - 8.5 ± 2.8 Cortex Endothelial

Seminiferous Ducts, Testis 1.7 ± 0.3 1 - 10.3 ± 1.2 Leydig

Fibroblasts, Skin 1.6 ± 0.2 2/3 - 7 ± 0.5 Langerhans, Melanocytes

Adipose 1.0 ± 0.4 1 - 23.3 ± 2.7 Adipocytes, Fibroblasts Tissue

Ovary 0.5 ± 0.1 - - 27.8 ± 1.6 * Ovarian Stroma

Germinal Center, Non- Lymph Node 0.4 ± 0.1 - - 5.9 ± 0.8 germinal Center

Skeletal 0.1 ± 0.1 - - 4.0 ± 0.4 Myocytes Muscle

Bone Marrow 0.0 ± 0.0 - - 4.7 ± 0.9 Hematopoietic Table S3 gnomAD human variants within SHROOM3. Transcript ENST0000029604 annotated variants were grouped and all missense variants (red) analyzed with PolyPhen2 as either benign, possibly damaging or probably damaging followed by impact scoring for those predicted probably damaging (score>100).

Annotation # variants

3' UTR 278

frameshift 21

inframe deletion 13

intron 298

Non-coding transcript exon 0

splice acceptor 1

splice donor 4

splice region 37

start lost 0

stop gained 11

synonymous 447

missense 1043

benign 518

possibly damaging 195

probably damaging 330

High Impact Variants (>100 Score) 35

Table S4 Top 35 high impact gnomAD human variants within SHROOM3.

Protein RSID Position Triplet Normalized dN-dS Conservation Score 21 codon window PolyPhen2 Allele Count Impact Consequence

probably rs139272770 chr4:77659945 C/T His207Tyr CAT -0.890867 1.25 14.75 846 15598.13 damaging probably rs145112769 chr4:77652057 G/C Gly186Arg GGC -0.476875 1 9.25 1611 14901.75 damaging probably rs147732653 chr4:77662361 C/A Thr1012Asn ACT -2.27092 0.5 15.5 1630 12632.5 damaging probably rs144435434 chr4:77476772 G/T Gly60Val GGG -1.31398 0.25 13.25 538 1782.125 damaging probably rs201389959 chr4:77660493 T/A Ser389Arg AGT -3.04693 2 15.5 56 1736 damaging probably rs371533598 chr4:77662358 T/C Leu1011Pro CTG -0.312924 1 16 68 1088 damaging probably rs199558629 chr4:77680831 G/C Asp1778His GAT -2.4882 1.5 13.25 38 755.25 damaging probably rs141646361 chr4:77676155 G/A Glu1507Lys GAA -1.69351 0.25 6.75 424 715.5 damaging probably rs141239704 chr4:77476831 A/T Asn80Tyr AAT -2.15082 1.5 14 34 714 damaging probably rs76406459 chr4:77659958 G/A Arg211Lys AGG -0.874031 1.25 13.5 41 691.875 damaging probably rs376718092 chr4:77691999 G/A Arg1857His CGT -2.19996 1.5 21.5 20 645 damaging probably rs199533144 chr4:77357309 G/T Gly35Val GGA -1.9075 1.5 15.25 28 640.5 damaging probably rs745611618 chr4:77700166 A/C Lys1943Gln AAG -2.42386 1.5 20.75 19 591.375 damaging probably rs181194611 chr4:77663057 C/T Pro1244Leu CCC -0.238437 1 10.5 50 525 damaging probably rs367782393 chr4:77662214 C/T Ser963Leu TCG -2.86125 2 10.75 23 494.5 damaging probably rs181584053 chr4:77691937 G/A Met1836Ile ATG 0 1 18.5 24 444 damaging probably rs370967544 chr4:77660063 T/C Ile246Thr ATT -1.8975 1.5 16.5 17 420.75 damaging probably rs144913986 chr4:77476909 G/C Val106Leu GTG -1.32553 0.25 7.75 183 354.5625 damaging probably rs775459440 chr4:77691866 G/A Ala1813Thr GCC -1.43062 1.25 18.5 14 323.75 damaging probably rs761396961 chr4:77476813 G/A Asp74Asn GAT -3.96299 2 10.75 14 301 damaging probably rs777929615 chr4:77659915 T/C Ser197Pro TCC -0.95375 1.25 15 13 243.75 damaging probably rs533735013 chr4:77660489 G/A Arg388Gln CGG -1.28941 1.25 15.25 10 190.625 damaging probably rs769359406 chr4:77660732 C/T Pro469Leu CCG -2.38437 0.5 15.75 21 165.375 damaging probably rs146652221 chr4:77660381 G/A Arg352Gln CGG -0.932178 0.25 2.75 238 163.625 damaging probably rs371504856 chr4:77660201 G/A Arg292Gln CGA -1.43898 1.25 8 15 150 damaging probably rs79007254 chr4:77660567 G/A Arg414Gln CGG -2.05531 1.5 6.5 15 146.25 damaging probably rs780349816 chr4:77660717 C/T Pro464Leu CCG -2.74088 1 15.75 8 126 damaging probably rs752388322 chr4:77677916 T/C Ile1675Thr ATT -1.97434 1.5 16.5 5 123.75 damaging probably rs144435434 chr4:77476772 G/C Gly60Ala GGG -1.31398 0.25 13.25 36 119.25 damaging probably rs147919147 chr4:77676154 C/G Asp1506Glu GAC -5.24274 1 7 17 119 damaging probably rs767412535 chr4:77357350 G/A Gly49Arg GGA -1.19219 1.25 18.75 5 117.1875 damaging probably rs377199268 chr4:77660023 C/A Pro233Thr CCA -1.09899 0.25 11.5 40 115 damaging probably rs770837589 chr4:77700037 C/T Arg1900Cys CGC -1.77718 1.5 19 4 114 damaging probably rs757040989 chr4:77651977 G/A Arg159Gln CGA -1.03479 1.25 8.25 10 103.125 damaging probably rs751264874 chr4:77662571 A/G Gln1082Arg CAG -0.407049 1 14.5 7 101.5 damaging

Table S5 Top 35 high impact gnomAD human variants within SHROOM3 for allele frequencies.

RSID Protein Consequence African Ashkenazi Jewish European (Finnish) European (Non-Finnish) Latino East Asian South Asian Other

rs139272770 His207Tyr 0.054 0.000 0.826 0.464 0.029 0.000 0.000 0.356

rs145112769 Gly186Arg 0.012 0.000 0.000 0.016 0.003 0.011 5.087 0.294

rs147732653 Thr1012Asn 0.193 0.981 0.310 1.020 0.586 0.000 0.201 0.854

rs144435434 Gly60Val 0.042 0.345 0.054 0.279 0.218 0.005 0.101 0.294

rs201389959 Ser389Arg 0.013 0.000 0.000 0.017 0.079 0.000 0.000 0.062

rs371533598 Leu1011Pro 0.000 0.000 0.000 0.000 0.000 0.000 0.227 0.034

rs199558629 Asp1778His 0.000 0.000 0.004 0.011 0.012 0.000 0.068 0.000

rs141646361 Glu1507Lys 0.017 0.000 0.307 0.247 0.026 0.005 0.023 0.202

rs141239704 Asn80Tyr 0.004 0.000 0.000 0.026 0.000 0.000 0.000 0.000

rs76406459 Arg211Lys 0.000 0.000 0.043 0.024 0.000 0.000 0.000 0.000

rs376718092 Arg1857His 0.004 0.000 0.012 0.006 0.020 0.005 0.003 0.000

rs199533144 Gly35Val 0.000 0.000 0.000 0.025 0.000 0.000 0.000 0.000

rs745611618 Lys1943Gln 0.000 0.000 0.000 0.002 0.036 0.000 0.000 0.091

rs181194611 Pro1244Leu 0.000 0.000 0.000 0.000 0.000 0.268 0.000 0.000

rs367782393 Ser963Leu 0.014 0.022 0.000 0.000 0.000 0.191 0.000 0.028

rs181584053 Met1836Ile 0.000 0.000 0.004 0.017 0.003 0.000 0.003 0.000

rs370967544 Ile246Thr 0.013 0.000 0.000 0.002 0.018 0.017 0.010 0.018

rs144913986 Val106Leu 0.200 0.246 0.000 0.045 0.122 0.000 0.003 0.155

rs775459440 Ala1813Thr 0.004 0.000 0.000 0.002 0.000 0.058 0.000 0.000

rs761396961 Asp74Asn 0.000 0.000 0.054 0.001 0.000 0.000 0.000 0.018

rs777929615 Ser197Pro 0.000 0.000 0.000 0.000 0.039 0.000 0.000 0.000

rs533735013 Arg388Gln 0.000 0.010 0.000 0.003 0.003 0.021 0.000 0.000

rs769359406 Pro469Leu 0.004 0.000 0.000 0.009 0.000 0.000 0.029 0.000

rs146652221 Arg352Gln 0.054 0.355 0.066 0.132 0.009 0.000 0.000 0.031

rs371504856 Arg292Gln 0.025 0.000 0.000 0.006 0.000 0.000 0.000 0.031

rs79007254 Arg414Gln 0.000 0.000 0.000 0.005 0.003 0.042 0.000 0.000

rs780349816 Pro464Leu 0.012 0.000 0.000 0.002 0.000 0.005 0.003 0.000

rs752388322 Ile1675Thr 0.004 0.000 0.000 0.003 0.000 0.000 0.000 0.000

rs144435434 Gly60Ala 0.000 0.000 0.132 0.000 0.003 0.005 0.000 0.000

rs147919147 Asp1506Glu 0.000 0.000 0.000 0.013 0.003 0.000 0.000 0.000

rs767412535 Gly49Arg 0.007 0.000 0.000 0.000 0.000 0.012 0.006 0.000

rs377199268 Pro233Thr 0.000 0.000 0.000 0.000 0.006 0.201 0.000 0.000

rs770837589 Arg1900Cys 0.000 0.000 0.000 0.003 0.000 0.000 0.000 0.000

rs757040989 Arg159Gln 0.000 0.000 0.000 0.001 0.009 0.006 0.016 0.000

rs751264874 Gln1082Arg 0.036 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Figure S1 RNAseq of adult tissues from Roadmap Epigenomics. Darker black represents a higher density of reads mapped to the genome through RNAseq. Isoform 1 is identified by containing expression of exon 1 found on the left most side. This figure shows that isoform 1 is only used in a few cells such as neural while transcripts not including exon 1 are found in more tissues types.

Figure S2 FANTOM CAGE data for either TSS1 or TSS2 of SHROOM3 in 1,830 datasets. More datasets use TSS2 as a start site than do TSS1 with higher positive strand values. Below each plot are the top ten tissues with noticeable elevation of neuron and astrocyte cells using TSS1 and a list of more diverse organs that use TSS2.

Figure S3 Transcript mapping of SHROOM3 using publicly available RNAseq datasets. BLAST analysis was performed using SHROOM3/Shroom3 transcripts (NM_015756/NM_020859) or NPHS2/Nphs2 and plotted as reads mapped per million reads sequenced (RPM) for six different clusters of RNAseq datasets. Only whole kidney, isolated glomeruli, and single cell podocytes have expression of NPHS2/Nphs2. Below the RPM plots are the Bowtie2 alignments of transcripts with a zoomed in view of the secondary transcriptional start site in either mouse or human.

Figure S4 Transcripts in individual podocyte RNAseq datasets. Three transcripts of the mouse are shown above in red. Reads from five datasets for both the isolated glomeruli and for individual podocytes were aligned and shown for read depth. A third and novel transcript is shown in one of the isolated podocyte RNAseq datasets zoomed in in the red box. This transcript is predicted to generate a protein of 45.9 kDa that contains only the ASD2 domain.

Figure S5 Immunoprecipitation of SHROOM3 from primary human podocytes. Two SHROOM3 antibodies and a negative antibody control (bottom) were used to capture proteins from cellular extracts made from two separate isolates of primary human podocytes. Captured proteins were then run on SDS-PAGE and proteins determined using either silver staining (left) or a western blot using a SHROOM3 antibody (right). Protein bands were identified at ~45kDa, the predicted size of the new SHROOM3 Isoform 4 and ~190kDa for the isoform2/3 predicted from Figure S4.

Figure S6 HiChIP data for GM12878, a cell line without SHROOM3 expression. On top is shown the DHS-Linkage, a compiled dataset for DNaseI-hypersensitive sites (DHS) of ~100 cell types of ENCODE correlating linkage in gene regulation areas. This suggests that regulation is correlated from before the first TSS through the second TSS. The GM12878 cell line does not express SHROOM3. The HiChIP data for Cohesin is shown in the middle and shows where cohesion, a factor involved in chromosome organization, links multiple regions of the genome together. Shown below the DHS data is either the global interaction sites for HiChIP, and below that is interaction sites directly around the rs17319721 variant. On the bottom is the human genome browser showing the location of SHROOM3, GWAS SNPs (green), H3K27Ac marks of ENCODE, DNaseI hypersensitive sites, transcription factor binding sites, and SwitchGear TSS sites. Boxes in black is the location of rs17319721 and boxed in red is the TSS2 site.

Figure S7 Transcription Factors (TFs) from ENCODE near rs17319721 and TSS2. The binding of TFs from Phase 1 transcription factor ChIP-Seq data, as determined by a threshold of the ENCODE consortium, for the region around rs17319721 (hg19: chr4:77,361,347- 77,376,346) or the TSS2 (hg19: chr4:77487991-77513325). A total of 53 factors are bound to the TSS2 location and 10 near rs17319721 with 8 TFs overlapping between the two datasets including the well-known linker CTCF.

Figure S8 Pathway enrichment of TFs from all ENCODE cell ChIPseq and those of HEK293 for rs17319721 and TSS2 sites. STRING analysis was performed on the list of TFs from Figure S7 (top) or those TFs found in the HEK293 ChIP-Seq (bottom) for either rs17319721 alone, TSS2 alone, or found in both. It should be noted that TCF7L2 is found in the TSS2 site for both panels.

Figure S9 Testing of top five SNPs from RegulomeDB with EMSA. Each of the SNPs in LD were processed through RegulomeDB with scores shown: rs17319721 (2b: http://www.regulomedb.org/snp/chr4/77368846); rs1398016 (5); rs17253722 (5); rs28418670 (6); rs11724003 (5); rs1398018 (6); rs2870238 (No Data); rs4859682 (5); rs13146355 (5); rs9992101 (5); rs10023335 (No Data). Shifting of the major and minor allele in three cell lines that have variable expression of SHROOM3 for five probes. Cold outcompetition was performed on the HEK293 binding using 40X non-biotin tagged oligo probe in addition to the normal biotin probe. Two probes show promising shifting; however, rs4859682 does not have changes in binding to the primary kidney nuclear extracts (red box).

Figure S10 Removal of the entire SHROOM3 GWAS LD block changes regulation similar to rs17319721. A) A CRISPR/Cas9 system was designed to remove the entire LD block from C1 to C2, resulting in a 223 bp fragement that the F and R primers could detect successful CRISPR within single cell isolates. B) Change in expression of the E2E4 and E8E9 primer sets following the LD block removal as determined by q-RT-PCR. C) Number of modified alleles (x- axis) was determined by quantifying the F+R PCR for four separate single cell isolates and plotted relative to the E8E9 expression levels (y-axis) showing a correlation in expression change to the number of modified alleles within the cell.

Figure S11 Molecular Phylogenetic analysis by Maximum Likelihood for full-length SHROOM3 sequences. It can be noted that species included in our evolution range from mammals to reptiles (such as Anolis) and birds (such as Gallus). Methods for tree generation: The evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura-Nei model. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the taxa analyzed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The analysis involved 39 nucleotide sequences. All positions with less than 80% site coverage were eliminated. That is, fewer than 20% alignment gaps, missing data, and ambiguous bases were allowed at any position. There was a total of 5,856 positions in the final dataset.

Figure S12 Sliding window for motif discovery of Shroom3 open reading frame. A) The conservation of each amino acid was determined using the 39 sequences from Figure S2. B-E) The scores for each position were then put on a sliding window, ie summing the values for position X with the number of amino acids Y upstream and downstream to equal Z codons in the window. This was performed for a window of 11 (B), 21 (C), 51 (D), and 101 (E) codons. For example the 21-codon window position 101 is the combined values of evolutionary conservation for amino acid 90-111 and for 102 is 91-112.

Figure S13 Analysis of variants from gnomAD. Allele frequency percentages from five variants within SHROOM3 with the top impact scores (given in parentheses) in the 8 populations of gnomAD.

Figure S14 Biophysical characterization of the P1244L variant. A-B) Isothermal titration calorimetry (ITC) results of binding of the SHROOM3 (A) and SHROOM3 P1244L (B) peptides to 14-3-3. C) Affinity capture of recombinant 14-3-3 by a SHROOM3 phosphorylated peptides fixed to avidin in 4 separate capture experiments. The quantification of 6 independent capture experiments is shown below, with the error bars representing the SEM. Significance was determined with a Student’s T-test.

SIGNIFICANCE STATEMENT

Although the genetics of CKD are beginning to be deciphered, interpretation of how variants result in disease remains a challenge that is increasing as more and more genomes are being sequenced. In this paper, we use our workflow designed to assess variants to develop mechanistic insights into CKD variants, highlighting new knowledge of both common noncoding and rare coding variants within SHROOM3. The detailed knowledge gleaned for function of SHROOM3 in podocytes advances novel pathways and mechanisms for CKD.