<<

ARTICLE

https://doi.org/10.1038/s41467-021-22266-1 OPEN Single cell regulatory landscape of the mouse kidney highlights programs and disease targets

Zhen Miao 1,2,3,8, Michael S. Balzer 1,2,8, Ziyuan Ma 1,2,8, Hongbo Liu1,2, Junnan Wu 1,2, Rojesh Shrestha 1,2, Tamas Aranyi1,2, Amy Kwan4, Ayano Kondo 4, Marco Pontoglio 5, Junhyong Kim6, ✉ Mingyao Li 7, Klaus H. Kaestner2,4 & Katalin Susztak 1,2,4 1234567890():,; Determining the epigenetic program that generates unique cell types in the kidney is critical for understanding cell-type heterogeneity during tissue homeostasis and injury response. Here, we profile open chromatin and expression in developing and adult mouse kidneys at single cell resolution. We show critical reliance of on distal regulatory elements (enhancers). We reveal key cell type-specific factors and major gene- regulatory circuits for kidney cells. Dynamic chromatin and expression changes during nephron progenitor differentiation demonstrates that commitment occurs early and is associated with sustained Foxl1 expression. Renal tubule cells follow a more complex differentiation, where Hfn4a is associated with proximal and Tfap2b with distal fate. Mapping single nucleotide variants associated with kidney disease implicates critical cell types, developmental stages, , and regulatory mechanisms. The single cell multi-omics atlas reveals key chromatin remodeling events and gene expression dynamics associated with kidney development.

1 Renal, Electrolyte, and Hypertension Division, Department of Medicine, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA. 2 Institute for Diabetes, , and Metabolism, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA. 3 Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA. 4 Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA. 5 Epigenetics and Development Laboratory, Université de Paris Inserm U1151/CNRS UMR 8253, Institut Necker Enfants Malades, Paris, France. 6 Department of Biology, University of Pennsylvania, Philadelphia, PA, USA. 7 Department of Epidemiology and Biostatistics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA. 8These authors contributed equally: Zhen ✉ Miao, Michael S. Balzer, Ziyuan Ma. email: [email protected]

NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1

he mammalian kidney maintains fluid, electrolyte, and seq data, susztaklab.com/developing_adult_kidney/scRNA/ for Tmetabolite balance of the body and plays an essential role scRNA-seq data, and susztaklab.com/developing_adult_kidney/igv/ in blood pressure regulation and red blood cell home- for IGV view of peak tracks). Using this atlas, we produce an ostasis. The human kidney makes roughly 180 liters of primary epigenome-based classification of developing and mature cells and filtrate each day that is then reabsorbed and modified by a long defined cell type-specific regulatory networks. We also investigate tubule segment. To perform this highly choreographed and key TFs and cell–cell interactions associated with developmental sophisticated function, the kidney contains close to 20 highly cellular transitions. Finally, we use the single cell open chromatin specialized epithelial cells. The renal glomerulus acts as a 60 kD information to pinpoint putative target genes and cell types of size-selective filter. The proximal part of the tubules is responsible several chronic kidney disease noncoding genome-wide association for reclaiming more than 70% of the primary filtrate, which is study (GWAS) loci. done via unregulated active and passive paracellular transport1, while the loop of Henle plays an important role in concentrating the urine. The distal convoluted tubule is critical for regulated Results sodium reabsorption. The last segment of kidney tubules is the Single cell accessible chromatin landscape of the developing collecting duct, where the final concentration of the urine is and adult mouse kidneys. To characterize the accessible chro- determined via regulation of water channels, acid, or base matin landscape of the developing and adult mouse kidneys at secretion. Understanding the development of these diverse cell single cell resolution, we performed single nuclei ATAC-seq types in the kidney is essential to understand kidney homeostasis, (snATAC-seq) on kidneys of mice on postnatal day 0 (P0), at 3 disease, and regeneration. and 8 weeks ( and P56) of age (Fig. 1a). In mice at birth, the The mammalian kidney develops from the intermediate meso- nephron progenitors are still present and nephrons are formed derm via a complex interaction between the ureteric bud and the until around day 2113,18. In parallel, we also performed bulk metanephric mesenchyme2. In the mouse kidney, Six2 marks the (whole kidney) ATAC-seq analysis at matched developmental self-renewing nephron progenitor population3. The nephron pro- stages. Following sequencing, we aggregated all high-quality genitors commit and undergo a mesenchymal-to-epithelial trans- mapped reads in each sample irrespective of barcode. The com- formation giving rise to the renal vesicle3. The renal vesicle then bined snATAC-seq dataset from all samples showed the expected undergoes segmentation and elongation, giving rise to epithelia insert size periodicity (Supplementary Fig. 1a) with a strong from the to the distal convoluted tubules, while the enrichment of signal at Transcription Start Sites (TSS), indicating ureteric bud becomes the collecting duct. Unbiased and high data quality (Supplementary Fig. 1b). The snATAC-seq data hypothesis-driven studies have highlighted critical stages and dri- showed high concordance with the bulk ATAC data (Spearman vers of early kidney development4, that have been essential for the correlation coefficient >0.84 between matched stages, see development of an in vitro kidney organoid differentiation “Methods” section, Supplementary Fig. 1c). protocols5–7 . However, cells in organoids are still poorly differ- We next revealed cell type annotations from the open entiated, improving cellular differentiation and maturation of these chromatin information. After conducting stringent filtering of structures remains a major challenge8. Thus, the understanding of the number of unique fragments, promoter ratio and mitochon- late kidney development, especially the cell type-specificdriver dria ratio (see “Methods” section, Supplementary Fig. 2a), we kept transcription factors (TFs) is of great importance9–11.Alterationin 28,316 cells across the samples (Fig. 1b). Cells were then clustered Wnt, Notch, Bmp, and Egf signaling significantly impacts cellular using SnapATAC19, which binned the whole genome into 5 kb differentiation, but only a handful of TFs that directly drive the regions to address the sparsity of the data (see “Methods” differentiation of distinct segments have been identified, such as section). Prior to clustering, we used Harmony20, an iterative Pou3f3, Lhx1, Irx2, Foxc2,andMafb12. Further understanding of batch correction method, to correct for variability across samples the terminal differentiation program could aid the understanding (Supplementary Fig. 2b). Using batch-corrected low dimensional of kidney disease development. embeddings, we retained 13 clusters, all of which had consistent While single cell RNA sequencing (scRNA-seq) has improved representation across the number of peaks, samples and read our understanding of kidney development in mice and depth profiles (Figs. 1b and S2c, d). humans9,10,13,14, it provides limited information of TFs, which To determine the cell types represented by each cluster, we are usually expressed at low levels. Equally difficult is to under- examined chromatin accessibility around the TSS and gene body stand how genes are regulated from scRNA-seq data alone. regions of the known cell type-specific marker genes21.Basedonthe Chromatin state profiles, on the other hand, determine the gene accessibility of the known marker genes, we identified clusters expression potential and can pinpoint the availability of TF representing nephron progenitors, endothelial cells, podocytes, binding sites. Together with gene expression, open chromatin proximal tubule segment 1 and segment 3 cells, loop of Henle, distal profiles can define the gene regulatory logic, which is the fun- convoluted tubule, connecting tubule, collecting duct principal cells, damental element of cell identity. However, there is a scarcity of collecting duct intercalated cells, stromal, and immune cells. Fig. 1d open chromatin information by Assay for Transposase-Accessible and Supplmentary Fig. 3) show chromatin accessibility information Chromatin using sequencing (ATAC-seq) or chromatin immu- for key cell type marker genes, such as Uncx and Cited1 for nephron noprecipitation (ChIP) data by ChIP-seq related to kidney post- progenitors, Nphs1 and Nphs2 for podocytes, Akr1c21 for both natal development. In addition, epigenetic changes observed in segments of proximal tubules, Slc34a1 and Slc5a2 for proximal bulk analyses mostly represent changes in cell composition, rather convoluted tubules (PCT), Kap for proximal straight tubules (PST), than cell type-specific changes15, making it challenging to inter- Slc12a1 and Umod for loop of Henle, Slc12a3,andPvalb for distal pret bulk ATAC-seq data. The publicly available mouse kidney convoluted tubule, Trpv5 for connecting tubule, Aqp2 and Fxyd4 for ATAC-seq data contain limited number of cells from adult tis- principal cells, Atp6v1g3 and Atp6v0d2 for intercalated cells, Egfl7 sues, and therefore do not provide insight into chromatin changes for endothelial cells, C1qb for immune cells and Col3a1 for different during development16,17. types of stromal cells, respectively21. As expected, some clusters Here, we generate a single cell open chromatin and corre- such as nephron progenitors and stromal cells were enriched in the sponding expression survey for the developing and adult mouse developing kidney (P0). kidney, which is available to the community via searchable websites In order to identify cell type-specific open chromatin regions, (susztaklab.com/developing_adult_kidney/snATAC/ for snATAC- we conducted peak calling using MACS222 on each cell type

2 NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1 ARTICLE separately. The peaks were then merged to obtain a comprehen- snATAC-seq data, we first stratified the genome into promoters, sive open chromatin set. We found that the single nuclei open , 5′ and 3′ untranslated regions, , and distal regions chromatin set showed good concordance with bulk ATAC-seq using the GENCODE annotation27 (see “Methods” section). We samples, with most of the peaks in bulk ATAC-seq data captured noticed that concordant with bulk ATAC-seq data, most peaks in by the single nuclei data. On the other hand, single nuclei snATAC-seq data were in regions characterized as distal elements chromatin accessibility data showed roughly 50% more accessible or introns, and relatively small portions (<10%) were in promoter chromatin peaks (total of 300,693 peaks) than the bulk ATAC-seq or 5′ untranslated regions (Supplementary Fig. 12a). Moreover, data (Fig. 1e, see “Methods” section). As expected, in general, the there were more developmental stage-specific distal and intronic overlap with bulk samples was greater for common cell types, such regions (Supplementary Fig. 12b), which is consitent with as PT and DCT, than rare cells, such as immune cells, indicating developmental stage-specific DNA demethylation of the kidney28. that the snATAC-seq data was particularly powerful in identifying In addition, around half of the open chromatin peaks overlapped open chromatin areas that are accessible in single cell types. with previously published P0 or adult H3K27Ac ChIP-seq In parallel, we also generated a single cell RNA sequencing signals29 (Supplementary Fig. 12c, see “Methods” section). (scRNA-seq) atlas for mouse kidney samples at P0 and P56. Taken together, these results indicated a critical role of Rigorous quality control yielded a set of 43,636 single cells elements. (Fig. 1b and Supplementary Data 1). Quality control metrics such To study the open chromatin heterogeneity across cell types as gene counts, UMI counts, and mitochondrial gene percentage and developmental stages, we derived a cell type-specific accessible along with batch correction results are shown in Supplementary chromatin landscape by conducting pairwise Fisher’s exact test for Fig. 4a–d. We obtained 17 clusters by unbiased clustering23 and each peak between every cluster (Benjamini–Hochberg adjusted batch effect correction (Supplementary Fig. 4e, f). On the basis of q value ≤ 0.05, see “Methods” section). In total, we identified marker gene expression, we identified kidney epithelial, immune, 60,684 differentially accessible open chromatin peaks (DAPs) and endothelial cells (Fig. 1f and Supplementary Fig. 5a–c), across the 13 cell types (Supplementary Data 5 and Fig. 2a). closely resembling the clustering obtained from snATAC-seq Among these peaks, most showed high specificity for a single analysis. We then conducted differential expression (DE) analysis cluster. However, we noticed overlaps between the PCT and PST on the clusters and identified key marker genes for each cell type segments-specific peaks, as well as between the loop of Henle and (Supplementary Data 2). Correcting gene expression matrices for distal convoluted tubule segments, which is consistent with their ambient RNA with SoupX24 yielded very similar clusters. Average biological similarities. In addition to the cell type-specific peaks, gene expression was highly correlated before and after ambient we also found some cell type-independent open chromatin areas RNA correction (Supplementary Figs. 6 and 7, see “Methods” (present across nephron progenitors, podocytes, proximal tubule section). DE genes for cell types retrieved after correction for and loop of Henle cells), likely consisting of basal housekeeping ambient RNA are accessible in Supplementary Data 3. genes and regulatory elements. (Supplementary Fig. 12d). To compare the consistency between cluster assignment in the We noticed that many genes had strong cell type-specific DAPs snATAC-seq data and the scRNA-seq data, we next pooled each at their TSS. Other genes, however, had accessible chromatin at snATAC cluster and derived gene activity scores for the top 3000 their TSS in multiple cell types. For example, Umod, a marker highly variable genes and computed the Pearson’s correlation gene for loop of Henle, showed accessible chromatin at TSS in coefficient between each snATAC cluster and scRNA cluster (see multiple tubule cell types (Supplementary Fig. 12e). Rather than “Methods” section). This analysis indicated good concordance with its TSS, cell type-specific chromatin accessibility of Umod between the two datasets (correlation for P0 samples see Fig. 1g, strongly correlated with an upstream peak, which is likely an for adult samples see Supplementary Fig. 8). While the correlation enhancer region, as indicated by the H3K27Ac ChIP-seq signal. between gene expression and inferred gene activity score was high, In addition, we noticed enrichment of intronic regions and distal we noted some differences in cell proportions, which was likely elements (Supplementary Fig. 13a, b) in cell type-specific DAPs, related to the sample preparation-induced cell drop-out (Supple- indicating their role in cell type-specific gene regulation. mentary Figs. 2d and 5a). Consistent with previous observations These observations motivated us to study cis-regulatory that single cell preparations are more biased towards immune cells elements using the snATAC-seq data and scRNA-seq data. We than single nuclear preparations25, we noted that the immune cell reasoned that a subset of the cell type-specific regulatory elements repertoire was limited in the snATAC-seq dataset; on the other should correlate with cell type-specific gene expression. Inspired hand, stromal cells were better captured by the nuclear by Zhu et al.30, we aligned DAPs and DEGs from our snATAC- preparation. Further sub-analysis of the stromal cluster indicated seq and scRNA-seq datasets, and inferred the putative regulatory a heterogenous stromal cell population in both P0 and adult peak-gene pairs by their proximity (see “Methods” section). Such samples (Supplementary Figs. 9 and 10). Our data are in keeping cis-regulatory elements predictions were confirmed by comparing with recent findings highlighting considerable heterogeneity in the with cis-regulatory elements inferred previously31, as we recapi- developing kidney interstitium26. DE genes for stroma subclusters tulated roughly 20% of elements from their analysis. In addition, are shown in Supplementary Data 4. Additional experimental our analysis was able to identify several known enhancers such as validation is needed to confirm and further delineate stromal cells. for Six2 and Slc6a1831,32 (Supplementary Fig. 13c). To allow the interactive use of this dataset by the community, To quantify the contribution of cis-regulatory elements, we we not only made the raw data available but also the processed analyzed peak co-accessibility patterns using Cicero33. Using a dataset via our searchable website (susztaklab.com/developin- heuristic co-accessible score 0.4 as a cutoff, we identified 232,380 g_adult_kidney/snATAC/ for snATAC-seq data, susztaklab.com/ and 206,701 cis-regulatory element links in the P0 and adult data, developing_adult_kidney/scRNA/ for scRNA-seq data, and respectively. However, it is worth-noting that the interaction susztaklab.com/developing_adult_kidney/igv/ for IGV view of between genomic elements is complex and is not limited to co- peak tracks). Supplementary Fig. 11 shows an example of the accessible peaks. Ace2 in the website. Given the complex interaction between genomic regions, we next looked into identifying key TFs that occupy the cell type- specific open chromatin regions. Until now, information on cell Characterization of the cell type-specific regulatory landscape. type-specific TFs in the kidney has been scarce. Therefore, we To characterize different genomic elements captured by performed motif enrichment analysis on the cell type-specific

NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1

Kidney Nephron sc/sn extraction GEM capture snATAC & scRNA-seq analysis tissue digestion a P0

Adult

b snATAC (28,316 cells) scRNA (43,636 cells) c snATAC Stroma1 Neutro

Immune PC Macro PCT IC Podo LOH PST Stroma2 Stroma 2 DCT Proliferating

LOH Endo IC DCT

Early PT NP Stroma 1 CNT scRNA PCT PST PC Podo NP Endo

chr5:139,543,229- chr1:156,308,359- chr13:4,572,243- chr2:125,151,789- chr8:94,327,939- chr6:41,679,664- chr15:99,578,086- chr1:138,272,545- chr2:26,580,190- chr4:136,885,198- chr1:45,310,499- 139,545,299 156,313,692 4,575,528 125,154,132 94,331,469 41,683,749 99,580,924 138,275,211 26,585,159 136,887,332 45,312,950 d 2,058 bp 5,300 bp 3,266 bp 2,330 bp 3,509 bp 4,060 bp 2,821 bp 2,650 bp 4,939 bp 2,122 bp 2,437 bp

Uncx Nphs2 Akr1c21 Slc12a1 Slc12a3 Trpv5 Aqp2 Atp6v1g3 Egfl7 C1qb Col3a1 NP [0,10] [0,20] [0,8] [0,30] [0,40] [0,25] [0,25] [0,20] [0,18] [0,20] [0,15] Podo e Not Overlapped 350 PCT Overlapped PST 300 LOH 250 DCT 200 CNT PC 150 IC 100 Endo 50 Immune 0 Stroma1 Number of peaks (X1,000) single bulk Stroma2 cell

Correlation f NP g -0.7 0.7 Podo NP PCT Podo PST PCT LOH PST DCT LOH PC DCT IC PC Endo IC Macro Stroma Neutro Endo Stroma Immune IC NP PC PST PCT LOH DCT Endo Podo Stroma Immune

Fig. 1 snATAC-seq and scRNA-seq identified major cell types in developing and adult mouse kidney. a Schematics of the study design. Kidneys from P0 and adult mice were processed for snATAC-seq and scRNA-seq followed by data processing and analysis including cell type identification and peak calling; artwork own production and from https://smart.servier.com, license https://creativecommons.org/licenses/by-sa/3.0/). b UMAP embeddings of snATAC-seq data and scRNA-seq data. Using marker genes, cells were annotated into nephron progenitors (NP), collecting duct intercalated cells (IC), collecting duct principal cells (PC), proximal convoluted and straight tubule (PCT and PST), loop of Henle (LOH), distal convoluted tubules (DCT), stromal cells (Stroma), podocytes (Podo), endothelial cells (Endo), and immune cells (Immune). In scRNA-seq data, the same cell types were identified, with an additional proliferative population and immune cells were clustered into neutrophils and macrophages. c UMAP embeddings of snATAC-seq and scRNA- seq data colored by P0 and adult batches. d Genome browser view of read density in each snATAC-seq cluster at cell type marker gene transcription start sites. Additional marker gene examples are shown in Supplementary Fig. 3a. e Comparison of peaks identified from snATAC-seq data and bulk ATAC-seq data. Peaks that are identified in both datasets are colored blue, and peaks that are dataset-specific are gray. f Violin plots showing cell type-specific gene expression in scRNA-seq data. g Heatmap showing Pearson’s correlation coefficients between snATAC-seq gene activity scores and gene expression values in P0 data. Each row represents a cell type in scRNA-seq data and each column represents a cell type in snATAC-seq data. The correlation of the adult dataset is shown in Supplementary Fig. 3b.

4 NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1 ARTICLE

Homer enrichment % target Motif P-val Cell type-specific TF expression 60,684 cell type-specific peaks seq Six2 35.73 1e-270 Six2 a NP Hoxc9 18.33 1e-151 Hoxc9 Podo Wt1 26.07 1e-125 Wt1 Mafb 12.79 1e-20 Mafb PCT Hnf4a 29.78 1e-491 Hnf4a PST Ppara 12.14 1e-267 Ppara Normalized accessibility signal Bhlhe41 18.71 1e-31 Bhlhe41 LOH Esrrb 24.56 1e-20 Esrrb 03 DCT Foxa1 20.89 1e-2 Foxa1 Normalized Vdr 4.74 1e-5 gene expression PC Vdr Elf5 18.49 1e-34 Elf5 -1.5 1.5 IC Tcfcp2l1 8.34 1e-74 Tfcp2l1 Erg 64.77 1e-1009 Endo Erg Sox17 22.37 1e-189 Sox17 Immune PU.1 34.78 1e-410 Spi1 Batf 13.82 1e-25 Stroma 1 Batf Twist1 3.05 1e-10 Twist1 Stroma 2 Nr2f2 29.11 1e-24 Nr2f2 IC PT PC NP DCT LOH Podo Endo Stroma Immune

Cells b d Regulon TF gene Predicted target gene activity expression expression off on low high low high Podo PCT Stroma Early NP LOH PST Prolif PT Regulons Eya1 Hnf1a Uncx Hoxc8 Pax2 Ppargc1a

Cell cluster c Regulon density LOH PST LOH Hmgb3 LOH Spock2 PCT high PST PCT Early PT Early PT Stroma Podo Podo Stroma

Proliferating Mafb NP Wnt4 low Proliferating Regulon activity NP

Fig. 2 Cell type-specific gene regulatory landscape of the mouse kidney. a Left panel: Heatmap showing all cell type-specific differentially accessible peaks (DAPs) (yellow: open chromatin, blue: closed chromatin) (peak loci are provided in Supplementary Data 5). Middle panel: Examples of cell type- specific motif enrichment analysis using Homer (full results are shown in Supplementary Data 6). Right panel: TF expression z-score heatmap that corresponds to the motif enrichment in each cell type. b Heatmap of cell type-specific regulons, as inferred by SCENIC algorithm. Regulon activity was binarized to “on” (black) or “off” (white). c tSNE representation of regulon density as a surrogate for stability of regulon states, as inferred by SCENIC algorithm. d tSNE depiction of regulon activity (“on-blue”, “off-gray”) and TF gene expression (red scale) of exemplary regulons for proximal tubule (Hnf1a), nephron progenitors (Uncx), loop of Henle (Ppargc1a), proliferating cells (Hmgb3), and podocytes (Mafb). Examples of target gene expression of the Uncx regulon (Eye1, Hoxc8, Pax2, Spock2, and Wnt4) are shown in purple scale. Expression of target genes of Hnf1a, Six2, Ppargc1a, Hmgb3, and Mafb is shown in Supplementary Fig. 15a.

NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1 open chromatin regions using HOMER (see “Methods” both data modalities, we identified that the podocyte precursors section)34. The full list of cell type-specific TF binding motifs is differentiated early from the nephron progenitor pool (Fig. 3a, b). shown in Supplementary Data 6. Since several TFs have identical The tubule cell trajectory was more complex with a shared inter- or similar binding sequences, we next correlated motif enrich- mediate stage and later differentiation into proximal tubules and ment with scRNA-seq TF expression. Using this combined motif distal tubules/loop of Henle (Fig. 3a, b and Supplementary enrichment and gene expression approach, we have defined the Fig. 17a–c). We also integrated snATAC-seq and scRNA-seq mouse kidney cell type-specific TF landscape. Examples include data to obtain a single trajectory (see “Methods” section). The cell Six2 and Hoxc9 in nephron progenitors, Wt1 and Mafb in types in this dataset were correctly mapped and the trajectory podocytes, Hnf4a, Ppara, and Bhle41 in proximal tubules, Esrrb resembled the path observed in individual analyses (Supplementary and Foxa1 in loop of Henle, Vdr in distal convoluted tubule, Elf5 Fig. 17e–g). The robustness of developmental trajectories was fur- in principal cells, Tcfcp2l1 in intercalated cells, Erg and Sox17 in ther supported by RNA velocity analysis using Velocyto37 (Sup- endothelial cells, Spi1 and Batf in immune cells, and Twist1 and plementary Fig. 17d), and by comparing with previous human and Nr2f2 in stromal cells (Fig. 2a and Supplementary Fig. 14). mouse kidney developmental studies9,13,14. In order to study the putative target genes of TFs, we examined Building on both the SCENIC-generated gene regulatory TF regulon activity using Single-Cell rEgulatory Network network and the robust differentiation trajectories of the Inference and Clustering (SCENIC)35. SCENIC was designed to snATAC-seq and scRNA-seq datasets, we next aimed to under- reveal TF-centered gene co-expression networks. By inferring a stand chromatin dynamics, identify TFs and driver pathways for gene correlation network followed by motif-based filtration, cell type specification. To this end, we first determined variation SCENIC keeps only potential direct targets of each TF as modules in chromatin accessibility along the three differentiation trajec- (regulons). The activity of each regulon in each cell was quantified tories using ChromVAR38. ChromVAR estimates the accessibility and then binarized to “on” or “off” based on activity distribution dynamics of motifs in snATAC-seq data (see “Methods” section). across cells (see “Methods” section). SCENIC was also able to The cell type-specific TF enrichment score matrix is shown in conduct clustering based on the regulon states of each cell. Supplementary Fig. 18a and Supplementary Data 9. We observed SCENIC results (Fig. 2b–d) indicated strong enrichment in Trps1, three different patterns when analyzing genes of interest (Fig. 3c): Hnf1b, Maf, Hnf1a, and Hnf4a regulon activity in proximal (1) Decrease of TF motif accessibility in all lineages. For example, tubules, Hmga2, Hoxc6, Hoxd11, Meox1, Six2, Tcf4, and Uncx in Sox11 motif enrichment score was high in nephron progenitor nephron progenitors, Esrrg, and Ppargc1a in loop of Henle, cells at the beginning of all three trajectories. It then decreased in Hmgb3 in proliferating cells and Foxc1, Foxc2, Foxd1, Lef1, and all three lineages in parallel, underlining the role of Sox11 in early Mafb in podocytes, respectively. As the expression of several of kidney development. Several other TFs followed this pattern such these TFs was relatively low, possibly exacerbated further by as Six2 and Sox9. (2) Cell type-specific maintenance of chromatin transcript drop-outs, they did not show strong cell type accessibility with advancing differentiation. We observed that enrichment. The regulon-based analysis, however, showed a very chromatin accessibility for the Wt1 motif was high initially clear enrichment. SCENIC also successfully inferred multiple but declined in proximal tubule and loop of Henle lineages, downstream target genes. The full list of regulons and their while its expression is increased in the podocyte lineage. This is respective predicted target genes can be found in Supplementary consistent with the important role of Wt1 in nephron progenitors Data 7, scaled and binarized regulon activity is available in and podocytes39,40. Other TFs that followed this pattern Supplementary Data 8. Examples of regulon activity, correspond- include Foxc2 and Foxl1. (3) A de novo increase in chromatin ing TF expression, and predicted target gene expression are accessibility with cell type commitment and advancing differ- depicted in Fig. 2d and Supplemenatary Fig. 15a. For example, entiation. For example, the chromatin accessibility of Hnf4a and TFs such as Eya1, Hoxc8, Hoxc9, Pax2, Spock2, and Wnt4 are Pou3f3 motif increased in proximal tubule and loop of Henle important downstream targets within the regulon of nephron trajectories, respectively, coinciding with the cellular differentia- progenitor-specificTFUncx, indicating an important transcrip- tion program41. A large number of TFs followed this pattern such tional hierarchy of nephron development36. Corresponding as Mafb (in podocytes), Hnf4a and Hnf1a (in proximal tubule), snATAC-seq tracks for these predicted target genes along with Hnf1b (in both proximal tubule and loop of Henle) as well as the TF motif are depicted in Supplementary Fig. 15b. By Esrrb and Tfap2b (in loop of Henle). comparing the number of cell type-specific TFs reported by Next, we correlated changes in chromatin accessibility-based TF HOMER and SCENIC to that from DEGs in RNA expression motif enrichment with TF expression, and their respective predicted data, it became evident that our integrative cis-regulatory analysis target genes along the trajectories. To this end, we investigated TFs with snATAC-seq and scRNA-seq datasets yielded significant and target genes differentially expressed over scRNA-seq pseudo- benefits in discovering the TF-regulatory network over analyzing time (Supplementary Data 10). We noticed a good concordance of transcript data alone (Supplementary Fig. 16a, b). time-dependent changes of TF and predicted target gene expression along with motif enrichment, including the lineages for podocytes (e.g., Foxc2, Foxl1, Mafb, Magi2, Nphs1, Nphs2, Plat, Synpo, Thsd7a, The regulatory trajectory of nephron progenitor differentia- Wt1,andZbtb7c), proximal tubule (e.g., Ace2, Atp1a1, Dab2, Hnf1a, tion. All cells in the body differentiate from the same genetic Hnf4a, Hsd17b2, Lrp2, Maf, Slc12a3, Slc22a12, Slc34a1,andWnt9b), template. Cell type-specific chromatin opening and closing events loop of Henle (e.g.,Cyfip2, Cytip, Esrrb, Esrrg, Irx1, Irx2, Mecom, associated with TF binding changes set up the cell type-specific Pla2g4a, Pou3f3, Ppargc1a, Stat3, Sytl2, Tfap2b, Thsd4,andUmod), regulatory landscape resulting in cell type specification and devel- as well as for both proximal tubule and loop of Henle (e.g., Bhlhe40, opment. We found that closing of open chromatin regions was the Hnf1b,andTmprss2), respectively (Fig. 3c and Supplementary predominant event during the nephron progenitor differentiation Fig. 18b). Most interestingly, we noticed two distinct patterns of (Supplementary Fig. 12d). We then evaluated the cellular differ- how gene expression was related to chromatin accessibility. While entiation trajectory in the snATAC-seq and scRNA-seq datasets gene expression of TFs increased over pseudotime, its correspond- (see “Methods” section). We identified the multiple nephron pro- ing motif accessibility either increased in parallel (such as Hnf4a genitor sub-groups (Fig. 3a, b), which will need to be carefully and Pou3f3) or was maintained in a lineage-specific manner (such mapped to prior gene expression-driven and anatomical location- as Wt1). This might indicate different regulatory mechanisms driven nephron progenitor sub-classification. Consistently, across during differentiation.

6 NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1 ARTICLE

a snATAC-seq b scRNA-seq LOH Podo Podo

NP PT

LOH Pseudotime NP

0 5 10 15 PT UMAP 2 UMAP UMAP 2 UMAP DCT UMAP 1 UMAP 1 c chromVAR TF enrichment TF expression Predicted target gene expression Sox11 Sox11 Sox11 Notch1 Lhx1 0.1

0.0 NP

−0.1

01020

Wt1 Nphs2 Nphs1 Wt1 0.1

Podo 0.0

10 20 0.10 Hnf4a Hnf4a Lrp2 Ace2 0.05

0.00 PT −0.05

−0.10

01020 Pou3f3 Pou3f3 Irx2 Mecom

LOH

0.15 Hnf1b Hnf1b Vdr Tmprss2

0.10

0.05

0.00 PT & LOH −0.05

01020 Pseudotime

Fig. 3 The cellular trajectory of nephron progenitor differentiation. a UMAP representation of snATAC-seq nephron progenitor differentiation trajectory towards podocytes, proximal tubule, loop of Henle and distal convoluted tubule, respectively, as inferred by Cicero. Cells are colored by pseudotime. b UMAP representation of scRNA-seq nephron progenitor differentiation trajectory towards podocytes, proximal tubule and loop of Henle, respectively, as inferred by Monocle3. Cells are colored by pseudotime. c Pseudotime-dependent chromatin accessibility and gene expression changes along the proximal tubule (red), podocyte (green), and loop of Henle (blue) cell lineages. The first column shows the dynamics of chromVAR TF enrichment score, the second column shows the dynamics of TF gene expression values and the third and fourth column represent the dynamics of SCENIC-reported target gene expression values of corresponding TFs, respectively. Error bars denote 95% confidence intervals of local polynomial regression fitting. Additional examples are given in Supplementary Fig. 18b.

NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1

We next aimed to interrogate the stage-dependent chromatin The intermediate cells (IM) gave rise to proximal and distal dynamics along the identified differentiation trajectory. The branches representing the proximal tubules, and the loop of differentiation trajectory was binned into 15 developmental steps Henle as well as distal convoluted tubule segments. The proximal based on the lineage specification in pseudotime inference tubule region was characterized by chromatin opening around (Supplementary Fig. 17b, c). These stages were labeled as NP1- Hnf4a, Maf, Tprkb, and Gpat2. The loop of Henle and distal 3 (nephron progenitor), IM1-2 (intermediate cells), Podo1-3 convoluted tubule segments were remarkable for multiple DAPs (podocytes), PT1-3 (proximal tubule), LOH1-2 (loop of Henle), in the vicinity of Tfap2a, Tfap2b, Cited4, Ephb2, Ephb3, Hoxd8, and DCT (distal convoluted tubule). By studying gene activity Mecom, and Prmd16, indicating a critical role for these TFs in score enrichment of prior cell marker-based annotations, we distal tubule differentiation (Figs. 3c and 4 and Supplementary showed that NP3 matches with renal vesicle signatures, IM1-2 Fig. 21). Consistently, we saw a reduction in chromatin and Podo1 match with Comma-shaped and S-shaped body that accessibility of Six2 promoters and enhancers along all three committed to tubule and podocyte fates, respectively (Fig. 4b). To trajectories (podocyte, proximal tubule, and loop of Henle) study the chromatin opening and closing, we conducted (Supplementary Fig. 22). There was also a decrease in accessibility differential chromatin accessibility analysis between subsequent of Jag1 and Heyl in the distal loop of Henle segment, concordant stages. To understand the biological processes controlled by the with the putative role of Notch driving the proximal tubule fate44 epigenetic changes, we examined the nearby genes and performed (Supplementary Data 14). Another striking observation was that functional annotation of these peaks (see “Methods” section). We tubule segmentation and specification occurred early by an found that open chromatin profiles were relatively stable in the increase in chromatin accessibility around Lhx1, Hnf1a and early precursor stages such as NP1 to NP3, with fewer than 70 Hnf4a and Maf for proximal tubule and Tfap2b for loop of Henle. DAPs identified (Supplementary Data 11 and Supplementary Terminal differentiation of proximal tubule and loop of Henle Fig. 19). The podocyte differentiation branch was associated with cells was strongly linked to nuclear receptors that regulate marked increase in the number of DAPs (796 DAPs between NP3 metabolism, such as Esrra and Ppara in proximal tubules and and Podo1). This mainly represented the closing of chromatin Esrra and Ppargc1a in the loop of Henle segment, once more areas around nephron progenitor-specific genes such as Osr1, indicating the critical role of metabolism of driving gene Gdnf, Sall1, Pax2 and opening of areas around podocyte-specific expression and differentiation45. genes and key TFs such as Foxc2 and Efnb2, both of which are validated to be important for early podocyte differentiation42,43. At later stages, there was a strong increase in chromatin Stromal-to-epithelial communication is critical in the devel- accessibility of actin filament-based processes and a significant oping and adult kidneys. Previous studies indicated that the decrease in Notch1, Notch2, and Ctnnb1 in the podocyte lineages survival, renewal, and differentiation of nephron progenitors is (Supplementary Data 12). The chromatin changes from NP3 to largely regulated through its cross-talk with the adjacent ureteric intermediate cells 1 (IM1) were mainly associated with closing of bud46. To investigate the complex cellular communication net- the chromatin around Osr1 and opening around tubule cell- work, we used CellPhoneDB47 to systematically infer potential specific TFs, such as Lhx1 and Pax3 (Fig. 4). The decrease in Six2 cell–cell communication in the developing and adult kidney. expression only occurred at the IM2 stage, at which we also CellPhoneDB provides a comprehensive database and a statistical observed an increase in tubule specification genes such as Hnf1a. method for the identification of ligand– interactions in results from the 820 up-regulated peaks between scRNA-seq data. Analysis of our scRNA-seq dataset indicated PT1 and IM2 showed enrichment associated with typical that the number of cell–cell interaction pairs was larger in proximal tubule functions including sodium-dependent phos- developing kidney compared to the adult kidney (Fig. 5a). In the phate transport, maintenance of osmotic response in the loop of developing kidney, the stroma showed the greatest number of Henle and active sodium transport in the distal convoluted tubule interactions among all cell types, coinciding with the well-known (Supplementary Fig. 19, the full list can be found in Supplemen- role of epithelial–stromal interactions in driving kidney tary Data 11, 12, and 13). development44. Of the identified interactions, many were related In addition to analyzing changes along the trajectory, we also to stroma-secreted molecules such as collagen 1, 3, 4, 6, and 14 examined cell-fate bifurcation events by comparing two descen- (Fig. 5b). Interestingly, the nephron progenitor cluster showed dant lineages. We found that podocyte specification from important ligand-receptor interaction between Fgf1, Fgf8 as well nephron progenitors was associated with differential opening of as Fgf9 and the corresponding receptor Fgfr1, which is consistent Foxl1, Zbt7c, and Smad2 in the podocyte lineage and Lhx1, Sall1, with the well-known role of FGF signaling in kidney Dll1, Jag1, Cxcr3, and Pax3 in the other lineage, respectively. development48. Of the manifold identified interactions in the fetal While the role of several TFs has been established for podocyte kidney, stromal interaction and the VEGF-involving interaction specification, the expression of Foxl1 has not been described in remained significant in the adult data set, underscoring the the kidney until now (Fig. 4). Our analysis pinpointed that four importance of endothelial-to-epithelial communication. peaks in the vicinity of Foxl1 were accessible only in podocyte We next individually examined the expression of several key lineage, which locate in +53,381 bp, +152,832 bp, +237,019 bp, pathways known to play important roles in kidney development, and +268,550 bp of the Foxl1 TSS, respectively. To confirm the such as Gdnf-Ret, sonic hedgehog, FGF, Bmp, Wnt, and others9. expression of Foxl1 in nephron progenitors and podocytes, we Expression of these key ligand–receptor pairs showed strong cell performed immunofluorescence studies on developing kidneys type specificity (Fig. 5c). For example, Robo2 of the Gdnf-Ret (E13.5, P0, and P6). Consistent with the computational analysis, pathway was expressed in nephron progenitors and in podocytes we found strong expression of FOXL1 in nephron progenitors of P0 and adult kidney49–51. Eya1, however, is genetically (E13.5). At later stages, FOXL1 expression in glomerular upstream of Gdnf and acts as a positive regulator for its podocyte was confirmed by co-localization with WT1 (Supple- activation52. Consistently, we noted distinct cell type specificity mentary Fig. 20). Early and late stage expression of Foxl1 were of Eya1 expression only in nephron progenitors, which was also also confirmed in scRNA-seq data (Supplementary Fig. 21). true for other important signaling molecules such as Ptch1, Smo, While further experimental validation will be important, our and Gli3 of the sonic hedgehog pathway. Fgfr1 showed the highest study has illustrated the critical role of open chromatin state expression in nephron progenitors as well as in fetal and adult information and dynamics in cellular differentiation. stroma, underscoring the importance of FGF signaling for

8 NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1 ARTICLE

a Blue: stage-dependent chromatin closing NP1 Red: stage-dependent chromatin opening FOXL1 P0 (mouse) Green: binary decision NP2 Pr b Med NP3 Di Foxl1 Lhx1 UE Zbtb7c Sall1 Smad2 Dll1 Jag1 Cxcr4 Hoxc4 Sall1 Pax3 Osr1 Hoxc5 Osr1 Tbx5 Pax2 Gdnf Foxl1 Cxcr4 Six2 Smad2

Podo1 IM1 Nphs2 Wnt4 Sall1 Mafb Hif1a Gdnf # Zeb2 Notch2 Klf6 Six2 Lef1 Klf3 Pax2 Efnb2 Gli3 Pax8 Irx1 Eya1 Podo2 Hnf1a Sox11 Uncx Normalized Lhx1 gene activity NP Pax2 Jag1 Nphs2 -1.5 1.5 Podo Hif1a Klf6 Notch2 PCT Lhx1 Dach1 Notch1 IM2 PST Ctnnb1 LOH Hnf4a Cited4, Ephb2, Ephb3, Hoxd8, Mecom, Prdm16 DCT Podo3 Maf Tfap2a/b Tprkb Slc12a1 Hnf4a Gpat2 Fxyd2 Hnf1a Esrrb Slc34a1 Wnt4 HNF4A Adult (human) Arnt2 TFAP2B Adult (human) Slc51b Esrra Klf15 Dll1 Klf15 Ppargc1a Heyl Wnt7b Jag1 Col4a4 Maf PT1 LOH1 Stat2 Lhx1 Ppara Slc12a1 Sall1 Ppargc1a Pax2 Esrrb Wnt4 PT2 Sox9 Pax8 Uncx NP NP LOH2 Podo Podo PCT Slc12a3 Emx1 PCT Trpm6 Sall3 PST PST PT3.1 PT3.2 Vdr LOH LOH DCT DCT DCT

Fig. 4 Chromatin dynamics of nephron progenitor differentiation. a Di-graph representing cell type and lineage divergence, as derived from Cicero trajectory inference. Nephron progenitors (NP), podocytes (Podo), intermediate stage (IM), proximal tubule (PT), loop of Henle (LOH), and distal convoluted tubule (DCT) are connected with their developmental precursor stages and represented by ascending numbering. Arrows represent cell differentiation along respective trajectories. Genes listed next to the trajectories were derived from analyzing gene enrichment of differentially assessible peaks (DAPs) between two stages. Genes colored red were derived from the opening DAPs between two stages, genes colored blue were derived from the closing DAPs between two stages, and genes colored green were derived from opening DAPs between two branches. Three important genes, Foxl1, Hnf4a, and Tfap2b are shown along with their cell type-specific accessibility peaks and immunostaining results. Peaks that were open during the development of specific cell types are shown in red boxes. Immunofluorescence staining of P0 mouse kidney shows FOXL1 in red in the proximal part (Pr) of late S-shaped bodies (cross) and in podocytes within primitive glomeruli (#); green staining denotes E-Cadherin, blue DAPI; Med, medial part of S-shaped body; Di, distal part of S-shaped body; UE, ureteric epithelium; images are representative of three independent experiments, scale bars = 25 µm. HNF4A and TFAP2B in human adult kidney samples (images CAB019417 and HPA034683, respectively, taken from the Human Atlas version 19.3, http://www. proteinatlas.org83, https://creativecommons.org/licenses/by-sa/3.0/) are visualized by immunohistochemistry in brown, scale bar = 25 µm. In addition, gene expression changes along the three trajectories of genes that were identified by GREAT analysis to be nearby chromatin closing and opening events demonstrated to be consistent to chromatin information, examples are visualized in the top right subpanel. b Heatmap showing normalized gene activity scores of lineage marker genes in the 15 developmental stages.

NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1

a b Stroma Podo Endo PT LOH NP

DCT PT Neutro IC Stroma Endo Podo LOH NP PC Macro IC PT NP PC DCT LOH Endo Podo Macro Neutro Stroma log(count) 2 2.5 3 3.5 4

PT Neutro DCT LOH Stroma IC Podo Macro Endo PC IC PT PC DCT LOH Endo Podo Macro Neutro Stroma

c NP Podo PCT PST LOH DCT

P0 PC IC Endo Macro Neutro Stroma

Podo PCT PST LOH DCT PC

Adult IC Endo Macro Neutro Stroma

Fig. 5 Cell–cell communication analysis in the developing and adult mice highlighted the critical role of stroma in driving cell differentiation. a Heatmaps showing the number of cell-cell interactions in the scRNA-seq dataset of P0 (top) and adult (bottom) kidneys, as inferred by CellPhoneDB. Dark blue and dark red colors denote low and high numbers of cell–cell interactions, respectively. b CellPhoneDB-derived measures of cell–cell interaction scores and p values. Each row shows a ligand-receptor pair, and each column shows the two interacting cell types, which is binned by cell type. Columns are sub-ordered by first interacting cell type into stroma, podocytes, endothelial cells, proximal tubule, loop of Henle, and nephron progenitors. Color scale denotes the mean values for all the interacting partners, where mean value refers to the total mean of the individual partner average expression valuesin the interacting cell type pairs. Orange scale denotes P0, blue scale denotes adult. Dot size denotes corresponding p values of the (one-sided) permutation test with 1000 permutations. c Dot plots of RNA expression of important cell–cell communication candidates within the Gdnf-Ret, Sonic hedgehog, Fgf, Bmp, Wnt, and other pathways in both P0 (top) and adult (bottom) kidney. Dot size denotes percentage of cells expressing the marker. Color scale represents average gene expression values, orange denotes P0, blue denotes adult. Arrows indicate ligand–receptor pairs.

10 NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1 ARTICLE cell–cell interactions in both the developing and developed adult kidney was low (Fig. 6g). We observed open chromatin kidney. Because not much is known about some of these markers, areas in multiple cell types including nephron progenitors, the significance of these putative interactions requires further podocytes, and proximal tubule (Fig. 6g). The human kidney investigation. For example, Rspo3 has been implicated in nephron cell type-specific open chromatin landscape showed consistent progenitor-associated interactions during nephrogenesis10. Muta- overlap with multiple SNPs in multiple cell types (Fig. 6h). tions in the Itga8 gene are known to cause isolated congenital Interesting to note that one SNP with strong nephron progenitor- anomalies of kidney and urinary tract in humans50. specific enrichment also coincided with the Six2 binding area (Fig. 6i). While this finding is enticing, further experimental validation is warranted in order to uncover mechanistic insights. Murine single cell chromatin accessibility implicates potential Closer views of all three loci are shown in Supplementary Fig. 23. human kidney GWAS mechanisms, target regulatory regions, Lastly, we also overlapped all disease-associated SNPs (after genes, and cell types. Finally, we examined whether single cell lift-over) with peaks in snATAC-seq data. The full table including level chromatin accessibility data can help implicate cell and gene nearest genes is provided in Supplementary Data 15. These results targets for human kidney diseases. GWAS have been exceedingly indicate that variants associated with kidney disease development successful in identifying nucleotide variations associated with are located in regions with murine cell type-specific and specific diseases or traits. However, the majority of the identified developmental stage-specific regulatory activity, and illustrate genetic variants are in the non-coding region of the genome. the potential power of snATAC-seq in nominating target genes Initial epigenome annotation studies indicated that GWAS hits and target cell types for GWAS variants. are enriched in tissue-specific enhancer regions. As there are many different cell types in the kidney with differing function, understanding the true cell type specificity of these enhancers is Discussion critically important. In summary, here we present the first cellular resolution open Here, we reasoned that murine single cell accessible chromatin chromatin map with both developing and adult mouse kidney. information could be useful in nominating potential cell type- Using this dataset, we identified key cell type-specific regulatory specific regulatory regions, and thereby the target cell type for the networks for kidney cells, defined the cellular differentiation GWAS hits. We combined three recent kidney disease trajectory, characterized regulatory dynamics and identified key GWAS53–55, and obtained 26,637 single nucleotide polymorph- driving TFs for nephron development, especially for the terminal isms (SNPs) that passed genome-wide significance level. After differentiation of epithelial cells. Furthermore, our results shed lift-over, we identified 7923 variants by orthologous mapping light on potential cell types and target genes for genetic variants from human to mouse. associated with kidney disease. First, we examined eGFR GWAS loci where functional By performing massively parallel single cell profiling of chro- validation studies reported conflicting results on target cell types matin state, we were able to define the key regulatory logic for and target genes (Fig. 6). First, we examined the region around each kidney cell type by investigating cis-regulatory elements, Uncx, for which reproducible association with kidney function was motif enrichment, and TF-target gene co-expression. These ana- shown in multiple kidney function GWAS53,54. Interestingly, the lyses helped the identification of key TFs for different kidney cell orthologous GWAS locus demonstrated a strong open chromatin types. TF identification is challenging in scRNA-seq data, since the region in nephron progenitors but not in any other differentiated expression of several cell type-specific TFs is low59. We showed cell types of the murine (Fig. 6a) or human (Fig. 6b) kidney. that by extracting motif information, snATAC-seq data provided Consistently, by examine epigenome data (including H3K27ac, critical complementary information for TF identification. H3K4me1, H3K4me3, and ChIP-seq data across multiple stages), In our dataset, we also observed that the single cell open we observed enrichment for H3K27ac, H3Kme1, and Six2 binding chromatin atlas was able to define more distinct cell types even in at this locus. We also detected Uncx gene expression in the fetal the developing kidney compared to scRNA-seq analysis. Given kidney samples, but not in the adult kidney (Fig. 6c). the continuous nature of RNA expression, it has been exceedingly Next, we analyzed the 15 GWAS region around difficult to dissect specific cell types in the developing Dab2,whereweidentified open chromatin regions in multiple kidney kidney9,10,13. In addition, it has been difficult to resolve the cell cells. Dab2 expression, on the other hand, strongly correlated with type origin of transcripts expressed at low levels in scRNA-seq open distal enhancer regions in proximal tubule cells (Fig. 6d). More data. However, in our snATAC-seq data, the low dimensional importantly, this region was also accessible in human proximal embedding showed distinct clusters. In addition, snATAC-seq tubule cells (Fig. 6e). This is consistent with earlier publications data were able to capture the chromatin state irrespective of gene indicating the role of proximal tubule-specific DAB2 expression in expression magnitude. There were several examples where kidney disease development56. Interestingly, while single cell analysis accessible peaks were identified in specific cell types even for indicated an additional distal enhancer in intercalated cells, the genes expressed at low levels, such as Shroom3. GWAS-significant region coincided with the proximal tubule-specific Our studies revealed dynamic chromatin accessibility that enhancer region and the open chromatin areas showed strong tracks with renal cell differentiation. These states may reveal coregulation (Fig. 6f). Bulk kidney (ChIP-Seq) regulatory annotation mechanisms governing the establishment of cell fate during indicated enhancer mark enrichment in the adult but not in the fetal development. We found a coherent pattern between gene kidneys. expression and open chromatin information, where the nephron We also examined loci where functional validation studies progenitors differentiated into two branches representing podo- reported conflicting results on target cell types and target genes. cytes and tubule cells60. We found that podocyte commitment The SHROOM3 locus has shown a reproducible association with occurred earlier, while tubule differentiation was more complex. kidney function in multiple GWAS55. While one study indicated The podocyte specification correlated with maintained expression that the genetic variants were associated with an increase in of Foxc2 and Foxl1, as evidenced by immunofluorescence. While SHROOM3 levels in tubule cells inducing kidney fibrosis57, the Foxc2 has been known to play a role in nephron progenitors and other suggested that the variant was associated with lower podocytes, this is the first description of Foxl1 in kidney and Shroom3 levels in podocytes resulting in chronic kidney disease podocyte development. We confirmed the key role of Hnf4a in development58. We noticed that the expression of Shroom3 in the proximal tubules and identified transcriptional regulators, such as

NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1

a d g

Uncx (mm10) RNA Dab2 (mm10) RNA Shroom3 (mm10) RNA rs11762900 rs550583202 expr expr rs3849774 rs56049812 expr

NP NP NP Podo Podo Podo PCT PCT PCT PST PST PST LOH LOH LOH DCT DCT DCT PC PC PC IC IC IC Endo Endo Endo Imm Imm Imm Stroma Stroma Stroma chr5:139,525,148-139,576,090 chr15:6,361,549-6,437,170 chr5:92,553,048-92,778,286 50 kb 74 kb 221 kb b UNCX (hg19) e DAB2 (hg19) h SHROOM3 (hg19)

Podo Podo Podo PCT PCT PCT PST PST PST LOH LOH LOH DCT DCT DCT PC PC PC IC IC IC Endo Endo Endo Imm Imm Imm Stroma Stroma Stroma H3k27ac H3k27ac H3k27ac H3k4me1 H3k4me1 H3k4me1 H3k4me3 H3k4me3 H3k4me3 chr7:1,253,160-1,304,903 chr5:39,338,045-39,518,009 chr4:77,154,436-77,495,974 50 kb 177 kb 336 kb c Uncx (mm10) f Dab2 (mm10) i Shroom3 (mm10) [0-10] [0-5] [0-5] E15.5 E15.5 E15.5 P0 P0 P0 Adult Adult Adult [0-5] [0-15] [0-5] E15.5 E15.5 E15.5 P0 P0 P0 Adult Adult Adult H3K4me1 H3K4me1 H3K4me1 [0-150] [0-150] [0-80] E15.5 E15.5 E15.5 P0 P0 P0 Adult Adult

Adult H3K4me3 H3K4me3 H3K4me3 [0-5] [0-5] [0-10] NPC NPC NPC NPC NPC NPC Six2 H3K27ac NPC Six2NPC H3K27ac NPC [0-1] [0-1] E15.5 E15.5 [0-1] E15.5 P0 P0 P0 WGBS WGBS Six2 H3K27ac Adult WGBS Adult Adult [0-10] [0-5] [0-5] E14.5 E14.5 E14.5 E15.5 E15.5 E15.5 seq E16.5 E16.5 E16.5 RNA-seq RNA-seq P0 P0 RNA- P0 Adult Adult Adult Gene Gene Gene chr5:139,525,148-139,576,090 chr15:6,361,549-6,437,170 chr5:92,553,048-92,778,286 50 kb 74 kb 221 kb

Fig. 6 Single cell level chromatin accessibility highlighted human kidney GWAS target genes and cell types. a, d, g Genome browser view of Shroom3, Dab2, and Uncx loci; from top to bottom: mouse orthologue of eGFR GWAS significant SNPs (after lift-over) mouse kidney single nuclei chromatin accessibility for nephron progenitors (NP), podocytes (Podo), proximal convoluted and straight tubules (PCT and PST), loop of Henle (LOH), distal convoluted tubule (DCT), collecting duct intercalated cells (IC), collecting duct principal cell types (PC), endothelial cells (Endo), immune cells (Immune) and stromal cells (Stroma). Data range in all tracks is set to the same scale. Examples of cell type-specific accessible chromatin overlapped with significant SNPs are highlighted with dashed lines. Right subpanel shows violin plots of cell type-specific gene expression in P0 (orange) and adult (blue) kidneys in the scRNA-seq dataset. b, e, h Corresponding genome browser views in adult human (hg19) kidney snATAC-seq data; from top to bottom: eGFR GWAS significant SNPs, adult human kidney single nuclei chromatin accessible landscape, whole kidney H3K27ac, H3K4me1, and H3K4me3 ChIP-seq tracks. The genomic location in human was matched to the mouse orthologue. Note that for DAB2 (f), the image was mirrored to facilitate comparison along genomic read direction. Alternative genome browser views at more zoomed-in locations are available in Supplementary Fig. 23. c, f, i Whole mouse kidney epigenomics tracks from E14.5, E15.5, E16.5, P0, and adult mice. From top to bottom: H3K27ac, H3K4me1, H3K4me3, and Six2 ChIP-seq; whole genome bisulfate sequencing (WGBS); and bulk RNA-seq. The bottom Refseq visualization corresponds to the Refseq track at the top in a, d, g, respectively. Six2 binding signal in nephron progenitor cells.

12 NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1 ARTICLE

Tfap2a for the distal portion of the nephron. Furthermore, the cells in 1X PBS for further steps. Cell number and viability were analyzed using terminal differentiation of proximal tubule cells correlated with Countess AutoCounter (Invitrogen, C10227). The cell concentration was 2.2 mil- the increase in Ppara and Esrra expression, both of which are lion cells/mL with 92% viability. Ten thousand cells were loaded into the Chro- mium Controller (10X Genomics, PN-120223) on a Chromium Single Cell B Chip known regulators of oxidative and fatty acid (10X Genomics, PN-120262), and processed to generate single cell gel beads in the oxidation45. Loop of Henle differentiation strongly correlated emulsion (GEM) according to the manufacturer’s protocol (10X Genomics, with Essrb and Ppargc1a expression. These studies potentially CG000183). The library was generated using the Chromium Single Cell 3′ Reagent fi Kits v3 (10X Genomics, PN-1000092) and Chromium i7 Multiplex Kit (10X indicate that cell speci cation events occur early and metabolism ’ 61 Genomics, PN-120262) according to the manufacturer s manual. Quality control controls terminal differentiation of tubule cells . Impaired for constructed library was performed by Agilent Bioanalyzer High Sensitivity metabolic fitness of proximal tubules has been a key contributor DNA kit (Agilent Technologies, 5067-4626) for qualitative analysis. Quantification to kidney dysfunction, explaining the critical association with analysis was performed by Illumina Library Quantification Kit (KAPA Biosystems, tubule metabolism and function62. KK4824). The library was sequenced on Illumina HiSeq 4000 system with 2 × 150 paired-end kits using the following read length: 28 bp Read1 for cell barcode and Furthermore, we show that murine single cell-level and dif- UMI, 8 bp I7 index for sample index and 91 bp Read2 for transcript. ferentiation stage-level epigenome annotation is potentially powerful for the annotation of human GWAS. Data from human kidney tissues are needed to understand whether the knowledge Single cell ATAC sequencing. Kidneys from 3-week-old and 8-week-old mice from mouse ortholog is transferrable to human, however, our were harvested, minced and lysed in 5 mL lysis buffer (10 mM Tris HCl pH 7.4, TM analyses highlight the usefulness of murine analyses in the con- 10 mM NaCl, 3 mM MgCl2, and 0.1% Nonidet P40 Substitute in nuclease-free water) for 15 min. Kidneys from P0 mice were minced and lysed in 2 mL lysis text of ethical and practicability issues in obtaining human buffer for 15 min. Tissue lysis reaction was then blocked by adding 10 mL 1× PBS samples, especially during developmental stages. Most identified into each tube, and solution was passed through a 40 μm cell strainer. Cell debris GWAS signals are in the non-coding region of the genome and and cytoplasmic contaminants were removed by Nuclei PURE Prep Nuclei Isola- the target gene, as well as target cell type remain unknown. Our tion Kit (Sigma, NUC-201) after centrifugation at 30,000 × g for 45 min. Nuclei concentration was calculated with Countess AutoCounter (Invitrogen, C10227). results indicate that multiple GWAS regions are conserved Diluted nuclei suspension was loaded and incubated in transposition mix from between mice and . Single cell open chromatin informa- Chromium Single Cell ATAC Library & Gel Bead Kit (10X Genomics, PN- tion enables not only to infer the implication of affected cell types, 1000110) by targeting 10,000 nuclei recovery. GEMs were then captured on the but also the understanding of co-regulation of the open chro- Chromium Chip E (10X Genomics, PN-1000082) in the Chromium Controller according to the manufacturer’s protocol (10X Genomics, CG000168). Libraries matin area, which helps infer target genes. Here, we show that the were generated using the Chromium Single Cell ATAC Library & Gel Bead Kit and GWAS variants mapped only to those regions where chromatin Chromium i7 Multiplex Kit N (10X Genomics, PN-1000084) according to the was open exclusively in nephron progenitors, whereas chromatin manufacturer’s manual. Quality control for constructed library was perform by became inaccessible as differentiation progressed during later Agilent Bioanalyzer High Sensitivity DNA kit. The library was sequenced on Illumina HiSeq 4000 system with 2 × 50 paired-end kits using the following read stages, such as for Uncx. This is an interesting and important length: 50 bp Read1 for DNA fragments, 8 bp i7 index for sample index, 16 bp i5 potential mechanism, indicating that the altered expression of this index for cell barcodes and 50 bp Read2 for DNA fragments. gene might play a role in the development rewiring of the kidney. This mechanism would be similar to genes associated with autism that are known to be expressed in the fetal, but not in the adult Bulk ATAC sequencing stages63, and highlights the critical role of understanding chro- Nuclei preparation. Bulk ATAC sequencing was performed with previously pub- lished methods66,67. Kidneys were minced and lysed in 5 mL lysis buffer (10 mM matin accessibility at multiple stages of differentiation. TM Tris HCl pH 7.4, 10 mM NaCl, 3 mM MgCl2, and 0.1% Nonidet P40 Substitute While we have generated a large amount of high-quality data, in nuclease-free water) for 15 min. The reaction was then blocked by adding 10 mL this information will need further experimental validation. In 1× PBS into each tube, and solution was passed through a 40 μm cell strainer. The nuclei were centrifuged down at 500×g at 4 °C, resuspended in the resuspension addition, one needs to be aware of the limitations when interpreting = buffer (10 mM pH 7.5 Tris-HCl, 10 mM NaCl, 3 mM MgCl2). Nuclei numbers different computational analyses, for example, motif enrichment were estimated with Countess AutoCounter (Invitrogen, C10227). analyses such as implemented by HOMER, SCENIC, and chrom- VAR are not able to distinguish between TFs with similar binding Transposition. Fifty thousand nuclei/sample were tagmented with Tagment DNA sites. Future high-throughput studies that analyze open chromatin TDE1 Enzyme and Buffer Kit (Illumina, 20034198) in 50 μL reaction volume of and gene expression information from the same cells will be transposition buffer (25 μL 2× TD buffer (Tagment DNA Buffer), 2.5 μL Tn5 transposase (Tagment DNA Enzyme 1), 0.5 μL 10% Tween-20, 0.5 μL 1% Digito- exceedingly helpful to correlate open chromatin, and gene expres- nin, 16.5 μL 1× PBS, 5 μL nuclease-free water). The reaction was carried out at 31,64,65 sion information along the differentiation trajectory . 37 °C for 30 min in a thermomixer. In summary, our dataset provides critical insight into the cell type-specific gene regulatory network, cell differentiation pro- DNA purification and library construction. Isolated DNA was purified by MinElute gram, and human complex diseases-associated genes. Reaction Cleanup Kit (Qiagen, 28204) by following the manufacturer’s manual. The purified DNA was finally eluted in 10 μL elution buffer. The DNA was then amplified by NebNext High-Fidelity 2× PCR Master Mix (NEB, M0541S) and Methods quantified by qPCR to make libraries. The libraries were purified by AMPure XP Material table. Material table can be viewed in the Supplementary Data 16. beads (Beckman Coulter, A63880). Quality control for constructed library was perform by Agilent Bioanalyzer High Sensitivity DNA kit. Libraries were submitted to 150 bp PE sequencing with Illumina HiSeq 3000 system. Research animals. Mice were housed in a temperature-controlled specific-pathogen- free facility under 12 h light/dark cycles (lights on at 07:00 h, off at 19:00 h). The animal experiments were reviewed and approved by the Institutional Animal Care snATAC-seq data analysis and Use Committee (IACUC) of the University of Pennsylvania in accordance with Data processing and quality control. Raw fastq files were aligned to the mm10 the guidelines of the National Institutes of Health. All applicable international, fi national, and institutional guidelines for the care and use of animals were followed. (GRCm38) reference genome and quanti ed using Cell Ranger ATAC (v. 1.1.0). We only kept valid barcodes with number of fragments ranging from 1000 to 40,000 and mitochondria ratio less than 10%. One of the important indicators for Single cell RNA sequencing of P0 and adult mice. One-day-old wild type mouse ATAC-seq data quality is the fraction of peaks in promoter regions, so we did neonate and adult mouse (C57BL/6) kidneys were harvested and minced into further filtration based on promoter ratio. We noticed the promoter ratio seemed 1mm3 pieces and incubated with digestion solution containing Enzyme D, Enzyme to follow a binary distribution, with most of cells either having a promoter ratio R, and Enzyme A from Multi Tissue Dissociation Kit 1 (Miltenyi, 130-110-201) at around 5% (background) or more than 20% (valid cells) (Supplementary Fig. 2a). 37 °C for 15 min with agitation. Reaction was deactivated by adding FBS to 10%, We therefore filtered out cells with a promoter ratio <20%. After this stringent then solution was passed through a 40 μm cell strainer. After centrifugation at quality control, we obtained 11,429 P0 single cells (5993 in P0_batch_1 and 5436 in 500 × g for 5 min, cell pellet was incubated with 500 μL of RBC lysis buffer on ice P0_batch_2) and 16,887 adult single cells (7129 in P56_batch_3, 6397 in for 3 min. We centrifuged the cells at 500 × g for 5 min at 4 °C and resuspended the P56_batch_4, and 3361 in P21_batch_5).

NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1

Preprocessing. Since snATAC-seq data are very sparse, previous methods either one peak using reduce function from the GenomicRanges package. This resulted in conducted peak calling or binarization before clustering. Here, we chose to do 300,755 peaks, which were used to build binarized cell-by-peak matrix. Differen- binarization instead of peak calling for two reasons: 1) Peak calling is time con- tially accessible peaks (DAPs) for each cell type were identified by pairwise peak suming; 2) Many peaks are cell type-specific, open chromatin regions in rare comparison. populations are more likely to be treated as background. After binning fragments Because of the binary nature of the single cell peak matrix, Fisher’s exact test has into 5 kb windows and removing the fragments not matched to or been widely used to compute differentially accessible peaks65,71–73.Todefine cell aligned to the mitochondria, we binarized the cell–bin matrix. In order to keep only type-specific peaks, we required the peaks to be stringently expressed in one bins that were informative for clustering, we removed the top 5% most accessible specific cell type. Specifically, for each peak we conducted a Fisher’s exact test bins and bins overlapping with ENCODE blacklist. The 484,606 remaining bins between a cell type and each of the other cell types. Peaks with corrected p values were used as input for clustering. (Benjamini–Hochberg approach) below significance level (0.05) in all pairwise tests were defined as cell type-specific peaks. This approach was inspired by our previous 21 Dimension reduction, batch effect correction, and clustering. Clustering was con- experience on scRNA-seq data differential expression analysis , and similar to 73 ducted using snapATAC19, a single-cell ATAC-seq algorithm scalable to large another published snATAC-seq paper . In total, we obtained 60,683 highly fi dataset. Previous benchmarking evaluation has shown that snapATAC was one of speci c DAPs, which were used for motif enrichment analysis. the best-performing methods for snATAC-seq clustering68. Diffusion map was applied as a dimension reduction method using function runDiffusionMaps.To Motif enrichment analysis. Motif enrichment analysis was conducted using DAPs remove batch effect, we used Harmony20, in which the low dimensional embed- by HOMER34 with hypergeometric test for enrichment (one-sided). We used the dings obtained from the diffusion map were used as an input. Harmony iteratively parameters background = “automatic” and scan.size = 300. We noticed that de pulled batch-specific centroid to cluster centroid until convergence to remove the novo motif identification only generated few significant results, so we focused on variability across batches. After batch correction, a graph was constructed using k- known motifs for our following study. We used the significance level of 0.05 for Nearest Neighbor (kNN) algorithm with k = 15, which was then used as input for Benjamini–Hochberg (BH) corrected p-value to determine the enriched results. Louvain clustering. We used the first 20 dimensions for the Louvain algorithm. The The motif enrichment results are provided in Supplementary Data 6. number of dimensions was chosen using a method recommended by snapATAC, although we noticed that the clustering results were similar among a series of Peak–peak correlation analysis. Peak–peak correlation analysis was conducted dimensions from 18 to 30. using Cicero33. In order to find developmental stage-specific peak-peak correla- tions, the analysis was conducted for P0 and adult separately. Cicero uses Graphic Cell type annotation. We used a published list of marker genes9,21 to annotate Lasso with distance penalty to assess the co-accessibility between different peaks. kidney cell types. In order to infer gene expression of each cell type, we built a cell- Cicero analysis was conducted using the run_cicero function with default para- gene activity score matrix by integrating all fragments that overlapped with gene meters. A heuristic cutoff of 0.25 score of co-accessibility was used to determine the transcript. We used GENCODE Mouse release VM1627 as reference annotation. connections between two peaks.

Comparison with mouse ATAC Atlas dataset. We compared our cell type anno- snATAC-seq trajectory analysis. snATAC-seq trajectory was conducted using tations and QC procedures with previously published sciATAC-seq data17 with Cicero, which extended Monocle3 to the snATAC-seq analysis. We obtained the mouse adult kidney samples included. As for the QC procedure, in the published preprocessed P0 snATAC-seq cell-peak matrix from snapATAC as input for Cicero mouse ATAC atlas, cells were kept with at least 500 UMIs and less than 10% and conducted dimension reduction using latent semantic indexing (LSI) and mitochondria reads, while in our analysis, we kept cells with at least 1000 UMIs visualized using UMAP. Trajectory graph was built using the function learn_graph. and less than 10% mitochondria reads. In addition, we also filtered out cells with Batch effect was not observed between the two P0 batches, and the trajectory graph promoter ratio below 20%. Despite the more stringent QC step, the median was consistent with cell type assignment with clustering analysis (Fig. S17a, b). number of accessible peaks per cell is comparable, with 4413 in our dataset and In order to study how open chromatin changes are associated with the cell fate 4333 in the Mouse ATAC Atlas dataset. decision, we first binned the cells into 15 groups based on their pseudotime and cell Our dataset contains larger number of cells (28,346 in our dataset vs. 6299 in type assignment. Next, we studied the DAPs between each group and its ancestral the mouse atlas dataset), allowing us to improve the cell type resolution of the prior group using the same methods described above. The number of newly open and map. For example, we were able to identify immune and CNT cells and separate closed chromatins were reported using pie charts. The exact peak locations are the PCT and PST segments of the proximal tubules and IC and PC cells in the provided in the Supplementary Data 11. collecting duct. Thus, our dataset includes 13 annotated cell types while the previous dataset has 6. In addition, the different resolution may also result in the Genes and gene ontology terms associated with snATAC-seq trajectory. Based on the difference in experimental technologies. binned trajectory graphs and DAPs between each group and its ancestral group, we Among the annotated cells, the cell type proportion is mostly consistent. By next used GREAT tool v4.0.474 to study the enrichment of associated genes and fi investigating the cell type-speci c marker genes reported in the Mouse ATAC Atlas gene ontology (GO) terms along the trajectory. We used the newly open or closed fi dataset, we observe good consistency between the cell type-speci c peaks, peaks as test regions and all the peaks from peak-calling output as the background indicating a consistent cell type annotation. regions for the analysis. The output can be found in the Supplementary Data 13 and 14. Peak calling and visualization. Peak calling was conducted for each cell type 22 separately using MACS2 . We aggregated all fragments obtained from the same Predict cis-regulatory elements. We implemented two methods to study cis-reg- cell types to build a pseudo-bulk ATAC data and conducted peak calling with ulatory elements in the snATAC-seq data. The first method was inspired by Zhu “– – – – – − – – parameters nomodel keep-dup all shift 100 ext 200 qval 1e 2-B SPMR call- et al.30, which was based on the observation that there was co-enrichment in the ” “– ” “ summits . By specifying SPMR , MACS2 generated fragment pileup per million genome between the snATAC-seq cell type-specific peaks and scRNA-seq cell type- ” fi reads pileup les, which were converted to bigwig format for visualization using specific genes. This method links a gene with a peak if (1) they were both specificin UCSC bedGraphToBigWig tool. the same cell type, (2) they were in cis, meaning that the peak is in ±100 kb region We also visualized public chromatin ChIP-seq data and RNA-seq data obtained of the TSS of the corresponding gene, and (3) the peak did not directly overlap with fi from ENCODE Encyclopedia with the following identi ers: ENCFF338WZP, the TSS of the gene. This method successfully inferred several known distal ele- ENCFF872MVE, ENCFF455HPY, ENCFF049LRQ, ENCFF179NTO, ENCFF071PID, ments such as for Six2 and Slc6a18 (Supplementary Fig. 13c). ENCFF746MFH, ENCFF563LOO, ENCFF184AYF, ENCFF107NQP, ENCFF465THI, Alternatively, we assessed the co-accessibility of two peaks. We implemented ENCFF769XWI, and ENCFF591DAX. The Six2 ChIP-seq data were obtained from Cicero33, which aggregates similar cells to obtain a set of “meta-cells” and address 69 70 the ref. and the WGBS data were obtained from ref. . the issue with sparsity in the snATAC-seq data. We used run_cicero function with default parameters to predict cis-regulatory elements (CREs). Although it is Genomic elements stratification. Mouse mm10 genome annotation files were recommended to use 0.25 as a cutoff for co-accessibility score, we noticed that this download from UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables) resulted in a great amount of CREs, which could contain many false positives. using GENCODE VM23. TSS upstream 5 kb regions were included as promoter Thus, we used a more stringent score of 0.4 for the cutoff and retained 232,380 and regions, but the results were similar when using 2 kb upstream regions as promoters. 206,701 CRE links in the P0 and adult data, respectively. We then studied the number of overlapped regions between open chromatin regions identified from the snATAC-seq and bulk ATAC-seq dataset and genome anno- tations. Since one open chromatin region could overlap with multiple genomic Bulk ATAC sequencing analysis. Bulk ATAC-seq raw fastq files were processed elements, we defined an order of genomic elements as > 5′-UTR > 3′-UTR > using the end-to-end tool ENCODE ATAC-seq pipeline (Software and Algo- > promoter > distal elements. To be more specific, if one peak overlapped rithms). This tool provided a standard workflow for ATAC-seq data quality con- with both exon and 5′-UTR, the algorithm would count it as an exon-region peak. trol, adapter removal, alignment, and peak calling. To obtain high quality ATAC- seq peaks, peak calling results from two biological replicates were compared and Identification of differentially accessible regions. Peaks identified in each cell type only those peaks that were present in both replicates were kept, which were further were combined to build a union peak set. Overlapping peaks were then merged to used to compare with snATAC-seq peaks.

14 NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1 ARTICLE

Correlation of bulk and single nuclei ATAC sequencing data. snATAC-seq scRNA-seq trajectory analysis reads were aggregated to a pseudo-bulk data for the comparison purpose. To Monocle3 prevent the effect of sex chromosome and mitochondria chromosome, reads from To construct single cell pseudotime trajectory and to identify genes whose expression chromosome X, Y and M were excluded from our analysis. We used multi- changed as the cells underwent transition, Monocle3 v0.1.378 was applied to P0 cells of BigwigSummary tool from deeptools75 to study the correlation between different the following Seurat cell clusters: nephron progenitors (NP), proliferating cells, stroma- samples. Specifically, the whole genome was binned into equally sized (10 kb) like cells, podocytes, loop of Henle (LOH), early proximal tubule (PT), proximal con- windows, and the reads in each bin were aggregated, generating a bin-read count voluted tubule (PCT), and proximal straight tubule (PST) cells. vector for each of the sample. The correlation of these vectors was computed as a To show cell trajectories from both small (nephron progenitors) and large cell popula- measure of pairwise similarity between samples. tions (proximal tubule), an equal number of 450 cells per cluster was randomly sub- To compare the number of peaks in these two datasets, we used as input the sampled. Cells were re-clustered by Monocle3 using a resolution of 0.0005 with kNN k = fi fi narrowpeak les from the snATAC-seq and bulk ATAC-seq analysis. We ltered 29. Highly variable genes along pseudotime were identified using differentialGeneTest out bulk ATAC-seq peaks with q value > 0.01 to be consistent with the snATAC- function and cells were ordered along pseudotime trajectory. NP cluster was defined as seq setting. Since the snATAC peaks were called after merging different time earliest principal node. In order to find genes differentially expressed along pseudotime, points, we also took the union set of bulk ATAC-seq peaks from different time trajectories for podocytes, loop of Henle, and proximal tubule clusters were analyzed points. We then used findoverlap function in GenomicRanges package76 to identify separately with the fit_models function of Monocle3. Genes with a q value < 0.05 in the overlapping peaks. differentialGeneTest analysis were kept. In an alternate approach, graph_test function of Monocle3 was used and trajectory-variable genes were collected into modules at a Comparison between single nuclei ATAC sequencing data and single cell RNA resolution of 0.01. sequencing data. In order to compare the cluster assignment between snATAC- seq data and scRNA-seq data, we obtained the average gene expression values and RNA velocity peak accessibility in each cluster for P0 and adult samples separately. We next To calculate RNA velocity, Python-based Velocyto command-line tool as well as Velo- transformed snATAC-seq data by summing up the reads within gene body and cyto.R package were used as instructed37. We used Velocyto to calculate the single-cell 2 kb upstream regions to build gene activity score matrix, as suggested in Seurat23. trajectory/directionality using spliced and unspliced reads. From loom files produced by Then, we normalized the data and computed the mean expression and mean gene the command-line tool, we subset the exact same cells that were previously selected ’ activity scores in each cell type, and calculated z-scores of each gene. Pearson s randomly for Monocle trajectory analysis. This subset was loaded into R using the fi correlation coef cient was then calculated among top 3000 highly variable genes SeuratWrappers v0.1.0 package. RNA velocity was estimated using gene-relative model between snATAC-seq data and scRNA-seq data. We found high concordance with kNN cell pooling (k = 25). The parameter n was set at 200, when visualizing RNA between these two datasets in terms of cell type assignment (Fig. 1g and Supple- velocity on the UMAP embedding. mentary Fig. 3b). Gene regulatory network inference. In order to identify TFs and characterize cell Single cell RNA sequencing data analysis states, we employed cis-regulatory analysis using the R package SCENIC v1.1.2.235, Alignment and quality control. Raw fastq files were aligned to the mm10 (Ensembl which infers the gene regulatory network based on co-expression and DNA motif GRCm38.93) reference genome and quantified using CellRanger v3.1.0. Seurat R analysis. The network activity is then analyzed in each cell to identify recurrent package v3.0 was used for data quality control, preprocessing, and dimensional cellular states. In short, TFs were identified using GENIE3 and compiled into reduction analysis. After gene-cell data matrix generation of both P0 and adult modules (regulons), which were subsequently subjected to cis-regulatory motif datasets, matrices were merged and poor-quality cells with <200 or >3000 analysis using RcisTarget with two gene-motif rankings: 10 kb around the TSS and expressed genes and mitochondrial gene percentages >50 were excluded, leaving 500 bp upstream. Regulon activity in every cell was then scored using AUCell. 25,138 P0 and 18,498 adult cells for further analytical processing, respectively Finally, binarized regulon activity was projected onto Monocle3-created UMAP (Supplementary Fig. 4a, b). trajectories.

Pre-processing, batch effect correction, and dimension reduction. Data were nor- Ligand–receptor interactions. To assess cellular crosstalk between different cell malized by RPM following log transformation and 3000 highly variable genes were types, we used the CellPhoneDB repository to infer cell–cell communication net- selected for scaling and principal component analysis (PCA). Harmony R package works from single cell transcriptome data47. We used the Python package Cell- v1.020 was used to correct batch effects. The top 20 dimensions of Harmony PhoneDB v2.1.2 with the database v2.0.0 to predict cell type-specific embeddings were used for downstream uniform manifold approximation and ligand–receptor interactions as per the authors’ instructions. Only receptors and projection (UMAP) visualization and clustering (Supplementary Fig. 4c, d). ligands expressed in more than 5% of the cells in the specific cluster were con- sidered. Thousand iterations of permutation test were conducted and p values were Cell clustering, identification of marker genes, and differentially expressed genes. corrected using FDR methods. Louvain algorithm with resolution 0.4 was used to cluster cells, which resulted in 18 distinct cell clusters. A gene was considered to be differentially expressed if it Human kidney tissue processing. Ten adult human kidney samples were was detected in at least 25% of one group and with at least 0.25 log fold change obtained from the non-tumor tissue of six partial or radical nephrectomy patients fi – between two groups and the signi cant level of Benjamini Hochberg (BH) from the Hospital of the University of Pennsylvania. Institutional Review Boards at adjusted p value < 0.05 in Wilcoxon rank sum test was used. We used a list of the University of Pennsylvania reviewed this study. This project utilized de- 9,21 marker genes to manually annotate cell types. Two distal convoluted tubule identified kidney biospecimens collected via CHTN (Cooperative Human Tissue clusters were merged based on the marker gene expression, resulting in a total of Network), and therefore was considered “exempt” by the local IRB. The work was – 17 clusters (Supplementary Fig. 5a c). completed in compliance with all relevant ethical regulations. Fresh tissues were shipped to the lab in RPMI on ice the same day after nephrectomy. Nuclei pre- Ambient RNA quantification. As in droplet based scRNA-seq experiments, there is paration, library construction, and sequencing methods were the same as for mouse always a certain amount of background mRNAs present that get distributed into kidneys described above. droplets and sequenced along with cells77. In order to quantify the net effect of ambient RNA contamination, we used R package SoupX24. Function autoEstCount was used to estimate the contamination fraction in P0 and adult batches separately. snATAC-seq data analysis of human kidneys fi We visualized the change in expression due to soup correction in UMAP space. Data processing and quality control. Raw fastq les were aligned to the b37 fi Function adjustCounts was used to correct the count expression matrices for (GRCh37) genome, quanti ed using Cell Ranger ATAC (v. 1.2.0). The quality fi downstream processing. We then used the corrected matrices and reran the whole control and barcodes ltration steps were similar as mice samples described above. fl Seurat pipeline with the same parameters. The average expression of genes per Brie y, we only keep cells with promoter ratio between 20 and 60%, UMIs between 3 5 cluster were used for Pearson correlation coefficient analysis to compare between 10 and 10 , and mitochondria ratio below 10%. In total, we obtained 61,440 single matrices with and without ambient RNA correction. As results with and without cells across ten adult human kidney samples after quality control. ambient RNA were similar (Supplementary Figs. 6 and 7), results without ambient RNA correction are shown throughout the manuscript. Preprocessing, dimension reduction, batch effect correction, clustering, and cell type annotation. Filtered barcode-by-cell matrices were processed with SnapATAC fi Subclustering of stroma populations. To investigate the heterogeneity within the pipeline similar to that in mouse samples described above. Speci cally, we used stroma clusters, we conducted subclustering analysis on P0 (Supplementary Fig. 9) 5 kb bin size to create cell-by-bin matrices, following dimension reduction, we = and adult (Supplementary Fig. 10) stroma cells. Using recently reported marker removed batch effect by Harmony. We used k 15 for the k-nearest neighbor fi = genes, we were able to recapitulate some cell types including mesangial cells, algorithm and used the rst 24 dimensions and resolution 0.8 for clustering with fibroblasts, smooth muscle cells, and juxtaglomerular cells. We also found some the Louvain algorithm. subpopulations with various signatures. Further experimental validation is needed to determine whether these populations are due to artifacts in single cell data Peak calling and visualization. We used MACS2 for peak calling of each cell type (doublets or contaminated reads) or real populations. with the same parameters as for mouse samples described above. To visualize the

NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications 15 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1 chromatin ChIP-seq data for H3K4me3, H3K4me1, and H3K27ac, we used public 11. Schmidt-Ott, K. M. How to grow a kidney: patient-specific kidney organoids data from GEO with the following identifiers: GSM1586397, GSM3716711, and come of age. Nephrol. Dial. Transpl. 32,17–23 (2017). GSM371671479. 12. Menon, R. et al. Single-cell analysis of progenitor cell dynamics and lineage specification in the human fetal kidney. Development 145, dev164038 (2018). Immunofluorescence staining. Mouse kidneys were fixed with 4% paraformalde- 13. Lindstrom, N. O. et al. Progressive recruitment of mesenchymal progenitors hyde overnight, rinsed in PBS, and dehydrated for paraffin embedding. Antigen reveals a time-dependent process of cell fate acquisition in mouse and human – retrieval was performed using Tris-EDTA buffer pH 9.0 with a pressure cooker nephrogenesis. Dev. Cell 45, 651 660 e654 (2018). (PickCell Laboratories, Agoura Hills, CA) and antibody staining performed as 14. Hochane, M. et al. Single-cell transcriptomics reveals gene expression described80. Antibodies used were as follows: guinea pig FOXL1 (1:1500)81,mouseE- dynamics of human fetal kidney development. PLoS Biol. 17, e3000152 (2019). cadherin (1:250; BD Transducton 610182, Franklin Lakes, NJ), goat WT1 (clone F6) 15. Park, J., Liu, C. L., Kim, J. & Susztak, K. Understanding the kidney one cell at a (1:50; Santa Cruz sc-7385). Cy2-conjugated, Cy3-conjugated, and Cy5-conjugated time. Kidney Int. 96, 862–870 (2019). donkey secondary antibodies (1:2000) were purchased from Jackson ImmunoResearch 16. Li, Z. et al. scOpen: chromatin-accessibility estimation of single-cell ATAC Laboratories, Inc, AlexaFluor 488-conjugated donkey secondary antibodies were from data. Preprint at bioRxiv https://doi.org/10.1101/865931 (2019). LifeSciences (1:1000). Fluoresecence images were collected on a Keyence microscope. 17. Cusanovich, D. A. et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174, 1309–1324 e1318 (2018). 18. McMahon, A. P. et al. GUDMAP: the genitourinary developmental molecular Statistical information. The differential accessible analysis was conducted by – pairwise Fisher’s exact test. The differential expression analysis was conducted anatomy project. J. Am. Soc. Nephrol. 19, 667 671 (2008). using Wilcoxon rank sum test. The motif enrichment was based on hypergeometric 19. Fang, R. et al. Fast and accurate clustering of single cell epigenomes reveals test. All statistical tests were multitest corrected using FDR method and a sig- cis-regulatory elements in rare cell types. Nat. Commun. 12, 1337 (2021). nificance level of 0.05 was used throughout the manuscript. 20. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods 16, 1289–1296 (2019). 21. Park, J. et al. Single-cell transcriptomics of the mouse kidney reveals potential Reporting summary. Further information on research design is available in the Nature cellular targets of kidney disease. Science 360, 758–763 (2018). Research Reporting Summary linked to this article. 22. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008). Data availability 23. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, – Raw data, processed data, and metadata from mouse samples have been deposited in 1888 1902 (2019). GEO with the accession codes GSE157079. The processed data and metadata can also be 24. Young, M. D. & Behjati, S. SoupX removes ambient RNA contamination from viewed, analyzed, and downloaded at susztaklab.com/developing_adult_kidney/ droplet based single-cell RNA sequencing data. GigaScience 9, (2020). snATAC/, susztaklab.com/developing_adult_kidney/scRNA/, and susztaklab.com/ 25. Wu, H., Kirita, Y., Donnelly, E. L. & Humphreys, B. D. Advantages of single- developing_adult_kidney/igv/. The raw data, processed data, and metadata from human nucleus over single-cell RNA sequencing of adult kidney: rare cell types and fi – samples are available at https://www.diabetesepigenome.org. In this study, we novel cell states revealed in brosis. J. Am. Soc. Nephrol. 30,23 232 (2019). fi downloaded public data from the following database with accession numbers: GUDMAP 26. England, A. R. et al. Identi cation and characterization of cellular (RID:Q-Y4CY); ENCODE (ENCFF338WZP, ENCFF872MVE, ENCFF455HPY, heterogeneity within the developing renal interstitium. ENCFF049LRQ, ENCFF179NTO, ENCFF071PID, ENCFF746MFH, ENCFF563LOO, Development 147, dev190108 (2020). ENCFF184AYF, ENCFF107NQP, ENCFF465THI, ENCFF769XWI, ENCFF591DAX); 27. Frankish, A. et al. GENCODE reference annotation for the human and mouse – GEO (GSM1051156, GSM3716711, GSM3716714, GSM1586397). Further information genomes. Nucleic Acids Res. 47, D766 D773 (2019). and requests for resources and reagents should be directed to and will be fulfilled by the 28. Guan, Y. et al. Dnmt3a and Dnmt3b-Decommissioned Fetal Enhancers are lead contact: Katalin Susztak. email: [email protected]. Linked to Kidney Disease. J. Am. Soc. Nephrol. 31, 765 (2020). 29. Davis, C. A. et al. The encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 46, D794–D801 (2018). Code availability 30. Zhu, Q. et al. Developmental trajectory of pre-hematopoietic stem cell Codes are available at https://github.com/Zhen-Miao/dev-kidney-snATAC.(https://doi. formation from endothelium. Blood 136, 845–856 (2020). org/10.5281/zenodo.4421623)82. 31. Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018). ’ Received: 5 June 2020; Accepted: 28 February 2021; 32. O Brien, L. L. et al. Transcriptional regulatory control of mammalian nephron progenitors revealed by multi-factor cistromic analysis and genetic studies. PLoS Genet. 14, e1007181 (2018). 33. Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858–871 e858 (2018). 34. Heinz, S. et al. Simple combinations of lineage-determining transcription References factors prime cis-regulatory elements required for macrophage and B cell 1. Reidy, K., Kang, H. M., Hostetter, T. & Susztak, K. Molecular mechanisms of identities. Mol. Cell 38, 576–589 (2010). diabetic kidney disease. J. Clin. Invest. 124, 2333–2340 (2014). 35. Aibar, S. et al. SCENIC: single-cell regulatory network inference and 2. Costantini, F. & Kopan, R. Patterning a complex organ: branching clustering. Nat. Methods 14, 1083–1086 (2017). morphogenesis and nephron segmentation in kidney development. Dev. Cell 36. Nittoli, V. et al. Characterization of paralogous uncx 18, 698–712 (2010). encoding genes in zebrafish. Gene 2, 100011 (2019). 3. Park, J. S. et al. Six2 and Wnt regulate self-renewal and commitment of 37. La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018). nephron progenitors through shared gene regulatory networks. Dev. Cell 23, 38. Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: 637–651 (2012). inferring transcription-factor-associated accessibility from single-cell 4. Harding, S. D. et al. The GUDMAP database–an online resource for epigenomic data. Nat. Methods 14, 975–978 (2017). genitourinary research. Development 138, 2845–2853 (2011). 39. Guo, J. K. et al. WT1 is a key regulator of podocyte function: reduced 5. Wu, H. et al. Comparative analysis and refinement of human PSC-derived expression levels cause crescentic glomerulonephritis and mesangial sclerosis. kidney organoid differentiation with single-cell transcriptomics. Cell Stem Cell Hum. Mol. Genet. 11, 651–659 (2002). 23, 869–881 e868 (2018). 40. Kann, M. et al. WT1 targets Gas1 to maintain nephron progenitor cells by 6. Takasato, M. et al. Kidney organoids from human iPS cells contain multiple modulating FGF signals. Development 142, 1254–1266 (2015). lineages and model human nephrogenesis. Nature 526, 564–568 (2015). 41. Nakai, S. et al. Crucial roles of Brn1 in distal tubule formation and function in 7. Morizane, R. & Bonventre, J. V. Generation of nephron progenitor cells and mouse kidney. Development 130, 4751–4759 (2003). kidney organoids from human pluripotent stem cells. Nat. Protoc. 12, 195–207 42. Nilsson, D., Heglind, M., Arani, Z. & Enerbäck, S. Foxc2 is essential for (2017). podocyte function. Physiol. Rep. 7, e14083–e14083 (2019). 8. Nishinakamura, R. Human kidney organoids: progress and remaining 43. Takahashi, T. et al. Temporally compartmentalized expression of ephrin-B2 challenges. Nat. Rev. Nephrol. 15, 613–624 (2019). during renal glomerular development. J. Am. Soc. Nephrol. 12, 2673–2682 9. Combes, A. N. et al. Correction: single cell analysis of the developing mouse (2001). kidney provides deeper insight into marker gene expression and ligand- 44. Cheng, H. T. & Kopan, R. The role of Notch signaling in specification of receptor crosstalk. Development https://doi.org/10.1242/dev.182162 (2019). podocyte and proximal tubules within the developing mouse kidney. Kidney 10. Adam, M., Potter, A. S. & Potter, S. S. Psychrophilic proteases dramatically Int. 68, 1951–1952 (2005). reduce single-cell RNA-seq artifacts: a molecular atlas of kidney development. 45. Kang, H. M. et al. Defective fatty acid oxidation in renal tubular epithelial cells Development 144, 3625–3632 (2017). has a key role in kidney fibrosis development. Nat. Med. 21,37–46 (2015).

16 NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22266-1 ARTICLE

46. Das, A. et al. Stromal–epithelial crosstalk regulates kidney progenitor cell 77. Balzer, M. S. et al. How to Get Started with Single Cell RNA Sequencing Data differentiation. Nat. Cell Biol. 15, 1035–1044 (2013). Analysis. J. Am. Soc. Nephrol., ASN.2020121742, doi:10.1681/asn.2020121742 47. Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal-fetal (2021). interface in humans. Nature 563, 347–353 (2018). 78. Cao, J. et al. The single-cell transcriptional landscape of mammalian 48. Trueb, B., Amann, R. & Gerber, S. D. Role of FGFRL1 and other FGF organogenesis. Nature 566, 496–502 (2019). signaling in early kidney development. Cell Mol. Life Sci. 70, 79. Meyer, M. B. et al. Targeted genomic deletions identify diverse enhancer 2505–2518 (2013). functions and generate a kidney-specific, endocrine-deficient Cyp27b1 49. Costantini, F. & Shakya, R. GDNF/Ret signaling and the development of the pseudo-null mouse. J. Biol. Chem. 294, 9518–9535 (2019). kidney. Bioessays 28, 117–127 (2006). 80. Zhao, J., Hu, Z. Z., Zheng, X. G. & Ng, S. W. [2-(Tetra-zol-1-yl)acetato- 50. Hwang, D. Y. et al. Mutations of the SLIT2-ROBO2 pathway genes SLIT2 and kappaO]tris-(tri-phenyl-phosphine-kappaP)silver(I) mono-hydrate. Acta SRGAP1 confer risk for congenital anomalies of the kidney and urinary tract. Crystallogr. Sect. E Struct. Rep. Online 65, m1601 (2009). Hum. Genet 134, 905–916 (2015). 81. Aoki, R. et al. Foxl1-expressing mesenchymal cells constitute the intestinal 51. Fan, X. et al. SLIT2/ROBO2 signaling pathway inhibits nonmuscle myosin IIA stem cell niche. Cell Mol. Gastroenterol. Hepatol. 2, 175–188 (2016). activity and destabilizes kidney podocyte adhesion. JCI Insight 1, e86934 82. Miao, Z. et al. Single cell regulatory landscape of the mouse kidney highlights (2016). cellular differentiation programs and disease targets, Zhen-Miao/dev-kidney- 52. Sajithlal, G., Zou, D., Silvius, D. & Xu, P. X. Eya 1 acts as a critical regulator for snATAC https://doi.org/10.5281/zenodo.4421623 (2020). specifying the metanephric . Dev. Biol. 284, 323–336 (2005). 83. Uhlen, M. et al. Tissue-based map of the human proteome. Science 347, 53. Wuttke, M. et al. A catalog of genetic loci associated with kidney function 1260419 (2015). from analyses of a million individuals. Nat. Genet. 51, 957–972 (2019). 54. Hellwege, J. N. et al. Mapping eGFR loci to the renal transcriptome and phenome in the VA Million Veteran Program. Nat. Commun. 10, 3842 (2019). Acknowledgements 55. Pattaro, C. et al. Genetic associations at 53 loci highlight cell types and biological This work in the Susztak lab is supported by the NIH DK076077, DK087635 and pathways relevant for kidney function. Nat. Commun. 7, 10023 (2016). DK105821. M.S.B. is supported by a German Research Foundation grant (BA 6205/2-1). 56. Qiu, C. et al. Renal compartment-specific genetic variation analyses identify We thank the University of Pennsylvania Diabetes Research Center (DRC) for the use of new pathways in chronic kidney disease. Nat. Med. 24, 1721–1731 (2018). the next generation sequencing Core (P30-DK19525). We thank Dr. Pablo G. Cámara, 57. Menon, M. C. et al. Intronic locus determines SHROOM3 expression and Dr. Wenliang Wang, Qin Zhu, Feiya Ou, Siling Du, and Lulu Liu for constructive potentiates renal allograft fibrosis. J. Clin. Invest. 125, 208–221 (2015). suggestions that improve the manuscript. 58. Khalili, H. et al. Developmental origins for kidney disease due to Shroom3 deficiency. J. Am. Soc. Nephrol. 27, 2965–2973 (2016). 59. Ransick, A. et al. Single-cell profiling reveals sex, lineage, and regional Author contributions diversity in the mouse kidney. Dev. Cell 51, 399–413 (2019). Z.M. and K.S. designed and conceived the experiment. Z.Y.M., J.W., R.S., and T.A. 60. Kobayashi, A. et al. Six2 defines and regulates a multipotent self-renewing conducted the experiment. Z.M. conducted snATAC-seq analysis with nephron progenitor population throughout mammalian kidney development. advice from K.S., H.L., M.S.B., and J.K. M.S.B. and Z.M. conducted scRNA-seq bioin- Cell Stem Cell 3, 169–181 (2008). formatics analysis with advice from K.S., M.L., J.K., and M.P. M.S.B., A.M.K., and A.Y.K. 61. Han, S. H. et al. PGC-1alpha protects from Notch-induced kidney fibrosis conducted immunofluorescence staining with supervision from K.H.K. All authors dis- development. J. Am. Soc. Nephrol. 28, 3312–3322 (2017). cussed and commented on the results. K.S., Z.M., and M.S.B. wrote the manuscript and fi 62. Dhillon, P. et al. The ESRRA Protects from Kidney Disease all authors edited and approved of the nal manuscript. by Coupling Metabolism and Differentiation. Cell Metab. 33, 379–394.e8 (2021). Competing interests fi 63. Parikshak, N. N. et al. Integrative functional genomic analyses implicate speci c The authors declare no competing interests. molecular pathways and circuits in autism. Cell 155, 1008–1021 (2013). 64. Zhu, C. et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol. 26, 1063–1070 (2019). Additional information 65. Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the Supplementary information The online version contains supplementary material transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. available at https://doi.org/10.1038/s41467-021-22266-1. 37, 1452–1457 (2019). 66. Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a Correspondence and requests for materials should be addressed to K.S. method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109,212921–21 29 29 (2015). Peer review information Nature Communications thanks Joseph Bonventre, Mark 67. Corces, M. R. et al. An improved ATAC-seq protocol reduces background and Knepper and the other, anonymous, reviewer(s) for their contribution to the peer review enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017). of this work. Peer reviewer reports are available. 68. Chen, H. et al. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biol. 20, 241 (2019). Reprints and permission information is available at http://www.nature.com/reprints 69. O’Brien, L. L. et al. Differential regulation of mouse and human nephron progenitors by the Six family of transcriptional regulators. Development 143, Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in 595–608 (2016). published maps and institutional affiliations. 70. Hon, G. C. et al. Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat. Genet. 45,1198–1206 (2013). 71. Chiou, J. et al. Single cell chromatin accessibility reveals pancreatic islet cell Open Access This article is licensed under a Creative Commons fi type- and state-speci c regulatory programs of diabetes risk. Preprint at Attribution 4.0 International License, which permits use, sharing, bioRxiv https://doi.org/10.1101/693671 (2019). adaptation, distribution and reproduction in any medium or format, as long as you give 72. Ziffra, R. S. et al. Single cell epigenomic atlas of the developing human brain appropriate credit to the original author(s) and the source, provide a link to the Creative and organoids. Preprint at bioRxiv https://doi.org/10.1101/2019.12.30.891549 Commons license, and indicate if changes were made. The images or other third party (2020). material in this article are included in the article’s Creative Commons license, unless 73. Pijuan-Sala, B. et al. Single-cell chromatin accessibility maps reveal regulatory indicated otherwise in a credit line to the material. If material is not included in the programs driving early mouse organogenesis. Nat. Cell Biol. 22,487–497 (2020). article’s Creative Commons license and your intended use is not permitted by statutory 74. McLean, C. Y. et al. GREAT improves functional interpretation of cis- regulation or exceeds the permitted use, you will need to obtain permission directly from regulatory regions. Nat. Biotechnol. 28, 495–501 (2010). the copyright holder. To view a copy of this license, visit http://creativecommons.org/ 75. Ramirez, F. et al. deepTools2: a next generation web server for deep- licenses/by/4.0/. sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016). 76. Lawrence, M. et al. Software for computing and annotating genomic ranges. PLOS Comput. Biol. 9, e1003118 (2013). © The Author(s) 2021

NATURE COMMUNICATIONS | (2021) 12:2277 | https://doi.org/10.1038/s41467-021-22266-1 | www.nature.com/naturecommunications 17