BASIC RESEARCH www.jasn.org

Single-Cell RNA Sequencing Reveals mRNA Splice Isoform Switching during Kidney Development

Yishay Wineberg,1 Tali Hana Bar-Lev,1 Anna Futorian,2 Nissim Ben-Haim,1 Leah Armon,2 Debby Ickowicz,2 Sarit Oriel,1 Efrat Bucris,1 Yishai Yehuda,1 Naomi Pode-Shakked ,3,4,5 Shlomit Gilad,6 Sima Benjamin,6 Peter Hohenstein,7 Benjamin Dekel,3,4,5 Achia Urbach,2 and Tomer Kalisky1

Due to the number of contributing authors, the affiliations are listed at the end of this article.

ABSTRACT Background During mammalian kidney development, nephron progenitors undergo a mesenchymal- to-epithelial transition and eventually differentiate into the various tubular segments of the nephron. Recently, Drop-seq single-cell RNA sequencing technology for measuring expression from thousands of individual cells identified the different cell types in the developing kidney. However, that analysis did not include the additional layer of heterogeneity that alternative mRNA splicing creates. Methods Full transcript length single-cell RNA sequencing characterized the transcriptomes of 544 indi- vidual cells from mouse embryonic kidneys. Results levels measured with full transcript length single-cell RNA sequencing identified each cell type. Further analysis comprehensively characterized splice isoform switching during the tran- sition between mesenchymal and epithelial cellular states, which is a key transitional process in kidney development. The study also identified several putative splicing regulators, including the Esrp1/2 and Rbfox1/2. Conclusions Discovery of the sets of genes that are alternatively spliced as the fetal kidney mesenchyme differentiates into tubular will improve our understanding of the molecular mechanisms that drive kidney development.

JASN 31: ccc–ccc, 2020. doi: https://doi.org/10.1681/ASN.2019080770

Kidney development is a complex process that in- population. Next, cells from the CM undergo a mes- volves multiple interacting cell types.1–5 It starts at enchymal-to-epithelial transition (MET) and the embryonic stage and continues from week 5 to progressively differentiate into early epithelial struc- week 36 of gestation in humans and from embry- tures: pretubular aggregates, renal vesicles, and onic day (E) 10.5 to approximately day 3 after birth comma and S-shaped bodies. The S-shaped bodies in mice. The process is initiated by signaling inter- actions between two lineages originating from the intermediate mesoderm: the ureteric duct and the Received August 2, 2019. Accepted May 23, 2020. metanephric mesenchyme (Figure S1 in Supple- Y.W., T.H.B.-L., A.F., and N.B.-H. are cofirst authors and B.D., A. mental Appendix 1). These interactions invoke U., and T.K. are cosenior authors. the ureteric duct to invade the metanephric mesen- Published online ahead of print. Publication date available at chyme and create a tree-like structure. Around the www.jasn.org. tip of each branch of this tree, the ureteric tip, cells Correspondence: Dr. Tomer Kalisky, Department of Bio- from the metanephric mesenchyme are induced engineering, Bar-Ilan University, Ramat Gan, Israel 52900. Email: to condense and form the cap mesenchyme (CM), [email protected] which is a transient nephron progenitor cell Copyright © 2020 by the American Society of Nephrology

JASN 31: ccc–ccc, 2020 ISSN : 1046-6673/3110-ccc 1 BASIC RESEARCH www.jasn.org further elongate and differentiate to form the various epithelial Significance Statement tubular segments of the fully developed nephron, whose main constituents are the podocytes (PODOs), the proximal tubule, Kidney development is a complex process involving multiple in- the loop of Henle (LOH), and the distal tubule. At an early stage teracting and transitioning cell types. Drop-seq single-cell tech- in their differentiation the distal tubules connect to the ureteric nology, which measures gene expression from many thousands of individual cells, has been used to characterize these cellular differ- tips that form the collecting duct system for draining the neph- entiation changes that underlie organ development. However, the rons. Meanwhile, the uninduced cells of the metanephric mes- alternative splicing of many genes creates an additional layer of enchyme differentiate into other supporting cell types of the cellular heterogeneity that Drop-seq technology cannot measure. kidney such as interstitial fibroblasts, pericytes, and mesangial Therefore, in this study, full transcript length single-cell RNA se- cells (Figure S1 in Supplemental Appendix 1). In the past few quencing was used to characterize alternative splicing in the mouse embryonic kidney, with particular attention to the identification of years, the various cell populations of the developing kidney genes that are alternatively spliced during the transition from – were characterized,6 8 mainly using the Drop-seq single-cell mesenchymal to epithelial cell states, as well as their splicing reg- RNA sequencing protocol that enables measuring of gene ex- ulators. These results improve our understanding of the molecular pression levels from many thousands of individual cells.9 mechanisms that underlie kidney development. A central process in kidney development is the MET that occurs during the differentiation of cells from the metanephric heterogeneous organ that is composed of numerous cell mesenchyme to the CM and then to nephron tubules. Similar populations of widely varying proportions,6–8,24 it is diffi- transitions from mesenchymal to epithelial states and vice cult to isolate pure populations of mesenchymal and epi- versa (epithelial-to-mesenchymal transition [EMT]) are thelial states. Moreover, typical sequencing depths from a thought to play a central role in development, as well as in single cell are not sufficient for splicing analysis. We there- 10 pathologic processes such as cancer metastasis and organ fore performed single-cell RNA sequencing on 576 indi- fi 11 fi brosis. There are signi cant structural and functional vidual cells from the kidneys of E18.5 mouse embryos differences between mesenchymal and epithelial cells. Mesen- using the Smartseq2 protocol for sequencing full transcript chymal cells are typically loosely associated with each other, lengths.25,26 We first identified the main cell lineages that surrounded by an extracellular matrix, and have migratory coexist in the nephrogenic zone of the fetal developing capabilities, whereas epithelial cells are tightly interconnected kidney: the uninduced metanephric mesenchyme, the by junctions and are polarized with distinct apical and baso- CM, podocytes, epithelial cells, endothelial cells, and in- lateral membranes. Thus, epithelial cells can create two- filtrating immune cells (e.g., macrophages). We then dimensional surfaces and tubes with a clear in/out distinction merged the raw reads from all cells belonging to each pop- that are capable of absorption and secretion. There are also ulation in order to create “bulk” in silico transcriptomes large differences in gene expression: mesenchymal cells typi- that represent each cell population. These bulk transcrip- cally express Fibronectin (Fn1), Vimentin (Vim), and tomes had sufficient sequencing depth to allow us to char- 12 N- (Cdh2), as well as the transcription factors acterize splice isoform switching and to identify putative Snai1/2, Zeb1/2, and Twist1/2, whereas epithelial cells typi- splicing regulators. cally express other genes such as E-cadherin (Cdh1) and Epcam. It was recently discovered that mesenchymal and epithelial METHODS cells also express alternative splice isoforms of genes that are expressed in both cell states. For example, in many systems Tissue Collection, Dissociation, and Flow Cytometry both in vivo and in vitro, the genes Enah, Cd44, Ctnnd1, and A wild-type female mouse was crossed with a male that was Fgfr2 were found to be expressed in both mesenchymal and heterozygous for the Six2-GFP transgene.27 The female was epithelial states, but with unique isoforms specifictoeach euthanized at E18.5 of pregnancy and kidneys from 11 em- state.13–18 It was also found that RNA binding , such bryos were dissected, placed in PBS on ice, and examined un- as ESRP1/2, RBFOX1/2, RBM47, QKI, act as splicing regula- der a fluorescence microscope to check for the presence of tors that promote splicing of specific mesenchymal or epithe- GFP. Kidneys from transgenic embryos showed a clear fluo- lial variants.12,17,19,20 mRNA splicing creates an additional rescent pattern marking the CM (Figure 1A), whereas those layer of heterogeneity that, apart from specific genes,21–23 from the nontransgenic embryos showed uniform back- has not yet been comprehensively studied in the developing ground fluorescence. Seven out of the 11 embryos contained kidney. the Six2-GFP transgene. Keeping the transgenic and nontrans- Therefore, in this study we set to characterize the splice genic kidneys separate, each kidney was then cut into two to isoform switching events that occur during the transition four small pieces using a surgical razor blade and placed in between the mesenchymal and epithelial cellular states in 1mloftrypsin0.25%(03–046–1A, Trypsin Solution B the course of kidney development by comparing gene ex- [0.25%]; Biologic Industries) using forceps. Initial tissue trit- pression and alternative splicing in the various mesenchy- uration was performed using a tissue grinder (D8938, Dounce, mal and epithelial cell populations. Because the kidney is a large clearance pestle) followed by up-down pipetting with a

2 JASN JASN 31: ccc–ccc,2020 www.jasn.org BASIC RESEARCH

A Mouse fetal kidney (E18.5) Cell count Fluorescence LASER

SIX2 promoter EGFP

In-depth characterization: Smart-seq2 1.Identify cell populations Single cell RNAseq 2.Find splice isoform switching events (Full transcript length sequence) 3.Identify splicing regulator candidates

B

Serpine1 Foxd1 Osr1 Six2 Eya1 Cited1 Sall2 Ncam1 CM Wt1 Cdh11 Rhbg Clcnkb Foxi1 Slc4a9 Atp6v1g3 CD Atp6v0d2 Pax2 Pax8 Gapdh Krt19 Wnt9b Muc1 Gata3 Cldn8 DIST/CD Cldn4 Aqp2 Umod Slc12a1 Kcnj1 Clcnka LOH Prom1 Ocln Krt18 Epcam Cldn7 EPI Cldn6 Cdh1 Cdh6 Lrp2 Slc22a6 Slc34a1 Hnf4a Cldn2 PROX Slc5a1 Aqp1 Top2a Cell Mki67 Cd97 Actb division Meis1 Serpine2 Fn1 Col1a1 Pdgfrb Col3a1 UM Col1a2 Col5a2 Acta2 Synpo Podxl Nphs2 Nphs1 Ptpro PODO Mafb S100a8 S100a9 Ngp Ltf NEUTR Edntb Cd93 Dysf Fdr Cdh5 ENDO Cd34 Cd14 Cxcl2 Emr1 Ccl3 Cd68 Ccl2 Mrc1 MACRO C1qb C1qa

CM UM DISTAL/CD PROX

Six2-high

Six2-low PODO MACRO ENDO LOH

Figure 1. Single-cell RNA sequencing was used to identify and transcriptionally characterize the main cell lineages that coexist in the nephrogenic zone of the developing mouse fetal kidney. (A) Shown is the general outline of the experiment. Kidneys from transgenic mouse embryos with a Six2-GFP reporter gene were harvested at E18.5. The Six2-GFP reporter gene shows a clear fluorescent pattern marking the CM. After tissue dissociation, single cells from the Six2-high and Six2-low cell fractions were sorted into individual wells and their mRNA was sequenced for full transcript length using the Smartseq2 protocol. (B) A clustergram for 83 selected genes from the literature versus 544 cells shows the main cell lineages in the developing kidney. We manually classified the cells to lineages known

JASN 31: ccc–ccc, 2020 mRNA Splice Isoforms Switch during Kidney Development 3 BASIC RESEARCH www.jasn.org

P1000 pipette. Tissue fragments were then incubated with Purification Kit (51800; Norgen Biotek) was used in one ex- trypsin at 37°C for 20 minutes, followed by additional tritu- periment where we sorted 50,000 Six2-high and 50,000 Six2- ration by pipetting. After visual confirmation that the majority low cells, and the Direct-zol RNA MiniPrep Kit (R2050; Zymo of cells were fully dissociated, enzyme digestion was stopped Research) was used in two experiments where we sorted ap- by adding 2 ml of DMEM with 10% FBS and placed on ice. proximately 100,000 Six2-high cells and 400,000 Six2-low Cells were first filtered with 70-mm cell strainers (CSS- cells. Bulk total RNA was purified according to the manufac- 010–070; Lumitron) and then 40-mmcellstrainers turer’s instructions and stored at 280°C. Altogether, six (732–2757; VWR International), pelleted by centrifugation samples of total RNA were collected (three replicates, each for 5 minutes at 5003g, resuspended in 3 ml PBS, and kept containing total RNA from Six2-high and Six2-low cells). on ice until FACS sorting. Sequencing libraries were prepared using the G-INCPM Single-cell sorting was done using a BD FACSAria III flow in-house protocol for mRNA sequencing. Briefly, the poly-A cytometer with an 85 mm nozzle. Forward and side scatter fraction (mRNA) was purified from 80 to 280 ng of total RNA, were used to filter out red blood cells and to select for live followed by fragmentation and generation of double-stranded single cells. Gating on the basis of the negative control—the cDNA. Then, end repair, A-tailing, adapter ligation, and PCR cells from the nontransgenic kidneys—was used to select for amplification steps were performed. Libraries were evaluated cells that were either positive (Six2-high) or negative (Six2- using a Qubit (Thermo Fisher Scientific) and TapeStation low) for expression of the Six2-GFP transgene. (Agilent). Sequencing libraries were constructed with barco- All procedures were approved by the institutional Animal des to allow for multiplexing of six libraries in a single lane. All Care and Use Committee in Bar-Ilan University. six libraries were paired-end sequenced (23126 bases) on an Illumina HiSeq 2500 platform in the G-INCPM. For each Single-Cell RNA Sequencing sample we obtained 35–42 million reads. For single-cell RNA sequencing, single cells were sorted into The GEO series record for the bulk RNA sequencing data is 96 individual wells from a 384-well plate that was prefilled GSE147525. with Smartseq2 cell lysis buffer, RNase inhibitor, oligo-dT primer, ERCC RNA spike-in mix, and dNTPs. Plates were Single-Cell RNA Sequencing Data Preprocessing and spun for 1 minute to collect the liquid and cells at the bottom Gene Expression Analysis of the wells and immediately frozen. The Smartseq2 protocol Raw reads from 576 cells (6396-well plates) were aligned by was performed by the Israel National Center for Personalized TopHat228 to the mouse mm10 genome. Aligned reads were Medicine (G-INCPM) as previously described,25 with 22 am- counted by HTSeq.29 Data normalization and estimation of plification cycles. Altogether, six plates were processed, each size factors was done by DESeq2,30 resulting in a matrix of containing 96 individual cells (576 cells in total), with three normalized gene expression counts (Supplemental Table 1). plates containing Six2-high cells and three plates containing We then filtered out 11 cells that expressed zero levels of Six2-low cells. Each one of the six plates was separately se- the housekeeping genes Gapdh or Actb, resulting in 565 quenced (1361 bases) on the Illumina HiSeq 2500 platform in cells. We chose highly variable genes using the method by the G-INCPM. Macosko et al.9 to select for genes whose variance exceeds The GEO series record for the single-cell RNA sequencing those of other genes having a similar mean expression value data is GSE146988. (Figure S2A in Supplemental Appendix 1). This step resulted in 647 highly variable genes, to which we added a list of genes Bulk RNA Sequencing fromtheliteraturethatwerepreviouslyshowntobeinvolved Six2-high and Six2-low cells were sorted into separate 1.5 ml in kidney development (Supplemental Table 2). We also tubes, each containing RNA purification buffer. This was re- added an additional list of 48 genes from a previous single- peated in three experiments in which we used two types of cell quantitative PCR study that we previously conducted on RNA purification kits: the Norgen single-cell RNA human fetal kidney cells23 (SupplementalTable2),which,in to coexist in the nephrogenic zone of the developing fetal kidney, including the un-induced mesenchyme (UM), the cap mesenchyme (CM), podocytes (PODO), proximal tubular epithelial cells (PROX), the loop of Henle (LOH), distal tubular cells and collecting duct (DIST_CD), endothelial cells (ENDO), and infiltrating immune cells, mainly macrophages (MACRO) and a small number of neutrophils (NEUTR). The horizontal bar at bottom of the figure depicts the gate used by FACS to sort each cell (Six2-high or Six2-low). Consistent with the fact that Six2 is highly expressed in the CM, it can be seen that cells originating from the Six2-high fraction predominantly belong to the CM, whereas cells originating from the Six2-low fraction belong mostly to the other populations. The gene panel in- cludes genes that were previously shown to be specific to the different populations, as well as general epithelial markers (EPI) and genes that are overexpressed in the S-G2-M phase of the cell cycle, indicating cell division. Notice that some cells (within the second cluster from the left) express a mixture of CM markers and epithelial markers (EPI, PODO), as well as cell division. These are presumably cells from early epithelial structures, presumably pretubular aggregates, renal vesicles, and C/S-shaped bodies (see Supplemental Appendix 1).

4 JASN JASN 31: ccc–ccc,2020 www.jasn.org BASIC RESEARCH retrospect, were not crucial to the identification of the dif- enrichment of genes containing differen- ferent cell populations. These steps resulted in a gene ex- tial splicing events (Supplemental Table 9) was done with pression matrix of 677 genes 3 565 cells. Each gene was ToppGene (https://toppgene.cchmc.org).38 rMAPS (http:// then modified-log-transformed (log2[11expression]) and rmaps.cecsresearch.org/)39 was used to test for enrichment standardized by subtracting the mean, dividing by the SD, of binding motifs of RNA binding proteins in the vicinity of and truncating to the range (21,1). alternatively spliced cassette exons in order to identify pu- We used tSNE31 to project the 565 single cells profiles tative splicing regulators. A list of 84 RNA binding proteins into a two-dimensional plane (Figure S2E in Supplemental was obtained from the rMAPS website (http://rmaps. Appendix 1) and used genes that are known to mark differ- cecsresearch.org/Help/RNABindingProtein).39–41 Apart ent populations in the fetal kidney to identify the various from the RNA binding motifs that are tested by the default populations, including a population of 21 low-quality cells settings in the rMATS website, we also tested additional that appears as a “mixture” of many cell types. This popu- UGG-enriched motifs that were previously found to be lation of cells displayed low expression levels of the house- binding sites for the RNA binding proteins Esrp119,42 and keeping genes Actb and Gapdh, as well as low DESeq size Esrp243 (Supplemental Table 10). For the RNA binding pro- factors (Figure S3 in Supplemental Appendix 1). After re- teins Rbfox1 and Rbfox2, following Yang et al.12 and the moving these 21 low-quality cells, the process of selecting CISBP-RNA database40 (http://cisbp-rna.ccbr.utoronto. for highly variable genes (and adding known genes related ca),weassumedthatbothproteins(Rbfox1andRbfox2) to kidney development as described above) was then re- preferentially bind to the same motif ([AT]GCATG[AC]) peated for the remaining high-quality cells, resulting in a on mRNA. matrix of 728 genes 3 544 cells, whose analysis is shown below. We note that once a sufficient number of highly vari- able genes are included, the ability to identify the different RESULTS cell populations (Figure 2A) is not very sensitive to the exact choice of genes. Gene Expression Levels Were Used to Classify Each Cell into One of the Various Cell Types that Coexist Creation of Bulk In Silico Transcriptomes Representing within the Nephrogenic Zone of the Developing the Different Cell Populations Mouse Fetal Kidney After identifying the different populations (Supplemental We collected kidneys from transgenic mouse embryos that Table 3) according to known gene markers from the literature express GFP under the control of a Six promoter27 (e.g., Adam et al.6 and Magella et al.7), we created bulk in silico (Figure 1A). These mice have the advantage that cells from transcriptomes representing the different cell populations by the CM—previously shown to express the transcription factor merging reads (*.bam files) from all cells belonging to each Six2—are fluorescently tagged and can be enriched by flow population (using samtools merge). Again, aligned reads were cytometry. The kidneys were collected at E18.5 of gestations counted by HTSeq29,32 and data normalization and estimation because at this stage we expect to observe a still-active neph- of size factors was done by DESeq2.30,33 rogenic zone containing both early progenitor cell populations as well as fully developed nephrons.6,7 After kidney dissocia- Splicing Analysis: Identification of Splicing Events and tion, we used flow cytometry to select 288 cells expressing high Putative Splicing Regulators levels and 288 cells expressing low levels of the Six2-GFP rMATS34 was used to detect splice isoform switching events transgene, and for each individual cell we performed full tran- between the bulk in silico transcriptomes representing the dif- script length single-cell RNA sequencing using the Smartseq2 ferent cell populations and to infer the inclusion levels of se- protocol.25,26 After discarding low-quality cells, this resulted lected splicing events. In order to compare the mesenchymal in gene expression and sequence information for 544 individ- populations (uninduced mesenchyme [UM] and CM) to the ual cells, with approximately equal proportions of cells orig- epithelial populations (early epithelial structures [PROX_1], inating from the Six2-high and Six2-low fractions. proximal epithelial tubules [PROX_2], LOH, distal tubule and Using expression levels of selected genes from the literature collecting duct [DIST_CD]), we performed the following that were previously shown to be specific to each cell popula- comparisons: UM-PROX_1, UM-PROX_2, UM-LOH, UM- tion,24,44,45 genes from recent publications on single-cell pro- DIST_CD, CM-PROX_1, CM-PROX_2, CM-LOH, and CM- filing of the fetal kidney using the Drop-seq technology,6–8 as DIST_CD. Additionally, we compared the mesenchymal well as general epithelial markers and genes indicating cell populations (UM and CM) to the PODOs: UM-PODO and division, we manually classified each cell into one of the major CM-PODO (see Supplemental Tables 4–8). Selected splicing cell types that coexist in the nephrogenic zone of the develop- events (e.g., cassette exons) were visualized and validated using ing kidney (Figures 1B and 2A, Figures S1 and S4–S9 in IGV35 and Sashimi plots,36 as well as with bar plots of inclu- Supplemental Appendix 1). These include the un-induced sion levels inferred from rMATS34 or DEXSeq37 (see mesenchyme (UM), the cap mesenchyme (CM), podocytes Supplemental Appendix 1). (PODO), early epithelial structures (PROX_1; presumably

JASN 31: ccc–ccc, 2020 mRNA Splice Isoforms Switch during Kidney Development 5 BASIC RESEARCH www.jasn.org

A B tSNE CM only (PCA) 40 Original MKI67 TOP2A ENDO CCNA2 CM UM ANLN 20 BIRC5 5

0 0 MACRO -5 -10

PROX_1 PC3 (1.92% Var)

-20 PODO PROX_2 0 0 PC1 (8.02% Var) LOH Uninduced M (COL1A1 and COL3A11) 10 10 CM (SIX2) PC2 (2.35% Var) 20 -40 Proximal tubule (SLC22A6) Distal tububle (MUC1) DISTAL/CD LOH (SLC12A1) CD (AQP2) -60 Podocytes (NPHS2) -50 0 Endothelial (KDR and CDH5) Macrophages (C1QB and CCL3) CD Mesenchymal markers Epithelial markers

Marking levels of Snai2 *104 Marking levels of cdh6 *104 40 2.5 40 3 ENDO ENDO CM UM Snai2 CM UM 2.5 Cdh6 20 2 20

2 0 1.5 0 MACRO MACRO PROX_1 PROX_1 1.5 -20 PODO PROX_2 1 -20 PODO PROX_2 LOH LOH 1 -40 0.5 -40 0.5 DISTAL/CD DISTAL/CD -60 0 -60 0 -50 0 50 -50 0 50 UM UM LOH LOH PODO PODO CM_ALL CM_ALL PROX_1 PROX_2 PROX_1 PROX_2 DIST_CD 4 DIST_CD Marking levels of *10 Marking levels of *103 40 8 40 16 ENDO 7 ENDO CM UM Cdh11 CM UM 14 Cdh1 20 20 6 12

5 0 0 10 MACRO MACRO PROX_1 4 PROX_1 8 -20 PODO PROX_2 3 -20 PODO PROX_2 6 LOH LOH 2 4 -40 -40 DISTAL/CD 1 DISTAL/CD 2 -60 0 -60 0 -50 0 50 -50 0 50 UM UM LOH LOH PODO PODO CM_ALL PROX_1 PROX_2 CM_ALL PROX_1 PROX_2

DIST_CD 4 Marking levels of *103 Marking levels of Epcam *10 DIST_CD 40 8 40 6 ENDO 7 ENDO Epcam CM Cdh2 CM UM 5 20 UM 20 6 4 5 0 0 MACRO MACRO PROX_1 4 PROX_1 3 -20 PODO PROX_2 3 -20 PODO PROX_2 LOH LOH 2 2 -40 -40 1 DISTAL/CD 1 DISTAL/CD -60 0 -60 0 -50 0 50 -50 0 50 UM UM LOH LOH PODO PODO CM_ALL CM_ALL PROX_1 PROX_2 PROX_1 PROX_2 DIST_CD DIST_CD

Figure 2. Single-cell gene expression analysis enables characterization of cellular heterogeneity, cell cycle dynamics, and the Mes- enchymal to Epithelial Transition (MET) in the developing mouse fetal kidney. (A) Shown is a tSNE plot of 544 single-cell gene ex- pression profiles, each consisting of 728 highly variable genes (see Methods). Each cell is represented by a dot. Cells overexpressing genes that were previously shown to mark different cell types are marked by additional symbols. (B) Cells of the CM create a circular manifold in gene expression space that corresponds to the cell cycle. Shown is a PCA figure of cells from the CM only. Each cell is

6 JASN JASN 31: ccc–ccc,2020 www.jasn.org BASIC RESEARCH pretubular aggregates, renal vesicles, and C/S-shaped bodies), Supplemental Appendix 1), with the highest expression levels proximal epithelial tubules (PROX_2), the loop of Henle being expressed in the UM, medium levels in the CM, lower (LOH), distal tubule and collecting duct (DIST_CD), endo- levels in the PODOs, and very low levels in the epithelial line- thelial cells (ENDO), and infiltrating immune cells, mainly ages (PROX_1 and PROX_2). Likewise, when comparing macrophages (MACRO). PROX_1 cells differ from PROX_2 Cdh6 and Cdh1 (Figure 2D, Figure S12 in Supplemental Ap- cells in that they overexpress markers for early epithelial struc- pendix 1) we noticed that Cdh6 is higher in the early epithelial tures such as Mdk and Lhx1 (Figure S7 in Supplemental Ap- structures (PROX_1) and PROX_2, whereas Cdh1 is higher in pendix 1), as well as markers for actively dividing cells such as the LOH and DIST_CD.47 Mki67 and Top2a (Figure S10 in Supplemental Appendix 1). We note that we found it extremely difficult to distinguish rMATS Was Used to Characterize the Splice Isoform between cells of the distal tubule and cells of the collecting Switching Events that Occur during the Transition duct in our data set, probably because of their transcriptional between the Mesenchymal and Epithelial Cellular similarity as well as the relatively small number of cells in this States in the Course of Kidney Development experiment, and therefore we merged them into a single We next used the full transcript length sequence information population. to characterize the splice isoform switching that occurs during After identifying the various cell subpopulations, we in- the MET in the course of kidney development. We first focused spected the expression levels of the genes Mki67 and Top2a on identifying alternatively spliced exons (cassette/skipped that are known to be overexpressed during cell division (Fig- exons) by searching for exons whose inclusion levels (defined ure S10 in Supplemental Appendix 1). We found that the UM, as the fraction of transcripts that include the exon out of the CM, PROX_1, LOH, and DIST_CD each contain a substantial total number of transcripts that either include the exon or skip subset of dividing cells that overexpress these genes, whereas over it) changes significantly between the mesenchymal and the PODOs and PROX_2 do not. Moreover, we found that epithelial states. Because the coverage for each single cell was the cells of the CM create a circular manifold (i.e., a high di- rather low for splicing analysis, we first merged the raw reads mensional ring) in gene expression space, whose segments from all cells belonging to each population in order to create correspond to the different phases of the cell cycle bulk in silico transcriptomes that represent each cell popula- (Figure 2B, Figure S11 in Supplemental Appendix 1). Using tion, and then used the resulting bulk transcriptomes as input RNA velocity46—a computational tool for inferring a vector to rMATS34 (Supplemental Tables 4–8). We searched for cas- between the present and predicted future transcriptional state sette exons whose inclusion levels change significantly of each single cell by distinguishing between the spliced mRNA (FDR50, difference in inclusion levels .0.2) between either (present state) and yet-unspliced mRNA (future state)—we of the mesenchymal populations, the UM or the CM, and all of observed a consistent directional flow along this circular man- theepithelialpopulations,PROX_1,PROX_2,LOH,and ifold (Figure S11 in Supplemental Appendix 1). DIST_CD. Next, we inspected the expression levels of genes that are We found 57 cassette exons that were thus differentially known to be involved in the MET or that are known to be expressed between the mesenchymal and epithelial lineages, preferentially overexpressed in mesenchymal or epithelial out of which 55 could be validated by manual inspection in the lineages10,12,47 (Figure2,CandD,FiguresS12andS13in IGV genome browser35 (Figure 3, A and C, selected exam- Supplemental Appendix 1). We observed high levels of ples in Figures S23–S41 in Supplemental Appendix 1, mesenchyme-associated genes such as Snai2, Cdh11, and Supplemental Table 9). These exons include some known ex- Cdh2 in the earlier developmental lineages: the UM and amples that were previously observed in EMT such as the CM. Likewise, higher levels of epithelial genes such as Cdh6, epithelial-associated cassette exons in Map3k7,42,48,49 Cdh1, and EpCAM were prevalent in the more differentiated Dnm2,42,50,51 Pard3,50 and the mesenchymal-associated exons lineages: the early epithelial structures (PROX_1), PROX_2, Plod2,12,18,52 Csnk1g3,18,49 and Ctnnd1.14,18 We next used LOH, and DIST_CD. We noticed that the expression of Cdh11 ToppGene 38 to perform gene ontology enrichment analysis showed a gradual decrease (Figure 2C, Figure S12 in (Figure 3B, Supplemental Table 9) and found that the genes represented by a dot. Cells overexpressing genes such as Top2a and Mki67 (genes that were shown to be overexpressed in the S-G2-M phases of the cell cycle) are marked by additional symbols. These cells are located in a specific segment of the circular manifold representing the S-G2-M segment of the cell cycle. (C and D) Expression levels of the mesenchyme related genes Snai2, Cdh11, and Cdh2 are typically higher in the UM and CM, whereas the epithelial genes Cdh6, Cdh1, and Epcam are typically higher in the PROX_1, PROX_2, LOH, and DIST_CD. Shown are tSNE plots and bar plots showing the expression levels of selected genes. The area of each circle in each tSNE plot is proportional to log2(11expression) of the specific gene in that particular cell. The expression level in each cell is also encoded by the circle color (red, high expression; green, low expression). The bar plots show gene expression levels in bulk in silico cell transcriptomes representing the different populations (UM, CM, PODO, PROX_1, PROX_2, LOH, and DIST_CD) that were created by uniting raw reads from all cells belonging to each population. The annotations “CM_ALL” and “CM” are used in- terchangeably to represent all cells that were classified as belonging to the CM (see also Supplemental Appendix 1).

JASN 31: ccc–ccc, 2020 mRNA Splice Isoforms Switch during Kidney Development 7 BASIC RESEARCH www.jasn.org

A B -log10(p_val) Inclusion levels 01234567 Cadherin binding Dmtf1 molecule binding Myo1b Cytoskeletal binding Bnip2 Sorbs1 Actin binding Pam Cytoskeleton organization Palm Plod2 Cell junction organization Lrrfip1 Actin cytoskeleton Uap1 Trp53bp1 Actin filament Rnf138 Adherens junction Mprip Cttn Anchoring junction Cnnm2 Lamellipodium Mpzl1 Mark3 Cell junction App Cell leading edge Dtnb Rps24 Actin-based cell projection Ktn1 Filopodium Ctnnd1 Csnk1g3 Focal adhesion Gyk Adherens junction Abi1 The splicing regulators Esrp1 and Esrp2 Osbpl9 Nf2 direct an epithelial splicing program Trappc13 essential for mammalian development Flnb (PMID: 26371508) Fxr1 Dzip3 Tcf7l2

Scarb1 Dnm2_chr9_21490657_21490669_21489742_21489794_21494503_21494617_ Gpbp1 C 1 Map3k7_chr4_31994873_31994954_31992386_31992516_32002088_32002153_ 1 Mtmr2 Ankrd17 0.8 Map3k7 0.8 Dnm2 Tpp2 0.6 0.6 Tjp1 Map3k7 0.4 0.4 Macf1 Inclusion Levels Inclusion Levels Picalm 0.2 0.2 Ctage5 0 Car12 0 UM CM UM CM

Csnk1d LOH LOH PODO

Aplp2 PODO PROX_1 PROX_2 DIST_CD PROX_1 PROX_2 St3gal6 DIST_CD Nadkd1 [0 - 97] UM.bam [0 - 153] UM.bam March7 54 31 69 26 Sorbs2 2 14 22 [0 - 606] CM_ALL.bam [0 - 553] CM_ALL.bam Golgb1 467 84 323 120 126 Add3 48 39 [0 - 100] PODO.bam [0 - 81] PODO.bam 42 33 Strn3 61 14 39 Rai14 6 17 [0 - 99] PROX_1.bam [0 - 119] PROX_1.bam 59 61 Dnm2 61 51 24 Pard3 4 37 [0 - 132] PROX_2.bam 72 75 Immt [0 - 63] PROX_2.bam 33 32 33 0 16 LOH.bam [0 - 57] 43 47 [0 - 41] LOH.bam 29 33 7 UM CM 8 5 [0 - 306] DIST_CD.bam LOH 219 230 [0 - 163] DIST_CD.bam 119 22

PODO 110

08 35 21489714 21491382 21493050 21494718 PROX_1 PROX_2

DIST_CD 31992643 31995909 31999176 32002443

Pard3_chr8_127410728_127410773_127409555_12409707_127415569_127415797_ Plod2_chr9_92601984_92602041_92600740_92600882_92602947_92603061_ Csnk1g3_chr18_53948644_53948668_53948047_53948154_53953244_53955684_ 1 0.8 Ctnnd1_chr2_84612531_84612549_84612023_84612092_84612935_84613089_ 0.8 0.8 0.8 Pard3 Plod2 Csnk1g3 Ctnnd1 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 Inclusion Levels Inclusion Levels Inclusion Levels 0.2 Inclusion Levels 0.2 0.2 0.2

0 0 0 0 UM CM UM CM UM CM LOH UM CM LOH LOH LOH PODO PODO PODO PODO PROX_1 PROX_2 DIST_CD PROX_1 PROX_2 PROX_1 PROX_2 PROX_1 PROX_2 DIST_CD DIST_CD DIST_CD

[0 - 34] UM.bam 87 11 33 11 [0 - 241] UM.bam [0 - 176] UM.bam [0 - 114] UM.bam 130 41 2 20 47 49 24 [0 - 222] CM_ALL.bam 41 2 3 77 [0 - 804] CM_ALL.bam 660 622 CM_ALL.bam [0 - 585] CM_ALL.bam 207 [0 - 175] 252 230 4 68 96 33 401 240 102 [0 - 45] PODO.bam [0 - 40] PODO.bam 8 5 16 [0 - 186] PODO .bam v 23 15 19 82 48 [0 - 279] PODO.bam 4 9 15 40 84 102 13 [0 - 59] PROX_1.bam [0 - 65] 12 PROX_1.bam 33 49 [0 - 76] PROX_1.bam 41 8 [0 - 277] 28 PROX_1.bam 3 15 12 75 8 60 46 [0 - 86] PROX_2.bam [0 - 68] PROX_2.bam 19 58 32 9 [0 - 231] PROX_2.bam 162 11 [0 - 277] 12 PROX_2.bam 2 5 6 5 [0 - 20] LOH.bam 66 6 [0 - 41] LOH.bam 5 12 24 4 [0 - 36] LOH.bam 23 [0 - 119] LOH.bam 3 3 19 5 8 [0 - 188] DIST_CD.bam [0 - 89] DIST_CD.bam 8 134 2 46 [0 - 456] DIST_CD.bam 9 3 285 36 [0 - 517] 141 DIST_CD.bam 74 19 46 84611880 84612342 84612805 84613268 27409389 127411663 127413937 12741621 43 92600750 92601558 92602366 92603174 53948089 53949864 53951640 53953415

Figure 3. Characterization of splice isoform switching events that occur during the Mesenchymal to Epithelial transition (MET) in the course of kidney development. (A) Shown is a heatmap of inclusion levels of selected cassette exons that change significantly (FDR50, difference in inclusion levels .0.2) between either of the mesenchymal populations (UM or CM) and all of the epithelial populations

8 JASN JASN 31: ccc–ccc,2020 www.jasn.org BASIC RESEARCH containing these exons are related to epithelial characteristics the mesenchymal rather than the epithelial isoforms (e.g., cadherin binding or cell junction) or mesenchymal char- (Figure 3A, Figures S15–S17 in Supplemental Appendix 1). acteristics (e.g., lamellipodium or cell leading edge, related to This reveals another aspect in which PODOs, which form a cellular motility). Moreover, the enrichment analysis pointed specialized type of epithelial tissue,5 are different from most out that splicing in many of these genes is controlled by the other forms of epithelia. splicing regulators Esrp1 and Esrp2, as was recently shown in The genes Enah and Cd44 are prominent examples of genes developing mice.19 that undergo splice isoform switching during EMT.17,18 Both Using similar analysis, we found additional MET-related genes contain cassette exons that are expressed in epithelial alternative splicing events of types other than cassette/skipped cells only. However, this behavior was difficult to observe in exons (SE), although to a much smaller extent. These include our single-cell data set, probably because of the relative low mutually exclusive exons (MXE) such as Fgfr214,53 (Figure S15 coverage and bias of our single-cell protocol with respect to in Supplemental Appendix 1), Actn1 (Figure S42 in Supple- bulk RNA sequencing. We therefore performed additional mental Appendix 1), and Tpm1 (Figure S43 in Supplemental bulk RNA sequencing on three replicates of sorted Six2-high Appendix 1); alternative 39 splice sites (A3SS) such as Acsl4 and Six2-low cell fractions. Because Six2 is uniquely expressed (Figure S44 in Supplemental Appendix 1) and Bmp1 (Figure in the CM, the cell fraction that was gated Six2-high is pre- S45 in Supplemental Appendix 1); and alternative 59 splice dominantly composed of cells originating from the CM, sites (A5SS) such as Polr2k (Figure S46 in Supplemental Ap- whereas the cell fraction gated for Six2-low contains a mixture pendix 1). We also found a retained intron (RI) in the gene of all of the other mesenchymal and epithelial populations. Srsf1 (Figure S47 in Supplemental Appendix 1) that was ex- Nevertheless, by manually comparing Sashimi plots from pressed at higher levels in the PODOs with respect to other cell these two cell fractions we were able to observe the alterna- populations. tively spliced cassette exons in the genes Enah and Cd44 (Fig- ure S18 in Supplemental Appendix 1). Supervised Analysis Identifies Additional Genes that A different form of alternative splicing, not related to Undergo Splice Isoform Switching during Kidney EMT, was observed in gene Cldn10, which is an important Development component of epithelial tight junctions in the kidney and Because the low coverage and bias of single-cell protocols provides a barrier that permits selective paracellular trans- limits the power of automated tools for discovering splice port.54 Two isoforms of the gene Cldn10 were previously isoform switching, we searched the existing literature for ad- found to coexist in the kidney, one being highly expressed ditional genes for which different splice isoforms are ex- in the cortex and the other in the medulla.55 These alterna- pressed in mesenchymal versus epithelial cells or which are tively spliced isoforms are thought to generate different known to undergo splice isoform switching during EMT, and permselectivities in different segments of the nephron. In examined their alternative splicing between the different cell our data set we observed that the cortical isoform was indeed populations of the developing kidney. Indeed, we found that overexpressed in PROX_1 and PROX_2, which are located in the genes Fgfr214,53 (Figure S15 in Supplemental Appendix 1, the cortex, whereas the medullary isoform was overexpressed also found in our previous analysis), Epb41l517,51 (Figure S15 in the LOH, which is predominantly located in the medulla in Supplemental Appendix 1), Fat118,50 (Figure S16 in Sup- (Figure S19 in Supplemental Appendix 1). This confirmed plemental Appendix 1), and Arhgef10l19,20,43 (Figure S17 in previous in situ hybridization measurements55 also at the Supplemental Appendix 1) express their mesenchymal iso- single-cell transcriptomic level. forms predominantly in the early mesenchymal populations We also observed two known splice isoform switching events (UM and CM), and their epithelial isoforms mostly in the in the gene Wt1, a central gene in kidney development4–8: cassette more differentiated epithelial populations (LOH and exon 5 is low in the mesenchymal populations and increases grad- DIST_CD). PROX_1 and PROX_2 express either the mesen- ually during MET, and the KTS1:KTS2 isoform ratio in exon 9 chymal or the epithelial isoform or a mixture of both. Inter- converges to 60:40 in the epithelial populations56–65 (see Figure estingly, we observed that the PODOs in many cases express S20 in Supplemental Appendix 1).

(PROX_1, PROX2, LOH, and DIST_CD). Exon inclusion levels were derived from in silico bulk transcriptomes representing the different cell populations (UM, CM, PODO, PROX_1, PROX_2, LOH, and DIST_CD) that were created by uniting raw reads from all cells be- longing to each population. Colors indicate relative high (red) versus low (green) inclusion levels. The inclusion levels for each exon (row) were independently standardized by mean-centering and dividing by the SD. (B) Gene ontology (GO) enrichment analysis shows that the genes containing these differentially expressed cassette exons are related to structural and functional properties of epithelial cells (e.g., cadherin binding or cell junction) or to characteristics of mesenchymal cells (mainly cellular motility, e.g., lamellipodium and cell leading edge). (C) Sashimi plots and bar plots of inclusion levels in selected exons show gradual increase (Map3k7,42,48,49 Dnm2,42,50,51 and Pard350) or decrease (Plod2,12,18,52 Csnk1g3,18,49 and Ctnnd114,18) of inclusion levels during the transition from mesenchymal (UM and CM) to epithelial states (PROX_1, PROX_2, LOH, and DIST_CD; see also Supplemental Appendix 1).

JASN 31: ccc–ccc, 2020 mRNA Splice Isoforms Switch during Kidney Development 9 nw rmteltrtr orglt piigtruhbnigo RAtasrps eue xrsinlvl rmbulk from 10 levels expression are used that We (RBPs) pro transcripts. proteins single-cell mRNA binding the of RNA of plot 84 tSNE binding A of through (B) levels population. is splicing expression cell Shown each regulate (A) mean represent development. the to that kidney of transcriptomes during literature states occurs the that epithelial (MET) from and known transition mesenchymal Epithelial between to comparison Mesenchymal the a of regulators splicing are Rbfox1/2 4. Figure piigrgltr bo12adEr12 tcnb enta bo1adRfx r ihyepesdi h eecya populations, mesenchymal the in expressed highly are Rbfox2 and Rbfox1 that seen be can It Esrp1/2. and Rbfox1/2 regulators splicing AI RESEARCH BASIC CM-DIST_CD A C Motif Score (mean) UM-DIST_CD Motif Score (mean) Counts JASN 0.013 0.026 0.039 0.052 0.017 0.034 0.050 0.067 1000 1200 ifrnilepeso fRAbnigpoen n N idn oi nihetaayi ugs htEr12and Esrp1/2 that suggest analysis enrichment motif binding RNA and proteins binding RNA of expression Differential 200 400 600 800 0.0 0.0 Mean expression [log2(1+Counts)] 0

5’ Epithelial populations (PROX1,PROX2,LOH,DIST_CD)

-50 -50 10 12 14 16 18

UM 0 2 4 6 8 0 0 CM_ALL 015 10 5 0 Motif MAP:RBFOX1-[AT]GCATG[AC] Motif MAP:RBFOX1-[AT]GCATG[AC](PresumablyRBFOX2) A1cf PODO Esrp1 Upregulated(695) -log(pVal) upvs.bg Upregulated(375) 125 -log(pVal) upvs.bg 125 PROX_1 Esrp2 www.jasn.org

250 250 PROX_2

-250 Ybx2 -250 Pabpc5 Mesenchymal populations(UM,CM) Rbfox1 Mean expression[log2(1+Counts)] Mesenchymal lineages LOH Cpeb2 DIST_CD Rbm47 -125 -125

0 0 Downregulated(480) -log(pVal) dnvs.bg 50 Downregulated(603) -log(pVal) dnvs.bg 50

-50 -50 lgf2bp3 0 Counts Khdrbs2 5000 6000 1000 2000 3000 4000

0 Rbm24 RBFOX1/2 Enox1 0 UM>DIST_CD CM>DIST_CD 125 125 Rbm46 CM_ALLUM 250 250 Rbfox2 -250 -250 PODO Rbms3 Background(809) Background(525) PROX_1

-125 -125 PROX_2 Rbfox1 Rbfox2 3’ 0 0

50 50 LOH 0.0 0.765 1.531 2.296 3.061 0.0 0.907 1.814 2.721 3.628 DIST_CD

Negative log10(pValue) Negative Negative log10(pValue) Negative D B Motif Score (mean) Motif Score (mean) CM-LOH Counts CM-PODO 0.040 0.081 0.121 0.161 4000 3000 3500 1000 1500 2000 2500 0.032 0.064 0.096 0.128 500 0.0 0.0 0

5’ -50 -50

UM Esrp1 0 0 CM_ALL Motif MAP:ESRP1-TGGTGG

Motif MAP:Additional.motif.14-TGGTG(PresumablyESRP2) PODO Upregulated(574) 125 125 -log(pVal) upvs.bg PROX_1 Upregulated(553) -log(pVal) upvs.bg

250 250 PROX_2 CM

fi -250 -250 e,hglgtn el htepesteputative the express that cells highlighting les, Rbfox1 DIST_CDLOH Esrp1 Epithelial linea -125 -125

0 0 50 50 Downregulated(174) -log(pVal) dnvs.bg Downregulated(232) -log(pVal) dnvs.bg -50 -50 Counts 0 3000 3500 0 1000 1500 2500 g 500 200 es ESRP1/2 0 CM

CM_ALLUM Esrp2 250 250

-250 -250 PODO Background(960) PROX_1 Background(837) JASN -125 -125 PROX_2 Rbfox2 Esrp2

3’ 0 0 31: 50 50 LOH

0.0 0.886 1.772 2.658 3.544 DIST_CD 0.0 1.984 3.968 5.952 7.935

ccc

Negative log10(pValue) Negative – log10(pValue) Negative ccc nsilico in ,2020 www.jasn.org BASIC RESEARCH

Differential Expression of RNA Binding Proteins and full transcript length RNA sequencing for deeper analysis of RNA Binding Motif Enrichment Analysis Suggest that selected cell populations, in order to obtain a detailed under- Esrp1/2 and Rbfox1/2 Are Splicing Regulators of the standing of the molecular mechanisms involved in kidney de- MET that Occurs during Kidney Development velopment and disease. We next used rMAPS39 to identify putative RNA binding pro- Because the Smartseq2 protocol that we used is practically teins that act as splicing regulators for splice isoform switching limited to a few hundreds of cells, we were unable to detect between the mesenchymal and epithelial states during renal very small immune populations67 or to discern between all cell development. We first compared the mean expression levels of subtypes. For example, we were unable to discern subpopula- 84 known RNA binding proteins39–41 (Supplemental Table 10) tions within the very early epithelial structures (PROX_1) or between the mesenchymal (UM, CM) and epithelial popula- stromal subtypes within the UM.6 Nevertheless, the Smart- tions (PROX_1, PROX_2, LOH, DIST_CD) and found several seq2 protocol does have the advantage of being able to mea- putative splicing regulators that were differentially expressed sure expression levels more precisely. For example, we were (Figure 4A, Figure S21 in Supplemental Appendix 1). able to discern high, medium, and low expression levels of Of these differentially expressed RNA binding proteins, we genes such as Cdh11 (Figure S12 in Supplemental Appendix found Rbfox1 and Rbfox2 to be overexpressed in the mesen- 1) or Wt1 (Figure S20 in Supplemental Appendix 1). chymal cells, whereas Esrp1 and Esrp2 are overexpressed in the Each cell in our analysis was sequenced at roughly 1–2 epithelial cells (Figure 4B). Likewise, we found that Rbfox1/2 million reads per cell. Because splicing analysis typically re- and Esrp1/2 have RNA binding sites (motifs) that are enriched quires roughly 20–40 million reads, we merged the raw reads in the 59 or 39 neighboring introns of the cassette exons that from all cells belonging to each population in order to create are differentially expressed between mesenchymal and epithe- bulk in silico transcriptomes that represent each cell popu- lial states (Figure 4, C and D, Figure S22 in Supplemental lation and then performed splicing analysis on these. We Appendix 1).12,19,42,43 This indicates that RBFOX1/2 and note, however, that splicing analysis can also be done for ESRP1/2 are splicing regulators involved in MET during kid- individual cells,68 but because of the small number of reads ney development, similar to what was previously observed in the inferred inclusion levels for most genes will be less ac- EMT12: in the mesenchymal cell populations Rbfox1 and/or curate and have wide margins of error, except for the most Rbfox2 bind to mRNA in the downstream 39-flanking introns highly expressed genes. of the mesenchymal-associated cassette exons and promote their We note that the motif enrichment analysis for RNA bind- inclusion,12,40,66 whereas in the epithelial cell populations Esrp1 ing proteins often gave nonspecific results, such as candidate and/or Esrp2 bind to the downstream 39-flanking introns19,42,43 splicing regulators that were not even expressed in some pop- (and in some cases also to an additional site at the far 59 end of the ulations. We hypothesize that this stems from the fact that the upstream flanking intron42) of the epithelial-associated cassette binding motifs are not very specific because they are typically exons and promote their inclusion. only a few bases long. We therefore based our identification of the splicing regulators Esrp1/2 and Rbfox1/2 on the existence of two additional criteria apart from RNA binding motif en- DISCUSSION richment: first, the expression levels of Esrp1/2 and Rbfox1/2 differ significantly between the mesenchymal and epithelial In this study we used the Smartseq2 protocol25,26 for full tran- cell states; and second, there is much previous evidence for script length single-cell RNA sequencing to characterize the similar functionality in other developing organs and in vitro splice isoform switching events that occur during the MET in systems.12,14,17,42,43,48,51 The marked differences in expres- the course of kidney development. Such splicing information sion between mesenchymal and epithelial populations of is not obtainable using 39-end digital counting protocols such other RNA binding proteins such as Cpeb2, Rbm47,12 as Drop-seq,6–9 apart from splicing events located at the very Msi1, Rbms3, and others (Figure 4A, Figure S21 in Supple- 39 end of mRNA transcripts. These results highlight the im- mental Appendix 1) indicate that they might also be involved portance of combining 39-end digital counting technologies in renal MET splicing regulation. However, we did not ob- for transcriptional profiling of many thousands of cells, with serve a consistent enrichment of known binding motifs for

whereas Esrp1 and Esrp2 are highly expressed in the epithelial populations. The area of each circle is proportional to log2(11ex- pression) of each gene in that particular cell. (C and D) Cassette exons that are over-expressed in the mesenchymal populations contain asignificant enrichment of Rbfox1/2 binding motifs at their downstream introns.12,40,66 Likewise, cassette exons that are over- expressed in the epithelial populations contain a significant enrichment of Esrp1/2 binding motifs in their downstream in- trons,19,42,43 and in some cases (CM versus LOH), also in the far 59 end of their upstream introns (as previously observed in EMT42). This indicates that Rbfox1/2 and Esrp1/2 are splicing regulators12 involved in the MET that occurs during kidney development. The an- notations “CM_ALL” and “CM” are used interchangeably to represent all cells that were classified as belonging to the CM (see also Methods and Supplemental Appendix 1).

JASN 31: ccc–ccc, 2020 mRNA Splice Isoforms Switch during Kidney Development 11 BASIC RESEARCH www.jasn.org these genes as we did for Esrp1/2 and Rbfox1/2. Likewise, DISCLOSURES there may be additional splicing regulators whose expression does not change and whose RNA binding activity is modulated T. Kalisky has a patent application #US-20100255471, “Single cell gene by protein modification. One example is QK (also known as expression for diagnosis, prognosis and identification of drug targets,” with QKI),12 but we did not find significant motif enrichment for Michael F. Clarke, Stephen R. Quake, Piero D. Dalerba, Huiping Liu, Anne A. Leyrat, and Maximilian Diehn with royalties paid to Stanford University; and a this gene in the MET-associated differentially expressed patent application #US-20130225435, “Methods and systems for analysis of cassette exons. single cells,” with Michael F. Clarke, Stephen R. Quake, Piero D. Dalerba, In a recent study it was shown that ablation of Esrp1 in Huiping Liu, Anne A. Leyrat, Maximilian Diehn, Michael Rothenberg, Jianbin mice, alone or together with Esrp2, resulted in reduced kid- Wang, and Neethan Lobo with royalties paid to Stanford University. All re- ney size, fewer ureteric tips, reduced nephron numbers, and maining authors have nothing to disclose. a global reduction of epithelial splice isoforms in the tran- scriptome of ureteric epithelial cells.69 We believe that our results provide a detailed picture at the single-cell level that FUNDING complements the above study. Moreover, the fact that kid- neys still develop under the ablation of Esrp1/2, taken Y. Wineberg, T.H. Bar-Lev, N. Ben-Haim, S. Oriel, E. Bucris, Y. Yehuda, with our results, suggests that there are multiple splicing andT.KaliskyweresupportedbytheIsrael Science Foundation (Israeli Center of Research Excellence [I-CORE] no. 1902/12 and grants 1634/13 regulators acting combinatorically thus creating a bypass and 2017/13), the Israel Cancer Association (grant 20150911), the Israel mechanisms so that one splicing regulator compensates, al- Ministry of Health (grant 3-10146), and the European Union’sSeventh though partially, for lack of another. Because it is infeasible Framework Programme (Marie Curie International Reintegration grant to create transgenic mice containing knockouts of the many 618592). N. Pode-Shakked and B. Dekel were supported by the Israel possible combinations of multiple regulators with current Science Foundation (grants 2071/17 and 910/11). The funders had no role in study design, data collection and analysis, decision to publish, or prepa- technology, we suggest using kidney organoids as a model ration of the manuscript. system along with single-cell analysis for future functional studies. SUPPLEMENTAL MATERIAL

This article contains the following supplemental material online at ACKNOWLEDGMENTS http://jasn.asnjournals.org/lookup/suppl/doi:10.1681/ASN.2019080770/-/ DCSupplemental. We thank Nelly Komorovsky for assistance in fetal kidney collection. Supplemental Appendix 1. Supplemental text and figures. Likewise, we thank Jordan Kreidberg, Steve Potter, Oded Volovelsky, Supplemental Table 1. Single-cell gene expression values (raw and Morris Nehama, Tal Shay, Rotem Karni, and all members of our normalized counts). laboratories for useful comments and suggestions. Dr. Kalisky reports Supplemental Table 2. A list of genes from the literature that were grants from the Israel Cancer Research Fund (ICRF), grants from The previously shown to be involved in kidney development and an ad- Israel Ministry of Science, and other funds from TaGeza Bio- ditional list of 48 genes from a previous single-cell quantitative PCR pharmaceuticals (for consulting services provided), outside the study that we previously conducted on human fetal kidney cells.23 We submitted work. note that once a sufficient number of genes are included, the ability to Dr. Benjamin Dekel, Dr. Achia Urbach, and Dr. Tomer Kalisky identify the different populations (Figures 1B and 2A) is not very were responsible for study initiation and conception. Dr. Tali Hana sensitive to the exact choice of genes. Bar-Lev, Anna Futorian, Nissim Ben-Haim, Dr. Leah Armon, Dr. Supplemental Table 3. Lists of cells in each population. Efrat Bucris, Dr. Achia Urbach, and Dr. Tomer Kalisky were re- Supplemental Table 4. rMATS tables for cassette/skipped exons sponsible for fetal kidney collection and dissociation. Dr. Debby (SEs) that are alternatively spliced between the bulk in silico tran- Ickowicz was responsible for FACS sorting. Dr. Tali Hana Bar-Lev scriptomes representing the different populations. handled C1 experiments (data not used). Dr. Sarit Oriel handled Supplemental Table 5. rMATS tables for mutually exclusive exons Biomark experiments (data not used). Anna Futorian, Dr. Leah Ar- (MXEs) that are alternatively spliced between the bulk in silico mon, and Dr. Achia Urbach handled bulk RNA extraction. Dr. Tali transcriptomes representing the different populations. Hana Bar-Lev handled bulk RNA quality checks. Dr. Shlomit Gilad Supplemental Table 6. rMATS tables for alternative 39 splice sites and Dr. Sima Benjamin were responsible for Smartseq2 and (A3SS) that are alternatively spliced between the bulk in silico tran- RNA sequencing. Nissim Ben-Haim was responsible for bulk and scriptomes representing the different populations. single-cell RNA sequence preprocessing. Yishay Wineberg, Nissim Supplemental Table 7. rMATS tables for alternative 59 splice sites Ben-Haim, and Dr. Tomer Kalisky were responsible for single-cell (A5SS) that are alternatively spliced between the bulk in silico tran- gene expression and splicing analysis. Dr. Yishai Yehuda, Dr. Naomi scriptomes representing the different populations. Pode-Shakked, and Dr. Peter Hohenstein made other intellectual Supplemental Table 8. rMATS tables for retained introns (RIs) that contributions. Yishay Wineberg and Dr. Tomer Kalisky were re- are alternatively spliced between the bulk in silico transcriptomes sponsible for manuscript writing. representing the different populations.

12 JASN JASN 31: ccc–ccc,2020 www.jasn.org BASIC RESEARCH

Supplemental Table 9. Gene ontology (GO) enrichment analysis 17. Warzecha CC, Shen S, Xing Y, Carstens RP: The epithelial splicing results from ToppGene for genes containing alternatively spliced exons. factors ESRP1 and ESRP2 positively and negatively regulate diverse – Supplemental Table 10. A list of RNA binding proteins and types of alternative splicing events. RNA Biol 6: 546 562, 2009 18. Shapiro IM, Cheng AW, Flytzanis NC, Balsamo M, Condeelis JS, Oktay binding sites used for identifying putative splicing regulators. MH, et al.: An EMT-driven alternative splicing program occurs in human Supplemental Table 11. DEXSeq counts. This table contains the breast cancer and modulates cellular phenotype. PLoS Genet 7: number of reads that align to each exon within each bulk in silico e1002218, 2011 transcriptome representing each cell population. 19. Bebee TW, Park JW, Sheridan KI, Warzecha CC, Cieply BW, Rohacek AM, Program. A compressed directory containing a short MATLAB et al.: The splicing regulators Esrp1 and Esrp2 direct an epithelial splicing program essential for mammalian development. eLife 4: 1–27, 2015 program and data sets for single-cell data visualization. 20. Bangru S, Arif W, Seimetz J, Bhate A, Chen J, Rashan EH, et al.: Alternative splicing rewires Hippo signaling pathway in hepato- cytes to promote liver regeneration. NatStructMolBiol25: 928–939, 2018 REFERENCES 21. Brunskill EW, Park J-S, Chung E, Chen F, Magella B, Potter SS: Single cell dissection of early kidney development: Multilineage priming. 1. Hohenstein P, Pritchard-Jones K, Charlton J: The yin and yang of kidney Development 141: 3093–3101, 2014 development and Wilms’ tumors. Genes Dev 29: 467–482, 2015 22. Pode-Shakked N, Pleniceanu O, Gershon R, Shukrun R, Kanter I, Bucris 2. Gilbert SF: Intermediate mesoderm. Developmental Biology, 6th Ed, E, et al.: Dissecting stages of human kidney development and tumor- Sunderland, MA, Sinauer Associates, 2000 igenesis with surface markers affords simple prospective purification of 3. Little MH, McMahon AP: Mammalian kidney development: Principles, nephron stem cells. Sci Rep 6: 23562, 2016 progress, and projections. Cold Spring Harb Perspect Biol 4: a008300, 23. Pode-Shakked N, Gershon R, Tam G, Omer D, Gnatek Y, Kanter I, et al.: 2012 Evidence of in vitro preservation of human nephrogenesis at the single- 4. Little M, Georgas K, Pennisi D, Wilkinson L: Kidney development: Two cell level. Stem Cell Reports 9: 279–291, 2017 tales of tubulogenesis. In: Current Topics in Developmental Biology, 24. Brunskill EW, Aronow BJ, Georgas K, Rumballe B, Valerius MT, Aronow edited by Chevalier RL, Thornhill BA, Amsterdam, The Netherlands, J, et al.: Atlas of gene expression in the developing kidney at micro- Elsevier, 2010, pp 193–229 anatomic resolution [published correction appears in Dev Cell 16: 482, 5. Schell C, Wanner N, Huber TB: Glomerular development--shaping the 2009]. Dev Cell 15: 781–791, 2008 multi-cellular filtration unit. Semin Cell Dev Biol 36: 39–49, 2014 25. Picelli S, Faridani OR, Björklund AK, Winberg G, Sagasser S, Sandberg 6. Adam M, Potter AS, Potter SS: Psychrophilic proteases dramatically R: Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc reduce single-cell RNA-seq artifacts: A molecular atlas of kidney de- 9: 171–181, 2014 velopment. Development 144: 3625–3632, 2017 26. Picelli S, Björklund ÅK, Faridani OR, Sagasser S, Winberg G, Sandberg 7. Magella B, Adam M, Potter AS, Venkatasubramanian M, Chetal K, Hay R: Smart-seq2 for sensitive full-length transcriptome profiling in single SB, et al.: Cross-platform single cell analysis of kidney development cells. Nat Methods 10: 1096–1098, 2013 shows stromal cells express Gdnf. Dev Biol 434: 36–47, 2018 27. Kobayashi A, Valerius MT, Mugford JW, Carroll TJ, Self M, Oliver G, 8. Lindström NO, Guo J, Kim AD, Tran T, Guo Q, De Sena Brandine G, et al.: Six2 defines and regulates a multipotent self-renewing nephron et al.: Conserved and divergent features of mesenchymal progenitor progenitor population throughout mammalian kidney development. cell types within the cortical nephrogenic niche of the human and Cell Stem Cell 3: 169–181, 2008 mouse kidney. JAmSocNephrol29: 806–824, 2018 28. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL: To- 9. Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, et al.: pHat2: Accurate alignment of transcriptomes in the presence of in- Highly parallel genome-wide expression profiling of individual cells sertions, deletions and gene fusions. Genome Biol 14: R36, 2013 using nanoliter droplets. Cell 161: 1202–1214, 2015 29. Anders S, Pyl PT, Huber W: HTSeq--a Python framework to work with 10. Zhang Y, Weinberg RA: Epithelial-to-mesenchymal transition in cancer: high-throughput sequencing data. Bioinformatics 31: 166–169, 2015 Complexity and opportunities. Front Med 12: 361–373, 2018 30. Love MI, Huber W, Anders S: Moderated estimation of fold change and 11. Lovisa S, LeBleu VS, Tampe B, Sugimoto H, Vadnagara K, Carstens JL, dispersion for RNA-seq data with DESeq2. Genome Biol 15: 550, 2014 et al.: Epithelial-to-mesenchymal transition induces cell cycle arrest 31. Van Der Maaten LJP, Hinton GE: Visualizing high-dimensional data and parenchymal damage in renal fibrosis. Nat Med 21: 998–1009, using t-SNE. J Mach Learn Res 9: 2579–2605, 2008 2015 32. Anders S, McCarthy DJ, Chen Y, Okoniewski M, Smyth GK, Huber W, 12. Yang Y, Park JW, Bebee TW, Warzecha CC, Guo Y, Shang X, et al.: et al.: Count-based differential expression analysis of RNA sequencing Determination of a comprehensive alternative splicing regulatory net- data using R and Bioconductor. Nat Protoc 8: 1765–1786, 2013 work and combinatorial regulation by key factors during the epithelial- 33. Anders S, Huber W: Differential expression analysis for sequence count to-mesenchymal transition. Mol Cell Biol 36: 1704–1719, 2016 data. Genome Biol 11: R106, 2010 13. Di Modugno F, Iapicca P, Boudreau A, Mottolese M, Terrenato I, 34. ShenS,ParkJW,LuZX,LinL,HenryMD,WuYN,etal.:rMATS:Robust Perracchio L, et al.: Splicing program of human MENA produces a and flexible detection of differential alternative splicing from replicate previously undescribed isoform associated with invasive, mesenchymal- RNA-Seq data. Proc Natl Acad Sci U S A 111: E5593–E5601, 2014 like breast tumors. Proc Natl Acad Sci U S A 109: 19280–19285, 35. Thorvaldsdóttir H, Robinson JT, Mesirov JP: Integrative genomics 2012 viewer (IGV): High-performance genomics data visualization and ex- 14. Warzecha CC, Sato TK, Nabet B, Hogenesch JB, Carstens RP: ESRP1 ploration. Brief Bioinform 14: 178–192, 2013 and ESRP2 are epithelial cell-type-specific regulators of FGFR2 splic- 36. Katz Y, Wang ET, Silterra J, Schwartz S, Wong B, Thorvaldsdóttir H, ing. Mol Cell 33: 591–601, 2009 et al.: Quantitative visualization of alternative exon expression from 15. Brown RL, Reinke LM, Damerow MS, Perez D, Chodosh LA, Yang J, RNA-seq data. Bioinformatics 31: 2400–2402, 2015 et al.: CD44 splice isoform switching in human and mouse epithelium is 37. Anders S, Reyes A, Huber W: Detecting differential usage of exons from essential for epithelial-mesenchymal transition and breast cancer pro- RNA-seq data. Genome Res 22: 2008–2017, 2012 gression. JClinInvest121: 1064–1074, 2011 38. Chen J, Bardes EE, Aronow BJ, Jegga AG: ToppGene suite for gene list 16. Sneath RJ, Mangham DC: The normal structure and function of CD44 enrichment analysis and candidate gene prioritization. Nucleic Acids and its role in neoplasia. Mol Pathol 51: 191–200, 1998 Res 37: W305–W311, 2009

JASN 31: ccc–ccc, 2020 mRNA Splice Isoforms Switch during Kidney Development 13 BASIC RESEARCH www.jasn.org

39. Park JW, Jung S, Rouchka EC, Tseng YT, Xing Y: rMAPS: RNA map 54. Denker BM, Sabath E: The biology of epithelial cell tight junctions in the analysis and plotting server for alternative exon regulation. Nucleic kidney. JAmSocNephrol22: 622–625, 2011 Acids Res 44: W333–W338, 2016 55. Van Itallie CM, Rogan S, Yu A, Vidal LS, Holmes J, Anderson JM: Two 40. Ray D, Kazan H, Cook KB, Weirauch MT, Najafabadi HS, Li X, et al.: A splice variants of claudin-10 in the kidney create paracellular pores with compendium of RNA-binding motifs for decoding gene regulation. different ion selectivities. Am J Physiol Renal Physiol 291: Nature 499: 172–177, 2013 F1288–F1299, 2006 41. Anderson ES, Lin CH, Xiao X, Stoilov P, Burge CB, Black DL: The car- 56. Hohenstein P, Hastie ND: The many facets of the Wilms’ tumour gene, diotonic steroid digitoxin regulates alternative splicing through de- WT1. Hum Mol Genet 15: R196–R201, 2006 pletion of the splicing factors SRSF3 and TRA2B. RNA 18: 1041–1049 57. Ozdemir DD, Hohenstein P: Wt1 in the kidney--a tale in mouse models. 42. Dittmar KA, Jiang P, Park JW, Amirikian K, Wan J, Shen S, et al.: Pediatr Nephrol 29: 687–693, 2014 Genome-wide determination of a broad ESRP-regulated post- 58. Hastie ND: The genetics of Wilms’ tumor--a case of disrupted devel- transcriptional network by high-throughput sequencing. Mol Cell Biol opment. Annu Rev Genet 28: 523–558, 1994 32: 1468–1482, 2012 59. Lefebvre J, Clarkson M, Massa F, Bradford ST, Charlet A, Buske F, et al.: 43. Bhate A, Parker DJ, Bebee TW, Ahn J, Arif W, Rashan EH, et al.: ESRP2 Alternatively spliced isoforms of WT1 control podocyte-specificgene controls an adult splicing programme in hepatocytes to support post- expression. Kidney Int 88: 321–331, 2015 natal liver maturation. Nat Commun 6: 8768, 2015 60. Haber DA, Sohn RL, Buckler AJ, Pelletier J, Call KM, Housman DE: 44. McMahon AP, Aronow BJ, Davidson DR, Davies JA, Gaido KW, Alternative splicing and genomic structure of the Wilms tumor gene Grimmond S, et al.; GUDMAP project: GUDMAP: The genitourinary WT1. Proc Natl Acad Sci U S A 88: 9618–9622, 1991 developmental molecular anatomy project. J Am Soc Nephrol 19: 61. Laity JH, Dyson HJ, Wright PE: Molecular basis for modulation of bi- 667–671, 2008 ological function by alternate splicing of the Wilms’ tumor suppressor 45. Harding SD, Armit C, Armstrong J, Brennan J, Cheng Y, Haggarty B, protein. Proc Natl Acad Sci U S A 97: 11932–11935, 2000 et al.: The GUDMAP database--an online resource for genitourinary 62. Larsson SH, Charlieu JP, Miyagawa K, Engelkamp D, Rassoulzadegan research. Development 138: 2845–2853, 2011 M, Ross A, et al.: Subnuclear localization of WT1 in splicing or tran- 46. La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, scription factor domains is regulated by alternative splicing. Cell 81: et al.: RNA velocity of single cells. Nature 560: 494–498, 2018 391–401, 1995 47. Cho EA, Patterson LT, Brookhiser WT, Mah S, Kintner C, Dressler GR: 63. Barbaux S, Niaudet P, Gubler MC, Grünfeld JP, Jaubert F, Kuttenn F, Differential expression and function of cadherin-6 during renal epi- et al.: Donor splice-site mutations in WT1 are responsible for Frasier thelium development. Development 125: 803–812, 1998 syndrome. Nat Genet 17: 467–470, 1997 48. Warzecha CC, Jiang P, Amirikian K, Dittmar KA, Lu H, Shen S, et al.: An 64. Hammes A, Guo JK, Lutsch G, Leheste JR, Landrock D, Ziegler U, et al.: ESRP-regulated splicing programme is abrogated during the epithelial- Two splice variants of the Wilms’ tumor 1 gene have distinct functions mesenchymal transition. EMBO J 29: 3286–3300, 2010 during sex determination and nephron formation. Cell 106: 319–329, 49.VenablesJP,BrosseauJ-P,GadeaG,KlinckR,PrinosP,BeaulieuJ-F, 2001 et al.: RBFOX2 is an important regulator of mesenchymal tissue- 65. Menke AL, Schedl A: WT1 and glomerular function. Semin Cell Dev Biol specific splicing in both normal and cancer tissues. Mol Cell Biol 33: 14: 233–240, 2003 396–405, 2013 66. Jin Y, Suzuki H, Maegawa S, Endo H, Sugano S, Hashimoto K, et al.: A 50. Braeutigam C, Rago L, Rolke A, Waldmeier L, Christofori G, Winter J: vertebrate RNA-binding protein Fox-1 regulates tissue-specific splicing The RNA-binding protein Rbfox2: An essential regulator of EMT-driven via the pentanucleotide GCAUG. EMBO J 22: 905– 912, 2003 alternative splicing and a mediator of cellular invasion. Oncogene 33: 67. Park J, Shrestha R, Qiu C, Kondo A, Huang S, Werth M, et al.: Single-cell 1082–1092, 2014 transcriptomics of the mouse kidney reveals potential cellular targets of 51. Warzecha CC, Carstens RP: Complex changes in alternative pre-mRNA kidney disease. Science 360: 758–763, 2018 splicing play a central role in the epithelial-to-mesenchymal transition 68. Shalek AK, Satija R, Adiconis X, Gertner RS, Gaublomme JT, (EMT). Semin Cancer Biol 22: 417–427, 2012 Raychowdhury R, et al.: Single-cell transcriptomics reveals bimodality in 52. Yeowell HN, Walker LC: Tissue specificity of a new splice form of the expression and splicing in immune cells. Nature 498: 236–240, 2013 human lysyl hydroxylase 2 gene. Matrix Biol 18: 179–187, 1999 69. Bebee TW, Sims-Lucas S, Park JW, Bushnell D, Cieply B, Xing Y, et al.: 53. Hovhannisyan RH, Warzecha CC, Carstens RP: Characterization of se- Ablation of the epithelial-specific splicing factor Esrp1 results in ureteric quences and mechanisms through which ISE/ISS-3 regulates FGFR2 branching defects and reduced nephron number. Dev Dyn 245: splicing. Nucleic Acids Res 34: 373–385, 2006 991–1000, 2016

AFFILIATIONS

1Department of Bioengineering and Bar-Ilan Institute of Nanotechnology and Advanced Materials, Bar-Ilan University, Ramat Gan, Israel 2The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel 3Pediatric Stem Cell Research Institute, Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, Tel-Hashomer, Israel 4Division of Pediatric Nephrology, Sheba Medical Center, Tel-Hashomer, Israel 5Sackler Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel 6The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot, Israel 7Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands

14 JASN JASN 31: ccc–ccc,2020