<<

Transcriptomes of major renal collecting duct cell types PNAS PLUS in mouse identified by single-cell RNA-seq

Lihe Chena, Jae Wook Leeb, Chung-Lin Choua, Anil V. Nairc, Maria A. Battistonec, Teodor G. Paunescu˘ c, Maria Merkulovac, Sylvie Bretonc, Jill W. Verlanderd, Susan M. Walle,f, Dennis Brownc, Maurice B. Burga,1, and Mark A. Kneppera,1

aSystems Center, National Heart, , and Institute, National Institutes of Health, Bethesda, MD 20892; bNephrology Clinic, National Center, Goyang, 10408, South Korea; cCenter for Systems Biology, Program in Membrane Biology and Division of , Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114; dDivision of Nephrology, and Renal Transplantation, University of Florida College of Medicine, Gainesville, FL 32610; eDepartment of Medicine, Emory University School of Medicine, Atlanta, GA 30322; and fDepartment of Physiology, Emory University School of Medicine, Atlanta, GA 30322

Contributed by Maurice B. Burg, October 3, 2017 (sent for review June 22, 2017; reviewed by Andrew P. McMahon and George J. Schwartz) Prior RNA sequencing (RNA-seq) studies have identified complete count for a small fraction of the . Therefore, transcriptomes for most renal epithelial cell types. The exceptions methods were required for selective enrichment of the three cell are the cell types that make up the renal collecting duct, namely types from mouse kidney-cell suspensions. Here, we have identi- intercalated cells (ICs) and principal cells (PCs), which account for fied cell-surface markers for A-ICs, B-ICs, and PCs, allowing these only a small fraction of the kidney mass, but play critical physio- cell types to be enriched from kidney-cell suspensions by using logical roles in the regulation of , FACS. We used the resulting enrichment protocols upstream from volume, and extracellular fluid composition. To enrich these cell microfluidic-based scRNA-seq to successfully identify transcriptomes types, we used FACS that employed well-established lectin cell of all three cell types. These three transcriptomes have been surface markers for PCs and type B ICs, as well as a newly identified permanently posted online to provide a community resource. Our cell surface marker for type A ICs, c-Kit. Single-cell RNA-seq using the bioinformatic analysis of the data addresses the possible roles of IC- and PC-enriched populations as input enabled identification of A-IC–, B-IC–, and PC-selective in regulation of renal complete transcriptomes of A-ICs, B-ICs, and PCs. The data were transport, total body , and renal pathophysiology. PHYSIOLOGY used to create a freely accessible online -expression database for collecting duct cells. This database allowed identification of Results genes that are selectively expressed in each cell type, including cell- Single- RNA-Seq in Microdissected Mouse Cortical Collecting surface receptors, transcription factors, transporters, and secreted Ducts. To provide reference data for interpretation of scRNA- . The analysis also identified a small fraction of hybrid cells seq experiments in mouse, we have carried out single-tubule expressing -2 and anion exchanger 1 or tran- RNA-seq in cortical collecting ducts (CCDs) rapidly micro- scripts. In many cases, mRNAs for receptors and their ligands were dissected from mouse kidneys without protease treatment. Data identified in different cells (e.g., Notch2 chiefly in PCs vs. Jag1 chiefly were highly concordant among 11 replicates from seven different in ICs), suggesting signaling cross-talk among the three cell types. The identified patterns of among the three types of Significance collecting duct cells provide a foundation for understanding physi- ological regulation and pathophysiology in the renal collecting duct. A long-term goal in mammalian biology is to identify the genes expressed in every cell type of the body. In the kidney, the systems biology | intercalated cell | principal cell | kidney expressed genes (i.e., transcriptome) of all epithelial cell types have already been identified with the exception of the cells that hole-body homeostasis is maintained in large part by make up the renal collecting duct, which is responsible for reg- Wtransport processes in the kidney. The transport occurs ulation of blood pressure and body fluid composition. Here, along the renal tubule, which is made up of multiple segments single-cell RNA-sequencing was used in mouse to identify tran- consisting of epithelial cells, each with unique sets of transporter scriptomes for the major collecting duct cell types: type A in- proteins. There are at least 14 renal tubule segments containing tercalated cells, type B intercalated cells, and principal cells. The at least 16 epithelial cell types (1, 2). A systems-level under- information was used to create a publicly accessible online re- standing of renal function depends on knowledge of which source. The data allowed identification of genes that are selec- -coding genes are expressed in each of these cell types. tively expressed in each cell type, which is informative for cell- Most renal tubule segments contain only one cell type, and the level understanding of physiology and pathophysiology. genes expressed in these cells have been elucidated through the application of RNA sequencing (RNA-seq) or serial analysis of Author contributions: L.C., J.W.L., S.M.W., D.B., M.B.B., and M.A.K. designed research; L.C., gene expression applied to microdissected from rodent J.W.L., C.-L.C., A.V.N., M.A.B., T.G.P., M.M., and J.W.V. performed research; S.M.W., D.B., and kidneys (2, 3), which identify and quantify all mRNA species M.A.K. contributed new reagents/analytic tools; L.C., J.W.L., C.-L.C., A.V.N., M.A.B., T.G.P., (i.e., transcriptomes) expressed in them. The exception is the M.M., S.B., J.W.V., S.M.W., D.B., M.B.B., and M.A.K. analyzed data; and L.C., J.W.L., C.-L.C., A.V.N., M.A.B., T.G.P., M.M., S.B., J.W.V., S.M.W., D.B., M.B.B., and M.A.K. wrote the paper. renal collecting ducts, which are made up of at least three cell Reviewers: A.P.M., University of Southern California; and G.J.S., University of Rochester types, known as type A intercalated cells (A-ICs), type B in- Medical Center. tercalated cells (B-ICs), and principal cells (PCs). Single-tubule The authors declare no conflict of interest. RNA-seq applied to collecting duct segments provides an ag- gregate transcriptome for these three cell types. Hence, to Published under the PNAS license. identify separate transcriptomes for A-ICs, B-ICs, and PCs, it is Data deposition: The sequences and metadata reported in this paper have been depos- ited in the Gene Expression Omnibus (GEO) database, https://www.ncbi.nlm.nih.gov/geo necessary to carry out RNA-seq at a single-cell level. Recent (accession no. GSE99701). advances in single-cell RNA-seq (scRNA-seq) have facilitated 1To whom correspondence may be addressed. Email: [email protected] or knepperm@ our understanding of heterogeneous tissues like (4), lung nhlbi.nih.gov. (5), pancreas (6), and retina (7). However, a barrier to success This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. with such an approach exists because collecting duct cells ac- 1073/pnas.1710964114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1710964114 PNAS Early Edition | 1of10 Downloaded by guest on September 29, 2021 untreated mice (Dataset S1). The single-tubule RNA-seq data for A + - CD+ + mouse CCDs are provided as a publicly accessible web page GFP :GFP c-Kit :DBA DBA+:c-Kit+ (https://hpcwebapps.cit.nih.gov/ESBL/Database/mTubule_RNA-Seq/). Asb15 Slc4a1 Ptgds Among the most abundant transcripts in mouse CCDs are those Hepacam2 Oxgr1 Fxyd4 typical of PCs (e.g., Aqp2, Aqp3, Scnn1b, Scnn1g, Kcnj1,and Rhbg Cpsf4l Aqp2 Atp6v0d2 Kit Npnt Avpr2) and ICs (e.g., Car2, Slc4a1, Slc26a4, Rhcg,andAtp6v1b1). Aqp6 Aqp6 Apela Wscd2 Dmrt2 Avpr2 Identification of Cell-Surface Markers for ICs. To identify potential Slc4a9 Tmem61 Mcoln3 cell-surface marker proteins that can be used for FACS enrich- Slc26a4 Avpr1a Tmem45b ment of ICs, we used transgenic mice that express GFP driven by + Avpr1a Atp6v1g3 C1qa the promoter for the B1 subunit of the H -ATPase (Atp6v1b1) (8). Atp6v1g3 Hepacam2 Hsd11b2 Atp6v1b1 is known to be expressed in A- and B-ICs and is abun- Foxi1 Ociad2 C1qc dant in connecting tubule (CNT), CCD, and outer medullary Atp6v1c2 Adgrf5 Ptgs1 collecting duct (2), the segments that contain ICs. We used en- Spink8 Atp6v0d2 C1qb zymatic tissue dissociation and FACS to enrich GFP-expressing Nefl Guca2a Kcne1 + Atp6v1c2 Csf1r (GFP ) cells and carried out RNA-seq to quantify mRNA abun- Kit + − Dagla Mme Apoe dance levels for all expressed genes in GFP -cells vs. GFP -cells. + − Insrr Slc4a9 Rnf186 Fig. 1A shows the 24 transcripts with GFP :GFP mRNA ex- Slc4a1 Bsnd Rasl11a pression ratios greater than 50 based on two pairs of samples Serpinb9 Aldh1l1 Scin isolated on different days (full listing of ratios is provided in Slc8a1 Rcan2 Tmem229a Dataset S2). Consistent with the idea that these are IC-selective Pkib Foxi1 Trf genes, 12 of 24 of the transcripts in Fig. 1A are already widely Bglap3 Serpinb9 Tacstd2 Slc26a7 Slc2a4 Kcnj1 known to be expressed in ICs (shown in boldface). Notably, there Gcgr Scnn1g are two transcripts that code for potential cell surface marker Ccbe1 proteins, specifically Hepacam2 and Kit (also known as c-Kit). 0 800 060015 Both are integral membrane proteins with long extracellular N- terminal regions (i.e., type I membrane proteins). Anti–c-Kit an- B tibodies are used extensively for cell-surface labeling of hemato- poietic cells, and excellent reagents are already available for flow sorting. Immunocytochemical labeling with an antibody to c-Kit + (Fig. 1B, red) and H -ATPase A subunit (Fig. 1B, green) confirms the basolateral expression of c-Kit in A-ICs, and weak intracellular labeling for c-Kit was also detectable in B-ICs (Fig. 1B).

+ + RNA-Seq Profiling of Populations of c-Kit and DBA -Enriched Renal Cells. We adapted existing protocols for hematopoietic cells to + sort c-Kit ICs from mouse kidney (Methods and Fig. S1 A and B). We chose fluorophore-labeled Dolichos biflorus agglutinin (DBA), a lectin, to decorate PCs (9, 10). Initially, we carried out Fig. 1. Identification of cell surface markers for ICs. (A) The 24 transcripts with + + − RNA-seq analysis of pooled flow-sorted c-Kit cells, comparing the highest mRNA abundance (i.e., RPKM) ratios between GFP (ICs) and GFP + them with DBA cells. Fig. 1C shows the 24 transcripts with the cells in pooled samples from Atp6v1b1-GFP transgenic mice. Bold type indicates + a gene generally recognized to be expressed in ICs. The x-axis indicates the highest mRNA abundance ratios for pooled c-Kit cells vs. + abundance ratios (full listing of ratios provided in Dataset S2). (B) Collecting duct pooled DBA cells (full listing of ratios provided in Dataset S3). from mouse kidney immunolabeled with anti–H+-ATPase (A-subunit) antibody These transcripts include many that are known from previous (green) and anti–c-Kit antibody (red) shows strong basolateral c-Kit localization + data to be selectively expressed in ICs (indicated in boldface in A-ICs (cells labeled “A”) that have apical H -ATPase labeling. (Inset) B-IC (cells + in Fig. 1C), including Slc4a1 (anion-exchanger 1), Aqp6 (an labeled “B”) with apical and basolateral H -ATPase labeling but only weak IC aquaporin), Avpr1a (the V1a ), Foxi1 basolateral c-Kit localization. In contrast, the adjacent A-IC has strong basolateral [a (TF)], Slc4a9 (a - c-Kit expression. Some weaker intracellular labeling for c-Kit is also detectable in + ), and several H -ATPase subunits, as well as c-Kit B-ICs. Images were captured by using a Zeiss LSM800 confocal microscope with Airyscan. (Scale bar: 5 μm).L,.(C) Transcripts enriched in c-Kit+ cell pop- (11–13). II (Car2), the chief intracellular + + ulation relative to DBA cell population. Transcript abundance (i.e., TPM) ratio + + carbonic anhydrase in ICs (12), is also abundant in the c-Kit (c-Kit :DBA , presumed IC:PC) greater than 10 and TPM greater than 200 in fraction, even though it was not among those with the top either population. Bold type indicates a gene generally accepted to be expressed 24 abundance ratios. Fig. 2A, Left, shows examples of mapped in ICs. The x-axis indicates the abundance ratios (full listing of ratios provided in reads for recognized IC transcripts, namely Slc4a1, Slc26a4, and Dataset S3). (D) Transcripts enriched in DBA+ cell population relative to c-Kit+ cell + Avpr1a. Overall, the analysis of the c-Kit cell population is population. Transcript abundance ratio (TPM, DBA+:c-Kit+, presumed PC:IC) consistent with the conclusion that these cells are highly enriched greater than 4.77 and TPM greater than 200 in either population. Bold type + in ICs, but could also include c-Kit blood-borne cells present in indicates a gene widely accepted to be expressed in PCs. The x-axis indicates the the kidney. Also found on this list are two receptor proteins not abundance ratios (full listing of ratios provided in Dataset S3). generally known to be associated with ICs, specifically Adgrf5 [adhesion -coupled receptor (GPCR) F5] and Gcgr have not been widely associated with PCs, including neph- (the ). α β Fig. 1D shows the 24 transcripts with the highest mRNA ronectin (a functional ligand of integrin -8/ -1 involved in + + abundance ratios for pooled DBA cells vs. pooled c-Kit cells ) and D2 synthase. Fig. 2A, Right, shows examples of mapped reads for recognized PC pro- (full listing of ratios provided in Dataset S3). Many of these + transcripts encode known PC proteins (boldface in Fig. 1D), teins, namely Aqp2, Avpr2, and Aqp3. Thus, the analysis of DBA including Aqp2, Kcne1, Kcnj1 (ROMK), Scnn1g (ENaC-γ), Avpr2 cells is consistent with identification of them predominately as (vasopressin V2 receptor), and Hsd11b2 (hydroxysteroid 11-β PCs, and supports the use of DBA for enrichment of PCs from dehydrogenase 2). Additional transcripts were identified that heterogeneous mixtures.

2of10 | www.pnas.org/cgi/doi/10.1073/pnas.1710964114 Chen et al. Downloaded by guest on September 29, 2021 A IC representa ve transcripts PC representa ve transcripts Group 1 cells appear to express genes associated with PCs. Group PNAS PLUS 2 cells express genes associated with ICs. Most of these express the c-Kit+ c-Kit+ Aqp2 A-IC marker anion-exchanger 1 (Slc4a1), whereas only a few ex- Slc4a1 + + DBA DBA press the B-IC marker pendrin (Slc26a4). Notably, one other IC marker, receptor-related receptor (Insrr), also clustered c-Kit+ c-Kit+ with pendrin. The Insrr protein has been identified as a pH-sensor Slc26a4 Avpr2 protein that triggers cellular responses to alkalosis (14). Group DBA+ DBA+ 3 cells appear to express markers for ICs and PCs and could be hybrid cells. Group 4 cells generally do not contain markers for + + c-Kit c-Kit either cell type, consistent with the demonstrated lack of complete Avpr1a Aqp3 + + DBA+ DBA+ purity in the original c-Kit and DBA populations. Examination of the group 1 cells (identified as PCs) reveals Atp6v1b1 Aqp2 some degree of heterogeneity within the group. For example, the B ROMK channel (Kcnj1) was strongly expressed in most of these cells, but a few had little or no ROMK mRNA. The same Cell 1 Cell 1 is true for several other proteins that are generally considered PC Cell 2 Cell 2 markers. Thus, among PCs, considerable heterogeneity can be Cell 3 Cell 3 observed at a transcript level. In general, such heterogeneity has Cell 4 Cell 4 been seen in other tissues (4–7, 15), and may reflect genes whose Cell 5 Cell 5 transcripts are produced in bursts (16) or periodically as a result of oscillatory nuclear translocation of TFs, as is seen with and Cell 6 Cell 6 NF-κB (17, 18). Similar heterogeneity was observed for the group Fig. 2. Visualization of RNA-seq mapping reads for c-Kit+ and DBA+ cell 2 cells, identified as ICs. In addition, several PC markers are populations and scRNA-seq mapped reads for representative transcripts. expressed at low levels in group 2 cells. Especially prominent (A) Distribution of RNA-seq reads across gene bodies of selected genes for among these is aquaporin-3 (Aqp3), which appears to be expressed + + + pooled c-Kit (Top) and pooled DBA (Bottom) cells. c-Kit cells abundantly in most of the ICs, albeit at lower levels than in PCs. + express IC genes: Slc4a1, Slc26a4, and Avpr1a. DBA cells abundantly express Fig. 3B shows the cell-to-cell Spearman correlation for the PHYSIOLOGY PC genes: Aqp2, Avpr2, and Aqp3. (B) Distribution of scRNA-seq reads across same markers used in Fig. 3A. This method, which employs the gene bodies of Atp6v1b1 and Aqp2 in cells 1–6. Cells 1–3 abundantly express the H+-ATPase Atp6v1b1, an IC marker. Cells 4–6 abundantly express AQP2, a PC marker. Data were visualized in the UCSC Genome Browser. Vertical axis shows read counts. Map of exon/ organization of each gene is shown AB on top of individual panels. Slc4a9 Foxi1 Avpr1a 0 1 scRNA-Seq Profiling. The transcript lists of pooled cell populations Correla on + Oxgr1 show that some AQP2-positive cells were present in the c-Kit Atp6v0d2 population and some AE1-positive cells were present in the Atp6v1c2 + markers IC 218 Cells

DBA population, as indicated by noninfinite ratios for these Dmrt2 218 Cells markers in Fig. 1 C and D. Hence, to define the transcriptomes Kit of IC cells (A- and B-ICs), it is necessary to carry out scRNA- Aqp6 seq. In initial experiments, we determined single-cell tran- Slc4a1 + + Rhbg scriptomes by using flow-sorted c-Kit and DBA populations as Atp6ap1 input to a microfluidic chip (C1 System; Fluidigm; Fig. S1A), Slc26a4 recognizing that some non-collecting duct cells are likely to be Insrr trapped along with the target cell types. This system allows direct Egr1 visualization of each chamber to identify and eliminate those L1cam with more than one cell or fragmented cells. In the first run, we Mcoln3 carried out scRNA-seq with cells derived from a 1:1 mixture of c- Aqp3 1 2 + + Scnn1g Kit and DBA populations. Sixty-six single-cell cDNA libraries Kcne1 PC markers were constructed for paired-end sequencing and sequenced to an Avpr2 average depth of 10 million read-pairs per library with a high Kcnj1 percentage of uniquely mapped reads and gene body coverage Hsd11b2 (Fig. S1C). On average, the 66 cells were sequenced to a depth of Fxyd4 -10 10 2,855 genes [transcripts per million (TPM) > 1]. Examples of the Aqp2 pattern of mapped reads for six representative cells are pre- 1 2 Log2(Normalized TPM) sented in Fig. 2B. Cells 1–3 abundantly express Atp6v1b1,anIC marker. Cells 4–6 abundantly express Aqp2 and are likely to be Fig. 3. Supervised clustering of cells based on genes known to be expressed in PCs. Note that the reads map almost exclusively to exons and ICsorPCs.(A) Heat map showing four different cell groups with different gene that the reads are distributed to all exons. expression patterns. Group 1 cells appear to express genes associated with PCs. Given success with these initial studies, we carried out three Group 2 cells express genes associated with ICs. Group 3 cells are likely hybrid additional single-cell profiling runs: (i) 68 cells from a 1:1 mixture cells with the expression of both IC and PC genes. Group 4 cells express neither + + + of c-Kit and DBA populations, (ii) 43 cells from a c-Kit pop- of these genes. Note that, in group 2 cells, only a few cells express Slc26a4 and + ulation, and (iii)58cellsfromaDBA population. Fig. 3A shows Insrr and are likely B-ICs. Columns are individual cells and rows are specific genes. Colors indicate the row-wise mean centered gene expression levels a heat-map representation of supervised clustering from all [mean normalized log2(TPM + 1)]. (B) Heat map showing the same cell groups 218 cells. For this analysis, the classifiers are composed of TPM (218 cells) identified in A grouped by calculating cell–cell Spearman correlation values for generally recognized IC and PC markers shown on the coefficients. Colors indicate the cell correlation by Spearman correlation co- right of the figure. The cells divided into four general groups. efficient between cells. Color bars indicate different cell clusters in A and B.

Chen et al. PNAS Early Edition | 3of10 Downloaded by guest on September 29, 2021 TPM rank rather than the TPM value, identified the same flow-sorting protocol for enrichment of pendrin-expressing ICs, groups of cells as in Fig. 3A. The greater granularity in this figure i.e., B-ICs (Fig. S1D). First, the procedure used negative selec- emphasizes the heterogeneity within cell groups. tion markers L1CAM (19) (to eliminate PCs), CD45 (to elimi- Fig. 4A shows so-called t-distributed stochastic neighbor em- nate hematopoietic cells), and DAPI (to eliminate nonviable bedding (t-SNE) visualization of principal component analysis for cells). The remaining cells were further sorted (Fig. S1D, Right) the cells, a method of unsupervised clustering (Fig. S2). Transcripts by using peanut agglutinin (PNA) lectin as a positive marker and Lotus tetragonolobus lectin (LTL; a negative marker to eliminate in each principal component are listed in Fig. S2C.Fig.4A identifies + − four cell groups indicated by colors. Based on known markers for cells). The PNA /LTL cells were taken for single-cell RNA-seq analysis (n = 95), yielding 17 cells that were ICs (Avpr1a, Aqp6, Slc4a1, Slc26a4,andAtp6v1c2)andPCs(Aqp2, + Avpr2, Scnn1g,andKcnj1)summarizedinFig.4B, we identify the pendrin-positive (i.e., B-ICs), giving a total of 23 pendrin cells cells represented as green points in Fig. 4A as ICs and the cells in all runs. Many of the non-ICs found in this run expressed represented as blue points in Fig. 4A as PCs. Most ICs expressed Slc12a1 (the -sensitive Na-K-2Cl cotransporter 2) Slc4a1 rather than Slc26a4, indicating that they are predominantly and are therefore thick ascending limb cells. A-ICs. One nominal PC marker, Aqp3,mappedtotheblueandthe green cells in Fig. 4A, indicating that it is expressed in ICs as well as Combined Data for all Single Cells. Combining all data, we obtained PCs, albeit at lower levels in the former. Attempts to carry out transcriptomes from 74 cells classified as PCs, 87 cells classified as secondary clustering in the PC cell cluster did not yield clear-cut A-ICs, and 23 cells classified as B-ICs (Dataset S4). Examples of transcripts expressed among A-ICs, B-ICs, and PCs are shown in subgroups of cells (Fig. S3B). The third row in Fig. 4B shows that Fig. 5A. Expression of AE1 (Slc4a1) and pendrin (Slc26a4) appear proximal tubule markers [Slc5a2 (SGLT2), Lrp2 (Megalin), Pdzk1, to be mutually exclusive (Fig. 5A), i.e., no cells were found with Slc34a1 (NaPi-2), and Slc13a1 (Na sulfate cotransporter-1)] map TPM values greater than 10 for both transcripts. In contrast, many chiefly to the cells labeled in purple in Fig. 4A, identifying these as cells were found that express either of the IC-marker transcripts proximal tubule cells. and Aqp2, consistent with the presence of hybrid IC/PCs. Simi- + larly, a minority of A-ICs and B-ICs expressed relatively high scRNA-Seq Profiling of Pendrin Renal Cells. A limitation of the levels of the nominally PC Scnn1g. aforementioned analysis is the low number of B-ICs identified. An aggregate transcriptome list for each collecting duct cell Evidently, c-Kit is a good surface marker for enrichment of type is provided as a resource on a permanent web page (https:// A-ICs, but not B-ICs. Consequently, we developed an alternative hpcwebapps.cit.nih.gov/ESBL/Database/scRNA-Seq/). The online data can be sorted or filtered in various ways and can be down- loaded as a resource for experimental design or bioinformatic A 10 Proximal tubule cells modeling. An important issue in presenting the aggregate lists is whether to represent TPM values for each gene in each cell Non-epithelial cells type as the mean or the median across all single-cell replicates. 5 To address this, we used the single-tubule RNA-seq data for non–-treated microdissected mouse CCDs reported here earlier to compare with scRNA-seq data. The analysis (Fig. 5B) showed that, when mean values for scRNA-seq data 0 were assumed, there was a much better correlation with the

tSNE_2 single-tubule data than when median values were used. (We assumed a ratio of 20%:20%:60% for A-ICs, B-ICs, and PCs, -5 based on a consensus of published data.) Thus, in the re- Principal cells mainder of this paper, we work with mean values to represent Intercalated cells the aggregate transcriptomes of A-ICs, B-ICs, and PCs. -20 -10 0 10 B tSNE_1 Cell Type-Selective Transcripts. Transcripts expressed predominately in a single cell type are of interest as potential cell-type markers Avpr1a Aqp6 Slc4a1 Slc26a4 Atp6v1c2 or for investigating intercellular signaling among the three cell types. Fig. 5C and Dataset S4 summarize the transcripts selective for A-ICs (including Slc4a1 and Aqp6), B-ICs (in- cluding Slc26a4 and Insrr), both IC types (including Foxi1 and + 2pqA Ag2rpv 1nncS 3pqA1jncK multiple H -ATPase subunits), and PCs (including Aqp2 and Hsd11b2).

Bioinformatic Analysis of Transcriptomes. The data can be analyzed Lrp2 bioinformatically to address several questions, as follows. Slc5a2 Pdzk1 Slc34a1 Slc13a1 Is there evidence for cross-talk between cell types, i.e., cell-surface receptors with cognate ligands expressed in a neighboring cell type? We found several examples of such relationships. For example, the receptor tyrosine-kinase c-Kit is expressed chiefly in A-ICs, Fig. 4. Unsupervised clustering of single-cell RNA-seq data. (A) t-SNE plot whereas its ligand (Kitl or stem-cell growth factor) is expressed showing four different cell clusters. Different cell groups are color-coded. chiefly in PCs. Furthermore, Notch2 is expressed chiefly in PCs, Brown, nonepithelial cells; green, ICs; blue, PCs; purple, proximal tubule whereas Jagged1 (Jag1) is expressed predominantly in ICs (Fig. cells. (B) t-SNE plot showing expression of selected gene among single cells. 5D). Nephronectin (Npnt) is expressed predominantly in PCs First row shows the expression of IC genes among single cells, second row and interacts with integrin β1(Itgb1), which is expressed pre- shows the expression of PC genes among single cells, and third row shows the expression of proximal tubule cell genes. Red color indicates higher and dominantly in B-ICs. Other examples are provided in Dataset S4. gray color indicates lower gene expression. Expression of nonepithelial What GPCRs are expressed in A-ICs, B-ICs, or PCs? Fig. 6A lists the genes is shown in Fig. S3A. A full listing of transcripts for each single cell is GPCRs abundantly expressed in PCs and ICs with their mean provided in Dataset S4. TPM values (full list provided in Dataset S4). Of particular interest

4of10 | www.pnas.org/cgi/doi/10.1073/pnas.1710964114 Chen et al. Downloaded by guest on September 29, 2021 are the receptors that are selectively expressed in particular cell PNAS PLUS A types: the A-ICs selectively express the UDP--selective P2ry14 (20), the adhesion receptor Adgrf5, and, somewhat surprisingly, the thyroid stimulating re- ceptor. B-ICs selectively express the extracellular -sensing receptor, the β2-, and the type 3 prostaglandin E2 receptor. PCs selectively express the V2 , the atypical receptor 3, and the type 1 prostaglandin E2 receptor. IC-selective receptors not limited to A-ICs or B-ICs above include the glucagon receptor, the V1a vasopressin re- ceptor, and the oxoglutarate receptor. We also performed pairwise B correlation analysis of all of the GPCRs listed in Dataset S4 with known cell-type markers and constructed a GPCR correlation network (Fig. 6B). The analysis revealed distinct patterns of GPCR expression among three cell types. What secreted proteins are selectively expressed in PCs? Cells secrete proteins to carry out extracellular signaling and to regulate the ECM, among other functions. We extracted the transcripts that code for secreted proteins in the three cell types and present the ones that are abundantly expressed in ICs or PCs in Fig. 6C (see also Dataset S4). Examples of important secreted proteins in ICs are (Guca2a), serine peptidase inhibitors (Serpinb1a and Serpinb6b), and endothelin 3 (Edn3). Guanylin binds to integral C kinase/guanylyl proteins of the ANP- receptor family that are found downstream in the inner medullary collecting duct (IMCD) (Npr1 and Gucy2g) (2). Endothelin 3 also

plays a role in regulation of collecting duct transport (21). Se- PHYSIOLOGY creted proteins selectively expressed in PCs include follistatin-like 1 (Fstl1; a potential activin antagonist), elabela (Apela; a ligand for the ), and nephronectin (Npnt;functionalligandof integrin α-8/β-1 in kidney development). What TFs are selectively expressed in A-ICs, B-ICs, or PCs? DNA-binding TFs can play roles in cell type-specific gene expression or in day- to-day physiological regulation. Fig. 7A lists abundant TFs identified in at least one of the three cell types (full list provided in Dataset S4). Among IC-selective TFs, the most selectively expressed in A-ICs were the developmental gene Dmrt2 and the Y-box gene Ybx2 (that also regulates mRNA stability). Several TFs were predominantly expressed in B-ICs, including the de- velopmental gene Hmx2, the b-ZIP TF Jund, and the forkhead box TFs Foxi1 and Foxq1. A third forkhead box TF, Foxp1,is selectively expressed in both IC subtypes. Its localization in ICs was confirmed by immunocytochemistry (Fig. 8A). PCs abun- dantly express several TFs already implicated in regulation of Aqp2 gene transcription, including Gata3, Ehf, and Tfap2b (22). D Localization of Tfap2b protein in PCs was confirmed by immu- nocytochemistry (Fig. 8A). The immunocytochemical localiza- tions of other differentially expressed transcripts, namely Eps8, Fosb (Dots in the cell nucleus), Hoxd8, and Scin, also matched the scRNA-seq data (Fig. 8). We also performed pairwise correlation analysis of all of the Fig. 5. Gene distribution among three cell types. (A) Violin plot showing Slc4a1, TFs listed in Dataset S4 with known cell-type markers and Atp6v1c2, Slc26a4, Insrr, Aqp2,andScnn1g expression in A-ICs, B-ICs, and PCs. (B) constructed a TF correlation network (Fig. 7B) and heat map Correlation between scRNA-seq and single-tubule RNA-seq. Predicted CCD (Fig. 7C). The analysis revealed three subnetworks (Fig. 7B) + + transcriptome (20% A-ICs 20% B-ICs 60% PCs) was calculated with median representing PC, A-IC, and B-IC TFs. PC TFs connect to A-IC TPM (Left)andmeanTPM(Right) of each cell type. Pearson correlation was calculated between predicted CCD transcriptome with CCD transcriptome TFs through the developmental TF Tfcp2l1 (23), whereas B-IC obtained from microdissected CCDs (Dataset S1) using the cor.test function in R. TFs connect to A-IC TFs through Foxi1.

Data are log2-transformed before plotting.Atotalof8,022genesthatare What transporters and channels are expressed in A-ICs, B-ICs, or PCs? abundantly expressed in all three cell types (Dataset S1) are plotted. (C)Top These three cell types are generally characterized on the basis of 25 transcripts enriched in A-ICs, B-ICs, ICs, and PCs. Transcript abundance fraction their transport functions. The chief transporters and channels found (mean TPM) greater than 0.85 [i.e., top 25 IC enriched transcripts: A-IC/(A-IC + in these three cell types are listed in Fig. 9A (full list provided B-IC + PC) > 0.85 and sorted by TPM]. Transcripts known to be expressed in in Dataset S4). Fig. 9B shows the pairwise correlation network collecting duct are indicated in bold type (full list provided in Dataset S4). between transporters and channels and cell type-selective genes. (D) Violin plot showing ligand Kitl and Jag1 and its corresponding receptors Kit TheA-ICssecreteprotonsandabsorb bicarbonate, and the key and Notch2 expression in A-ICs, B-ICs, and PCs. Each dot in the plot indicates an – individual cell. A-IC cluster is colored green, B-IC cluster is colored light green, and bicarbonate exchanger expressed in these cells is AE1 PC cluster is colored blue. Vertical axis shows the gene expression value (Slc4a1). In addition, the results indicate that the chloride trans-

[log2(TPM + 1)]. Other examples are listed in Dataset S4. porter Aqp6 is also most abundant in A-ICs.

Chen et al. PNAS Early Edition | 5of10 Downloaded by guest on September 29, 2021 A B

C

Fig. 6. Seven-membrane spanning receptors (i.e., GPCRs) expressed in ICs or PCs. (A) Tables show GPCRs selectively expressed in PCs and ICs. A yellow color gradient was used to indicate expression levels (full list provided in Dataset S4). (B) Pairwise correlation of the GPCRs, PC genes (Avpr2, Aqp2,andScnn1g),andICgenes(Slc4a1, Kit, Aqp6, Slc4a9, Slc26a4,andInsrr) was calculated (detailed description provided in Methods). Genes with pairwise correlation >0.3 and FDR <0.05 were included for network construction. Gene pairs were presented as edges linking respective genes. PC genes are colored dark blue, B-IC genes are colored light green, and A-IC genes (Slc4a9 is a IC gene connecting A-IC and B-IC) are colored green. All other genes are colored light blue. (C) Top 16 secreted proteins abundantly expressed in A-ICs, B-ICs, and PCs (full list provided in Dataset S4). Bold type indicates cell type-specific secreted protein transcripts in A-ICs (ICs), B-ICs (ICs), and PCs.

B-ICs secrete bicarbonate in exchange for chloride, and the key Discussion anion transporter in this process is pendrin (Slc26a4). They also Collecting duct cells account for only a small fraction of cells in mediate the electroneutral component of transepithelial NaCl ab- the kidney. In this study, we have used scRNA-seq to obtain sorption seen in CCDs (24, 25). In addition to pendrin, B-IC se- comprehensive transcriptome profiles for the three major subtypes lective channels and transporters include the sodium-bicarbonate of collecting duct cells (A-ICs, B-ICs, and PCs) in the mouse kid- cotransporter Slc4a9 (Fig. 9A), whereas a related sodium- ney. It should be emphasized that FACS was used in these exper- bicarbonate cotransporter Slc4a8 is not strongly expressed (Data- iments only as a means of enriching the cell types of interest. The set S4). Surprisingly, Slc4a9 is also detected in A-ICs, albeit at a identification of individual cell types did not depend on the anti- lower level. Fig. 9C shows immunocytochemistry findings for Slc4a9 bodies or lectins. Instead, cells were classified as A-ICs, B-ICs, or in the mouse , confirming its predominant B-IC localization PCs strictly according to the RNA-seq data. Because the data are and weaker expression in A-ICs. Another potentially important likely to be of value to other researchers, we have curated the transcript that is selectively expressed in the B-ICs is Tmem37, scRNA-seq data (as well as newly generated single-tubule data for which is a subunit of the L-type , the target of the mouse CCD) to create online resources that will allow users to dihydropyridine class of antihypertensive drugs such as nifedipine. easily access the data for their own studies. This has been preserved PCs mediate regulated of and the elec- as part of a larger resource called the Kidney Systems Biology trogenic component of NaCl absorption in the collecting duct. Project (KSBP; https://hpcwebapps.cit.nih.gov/ESBL/Database). Not surprisingly, the water channels AQP2 and AQP3 are highly One of the objectives of the KSBP is to identify transcriptomes for expressed in these cells, and the epithelial sodium channel each major cell type in the kidney. Coupled with RNA-seq analysis (ENaC) β-andγ-subunits are strongly expressed as well (Fig. 9A). of renal glomerular epithelial cells (27), our previous single-tubule The ENaC α-subunit is expressed only at a low level in PCs, as RNA-seq study in rat (2) and new single-tubule RNA-seq data for expected in that have not been sodium-restricted or mouse CCD in the present study, we have now achieved coverage -treated (26). Also, PCs secrete potassium and, as ofallofthemajorepithelialcelltypes in the kidney. We expect that expected, the Kcnj1, Kcnj10, Kcnj16, and Kcne1 potassium the aggregate data will be useful for a variety of applications, in- channels are strongly and selectively expressed in PCs. We did cluding selection of drug targets, understanding of renal patho- not find evidence for Kcnma1, commonly referred to as the physiology of salt and water balance disorders, and interpretation of Maxi-K or BK . data from transgenic animals in which particular genes have been

6of10 | www.pnas.org/cgi/doi/10.1073/pnas.1710964114 Chen et al. Downloaded by guest on September 29, 2021 AB PNAS PLUS

C PHYSIOLOGY

Fig. 7. TFs expressed in ICs or PCs. (A) TFs selectively expressed in PCs and ICs. A yellow color gradient was used to indicate expression levels (full list provided in Dataset S4). (B) Pairwise correlation of the TFs, PC genes (Avpr2, Aqp2, and Scnn1g), and IC genes (Slc4a1, Kit, Aqp6, Slc4a9, Slc26a4, and Insrr) was cal- culated (detailed description provided in Methods). Genes with pairwise correlation >0.3 and FDR <0.05 were included for network construction. Gene pairs were presented as edges linking respective genes. Nodes with less than two edges were removed before network construction. PC genes are colored dark blue, B-IC genes are colored light green, and A-IC genes (Slc4a9 is an IC gene connect A-IC and B-IC) are colored green. All other genes are colored light blue. (C) Heat map showing gene expression pattern of the nodes (TFs, PC, and IC genes) from B among A-ICs, B-ICs, and PCs. Colors indicate the gene expression

levels [Z-normalized log2(TPM + 1)]. Columns are individual cells, and rows are specific genes listed in the box.

knocked out. Several general issues that arose during the conduct of for collecting duct cells is likely to be representative of the cells as these studies are discussed in the following paragraphs. they are in the mice. Saxena et al. (31) have previously succeeded in obtaining viable collecting duct cell suspensions by FACS using Viability of Collecting Duct Cells in Single-Cell Isolation. One ques- transgenes that code for fluorescent proteins to label ICs and PCs, tion that may be asked is, “Does enzymatic treatment of cells and supporting the stable nature of collecting duct cells. This is not the long processing time (>4 h) needed for single cell isolation necessarily true of the other renal epithelial cell types because, substantially affect the transcriptomic profile?” We addressed this although we typically observe sustained function of CCDs for as question in part by carrying out single-tubule RNA-seq in rapidly long as 8 h after the death of the animal in isolated perfused tu- dissected mouse CCDs and comparing the transcriptomic profile bule experiments (28), the viable lifetime of isolated perfused so obtained with the single-cell data. These single-tubule micro- proximal tubules and thick ascending limbs is considerably shorter dissection experiments were done without the aid of enzymatic (29, 32), typically extending only to a maximum of approximately treatment, exactly as we would microdissect tubules for isolated 60–90 min after the death of the animal. Thus, single-cell data perfused tubule experiments that consistently show stable trans- from pre-collecting duct renal tubule segments may be worthy of port characteristics indicative of sustained viability (28–30). The skepticism without special procedures to maintain the viability of comparison of our single-tubule RNA-seq results with the com- these cell types. bined scRNA-seq profiles of the three cell types indicates a very high degree of correlation, indicating that the transcriptomes in The Validity of c-Kit as an A-IC Marker. RNA-seq analysis of ICs + + the single cells are not markedly affected by the isolation pro- isolated by FACS in H -ATPase-GFP mice identified strongly cedure. Therefore, we believe that the single-cell RNA-seq data expressed transcripts corresponding to two cell surface proteins,

Chen et al. PNAS Early Edition | 7of10 Downloaded by guest on September 29, 2021 A Hybrid Cells. Another observation derived from single-cell data is that a small proportion of collecting duct cells were not readily classified as being one of the three recognized cell types, but in- stead appear to be hybrid cells expressing aquaporin-2 and either Eps8 Aqp2 Merge Aqp2 Merge anion exchanger 1 or pendrin. The physiological role and de- Hoxd8 velopmental significance of these cells remains to be determined.

Prospects for Transformed IC Lines. Although transformed cultured cell lines with properties of PCs are available (34, 35) and are Aqp2 Merge Aqp2 Merge Fosb Scin widely used in renal physiological research, we are not aware of the availability of similar cell lines for the study of ICs. Accord- ingly, one motivation for the work described in the present paper is to develop the information and tools necessary to create cul- Foxp1 Aqp2 Merge Tfap2b Aqp2 Merge tured cell lines with properties of the two IC subtypes. The IC B enrichment procedures described in the present paper provide an Gene Gene Annota on A-IC B-IC PC important step in that direction. The current lack of IC lines raises Symbol the possibility that the cells will not grow or maintain a differen- Eps8 EGFR pathway substrate 8 1047.6 545.1 86.0 tiated state in the absence of PCs or substances derived from Fosb FBJ osteosarcoma oncogene B 73.1 44.1 168.3 them. Therefore, the identification in the present study of many Foxp1 forkhead box P1 162.5 104.7 22.5 secreted proteins expressed in PCs (Fig. 6C) and the identification Hoxd8 D8 157.3 85.2 413.5 of corresponding receptor proteins in ICs (Fig. 5D)isofobvious Scin scinderin 12.1 0.0 99.3 relevance. One possible mediator is the Kit ligand (Kitl or stem Tfap2b transcrip on factor AP-2 beta 108.1 20.4 491.1 cell factor), which is a growth factor that we show in the present paper to be expressed in PCs. It binds and activates c-Kit, which is Fig. 8. Immunolabeling of cell type-selective proteins. (A) Immunolabeling showing labeling for cell type-selective proteins (green) with Aqp2 (red) in mouse a receptor , and therefore has potential effects on kidney sections. Eps8 and Foxp1 are predominately expressed in ICs. Scin is lo- growth, proliferation, and differentiation of A-ICs. calized solely in PCs. Hoxd8 and Tfap2b are abundantly detected in the nucleus of PCs, albeit not limited to collecting ducts. Fosb shows nuclear dots staining in Methods both ICs and PCs. Arrows indicate representative cells positive for cell type- Full details of study methods are provided in SI Methods. selective proteins. (B) Tables showing the mean TPM values for proteins la- beled green in A. A yellow color gradient was used to highlight expression levels. Animals. Male C57BL/6 mice aged 2 mo (Taconic) were used. These experi- ments were conducted in accordance with National Institutes of Health (NIH) animal protocol H-0047R4. Atp6v1b1GFP mice (8) used to isolate GFP+ ICs namely c-Kit and Hepacam2. We adopted a strategy to use were housed at the Massachusetts General Hospital (MGH; protocol no. commercially available antibodies to c-Kit in an attempt to enrich 2001N000101) under standard conditions. the ICs from cell suspensions derived from non-genetically mod- ified mice by using FACS. RNA-seq profiling of the cells revealed Cell Isolation, FACS Purification, and RNA Extraction. The kidneys were per- fused via the left ventricle with perfusion buffer (5 mM Hepes, 120 mM NaCl, that c-Kit successfully enriched A-ICs but not B-ICs. Consistent 5 mM KCl, 2 mM CaCl2, 2.5 mM Na2HPO4, 1.2 mM MgSO4, 5.5 mM glucose, with this, immunocytochemical localization of c-Kit demonstrated 5 mM Na acetate, pH 7.4) to remove blood-borne protease inhibitors, fol- stronger labeling in A-ICs than in B-ICs. Thus, within the kidney lowed by the same solution with 1 mg·mL−1 Collagenase B (Roche). Kidneys parenchyma, c-Kit can be viewed as an A-IC marker. However, we were immediately removed into PBS solution and the cortex was dissociated − − emphasize that c-Kit is also expressed in a number of blood-borne by using collagenase B (1.8 mg·mL 1), Dispase II (1.2 U·mL 1; Roche), and · −1 cells that could contaminate kidney preparations. Thus, as we DNase I (10 U mL ; Qiagen) in perfusion buffer (37 °C for 30 min with fre- quent agitation). The partially disaggregated tissue was centrifuged at 70 × observed, cell populations enriched by FACS using c-Kit as a g at 4 °C for 30 s, and then the supernatant was centrifuged for 5 min at surface marker were not composed purely of ICs. 350 × g. The pelleted cells were resuspended in 5 mL lysis buffer (118-156- 101; Quality Biological) for 30 s to remove red blood cells, followed by di- Cell Heterogeneity. Clustering of transcriptomic data for single lution in 45 mL PBS solution. The suspension was pelleted by centrifugation cells allowed us to identify the three distinct cell types known to be at 350 × g for 5 min, resuspended in FACS buffer (perfusion buffer with present in the renal collecting duct, namely PCs, A-ICs, and B-ICs. 0.05% BSA and 2 mM EDTA), and then passed sequentially through 100-μm, 70-μm, and 40-μm cell strainers, producing a single-cell suspension. These cells express the defining markers for the cell types, spe- + + cifically aquaporin-2, anion exchanger 1, and pendrin, respectively. ForenrichmentofDBA and c-Kit cells from C57BL/6 mice, the single-cell suspension was incubated with FITC-conjugated DBA (FL-1031; Vector Labora- However, the scRNA-seq analysis demonstrated considerable tories), PE-conjugated anti–c-Kit antibody (130–102-796; Miltenyi Biotec), and heterogeneity within each of the cell types. For example, many DAPI (30 min at 4 °C) and washed twice in perfusion buffer. The DBA+ (FITC + cells classified as PCs did not express high levels of the mRNA channel) population and c-Kit (PE channel) population were flow-sorted (Aria II; BD Biosciences) at the National Heart, Lung, and Blood Institute (NHLBI) Flow coding for the ROMK potassium channel, a well-established + mediator of potassium by PCs. Many other examples Cytometry Core with the help of Venina M. Dominical, eliminating DAPI cells. – can be gleaned from Fig. 3A. Thus, each of the cells profiled ex- For isolation of a B-IC enriched cell fraction, the single-cell suspension was incubated with APC-conjugated anti-mouse CD45 (17–0451-83; eBioscience), hibits its own unique mRNA expression pattern. We speculate APC-conjugated anti-mouse L1CAM/CD171 (130–102-221; Miltenyi Biotec), that the differences among individual cells relate to the fact that FITC-conjugated LTL (FL-1321; Vector Laboratories), rhodamine-conjugated transcription is not a static event but tends to occur in bursts or as PNA (RL-1072; Vector Laboratories), and DAPI for 30 min at 4 °C with rota- + − − − − a periodic function, governed by oscillatory translocation of TFs into tion and washed twice. PNA LTL CD45 L1CAM DAPI cells were obtained and out of the nucleus as seen for p53 and NF-κB (17, 18). Thus, by FACS (Aria II; BD Biosciences). certain transcripts may come and go in any particular cell. Because Total RNA for bulk RNA-seq was extracted by using an RNeasy Mini Kit (Qiagen) according to the manufacturer’s protocol. RNA integrity was assessed most proteins have much longer half-lives than their respective by using an Agilent 2100 Bioanalyzer with the RNA 6000 Pico Kit (Agilent). mRNAs (33), we would expect protein levels to vary less. However, protein MS is not yet sufficiently sensitive to identify a deep pro- Immunocytochemical Labeling. Perfusion-fixed mouse kidneys were labeled teome in single cells, and hence we cannot test the hypothesis. immunocytochemically (36) or immunohistochemically (37) by using the

8of10 | www.pnas.org/cgi/doi/10.1073/pnas.1710964114 Chen et al. Downloaded by guest on September 29, 2021 ABPNAS PLUS

C PHYSIOLOGY

Fig. 9. Transporters and channels expressed in ICs or PCs. (A) Transporters and channels selectively expressed in PCs and ICs. A yellow color gradient was used to indicate expression levels (full list provided in Dataset S4). (B) Network showing the cell type-selective transporters and channels along with PC genes (Aqp2 and Scnn1g) and IC genes (Slc4a1, Kit, Aqp6, Slc4a9, Slc26a4,andInsrr). Genes with pairwise correlation >0.3 and FDR <0.05 were included for network construction. Gene pairs were presented as edges linking respective genes. Nodes with less than three edges were removed before network construction. (C) Immunohisto- chemistry showing labeling for AE4 (Slc4a9)inbrownandpendrin(Slc26a4) in blue. (Left) Single labeling for AE4. (Right) Double labeling with pendrin. Images were obtained using DIC optics. Arrowheads in double labeling indicate pendrin-negative cells with basolateral AE4 signal. Arrows indicate pendrin-positive cells.

following antibodies: c-Kit (cat. no. 3074; Cell Signaling Technology), determined by using an Agilent 2100 Bioanalyzer using the High-Sensitivity DNA + H -ATPase A-subunit (36), EPS8 (sc-390257; Santa Cruz), Fosb (sc-398595; Kit. For single-cell library preparation, the Nextera XT DNA Sample Preparation Santa Cruz), Foxp1 (sc-398811; Santa Cruz), Hoxd8 (sc-515357; Santa Cruz), Kit (Illumina) was used to generate the libraries with input of only 0.25 ng cDNA Scin (sc-376136; Santa Cruz), Tfap2b (sc-390119; Santa Cruz), AE4 (38), according to the modified protocol from Fluidigm. Libraries were pooled and pendrin (37), and AQP2 (K5007; laboratory of M.A.K.). sequenced to obtain 50-bp paired-end reads on the Illumina HiSeq 3000 platform to a depth of more than 40 million reads per sample (>10 million reads per cell). Single-Cell Capture and Single-Cell cDNA Generation. The Fluidigm C1 system was used to generate single-cell cDNAs. Briefly, post-FACS cells were pelleted Data Processing and Transcript Abundance Quantification. Sequencing data and resuspended to a final concentration of 300–500 cells per microliter. Cells quality checks were performed by using FastQC (https://www.bioinformatics. were combined with Fluidigm C1 Suspension Reagent at a ratio of 3:2 and babraham.ac.uk/projects/fastqc/) followed by read alignments using STAR loaded on a medium-sized (for 10–17-μm-diameter cells) Fluidigm integrated (39) with alignment to the mouse Ensembl genome (GRCm38.p5) and fluidic circuit (IFC) for cell capture in the C1 system. Each IFC chamber was Ensembl annotation (Mus_musculus.GRCm38.83.gtf). Unique reads from visually examined to ensure the removal of doublet or fragmented cells in genomic alignment were processed for visualization on the University of the downstream analysis. Each single cell was then automatically washed, California, Santa Cruz (UCSC), Genome Browser. RSEM (40) was used for lysed, reverse-transcribed by using an oligo-dT primer, and amplified by PCR transcript abundance quantification. Transcript abundances were estimated in the C1 system with a SMART-seq Kit (Clontech). cDNAs were harvested in TPM by using the NIH Biowulf High-Performance Computing platform. from the outlets of IFC and quantified by using a Qubit 2.0 Fluorometer, and Cells associated with low-quality data were excluded based on (i) sequenc- the size distribution was determined by the Agilent 2100 Bioanalyzer by ing depth less than 2 million reads per cell and/or (ii) percentage of uniquely using the High-Sensitivity DNA Kit (Agilent). mapped reads lower than 55% and/or (iii) strong 3′ bias [determined by skewness lower than 0 calculated with RSeQC (41)]. Library Construction and DNA Sequencing. For bulk RNA-seq, cDNA was generated by SMARTer V4 Ultra Low RNA Kit (Clontech) according to the man- Supervised Clustering of Cells Based on Known Gene Expression Profiles. Genes − ufacturer’s protocol. cDNA 1 ng at a concentration of 0.2 ng·μL 1 was “tag- known to be expressed in PCs and ICs were used in clustering. In brief, gene TPM

mented” and barcoded by using a Nextera XT DNA Sample Preparation Kit values were log2-transformed and mean-centered to highlight the different (Illumina). Libraries were generated after 12 cycles of PCR, purified by AmPure expression across cells. Heat maps were generated by the heatmap.2 package XP magnetic beads, and quantified by Qubit. Library size distribution was in the R environment.

Chen et al. PNAS Early Edition | 9of10 Downloaded by guest on September 29, 2021 Unsupervised Clustering of the scRNA-Seq Data. To classify the cells in an Single-Tubule RNA-Seq in Mouse CCDs and cTALs. Kidneys were removed from unbiased way, we used the Seurat package (satijalab.org/seurat/). Highly 5-wk-old male mice and immediately placed in ice-cold PBS solution. CCDs and expressed genes (mean TPM > 4,000) or low-abundance genes (mean TPM < thick ascending limbs were manually dissected in ice-cold dissection solution 5) or genes expressed in less than 10% of total cells were excluded in the (described here earlier) without protease treatment under a Wild M8 dissection downstream analysis. Furthermore, cells with fewer than 2,000 genes stereomicroscope equipped with on-stage cooling. After washes in ice-cold PBS expressed (TPM cutoff at 0) were excluded. Highly variable genes were solution (two times), the microdissected tubules were transferred to TRIzol identified by MeanVarPlot function (y.cutoff = 1,x.low.cutoff = 5.5, fxn.x = reagent for RNA extraction. One to four tubules were collected for each sample. = = expMean, fxn.y logVarDivMean, x.high.cutoff 11). Then, principal Library generation and sequencing details were as described here earlier. component analysis (PCA) and ProjectPCA was performed on these highly ∼ variable genes ( 800 genes). We examined each principal component and RNA-Seq Analysis of Cells Expressing GFP Driven by the Atp6v1b1 Promoter. + − visualized the heterogeneity of the data by heat maps. In PCA, we de- GFP and GFP-negative (GFP ) cells were isolated from Atp6v1b1GFP mice (8) as termined the first eight significant components and proceeded to nonlinear + − described previously (36). GFP and GFP cells were collected in PBS buffer and dimensional reduction (i.e., t-SNE). centrifuged for 5 min at 500 × g. For each experiment, between 20,000 and + − 49,000 GFP cells, and between 500,000 and 600,000 GFP cells, were collected. Cell Type-Specific Transcriptomes. The mean and median TPM were calculated RNA was extracted by using the Quick-RNA MicroPrep (Zymo Research) per the across all single cells of each cell type (A-ICs, B-ICs, and PCs), and the resulting manufacturer’s instructions, with on-column DNase I treatment. RNA was eluted values were used to assemble transcriptomes as described in Results.Thesedata in 2 × 6 μL DNase, RNase-free water. The RNA-seq was the same as described by are reported on specialized publicly accessible, permanent web pages to Lee et al. (2) with the following exceptions: 50-bp paired-end from Illumina HiSeq provide a community resource: https://hpcwebapps.cit.nih.gov/ESBL/Database/ 2500 platform were aligned to mouse mm10 by STAR. The data were normalized scRNA-Seq/index.html. to obtain values as reads per kilobase of transcript per million (RPKM). Pairwise Correlation and Network Construction. Pairwise correlations for TFs, ACKNOWLEDGMENTS. The study was supported by the Division of Intramu- GPCRs, and transporters and channels were calculated based on a modified ral Research of the NHLBI (M.B.B. and M.A.K., principal investigators), Pro- Spearman’s ρ by using the scran package in R (42). Each pairwise correlation jects ZIA-HL001285 and ZIA-HL006129, as well as NIH Grants R01-DK104125 matrix was then converted into a graph object by using RBGL (bioconductor. (to S.M.W.), R01-DK107798 (to J.W.V.), and R37-DK042956 (to D.B.). Next- org/packages/release/bioc/html/RBGL.html) and igraph (igraph.org/redirect. generation sequencing was done in the NHLBI DNA Sequencing Core Facility html) packages implemented in R and finished in Cytoscape (www.cytoscape. (Y. Li, Director). Cell sorting was done in the NHLBI Flow Cytometry Core org). Gene pairs with correlation >0.3 and false discovery rate (FDR) <0.05 were Facility (P. McCoy, Director). Confocal images were taken in the NHLBI Light presented as edges linking the respective genes. Microscopy Core Facility (C. Combs, Director).

1. Kriz W, Bankir L; The Renal Commission of the International Union of Physiological 23. Werth M, et al. (2017) Transcription factor TFCP2L1 patterns cells in the mouse kidney Sciences (IUPS) (1988) A standard nomenclature for structures of the kidney. Kidney collecting ducts. Elife 6:e24265. Int 33:1–7. 24. Leviel F, et al. (2010) The Na+-dependent chloride-bicarbonate exchanger SLC4A8 2. Lee JW, Chou CL, Knepper MA (2015) Deep sequencing in microdissected renal tu- mediates an electroneutral Na+ reabsorption process in the renal cortical collecting bules identifies segment-specific transcriptomes. J Am Soc Nephrol 26: ducts of mice. J Clin Invest 120:1627–1635, and erratum (2011) 121:1668. 2669–2677. 25. Tomita K, Pisano JJ, Burg MB, Knepper MA (1986) Effects of vasopressin and brady- 3. Cheval L, et al. (2011) Atlas of gene expression in the mouse kidney: New features of kinin on anion transport by the rat cortical collecting duct. Evidence for an electro- glomerular parietal cells. Physiol Genomics 43:161–173. neutral transport pathway. J Clin Invest 77:136–141. 4. Zeisel A, et al. (2015) Brain structure. Cell types in the mouse cortex and hippocampus 26. Masilamani S, Kim GH, Mitchell C, Wade JB, Knepper MA (1999) Aldosterone- revealed by single-cell RNA-seq. Science 347:1138–1142. mediated regulation of ENaC alpha, beta, and gamma subunit proteins in rat kid- 5. Treutlein B, et al. (2014) Reconstructing lineage hierarchies of the distal lung epi- ney. J Clin Invest 104:R19–R23. thelium using single-cell RNA-seq. Nature 509:371–375. 27. Kann M, et al. (2015) Genome-wide analysis of Wilms’ tumor 1-controlled gene ex- 6. Xin Y, et al. (2016) Use of the Fluidigm C1 platform for RNA sequencing of single pression in reveals key regulatory mechanisms. J Am Soc Nephrol 26: mouse pancreatic islet cells. Proc Natl Acad Sci USA 113:3293–3298. 2097–2104. 7. Macosko EZ, et al. (2015) Highly parallel genome-wide expression profiling of indi- 28. Grantham JJ, Burg MB (1966) Effect of vasopressin and cyclic AMP on permeability of vidual cells using nanoliter droplets. Cell 161:1202–1214. isolated collecting tubules. Am J Physiol 211:255–259. 8. Miller RL, et al. (2005) V-ATPase B1-subunit promoter drives expression of EGFP in 29. Knepper MA (1983) transport in isolated thick ascending limbs and collecting intercalated cells of kidney, clear cells of and airway cells of lung in ducts from rats. Am J Physiol 245:F634–F639. transgenic mice. Am J Physiol Cell Physiol 288:C1134–C1144. 30. Pech V, et al. (2007) II increases chloride absorption in the cortical col- 9. Murata F, et al. (1983) Distribution of glycoconjugates in the kidney studied by use of lecting duct in mice through a pendrin-dependent mechanism. Am J Physiol Renal labeled lectins. J Histochem Cytochem 31(1A suppl):139–144. Physiol 292:F914–F920. 10. Labarca M, et al. (2015) Harvest and primary culture of the murine aldosterone- 31. Saxena V, et al. (May 3, 2017) Cell specific qRT-PCR of renal epithelial cells reveals a sensitive distal nephron. Am J Physiol Renal Physiol 308:F1306–F1315. novel innate immune signature in murine collecting duct. Am J Physiol Renal Physiol, 11. Alper SL, Natale J, Gluck S, Lodish HF, Brown D (1989) Subtypes of intercalated cells in 10.1152/ajprenal.00512.2016. rat kidney collecting duct defined by antibodies against erythroid band 3 and renal 32. Garvin JL, Knepper MA (1987) Bicarbonate and transport in isolated per- vacuolar H+-ATPase. Proc Natl Acad Sci USA 86:5429–5433. fused rat proximal straight tubules. Am J Physiol 253:F277–F281. 12. Roy A, Al-bataineh MM, Pastor-Soler NM (2015) Collecting duct intercalated cell 33. Schwanhäusser B, et al. (2011) Global quantification of mammalian gene expression function and regulation. Clin J Am Soc Nephrol 10:305–324. control. Nature 473:337–342. 13. Yasui M, Kwon TH, Knepper MA, Nielsen S, Agre P (1999) Aquaporin-6: An in- 34. Bens M, et al. (1999) -dependent sodium transport in a novel immor- tracellular vesicle water channel protein in renal epithelia. Proc Natl Acad Sci USA 96: talized mouse collecting duct principal cell line. J Am Soc Nephrol 10:923–934. 5808–5813. 35. Gaeggeler HP, et al. (2005) Mineralocorticoid versus occu- 14. Deyev IE, et al. (2011) -related receptor as an extracellular alkali pancy mediating aldosterone-stimulated sodium transport in a novel renal cell line. sensor. Cell Metab 13:679–689. J Am Soc Nephrol 16:878–891. 15. Shalek AK, et al. (2013) Single-cell transcriptomics reveals bimodality in expression 36. Vedovelli L, et al. (2013) Altered V-ATPase expression in renal intercalated cells iso- and splicing in immune cells. Nature 498:236–240. lated from B1 subunit-deficient mice by fluorescence-activated cell sorting. Am J 16. Sanchez A, Golding I (2013) Genetic determinants and cellular constraints in noisy Physiol Renal Physiol 304:F522–F532. gene expression. Science 342:1188–1193. 37. Knauf F, et al. (2001) Identification of a chloride-formate exchanger expressed on the 17. Niopek D, Wehler P, Roensch J, Eils R, Di Ventura B (2016) Optogenetic control of brush border membrane of renal proximal tubule cells. Proc Natl Acad Sci USA 98: nuclear protein export. Nat Commun 7:10624. 9425–9430. 18. Nelson DE, et al. (2004) Oscillations in NF-kappaB signaling control the dynamics of 38. Hentschke M, Hentschke S, Borgmeyer U, Hübner CA, Kurth I (2009) The murine gene expression. Science 306:704–708. AE4 promoter predominantly drives type B intercalated cell specific transcription. 19. Debiec H, Christensen EI, Ronco PM (1998) The cell adhesion molecule L1 is de- Histochem Cell Biol 132:405–412. velopmentally regulated in the renal and is involved in kidney branching 39. Dobin A, et al. (2013) STAR: Ultrafast universal RNA-seq aligner. 29: morphogenesis. J Cell Biol 143:2067–2079. 15–21. 20. Azroyan A, et al. (2015) Renal intercalated cells sense and mediate inflammation via 40. Li B, Dewey CN (2011) RSEM: Accurate transcript quantification from RNA-Seq data the P2Y14 receptor. PLoS One 10:e0121419. with or without a reference genome. BMC Bioinformatics 12:323. 21. Kohan DE (2013) Role of collecting duct endothelin in control of renal function and 41. Wang L, Wang S, Li W (2012) RSeQC: Quality control of RNA-seq experiments. blood pressure. Am J Physiol Regul Integr Comp Physiol 305:R659–R668. Bioinformatics 28:2184–2185. 22. Wilson JL, Miranda CA, Knepper MA (2013) Vasopressin and the regulation of 42. Lun AT, McCarthy DJ, Marioni JC (2016) A step-by-step workflow for low-level analysis aquaporin-2. Clin Exp Nephrol 17:751–764. of single-cell RNA-seq data with Bioconductor. F1000 Res 5:2122.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1710964114 Chen et al. Downloaded by guest on September 29, 2021