bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
1 Diversification of Reprogramming Trajectories Revealed by Parallel Single-cell 2 Transcriptome and Chromatin Accessibility Sequencing
3 Qiao Rui Xing1,2,19, Chadi El Farran1,3,19, Pradeep Gautam1,3, Yu Song Chuah1, Tushar Warrier1,3, 4 Cheng-Xu Delon Toh1, Nam-Young Kang4,5, Shigeki Sugii6,7, Young-Tae Chang4,5, Jian Xu3, 5 James J. Collins8,9,10,11, George Q. Daley8,12,13,14,15, Hu Li16,*, Li-Feng Zhang2,*, Yuin-Han 6 Loh1,3,17,18*
7 1 Epigenetics and Cell Fates Laboratory, Programme in Stem Cell, Regenerative Medicine and 8 Aging, A*STAR Institute of Molecular and Cell Biology, Singapore 138673, Singapore 9 2 School of Biological Sciences, Nanyang Technological University, 637551, Singapore 10 3 Department of Biological Sciences, National University of Singapore, Singapore 117543, 11 Singapore 12 4 Laboratory of Bioimaging Probe Development, Singapore Bioimaging Consortium, A*STAR, 13 Singapore 138667, Singapore 14 5 Department of Chemistry, National University of Singapore, Singapore 117543, Singapore 15 6 Fat Metabolism and Stem Cell Group, Laboratory of Metabolic Medicine, Singapore Bioimaging 16 Consortium, A*STAR, 11 Biopolis Way, 138667 Singapore 17 7 Cardiovascular and Metabolic Disorders Program, Duke-NUS Medical School, 8 College Road, 18 169857 Singapore 19 8 Howard Hughes Medical Institute, Boston, MA 02114, USA 20 9 Institute for Medical Engineering and Science Department of Biological Engineering, and 21 Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA 02114, USA 22 10 Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA 23 11 Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA 24 12 Stem Cell Transplantation Program, Division of Pediatric Hematology/Oncology, Boston 25 Children’s Hospital and Dana-Farber Cancer Institute, Boston, MA 02115, USA 26 13 Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, 27 Boston, MA 02115, USA 28 14 Harvard Stem Cell Institute, Boston, MA 02115, USA 29 15 Manton Center for Orphan Disease Research, Boston, MA 02115, USA 30 16 Center for Individualized Medicine, Department of Molecular Pharmacology & Experimental 31 Therapeutics, Mayo Clinic, Rochester, Minnesota 55905, USA 32 17 NUS Graduate School for Integrative Sciences and Engineering, National University of 33 Singapore, 28 Medical Drive, Singapore 117456, Singapore 34 18 Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, 35 Singapore 117593, Singapore 36 19These authors contributed equally to this work 37 38 39 40 41 *Correspondence: 42 Yuin-Han Loh, Epigenetics and Cell Fates Laboratory, A*STAR, Institute of Molecular and Cell 43 Biology, 61 Biopolis Drive Proteos, Singapore 138673, 44 Singapore; e-mail: [email protected] 45 Li-Feng Zhang, School of Biological Sciences, Nanyang Technological University, 60 Nanyang 46 Drive, Singapore, 637551, Singapore; e-mail: [email protected] 47 Hu Li, Center for Individualized Medicine, Department of Molecular Pharmacology & 48 Experimental Therapeutics, Mayo Clinic, 200 First St. SW Rochester, MN 55905, USA; e-mail: 49 [email protected] 1
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
50 Keywords: human cellular reprogramming, single-cell ATAC-Seq (scATAC-Seq), single-cell 51 RNA-Seq (scRNA-Seq), GDF3, BDD2-C8, CD13, CD44, CD201, stage-specific regulatory 52 networks, lineage specifiers, chromatin accessibility, FOSL1, TEAD4.
53 Running title: Parallel Single-cell Transcriptome and Chromatin Sequencing on Human Cellular 54 Reprogramming
55
56 SUMMARY
57 To unravel the mechanism of human cellular reprogramming process at single-cell
58 resolution, we performed parallel scRNA-Seq and scATAC-Seq analysis. Our analysis reveals that
59 the cells undergoing reprogramming proceed in an asynchronous trajectory and diversify into
60 heterogeneous sub-populations. BDD2-C8 fluorescent probe staining and negative staining for
61 CD13, CD44 and CD201 markers, could enrich for the GDF3+ early reprogrammed cells.
62 Combinatory usage of the surface markers enables the fine segregation of the early-intermediate
63 cells with diverse reprogramming propensities. scATAC-Seq analysis further uncovered the
64 genomic partitions and transcription factors responsible for the regulatory phasing of
65 reprogramming process. Binary choice between a FOSL1 or a TEAD4-centric regulatory network
66 determines the outcome of a successful reprogramming. Altogether, our study illuminates the
67 multitude of diverse routes transversed by individual reprogramming cells and presents an
68 integrative roadmap for identifying the mechanistic part-list of the reprogramming machinery.
69
70 INTRODUCTION
71 Somatic cells can be reverted to pluripotency by inducing the expression of four
72 transcription factors namely OCT4, SOX2, KLF4 and MYC in a process known as cellular
73 reprogramming1–6. Discovery of this phenomenon has raised the hopes for advancing the field of
74 regenerative medicine7. However, cellular reprogramming suffers from extremely low efficiency
75 especially for the human cells, resulting in a heterogeneous population in which few cells can be
76 characterized as pluripotent8–10. Although a handful of studies analyzed bulk population to
2
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
77 understand the reprogramming mechanisms11–17, ensemble measurement of the heterogeneous
78 population impedes the discerning of transcriptomic and epigenetic changes taking place in the
79 minority of cells undergoing the route towards successful reprogramming. Single-cell sequencing
80 technologies provide tools to decipher the types of cells present in a heterogeneous mixture18. In the
81 present study, we adopted the parallel genome-wide single-cell assays of scRNA-Seq and scATAC-
82 Seq19–21 to profile transcriptome and chromatin accessibility of human reprogramming cells across
83 various stages. We identified cellular diversification and trajectories where individual cell displays
84 different dynamics and potential for reprogramming. Moreover, with a set of cell surface markers
85 and a fluorescent probe BDD2-C8, we were able to enrich for the early-intermediary cells
86 undergoing route towards successful reprogramming. In addition, we identified the modulators
87 driving the changes in gene regulatory network and chromatin accessibility, as cells advanced
88 towards the diverse reprogramming trajectories. Of note, the pivot from FOSL1 to TEAD4-centric
89 regulatory networks is essential for the acquisition of the pluripotent state.
90
91 RESULTS
92 Single-cell Profiling of Cell-Fate Reprogramming
93 To study the heterogeneity of human reprogramming, we analyzed a total of 33468 scRNA-
94 Seq and scATAC-Seq libraries with good quality, including day 0 (BJ), day 2 (D2), day 8 (D8),
95 day12 (D12) and day 16 (D16) OSKM-induced reprogramming cells (Figure 1a and Supplementary
96 Table 1). On D16, cells were sorted using the TRA-1-60 marker to distinguish the successfully
97 reprogrammed (D16+) from the non-reprogrammed (D16-) cells (Figures 1a and S1a-b). The
98 generated iPSCs were characterized with immunostaining, DNA methylation, and terotoma assay
99 (Figures S1c-e). Two distinct approaches were used for scRNA-Seq library preparation.
100 Microfluidic cell capture-based assay (Fluidigm C1) reads the full-length transcripts from hundreds
101 of cells with high resolution, whereas the droplet-based assay (10X Genomics) probes the 3’ end of 3
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
102 the transcripts from thousands of cells albeit at a relatively low genomic coverage. In addition, we
103 have also screened for fluorescence probes to distinguish early-intermediate cells poised for
104 successful reprogramming (Figure 1a). The cumulative data enable us to characterize the sub-
105 populations in-depth and to construct a trajectory map of human reprogramming.
106 Majority of the capture-based scRNA-Seq libraries showed high exon mapping percentage
107 (>=75%) and gene detection rate (>=20%) and displayed even distribution over the gene-bodies
108 (Figure 1b-c, and Supplementary Table 1). Furthermore, epithelial and pluripotency genes were
109 progressively expressed with the advancement of reprogramming, as opposed to the mesenchymal
110 and fibroblast genes (Figure S1f). Similarly, majority of the 10X libraries were of good quality
111 (Figures S1g-h). Notably, t-SNE plot revealed a dynamic transcriptomic transition from the parental
112 BJ to D16+ cells (Figure 1d). Expectedly, ZEB1 (mesenchymal) and COL1A1 (somatic) were
113 abundantly expressed in the early time-points and non-reprogrammed cells (Figures 1e-f). On the
114 contrary, EPCAM (epithelial), and NANOG and LIN28A (pluripotent) were expressed highly in the
115 successfully reprogrammed cells (Figures 1e-f and S1i). Likewise, most of the scATAC-Seq
116 libraries passed the previously reported QC indices19, and exhibited enrichment over TSS regions
117 and nucleosomal distributions (Figure 1g-i and S1j-k, and Supplementary Table 1). Collectively, we
118 generated reliable libraries for tens of thousands of reprogramming cells, providing a rich resource
119 to decipher its deep molecular mechanisms.
120 Deciphering the Heterogeneous Subgroups with Diverse Reprogramming Potentials
121 Due to its high sensitivity, capture-based scRNA-Seq libraries were analyzed first to
122 decipher the heterogeneity. CellNet22 revealed the dynamics of reduced fibroblast similarity and
123 increased ESC correlation (Figure S2a). To determine the diverse populations present at each
124 reprogramming time-point, we clustered scRNA-Seq libraries using Reference Component Analysis
125 (RCA)23. Interestingly, BJ cells correlated significantly with the smooth muscle lineage, which was
126 also detected across the published BJ libraries (Figures S2b-c). D2 cells were marked by four 4
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
127 distinct subgroups (G1-G4), among which G1-3 cells displayed lower correlation to the fibroblasts
128 and mesenchymal stem cells (MSCs) (Figures 2a and S2d). D8 cells were distributed among three
129 discrete sub-populations (Figures 2b and S2d). D8 G1 cells corresponded to the fibroblasts, smooth
130 muscles, myocytes, and MSCs lineages, whilst G3 cells displayed substantial similarity to the
131 pluripotent stem cells (PSCs). Interestingly, D8 G2 cells represented the intermediate state. Two
132 sub-populations were present in the D16+ cells (Figures 2c and S2d). D16+ G2 cells were highly
133 associated with PSCs, while G1 cells maintained detectable correlation to the MSCs, adipose cells,
134 and endothelial cells, other than PSCs. In summary, RCA analysis strongly indicates that the
135 reprogramming cells are highly diverse, some of which may deviate from the route to pluripotency
136 and acquire alternative lineage cell-fates.
137 We then performed differential gene expression (DGE) analysis to evaluate the subgroup
138 specific genes (Figure 2d and Supplementary Table 2). Among D2 subgroups, G3 cells had the
139 most distinct transcriptomic profile with exclusive expression of a remarkable number of genes,
140 including ERBB3 (Figures 2d-e). Majority of D2 G1-2 genes, for example CDK1, were expressed
141 highly in D16+ G2, suggestive of their higher reprogramming propensity (Figures 2d-f). Among D8
142 subgroups, G1-2 specific genes were expressed highly in BJ and D16- but not the D16+ cells, such
143 as JUNB, LUM, COL1A1, and COL6A3 (Figure 2d). The opposite trend was observed for D8 G2-3
144 genes. RFC3 was vastly expressed in D8 G2-3 cells, whereas GDF3 was specifically expressed in
145 the G3 (Figures 2d-f). In agreement with the correlation to the differentiated lineages, D16+ G1
146 specific genes, including MMP2, were also expressed highly in the D16- cells, suggesting that
147 D16+ G1 cells were at most partially reprogrammed (Figures 2d-f). On the other hand, epithelial
148 genes and pluripotent genes including CDH1, NANOG and LIN28A, as well as genes associated
149 with mRNA splicing and transcription of the small RNAs including WBP11 and POLR3K, were
150 specifically expressed in D16+ G2 cells. Intriguingly, depletion of POLR3K and WBP11 resulted in
151 ablated reprogramming efficiency, indicating the functional importance of the D16+ G2 specific
152 genes for reprogramming (Figure S2e). Additionally, D8 G3 and D16+ G2 specific genes exhibited 5
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
153 high stemness score for PSCs, whereas D8 G1-2 and D16+ G1 specific genes significantly
154 associated with the differentiated lineages (Figures S2f-g).
155 To test the notion of the diverse reprogramming potentials, D8 subgroups were correlated to
156 the subgroups of various time-points (Figure S2h). Interestingly, G3 cells highly correlated with
157 D16+ G2 cells, whereas G2 cells represented an intermediate state in which the cells were
158 moderately correlated with all D16 cells. G1 cells, on the other hand, strongly correlated with D16-
159 and cells from the early time-points (BJ and D2). In-house devised classifier displayed a similar
160 correlation trend of the reprogramming time-points to the D8 subgroups (Figure 2g).
161 Pseudotemporal trajectory of reprogramming cells
162 We next analyzed 10X libraries to construct pseudotemporal map of cellular reprogramming.
163 CellNet and RCA of 10X libraries reproduced the reprogramming dynamics and diverse subgroups
164 with variable lineage correlations (Figures S3a-c). Resultant pseudotemporal trajectories24,25
165 consisted of 9 states and 4 branching events (Figures 3a and S3d). Notably, pseudotime highly
166 correlated with the reprogramming time-points (Figures 3a-c). We then traced the trajectory of RCA
167 subgroups. Interestingly, majority of the D2 G1-3 cells were found in state 3 (95% of G1, 65% of
168 G2, and 80% of G3), whereas G4 cells scattered across the early states (Figure S3e). Joint RCA
169 analysis demonstrated that D2 G1-2 cells clustered closer to the D8 G2 cells and correlated stronger
170 to the ESC fate, indicating their higher reprogramming propensity (Figures S3f-g). Of note, D8 G1
171 cells enriched across the different states other than state 9 (successfully reprogrammed) (Figures 3d-
172 e). On the contrary, D8 G3 cells were mostly found in state 9. D8 G2 cells distributed across the
173 intermediate (3-5) and late states (7-9). Intriguingly, state 4 comprised almost entirely of D8 G2
174 cells (544 vs. 627) (Figures 3d-e). Expectedly, D16+ G2 cells were the major constituents of state 9,
175 whereas cells of D16+ G1 were enriched in both state 8 (non-reprogrammed) and state 9,
176 corroborating their partially- and non-reprogrammed identities (Figures 3d and S3h). Further,
177 subgroup specific markers were expressed differentially along the pseudotime axis (Figure 3f). 6
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
178 RFC3 (D8 G2-3) and GDF3 (D8 G3), and NANOG and LIN28A (D16+ G2) were expressed highly
179 in the cells on the successful reprogramming trajectory, whereas MMP2 (D16+ G1) showed the
180 opposite trend. Next, we determined the gene expression patterns and significant biological
181 processes defining the trajectory (Figure S3i and Supplementary Table 3). At branching event 1,
182 cells with high levels of lineage genes advanced towards successful reprogramming, whereas cells
183 with abundant DNA replication genes deviated from the path (Figures 3g-h and S3i). Interestingly,
184 successfully reprogrammed cells at branching point 4 were highly active for DNA replication and
185 mRNA splicing, which were implicated to be essential for reprogramming26,27. On the other hand,
186 collagen and extracellular matrix related genes emerged to be detrimental for reprogramming at the
187 branching point 3 and 4, and lineage processes such as angiogenesis and epidermis development
188 contributed to the unsuccessful reprogramming at branching point 4. Together, these analyses
189 provide a plethora of data for identifying the novel processes and modulators affecting the
190 reprogramming trajectory at an unprecedented single-cell resolution.
191 Toolkits to enrich for early-intermediate cells with diverse reprogramming potentials
192 To enrich for the intermediate cells with high reprogramming potential, we conducted a
193 screen for a library of 34 DOLFA28,29 fluorescent dyes (Figure S4a). Candidate dyes differentially
194 stain the intermediate reprogramming cells with the accelerated dynamics upon treatment with a
195 TGFβ inhibitor30,31, A83-01. BDD1-A2, BDD2-A6, and BDD2-C8 were identified as the top hits,
196 which showed co-localized signal with TRA-1-60 (Figures S4a-c). Importantly, D8 cells enriched
197 with the candidate probes gave rise to a significantly higher number of TRA-1-60+ colonies
198 (Figures 4a and S4d). Among them, BDD2-C8 displayed the best performance in an alternative
199 reprogramming of MRC5 fibroblasts (Figure S4e). BDD2-C8 also precisely captured changes in
200 reprogramming efficiency upon depletion of the key modulators15 (Figure S4f). Single-cell qPCR
201 showed that BDD2-C8+ cells expressed higher levels of epithelial and pluripotent genes, and lower
202 levels of mesenchymal and somatic genes (Figure 4b). To further characterize, we prepared 192
7
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
203 scRNA-Seq libraries for D8 cells stained highly (D8BDD2-C8+) and lowly (D8BDD2-C8-) for BDD2-C8.
204 In the ensuing RCA analysis, D8BDD2-C8+ and D8BDD2-C8- cells clustered close to the D8 G2-3 and G1
205 respectively, which were substantiated by the similar expression profiles for the subgroup specific
206 genes (Figures 4c-d and S4g-h). GO enriched for D8BDD2-C8+ specific genes were related to cell
207 cycle, embryo development and stem cell population maintenance, whereas D8BDD2-C8- genes were
208 predominantly represented by epithelial to mesenchymal transition, extracellular matrix
209 organization, and development processes (Figure 4e and Supplementary Table 4). In terms of
210 structural complexes, D8BDD2-C8- genes were specifically enriched in the endoplasmic reticulum (ER)
211 lumen and Golgi (Figure S4i). Interestingly, BDD2-C8 were localized in the ER and Golgi (Figure
212 S4j). In addition, higher expression of secretory genes in the D8BDD2-C8- cells implicated its active
213 ER-Golgi secretion pathway (Figure S4k). Depletion of these genes indeed resulted in the retention
214 of BDD2-C8 (Figures S4l-m). This indicates that BDD2-C8 may be actively effluxed from BJ and
215 the non-reprogrammed cells (D8 G1) but retained in the pluripotent cells and intermediate cells with
216 high reprogramming potential, due to the differential ER-Golgi secretion activities.
217 We next identified surface markers with differential expression among D8 subgroups to
218 enrich for the intermediate cells with varying reprogramming potentials (Supplementary Table 4).
219 Shortlisted surface markers displayed higher expression in D8 G1/G2 and D16- cells than D8 G3
220 and D16+ cells respectively (Figures 4f-g). Majority of them were abundantly enriched in the
221 parental BJ cells, except for CD201. Expression dynamics of the surface markers were validated by
222 time-course FACS analysis (Figure 4h). Further, D8 cells stained negatively for the surface markers
223 exhibited lower expression of mesenchymal markers but higher epithelial and pluripotency genes,
224 including the D8 G3 marker GDF3 (Figure 4i). Noteworthy, negatively stained D8 populations
225 gave rise to more TRA-1-60+ colonies, indicating the capacity of surface markers to isolate early
226 reprogrammed cells with high stemness feature (Figure 4j). We then examined the similarity of
227 cells sorted by BDD2-C8 and the identified surface markers. Indeed, D8BDD2-C8+ cells demonstrated
228 lower level of CD13, CD44, and CD201 (Figure 4k-l). Of note, difference in the CD201 protein 8
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
229 amount was subtler, which could be due to its inconsistent expression across the D8 subgroups (G1-
230 2 like) with BDD2-C8 (G2-3 like). Of note, co-staining of CD13 and CD44 markers showed
231 extensive overlaps across the time-points (Figures 4m and S4n). Surface markers were verified in
232 an alternative reprogramming of MSCs induced by Sendai viruses (Figure S4o-p). D8 sorted MSC
233 reprogramming cells demonstrated the similar expression trends as BJ reprogramming (Figures 4n
234 and S4q). Importantly, CD13- MSCs resulted in a remarkably higher reprogramming efficiency
235 with or without the aid of TGF-beta inhibition (Figure 4o). Collectively, the fluorescent dye and
236 surface markers established the technological platforms to enrich for the minority of intermediate
237 cells primed for successful reprogramming.
238 Refined classification of the intermediate population
239 To decipher the intermediate reprogramming cells, we prepared 10X libraries on D8 cells
240 sorted with CD13. Majority of the libraries passed the QC thresholds (Figure S5a). Expectedly,
241 libraries showed 2 distinct groups with differential CD13 expression (Figures 5a and S5b). Notably,
242 majority of the genes that were expressed highly in CD13+ cells were correspondingly expressed in
243 the D8BDD2-C8- cells and vice versa (Figure S5c). The libraries demonstrated 8 clusters. Clusters 5-8
244 were mainly comprised of CD13+ cells (Figures 5b and S5d). Consistently, CellNet analysis
245 showed that clusters 1-4 correlated with ESCs, whereas the other clusters to the fibroblast state
246 (Figure S5e). RCA corroborated the association of CD13+ cells to the MSCs and Fibroblast
247 lineages (resembling G1) (Figure 5c). Interestingly, among the CD13- cells, cluster 1 showed the
248 highest correlation score to the PSCs, indicating similarity to the D8 G3 cells (Figure 5c). Whereas,
249 cells of cluster 3 and 4 showed resemblance to both the MSCs and ESCs, albeit at a lower
250 significance than cluster 1 in terms of the ESCs (Figure 5c). On the other hand, cells of cluster 2
251 and 7 exhibited a transitional intermediate profile (Figure 5c). We next used MAGIC32 for pairwise
252 comparison between CD13 and GDF3. Interestingly, cells with lower CD13 levels exhibited higher
253 expression of GDF3 and NANOG, among which cluster 1 and 4 exhibited the highest GDF3
9
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
254 expression (Figures 5d-e and S5f). Next, to trace the trajectory of the CD13 sub-clusters, we
255 aggregated the CD13 sorted 10X libraries with the other timepoints and performed pseudotemporal
256 analysis. Remarkably, D8 CD13+ and CD13- cells were enriched at the distinct trajectory states
257 (Figure 5f). Cells of cluster 1 and 4 concentrated in the branch shared by D16+ cells. Notably, cells
258 of cluster 3 were found at the unsuccessful reprogramming branch, whereas cluster 2 cells were
259 scattered.
260 DGE analysis demonstrated that cluster 7 had unique expression trends among the CD13+
261 clusters, which associated with epidermis development, and I-KB/NF-KB signaling (Figures 5g-h).
262 On the other hand, cluster 5, 6 and 8 specific genes related to extracellular matrix organization and
263 collagen catabolic process (Figure 5h). cluster 1 and 4 specific genes related to cell division,
264 heterochromatin assembly, embryonic development, organ development of multiple lineages, and
265 MAPK cascade, suggesting that reprogramming cells of cluster 1 and 4 might adopt an open
266 chromatin structure, especially at genes crucial for multi-lineage development and differentiation.
267 Mathematical imputation showed an extensive correlation between CD13 and CD44 (Figures S5g-
268 h). Interestingly, we detected a group of cells which were low in CD13 but high in CD201,
269 representing the D8 G2-like intermediate cells belonging mostly to cluster 2 and 7 (Figures 5i, and
270 S5i). To substantiate their existence, we performed time-course co-staining using CD13 and CD201
271 antibodies. A significant number of intermediate cells were CD13-CD201+, and their proportion
272 increased as reprogramming advanced (Figures 5j and S5j). Furthermore, CD13-CD201+ cells
273 corresponded to higher BDD2-C8 staining signals than CD13+CD201+ cells (Figure S5k). Hence,
274 we hypothesized that dual sorting with CD13 and CD201 markers would allow us to enrich for
275 successfully reprogrammed early-intermediary cells with higher purity.
276 We thus categorized D8 cells into double negative (CD13-CD201-), double positive
277 (CD13+CD201+) and intermediate (CD13-CD201+) cells, which were then subjected to gene
278 measurement (Supplementary Table 5). CD13+CD201+ cells expressed fibroblast and
10
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
279 mesenchymal genes and genes associated with extracellular matrix and cell adhesion (Figures 5k-l).
280 On the contrary, CD13-CD201- cells expressed genes related to pluripotency, epithelial lineage, cell
281 division, neuronal differentiation, and stem cell population maintenance. CD13-CD201+ cells
282 exhibited an intermediate transcription profile (Figures 5k-l). In addition, genes highly expressed in
283 CD13+CD201+ cells were mostly enriched in the D16- but not D16+ cells, whereas the opposite
284 was observed for CD13-CD201- specific genes (Figure S5l). Notably, depletion of genes highly
285 expressed in the CD13+CD201+ population resulted in more reprogrammed colonies (Figure 5m).
286 Importantly, enrichment of CD13-CD201- population gave rise to the highest reprogramming
287 efficiency, in comparison with the CD13-CD201+ (intermediate) and CD13+CD201+ cells (lowest)
288 (Figure 5n). We validated the presence of these distinct D8 populations in an alternative
289 reprogramming of BJ cells using Sendai viruses (Figures S5m-o). Altogether, concurrent use of
290 CD13 and CD201 antibodies enable us to dissect the precise populations differentially poised for
291 successful reprogramming.
292 Regulatory networks of Transcription Factors in cellular reprogramming
293 Transcription Factors (TFs) define the cell-selective regulatory network underlying the
294 cellular identity and function33. However, the stage-specific core regulatory networks of the human
295 cellular reprogramming remain elusive. To this end, we performed DGE analysis for TFs across the
296 pseudotemporal states, which were then categorized as Early Silenced, Late Silenced, Transient,
297 Early Expressed and Late Expressed (Figures 6a and S6a, and Supplementary Table 6). Notably,
298 many TFs exhibited similar expression trends in the D8 RCA subgroups and the D8 sorted libraries.
299 Specifically, Early Silenced TFs (e.g. FOSL1, CREB3L1, AHRR, DRAP1 and ELL2) showed higher
300 expression in the D8 BDD2-C8- and CD13+CD201+ populations, whereas Late Silenced TFs
301 exhibited the opposite trends (Figures 6a and S6b). Majority of Transient TFs exhibited higher
302 expression in CD13+CD201+ and CD13-CD201+ populations, including lineage associated factors
303 namely HAND1 (mesoderm), ASXL3 and NEUROG2 (neuroectoderm) (Figures 6a and S6c).
11
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
304 Contrastingly, Early Expressed TFs adopted higher expression in the cells of CD13-CD201- and
305 CD13-CD201+ (Figures 6a and S6d). CD13-CD201- cells expressed the highest level of Late
306 Expressed TFs, such as PRDM14, DNMT3B and LHX6.
307 To investigate how the regulatory TFs accessed its genomic targets, we then analyzed time-
308 course scATAC-Seq libraries of reprogramming cells (Figure 1a). Both batches of libraries showed
309 similar promoter accessibility profiles (Figure S6e). We also observed diminished accessibility on
310 the fibroblast-specific DHS and increased accessibility on the pluripotency-specific DHS as
311 reprogramming progressed (Figure S6f). Next, we utilized chromVAR34 to identify the TFs
312 determining the variable epigenomes accessibility. Correlation of scATAC-Seq libraries among
313 themselves resulted in three major clusters. The early cells consisting mostly of BJ and D2 cells
314 clustered together (Cluster II), while D8, D16+ and H1 cells shared similar accessibility profile
315 (Cluster I). A third cluster composed mainly of D8 and D16- cells (Cluster III) (Figure 6b). of note,
316 FOSL1 and its partners, CEBPA, ZEB1, PAX6, SOX8, SOX10, POU5F1 and TEAD4 were found
317 to be the TFs contributing to the reprogramming heterogeneity (Figure 6c and Supplementary Table
318 6). TFs were then categorized to open starting from the intermediate stage (Early open) or the late
319 stage (Late open), open transiently at the intermediate stage (Transient open), and close starting
320 from the intermediate stages of reprogramming (Close) which were either homogenously or
321 heterogeneously open in the early cells (Figure 6d and 6e). Particularly, Close TFs belonged mostly
322 to the FOS-JUN-AP1 complex, such as FOSL1 and JDP2 (Figures 6f and S6g). Intriguingly, motifs
323 of lineage specifiers (Mesendoderm: GATA1, SOX8 and HNF4G; Ectoderm: PAX6, NRL and
324 RXR) were found to be accessible only in the intermediate reprogramming cells, but not the H1
325 hESCs (Figures 6d, 6g and S6h). This observation corroborated with the model of counteracting
326 lineage specification networks underlying the induction of pluripotency35,36. On the contrary, Early
327 Open TFs (e.g. TEAD4 and POU5F1) exhibited accessibility starting from the D8 reprogramming
328 cells, whilst Late Open TFs (e.g. FOXL1, TCF4, and YY2) demonstrated accessibility in D16+
329 cells and ESCs exclusively (Figures 6d, 6h-i, S6i-k). Of note, TFs of FOX family and YY1 were 12
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
330 previously reported to be important for reprogramming37–39. Independently, we used SCENIC
331 analysis to infer the interaction between TFs and their target genes. Corroborating with our earlier
332 results, Silenced and Close TFs showed decrease in their regulon activity as reprogramming
333 progressed, whereas majority of Expressed and Open TFs displayed the opposite dynamics (Figure
334 S6l). Likewise, regulon activity of the Transient TFs was observed only in the intermediate cells.
335 Together, these data represent the compendium of TFs which regulate the networks of downstream
336 key modulators in the cellular reprogramming process.
337 Identification of key regulators for the intermediate stage of reprogramming
338 In order to deduce TFs essential for the intermediate cells to acquire pluripotency, we
339 further sub-clustered the D8 scATAC-Seq libraries (Figures S7a-b). Intriguingly, the most variable
340 motifs of D8 cells belonged to the FOS-JUN-AP1 and TEAD families (Figure 7a and
341 Supplementary Table 7). Strikingly, D8 cells were either accessible for FOSL1-JUN-AP1 or
342 TEAD4 motif (Figures 7b-c). Of note, FOSL1 and TEAD4 displayed a contrasting expression
343 pattern and regulon activity during reprogramming (Figures 7d-g). In D8, FOSL1 and TEAD4 were
344 exclusively expressed in the CD13+ and CD13- cells (Figure 7h).
345 Given its expression and motif accessibility at the early stage of reprogramming, FOSL1
346 was hypothesized to act as a roadblock. To verify, reprogramming efficiency was assessed upon the
347 depletion of FOSL1 by siRNA, which showed no effect on cell proliferation and lasted for about 4
348 days (Figures S7c-e). Remarkably, depletion of FOSL1 at day 1, 2, 3, or 5 post-OSKM induction
349 resulted in an increased reprogramming efficiency (Figure 7i). Specificity of the FOSL1 siRNA
350 construct was validated using a mutant construct which showed no effect (Figures S7f-g). Depletion
351 of FOSL1 in an alternative Sendai virus reprogramming reproduced the phenotypic change (Figures
352 S7h-i). Conversely, ectopic expression induced a drastic reduction in the number of reprogrammed
353 colonies (Figure 7j). Noteworthy, high reprogramming propensity of CD13- cells was negated by
354 the elevated FOSL1 level (Figure 7k). 13
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
355 Contrastingly, we posit that TEAD4 could serve as an effector. Indeed, depletion of TEAD4
356 resulted in the reduced number of reprogrammed colonies (Figure 7l). Specificity of siTEAD4 was
357 affirmed by the mutant construct with no phenotypic change (Figures S7j-k). Knock-down effect of
358 siTEAD4 lasted for around 5 days (Figure S7l). Interestingly, when knock-down was performed on
359 D5 cells, TEAD4 expression did not restore, which could be due to the perpetuations of the non-
360 reprogrammed state introduced by siTEAD4 (Figure S7m). Notably, depletion of TEAD4 only from
361 day 5 onwards induced the decreased reprogramming efficiency (Figure 7l). This established the
362 vital role of TEAD4 at the intermediate-late stages of reprogramming. In contrast, overexpression
363 of TEAD4 resulted in a significant increase in the number of reprogrammed colonies (Figure 7m).
364 Importantly, depletion of TEAD4 revoked the reprogramming potential of D8 CD13- cells (Figure
365 7n). Collectively, these analyses illustrate that the pivot from FOSL1 to TEAD4-directed regulatory
366 network is essential for successful reprogramming.
367 Mechanistic roles of FOSL1 and TEAD4 in cellular reprogramming
368 To find the direct targets of FOSL1, we prepared ChIP-Seq libraries for D8 cells, which
369 demonstrated the expected genomic distribution profile and enriched with FOSL1 motif and its
370 regulatory partners (Figures S8a-c). We also investigated the genomic binding profile of TEAD4 in
371 D8 cells and CD13 sorted D8 cells (Figures S8d-e). Majority of the D8 TEAD4 bound sites were
372 shared by both CD13+ and CD13- cells (Figures S8f). 1.7-fold higher numbers of TEAD4 bound
373 loci were detected in the CD13- than CD13+ cells (18550 vs. 10986 sites). In addition, CD13-
374 specific sites showed greater enrichment of TEAD4 motif and exclusive enrichment of pluripotency
375 associated OCT4-SOX2-TCF-NANOG motif, indicating the differential regulatory role of TEAD4
376 (Figures S8g).
377 Consistent with their contrasting roles, majority of loci highly enriched with FOSL1 were
378 distinctive from that with TEAD4 (Figure 8a). We next clustered BJ, D16- and D16+ scATAC-Seq
379 libraries, based on the accessibility of FOSL1 and TEAD4 bound sites (Figure 8b). Notably, FOSL1 14
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
380 specific sites exhibited higher accessibility in BJ and D16- cells, whereas CD13- specific and CD13
381 common TEAD4 bound sites were mostly accessible in D16+ cells (Figures 8b-c and S8h-i). these
382 differential accessible sites were annotated as the functional FOSL1 and TEAD4 targets
383 (Supplementary Table 8). Similar to the motif enrichment pattern, most of the D8 cells were either
384 accessible for FOSL1 (“FOSL1 ChIP only” cells) or CD13- TEAD4 bound sites (“TEAD4 ChIP
385 only” cells) (Figures 8d-e and S8k-l). We next asked if their binding has any consequence to the
386 transcription of target genes. To this end, we analyzed the expression of functional FOSL1 and
387 TEAD4 targets across the D8 CD13-CD201 sorted cells. Remarkably, majority of the D8 FOSL1
388 targets (e.g. TGFBR2, SMAD3, and COL5/7/21A1) were expressed highly in the CD13+CD201+
389 cells (Figure 8f). In contrast, D8 CD13-CD201- cells demonstrated high expression of the TEAD4
390 targets, including key pluripotent genes such as DNMT3B, LIN28A, SOX2, and PODXL which
391 codes for TRA-1-60 (Figure 8g). This was further substantiated at single-cell resolution using the
392 coupled NMF analysis of D8 scRNA-Seq and scATAC-Seq libraries, by which regulatory
393 connectivity between the accessible chromatin regions and expression of target genes were
394 established (Figure S8m). Notably, cluster 1 composed of “TEAD4 ChIP only” cells (scATAC-Seq)
395 and D8 RCA G3 cells (scRNA-Seq), whilst cluster 2 comprised of “FOSL1 ChIP only” cells
396 (scATAC-Seq) and D8 RCA G1 (scRNA-Seq) cells. Differential analysis demonstrated that
397 TEAD4 and its targets, such as TET1 and CDH1, were highly accessible and expressed in the
398 cluster 2 cells, whereas FOSL1 and its targets, such as MMP2 and SMAD3, displayed the opposite
399 trends (Figures S8m-n). Besides, interactome analysis revealed that cluster 1 genes associated with
400 splicing process, whereas cluster 2 genes related to ER-Golgi transport and extracellular matrix
401 organization, including FOSL1 targets MMP2 and several collagen genes (Figure S8o). More
402 importantly, downstream targets of FOSL1 and TEAD4 were themselves modulators of
403 reprogramming. For instance, knock-down of FOSL1 bound genes, MMP2 and SMAD3, resulted in
404 higher numbers of the reprogrammed colonies (Figure 8h). Conversely, depletion of TEAD4 targets,
405 PRDM14 and SOX2, showed the opposite effect (Figure 8i).
15
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
406 Taken together, we present the single-cell roadmap of the human cellular reprogramming
407 process, which reveals the diverse cell-fate trajectory of individual reprogramming cells (Figure 8j).
408 D2 cells consist of four subgroups, out of which CDK1+ cells showed greater propensity for
409 reprogramming. Among the three subgroups of D8 cells, G1 represents the unsuccessful
410 reprogramming cells, whereas G3 cells with GDF3 expression are highly primed for successful
411 reprogramming. G2 is the intermediary group between G1 and G3. In-house developed fluorescent
412 probe (BDD2-C8) and the identified surface markers (CD13, CD44 and CD201), enables the
413 segregation of the heterogeneous population based on their reprogramming potential. Moreover,
414 TFs analysis reveals the stage-specific regulatory networks of reprogramming. Importantly, we
415 describe the crucial switch from a FOSL1 to a TEAD4-centric expression which collectively
416 regulate genomic accessibility, cell-lineage transcription program, and network of functional
417 downstream modulators favoring the acquisition of the pluripotent state.
418
419 DISCUSSION
420 Heterogeneity of Cellular Reprogramming
421 Due to the inherent heterogeneity, Single-cell NGS techniques are more suited to
422 characterize the molecular and epigenetics signatures of reprogramming cells with diverse
423 trajectories. Several studies described the mouse reprogramming process at single-cell resolution40–
424 42. However, there have been reports for extensive differences between the mouse and human
425 reprogramming in terms of kinetics, modulators involved, and the molecular nature of the generated
426 iPSCs. In addition, this is the first study presenting a comprehensive human reprogramming
427 roadmap by integrating the transcriptomic changes and the alteration of accessible epigenetic
428 regions at single-cell resolution.
16
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
429 Our analyses also identify an early reprogramming marker GDF3, which marks the
430 intermediate cells primed for successful reprogramming. GDF3 was previously shown to be
431 expressed in pluripotent stem cells and played a role in regulating cell fate via BMP signaling
432 inhibition43. We also develop a fluorescent probe, BDD2-C8, and identify a panel of cell surface
433 markers, namely CD13, CD44 and CD201, to distinguish and characterize the diverse sub-
434 populations of the intermediate reprogramming cells. Interestingly, CD44 sorting was previously
435 reported as a mean to isolate reprogrammed cells in the mouse system44. Noteworthy, combinatorial
436 use of the surface markers enables a more refined segregation. The toolkits to decipher the
437 intermediate cells with different stemness capacity, will help deepen our understanding of the
438 mechanisms of reprogramming process. Additionally, the ability to enrich for early reprogramming
439 cells will help increase the success rate of iPSC generation from the cell lines or patient-derived
440 primary cells which are refractory to reprogramming.
441 Transcription Factors Contribute to the Heterogeneity of Cellular Reprogramming
442 Facilitated by the integrative analysis of transcriptomic and chromatin accessibility profiles,
443 TFs contributing to the heterogeneous reprogramming trajectories are identified. Accessible regions
444 with the motifs of FOS-JUN-AP1 and CEBPA are rapidly closed upon induction, which were
445 reported to act as repressors in mouse reprogramming45,46 (Figure 6). An earlier report showed that
446 FOSL1 lost many of its binding as early as day 2 of mouse reprogramming46. Our study reveals that,
447 in the human system, FOSL1 regulates myriad of genes which serve as barriers of reprogramming,
448 including MMP2, SMAD3, TGFBR2, and collagen genes. Strikingly, chromatin regions with the
449 motifs of lineage TFs are open in the intermediate cells, which might be due to the induction of ME
450 and ECT lineages driven by POU5F1 and SOX247. This is further supported by the replacement of
451 POU5F1 and SOX2 with the ME and ECT lineage specifiers, for both mouse and human
452 reprogramming35,36. We unravel for the first time the transitory epigenetic accessibility directed by
17
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
453 the lineage TFs, which contribute extensively to the diverse cell fate potentials observed during
454 cellular reprogramming.
455
456 REFERENCE
457 1. Loh, Y.-H. et al. Reprogramming of T Cells from Human Peripheral Blood. Cell Stem Cell 7, 458 15–19 (2010).
459 2. Park, I. H. et al. Reprogramming of human somatic cells to pluripotency with defined factors. 460 Nature 451, 141–146 (2008).
461 3. Takahashi, K. et al. Induction of Pluripotent Stem Cells from Adult Human Fibroblasts by 462 Defined Factors. Cell 131, 861–872 (2007).
463 4. Takahashi, K. & Yamanaka, S. Induction of Pluripotent Stem Cells from Mouse Embryonic 464 and Adult Fibroblast Cultures by Defined Factors. Cell 126, 663–676 (2006).
465 5. Tan, H.-K. et al. Human Finger-Prick Induced Pluripotent Stem Cells Facilitate the 466 Development of Stem Cell Banking. Stem Cells Transl. Med. 3, 586–598 (2014).
467 6. Yu, J. et al. Induced pluripotent stem cell lines derived from human somatic cells. Science 468 318, 1917–20 (2007).
469 7. Seah, Y. F. S., El Farran, C. A., Warrier, T., Xu, J. & Loh, Y. H. Induced pluripotency and 470 gene editing in disease modelling: Perspectives and challenges. Int. J. Mol. Sci. 16, 28614– 471 28634 (2015).
472 8. Cheow, L. F. et al. Single-cell multimodal profiling reveals cellular epigenetic heterogeneity. 473 Nat. Methods 13, 833–836 (2016).
474 9. Polo, J. M. et al. A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 475 151, 1617–1632 (2012).
476 10. Stadtfeld, M. & Hochedlinger, K. Induced pluripotency: History, mechanisms, and 477 applications. Genes and Development 24, 2239–2263 (2010).
478 11. Maherali, N. et al. A High-Efficiency System for the Generation and Study of Human 479 Induced Pluripotent Stem Cells. Cell Stem Cell (2008). doi:10.1016/j.stem.2008.08.003
480 12. Merkl, C. et al. Efficient Generation of Rat Induced Pluripotent Stem Cells Using a Non- 481 Viral Inducible Vector. PLoS One 8, (2013).
482 13. Borkent, M. et al. A Serial shRNA Screen for Roadblocks to Reprogramming Identifies the 483 Protein Modifier SUMO2. Stem Cell Reports 6, 704–716 (2016).
484 14. Qin, H. et al. Systematic identification of barriers to human Ipsc generation. Cell 158, 449– 485 461 (2014).
18
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
486 15. Toh, C. X. D. et al. RNAi Reveals Phase-Specific Global Regulators of Human Somatic Cell 487 Reprogramming. Cell Rep. 15, 2597–2607 (2016).
488 16. Yang, C. S., Chang, K. Y. & Rana, T. M. Genome-wide Functional Analysis Reveals Factors 489 Needed at the Transition Steps of Induced Reprogramming. Cell Rep. 8, 327–337 (2014).
490 17. Fang, H. T. et al. Global H3.3 dynamic deposition defines its bimodal role in cell fate 491 transition. Nat. Commun. 9, (2018).
492 18. Gawad, C., Koh, W. & Quake, S. R. Single-cell genome sequencing: Current state of the 493 science. Nature Reviews Genetics 17, 175–188 (2016).
494 19. Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory 495 variation. Nature 523, 486–490 (2015).
496 20. Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly 497 multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).
498 21. Ramsköld, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual 499 circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).
500 22. Cahan, P. et al. CellNet: Network biology applied to stem cell engineering. Cell 158, 903– 501 915 (2014).
502 23. Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular 503 heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).
504 24. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by 505 pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
506 25. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. 507 Methods 14, 979–982 (2017).
508 26. Tsubouchi, T. et al. DNA synthesis is required for reprogramming mediated by stem cell 509 fusion. Cell 152, 873–883 (2013).
510 27. Lu, Y. et al. Alternative splicing of MBD2 supports self-renewal in human pluripotent stem 511 cells. Cell Stem Cell (2014). doi:10.1016/j.stem.2014.04.002
512 28. Jeong, Y. M. et al. CDy6, a Photostable Probe for Long-Term Real-Time Visualization of 513 Mitosis and Proliferating Cells. Chem. Biol. (2015). doi:10.1016/j.chembiol.2014.11.018
514 29. Yun, S.-W. et al. Neural stem cell specific fluorescent chemical probe binding to FABP7. 515 Proc. Natl. Acad. Sci. (2012). doi:10.1073/pnas.1200817109
516 30. Tan, F., Qian, C., Tang, K., Abd-Allah, S. M. & Jing, N. Inhibition of transforming growth 517 factor β (TGF-β) signaling can substitute for Oct4 protein in reprogramming and maintain 518 pluripotency. J. Biol. Chem. (2015). doi:10.1074/jbc.M114.609016
519 31. Ruetz, T. et al. Constitutively Active SMAD2/3 Are Broad-Scope Potentiators of 520 Transcription-Factor-Mediated Cellular Reprogramming. Cell Stem Cell (2017). 521 doi:10.1016/j.stem.2017.10.013
19
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
522 32. van Dijk, D. et al. Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. 523 Cell (2018). doi:10.1016/j.cell.2018.05.061
524 33. Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. 525 Cell (2012). doi:10.1016/j.cell.2012.04.040
526 34. Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. ChromVAR: Inferring 527 transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 528 14, 975–978 (2017).
529 35. Shu, J. et al. XInduction of pluripotency in mouse somatic cells with lineage specifiers. Cell 530 (2013). doi:10.1016/j.cell.2013.05.001
531 36. Montserrat, N. et al. Reprogramming of human fibroblasts to pluripotency with lineage 532 specifiers. Cell Stem Cell (2013). doi:10.1016/j.stem.2013.06.019
533 37. Koga, M. et al. Foxd1 is a mediator and indicator of the cell reprogramming process. Nat. 534 Commun. (2014). doi:10.1038/ncomms4197
535 38. Takahashi, K. et al. Induction of pluripotency in human somatic cells via a transient state 536 resembling primitive streak-like mesendoderm. Nat. Commun. (2014). 537 doi:10.1038/ncomms4678
538 39. Gabut, M. et al. An alternative splicing switch regulates embryonic stem cell pluripotency 539 and reprogramming. Cell (2011). doi:10.1016/j.cell.2011.08.023
540 40. Buganim, Y. et al. Single-cell expression analyses during cellular reprogramming reveal an 541 early stochastic and a late hierarchic phase. Cell 150, 1209–1222 (2012).
542 41. Schiebinger, G. et al. Optimal-Transport Analysis of Single-Cell Gene Expression Identifies 543 Developmental Trajectories in Reprogramming. Cell (2019). doi:10.1016/j.cell.2019.01.006
544 42. Guo, L. et al. Resolving Cell Fate Decisions during Somatic Cell Reprogramming by Single- 545 Cell RNA-Seq. Mol. Cell (2019). doi:10.1016/j.molcel.2019.01.042
546 43. Levine, A. J. & Brivanlou, A. H. GDF3, a BMP inhibitor, regulates cell fate in stem cells and 547 early embryos. Development 133, 209–16 (2006).
548 44. O’Malley, J. et al. High-resolution analysis with novel cell-surface markers identifies routes 549 to iPS cells. Nature 499, 88–91 (2013).
550 45. Liu, J. et al. The oncogene c-Jun impedes somatic cell reprogramming. Nat. Cell Biol. 17, 551 856–867 (2015).
552 46. Chronis, C. et al. Cooperative Binding of Transcription Factors Orchestrates Reprogramming. 553 Cell 168, 442-459.e20 (2017).
554 47. Loh, K. M. & Lim, B. A precarious balance: Pluripotency factors as lineage specifiers. Cell 555 Stem Cell (2011). doi:10.1016/j.stem.2011.03.013
556
557 ACKNOWLEDGMENTS
20
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
558 We are grateful for Ahad Khalilnezhad for technical assistance and Lin Yang and Samantha Seah
559 for helpful discussion. H.L. is supported by Glenn Foundation for Medical Research, Mayo Clinic
560 Center for Biomedical Discovery and Mayo Clinic Cancer Center. Y-H.L. is supported by the [NRF
561 Investigatorship award -NRFI2018-02] and [JCO Development Programme Grant - 1534n00153]
562 grants. Y-H.L. and L-F.Z. are supported by the Singapore National Research Foundation under its
563 Cooperative Basic Research Grant administered by the Singapore Ministry of Health’s National
564 Medical Research Council [NMRC/CBRG/0092/2015]. We are grateful to the Biomedical Research
565 Council, Agency for Science, Technology and Research, Singapore for research funding.
566
567 AUTHOR CONTRIBUTIONS
568 Contribution: Q.R.X, C.E.F designed and performed research, analyzed data, and wrote the
569 paper; P.G, Y.S.C, C.X.D.T, T.W designed and conducted research; N.Y.K, S.S, Y.T.C, J.X, J.C,
570 H.L, L.F.Z analyzed data; and Y.H.L. designed research, analyzed data, and wrote the paper.
571
572 AUTHOR INFORMATION
573 The authors declare no competing financial interests.
574
575 FIGURE LEGENDS
576 Figure 1. Schematic of the single-cell systems used for de-convoluting the heterogeneity in
577 human cellular reprogramming
578 (a) Overview of the prepared single-cell NGS libraries at the indicated time-points of the human
579 cellular reprogramming. Two single-cell platforms were utilized for the study. Microfluidics
580 platform was used to prepare 439 single cell RNA-Seq and 891 single cell ATAC-Seq libraries
21
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
581 (Duplicates). 10x Genomics platform provided us with an additional 32138 cells for scRNA-Seq
582 analysis. (b) Quality control of Microfluidics-capture based scRNA-Seq libraries. Dotplot
583 demonstrates the exon mapping percentage (X-axis) of each scRNA-Seq libraries, along with its
584 corresponding detected gene rates (Y-axis). Axis crosses each other at the cutoff values that filter
585 the libraries. Blue dots represent libraries that passed the QC filters. (c) Average enrichment of
586 scRNA-Seq libraries over genebodies. (d) t-SNE plot of the prepared 10x scRNA-Seq libraries
587 based on total cellular transcriptomes. (e-f) Super-imposition of the single-cell expression levels of
588 MET genes: ZEB1 and EPCAM (e), and fibroblast and pluripotent genes: COL1A1 and NANOG (f),
589 on the tSNE plot. The expression ranges from no expression (grey) to high expression (red). (g)
590 Quality control of scATAC-Seq libraries. Dotplot demonstrating the library size (X-axis) of each
591 scATAC-Seq library, along with its corresponding contribution to its respective time-point’s HARs
592 (Y-axis). Red dots represent the cells that pass the QC filters. (h) Average enrichment profile of a
593 D16+ scATAC-Seq library around Transcription Start Sites (TSS) of the genome with a window of
594 -3K to 3K. Y-axis denotes the average normalized read counts of the library over the indicated
595 region in the genome (x-axis). (i) Histograms of insert size metric of a D16+ scATAC-Seq library
596 revealing a nucleosomal pattern, which is characteristic of a good ATAC-Seq library.
597 Figure 2. Identification of diverse reprogramming subgroups
598 (a-c) RCA clustering of D2 (a), D8 (b), and D16+ (c) cells based on the profile of the expressed
599 genes. The PCAs show subgroups of reprogramming cells at individual time points. Each color
600 represents a subgroup. (d) Heatmap showing dynamics of the genes differentially expressed among
601 4 subgroups of D2, 3 subgroups of D8, and 2 subgroups of D16+ cells, across the reprogramming
602 process. Color represents the expression level, ranging from dark blue (low) to dark red (high). The
603 expression values are Log10 (FPKM+1). The color code on top represents the time points (above)
604 and their respective subgroups (below). (e) Bar charts demonstrating the expression pattern of D2
605 (left), D8 (middle), and D16+ (right) subgroup specific genes. Color represents the expression level.
606 X-axis represents the percentage of cells with the expression within the indicated range. Red dotted 22
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
607 boxes highlight the subgroups with high expression of the respective gene. (f) Box plots showing
608 the single cell expression of the differentially expressed genes CDK1 (D2), GDF3 (D8), MMP2
609 (D16+), and LIN28A (D16+) across all the time points and their respective subgroups. Lines in the
610 box represent the median expression in the respective subgroups or time points. The expression
611 values are Log10 (FPKM+1). (g) Confusion matrix of the scRNA-seq libraries of all time points,
612 using Random Forest algorithm. Color represents the number of libraries from each time point
613 predicted to be similar to the indicated subgroups of D8. Scale ranges from dark blue (low number)
614 to dark red (high number).
615 Figure 3. Trajectories of cellular reprogramming
616 (a) Trajectory of reprogramming cells identified from the 10x scRNA-Seq libraries based on
617 DDRTree dimension reduction. Color represents the time points. (b) Pseudotime calculation based
618 on the DDRTree trajectory. Color indicates pseudotime, ranging from dark blue (early) to light blue
619 (late). The plot shows four branching events. “+succ” and “-succ” represents the successful and
620 unsuccessful braches. (c) Stacked columns indicating the percentage of cells belonging to the
621 respective time-points in the reprogramming trajectory states. Color represents the time points. (d)
622 Super-imposition of D8 subgroups (left) and D16+ subgroups (right) on the trajectory of
623 reprogramming. Color represents the subgroups. (e) Stacked columns revealing the percentage of
624 cells of the respective D8 subgroups in the indicated reprogramming trajectory states. Colors
625 represent the subgroups of D8 and grey color indicates cells of the other time points. (f) Super-
626 imposition of the expression of D8 subgroup genes (RFC3, GDF3) and D16+ subgroup genes
627 (NANOG, MMP2 and LIN28A) on the reprogramming trajectories. Expression ranges from purple
628 (low) to yellow (high). (g-h) GO analysis of the differentially expressed genes associated with the
629 unsuccessful (g) and successful (h) reprogramming branches. The enrichment score ranges from
630 white (no) to red (high). The braches are labelled in Fig.3b.
23
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
631 Figure 4. Identification of chemical dyes and surface markers for early-intermediate
632 reprogramming cells
633 (a) Quantification of TRA-1-60+ colonies yielded from D8 cells sorted with BDD2-C8 (top).
634 Representative images are shown below. n=3; error bar indicates SD. (b) Gene expression in the D8
635 BDD2-C8+ and BDD2-C8- cells, measured by single cell qRT-PCR. Student t-test (2-tails) was
636 applied (p<0.05). (c) RCA plot of D8, D8BDD2-C8+ and D8BDD2-C8- cells. (d) Expression heatmap of
637 the D8 differential genes in D8BDD2-C8+ and D8BDD2-C8- cells. Color code on top represents the time
638 points (above) and the subgroups or sorted cells (below). (e) GO terms enriched by the differential
639 genes of D8BDD2-C8+ and D8BDD2-C8- cells. (f) Heatmap showing the expression dynamics of the
640 surface markers. Color code on top represents the time point and their respective subgroups. (g)
641 Boxplots showing the surface markers expression across the time points and their respective
642 subgroups. Lines in the box represent the median expression. (h) Stacked histograms (top) showing
643 the fluorescence intensity (X-axis) of the surface markers. Red dotted boxes highlight the positively
644 stained cells. Quantifications are shown below. (i) Barchart exhibiting the relative expression levels
645 in the D8 sorted cells, normalized to that of D8 CD44+ cells. n=2; error bar indicates SD. (j)
646 Quantification of TRA-1-60+ colonies yielded from the D8 sorted cells. Representative images are
647 shown below. n=2; error bar indicates SD. (k) Boxplots showing the surface marker expression in
648 the D8BDD2-C8+ and D8BDD2-C8- cells. Lines represent the median expression. (l) Overlaid histograms
649 showing the surface marker staining intensity in the BDD2-C8+ and BDD2-C8- cells. Red dotted
650 boxes highlight the positively stained cells. The numbers on top indicate the percentage of
651 positively stained cells. (m) Barchart showing the distribution of co-staining signals of CD13 and
652 CD44 in the cells of various reprogramming time points and cell lines. (n) Barcharts exhibiting the
653 relative expression of the collagen/ mesenchymal genes (top) and pluripotent genes (bottom) in the
654 D8 CD13 sorted cells. n=2; error bar indicates SD. (o) Quantification of TRA-1-60+ colonies
655 yielded from D8 CD13 sorted cells (top). Representative images are shown below. n=2; error bar
656 indicates SD.
24
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
657 Figure 5. Refined classification and enrichment of early-intermediate reprogramming cells
658 (a) t-SNE plot generated by 10X libraries of D8 CD13 sorted cells. (b) t-SNE plot showing the
659 clusters among the CD13+ and CD13- cells. (c) RCA heatmap demonstrating the clustering of
660 CD13 sorted cells (column) based on their correlation to the cells of different lineages (row). Color
661 code on top indicates the CD13 antigen profile (below) and clusters (above). (d) MAGIC plot
662 showing the correlative expression of CD13 and GDF3 in the D8 CD13-sorted 10x libraries. Color
663 represents the expression level of NANOG. (e) Violin plot demonstrating the expression of GDF3
664 across the identified clusters. (f) Top: Reprogramming trajectory constructed by the 10X scRNA-
665 Seq libraries of various time points and the D8 CD13 sorted cells. Bottom: Super-imposition of
666 CD13 clusters on the reprogramming trajectory. (g) Heatmap showing the differentially expressed
667 genes of CD13 clusters. Genes highlighted in orange were expressed higher in D8 G2 and G3 than
668 G1. (h) Barcharts showing the GO terms enriched by the genes highly expressed in the indicated
669 CD13 clusters. (i) MAGIC plot showing the correlative expression of CD13 and CD201 in the
670 CD13 10X libraries. Color represents the expression level of GDF3. (j) Barchart showing the
671 distribution of co-staining signals across the various reprogramming time points and cell lines. (k)
672 Barchart exhibiting the relative gene expression in the D8 dual antibody sorted cells, normalized to
673 that in CD13+CD201+ cells. n=2; error bar indicates SD. (l) Left: Heatmap showing the
674 differentially expressed genes among the D8 dual antibody sorted cells. GO terms (Middle)
675 enriched by the differentially expressed genes, with the enrichment values and the related genes
676 indicated on the right. (m) Bar chart demonstrating the number of normalized TRA-1-60+ colonies
677 (Y-axis) upon the knock-down of genes highly expressed in CD13+ CD201+ or CD13-CD201+
678 cells, at day 5 of reprogramming. Representative images are shown above. n=3. Error bar indicates
679 SD. (n) Quantification of TRA-1-60+ colonies yielded from the D8 dual antibody sorted cells.
680 Representative images are shown above. n=2; error bar indicates SD.
681 Figure 6. Transcription factors critical for reprogramming
25
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
682 (a) Heatmap showing the expression dynamics of the transcription factors across the pseudotime
683 states. Color represents the expression level, ranging from purple (low) to yellow (high). Color code
684 on top represents the pseudotime states. Transcription factors were classified to 5 categories
685 according to the states when they were silenced or expressed. The representative TFs of each
686 category are listed on the right. (b) Correlation heatmap of scATAC-Seq libraries based on the
687 calculated JASPAR motifs deviations in the HARs peaks. Color represents the correlation value,
688 ranging from blue (no) to red (high). Side color bar indicates the time-point to which each scATAC-
689 Seq library belongs. (c) Plot indicating the significantly variable motifs in terms of accessibility
690 from the scATAC-Seq libraries. Y-axis represents the variability score assigned to each JASPAR
691 motif whereas the X-axis represents the motif rank. (d) Heatmap of scATAC-Seq libraries based on
692 the deviation scores of the significantly variable JASPAR motif. Color indicates the accessibility
693 level, ranging from dark blue (no enrichment) to dark red (high enrichment). Color code on top
694 represents the time points. Motifs were classified to 3 major types according to the dynamics of
695 accessibility across the time points. (e) t-SNE plot of scATAC-Seq libraries based on the deviation
696 scores of JASPAR motifs. (f-i) Super-imposition of motif enrichment scores for Close motif-
697 FOSL1 and CEBPA (f), Transient motif-GATA1:TAL1 (g), Open motif: Early Open-TEAD4 (h);
698 Late Open- FOXL1 (i) on the scATAC-Seq tSNE plot. Color indicates the accessibility level and
699 ranges from dark blue (no enrichment) to dark red (high enrichment).
700 Figure 7. Transcription factors contributing to the heterogeneity in chromatin accessibility of
701 the intermediate reprogramming cells
702 (a) Plot indicating the significantly variable motifs in terms of accessibility among D8 cells. Y-axis
703 represents the variability score assigned to each JASPAR motif whereas X-axis represents the motif
704 rank. (b) Heatmap showing clustering of D8 scATAC-Seq libraries based on the deviations scores
705 of the significantly variable JASPAR motif. (c) Super-imposition of motif enrichment scores for
706 FOSL1 and TEAD4 on the D8 scATAC-Seq tSNE plot. (d) Super-imposition of FOSL1 and TEAD4
707 expression on the reprogramming trajectory displayed in Figure 3a. (e) Dotplots indicating the 26
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
708 expression of FOSL1 and TEAD4 along the pseudotime. Smooth lines represent the mean
709 expression level at each pseudotime, regardless of the state. (f) Top: MAGIC plot showing
710 correlative expression of FOSL1 and TEAD4 in the 10X libraries. Bottom: Super-imposition of
711 GDF3 expression on the magic plot. (g) Top: tSNE plot based on the regulon activity matrix
712 generated by SCENIC. Bottom: Activity of FOSL1 and TEAD4 regulons across the time-points. (h)
713 Super-imposition of FOSL1 and TEAD4 expression on the D8 CD13 sorted tSNE plot. (i) Bar chart
714 demonstrating the number of normalized TRA-1-60+ colonies upon the knock-down of FOSL1 at
715 the indicated time-points of reprogramming. Representative images are shown below, n=3. Error
716 bar indicates SD. (j) Bar chart demonstrating the quantification of TRA-1-60+ colonies upon the
717 overexpression of FOSL1 at D5. Representative images are shown below, n=3. Error bar indicates
718 SD. (k) Bar chart demonstrating the quantification of TRA-1-60+ colonies yielded from D8 CD13-
719 cells with FOSL1 overexpression. Representative images are shown below, n=3. Error bar indicates
720 SD. (l) Bar chart demonstrating the number of normalized TRA-1-60+ colonies upon the knock-
721 down of TEAD4 at the indicated time-points of reprogramming. Representative images are shown
722 below, n=2. Error bar indicates SD. (m) Bar chart demonstrating the quantification of TRA-1-60+
723 colonies upon the overexpression of TEAD4. Representative images are shown below, n=3. Error
724 bar indicates SD. (n) Bar chart demonstrating the number of TRA-1-60+ colonies yielded from D8
725 CD13- cells with TEAD4 depletion. Representative images are shown below, n=3. Error bar
726 indicates SD.
727 Figure 8. Mechanistic role of FOSL1 and TEAD4 during reprogramming
728 (a) Heatmaps exhibiting the D8 FOSL1 and D8 TEAD4 ChIP-Seq enrichment over merged binding
729 loci of both these factors. (b) Left: t-SNE clustering of BJ, D16+, and D16- scATAC-Seq libraries
730 based on the deviation scores of common and specific sites identified from D8 FOSL1 ChIP-seq,
731 D8 CD13- TEAD4 ChIP-seq, and D8 CD13+ TEAD4 ChIP-seq. Right: Super-imposition of
732 deviation score for FOSL1 specific bound sites and TEAD4 CD13- specific bound sites on the
733 scATAC-Seq t-SNE plot. Color indicates the accessibility level. (c) MA plots of scATAC-Seq 27
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
734 revealing the differentially accessible FOSL1 bound sites (left) and CD13- specific TEAD4 bound
735 sites (right) between BJ and D16+ cells. X-axis denotes the mean of normalized counts in BJ cells.
736 Red dots denote the sites with significant accessibility changes. (d) Super-imposition of enrichment
737 scores for FOSL1 specific bound sites (left) and CD13- TEAD4 specific bound sites (right) on the
738 D8 scATAC-Seq tSNE plot. Color indicates the accessibility leve. (e) UCSC screenshots
739 illustrating the chromatin accessibility of FOSL1bound genes (MMP2 and SMAD3) and TEAD4
740 CD13- bound sites (TET1 and RFC3) in D8 “FOSL1 ChIP only” and D8 “TEAD4 ChIP only” cells.
741 The differentially accessible sites with highlighted with dotted rectangles. (f-g) Expression heatmap
742 of the differentially accessible genes bound by FOSL1 (f) in D8 and TEAD4 in D8 CD13- cells (g),
743 among the D8 dual antibody sorted cells. (h-i) Bar chart demonstrating the number of TRA-1-60+
744 colonies upon knock-down of FOSL1 targets (h) and TEAD4 targets (i) at D5. Representative
745 images are shown below. n=3. Error bar indicates SD. (j) Proposed model of the study.
28
bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. ad 40 OSKM
20 BJ BJ scRNA-seq scATAC-seq D2 D8 10xGenomics Microfluidics-capture 0 D12
Oil tSNE2 D16- Day 2 -20 D16+
Cell Oil -40 Day 8 33468 cells -40 -20 0 20 40 tSNE1 e ZEB1 EPCAM Day 12 Enrichment of early-intermediate cells Transcription Factors
Surface markers Chemical screen Accessibility & ChIP-seq 25
Day 16 0 High tSNE2
-25
TRA-1-60- TRA-1-60+ Low
-25 0 25 -25 0 25 tSNE1 bc f COL1A1 NANOG 60 Enrichment over genebodies 1 50 0.8 25 40 25
30 0.6 0 High
20 0.4 tSNE2 20 40 60 80 100 120
Detected gene rate 10 0.2 -25 0 0 Low Exon mapping rate 0 20 40 60 80 100 -25 0 25 -25 0 25 gh i tSNE1
1200 70 D16+(No.52) D16+(No.52) 1 60 1000 0.8 50 800
40 0.6 600 RPM 30 400 0.4 Read Count % in HARs 20 0.2 200 2310 4 567 0 0 -3000 -1500 0 1500 3000 0 200 400 600 800 1000 Distance to TSS (bp) Insert-size (bp) Library size (Log10 )
Figure 1 bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. a b c D2 D8 D16+ 4 3 1 0 1 G1 G1 PC2 PC2 -1 G2 PC2 G1 G2 G3 -4 -1 G2 G3 G4 -3 -3 -8 -6-2 2 6 -5 0 5 -10 -5 0 PC1 PC1 PC1 d BJ D2 D8 D16- D16+ G1 G2 G3 G4 G1 G2 G3 G1 G2 Time points Subgroups ERBB3 GAA WNT1 CDC20 CDK1 UBE2C FOXF1 expressed FOSL1 MBD3 FOS D2 differentially D2 differentially ID1 ID1 JUNB LUM CARM1 PLAU GDF3 SNAI2 ENG TFAP2C FBLN1 TH TRIM32 COL1A1 CXADR COL1A1 TGFBR2 COL6A3 STRA6 expressed BMP1 CDKN2A D8 differentially D8 differentially COL7A1 EMP1 IGFBP5 MMP2 RUNX1 TGFBI CDK1 TNC CDH1 WNT5A CDC6 ZEB1 CDC73 DPPA4 EPCAM NASP JARID2 KRT7 expressed LIN28A NANOG 4
D16+ differentially D16+ differentially NODAL ORC6 POLR3K 2 RNASEH2A (FPKM+1)
SETD6 10 TEAD4 0 WBP11 Log ZSCAN10 HNRNPA1 SNRPD1 e g 0
f CDK1 GDF3 MMP2 LIN28A 4 4 3 3
3 3 2 2
2 2 (FPKM+1) 10 1 1
Log 1 1
0 0 0 0 BJ BJ BJ BJ D2 D2 D2 D2 D8 D8 D8 D8 D16- D16- D16- D16- D16+ D16+ D16+ D16+
Figure 2 bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. a b Pseudotime c
BJ D2 D8 D12 D16- D16+ 02040 100%
20 20 80% BJ D2 10 9 10 60% D8 +succ 4 +succ 7 -succ 3 D12 -succ 40% PC2 2 PC2 +succ 4 1 -succ -succ 8 D16- 0 5 0 2 +succ 3 6 20% D16+
1 -10 -10 0% -20 -10 0 10 -20 -10 0 10
PC1 PC1 State 1 State 2 State 3 State 4 State 5 State 6 State 7 State 8 State 9 d e
100% D8 G1 D8 G2 D8 G3 D16+ G1 D16+ G2 20 15 80% 15 D8 G1 10 60% D8 G2 10 D8 G3 5 Others 5 40% PC2 PC2 0 0 20%
-5 -5 0%
-20 -10 0 10 -10 0 10 PC1 PC1 State 1 State 2 State 3 State 4 State 5 State 6 State 7 State 8 State 9 f RFC3 GDF3 NANOG MMP2 LIN28A 20
high 10 (value+0.1) 10 PC2 0 Log Low
-10 -20 -10 0 10 -20 -10 0 10 -20 -10 0 10 -20 -10 0 10 -20 -10 0 10
PC1
g Unsuccessful branches h Successful branches -Log (P value) -Log (P value) 10 10 Branch point 20 36 Branch point 20 36 134 134 Cell adhesion Cell division Extracellular matrix organization Cell proliferation Heart development DNA replication Cytoskeleton organization Chromosome organization Cardiac septum morphogenesis Extracellular matrix organization Kidney development Collagen catabolic process Angiogenesis Cell adhesion Positive regulation of osteoblast differentiation Angiogenesis Glomerular visceral epithelial cell development Endothelial cell migration +ve. reg of canonical Wnt signaling pathway Skeletal system development Cell division Epidermis development Cell proliferation mRNA splicing, via spliceosome DNA damage response, detection of DNA damage
Figure 3 bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available a b under aCC-BY-NC 4.0 International license. c D8 FACS sort, D21 assay
12 BDD2-C8+BDD2-C8+ BDD2-C8- SALL4 DNMT3B 4 D8 BDD2-C8+ 8 RFC3 D8 BDD2-C8- genes
Pluripotent CDH1
4 PC2 D8G1 WNT5A 0 D8G2 SNAI2 D8G3 0 CD44 No. of TRA-1-60+ colonies No. of PLAUR -4
BDD2-C8- BDD2-C8+ -chymal genes D8 D8 Somatic/ Mesen Log2 expression -5 015 0 0 Low High PC1
d e f BJ D2D8 D16- D16+
D8 BDD2-C8- D8 BDD2-C8+ G1 G2 G3 G1 G2 G3 G4 G1 G2 G3 G1 G2 BDD2-C8- BDD2-C8+ CD13 1 D8 D8 CD44 Embryo development Regulation of transcription involved in CD59 G1/S transition of mitotic cell cycle CD201 0 Cellular response to cadmium ion Per row normalized Stem cell population maintenance g CD13 CD44 CD201 Epithelial- Mesenchymal Transition 2.5 3 Extracellular matrix organization 2 3 2 Inflammatory response 1.5 2
KRAS Signalling (FPKM+1) 1 10 1 1
D8 differentially expressed D8 differentially Response to hypoxia
Log 0.5 Cell-cell adhesion 0 0 0 Heart development BJ BJ BJ D2 D2 D2 D8 D8 D8 D16- D16- D16- D16+ D16+ D16+
-Log10 (p value) Log (FPKM+1) i 10 0 20 CD13+ CD44+ CD201+ 3.5 042 CD13- CD44- CD201- ** ** * 3 ** ** *** h CD13 CD44 CD201 2.5 ** ** ** * * * * * ** 2 * * * * ** * ** H1 1.5 * ** 1 D12 0.5 Relative expression D10 0
D8 ZEB1 ZEB2 CDH1 GDF3 SNAI1 SNAI2 LIN28A EPCAM NANOG POU5F1 D6 j 75 25 75 D4 m 60 20 60 Percentage (%) 0 20406080100 D2 45 15 45 H1 BJ D12 30 10 30 D8 01103 104 105 0103 104 105 0 03 104 105 5 15 100 CD13CD44 CD201 15 D4 80 D2 No. of TRA-1-60+ colonies No. of 0 0 0 60 CD13+ CD13- CD44+ CD44- CD201+ CD201- BJ 40 20 CD13+CD44+ CD13-CD44+
% of +ve cells 0 CD13+CD44- CD13-CD44- BJ BJ BJ D2 D4 D6 D8 H1 D2 D4 D6 D8 H1 D2 D4 D6 D8 H1 D10 D12 D10 D12 D10 D12
k CD13 CD44 CD201 200 250 200 n o 400 D8 Sendai repro (MSC) 150 D8 Sendai repro (MSC) 150 CD13+ CD13- 1.2 100 1000 200 100 FPKM 0.9 50 50 800 0.6 0 0 0 600 D8 BDD2-C8- D8 BDD2-C8+ D8 BDD2-C8- D8 BDD2-C8+ D8 BDD2-C8- D8 BDD2-C8+ 0.3 400 l Relative expression 0 Top 10% BDD2-C8 Bottom 10% BDD2-C8 200
SNAI1 TRA-1-60+ colonies No. of 15.2% 13.6% 79.8% 60 COL1A1 COL6A3 GAPDH 0 51.5% 52.9% 87.3% 50 CD13+ CD13- CD13+ CD13- 40 +A83-01 +A83-01
Count 30 20 10 Relative expression 0 103 104 105 0 103 104 105 0 103 104 105 0 CD13 CD44 CD201 GDF3 TEAD4 SALL4 LIN28A NANOG DNMT3B Figure 4 bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. abc d Correlation value NANOG CD13+ CD13- 1 2 3 4 5 6 7 8 00.20.1 0.3 40 40 Low high 0.6
20 20 Clusters CD13 MSC_Fetal Liver 0.4 MSC_Fetal Blood 0 0 MSC_Fetal Bone Marrow Fibroblast Foreskin GDF3 tSNE 2 tSNE 2 MSC MSC 0.2 -20 MSC -20 ESC iPSC ESC ESC ESC 0 -40 -40 -50 -25 0 25 ESC -50 -25 0 25 5 6 7 8 732 4 1Clusters 00.511.52 tSNE 1 tSNE 1 (G1) (G2) (G2) (G3) CD13
egGDF3 CD13 Clusters 8 3 7 6 5 2 4 High
3 UMI UMI 1 Low 2
0 1 12345678 AK4 FN1 AXL FLG LOX LIM2 IGF2 MGP IFI27 KRT8 PLAT SOX4 CAV1 FLNB CD44 CD83 SPP1 CD13 CD70 CD36 CD55 GDF3 PLAU JUNB MT1E RGS5 MT2A CSTB MT1X EMP2 RGS2 MT1G APOE SNAI2 TIMP1 TIMP3 C9orf3 LYPD2 PALLD CALB1 KRT18 CD201 SULF1 TDGF1 CD13 Clusters KRT6A TAGLN TEAD4 THBS1 MFAP5 CHPT1 ANXA1 GSTP1 PRSS3 ANXA2 SPARC HAND1 CENPF KDM1A MFGE8 HMGB2 ATP2B1 ARGFX IGFBP4 IGFBP6 IGFBP5 LEFTY2 LGALS1 TUBB4A MAD2L1 OLFML3 COL6A1 COL3A1 COL1A1 COL6A3 KCTD12 COL6A2 COL1A2 B3GNT7 ASRGL1 AKR1C1 CRABP2 MIR302B C9orf135 C12orf75 ALDH1A1 SCGB3A2 SOSTDC1 HIST1H4C SERPINB9 MIR205HG LINC00437 TNFRSF12A f hiD8-G2&G3 high -Log (P value) BJ D2 D8 CD13+ D8 CD13- 10 GDF3 D12 D16- D16+ 024 00.40.2 0.6 mitotic nuclear division 10 0.75 chromosome segregation nucleosome assembly pericentric heterochromatin assembly 0.5 PC2 Cluster 1&4 0 in utero embryonic development CD201 organ regeneration epithelial cell differentiation erythrocyte differentiation 0.25 -10 liver regeneration eye development -20 -10 0 10 20 brain development 0 0.5 1 1.5 2 CD13 -Log (P value) 1 2 3 4 5 6 7 8 10 0 5 10 15 20 25 j 10 Percentage (%) extracellular matrix organization 0 20406080100 collagen catabolic process cell adhesion BJ PC2 0 endodermal cell differentiation Cluster 5&6&8 D2 response to hypoxia Cluster 7 +ve reg. of cell migration D4 epidermis development D8 antigen process and presentation -10 reg. of I-KB/NF-KB signaling D12 cell division -10 0 10 CD13+CD201+ CD13-CD201+ PC1 sister chromatid cohesion CD13+CD201- CD13-CD201- k lm
Relative expression normalized to GAPDH 8 012345678 GO terms -Log10(p-value) Genes 7 D8 CD13+CD201+D8 CD13-CD201+D8 CD13-CD201- FOSL1 6 D8 CD13+CD201+ extracellular matrix organization 20.16 CNN1 D8 CD13-CD201+ CD44; SERPINE1; 5 collagen catabolic process 12.97 COL1A1;COL6A3; CD44 D8 CD13-CD201- LUM; FN1; FOSL1; 4 cell adhesion 11.23 CNN1; FBN1 3 LUM type I interferon signaling pathway 12.04 2 ISG20; ISG15; SNAI2 antigen processing via MHC class I 6.22 IRF1; IRF2; IRF9 1 ZEB1 TGFB1; TGFB2; TGFBR2; 0 response to hypoxia 7.53 SMAD3; MMP14; PLAU Normalized TRA-1-60+ colonies COL1A1 VEGFA; VEGFC; CD13; siNT2 CD44 siJUN angiogenesis 10.44 si siFBN1 siCNN1 FZD8; ELK3 COL6A3 n siCOL6A3 angiogenesis 6.21 MMP2; ID1; JUN;TGFBI CD201 epithelial to mesenchymal transition 2.99 BMP4; NOTCH1; SNAI1 epidermis development 2.62 KRT17; CRABP2; GJB5 CDH1 100 muscle filament sliding 4.11 TEAD4 cell adhesion 2.96 80 DNA replication 18.46 NANOG cell division 7.87 NANOG; KDM1A; MCM10; CDC25A cell proliferation 2.55 60 GDF3 protein localization to plasma membrane 1.89 F11R; TSPAN33; CDH1; TSPAN15 inner cell mass cell proliferation 1.56 GINS1;GINS4; SALL4 40 positive regulation of neuron differentiation 2.4 MEF2C; DNMT3B; LIN28A SALL4 cellular response to cadmium ion 2.4 MT1H; MT1X; MT1G; MT1F SMAD protein signal transduction 1.5 GDF3; SMAD1; LEFTY1 20 DNMT3B somatic stem cell population maintenance 1.43 POU5F1; SOX2; LIN28A; DPPA4
LIN28A No. of TRA-1-60+ colonies 0 Per row normalized CD13 + - - GAPDH CD201 + + - 00.41 Figure 5 bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available a under aCC-BY-NC 4.0 International license. 1236845 7 9JUNB States CREB3L1 CREBRF DOT1L DRAP1 EPAS1 FOSL1 FOXC2 FOXL1 Early silenced (-succ/+succ) HLX JUND JUN KLF2 KLF6 NFKBIA SNAI2 (+succ) ZEB1
Late silenced TCF4 TFEB ASCL2 CITED4 HAND1 KLF3 TEAD4 LMO2 JARID2 (-succ) MAF Transient SALL4 NEUROG2 TET1 HOXA1 KLF4
Early ZSCAN10 exp (+) SOX2 NANOG DNMT3B DNMT3L PRDM14 FOXA3 SP5 FOXD3 SRY (+suuc) Late exp FOXH1 TFCP2L1 HIF3A ZFP42 ZIC2 UMI OTX2 POU5F1B ZIC3 ZIC5 bce -2 20 POU5F1 JUNB FOSL2 5 FOS FOSL1 JUND 10 lse lse ICluster III Cluster II Cluster I 0.5 4 JDP2 NFE2 BJ D2 3 0 D8 -0.5 OCT4 PAX6 D16- Variability SOX10 CEBPA tSNE2 2 SRY ZEB1 D16+ BJ SOX8 TEAD4 H1 D2 -10 D8 1 D16- D16+ 0 H1 0 100 200 300 400 -10 0 10 Sorted TFs tSNE1 d BJ D2 D8 D16- D16+ H1
Deviation score POU1F1 POU2F1-2 -15 0 15 POU3F1-4 POU5F1B SP1/2/4 KLF5/14 SOX8 V NEUROG2 GATA1:TAL1 SOX10 NRL TFAP4 PAX6 III TAL1:TCF3 HNF4G IV TEAD1/3/4 MAFK III RXRA/B/G ELF5 CEBPA/G II VI RFX2-5 SNAI2 FOXA1 ZEB1 FOXB1 TCF4 FOXC1/2 CTCF VI FOXD1/2 ID4 NFE2 FOXF2 YY2 MAF:NFE2 FOXO3/4/6 SRY JDP2 FOXP2/3 TGIF1 FOXL1 PKNOX1 I JUNB/D FOS:JUN FOXI1 ESR1 FOS NKX2-3 Close I: Homo open in BJ & early Transient III: Open in intermediates & D16+ Open IV-V: Open early FOSL1/2 (D16+&hESC) II: Hete open in BJ & early (intermediates) Close in BJ & H1 (D16+&hESC) VI: Open late
f Deviation score Deviation score giDeviation score h Deviation score Deviation score -10 0 10 05 05 05 -5 010 FOSL1 CEBPA GATA1:TAL1 TEAD4 FOXL1
10 10 10 10 10
0 0 0 0 0 tSNE2 tSNE2 tSNE2 tSNE2 tSNE2
-10 -10 -10 -10 -10
-10 0 10 -10 0 10 -10 0 10 -10 0 10 -10 0 10 tSNE1 tSNE1 tSNE1 tSNE1 tSNE1
Figure 6 bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
ab cDeviation score Deviation score 5 FOS:JUN -10 0 10 -4 0 4 JUNB FOSL1 TEAD4 FOSL2 4 FOSL1 Deviation score FOS 10 BATF:JUN -6 0 6 3 NFE2 JUNB 5 FOSL2 TEAD4 TEAD3 FOS 0 Variability 2 FOSL1
FOS:JUN tSNE2 BATF:JUN -5 1 NFE2 TEAD3 -10 TEAD4 0 0 100 200 300 400 -15 Sorted TFs -10 -50 5 -10-5 0 5 tSNE1
deExpression (UMI) f BJ D2 D8 g BJ D2 D8 G1 D8 G2 D8 G3 D12 D16- D16+ D12 D16- D16+ G1 D16+ G2 Low High State 1 State 2 State 3 State 4 State 5 1.2 State 6 State 7 State 8 State 9 20 FOSL1 1 20
FOSL1 0.8 0 10 0.6 FOSL1 10 tSNE2 0.4 -20 0.2 0 1 0.2 0.4 0.6 0.81 1.2 -40 -20 020 40
Relative expression TEAD4 tSNE1 -10 0 10203040 Pseudotime GDF3 20 TEAD4
PC2 00.5 1.5 2.5 TEAD4 1 FOSL1 10 10 High FOSL1 0.5 0 1
Regulon activity Low Relative expression 0 10203040 TEAD4 -10 Pseudotime -20 -10 0 10 0.25 0.5 0.75 1.251 TEAD4 PC1
i NT2 KD FOSL1 KD j *** h 6 400 UMI ***
FOSL1 Low High TEAD4 300 40 * 4 200 20 * 2 100 0 tSNE 2 No. of TRA-1-60+ Colonies No. of 0 -20 Normalized TRA-1-60+ colonies 0 D1 KD D2 KD D3 KD D5 KD OE GFP -40 OE FOSL1 -50 -25 0 25 -50 -25 0 25 siNT2 tSNE 1
siFOSL1
N.S. N.S. k lm2 n D8 CD13- * D8 CD13- * * 250 350 350 1.6 ** 300 300 200 1.2 250 250 150 200 200 0.8 150 150 100 100 100 0.4 50 50 50 No. of TRA-1-60+ colonies No. of 0 No. of TRA-1-60+ Colonies No. of No. of TRA-1-60+ Colonies No. of Normalized TRA-1-60+ colonies 0 0 0 D1 KD D2 KD D5 KD D7 KD D8 KD D10 KD siNT2 OE GFP siTEAD4 OE GFP siNT2 OE TEAD4 OE FOSL1
siTEAD4
Figure 7 bioRxiv preprint doi: https://doi.org/10.1101/829853; this version posted November 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. a b D8 FOSL1 D8 TEAD4 BJ D16- D16+ D8 FOSL1 bound sites D8 TEAD4 bound sites (CD13-)
-4 -2 0 2 5 -5 -2 0 4 7 10 10 10
5 5 5
0 0 0 tSNE2 -5 -5 -5
-10 -10 -10
-30 -20 -10 0 10 20 30 -200 20 -200 20 -2.5K-5K 3’5’ 2.5K 5K -5K -2.5K 5’ 2.5K3’ 5K tSNE1 tSNE1 tSNE1
c d D8 cells D8 FOSL1 bound sites D8 TEAD4 bound sites (CD13- specific) D8 TEAD4 bound sites D8 FOSL1 bound sites 6 368 sites 2314 sites (CD13- specific)
4 5 4 2
0 0 0 -2 tSNE2 -4 -5 -4 -6 1009 sites 393 sites Log(D16+/BJ scATAC-seq reads) Log(D16+/BJ scATAC-seq reads) Log(D16+/BJ scATAC-seq 0.01 0.11 10 100 0.011 100 Mean of normalized counts Mean of normalized counts -10 -5 05 10 -10 -5 05 10 tSNE1
e 5kb hg19 h i D8 FOSL1 900 350 ChIP only 750 300 D8 TEAD4 250 ChIP only 600 200 450 MMP2 150 300 20kb hg19 100 D8 FOSL1 150 50 ChIP only 0 0
D8 TEAD4 TRA-1-60+ colonies No. of TRA-1-60+ colonies No. of siNT ChIP only siNT2 SOX2 siMMP2 si siSMAD3 SMAD3 siPRDM14 50kb hg19 D8 FOSL1 ChIP only
D8 TEAD4 ChIP only TET1 2kb hg19 D8 FOSL1 ChIP only j D8 TEAD4 ChIP only Transcriptome Chromatin accessibility
RFC3 OSKM fg D8 FOSL1 bound sites D8 TEAD4 bound sites (CD13-) BJ CD13 + - - CD13 + - - AP-1 CD201 + + - CD201 + + - CEBPA/G MAF D2 G3 G4 G1,2 CDK1+
TGFBR2 MMP2 CD13+ CD44+ COL5/7/21A1 COL5A1 n=165 COL7A1 G1 BDD2-C8- SMAD3 CDH1 FOSL1,COL6A3, COL21A1 MT1E CNN1, FBN1, CREB3L1 SMAD3, CD44 CD13-/CD13low TET1 CD201+ EPAS1 G2 FOSL1 TEAD4 n=122 RFC3 BDD2-C8+ D8 GATA1:TAL1 NR2F2 F11R MMP2, PRDM14 SOX8 MICALL2 JUN CD83 LIN28A NRL NFATC2 SOX2 PODXL G3 CD13- PAX6 SMAD3 GDF3+ CD201- HNF4G HOXA1 BDD2-C8+ RXRA/B/G MT1X FOSL1 TEAD4 OSNOCT4 n=234 DNMT3B TEAD4 LIN28A SOX2
n=27 SOX21 D16 TRA1-60+ Per row normalized Per row normalized D16- D16+ G2 OCT FOX 00.41 00.41 KLF5/14 YY2 SP1/2/4 SRY
Figure 8