LETTER doi:10.1038/nature18323

Dissecting direct reprogramming from fibroblast to using single-cell RNA-seq Barbara Treutlein1,2*, Qian Yi Lee1,3,4*, J. Gray Camp5, Moritz Mall3,4, Winston Koh1, Seyed Ali Mohammad Shariati6, Sopheak Sim3, Norma F. Neff1, Jan M. Skotheim6, Marius Wernig3,4§ & Stephen R. Quake1,7,8§

Direct lineage reprogramming represents a remarkable with Ascl1 only (hereafter referred to as Ascl1-only cells) using PCA conversion of cellular and transcriptome states1–3. However, and identified three distinct clusters (A, B, C), which correlated with the the intermediate stages through which individual cells progress level of Ascl1 expression (Fig. 1b–e). Cluster A consisted of all control during reprogramming are largely undefined. Here we use single- d0 MEFs and a small fraction of day 2 cells (~​12%) which showed cell RNA sequencing4–7 at multiple time points to dissect direct no detectable Ascl1 expression, suggesting these day 2 cells were not reprogramming from mouse embryonic fibroblasts to induced infected with the Ascl1 virus. This is consistent with typical Ascl1 neuronal cells. By deconstructing heterogeneity at each time infection efficiencies of about 80–90%. We found that the day 0 MEFs point and ordering cells by transcriptome similarity, we find that were surprisingly homogeneous, with much of the variance due to cell the molecular reprogramming path is remarkably continuous. cycle (Extended Data Fig. 1b–g, Supplementary Data 3, Supplementary Overexpression of the proneural pioneer factor Ascl1 results in a Information). Cluster C was characterized by high expression of Ascl1, well-defined initialization, causing cells to exit the cell cycle and Ascl1-target (Zfp238, Hes6, Atoh8 and so on) and genes involved re-focus expression through distinct neural in neuron remodelling, as well as the downregulation of genes involved factors. The initial transcriptional response is relatively in cell cycle and (Fig. 1c, e, f and Supplementary Data 2). homogeneous among fibroblasts, suggesting that the early steps Cluster B cells represent an intermediate population that expressed are not limiting for productive reprogramming. Instead, the Ascl1 at a low level, and were characterized by a weaker upregulation later emergence of a competing myogenic program and variable of Ascl1-target genes and less efficient downregulation of cell cycle transgene dynamics over time appear to be the major efficiency genes compared to cluster C cells. This suggests that an Ascl1 expres- limits of direct reprogramming. Moreover, a transcriptional state, sion threshold is required to productively initiate the reprogramming distinct from donor and target cell programs, is transiently induced process. In addition, we found that forced Ascl1 expression resulted in in cells undergoing productive reprogramming. Our data provide less intracellular transcriptome variance, a lower number of expressed a high-resolution approach for understanding transcriptome states genes (Fig. 1d) and a lower total number of transcripts per single cell during lineage differentiation. (Extended Data Fig. 2a, b). Notably, the distribution of average expres- Direct lineage reprogramming bypasses an induced pluripotent stage sion levels per gene was similar for all experiments independent of to directly convert somatic cell types. Using the three transcription Ascl1 overexpression (Extended Data Fig. 2c). We observed that the factors Ascl1, Brn2 and Myt1l (BAM), mouse embryonic fibroblasts upregulation of neuronal targets and downregulation of cell cycle genes (MEFs) can be directly reprogrammed to induced neuronal (iN) cells in response to Ascl1 expression are uniform, indicating that the initial within 2 to 3 weeks at an efficiency of up to 20%8. Several groups have transcriptional response to Ascl1 is relatively homogenous among all further developed this conversion using combi- cells (Fig. 1e). This suggests that most fibroblasts are initially competent nations that almost always contain Ascl1 (refs 9–12). Recently, one of to reprogram and later events must be responsible for the moderate our groups showed that Ascl1 is an ‘on target’ pioneer factor initiating reprogramming efficiency of about 20%. the reprogramming process13, and inducing conversion of MEFs into To explore the effect of transgene copy number variation on the functional iN cells alone, albeit at a much lower efficiency compared heterogeneity of the early response, we analysed single-cell tran- to BAM14. These findings raised the question whether and when a scriptomes of an additional 47 cells induced with Ascl1 for two days heterogeneous cellular response to the reprogramming factors occurs from secondary MEFs derived via blastocyst injection from a clonal, during reprogramming and which mechanisms might cause failure Ascl1-inducible embryonic stem cell line. As expected, the induction of reprogramming. We hypothesized that single-cell RNA sequencing efficiency of Ascl1 was 100% since the secondary MEFs are genetically (RNA-seq) could be used as a high-resolution approach to reconstruct identical and all cells carry the transgene in the same genomic location the reprogramming path of MEFs to iN cells and uncover mechanisms (Fig. 1g). Nevertheless, these clonal MEFs had similar transcriptional limiting reprogramming efficiencies4,15,16. responses and heterogeneity as primary infected MEFs at the day 2 time In order to understand transcriptional states during direct conver- point, as well as comparable reprogramming efficiencies and maturation sion between somatic fates, we measured 405 single-cell transcriptomes (Extended Data Fig. 3a). Finally, we compared the early response in our (Supplementary Data 1) at multiple time points during iN cell repro- Ascl1-only single-cell RNA-seq data with our previously reported bulk gramming (Fig. 1a and Extended Data Fig. 1a). We first explored how RNA-seq data of Ascl1-only and BAM-mediated reprogramming13 individual cells respond to Ascl1 overexpression during the initial phase (Extended Data Fig. 3b). We found similar downregulation of MEF- of reprogramming. We analysed day 0 MEFs and day 2 cells induced related genes and upregulation of pro-neural marker genes in both

1Department of Bioengineering, Stanford University, Stanford, California 94305, USA. 2Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany. 3Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, California 94305, USA. 4Department of Pathology, Stanford University School of Medicine, Stanford, California 94305, USA. 5Department of Developmental Biology, Stanford University School of Medicine, Stanford, California 94305, USA. 6Department of Biology, Stanford University, Stanford, California 94305, USA. 7Howard Hughes Medical Institute, Stanford, California 94305 USA. 8Department of Applied Physics, Stanford University, Stanford, California 94305, USA. *These authors contributed equally to this work. §These authors jointly supervised this work.

00 MONTH 2016 | VOL 000 | NATURE | 1 © 2016 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

+ − a Single-cell RNA-seq measurements b Cluster C BA 40 Tau–eGFP and 15 Tau–eGFP cells for transcriptome analysis − d0 d2 d5 d22 25 by fluorescence-activated cell sorting. We found that Tau–eGFP BAM cells lacked expression of neuronal Ascl1-target genes (genes B), and Ascl1 0 MEF Infected Clonal Tau–eGFP+ iN cell maintained expression of fibroblast-associated genes (genes A and C; + –25 Tau–eGFP– Fig. 2a, b, Extended Data Fig. 4a, Supplementary Data 4). In addition, PC2 (2.11%) d0 2 d2 we found a positive correlation (R =​ 0.49) between Ascl1 expres- c 25 d 2.0 ×10–4 sion and Tau–eGFP intensities (Extended Data Fig. 4b, Fig. 2a, b). A Ascl1 6.0 B log 0 Quantitative real-time (qRT)–PCR and western blot analysis of Ascl1

1.5 2

C [FPKM] 5 4.0 expression on day 5 to day 12 Tau–eGFP-sorted cells validated a 1.0 − –25 0 significant decrease in Ascl1 expression in Tau–eGFP cells compared 0.5 2.0 d0 PC2 (2.11%) + d2 Single cell densit y to Tau–eGFP cells (Fig. 2c, Supplementary Data 5). Thus, Ascl1 0 0 –50 –25 025 3.0 4.0 5.0 3,000 5,000 7,000 expression is correlated to Tau–eGFP levels and expression of neu- PC1 (12.58%) Transcriptome variance No. of expressed genes ronal genes at day 5. This raises the hypothesis that Ascl1 is silenced e Genes I Genes II in cells that fail to reprogram. Alternatively, cells with low or no Ascl1

Cluster expression at day 5 and day 22 might have never highly expressed A Ascl1. To distinguish between these two mechanisms, we used live cell

Cluster microscopy to track cells over a time course from 3–6 days after Ascl1 B induction using an eGFP–Ascl1 fusion construct (Fig. 2d, Extended Cluster C Data Fig. 5). We immunostained the cells at day 6 using Tuj1 antibod- d0 ies recognizing the neuronal β​3-tubulin (Tubb3) to identify cells that 0 12 d2 Tcf4 Sox9 Snca Cav1 Hes6 Birc 5 Cnn2 Ascl 1 Pval b log [FPKM] Tcf12 differentiated towards neuronal fate. We found that transgenic Ascl1 Atoh8

2 Foxp2 Tubb 3 Cdca 3 Ube2c Zfp238 Hmga2 Ppp3ca levels varied substantially over time and, on average, contin- f Genes I –log10[P value] Genes II –log10[P value] g Single + 0 10 0 10 cells d0 MEF ued to increase over time in Tuj1 cells, but decreased or plateaued Translation Reg. pol II transcription 20 d2 Ascl1 − Cell division Reg. infected in Tuj1 cells, leading to a significant difference in Ascl1 expression M phase Striated muscle dev. 10 d2 Ascl1 Cell cycle Reg. exc. memb. pot. clonal within six days of Ascl1 induction (Fig. 2e, Extended Data Fig. 4c). Rnp complex Axon Densit y 0 –0.10 0.1 Organelle Cell projection BP This time-lapse analysis demonstrated that Ascl1 is silenced in many Ribosome Neuron projection CC PC1 loadings cells that fail to reprogram. Figure 1 | Ascl1 overexpression elicits a homogeneous early response We next analysed the maturation events occurring during late and initiates expression of neuronal genes. a, Mouse embryonic reprogramming stages. We performed principal component analysis fibroblasts stably integrated with neuronal reporter Tau–eGFP8 were (PCA) on the single-cell transcriptomes of all reprogramming stages directly transformed to neuronal cells through overexpression of a single 8 analysed, including day 22 cells reprogrammed with Ascl1 alone or (Ascl1), or three factors (Brn2, Ascl1, Myt1l; BAM) as described . Cells with all three BAM factors (Extended Data Fig. 6a). PC1 separated were sampled using single-cell RNA-seq at day 0 without infection (d0, MEFs and early time points (day 2, day 5) from most of the day 22 cells. 73 cells), day 2 (d2, 81 cells Ascl1-infected and 47 cells clonal), day 5 (d5, 55 cells, eGFP+ and eGFP− cells), day 20 (d20, 33 cells, eGFP+ cells), Surprisingly, PC2 separated most day 22 BAM cells from day 22 Ascl1- and day 22 (d22, 73 cells, eGFP+ cells) post-induction with Ascl1. As only cells despite robust Tau–eGFP expression in both groups. We used a comparison, cells reprogrammed using all three BAM factors were t-distributed stochastic neighbour embedding (tSNE) to organize analysed at 22 days (d22, 43 cells, eGFP+ cells). b, c, PCA of single-cell all day 22 cells into transcriptionally distinct clusters, and identified transcriptomes from day 0 MEFs (circle, 73 cells) and day 2 Ascl1-induced differentially expressed genes marking each cluster (Fig. 3a). We identified cells (square, 81 cells) shows reduced intercellular variation at day 2. Points 3 clusters, which contained cells expressing neuron (Syp), fibroblast are coloured based on hierarchical clustering shown in e (b), or Ascl1 (Eln), or myocyte (Tnnc2) marker genes, respectively (Fig. 3b). expression (c). d, Left, distribution of transcriptome variance within single Consistent with this marker gene expression, cells in each cluster had cells grouped by cluster assignment of b and e shows that Ascl1 expression a maximum correlation with bulk RNA-seq data from purified , reduces the intracellular transcriptome variance. Right, distribution embryonic fibroblasts, or myocytes (Fig. 3c). Neuron- and myocyte-like of total number of genes expressed by single cells grouped by cluster assignment shows that Ascl1 overexpression reduces the range of gene cells expressed a clear signature of each cell type (Fig. 3d). Although we expression. e, Hierarchical clustering of day 0 and day 2 cells (rows) using observed cells with complex neuronal morphologies in the Ascl1-only 14 the top 50 genes (columns) correlating positively (genes I) and negatively reprogramming experiments as we had reported previously (Fig. 3e), (genes II) with PC1. Cells are clustered into three clusters (left sidebar): their frequency was too low to be captured in the single-cell RNA-seq A (83 cells, MEFs), B (20 cells, intermediates), C (51 cells, day 2 induced experiments. All of the day 22 Ascl1-only cells, and 33% of BAM cells cells). f, Top enrichments of genes I and II (d) are shown had a highest correlation with myocytes or fibroblasts. with Bonferroni-corrected P values. BP, biological process; CC, cellular We applied an analytical technique based on quadratic programming component; reg. exc. memb. pot., regulation of excitatory postsynaptic to quantify fate conversion and to predict when during reprogram- membrane potential. g, Distribution of PC1 loadings are shown for day 2 ming the alternative muscle program emerges (Extended Data Fig. 6b). cells carrying variable numbers of Ascl1 transgene copies (dark green, Ascl1-infected) or carrying the same Ascl1 copy number and genomic This method allowed us to decompose each single cell’s transcrip- location (yellow, clonal). PC1 effectively separates un-induced MEFs tome and express each cell’s identity as a linear combination of the (cluster A) from induced cells highly expressing Ascl1-target genes transcriptomes from the three different observed fates (neuron, MEF, (cluster C) and both, Ascl1-infected and clonal cells, productively initiate myocyte; Supplementary Data 6). Using this method, we observed that reprogramming. The induction efficiency is higher for clonally induced there is an initial loss of MEF identity concomitant with an increase in MEFs, however even in the clonal population Ascl1 induction is variable. neuronal and myocyte identity over the first five days of Ascl1 repro- gramming. The neuronal identity is maintained and matures in day 22 cells transduced with BAM (Extended Data Fig. 6c). However, the day Ascl1- and BAM-mediated reprogramming. These data suggest that 22 Ascl1-only cells failed to mature to neurons and adopted a predomi- the overexpression of Ascl1 focuses the transcriptome and directs the nantly myogenic transcriptional program. This divergence was already expression of target genes. apparent in some day 5 cells (Extended Data Fig. 6d, e). These findings We next analysed the transcriptomes of reprogramming cells on day 5. raised the question whether the additional two reprogramming factors At this time point, the first robust Tau–eGFP signal can be detected Brn2 and Myt1l suppress the aberrant myogenic program. Compatible in successfully reprogramming cells and we therefore purified with this notion, we observed that Brn2 and Myt1l had low expression

2 | NATURE | VOL 000 | 00 MONTH 2016 © 2016 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

a cdeGFP iN cell Genes A Genes B Genes C –Ascl1 Dox N3 media Tau–eGFP+ + ** Tau–eGFP– Tuj1 60 –1d d0 d3 d6 staining * Failed MEF Live cell – repro- 40 ** *** imaging gramming 20 e Tuj1+ * ECM, ECM, 120 Tuj1– collagen Neurogenesis transcription (norm. to Gapdh ) 0 Ascl1 transcript level Ascl1 eGFP log [FPKM] Fluor. 2 MEF d2 d5 d7 d10 d12 012 012 b 60 Ascl1 Tubb3 eGFP: NA NA – + –++ –+– Intensit y 10 40 Average EGFP-Ascl1 [FPKM]) Ascl1 2 0 0 30 3456 Expression (log MEF GFP– GFP+ MEF GFP– GFP+ Days post Ascl1 induction Figure 2 | Transgenic Ascl1 silencing explains early reprogramming RNA and protein levels of Ascl1 are significantly higher in Tau–eGFP+ failure. a, Hierarchical clustering of day 5 cells using genes correlating cells, and gradually decrease in Tau–eGFP− cells (*​P <​ 0.05, **​ ​P <​ 0.01, positively and negatively with PC1 and PC2 from PCA of day 5 Ascl1-only **​ *​ ​P <​ 0.001, two-tailed t-test; error bars, s.e.m.). d, Schematic for live cells. Note that eGFP fluorescence intensity and Ascl1 mRNA expression cell imaging experiment. CD1 MEFs were infected with an eGFP–Ascl1 shown in the left side bar appear correlated. b, Violin plots show the construct at −​1 day, induced with doxycycline at day 0, switched to N3 distribution of Ascl1 and neuronal marker Tubb3 in day 0 MEFs, as well media at day 1 and imaged between 3 and 6 days post doxycycline. Cells as Tau–eGFP+ and Tau–eGFP− day 5 cells. c, qRT–PCR for exogenous were fixed at 6 days and stained for Tubb3 expression. e, Average eGFP– Ascl1 expression (top, n =​ 4, biological replicates) and western blot of Ascl1 intensity (error bars, s.e.m.) was plotted at 45-min intervals for Ascl1 protein levels (bottom, Supplementary Data 5 ) for unsorted control Tuj1 + (n =​ 10) and Tuj1− (n =​ 12) cells between day 3 and day 6. Tuj1+ MEFs and day 2 cells (NA, not applicable), as well as day 5, day 7, day 10 cells significantly (one-tailed t-test, P <​ 0.05) increased Ascl1 expression and day 12 cells FAC-sorted using Tau–eGFP as a neuronal marker. Both through time compared to Tuj1− cells, which appeared to silence Ascl1. in the five day 22 BAM cells that expressed a myogenic program. To and neurogenic fates at day 22 based on immunostaining and directly address this question, we infected MEFs with Ascl1 alone qRT–PCR (Fig. 3e, Extended Data Fig. 6f–i). Indeed, myocyte mark- or in combination with Brn2 and/or Myt1l and assessed myogenic ers (Myh3, Myo18b, Tnnc2) were upregulated in Tau–eGFP-positive versus negative cells and were strongly repressed when Brn2 and/or a b c d d22 cells Myt1l was overexpressed together with Ascl1. Moreover, Brn2 and Syp Neuron BAM Ascl1

[FPKM ] lo g Myt1l enhanced the expression of the synaptic genes Gria2, Nrxn3, Tubb3 Myocyte 2 100 Lrrtm4 Stmn3, and Snap25 but not the immature pan-neuronal genes Gria2 Syngr3 Syp 0 Tubb3 and Map2. As expected, fibroblast markers were repressed in Kif5a + Tnnc2 Myocyte Eno2 Syt4 15 Tau–eGFP cells. Stmn2 0 Snap25 We next set out to reconstruct the reprogramming path from Snhg11 Fibroblast Stmn3 MEFs to iN cells. By deconstructing heterogeneity at each time point

tSNE 2 Cox6a2 Cryab Eln Fibroblast Hspb8 as described above, we removed cells that appeared stalled in repro- Mybpc1 Neuron Myl1 –100 Tnni1 Fluor. gramming due to Ascl1 silencing or cells converging on the alternative d20/22 Ascl1 Acta1 Cox8b d22 BAM Pgam2 myogenic fate. We used quadratic programming to order the cells based Tpm2 0 –40 040 Tnnt2 on fractional similarity to MEF and neuron bulk transcriptomes. This Expression Correlation eGFP tSNE 1 Low High Low High Ascl1 10 revealed a continuum of intermediate states through the 22-day repro-

e eGFP Dapi Myh3 Tubb3 Merged 1 gramming period (Fig. 4a, b). Notably, the total number of transcripts per single cell decreased as a function of fractional neuron identity cells + (Extended Data Fig. 7a). Our ordering of cells based on fractional iden- 15 0 tities correlated well with pseudotemporal ordering using Monocle , Ascl1-only 1 an alternative algorithm for delineating differentiation paths (Extended Data Fig. 7b–d). Heat map visualization of genes identified by PCA of all cells on the iN cell lineage revealed two gene regulatory events BA M Fraction of Tau–eGFP 0 during reprogramming with many cells at intermediate stages (Fig. 4c, + + Supplementary Data 7). First, there is an initiation stage where MEFs Myh3 Tubb3 exit the cell cycle upon Ascl1 induction, and genes involved in mitosis Figure 3 | iN cell maturation competes with an alternative myogenic cell are turned down or off (such as Birc5, Ube2c, Hmga2). Concomitantly, fate that is repressed by Brn2 and Myt1l. a, tSNE reveals alternative cell fates genes associated with cytoskeletal reorganization (Sept3/4, Coro2b, that emerge during direct reprogramming. Shapes and colours indicate the Ank2, Mtap1a, Homer2, Akap9), synaptic transmission (Snca, Stxbp1, day 20/22 Ascl1-only (dark green) or day 22 BAM-induced (blue) cells. Note + Vamp2, Dmpk, Ppp3ca), and neural projections (Cadm1, Dner, Klhl24, that all cells are Tau–eGFP . b, c, tSNE plot from a with cells coloured based on expression level of marker genes (b), or correlation with bulk RNA-seq Tubb3, Mapt (Tau)) increase in expression. This indicates that Ascl1 data from different purified cell types (neurons24, myocytes25, fibroblasts13; c). induces genes involved in defining neuronal morphology early in the d, Heat map showing expression of genes marking the two alternative fates reprogramming process. The initiation phase is followed by a matura- in day 20/22 Ascl1-only (upper sidebar, dark green) and day 22 BAM (upper tion stage whereby MEF extracelluar matrix genes are turned off and sidebar, blue) Tau–eGFP+ cells. Genes (rows) have the highest positive and genes involved in synaptic maturation are turned on (Syp, Rab3c, Gria2, negative correlation with the first principal component in a PCA analysis on Syt4, Nrxn3, Snap25, Sv2a). These results are consistent with previous all day 20/22 cells and all genes. Columns represent 121 single cells, ordered findings that Tuj1+ cells with immature neuron-like morphology can based on their correlation coefficient with the first principal component. be found as early as three days after Ascl1 induction, while functional Lower sidebars, Ascl1 transcript level and Tau–eGFP fluorescence for each synapses are only formed 2 to 3 weeks into the reprogramming pro- cell. e, Immunofluorescent detection of Tau–eGFP (green), DAPI (blue), 8 Myh3 (red) and Tubb3 (cyan) for day 22 cells infected with Ascl1 alone, or cess . Finally, we constructed a transcription regulator network on the with all BAM factors. Images are representative of four biological replicates. basis of pairwise correlation of transcription regulator expression across Right, mean fractions of eGFP+ cells that express either Tubb3 or Myh3. Only all stages of the MEF-to-iN cell reprogramming. This revealed three Tubb3 + cells with a neuronal morphology were counted. Six or seven images densely connected sub-networks identifying transcription regulators were analysed for each of four biological replicates. Error bars, s.e.m. influencing MEF cell biology, iN cell initiation, and iN cell maturation

00 MONTH 2016 | VOL 000 | NATURE | 3 © 2016 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER a c d0 d2 d5 d22 0 1 PC1 - Initiation PC2 - Maturation 0 8 Fraction neuron identit y + correlation – correlation–+ correlation correlation 0.75 0.75 d0 d2 0.50 0.50 d5

0.25 0.25 d2 2 Fraction MEF identity

4,000

2,000 Residual 0100 200 Cells ordered according to fraction neuron identity b M phase Synaptosome Synaptic transmission Extracellular matrix Cell cycle Neuron projection Neurotransmitter transport Collagen bril organization 0.6 Negative reg. of kinase activity Transmission of nerve impulse Lysosome Ascl 1 log [FPKM] Reg. of synaptic transmission Neuron projection Reg. of cell proliferation 0.2 2

Vesicle-mediated transport Skeletal system development eGFP uor . Experiment Fraction 0510 15 20

neuron identity Time post Ascl1 induction (days) 012 d Fraction neuron e f 0 1 NPCNIPC euron Prolif. MEF subnetwork Initiation subnetwork Maturation subnetwork d0 Tcf12 d2 Klf10 Dlx3 Fraction NPC identit y d5 d0 0.6 Gata3 Hes6 Hsf2 Onecut2 Sp8 0.6 d22 Sox9 Atoh8 L3mbtl4 d2 Runx2 Foxn3 Maf Smarce1 Zmiz2 Nr3c1 Id3 Tfdp2Scx Sox11 Tox3 d5 Lass5 Tfdp1 Hbp1 2 Atf4 Klf6 Zfp612 0.4 0.4 2 Fosl2 Tcf7l2 Id2 Zfp238 Myt1 Peg3 Neurod1 d Zfp948 Trp53 Prrx1 Pou3f2 Litaf Id1 Tcf4 Ascl1 Ddit3 Zfp367 Nfib Nfia Zfp941 Zfp523 Csda Lass2 Kdm5b Camta1 Hmga1-rs1 Ets1 Sox4 Arnt2 0.2 0.2 Hmgb2 Insm1 St18 Hmga1 Hmgb1 3 x4 Wdhd1 Csdc2 Etv1 x6 x2 x9 x1 1 bb Dc x Gli3 Thrb bp 7 Fosl1 Tcf4

Hmga2 Tbr1

Zfp423 Fraction MEF identity Pa So So So Hes6 Hes1 Mapt Ascl 1 Insm 1 Mki67 Tu So Fa Mtap 2 Zkscan16 Spag 5 Foxm1 Ccnb 1 Eome s Stat6 Bcl11b Snap2 5 Neurod 6 Bhlhe40 Ssrp1 Tub Myt1l Lass4 0 0 Neurod 4 Mybl2 Foxs1 0.0 0.2 0.4 0.6 Experiment Pbx4 0 10

Fraction neuron identity Fraction neuron log2[FPKM]

Figure 4 | Reconstructing the direct reprogramming path from identity (yellow/red). Right sidebars, Ascl1 transcript levels (log2[FPKM], MEFs to iN cells. a, Top, for each cell on the iN cell reprogramming blue/yellow) and eGFP fluorescence intensities (log10[RFU], black/ path, the similarity to bulk RNA-seq from either MEFs13 or neurons24 white; RFU, relative fluorescence units). d, Transcriptional regulator was calculated using quadratic programming and plotted as fractional covariance network during iN cell lineage progression. Shown are nodes identities (left axis, circle, fractional MEF identity; right axis, triangle, (transcriptional regulators) with more than three edges, with each fractional neuron identity). Points are coloured based on the experimental edge reflecting a correlation >​0.25 between connected transcriptional time point. Bottom, Lagrangian residuals of the quadratic programming regulators. e, Fractional MEF (left axis) or fractional neural precursor cell for each single cell ordered based on their fractional identity as above. (NPC) identities (right axis) are plotted against fractional neuron identity Points are coloured based on the experimental time point. b, Fractional for single cells on the MEF-to-iN cell lineage. Points are shaped based neuron identities of all cells on the iN cell reprogramming path are shown on the experiment. f, Expression of selected genes (columns) that mark as a function of the experimental time point. c, Ordering of single cells NPCs, intermediate progenitor cells (IPCs), neurons, or proliferating cells (rows) according to fractional neuron identity revealed a cascade of gene (Prolif.) are shown for cells on the iN cell lineage (rows). Left sidebars, expression changes leading to neuronal identity. Genes (columns) with fractional neuron identity (yellow/red) and experimental time point the highest positive and negative correlation to PC1 and PC2 are shown. (green/blue). Left sidebars, experimental time point (green/blue) and fractional neuron

(Fig. 4d, Extended Data Fig. 8, Supplementary Data 8, Supplementary similar to that which was observed for induced pluripotent stem cell Information). Notably, Ascl1 was found to strongly positively correlate reprogramming21–23 with the transcription regulators in both the initiation and maturation A fundamental question in cell reprogramming is whether there are subnetworks and negatively correlate with transcription regulators spe- pre-determined mechanisms that prevent the majority of the fibro- cific to MEFs. This data corroborates evidence that persistent Ascl1 blasts from reprogramming or whether all donor cells are competent expression is required to maintain chromatin states conducive to iN to reprogram but the reprogramming procedure is inefficient. We did cell maturation13. not observe any MEF subpopulations, other than cell cycle variation, It has been suggested that direct somatic lineage reprogramming that suggested differences in the capacity to initiate reprogramming. may not involve an intermediate progenitor cell state as seen dur- Furthermore, we observed that 48 h after infection the majority of the ing induced pluripotent stem cell differentiation17–19. However, our cells induced Ascl1-target genes and silenced MEF-associated genes. fractional analysis showed that the identity of intermediate repro- This does not preclude the possibility that underlying epigenetic vari- gramming cells could not be explained by a simple linear mixture of ation in donor cells influences reprogramming outcomes; however, our the differentiated fibroblast and neuron identities, as revealed by an analysis suggests that it is unlikely that MEF heterogeneity contributes intermediary increase of Lagrangian residuals (Fig. 4a). Therefore, significantly to reprogramming efficiency. We found that divergence we tested whether a neural precursor cell (NPC) state is transiently from the neuronal differentiation path into an alternative myogenic induced by adding NPC bulk transcriptome data along with that of fate, as well as Ascl1 transgene silencing, were both significant factors MEFs and neurons into the quadratic programming analysis (Fig. 4e). contributing to reprogramming efficiency. Though Ascl1 induces lin- We found that the fractional NPC identity of cells increased specif- eage conversion, it is inefficient in restricting cells to the neuronal fate. ically for cells at intermediate positions on the MEF-to-iN cell line- This suggests that intermediate stages of iN cell progression are unsta- age path, and then decreased as a function of iN cell maturation. In ble, perhaps due to epigenetic barriers, and additional factors promote addition, several NPC genes (that is, Gli3, Sox9, , Fabp7, Hes1) cells to permanently acquire neuron-like identity, rather than revert are expressed in intermediates of the iN cell reprogramming path20 to MEF-like or diverge towards the alternative myocyte-like fate. In (Fig. 4f). However, canonical NPC marker genes such as and summary, we present a single-cell transcriptomic approach that can Pax6 were never induced. This indicates that cells do not go through a be used to dissect direct cellular reprogramming pathways or devel- canonical NPC stage, yet a unique intermediate transcriptional state is opmental programs in which cells transform their identity through a induced transiently that is unrelated to donor and target cell program series of intermediate states.

4 | NATURE | VOL 000 | 00 MONTH 2016 © 2016 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Online Content Methods, along with any additional Extended Data display items and 20. Camp, J. G. et al. Human cerebral organoids recapitulate gene expression Source Data, are available in the online version of the paper; references unique to programs of fetal neocortex development. Proc. Natl Acad. Sci. USA 112, these sections appear only in the online paper. 15672–15677 (2015). 21. Di Stefano, B. et al. C/EBPα​ poises B cells for rapid reprogramming into Received 10 March 2015; accepted 17 May 2016. induced pluripotent stem cells. Nature 506, 235–239 (2014). 22. Lujan, E. et al. Early reprogramming regulators identified by prospective Published online 8 June 2016. isolation and mass cytometry. Nature 521, 352–356 (2015). 23. Takahashi, K. et al. Induction of pluripotency in human somatic cells via a 1. Xu, J., Du, Y. & Deng, H. Direct lineage reprogramming: strategies, mechanisms, transient state resembling primitive streak-like mesendoderm. Nat. Commun. and applications. Cell Stem Cell 16, 119–134 (2015). 5, 3678 (2014). 2. Arlotta, P. & Berninger, B. Brains in metamorphosis: reprogramming cell 24. Zhang, Y. et al. An RNA-sequencing transcriptome and splicing database of identity within the central . Curr. Opin. Neurobiol. 27, 208–214 glia, neurons, and vascular cells of the cerebral cortex. J. Neurosci. 34, (2014). 11929–11947 (2014). 3. Graf, T. Historical origins of and reprogramming.Cell Stem 25. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals Cell 9, 504–516 (2011). unannotated transcripts and isoform switching during cell differentiation. 4. Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung Nat. Biotechnol. 28, 511–515 (2010). epithelium using single-cell RNA-seq. Nature 509, 371–375 (2014). 5. Shalek, A. K. et al. Single-cell transcriptomics reveals bimodality in expression Supplementary Information is available in the online version of the paper. and splicing in immune cells. Nature 498, 236–240 (2013). 6. Zeisel, A. et al. Brain structure. Cell types in the mouse cortex and Acknowledgements The authors would like to acknowledge B. Passarelli and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 B. Vernot for discussions regarding bioinformatic pipelines, P. Lovelace for support (2015). with FACS and other Quake and Wernig laboratory members for discussions 7. Ramsköld, D. et al. Full-length mRNA-seq from single-cell levels of RNA and and support. This work was supported by NIH grant RC4NS073015-01 individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012). (M.W., S.Q.R., B.T.), the Stinehart-Reed Foundation, the Ellison Medical 8. Vierbuchen, T. et al. Direct conversion of fibroblasts to functional neurons by Foundation, the New York Stem Cell Foundation, CIRM grant RB5-07466 (all to defined factors.Nature 463, 1035–1041 (2010). M.W.), a National Science Scholarship from the Agency for Science, Technology 9. Pfisterer, U.et al. Direct conversion of human fibroblasts to dopaminergic and Research (Q.Y.L.), NIH grant GM092925 (S.A.M.S., J.S.), the German neurons. Proc. Natl Acad. Sci. USA 108, 10343–10348 (2011). Research Foundation (M.M.) and a PhRMA foundation Informatics fellowship 10. Yoo, A. S. et al. MicroRNA-mediated conversion of human fibroblasts to (J.G.C.). S.R.Q. is an investigator of the Howard Hughes Medical Institute. M.W. is neurons. Nature 476, 228–231 (2011). a New York Stem Cell Foundation (NYSCF) Robertson Investigator and a Tashia 11. Ambasudhan, R. et al. Direct reprogramming of adult human fibroblasts to and John Morgridge Faculty Scholar at the Child Health Research Institute at functional neurons under defined conditions.Cell Stem Cell 9, 113–118 Stanford. (2011). 12. Caiazzo, M. et al. Direct generation of functional dopaminergic neurons from Author Contributions B.T., Q.Y.L., M.W. and S.R.Q. conceived the study and mouse and human fibroblasts.Nature 476, 224–227 (2011). designed the experiments. Q.Y.L. performed direct reprogramming, qRT–PCR 13. Wapinski, O. L. et al. Hierarchical mechanisms for direct reprogramming of and western blot experiments; B.T., Q.Y.L., and S.S. performed single-cell fibroblasts to neurons.Cell 155, 621–635 (2013). RNA-seq experiments; N.F.N. assisted with single-cell RNA-seq experiments 14. Chanda, S. et al. Generation of induced neuronal cells by the single and sequenced the libraries; Q.Y.L., S.A.M.S. and M.M. performed time-lapse reprogramming factor ASCL1. Stem Cell Rep. 3, 282–296 (2014). imaging experiments. B.T., J.G.C. and W.K. analysed single-cell RNA-seq data, 15. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are Q.Y.L. analysed qRT–PCR and time-lapse imaging data, J.M.S., M.W. and S.R.Q. revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, provided intellectual guidance in the interpretation of the data. B.T., Q.Y.L., J.G.C., 381–386 (2014). M.W., and S.R.Q. wrote the paper. 16. Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Author Information The single-cell RNA-seq data were deposited on NCBI GEO Nat. Biotechnol. 33, 155–160 (2015). with the accession number GSE67310. Reprints and permissions information is 17. Merkle, F. T. & Eggan, K. Modeling human disease with pluripotent stem available at www.nature.com/reprints. The authors declare competing financial cells: from genome association to function. Cell Stem Cell 12, 656–668 interests: details are available in the online version of the paper. Readers are (2013). welcome to comment on the online version of the paper. Correspondence and 18. Perrier, A. L. et al. Derivation of midbrain dopamine neurons from human requests for materials should be addressed to S.R.Q. ([email protected]) and embryonic stem cells. Proc. Natl Acad. Sci. USA 101, 12543–12548 M.W. ([email protected]). (2004). 19. Li, X. J. et al. Specification of motoneurons from human embryonic stem cells. Reviewer Information Nature thanks F. Tang and the other anonymous Nat. Biotechnol. 23, 215–221 (2005). reviewer(s) for their contribution to the peer review of this work.

00 MONTH 2016 | VOL 000 | NATURE | 5 © 2016 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

METHODS distinguishing more than one cell from the reference genome. We clustered cells No statistical methods were used to predetermine sample size. The experiments based on their genotype (homozygous reference, heterozygous, homozygous alter- were not randomized and the investigators were not blinded to allocation during nate), and identified cells that were strongly different from the reference genome. experiments and outcome assessment. These cells expressed either astrocyte (Gfap) or microglia marker genes suggesting Cell derivation, cell culture and iN cell generation. Tau–eGFP reporter MEFs, they were contaminants from the feeder cell culture. We removed these cells from tested negative for mycoplasma contamination, were isolated, infected with doxy­ subsequent analyses. cycline (dox)-inducible lentiviral constructs and reprogrammed into iN cells as Approximate number of transcripts was calculated from FPKM values by using previously described8. Day 0 (d0) cells were uninfected MEFs that served as a the correlation between number of transcripts of exogenous spike-in mRNA negative control. Day 2 (d2) cells were infected with Ascl1 and harvested two days sequences and their respective measured mean FPKM values (Extended Data after dox-induction. Day 5 (d5) cells were infected with Ascl1, FAC-sorted for Tau– Fig. 2). The number of spike-in transcripts per single cell lysis reaction was calcu- eGFP+ and Tau–eGFP− cells five days after dox induction and the two cell popu- lated using the concentration of each spike-in provided by the vendor (Ambion, lations were mixed again in a 1:1 ratio. Day 20 or 22 (d20/d22) cells were infected Life Technologies), the approximate volume of the lysis chamber (10 nl) as well as either with Ascl1 alone, or combined with Brn2 and Myt1l, plated with glia seven the dilution of spike-in transcripts in the lysis reaction mix (40,000×). Transcript days post dox induction, and FAC-sorted for Tau–eGFP+ iN cells 20 or 22 days levels were converted to the log-space by taking the logarithm to the base 2 after dox induction. Each of these groups was then loaded onto separate micro- (Supplementary Data 1). R studio37 (https://www.rstudio.com/) was used to run fluidic mRNA-seq chips for preparation of pre-amplified cDNA from single cells. custom R38 scripts to perform principal component analysis (PCA, FactoMineR Clonal Ascl1-inducible MEFs were derived as previously described13. Twelve- package), hierarchical clustering (stats package), variance analysis and to construct well plates were coated with Matrigel and incubated at 37 °C overnight. 350,000 heat maps, correlation plots, box plots, scatter plots, violin plots, dendrograms, cells were then plated per well and kept in MEF media. Dox was added a day after bar graphs, and histograms. Generally, ggplot2 and gplots packages were used to plating. For single-cell RNA-seq, cells were harvested two days post dox induction generate data graphs. and loaded onto a microfluidic mRNA-seq chip. To evaluate efficiency in repro- The Seurat package39,40 implemented in R was used to identify distinct cell gramming, MEF +​ dox media was switched out for N3 +​ dox media after 48 h, populations present at d22 of Ascl1-only and BAM reprogramming (Fig. 3a–d). and cells were fixed for immunostaining 12 days post dox. t-distributed stochastic neighbour embedding (tSNE) was performed on all Capturing of single cells and preparation of cDNA. Single cells were captured on d20/d22 cells using the most significant genes (P <​ 1 ×​ 10−3, with a maximum of a medium-sized (10–17 μ​m cell diameter) microfluidic RNA-seq chip (Fluidigm) 100 genes per principal component) that define the first three principal components using the Fluidigm C1 system. Cells were loaded onto the chip at a concentration of a PCA analysis on the data set. To further estimate the identity of each cell on of 350–500 cells μ​l−1, stained for viability (live/dead cell viability assay, Molecular the tSNE plot, we colour coded cells based on Pearson correlation of each single Probes, Life Technologies) and imaged by phase-contrast and fluorescence cell’s expression profile with the expression profile of bulk cortical neurons13,24, microscopy to assess number and viability of cells per capture site. For d5 and d22 myocytes25, and MEFs13 (Fig. 3). The Monocle package15 was used to order cells experiments, cells were only stained with the dead stain ethidium homodimer on a pseudo-time course during MEF to iN cell reprogramming (Extended Data (emission ~​635 nm, red channel) and Tau–eGFP fluorescence was imaged in the Fig. 7). Covariance network analysis and visualizations were done using igraph green channel. Only single, live cells were included in the analysis. cDNAs were implemented in R41 (http://igraph.sf.net). prepared on chip using the SMARTer Ultra Low RNA kit for Illumina (Clontech). To generate PCA plots and heat maps in Figs 1c–e, 2a, 3a and 4c, PCA was per- ERCC (External RNA Controls Consortium) RNA spike-in Mix (Ambion, Life formed on cells using all genes expressed in more than two cells and with a variance 26,27 Technologies) was added to the lysis reaction and processed in parallel to in transcript level (log2[FPKM]) across all single cells greater than 2. This threshold cellular mRNA. Tau–eGFP fluorescence intensity of each single cell was deter- resulted generally in about 8,000–12,000 genes. Subsequently, genes with the high- mined using CellProfiler28 by first identifying the outline of the cell in the image of est PC loadings (highest (top 50–100) positive or negative correlation coefficient the respective capture site and then integrating over the signal in the eGFP channel. with one of the first one to two principal components) were identified and a heat RNA-seq library construction and cDNA sequencing. Size distribution and con- map was plotted with genes ordered based on their correlation coefficient with the centration of single-cell cDNA was assessed on a capillary electrophoresis based respective PC (Figs 1e, 2a, 4c). Cells in rows were ordered based on unsupervised fragment analyser (Advanced Analytical Technologies) and only single cells with hierarchical clustering using Pearson correlation as distance metric (Figs 1e, 2a) high quality cDNA were further processed. Sequencing libraries were constructed or based on their fractional identity as determined by quadratic programming in 96-well plates using the Illumina Nextera XT DNA Sample Preparation kit (Fig. 4c, see below) according to the protocol supplied by Fluidigm and as described previously29. Gene ontology enrichment analyses were performed using DAVID Libraries were quantified by Agilent Bioanalyzer using High Sensitivity DNA Bioinformatics Resources 6.7 of the National Institute of Allergy and Infectious analysis kit as well as fluorometrically using Qubit dsDNA HS Assay kits and a Diseases42. Functional annotation clustering was performed and GO terms repre- Qubit 2.0 Fluorometer (Invitrogen, Thermo Fisher Scientific). Up to 110 single-cell sentative for top enriched annotation clusters are shown in Fig. 1f, Extended Data libraries were pooled and sequenced 100 bp paired-end on one lane of Illumina Figs 1e and 4a with their Bonferroni corrected P values. In addition, results of GO HiSeq 2000 or 75 bp paired-end on one lane of Illumina NextSeq 500 to a depth of enrichment analyses are provided in the Supplementary Data. 1–7 million reads. CASAVA 1.8.2 was used to separate out the data for each single To express a single cell transcriptome as a linear combination of primary cell cell using unique barcode combinations from the Nextera XT preparation and to type transcriptomes, we used published bulk RNA-seq data sets for primary murine generate *.fastq​ files. In total, the transcriptome of a total of 405 cells was measured neurons24, myocytes25, and embryonic fibroblasts13 (Extended Data Fig. 6b, c), from the following eight independent experiments: d0 (73 cells, 1 experiment), d2 neurons24 and embryonic fibroblasts13 (Fig. 4a) or neurons24, embryonic fibro- (Ascl1-only in regular MEFs, 81 cells, 1 experiment; Ascl1-only in clonal MEFs, blasts13 and neuronal progenitor cells13 (Fig. 4e). In each quadratic programming 47 cells, 1 experiment), d5 (Ascl1-only, 55 cells, 1 experiment) and d20 (Ascl1-only, analysis, we first identified genes that were specifically (log2 fold change of 3 or 33 cells, 1 experiment) and d22 (BAM, 43 cells, 1 experiment; Ascl1-only, 34 and higher) expressed in each of the bulk data sets compared to the respective others 39 cells, 2 independent experiments). See Supplementary Data 1 for the transcrip- (Supplementary Data 6). Using these genes, we then calculated the fractional iden- tome data for all 405 cells with annotations (quantification in log2[FPKM]). tities of each single cell using quadratic programming (R package ‘quadprog’). The Processing, analysis and graphic display of single cell RNA-seq data. Raw reads resulting fractional neuron identities of cells on the MEF-to-iN cell reprogram- were pre-processed with sequence grooming tools FASTQC30, cutadapt31, and ming path (265 cells in total, excluding cells that were Tau–eGFP-negative at d5 or PRINSEQ32 followed by sequence alignment using the Tuxedo suite (Bowtie33, myocyte- and fibroblast-like cells at d22) were used to order cells in a pseudo- Bowtie234,TopHat35 and SAMtools36) using default settings. Transcript levels temporal manner (Fig. 4a–c, e, f). We compared this fractional neuron identity were quantified as fragments per kilobase of transcript per million mapped reads based cell ordering with pseudo-temporal ordering of cells based on Monocle (FPKM) generated by TopHat/ Cufflinks25. (Extended Data Fig. 7b–d), an algorithm that combines differential dimension After seven days of reprogramming, Tau–eGFP reporter MEFs (with C57BL/6J reduction using independent component analysis with minimal spanning tree and 129S4/SvJae background) were co-cultured with glia derived from CD-1 mice. construction to link cells along a pseudotemporally ordered path15. Monocle To determine if any feeder cells contaminated the 20–22-day time points, we used analysis was performed using genes differentially expressed between neuron24 and the single cell RNA-seq reads to identify positions that differ from the mouse embryonic fibroblast13 bulk RNA-seq data (same gene set that was used when reference genome (mm10, built from strain C57BL/6J mice). We used the mpileup calculating fractional neuron and fibroblast identities in Fig. 4a, genes listed in fuction in samtools to generate a multi-sample variant call format file (vcf), and a Supplementary Data 6). custom python script to genotype the cells by requiring coverage in all cells for all For the transcription factor network analysis (Fig. 4d), we computed a pairwise positions, with a coverage depth of five reads, a phred GT likelihood =​ 0 for called correlation matrix (Pearson correlation, visualized in correlogram in Extended genotype and ≥​40 for next-best genotype. This resulted in 95 informative sites Data Fig. 8a) for transcriptional regulators annotated as such in the Animal

© 2016 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Transcription Factor Database (http://www.bioguo.org/AnimalTFDB/)43 and Sox11 (forward: CCTGTCGCTGGTGGATAAGG, reverse: CTGCGCCTCTC identified those transcriptional regulators (TRs) with a Pearson correlation of AATACGTGA); Sox9 (forward: CGAGCACTCTGGGCAATCTCA, reverse: greater than 0.35 with at least five other TRs (82 TRs, shown in Extended Data ATGACGTCGCTGCTCAGTTC); Tcf4 (forward: CAGTGCGATGT Fig. 8b). We used a permutation approach to determine the probability of finding TTTCGCCTC, reverse: ATGTGACCCAAGATCCCTGC); Tcf12 (forward: TRs meeting this threshold by chance. We performed 500 random permutations GTCTCGAATGGAAGACCGCT, reverse: GTTCCGACCATCGAAGCTGA). of the expression matrix of all TRs across cells on the MEF-to-iN cell lineage, and Maturation factors. Camta1 (forward: CCCCTAAGACAAGACCGCAG, calculated the pairwise correlation matrix for each permutation of the input data reverse: ACATAGCAGCCGTACAAGCA); Insm1 (forward: GACCCGG frame. All randomized data frames resulted in 0 TRs that met our threshold. This CACATCAACAAGT, reverse: GAAGCGAAGCGAAGAGGACA); Myt1l shows that our correlation threshold is strict, and all nodes and connections that we (forward: ATGTTCCCACAACCACACCA, reverse: TACCGCTTGGCATCG present in the TR network are highly unlikely to be by chance. We used the pairwise TCATA); St18 (forward: TGCCAAGGGAGCTGAGATAGA, reverse: GAAGG correlation matrix for the selected TRs as input into the function graph.adjacency() CTGCTTGCGTTGAAT). of igraph implemented in R41 (http://igraph.sf.net) to generate a weighted network Neuronal genes. Gria2 (forward: GGGGACAAGGCGTGGAAATA, graph, in which the selected TRs are presented as vertices and all pairwise corre- reverse: GTACCCAATCTTCCGGGGTC); Map2 (forward: CAGAGAAA lations >​0.25 are presented as edges linking the respective vertices. The network CAGCAGAGGAGGT, reverse: TTTGTTCTGAGGCTGGCGAT); Nrxn3 (forward: graph was visualized using the Fruchterman–Reingold layout and the three clear TGTGAACCAAGTACAGATAAGAGT, reverse: CAGCTCAGGGGAC subnetworks (MEF, initiation, maturation) were manually colour coded. AAAGAGG); Snap25 (forward: TTCATCCGCAGGGTAACAAA, We used Pearson correlation of each single cell expression profile with the reverse: GTTGCACGTTGGTTGGCTT); Stmn3 (forward: AGCACCGT expression profile of bulk cortical neurons13,24, myocytes25, and MEFs13 to further ATCTGCCTACAAG, reverse: TGGTAGATGGTGTTCGGGTG); Tubb3 (forward: estimate the identity of each single cell and to estimate when alternative fates CAGATAGGGGCCAAGTTCTGG, reverse: GTTGTCGGGCCTGAATAGGT). emerge (Fig. 3c, Extended Data Fig. 6d, e). For this analysis, we considered the same Myocyte genes. Acta1 (forward: CTAGACACCATGTGCGACGA, reverse: cell type specific gene sets that were used in the quadratic programming analysis, CATACCTACCATGACACCCTGG); Myh3 (forward: AAATGAAGGGGACG that is, were genes specifically expressed (log2 fold change of 3 or higher) in a CTGGAG, reverse: CAGCTGGAAGGTGACTCTGG); Myo18b (forward: respective bulk RNA-seq data set compared to the others (Supplementary Data 6). TGCCCTCTTCAGGGAAGGTA, reverse: GAGCTTCTCCACTGACACCC); To estimate intercellular heterogeneity of d0 MEFs, we calculated the variance Tnnc2 (forward: CAACCATGACGGACCAACAG, reverse: GTGTCTGCC for each gene across all MEF cells as well as across mouse embryonic stem cells CTAGCATCCTC). under 2iLIF culture conditions44 and across glioblastoma cells45. We then plotted Fibroblast genes. Col1a2 (forward: AGTCGATGGCTGCTCCAAAA, reverse: the distribution of variances for all genes per cell population as box plots. ATTTGAAACAGACGGGGCCA); Dcn (forward: GCAAAATCAGT Quantitative RT–PCR and immunostaining. Ascl1 infected Tau–eGFP reporter CCAGAGGCA, reverse: CGCCCAGTTCTATGACAAGC). MEFs were FAC-sorted 5, 7, 10, 12 or 22 days post-Ascl1 induction with dox. RNA was then extracted from both Tau–eGFP positive and negative populations 26. Baker, S. C. et al. The External RNA Controls Consortium: a progress report. from each time point, as well as uninfected control MEFs and unsorted d2 Ascl1- Nat. Methods 2, 731–734 (2005). infected MEFs using the TRIzol RNA isolation protocol (Invitrogen, 15596-018). 27. Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Reverse transcription into cDNA was performed using the SuperScript III First- Res. 21, 1543–1551 (2011). strand Synthesis System (Invitrogen, 18080-051) and qRT–PCR was performed 28. Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and using Sybr Green (Thermo Fisher Scientific, 4309155). Immunostaining was per- quantifying cell phenotypes. Genome Biol. 7, R100 (2006). 8 29. Wu, A. R. et al. Quantitative assessment of single-cell RNA-sequencing formed as previously described . Antibodies and qRT–PCR primers are listed in methods. Nat. Methods 11, 41–46 (2014). the Supplementary Information. 30. Babraham Institute. Babraham Bioinformatics. FASTQC. http://www. Time-lapse imaging of Ascl1 expression. MEFs were isolated from E13.5 CD-1 bioinformatics.bbsrc.ac.uk/projects/fastqc embryos (Charles River) and infected with a dox-inducible, N-terminal-tagged 31. Martin, M. Cutadapt removes adapter sequences from high-throughput eGFP–Ascl1 fusion construct using the protocol previously described1. Cells were sequencing reads. EMBnet.journal 17, 10–12 (2011). 32. Schmieder, R. & Edwards, R. Quality control and preprocessing of plated on 35 cm glass bottom dishes (MatTek), coated with polyorthinine (Sigma metagenomic datasets. Bioinformatics 27, 863–864 (2011). P3655) and laminin (Invitrogen 23017-015). Imaging experiments were performed 33. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory- between 3 and 6 days post dox induction, in a temperature- and CO2-controlled efficient alignment of short DNA sequences to the .Genome chamber. Images were taken for up to 10 positions per dish, for 3 dishes, every Biol. 10, R25 (2009). 45 min with a Zeiss AxioVert 200M microscope with an automated stage using 34. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. × × Nat. Methods 9, 357–359 (2012). an EC Plan-Neofluar 5 ​/0.16 NA Ph1 objective or an A-plan 10 ​/0.25 NA Ph1 35. Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions objective. Cells were fixed at 6 days and immunostained using Tuj1 antibodies rec- with RNA-seq. Bioinformatics 25, 1105–1111 (2009). ognizing neuronal Tubb3 (Covance MRB-435P) to confirm neuronal identity. We 36. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics used ImageJ to segment individual cells and measure the level of GFP for 7 Tuj1+ 25, 2078–2079 (2009). cells and 7 Tuj1− cells over time. Average intensity was obtained by normalizing 37. RStudio. Integrated Development for R. RStudio, Inc., Boston, MA URL the average intensity of a cell segment by the average background intensity of an http://www.rstudio.com/ (2015). + 38. R Core Team. R: A language and environment for statistical computing. adjacent segment of the same size. A t-test was performed comparing Tuj1 and R Foundation for Statistical Computing. http://www.R-project.org/ − Tuj1 cells at each time point to evaluate significance. 39. Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of Antibodies. Rabbit anti-Ascl1 (Abcam ab74065), chicken anti-GFP (Abcam individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015). ab13970), rabbit anti-Tubb3 (Covance MRB-435P), mouse anti-Tubb3 (Covance 40. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction MMS-435P), mouse anti-Map2 (Sigma M4403), rabbit anti-Myh3 (Santa of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015). 41. Csardi, G. & Nepusz, T. The igraph software package for complex network β Cruzsc-20641), goat anti-Dlx3 (Santa Cruz sc-18143), mouse anti- -​ (Sigma research. InterJournal 1695 (2006). A5441), rabbit anti-Tcf12 (Bethyl A300-754A). 42. Huang, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis Primers. General. Gapdh (forward: AGGTCGGTGTGAACGGATTTG, of large gene lists using DAVID bioinformatics resources. Nat. Protocols 4, reverse: TGTAGACCATGTAGTTGAGGTCA); Ascl1 (TetO) (forward: CCGAA 44–57 (2009). TTCGCTAGCCACCAT, reverse: AAGAAGCAGGCTGCGGG). 43. Zhang, H. M. et al. AnimalTFDB: a comprehensive animal transcription factor database. Nucleic Acids Res. 40, D144–D149 (2012). Initiation factors. Atoh8 (forward: GCCAAGAAACGGAAGGAGTGA, 44. Kumar, R. M. et al. Deconstructing transcriptional heterogeneity in pluripotent reverse: CTGAGAGATGGTACACGGGC); Dlx3 (forward: CGCCGCTCCAA stem cells. Nature 516, 56–61 (2014). GTTCAAAAA, reverse: GTGGTACCAGGAGTTGGTGG); Hes6 (forward: 45. Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in TACCGAGGTGCAGGCCAA, reverse: AGTTCAGCTGAGACAGTGGC); primary glioblastoma. Science 344, 1396–1401 (2014).

© 2016 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

Extended Data Figure 1 | The majority of MEFs are actively undergoing cell cycle, but exit cell cycle upon Ascl1 induction. a, Live cell imaging of Tau–eGFP reporter over the course of BAM-mediated iN cell reprogramming. Tau–eGFP fluorescence normalized to the maximum expression is shown in relation to days post-BAM induction. Tau–eGFP expression began at day 5 and reached a peak at day 8 after induction. Shown are representative images from day 0, day 5 and day 9. b, Box plots of intercellular transcriptome variance showed that MEFs are more heterogeneous than mouse embryonic stem cells under 2iLIF culture conditions44 and less heterogeneous than glioblastoma cells45. c, PCA of genes with most variance in day 0 MEFs revealed MEF heterogeneity (blue, A). Density plot showing the distribution of number of cells along PC1 loading is shown above the PCA plot. d, Heat map and hierarchical clustering of genes used for the PCA in panel c shows to major MEF subpopulations. Each column represents a single cell, and each row a gene. Subpopulation A is highlighted in blue in the dendrogram. e, GO enrichment for genes in c shows that MEF subpopulation A is distinguished by the low or lack of expression of genes enriched for cell cycle terms. f, g, PCA and heat map of the same genes used in panels c–e, this time including day 0 MEFs (circles, light green) and day 2 cells (squares, dark green), showed that most of the day 2 cells had the same cell cycle signature as MEF subpopulation A. Cells in columns of both heat maps are ordered based on PC1 loading.

© 2016 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Extended Data Figure 2 | Total number of transcripts per cell decreases number of transcripts per single cell for each experiment. Number of during MEF-to-iN cell reprogramming. a, Average detected transcript transcripts per cell were calculated from the FPKM values of all genes in levels (mean FPKM, log2) for 92 ERCC RNA spike-ins as a function each cell using the correlation between number of transcripts of exogenous of provided number of molecules per lysis reaction for each of the spike-in mRNA sequences and their respective measured mean FPKM 8 independent single-cell RNA-seq experiments. Linear regression fits values (calibration curves are shown in panel a). The total number of through data points are shown. The length of each ERCC RNA spike-in transcripts expressed by a single cell and detected by single-cell RNA-seq is transcript is encoded in the size of the data points. No particular bias highest in MEFs and is more than twofold decreased upon overexpression towards the detection of shorter versus longer transcripts is observed. of Ascl1 or BAM. c, Box plots showing the distribution of the median The linear regression fit was used to convert FPKM values to approximate transcript number per gene across all cells of one experiment. The number of transcripts. b, Box plots showing the distribution of the total distributions are similar over the course of iN cell reprogramming.

© 2016 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

Extended Data Figure 3 | Clonal MEFs reprogram successfully into iN cells, and Ascl1-only and BAM induce similar responses during early iN cell reprogramming. a, Immunostaining of heterogenous Ascl1-infected MEFs and clonal MEFs with homogenous Ascl1 transgene insertions, fixed 12 days after Ascl1 induction, using rabbit anti-Tubb3 (red) and mouse anti-Map2 (cyan) antibodies and DAPI (blue) as a nuclear stain. Reprogramming efficiencies are comparable regardless of variation in Ascl1 copy numbers. Images are representative for one reprogramming experiment. b, Bar plots showing expression of Ascl1-target genes (Hes6, Zfp238, Snca, Cox8b, Bex1, Dner) and MEF marker genes averaged across single cells from day 0 MEFs and day 2 Ascl1-only cells, as well as from bulk RNA-seq data from MEFs, day 2 BAM, and day 2 Ascl1-only cells. This data shows that the initiation of reprogramming at day 2 is similar for Ascl1-alone and BAM-mediated reprogramming.

© 2016 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Extended Data Figure 4 | Failed reprogramming at day 5 correlates with silencing of Ascl1. a, Bonferroni-corrected P values for gene ontology enrichments are shown for each group of genes from Fig. 2a, with representative genes listed (Supplementary Data 4). b, Biplot showing Tau–eGFP fluorescence intensity as a function of Ascl1 transcript level in day 5 cells. Point size is proportional to eGFP transcript levels in 2 log2[FPKM]. There is a positive correlation (R =​ 0.49) indicating that cells with higher Ascl1 expression are more likely to reprogram. c, Heat map of eGFP–Ascl1 expression in 14 individual cells (columns) during live cell imaging. Rows represent time post Ascl1 induction in 45-min intervals.

© 2016 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

Extended Data Figure 5 | Live cell imaging shows diminishing of one reprogramming experiment per condition. b, Representative images eGFP–Ascl1 signal in cells that fail to reprogram. a, Immunostaining from live cell imaging showing an example of diminishing of eGFP signal for Tubb3 and Map2 at day 12 post induction of Ascl1, C-terminal tagged in a cell that failed to reprogram (that is, cell was Tuj1-negative at day 6). Ascl1–eGFP and N-terminal tagged eGFP–Ascl1 in CD-1 MEFs. eGFP– c, Live cell imaging of eGFP signal of eGFP–Ascl1 infected MEFs between Ascl1 has comparable reprogramming efficiency with untagged Ascl1 3–6 days post dox induction. d, eGFP imaging of live cells 6 days post while Ascl1–eGFP has a much reduced reprogramming efficiency, so induction of Ascl1 and corresponding immunostaining for Tubb3 after eGFP–Ascl1 was chosen for live cell imaging. Images are representative for fixation.

© 2016 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Extended Data Figure 6 | Brn2 and Myt1l repress alternative fates that with Ascl1 co-infected with Brn2 or Myt1l. See Fig. 3e for respective compete with the iN cell fate during advanced Ascl1 reprogramming. data for cells infected with Ascl1-only or all three BAM factors. Images a, Scatter plot showing PC1 and PC2 loadings from principal component are representative for four biological replicates. Right, mean fractions of analysis (PCA) of single cells from all time points with experimental time eGFP+ cells that express either Tubb3 or Myh3. Only Tubb3+ cells with point and reprogramming condition (Ascl1 versus BAM) encoded in point a neuronal morphology were counted. Co-expression of Ascl1 with Brn2 shape and colour. b, Overview of quadratic programming. Fractional and/or Myt1l increases fraction of Tau–eGFP+ cells that are also Tubb3+, identities are calculated assuming a linear combination of different cell while decreasing the number of cells that are Myh3+. Six or seven images fates. c, Biplots showing the fractional fibroblast identity as a function of were analysed for each of four biological replicates. Error bars, s.e.m. fractional neuron (left) and fractional myocyte (right) identity for each g–i, qRT–PCR of selected myogenic (g), neuronal (h), and fibroblast (i) cell with points shaped and colour coded based on reprogramming time markers using day 22 cells that are infected with Ascl1 only or point and condition. d, Correlation of transcriptomes from days 0, 2, 5, co-infected with Brn2 or Myt1l or both and FAC-sorted by Tau–eGFP and 20/22 cells (Ascl1-only and BAM-induced) with bulk RNA-seq from (n =​ 3, biological replicates; error bars, s.e.m.). Myogenic genes were MEFs, cortical neurons and myocytes. Bottom bars show Tau–eGFP significantly downregulated in Tau–eGFP+ cells that were co-infected fluorescence intensity. e, Bar plot quantifying the number of cells with a with Brn2 and/or Myt1l compared to those infected with Ascl1 alone, maximum correlation to bulk RNA-seq data from each of the observed while some neuronal genes are significantly upregulated (Map2, Gria) fates (d). f, Immunofluorescent detection of Tau–eGFP (green), DAPI (*​P <​ 0.05, **​ ​P <​ 0.01, **​ *​ ​P <​ 0.001, two-tailed t-test). (blue), Myh3 (red) and Tubb3 (cyan) for day 22 cells that were infected

© 2016 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

Extended Data Figure 7 | Comparison of Monocle and quadratic programming with respect to ordering of neuronal cells through the reprogramming path. a, Biplot showing the total number of transcripts per cell for all cells on the MEF-to-iN cell lineage as a function of the fraction neuron identity of each cell (see Fig. 4). The total number of transcripts decreases during the reprogramming process. b, Cells (depicted as circles) are arranged in the 2D independent component space based on the expression of genes used for quadratic programming in Fig. 4a. Lines connecting cells represent the edges of a minimal spanning tree with the bold black line indicating the longest path. Time points are colour coded. c, Monocle plots with single cells coloured based on gene expression that distinguishes the stages of iN cell reprogramming. d, Biplot shows the correlation between ordering of cells based on pseudo-time (Monocle) and fractional identity (quadratic programming). Time points are colour coded. Pearson correlation coefficient =​ 0.91.

© 2016 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Extended Data Figure 8 | Neuronal maturation proceeds through cells on the MEF-to-iN cell lineage ordered based on fractional neuron expression of distinct transcriptional regulators. a, Correlogram identity. Growth curves based on a model-free spline method were fitted showing transcriptional regulators (TRs) highly correlated within MEFs as to the data. f, qRT–PCR of selected TRs from initiation and maturation well as the initiation phase and the maturation phase of reprogramming. subnetworks from Fig. 4d. Uninfected MEF controls and day 2–12 Ascl1- b, Heat map shows expression of TRs that control the two stages of MEF infected cells were assayed for all selected TRs, and day 22 Ascl1-alone to iN cell reprogramming (Fig. 4d) in cells ordered based on fractional and BAM-infected cells were additionally assayed for maturation TRs. neuron identity. Each row represents a single cell, each column a gene. Cells for day 5 to day 22 samples were FAC-sorted into Tau–eGFP+ and Experimental time point (green/blue sidebar) and fractional neuron Tau–eGFP− populations (n =​ 4 for all populations, biological replicates; identity (yellow/red sidebar) are shown at the top. c–e, Pseudo-temporal error bars, s.e.m.). g, Western blot for selected TRs from the initiation expression dynamics of exemplary TRs marking the initiation stage (c) subnetwork presented in panel b. β​-Actin was used as a loading control and the maturation stage (d) of iN cell reprogramming as well as MEF (Supplementary Data 8). identity (e). Transcript levels of the TRs are shown across all single

© 2016 Macmillan Publishers Limited. All rights reserved