Department of Haematology, Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge Institute for Medical Research, Cambridge CB2 0XY, United Kingdom

Edited by Ellen V. Rothenberg, California Institute of Technology, Pasadena, CA, (received July 25, 2016; reviewed by Berthold Gottgens)

Reconstructing regulatory network models from single-cell blood stem cell profiles

Fiona K. Hamey, Sonia Nestorowa, Sarah J. Kinston, David G. Kent, Nicola K. Wilson, Berthold Gottgens

Hematopoiesis is an extensively studied and well-characterized system. Hematopoietic stem cells (HSCs). HSCs are able to maintain the mammalian blood system throughout life, adult blood contains a mixture of functionally specialized cells. To obtain this comprehensive coverage of the murine bone marrow HSPC compartment, we extended previously published qRT-PCR data (2) to provide a large pool of cells for this investigation. To infer regulatory relationships, we have used single-cell gene expression transcription factor profiling.

Computational modeling of gene regulatory networks has been widely applied to a variety of developmental processes, providing new understanding about gene regulatory networks. Several studies have used mathematical approaches to develop models of gene regulation in Drosophila development (13). In particular, Peter et al. (14) used extensive experimental evidence of transcriptional regulation in the developing sea urchin embryo to create a computational network model that was capable of simulating and recapitulating known patterning behavior, making predictions by simulating perturbations.

In the present study, gap genes of the Drosophila embryo improved the understanding of the role of gene interactions in patterning (13). Computational approaches basing network reconstruction on single-cell data have emerged more recently (6, 7). The power in discovering simple relationships from single-cell data has been recognized in blood systems (8–11). However, most network reconstruction methods have been restricted to data measured in whole populations of cells. Computational methods to predict functional relationships therefore have become widely used to infer transcriptional regulatory networks. Identifying true transcriptional regulatory networks remains an enormous challenge, at least in part because experimental validation of the interactions does not readily scale to a system-wide approach. Computational network inference methods have been used to discover relationships between regulator genes and target genes, acting as components of transcriptional regulatory networks (5).

Data deposition: The ChIP data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database (accession no. GSE84328). The network models 1 and 2 reported in this paper have been deposited in the BioModels database, respectively, (accession no. MODEL1610060000 and MODEL1610060001).

This article is a PNAS Direct Submission. E.V.R. is a guest editor invited by the Editorial Board.

This paper results from the Arthur M. Sackler Colloquium of the National Academy of Sciences, "Gene Regulatory Networks and Network Models in Development and Evolution," held April 12–14, 2016, at the Arnold and Mabel Beckman Center of the National Academies of Sciences and Engineering in Irvine, CA. The complete program and video recordings of most presentations are available on the NAS website. Author contributions: B.G. and N.K.W. designed research; F.K.H. S.N., F.K.H., and S.J.K., N.K.W. and D.G.K. performed research; F.K.H. and N.K.W. contributed new reagents/analytic tools; F.K.H., S.N., D.G.K., S.N., and N.K.W., B.G. analyzed data; and F.K.H. wrote the paper.

The authors declare no conflict of interest.

To whom correspondence may be addressed. Email: [email protected] or [email protected]

F.K.H. and S.N. contributed equally to this work.

To address the question of how HSPC fate decisions are controlled, as well as identifying regulatory relationships, we infer regulatory network circuit gene models where constructing (12) embryo. The MEP and LMPP

BIOPHYSICS AND COLLOQUIUM COMPUTATIONAL BIOLOGY PAPER of the murine bone marrow HSPC compartment. Using these can capture a variety of more complex structures. The diffusion data, differentiation trajectories from HSCs to progenitor cells map method has been specifically adapted for use with single- were constructed. These were used to infer and validate regu- cell expression data (17) and has proved to be a powerful tool for latory network models, thereby gaining greater insight into the representing spatial heterogeneity in single-cell data from mouse transcriptional programs governing HSC differentiation. embryos (18), and branching differentiation dynamics for both single-cell qRT-PCR data describing embryonic blood develop- Results ment (10) and single-cell RNA sequencing (scRNA-Seq) data for Single-Cell Snapshot Measurements Capture Progression Through adult HSPCs (19). HSPC Differentiation. To study the transcriptional control of When applied to our data, diffusion map analysis using all of HSPC differentiation, we previously collected single-cell qRT- the genes analyzed by single-cell qRT-PCR demonstrated that PCR data for HSCs and progenitor cells, in which we quan- the new and old data sets integrated well (SI Appendix, Fig. S1). tified the expression levels of 48 genes in 1,626 HSPCs The location of specific HSPC populations in the diffusion using the Fluidigm Biomark system (2). This study profiled map was consistent with known lineage relationships between megakaryocyte–erythroid progenitors (MEPs), granulocyte– mature cell types and their respective precursor populations. monocyte progenitors (GMPs), lymphoid-primed multipotent Fig. 1B highlights two progenitor cell populations, MEPs and progenitors (LMPPs), common myeloid progenitors (CMPs), LMPPs, along with the so-called molecular overlap, or “MolO” HSCs with finite self-renewal (FSR-HSCs), and long-term HSCs HSCs, as identified by Wilson et al. (2). MolO cells are HSCs (LT-HSCs). However, the primary focus was to resolve hetero- with a shared transcriptional profile and increased probability geneity within four different LT-HSC populations isolated by of long-term multilineage reconstitution upon single-cell trans- fluorescence-activated cell sorting. Furthermore, the study pro- plantation. Cells belonging to intermediate populations, such as filed a limited number of progenitor populations. As we were MPPs and preMegEs, were present in regions of the diffusion interested in understanding progression through differentiation, map between the highlighted cell types. Taken together, dif- we generated equivalent expression profiles for over 500 single fusion map analysis of this comprehensive single-cell data set cells from three additional populations to increase the coverage reveals a transcriptional landscape of expression states charac- of intermediate cell stages and therefore improve our resolution teristic for early HSPC differentiation (Fig. 1C). In addition, of the hematopoietic hierarchy (Fig. 1A). FSR-HSC2, multipo- the coordinates of the data in the diffusion map provide more tent progenitor (MPP), and pre-megakaryocyte-erythroid pro- than a visualization, as distances in diffusion space represent genitor (preMegE) (15) populations were profiled using the a measure of similarity between cells that avoids some of the same single-cell qRT-PCR assays as before. Combined with the effects of noise present in single-cell expression measurements earlier profiles, these data provide extensive coverage of murine (11, 20). HSPC populations (Fig. 1A). The gene set used included 33 tran- scription factors known to play a role in HSC or myeloid differ- Single-Cell Expression Profiles Can Be Used to Construct Differen- entiation, 12 nontranscription factor genes implicated in HSPC tiation Trajectories. Motivated by the consistency between the biology, and three housekeeping genes. location of HSPC populations in the diffusion map and the To visualize the broader expression landscape captured by hematopoietic hierarchy, we aimed to use the underlying coordi- these 2,167 single-cell transcriptional profiles, we used diffu- nate space to better understand transcriptional changes through- sion maps (16). Diffusion maps use properties of random walks out differentiation. Recent work introduced the concept of between cells to describe the underlying structure of the data. inferring “pseudotime” trajectories from single-cell expression This method offers an advantage over linear dimensionality data, where a sample of cells is ordered by progress through reduction techniques, such as principal component analysis, as it differentiation based on the strength of similarities between



Fig. 1. Single-cell profiling captures the transcriptional landscape of HSC differentiation. (A) The hematopoietic hierarchy, with popula- tions profiled by qRT-PCR highlighted in boxes. The sorting strategies used to isolate each population are displayed to the right of the lineage tree. The three cell types with starred sorting strategies were collected and profiled specifically for this study; unstarred populations were profiled in our previous study, and the lineage tree diagram is also adapted from this paper (2). (B) Diffusion map dimensionality reduction of the populations highlighted in A based on gene expression as quantified by qRT-PCR. MolO stem cells (a subset of the LT-HSC sorting strategies enriched for functional LT-HSCs) are shown in purple, MEPs in red, and LMPPs in blue. All other cell types are in gray. For diffusion map, principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) plots showing all cell types see SI Appendix, Fig. S1.(C) Diagram highlighting how these single-cell data capture HSC fate choice. HSCs can self-renew, or differentiate toward alternative lineages. Single-cell expression data are sampled from a transcriptional landscape that contains cells at different stages along differentiation trajectories toward MEP or LMPP progenitor cells.

Computationally ordering single cells along differentiation trajectories captures gene expression dynamics.

Fig. 2.

BIOPHYSICS AND COLLOQUIUM COMPUTATIONAL BIOLOGY PAPER A Correlation network Single-cell Boolean qRT-PCR network data inference Order cells in Fig. 3. Single-cell molecular profiles allow inference of regulatory network models. ( ) pseudotime A Schematic showing the network inference steps starting from gene expression profiling using B C single-cell qRT-PCR data. (B) Potential regula- Gene correlation network gives Score functions using pseudotime ordering tors of each gene are identified by calculat- potential Boolean functions ing a pairwise gene–gene correlation network. The highest correlating gene pairs are linked G1 in the gene network. Activating (red edge) or G8 G2 repressing (blue edge) relationships correspond to positive or negative correlations, respectively. G7 G3 The regulators of each gene then define a set of potential Boolean functions governing the (I0,O0) (I1,O1) (I2,O2) G6 G4 expression of that gene. Three of the possible functions for G1 are shown here. (C) The pseudo- G5 time trajectory is then used to identify the most suitable Boolean functions. Cells are ordered in pseudotime (based on continuous expression data) and then converted to binary expression. ? Pairs of cells a fixed distance apart then repre- Fa G2 → G1 F(I ) sent input–output pairs to the Boolean function. Ik F k O G2 Λ G3 → G1 k These pairs are used to score a Boolean function Fb F by comparing F(Ik) to Ok for a pair (Ik, Ok). The G2 V G3 → G1 highest scoring function is the one where these Fc Input Boolean Predicted Observed

... Function Output Output values agree for the greatest number of pairs.

well as providing a score for the Boolean functions, this method MEP stable states having expression close to cells on the MEP can also enable a direction of regulation to be inferred from trajectory. Similarly, for the LMPP network model, stable states the undirected correlation network. Using this method, we iden- were found that either closely or exactly matched the expression tified potential transcriptional regulatory network models for profiles of LMPP cells from the primary bone marrow data (SI differentiation from HSCs to MEPs and LMPPs, with regula- Appendix, Fig. S3). To visualize how closely these stable states tory rules for each gene given by the highest scoring Boolean matched the location of cells sorted on LMPP and MEP surface functions (see SI Appendix, Table S1 for full set of results). markers, we also projected the stable states onto the diffusion Examples of the dynamic expression patterns seen in Fig. 2B can map (Fig. 4B). Close matches between the sorted qRT-PCR data be readily explained by the Boolean rules, such as differences and the stable states were seen along the relevant lineage trajec- in Notch expression between the two trajectories. In the LMPP tories for both network models. trajectory, the expression increases throughout differentiation, Stable state analysis identifies all stable states of the network whereas the majority of cells on the MEP differentiation trajec- model, regardless of whether they can be reached from a biolog- tory do not express Notch. Investigating the Boolean rules for ically meaningful starting condition. We therefore simulated the Notch shows that it is predicted within the LMPP trajectory to be network with initial conditions corresponding to binary expres- regulated via Lmo2 AND NOT (Gata2 AND Gfi1b). A similar sion in MolO cells (see Materials and Methods for details). Sim- rule was found as one of the alternatives for the MEP trajec- ulations starting from several of the MolO binary states could tory [Lmo2 AND NOT (Gata2 OR Gfi1b) activates Notch]. The stabilize on both the MEP and LMPP binary states when simu- different behavior of Gata2 and Gfi1b along both trajectories lated with the relevant networks, demonstrating that the two net- can account for the different dynamics of Notch expression, as work models could recapitulate differentiation trajectories from Gata2 and Gfi1b are downregulated toward LMPPs but remain HSCs to MEPs and LMPPs, respectively. expressed in MEPs. Differences in Network Model Connectivity Are Supported by Tran- Stable State Analysis of Network Models Identifies States Corre- scription Factor Binding. Given the differences in dynamic expres- sponding to in Vivo Cell Types. The Boolean network models sion of genes such as Notch and Gata1 between the two reconstructed from pseudotime ordering of LMPP and MEP dif- differentiation trajectories, it was not unexpected that the ferentiation trajectories were found to have complex structures, inferred Boolean networks for the two trajectories show differ- with each gene receiving inputs from an average of four upstream ences in the regulatory rules for some genes. Comparing rules regulators, often as part of composite Boolean functions such as in the two network models highlighted a trio of genes with “(Notch AND Tcf7) AND NOT Etv6 activates Ets1.” Simplified regulation unique to the MEP network model (Fig. 5A). In graphical representation, depicting regulation as only activation the MEP network model, Gata2 positively regulates Cbfa2t3h or repression rather than the Boolean AND/OR relations form- and Nfe2, with this regulation not present in the LMPP net- ing the regulatory rules, illustrates the highly connected nature work model. Classical assays for the functional validation of the of both networks (Fig. 4A and SI Appendix, Fig. S2). To assess specificity of regulatory relationships require the use of model whether the reconstructed LMPP and MEP network models cell lines. We therefore considered previously published single- were able to recapitulate HSPC differentiation, we identified the cell expression profiles of the 416B myeloid progenitor cell line stable states of both models. Importantly, this analysis demon- (26, 27), which can be induced toward megakaryocyte differ- strated that, within the set of stable states for the MEP network entiation (28). Projection of the 416B expression profiles onto model, there were several states exactly matching binary gene our bone marrow HSPC diffusion map indeed demonstrated expression profiles of MEP but not LMPP cells, with the other that 416B cells occupy a territory that forms part of the MEP

4 of 8 | Hamey et al. Several of the Boolean rules for the MEP network model, which we found to be consistent with previously reported relationships [growth factor independent 1B (Gfi1b) being regulated via Gata2 and Nfe2. The Boolean rules proposed for the MEP model therefore was consistent with both ChIP-Seq and luciferase assays, demonstrating Gata2 activation of Cbfa2t3h and Nfe2. Moreover, mutation of the Gata2 binding sites resulted in significantly reduced luciferase activity, and was therefore consistent with both the proposed role of Gata2 activation during MEP differentiation.

Luciferase reporter assays demonstrated that constructs with the relevant Gata2 binding sites showed significant activation over promoter/enhancer-less control constructs in 416B cells (Fig. 5D). The wild-type constructs generated for both the Cbfa2t3h promoter, as well as constructs with corresponding Gata2 binding sites mutated. We therefore generated reporter constructs for our model, by the predicted activation as well as at a much lower level. At the Nfe2 locus, a prominent peak was identified at the −7-kb enhancer region in 416B cells and HoxB8-FL cells. At the Cbfa2t3h locus, two prominent binding peaks were identified, representing the minimal promoter and the full promoter. The minimal promoter at the Cbfa2t3h locus showed Gata2 binding in 416B cells and very limited Gata2 binding in HoxB8-FL cells, consistent with the model. Our single-cell profiling showed that, like primary bone marrow LMPP cells, only a small minority of HoxB8-FL cells express Cbfa2t3h, being only just above the minimal promoter region. The specificity of the −7-kb enhancer region in 416B cells was again, and at a much lower level in HoxB8-FL cells, consistent with our model.

We generated new ChIP-Seq data for Gata2 in 416B cells and HoxB8-FL cells to investigate Gata2 binding to the Cbfa2t3h and Nfe2 loci from the LMPP state. The HoxB8-FL cell line was recently reported to have both myeloid and lymphoid potential (29). We therefore also generated 107 single-cell expression profiles for HoxB8-FL cells and projected these onto the diffusion plot, which confirmed that when projected onto the diffusion map, HoxB8-FL cells resembles a state from the LMPP trajectory. This path was then formed from the start to the end cell populations. Using the Dijkstra's algorithm, the shortest path from these highlighted within these populations were identified by constructing a distance on the first four diffusion components. Branches were then formed in pseudotime. Briefly, cells to select from a trajectory for each of MEP and LMPP were identified following the visualization method from this cell path end and start were found, and then cells in the branch were ordered using the Wanderlust algorithm (21) with default parameters. (Fig. trajectory differentiation ovldt hte h idn fGt2i 1Bclscauses cells 416B in Gata2 of binding the whether validate To Immunoprecipitation Chromatin existing interrogated We B A Cbfa2t3h ou,topoietbnigpaswr identi- were peaks binding prominent two locus, Nfe2 ou,apoietpa a dnie tthe at identified was peak prominent a locus, Nfe2 uefudol nteMPnetwork MEP the in only found rule nacr hc eecomplemented were which enhancer, n ewe aa and Gata2 between and Cbfa2t3h, Cbfa2t3h Cbfa2t3h Cbfa2t3h and naccordance, In Gata2. iia n ulpro- full and minimal Nfe2 and .At 5C). (Fig. Nfe2 Cbfa2t3h during aat icvrrgltr eainhp,btol oue on focused only but relationships, regulatory single-cell discover used bulk have to to studies limited data Several was data. and single-cell feasible, than always rather not is which experimental multiple conditions, in of profiling relied results expression study gene the performing this predict on However, to perturbations. able network was in and experimental network (24), pluripotency cells the stem study model recent embryonic A to limitation. abstraction this from Boolean suffer used not does this net- in it used the that as is in modeling, work, network factors Boolean of transcription advantage can- An therefore between work. and feedback graphs capture acyclic to not limited is Bayesian topology However, simulated. network be and to efficient perturbations network computationally allow are as which such (27), methods networks using Bayesian modeled successfully factor been Transcription have data. networks expression gene from models disorders. network these in play decisions fate cell into subverted insights that provide role can the cells regu- stem mechanisms blood the of differentiation discovering lating con- (31), blood leukemia serious as As to such blood. linked ditions the is in under- programs regulated regulatory our are of improve decisions disruption fate will cell hypotheses perturbations how test of network standing and of develop effect to the models on Using laboratory. the feasible approaches in experimental per- to network relate simulate readily to single-cell and used turbations by be easily captured can information models Boolean dynamic data. the mod- exploit network to Boolean eling with trajectories pseudotime about ideas evidence. experimental of by supported control be Gata2 identified contrast- we Cbfa2t3h rules, By network to lineages. the profiling blood ing alternative expression capturing models toward gene network differentiation single-cell regulatory used transcriptional two we define study, this In Discussion 30). (27, study net- this Boolean in proposed fac- the model Ets of work via utility the regulated reiterating being thereby (Fli1) or tors], 1 (Tal1)/Ets/Gata integration 1 leukemia friend leukemia lymphocytic acute T-cell aymtoseitwt h i fcntutn regulatory constructing of aim the with exist methods Many developed recently combines method inference network Our nqet h E ewr oe,wihw on to found we which model, network MEP the to unique uldsrpino ola ue o both for rules in Boolean available of is repres- networks description arrow. and flat-headed full blue arrow, indi- a A with pointed is indicated red Activation is sion LMPPs. a to with HSCs cated HSCs or HSC from MEPs for differentiation to for regulatory networks models Transcriptional network the (A) of differentiation. relevance logical 4. Fig. h rmr oemro aaecp o sin- a for gene. except matched gle data state marrow stable bone the primary that the a means and 1 data, exact of marrow an value bone primary indicates the zero to of match closely value values; a how expression example, indicates measured for binary state matches each it of of intensity color The the circles). red/blue high- (large and map lighted was diffusion the data) into nonbinary projected then the gene (in near- binary neighbors nearest these the its of expression in average state, The found data. expression stable were each neighbors est For (small data points). qRT-PCR of gray marrow map bone (blue) diffusion primary LMPP the the and onto (red) projected MEP networks of states Stable (B) tbesaeaayi eosrtsbio- demonstrates analysis state Stable NSEryEdition Early PNAS al S1 Table Appendix, SI Nfe2 | f8 of 5 and .

BIOPHYSICS AND COLLOQUIUM COMPUTATIONAL BIOLOGY PAPER A B C MEP Network only: Gata2 binding to Cbfa2t3h mm10 20 kb

Gata2 _ 246

416B 416B WT DC1 HoxB8-FL _ 0 Nfe2 Cbfa2t3h In vivo _ 246 Min HoxB8-FL

DC2 _ 0 Cbfa2t3h Promoter Gata2 binding to Nfe2 D Fold change in luciferase acvity at Fold change in luciferase acvity at mm10 5 kb Cbfa2t3hpromoter Nfe2 enhancer _ 399 1.2 1.2 416B 1 1 0.8 0.8 _ 0 *** 0.6 *** 0.6 * _ 399

Fold Change 0.4

Fold Change 0.4 HoxB8-FL 0.2 0.2

0 0 _ 0 WT Gata2 mutant WT Gata2 mutant WT Gata2 mutant Nfe2 Cbfa2t3h promoter Cbfa2t3hmin promoter Nfe2 enhancer -7kb enhancer

Fig. 5. Regulatory relationships unique to the MEP network model are supported by transcription factor binding. (A) Diagram of the trio of genes with a regulatory pattern identified as unique to the MEP network model. Red arrows indicate binding and positive regulation of genes by Gata2. (B) Diffusion map with projected qRT-PCR data for 416B and HoxB8-FL cells, showing gene expression similarities between the cell lines and in vivo data. (C) ChIP-Seq analysis of Gata2 in 416B and HoxB8-FL cell lines, showing Gata2 binds the Cbfa2t3h promoter in 416B cells only, and binds the Nfe2 enhancer in both cell lines but with greater binding in 416B cells. (D) Fold change in luciferase activity at the Cbfa2t3h promoter and Nfe2 enhancer, comparing the wild-type and Gata2 mutant regulatory regions. (*P < 0.05, **P < 0.01, ***P < 0.001; two-tailed unpaired t test, n = 3 ± SD.)

simple correlation analyses (6, 7), which cannot infer the direc- involved in HSC cell fate decisions made by single cells. Future tion of regulation without additional experimental data. work may focus on expanding the set of profiled genes by using More recently, single-cell gene expression data have been used other high-throughput single-cell approaches, such as RNA-seq, to construct Boolean network models, but the models either which may also resolve heterogeneities within HSPC populations relied on the assumption of cells being in a steady-state (8), which linked to fate choices (3, 32, 33). However, single-cell RNA-seq is is not applicable to differentiating systems, or only used binary currently less sensitive than qRT-PCR, which presents its own set gene expression data, thereby losing information present in the of challenges for network inference methods. level of gene expression (10). Regulatory factors in Boolean The two MEP trajectory-specific network rules we identified, rules with many OR logic inputs could play different roles in the namely, the positive regulation of Cbfa2t3h and Nfe2 by Gata2, regulation of expression levels, which would not be captured by are both consistent with the known biological functions of the the Boolean model. For example, when Gata2 is predicted by genes involved. Cbfa2t3h functions as a key component of multi- our model to act in OR logic control of Nfe2, this would lead to meric transcription factor complexes that regulate both erythroid the prediction that the loss of Gata2 is as important as any of and megakaryocytic expression programs (34–36), whereas Nfe2 the other factors also involved in the OR rules. This may not be was originally discovered as an upstream regulator of globin gene true in vivo, as the relative expression levels of genes will vary, expression (37) and is required for megakaryocyte maturation and loss of a transcription factor that is very highly expressed (38). Gata2 is primarily recognized as a regulator of HSPC func- may have different consequences than that of the loss of a factor tion (39, 40). It is involved in HSC maintenance and expansion, that is lowly expressed. An alternative approach, using the pseu- playing a role in early hematopoietic cell formation (41), where dotime ordering of single-cell expression profiles to construct Gata2 knockout mice display defects in primary hematopoiesis an ordinary differential equation network model, was recently (42). Cbfa2t3h encodes the transcription factor Eto2, a core- described (11). This approach can model more sensitive changes pressor in complex with Scl/Tal1 (43). Gata2 binds and activates in gene expression levels, but is limited to smaller networks. Cbfa2t3h and, during differentiation, Eto2 represses its own pro- We believe that the ability of our method to simulate and infer moter, leading to erythroid maturation and a Gata1-driven tran- larger networks is a reasonable trade-off for modeling binary scriptional program (44). Directly linking Gata2 to Cbfa2t3h and gene expression states. Nfe2 in the MEP regulatory network model but not the LMPP Our method is particularly useful for studying regulation of network model therefore provides an illustration of how differ- differentiation processes, as it uses the dynamic pseudotime ences in network topology guide the interaction between HSPC ordering to identify regulatory rules. A limitation of using qRT- regulators such as Gata2 and more lineage-restricted regulators PCR profiling is that it can only measure the expression of a lim- such as Cbfa2t3h and Nfe2. Interestingly, although Cbfa2t3h is ited number of genes. This restriction will affect the accuracy of traditionally reported to be a corepressor, our model predicts the pseudotime ordering and means some important regulatory that it would activate several genes in the network. This activity relationships cannot be described in the network, as the relevant could be directly a result of Cbfa2t3h (depending on its cofactors) genes were not included. Nevertheless, this study demonstrates an or a double repressive link (involving a gene not included in our advantage to performing single-cell rather than bulk expression dataset). An important area of future research will be the iden- analysis, as it allows the construction of differentiation trajectories tification of the mechanisms that direct stem cells into entering (11, 21, 22) and the reconstruction of transcriptional relationships specific differentiation trajectories. By identifying and validating

6 of 8 | Hamey et al. Downloaded by guest on October 1, 2021 Downloaded by guest on October 1, 2021 (O formedfrom,atmost, and function f activators possible (I taking by generated then tr e oadfutof default a to set eters functions The 2. in-degree of nodes OR and nodes AND to restricted (c cells ordered ae tal. et Hamey vector a by represented is cell (ON/OFF) each ( binary in to where expression converted After Binary was order. expression pseudotime expression. the gene from ordering, pairs input–output val- pseudotime the correlation generate of to (distribution in used for edges available self-activation potential plus is as pairs, ues correlating taken 100 were top gene, The each zero. significance to with func- set coefficients were using Correlation 0.01 package. factors R transcription ppcor of the pairs from tions all on calculated were coefficients Construction. Stable State Analysis. For the LMPP and MEP networks, simulations were run starting from MolO cells, and the end states of the network expression were recorded. For each of the 237 binary stable states identified, its nearest neighbor was found in the primary bone marrow expression data. If more than one neighbor was identified, the expression across genes was averaged. For each stable state, the best match was the neighbor in the diffusion map. The average gene expression levels were then projected onto the diffusion map from the Destiny R package using the dm.predict function expression.

Chromatin Immunoprecipitation Sequencing. ChIP-Seq data have been submitted to the NCBI Gene Expression Omnibus with identifier GSE84328 (see (48) described previously (49) Assays.

Luciferase Assays (see SI Appendix for details).

Bone marrow cells were isolated from 8- to 12-wk-old C57BL/6 mice. Cells were red-cell-depleted by ammonium chloride and lineage-depleted using the EasySep Mouse Hematopoietic Progenitor Cell Enrichment Kit (STEMCELL Technologies). Antibodies used for isolation of FSR-HSC2, MPPs, and PreMegEs are listed PCR 96-well a of wells individual into mice C57BL/6 12-wk-old to EasySep 8- the of using lineage-depleted Technologies). crest were (STEMCELL Cells lysis iliac chloride ammonium and by red-cell-depleted tibiae, and femurs, Cells. the Progenitor from and Stem of Purification Methods and Materials systems. be biological will differentiating approach other inference to network applicable our widely and differen- regulated, HSC capture is how about tiation models hypotheses network provide Our and biology differences models. known of network validation two provide the blood and between in cells, models progenitor network and regulatory describe stem differentia- to this cell use We governing tion. programs regulatory transcriptional mutation. by cell affected specific be as well may as alone, that interact, types knockouts that genes gene of from combinations as available such can not in we information and investigations, gain vitro silico in in also these guide doing to By inference investigations. network vivo the silico show in we using models, network of LMPP value and MEP the in rules simple slto fFRHC,MP,adPeeE r itdin listed are for PreMegEs and S2 used MPPs, Antibodies FSR-HSC2, Technologies). of (STEMCELL isolation Kit Enrichment Cell genitor .Wlo K ta.(05 obndsnl-elfntoa n eeexpression myeloid in commitment gene lineage and heterogeneity and Transcriptional (2015) al. functional et F, Paul single-cell 3. 5. Peter IS, Davidson EH (2015) Genomic Control Process: Development and Evolution 2nd Ed. (Academic, New York).
6. Moignard V, et al. (2013) Characterization of transcriptional networks in blood stem S4 Fig. Appendix, SI , 0 ers egbr fec elo hspt.Clson Cells path. this on cell each of neighbors nearest 100 c ..., i d } 1 eoepedtm reig w ieg branches lineage two ordering, pseudotime Before ocntuta nta ewr,prilcorrelation partial network, initial an construct To d mJPathol J Am n repressors and and 1 n = .Testo nu–uptpairs input–output of set The ). d k d 4, 2 t = ) regulatingTFs,respectively,withtheseparam- j 2 0naetniho rp sn Euclidean using graph nearest-neighbor 30 cell = = 169(2):338–346. o ahgene. each for 2 f mode[(c qpcr.html. 1 igecl eeepeso analysis expression gene Single-cell {r ersnsteatvtn ato the of part activating the represents j ola functions Boolean }. oemro el eeisolated were cells marrow Bone t −k TM −1 os eaooei Pro- Hematopoietic Mouse ) j f , σ j (c saBoenfunction Boolean a is = t j −k Table Appendix, SI IAppendix SI ncell in .Cl iedata line Cell 0.3. elSe Cell Stem Cell ) j , {(I (c f t t −k O , fteform the of i Consider . f k +1 1 t f , = )} ) 2 j .Dif- ). was 3 ] were were 16(6): and c > i , {(I genes activating For 4)(see (49) Assays. Luciferase Omnibus Expression Gene NCBI the GSE84328. to identifier submitted with been have data ChIP-Seq (see (48) described previously Sequencing. Immunoprecipitation Chromatin function dm.predict expression the average using package. map R the Destiny diffusion and the the from neighbors, onto expres- gene projected across continuous was averaged the state match, were best If levels the map. state, diffusion was sion stable the neighbor in each one highlighted than For and more data. identified was expression neighbor marrow nearest its bone primary was binary simulation the with of cor- state states end expression stable binary the 237 and recorded. the cells, of MolO each simula- to from 1,000 responding starting networks, LMPP no run and and were and MEP tions stabilized R For in network expression. change encoded the could were until genes network updates from a asynchronous reachable for with states rules simulated the Boolean identify To points, (47). starting time) MolO a at randomly changing one gene of chosen (expression updates asynchronous using algorithm GenYsis Analysis. State Stable rules. two other the contains n uoenHmtlg soito o-lnclAvne Research Advanced Non-Clinical Association Hematology Fellowship. European (15008) Fellowship PhD a Bennett Council Bloodwise Research and a Medical by Stem supported of Medical is Cambridge recipients D.G.K. for are Studentships. F.K.H. Council Institute and Research Cambridge S.N. Institute. To find the rule for a gene, the above criteria were encoded as a Boolean satisfiability problem using the Z3 solver in the Python programming language. This method was implemented in the Python (see SI Appendix for details).

To identify the best rules for each gene, this function counts how many times the predicted output of a function agrees with the observed output from pseudotime. This method also can infer a direction of regulation from the undirected correlation network.

For each gene, a set of potential Boolean functions was generated taking activators and repressors from the partial correlation network. The pseudotime trajectory was then used to identify the most suitable Boolean functions. Cells were ordered in pseudotime and then converted to binary expression. Pairs of cells a fixed distance apart then represent input–output pairs to the Boolean function.

In conclusion, we present an algorithm for discovering the regulatory programs controlling differentiation toward alternative blood lineages. By using single-cell gene expression profiling to capture the transcriptional heterogeneity within stem cell populations, we were able to test hypotheses and develop network models on the effect of perturbations. Using approaches to simulate network perturbations will improve our understanding of how fate cell decisions are regulated and the role that disruption of these regulatory programs play in blood disorders. As transcriptional dysregulation is linked to serious conditions such as leukemia, blood commitment and myeloid gene expression in cells: NK, IL 7. Pina C, et al. (2015) Single-cell network analysis identifies DDIT3 as a nodal lineage regulator in hematopoiesis. Cell Rep 11(10):1503–1510.
8. Xu H

