differentiation as a many-body problem

Bin Zhanga,b and Peter G. Wolynesa,b,c,1

Departments of aChemistry and cPhysics and Astronomy, and bCenter for Theoretical Biological Physics, Rice University, Houston, TX 77005

Contributed by Peter G. Wolynes, May 9, 2014 (sent for review March 25, 2014) Stem cell differentiation has been viewed as coming from transitions transcription factors function as pioneers that can directly bind between attractors on an epigenetic landscape that governs the with the chromatin sites occupied by the nucleosome, slow dynamics of a regulatory network involving many . Rigorous DNA binding (14, 15) is still a good approximation to describe definition of such a landscape is made possible by the realization the effect of the progressive change of the chromatin structure that regulation is stochastic, owing to the small copy number of and histone modification induced by the pioneer factors on gene the transcription factors that regulate and because regulation (16). As a result, DNA binding must be treated on of the single-molecule nature of the gene itself. We develop an ap- equal footing together with protein synthesis and degradation proximation that allows the quantitative construction of the epige- to fully understand eukaryotic gene regulation (14–18). netic landscape for large realistic model networks. Applying this By increasing the dimensionality of the problem, investigating approach to the network for embryonic stem cell development ex- the effects arising from slow DNA-binding kinetics on gene net- plains many experimental observations, including the heteroge- work dynamics becomes computationally challenging. Although neous distribution of the Nanog and its role in being in some aspects conceptually transparent, where one can safeguarding the stem cell pluripotency, which can be understood directly simulate the chemical reactions involved in a gene network by finding stable steady-state attractors and the most probable using a Monte Carlo algorithm (19), such a procedure quickly transition paths between those attractors. We also demonstrate becomes inefficient as the system complexity increases and is even that the switching rate between attractors can be significantly impractical for studying rare, transient switching events between influenced by the gene expression noise arising from the fluctua- steady states that occur on long timescales. In recognizing the close tions of DNA occupancy when binding to a specific DNA site is slow. analogy between gene networks and magnetic systems, Sasai and Wolynes suggested that analytical approaches originating from gene network | most probable path | master equation quantum statistical mechanics could be used to study the epigenetic landscape of networks and discussed the steady states of a very nderstanding the underlying mechanisms of the differenti- stylized model network with many attractors (4). Although their Uation of stem cells into many distinct cell types has long been approach was efficient in identifying steady-state solutions, it re- a goal of developmental biology (1) and regenerative medicine (2). mained to show how this approach can characterize transitions The epigenetic landscape, originally proposed by Waddington, has between different attractors. In the adiabatic limit where DNA proved to be a useful metaphor for visualizing cellular dynamics (3). binding is fast, analytical approaches to the transition process based However, is it more than a metaphor? In this view, cell phenotypes on large deviation theory have proved successful in studying noise- are identified as attractors with well-defined patterns of robust gene induced transitions (7, 20–22). Here we show how to find equiva- expression, and differentiation occurs through transitions from the lent approaches for large networks where DNA binding must be stem cell attractor to other attractors for the differentiated cells. It treated explicitly. has become clear that for simple models, taking into account sto- In this paper, we generalize a kinetic model originally pro- chastic effects allows a well-defined landscape to be constructed posed by Sasai and Wolynes (4) to explicitly model DNA binding (4–6). This generalized potential landscape governs much of the along with protein synthesis and degradation in large gene net- gene regulatory network dynamics, such that corrections to such works with multifactorial and complex switches. An approxima- a landscape picture can also be defined and formalized (7, 8). It tion method is further developed that allows the construction of remains a challenge to construct such landscapes for realistically large network models. Significance Describing the stochastic dynamics of gene networks must include the statistics of synthesizing transcription factors, their A molecular understanding of stem cell differentiation requires degradation, and their binding to genes on the DNA. These all the study of gene regulatory network dynamics that includes play crucial roles in shaping the epigenetic landscape (9). Due to the statistics of synthesizing transcription factors, their deg- the complexity of the assembly of the machinery involved for radation, and their binding to the DNA. Brute force simulation protein synthesis, it has often been assumed that protein trans- for complex large realistic networks can be computationally lation is much slower than the diffusion-limited DNA-binding challenging. Here we develop a sound approximation method process. Under this view, the ensemble of DNA occupancies, i.e., ’ built upon theories originating from quantum statistical me- the set of transcription factors bound at a gene s regulatory ele- chanics to study the network for embryonic stem cell de- ments, can be assumed to have achieved a quasi-equilibrium so that velopment. Mechanistic insight is provided on the role of the the expression dynamics of this gene can then be described as a – master regulator Nanog in safeguarding stem cell pluripotency. birth death process governed by an averaged protein production We also demonstrate the significant influence of DNA-binding

rate that depends nonlinearly on transcription factor concen- kinetics, an aspect that often has been overlooked, on the BIOPHYSICS AND trations (10). Although perhaps valid for some networks in pro-

transition rate between gene network steady states. COMPUTATIONAL BIOLOGY karyotic cells, this assumption on timescales is probably not completely adequate for eukaryotic systems because the pro- Author contributions: B.Z. and P.G.W. designed research, performed research, analyzed cesses of chromatin decondensation and unwrapping of DNA data, and wrote the paper. from nucleosomes, both of which are required for proteins to bind, The authors declare no conflict of interest. take time. The high-level architecture of chromatin can severely Freely available online through the PNAS open access option. limit the access of transcription factors to the DNA, slowing down 1To whom correspondence should be addressed. E-mail: [email protected]. CHEMISTRY DNA-binding kinetics (11, 12) and resulting in gene expres- This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. sion noise deviating from Poisson statistics (13). Even when the 1073/pnas.1408561111/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1408561111 PNAS | July 15, 2014 | vol. 111 | no. 28 | 10185–10190 Downloaded by guest on September 27, 2021 the network’s epigenetic landscape via the determination of not A Oct4 Oct4 only steady-state solutions, but also most probable transition OCT4 CDX2 paths and stochastic switching rates between steady states. When k applied to a model of the core transcriptional regulatory network h Oct4 Oct4 of embryonic stem cells, these approximations turn out to explain g Regulatory Oct4 Oct4 the observed fluctuations in the expression of the master plu- Element f Cdx2 Gene ripotent regulator Nanog and the important role it has in safe- Cdx2 Cdx2 guarding stem cell pluripotency against differentiation. We also Oct4 Gene f Regulatory g Element demonstrate the striking effect that DNA-binding kinetics have Cdx2 h Cdx2 on the switching rate between steady states, making clear the Cdx2 Cdx2 importance of explicitly modeling fluctuations of DNA occu- k pancy in studying the fate of gene networks. Theoretical Approaches for Gene Networks B Viewed as a molecular process, gene regulation is complicated, GCNF with many actors as well as attractors, and spans a large spatial GATA6 CDX2 and temporal domain. Several approximations are needed to make tractable mathematical models. We use an admittedly simplified model but one that contains many of the crucial fea- Activation tures of the real embryonic stem cell network. Repression KLF4 PBX1 Pluripotency NANOG Constructing Schematic Kinetic Models for Gene Networks. Following Sasai and Wolynes (4), we model gene regulation at a level that Differentiation includes explicit binding and unbinding of transcription factors to OCT4 SOX2 the DNA, protein translation, and protein degradation. Both the mRNA intermediary and the serial nature of macromolecular Oct4-Sox2 synthesis are neglected in the present model but are likely im- portant. Fig. 1A illustrates how we can diagram a mutual re- pression motif between genes Cdx2 and Oct4. The gene Cdx2 Fig. 1. A schematic gene regulatory network for the embryonic stem cell. can be transcribed to proteins at a rate g, which will change (A) Schematics for the detailed chemical reactions included in modeling the mutual repression between genes Oct4 and Cdx2. Inset simplifies the reac- depending on whether the transcription factors produced by the ’ tions into a diagram that is useful for illustrating the topology of complex Oct4 gene are bound at the Cdx2 gene s regulatory element. networks. (B) Topology of the stem cell gene network. Each node represents Here, we assume Oct4 proteins will interact with the Cdx2 reg- either a transcription factor (single word) or a protein complex (two words) ulatory element in the form of a homodimer and that they will involved in early stem cell differentiation. Arrows that end at transcription bind with the regulatory element at a rate h and unbind with factors represent transcriptional regulations with the mechanism shown in a rate f. Similarly, the Cdx2 proteins can also interact with the A. The green color is used for activation and the red color indicates re- Oct4 gene’s regulatory element and regulate the production rate pression. Arrows targeting protein complexes emerge from the monomers of Oct4 proteins. Proteins in our model are also degraded with involved in forming the complex. Gene markers found in cells being in a rate k. We do not explicitly model nucleosome dynamics and pluripotent states are colored in red and orange, and those found for dif- ferentiated state phenotypes are colored in blue. covalent chromatin modifications such as DNA methylation and histone acetylation and methylation and assume these affect gene expression only via regulating the kinetics of transcription factor be overemphasized. As shown below, many concrete conclusions binding to the DNA. Accounting for such epigenetic modifications that appear insensitive to these simplifications can be drawn due may be necessary to fully appreciate the complexity of eukaryotic to the robustness of the underlying gene network. gene regulation, as emphasized by Sasai et al. (16). Simplified as arrows, this mutual repression motif can be represented as the Master Equation for Network Dynamics. As shown in ref. 4, to fully diagram shown in Fig. 1A, Inset. Including interactions with other specify a gene’s state, knowledge of both the occupancy of the genes, a relatively realistic network for the embryonic stem cell DNA site and the concentration of the proteins is required. For can be diagrammed as in Fig. 1B. We emphasize that every link in Fig. 1B includes the series of reactions illustrated in Fig. 1A. a gene that is regulated by only one transcription factor, the joint Obviously many quantitative molecular details of even this probability of the DNA occupancy and the number of proteins n simplified scheme are still unknown. In building a kinetic model can be conveniently written as a two-component column vector P ; = ; ; ; T for the gene network, further approximations are thus necessary ðn tÞ ðP1ðn tÞ P2ðn tÞÞ , with T as the transpose operator. and worthy of clarification. The resulting model still can thus be The subscripts 1 and 2 refer to the two DNA occupancy states deemed only “stylized.” First, we assume all of the transcription encoding the gene’s regulatory element being either empty or factors, except the Oct4–Sox2 protein complex, will interact with bound with the transcription factor, respectively. When the gene the DNA sequence in the form of homodimers. Although many is regulated via a multiplicity of transcription factors as is the transcription factors are indeed found to function as dimers (23), case for most realistic large networks, describing the DNA oc- it is unlikely that this is true for all of the proteins included in our cupancy states also requires more dimensions. Due to the as- current network. Second, many genes in the network are regu- sumed independent binding at the gene’s regulatory elements, lated by a multiplicity of transcription factors. Again, for sim- there will be a total of N = 2NTF numbers of DNA occupancy plicity and generality, we assume all of the different transcription factors occupy unique, nonoverlapping regulatory elements. Fi- states for a gene regulated by NTF different transcription factors. nally, because many of the kinetic parameters are not available at Generalizing from the two-state case, the joint probability for the this point, all of the genes in the network are assumed to function state of such a gene can be represented with an N-component P ; = ; ; ; ; ⋯; ; T with the same set of rates. We see then this is something like an column vector ðn tÞ ðP1ðn tÞ P2ðn tÞ PN ðn tÞÞ .Foranet- “Ising model” for gene networks. Despite these simplifications, work consisting of M genes, we use the self-consistent proteomic the significance of having such a specific yet stylized model cannot field approximation (24) to construct the probability function of the

10186 | www.pnas.org/cgi/doi/10.1073/pnas.1408561111 Zhang and Wolynes Downloaded by guest on September 27, 2021 entire network as a simple product for its component individual q ∂ p ∂ P ; ⋯; ; = P ; ⊗ P ; ⋯ ⊗ P ; d m = H ; d m = − H ; [6] genes ðn1 nM tÞ 1ðn1 tÞ 2ðn2 tÞ M ðnM tÞ. ∂p ∂q Including the chemical reactions illustrated in Fig. 1A, the self- dt m dt m

consistent field regulatory dynamics for a given gene m can be q = qst0 subject to the two-ended boundary conditions mð0Þ m and described via the following master equation: q τ = qst1 qst0 qst1 mð Þ m . m and m are the two steady-state solutions of ∂ interest. Many numerical schemes have been proposed to solve Pmðn; tÞ = GfPmðn − 1; tÞ − Pmðn; tÞg such two-sided first-order equations (21, 29, 30), and here we use ∂t the geometric minimum action method (gMAM) (29) because of its + K + P + ; − P ; [1] fðn 1Þ mðn 1 tÞ n mðn tÞg robustness and computational efficiency. From the most probable transition path, we can estimate the transition rate k ∝ exp½−S (31), + WP ðn; tÞ: m with the transition action S defined as Z G and K are diagonal matrices with the diagonal elements Gjj X = p q_ : [7] and Kjj corresponding to the protein synthesis and degradation S dt m m rate at the DNA occupancy state j. W is the transition rate matrix m for exchanging probability among DNA occupancy states. De- tailed definitions for the rate matrices are provided in SI Text. A demonstration of the accuracy of the developed approxima- tion method in predicting the transition rate between steady Steady States and Most Probable Transition Paths. We follow the ap- states is provided in Fig. S1. proach of Doi (25) and Peliti (26) to rewrite the master equation To highlight the effects of DNA-binding kinetics, i.e., the rate in an operator form. Defining creation and annihilation operators of fluctuations of DNA occupancy, on the epigenetic landscape, † it is often convenient to compare results obtained in the adia- such thatPa jni = jn + 1i and ajni = njn − 1i and the state vector jψðtÞi = Pðn; tÞjni, the master equation is written as batic limit, in which DNA binding is considered to be fast, with n the full results obtained directly from Eqs. 4 and 6. In the adi- ∂ ψðtÞ = Ω ψðtÞ : [2] abatic limit, the state probabilities ci are approximated with an t equilibrium partition, and simplified versions of Eqs. 4 and 6 can be derived as shown in SI Text. Ω is thus a non-Hermitian “Hamiltonian” operator defined in terms of a† and a, whose explicit form is provided in SI Text. Cell Types as Steady States of Gene Networks From Eq. 2, the transition probability Pðn ; τ n ; 0Þ of finding f i Realistic models of the core transcriptional regulatory network n = n1; ⋯; nM t = τ the protein concentrations f ð f f Þ at a final time for the embryonic stem cell have already been constructed by n = 1; ⋯; M = n ; τn ; = from i ðni ni Þ at t 0 can be written as Pð f i 0Þ others (16, 32, 33). We illustrate the network used in this study in n Ωτ n h f expð Þj ii. Next, following Zhang et al. (27), using a res- Fig. 1B. This core model consists of eight unique transcription olution of the identity generalized from the coherent state basis factors and their genes interconnected via transcriptional acti- set, the transition probability can be further represented in a vation (green) and repression (red) inputs. The triad of master path integral form involving an action that follows from the pluripotency regulators, Oct4, Sox2, and Nanog, is highlighted in master equation red. These regulators maintain stem cell pluripotency by acti- Z vating a set of genes crucial for self-renewal and pluripotency n ; τn ; ∝ ∏ x ∏ c (orange) while simultaneously suppressing differentiation genes P f i 0 D m D m (blue). Using the kinetic parameters given in SI Text, we de- Z m m X [3] termine the steady-state solutions of such a network and compare × − p q_ − q ; p ; exp dt m m Hð m mÞ them with the gene expression patterns of known cell phenotypes. m Fig. 2A presents the steady-state gene expression profiles calculated in the adiabatic limit and labels these profiles using q = ; ⋯; ; − = ; ⋯; − = where m ðc1x1 cN xN ðcN c1Þ 2 ðcN cN−1Þ 2Þ and abbreviations of corresponding cell types, with SC standing for p = x ; ⋯; x ; c ; ⋯; c m ðp1 pN p1 pN−1Þ.Herecj refers to the probability of stem cell, PE for primitive endoderm, TE for trophectoderm, finding the gene m in the DNA occupancy state j with an average and DC for differentiated cell type. For the two steady states SC1 c x protein number xj and pj and pj are the corresponding conjugate and SC2 found in the bottom of Fig. 2A, proteins for the pluri- q ; p momenta. The separability of the Hamiltonian Hð m mÞ over potency gene markers Oct4, Sox2, and Klf4 are expressed in high the set of genes is a consequence of the self-consistent proteo- amounts, thus defining the corresponding phenotypes as stem mic field approximation. cells. Compared with the steady-state SC2, SC1 has a much The steady states of the master equation can be found as higher level of Nanog gene expression. The prediction of two attractors of the following deterministic dynamics that would distinct levels of Nanog gene expression in stem cell steady states extremize the action: agrees with the experimentally observed heterogeneous distri- bution of Nanog proteins from a population of stem cells (34– q ∂ d m = H : [4] 36). In the remaining three steady states, the pluripotency gene ∂p dt m p =0 markers are turned off, allowing the expression of differentiation m genes such as Gata6 and Cdx2. These steady states thus have

From the steady-state solutions, the average level of protein ex- expression profiles characteristic of differentiated cells. In par- BIOPHYSICS AND

pression xm and the probability distribution PmðnÞ for the gene m ticular, consistent with primitive endoderm cells, the Gata6 gene COMPUTATIONAL BIOLOGY can be calculated as is turned on in the steady-state PE. And consistent with tro- phectoderm cells, the Cdx2 protein is highly expressed in the n XN XN −xj steady-state TE. Finally, the network also supports the steady- xj e x = c x ; P ðnÞ = c : [5] state DC, in which both Gata6 and Cdx2 proteins are produced m j j m j n! j=1 j=1 in large amounts. A similar expression pattern to this predicted steady state has been observed in Oct4 knockdown embryonic The most probable transition paths between steady states are stem cells (37). Stability of the steady-state DC might be com- CHEMISTRY determined from the Hamiltonian equations (28) promised in the presence of cell–cell communication, which is

Zhang and Wolynes PNAS | July 15, 2014 | vol. 111 | no. 28 | 10187 Downloaded by guest on September 27, 2021 A exhibits five steady states. Fig. S2 further confirms that these steady states are similar to the completely adiabatic solutions, and the average steady-state gene expressions are quantitatively comparable. As the unbinding rate slows down, however, the two stem cell steady states collapse together ðf = 2:0Þ. From the probability distribution of gene expression shown in Fig. 2C,itis clear that the heterogeneous distribution for Nanog proteins is preserved for this collapsed situation. When the unbinding rate is still further decreased, the differentiated state disappears al- together, and only one stem cell state remains ðf = 1:0Þ. Finally, in the limit of extremely slow DNA kinetics, the regulatory network fails to function at all and all of the genes are mini- B mally expressed at a basal translation rate. The sensitivity of the number of steady states to DNA-unbinding rate argues for the importance of explicit modeling of DNA kinetics when studying epigenetic landscapes. Most Probable Transition Pathways The excellent agreement between the steady-state gene expres- sion levels shown in Fig. 2A and the gene expression levels of C cell phenotypes known in the laboratory strongly suggests that cell types indeed do correspond to attractors on an epigenetic landscape. We next set out to test whether one can understand aspects of stem cell differentiation by studying the stochastic transition between attractors. In particular, we focus on studying the differentiation of stem cells into primitive endoderm cells and the role of the Nanog gene in regulating this differentiation process. We first determine the most probable stochastic transi- tion path from the stem cell steady-state SC1 to the primitive endoderm steady-state PE in the adiabatic limit; then we discuss the effect of DNA-binding kinetics on the mechanism of this Fig. 2. Steady-state solutions for the embryonic stem cell network. (A)Gene transition in the next section. Details regarding these calculations expression levels in the steady-state solutions identified for the network when – its dynamics are modeled in the adiabatic limit. (B) Dependence of the number are provided in the SI Text and Tables S1 S3. of steady-state solutions, Nss , on the DNA-unbinding rate f.(C) The probability Fig. 3A presents the changing gene expression patterns along distribution of Nanog expression at different DNA-unbinding rates. the most probable transition pathway from the steady-state SC1 to the steady-state PE. Starting from SC1, the pathway first transits to the steady-state SC2. During the transition from SC1 known to lead to mutual repression between Gata6 and Cdx2 to SC2, the Nanog gene switches from an on state to an off state, genes (16). We note gene expression profiles consistent with the accompanied by a coherent down-regulation in the expression of five predicted steady states in Fig. 2A have been observed in the Pbx1 gene. Little change is observed for the expression of previous studies from the analysis of embryonic stem cell net- other genes. As the pathway exits from the steady-state SC2, the works (15, 16, 32). expression level of the differentiation gene Gata6 starts to in- The observed five steady-state gene expression profiles are crease along with its antagonizing gene Nanog being turned off. consistent with the network topology shown in Fig. 1B. The stem Accumulation of Gata6 proteins leads to the inhibition of plu- cell network contains three self-activating genes Nanog, Cdx2, ripotency genes such as Oct4 and Sox2, while in the meantime and Gata6. The self-activation of Nanog arises indirectly via the the decreased expression of pluripotency genes also diminishes intermediate gene Pbx1. Because it is known that bistability can their repressive effect on the Gata6 gene. This positive rein- arise in certain regimes for self-activating motifs (38), a total of forcement leads to the final stable expression of the Gata6 gene three of these motifs will in principle give rise to N = 23 = 8 in the steady-state PE. steady states if they are independent. However, mutual antago- The changing gene expression patterns shown in Fig. 3A sug- nism exists between the Nanog (Oct4) and the Gata6 (Cdx2) gest that the transition from the steady-state SC1 to PE occurs in genes in the network. These counteracting interactions lead to two steps mediated by the intermediate-state SC2. This two-step the silencing of the differentiation genes when the Nanog gene is transition mechanism becomes more apparent when the differ- on and render inaccessible the three potential steady states entiation pathway is plotted using the expression levels of the having either only one of the differentiation genes on or both of Nanog and Gata6 genes, as shown in Fig. 3B. The three red them on. Therefore, the total number of steady states is reduced circles correspond to gene expression levels in each of the three from a potential eight to five. steady states, respectively. The transition pathway clearly transits The steady-state solutions shown in Fig. 2A are obtained from through the steady-state SC2 before committing to the steady- calculations performed in the adiabatic limit, and the DNA oc- state PE. cupancy states are assumed to achieve a conditional equilibrium. To further quantify the role of individual genes along the This assumption is, however, not always valid for eukaryotic cells transition pathway, weR decompose the transition action defined – 7 = p q_ because of slow chromatin dynamics (12, 14 16). To investigate in Eq. and use Sm dt m m to assess the contributions coming the effect of DNA-binding kinetics on the epigenetic landscape, from each given gene m. This additive decomposition is possible we determined the number of steady-state solutions for the stem owing to the self-consistent proteomic field approximation. Fig. cell network over a wide range of DNA-unbinding rates. The 3C, Upper presents the change of the transition actions for various results are presented in Fig. 2B. genes along the transition path from the steady-state SC1 to SC2. Fig. 2B demonstrates that from the weakly adiabatic regime The two genes Nanog and Pbx1 contribute most to this transition, ðf ∼ 10Þ to the strongly adiabatic regime ðf ∼ 104Þ, the network which is consistent with the dramatic change in the expression of

10188 | www.pnas.org/cgi/doi/10.1073/pnas.1408561111 Zhang and Wolynes Downloaded by guest on September 27, 2021 Number of Protein Molecules Fig. 4A compares the average expression of the two genes A 0 500 1000 1500 2000 2500 3000 3500 Nanog and Gata6 along the most probable transition pathways calculated at several different DNA-unbinding rates. The blue Gcnf Cdx2 line is the path calculated in the adiabatic limit and is the same as Gata6 Klf4 the transition path shown in Fig. 3B. The yellow and red lines are Pbx1 = 4 = Sox2 obtained at the DNA-unbinding rates f 10 and f 10, re- Oct4 spectively. Despite the three orders of magnitude change in the Nanog DNA-unbinding rate, there is little difference in the paths for SC1 SC2 Reaction Coordinate PE gene expression along any of the three pathways. B C

) On the other hand, Fig. 4B further compares the magnitudes 3 SC1 SC2 0.2 of the transition actions along the most probable pathways cal- PE Nanog 3.0 culated at different DNA-unbinding rates. The dashed line ) 0.1 Oct4 2 Sox2 indicates the action value in the adiabatic limit. The black solid 2.0 Pbx1 0 Differentiation Klf4 line is a numerical fit to the transition actions (red circles), using 4.0 SC2 PE Gata6 1.0 Action (10 the expression a=ð1=f + bÞ, in which f is the DNA-unbinding rate 3.0 Cdx2 2.0 Gcnf and a and b are two fitting parameters. As expected, the tran- 0 1.0 Gata6 Protein Number (10 SC2 SC1 sition action approaches the adiabatic value for large DNA- 0 0 0.5 1.0 0 0.2 0.4 0.6 0.8 1.0 unbinding rates ðf > 104Þ. Unlike the average gene expression 3 Reaction Coordinate Nanog Protein Number (10 ) paths shown in Fig. 4A that exhibit little dependence on the Fig. 3. The switching mechanism between steady states inferred from the DNA-unbinding rate, the transition action itself changes by most probable transition pathway. (A) Number of protein molecules along nearly two orders of magnitude as the DNA-unbinding rate the most probable transition path from the stem cell steady-state SC1 to the decreases from 104 to 10. As shown in Fig. S4, transition actions primitive endoderm steady-state PE. The x axis indicates the progression between other steady states exhibit similar strong dependences along the transition and the y axis refers to different transcription factors. on DNA-binding kinetics. Because the transition rate is expo- (B) The most probable differentiation pathway as in A plotted through the nentially proportional to the transition action, these results number of Nanog and Gata6 protein molecules. The steady-state solutions are shown as red circles. (C) Contributions to the action from individual suggest that noise-driven switching between attractors can occur genes along the most probable differentiation pathway from the steady- only in reasonable timescales in the weakly adiabatic regime. state SC1 to SC2 (Upper) and from the steady-state SC2 to PE (Lower). The results shown in Fig. 4 A and B can be understood from the “churning mechanism” proposed in ref. 38. In the weakly to strongly adiabatic regime, the DNA occupancy responds quickly the two genes and minimal change of the others shown in Fig. 3A. to the change of the proteomic atmosphere and reaches a local Fig. 3C, Lower further characterizes the transition from the steady state before the protein number changes by a large steady-state SC2 to PE. Major contributions again originate from amount. The mean expression of a given gene is mostly de- genes with significant expression changes. The transition action termined using an effective synthesis rate averaged over the for the Gata6 gene, however, emerges first along the pathway, ensemble of DNA occupancy states and is relatively insensitive indicating its important role in initiating the transition. We note to the DNA-binding kinetics as shown in Fig. 4A. Unlike the in Fig. 3C, Upper and Lower the transition actions increase only mean, the fluctuations of gene expression, however, strongly up to a point and then plateau afterward. This behavior arises depend on the DNA-binding kinetics. This fluctuation in gene because the most probable transition path connecting two steady expression can be quantified via a local diffusion rate in protein states travels through a saddle point. The transition action will number D. In the adiabatic limit, the local diffusion DBD comes increase only along the path from the locally probable starting purely from the stochastic birth and death of proteins and is steady state to the less probable saddle point. The path segment independent of the DNA-binding kinetics. As the DNA-binding from the saddle point to the end steady state coincides with the rate slows down in the weakly adiabatic regime, protein number zero-momentum deterministic path and makes no contribution diffusion arising from the fluctuation of DNA occupancy states to the transition action (28). becomes more and more significant. Indeed, the binding and ThestochastictransitionpathshowninFig.3providesa unbinding of transcription factors onto the DNA will churn the mechanistic understanding of the important role of the Nanog protein number like a turbulent surf, as suggested in ref. 38. This gene in safeguarding stem cell pluripotency. The calculations suggest that for a stem cell to differentiate, the network must transit through the intermediate steady-state SC2 by turning off A B

)

3 3 the Nanog gene. Because the Nanog off state acts as a bridge that Adiabatic 10 connects the stem cell attractor SC1 to the differentiated cell 4 3.0 f = 10 attractor PE, it is reasonable to expect that stem cells con- f = 10 structed with low expression of Nanog will be more likely to 2.0 2 differentiate compared with those constructed having high ex- 10 pression of Nanog, as has been observed in various experiments 1.0 (34, 35). A similar analysis on the transition from the steady-state Transition Action 0 Gata6 Protein Number (10 1

SC1 to TE, as shown in Fig. S3, leads to consistent conclusions. BIOPHYSICS AND 10 1 2 3 4 5 6 0 0.5 1.0 10 10 10 10 10 10 We note transition through an intermediate state with low Nanog 3

Nanog Protein Number (10 ) DNA Unbinding Rate f (k) COMPUTATIONAL BIOLOGY expression along the differentiation pathway has also been ob- served in previous studies (8, 14–16, 39). Fig. 4. The effect of DNA-binding kinetics on the switching pathway and switching rate between steady states. (A) Comparison of the most probable Effect of DNA-Binding Kinetics on Stochastic Switching differentiation pathways from the steady-state SC1 to PE calculated at several different DNA-unbinding rates. The blue line is the same as the adiabatic result To quantify the effect of DNA-binding kinetics on the stochastic alsoshowninFig.3B.(B) Dependence of the transition action on the DNA- transition mechanism and transition rate, we calculate the most

unbinding rate f. The black solid line is a numerical fit to the calculated actions CHEMISTRY probable transition path between the steady-state SC1 and PE (red circles), using the expression a=ð1=f + bÞ,witha and b as fitting parameters. for various DNA-unbinding rates. The dashed line indicates the action value obtained in the adiabatic limit.

Zhang and Wolynes PNAS | July 15, 2014 | vol. 111 | no. 28 | 10189 Downloaded by guest on September 27, 2021 additional diffusion in protein number caused by the churning In that respect, it is particularly encouraging to note that stem cells mechanism Dchurn scales as 1=f. The total diffusion rate is therefore can be classified into distinct clusters, in different ones of which the a sum of the two effects D = DBD + Dchurn. Finally, because the gene expression profiles bear close resemblance to those found for transition barrier scales as 1=D, the strong dependence of the restricted lineages (41, 44). The fast switching between attractors transition action on the DNA-binding kinetics shown in Fig. 4B in the weakly adiabatic regime may be beneficial to allow de- is understood. velopmental plasticity of stem cells. However, this tendency of switching can also be detrimental and can compromise the phe- Discussion notypic robustness of differentiated cells. We therefore expect that The strong dependence of transition rates between steady states cell fate may be fixed by a transition of gene network dynamics on the DNA-binding kinetics suggests a potential mechanism for from the weakly to the strongly adiabatic regime during cell dif- the heterogeneous distribution of gene expression levels in em- ferentiation to suppress the switching probability, possibly via co- bryonic cells (40), adult stem cells (41) and cancer cells (42). valent modifications of the chromatin such as methylation and Many kinetic studies suggest that the stem cell gene network may acetylation. function in a weakly adiabatic regime (16, 43), which according to Fig. S4 will allow fast switching between different steady-state ACKNOWLEDGMENTS. We thank Dr. Davit Potoyan, Dr. Masaki Sasai, gene expression profiles. Therefore, the experimentally measured Dr. Aleksandra Walczak, and Dr. Jin Wang for critical reading of the manuscript. gene expression levels from a population of cells may in fact rep- B.Z. acknowledges help from Dr. Matthias Heymann on the implementation of gMAM for finding the most probable path. We acknowledge financial resent, instead of gene expression from a single attractor, an en- support by the D. R. Bullard-Welch Chair at Rice University and by the Center semble mixture of different steady-state gene expressions, which for Theoretical Biological Physics sponsored by the National Science Founda- naturally leads to a heterogeneous distribution of expression levels. tion (Grant PHY-0822283).

1. Davidson E (2006) The Regulatory Genome: Gene Regulatory Networks in De- 23. Chapman K, Higgins S (2001) Regulation of Gene Expression (Portland Press, London) . velopment and Evolution (Academic, San Diego). 24. Walczak AM, Sasai M, Wolynes PG (2005) Self-consistent proteomic field theory of 2. Cherry AB, Daley GQ (2012) Reprogramming cellular identity for regenerative medi- stochastic gene switches. Biophys J 88(2):828–850. cine. Cell 148(6):1110–1122. 25. Doi M (1976) Second quantization representation for classical many-particle system. 3. Waddington CH (1957) The Strategy of the Genes (Allen & Unwin, London). J Phys A: Math Gen 9:1465–1477. 4. Sasai M, Wolynes PG (2003) Stochastic gene expression as a many-body problem. Proc 26. Peliti L (1985) Path integral approach to birth-death processes on a lattice. J Phys 46: Natl Acad Sci USA 100(5):2374–2379. 1469–1483. 5. Wang J, Xu L, Wang E (2008) Potential landscape and flux framework of non- 27. Zhang K, Sasai M, Wang J (2013) Eddy current and coupled landscapes for non- equilibrium networks: Robustness, dissipation, and coherence of biochemical oscil- adiabatic and nonequilibrium complex system dynamics. Proc Natl Acad Sci USA lations. Proc Natl Acad Sci USA 105(34):12271–12276. 110(37):14930–14935. 6. Lv C, Li X, Li F, Li T (2014) Constructing the energy landscape for genetic switching 28. Dykman MI, Mori E, Ross J, Hunt PM (1994) Large fluctuations and optimal paths in system driven by intrinsic noise. PLoS ONE 9(2):e88167. chemical-kinetics. J Chem Phys 100:5735–5750. 7. Wang J, Zhang K, Wang E (2010) Kinetic paths, time scale, and underlying landscapes: 29. Heymann M, Vanden-Eijnden E (2008) The geometric minimum action method: A A path integral framework to study global natures of nonequilibrium systems and least action principle on the space of curves. Commun Pure Appl Math 61:1052–1117. networks. J Chem Phys 133(12):125103. 30. Lindley BS, Schwartz IB (2013) An iterative action minimizing method for computing 8. Wang J, Zhang K, Xu L, Wang E (2011) Quantifying the Waddington landscape and optimal paths in stochastic dynamical systems. Physica D 255:22–30. biological paths for development and differentiation. Proc Natl Acad Sci USA 108(20): 31. Caroli B, Caroli C, Roulet B (1981) Diffusion in a bistable potential: The functional – 8257 8262. integral approach. J Stat Phys 26(1):83–111. 9. Kaern M, Elston TC, Blake WJ, Collins JJ (2005) Stochasticity in gene expression: From 32. Chickarmane V, Peterson C (2008) A computational model for understanding stem theories to phenotypes. Nat Rev Genet 6(6):451–464. cell, trophectoderm and endoderm lineage determination. PLoS ONE 3(10):e3478. 10. Segal E, Widom J (2009) From DNA sequence to transcriptional behaviour: A quan- 33. Chang R, Shoemaker R, Wang W (2011) Systematic search for recipes to generate titative approach. Nat Rev Genet 10(7):443–456. induced pluripotent stem cells. PLoS Comput Biol 7(12):e1002300. 11. Mariani L, et al. (2010) Short-term memory in gene induction reveals the regulatory 34. Chambers I, et al. (2007) Nanog safeguards pluripotency and mediates germline de- principle behind stochastic IL-4 expression. Mol Syst Biol 6:359. velopment. Nature 450(7173):1230–1234. 12. Miller-Jensen K, Dey SS, Schaffer DV, Arkin AP (2011) Varying virulence: Epigenetic 35. Kalmar T, et al. (2009) Regulated fluctuations in nanog expression mediate cell fate control of expression noise and disease processes. Trends Biotechnol 29(10):517–525. decisions in embryonic stem cells. PLoS Biol 7(7):e1000149. 13. Raj A, van Oudenaarden A (2009) Single-molecule approaches to stochastic gene 36. Miyanari Y, Torres-Padilla ME (2012) Control of ground-state pluripotency by allelic expression. Annu Rev Biophys 38:255–270. regulation of Nanog. Nature 483(7390):470–473. 14. Feng H, Wang J (2012) A new mechanism of stem cell differentiation through slow 37. Hay DC, Sutherland L, Clark J, Burdon T (2004) Oct-4 knockdown induces similar binding/unbinding of regulators to genes. Sci Rep 2:550. patterns of endoderm and trophoblast differentiation markers in human and mouse 15. Li C, Wang J (2013) Quantifying Waddington landscapes and paths of non-adiabatic embryonic stem cells. Stem Cells 22(2):225–235. cell fate decisions for differentiation, reprogramming and transdifferentiation. JR 38. Walczak AM, Onuchic JN, Wolynes PG (2005) Absolute rate theories of epigenetic Soc Interface 10(89):20130787. – 16. Sasai M, Kawabata Y, Makishi K, Itoh K, Terada TP (2013) Time scales in epigenetic stability. Proc Natl Acad Sci USA 102(52):18926 18931. dynamics and phenotypic heterogeneity of embryonic stem cells. PLoS Comput Biol 39. Li C, Wang J (2013) Quantifying cell fate decisions for differentiation and re- 9(12):e1003380. programming of a human stem cell network: Landscape and biological paths. PLoS 17. Feng H, Han B, Wang J (2012) Landscape and global stability of nonadiabatic and Comput Biol 9(8):e1003165. adiabatic oscillations in a gene network. Biophys J 102(5):1001–1010. 40. Martinez Arias A, Brickman JM (2011) Gene expression heterogeneities in embryonic – 18. Potoyan DA, Wolynes PG (2014) On the dephasing of genetic oscillators. Proc Natl stem cell populations: Origin and function. Curr Opin Cell Biol 23(6):650 656. Acad Sci USA 111(6):2391–2396. 41. Moignard V, et al. (2013) Characterization of transcriptional networks in blood stem 19. Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys and progenitor cells using high-throughput single-cell gene expression analysis. Nat Chem 81:2340–2361. Cell Biol 15(4):363–372. 20. Aurell E, Sneppen K (2002) Epigenetics as a first exit problem. Phys Rev Lett 88(4): 42. Gupta PB, et al. (2011) Stochastic state transitions give rise to phenotypic equilibrium 048101. in populations of cancer cells. Cell 146(4):633–644. 21. Roma DM, O’Flanagan RA, Ruckenstein AE, Sengupta AM, Mukhopadhyay R (2005) 43. Teles J, et al. (2013) Transcriptional regulation of lineage commitment—a stochastic Optimal path to epigenetic switching. Phys Rev E Stat Nonlin Soft Matter Phys model of cell fate decisions. PLoS Comput Biol 9(8):e1003197. 71(1 Pt 1):011902. 44. Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S (2008) Transcriptome-wide 22. Wang P, et al. (2014) Epigenetic state network approach for describing cell pheno- noise controls lineage choice in mammalian progenitor cells. Nature 453(7194): typic transitions. Interface Focus 4(3):20130068. 544–547.

10190 | www.pnas.org/cgi/doi/10.1073/pnas.1408561111 Zhang and Wolynes Downloaded by guest on September 27, 2021