Liang J, Qian H. Computational cellular dynamics based on the chemical master equation: A challenge for understanding complexity. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 25(1): 1– Jan. 2010

Computational Cellular Dynamics Based on the Chemical Master Equation: A Challenge for Understanding Complexity

Jie Liang1,2 (梁 杰) and Hong Qian3,4 (钱 纮)

1Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, U.S.A. 2Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China 3Department of Applied Mathematics, University of Washington, Seattle, WA 98195, U.S.A. 4Kavli Institute for Theoretical China, Chinese Academy of Sciences, Beijing 100190, China E-mail: [email protected]; [email protected] Received August 29, 2009; revised November 16, 2009.

Abstract Modern molecular biology has always been a great source of inspiration for computational science. Half a century ago, the challenge from understanding macromolecular dynamics has led the way for computations to be part of the tool set to study molecular biology. Twenty-five years ago, the demand from genome science has inspired an entire generation of computer scientists with an interest in discrete mathematics to join the field that is now called bioinformatics. In this paper, we shall lay out a new mathematical theory for dynamics of biochemical reaction systems in a small volume (i.e., mesoscopic) in terms of a stochastic, discrete-state continuous-time formulation, called the chemical master equation (CME). Similar to the wavefunction in , the dynamically changing landscape associated with the state space provides a fundamental characterization of the biochemical reaction system. The stochastic trajectories of the dynamics are best known through the simulations using the Gillespie algorithm. In contrast to the Metropolis algorithm, this Monte Carlo sampling technique does not follow a process with . We shall show several examples how CMEs are used to model cellular biochemical systems. We shall also illustrate the computational challenges involved: multiscale phenomena, the interplay between stochasticity and nonlinearity, and how macroscopic determinism arises from mesoscopic dynamics. We point out recent advances in computing solutions to the CME, including exact solution of the steady state landscape and stochastic differential equations that offer alternatives to the Gilespie algorithm. We argue that the CME is an ideal system from which one can learn to understand “complex behavior” and complexity theory, and from which important biological insight can be gained. Keywords biochemical networks, cellular signaling, epigenetics, master equation, nonlinear reactions, stochastic modeling

1 Introduction The Law of Mass Action and the CME are two parts of a single mathematical theory of chemical reaction Cellular biology has two important foundations: ge- systems, with the latter being fundamental. When the nomics focuses on DNA sequences and their evolu- number of molecules in a reaction system are large, tionary dynamics; and biochemistry studies molecular stochasticity in the CME disappears and the Law of reaction kinetics that involve both small metabolites Mass Action can be shown, mathematically, to arise as and large macromolecules. Computational science has the limit[1-2]. been an essential component of genomics. In recent In this article, we shall introduce the CME approach years, cellular biochemistry is also increasingly relying to biochemical reaction kinetics. We use simply exam- on mathematical models for biochemical reaction net- ples to illustrate some of the salient features of this works. Two approaches have been particularly promi- yet to be fully developed theory. We then discuss the nent: the Law of Mass Action for deterministic non- challenges one faces in applying this theory to computa- linear chemical reactions in terms of the concentrations tional cellular biology. There have been several recent of chemical species, and the Chemical Master Equation texts which cover some of the materials we discuss. See (CME) for stochastic reactions in terms of the numbers [2-3]. of reaction species.

Survey This work is supported by US NIH under Grant Nos. GM079804, GM081682, GM086145, GM068610, NSF of USA under Grant Nos. DBI-0646035 and DMS-0800257, and ‘985’ Phase II Grant of Shanghai Jiao Tong University under Grant No. T226208001. ©2010 Springer Science + Business Media, LLC 2 J. Comput. Sci. & Technol., Jan. 2010, Vol.25, No.1

2 A System of Nonlinear Reactions For complex biochemical reactions, master equa- tion like this in general cannot be solved analytically. To illustrate the theory of the CME and the Law Various algorithms exist for simulating its stochastic of Mass Action, let us first consider a simple system of [8] [4] trajectories . For the above specific example, however, nonlinear chemical reactions first proposed by Schl¨ogl the exact stationary to (3), i.e.,

α β1 after the system reaches stationarity, can be found as A + 2X ­1 3X,B + X ­ C, (1) α2 β2 [9-10]: kY−1 v in which species A, B and C are at fixed concentrations P (k) = C j , (4) 0 w a, b and c, respectively. The traditional, macroscopic j=0 j kinetics of the system (1), according to the Law of Mass where C0 is a normalization constant such that P∞ Action, is described by a deterministic ordinary differ- k=0 P (k) = 1. The number of X molecules still fluc- ential equation (ODE)[5] tuates in the steady state. We note that for large V ,

k−1 k−1 dx 3 2 X v X v(k/V ) = −α2x + α1ax − β1bx + β2c, (2) ln P (k) = ln j + C ≈ ln dt w 1 w(k/V ) j=0 j j=0 where x represents the concentration of X. It is ³ ´ Z k/V straightforward to show that (2) exhibits bistabi- 1 v(z) + o + C1 ≈ V ln dz + C1, lity (via the so called pitchfork bifurcation) when V 0 w(z) 2 [4-5] α2β1b/(α1a) = 1/3 : that is, the polynomial on in which the right-hand-side switches from having only one pos- itive root to have three positive roots. The system v(z) = z2 + σ, w(z) = z3 + µz, also shows another bifurcations when varying another 2 2 3 2 3 µ = α2β1b/(α1a) , σ = α2β2c/(α1a) , and C1 = ln C0. lumped parameter α2β2c/(α1a) (this time via the so called saddle-node bifurcation). Therefore in terms of the concentration x = k/V , we We now turn to the CME approach to this reaction have the probability distribution f(x) = VP (V x): system (1). If in a small volume such as that of a cell, 1 1 ln f(x) = ln P (V x) + C the number of X is sufficiently small, its concentration 2V 2V 2 fluctuations become significant[6]. The dynamics of re- Z x 1 v(z) ˆ action (1) then is stochastic, which should be described ≈ ln dz + C (5) 2 0 w(z) in terms of a master equation, also known as a birth- Z 1 x z2 + σ death process in the theory of Markov processes[7]. ˆ = ln 3 dz + C. (6) The system is represented by a discrete random vari- 2 0 z + µz able nX (t): the number of X at time t (0 6 nX < ∞). Therefore, the stationary probability distribution of Let P (k, t) = Pr{nX (t) = k}, and we have the concentration of X: dP (k, t) −V φ(x) = v P (k − 1, t) + w P (k + 1, t) f(x) ≈ e , (7) dt k−1 k − (v + w )P (k, t), (3) where Z k k−1 x z2 + σ φ(x) = − ln 3 dz, (8) where 0 z + µz α1ak(k − 1) is independent of V . It is easy to verify that φ(x) is vk = + β2c, V 2 at its extrema exactly when the ODE (2) is at its fixed and points. The function φ(x) can be thought as a “land- α (k + 1)k(k − 1) β b(k + 1) scape” for the nonlinear chemical reaction system. w = 2 + 1 . k V 3 V Closed System, Detailed Balance and Chemical Equi- librium. A chemical equilibrium is reached in the reac- Here V is the volume of the reaction system. It is tion system (1) when a very important parameter of the model. The basic 3 rule is still the Law of Mass Action: the rate of one [X] α1 [C] β1 β 1 2 = , = . (9) step reaction B + X −→ C, when there are k + 1 num- [A][X] α2 [B][X] β2 ber of X molecules, is β b(k + 1)/V . This gives the 1 This leads to the equilibrium condition that above last term. Similarly, the rate of one step reac- α 1 ³ ´eq tion A + 2X −→ 3X, when there are k number of X [C] α1β1 2 = . (10) molecules, is α1ak(k − 1)/V . [A][B] α2β2 Jie Liang et al.: Computational Cellular Dynamics Based on the CME 3

In term of the two model parameters µ and σ intro- precisely the extrema of φ(x) where duced above, this equilibrium (also called detailed ba- lance) condition is expressed as x2 + σ φ0(x) = − ln = 0. (16) x3 + µx 2 3 σ α2β2c/(α1a) α2β2c = 2 = = 1. (11) Therefore, the nonlinear chemical reaction system is µ α2β1b/(α1a) α1β1ab bistable. The dynamics of the system exhibits multiple This equation has a very strong thermodynamic mean- time scale: the relaxation within each “well” and tran- ing: the term ln(σ/µ) = ∆G/(kBT ) is the chemical sitions between the two wells. The former can be accu- potential difference between A + B and C. If one con- rately described by a Gaussian (linear) random process. siders A + B and C as two nodes in a circuit, then ∆G The latter, as two-state transitions, is on a much longer is the potential between them. When ∆G 6= 0, there time scale. exists a nonequilibrium chemical driving force exerted It can be shown, according to the CME, that for a on the reaction system. closed nonlinear chemical reaction system, its stationa- [9,11] Mathematically, the ODE (2) can be simplified. Let ry distribution has a unique peak, the equilibrium . 2 u = α2x/(α1a) and τ = (α1a) t/α2, then (2) becomes Furthermore, the fluctuating dynamics, i.e., the sta- tionary stochastic process in equilibrium is statistically du time reversible[12]. These theoretical results indicate = −u3 + u2 − µu + σ, (12) dτ that complex behavior such as chemical bistability in- deed can only occur in a “living system” with dissipa- in which µ, σ > 0. If σ = µ, the right-hand-side of (12) tion, i.e., useful chemical energy is converted into heat, 2 becomes −(u +µ)(u−1). There is only one unique fixed and the process sustains a self-organizing complex dy- eq point, i.e., u = 1, the equilibrium point. This result namical system[13-14]. is general. For equilibrium system, the steady state dis- Multiscale Dynamics and the Keizer’s Paradox. Ev- tribution obtained from the CME is always uni-modal, ery CME model contains the parameter V , the volume corresponding to the unique fixed point obtained from of the reaction system. When the number of molecules, [9,11] the Law of Mass Action ODE . N, and V → ∞, the mathematical solution to the CME Nonequilibrium Steady State, Gaussian Approxima- agrees with that from the Law of Mass Action which de- tion, and Multiscale Dynamics. When σ 6= µ, the che- scribes concentration x = N/V [1-2]. For most biochem- mical reaction system is not in detailed balance. In ical models, one might also be interested in the station- this case, there is a continuous conversion of chemical ary behavior of the solution to the CME. This repre- energy to heat, even in the steady state. Therefore, sents all the numbers of molecules in a reaction sys- there is a continuous production of entropy due to the tem, which are statistically independent of time, with conversion of more useful chemical energy to less useful stationary number fluctuations due to the biochemical heat. The entropy production rate reactions. One naturally identifies this with the home- ostasis of a cell. Mathematically, this means one is µ [15] epr = kBTJ ln . (13) interested in the limit of t → ∞. H¨anggi et al. and σ Baras et al.[16] correctly pointed out, however, there The nonequilibrium steady-state (NESS) has a net flux is a delicate computational issue of V (and N) → ∞ in the overall reaction A + B → C: and t → ∞ and changing the order of the limits can lead to different mathematical predictions. This non- J = u2 − u3 = µu − σ. (14) exchangability between V → ∞ and t → ∞ has been named Keizer’s paradox. One needs to be extra care- It is easy to show that the epr in (13) is always posi- ful in dealing with the steady state behavior of a CME tive in the NESS. This result should be compared with model. “power = current × voltage” being always positive in This issue has been re-examined recently[11,17] in a stationary electrical circuit. more details. It is shown to be intimately related to For certain parameter values, say µ = 0.25 and the multiple time scales of the bistability. The transi- σ = 0.01, the landscape function φ(x) in (8) has two tion rates between the two states of a bistable system minima and one maximum in-between: are exponentially small with increasing V : ∝ e−αV where α is a positive constant. −u3+u2−µu+σ ≈ −(u−0.05)(u−0.32)(u−0.63). (15) One naturally would like to approximate the CME in terms of a Fokker-Planck equation (second order It is easy to see that the root of u3 + µu = u2 + σ is PDE). The Fokker-Planck approximation of the CME 4 J. Comput. Sci. & Technol., Jan. 2010, Vol.25, No.1 has been discussed in several treatises (e.g., p. 116 of to the CME, the stationary probability distribution for [18]). The approach is similar to the diffusion approxi- the number of E∗ is mation theory for Boltzmann equation (Subsections 3.2 nY−1 ˆ and 3.3 of [18]). Keizer also discussed multiple steady- (k1j + k−2)(Nt − j) pss(n) = C , (19) states in biochemical reaction systems. However, the ˆ j=0 (k−1j + k2)(j + 1) consequence of the multi-stability with diffusion appro- ximation has not been fully discussed. van Kampen has ˆ e where ki = ki/V , i = ±1. C is a normalization factor. repeatedly emphasized that the Fokker-Planck approxi- It is easy to check that the distribution in (19) has mation can be obtained for master equations only with ∗ ∗ two peaks, one at n1 = 0 and the other at n2: small individual jumps. A more sophisticated treat- ment of the Fokker-Planck approximation for master ˆ ˆ ∗ k2 + k−2 + k−1 − k1Nt+ equation was given in terms of the Ω-expansion (Ch. 10 n2 = of [10]). This theory provides a more satisfying appro- ³ 2 ´ 1 (k + k + kˆ − kˆ N ) 2 ximation for the stochastic relaxation in the limit of 2 −2 −1 1 t −4(kˆ + kˆ )(k − k N ) large V . However, it does not address how to obtain the −1 1 2 −2 t . (20) ˆ ˆ stationary distribution with multistability. Computing 2(k1 + k−1) such a stationary distribution is a major challenge. It is usually not an integer. Hence it exhibits stochastic 3 Stochastic Bistability in the CME bistability in a small volume. In the previous section we stated that for suffi- 4 Biochemical Bistability in a Cell and ciently large V , the CME gives a stationary probabi- Epigenetic Inheritance with a Distributive lity distribution for the numbers of all the dynamical Code species, which is in complete agreement with the pre- Since the discovery of DNA double helix, it has diction from the Law of Mass Action. A bistable sys- been well understood that DNA replication is the mole- tem according to the Law of Mass Action, thus, corre- cular basis of biological inheritance. However, in addi- sponds to a bimodal distribution in the CME. The con- tion to DNA based inheritance, epigenetic inheritance verse is not true, however. In recent years, there have has become an increasingly important concept in cell been increasingly more examples showing that non- differentiation, stem cell research, as well as bacte- linear biochemical reaction systems with macroscopic rial persistence[20]. Current research has been focusing unistability can exhibit bistable behavior in a small on several specific molecular processes as the possible volume. These results have important implications to “code” for epigenetics, e.g., histone acetylation[21] and cellular biochemistry. We shall give one example: the DNA methylation[22-23]. One of the key issues is that phosphorylation-dephosphorylation cycle (PdPC) with the code has to be sufficiently stable. This leads re- autocatalytic kinase[19]: searchers to look for specific covalent modifications of transcriptional regulation apparatus. ∗ k1 ∗ ∗ ∗ k2 E+E +AT P ­ E +E +ADP, E ­ E+Pi. (17) However, specific covalent modifications might not k−1 k−2 be necessary in some cases. According to the theory of If we use x to denote the fraction of the phosphory- the CME, the stability of a state of a biochemical re- lated E∗, then according to the Law of Mass Action: action system, i.e., the peak in the stationary distribu- tion, is due to the biochemical reaction network[24]. In dx e e 2 other words, the epigenetic code could be distributive, = k1x(1 − x) − k−1x − k2x + k−2(1 − x) dt namely, properties such as state stabilities are the out- e e 2 e = − (k1 + k−1)x + (k1 − k2 − k−2)x + k−2, come of the collective behavior of many components of a (18) biochemical network[23]. Therefore, detailed molecular mechanism(s) aside, the nonlinear biochemical reaction e e where k1 = k1EtcT, k−1 = k−1EtcD, Et is the total network(s) as the foundation of cellular epigenetics has ∗ concentration of E and E , cT and cD are ATP and to be valid. ADP concentrations. (18) has two steady states, only Ptashne has recently re-emphasized the importance one is positive and chemically meaningful. Hence there of heritability in the term of “epigenetics”[25]. We shall is no bistability in macroscopic size reaction system, point out that the states of bi- or multistable nonlinear with any parameters. biochemical reaction systems, as defined above, natu- However, if the exactly same nonlinear PdPC is in rally give rise to heritability. It is important to recall a small reaction volume such as a cell, then according that the function φ(x) above is independent of V , and Jie Liang et al.: Computational Cellular Dynamics Based on the CME 5 the variable x is the concentration. Hence, assuming algorithm. In analyzing a CME model, these two ap- there is no specific mechanism of regulating the pro- proach complement to each other. duction of molecule X, if the system’s volume is in- One type of chemical reaction systems, the single creased, the concentration x will go down. However, molecules or uni-molecular reaction system, has been the nonlinear dynamic nature of the network automa- extensively studied in the past. It is important to tically regulates the system and the steady state con- note that such systems are linear chemical reaction sys- centration of X is regained. Thus, as long as the vo- tems. Since all the molecules in uni-molecular reaction lume of the system is slowly increasing in the synthesis systems are statistically independent, it can be repre- phase of the cell cycle, the concentrations of all the key sented by either the particle-state-tracking (PST) algo- biochemical species (i.e., transcriptional regulators) are rithm or particle-number-tracking (PNT) algorithm[26]. always maintained at its steady state value. Only when The simulation can also be carried out approximately, the volume change occurs in short time period and the but satisfactorily, by a continuous model of Langevin amount of change is sufficiently large, there would be a dynamics[27]. There is no multistability in such sys- chance that the system “jumps” into another basin of tems; nor complex dynamics. attraction (Fig.1). If the basins of attraction of states The difference between PST and PNT is as follows: are broad, then a daughter cell will still be in the same one either considers the discrete states of the particles state as the parent cell without the need for any addi- in the simulation, or considers the number of particles tional signal and regulation. in a particular state. These two approaches correspond precisely to the Lagrangian and Euler descriptions of fluid particles — in terms of trajectories of particles and in terms of the density[28]. In the current re- search on stochastic simulation of biochemical reaction systems, these correspond to the StochSim/MCell[29] and the StochKit, respectively. The Langevin approx- imated algorithm is closely related to the linear noise approximation (LNA)[30]. The LNA can be only valid within each “peak” region, i.e., a basin of attraction, of the CME. For nonlinear reaction systems with multi- stability, the Keizer’s paradox can occur which invali- dates the Langevin approximation for the longer time Fig.1. Schematics showing how two biochemical states of a non- scale dynamics. linear biochemical reaction system can be inheritable if the vol- On more general terms, there are many reasons to ume of the reaction system is increased, and then divided into seek accurate solution to the CME directly, although two. Note that the abscissa is concentration, not number of much has been learned about the overall probabilis- molecules. In the figure, an increase in volume with a factor tic landscape of many biochemical networks through of 2, corresponding to a decrease of concentration to one half, stochastic simulations (Gillespie, StochSim/MCell, and will still maintain the system in its original attractors. Division StochKit) and approximated continuous models based does not change the concentration. on stochastic differential equations. First, details of the 5 Computational Challenges from the topological features and their quantification such as the Chemical Master Equation existence and location of basins of attraction, craters, peaks, and saddle points of various dimensions, their In the theory of the CME, the dynamics of a bio- widths, breadth, and depths on the probabilistic land- chemical reaction system, in a small volume, is repre- scape, as well as their possible biological implications sented by a multi-dimensional, integer-valued stochas- such as the inheritable epigenetic state arising from the tic jump process in Zn. The process is a discrete- properties of the network are largely unexplored. This state, continuous-time Markov process. As any Markov is true even for simple reaction systems such as the process, it can be mathematically characterized ei- 2-dimensional Schnakenberg model[31], which is only ther in terms of its ensemble of stochastic trajectories, slightly more complex than the 1-dimensional Schl¨ogl or by its probability distribution as function of time. model discussed above, as there are no general exact These correspond to the stochastic differential equa- probabilistic solutions available yet. Second, accurate tion and the Fokker-Planck equation representations solution to the CME problems can facilitate develop- of a Brownian dynamics. The CME is the differential ment of approximation methods that are capable of equation for the probability distribution; the stochas- solving large-size problems. There is a large body of tic trajectory is defined by the well-known Gillespie studies on theoretical approaches approximating the 6 J. Comput. Sci. & Technol., Jan. 2010, Vol.25, No.1

CME through the Fokker-Planck, and equivalently system. For example, if there are 16 molecular species Langevin, equations. For effective design of these mod- in a network, and there are only a total of 33 copies of els and efficient computations of accurate solutions to molecules in the whole system, one can estimate some- large biochemical systems, it is essential to have some what naively the upper bound of the state space as a priori knowledge of the ground truth. Third, per- (33 + 1)16 = 3.19 × 1024. Note the +1 counts the zero haps most importantly, an accurate solution to the copy as a state. This is an astronomic number that is CME of simpler model systems can reveal important well beyond what can be computed with current and insights into basic principles on how biological net- for-seeable future computing technology. works function and how they respond to various envi- Below we discuss the enumeration of the state space ronmental perturbations. A shining example of study- of CME and exam how to obtain exact steady state so- ing complex systems using manageably simple mod- lutions to the CME for biochemical systems with small els is the study of protein folding. Models such as and moderate sizes. two- and three-dimensional lattice self-avoiding walks Optimal Enumeration of State Space. Although in with only hydrophobic and polar (HP) interactions principle the size of the state space grows exponen- allow complete enumeration of all feasible conforma- tially with the number of molecular species and the tions and calculation of exact thermodynamic param- copy numbers of molecules in the system, all is not eters for molecules with short chain lengths. They lost. There are two important observations about gene- have played important roles in elucidating the princi- ral biochemical networks. First, the Markovian transi- ples of protein folding[32], including collapse and fold- tion is very sparse. For any given microstate, ing transitions[33-40], influence of packing on secondary the number of reactions that could occur in a short structure and void formation[41-44], the evolution of time interval is small, which could be bounded by the protein function[45-46], nascent chain folding[47], and the total number of possible reactions in a biochemical net- effects of chirality and side chains[44]. work. Second, as an open system, molecules are synthe- sized and degraded constantly. However, the number of 6 State Space of the Chemical Master molecules that can be synthesized is never infinite, as Equation and Exact Calculation of Steady synthesis is constrained by the time and resources re- State Probability Landscape quired. With these two considerations, an algorithm to enumerate the state space of CME has recently been The state space of the CME is that of M-dimensional developed[48]. The algorithm is optimal in memory re- vectors with non-negative integers, which represents the quirement, as it allows the enumeration of all states copy numbers of molecular species in a network; M is that can be reached from a given initial state, without the number of dynamic species. These states are micro- including any irrelevant states. In addition, all possible scopic in nature, as they provide a detailed, chemical transitions are recorded, and no infeasible transitions amount of each and every molecular species. An impor- are attempted. The resulting transition matrix based tant advantage of treating these microscopic states of on the enumerated state space is compact without re- copy numbers explicitly is that both linear and nonlin- dundant information, and is minimal in size. In addi- ear reactions (such as synthesis, degradation, bimole- tion, its computational time is also optimal[48]. cular association, and polymerization) can be modeled Exact Calculation of Probability Distribution of the as Markovian transitions between two microstates, one Steady State. Once the states reachable from a given reaction at a time. Here the transition rates between initial state are enumerated, the rates of chemical re- states are determined by the intrinsic propensities of actions connecting two of these states can be com- the reaction, and the copy numbers of molecules in- puted. For example, we can study a simplified model of volved. protein-DNA interaction. For the process of two pro- For any biochemical systems beyond the simplest toy tein monomer (ProteinA) dimerize and bind to a seg- problems, a challenging issue in obtaining an accurate ment of DNA (GeneB), we can use the simplified model solution to the CME is the characterization of the state below. If we denote the rate of the reaction that brings space. That is, what are all the possible combinations the before-state i to the after-state j as aj,i, we have of concentrations (or copy numbers) of the molecular for the third order reaction: species for a given set of reactions represented by a 2 × ProteinA + GeneB −→b BoundGeneB, network? An accurate description of the state space is a prerequisite for computationally obtaining solu- with the following reaction rate coefficient aj,i: tions to the CME. In principle, the size of the state aj,i = b · ngB, i · npA, i · (npA, i − 1)/2, space grows exponentially with the number of molec- ular species and the copy numbers of molecules in the where b is the intrinsic reaction rate which contains Jie Liang et al.: Computational Cellular Dynamics Based on the CME 7

hidden systems volume V , npA, i is the copy number course, gives more than just a stationary distribution, of protein A in state i, and ngB, i is the copy num- but solving the steady state is almost obligatory in any ber of unbound gene B. Here the combination num- analysis of mathematical models. ¡ber of¢ the protein for this second order reaction is Toggle Switch. In Section 3, we already discussed npA, i 2 = npA, i ·(npA, i −1)/2. Note that in addition to how bistability arises from stochasticity. Another ex- the volume V , there is a factor of 2 difference between ample is the well studied genetic toggle-switch system. the intrinsic reaction rate here and the macroscopic rate This is a small network consisting of two genes, A and constant discussed in Section 2. B, each inhibits the other (Fig.2). It was the first syn- Once the full reaction rate matrix A = {aj,i} ∈ thetic network constructed in a wet lab from two re- Rn×n is filled with computed rates, the chemical master pressible promoters arranged in a mutually inhibitory equation can be written in a matrix-vector form as: network in Escherichia coli by Gardner et al.[52]. It is flippable between two stable states by chemical or ther- ˙ P (t) = AP (t). (21) mal induction and exhibits an ideal switching thresh- old. This toggle switch forms a synthetic cellular mem- Here the matrix A represents the infinitesimal gene- [52] rator of a continuous time Markov process. The dia- ory unit . Although this is the simplest network P with bistability that can already be identified from gonal elements aii is set as: aii = − i6=j aj,i, and all off-diagonal elements are nonnegative. The analytical ODE models based on the Law of Mass Action, impor- solution at time t to (21) can be written as a matrix tant questions such as switching probability between exponential: the “on” and “off” states requires a treatment of the stochasticity. Although there have been great recent P (t) = exp(At)P (0). (22) progresses in deriving analytical solutions[53-56], they are applicable under special conditions, such as fast At The matrix e is the Markovian state transition pro- transition between the on- and off-states, or overall bability matrix with time duration t. We can also ob- tain its discrete equivalent M as[40]:

M = I + A · ∆t, (23) where I is the identity matrix, ∆t is a small time inter- val during which one reaction occurs. When the system has reached the steady state, the probability landscape over the enumerated states P can be computed by solv- ing the equation: P = MP . Here P can be obtained with an iterative solver such as that based on the successive over-relaxation (SOR) technique[49]. Alternatively, since P for the steady state corresponds to the eigenvector of M with eigen- value 1.0, one can obtain P by using eigenvector method such as the Arnoldi method[50], as done in [48]. By examining computationally the stochastic be- havior of genetic circuits for wild type and mutant networks, and by studying the of rare events, one can gain further understanding of the re- gulation mechanism of genetic circuits, its system sta- bility against perturbation (such as fluctuations in nu- Fig.2. Model of a toggle switch. (a) The network model and the tritional conditions), and its robustness against genetic reaction rates. Single copies of gene A and gene B encode protein mutations (such as those due to DNA damage)[51]. products. Two protein monomers can repress the transcription 7 Two Examples of Stochastic Biochemical of the other gene. The synthesis of protein product of gene A and Systems and Their CMEs gene B depends on the bound or unbound state of the gene. (b) The chemical reactions of the 8 stochastic processes involved in In this section we give two examples on how exact the toggle-switch system. The reaction rates include s for protein stationary probability landscapes of a biochemical net- synthesis, d for protein degradation, b for protein-gene binding, work can be computed from its CME. The CME, of and u for protein-gene unbinding (adapted from [48]). 8 J. Comput. Sci. & Technol., Jan. 2010, Vol.25, No.1 small noise associated with high concentrations. With phage lambda to survive, as it allows phage to escape the algorithm for state enumeration, the steady state from hopelessly distressed E. coli host cells. In phage landscape probability of the toggle-switch can be solved lambda, a gene regulatory circuit controls the switching exactly for models with arbitrary parameter specifica- between the maintenance of the lysogenic state and the tions. induction of the lytic state. The CME model analysis Epigenetic Switch in Phage Lambda. Exact solution clearly demonstrates the idea of a distributive epige- of the CME can also be obtained for larger systems in netic code. As a paradigm for understanding cell de- which biological phenomenon are modeled more realis- velopment, phage lambda has been extensively studied, tically. An example is the epigenetic switch of phage with the molecular components and reaction rates well lambda. Phage lambda is a virus that infects E. coli characterized (see the seminal book by Ptashne[57]). bacteria. It is the system in which gene regulation The key components of the switch of the genetic cir- was first studied. Upon infection, phage lambda can cuits and their wirings can be summarized in Fig.3. choose two different life styles. In the lysogenic path- There are three operators (OR1, OR2, and OR3) and way, the DNA of phage lambda becomes integrated two promoters (Pr and Prm). These are used to control into the chromosome of the host, and can replicate the transcription of CI and Cro proteins, which dimer- for many generations along with the host. Upon ad- ize and bind to the operator sites with different affinity verse environmental perturbations such as UV irradia- and inhibit the expression of each other[57]. tion, phage lambda switches from the lysogenic path- The importance of stochasticity in the genetic cir- way to the lytic pathway, in which it uses the protein cuit of lambda phage is well recognized, and its effects synthesis machinery of the host, and replicate to 100s have been studied using stochastic simulations[58] and of copies, which leads to the burst of the host cell. The stochastic differential equations[24,59]. The steady state lytic pathway offers critical evolutionary advantage for probability landscape of the CME model based on the

Fig.3. Phage λ switching network. Reactions including binding and unbinding, synthesis and degradation, dimerization are labeled as arrows, along with the corresponding kinetic constants (adapted from [51]).

Fig.4. Lysogenic and lytic states and CI synthesis rate. (a) Lysogenic state, Ks CI=0.045/s. (b) The switching state, Ks CI=0.0245/s. (c) Lytic state, Ks CI=0.0077/s. X and Y axes are copy numbers of CI and Cro dimers; and Z axis is the marginal probability (adapted from [51]). Jie Liang et al.: Computational Cellular Dynamics Based on the CME 9

[51] network depicted in Fig.3 can be solved directly . matrix obtained by selecting states in J1 and J2 are

Fig.4 shows the probability landscape under several AJ1 and AJ2 , respectively. The first observation is: physiological conditions when the system is in the lyso- A A genic state, in transitory switching state, and in lytic J2 J1 (e )J1 > e > 0. (24) state[51]. Fig.5 shows the phase diagram of concentra- tions of CI and Cro at different CI synthesis rate. This relation implies that by increasing the size of the selected subset of states, the approximation improves monotonically. Second, if one obtains a reduced state space by selecting states contained in the index set J T tAJ and if 1 e PJ (0) > 1 − ² for ² > 0 and t > 0, then:

tAJ tAJ e P J (0) 6 P J (t) 6 e P J (0) + ²I. (25)

That is, starting with the initial probability of the re- duced vector P J (0), compute the probability vector in tAJ the reduced space e P J (0) at time t using the re- duced rate matrix AJ . If the inner-product of this vector for time t with 1 is no less than 1 − ², then the error of this vector with the projected true vector P J (t) from the true probability P (t) is no more than ²I. This inequality guarantees that the approximation obtained with reduced state space will never exceed the actual solution, and its error is bounded by ²[60]. These key observations led to the Finite State Project Algorithm, which iteratively adds new states to an initial reduced state space, until the approxima- tion error is within a prescribed bound[60]. Munsky and Fig.5. Relative CI and Cro dimer levels for wild type and mu- Khammash further extended the original Finite State [61] tant lambda phage. The lysogenic state has high CI (solid line) Projection method , and recommends running a few concentration, and the lytic state has high Cro (dashed line) con- steps of stochastic simulation to obtain the initial prob- centration. Wild type lambda phage has a well-behaving switch, ability vector P (0) that is non-sparse. However, there while mutants are all leaky (adapted from [51]). are no generally applicable strategies as to what states to add to a finite projection to most efficiently improve By examining computationally the stochastic beha- the approximation accuracy. vior of genetic circuits for wild type and mutant net- Krylov Subspace Method. The analytical solution to work, and by studying the probabilities of rare events, the CME can be expressed in the form of a matrix ex- one can gain further understanding of the regula- ponential P (t) = eAtP (0). As discussed before, the tion mechanism of genetic circuits, its system stabil- rate matrix A has a very large dimension but is sparse. ity against perturbation (such as fluctuations in nutri- An alternative approach to reduced the state space is tional conditions), and its robustness against genetic to convert the problem of exponentiating a large sparse [51] mutations (such as those due to DNA damage) . matrix to that of exponentiating a small dense matrix in the Krylov subspace K [62]: 8 Methods for State Space Simplification m

m−1 For large systems in which enumeration is no longer Km(At, P (0)) ≡ Span{P (0), ··· , (At) P (0)}. feasible, one approach for numerical computation is to (26) reduce the large number of microstates to a smaller fi- The idea is that the Krylov subspace used is of a very [60] nite number . small dimension of m = 30−60. Denoting ||·||2 as the 2- Finite State Projection. Munsky and Khammash norm of a vector or matrix, the approximation then be- made two key observations about projecting the high comes P (t) ≈ ||P (0)||2V m+1 exp(tHm+1)e1, where e1 dimensional state space to a lower dimensional finite is the first unit basis vector, V m+1 is a (m+1)×(m+1) space by including only a subset of the original states. matrix formed by the orthonormal basis of the Krylov Denote two sets of indice of the states being chosen subspace, and Hm+1 the upper Hessenberg matrix, [63] as J1 and J2, and assume J1 ⊆ J2. The reduced rate both computed from an Arnoldi algorithm . The 10 J. Comput. Sci. & Technol., Jan. 2010, Vol.25, No.1 error can be bounded by Approximation by Continuous Stochastic Differen- tial Equation. An effective approach to study bioche- m−t||A||2 m O(e (t||A||2/m) ). mical networks whose chemical master equations can- not be solved directly is to approximate them with One only needs to compute explicitly exp(H t). m+1 stochastic differential equations. One widely used ap- This is a simpler problem as m is much smaller. A proach is that of the Fokker-Planck-Langevin model[10]. special form of the well-known Pad´erational of polyno- The Langevin equation for concentration flux consists mials instead of Taylor expansion is used[64-65]: of a drift term and a diffusion term. The drift term models the macroscopic deterministic component of the etHm+1 ≈ N (tH )/N (−tH ), pp m+1 pp m+1 system. It reflects the time-dependent evolution of the P p k mean concentrations of the molecular species. The dif- where Npp(tHm+1) = k=0 ck(tHm+1) and ck = p+1−k fusion term models the intrinsic stochasticity of the sys- ck−1 · . The Expokit software by Sidje pro- (2p+1−k)k tem. The basic form of a Langevin, stochastic differen- vides an excellent implementation of this method[65]. tial equation is: This approach has been shown to be very effective in 5 studying large dynamic system (n = 8.0 × 10 ) such as dX ³ 1 ´ [40] = µ(X) + σ(X)N 0, . (28) protein folding and signaling transmission in macro- dt dt molecular assembly of GroEL-GroES[66]. The Krylov subspace method concurrently evaluate Here X is the vector of concentrations of molecular the matrix exponential. The overall scheme can be ex- species in the reaction system, µ(X) the drift term, and pressed as follows: the second term is the diffusion term. Here N (0, 1/dt) is a vector of one-dimensional Gaussians, with zero P (t) ≈ exp(τK AK ) ... exp(τ0A0)P (0), mean and 1/dt variance. The coefficient σ(X) controls XK the amplitude of the Gaussian noise. It can be either t = τk, (27) a function of X or a constant. The key issue in deve- k=0 loping Langevin models for biochemical networks is to determine µ(X) and σ(X). When σ(X) is a vector of in which the evaluation is from right to left. Here {τi} are size of time steps, and K is the total number of time constants, one adjusts its values so the variance of the steps[62]. Gaussian noise produce the correct fluctuations in the [10] MacNamara et al. further extends the Krylov sub- system . space method by splitting the rate matrix A. Based on One of the most important issues to keep in mind the reachability criteria, one can divide the states into when developing Fokker-Planck-Langevin approxima- the “fast partition” and the “slow partition”[67]. Here tions for a CME is the Keizer’s paradox previously dis- the condition is that two states belong to the same sub- cussed. For dynamical system with a single, globally set of the fast partition if and only if one can be reached attracting attractor, however, this is not an issue. We from the other via a sequence of finite fast reactions[67]. shall use the Schnakenberg model to demonstrate how Correspondingly, the matrix can be splitted into two: well the Langevin approach works. Originally deve- loped for studying the limit cycle behavior in a sim- [31,69] A = Af + As, ple chemical reaction system , the Schnakenberg model is a simple system with two reacting components where Af corresponds to the fast CME, and As corre- and three reversible reactions: sponds to the slow CME, and one has:

k1 k2 k3 P˙ (t) = A P (t) X ­ A, B ­ Y, 2X + Y ­ 3X, (29) f f f k−1 k−2 k−3 and where X and Y are reacting species of the system, and ˙ P s(t) = AsP s(t). A and B are external reactants whose concentrations With this deliberate separation, both Af and As (or copy numbers) are fixed constants. Each reaction maintain the important property of being infinitesi- has a corresponding microscopic reaction rate. The mal generators of continuous time Markov processes by fixed copy numbers or concentrations of A and B can be themselves[67]. With more elaborated splitting scheme adjusted, which lead to different behavior of the system. for aggregation of Markov processes, the Krylov sub- This simple system already produces complex behavior space projection method have been shown to be com- such as oscillation and has a single stable limit cycle putationally very efficient[62]. (see [6, 70] for recent examples). Jie Liang et al.: Computational Cellular Dynamics Based on the CME 11

Fig.6. Calculated steady state probability distributions over different copy numbers of X and Y and the trajectories of evolving concentrations of X and Y of the Schnakenberg model. (a) and (b): Trajectories of evolving concentrations of X and Y according to the deterministic ordinary differential equation (ODE). Here (a) shows the well-known oscillating limit cycle behavior of the Schnakenberg model, and (b) shows the convergence towards a fixed point. The concentrations of A and B are set at values equivalent to the copy numbers used in stochastic models. (c) and (d): Reconstructed probability distributions over X and Y obtained from 200 000 simulations of the Langevin equation (LE). (e) and (f): Exact probability distributions over copy numbers X and Y obtained by solving the chemical master equation (CME). Two sets of copy numbers of (A, B) at (10, 50) and (20, 40) are used for the fixed parameters A and B (adapted from [68]).

The macroscopic concentration obtained by solv- As can be seen in Figs. 6, and 7, the model of ing the corresponding ODE model, the approximated Langevin equation approximates well the true proba- steady state probability distribution obtained by inte- bility landscape of the chemical master equation. This grating the Langevin model, and the exact probability demonstrates that the diffusion term models the intrin- distributions obtained by solving the chemical master sic stochasticity of the Schnakenberg model well. equation[48] are shown in Fig.6. At the parameter va- Alternative models account for the stochasticity lues of A = 10 and B = 50, the well-known oscillating by replacing the diffusion term with a term for the limit cycle behavior of the Schnakenberg model can be variance-covariance between pairs of the molecular seen in Fig.6(a). At A = 20 and B = 40, the be- reactions[71], or between concentrations of different havior of the system converges towards a fixed point molecular species, without the explicit inclusion of a (Fig.6(b)). The landscapes of the steady state proba- random process[72]. The magnitude of the covariance bility distributions obtained from solving the Langevin is determined by the Hessian matrix of the second- equation (Fig.6(c)) and the chemical master equation order partial derivative of the propensity functions of (Fig.6(e)) all show a crater, or a basin surrounded by a the reactions[71-72]. This inclusion of the second mo- mountainous ridge for the parameter set of A = 10 and ments to account for the stochasticity is the basis of B = 50 (details not shown). This corresponds well with the stochastic kinetic model[71] and the mass fluctua- the limit cycle behavior observed in the ODE model. tion kinetic model (MFK)[72]. These models can model At A = 20 and B = 40, the landscapes show a single reactions involving one or two molecules well[71-72]. peak (Figs. 6(c) and 6(d)), which again corresponds They are similar in spirit to the Fokker-Planck equation well with the fixed-point behavior observed in the ODE model of the CME as described in [73] by including a model. second moment term for better approximation, but are 12 J. Comput. Sci. & Technol., Jan. 2010, Vol.25, No.1 different from that of [73] as they are macroscopic in and degradation of molecules an integral part of the sys- nature and do not involve any random processes. tem, hence there are no equilibrium states. Instead, one Yet another approach is to directly model explicitly frequently seeks to study the non-equilibrium steady the stochastic coupling of the macroscopic concentra- state. Second, once the energy of a protein conforma- tions of molecular species, in addition to the Gaussian tion is known, the relative probability of its sequence noise of the original Langevin model[68]. The steady adopting this conformation in the equilibrium state can state probability landscape of the Schnakenberg model be calculated from the Boltzmann distribution, without resulting from this approach is shown in Fig.7. Sig- the need of knowing all other possible conformations nificant improvement after incorporating the coupling and their associated probabilities. In other words, the term can be seen in Fig.7. protein folding problem is local in the energy landscape. In contrast, it is not possible to calculate the relative probability of a specific microstate of copy numbers a priori without solving the entire CME, as the proba- bility distribution of network states do not generally follow any specific analytical forms (no detailed bal- ance and the existence of cycle fluxes). Third, tran- sitions between microstates are clearly defined in bio- chemical networks by the reactions, whereas transitions between different protein conformations often techni- cally depend on specific move sets, which are different in terms of allowable transitions between states and tran- sition rates, although all such move-sets are developed with the goal to mimic physical movement of molecules.

9 Discussions and Outlooks

In this review, we have discussed the significance of the chemical master equation (CME) as a theoretical framework for modeling nonlinear, biochemical reaction networks inside cells, and the possible mechanism of cel- lular states, or attractors, as the inheritable phenotypes with a distributive epigenetic code. The validity of such a grand theory requires close comparisons between the- oretical predictions with experiments. Solving a given CME, however, is a computationally challenging task at the present time. We have outlined several key diffi- Fig.7. Comparison of errors between different steady state so- culties, as well as some of the progresses that have been lutions of the Schnakenberg model. (a) Difference between the made so far. probability landscapes of the Langevin equation and that of the In addition to the subject of studying algorithmic chemical master equation. This represents errors in the Langevin complexity, complex system, in a broady sense, is a model. (b) The amount of the errors in (a) that are corrected by major scientific problem of computer science and com- introducing explicitly a coupling term between X and Y (adapted putational science[74]. One needs not to be reminded of from [68]). the complex phenomena exhibited in the natural world Remark. The complex nature of the stochastic dy- of her/his surroundings. How to characterize and quan- namics arising from biochemical networks bears much tify such complex behavior is of great interests for un- resemblance to another complex system, namely, that derstanding our physical and biological worlds. But of protein folding. Both have very large space of micro- what is complexity and how to define complex beha- states, and both can be modeled by transitions between viors? Through studies of the CME, one seems to be micro-states using master equations (for master equa- able to gain some deeper understanding of the issues in- tion approach in protein folding studies, see [37, 39-40]). volved through concrete physical and biology examples. However, these two systems differ in several important Recently, one of us has suggested that a key to meso- aspects. First, while protein folding can be modeled as scopic complexity[75] is in the multi-stability with mul- a relaxation process towards the equilibrium state, bio- tiple time scale dynamics[76]. Nonlinear biochemical re- chemical networks are intrinsically open, with synthesis action systems in a cell-size volume can be a prototype Jie Liang et al.: Computational Cellular Dynamics Based on the CME 13

for studying complexity. [23] Dodd I B, Micheelsen M A, Sneppen K, Thon G. Theoretical analysis of epigenetic cell memory by nucleosome modifica- Acknowledgement We thank Professors Ping tion. Cell, 2007, 129(4): 813-822. Ao, Bai-Lin Hao, Shou-Dan Liang, Kim Sneppen, Jin [24] Zhu X M, Yin L, Hood L, Ao P. Robustness, stability and ef- Wang, and Peter Wolynes for helpful discussions, and ficiency of phage λ genetic switch: Dynamical structure anal- ysis. J. Bioinf. Compt. Biol., 2004, 2(4): 785-817. Youfang Cao for reading a draft of the manuscript. [25] Ptashne M. On the use of the word “epigenetic”. Curr. Biol., 2007, 17(7): R233-R236. References [26] Mino H, Rubinstein J T, White J A. Comparison of algo- rithms for the simulation of action potentials with stochastic [1] Kurtz T G. The relationship between stochastic and deter- sodium channels. Ann. Biomed. Eng., 2002, 30(4): 578-587. ministic models for chemical reactions. J. Chem. Phys., 1972, [27] Fox R F. Stochastic versions of the Hodgkin-Huxley equa- 57(7): 2976-2978. tions. Biophys. J., 1997, 72(5): 2069-2074. [2] Beard D A, Qian H. Chemical Biophysics: Quantitative Anal- [28] Lamb H. Hydrodynamic. New York: Dover, 1945. ysis of Cellular Systems. London: Cambridge Univ. Press, [29] Morton-Firth C J, Bray D. Predicting temporal fluctuations 2008. in an intracellular signalling pathway J. Theoret. Biol., 1998, [3] Wilkinson D J. Stochastic Modeling for Systems Biology. New 192(1): 117-128. York: Chapman & Hall/CRC, 2006. [30] Elf J, Ehrenberg M. Fast evaluation of fluctuations in [4] Schl¨ogl F. Chemical reaction models for non-equilibrium biochemical networks with the linear noise approximation. phase transition. Z. Physik., 1972, 253(2): 147-161. Genome Res., 2003, 13(11): 2475-2484. [5] Murray J D. Mathematical Biology: An Introduction. 3rd [31] Vellela M, Qian H. On Poincar´e-Hillcycle map of rotational Ed., New York: Springer, 2002. random walk: Locating stochastic limit cycle in reversible [6] Qian H, Saffarian S, Elson E L. Concentration fluctuations Schnakenberg model. Proc. Roy. Soc. A: Math. Phys. in a mesoscopic oscillating chemical reaction system. Proc. Engr. Sci., 2009. (in the press) Natl. Acad. Sci. USA, 2002, 99(16): 10376-10381. [32] Dill K A, Bromberg S, Yue K, Fiebig K M, Yee D P, Thomas [7] Taylor H M, Karlin S K. An Introduction to Stochastic Mod- P D, Chan H S. Principles of protein-folding — A perspective eling. 3rd Ed., New York: Academic Press, 1998. from simple exact models. Prot. Sci., 1995, 4(4): 561-602. [8] Resat H, Wiley H S, Dixon D A. Probability-weighted dy- [33] Saliˇ A, Shakhnovich E I, Karplus M. How does a protein fold? namic Monte Carlo method for reaction kinetics simulations. Nature, 1994, 369(6477): 248-251. J. Phys. Chem. B, 2001, 105(44): 11026-11034. [34] Socci N D, Onuchic J N. Folding kinetics of protein like het- [9] Gardiner C W. Handbook of Stochastic Methods for Physics, eropolymer. J. Chem. Phys., 1994, 101: 1519-1528. Chemistry and the Natural Sciences. 3rd Ed., New York: [35] Shrivastava I, Vishveshwara S, Cieplak M, Maritan A, Ba- Springer, 2004. navar J R. Lattice model for rapidly folding protein-like het- [10] van Kampen N G. Stochastic Processes in Physics and Chem- eropolymers. Proc. Natl. Acad. Sci. U.S.A, 1995, 92(20): istry. 3rd Ed., Amsterdam: Elsevier Science, 2007. 9206-9209. [11] Vellela M, Qian H. Stochastic dynamics and nonequilibrium [36] Klimov D K, Thirumalai D. Criterion that determines the thermodynamics of a bistable chemical system: The Schl¨ogl foldability of proteins. Phys. Rev. Lett., 1996, 76(21): 4070- model revisited. J. R. Soc. Interf., 2009, 6(39): 925-940. 4073. [12] Qian H, Qian M, Tang X. Thermodynamics of the general [37] Cieplak M, Henkel M, Karbowski J, Banavar J R. Master diffusion process: Time-reversibility and entropy production. equation approach to protein folding and kinetic traps. Phys. J. Stat. Phys., 2002, 107(5/6): 1129-1141. Rev. Lett., 1998, 80(16): 3654-3657. [13] Schr¨odingerE. What Is Life? The Physical Aspect of the [38] M´elinR, Li H, Wingreen N, Tang C. Designability, thermo- Living Cell. New York: Cambridge Univ. Press, 1944. dynamic stability, and dynamics in protein folding: A lattice [14] Nicolis G, Prigogine I. Self-Organization in Nonequilibrium model study. J. Chem. Phys., 1999, 110(2): 1252-1262. Systems. New York: Wiley-Interscience, 1977. [39] Ozkan S B, Bahar I, Dill K A. Transition states and the mean- [15] H¨anggiP, Grabert H, Talkner P, Thomas H. Bistable systems: ing of φ-values in protein folding kinetics. Nature Struct. Master equation versus Fokker-Planck modeling. Phys. Rev. Biol., 2001, 8(9): 765-769. A., 1984, 29(1): 371-378. [40] Kachalo S, Lu H, Liang J. Protein folding dynamics via quan- [16] Baras F, Mansour M M, Pearson J E. Microscopic simulation tification of kinematic energy landscape. Phys. Rev. Lett., of chemical bistability in homogeneous systems. J. Chem. 2006, 96(5): 058106. Phys. 1996, 105(18): 8257-8261. [41] Chan H S, Dill K A. Compact polymers. Macromolecules, [17] Vellela M, Qian H. A quasistationary analysis of a stochastic 1989, 22(12): 4559-4573. chemical reaction: Keizer’s paradox. Bull. Math. Biol., 2007, [42] Chan H S, Dill K A. The effects of internal constraints on 69(5): 1727-1746. the configurations of chain molecules. J. Chem. Phys., 1990, [18] Keizer J. Statistical Thermodynamics of Nonequilibrium Pro- 92(5): 3118-3135. cesses. New York: Springer-Verlag, 1987. [43] Liang J, Zhang J, Chen R. Statistical geometry of packing de- [19] Bishop L, Qian H. Stochastic bistability and bifurcation in a fects of lattice chain polymer from enumeration and sequential mesoscopic signaling system with autocatalytic kinase. Bio- Monte Carlo method. J. Chem. Phys., 2002, 117(7): 3511- phys. J., 2010. (in the press) 3521. [20] Kussell E, Kishony R, Balaban N Q, Leibler S. Bacterial per- [44] Zhang J, Chen Y, Chen R, Liang J. Importance of chirality sistence: A model of survival in changing environments. Ge- and reduced flexibility of protein side chains: A study with netics, 2005, 169(4): 1804-1807. square and tetrahedral lattice models. J. Chem. Phys., 2004, [21] Turner B M. Histone acetylation and an epigenetic code. 121(1): 592-603. Bioessays, 2000, 22(9): 836-845. [45] Williams P D, Pollock D D, Goldstein R A. Evolution of func- [22] Jones P A, Takai D. The role of DNA methylation in mam- tionality in lattice proteins. J. Mole. Graph. Modelling, 2001, malian epigenetics. Science, 2001, 293(5532): 1068-1070. 19(1): 150-156. 14 J. Comput. Sci. & Technol., Jan. 2010, Vol.25, No.1

[46] Bloom J D, Wilke C O, Arnold F H, Adami C. Stability and [66] Lu H M, Liang J. Perturbation-based Markovian transmission the evolvability of function in a model protein. Biophys. J., model for probing allosteric dynamics of large macromolecular 2004, 86(5): 2758-2764. assembling: A study of GroEL-GroES. PLoS Comput. Biol., [47] Lu H M, Liang J. A model study of protein nascent chain 2009, 5(10): e1000526. and cotranslational folding using hydrophobic-polar residues. [67] Cao Y, Gillespie D T, Petzold L R. The slow-scale stochastic Prot. Struct. Funct. Bioinf., 2008, 70(2): 442-449. simulation algorithm. J. Chem. Phys., 2005, 122(1): 14116. [48] Cao Y, Liang J. Optimal enumeration of state space of finitely [68] Cao Y, Liang J. Nonlinear coupling for improved stochastic buffered stochastic molecular networks and exact computa- network model: A study of Schnakenberg model. In Proc. tion of steady state landscape probability. BMC Syst. Biol., the 3rd Symp. Optimiz. Syst. Biol., Zhangjiajie, China, 2008, 2: 30. Sept. 20–22, 2009, pp.379-386. [49] Barrett R, Berry M, Chan T F, Demmel J, Donato J, Don- [69] Schnakenberg J. Simple chemical reaction systems with limit garra J, Eijkhout V, Pozo R, Romine C, van der Vorst H. Tem- cycle behaviour. J. Theoret. Biol., 1979, 81(3): 389-400. plates for the Solution of Linear Systems: Building Blocks for [70] Qian H. Open-system nonequilibrium steady state: Statisti- Iterative Methods. 2nd Ed., Philadelphia, PA: SIAM, 1994. cal thermodynamics, fluctuations, and chemical oscillations. [50] Lehoucq R, Sorensen D, Yang C. Arpack Users’ Guide: So- J. Phys. Chem. B, 2006, 110(31): 15063-15074. lution of Large Scale Eigenvalue Problems with Implicitly [71] Goutsias J. Classical versus stochastic kinetics modeling of Restarted Arnoldi Methods. Philadelphia, PA: SIAM, 1998. biochemical reaction systems. Biophys. J., 2007, 92(7): 2350- [51] Cao Y, Lu H M, Liang J. Stochastic probability landscape 2365. model for switching efficiency, robustness, and differential [72] Uribe C A, Verghese G C. Mass fluctuation kinetics: Captur- threshold for induction of genetic circuit in phage λ. In Proc. ing stochastic effects in systems of chemical reactions through the 30th Annual Int. Conf. IEEE Eng. Med. Biol. Soc., coupled mean-variance computations. J. Chem. Phys., 2007, Vancouver, Canada, Aug. 20–24, 2008, pp.611-614. 126(2): 024109. [52] Gardner T S, Canter C R, Collins J J. Construction of [73] Keizer J. On the macroscopi equivalence of descriptions of a genetic toggle switch in Escherichia coli. Nature, 2000, fluctuations for chemical reactions. J. Math. Phys., 1977, 18: 403(6767): 339-342. 1316-1321. [53] Kepler T B, Elston T C. Stochasticity in transcriptional reg- [74] Mitchell M. Complexity: A Guided Tour. London: Oxford ulation: Origins, consequences, and mathematical represen- Univ. Press, 2009. tations. Biophys. J., 2001, 81(6): 3116-3136. [75] Laughlin R B, Pines D, Schmalian J, Stojkovi´cB P, Wolynes [54] Schultz D, Onuchic J N, Wolynes P G. Understanding P G. The middle way. Proc. Natl. Acad. Sci. USA, 2000, stochastic simulations of the smallest genetic networks. J. 97(1): 32-37. Chem. Phys., 2007, 126(24): 245102. [76] Qian H, Shi P Z, Xing J. Stochastic bifurcation, slow fluctu- [55] Kim K Y, Wang, J. Potential energy landscape and robustness ations, and bistability as an origin of biochemical complexity. of a gene regulatory network: Toggle Switch. PLoS Comput. Physical Chemistry Chemical Physics, 2009, 11(24): 4861- Biol., 2007, 3(3): e60. 4870. [56] Wang J, Xu L, Wang E. Potential landscape and flux frame- work of nonequilibrium networks: Robustness, dissipation, Jie Liang is a professor in the and of biochemical oscillations. Proc. Natl. Acad. Department of Bioengineering at the Sci. U.S.A., 2008, 105(34): 12271-12276. University of Illinois at Chicago, and [57] Ptashne M. Genetic Switch: Phage Lambda Revisited. New holds a visiting position at Shanghai York: Cold Spring Harbor Laboratory Press, 2004. Jiao Tong University. He received his [58] Arkin A, Ross J, McAdams H H. Stochastic kinetic analy- B.S. degree from Fudan University sis of developmental pathway bifurcation in phage λ-infected Escherichia coli cells. Genetics, 1998, 149(44): 1633-1648. (1986), MCS and Ph.D. degree from [59] Aurell E, Brown S, Johanson J, Sneppen K. Stability puzzles the University of Illinois at Urbana- in phage λ. Phys. Rev. E., 2002, 65(5): 051914. Champaign (1994). He was an NSF [60] Munsky B, Khammash M. The finite state projection algo- CISE postdoctoral research associate rithm for the solution of the chemical master equation. J. (1994∼1996) at the Beckman Institute and National Center Chem. Phys., 2006, 124(4): 044104. for Supercomputing and its Applications (NCSA) in Ur- [61] Munsky B, Khammash M. A multiple time interval finite state bana, Illinois. He was a visiting fellow at the Institute of projection algorithm for the solution to the chemical master equation. J. Comput. Phys., 2007, 226(1): 818-835. Mathematics and Applications at Minneapolis, Minnesota [62] Macnamara S, Bersani A M, Burrage K, Sidje R B. Stochastic in 1996, and an Investigator at SmithKline Beecham Phar- chemical kinetics and the total quasi-steady-state assumption: maceuticals in Philadelphia from 1997 to 1999. He was a Application to the stochastic simulation algorithm and chem- recipient of the NSF CAREER award in 2003. He is a fel- ical master equation. J. Chem. Phys., 2008, 129(9): 095105. low of American Institute of Medical and Biological En- [63] Datta B N. Numerical Linear Algebra and Applications. gineering, and served as regular member of the NIH Bio- Brooks/Cole Pub. Co., 1995. logical Data Management and Analysis study section. His [64] Golub G H, van Loan C F. Matrix Computations. Johns research interests are in biogeometry, biophysics, computa- Hopkins Univ. Press, 1996. [65] Sidje R B. Expokit: A software package for computing ma- tional proteomics, stochastic molecular networks, and cellu- trix exponentials. ACM Trans. Math. Softw., 1998, 24(1): lar pattern formation. His recent work can be accessed at 130-156. (http://www.uic.edu/∼jliang). Jie Liang et al.: Computational Cellular Dynamics Based on the CME 15

Hong Qian received his B.S. de- gree in astrophysics from Peking Uni- versity. He worked on fluorescence correlation spectroscopy (FCS) and single-particle tracking (SPT) and obtained his Ph.D. degree in bio- chemistry from Washington Univer- sity (St. Louis). His research inter- ests turned to theoretical biophysical chemistry and mathematical biology when he was a postdoctoral fellow at the University of Ore- gon and at the California Institute of Technology. In that period of time, he worked on protein thermodynamics, fluc- tuations and folding. Between 1994 and 1997, he was with the Department of Biomathematics at the UCLA School of Medicine, where he worked on the theory of motor proteins and single-molecule biophysics. This work led to his current interest in mesoscopic open chemical systems. He joined the University of Washington (Seattle) in 1997 and is now pro- fessor of applied mathematics, and an adjunct professor of bioengineering. His current research is in stochastic anal- ysis and statistical physics of cellular systems. His recent book “Chemical Biophysics: Quantitative Analysis of Cel- lular Systems”, co-authored with Daniel A. Beard, has been published by the Cambridge University Press.