arXiv:1207.1631v2 [q-bio.QM] 15 Jan 2013 eei euaoycircuits regulatory genetic ercnl eeoe nrni os nlzr(N)[4], (iNA) Analyzer Noise SSA. the intrinsic to developed equivalent recently mathematically stat We is the which of set probabilities system, a the the (CME), for equation equations master differential of chemical of appro the route an of different finding solution involves A mate noise statistics SSA. required intrinsic the the in by inferring produced interested this averaging trajectories is ensemble networks; the considerable one reaction require if which large statistics true for particularly slow is typically and event is reaction every hence simulates method This react [3]. biochemical in systems stochasticity probing of numbers. tional molecule low with species characterized i one networks role least biochemical important at those an by plays of noise Hence dynamics intrinsic [2]. the molecules that standard of expected number is the it the that of root scale known roughly square noise) the also (intrinsic rando events the molecular is to of due timing It fluctuations concentration [1]. protein the of per deviation cell thousands several per to tens species few from varies dance computa- the of needs the community. for iNA biology suited simulations. tional stochastic well outperfor exact typically particularly of which hence means framework auto- by analysis LLVM obtained by fluctuation that the a improved to using significantly leading compilation is just-in-time T methods fluctuations. mated the concentration next of of the efficiency esti calculating and approximation’s o noise by linear ranges of achieved the wide to is corrections over fo order This a allows statistics of numbers. which noise 0.3 with molecule (iNA) version of species Analyzer describe computation Noise one here accurate intrinsic We least software molecules. at the of biochemical by number many characterized that small known are well number netwo is large networks it biochemical with However networks for molecules. for of accurate statistics are noise estimates These intrinsic obtain to h tcatcsmlto loih SA steconven- the is (SSA) algorithm simulation stochastic The abun- protein the that shown have studies Experimental Keywords Abstract optto fbohmclptwyflcutosbyn the beyond fluctuations pathway biochemical of Computation Telna os prxmto scmol used commonly is approximation noise linear —The Sohsi oeig ierNieApproximation; Noise Linear modeling; -Stochastic ∗ colo ilgclSine,Uiest fEibrh Ed Edinburgh, of University Sciences, Biological of School .I I. § nttt fPyisadAtooy nvriyo Potsdam, of University Astronomy, and Physics of Institute † NTRODUCTION ytSsEibrh nvriyo dnug,Eibrh U Edinburgh, Edinburgh, of University Edinburgh, SynthSys ‡ eateto hsc,Hmod nvriyo eln Berl Berlin, of University Humboldt Physics, of Department hlp Thomas Philipp prxmto sn iNA using approximation ¶ † ¶ ∗†‡ hs uhr otiue equally. contributed authors These ansMatuschek Hannes , mates sof es as s rks. ion ms xi- he of m is n r s f .Applications A. section the the in of provided applicable Methods. details are is Technical methods which approximation systems. [2] various chemical CME monostable the system- of to the (SSE) from expansion three obtained are size The IOS) performance. EMRE, its finally (LNA, methods and and SSA implementation the the and using discuss CME simulations the stochastic of approximations results exact EMRE with the and LNA with RE, compare the of iNA, inside implemented method nezmtcdgaainmechanism: degradation enzymatic an r on oarevr elwt hs rmhu long hour from those trajectories. with SSA of well thousands bimolecular very over averaging agree a calculati ensemble to IOS involving fast found expression the of are gene results tested the of is Remarkably, reaction. method model new a The molecules). on ranges of wide thousands over accurate to vary are (few numbers estimates molecule the These whose for systems them. for estimates about comple- providing noise method by of (IOS) variance method Square EMRE Omega the Inverse menting the iNA, in Equations Rate conventional mean the than the numbers (REs). accurate molecule gives more large is latter to and the intermediate of for while limit concentrations numbers the in molecule variance fluctuations large the concentration gives of former (EMRE) covariance The Equation and CME. Rate the of Mesoscopic approximations Approximation Effective Noise and Linear of (LNA) the analysis fluctuation via a networks enabling biochemical package software first the nti eto edsrb h eut ftenvlIOS novel the of results the describe we section this In ecnie w-tg oe fgn xrsinwith expression gene of model two-stage a consider We nti rceigw eeo n fcetyimplement efficiently and develop we proceeding this In Protein Gene mRNA §¶ − → k n ao Grima Ramon and 0 + − → k s Enzyme Gene mRNA nug,Uie Kingdom United inburgh, + osa,Germany Potsdam, mRNA ie Kingdom nited ↽ −− + k n Germany in, k −− I R II. − 1 ⇀ Protein 1 Complex , ∗† ESULTS mRNA , iernoise linear − → k 2 − − → k dM Enzyme ∅ , + ∅ . (1) on Figure 1: Gene expression model with fast transcription rate. In panels (a) and (b) we compare the RE predictions of mean concentrations with those obtained from ensemble averaging 3, 000 SSA trajectories. The shaded areas denote the region of one standard deviation around the average concentrations which in (a) has been computed using the LNA and in (b) using the SSA. For comparison, in panel (c) we show the mean concentration prediction according to the EMRE and the variance prediction according to IOS. The results in (a), (b) and (c) are in good agreement.

The scheme involves the transcription of mRNA, its trans- while an evaluation of 1,962 genes in budding yeast showed lation to protein and subsequent degradation of both mRNA that the ratios have median and mode close to 3 [7]. and protein. Note that while mRNA is degraded via an un- We now consider parameter set (ii), the case of moderate specific linear reaction, the degradation of protein occurs via transcription. In Fig. 2(a) and (b) we compare the RE and an enzyme catalyzed reaction. The latter may model proteol- LNA predictions of mean concentrations and variance of ysis, the consumption of protein by a metabolic pathway or fluctuations with that obtained from the SSA. Notice that other post-translational modifications. A simplified version in this case, the two are in severe disagreement. The SSA of this model is one in which the protein degradation is predicts that the mean concentration of protein is larger replaced by the linear reaction: Protein −→ ∅. Over the past than that of the mRNA while the REs predict the opposite. decade the latter model has been the subject of numerous This discreteness-induced inversion phenomenon has been studies, principally because it can be solved exactly since the described in Ref. [8]. It is also the case that the variance scheme is composed of purely first-order reactions [5]–[7]. estimate of the LNA is considerably smaller than that of However, the former model as given by scheme (1), cannot the SSA. In Fig. 2(c) we show the mean concentrations be solved exactly because of the bimolecular association computed using the EMRE and the variance computed using reaction between enzyme and protein. Hence in what follows the IOS method. Note that the latter are in good agreement we demonstrate the power of approximation methods to infer with the SSA in Fig. 2(b). Note further that while the EMRE useful information regarding the mechanism’s intrinsic noise was already implemented in the previous version of iNA, the properties. IOS was not. Hence the present version is the first to provide We consider the model for two parameter sets (see Tab. I) estimates of the mean concentrations and of the size of the at fixed compartmental volume of one femtoliter (one micron noise which are more accurate than both the REs and the cubed). For both cases, the REs predict the same steady state LNA. The transient times for this case are given by 37 min- mRNA and protein concentrations: [mRNA] = 0.12µM and [Protein]=0.09µM. These correspond to 72 and 54 molecule numbers, respectively. parameter set (i) set (ii) −1 −1 We have used iNA to compute the mean concentrations k0[G] 2.4min µM 0.024min µM −1 −1 using the REs and the variance of fluctuations using the kdM 20min 0.2min −1 −1 LNA for parameter set (i) in which transcription is fast. ks 1.5min 1.5min −1 −1 Comparing these with SSA estimates (see Fig. 1) we see k−1,k2 2min 2min −1 −1 that the REs and LNA provide reasonably accurate results k1 400(µMmin) 400(µMmin) for this parameter set. This analysis was within the scope of the previous version of iNA [4]. However, the scenario Table I: Kinetic parameters used for the gene expression considered is not particularly realistic. This is since the model, scheme (1), as discussed in the main text. The −15 ratio of protein and mRNA lifetimes in this example is volume is fixed to Ω=10 l with an enzyme concentration approximately 100 (as estimated from the time taken for of 0.1µM corresponding to 60 enzyme molecules. The the concentrations to reach 90% of their steady state values) Michaelis-Menten constant is 0.01µM in all cases. Figure 2: Gene expression model with moderate transcription rate. In panels (a) and (b) we compare the RE and LNA predictions of mean concentrations and variance of fluctuations with those obtained from ensemble averaging 30, 000 stochastic realizations computed using the SSA. Note that the RE and LNA predictions are very different than the actual values. In panel (c) we show the mean concentration prediction according to the EMRE and the variance prediction according to IOS. These are in good agreement with those obtained from the SSA. utes for protein and 12 minutes for mRNA concentrations to emit and execute platform specific machine code at with a ratio of approximately 3 in agreement with the median runtime [10]. The system size expansion ODEs are automat- and mode of experimentally measured ratios. Hence this ically compiled as native machine code making it executable example provides clear evidence of the need to go beyond directly on the CPU. Therefore iNA’s JIT feature enjoys the RE and LNA level of approximation for physiologically the speed of statically compiled code while maintaining relevant parameters of the gene expression model. the flexibility common to interpreters. Moreover, the LLVM framework allows optimizations on platform independent B. Implementation and platform specific instruction levels which are beneficial iNA’s framework consist of three layers of abstraction: the for computationally expensive calculations. SBML parser which sets up a mathematical representation Compared to iNA’s previous implementation, using JIT of the reaction network, a module which performs the compilation, we have observed speedups for the SSE anal- SSE analytically using the computer algebra system Ginac ysis by factors of 10 − 20 and factors of 1.5 − 2 for the [9] and a just-in-time (JIT) compiler based on the LLVM SSA. [10] framework which compiles the mathematical model III. METHODS into platform specific machine code at runtime yielding We consider a general reaction network confined in a high performance of SSE and SSA analyses implemented volume Ω under well-mixed conditions and involving the in iNA. Generally, methods based on the SSE require interaction of N distinct chemical species via R chemical the numerical solution of a high dimensional system of reactions of the type linear equations. In the present version 0.3 of iNA, time- kj course analysis is efficiently performed by means of the all- s1j X1 + ... + sNj XN −→ r1j X1 + ... + rNj XN , (2) round ODE integrator LSODA which automatically switches where j is the reaction index running from 1 to R, X between explicit and implicit methods [11]. In particular, i denotes chemical species i, k is the reaction rate of the jth the number of simultaneous equations to be solved for j reaction and s and r are the stoichiometric coefficients. the LNA, EMRE analysis is approximately proportional to ij ij Note that our general formulation does not require all N 2 where N is the number of independent species after reactions to be necessarily elementary. conservation analysis [12] while for IOS, the number of The CME gives the time-evolution equation for the prob- equations is approximately proportional to N 3. Quadratic ability P (~n, t) that the system is in a particular mesoscopic and cubic dependencies represent a challenge for software state ~n = (n , ..., n )T where n is the number of molecules development. iNA’s previous version 0.2 addressed this need 1 N i of the ith species. It is given by: for performance by providing a bytecode interpreter (BCI) R N for efficient expression evaluation [4]. The latter concept has ∂P (~n, t) − = E Sij − 1 aˆ (~n, Ω) P (~n, t), (3) proven its performance for both SSE and SSA methods while ∂t i j j=1 i=1 maintaining compatibility over many platforms. With the XY  ~ present version we introduce a strategy for JIT compilation where Sij = rij − sij , aˆj (~n, Ω) is the propensity function based on the modern compiler framework LLVM that allows such that the probability for the jth reaction to occur 1 1 in the time interval t,t dt is given by a ~n, ~ dt (2) (1)β (1) βγδ [ + ) ˆj ( Ω) L = −∂αJα ǫβ + ∂α∂βDαβ − ∂αJα ǫβǫγǫδ −Sij 2! 3! [3] and Ei is the step operator defined by its ac- 1 1 γδ 1 δ tion on a general function of molecular populations as + ∂α∂βJαβǫγǫδ − ∂α∂β∂γ Jαβγ ǫδ −Sij 2! 2! 3! Ei g(n1, ..., ni, ..., nN )= g(n1, ..., ni − Sij , ..., nN ) [2]. 1 + ∂ ∂ ∂ ∂ D , (7) The CME is typically intractable for computational pur- 4! α β γ δ αβγδ poses because of the inherently large state space. iNA and the SSE coefficients are given by circumvents this problem by approximating the moments of probability density solution of the CME systematically R (n) (n) ~ using van Kampen’s SSE [2], [13]. The starting point of the D ij..r = SikSjk...Srk fk ([X]), analysis is van Kampen’s ansatz Xk=1 (n)st..z ∂ ∂ ∂ (n) ~n −1/2 J ij..r = ... D ij..r. (8) = [X~ ]+Ω ~ǫ, (4) ∂[Xs] ∂[Xt] ∂[Xz] Ω Note that the above expressions generalize the expansion by which one separates the instantaneous concentration into carried out in Ref. [14] to include also non-elementary a deterministic part given by the solution [X~ ] of the macro- reactions as for instance trimolecular reactions or reactions scopic REs for the reaction scheme (2) and the fluctuations with propensities of the Michaelis-Menten type [15]. Note around it parametrized by ~ǫ. The change of variable causes also that the Ω1/2 term vanishes since the macroscopic REs the of molecular populations P (~n, t) are given by ∂ [X ]= R S f (0)([X~ ]) leaving us with to be transformed into a new probability distribution of t α k=1 αk k a series expansion of the CME in powers of the inverse fluctuations Π(~ǫ, t). The time derivative, the step operator square root of the volume.P In Eqs. (7) we have omitted the and the propensity functions are also transformed (see [4] upper index in the bracket of the SSE coefficients in the case for details). In particular, using ansatz (4) together with of n =0. the explicit Ω-dependence of the propensities as given by ∞ The method now proceeds by constructing equations for aˆ (~n, Ω) = Ω1−mf (m)( ~n ), the propensities can be j m=0 j Ω the moments of the ~ǫ variables. This is accomplished by expanded in powers of Ω−1/2: P expanding the probability distribution of fluctuations Π(~ǫ, t) ∂f (0)([X~ ]) in terms of the inverse square root of the volume as Π(~ǫ, t)= aˆj (~n, Ω) (0) ~ −1/2 j −1 (1) ~ ∞ −j/2 =fj ([X]) + Ω ǫα +Ω fj ([X]) j=0 Πj (~ǫ, t)Ω , which implies an equivalent expansion Ω ∂[Xα] of the moments (0) (1) ∂f X~ ∂f X~ P ∞ 1 −1 j ([ ]) −3/2 j ([ ]) + Ω ǫαǫβ +Ω ǫα −j/2 2 ∂[Xα]∂[Xβ] ∂[Xα] hǫkǫl...ǫmi = [ǫkǫl...ǫm]j Ω . (9) j=0 ∂f (0) X~ X 1 −3/2 j ([ ]) −2 + Ω ǫαǫβǫγ + O(Ω ). In order to relate the above moments back to the moments 2 ∂[Xα]∂[Xβ]∂[Xγ] (5) of the concentration variables we use Eqs. (4) and (9) to find expressions for the mean concentrations and covariance of Note that f~(0) is the macroscopic rate function. Here we fluctuations which are given by have used the convention that twice repeated Greek indices n − − i = [X ]+Ω 1 [ǫ ] + O(Ω 2), (10a) are summed over 1 to N, which we use in what follows as Ω i i 1 −1 well. Consequently the CME up to order Ω can be written D E ni ni nj nj Σ = − − as ij Ω Ω Ω Ω D−1 D E−2 D EE R = Ω [ǫiǫj ]0 +Ω [ǫiǫj ]2 − [ǫi]1 [ǫj ]1 ∂Π(~ǫ, t) ∂[X ] − − Ω1/2 α − S f (0)([X~ ]) ∂ Π(~ǫ, t) + O(Ω 3) (10b) ∂t ∂t αk k α  k=1 ! X 0 0 (0) −1/2 (1) −1 (2) The order Ω term of Eq. (10a) denotes the mean con- = Ω L +Ω L +Ω L Π(~ǫ, t) centrations as given by the macroscopic REs while the − −1  + O(Ω 3/2),  (6) Ω term in Eq. (10b) gives the LNA estimate for the covariance. Including terms to order Ω−1 in Eq. (10a) where the operators are defined as gives the EMRE estimate of the mean concentrations which corrects the estimate of the REs. Finally, considering also (0) β 1 −2 L = −∂αJ ǫβ + ∂α∂βDαβ, the Ω term in Eq. (10b) gives the IOS (Inverse Omega α 2 Squared) estimate of the variance which is centered around (1) (1) 1 βγ 1 γ L = −∂αD − ∂αJ ǫβǫγ + ∂α∂βJ ǫγ the EMRE concentrations and is of higher accuracy than α 2! α 2! αβ 1 the LNA method. The general procedure together with the − ∂ ∂ ∂ D , 3! α β γ αβγ equations determining the coefficients of Eq. (9) is presented in Refs. [14] and [16]. In brief the result can be summarized [3] D.T. Gillespie. Stochastic simulation of chemical kinetics. as follows: as mentioned above, truncating Eq. (6) to order Annu. Rev. Phys. Chem., 58:35–55, 2007. Ω1/2 yields the macroscopic REs, truncation after terms up 0 [4] P. Thomas, H. Matuschek, and R. Grima. Intrinsic noise to order Ω gives a Fokker-Planck equation with linear drift analyzer: A software package for the exploration of stochastic and diffusion coefficients which is also called the Linear biochemical kinetics using the system size expansion. PloS Noise Approximation. Considering terms up to Ω−1/2 one one, 7(6):e38518, 2012. obtains the EMREs while the terms up to Ω−1 determine the corrections beyond the LNA as given by the present IOS [5] E.M. Ozbudak, M. Thattai, I. Kurtser, A.D. Grossman, and A. van Oudenaarden. Regulation of noise in the expression method. Since the IOS method has been derived from van of a single gene. Nature genetics, 31(1):69–73, 2002. Kampen’s ansatz, Eq. (4), which expands the CME around the macroscopic concentrations it has the same limitation [6] J. Paulsson. Models of stochastic gene expression. Physics as the LNA namely that it cannot account for systems of life reviews, 2(2):157–175, 2005. exhibiting bistability. [7] V. Shahrezaei and P.S. Swain. Analytical distributions for stochastic gene expression. Proceedings of the National IV. DISCUSSION Academy of Sciences, 105(45):17256, 2008. In this proceeding we have introduced and implemented the IOS approximation in the software package iNA. This [8] R. Ramaswamy, N. Gonz´alez-Segredo, I.F. Sbalzarini, and −2 R. Grima. Discreteness-induced concentration inversion in allows the to be determined accurate to order Ω , mesoscopic chemical systems. Nature Communications, an approximation which complements the EMRE method 3:779, 2012. (mean concentrations accurate to order Ω−1) and is superior to the previously implemented LNA method (variances accu- [9] C. Bauer, A. Frink, and R. Kreckel. Introduction to the rate to order Ω0). As we have shown this increased accuracy ginac framework for symbolic computation within the C++ programming language. Journal of Symbolic Computation, is desirable to accurately account for the effects of intrinsic 33(1):1–12, 2002. noise in biochemical reaction networks under low molecule number conditions. In particular, we have demonstrated [10] C. Lattner and V. Adve. LLVM: A compilation framework the utility of the software by analyzing an example of for lifelong program analysis & transformation. In Code gene expression with a functional enzyme. We have also Generation and Optimization, 2004. CGO 2004. International Symposium on, pages 75–86. IEEE, 2004. extended iNA by a more efficient JIT compilation strategy in combination with improved numerical algorithms which [11] L. Petzold. Automatic selection of methods for solving stiff offers high performance and enables computations feasible and nonstiff systems of ordinary differential equations. SIAM even on desktop PCs. This feature is particularly important Journal on Scientific and Statistical Computing, 4(1):136– when analyzing noise in reaction networks of intermediate 148, 1983. or large size with bimolecular reactions, conditions that have [12] R.R. Vallabhajosyula, V. Chickarmane, and H.M. Sauro. been shown to amplify the deviations from the conventional Conservation analysis of large biochemical networks. Bioin- rate equation description [17]. formatics, 22(3):346–353, 2006.

ACKNOWLEDGMENT [13] NG Van Kampen. The expansion of the . Advances in Chemical Physics, pages 245–309, 1976. RG gratefully acknowledges support by SULSA (Scottish Universities Life Science Alliance). [14] R. Grima, P. Thomas, and A.V. Straube. How accurate are the nonlinear chemical fokker-planck and chemical langevin AVAILABILITY equations? The Journal of Chemical Physics, 135:084103, 2011. The software iNA version 0.3 is freely available un- der http://code.google.com/p/intrinsic-noise-analyzer/ as ex- [15] C.V. Rao and A.P. Arkin. Stochastic chemical kinetics and ecutable binaries for Linux, MacOSX and Microsoft Win- the quasi-steady-state assumption: Application to the gillespie dows, as well as the full source code under an open source algorithm. The Journal of chemical physics, 118:4999, 2003. license. [16] R. Grima. A study of the accuracy of moment-closure approximations for stochastic chemical kinetics. The Journal REFERENCES of Chemical Physics, 136:154105, 2012. [1] Y. Ishihama, T. Schmidt, J. Rappsilber, M. Mann, U. Hartl, [17] P. Thomas, A.V. Straube, and R. Grima. Stochastic theory M. J. Kerner, and D. Frishman. Protein abundance profiling of large-scale enzyme-reaction networks: Finite copy number of the escherichia coli cytosol. BMC Genomics, 9:102, 2008. corrections to rate equation models. Journal of Chemical Physics, 133(19):195101, 2010. [2] N.G. van Kampen. Stochastic processes in physics and chemistry. North-Holland, 3rd edition, 2007.