Tensor-train approximation of the chemical and its application for parameter inference

Tensor-train approximation of the chemical master equation and its application for parameter inference Ion Gabriel Ion,1, 2 Christian Wildner,2 Dimitrios Loukrezis,1, 2 Heinz Koeppl,1, 3, 2 and Herbert De Gersem1, 2 1)Centre for Computational Engineering, Technische Universität Darmstadt 2)Department of Electrical Engineering, Technische Universität Darmstadta) 3)Centre for Synthetic Biology, Technische Universität Darmstadt In this work, we perform Bayesian inference tasks for the chemical master equation in the tensor-train format. The tensor-train approximation has been proven to be very efficient in representing high dimensional data arising from the explicit representation of the chemical master equation solution. An additional advantage of representing the probability mass function in the tensor train format is that parametric dependency can be easily incorporated by introducing a tensor product basis expansion in the parameter space. Time is treated as an additional dimension of the tensor and a linear system is derived to solve the chemical master equation in time. We exemplify the tensor-train method by performing inference tasks such as smoothing and parameter inference using the tensor-train framework. A very high compression ratio is observed for storing the probability mass function of the solution. Since all linear algebra operations are performed in the tensor-train format, a significant reduction of the computational time is observed as well.

I. INTRODUCTION Low-rank tensor decompositions have also been considered in the context of parameter-dependent CMEs, however, with 10 Traditional chemical kinetic models use ordinary differen- limited applications so far . For example, considering sys- tial equations (ODEs) to predict the concentrations of the in- tems and synthetic biology, stochastic chemical kinetics are volved molecule types. The evolution of the corresponding used to construct quantitatively predictive models of biomolec- is given by the chemical master equa- ular circuits. This requires the solution of an inverse problem, tion (CME)1 which, in principle, can be solved by numerical where it is often necessary to estimate the rate parameters integration. In practice, the state space even of simple models of a structurally known candidate system from time course is too large for a naive integration of the CME. Therefore, a data. For population snapshot data, e.g. obtained from flow number of approximation techniques have been developed over cytometry measurements, calibration via moment-based in- 16–18 the years, e.g. stochastic simulation methods2–4 and suitable ference is a well-established method . In the last decade, time and space discretizations5,6. Many CME approximations advances in fluorescence microscopy have led to an increasing are based on the observation that the probability mass is often availability of single-cell time course data. When the number concentrated on a small fraction of the state space. For ex- of observed cells is small, standard moment-based methods ample, the finite state projection method solves the CME on a break down because they rely on the central limit theorem to 19–21 rectangular subspace with appropriate boundary conditions7,8. compute a likelihood function for sample moments . In A problem here is that the region where the significant part such a scenario, inference based on the path likelihood of in- of the probability mass function (PMF) is located may change dividual trajectories provides a principled way to extract more over time. This is tackled in the sliding window method, where information from the data. Unfortunately, these approaches the region of interest is adapted based on the solution at the are computationally challenging since they require integrating previous time step9. Unfortunately, the computational cost of the CME multiple times for different parameter configurations. these methods grows exponentially with the number of species, More effective approaches can be obtained by approximating 22 due to the fact that the system states must be labeled explicitly the likelihood, e.g. by Gaussian moment closure or the linear 23,24 to cast the CME into a ODE. noise approximation . While reducing the computational A different line of research has explored approximations of demand significantly, the underlying approximations are not the CME based on low-rank tensor formats10–14. The idea is applicable to systems with low copy number of species, such to project the probability distribution onto a subspace of the as single genes. Alternatively, approximate likelihoods can be arXiv:2106.15188v2 [stat.CO] 15 Jul 2021 25 tensor-product space induced by the reaction system. The solu- computed via particle filtering . For Bayesian parameter in- tion is then propagated by a small time step and projected back ference, the approximate likelihood expression are typically onto the chosen space. Alternatively, considering time as an used within a Monte Carlo scheme. Alter- additional dimension, a joint approximation of the space-time natively, the parameters can be included into an augmented system in a low-rank tensor format can be obtained10,11. The state space allowing for direct estimation via sequential Monte 26 low-rank tensor representation not only preserves the structure Carlo . of the CME, but is also much more memory-efficient com- pared to the matrix representation approach15. In addition, the In this work, we suggest a framework for performing Bayesian inference tasks for the parameter-dependent CME, low-rank tensor representation allows for accurate, dynamic 27 approximations via rank rounding. by exploiting the so-called tensor-train (TT) decomposition to approximate the joint distribution over the CME states and parameters. For that purpose, we construct an explicit repre- sentation of the evolution operator in the TT format and show a)Electronic mail: [email protected] that it can be constructed without ever assembling the corre- Tensor-train approximation of the chemical master equation and its application for parameter inference 2 sponding matrix. The TT format has the advantage that the is described by the so-called chemical master equation storage requirement scales linearly with respect to the number (CME)1, such that of dimensions, while at the same time being a numerically 27,28 푀 robust tensor decomposition . We also develop a time- d푝푡 x ∑︁ n 푚 푚 o ( ) = 훼푚 x ν ( ) 푝푡 x ν ( ) 훼푚 x 푝푡 x , domain solver based on the time-dependent alternating mini- d푡 ( − ) ( − ) − ( ) ( ) mal energy (tAMEn) algorithm29, which additionally incorpo- 푚=1 (3a) rates parameter dependence. To that end, we combine the the 0 state space and the parameter space into a higher-dimensional 푝0 x = 푃( ) x , (3b) ( ) ( ) tensor-product space. The parameter dependence is expressed 0 by of a B-spline basis expansion and a Galerkin formu- where 푃( ) is the initial probability and 훼푚 is called the propen- lation is employed to derive the multilinear system with respect sity function. For a well-mixed system at thermal equilibrium, to the full tensor. Since typically every reaction is governed the mass-action propensity reads by an individual rate constant, the parameters can be seam- 푑 lessly included in the tensor representation, thus allowing for Ö 푥푘 ! 훼푚 x = 푐푚 , (4) efficiently solving the joint system. In practice, however, the ( ) 푞 ! 푥 푞 ! 푘=1 푚,푘 푘 푚,푘 system parameters are often unknown. Therefore, we develop ( − ) a framework for filtering, smoothing, and parameter inference where 푐푚 is the so-called specific probability rate, which is a based on the efficient TT representation of the joint system. measure for the probability that reaction 푚 occurs. The proposed framework allows us to perform Bayesian infer- Until now, the CME has been defined for an infinite state 푑 ence for the model parameters with a single forward-backward space N0 , which is computationally intractable, thus inap- pass, as demonstrated on several synthetic examples. propriate for a numerical solution. To that end, a trun- The remaining of this paper is organized as follows. In cation of the state space is necessary, such that 푥푘 < 푛푘 , section II, we recall the CME and explain how it can be ex- 푘 = 1, . . ., 푑. We denote the truncated state space as =  푑 X pressed in tensor format. Next, in section III, we present x N 푥푘 < 푛푘 , 푘 = 1, . . ., 푑 . The choice of a box domain ∈ 0 | the TT decomposition, apply it to the tensor-formatted CME, truncation is not strictly necessary, however, it is beneficial and present a TT-based solution method. In section IV, we for the TT format used in the following. All states in can X푑 consider the setting of a CME with parameter dependencies, be uniquely indexed as x i , where i = 푖1, . . .,푖푑 N and ( ) ( ) ∈ which we include in the TT-based CME format. Subsequently, 푥푘 푖푘 = 푖푘 1. Accordingly, 푖푘 = 1, . . .,푛푘 . ( ) − we exploit the TT-format of the parameter-dependent CME for Using the truncated state space defined above and for a inference tasks, such as filtering, smoothing, and parameter given time instance 푡, the PMF 푝 푡,x can be represented as ( 푛1 ) 푛푑 identification. Numerical results are presented in section V, a multidimensional array p 푡 R ×···× , the elements of ( ) ∈ where we validate the TT-based CME solver and showcase the which are given as benefits of performing inference in the TT-format. The paper p 푡 = 푝 푡,x i . (5) closes with our conclusions, presented in section VI. i ( ) ( ( )) In the following, we shall refer to such multidimensional arrays as tensors28. The evolution equation (3a) can then be written II. CHEMICAL MASTER EQUATION IN TENSOR FORMAT in tensor format, such that dp 푡 We consider a well-mixed reaction system with 푑 chemical ( ) = Ap 푡 , (6) species denoted with 푆 , ..., 푆푑 involved in 푀 reactions. We d푡 ( ) { 1 } consider reactions of type: 푛1 푛푑 푛1 푛푑 where A R( ×···× )×( ×···× ) is a tensor-operator, also called a tensor-matrix,∈ that acts on the tensor p 푡 . Tensor- 푞푚, 푆 푞푚,푑 푆푑 푠푚, 푆 푠푚,푑 푆푑, (1) 1 1 + · · · + −→ 1 1 + · · · + operators can be seen as generalizations of the commonly( ) em- ployed matrix-based operators to more than two dimensions. with 푚 = 1, . . ., 푀 and 푞푚,푘 , 푠푚,푘 N0, 푘 = 1, . . ., 푑. The state of the system at time 푡 is described∈ by the vector x N푑 The elements of the CME tensor-operator are given as ∈ 0 which contains the number of elements per species at that 푀 time instant. To describe the change in the state vector after ∑︁ 푚 x j x j Ai,j = 훼푚 x i ν ( ) 훿 ( ) 푚 훼푚 x i 훿x(i) , (7) ( ( ) − ) x i ν ( ) − ( ( )) reaction 푚 occurs, we introduce the stoichiometric change 푚=1 ( )− ( ) 푚 푑 푚 vector ν ( ) Z , the 푘th element of which is given as 휈푘( ) = ∈ where 훿j = 훿 푗1 훿 푗푑 , with 훿 푗푘 denoting the Kronecker delta. 푠푚,푘 푞푚,푘 . The change in the state vector due to the 푚th i 푖1 푖푑 푖푘 − 푚 ··· reaction is then given as x x ν ( ) . Accordingly, the product between a tensor-operator and a ten- Assuming a stochastic model→ + of the system, the state vec- sor, the result of which is elementwise given as tor x is a realization of a continuous-time jump process ∑︁ X 푡 . Then, the evolution of the time-dependent PMF Ap 푡 i = Ai,j pj 푡 , (8) 푡 0 ( ( )) ( ) { ( )} ≥ j 푝 푡,x = 푝푡 x = Pr X 푡 = x ( ) ( ) ( ( ) ) can be seen as a generalization of the standard matrix-vector = Pr 푋 푡 = 푥 , . . ., 푋푑 푡 = 푥푑 (2) ( 1 ( ) 1 ( ) ) product. Tensor-train approximation of the chemical master equation and its application for parameter inference 3

The complexity for storing for storing the tensor p 푡 , which a low-rank TT approximation can be efficiently computed with contains all state probabilities at time instance 푡,( is) 푛푑 , 푑 sequential singular value decompositions (SVDs) of auxil- O( ) 푘 푛푘 푛1푛2 푛푘 1푛푘 1 푛푑 27 where 푛 = max푘 푛푘 . Therefore, even if a truncated state iary matrices A( ) R ×( ··· − + ··· ) . For tensors space is employed,{ the} storage needs can become intractable defined implicitly by∈ multidimensional functions, e.g. similar even for a relatively small number of species. The exponential to (5), efficient interpolation-based TT approximation meth- dependence of storage needs to the number of species is one ods are available31,32, in which case the assembly of the full manifestation of the so-called curse of dimensionality30. As tensor is not necessary. 28 푛1 푛푑 푚1 푚푑 a remedy to this problem, low-rank tensor formats can be A tensor-operator A R( ×···× )×( ×···× ) can be sim- employed, such as the TT decomposition27 discussed next. ilarly decomposed using∈ the TT format, in which case it is In the following, a commonly employed operation between elementwise written as: tensors, tensor-operators, or matrices, is the Kronecker prod- 푅1 푅2 푅푑 1 푛1 푛푝 uct. The Kronecker product between two tensors x R ×···× ∑︁ ∑︁ ∑︁− 1 2 푑 Ai,j = g( ) g( ) g( ) , (11) 푚1 푚푞 ∈ 1푖1 푗1푟1 푟1푖2 푗2푟2 푟푑 1푖푑 푗푑1 and y R ×···× is defined elementwise as ··· ··· − ∈ 푟1=1 푟2=1 푟푑 1=1 − x y ij = xiyj . (9) 푘 푅푘 1 푛푘 푚푘 푅푘 ( ⊗ ) where the TT-cores g( ) R − × × × are now four- The definition holds also for matrices and vectors, as they dimensional. The product∈ between a tensor-operator and a can be interpreted as two-dimensional and one-dimensional tensor, e.g. the Kronecker product defined in (8), can be effi- tensors, respectively. ciently computed directly in the TT-format27.

III. SOLVING THE CHEMICAL MASTER EQUATION IN THE B. Linear system solutions in the TT format TENSOR-TRAIN FORMAT Of particular interest in the context of this work is to A. Tensor-train decomposition solve efficiently a multilinear system Ax = b, where A 푛1 푛푑 푛1 푛푑 푛1 푛푑 ∈ R( ×···× )×( ×···× ) and x, b R ×···× are given in the As discussed in the previous section, the size of a tensor TT format. Iterative Krylov subspace∈ solvers, e.g. based on scales exponentially with the number of dimensions, equiva- the conjugate gradient (CG) or generalized minimal residual lently, number of species in this work. To mitigate the curse of (GMRES) methods, can be generalized to multilinear systems dimensionality, we employ tensor decompositions, resulting in given in the TT format33. However, without preconditioning, tensor formats whose sizes scale linearly with the number of the number of iterations is large and leads to a large number of dimensions, instead of exponentially. Several tensor decompo- rounding operations in order to keep the TT-rank small, thus sitions have been developed over the last decades, resulting in resulting in an undesirable computational cost33,34. better-scaling tensor formats28. In this work, we focus on the An alternative approach is to minimize the norm of the so-called tensor-train (TT) decomposition, which combines system’s residual with respect to the cores of the solution’s linear complexity scaling with respect to the dimensions and TT-decomposition. The corresponding minimization problem computational stability27. reads 푛1 푛푑 A tensor x R ×···× is said to be in the TT format if it ∈ 2 can be elementwise written as min Ax b F , (12) x R푛1 푛푑 k − k ∈ ×···× 푅1 푅2 푅푑 1 ∑︁ ∑︁ ∑︁− 1 2 푑 xi = g( ) g( ) g( ) , (10) where F denotes the Frobenius norm. The feasible set 1푖1푟1 푟1푖2푟2 푟푑 1푖푑1 푛 k·k푛 푛 푛 ··· ··· − R 1 푑 R 1 푑 푟1=1 푟2=1 푟푑 1=1 ×···× is restricted to a subset of ×···× which contains − all 푑-dimensional tensors that can be represented in the TT 푘 푅푘 1 푛푘 푅푘 where the three-dimensional tensors g( ) R − × × are format with TT-ranks 푅푘 푅 , 푘 = 1, . . ., 푑 1. In this case, ∈ ≤ max − called the TT-cores and R = 1, 푅1, ..., 푅푑 1,1 are called the the minimization problem is nonlinear with respect to the core ( − ) 푘 TT-ranks. The storage complexity of a tensor in the TT for- tensors g( ) of the TT representation of x given in (10). mat is reduced to 푁푅2푑 , i.e. it is linear with respect to The nonlinear minimization problem (12) can be solved O( ) the number of dimensions 푑. Moreover, all basic multilinear using the alternating least squares (ALS) method35,36. The algebraic operations scale also linearly with the dimensions method fixes all cores but one, thus resulting in a quadratic and polynomially with the TT-ranks27. It should be noted optimization problem for the minimizing core. The process that the TT-ranks grow after a multilinear algebraic operation is then repeated iteratively, minimizing one core at a time is performed in the TT-format. Therefore, a rank-reduction until the objective function decreases to a sufficiently small procedure is performed after the operation, called rounding27. value34. The main drawback of the ALS method is that the Rounding decreases the TT-rank of the tensor while maintain- TT ranks must be given a priori. This can be avoided by ing a prescribed accuracy 휖 and its complexity is 푅3푁푑 . minimizing over two consecutive cores instead of just one, an O( ) An exact TT decomposition of a full tensor typically leads to idea that was first introduced by the Density Matrix Renor- high ranks R, hence, to high storage needs and computational malization Group (DMRG) algorithm37 and which was later costs as well. However, in many cases, using a low-rank TT ap- used in the Alternating Minimal Energy (AMEn) method for proximation eA A is sufficient . If the full tensor is available, solving high-dimensional multilinear systems29,38. In order ≈ Tensor-train approximation of the chemical master equation and its application for parameter inference 4 to bring the minimization problem to a quadratic form, a so- alternative method10,29 is to employ a basis representation of 푘,푘 1 푅 푛 푛 푅 called supercore g( + ) R 푘 1 푘 푘 1 푘 1 is first defined the time-dependent solution over an interval 0,Δ푇 , such that e − × × + × + [ ] as ∈ 푇 푘,푘 1 푘 푘 1 ∑︁ g( + ) = g g , (13) 푝 푡,x i = pi 푗 푏 푗 푡 , (18) e ( ) ( + ) ( ( )) ( ) 푗=1 thus removing all information regarding the rank 푅푘 . Then, the TT-representation of x takes the following form (in ele- where 푏 푗 푡 are basis functions for the interpolation in time. mentwise notation): In this work,( ) we employ a Chebyshev basis, however, other options are also possible, e.g. hat functions or Lagrange poly- ∑︁ 1 푘,푘 1 푑 xi = g( ) g( + ) g( ) . (14) 1푖1푟1 e푟푘 1푖푘 푖푘 1푟푘 1 푟푑 1푖푑1 nomials. ··· − + + ··· − 푟푖 ,푖≠푘 By including time as an additional dimension next to the states, a 푑 1 -dimensional tensor is obtained. We then per- The minimization then proceeds similar to the ALS method, ( + ) where now one supercore is minimized in each iteration. Af- form a Galerkin projection to recover the degrees of freedom ter the optimization procedure is completed, the supercore is for the entire subinterval, by solving the system divided into two separate TT-cores, e.g. by means of SVDs of 31 Mp = f, (19a) auxiliary matrices . Then, the rank 푅푘 is not a priori given, M = In S V In P A I푇 , (19b) but identified as part of the supercore’s separation. ⊗ ( + ) − ( ⊗ )( ⊗ ) 0 f = p( ) v, (19c) ⊗ C. Low-rank TT representation of the CME operator 29,41 where S is the stiffness matrix, P is the mass matrix , I푁 푁 푁 ∈ R × denotes an identity matrix, and In = I푛1 I푛푑 . The The CME operator in (7) can be represented in the TT system (19a) can then be solved as described in⊗ ·section · · ⊗ III B. format using a sum of rank-1 tensors, without ever assembling Due to the increased complexity of the solution over long the full tensor-operator10,39. For a reaction of type (1) with the simulation times, one can divide the time domain and apply propensity function (4), one can use the separation the presented method to the individual subdomains by taking the end state as initial condition for the next subinterval. For 푚 푚 훼푚 x = 푐푚 푓1( ) 푥1 푓푑( ) 푥푑 , (15a) a constant subinterval length and if the solution is smooth, ( ) ( )··· ( ) 29,41 푚 푥푘 ! the convergence is exponential in 푇 . An error indicator is 푓푘( ) 푥푘 =  . (15b) represented by computing the norm of the residual of (19a) for ( ) 푞푚,푘 ! 푥푘 푞푚,푘 ! − an enriched basis29 Then, the CME tensor-operator defined in (7) can be writ- ten as the difference between two tensor-matrices B, C 휀 푇,Δ푡 = M0Qp f0 퐹 , (20) 푛1 푛푑 푛1 푛푑 ∈ ( ) || − || R( ×···× )×( ×···× ) , i.e. where M0 and f0 are the tensors from (19b) and (19c) con- Ai,j = Bi,j Ci,j , (16) structed for 푇 0 = 2푇. The operator Q interpolates the solution − on the finer time-domain basis with 푇 = 2푇. This can be used both of which admit exact rank-1 TT decompositions, with 0 푘 to adapt the subinterval length if the indicator 휀 푇,Δ푡 is larger the corresponding four-dimensional TT-cores g( ) (for B) and 29 ( ) 푘 than a prescribe tolerance tol h( ) (for C) given by 1   푇 푘 푚 푖푘 푗푘 푚 tol g1(푖 )푗 1 = 푓푘( ) 푥푘( ) 훿푖 I푛푘 푥푘 푗푘 휈푘( ) , (17a) Δ푡0 = Δ푡, (21) 푘 푘 ( ) 푘 ( ( ) + ) 휀 푇,Δ푡 푘 푚 푗푘 푗푘 푚 h( ) = 푓 ( ) 푥 ( ) 훿 I푛 푥푘 푗푘 휈( ) , (17b) ( ) 1푖푘 푗푘 1 푘 ( 푘 ) 푖푘 휈푘 푘 ( ( ) + 푘 ) − where Δ푡0 is the modified subinterval length. One bottleneck where the indicator functions I푛 are defined as I푛 푥 = 1 is the prescribed accuracy of the TT-solver, which, as shown 푘 푘 ( ) if 푥 푛푘 and 0 otherwise. Then, for the 푚th reaction, the in the numerical results, acts like a lower limit for the error. maximum≤ TT-rank of the tensor-operator A is at most equal In the same framework, one can also include classical im- to 2. If the operators coming from 푀 different reactions are plicit time stepping schemes, such as the Crank-Nicolson and added, the maximum TT-rank is at most 2푀. However, in the implicit Euler, by appropriately choosing the stiffness and practical examples, rank rounding with a very high accuracy, mass matrices29. The motivation behind this is that the time 12 e.g. eA A F 10− A F, yields a TT approximation eA A dynamics is captured in the low rank structure of the tensor, withk much− lowerk ≈ TT-ranks.|| || ≈ reducing the storage complexity. Moreover, as observed from numerical experiments, the runtime time is also reduced.

D. Solving the CME in the TT format E. Quantized TT CME solver The CME can be solved numerically in the TT-format using finite differences40. This limits the choice of the time dis- One way to optimize the aforementioned TT-based CME cretization to classical implicit and explicit schemes29. An solver is to employ the so-called quantized tensor train (QTT) Tensor-train approximation of the chemical master equation and its application for parameter inference 5 decomposition42. The QTT format has proven to further in- to the full tensor. We choose the test functions q θ from the crease the storage and computational efficiency of the TT rep- same space, which yield the formulation ( ) resentation by reshaping the tensors into higher dimensional ones while reducing the mode sizes. Let x R푛1 푛2 푛푑 be dp θ ∈ × ×···× ( ) , q = Ap, q , (24) a tensor with log2 푛푘 N, 푘 = 1, ..., 푑, i.e. with mode sizes that h d푡 i h i are powers of 2. If the∈ tensor x is represented in the TT format, where , is an inner product with respect to θ. One can then then reshaping it into a Í log 푛 -dimensional tensor can 푘 2 푘 deriveh· the·i multilinear system be easily achieved by performing( the) TT-decomposition on the individually reshaped cores. dp θ 푛1 푛푑 Let x R ×···× be a tensor with log2 푛푘 N. The tensor M ( ) = Kp, (25) ∈ ∈ 푘 d푡 admits a rank R TT-decomposition with the cores g( ) . The quantization process implies reshaping the individual cores with the mass tensor-matrix to tensors of shape 푟푘 1 2 2 푟푘 and then the TT- ∫ decomposition of the reshaped− × × · ·cores · × × is computed. The re- i Mmn,il = 훿m 퐿n θ 퐿l θ dθ, (26) sulting cores correspond to the QTT decomposition of the ( ) ( ) tensor32. In the case of the tensor-matrices, the procedure is P similar. Compared to the computational complexity of the and the stiffness tensor-matrix solver, the complexity of the transformation between TT and ∫ QTT can be neglected. This procedure has been proven to be Kmn,il = Am,i θ 퐿n θ 퐿l θ dθ. (27) effective for reducing the storage requirements and the com- ( ) ( ) ( ) 10,11 putation time for solving the CME in the TT format . If the P ranks of the QTT decomposition remain bounded, the storage The mass matrix can be easily constructed as a rank-1 TT- complexity 푑 log 푁 43. Moreover, the solver benefits from O( 2 ) operator using the individual mass matrices of the univariate the linearity with respect to the number of dimensions. bases. On the contrary, the stiffness matrix requires the evalu- ation of the parameter dependent CME operator over a tensor  푟1 ℓ1  푟2 ℓ2 product quadrature grid Θ = 휃 ( ) 휃 ( ) IV. BAYESIAN INFERENCE FOR THE CHEMICAL MASTER 1 푟1=1 × 2 푟2=1 × · · · × EQUATION WITH PARAMETER DEPENDENCIES  푟푛푝 ℓ푛푝 휃푛( ) , such that 푝 푟푛푝 =1

A. Parameter-dependent CME ∑︁ r r Kmn,il wrA¯ mr,ir 퐿n θ( ) 퐿l θ( ) , (28) ≈ r ( ) ( ) We now consider a parameter-dependent CME10, described ¯ r  k by where w is the weight tensor and Aik,jr = A θ( ) 훿r . The tensor-matrix K admits a TT representation or approximation, dp θ ( ) = A θ p θ , (22) assuming that the evaluation of the CME operator can also be d푡 ( ) ( ) represented or approximated in the TT format. ¯ where θ R푛푝 denotes the parameter vector. The parameter A direct construction of the extended operator A can be vector θ is∈ assumed to take values in the tensor-product space easily accomplished if each parameter affects only one reac- 휃min, 휃max 휃min, 휃max . tion. For example, this situation occurs if the parameters are = 1 1 푛푝 푛푝 P [ ] × · · · × [ ] l the reaction rates, i.e. θ = 푐1, 푐2,... . Then, the operator is Solving the CME for one parameter realization θ( ) Θ, ( ) l ∈ extended using the Kronecker product, such that l = 푙 , . . .,푙푛 , yields the conditional PMF 푝푡 x θ . If ( 1 푝 ) | ( ) instead of a fixed initial PMF, one starts with the joint PMF 1   1 ℓ1   A¯ = A( ) diag 휃 ( ) , ..., 휃 ( ) Iℓ Iℓ 푝푡 x,θ , the solution will be the joint PMF over the dis- 1 1 2 3 0 ( ) ⊗ ⊗ ⊗ ⊗ · · · + cretized time interval. Since our goal is to compute the con- 2   1 ℓ2   A( ) Iℓ diag 휃 ( ) , ..., 휃 ( ) Iℓ ..., (29) ditional/joint PMF over the entire parameter space, we use a ⊗ 1 ⊗ 2 2 ⊗ 3 ⊗ · · · + basis expansion44 to describe the parameter dependence, such 푚 that where A( ) is the CME operator corresponding to reaction 푚 for a unity reaction rate. We note that parameter dependence is not necessarily re- ∑︁∑︁ 푝푡 x i ,θ p 푡 퐿l θ , (23) stricted to the reaction rates. Equation (29) can be extended to ( ( ) ) ≈ il 푗 ( ) ( ) 푗 l accommodate other types of parameter dependencies. If the propensity functions have a representation as in (15b) and ev- 푛1 푛푑 ℓ1 ℓ푛푝 ` where p R ×···× × ×···× , 퐿l l=1 is a tensor-product ery 훼푚 depends on at most one parameter, then the individual ∈ 1 푛푝 { } basis 퐿l θ = 퐿 ( ) 휃1 퐿 ( ) 휃푛푝 . CME operator for every reaction evaluated on the grid Θ can In this( work,) B-spline( )··· basis( functions) are chosen for the be expressed as a sum of rank-1 TT tensors and then rounded parameter dependence45. to eliminate overshooting ranks. Moreover, even the structure In order to retrieve the degrees of freedom, a Galerkin for- of the reaction network can be incorporated as a parameter, mulation is used to derive a multilinear system with respect however, this is out of scope for the present work. Tensor-train approximation of the chemical master equation and its application for parameter inference 6

B. Filtering and smoothing in the TT format with the time-varying smoothing propensity functions

푚 훽 x ν ( ) ,푡 One relevant inference task in computational biology ap- 훼˜ 푚 x,푡 = 훼푚 x ( + ) . (35) plications is the filtering and smoothing of observations, for ( ) ( ) 훽 x,푡 ( ) example, in order to estimate the dynamics of genes that are The method is also known in the literature as the forward- measured indirectly via a fluorescent reporter protein. We 47 푁 backward algorithm and can be interpreted as a message- consider 푁 state observations y 푗 o , which are sampled o ( ) 푗=1 passing algorithm in a Hidden Markov Model (HMM). More-  푁o at discrete time steps 푡 푗 . The observations are considered over, it can be efficiently performed in the TT-format, as shown 푗=1 푗 0 푗 1  푗 푁o in Algorithm 1. The PMF 푝 x( ) y ( ) ,...,y ( − ) is the for- to be realizations of the random variables Y ( ) , which ( | 푗 ) 푛1 푛푑 푗=1 ward message and is denoted by the tensor a( ) R ×···× . are assumed to be conditionally independent given the latent The prediction step is performed by solving the CME∈ over the  푁o 푗 1 states X 푡 푗 푗 1. Thus, the observation model is assumed interval 푡 푗 1,푡 푗 with the initial condition a( − ) . The ob- ( ) = − [ ] obs 푛1 푛푑 to be dependent only on the current state. Its probability den- servation is used to construct a tensor p R ×···× with obs 푗  ∈ sity function (PDF) is denoted with 푝Y X y x . In practice, p = 푝Y X y ( ) x i . If every species is observed indepen- | ( | ) i | | ( ) 푝Y X often corresponds to additive Gaussian or multiplica- dently, i.e. the observation model can be factorized, the tensor | tive lognormal noise. However, the presented framework is pobs is rank-1 and can be expressed as a Kronecker product. 푗 1 푁o 푗  not limited to these particular cases. For the backward pass, the message is 푝 y ( + ) ,...,y ( ) x( ) 1 푗 푗 | The conditional probability Pr X 푡 = x y ( ) ,...,y ( ) for and is represented with the tensor b . The CME is now solved ( ( ) | ) ( ) 푗 = max 푘 N 푡푘 < 푡 satisfies the unconditional master using the transposed operator A> and with the initial condition 46{ ∈ | } 푗 1 obs equation b( + ) p , where denotes the elementwise multiplication operation.∗ The last step∗ is to multiply and normalize the for- 푀 d푝푡 x ∑︁ n 푚 푚 o ward and the backward messages in order to get the conditional ( ) = 훼푚 x ν ( ) 푝푡 x ν ( ) 훼푚 x 푝푡 x , 푗 0 푁o  d푡 ( − ) ( − ) − ( ) ( ) 푝 x( ) y ( ) ,...,y ( ) . In the presented framework, we a get 푚=1 smoother| distribution only at the observation points. In order (30a) to have information in between the observations, prediction steps must be added. with the reset conditions

1  푗  Algorithm 1 Forward-backward algorithm in the TT format. 푝 x = 푝 x 푝 y ( ) x , (30b) 푡푗 푡−푗 Y X 푁 ( ) 푍 푗 ( ) | | Input: Sample y 푗 o , initial PMF p 0 { ( ) } 푗=0 ( ) 0 0 a( ) p( ) where 푝푡 x represents the left approaching limit 푡 푡 푗 ,푡 < ← −푗 ( ) → for 푗 = 1, ..., 푁o do Í 푗  푗 1 푡 푗 , and 푍푖 = x 푝푡− x 푝Y X y ( ) x . If all observations are Solve the CME with a( − ) as initial condition. 푗 ( ) | | obs 푗 taken into consideration, we deal with the smoothing case. The Compute p for y( ) in the TT format. 푗 obs 푗 1 1 푁o a p a PMF 푝˜푡 x of Pr X 푡 = x y ( ) ,...,y ( ) can be factorized ( ) ← i ∗ ( − ) as ( ) ( ( ) | ) end for b 푁o 1 ( ) ← for 푗 = 푁o 1,...,0 do 푝˜푡 x = 푝푡 x 훽푡 x , (31) − obs 푗 1 ( ) ( ) ( ) Compute p for y( + ) in the TT format. Solve the CME with operator A> and initial condition where the PMF satisfies the backward master equation 1 푗 1 obs 푍− b( + ) p . 푗 obs∗ 푗 1 b( ) p b( + ) 푀 end for← ∗ d훽푡 x ∑︁ n 푚 o ( ) = 훼푚 x 훽푡 x 훼푚 x 훽푡 x ν ( ) , (32a) for 푗 0, ..., 푁 do d푡 ( ) ( ) − ( ) ( + ) = o 푚=1 p 푗 푍 1a 푗 b 푗 ( ) ← − ( ) ∗ ( ) 1 푗 end for 훽 x = 푝 x 푝 y ( ) x , (32b) 푗 푡−푗 푡푗 Y X Output: p for 푗 = 0, ..., 푁o ( ) 푍 푗 ( ) | ( | ) ( ) where 훽 x,푡푁 = 1 is the terminal condition and ( o ) C. Bayesian parameter inference in the TT format 푗 푁o 훽푡 x 푝 y ( ) ,...,y ( ) X 푡 = x , (33) ( ) ∝ ( | ( ) )  푗 푁o Consider an observation sample y ( ) satisfying the where 푗 = min 푘 N 푡푘 > 푡 . The PMF 푝˜푡 x satisfies the 푗=1 evolution equation{ ∈ | } ( ) assumptions detailed in section IV B, but now connected to a realization of the random process X 푡,θˆ , where θˆ is the ( ) 푀 parameter vector governing the system. Given a prior dis- d푝 ˜푡 x ∑︁ n 푚 푚 o ( ) = 훼˜ 푚 x ν ( ) ,푡 푝˜푡 x ν ( ) 훼˜ x,푡 푝˜푡 x , tribution 푝 θ , we are interested in computing the Bayesian d푡 ( − ) ( − ) − ( ) ( ) ( ) 0 푁o 푚 1 parameter posterior 푝 θ y ,...,y . By viewing the pa- = ( | ( ) ( ) ) (34) rameters as part of an augmented process X 푡 ,θ 푡 푡 0, the { ( ) ( )} ≥ Tensor-train approximation of the chemical master equation and its application for parameter inference 7 distribution of the parameters over the parameter space can TABLE I: Reactions of the simple gene expression model. be obtained by performing filtering over the joint spaceP of states and parameters. The prediction step is given by Reaction 훼푚 x Rates 푐푖 Description mRNA 푐 푥( ) 0.002 mRNA degradation → ∅ 1 1 mRNA mRNA Protein 푐2푥1 0.015 Translation  푗 푗 0 푗 1  → + 푝 x( ) ,θ( ) y ( ) ,...,y ( − ) = mRNA 푐3 0.1 Transcription | Protein∅ → 푐 푥 0.01 Protein degradation ∫ → ∅ 4 2 ∑︁ n  푘 푗 푗 1 푗 1  푝 푗 푗 1 x( ) ,θ( ) x( − ) ,θ( − ) | − 푗 1 | 푥 ( − )  푗 1 푗 1 0 푗 1  o 푗 1 Algorithm 2 Parameter identification for the parameter- 푝 x( − ) ,θ( − ) y ( ) ,...,y ( − ) dθ( − ) , (36) | dependent CME in the TT-format. 푁 n 푗 o o 0 Input: Sample y( ) , initial PMF p( ) , prior over the param- 푗=0 where 푝 푗 푗 1 is the transition PDF and implies solving the prior | − eter space p parameter-dependent CME from 푡 푗 1 to 푡 푗 . Next, the update p 0 p 0 pprior − ( ) ← ( ) ∗ step reads for 푗 = 1, ..., 푁o do 푗 1 Solve the CME with p( − ) as initial condition to obtain the 푗 푗 1 solution p( → + ) .     obs 푗 푗 푗 0 푗 1 푗 Compute p for y( ) in TT. 푝 x( ) ,θ( ) y ( ) ,...,y ( ) = 푝 y ( ) x Y X 푗 1 obs 푗 푗 1 | 푍 | | p( + ) p p( → + )   il ← i il 푗 푗 0 푗 1 푗 1 1 푗 1 Í 푗 1 푝 x( ) ,θ( ) y ( ) ,...,y ( − ) . (37) p( + ) 푍− p( + ) for 푍 = pil( + ) wl | ← il end for post 1 Í 푁o pl 푍− pil( ) ← i In the TT-format, this parameter inference proce- Output: ppost dure can be implemented as follows. The posterior 푗 푗 0 푗  푝 x( ) ,θ( ) y ( ) ,...,y ( ) is represented by the tensor p 푛1 푛푑 ℓ1| ℓ푛 ∈ R ×···× × ×···× 푝 , such that V. NUMERICAL RESULTS

 푗 푗 0 푗  ∑︁ 푗 푝 x( ) ,θ( ) y ( ) ,...,y ( ) = p( ) 퐿l θ . (38) The following numerical experiments aim to showcase the | xl ( ) l advantages of the proposed framework in terms of accuracy and computational efficiency. With respect to the latter, storage The prediction step involves solving the parameter-dependent requirements and computation times are reported for every CME with 푝 x 푗 ,θ 푗 y 0 ,...,y 푗  as initial condition, re- individual test case. All tests were run on a workstation with a ( ) ( ) | ( ) ( ) 10-core Intel Xeon processor with 64GB of RAM. For the TT turning the predicted PMF p pred as a result. The resulting ( ) operations, the ttpy Python package was used in combination tensor is multiplied with the observation tensor at step 푗 1 and with the Intel MKL library. normalization is performed to get the new joint distribution+

푗 1 1 obs 푗 푗 1 A. Validation of the TT-based CME solver pxl( + ) = 푍− px pxl( → + ) , (39)

1. Two-dimensional simple gene expression model Í 푗 1 where 푍 = pil( + ) wl is the normalization constant and comes il from numerically integrating over the parameter space with We first validate the developed TT ODE solver and perform the integration weights tensor w. A prior distribution 푝 θ 0  a convergence study based on the two-dimensional simple gene ( ) 48 and an exact knowledge of state for 푗 = 0 is used for the first expression model . The four reactions are presented in Table 0 0 0  0 0  I. The initial state is x 0 = 2,4 with probability 1. The step, where 푝 x( ) ,θ( ) y ( ) = 푝 x( ) 푝 θ( ) . Once all ( ) > | ) ( CME is solved in the time interval( ) 0,1024 with a subinterval observations have been used, the state is marginalized to obtain [ ] the posterior over the parameter space as size of 128, where arbitrary time units are used. The first validation test concerns the maximum rela- tive error of the solution to the CME, computed with the  푗 0 푗  ∑︁  푗 푗 0 푗  푝 θ( ) y ( ) ...,y ( ) = 푝 x( ) ,θ( ) y ( ) ,...,y ( ) , (40) method presented in III D, in dependence to the time di- | | x 푗 mension 푡 of the basis representation from (18) inside ( ) one subinterval. The maximum relative error is given as ref max 푝 ( ) x 푝푡 x max 푝푡 x , where 푡end = 1024 which is computationally efficient if performed in the TT- 푡end ( ) − end ( ) / end ( ) ref format. The procedure is summarized in Algorithm 2. and the reference solution 푝 ( ) x is computed by numer- 푡end ( ) Tensor-train approximation of the chemical master equation and its application for parameter inference 8

3 10− 2 10−

5 10− 5 10− 7 10− ) = 2 ) = 3 9 ) = 4 10− 8 10− ) = 5 maximum relative error Implicit Euler maximum relative error ) = 6 11 10− Crank-Nicolson ) = 7 11 Chebyshev 10− ) = 8 13 10− 1 2 12 10 8 6 4 2 10 10 10− 10− 10− 10− 10− 10− #C n

FIG. 1: Convergence of the TT-solver with respect to the FIG. 2: Error versus solver accuracy for different sizes of the dimension of the time basis. basis.

TABLE II: Reactions of the SEIR model. ically solving the CME without the TT-decomposition for a very fine time grid. We note that no truncation of the TT- Reaction 훼푚 x Rate 푐푖 Description rank was performed during this validation test, and the relative 푆 퐼 퐸 퐼 푐 푥( 푥) 0.1 Susceptible becomes exposed + → + 1 1 3 residual which signifies the convergence of the TT-solver was 퐸 퐼 푐2푥2 0.5 Exposed becomes infected 13 → set to 10− . 퐼 푆 푐3푥3 1.0 Infected recovers without immunity The results of this first validation set are presented in Fig. 1, 푆 → 푐 푥 0.01 Susceptible dies → ∅ 4 1 where the employed time-interpolation on the Chebyshev poly- 퐸 푐5푥2 0.01 Exposed dies → ∅ nomial basis, as shown in (18), is compared against classi- 퐼 푅 푐6푥3 0.01 Infected recovers with immunity → 푆 푐 0.4 New susceptible individuals arrive cal time-stepping methods such as implicit Euler and Crank- ∅ → 7 Nicolson finite-difference schemes. As would be expected, irrespective of the time-stepping method, the TT-based CME solver yields increasingly more accurate results for finer dis- in terms of computation time and storage needs. cretizations of the time interval. Moreover, as expected from The individuals of the virus spreading model are separated theory, the convergence of the explicit Euler scheme is Δ푡 , into four distinct categories: 1) Susceptible (푆), i.e. indi- O( ) accordingly Δ푡2 for Crank-Nicolson49. In the case of the viduals who may become infected; 2) Exposed (퐸), i.e. in- O( ) Chebyshev polynomials, an exponential convergence is ob- fected individuals who are not yet contagious; 3) Infected served and the error stagnates for a basis of only 푇 = 8 poly- (퐼), i.e. infected individuals who are contagious; 4) Re- nomials. covered (푅), i.e. individuals with immunity to the virus. The combined impact of the time step and the maximum The interactions between the individuals are described by residual of the TT solver is investigated next for the Chebyshev the reactions presented in Table II. The initial condition basis representation, with the results presented in Fig. 2. As is x 0 = 50,4,0,0 and the state space is truncated to ( ) ( )> can be observed, for a fixed 푇, the accuracy of the TT-solver’s n = 푛1,푛2,푛3,푛4 = 128,128,64,64 . The simulation was solution stagnates after a certain value of the maximum resid- performed( over the) interval( 0,8 .) For the TT-solver, the ual. The stagnation point is actually dependent on the value of subinterval length is equal to[ 0.5] and the subinterval basis 푇, i.e. smaller 푇 values allow for smaller maximum residuals. dimension is 푇 = 8. The reference solution is computed by Hence, the step size and maximum residual must be chosen ac- numerically solving the CME without the TT-decomposition cording to the desired accuracy of the TT-solution, also taking for a very fine time grid. into consideration the related computational cost. Even if a sparse format is employed, 1.3 GB RAM are needed to store the CME operator for the≈ reference solution. In comparison, using the TT-format, the CME TT-operator has 2. Four-dimensional SEIR model the TT-ranks R = 1,5,6,3,1 , resulting in storage needs of only 2.32 MB RAM,( i.e. 0.17%) of the storage space needed For the previously considered two-dimensional model, the by the≈ standard solver. If the operator is reshaped in the QTT reference solution could easily be obtained using an ODE format, the storage requirements decreases to 42 KB. More- solver. We now consider a four-dimensional virus spread- over, using the solver with the QTT format,≈ the solution is ing model, namely, the SEIR model50, in which case standard obtained in 180 s, which is a considerably smaller compu- ODE solution methods result in high computational demands tation time than≈ the one needed for the reference solution, i.e. Tensor-train approximation of the chemical master equation and its application for parameter inference 9

10 2 1 · − 60 60 0.8 3

40 0.6 40 3 3

G 2 G 0.4 20 20 1 0.2

0 0 0 0 0 20 40 60 0 20 40 60 G 2 G2 (a) 푡 = 0. (b) 푡 = 2. 10 3 10 2 · − · − 60 60 1 6

40 40 4 3 3 G G 0.5

20 2 20

0 0 0 0 0 20 40 60 0 20 40 60 G2 G2 (c) 푡 = 4. (d) 푡 = 6. 10 2 10 7 · − · − 60 60

1 6

40 40 3 3 4 G G 0.5 20 20 2

0 0 0 0 0 20 40 60 0 20 40 60 G2 G2 (e) 푡 = 8. (f) Error at 푡 = 8.

FIG. 3: Time evolution of Exposed-Infected (퐸퐼) marginal distribution at 푡 0,2,4,6,8 and pointwise absolute error at ∈ { } 푡end = 8. The solution is computed in the TT-format and the reference is obtained by integrating the CME over a fine time grid.

12600 s. Without the use of the QTT format, the number the relative errors ≈of solver iterations and the computation time increase by one order of magnitude. Fig. 3 shows the time evolution of the marginal 퐸퐼 distribu- ref maxx 푝푡( ) x 푝푡end x 휖 = | end ( ) − ( )| = 2.9 10 5, tion, as well as the pointwise error at the end of the simulation max ref − maxx 푝 ( ) x · compared to the reference marginals. Finally, at 푡end we obtain | 푡end ( )| Tensor-train approximation of the chemical master equation and its application for parameter inference 10

1 ref Í 푝 ( ) x 푝 x TT-format. 푁 4 푡end 푡end x | ( ) − ( )| 휖 = = 2.539 10 9. ref − max 푝 ( ) x · x | 푡end ( )| 60 observations Hence, the TT-solver yields accurate solutions at significantly true state reduced execution times and with tremendous storage savings mean compared to the standard solver. Indicatively, the solution at 푡 = 8 requires only 2.5 MB storage, which is approximately 40 0.4% the storage requirement of the reference solution. One issue is the ordering of the species. If species that are highly correlated are apart from each other in the train, the #individuals ranks in between must carry the information and therefore the 20 overall rank structure increases. This can also be observed in the representation of the CME operator. In this example, the S,E,I,R ordering is chosen so that most of the reactions involve species that are neighbors in the chain. 0 2 4 6 8 10 C (a) Evolution of the number of susceptible individuals. B. Filtering and smoothing observations true state As discussed in IV B, state filtering and smoothing can 30 mean be performed in the TT-framework. We consider here the SEIR model presented in section V A 2. We assume 푁o = 33 equidistant observations with Δ푡 = 0.3125. The time interval 20 is now chosen as 0,10 . The realizations are obtained using the Stochastic Simulation[ ] Algorithm (SSA)4 (blue continuous lines in Fig. 4). The noise is assumed to be lognormal with #individuals 0.1 for 푆, 퐸, and 퐼, and 0.05 for 푅 (black symbol 10 in Fig. 4). × The TT-based forward-backward Algorithm 1 presented in section IV B is used to perform state filtering and smooth- ing.The state is truncated to 128,128,64,32 and the Cheby- 0 2 4 6 8 10 shev basis is used for the time( dependency. The) runtime of the C 6 SEIR experiment is 12 minutes for a solver accuracy of 10− (b) Evolution of the number of exposed individuals. in terms of relative TT-solver residual. As the estimated state, observations we compute the expected value of the distribution given by 20 푘 0 푁o  true state 푝 x( ) y ( ) ,...,y ( ) (red discontinuous line in Fig. 4) and the corresponding| standard deviation (grey envelope in Fig. 4). mean In this case, the incorporation of the observation model in the TT framework is beneficial to the reduction of the error since it acts like a reset condition. This can be observed in the de- 10 crease of the TT-rank after the multiplication with the tensor corresponding to the observation operator. The numerical ex- #individuals periment shows a decrease in the rank of up to 3 times, as can be observed in Fig. 5. One more significant advantage of the TT representation is 0 that the storage of the messages decreases dramatically com- pared to the full tensor representation, as the total storage 0 2 4 6 8 10 needed is 150 MB for the forward propagating messages C and 190≈MB for the backward propagating messages. In (c) Evolution of the number of infected individuals. addition≈ to that, computing the moments of the smooth dis- tribution consists of multiplication with rank-1 TT tensors. FIG. 4: Smoothing for the SEIR model with initial Moreover, the lognormal observation model is also translated population x = 50,2,1,0 >. The sample path is given by the to a rank-1 tensor. The presented results are performed in blue line, the observations( ) are marked with “ ”, and for the the QTT format, however the same test was performed with- smoothed distribution the mean (red dashed× line) and the out quantization. For the given state truncation, using the standard deviation (gray envelope) are plotted. QTT format results in an acceleration by more than an order of magnitude in computation time, compared to the standard Tensor-train approximation of the chemical master equation and its application for parameter inference 11

of the PDF: 150 E θ = 0.001924,0.01512,0.09985,0.01057 , [ ] ( ) 6 6 4 6 V θ = 1.034 10− ,8.685 10− ,5.428 10− ,7.624 10− , 100 [ ] ( × × × × ) θˆ = 0.001373,0.01412,0.09065,0.009567 . ( ) For comparison, the mean and variance of the reference pos- 50 terior sample are: maximum TT-rank

µθ = 0.001922,0.01507,0.09992,0.01052 , ( ) 2 7 6 4 6 σ = 9.975 10− ,8.498 10− ,5.233 10− ,7.518 10− . 0 θ ( × × × × ) 0 2 4 6 8 10 Since there is no analytical estimate for the combined error C of the method, several runs with different hyperparameters are performed for this model. First, the accuracy of the solver in FIG. 5: Ranks for the forward pass over simulation time terms of relative residual, here denoted with 휖, is varied and (triangle markers). the relative error of the TT-solver’s solution is analyzed, first with respect to the MCMC solution, and second with respect 6 to the most accurate solution of the TT-solver, i.e. for 휖 = 10− . C. Parameter inference The corresponding results are presented in Table III, where the simulation time and memory requirement for storing the joint in the TT-format are also reported. As can be seen from the 1. Simple gene expression table, the accuracy of the MCMC solution is reached already 4 for a TT-sover accuracy of 휖 = 10− . Looking at the memory We now use Algorithm 2 to identify all four parameters consumption and the execution time, they both increase for a higher TT-solver accuracy, which is expected since more θ = 휃1, 휃2, 휃3, 휃4 = 푐1, 푐4, 푐3, 푐4 of the simple gene expres- ( ) ( ) solver iterations are needed to reach the desired residual. sion model from a noisy sample with 푁o = 64 observations. In this case, the solver uses the QTT format, the observations are taken equidistantly every 4 time units, and the parameter TABLE III: Simple gene expression model error analysis priors are independent, truncated Gamma distributions, cho- with respect to solver accuracy 휖. sen such that they do not match the actual parameter. The 4 parameter domain R is restricted to 휃푖 0,6푐푖 . As a refer- 휖 Error w.r.t. Error w.r.t. Time [min] Memory [MB] 6 ence, a sample of size+ 5 105 is drawn from∈ [ the posterior] using MCMC 휖 = 10− · 10 3 7.35 10 3 4.376 10 3 2.4 1.8 the Metropolis–Hastings algorithm. The CME in this case is − × − × − 10 4 2.35 10 3 1.309 10 3 7.4 4.57 solved in the full format with the built-in Python ODE solver. − × − × − 10 5 3.59 10 3 6.399 10 5 21 9.29 With respect to the parameter dependence approximation, − × − × − 10 6 3.64 10 3 - 60 17.74 the basis of choice in this case is quadratic B-splines with − × − equidistant knots scaled to the parameter range. The dimen- sion of the individual univariate bases is 64. For the time Additionally, the dimension of the parameter basis has been integrator, a Chebyshev basis of dimension 푇 = 8 is used with investigated. If the tensor product basis is constructed using an subinterval size of 0.5 time units. The runtime is in this case univariate B-spline bases of dimension 16, the prior can be well 21 minutes with a maximum storage requirement of 9.2 ≈ ≈ approximated. However during the inference, the decrease of MB for the joint over state and parameters (represented by a the variance of the joint leads to an incapability to resolve the 6D tensor with mode size 64). The storage requirement for the posterior, since a finer basis is needed. If the discretization is extended CME operator in the QTT format is only 128 KB. increased to 32 for every parameter, the oscillations become Storing the tensor in the full format is intractable on standard negligible. machines even for this 2D example. For the given sample size, As a conclusion, the limiting factor in the inference frame- the Metropolis-Hastings algorithm run for 1.5 days. work seems to be the accuracy of the TT-solver. For the pur- ≈ 5 Since the posterior over the parameter space is four- pose of inference, however, a relative residual value of 휖 = 10− dimensional, hence, not easy to visualize, the marginals for seems to be sufficient for obtaining an accurate approximation the individual parameters are computed and compared to the and an acceptable computational time. 1D histograms of the posterior sample, for the purpose of val- idation. The results are presented in Fig. 6, where it can be observed that the posterior modes offer a reasonable approx- 2. Gene expression model with feedback imation of the true values, the latter being marked with red vertical dashed lines. As a further characterization of the pos- The second model where the parameter identification is em- terior, we compute its expected value, variance, and the mode ployed is the 3-stage gene expression model with feedback Tensor-train approximation of the chemical master equation and its application for parameter inference 12

1 0.005

0.000

0.05 2

0.00

0.25 3

0.00 0.04

4 0.02

0.00 0.000 0.005 0.00 0.05 0.00 0.25 0.000 0.025

1 2 3 4

FIG. 6: Posterior marginal distributions for the four unknown reaction rates of the simple gene expression model. The black regions correspond to high density of the PDF. The exact parameters are marked with the red dashed lines. For the 1D marginals, a histogram of the posterior sample is represented as a reference, as well as the prior (green dashed lines).

TABLE IV: Reactions of the 3 stage gene expression model. parameter dependence is approximated using a tensor product basis of univariate quadratic B-splines with dimension 64. The 5 Reaction 훼푚 x Rate 푐푖 tolerance of the TT-solver is set to 10− in terms of relative 퐺 퐺 푀 푐 푥( ) 4.0 residual. → + 1 1 푀 푀 푃 푐2푥2 10.0 The results are reported in Fig. 7 where we can see the 푀→ + 푐 푥 1.0 visual match between the histograms and the 1D marginals on → ∅ 3 2 퐺 푃 퐺∗ 푐4푥1푥3 0.2 the diagonal. The expected value and variance of the posterior 퐺 + →퐺 푃 푐 푥 0.6 ∗ → + 5 4 are: 푃 푐6푥3 1.0 → ∅ E θ = 4.0358,9.1720,1.8398,0.2378,1.0686 , [ ] ( ) V θ = 0.9649,1.5117,0.1669,0.005187,0.1172 . [ ] ( ) 51 loop . The reactions as well as the reaction rates are pre- As a comparison, the mean and variance of the reference pos- sented in Table IV. A realization is drawn using the SSA terior sample are computed using MCMC: and equidistant sampling is performed with additive Gaussian noise (see Fig. 8). µθ = 4.0503,9.1995,1.8443,0.2379,1.0680 , ( ) The parameters to be identified are in this case θ = 2 σθ = 0.9874,1.2367,0.4123,0.0720,0.3467 . 푐1, ..., 푐5 and the parameter space is bounded to 0,5푐푖 . ( ) For( the priors,) we choose again independent Gamma distribu-[ ] The relative error between the reference and the TT-solver- 3 tions, which are truncated within the parameter space. The based modes is in the range of 10− for the expectation and Tensor-train approximation of the chemical master equation and its application for parameter inference 13

20 1

0 50

2 25

5.0

3 2.5

1.0

4 0.5

3 2 5 1

10 20 25 50 2.5 5.0 0.5 1.0 0 2

1 2 3 4 5

FIG. 7: Posterior marginal distributions for the five unknown reaction rates of the 3 stage gene expression model. The black regions correspond to high density of the PDF. The exact parameters are marked with the red dashed lines. For the 1D marginals, a histogram of the posterior sample is represented as a reference, as well as the prior (green dashed lines).

2 10− for the variance. The limiting factor is in this case the rior size in the QTT-format of 30 MB. As a comparison, the small MCMC sample size. Using the TT solver, the execu- chosen state truncation of 128≈,64,64,32,32 would require tion time for this test case is 50 minutes. Regarding storage 4.2 GB only for storing( the state for one) parameter real- needs, only 12 MB of RAM are used. As a comparison, the ≈ization. The storage complexity for the parameter-dependent MCMC simulation≈ took approximately 2.5 days to complete CME operator in the QTT format is 200 KB. for a sample size equal to 5 105. ≈ ·

3. SEIQR model For this setup, the variance of the approximated posterior The model considered now is an extension of the SEIR (see Fig. 10) is two orders lower than the prior for the first model presented in the filtering section and has one additional parameter and one order lower for the second and third param- species: quarantined (Q). The modified reactions are found in eters. This implies a higher confidence in the reconstruction Table V. We infer in this case the parameters θ = 푐1, 푐2, 푐3, 푐4 of the parameters, which also indicates the need for a denser from 45 observations affected by lognormal noise( (see Fig.) basis. In this case, choosing less than 64 points per parameter 9). The species Susceptibe and Exposed are observed with would lead to oscillations and inability to infer the posterior. a higher degree of uncertainty while Quarantined and Recov- This can be overcome by adaptively reducing the bounds of ered are observed exactly. The execution time for a TT-solver the parameter domain and re-interpolating the posterior on the accuracy of 휖 = 10 5 is 55 minutes with a maximum poste- new basis. − ≈ Tensor-train approximation of the chemical master equation and its application for parameter inference 14

VI. CONCLUSION G 20 M P We presented a method based on the TT decomposition G∗ to solve the CME, either in its standard form or including 15 observations parameter dependencies, and approximate the joint distribu- tion over the state-parameter space, including the time depen- dency as well. Using the considered TT-framework, inference 10 tasks such as state smoothing and parameter identification can be performed accurately and efficiently. The proposed TT- #individuals framework is also applied to solve the inverse problem of 5 identifying the values of the parameters governing the sys- tem under investigation from noisy state observations. The resulting numerical approximation of the posterior PDF over 0 the truncated parameter space can be efficiently stored and manipulated in the TT format. 0 2 4 6 8 10 12 A series of numerical experiments with a simple gene ex- t [s] pression model and with the SEIR virus model show clearly that the state-time TT-approximation reduces the storage needs FIG. 8: Noisy observation sample for the 3 stage gene of the CME to a mere fraction of what a standard CME so- expression model (number of observations is 45). lution method requires. Moreover, by performing multilinear algebraic operations in the TT-format, the execution time is significantly reduced as well. Similar benefits are observable TABLE V: Reactions of the SEIQR model. in the context of inference tasks, where the proposed TT-based filtering, smoothing, and parameter identification approaches Reaction 훼푚 x Rate 푐푖 Description ( ) yield accurate results for a significantly reduced computational 푆 퐼 퐸 퐼 푐1푥1푥3 0.04 Susceptible becomes exposed cost. +퐸→ 퐼+ 푐 푥 0.4 Exposed becomes infected → 2 2 퐼 푄 푐3푥3 0.4 Infected is quarantined While standard inference procedures for the single trajectory 퐼 → 푐 푥 0.004 Infected individual dies setting such as MCMC require repeated solutions of the CME → ∅ 4 3 퐼 푅 푐5푥3 0.12 Infected recovers with immunity for different parameter configurations, the joint approach pre- 푄→ 푅 푐 푥 0.8765 Quarantined recovers with immunity sented here requires only one forward pass on the augmented → 6 4 퐼 푆 푐7푥3 0.01 Infected recovers without immunity state space. One drawback of the parameter space discretiza- → 푄 푆 푐8푥4 0.01 Quarantined recovers without immunity tion is that it can cause problems when the posterior is much → 푆 푐 0.01 New susceptible individual ∅ → 9 more concentrated then the prior. As demonstrated on the SEIQR model, this can be overcome by dynamically adapting the basis.

100 In this work, we have focused on inferring the rate constants S of structurally known models from a single trajectory. An im- E portant direction for future research is to extend the approach 80 I to multiple trajectories with shared parameters. Another in- Q teresting direction is to consider different types of uncertainty, R e.g. the involved species or the types of the reaction. Since 60 observations the presented method is fully Bayesian, this could be realized by scoring different candidate structures via Bayesian model 40 comparison. #individuals

20

0 ACKNOWLEDGMENT

0 1 2 3 4 5 C The work of I.G. Ion is supported by the Graduate School Computational Engineering within the Centre for Computa- FIG. 9: Noisy observation sample for the SEIQR model (num- tional Engineering at Technische Universität Darmstadt. D. ber of observations is 45). Loukrezis is supported by the German Federal Ministry for Education and Research (BMBF) via the research contract 05K19RDB. Tensor-train approximation of the chemical master equation and its application for parameter inference 15

0.2 1

0.0 2

2 1

0 2

3 1

0 0.02

4 0.01

0.00 0.0 0.1 0.2 0 1 2 0 1 2 0.00 0.02

1 2 3 4

FIG. 10: Posterior marginal distributions for the four unknown reaction rates of the SEIQR model. The black regions correspond to high density of the PDF. The exact parameters are marked with the red dashed lines and the prior with green dashed lines.

DATA AVAILABILITY arising in the discrete modelling of biological systems,” Proc of the Markov 150th Anniversary Conference (2006). 9V. Wolf, R. Goel, M. Mateescu, and T. A. Henzinger, “Solving the chemical The data and code that support the findings of this study are master equation using sliding windows,” 4, 42 (2010). freely available at https://github.com/ion-g-ion/paper-cme-tt. 10S. Dolgov and B. Khoromskij, “Simultaneous state-time approximation of the chemical master equation using tensor product formats,” Numerical Linear Algebra with Applications 22 (2015), 10.1002/nla.1942. 11 1D. T. Gillespie, “A rigorous derivation of the chemical master equation,” V. Kazeev, M. Khammash, M. Nip, and C. Schwab, “Direct Solution of the Physica A: Statistical Mechanics and its Applications 188, 404–425 (1992). Chemical Master Equation Using Quantized Tensor Trains,” PLOS Com- 2 putational Biology 10, 1–19 (2014). M. A. Gibson and J. Bruck, “Efficient exact stochastic simulation of chemical 12 systems with many species and many channels,” The journal of physical T. Dinh and R. B. Sidje, “An adaptive solution to the chemical master equa- chemistry A 104, 1876–1889 (2000). tion using quantized tensor trains with sliding windows,” Physical Biology 3 (2020). M. Hemberg and M. Barahona, “Perfect sampling of the master equation 13 for gene regulatory networks,” Biophysical journal 93, 401–410 (2007). H. D. Vo and R. B. Sidje, “An adaptive solution to the chemical mas- 4D. T. Gillespie, “A general method for numerically simulating the stochastic ter equation using tensors,” The Journal of chemical physics 147, 044102 (2017). time evolution of coupled chemical reactions,” Journal of computational 14 physics 22, 403–434 (1976). T. Jahnke and W. Huisinga, “A Dynamical Low-Rank Approach to the 5S. Engblom, “Spectral approximation of solutions to the chemical master Chemical Master Equation,” Bulletin of Mathematical Biology 70, 2283 (2008). equation,” Journal of computational and applied mathematics 229, 208–221 15 (2009). A. Gupta, J. Mikelson, and M. Khammash, “A finite state projection al- 6P. Deuflhard, W. Huisinga, T. Jahnke, and M. Wulkow, “Adaptive discrete gorithm for the stationary solution of the chemical master equation,” The Journal of Chemical Physics 147 (2017), 10.1063/1.5006484. Galerkin methods applied to the chemical master equation,” SIAM Journal 16 on Scientific Computing 30, 2990–3011 (2008). P.Kügler, “Moment fitting for parameter inference in repeatedly and partially 7 observed stochastic biological models,” PLOS ONE 7 (2012). B. Munsky and M. Khammash, “The finite state projection algorithm for the 17 solution of the chemical master equation,” The Journal of Chemical Physics C. Zechner, J. Ruess, P. Krenn, S. Pelet, M. Peter, J. Lygeros, and H. Koeppl, 124, 044104 (2006). “Moment-based inference predicts bimodality in transient gene expression,” 8K. Burrage, M. Hegland, F. Macnamara, and R. Sidje, “A Krylov-based Proceedings of the National Academy of Sciences 109, 8340–8345 (2012). Finite State Projection algorithm for solving the chemical master equation Tensor-train approximation of the chemical master equation and its application for parameter inference 16

18F. Fröhlich, P. Thomas, A. Kazeroonian, F. J. Theis, R. Grima, and J. Hase- 10.1137/110833142. nauer, “Inference for stochastic chemical kinetics using moment equations 35S. Holtz, T. Rohwedder, and R. Schneider, “The alternating linear scheme for and system size expansion,” PLOS Computational Biology 12 (2016). tensor optimization in the tensor train format,” SIAM Journal on Scientific 19B. Munsky, G. Li, Z. R. Fox, D. P. Shepherd, and G. Neuert, “Distribu- Computing 34, A683–A713 (2012). tion shapes govern the discovery of predictive models for gene regulation,” 36L. Grasedyck, M. Kluge, and S. Kramer, “Variants of alternating least Proceedings of the National Academy of Sciences 115, 7533–7538 (2018). squares tensor completion in the tensor train format,” SIAM Journal on 20Z. Cao and R. Grima, “Accuracy of parameter estimation for auto-regulatory Scientific Computing 37, A2424–A2450 (2015). transcriptional feedback loops from noisy data,” Journal of The Royal So- 37S. R. White, “Density-matrix algorithms for quantum renormalization ciety Interface 16, 20180967 (2019). groups,” Physical Review B 48, 10345 (1993). 21L. Bronstein and H. Koeppl, “A variational approach to moment-closure 38S. Dolgov and D. Savostyanov, “Alternating minimal energy methods for lin- approximations for the kinetics of biomolecular reaction networks.” The ear systems in higher dimensions,” SIAM Journal on Scientific Computing Journal of chemical physics 148 1, 014105 (2018). 36 (2013), 10.1137/140953289. 22P. Milner, C. S. Gillespie, and D. J. Wilkinson, “Moment closure based 39M. Hegland and J. Garcke, “On the numerical solution of the chemical parameter inference of stochastic kinetic models,” Statistics and Computing master equation with sums of rank one tensors,” ANZIAM Journal 52 23, 287–295 (2013). (2011), 10.21914/anziamj.v52i0.3895. 23V. Stathopoulos and M. A. Girolami, “Markov chain monte carlo infer- 40P. Gelß, The Tensor-Train Format and Its Applications: Modeling and ence for markov jump processes via the linear noise approximation,” Philo- Analysis of Chemical Reaction Networks, Catalytic Processes, Fluid Flows, sophical Transactions of the Royal Society A: Mathematical, Physical and and Brownian Dynamics, Ph.D. thesis (2017). Engineering Sciences 371, 20110541 (2013). 41L. Trefethen, Spectral Methods in MATLAB, Software, Environments, and 24P. Fearnhead, V. Giagos, and C. Sherlock, “Inference for reaction networks Tools (Society for Industrial and Applied Mathematics (SIAM, 3600 Market using the linear noise approximation,” Biometrics 70, 457–466 (2014). Street, Floor 6, Philadelphia, PA 19104), 2000). 25A. Golightly and D. J. Wilkinson, “Bayesian parameter inference for stochas- 42B. N. Khoromskij, “O (d log n)-quantics approximation of n-d tensors in tic biochemical network models using particle Markov chain Monte Carlo,” high-dimensional numerical modeling,” Constructive Approximation 34, Interface Focus 1, 807–820 (2011). 257–280 (2011). 26 C. Zechner, M. Unger, S. Pelet, M. Peter, and H. Koeppl, “Scalable Inference 43S. V. Dolgov, B. N. Khoromskij, and I. V. Oseledets, “Fast solution of of Heterogeneous Reaction Kinetics from Pooled Single-Cell Recordings,” parabolic problems in the tensor train/quantized tensor train format with ini- Nat Meth 11, 197–202 (2014). tial application to the fokker–planck equation,” SIAM Journal on Scientific 27 I. Oseledets, “Tensor-Train Decomposition,” SIAM J. Scientific Computing Computing 34, A3016–A3038 (2012), https://doi.org/10.1137/120864210. 33, 2295–2317 (2011). 44D. Bigoni, A. Engsig-Karup, and Y. Marzouk, “Spectral Tensor-Train 28 T. Kolda and B. Bader, “Tensor decompositions and applications,” SIAM Decomposition,” SIAM Journal on Scientific Computing 38 (2014), Rev. 51, 455–500 (2009). 10.1137/15M1036919. 29 S. Dolgov, “A Tensor Decomposition Algorithm for Large ODEs with 45C. de Boor, A Practical Guide to Spline, Vol. Volume 27 (1978). Conservation Laws,” Computational Methods in Applied Mathematics 19 46L. Huang, L. Pauleve, C. Zechner, M. Unger, A. S. Hansen, and H. Koeppl, (2018), 10.1515/cmam-2018-0023. “Reconstructing dynamic molecular states from single-cell time series,” 30 I. Oseledets and E. Tyrtyshnikov, “Breaking the Curse of Dimensionality, Journal of The Royal Society Interface 13, 20160533 (2016). Or How to Use SVD in Many Dimensions,” SIAM J. Scientific Computing 47S. Z. Li and A. Jain, eds., “Forward-backward algorithm,” in Encyclopedia 31, 3744–3759 (2009). of Biometrics (Springer US, Boston, MA, 2009) pp. 580–580. 31 D. Savostyanov and I. Oseledets, “Fast adaptive interpolation of multi- 48B. Alberts, A. Johnson, J. Lewis, P. Walter, M. Raff, and K. Roberts, dimensional arrays in tensor train format,” in The 2011 International Work- Molecular Biology of the Cell 4th Edition: International Student Edition shop on Multidimensional (nD) Systems (IEEE, 2011) pp. 1–8. (Routledge, 2002). 32 I. Oseledets and E. Tyrtyshnikov, “TT-cross approximation for multidimen- 49“Numerical differential equation methods,” in Numerical Methods for sional arrays,” Linear Algebra and its Applications 432, 70–88 (2010). Ordinary Differential Equations (John Wiley and Sons, Ltd, 2016) 33 S. V. Dolgov, “TT-GMRES: solution to a linear system in the structured https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781119121534.ch2. tensor format,” Russian Journal of Numerical Analysis and Mathematical 50H. Hethcote, “The mathematics of infectious diseases,” SIAM Review 42, Modelling 28, 149–172 (2013). 599–653 (2000). 34 I. Oseledets and S. Dolgov, “Solution of linear systems and matrix inver- 51K. Öcal, R. Grima, and G. Sanguinetti, “Parameter estimation for biochem- sion in the tt-format,” SIAM Journal on Scientific Computing 34 (2012), ical reaction networks using wasserstein distances,” Journal of Physics A: Mathematical and Theoretical 53, 034002 (2019).