Systems Biology: Parameter Estimation for Biochemical Models Maksat Ashyraliyev1, Yves Fomekong-Nanfack2, Jaap A
Total Page:16
File Type:pdf, Size:1020Kb
MINIREVIEW Systems biology: parameter estimation for biochemical models Maksat Ashyraliyev1, Yves Fomekong-Nanfack2, Jaap A. Kaandorp2 and Joke G. Blom1 1 Centrum voor Wiskunde en Informatica, Amsterdam, The Netherlands 2 Section Computational Science, University of Amsterdam, The Netherlands Keywords Mathematical models of biological processes have various applications: to a prioiri and a posteriori identifiability; local assist in understanding the functioning of a system, to simulate experiments and global optimization; parameter before actually performing them, to study situations that cannot be dealt estimation with experimentally, etc. Some parameters in the model can be directly Correspondence obtained from experiments or from the literature. Others have to be J. G. Blom, Centrum voor Wiskunde en inferred by comparing model results to experiments. In this minireview, we Informatica, Science Park 123, 1098 XG discuss the identifiability of models, both intrinsic to the model and taking Amsterdam, The Netherlands into account the available data. Furthermore, we give an overview of the Fax: +31 20 5924199 most frequently used approaches to search the parameter space. Tel: +31 20 5924263 E-mail: [email protected] (Received 8 April 2008, revised 21 October 2008, accepted 28 November 2008) doi:10.1111/j.1742-4658.2008.06844.x these results, the forward problem. The inverse problem Introduction is the problem at hand: the estimation of parameters in Parameter estimation in systems biology is usually part a mathematical model from measured observations. of an iterative process to develop data-driven models There are a number of difficulties involved [6]. The for biological systems that should have predictive forward problem requires a fast and robust time inte- value. In this minireview, we discuss how to obtain grator. Fast, because the model will be evaluated many parameters for mathematical models by data fitting. times. Robust, because the whole parameter and state We restrict ourselves to the case where a deterministic space will be visited, which most likely will result in a model in the form of a mathematical function-based different character of the mathematical model (i.e. model is available, such as a system of differential and number and range of time scales involved). The inverse algebraic equations. For example, in the case of a bio- problem has even more pitfalls. The first question is chemical process, hypotheses based on the knowledge whether the parameters for the mathematical model of the underlying network structure of a pathway are can be determined assuming that for all observables translated into a system of kinetic equations, para- continuous and error-free data are available. This is the meters are obtained from literature or estimated from a subject of a priori identifiability or structural identifi- data fit, and, with the resulting model, predictions are ability analysis of the mathematical model. The actual made that can be tested with further experiments. To parameter estimation or data fitting typically starts compare model results with the experimental data, one with a guess about parameter values and then changes first has to simulate the mathematical model to produce those values to minimize the discrepancy between Abbreviations DAE, differential algebraic equations; EA, evolutionary algorithms; MLE, maximum likelihood estimator; SA, simulated annealing; SRES, stochastic ranking evolutionary strategy. 886 FEBS Journal 276 (2009) 886–902 ª 2009 The Authors Journal compilation ª 2009 FEBS M. Ashyraliyev et al. Parameter estimation in systems biology ( model and data using a particular metric. Kinetic dxðt; pÞ A ¼ fðt; xðt; pÞ; p; uðtÞÞ; t < t t models with nonlinear rate equations have in general dt 0 e ð1Þ multiple sets of parameters that lead to such minimiza- xðt0; pÞ¼x0ðpÞ tions, some of those minima may only be local. The where t denotes time, the m-dimensional vector p value of parameters and model variables may range contains all unknown parameters, x is an n-dimensional over many orders of magnitude, one can get stuck in a vector with the state variables (e.g. concentration val- local minimum or one can wander around in a very flat ues), u are the externally input signals, and f is a given part of the solution space. Given a particular set of vector function. When components of the initial state experimental data, and one particular acceptable model vector x are not known, they are considered as parameterization obtained by a parameter estimation 0 unknown parameters, so x may depend on p. In most procedure, does not mean that all obtained parameters 0 cases, A is a constant diagonal n · n matrix with 1 or 0 can be trusted. After the minimum has been found, an on the diagonal; 1 for an ODE and 0 for an algebraic a posteriori or practical identifiability study can show equation. how well the parameter vector has been determined In addition, a vector of observables is given: given a data set that is possibly sparse and noisy. That this part of model fitting should not be underestimated gðt; xðt; pÞ; p; uðtÞÞ ð2Þ is shown by Gutenkunst et al. [7]. For all 17 systems which are quantities in the model [in general (a combi- biology models that they considered, the obtained nation of) state variables] that can be experimentally parameters are ‘sloppy’, meaning not well-defined. On measured, and possibly a vector of (non)linear con- the other hand, one could argue that often the precise straints: value of a parameter is not required to draw biological conclusions [8]. cðt; xðt; pÞ; p; uðtÞÞ 0 ð3Þ In this minireview, we first discuss the identifiability of the model, both a priori and a posteriori, the latter by Let us assume that N measurements are available to a small example. Next, we give a brief survey of the cur- find parameters of Eqns (1–3). Each measurement, rent methods used in parameter estimation with a focus which we denote by yi, is specified by the time ti when on those that are implemented in toolboxes for systems the ith component of the observable vector g is mea- biology. In the Discussion, we give some guidelines on sured. The corresponding model value for a specific the application of these methods in practice. Finally, in parameter vector p^,which can be obtained sufficiently the supporting information (Doc. S1), an overview is accurate by numerical integration of Eqn (1) and com- given of the contents of some well-known toolboxes. puting the observable function of Eqn (2), is denoted For further reading on identifiability, we refer to the by ^gi ¼ giðti; x; p^; uÞ. The vector of discrepancies classical textbook of Ljung [9] and the recent review between the model values and the experimental values paper on regression by Jaqaman and Danuser [4]. An is then given by eðp^Þ¼jgðt; xðt; p^Þ; p^; uðtÞÞ À yj.We overview on local and global parameter estimation assume that Eqn (1) is a sufficiently accurate mathe- methods applied to a systems biology benchmark set is matical description approximating reality. This means given elsewhere [10,11]. We also recommend the easily that all relevant knowledge about the biological pro- readable books on this subject by Schittkowski [6] and cesses is incorporated correctly in the vector function by Aster et al. [12], which touch many of the subjects f. Thus, the only uncertainty in Eqn (1) is the vector discussed in this minireview, with the exception of of unknown parameters p. In this case, the difference à à global search methods. ei(p ) ¼ |gi(ti,x,p ,u))yi| is solely due to experimental errors, where pà is the true solution. The m-dimensional optimization problem is given by Problem definition the task to minimize some measure, V(p), for the dis- Deterministic models arising from kinetic equations crepancy e(p). By far the most used measure for the are typically given by a system of differential algebraic discrepancy is the Euclidean norm or the sum of the equations (DAEs)1 (i.e. ordinary differential equations squares weighted with the error in the measurement: coupled to algebraic equations) of the form: XN 2 ðg ðt ; x; p; uÞy Þ V ðpÞ¼ i i i ¼ eT ðpÞWeðpÞð4Þ MLE r2 1 The content of this paper is also applicable to (discretized) systems i¼1 i of partial differential equations and delay differential equations. Fitting parameters of stochastic models requires a different approach see [13,14]. This measure results from the maximum [1–3]. likelihood estimator (MLE) theory. Under the assump- FEBS Journal 276 (2009) 886–902 ª 2009 The Authors Journal compilation ª 2009 FEBS 887 Parameter estimation in systems biology M. Ashyraliyev et al. tion that the experimental errors are independent and determined with the available, noisy, experimental normally distributed with standard deviation ri, the data. In this case, locally means in the neighborhood least squares estimate p^ of the parameters is the value of the obtained parameter. of p that minimizes the sum of squares: A priori identifiability p^ ¼ arg min VMLEðpÞð5Þ p There are several techniques to determine a priori When these assumptions do not hold, other mea- global identifiability of the model, but for realistic sit- uations (i.e. nonlinear models of a certain size), it is sures than VMLE(p) might be used like the sum of the absolute values. The MLE theory then does not apply very difficult to obtain any results. Still, it is advisable so p^ is not the least squares estimate and the statistical to always perform an a priori analysis because parame- analysis in the section ‘A Posteriori identifiability’ does ter estimation methods can have problems with locally not hold. Dependent on the optimization method or identifiable or unidentifiable systems. Symbolic algebra maple mathematica the mathematical discipline the function V(p) is called packages like [15] and [16] can objective function, cost function, goal function, energy be of great help.