Institutionen för systemteknik Department of Electrical Engineering

Examensarbete

Evaluation of two Methods for Identifiability Testing

Examensarbete utfört i Reglerteknik vid Tekniska högskolan i Linköping av

Peter Nyberg

LiTH-ISY-EX--09/4278--SE Linköping 2009

Department of Electrical Engineering Linköpings tekniska högskola Linköpings universitet Linköpings universitet SE-581 83 Linköping, Sweden 581 83 Linköping

Evaluation of two Methods for Identifiability Testing

Examensarbete utfört i Reglerteknik vid Tekniska högskolan i Linköping av

Peter Nyberg

LiTH-ISY-EX--09/4278--SE

Handledare: Gunnar Cedersund ike, Linköpings universitet Christian Lyzell isy, Linköpings universitet Jan Brugård Mathcore Examinator: Martin Enqvist isy, Linköpings universitet

Linköping, 7 October, 2009

Avdelning, Institution Datum Division, Department Date

Division of Automatic Control Department of Electrical Engineering 2009-10-07 Linköpings universitet SE-581 83 Linköping, Sweden

Språk Rapporttyp ISBN Language Report category —  Svenska/Swedish  Licentiatavhandling ISRN ⊠ Engelska/English ⊠ Examensarbete LiTH-ISY-EX--09/4278--SE  -uppsats Serietitel och serienummer ISSN  D-uppsats Title of series, numbering —   Övrig rapport 

URL för elektronisk version http://www.control.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-ZZZZ

Titel Utvärdering av två metoder för identifierbarhetstestning Title Evaluation of two Methods for Identifiability Testing

Författare Peter Nyberg Author

Sammanfattning Abstract

This thesis concerns the identifiability issue; which, if any, parameters can be deduced from the input and output behavior of a model? The two types of iden- tifiability concepts, a priori and practical, will be addressed and explained. Two methods for identifiability testing are evaluated and the result shows that the two methods work well if they are combined. The first method is for a priori identifiability analysis and it can determine the a priori identifiability of a sys- tem in polynomial time. The result from the method is probabilistic with a high probability of correct answer. The other method takes the simulation approach to determine whether the model is practically identifiable. Non-identifiable pa- rameters manifest themselves as a functional relationship between the parameters and the method uses transformations of the parameter estimates to conclude if the parameters are linked. The two methods are verified on models with known identifiability properties and then tested on some examples from systems biology. Although the output from one of the methods is cumbersome to interpret, the results show that the number of parameters that can be determined in practice (practical identifiability) are far fewer than the ones that can be determined in theory (a priori identifiability). The reason for this is the lack of quality, noise and lack of excitation, of the measurements.

Nyckelord Keywords Identifiability, Mean optimal transformation approach, Multistart simulated an- nealing, Sedoglavic observability test

Abstract

This thesis concerns the identifiability issue; which, if any, parameters can be deduced from the input and output behavior of a model? The two types of iden- tifiability concepts, a priori and practical, will be addressed and explained. Two methods for identifiability testing are evaluated and the result shows that the two methods work well if they are combined. The first method is for a priori identifiability analysis and it can determine the a priori identifiability of a sys- tem in polynomial time. The result from the method is probabilistic with a high probability of correct answer. The other method takes the simulation approach to determine whether the model is practically identifiable. Non-identifiable pa- rameters manifest themselves as a functional relationship between the parameters and the method uses transformations of the parameter estimates to conclude if the parameters are linked. The two methods are verified on models with known identifiability properties and then tested on some examples from systems biology. Although the output from one of the methods is cumbersome to interpret, the results show that the number of parameters that can be determined in practice (practical identifiability) are far fewer than the ones that can be determined in theory (a priori identifiability). The reason for this is the lack of quality, noise and lack of excitation, of the measurements.

Sammanfattning

Fokus i denna rapport är på identifierbarhetsproblemet. Vilka parametrar kan unikt bestämmas från en modell? Det existerar två typer av identifierbarhets- begrepp, a priori och praktisk identifierbarhet, som kommer att förklaras. Två metoder för identifierbarhetstestning är utvärderade och resultaten visar på att de två metoderna fungerar bra om de kombineras med varandra. Den första metoden är för a priori identifierbarhetsanalys och den kan avgöra identifierbarheten för ett system i polynomiell tid. Resultaten från metoden är slumpmässigt med hög sanno- likhet för ett korrekt svar. Den andra metoden använder sig av simuleringar för att avgöra om modellen är praktiskt identifierbar. Icke-identifierbara parametrar ytt- rar sig som funktionella kopplingar mellan parametrar och metoden använder sig av transformationer av parameterskattningarna för att avgöra om parametrarna är kopplade. De två metoderna är verifierade på modeller där identifierbarheten är känd och är därefter testade på några exempel från systembiologi. Trots att resultaten från den ena metoden är besvärliga att tolka visar resultaten på att antalet parametrar som går att bestämma i verkligheten (praktiskt identifierbara)

v vi

är betydligt färre än de parametrar som kan bestämmas i teorin (a priori identifi- erbara). Anledningen beror på brist på kvalitet, både brus och brist på excitation, i mätningarna. Acknowledgments

First I would like to thank my examiner Martin Enqvist and my co-supervisor Christian Lyzell for the help with the report. I also would like to thank my co-supervisor Gunnar Cedersund for all good answers to my questions and the different models from systems biology. Thanks to Jan Brugård who gave me the opportunity to do my thesis at MathCore. Last but not least I would like to thank my wonderful girlfriend Eva for sticking with me.

vii

Contents

1 Introduction 1 1.1 ThesisObjectives...... 1 1.2 ComputerSoftware...... 2 1.2.1 Mathematica ...... 2 1.2.2 MathModelica ...... 2 1.2.3 Matlab ...... 3 1.2.4 ...... 3 1.3 OrganizationoftheReport ...... 3 1.4 Limitations ...... 3 1.5 Acronyms ...... 3

2 Theoretical Background 5 2.1 Identifiability ...... 7 2.2 Multistart Simulated Annealing ...... 7 2.2.1 CostFunction...... 8 2.2.2 Settings ...... 10 2.2.3 Input ...... 11 2.2.4 Output ...... 11 2.3 MeanOptimalTransformationApproach ...... 12 2.3.1 Alternating Conditional Expectation ...... 12 2.3.2 Input ...... 13 2.3.3 TheIdeaBehindMOTA...... 13 2.3.4 Test-function ...... 14 2.3.5 Output ...... 15 2.4 Observability ...... 18 2.5 Sedoglavic’sMethod ...... 18 2.5.1 AQuickIntroduction ...... 19 2.5.2 Usage ...... 20 2.5.3 DrawbacksoftheAlgorithm...... 20

3 Results 21 3.1 VerifyingMOTA ...... 21 3.1.1 Example1...... 21 3.1.2 Example2...... 24

ix x Contents

3.2 Verifying Sedoglavic Observability Test ...... 26 3.2.1 ALinearModel...... 26 3.2.2 Goodwin’sNapkinExample...... 27 3.2.3 ANonlinearModel...... 27 3.2.4 CompartmentalModel...... 27 3.3 Implementation Aspects and Computational Complexity ...... 28 3.3.1 Simulation ...... 28 3.3.2 ACE...... 29 3.3.3 SelectionAlgorithm ...... 29 3.4 EvaluationonRealBiologicalData ...... 29 3.4.1 Model1:SimplifiedModel ...... 30 3.4.2 Model 2: Addition of insulin to the media ...... 35 3.4.3 Model 3: A model for the insulin receptor signaling, includ- inginternalization ...... 41

4 Conclusions and Future Work 47 4.1 Conclusions ...... 47 4.2 ProposalforFutureWork ...... 48

A Programming Examples 51 A.1 Mathematica ...... 51 A.2 MathModelica...... 51 A.3 Maple ...... 52 Chapter 1

Introduction

A common question in control theory and systems biology is whether or not a model is identifiable. The reason for this is that the parameters of the model cannot be uniquely determined if the model is non-identifiable. Why is this important? The answer is that the parameters can have some physical meaning or the search procedures for the parameter estimates may suffer if these are not unique (Ljung and Glad, 1994). In systems biology this is a big problem due to few measurements compared to the number of parameters in the model. The models in systems biology are often described by systems of differential equations (Hengl et al., 2007) also known as the state-space representation

x˙ = f(x,p,u) y = g(x,p,u) where x are the states, p the parameters, y the outputs, and u the inputs of the dx(t) model. The dynamics is described by the formulax ˙ = dt = f(x,p,u). The parameters are, e.g., reaction rates which have to be determined with the help of the measured data. As we will see there are two types of identifiability concepts. The first one regards only the model structure and the other one originates form the lack of quality of the measurements. In this thesis two methods will be evaluated regarding the identifiability issue. The first method focuses on the model equations and the second method handles also the quality of the measurements.

1.1 Thesis Objectives

In this thesis we will describe the difference between the two types of identifiability concepts which will be explained in Section 2. The main purpose of this thesis is to investigate methods for determining the identifiability property of a given model. This includes both implementation and evaluation of existing methods in the MathModelica software environment. The algorithms which have been

1 2 Introduction translated from Matlab to MathModelica/Mathematica are described in Section 2.2 and in Section 2.3.

1.2 Computer Software

During the work with this thesis several programming languages have been used. A translation of two algorithms from Matlab to Mathematica has been done. For the simulation a Modelica-based language MathModelica and the Systems Biol- ogy Toolbox (SBTB) have been used and finally Maple has been required due to another algorithm that is written in that language. These four languages will be briefly described in this section. Some syntax examples are shown in Appendix A.

1.2.1 Mathematica

Mathematica is a program originally developed by Stephen Wolfram. The largest part of the Mathematica user community consists of technical professionals but the language is also used in education around the globe. One useful feature with Mathematica is that you can choose what programming style that suits you. Ordinary or procedural programming goes hand in hand with functional and rule-based programming. The downside is that you can write one line of code that does almost anything but the reader understands nothing. This can be solved with proper comments and basic knowledge about the language. An example in Appendix A illustrates the different ways to write a piece of code that sums up all the even numbers from 1 to 1000. For more information about Mathematica see Wolfram (1999).

1.2.2 MathModelica

Due to the demand of experiment fitting, that is, fitting the model several1 times to data to obtain estimates of the parameters, there is a need of simulation envi- ronments and in this thesis MathModelica has partly been used for the simulation. MathModelica System Designer Professional from MathCore has been used and it provides a graphical environment for modeling and an environment for simulation tasks. With the Mathematica link, which connects MathModelica with Mathemat- ica, one can make use of the Mathematica notebook environment and its facilities (Modelica, 2009). For more information about MathModelica see Mathcore (2009) and for the Modelica language see Fritzson (2003). One of the main advantages of a Modelica-based simulation environment is that its acausal. There is no predetermined input or output (if you do not force it to be) for the components meaning, e.g., for a resistor component (see Appendix A) either , i or v can serve as inputs or outputs depending of the structure on the whole system.

1Several is due to the choice of the parameter estimator in this thesis, see Section 2.2. 1.3 Organization of the Report 3

1.2.3 Matlab Matlab is a language that is based on numerical computing. It is developed by The MathWorks and it is used heavily around the world for technical and education purposes. Matlab can be extended with several toolboxes and in this thesis the SBTB has been used with the package SBADDON for the purpose of simulating in the Matlab environment.

1.2.4 Maple Maple is a technical computing and documentation environment, based on a com- puter algebra system and originated from the Symbolic Computation Group at the University of Waterloo, Canada. Maple has been used because one algorithm for testing identifiability has been written in Maple. The algorithm will be explained in Section 2.5.

1.3 Organization of the Report

The organization of this thesis is as follows. The background theory and the methods for testing identifiability will be explained in Section 2. In Section 3 the results are presented. The two methods are first verified with the help of some examples. Furthermore, the implementation aspects from Matlab to Math- Modelica/Mathematica are discussed. Thereafter, the methods will be tested with the help of examples from systems biology. In Section 4 the conclusions and the proposal to future work are to be found.

1.4 Limitations

One limitation in this thesis is that we focus on systems biology and the examples are all taken from that field of science. Another limitation is that the the second method for determining the identifiability of a model, see Section 2.5, has not been translated to MathModelica/Mathematica and therefore has not been studied as thoroughly.

1.5 Acronyms

MOTA Mean Optimal Transformation Approach...... 6 MSA Multistart Simulated Annealing ...... 6 ACE Alternating Conditional Expectation ...... 6 SOT Sedoglavic’s Observability Test...... 5 SBTB Systems Biology Toolbox...... 2

Chapter 2

Theoretical Background

In systems biology, but also within many other areas, over-parametrization and non-identifiability is a big problem. This problem is present when there are parts of a model that cannot be identified uniquely (or observed) from the given mea- surements. There are two types of identifiability: a priori and practical. Practical identifiability implies a priori identifiability, but not vice versa. For a model with unknown parameters there is a possibility to determine these parameters from the input and output signals. If the model is a priori non-identifiable the parameters cannot be determined even if the input signals are free from noise. Analyzing the model structure itself with respect to non-identifiability is called a priori (structural) identifiability analysis since the analysis is done before any experiment fitting and simulation. In this thesis the a priori or structural iden- tifiability regards the local identifiability aspects of a model and not the global property. Global identifiability implies local identifiability, but not the other way around. See Section 2.1 for the definition of global and local identifiability. For nonlinear models numerous approaches have been proposed, e.g., the power se- ries expansion (Pohjanpalo, 1978), differential algebra (Ljung and Glad, 1994) and similarity transform (Vajda et al., 1989). However, with increasing model complexity, these methods become mathematically intractable (Hengl et al., 2007; Audoly et al., 2001). Due to the importance of time efficient algorithms a method proposed by Alexandre Sedoglavic has been used in this thesis. The algorithm, Se- doglavic’s Observability Test (SOT), is polynomial in time with the drawback that the result is probabilistic (Sedoglavic, 2002). The probabilistic nature of the algo- rithm originates from simplifications in the algorithm to speed up the calculations (Sedoglavic, 2002). In Section 2.5 the SOT will be further explained. There is another way to detect non-identifiable parameters, besides a priori identifiability analysis, and this is done with the help of simulation and parameter fitting. Non-identifiable parameters manifest themselves as functionally related parameters, in other words the parameters depend on each other. If the parame- ters are functionally related this leads to that it may only be, for example, the sum or quotient that can be determined. This results in that the parameters cannot be determined uniquely and the parameters are non-identifiable. Non-identifiable pa-

5 6 Theoretical Background

Figure 2.1. This figure show the block diagram of the different paths used by the two algorithms. The simulation approach is the upper one. From the model equations the MSA algorithm produces estimates of the parameters which are used in MOTA to determine which of the parameters that are identifiable. The a priori identifiability analysis is a bit more straightforward; from the model equations the parameters which are a priori identifiable are determined with the help of the SOT. rameters can be detected by fitting the model to data consecutively, several1 times, to obtain estimates of the parameters which are then examined. One such method is presented in Hengl et al. (2007). Their method, Mean Optimal Transformation Approach (MOTA), is based on the simulation approach and is developed from the Alternating Conditional Expectation (ACE) (Breiman and Friedman, 1985). The simulation approach needs a parameter estimator that produces estimates of the parameters. One such algorithm is Potterswheel (Maiwald and Timmer, 2008) that Hengl et al, used. Another one is the well known Gauss-Newton algorithm. However in this thesis the Multistart Simulated Annealing (MSA) (Pettersson, 2008) is used instead. One of the advantages with the latter one is that it only needs function evaluations and not derivatives of the function. An advantage over the Gauss-Newton method is that the MSA algorithm can find parameter esti- mates that can describe the model fairly well even if the initial guess is far from the minimum. In Figure 2.1 the a priori identifiability analysis and the simulation approach are illustrated. This chapter contains an introduction to some basic results about identifiability and observability. Furthermore, the two algorithms that have been used in this thesis are explained.

1Due to the choice of the parameter estimator. 2.1 Identifiability 7

2.1 Identifiability

To be more specific and formal about identifiability, study the following case. Given a constrained model structure, a structure with constraints given by the model,

dx(t,p) = f(x(t,p),u(t),t,p) (2.1a) dt y(t,p)= g(x(t,p),p) (2.1b)

x0 = x(t0,p) (2.1c) h(x(t,p),u(t); p) ≥ 0 (2.1d)

t0 ≤t ≤ tf , (2.1e) where x denotes the state variables, u the externally given input signals, p the system parameters, and y the observations. The initial values x0 = x(t0,p), and h denote all additional constraints formulated as explicit or implicit algebraic equations. A single parameter pi in (2.1) is globally identifiable, if there exist a unique solution for pi from the constrained model structure. A parameter with count- able or uncountable number of solutions is locally identifiable or non-identifiable, respectively. In theory one can assume that the measurements are ideal, e.g., noise-free and continuous, but in many situations this is not the case. As for biological data the measurements include observational noise. To take this into consideration we can generalize (2.1b) to

y(t,p)= g(x(t,p),p)+ ǫ(t), where ǫ(t) represents the noise. As mentioned before, practical identifiability implies a priori identifiability, but not the other way around. Practical identifiability of a given model structure holds when we can determine the values of the parameters, with a small enough variance, from the measurements of the input and output signals. Due to the noise, the a priori identifiable parameters can become practically non-identifiable (Hengl et al., 2007).

2.2 Multistart Simulated Annealing

To determine the functional relationship between parameters there is a need for an algorithm that produces estimates of these parameters that can be further analyzed. In this thesis the MSA algorithm has been used. Another algorithm that can be used is the multi-experiment fitting that is presented by Maiwald and Timmer (2008). From now on we shall concentrate on the first one and use it to obtain the estimates of the parameters. These estimates are then used by the algorithm that is presented in Section 2.3, see also Figure 2.1. 8 Theoretical Background

The MSA algorithm is a random search method that tries to mimic the behavior of atoms in equilibrium at a given temperature. The algorithm starts at one temperature and then begins to cool down until its temperature reaches the stop- temperature. The temperature is proportional to the randomness in the search, the higher the temperature the more random is the search and for lower temperature the algorithm behaves more like a local search algorithm. One of the advantages with this algorithm is that it only needs function evaluations, not derivatives of the function. For more information about the technical aspects of the MSA algorithm see Chapter 3 in Pettersson (2008) and the references therein.

Algorithm 1 Multistart Simulated Annealing

Requirements: Initial starting guess X0, start temperature T1, stop temperature Ts, temperature factor νtemp, cost function, low bound lb, high bound hb, a positive number σ and the number of iterations used for each temperature N. 1: Initiate parameters and put k = 1.

2: while Tk >Ts do 3: Perform N iterations of Annealed Downhill-Simplex, a random search (Pet- tersson, 2008), at temperature Tk. Each iteration gives a pointp ¯ and N iterations gives the points P = [¯p1, p¯2,..., p¯N ].

4: Set Tk+1 lower than Tk, Tk+1 = νtempTk and put k = k + 1. 5: The suitable restart points R are calculated with the help of clustering anal- ysis. From the N points a critical distance,

1 1 k log(kN) k r = π− 2 Γ(1 + )F (σ, l ,h ) , k 2 b b kN   is calculated. The restart points R are the pointsp ¯ that have a distance sufficiently far from the other points according to

R = p¯ | || col(P ) − p¯ ||2> rk , where F is a rational function and Γ is the Gamma function and col(P ) = [¯p1, p¯2,..., p¯N ] denotes all columns in P exceptp ¯ . 6: end while 7: return The best points from each valley that have been found during the last temperature iteration.

2.2.1 Cost Function To determine which parameter vectorp ¯ that gives the best fit, the principle of minimizing the prediction error has been used. Let yi be measured output data 2.2 Multistart Simulated Annealing 9 points from the system that we want to estimate the parameters for. For a param- eter vectorp ¯ the constrained model structure given by (2.1a)-(2.1e) is simulated and the outcome is a predictiony ˆ(¯p)i. The input u is known and it is used when we simulate the model for different parameter vectors. The prediction error ǫ(¯p)i = yi − yˆ(¯p)i is used to measure how good the fit is between the measured data points to the simulated. From the measured output data points yi and the simulated onesy ˆ(¯p)i the cost function calculates the cost, e.g., by

N (y − yˆ(¯p) )2 V= i i , (2.2) std(¯y)2 i X where std denotes the standard deviation. When the algorithm minimizes V in (2.2) we obtain estimates of parameter vectors that hopefully can explain the output from the system fairly well. The length of the simulation is determined by the measured data points yi which have been measured before the experiment fitting. The time samples determine which predictiony ˆi that will be used in, e.g., the cost function (2.2). There are also possibilities to take other things into consideration than just the prediction error normalized with the standard deviation. We can choose how the weight is distributed between the prediction error, V,˜ and an ad-hoc extra cost, V,¯ by V= α V+˜ V¯, (2.3) where α determines the weight between the two terms. For instance, the measured signal has an overshoot and allp ¯ that does not produce this overshoot is given a larger cost in (2.3). For models with more than one output, one choice of cost function is the trivial expansion of (2.2),

N1 2 N2 2 (yi1 − yˆi1 ) (yi2 − yˆi2 ) V= 2 + 2 + . . . (2.4) std(y1) std(y2) i1 i2 X X The model or the constrained model structure is simulated numerous times in the MSA algorithm to calculate the cost function which the algorithm minimizes with respect to the parameter vectorp ¯. In this thesis all the models have been simulated either with the help of Mathematica and MathModelica or by Matlab and SBTB. For the first case the model has been written in MathModelica and then transfered to interact with Mathematica by the Mathematica Link.

Acceptable Parameters The MSA algorithm is an optimization algorithm that searches for an optimal parameter vectorp ¯ that minimizes the cost. How can this be of any use regarding the issue of identifiability? The reason will be thoroughly explained in Section 2.3 where the MOTA algorithm is presented. The MOTA algorithm takes an n × q matrix K = [¯p1, p¯2,... p¯q] as input, 10 Theoretical Background

p1,1 p1,2 ... p1,q p2,1 p2,2 ... p2,q K =  . . . .  , (2.5) . . .. .    p p ... p   n,1 n,2 n,q  where each row represents the r-th estimate of the parameters. The parameters are represented by the columns, e.g., the third row and fifth column represent the third estimate of the fifth parameter in the model. Here lies the answer of the question above; the MSA algorithm is used to produce this matrix K and this in done by searching for acceptable parameters. The acceptable parameters are all parameter combinations that have a cost near the cost of the best parameter vector so far. In the best parameter vector we mean the one with the lowest cost so far. In other words, all parameters that produce an acceptable cost are regarded as acceptable parameters. In this thesis we have used the threshold 110 percent of the best cost so far to determine if the cost is near or not. For each cost function evaluation the current parameter vectorp ¯ with a given cost is taken as acceptable if the cost is near the best cost so far. These acceptable parameters are then taken as estimates of the parameters and are used to determine if there exist any functional relationships between them.

2.2.2 Settings The MSA algorithm can be controlled by a number of settings that affect the random search. This section will explain which these settings are and what they do. The settings of the algorithm are presented in Table 2.1.

Table 2.1: Settings in the MSA algorithm

Temp-start the temperature where the algorithm starts Temp-end the temperature where the search do its last iteration Temp-factor the factor of the cool-down process, higher factor implies faster cool-down after each temperature-loop Maxitertemp the number of iterations for each temperature Maxitertemp0 the number of iterations when the temp-end has been reached Max-time the maximum time the search can proceed until termi- nation Tolx a critical distance between the evaluated points in the optimization Tolfun a tolerance between the maximum and minimum func- tion value Maxrestartpoints the number of possible restart points after each iteration in the temperature loop Low bound a low bound for the parameters which are optimized High bound a high bound for the parameters which are optimized 2.2 Multistart Simulated Annealing 11

The algorithm consists mainly of two nested loops, one outer loop called the temperature loop and one inner loop that contains the main algorithm. The tem- perature loop runs as long as the temperature is above the critical stop temperature setting. For each temperature this loop constructs a simplex, a geometrical figure, for each restart point. From the restart point, which is a q-dimensional vector, the simplex is created by modifying the restart point element by element. This modification is either a relative change or an absolute one. In the current imple- mentation the relative change2 is 1.25 times the element and if the relative change results in an element which is larger than the high bound (or lower than the low bound) then an absolute3 change of ±0.5 (the sign depends on the settings of the low and high bound in the MSA) is done instead. The result is a geometrical figure which consists of q + 1 corners. In the two-dimensional case the simplex is a triangle. Let R =p ¯ = [p1,p2] denote a restart point. The simplex,

p 1.25p p simplex = 1 1 1 , (2.6) p p 1.25p  2 2 2  is constructed by copying the restart point and also modifying its elements. For example, if the modification is only relative the simplex is the one shown in (2.6). The idea is to update the worst point/column, the one with the highest function value, in the simplex with a better one. In the inner loop the simplex is contracted, reflected, and/or expanded and the new point that has been retrieved is compared with the rest of the points in the simplex. When the the outer loop has iterated through all the restart points the next restart points are calculated with clustering techniques that take all the evaluated points with function values. Depending on a critical distance the output is new restart points that will be used in the next temperature. More information can be found in Pettersson (2008).

2.2.3 Input The algorithm needs two sorts of inputs. The first input is a cost function, e.g. (2.2), that determine how the function evaluations will be conducted for each parameter vectorp ¯. The second is a start guess of the parametersp ¯0 where the algorithm starts the search. This start guess is similar to the internal restart points with the exception that it, in the current version, has to be a vector (a single restart point).

2.2.4 Output The original outputs from the algorithm are the best points in each valley with its cost. However in this thesis we are not solely interested in the optimal point. We want to get hold of several parameter estimates that later on can be analyzed with respect to functional relationships between the parameters. These estimates or acceptable parameters form a matrix J that contains m number of estimations of q number of parameters. This matrix is then used, after some modification,

2A 25 percent increase of the value in the element. 3An absolute value of ±0.5 instead of the old value of the element. 12 Theoretical Background in the MOTA algorithm which will be explained in the next section. Usually the matrix J contains a large number of estimates. Due to computational complexity, further explained in Section 3.3, some problems would occur if we would use J as an input directly to MOTA.

2.3 Mean Optimal Transformation Approach

The Mean Optimal Transformation Approach (MOTA) was proposed by Hengl et al. (2007) and is a non-parametric bootstrap-based identifiability testing algo- rithm. It uses optimal transformations that are estimated with the use of the Alternating Conditional Expectation (ACE) (Breiman and Friedman, 1985). The MOTA algorithm finds linear and/or nonlinear relations between the parameters regardless of the model complexity or size. This functional relationship between the parameters is then mapped to the identifiability problem; a parameter that can be expressed by other parameters is not identifiable and vice versa.

2.3.1 Alternating Conditional Expectation The Alternating Conditional Expectation (ACE) algorithm was developed by Breiman and Friedman (1985). It was first intended to be used in regression analysis but has also been applied in several other fields, e.g., Wang and Mur- phy (2005) used it to identify nonlinear relationships. The algorithm estimates, non-parametrically, optimal transformations. In the bivariate case the algorithm estimates the optimal transformations Θ(p1) and Φ1(p2) which maximize the linear correlation R between Θ(p1) and Φ1(p2) b b ˜ ˜ {Θb , Φ}p1,p2 =b sup | R(Θ(p1), Φ(p2)) | . (2.7) Θ˜ ,Φ˜ In the core of the algorithmb b there is a simple iterative algorithm that uses bivariate conditional expectations. When the conditional expectation are estimated from a finite data set the conditional expectation is replaced by smoothing techniques4 (Breiman and Friedman, 1985). The two-dimensional case can easily be extended to any size. Let K denote an n × m matrix where n is the number of estimates and m is the number of param- eters. Suppose that the m parameters have an unknown functional relationship and let Θ and Φi denote the true transformation between the parameters, m Θ(pi)= Φj (pj )+ ǫ, 6 Xj=i where ǫ is normal-distributed noise . The algorithm estimates optimal transfor- mations Θ(pi) and Φj (pj ), where j 6= i, such that m b b Θ(pi)= Φj (pj ), (2.8) 6 Xj=i 4In Breiman and Friedman (1985)b they use the sob called super-smoother. 2.3 Mean Optimal Transformation Approach 13 and where optimal means in the sense of (2.7). The ACE algorithm differs between the left and right-hand side terms. The left-hand term is denoted as response and the right-hand terms as predictors. The calculation of (2.8) is done iteratively by the algorithm, new estimates of the trans- formation of the response serve as input to new estimates of the transformation of the predictors and vice versa. The ACE algorithm is summarized in Algorithm 2. For further information about the algorithm see Breiman and Friedman (1985) and Hengl et al. (2007) and the references therein.

Algorithm 2 Alternating Conditional Expectation ACE minimizes the unexplained variance between the response and predictors. 2 p 2 For e (Θ, Φ1,..., Φp) = E[Θ(Y ) − j=1 Φj (Xj )] the algorithm is the following (Breiman and Friedman, 1985) P 1: Initiate Θ(Y )= Y/kY k and Φ1(X1),..., Φp(Xp)=0.

2 2: while e (Θ, Φ1,..., Φp) decreases do

2 3: while e (Θ, Φ1,..., Φp) decreases do

4: for k = 1 to p do: Φk,1(Xk)= E[Θ(Y ) − i=6 k Φi(Xi)|Xk], replace Φk(Xk) with Φk,1(Xk); The conditional expectation is replaced by smoothing tech- niques (Breiman and Friedman, 1985). P 5: end for loop 6: end inner while

p 7: Θ1(Y )= E[ i Φi(Xi)|Y ]/kE[ i Φi(Xi)|Y ]k, replace Θ(Y ) with Θ1(Y ). 8: end outerP while P

∗ ∗ ∗ 9: Θ, Φ1,..., Φp are the solutions Θ , Φ1,..., Φp. 10: return

2.3.2 Input The input to the MOTA algorithm is an n × q matrix K containing n estimates of the q parameters. This matrix is then analyzed with the respect to functional relationships between the parameters. How the algorithm finds these relations is the topic of the next section.

2.3.3 The Idea Behind MOTA Non-identifiability manifests itself as functionally related parameters. These rela- tionships can be estimated by ACE and the idea is to use these estimates to inves- 14 Theoretical Background tigate the identifiability of the parameters. If there exists relationships between the parameters, the optimal transformations are quite stable from one sample to another, from a matrix K1 to a new draw of the matrix K2. If the first matrix K1 renders one optimal transformation then K2 will render a similar one if there exists a relation between the parameters. If there is no functional relationship then these transformations will differ from sample to sample. This depends on the data smoother/filter applied by the ACE algorithm, (Breiman and Friedman, 1985; Hengl et al., 2007). This is what differs the parameters that are linked with each other from the independent ones. The process of drawing new matrices K is replaced by bootstrapping. Bootstrapping is a re-sampling method that creates re-samples from a dataset. In this case the dataset is the input matrix K. The outcome from the bootstrapping is a new matrix that has been created from ran- dom sampling with replacement from matrix K. The bootstrapping speeds up the algorithm significantly. Each matrix K is denoted as a single fitting sequence.

2.3.4 Test-function A well behaved test-function is of greatest importance and due to robustness all es- timated optimal transformations are ranked. The following definition is presented in Hengl et al. (2007)

i Definition 1 Let φk(pkr) denote the value of the optimal transformation of pa- rameter pk at its r-th estimate (r-th row) in the i-th fitting sequence, and let card denote the cardinalb number of a given set, i.e., the number of elements contained i in the set. Then we define αk(pkr) as the function which maps each parameter es- timate of a certain fitting sequence onto its cardinality divided by the total number N of fits conducted within one fitting sequence

i 1 i ′ i i α (p )= card Φ (p ′ )|r ∈{1, ..., N} , Φ (p ′ ) ≤ Φ (p ) . k kr N k kr k kr k kr This function is then used to calculate the average optimal transformation.  i Note that the values of αk(pkr) is in the range [0, 1]. Let M denote the number of fitting sequences. The used test-function is the following

M 1 H := var αi (p ) , (2.9) k r M k kr " i=1 # X d 1 2 where var is the empirical variance, var = q−p (...) . A motivation is needed here. For parameters that have a strong functional relationship the average trans- P formationd d M 1 α¯ (p )= αi (p ) k kr M k kr i=1 X is independent of M, meaning the variance is constant. For parameters without i any functional relationship the function αk(pkr) is not stable from a fitting se- i quence to another. This leads to that αk(pkr) has the value from zero to one from 2.3 Mean Optimal Transformation Approach 15

a fitting sequence to another one. This implies thatα ¯k(t) → 0.5 when M → ∞, in other words zero variance. In the supplementary material of Hengl et al. (2007) the test-function is thoroughly explained. In the supplementary material it is 1 −1/2 1 shown that E[Hk] = ( 12 − Ordo(N )) M holds for parameters that are inde- pendent and for parameters which have a functional relationship the it holds that 1 1 E[Hk]= 12 (1 − N 2 ). When the test-function Hk (2.9) is low this indicates there is no functional relationship between the current response to the predictors. The other case, when Hk is not low, is more difficult. There are three threshold values T1, T2 and T3 that are used to determine whether there exists any functional relationship between the parameters or not. If the test-function falls below T1 this is regarded as the response parameter has no functional relationship with the predictors. If the test function is between T1 and T2 there is not enough information to establish whether there exists a functional relationship between the parameters. When the test-function is above threshold T2 there is enough information to confirm that there exists a strong relation between the response and the predictors. The third threshold T3 is for the performance of the algorithm and is not important for the functionality of MOTA. In Hengl et al. (2007) and Hengl (2007) these things are thoroughly explained.

2.3.5 Output MOTA determines which parameters that are linked with each other. These re- lationships are one of the outputs from MOTA. These relations can be seen in Table 2.2. The table has a simple structure, the different rows represent which parameter taken as the response parameter and the columns represent the predic- tors. For example, the first row is when the parameter p1 is taken as the response. Due to the simple structure of the table one can easily show these relations in a matrix form instead (2.10). In this thesis the matrix (2.10) is denoted the output q × q matrix S where q denotes the number of parameters. The matrix contains only zeros and ones. In each row, ones indicate which parameters that have a functional relationship. The matrix (2.10) indicates that the first parameter p1 is independent of the other parameters. This is also true for p5. The second row shows that when p2 is taken as the response the MOTA algorithm finds that p2 is related to p2 (trivial) but also p3 and p4. Row three and four also display this.

Table 2.2: The parameter relations from MOTA

* p1 p2 p3 p4 p5 p1 1 0 00 0 p2 0 1 11 0 p3 0 1 11 0 p4 0 1 11 0 p5 0 0 00 1 16 Theoretical Background

10000 01110   S = 01110 (2.10)  01110     00001      This is an ideal output, a symmetric matrix. However, this is not the case for all matrices S. The reason for this is that all parameters do not have equal contribution strength5 when taken as a predictor for a certain response. The less a parameter, a predictor, contributes to the response the noisier the transformation Φj (pj ) becomes. Finally the noise is so high the algorithm can not determine it from an independent parameter (Hengl et al., 2007). A simple example of this is low gradient, p1 = 0.001p2 + ǫ. The transformation Φ2(p2) will be noisy and the algorithm will have difficulties to conclude that the two parameters are functionally related. Uneven contribution strength can result in output matrices that have a non-symmetric shape. A problem arises from this, if the matrix is non-symmetric, is this due to uneven contribution strength of the parameters or is the result incorrect for some parameters? The MOTA gives also the r2-values

(Θ − Φ)2 r2 =1 − (2.11) (Θ − mean(Θ))2 P P i.e., the fractional amount of varianceP of the response explained by the predictors, as output. This will be used in the next section when we will analyze the output from MOTA. The coefficient of variation

cv = std(¯p)/mean(¯p) (2.12) will also be used according to the recommendation given by Hengl (2007). Another tool in the investigation of the identifiability issue of a model can be seen in Table 2.3. The table contains information about the parameter estimates and also information from the MOTA from a single run. The first column in the Table is the index, ix, of the response parameter. The table also contains the well known output matrix. Next are the r2-values which are the fractional amount of variance of the response explained by the predictors. The larger value (maximum is one) the more variance that is explained by the predictors. The cv-value is the coefficient of variation. Note that the cv-value is not a percentage value as ordinary. The #-column is the number of special parameter combinations that have been found by the MOTA algorithm. In this case the output matrix contains two identical rows, the first and the third, that shows that p1 and p3 are related. This information is stored in that column. The last column shows the parameter combinations which the response parameter ix has found related. The difference between the r2-value of the second row and the fourth is that the test-function (2.9) was below threshold T1, Section 2.3.4, when the parameter p4 was taken as response but between threshold T1 and T2 when parameter p2 was

5Some predictors contribute more or less to the response variable. 2.3 Mean Optimal Transformation Approach 17

2 taken as response. When the test-function drops below threshold T1 the r value is never calculated which the zero value for row 4 shows. When the test-function has a value that is between T1 and T2 there is not enough information to establish if there exists a functional relationship between the parameters. The r2-value is calculated and the algorithm does another loop with more predictors. If the test- function does not reach T2 the result is the one shown in row 2 in Table 2.3. The output matrix shows that parameter p2 is not linked with any other parameters but the r2-value is nonzero. In Hengl (2007) there are recommendations on how to interpret the output from MOTA. Functional relations with r2-value greater than 0.9 and a cv-value greater than 0.1 are recommended. If the functional relation has been found more than once this is a strong indication that the parameters are linked.

Table 2.3: Output from a MOTA run and properties of the esti- mates

2 ix p1 p2 p3 p4 r cv # pars 1 1 0 1 0 0.9936 0.5836 2 p1, p3 2 0 1 0 0 0.5203 0.6592 1 p2 3 1 0 1 0 0.9936 0.8413 2 p1, p3 4 0 0 0 1 0.0000 0.6263 1 p4

Accumulated Output Matrix

Due to the bootstrap-based technique to speed up MOTA, the output can vary from a run to another. The number of estimates from MSA also affects the output matrix. In this thesis we have taken this into account and we use several MOTA runs in hope that the result will be more robust. An accumulated output matrix is just an ordinary output matrix but the elements have been summed up, accumu- lated, from numerous MOTA runs. An example of an accumulated output matrix is 100 0 100 0 0 100 0 0 S200 = , (2.13) 100  100 0 100 0   0 0 0 100      where the lower index stands for how many times we have run the MOTA algo- rithm, and the upper index stands for how many estimates that serve as input to the algorithm. In this case MOTA has been run 100 times (lower index) and each run has been conducted with 200 estimates (upper index) of each parameter. The elements of the matrix can be as large as the number of runs in MOTA, since the ordinary output matrix only contains ones and zeros. From the matrix (2.13) one can conclude that parameter p1 and p3 seem to have some relation with each other. 18 Theoretical Background

2.4 Observability

The observability of a system S is determined by the state variables and their impact on the output. If a state variable has no effect on the output, directly or indirectly through other state variables, the system is unobservable. A definition is given in Ljung and Glad (2006).

Definition 2 A state vector x∗ 6= 0 is said to be unobservable if the output is identically zero when the initial value is x∗ and the input is identically zero. The system S is said to be observable if it lacks unobservable state vectors.

Observability is related to identifiability. Parameters can be considered as state variables with time derivative zero. By doing so the identifiability of the parameters is mapped to the observability rank test. This observability rank test is performed by rank calculation of the Jacobian (Anguelova, 2007). For a linear system

x˙ = Ax + Bu, (2.14a) y = Cx + Du, (2.14b) the observability can be determined by the well known observability matrix

C CA O(A, C)=  .  (2.15) .   CAn−1     where n denotes the number of state variables. The unobservable states form the linear null-space of O (Ljung and Glad, 2006). There is an equivalent test named PBH test; the system (2.14) is observable if and only if

A − λI C   has full rank for all λ, (Kailath, 1980). When the parameters are considered as state variables the model often becomes nonlinear and therefore a tool that can determine the observability of the system is needed. One such tool, SOT, is presented in the next section. For more information about the observability rank test see Anguelova (2007).

2.5 Sedoglavic’s Method

Besides the Mean Optimal Transformation Approach (MOTA) that was presented in Section 2.3 the other main algorithm used in this thesis is Sedoglavic’s Observ- ability Test (SOT) (Sedoglavic, 2002). SOT is an algorithm that calculates the identifiability of a system in polynomial time. A Maple implementation of the 2.5 Sedoglavic’s Method 19 algorithm can be retrieved from Sedoglavic’s homepage (Sedoglavic, 2009). SOT is an a priori (local) observability algorithm that only focuses on the model struc- ture. Due to the polynomial time properties, this algorithm is of interest when investigating the identifiability6 of fairly large models. In the default settings the SOT is quicker than MOTA. With the input 14 × 233 matrix K a single run of MOTA takes around an hour and for the SOT the a priori identifiability analysis takes a couple of seconds (Intel Celeron M processor 410, 1.46 GHz). In this thesis the algorithm has only been used in its current form in Maple.

2.5.1 A Quick Introduction The algorithm is based on differential algebra and it is related to the power series approach of Pohjanpalo (1978). In this section we will present the main result from Sedoglavic (2002). Let Σ denote an algebraic system as the following,

p¯˙ = 0, Σ x¯˙ = f(¯x, p,u¯ ), (2.16)   y¯ = g(¯x, p,¯ u¯), and assume that there is l parameters p = [p ,p ,...,p ], n state variables x =  1 2 l [x1, x2,...,xn], m output variables y = [y1,y2,...,ym] and r input variables u = [u1,u2,...,ur]. Let also represent the system Σ by a straight-line program which requires L arithmetic operations, e.g., the expression e = (x+1)3 is represented as 2 the instructions t1 := x + 1, t2 = t1, t3 = t2t1 and L = 3. The following theorem is the main result (Sedoglavic, 2002).

Theorem 2.1 Let Σ be a differential system described in (2.16). There exists a probabilistic algorithm which determines the set of observable variables of Σ and gives the number of unobservable variables which should be assumed to be known in order to obtain an observable system. The arithmetic complexity of this algorithm is bounded by O M(ν) N (n + l) + (n + m)L + mνN (n + l) with ν ≤ n + l and with M (ν) (resp. N (ν)) the cost of power series multiplication at order ν +1 (resp. ν × ν). ′ Let µ be a positive integer, D be 4(n + l)2(n + m)d and D be 2 ln(n + l + r + 1) + ln µD D + 4(n + l)2 (n + m)h +ln2nD ′ If the computations are done modulo a prime number p > 2D µ then the proba- 1 2 bility of a correct answer is at least (1 − µ ) . The detected observable variables are for sure observable. It is the unobserv- able variables that are unobservable with high probability. If we choose µ = 3000 the probability of a correct answer is 0.9993 and the modulo is 10859887151 (Se- doglavic, 2002). As we pointed out earlier the algorithm is fairly complicated and more information can be found in Sedoglavic (2002) and Sedoglavic (2009). 6Parameters are regarded as states withp ˙ = 0. 20 Theoretical Background

2.5.2 Usage This section will describe how to use the SOT which is written in Maple. For more information about the usage and the syntax of the algorithm see Sedoglavic (2009). Given an algebraic system Σ (2.16) we write down the equation as the following example

Example 2.1: Calling Observability Test in Maple f := [x*(a-b*x)-c*x]; x := [x]; g := [x]; p := [a,b,c]; u := []; observabilityTest(f,x,g,p,u) ;

f a list of algebraic expressions representing a vector field. x a list of names such that diff(x[i],t) = f[i]. g a list of algebraic expressions representing outputs. p a list of the names of the parameters. u a list of the names of the inputs.

All parameters are regarded as states withp ˙ = 0 and the algorithm tests if the states are a priori observable. If a parameter, regarded as a state variable, is a priori observable then the parameter is a priori identifiable, which was discussed in Section 2.4. The output of the algorithm is a vector that contains information about which parameters/states that are observable, which parameters/states that are unobservable and also the transcendence-degree; how many parameters are needed to be known for the system to be observable/identifiable.

2.5.3 Drawbacks of the Algorithm The SOT implementation is a pilot implementation and therefore contains some defects like, the variable t must be unassigned, the list x of the names of state variables has to be ordered such that diff(x[i],t) = f[i] represents the vector field associated to the model, a division by zero can occur if the chosen initial conditions cancel the separant. Some functions can not be handled. For example, the use of a square root implies that we work on an algebraic extension of a finite field. Some variables and some equations have to be added in order to handle this case. The implementation is efficient for one output. If there are many outputs and if the computation time is too long then some tests can be done in the main loop and some useless computations avoided, (Sedoglavic, 2009). Chapter 3

Results

In this chapter the results will be presented. First of all the algorithms that have been used will be verified. This is done with the help of examples for which the identifiability/observability properties are already known. The two algorithms have been applied to these examples and the outputs compared with the correct one. Furthermore, a comparison between the MOTA algorithm written in Matlab and in Mathematica is presented. Finally the algorithms, MOTA and SOT, have been applied to three models with different complexity and sizes.

3.1 Verifying MOTA

In the article of Hengl et al. (2007) there are two examples that are used to illustrate and demonstrate MOTA. Those examples have been reused and the results are presented below. In the first example the amount of Gaussian noise is not presented in Hengl et al. (2007) and in this thesis we have used ǫ ∈ N(0, 0.1). The second example is exactly the one that Hengl et al. (2007) used. All MOTA runs have the default settings which are: T1 = 0.01, T2 = 0.07, T3 = 0.08, the number of bootstrap samples drawn from the input matrix K is set to half of the number of estimates of the matrix. Finally the number of bootstraps or fitting sequences is set to 35 in the algorithm.

3.1.1 Example 1

The first example contains four parameters. The parameters p2, p3 and p4 have a uniform distribution on the interval I = [0 5]. The parameter p1 depends on 2 two other parameters according to p1 = p2 + sin(p3)+ ǫ, where ǫ ∈ N(0, 0.1). To verify the MOTA algorithm we will draw 100 and 200 estimates independently of each other and compare the results. The correct solution is that the three first parameters p1, p2 and p3 are functionally related and p4 is lacking any relationship with the other parameters. In other words the parameter relations are as shown in Table 3.1 or in the output matrix form (3.1),

21 22 Results

Table 3.1: Relationships between the parameters

* p1 p2 p3 p4 p1 1 1 1 0 p2 1 1 1 0 p3 1 1 1 0 p4 0 0 0 1

1 1 10 1 1 10 S = . (3.1)  1 1 10   0 0 01    From now on we will prefer to use the latter representation when we show the relationship between the parameters. As already explained a symmetric matrix is not always the case due to different contribution strengths from the predictors to the response parameter (Hengl et al., 2007; Hengl, 2007). We will also come aware of that the number of estimates, taken as input to MOTA, affects the outcome considerably.

100 estimates When 100 estimates are drawn, the accumulated output matrix, described in Sec- tion 2.3.5, of 100 MOTA runs has the following composition (100 runs and 100 estimates),

100 0 100 0 0 100 0 0 S100 = . (3.2) 100  100 0 100 0   0 0 0 100      If one only looks at the matrix one would assume that p2 and p4 are independent and that a functional relationship exists only between p1 and p3. The reason for this faulty behavior of the MOTA algorithm originates from that only 100 estimates are drawn in this case. More estimates have to be drawn due to the low contribution strength. The number of fits required for the algorithm depends on the functional relations of the parameters (Hengl et al., 2007). In Table 3.2, which is the outcome from a single MOTA run, we can see that the parameters p1 and p3 are functionally related and also meet the recommendations from Hengl (2007), r2 ≥ 0.9 and cv ≥ 0.1. However, since the number of fits is too low the algorithm does not reveal that the parameter p2 is also related to p1 and p3. 3.1 Verifying MOTA 23

Table 3.2: Output from a MOTA run and properties of the esti- mates from Example 1, 100 estimates

2 ix p1 p2 p3 p4 r cv # pars 1 1 0 1 0 0.9936 0.5836 2 p1, p3 2 0 1 0 0 0.5203 0.6592 1 p2 3 1 0 1 0 0.9936 0.8413 2 p1, p3 4 0 0 0 1 0.0000 0.6263 1 p4

200 estimates In this experiment 200 estimates are drawn instead of 100 estimates in the previous experiment. The accumulated output matrix from 100 MOTA runs are shown below (100 runs and 200 estimates),

100 99 100 0 0 100 0 0 S100 = . (3.3) 200  100 99 100 0   0 0 0 100    100  100  The second row of S200 is the same as the S100. If one only looks at this row one could assume that parameter p2 is lacking any relationship with the other parameter. However, row one and three indicates that parameters p1, p2 and p3 have a strong functional relationship (correct assumption). This situation, when some of the rows contradict each other, is a common result from MOTA. One reason is the contribution strength, another is the bootstrap-technique that results in some random behavior of the MOTA algorithm. A third reason, as mentioned previously, is due to the number of estimates taken as input. It can be a bit tricky to choose how many estimates that are needed for a specific run since the underlying relationship between the parameters is in general not known. The problem is that the more estimates that are used as input the longer the algorithm takes to calculate the relations between the parameters. However, if the number of estimates is too low, as seen above when 100 estimates are drawn, the output matrix can differ considerably from the correct one. In Table 3.3 which is from a single MOTA run one can see that the param- eters p1, p2 and p3 are related. Row 1 and 3 show the same results, that all of them are functionally related, and because of the nonzero r2-value in the second row there is likely that p2 is related to the other parameters. The test-function, when parameter p2 is taken as response, is between the first and second threshold which indicates that the algorithm can not conclude if there exist any relationship between the parameters for that case. However, due to the test-function does not drop below threshold 1 and there are other rows that show that p2 are functionally related, one can suspect that the parameter p2 is related. 24 Results

Table 3.3: Output from a MOTA run and properties of the esti- mates from Example 1, 200 estimates

2 ix p1 p2 p3 p4 r cv # pars 1 1 1 1 0 0.9992 0.5590 2 p1, p2, p3 2 0 1 0 0 0.4728 0.6062 1 p2 3 1 1 1 0 0.9992 0.8621 2 p1, p2, p3 4 0 0 0 1 0.0000 0.5788 1 p4

3.1.2 Example 2 The second example, also taken from Hengl et al. (2007), contains seven parameters which are related as

p1 = −p2 + 10 5 p3 = (3.4) p4p5

p6 = η

p7 =0.1, where p2, p4, p5 and η are all uniformly distributed, drawn independently from the interval I = [0 5]. The input to the MOTA algorithm is a matrix K =[¯p1,... p¯7] and the output matrix should look something like this

1100000 1100000  0011100  S =  0011100  . (3.5)    0011100     0000010     0000001    Two draws, one consisting of 100 estimates and the other one of 200 estimates, are presented below.

100 estimates

100 estimates of p2, p4, p5 and η are drawn from the interval I = [0 5] and the other parameters are created from (3.4). The MOTA algorithm is run 100 times resulting in the accumulated output matrix

100100 0 0 0 0 0 100100 0 0 0 0 0  0 0 100 100 100 0 0  S100 =  0 0 100 100 100 0 0  . (3.6) 100    0 0 100 100 100 0 0     0 0 0 0 01000     0 0 0 0 0 0100      3.1 Verifying MOTA 25

This is clearly the correct output for each run. Due to strong contribution strength the input matrix 100 × 7 K is sufficient to reveal these relationships between the parameters. The parameters p1 and p2 have a connection, p3, p4 and p5 show a functional relationship, and p6 and p7 are independent of all other parameters. From Table 3.4 one can see that the MOTA finds the relationships between the parameters and there is no ambiguity which parameters are linked and which are independent. The high r2-values for ix ∈ [1, 5] indicate that the predictors can explain the variance very well. The independent parameters p6 and 2 p7 are identified fairly fast which can be seen in the zero r -values which indicates that the test-function drop below threshold 1 in its first iteration. During the first iteration, if the predictor that gives the largest value of the test-function is not good enough the test-function is below T1 and the algorithm conclude that the response parameter is independent.

Table 3.4: Output from a MOTA run and properties of the esti- mates from Example 2, 100 estimates

2 ix p1 p2 p3 p4 p5 p6 p7 r cv # pars 1 1100000 1.0000 0.1924 2 p1, p2 2 1100000 1.0000 0.6134 2 p1, p2 3 0011100 0.9832 2.4188 3 p3, p4, p5 4 0011100 0.9778 0.6452 3 p3, p4, p5 5 0011100 0.9692 0.5753 3 p3, p4, p5 6 0000010 0.0000 0.5999 1 p6 7 0000001 0.0000 0.0000 1 p7

200 estimates When 200 estimates are drawn for the matrix 200 × 7 K the result is the same, as it should be, with the 100-estimates-case. The accumulated output matrix is shown below

100100 0 0 0 0 0 100100 0 0 0 0 0  0 0 100 100 100 0 0  S100 =  0 0 100 100 100 0 0  . (3.7) 200    0 0 100 100 100 0 0     0 0 0 0 01000     0 0 0 0 0 0100      The number of estimates, or in the general case acceptable parameters, that the MOTA algorithm needs to behave well depends obviously on the structure of the underlying system. In this example a change from 100 to 200 estimates does not change the behavior of the algorithm as it does for the first example. If one suspects that the number of estimates is to few then a re-run of the algorithm with more estimates is a good idea. Table 3.5 show basically the same as for the 100 tuples case with one interesting 26 Results change. The r2-value when ix = 6 is no longer zero. The algorithm has problem to determine that parameter p6 is independent of all other parameters in this case. 2 During the first iteration the test-function is between T1 and T2 and the r -value is calculated. However, there are no other rows that indicate that p6 would be linked to any other parameter. Even if the algorithm gives a nonzero r2-value this does not automatically mean that the parameter has an undiscovered functional relationship.

Table 3.5: Output from a MOTA run and properties of the esti- mates from Example 2, 200 estimates

2 ix p1 p2 p3 p4 p5 p6 p7 r cv # pars 1 1100000 1.0000 0.2045 2 p1, p2 2 1100000 1.0000 0.5690 2 p1, p2 3 0011100 0.9934 2.9393 3 p3, p4, p5 4 0011100 0.9885 0.4898 3 p3, p4, p5 5 0011100 0.9894 0.5615 3 p3, p4, p5 6 0000010 0.6531 0.6175 1 p6 7 0000001 0.0000 0.0000 1 p7

3.2 Verifying Sedoglavic Observability Test

To verify SOT (Sedoglavic, 2002, 2009) the algorithm has been tested on linear and nonlinear models. The properties of these models are known and the output from SOT can therefore be verified. Most of the examples have been taken from Ljung and Glad (1994).

3.2.1 A Linear Model The linear model

x˙ 1 = −x1

x˙ 2 = −2x2

x˙ 3 = −3x3

y =2x1 + x2 has been taken from Ljung and Glad (2006) page 174. The observability of the equation above can be analyzed with the observability matrix, (2.15). With

−1 0 0 A = 0 −2 0  0 0 −3   and C = 2 1 0   3.2 Verifying Sedoglavic Observability Test 27

the observability matrix is

2 1 0 O(A, C)= −2 −2 0 .  2 4 0   Since the third column only consists of zeros one can conclude that the state x3 is unobservable. SOT applied on the model equations above gives that the third state is unobservable which is correct.

3.2.2 Goodwin’s Napkin Example Goodwin’s Napkin example, from Ljung and Glad (1994) page 9, is described by

y¨ +2θy˙ + θ2y =0. (3.8) In the current form one can not use the SOT, however with the help of the following transformations, x1 = y ⇒ x˙ 1 =y ˙ = x2 2 2 x2 =y ˙ ⇒ x˙ 2 =y ¨ = −2θy˙ − θ y = −2θx2 − θ x1, one can use the SOT. When observing y, the output of the system, then SOT gives that all parameters and states are observable. This is also the conclusion made by Ljung and Glad (1994) page 9.

3.2.3 A Nonlinear Model In Ljung and Glad (1994), page 9, a simple nonlinear model is presented. The model is described by the following equations

2 x˙ 1 = θx2 x˙ 2 = u

y = x1.

With the algorithm presented in Ljung and Glad (1994) the output is that the system is a priori identifiable. SOT gives the same result.

3.2.4 Compartmental Model A compartmental model is given as Example 4 in (Ljung and Glad, 1994) page 10. The model equations are

Vmx(t) x˙(t)= − k01x(t) km + x(t) x(0) = D y = cx. 28 Results

Ritt’s algorithm in Ljung and Glad (1994) gives that the only parameter that is a priori identifiable is k01, if the initial value D is regarded as unknown. SOT does not regard initial values of the states and the outcome is the same as Ritt’s algorithm. The parameter k01 is a priori identifiable.

3.3 Implementation Aspects and Computational Complexity

The main part of this master thesis work has been to translate the algorithms Mul- tistart Simulated Annealing (MSA) algorithm and Mean Optimal Transformation Approach (MOTA) algorithm from Matlab to MathModelica/Mathematica. In the beginning it was planned that all testing and evaluations would be conducted in the MathModelica and Mathematica environment. However due to timing is- sues testing has mainly been performed in Matlab. The reason for this is twofold. Firstly, in the MSA algorithm, there is a need for a lot of simulations of the system with different sets of parameters. A fast simulation is vital for the performance and timing of the algorithm. Secondly, in MOTA the ACE algorithm is run many times and without efficient code the whole MOTA will be very slow and testing fairly complicated models will take a long time. Another problem is the compu- tational complexity in MOTA. The number of estimates that MSA produce is in most cases far more than MOTA can handle. A selection algorithm is needed and the one we have used in this thesis is presented in a section below.

3.3.1 Simulation

In the MSA algorithm at least one simulation is performed each time the cost function is calculated. This process can be troublesome if the simulation is not fast enough. In the MathModelica/Mathematica software environment the simulation is performed in MathModelica with the help of the Mathematica Link. The main program is written in Mathematica and use MathModelica when the simulation is to be carried out. For each parameter vectorp ¯ the simulation is run and the output is then used in Mathematica. The simulation is initiated for everyp ¯ without any pre-compilation for a more efficient simulation. Here lies the reason why Matlab is a lot faster than the current implementation of the MSA algorithm in MathModelica/Mathematica. In Matlab the simulation is performed in the SBTB with the SBADDON pack- age. When the MSA starts, the model equation are converted to a compiled C-file and the parameters are passed as arguments each time the simulation is run. The result is that the simulation is about 100 times faster than the usually implemen- tation. This is the main reason why the MSA algorithm is much faster in Matlab than Mathematica. In MathModelica one can do similar things for a faster simulation and this has also been pointed out to MathCore. It is essential for the performance of the algorithm that the simulation is time efficient. 3.4EvaluationonRealBiologicalData 29

3.3.2 ACE

As mentioned earlier the MOTA uses optimal transformations to identify func- tionally related parameters. In practice this is done by calculating the average of the optimal transformation and then apply a test-function to determine which parameters are linked with each other. These optimal transformations are calcu- lated in ACE and the core function is vital for the speed of the whole algorithm. In the current version of MOTA, in Mathematica, this is conducted either by call- by-value1 or by using a global structure to calculate the optimal transformations. However this is not fast enough. In the version of MOTA in Matlab the core func- tion in ACE is written in a C-file and the result is much faster than it would be if the function would be written as an ordinary .m-file. In MathModelica one can use the MathCode to do similar things. One can use a pre-compiled C-file, instead of evaluations of Mathematica notebooks, -nb-files. MathCore has been informed about this timing problem with ACE.

3.3.3 Selection Algorithm

Although the MOTA algorithm has no limit on the number of estimates in the input matrix K, the computational complexity increases with increasing number of estimates. Therefore a selection algorithm is needed that reduces the number of estimates from MSA. The idea is to get fewer and sparse estimates that the MOTA can handle. This is in line with the recommendations cv ≥ 0.1 in Hengl (2007). The more sparse the estimates are the higher cv-value in general. The used selection algorithm is

k=n (¯p − p¯ )2 ≥ d i ∈ I,i 6= j, n = #parameters (3.9) v i,k j,k uk=1 uX t where I denotes the determined acceptable parameters. The selection algorithm calculates the Euclidean distances between the parameter vectorsp ¯ and if the dis- tance is lower than a critical distance d, for any i ∈ I, then the current parameter vector is not used. The algorithm is also described in Algorithm 3.

3.4 Evaluation on Real Biological Data

In this section the algorithms, SOT and, MOTA with the MSA, will be applied to a number of models from systems biology. The data used in the cost function is real.

1The values of the variables are copied which takes computational time. 30 Results

Algorithm 3 Selection algorithm for the acceptable parameters The critical distance d can be changed in (3.9) and the selection algorithm is the following one,

1: sort all parameter vectors,p ¯, in descending order on their corresponding value of the cost function, denote this matrix B. 2: take the first parameter vector in matrix B, the one with the lowest value/cost. 3: remove all points in matrix B that have an Euclidean distance smaller than d to the taken parameter vector. 4: the taken parameter vector is chosen as an acceptable parameter and if B is not empty go to 2:.

3.4.1 Model 1: SimplifiedModel The first model that will be investigated is called SimplifiedModel and consists of the following equations

Vmx1 x˙ 1 = − (3.10a) km + x1 Vmx1 x˙ 3 = (3.10b) km + x1 x1(0) = 4 (3.10c)

x3(0) = 0 (3.10d)

y = x3. (3.10e)

The name SimplifiedModel originates from that the model is a simplification of a larger model in the sense of fewer states. This is the reason why the first model only contains states x1 and x3. The decay rate, in the larger model, of x3 is assumed to be negligible and the remaining reactions is in the classical Michaelis- Menten form. This is the reason why the model (3.10) is used. The larger model is presented below,

x˙ 1 = −k1x1 + km1x2

x˙ 2 = k1x1 − km1x2 − k2x2

x˙ 3 = k2x2 − k3x3

x1(0) = 4

x2(0) = 0

x3(0) = 0

y = x3. 3.4EvaluationonRealBiologicalData 31

The output data ymeas = x3(t) has been measured eight consecutive times in time-samples t ∈ [0, 0.2, 0.4,..., 2.0]. The result are presented in Table 3.6. The measurements have been done using so called Western blots, which allows for time- resolved measurements of the state of phosphorylation of various proteins; here the insulin receptor and the insulin receptor substracte-1. It has been measured by the group of Peter Strålfors at IKE, Linköping University.

Table 3.6: This table presents the values from the eight measure- ments. The time samples are shown in the first column

time y¯1 y¯2 y¯3 y¯4 y¯5 y¯6 y¯7 y¯8 0.0 -0.087 -0.333 0.025 0.058 -0.230 0.238 0.238 -0.008 0.2 3.367 3.549 3.286 3.841 3.377 3.427 3.617 3.416 0.4 4.014 3.688 4.098 4.280 3.817 4.127 4.206 3.636 0.6 3.913 4.131 4.156 4.136 4.251 4.127 4.231 3.753 0.8 3.674 4.046 3.783 4.277 3.833 4.100 4.038 3.810 1.0 3.791 4.116 4.094 4.331 4.111 3.864 4.069 3.791 1.2 3.991 3.928 4.210 3.617 4.077 4.170 4.138 4.107 1.4 4.104 3.939 3.914 3.931 3.695 3.943 4.013 4.053 1.6 4.113 4.148 4.176 3.790 4.031 4.036 3.787 3.840 1.8 4.065 4.004 3.859 3.875 4.075 3.797 4.143 4.100 2.0 3.747 3.545 4.182 3.881 4.050 4.032 3.989 3.784

In Figure 3.1 all eight measurements are plotted against time. Note that these samples are measurements of the signal and due to noise they are not all the same. Due to the variations between the measurements a mean value is computed. The mean can also be seen in Figure 3.1. The model has been tested with the help of MOTA and SOT to identify which, if any, parameters that are non-identifiable.

Results from SOT

The equations from SimplifiedModel (3.10) are similar to the compartmental model in Section 3.2.4. A significant difference is that in the former case the measured signal y is x3 without any unknown parameter involved in the expression. When applying SOT on the SimplifiedModel the output is that all states and parameters are a priori observable.

Results from MOTA

The model (3.10) has been simulated in the MSA algorithm and the acceptable parameters, those parameters that give a sufficiently low cost (110 percent of the best one) of the cost function, was collected. These acceptable parameters, (542 × 2), was taken as input to the MOTA algorithm. 10 runs of MOTA have been conducted and the accumulated output matrix is the following 32 Results

measurements SimplifiedModel mean of the measurements 4.5 4.5

4 4

3.5 3.5

3 3

2.5 2.5

value 2 value 2

1.5 1.5

1 1

0.5 0.5

0 0 0 0.5 1 1.5 2 0 0.5 1 1.5 2 time time

Figure 3.1. This figure show (a) the eight measurements of ymeas = x3(t), (b) the mean of the measurements which is used is the MSA when calculating the cost for a given parameter vectorp ¯

10 10 S542 = . (3.11) 10 10 10   The MOTA identifies that there exists a strong functional relationship between the parameters Vm and km. In Figure 3.2 the optimal solution for the parame- ters from the MSA is plotted against the mean values of the measurements. In Table 3.7 the result from a single MOTA run is presented. The r2-value is high and the functional relationship between Vm and km is strong. This follows the recommendation given in Hengl (2007), r2 ≥ 0.9 and cv ≥ 0.1.

Table 3.7: Output from a MOTA run and properties of the esti- mates from Model 1, 542 estimates

2 ix Vm km r cv # pars 1 1 1 0.9991 0.3623 2 p1, p2 2 1 1 0.9991 0.3637 2 p1, p2 3.4EvaluationonRealBiologicalData 33

Comparison simulation/experiment 4.5

4

3.5

3

2.5 3 x 2

1.5

1

0.5

0 0 0.5 1 1.5 2 time

Figure 3.2. A comparison between the mean values and output from the model when the optimal parameter vectorp ¯ is used. The stars represent the mean values 34 Results

Vm vs km 11000

10000

9000

8000

7000

6000 km

5000

4000

3000

2000

1000 1 2 3 4 5 6 7 8 9 10 4 Vm x 10

Figure 3.3. This figure is for the model (3.10) and shows the estimates, from MSA, of the two parameters, Vm and km, that are plotted against each other. The linear relationship is prominent

Analysis

The results from SOT and MOTA differs. This is not a contradiction as one could assume but rather an indication that the model is practically non-identifiable. In other words, in the ideal case the parameters can be estimated from the output y, there exist no input u in this case, however because of the data is not of sufficient quality this results in that the parameters become non-identifiable An indication of the reason why the model is practically non-identifiable de- rive from the expression (3.12). When x1 is small then the expression can be approximated with the following,

Vmx1 Vmx1 x˙ 3 = ≈ = kx1, (3.12) km + x1 km the original model equation can be approximated with the equation kx1. This implies that there is only the quotient Vm = k that can be derived. The result km is that the parameters Vm and km are functionally related and non-identifiable. Figure 3.3 shows this behavior. One can clearly see a linear relationship between the parameters. The correlation value between the two vectors V¯m and k¯m of acceptable parameters is as high as 0.99. This is a schoolbook example of the impact of practical non-identifiability. Even if the a priori identifiability analysis indicates that the parameters can be estimated from the input and output this does not imply that the parameters can be estimated in practice. 3.4EvaluationonRealBiologicalData 35

3.4.2 Model 2: Addition of insulin to the media The model described by (3.13) is a minimal model for the first few steps in the insulin receptor activation. It contains insulin binding, receptor internalization into the cytosol, and recycling back from the cytosol to the membrane. A common observation in systems biology is a overshoot which is a measured signal that shoots over the final value. The model in this section has been created to give one such overshoot before reaching the final value. For more information about the model see Cedersund et al. (2008) and Brännmark et al. (2009). The model equations are

I˙R = −k1uIR + kRIRi (3.13a)

I˙Rp = k1uIR − kIDIRp (3.13b)

I˙Ri = kIDuIRp − kRIRi (3.13c)

IR(0) = 10.0 (3.13d)

IRp(0) = 0.0 (3.13e)

IRi(0) = 0.0 (3.13f)

y = kY IRp. (3.13g)

The input u = ins is the insulin level. In the following measurements this signal has been equal to one, u = u0 ≡ 1. The measured signal in the the model and experiment is y = kY IRp(t) and it has been measured three consecutive times. The result can be viewed in Figure 3.4. As one can see, the experimental data differs from measurement to measurement indicating significant noise. The time- samples, t ∈ [0, 0.9, 1.15, 1.30, 2.30, 3.30, 5.30, 7, 15], when the measurements are taken are not equidistant. Due to bad measurements the third sample, when t = 2.3, has been corrupted and is not used. The parameters vector in this example isp ¯ = [k1, kID, kR, kY ]. The mean from the measurements is illustrated in Figure 3.4 and it is used in the MSA algorithm when calculating the cost for a given parameter vectorp ¯.

Results from SOT When applying SOT to the model (3.13) we get that the only parameter that is a priori non-identifiable is kY , all other parameters are a priori identifiable. This result will be taken into account when the MOTA will be used.

Results from MOTA The model that is described by (3.13) has been run and simulated in the MSA algorithm. The acceptable parameters were taken from the parameter vectorp ¯ that had a cost that were below 110 percent of the best cost so far. Six different starting points for the algorithm were chosen randomly between the low and high bounds in the MSA algorithm. The settings for the MSA can be seen in Table 3.8. The low and high bounds and the six starting points can be viewed in Table 3.9 36 Results

measurements mean of measurements 110 110

100 100

90 90

80 80

70 70

60 60

value 50 value 50

40 40

30 30

20 20

10 10

0 0 0 5 10 15 0 5 10 15 time time

Figure 3.4. This figure show (a) the three measurements of y = kY IRp(t), (b) the mean of the measurements and Table 3.10. The result was six different sets of the acceptable parameters. If one only summed up these subsets the result would be 32946 estimations for every parameter. This is far too many estimations that MOTA can handle for the input matrix K. Therefore the selection algorithm has been applied to this larger set. The selection algorithm that has been applied to the set of acceptable parameters can be seen in (3.9),

Table 3.8: Settings in the MSA for the experiments in Model 2, p¯ = [k1, kID, kR, kY ]

Temp start 10000 Temp end 1 Temp factor 0.1 Maxitertemp 4000 Maxitertemp0 12000 Max restart points 10

Table 3.9: The low and high bounds and the starting points for the first and second run for the parameters

Parameter Low bound High bound Start guess run 1 Start guess run 2 k1 0.01 500 387.6278 495.2873 kID 0.01 500 103.3198 55.929 kR 0.01 500 211.8210 257.3806 kY 0.01 500 489.6217 106.6258 3.4EvaluationonRealBiologicalData 37

Table 3.10: The starting points of the parameters for run 3 to run 6

Start guess run 3 Start guess run 4 Start guess run 5 Start guess run 6 224.8788 161.3668 22.9590 460.4434 113.3238 281.1164 260.1913 156.9335 20.3859 414.9658 418.6167 243.7368 244.8283 23.4521 483.8139 205.4742

After the larger set has been run in (3.9) with d = 1 the acceptable parameters are reduced to 558 × 4 K and it is this matrix that is taken as input to MOTA. The MOTA algorithm has been run 100 times and the result is the accumulated output matrix

100 0 2 0 0 100 22 100 S558 = . (3.14) 100  10 0 100 0   0 100 8 100    This accumulated output matrix is non-uniform and the rows contradict each other. However the indication is that the parameters kID and kY have a strong functional relationship and are therefore non-identifiable. There are some other smaller indications, row 1 and 3, that k1 and kR would be connected and that kR would relate, row 2 and 4, to both kID and kY . To determine if these smaller indications describe the true linkage between the parameters or just a random element in the MOTA, another run of the MSA is conducted. The new run is with new low and high bounds taken from the previous acceptable parameters lowest respectively highest value for each parameter. The settings are the same as before and can be viewed in Table 3.8. In Table 3.11 the start guess for the seventh run is presented.

Table 3.11: Settings in the MSA for the experiment M7

Parameter Low bound High bound Start guess run 7 k1 0.69 625 270.7719 kID 0.68 26 6.6090 kR 0.28 10 2.514 kY 15 500 488.6613

The selection algorithm (3.9) is once again applied with d = 1 to the new set and the input matrix K consists of 233 estimations for each parameter. The MOTA has now been run 1000 times and the accumulated output matrix is the following 38 Results

comparising simulation/experiment 200

180

160

140

120

100

y=kY(IRp) 80

60

40

20

0 0 5 10 15 time

Figure 3.5. This figure show the output from the optimal parameter vector p¯ when applied to the model (3.13). The mean value from the measurements is marked with stars

1000 0 0 0 0 1000 151 1000 S233 = . (3.15) 1000  0 0 1000 0   0 1000 109 1000      The indication that k1 and kR would have a connection is gone. On the other hand the indication that kR are related to kID and kY is still there. The result is similar with the result in Section 3.1.1 with the difference that in this example there are lower percentage values. In Table 3.12 the result from a single MOTA run is shown. With the recommendations from (Hengl, 2007), r2 ≥ 0.9 and cv ≥ 0.1, we conclude that parameter kID and kY have a strong functional relation.

Table 3.12: Output from a MOTA run and properties of the esti- mates from Model 2, 223 estimates

2 ix k1 kID kR kY r cv # pars 1 1 0 0 0 0.7914 0.6365 1 p1 2 0 1 0 1 0.9421 1.4250 2 p2, p4 3 0 0 1 0 0.7928 0.0227 1 p3 4 0 1 0 1 0.9421 1.4066 2 p2, p4

In Figure 3.5 the optimal solution for the parameters from the MSA is plotted against the mean values of the measurements. One can see the overshoot clearly and the model follows the measurements fairly well. 3.4EvaluationonRealBiologicalData 39

kID vs kY 550

500

450

400

350

300 kY 250

200

150

100

50

0 0 5 10 15 20 25 kID

Figure 3.6. A plot showing the estimates of kID and kY . The cluster in the lower left corner is zoomed in and presented in Figure 3.7

Analysis

In Figure 3.6 the parameter kID and kY is plotted. Here one can see why the MOTA algorithm finds a strong functional relationship between the parameters. There are two clusters and the first cluster, the one in the lower left corner is enlarged in Figure 3.7. The linear properties are clear and only the quotient between the parameter kID and kY seems to be identifiable which render both these parameters non-identifiable. The parameter kY is a priori (and practically) non-identifiable and the parameter kID is practically non-identifiable. The other parameters k1 and kR are practically identifiable 40 Results

kID vs kY 30

28

26

24

22

20 kY

18

16

14

12

10 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 kID

Figure 3.7. This figure show one of the clusters in Figure 3.6. The linear property is striking 3.4EvaluationonRealBiologicalData 41

3.4.3 Model 3: A model for the insulin receptor signaling, including internalization

The third and last model that is examined in this thesis is also a biological one. This model describes the same system as Model 2, but a little bit more extensively. More states are included in the insulin receptor activation, and also the first sub- strate - insulin-receptor substrate-1- is included. For more information about the model and the measurements see Cedersund et al. (2008) and Brännmark et al. (2009). The model is described by

I˙R = −k1uIR − k1basalIR + kRIRptp + km1IRins (3.16a)

I˙Rins = k1uIR + k1basalIR − k2IRins − km1IRins (3.16b)

I˙Rp = k2IRins − k3IRp + kmIRpP T P (3.16c)

I˙RpP T P = k3IRp − km3IRpP T P − kDIRpP T P (3.16d)

I˙Rptp = kDIRpP T P − kRIRptp (3.16e)

I˙Rs = −k4(IRp + IRpP T P )IRs + km4IRSP (3.16f)

I˙RSP = k4(IRp + IRpP T P )IRs − km4IRSP (3.16g)

IR(0) = 10.0 (3.16h)

IRins(0) = 0.0 (3.16i)

IRp(0) = 0.0 (3.16j)

IRpP T P (0) = 0.0 (3.16k)

IRptp(0) = 0.0 (3.16l)

IRs(0) = 10.0 (3.16m)

IRSP (0) = 0.0 (3.16n)

yIRp = kY 1(IRp + IRpP T P ) (3.16o)

yDoubleStep = kY 2IRSP (3.16p)

yAnna = kY AnnaIRSP (3.16q)

yDosR = kY DosRIRSP . (3.16r)

As one can see the model contains seven states and fourteen parameters. There are three consecutive measurements that have been used when calculating the cost for the different parameter vectors. The cost function in this case is similar to the one presented in (2.4). The model (3.16) is tested in SOT and MOTA and the results are presented in the following sections.

Results from SOT

SOT applied to the model (3.16) gives that the following parameters are a priori non-identifiable: [k3, km3, kD, k4, kY 1, kY 2, kY Anna, kY DosR]. The remaining pa- rameters, [k1, k1basal, km1, k2, kR, km4], are a priori identifiable. 42 Results

Results from MOTA Before testing the model in MOTA a run of the MSA algorithm is done. The settings can be viewed in Table 3.13. The start guess, shown in Table 3.14, is the best known parameter vector that minimize the current cost function and is used in this case.

Table 3.13: Settings in the MSA for the experiment M7

Temp start 10000 Temp end 0.5 Temp factor 0.1 Maxitertemp 14000 Maxitertemp0 42000 Max restart points 30

Table 3.14: Low and high bounds and the starting point in MSA

Parameter Low bound High bound Start guess k1 0.01 500 149.1 k1basal 0.0001 100 0.0001 km1 0.01 500 413.1 k2 0.01 500 0.7564 k3 0.01 500 6.0889 km3 0.01 500 381.0 kD 0.01 500 481.0 kR 0.01 500 0.3369 k4 0.01 1500 292.5 km4 0.01 1500 1496.5 kY 1 10 25 15.2186 kY 2 1 100 53.1776 kY Anna 1 100 100.0 kY DosR 1 100 98.2

From the MSA algorithm we get over 200000 estimates (acceptable parameters) when we run the algorithm with the settings from Table 3.13. These are too many and we use the selection algorithm described in Algorithm 3 with d = 100. The number of estimates then drop down to 290 which is used as input to MOTA algorithm. 30 runs of the MOTA give us the accumulated output matrix (3.17), 3.4EvaluationonRealBiologicalData 43

300000000000000 030000000000000  883080001000175   0 0 0300 0 0300 0 0 0 0 0     0 0 0 11 30 21 21 12 21 21 0 18 2 1     0 0 0 0 233023 0 2322 0 4 8 0     0 0 0 0 1818300 180 0 0 0 0  S290 =   . 30  0 0 0300 0 0300 0 0 0 0 0     0 0 0 0 0 0 03030300 0 0 0     0 1 0 1 1 1 02829300 2 0 0     000000000030000     0 0 0 0 0 0 0 0 0 0 030300     0 0 0 0 0 0 0 0 0 0 030300     0 0 0 0 0 0 0 0 0 0 030030     (3.17) It is hard to ratiocinate which parameters that are identifiable and which are not. However it seems that k1, k1basal, km1 and kY 1 are practically identifiable. The rest of the parameters seem to be functionally related with each other in different ways. The parameters k2 and kR appear to have a connection and kR, k4, km4 is another parameter combination that is linked. Row 12 and 13 show that kY 2 and kY Anna is solely connected, however row 14 indicates that the linkage between kY 2 and kY DosR is strong. In Table 3.15 the result from a single MOTA run is presented. Due to the large output matrix the parameter relations in Table 3.15 are shown in Table 3.16 instead. In this MOTA run the conclusions from (3.17) are confirmed. The pa- rameters that seem to be identifiable are k1, k1basal, km1 and kY 1. The second 2 observation is that the k2 and kR are related. On the other hand the r -value for row 4 and 8 is below 0.90 which do not follow the recommendations of Hengl 2 (2007). A third observation is of parameters kY 2 and kY Anna. Due to the high r - value (higher than 0.90) and the cv-value higher than 0.1 Hengl (2007) recommend to conclude that there is a real linkage between them. The other rows contradict each other and it is difficult to see which parameters are related and in what way. 44 Results

Table 3.15: Output from a MOTA run and properties of the es- timates from Model 3, 290 estimates. The found relations can be viewed in Table 3.16

2 ix k1 k1basal r cv # pars 1 1 . . . 0.9978 0.3518 1 k1 . .. 2 . . 0.9961 0.4518 1 k1basal 3 0 0 0.9977 0.2640 1 km1 4 0 0 0.8797 0.5076 2 k2, kR 5 0 0 0.9982 2.6240 1 k2, k3, km3, kD, kR, k4, km4, kY 2 6 0 0 0.9670 0.6022 1 k3, km3, kD, k4, km4 7 0 0 0.9916 0.5630 1 k3, km3, kD, k4 8 0 0 0.8797 0.2710 2 k2, kR 9 0 0 0.9304 0.6395 2 kR, k4, km4 10 0 0 0.8555 0.3539 2 kR, k4, km4 11 0 0 0.9936 0.3188 1 kY 1 12 0 0 0.9440 0.1862 2 kY 2, kY Anna 13 0 0 0.9440 0.1726 2 kY 2, kY Anna 14 0 0 0.8951 0.1898 1 kY 2, kY DosR

Table 3.16: Relationships between the parameters from a single MOTA run for model 3. k1b = k1basal , kYA = kY Anna and kY D = kY DosR

k1 k1b km1 k2 k3 km3 kD kR k4 km4 kY 1 kY 2 kYA kY D 100000000000 0 0 010000000000 0 0 001000000000 0 0 000100010000 0 0 000111111101 0 0 000011101100 0 0 000011101000 0 0 000100010000 0 0 000000011100 0 0 000000011100 0 0 000000000010 0 0 000000000001 1 0 000000000001 1 0 000000000001 0 1

Analysis Let us recollect the outcome from the two algorithms. SOT gives the result that the parameters [k1, k1basal, km1, k2, kR, km4] are a priori identifiable. The rest, [k3, km3, kD, k4, kY 1, kY 2, kY Anna, kY DosR], are concluded to be non-identifiable. 3.4EvaluationonRealBiologicalData 45

The MOTA on the other hand gives that the parameters that seem to be practically identifiable are k1,k1basal,km1, kY 1. As said before, practical identifiability implies a priori identifiability. On the other hand, if a parameter is a priori non-identifiable this implies that the param- eter cannot be practically identifiable. If not, it would imply that the parameter would be a priori identifiable which is contradicting. The one who has paid atten- tion to the results above has already found that the two results contradict each other. The contradiction is due to parameter kY 1. SOT has it as a non-identifiable parameter and MOTA has it as a practically identifiable parameter. How can this happen? Which algorithm is correct? As a matter of fact SOT is the one with the correct result. The reason for this faulty behavior of MOTA is that the parameter kY 1 never is used in the cost function used by MSA. This is rather a user mistake than a fault in MOTA. The measurements of yIRp were not used when construct- ing the cost function. If a parameter is not used when calculating the cost then the MSA algorithm cannot handle it correctly. A change in parameter kY 1 does not impact the cost and becomes disconnected with the rest of the parameters. This disconnection is in MOTA treated as an independent parameter and the outcome is the same as if the parameter would be identifiable. SOT on the other hand focuses only on the model equations and in that perspective the parameter kY 1 is a priori non-identifiable. There are three other parameters that seem to be practically identifiable in MOTA, k1, k1basal and km1. They are, according to SOT, a priori identifiable and therefore there is no contradiction in this case. Therefore it is likely that these are the only parameters in the model (3.16) that can be determined from the input and output signals. The other parameters that are a priori identifiable by SOT are k2, kR and km4. Both k2 and kR show an indication that they are related according to the accumulated output matrix (3.17) and also by the single run of MOTA in Table 3.16. However, the r2-value is below the recommended. This is a sign of that there can be more parameters related to k2 and kR rather than a sign that the parameters are independent. Parameter km4 appears to be related at least with parameter k4. From the accumulated output matrix (3.17) there are some contradicting rows with the respect of km4. This non-symmetric output matrix is a real problem when determining which parameters are functionally related. To give a summary of the conclusions for this third model there are three pa- rameters, k1, k1basal and km1, that appear to be practically identifiable. Three pa- rameters, k2, kR and km4, are a priori identifiable but practically non-identifiable. Parameter kY 1 is special case and its value is never used in the MSA algorithm and is therefore mistaken for an independent parameter by MOTA. The rest of the parameters are a priori non-identifiable.

Chapter 4

Conclusions and Future Work

In this chapter the conclusions of the thesis will be presented together with some ideas for future work.

4.1 Conclusions

First of all, the algorithms that have been evaluated, MOTA and SOT, can be used together. The use of one of them does not exclude the use of the other. When we try to determine the identifiability issue of a model both the a priori identifiability analysis and the simulation approach can be used successively, which is also addressed by Hengl et al. (2007). As a matter of fact, the use of a priori identifiability analysis, in this case SOT, helps a great deal when we are trying to decipher the output matrix from MOTA. The parameters that have been found a priori identifiable from the a priori identifiability analysis are the only ones that can be practically identifiable. This is under the assumption that the model equations are used correctly and all parameters affect the cost when calculating the cost function, when the acceptable parameters are searched for. This was not the case in model 3 in Section 3.4.3. The SOT algorithm, although it has some known drawbacks (Sedoglavic, 2009), is efficient for the examples and models that have been tested in this thesis. Time is an important observation and since the algorithm is polynomial in time the SOT is highly interesting when a fast algorithm is needed to determine the a priori identifiability aspects of a model. However, even if the algorithm is polynomial in time, large models with many states and parameters can be time-consuming, but it is anyway far better than most of the exponential ones. A problem that is prominent during this thesis is the non-symmetric shape of the output matrix from MOTA. The reason for this is that the matrices often consist of contradicting rows which makes it difficult to decode the functional relationship of the parameters. Fortunately the SOT was a big help many times

47 48 Conclusions and Future Work when the relation between the parameters were examined. The non-symmetric matrices from MOTA depend on a great deal of the es- timates of the parameters, or acceptable parameters. These estimates have been produced by MSA. Therefore the settings of the MSA are of importance for the result of the MOTA algorithm. The number of fits/estimates depends on the underlying functional relation and for different models these relations differ. Be- cause of that it is hard to know how many fits that are required by the MOTA algorithm to reveal the parameters that are connected with each other. The con- tribution strength also inflicts on the output matrix. A parameter when taken as a predictor on the right-hand side in MOTA can contribute more or less to the response. A parameter with low contribution strength to a certain response can therefore be mistaken as a non-related parameter to the response leading to a non-symmetric output matrix. Also the quality of the acceptable parameters affects the behavior of MOTA. How to get sufficiently good estimates from the MSA? In this thesis a selection algorithm is used to get more sparse estimates of the parameters which are then used by MOTA. This is according to Hengl (2007) recommended due to the cv ≥ 0.1 recommendation when deciphering the functional relationship from the output matrix. Even if the MOTA algorithm is difficult to manage, the problem of practically non-identifiable parameters is of great interest and a big problem. If the model in question is a priori identifiable this does not directly imply that the parameters can be estimated in practice. The quality of data may not have been considered and could result in practically non-identifiable parameters. Due to this, more focus is required on the quality of the measurements from the input and output signals.

4.2 Proposal for Future Work

One proposal for future work is to go through thoroughly the work of Sedoglavic. In this thesis we only scratched the surface of his algorithm and a deeper under- standing of the SOT would shed more light to the subject of identifiability. Another interesting subject is the relations between the parameters, how the parameters are connected and which parameters are needed to be known for an identifiable model. An algorithm that can be of help is one from Sedoglavic, Observability Symmetries (Sedoglavic, 2009), which determine how the parameters are linked. The space of the acceptable parameters is also a field that can be examined more thoroughly. Which methods are more suitable than others to identify these acceptable parameters? How good is the MSA algorithm for obtaining these pa- rameters? The last proposal for future work is to investigate how to obtain symmetric output matrices from MOTA. What can be done to reduce the possibility of getting non-symmetric output matrices? If this is solved then MOTA would be a lot more useful and reliable to work with. Bibliography

M. Anguelova. Observability and identifiability of nonlinear systems with appli- cations in biology. PhD thesis, Chalmers University of Technology, 2007.

S. Audoly, G. Bellu, L. D’Angiò, M.P. Saccomani, and C. Cobelli. Global identfi- ability of nonlinear models if biological systems. Biomedical engineering, 48(1): 55–65, 2001.

L. Breiman and J. Friedman. Estimating optimal transformations for multiple regressions and correlation. Journal of the American Statistical Association, 80 (19):580–598, 1985.

C. Brännmark, R. Palmer, T. Glad, G. Cedersund, and P. Strålfors. Receptor internalization is necessary but not sufficient for control of insulin signalling in adipocytes. Submitted, 2009.

G. Cedersund, J. Roll, E. Ulfheilm, A. Danielsson, H. Tidefelt, and P. Strålfors. Model-based hypothesis testing of key mechanisms in initial phase of insulin signaling. PLoS Comput Biol., 4(6), 2008.

P. Fritzson. Principles of object-oriented modelling and simulation with Modelica 2.1. IEEE Press, 2003. ISBN 0-471-47163-1.

S. Hengl. Quickstart to the MOTA-Software, 2007.

S. Hengl, C. Kreutz, J. Timmer, and T. Maiwald. Data-based identifiability anal- ysis of non-linear dynamical models. Bioinformatics, 23(19):2612–2618, July 2007.

T. Kailath. Linear System. Pretice Hall, 1980.

L. Ljung and T. Glad. On global identifiability for arbitrary model parametriza- tions. Automatica, 30(2):265–276, 1994.

L. Ljung and T. Glad. Reglerteknik -Grundläggnade teori. Studentlitteratur, 2006. ISBN 978-91-44-02275-8.

T. Maiwald and J. Timmer. Dynamical modeling and multi-experiment fitting with potterswheel. Bioinfomatics, 24(18):2037–2043, 2008.

49 50 BIBLIOGRAPHY

Mathcore. http://www.mathcore.com, 2009.

Modelica. http://www.modelica.org/tools, 2009. T. Pettersson. Global optimization methods for estimation of descriptive models. Master’s thesis, Linköpings University, 2008. H. Pohjanpalo. System identfiability based on the power series expansion of the solution. Mathematical biosciences, 41:21–33, 1978. Sedoglavic. http://www2.lifl.fr/~sedoglav/, 2009. A. Sedoglavic. A probabilistic algorithm to test local algebraic observability in polynomial time. Journal of Symbolic Computation, 33(5):735–755, 2002. S. Vajda, K. Godfrey, and H. Rabitz. Similarity transformation approach to ident- fiability analysis of nonlinear compartmental models. Mathematical biosciences, 93:217–248, 1989. D. Wang and M. Murphy. Identifying nonlinear relationsships on regression using the ace algorithm. Journal of Applied Statistics, 32:243–258, 2005. S. Wolfram. The Mathematica Book. Cambridge University Press, 1999. ISBN 0-521-64314-7. Appendix A

Programming Examples

A.1 Mathematica

Example A.1: An example of Mathematica Coding Procedural programming: sum=0; For[i=1,i<=1000,i++,If[Mod[i,2]==0,sum+=i]]; sum

Functional programming: Apply[Plus,Select[Range[1000],EvenQ]]

Rule-based programming: Range[2,1000,2]//.{y_,x_,z___}->{x+y,z}

A.2 MathModelica

Example A.2: An Example of Modeling in MathModelica type Voltage = Real(unit="V"); type Current = Real(unit="A"); connector Pin Voltage v; flow Current i;

51 52 Programming Examples end Pin; model TwoPin "Superclass of elements with two electrical pins" Pin p, n; Voltage v; Current i; equation v = p.v-n.v; 0 = p.i+n.i; i = p.i; end TwoPin; model Resistor "Ideal electrical resistor" extends TwoPin; parameter Real R(unit="ohm") "Resistance"; equation R*i=v; end Resistor;

A.3 Maple

Example A.3: Coding Faculty in Maple Imperative programming: myfac := proc(n::nonnegint) local out, i; out := 1; for i from 2 to n do out := out * i end do; out end proc;

Another way, ’maps to’ arrow notation: myfac := n -> product( i, i=1..n );