THE STOCHASTIC DYNAMICS OF BIOCHEMICAL SYSTEMS

A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences

2013

By Joseph Daniel Challenger School of Physics and Astronomy Contents

Abstract 8

Declaration 9

Copyright 10

Acknowledgements 11

Publications 12

1 Introduction 13 1.1 Stochastic effects in biochemical systems ...... 15 1.2 The role of noise in oscillatory behaviour ...... 17 1.3 Theoretical and numerical techniques ...... 18 1.4 Our approach ...... 19 1.5 Introducing COPASI ...... 20 1.6 Outlook ...... 22

2 Background 23 2.1 The ...... 23 2.1.1 An example ...... 26 2.2 Deterministic dynamics ...... 27 2.3 Simulating the master equation ...... 28 2.3.1 The Gillespie algorithm ...... 29 2.3.2 Other simulation algorithms ...... 33 2.4 The van Kampen expansion ...... 34 2.5 The Fokker-Planck equation ...... 37 2.5.1 Solving the Lyapunov equation ...... 39 2.6 The Langevin equation ...... 41

2 2.7 Stochastic oscillations & power spectra ...... 43 2.8 Summary ...... 46

3 Fluctuation Analysis and COPASI 50 3.1 Using COPASI ...... 50 3.2 Formalism ...... 53 3.3 Details of the implementation ...... 59 3.4 Conservation relations ...... 60 3.5 The linear noise approximation in COPASI ...... 63 3.6 A MAPK signalling system ...... 66 3.7 Summary ...... 69

4 Multi-compartment LNA 73 4.1 A two-compartment system ...... 74 4.2 The general case ...... 77 4.3 Relation between Aˆ and A˜ ...... 81 4.4 A model of metabolism in cardiac muscle ...... 83 4.5 A yeast glycolysis model ...... 86 4.6 Discussion ...... 91

5 Synchronisation of stochastic oscillators 92 5.1 Collective behaviour in Dictyostelium ...... 99 5.2 Synchronisation of glycolytic oscillations ...... 106 5.3 Discussion ...... 110

6 Conclusions 112

Bibliography 118

A Ill-Conditioned Systems 128

B Preliminary Code 133

C Form of the complex coherence function 140

Word Count: 28,981

3 4 List of Tables

3.1 The matrix for the Michaelis-Menten reaction system . 65 3.2 Reaction scheme for the Kholodenko MAPK signalling model . . . 67 3.3 Covariance matrix for the MAPK signalling model ...... 70 3.4 A comparison of the optimisation algorithms in COPASI . . . . . 71

4.1 The three-compartment reaction system ...... 80 4.2 of the fluctuations for the three-compartment system 81 4.3 The creatine kinase model at the steady-state ...... 85 4.4 Covariances for the creatine kinase model ...... 86 4.5 Reaction scheme for the yeast glycolysis model ...... 88 4.6 Covariances for the yeast glycolysis model ...... 90

5 List of Figures

1.1 The user interface for the software package COPASI...... 21

2.1 Flow chart illustrating the Gillespie algorithm...... 31 2.2 Output from the Gillespie algorithm ...... 32 2.3 A comparison of the stochastic (dark blue) and deterministic (light blue) dynamics ...... 32 2.4 Time series of stochastic oscillations ...... 47

2.5 Power spectra for species Y1, Y2 and Y3...... 48

3.1 List of chemical reactions, as displayed in COPASI ...... 52 3.2 The COPASI display for the LNA task...... 64 3.3 Reaction Scheme for the Kholodenko model ...... 67 3.4 Oscillations observed in MAPKK ...... 68 3.5 The of MKKK fluctuations around the steady state . . . 70 3.6 Two-dimensional parameter scan of the covariance of the fluctua- tions of MKKK and MKK-P ...... 71

4.1 The proposed geometry of the two-compartment model ...... 75 4.2 Reaction Scheme of the creatine kinase model ...... 84 4.3 Reaction Scheme of the yeast glycolysis model ...... 87 4.4 Stochastic oscillations in the yeast glycolysis model ...... 89

5.1 Absolute value of the CCF for species Y and Z ...... 97 5.2 Parametric plot of the CCF for species Y and Z ...... 98 5.3 Plot of the phase spectrum for species Y and Z ...... 98 5.4 Theoretical power spectra for the internal (larger peak) and exter- nal cAMP ...... 100 5.5 Magnitude for the (theoretical) CCF for the internal and external cAMP fluctuations...... 101

6 5.6 Parametric plot of the (theoretical) CCF for the internal and ex- ternal cAMP fluctuations ...... 101 5.7 Diagram from Kim et. al. illustrating the three-cell model . . . . 102 5.8 Time series for cAMP i for a three-cell model ...... 103 5.9 Theoretical power spectra for cAMP i in each cell of the two-cell model ...... 103 5.10 The magnitude of the CCF for the cAMP i in each cell of the two cell model ...... 104 5.11 Parametric plot of the CCF for the cAMP i in each cell of the two-cell model ...... 104 5.12 Phase lag for oscillations of cAMP i in each cell for the two-cell model ...... 105 5.13 Results for a two-cell model, where the cell volumes are not identical107 5.14 Results for a two-cell model, where the cell volumes are not identical108 5.15 Phase spectrum for the cAMP fluctuations in a two-cell model for non-equal cell volumes ...... 109

5.16 Results for oscillations in species A3 ...... 110

5.17 Results for oscillations in species A3 for increased coupling strength 111

7 Abstract

The stochastic dynamics of biochemical systems Joseph Daniel Challenger A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy, 2013

The topic of this thesis is the stochastic dynamics of biochemical reaction systems. The importance of the intrinsic fluctuations in these systems has become more widely appreciated in recent years, and should be accounted for when modelling such systems mathematically. These models are described as continuous time Markov processes and their dynamics defined by a master equation. Analytical progress is made possible by the use of the van Kampen system-size expansion, which splits the dynamics into a macroscopic component plus stochastic correc- tions, statistics for which can then be obtained. In the first part of this thesis, the terms obtained from the expansion are written down for an arbitrary model, enabling the expansion procedure to be automated and implemented in the soft- ware package COPASI. This that the fluctuation analysis may be used in tandem with other tools in COPASI, in particular parameter scanning and optimisation. This scheme is then extended so that models involving multiple compartments (e.g. cells) may be studied. This increases the range of models that can be evaluated in this fashion. The second part of this thesis also concerns these multi-compartment models, and examines how oscillations can synchronise across a population of cells. This has been observed in many biochemical processes, such as yeast glycolysis. How- ever, the vast majority of modelling of such systems has used the deterministic framework, which ignores the effect of fluctuations. It is now widely known that the type of models studied here can exhibit sustained temporal oscillations when formulated stochastically, despite no such oscillations being found in the deter- ministic version of the model. Using the van Kampen expansion as a starting point, multi-cell models are studied, to see how stochastic oscillations in one cell may influence, and be influenced by, oscillations in neighbouring cells. Analytical expressions are found, indicating whether or not the oscillations will synchronise across multiple cells and, if synchronisation does occur, whether the oscillations synchronise in phase, or with a phase lag.

8 Declaration

No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.

9 Copyright

i. The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the “Copyright”) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes. ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, De- signs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made. iii. The ownership of certain Copyright, patents, designs, trade marks and other intellectual property (the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example graphs and tables (“Reproduc- tions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and commercialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://documents.manchester.ac.uk/ DocuInfo.aspx?DocID=487), in any relevant Thesis restriction declarations deposited in the University Library, The University Library’s regulations (see http://www.manchester.ac.uk/library/aboutus/regulations) and in The University’s policy on presentation of Theses

10 Acknowledgements

First of all, I would like to thank my supervisor Alan McKane, for giving me the opportunity to work with him for the last three years. Alan’s knowledge of the field and his intuitive ability to identify and refine research ideas have been extremely useful, and working with him has been a very rewarding experience. I would also like to thank Tobias Galla for his support, and for encouraging me to present my work wherever possible. I have benefited greatly from working in the adventure-filled office 7.26, where ideas and help are both readily available. This is useful at any stage of a PhD, but especially at the start, when the learning curve is steep. Therefore, I thank past and present office-sharers Richard, Alex, Chris, Andy, Duncan, Mohammed, Tommaso, George, James and Toby. Our group has benefited from the presence of several post-docs over the last two years. I am especially grateful to Tim Rogers, whose tenacity for solving research problems and his wide mathematical knowledge have been a great resource and example. I would also like to thank our collaborators in the COPASI team, J¨urgenPahle and Pedro Mendes. Finally, I must thank EPSRC for the postgraduate grant that made this work possible.

11 Publications

The work described in Chapter 3 was done in collaboration with Alan McKane, J¨urgenPahle and Pedro Mendes, and has been published as:

Biochemical fluctuations, optimisation and the linear noise approximation. BMC Systems Biology, 6:86, (2012).

The work described in Chapter 4 was done in collaboration with Alan McK- ane and J¨urgenPahle, and has been published as:

Multi-compartment linear noise approximation. Journal of Statistical Mechanics, P11010, (2012).

The work described in Chapter 5 was done in collaboration with Alan McKane, and has been accepted for publication in Physical Review E.

12 Chapter 1

Introduction

Biochemistry is the study of the chemical processes at work in living organisms. These processes underpin crucial tasks such as controlling chemical energy flow, signalling between cells and gene regulation. Understanding the dynamics of bio- chemical reaction systems is a very important part of understanding the processes at work in living organisms. The dynamics of these systems have been examined using experimental, theoretical and numerical techniques. Both theoretical and numerical work require a model of the system to be made. This model must contain a description of the different types, or ‘species’, of molecules present in the system and the possible chemical reactions which could occur between these molecules. The model requires some knowledge about the the compartment (e.g. cell) in which these molecules are contained, its size, and whether molecules of some or all species can enter or leave it. In making such a model, the modeller, whether aware of it or not, is answering the following question: “for my purposes, what level of detail is required to capture the behaviour of the real biochemical system?” In the last two decades, the importance of stochastic effects in these systems has become much more widely appreciated. In part, this has been due to several important experimental studies. The reason why these effects should be present is clear. For a bimolecular reaction event to occur, two molecules of the requisite species must collide. It is these random collisions that introduce the stochasticity. The modeller must decide whether to explicitly include this randomness in their approach. Any modelling approach will need to quantify the abundances of the vari- ous chemical species present in the system. Because the system is comprised

13 14 CHAPTER 1. INTRODUCTION of molecules, a natural approach is to enumerate the system by the number of molecules of each species present at a given time. Another approach considers the concentrations of the chemical species, usually in molar concentrations, de- noted by M (moles per litre). As Avogadro’s number is so large (approximately 6.022 × 1023) it is often assumed that such a concentrations can be considered to be a continuous variable. That is, if a single molecule is gained or lost due to a reaction event, the change to the concentration is tiny. This assumption is particularly useful, as it means that the time-evolution of the biochemical process may be described by a set of coupled ordinary differential equations (ODEs). Of course, in reality the populations of molecules will still be fluctuating, due to the underlying stochasticity of the reaction processes. However, if the populations are sufficiently large, the fluctuations will be tiny in comparison, and thus can be ignored.

However, because cells are very small in size [1], and chemical species may be present in concentrations as low as µM, or even nM, populations of some, or all, of the species present in a cell may not be sufficiently large for the fluctuations to be ignorable. In these cases, the ODE approach will not give a wholly accurate description of the biochemical process, and a fully stochastic approach should be used.

Another important decision to be made when modelling biochemical reaction systems is whether or not the model explicitly involves space. In principle, a molecular dynamics approach could be used, where the position and velocity of every molecule could be tracked via computer simulation, with reaction events oc- curring due to collisions between molecules. However, this approach is extremely computationally expensive. In fact, most modellers make the assumption that the system under consideration is ‘well mixed’. That is, the concentration of any particular species is the same at any point within the compartment. This assumes that either the system is being constantly stirred, or that diffusion is fast enough to ensure an even distribution of molecules [2, 3]. Another justification for this assumption is that, of all the collision events that occur in the compartment, only a small minority of these will lead to a reaction. The non-reactive collisions result in the randomising of the positions of the molecules, and stop spatial correlations from emerging [2]. One reason why explicitly spatial techniques, such as mod- elling a process as a reaction-diffusion system, are not much used is that spatially resolved experimental data, which could be used to validate such a model, is still 1.1. STOCHASTIC EFFECTS IN BIOCHEMICAL SYSTEMS 15 rare [4]. In the following section we will give some examples of biochemical systems where stochastic effects are significant. We will also discuss the consequences of these effects, which can be detrimental. However in some cases, they appear to be integral to the underlying processes. It is generally true that, when making mathematical models of biochemical systems, the deterministic formalism is more straightforward to work with. The stochastic formalism introduces the concept of probability into the model, and so solutions to the equations can only be discussed in terms of statistical quantities, or probability distributions. Therefore, it is important to provide stark examples of how the stochastic effects present in these systems can have profound effects on the system dynamics, and should be properly considered in the modelling.

1.1 Stochastic effects in biochemical systems

As already touched upon, stochastic effects in biochemical systems are particu- larly significant when the populations of some, or all, of the constituent species are low. This is often the case in gene expression. This is the process by which proteins are synthesised. A gene, which is a length of DNA, is expressed via two processes: transcription and translation [5]. In transcription, information con- tained in the gene is copied to RNA. The number of transcripts from a particular gene is sometimes called the ‘copy number’. In translation, information stored in the RNA is used to construct chains of amino acid molecules. These chains are proteins, which carry out tasks within the cell. Experiments have shown that, in many cases, the average copy numbers of certain genes per cell are in single figures [6]. For example, in yeast cells Zenklusen et. al. found that for the gene MDN1 the average copy number was 6.1, and for the gene KAP 104 it was 2.6 [7]. Furthermore, across the population of cells, a large variation in copy number was found. The copy number of gene MDN1 varied between 1 and 15 in a single cell. For systems like this one, displaying low copy numbers, the deterministic approach, which replaces a discrete population with a continuously-varying one, is clearly limited, as a change in the population, due to e.g. a degradation event, results in a large ‘jump’ in the relative population size. Neither will the deter- ministic model tell us anything about the variation across the population of cells, which in this case result in large deviations from the average behaviour. 16 CHAPTER 1. INTRODUCTION

A great deal of experimental work has been carried out to quantify the vari- ability observed in gene expression, due to the underlying randomness of the interactions. These experiments have been motivated by questions such as how noise can lead to phenotypic variation [8, 9] and how noise can propagate in a gene network [10]. These experiments will continue to become more sophisticated, as experimental techniques develop. For example, single-molecule techniques enable experimentalists to count the individual mRNAs and proteins in the cell, which is very useful for quantifying the presence, and role of, noise in the process [11].

Stochasticity, or ‘noise’, is often considered to have a detrimental effect on the functioning of a biochemical process. This is because it can cause large variations in reaction rates, due to the fluctuating molecular populations, which can lead to significant deviations in the process output. An example of this is the loss of precision in cell signals [12]. Clearly, intracellular processes need to be robust to fluctuations or perturbations, whether internal or external [13]. Some reaction systems employ negative feedback loops to minimise the effects of noise on the overall process [14, 15], suggesting that noise has affected the way in which the reaction system has evolved [16, 17].

However, for many processes, it has been shown that noise has beneficial effects, or is even integral to the process, as is the case for probabilistic path- way selection. A well-known example of this is the lysis/lysogeny decision in λ-phage-infected E. coli cells [18]. Here, molecular-level fluctuations influence which pathway the cell follows, and therefore determine the fate of the cell as a whole. The result of this is that a homogeneous population of cells splits into distinct sub-populations, in a way which cannot be predicted deterministically. This behaviour is due to two proteins, independently produced, which ‘compete’ to control a developmental switch. Therefore, fluctuations in the rates of ex- pression of the two relevant genes can affect which pathway the cell follows [18]. Probabilistic differentiation can also be interpretation as a strategy, enabling di- vision of labour, or bet-hedging [19, 20].

It has been claimed that in certain cases, noise actually enhances the robust- ness of particular properties of the system. Calcium oscillations are involved in the regulation of many cellular processes in the role of signalling. Using a math- ematical model of calcium oscillations, Perc and Marhl showed that the addition of noise makes the oscillations more robust to external perturbations [21]. 1.2. THE ROLE OF NOISE IN OSCILLATORY BEHAVIOUR 17

1.2 The role of noise in oscillatory behaviour

Temporal oscillations are observed in a wide range of biochemical processes e.g. metabolism and signalling [22, 23, 24]. If a system is known to exhibit oscillatory behaviour from experimental observation, this will have a big impact on how the system is modelled theoretically, as the modeller will wish to capture this behaviour [25]. It also poses questions such as, “over what range of reaction parameters values will oscillations be observed?”. Whether a system is modelled as a stochastic or deterministic process can determine whether oscillations are observed. To capture sustained oscillations using the deterministic framework, the ODEs must exhibit either limit cycle behaviour or chaotic behaviour [26]. In the limit cycle case, the concentrations of the chemical species oscillate with a unique frequency. If the solution to the ODEs is a chaotic attractor, the oscillations will not have a unique period. However, it is now well established that intrinsic noise can produce oscillations (sometimes called quasi-cycles) which are not observed in a deterministic model [27, 28, 29]. This effect can produce a significant, qualitative difference in results obtained from the stochastic and deterministic models of the same biochemical process. This has important consequences for modellers attempting to capture sustained oscillations, observed in experiment, in their model: it is not necessary to create a model which, described deterministically, has limit cycle or chaotic behaviour. Purely considering oscillations resulting from deterministic models limits the model type, and parameter ranges, for which oscillations are observed. In Chapter 2 we will describe the mechanism which leads to quasi-cycles, and detail the cases in which they occur. Single-cell oscillators can also synchronise in a population of cells. This has been observed experimentally in a range of systems e.g. in glycolytic oscillations in yeast cells [30, 31], where the individual cells are coupled via a shared extracel- lular reservoir. The idea of synchrony is an important one, as many biochemical processes require a population of cells working together. Much theoretical work has looked at synchronisation of oscillations in deterministic models where the individual cells show limit cycle behaviour [32, 33], but less work has been done on synchronisation of stochastic oscillations. Using numerical simulations, Kim et. al. examined stochastic oscillations across a population of cells in a model of the social amoeba Dictyostelium discoideum, and showed that synchronisation of the oscillations is observed [34]. In Chapter 5, we shall look at this effect from 18 CHAPTER 1. INTRODUCTION an analytical viewpoint.

1.3 Theoretical and numerical techniques

We have looked at some examples of biochemical processes for which stochastic effects are important, to justify the need for a stochastic framework to describe the dynamics of these systems. A common assumption made when studying stochastic processes is that the process is memory-less. That is, predictions of future events depend only on the current state of the system. This greatly sim- plifies the analysis, as no knowledge of the history of the process is required. Processes which have this property are called Markov processes [35]. We shall describe the state of the system by the vector n, where n = (n1, n2, ...nk). This specifies the molecular populations of all k species present in the system. This can be seen as a mesoscopic approach, as the number of molecules of each species is counted at the population level, and no attempt is made to record the fate of any particular molecule. The system ‘jumps’ from one state to another when a reaction event occurs. These jumps are governed by transition probabilities, which are determined by the details of the chemical reactions. The dynamics of such a system is described by the so-called master equation [36], which we will derive in Chapter 2. This equation has long been used to study the type of sys- tems we shall encounter here [37]. The equation describes the time-evolution of P (n, t), the probability for the system to be found in state n at time t. In general, however, this equation cannot be solved exactly. Probabilistically exact trajec- tories of the stochastic processes can be found computationally, using numerical methods, such as the Gillespie algorithm [2, 38]. However, to obtain relevant statistics for the , such as a or variance, it is necessary to average over a large number of such trajectories, which can be computationally expensive. For this reason, it is often considered advantageous to make analytic progress by finding an approximate solution to the master equation. This can be done by noting that a jump process, such as the one described here, can be viewed as the approximation of a diffusion process [39], which, in general, are easier to study. Thus, the master equation is reduced to a diffusion process by approximate methods, such as the Kramers-Moyal expansion, or the van Kampen expansion [35, 39]. The latter will be described in detail in Chapter 2. Due to its relative ease of use, the van Kampen expansion, which is sometimes referred 1.4. OUR APPROACH 19 to as the ‘system-size expansion’ or linear noise approximation (LNA), has often been used to describe stochastic effects in the type of systems we shall study here [40, 41, 29].

1.4 Our approach

In this work, we will use techniques mentioned in the previous section, developed in statistical physics, to describe the time-evolution of a stochastic process. We will describe biochemical reaction systems by integer molecular populations, with ni denoting the population of species i, rather than using continuous chemical concentrations. In this thesis, we will focus on stochastic models for which the deterministic analogue has a fixed point, or ‘steady-state’ [26]. That is, the concentrations of all species reach a fixed value at long time. We will use these analytical techniques to obtain statistics for the fluctuations around the steady- state. The system will be described by the master equation [35], which we will solve approximately, using the expansion method due to van Kampen [42, 35]. We will automate this expansion process so that, once a reaction system has been described, statistics for the noise, here in the form of a covariance matrix, can be found by computer software. This will eliminate the lengthy algebra produced by the expansion, and will make the technique available to a wider audience. In addition to biochemistry, the techniques used here have been applied to problems in many disciplines, such as ecology [28, 43], genetics [44], linguistics [44], epidemiology [45] and game theory [46]. To study the effects of noise in such systems, it is possible, and indeed simpler, to use the deterministic model of the system, i.e. the set of coupled ODEs, and simply add a Gaussian noise term to each equation. This approach is not ideal, however, since it is not clear how large these noise terms should be, or how they are correlated with each other. The master equation, on the other hand, contains knowledge of the actual reaction events, so even if approximate methods, such as the LNA, are used to solve it, the form of the noise is properly motivated. It is possible for two different reaction systems, with different statistics for their fluctuations, to share the same set of ODEs when studied deterministically [40], so when studying such systems stochastically it is important to begin with a framework which elucidates the individual reaction events. 20 CHAPTER 1. INTRODUCTION

1.5 Introducing COPASI

As outlined in the previous sections, we will take a theoretical approach to under- standing noise in biochemical systems, using analytical techniques developed in statistical physics. The mathematics required by the techniques used is manage- able, but can become tedious for complicated reaction systems, involving many chemical species and reactions. Therefore, it would be desirable to automate the analytical procedure, so it could be described computationally. This would also allow the procedure to be performed by the wide range of people interested in the role of noise in biochemical systems, not all of whom may have the mathematical background required to perform the expansion. We have worked in collaboration with members of the COPASI project, which is a collaboration between research groups located in three universities: the Virginia Bioinformatics Institute, the University of Heidelberg and the University of Manchester. They have devel- oped, and maintain, COPASI (a Complex Pathway Simulator) [47, 48], a software package designed to study chemical reaction systems. Models of these systems may be formulated in terms of a single compartment (e.g. cell), or in terms of multiple compartments, which can exchange molecules. Once the reaction system has been described in terms of the chemical species, reactions and their reaction kinetics and rates, compartment volume(s) etc., COPASI can perform a range of tasks, such as

• Simulating the time evolution of the system (either deterministically or stochastically)

• Finding a fixed point, and determining its stability via an eigenvalue anal- ysis

• Calculate Lyapunov exponents

• Parameter scanning and optimisation

The results from these tasks can be described graphically in COPASI, or they can be saved to file for future reference. The graphical user interface (GUI) for COPASI is shown in Fig. 1.1. Our aim is to incorporate the LNA procedure into COPASI so that, for biochemical models with a steady-state, COPASI can calculate statistics for the fluctuations around the steady-state in the form of a 1.5. INTRODUCING COPASI 21

Figure 1.1: The user interface for the software package COPASI. The menu shows the tasks available to the user.

covariance matrix. Then the LNA could be used in tandem with the other tasks in COPASI, such as parameter scanning.

Concurrent to this work, a similar project has been devised by Thomas et. al. [49]. They have designed and built a software package called the ‘Intrinsic Noise Analyser’. They are also interested in quantifying the fluctuations around a fixed point using the van Kampen expansion. A strength of their approach is that higher-order terms in the expansions are used to calculate corrections to the mean values of the chemical species. This is done by using the so-called ‘effective mesoscopic rate equations’ (EMREs) [50]. A strength of our approach is that we are adding our analysis to what is already a large software package. This means that we have the advantage of the tools already existing in COPASI. Particularly useful are the model reduction techniques (which detect and eliminate conserved quantities) and the optimisation algorithms. We discuss the relevance of both of these in more detail in Chapter 3. 22 CHAPTER 1. INTRODUCTION

1.6 Outlook

In this chapter, we have explained the aims of this thesis, namely to use ana- lytical techniques to quantify the noise present in biochemical reaction systems. Some of the effects of noise in these systems, both constructive and detrimental were also outlined. The rest of the thesis is organised as follows. Chapter 2 provides the technical background which will be needed in subsequent chapters. Some simple examples of biochemical systems are used to illustrate the mathe- matical ideas introduced. In Chapter 3 a framework is established to describe the stochastic formulation of a reaction system in a very general way, so that the fluctuation analysis for models with a steady-state can be automated and, therefore, performed by COPASI. Chapter 4 extends these ideas so that reaction systems involving many compartments can be studied in this way. Chapter 5 examines how coupled stochastic oscillations can become synchronised, and de- scribes this behaviour quantitatively. There will be some conclusions and sugges- tions for further work in Chapter 6. Appendix A gives some information about ill-conditioned systems, which will be useful for the linear algebra used in this thesis. Appendix B displays some preliminary code, used to guide the COPASI implementation. Appendix C contains some algebra pertaining to Chapter 5. Chapter 2

Background

This chapter contains details of the formalism required to study a reaction sys- tem as a stochastic process. All stochastic processes described in this thesis are assumed to share a crucial property: that they are memory-less. Such processes are known as Markov processes, and may be described by the master equation, which will be derived and discussed. The master equation will be a key equa- tion in the work that follows this chapter. The relation between the reaction system described by the master equation and the system’s equivalent determin- istic description, namely a set of ordinary differential equations (ODEs), will be presented. Aside from certain cases, in which the master equation can be solved exactly, one must use either computational techniques to simulate the random process which the equation represents, or employ an approximation to enable analytical progress to be made. Both of these avenues will be explored. The analytical approach will require some tools from linear algebra which will also be outlined here.

2.1 The master equation

For much of what will follow, the master equation will be the starting point for describing the reaction systems under consideration. First, some notation will be introduced. P (y1, t1) is the probability that a stochastic variable, Y , has the value y1 at time t1. P (y2, t2; y1, t1) is the joint probability that the stochastic variable has the value y1 at time t1 and the value y2 at time t2. This expression may be extended for an arbitrary number of times, e.g. P (yn, tn; ... ; y2, t2; y1, t1), where we take t1 < t2 < t3 < . . . < tn. Such joint probabilities can be reduced as

23 24 CHAPTER 2. BACKGROUND follows: Z P (yn, tn; ... ; y2, t2; y1, t1)dyn = P (yn−1, tn−1; ... ; y2, t2; y1, t1). (2.1)

In the above equation, Y is a continuous . If, instead, Y takes discrete values, the integral is replaced by a summation. Notation for a condi- tional probability will also be required. We write P (y2, t2|y1, t1) to denote the probability for the stochastic variable Y to have value y2 at time t2 given that it had value y1 at time t1. It is contained in the following useful identity [36]

P (y2, t2; y1, t1) = P (y2, t2|y1, t1)P (y1, t1). (2.2)

We write P (yk+l, tk+l; ,... ; yk+1, tk+1|yk, tk; ... ; y1, t1) for the joint conditional prob- ability. It can be defined as

P (yk+l, tk+l; ... ; y1, t1) P (yk+l, tk+l; ,... ; yk+1, tk+1|yk, tk; ... ; y1, t1) = . (2.3) P (yk, tk; ... ; y1, t1)

As already mentioned, the stochastic processes studied here are considered to be memory-less, or Markov, processes. When this is the case, this expression can be simplified

P (yn, tn|yn−1, tn−1; ... ; y1, t1) = P (yn, tn|yn−1, tn−1). (2.4)

This allows any number of joint probabilities to be constructed e.g.

P (y3, t3; y2, t2; y1, t1) = P (y3, t3|y2, t2)P (y2, t2|y1, t1)P (y1, t1). (2.5)

Integrating over y2, where again we understand that t1 < t2 < t3, Z P (y3, t3; y1, t1) = P (y1, t1) P (y3, t3|y2, t2)P (y2, t2|y1, t1)dy2, (2.6)

and dividing both sides by P (y1, t1) leads to Z P (y3, t3|y1, t1) = P (y3, t3|y2, t2)P (y2, t2|y1, t1)dy2. (2.7)

This equation is known as the Chapman-Kolmogorov equation. So far, only continuous stochastic variables have been considered. When studying biochemical reaction systems, we will think in terms of numbers of molecules, n, where the 2.1. THE MASTER EQUATION 25 system makes instantaneous jumps between these discrete states due to reaction events. For discrete random variables, in the case when t3 = t2 + δt, where δt is a small increment of time, the Chapman-Kolmogorov equation can be written as

X 0 0 P (n, t + δt|n0, t0) = P (n, t + δt|n , t)P (n , t|n0, t0). (2.8) n0

The probability of a jump from state n0 to n in the time interval of size δt has the form T (n|n0)δt. Therefore we write

0 0 X 00 0 P (n, t + δt|n , t) = T (n|n )δt + (1 − T (n |n )δt)δn0,n. (2.9) n00

That is, in time interval δt, the system either makes a transition from state n0 to n, or it remains in the initial state. We substitute this equation into the Chapman-Kolmogorov equation and eliminate the Kronecker delta

X 0 0 X 0 P (n, t + δt|n0, t0) = T (n|n )P (n , t|n0, t0)δt + (1 − T (n |n)δt)P (n, t|n0, t0). n0 n0 (2.10)

Subtracting P (n, t|n0, t0) from both sides and dividing by δt leads to

P (n, t + δt|n0, t0) − P (n, t|n0, t0) X = T (n|n0)P (n0, t|n , t ) δt 0 0 n0 (2.11) X 0 − T (n |n)P (n, t|n0, t0). n0

The left hand side of this equation may be written as a derivative in the limit δt → 0. This equation is the master equation

dP (n, t) X X = T (n|n0)P (n0, t) − T (n0|n)P (n, t). (2.12) dt n06=n n06=n

For brevity, the conditional probability notation has been dropped, but clearly the equation must be solved subject to an initial condition. Here we have considered the one variable case, but the equation naturally extends to many variables, which we will use for biochemical models involving several chemical species. The equation has a clear interpretation: P (n, t), the probability for the system to be found in state n at time t, increases due to the transitions into state n from the other states n0. Similarly, P (n, t) decreases due to transitions out of state n to 26 CHAPTER 2. BACKGROUND

other states. The transition rates T describe the propensity for these transitions to occur. In the notation employed here, T (n0|n) is the transition rate from state n to the state n0. Unfortunately, it is rarely possible to solve the master equation for P (n, t) exactly. If the transition rates are linear then a solution can be found, but in general this will not be the case for the systems examined in this work. Therefore, one of two paths must be chosen. Either an approximation must be made, so that analytic process becomes feasible. Or, the stochastic system must be simulated numerically, using Monte-Carlo methods. Before describing these paths in more detail, a simple example will be described to show how the master equation is written down for a particular reaction system.

2.1.1 An example

To show how the master equation is construction for a particular case, we take a simple example of a reaction system from the literature [40]. It contains two dynamic molecular species X and Y , and three species, A, B and C, whose concentrations, for simplicity, are assumed to be held constant. The molecules are contained in a compartment of volume Ω. Here we take Ω = 1pl, which is typical for a eukaryotic cell [1]. In the original article, it is used as a simple example of an anabolic system, where X and Y are produced from two reservoirs, and the react irreversibly to form a heterodimer, C. The possible chemical reactions are

A −→k1 X, µ X −→1 A, B −→k3 Y, µ Y −→2 B, X + Y −→k5 C. (2.13)

The transition rates are presented below, where n = (n1, n2), the number of X and Y molecules respectively. The transition rates are taken to be proportional to the concentration(s) (that is, particle number per unit volume) of the substrate(s) of each reaction, and also proportional to the compartment volume, Ω. The form

of T3 and T5 are simplified, using the fact that the numbers of A and B molecules are constant. For simplicity, these two transition rates are taken to be equal: we 2.2. DETERMINISTIC DYNAMICS 27

write k1na = k3nb = κΩ, but in general this need not be the case. The transition rates are

0 k1na 0 T1(n |n) = Ω Ω = Ωκ, n = n + (1, 0),

0 µ1n1 0 T2(n |n) = Ω Ω = µ1n1, n = n + (−1, 0),

0 k3nb 0 T3(n |n) = Ω Ω = Ωκ n = n + (0, 1),

0 n2 0 T4(n |n) = µ2 Ω Ω = µ2n2, n = n + (0, −1),

0 n1 n2 n1n2 0 T5(n |n) = k5 Ω Ω Ω = k5 Ω n = n + (−1, −1). (2.14)

These transition rates, together with an initial condition, describe the stochastic process. Notice that the transition rate for the fifth reaction is non-linear. The effect of this is evident in the moment equations, which can be found from the master equation. The equation for the expectation of n1, hn1i, is written below.

It is found by multiplying both sides of the master equation by n1 and summing over all n, dhn i hn n i 1 = κΩ − k 1 2 − µ hn i. (2.15) dt 5 Ω 1 1

A similar equation may be written down for hn2i. The ODE describing hn1i depends on the second order moment, hn1n2i. Similarly, the equations for the second order moments will depend on the third order moments. Therefore, the coupled set of equations describing the moments is not ‘closed’, and cannot be solved exactly. As mentioned above, either an approximation must be made to close these equations, or numerical simulations must be employed. Both these options will be explored later in this chapter.

2.2 Deterministic dynamics

The focus of the work presented here will quantifying be the stochastic fluctua- tions present in biochemical systems. However, we will touch on some ideas used to study such systems deterministically. First we show how the mean field equa- tions are obtained from the master equation, using the reaction system described in Eq. (2.13). The equations for the first order moments are

dhn i hn n i 1 =κΩ − k 1 2 − µ hn i, dt 5 Ω 1 1 28 CHAPTER 2. BACKGROUND

dhn i hn n i 2 =κΩ − k 1 2 − µ hn i. (2.16) dt 5 Ω 2 2

The mean field equations are found by dividing both sides of the equations by the volume, Ω, and then taking Ω → ∞. We define the chemical concentrations

hnii in the deterministic limit to be xi. That is, xi = Ω in the limit Ω → ∞. In the infinite volume limit, the correlations between variables can be ignored i.e. hn1n2i = hn1ihn2i. The mean field equations are found to be

dx 1 = κ − k x x − µ x , dt 5 1 2 1 1 dx 2 = κ − k x x − µ x . (2.17) dt 5 1 2 2 2

A set of such equations has the general form dxi/dt = fi(x), where in this case i = 1, 2. Setting the left hand sides of these equations to zero, and solving the resulting set of simultaneous equations yield the fixed points of the system, denoted by x∗. To find out whether a fixed point is stable or unstable, it is necessary to linearise about the fixed point by making a small perturbation, xˆ, away from it. Retaining terms linear in the perturbation variable, we find P dˆxi/dt = j Aijxˆj, where Aij = ∂fi(x)/∂xj. If the eigenvalues of the matrix A all have negative real parts, the perturbation decays to zero. That is, the perturbed system returns to the fixed point, which is said to be stable1. Stable fixed points are sometimes referred to as ‘steady-states’. If, in addition to this, the eigenvalues have non-zero imaginary components, the perturbation decays to the fixed point via damped oscillations [26]. Other behaviours for these systems are generally possible e.g. limit cycles or chaotic attractors, but this work will focus on models which, in the deterministic framework, contain stable fixed points.

2.3 Simulating the master equation

In Section 2.1 it was shown that, due to non-linear transition rates, the master equation cannot be solved exactly. One way of obtaining statistics of the un- derlying stochastic process is to perform a Monte-Carlo simulation to generate a realisation of the time evolution of the stochastic process. A large number of

1Or, more precisely, it is locally stable, as we have only considered a small perturbation: a larger displacement could, in theory, send the system elsewhere. 2.3. SIMULATING THE MASTER EQUATION 29 these realisations can then be used to calculate, for example, expectation val- ues or autocorrelation functions. Here we describe one such algorithm, due to Gillespie, and briefly summarise other algorithms which have been suggested as improvements.

2.3.1 The Gillespie algorithm

This section describes the Gillespie algorithm which will be used to simulate reac- tion systems throughout this thesis. There is more than one way of implementing the algorithm. The method used here is called the ‘direct’ method [2]. The first step, of course, is to define the reaction system under investigation. We take a system of N chemical species, Si,(i = 1, 2, ..., N) each of molecular population Xi contained in a volume, V . These species can participate in M chemical reactions,

Rµ, µ = (1, 2, ..., M). For each reaction, Rµ, a quantity aµ is defined. This is the probability per unit time that reaction µ will occur. For a particular reaction this quantity is calculated from the probability per unit time that the required combination of molecules will react, multiplied by the number of distinct combi- nations of those molecules present in the system. These are in fact, the transition rates which appear in the master equation. The aµ will also be used to define the quantity a: M X a = aµ. (2.18) µ=1 These quantities are used to define the ‘reaction probability density function’, P (τ, µ), where P (τ, µ)dτ is the probability (at time t) that the next reaction event in our system will take place in the interval between time t + τ and time t + τ + dτ and will be a reaction of type Rµ. It has the form [2]

P (τ, µ) = aµexp[−aτ]. (2.19)

When simulating the system, Monte Carlo methods are used to select the values for τ and µ from this distribution. To achieve this, two random numbers, b1 and b2 are drawn from a uniform distribution over the unit interval. Time is initialised to zero and incremented by the value of τ after each reaction event. The value of τ is given by [2] 1 1 τ = ln . (2.20) a b1 30 CHAPTER 2. BACKGROUND

The reaction which occurs is specified as the integer value of µ which satisfies

µ−1 µ X X aν < b2a ≤ aν, (2.21) ν=1 ν=1 such that, the larger the value of aµ, the more likely that reaction µ is to be chosen [2]. In this way, two number are generated, τ, the value by which time should be incremented, and µ, which specifies which reaction occurs. When µ is chosen, we update the molecular populations that are altered by the occurrence of this reaction. As a result of this, aµ and a must be recalculated.

The process is repeated, with two new random numbers generated, until t exceeds tstop, which is defined by the user. A value may be chosen for tint, the time interval at which the molecular populations are to be stored for later use. The procedure is illustrated in Figure 2.1. After the simulation has finished, a time series plot can be made. It should be stressed that, because the simulations are probabilistic, two simulations of the same system with the same initialisation will result in different outcomes. They are simply two different realisations of the same process. To demonstrate the type of output produced by the algorithm, we use as an example the reaction system introduced in Eq. (2.13). Figure 2.2 shows one trajectory generated by the algorithm, for a particular initial condition. The numerical values chosen for the reaction parameters are given in the caption.

We can compare the results from this numerical simulation of the finite-size system with the solution to the corresponding deterministic equations, which we found in Section 2.2 by taking the infinite system-size limit, thereby eliminating the random fluctuations evident in Figure 2.2. Figure 2.3 shows one realisation of the stochastic process for species X, generated by the Gillespie algorithm, and compares it to the solution of the differential equation for its concentration, x1, given in Eq. (2.17). The concentration is converted into particle numbers using the system volume. It can be seen that the trajectory generated from the simulation fluctuates around the average behaviour, given by the deterministic dynamics. This observation will be the starting point for the analytical work, due to van Kampen, which will be described in the next section. Firstly, however, we will make a summary of other algorithms, which build on Gillespie’s original, used to simulate the stochastic process described by the master equation. 2.3. SIMULATING THE MASTER EQUATION 31

Initialise Model

Pick two Calculate random τ,µ numbers

Update molecular populations and reaction propensities

no

Exceeded yes Output tint?

no

Exceeded yes stop tstop?

Figure 2.1: Flow chart illustrating the Gillespie algorithm. 32 CHAPTER 2. BACKGROUND

Molecule Number

400

300

200

100

Time s 5 10 15 20 25 H L Figure 2.2: Using the Gillespie algorithm to simulate the reaction system de- scribed in Section 2.1. Species X is in blue, species Y is in red. Values for the reac- −1 −1 −1 −1 −1 tion parameters were: κ = 0.2s , µ1 = 0.2s , µ2 = 0.3s and k5 = 1.5nM s . The system volume was 10−12l.

Molecule Number

400

350

300

250

200

150

Time s 5 10 15 20 H L Figure 2.3: A comparison of the stochastic (dark blue) and deterministic (light blue) dynamics for species X. The parameters chosen are those stated in Fig- ure 2.2. 2.3. SIMULATING THE MASTER EQUATION 33

2.3.2 Other simulation algorithms

Many other algorithms have built on Gillespie’s original. These aim to speed up the simulations, either by improving on the efficiency of the original algorithm, or by employing approximations. For reviews of these algorithms see [38, 51]. An example of the former is the ‘Next Reaction Method’, due to Gibson & Bruck [52]. Here information on which species are affected by which reaction is stored, so that unnecessary recalculations of aµ are avoided. An index of the reactions is kept, which orders them in terms of their propensities to be chosen next. On average this reduces the time required to select which reaction will ‘fire’ next. In addition to this, the random numbers are reused, after transformations are applied, which also speeds up the simulation. One example of an approximate algorithm is the τ-leap Method [53]. Rather than depicting every reaction event, this method ‘leaps’ forward in time in steps of τ, during which time many reaction events will have occurred2. The number of these events can be approximated by Poisson random variables. The use of this procedure assumes that the propensities, aµ, do not change over the interval of time τ. Therefore, the size of the interval must be chosen carefully. This issue has been discussed in the original article [53], and elsewhere in the literature [54, 55]. All of the numerical results in this thesis are derived from simulations from probabilistically exact algorithms, such as Gillespie’s original. However, the τ-leap method was frequently used in preliminary work, when it was desirable to obtain results quickly. If a reaction system contains a mixture of species with large and small molec- ular populations, a hybrid method may be considered [38]. This involves par- titioning the master equation into two ’subsystems’, of fast and slow reactions. Then, rather than performing a full stochastic simulation, the system is evolved using a mixture of ODEs and stochastic simulation. In summary, many variations to the Gillespie algorithm have been proposed, with the aim of reducing the computational time required to simulate reaction systems. However, simulating realistic reaction systems can still be extremely time-consuming, especially as a large ensemble of trajectories must be generated for statistics to be generated. This makes analytical approximations desirable:

2Here we define τ in a different way than we did when describing the Gillespie algorithm. In that case, every reaction event is described, and τ was the calculated time between reaction events. In this case, τ will have a fixed value. 34 CHAPTER 2. BACKGROUND

one of these, due to van Kampen, is the subject of the next section.

2.4 The van Kampen expansion

In the previous section, we looked at the deterministic dynamics, which are ob- tained in the thermodynamic limit, as the molecular population of the system approaches infinity. We obtained this by taking the limit Ω → ∞, where Ω was the compartment volume. All information about the fluctuations was lost. To find expressions for the fluctuations via analytical work, it is necessary to re- turn to the master equation and make an approximation to allow a solution to be found. Recall that, in general, a direct solution to the master equation (for P (n, t)) cannot be found, and the moment equations will not close. A systematic approach to this task is the system-size expansion due to van Kampen [42, 35], sometimes called the ‘linear noise approximation’ (LNA), or Omega-expansion. Using this expansion assumes that the fluctuations scale with the square-root of system-size, and that the fluctuations are centred on the deterministic dynamics. The fluctuations can be said to “ride on the back” of the deterministic dynamics [56]. By system-size we mean a large parameter which characterises the system being studied. In this work, we use Ω, the compartment volume, which we have already used to define chemical concentrations. To perform the expansion, the following change of variables is made: √ ni = Ωxi + Ωξi, (2.22) where x is the chemical concentration obtained from the system’s deterministic model, and ξ is a random variable representing the fluctuations. By dividing both sides of the above equation by Ω and taking Ω → ∞, the fluctuations disappear, as expected in the thermodynamic limit. To demonstrate the expansion, we return again to the model described in Eq. (2.13), which contains five reactions. The master equation for this two-variable system can be written as

dP (n, t) X X = T (n|n0)P (n0, t) − T (n0|n)P (n, t), dt n06=n n06=n

where n = (n1, n2). Each reaction in the system contributes to the transition rates both into and out of state n, as shown in Eq. (2.14). Their contributions 2.4. THE VAN KAMPEN EXPANSION 35 to the master equation are written below.

−1 (E1 − 1)κΩP (n, t), +1 (E1 − 1)µ1n1P (n, t), −1 (E2 − 1)κΩP (n, t), +1 (E2 − 1)µ2n2P (n, t), n n ( +1 +1 − 1)k 1 2 P (n, t), (2.23) E1 E2 5 Ω

+1 where we have introduced the operators E1 and E2, such that E1 f(n1, n2) = f(n1 + 1, n2). Now the substitution of variables can be made. The operators Ei can be written in terms of the new variables as

1 ∂ 1 ∂2 Ei = 1 + √ + 2 + ..., (2.24) Ω ∂ξi 2Ω ∂ξi which can be demonstrated by performing a Taylor expansion of f(n1 + 1, 2) expressed in the new variables. The right hand side of the master equation also has to be transformed. The is now denoted by Π not P , to indicate that it is a function of ξ rather than of n. By writing out the total derivative of P the following relation can be found

2 dP ∂Π(ξ, t) X dxi ∂Π = − Ω1/2 , (2.25) dt ∂t dt ∂ξ i=1 i where we have used the fact that the time derivative is taken with constant n [35]. Due to the form of the E operators, we now have an infinite number of terms on the right hand side of the master equation. We proceed by equating terms of the same order in Ω on each side of the equation. The leading order √ terms, of order Ω, recover the mean field equations, which for this example are those in Eq. (2.17). The next-to-leading order terms, of order Ω0, obey a partial differential equation (PDE) in Π(ξ, t), the probability distribution for the fluctuations around the mean-field dynamics. We will now expand out the terms in the master equation given in Eq. (2.23) and, for each reaction, we find

−1 (E1 − 1)κΩP (n, t) −→ 1 ∂ 1 ∂2 (2.26) [(1 − √ + 2 + ...) − 1]κΩΠ Ω ∂ξ1 2Ω ∂ξ1 36 CHAPTER 2. BACKGROUND

+1 (E1 − 1)µ1n1P (n, t) −→ 2 (2.27) 1 ∂ 1 ∂ ξ1 [(1 + √ + 2 + ...) − 1]µ1Ω(x1 + )Π Ω ∂ξ1 2Ω ∂ξ1 Ω

−1 (E2 − 1)κΩP (n, t) −→ 1 ∂ 1 ∂2 (2.28) [(1 − √ + 2 + ...) − 1]κΩΠ Ω ∂ξ2 2Ω ∂ξ2

+1 (E2 − 1)µ2n2P (n, t) −→ 2 (2.29) 1 ∂ 1 ∂ ξ2 [(1 + √ + 2 + ...) − 1]µ2Ω(x2 + )Π Ω ∂ξ2 2Ω ∂ξ2 Ω

n n ( +1 +1 − 1)k 1 2 P (n, t) −→ E1 E2 5 Ω 1 ∂ 1 ∂2 1 ∂ 1 ∂2 [(1 + √ + 2 + ...)(1 + √ + 2 + ...) − 1] (2.30) Ω ∂ξ1 2Ω ∂ξ1 Ω ∂ξ2 2Ω ∂ξ2 ξ ξ k Ω(x + 1 )(x + 2 )Π. 5 1 Ω 2 Ω

After expanding out the brackets, we collect terms of the same order. The terms √ of leading order ( Ω) are found to be

∂ ∂ (k5x1x2Π) + (k5x1x2Π), (2.31) ∂ξ1 ∂ξ2

∂ ∂ ∂ ∂ − (κΠ) − (κΠ) + (µ1x1Π) + (µ2x2Π). ∂ξ1 ∂ξ2 ∂ξ1 ∂ξ2 The terms of next-to-leading order (Ω0) are

∂2 ∂2 ∂2 ∂2 2 (κΠ) + 2 (µ1x1Π) + 2 (κΠ) + 2 (µ2x2Π) ∂ξ1 ∂ξ1 ∂ξ2 ∂ξ2

∂ ∂ + (µ1ξ1Π) + (kµ2Π) (2.32) ∂ξ1 ∂ξ2 ∂2 ∂2 ∂2 + 2 (k5x1x2Π) + 2 (k5x1x2Π) + (k5x1x2Π) ∂ξ1 ∂ξ2 ∂ξ1ξ2 ∂ ∂ + (k5x1ξ2Π) + (k5x2ξ2Π). ∂ξ1 ∂ξ2 2.5. THE FOKKER-PLANCK EQUATION 37

The next step is to equate all these terms with the right-hand side of the master equation, given by (2.25), and compare terms of the same order. For the leading- order terms, we also compare the coefficients of the ∂Π/∂ξ1 and ∂Π/∂ξ2 terms. This leads us to the ODEs, as found previously in Eq. (2.17). The PDE obtained by collecting terms of order Ω0 can be written as

2 2 ∂Π X ∂ 1 X ∂2Π = − (M (ξ)Π) + B , (2.33) ∂t ∂ξ i 2 ij ∂ξ ∂ξ i=1 i i,j=1 i j which describes the probability distribution for the fluctuations around the de- terministic (or ‘mean field’) dynamics. The vector M is linear in ξ and is often P written as Mi(ξ) = k Aikξk, where A is found to be the Jacobian of the corre- sponding deterministic system. We will prove this general result in Chapter 3. The symmetric matrix B is independent of ξ. For the system considered here, the matrices A and B (sometimes called the drift and diffusion matrices respectively) have the form ! −k x − µ −k x A = 5 2 1 5 1 , (2.34) −k5x2 −k5x1 − µ2 ! κ + µ x + k x x k x x B = 1 1 5 1 2 5 1 2 , (2.35) k5x1x2 κ + µ2x2 + k5x1x2 This type of Fokker-Planck equation, where the first term on the right hand side is linear in ξ and the second term is independent of ξ, admits a Gaussian solution for Π(ξ, t) [57]. We remark that A and B depend on time through their dependence on x1(t) and x2(t), the values of which are obtained by solving the ODEs for the mean field equations. If the mean field equations reach a stable fixed point, A and B become constant matrices. Then Π describes the fluctuations around the fixed point. The Fokker-Planck equation is characterised by A and B alone. In Chapter 3 we will show how the form of A and B may be found directly from a general reaction system, without the need for the algebra performed here.

2.5 The Fokker-Planck equation

In the previous section we showed how the van Kampen expansion could be used to find an approximate solution to the master equation. This solution corresponds to the mean field equations which we found earlier, plus Gaussian fluctuations 38 CHAPTER 2. BACKGROUND

which are described via a Fokker-Planck equation. In this section we will show how to obtain statistics of the stochastic process from this equation. As the fluctuations are Gaussian, the first and second moments will be sufficient to describe them. We will consider a Fokker-Planck equation with matrices A and B constant. This corresponds to the case where the mean field dynamics have already reached a stable fixed point. We will consider a general Fokker-Planck equation with K variables,

K K ∂Π X ∂ 1 X ∂2Π = − (A ξ Π) + B . (2.36) ∂t ∂ξ ij j 2 ij ∂ξ ∂ξ i,j=1 i i,j=1 i j

The initial condition for the probability distribution is taken to be a delta-function

located at the point ξ0. We begin by calculating the first moments. We multiply

both sides of Eq. (2.36) by ξk and integrate over ξ

K ∂hξki X = A hξ i. (2.37) ∂t kj j j=1

The solution of this equation, with the initial condition, is

tA hξit = e ξ0. (2.38)

Since we are interested in systems with a stable fixed point, A (which is equal to the Jacobian of the deterministic system evaluated the fixed point) has eigenvalues with negative real parts. As a result, the first moments hξit will tend to zero at large time. The second moments are found by a similar fashion. We multiply both sides of Eq. (2.36) by ξkξl and integrate over ξ. We obtain the following coupled equations

K K ∂hξkξli X X = A hξ ξ i + A hξ ξ i + B . (2.39) ∂t ki i l lj k j kl i=1 j=1

The covariances, Ξkl, are constructed from the first and second moments:

Ξkl ≡ hξkξli − hξkihξli. (2.40) 2.5. THE FOKKER-PLANCK EQUATION 39

They obey the same equation as the second order moments. In matrix form this may be written as T ∂tΞ = AΞ + ΞA + B. (2.41)

This is sometimes called the Lyapunov equation [58]. At long time, information about the initial conditions is lost and the covariances reach constant values. Then the covariance matrix is the solution of

AΞ + ΞAT + B = 0. (2.42)

In the next subsection we will discuss methods of solution for this equation. Once the matrix Ξ is found, the explicit form of the probability distribution is completely known. Up to normalisation, a Gaussian distribution with zero values for the first order moments has the general form [35]

1 Π(ξ, t) = (DetΞ)−1/2exp[− ξT Ξ−1ξ]. (2.43) 2

2.5.1 Solving the Lyapunov equation

In the previous subsection we showed that the covariance matrix Ξ obeyed the Lyapunov equation shown in Eq. (2.42). Before looking at the various approaches that can be used to solve this equation, we shall show the conditions under which the equation has a unique solution. In order to do this, we rewrite the Lyapunov equation for a model with K chemical species in the form P x = b, where [59]

P =A ⊗ IK + IK ⊗ A, x =vec(Ξ), b = − vec(B). (2.44)

The operator vec takes the columns of a matrix and stacks them on top of each 2 other, forming a vector of length K . The object IK is the identity matrix of size K. The binary operator denoted by ⊗ is the Kronecker product [58]. The Kronecker Product of two matrices, say S (an n-by-m matrix) and R (a u-by-v 40 CHAPTER 2. BACKGROUND

matrix), is the block matrix

  s11R . . . s1nR  . . .  S ⊗ R ≡  . .. .  , (2.45)   sm1R . . . smnR

which has dimensions of mu-by-nv. Therefore, the operator P is a square matrix of size K2. To prove the uniqueness of the solution of the equation P x = b, and hence that of the Lyapunov equation, we need the eigenvectors of the matrix A, which we will call z, as well as the eigenvalues, λ. In addition to this we shall use the matrix identity [58]

(A ⊗ B)(C ⊗ D) = AC ⊗ BD. (2.46)

Dropping the subscript on the identity matrix (all matrices are square and of size K),we find

[(A ⊗ I) + (I ⊗ A)](z ⊗ z) = (z ⊗ Az) + (Az ⊗ z) = (z ⊗ λz) + (λz ⊗ z) = (λ + λ)(z ⊗ z). (2.47)

That is, the eigenvalues of the matrix P are λi + λj, where i, j = 1, 2,...,K and the eigenvectors of P are z ⊗ z. This tells us that there is a unique solution to P x = b provided that

λi + λj 6= 0 ∀ i, j , (2.48) i.e. no pair of eigenvalues of A sum to zero [60]. So, when looking at the covariances around a stable fixed point, there will always be a unique solution to the Lyapunov equation. To solve the Lyapunov equation, one can simply re-cast the equation, as done in Eq. (2.44), and find the inverse of P to solve for the element of Ξ. The cost of this approach is that the size of the matrix P can be very large, i.e. the square of the number of chemical species. Other approaches have been suggested, including algorithms due to Bartels & Stewart [61] and Hammarling [62]. Here we will briefly describe the Bartels-Stewart algorithm, as this will be mentioned later, in Chapter 3. By using similarity transformations, Eq. 2.42, is transformed so that the matrix A becomes lower triangular, and the matrix AT becomes upper triangular. The result of these transformations is that 2.6. THE LANGEVIN EQUATION 41 elements of the matrix Ξ0, the transformed covariance matrix, may be solved for successively, and then transformed back to find Ξ. Several software packages have the ability to solve the Lyapunov equation, including Mathematica [63] and Octave [64]. The work discussed the following two chapters will involve solutions of this equation. In Chapter 3, we will discuss removing conservation relations from the reaction systems. These conservation relations are present in systems in which not all chemical species vary independently of each other. If these relations are not properly considered, the Jacobian will contain one or more eigenvalue with a value of zero, and hence the solution to the Lyapunov equation will not be unique. In certain cases, the Lyapunov equation is found to be ‘ill-conditioned’. This means that very small changes to entries in the matrices A and B, due e.g. to small changes to the reaction parameters, can result in surprisingly large changes to the solution to the equation, which in our case is the covariance matrix. One reason that this is of interest is that there is often an uncertainty associated with a reaction parameter, the value of which could be estimated from experimental data. Therefore, results for ill-conditioned systems must be appraised carefully, as the numerical values of the covariances cannot be trusted to a high degree of accuracy. Therefore, it is important to be able to assess whether a system can be described as ill-conditioned, and how to quantify this effect. This analysis will be the subject of Appendix A.

2.6 The Langevin equation

Rather than examining the fluctuations via a Fokker-Planck equation, it is some- times easier to deal with an equation for the stochastic variable itself. This can be done using a stochastic differential equation (SDE) known as the Langevin equation [39]. The Langevin equation which is equivalent to the Fokker-Planck equation in Eq. (2.36) is K dξi X = A ξ + η , (2.49) dt ij j i j

0 0 where η is a noise term with correlator hηi(t)ηj(t )i = 2Bijδ(t − t ). Below we show the equivalence of the SDE approach and the Fokker-Planck approach in the one-variable case. This can easily be generalised to an arbitrary number of variables. We write the 1-D Langevin equation in the general SDE form, where W is a Wiener process [65]. We will use the variable z to distinguish from the 42 CHAPTER 2. BACKGROUND

Langevin equation already introduced

dz = f(z, t)dt + g(z, t)dW, (2.50)

and the function g is related to the noise correlator by g2 = B. This is a more general equation than we require, as our noise correlator does not depend on the random variable. We begin by writing down the SDE for an arbitrary function of z, q(z). To apply a transformation of variables to an SDE, It¯o’srule must be used [65]. Then the SDE for q(z) is

dq  d2q  g2(z, t) dq  dq = f(z, t)dt + dt + g(z, t)dW. (2.51) dz dz2 2 dz

To find the differential equation for the mean of q, we take the average of each side  dq  g2(z, t) d2q  dhqi = f(z, t) dt + dt. (2.52) dz 2 dz2 This can be re-expressed as

dhqi Z ∞ dq  g2(z, t) d2q  = [f(z, t) + 2 ]P (z, t)dz. (2.53) dt −∞ dz 2 dz

Using integration by parts, allied to the fact that the probability distribution must go to zero as z → ±∞, we find

Z ∞  2  dhqi ∂ 1 ∂ 2 = q(z) − [f(z, t)P (z, t)] + 2 [g (z, t)P (z, t)] dz. (2.54) dt −∞ ∂z 2 ∂z

The left hand side of the above equation can be written as

dhqi Z ∞ ∂ = q(z) P (z, t)dz. (2.55) dt −∞ ∂t

This equation must be true for any function q(z). Equating the integrands yields the desired Fokker-Planck equation

∂ ∂ 1 ∂2 P (z, t) = − [f(z, t)P (z, t)] + [B(z, t)P (z, t)], (2.56) ∂t ∂z 2 ∂z2

where B(z, t) ≡ g2(z, t). This result can be extended for multi-variate equations. 2.7. STOCHASTIC OSCILLATIONS & POWER SPECTRA 43

2.7 Stochastic oscillations & power spectra

As mentioned in Chapter 1, there are certain cases where qualitative, as well as quantitative, differences between the deterministic and stochastic models of the same system are visible. In particular, models studied stochastically can exhibit sustained oscillations, sometimes called ‘quasi-cycles’, whereas the corresponding deterministic model will go to a fixed point solution [28, 29]. In this section, we will show how to identify whether such oscillations will be present in a model, and, if so, how to describe them analytically. We will do this by using the Langevin description of the fluctuations, given by Eq. (2.49), for a model with S chemical species. To examine the frequencies of the fluctuations described by this equation, we take the Fourier transform, which yields

S ˜ X ˜ −iωξi(ω) = Aijξj(ω) +η ˜i(ω), (2.57) j=1 where the tilde represents the Fourier transform. To simplify the notation, we introduce the matrix Φ, where Φ(ω) ≡= −iωIS − A, where IS is the identity matrix of dimension S. This relation can be rearranged as

S ˜ X −1 ξi(ω) = Φij (ω)˜ηj(ω). (2.58) j

2 The power spectra of the fluctuations due to species i, Pi(ω), is given by h|ξi(ω)| i. To calculate this, we shall also need the Fourier transform of the noise correlator in Eq. (2.49), which is

0 0 hη˜i(ω)˜ηj(ω )i = 2πBijδ(ω + ω ). (2.59)

Now we are able to calculate Pi(ω). Up to a multiplicative factor, it has the form

S S X X −1 † −1 Pi(ω) = Φij (ω)Bjk(Φ )ki (ω). (2.60) j=1 k=1

We can describe how the power spectra will depend on ω, for a system of general dimension, S, by considering the form of the inverse of the matrix Φ(ω). The inverse of a matrix can be expressed as the adjugate of the matrix (the transpose 44 CHAPTER 2. BACKGROUND

of the matrix of cofactors [66]) divided by its determinant. By calculating the cofactors of Φ we can see that they will be polynomials of ω up to ωS−1. Simi- larly, the determinant of Φ is found to be a polynomial in ω up to ωS. Putting these pieces of information together, and using the fact that matrix B has no ω

dependence, we can write a general form of Pi(ω) for a system with S chemical species. This general form is

PS−1 2j j=0 bjω Pi(ω) = , (2.61) PS 2i i=0 aiω

where the coefficients ai and bi are constructed from entries of matrices A and

B. The presence of oscillations is indicated by Pi(ω) having a peak at a non- zero frequency. Whether such a peak appears can be deduced by looking at the eigenvalues of the system’s Jacobian. The presence of sustained oscillations is due to complex eigenvalues. In the analysis of the deterministic system, complex eigenvalues, with a negative real part result in the system approaching the fixed point via oscillations which are damped in an exponential envelope. The noise in the Langevin equation is white, hence it contains oscillations of all frequencies. The frequency of these damped oscillations is picked out, and the oscillations are sustained. This effect can be described more formally be looking at the denomi- nator of the power spectrum, which is the same for all species. By diagonalising the Jacobian, we can write the denominator, R(ω), in terms of its eigenvalues, {λ} [67], S Y 2 R(ω) = |λj − iω| . (2.62) j The location of the peak in the power spectra is strongly influenced by the mini- mum in R(ω). In the presence of complex eigenvalues, this will be minimised at

ω = Im(λi). The numerator of the power spectra is also a function of ω, so this

may shift the position of the maximum value of Pi(ω). It is not precisely true that complex eigenvalues will always lead to visible oscillations in the stochastic model of the reaction system. The size of the real part of the eigenvalue indicates

the strength of the damping, so if |Re(λi)| is very large then the oscillations will be suppressed, and therefore, will not have a significant impact on the overall dynamics [68].

To demonstrate these ideas, we will use another reaction system, described 2.7. STOCHASTIC OSCILLATIONS & POWER SPECTRA 45 below:

g1 A + Y2 −→ Y1,

g2 Y1 + Y2 −→ C,

g3 B + Y1 −→ 2Y1 + Y3,

g4 2Y1 −→ Q,

g5 Y3 −→ Y2,

g6 A −→ Y3. (2.63)

The model is motivated by the Oregonator model [69], a theoretical scheme for an autocatalytic reaction, which was designed to exhibit limit cycle behaviour, although we shall restrict ourselves to parameter choices for which the fixed point is stable. We choose to use this model, as it is the simplest, realistic model which can display this behaviour. By realistic we mean that it contains no reactions no more complicated than bi-molecular. It contains three dynamic variables,

(Y1,Y2,Y3), whose concentrations we write as (y1, y2, y3). The concentrations of the other variables are fixed. For simplicity, we shall set the concentrations of these variables to one. The ODEs for the system are found to be

dy 1 =g y − g y y + g y − 2g y2, dt 1 2 2 1 2 3 1 4 1 dy 2 =g y − g y − g y y , dt 5 3 1 2 2 1 2 dy 3 =g − g y + g y . dt 6 5 3 3 1 (2.64)

The ODEs for this model contain multiple fixed points, but only one which is physically realistic i.e. all concentrations are positive. For the reaction parame- ters chosen here, the fixed point is stable, and two of the eigenvalues of A form a pair of complex conjugates. A Fokker-Planck equation, describing the fluctua- tions around the fixed point, is found by using the van Kampen expansion. The 46 CHAPTER 2. BACKGROUND explicit forms of the matrices A and B are

  −g2y2 + g3 − 4g4y1 g1 − g2y1 0   A =  −g2y2 −g2y1 − g1 2g5  , (2.65) g3 0 −g5

 2  4g4y1 + g3y1 + g2y1y2 + g1y2 g2y1y2 − g1y2 g3y1   B =  g2y1y2 − g1y2 g2y1y2 + 4g5y3 + g1y2 −2g5y3  , g3y1 −2g5y3 g3y1 + g5y3 + g6 (2.66) where the yi are evaluated at their fixed point values. The conjugate pair of eigenvalues for the parameters chosen are −0.0525 ± 0.133401i. As the real parts of the conjugate eigenvalues are small, we can expect to see stochastic oscillations. These can be seen in the time series shown in Figure 2.4, the caption of which contains the values of the reaction parameters. The power spectrum for the oscillations of all three species is shown in Figure 2.5. It can be clearly seen that the peaks of the power spectra are all close to the value of the imaginary part of the conjugate pair of eigenvalues. We compare the theoretical prediction, from Eq. (2.60), with the power spectra obtained from numerical simulations. These are found after taking the discrete Fourier transform of the time series [70].

2.8 Summary

This chapter contains the technical details we will require in the rest of this thesis. We have shown how the master equation is derived, and how information about the stochastic process described by the master equation can be obtained, using both numerical and analytical approaches. In Section 2.4 we described the van Kampen expansion, which we used to quantify the fluctuations around a stable fixed point. Results from the expansion were then used to calculate the power spectra of these fluctuations. This expansion technique will be the subject of the next two chapters. The technique is well-understood and is not particularly difficult to apply, but the procedure can be quite tedious for more the complicated, more realistic systems we wish to study. Therefore, we will look at how to automate the procedure, so that once a reaction system is described, computer software can then be used to carry out the relevant linear algebra. In Chapter 5 we will use the techniques required to compute the power spectra 2.8. SUMMARY 47

Number of Molecules

1800

1700

1600

1500

1400 Time 50 100 150 200 250

Number of Molecules

2400

2300

2200

2100

2000 Time 50 100 150 200 250

Figure 2.4: Time series showing stochastic oscillations in the number of Y1 (top) and Y3 molecules (bottom), generated using the Gillespie algorithm. The red, dashed line indicates the fixed point value. The values for the reaction parameters −1 −1 −1 −1 −1 −1 −1 −1 are: g1 = 0.2nM s , g2 = 0.2nM s , g3 = 0.2nM s , g4 = 0.002nM s , −1 −1 −12 g5 = 0.15s and g6 = 0.01s . The compartment volume is 10 l 48 CHAPTER 2. BACKGROUND

P Ω

200H L

150

100

50

Ω 0.1 0.2 0.3 0.4 0.5 0.6 0.7

P Ω

H L

25

20

15

10

5

Ω 0.1 0.2 0.3 0.4 0.5 0.6 0.7

P Ω

H L 300

250

200

150

100

50

Ω 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Figure 2.5: Power spectrum of the stochastic oscillations for species Y1 (top), Y2(middle) and Y3 (bottom). The theoretical prediction (orange) is compared to the spectrum obtained from numerical simulations (blue), averaged over 3000 runs of the Gillespie algorithm. 2.8. SUMMARY 49 of stochastic oscillations to examine how two stochastic oscillators in, say, two neighbouring cells can synchronise under certain conditions. Chapter 3

Fluctuation Analysis and COPASI

In this chapter we will outline a formalism which will enable us to describe a bio- chemical reaction system in a very general way. This will allow us to perform the van Kampen expansion for a general reaction system, with an arbitrary number of chemical species and reactions. This is our starting point for implementing the fluctuation analysis in the software package, COPASI. We begin this chapter by describing how reaction systems are constructed and studied in COPASI, as our work will build on several tasks already available in the software package.

3.1 Using COPASI

In this section we will briefly discuss how a reaction system is described in CO- PASI. The user must list the number of compartments in the model, their vol- umes, and which chemical species are present in each compartment. It is possible to define a compartment whose volume changes with time, but throughout this work we will only consider compartments of fixed size. The user must also choose the units to be used in the model, for the volume (normally given in litres, but cubic metres may also be chosen), for time and for the quantity of the chemical species, which is often in moles, but sometimes unscaled molecular numbers are preferable. The symbol ‘#’ is used to denote this unit. It is possible to attach prefixes to these units e.g. picolitres rather than litres, if desired. The user must also list and describe the possible chemical reactions. Figure 3.1 shows the list of reactions for the system introduced in Section 2.7, and details how these reactions

50 3.1. USING COPASI 51 are defined. For this model, the units for volume, time and quantity were chosen to be millilitres, seconds and nanomoles. Once the model has been defined, CO- PASI uses the description of the reaction kinetics to produce the set of ODEs that describe the deterministic time-evolution of the system. If the user wishes to find a fixed point solution for the ODEs, COPASI sets their time-derivatives to zero and solves the resulting non-linear simultaneous equations numerically[48]. As we found in Chapter 2, the stability of a fixed point is determined by calculating the eigenvalues of the system’s Jacobian, evaluated at that fixed point. COPASI calculates the Jacobian using finite differences, and obtains the eigenvalues by using routines from the numerical linear algebra library LAPACK [71]. These routines can be performed by accessing the ‘Steady-State’ task in COPASI. CO- PASI can perform a number of tasks, which give the user information about the properties of the model, and how it will behave dynamically. One task that we will use in this and subsequent chapters is the ‘Time Course’ task, which de- scribes the time evolution of the system. This can be done deterministically (by forward-integration of the ODEs), or stochastically. The stochastic simulation employs the Gillespie algorithm, which was outlined in Section 2.3. Variants of the algorithm, such as the Gibson & Bruck algorithm, or the τ-leap method, are also available. Other tasks which we will use include ‘Mass Conservation’, ‘Optimisation’ and ‘Parameter Scanning’. These will be described later in this chapter.

We wish to add an additional task to COPASI so that, once a stable fixed point is found using the Steady-State task, the fluctuations around this fixed point can be quantified using the van Kampen expansion. In Chapter 2 we showed how to do this for a particular model. To automate this procedure, however, we need to show how it is done for a general model, with an arbitrary number of species and reactions. This will be the subject of the next section. At this stage we wish to point out that at the beginning of this project we were unaware of the paper by Elf and Ehrenberg, who have previously formulated the van Kampen expansion in this general way [40]. In this chapter, we shall just look at models with one compartment: multi-compartment models will be considered in Chapter 4. 52 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI

Figure 3.1: Top: List of chemical reactions, as displayed in COPASI. Extra reac- tions may be added by clicking on ‘New Reaction’. Bottom: For each reaction, the reaction kinetics, or ‘rate law’ must be chosen, along with numerical val- ues for the reaction parameters. COPASI calculates the correct units for these parameters. 3.2. FORMALISM 53

3.2 Formalism

Here we outline a completely general reaction system. It consists of Kˆ species, labelled Y1,...,YKˆ , contained in a cell of volume V . These chemical species are involved in M reactions, which we write in the following form:

r11Y1 + ... + rKˆ 1YKˆ −→ p11Y1 + ... + pKˆ 1YKˆ . . (3.1)

r1M Y1 + ... + rKMˆ YKˆ −→ p1M Y1 + ... + pKMˆ YKˆ .

ˆ The numbers riµ and piµ (i = 1,..., K; µ = 1,...,M) are stoichiometric coeffi- cients, which state the substrates and products of each reaction. These reactions may be written more succinctly as

Kˆ Kˆ X X riµYi −→ piµYi, µ = 1, 2, ...M. (3.2) i=1 i=1

All the reactions above are strictly irreversible. Any reversible reactions must be described by two separate irreversible reactions. We introduce the stoichiom- etry matrix, νiµ ≡ piµ − riµ, to describe how many particles of species Yi are gained or lost due to the reaction µ. Although there are Kˆ species present in the reaction system, they may not all be able to vary independently, as mass conser- vation relations may be present in the system. This means that the population of molecules for a particular species can be expressed as a linear combination of the populations of the other species. To illustrate this, we use the well known Michaelis-Menten reaction mechanism [72]. A substrate, S, is converted to a product, P, via an enzyme E. The substrate and enzyme bind reversibly to form a complex SE, which can be converted to the product P and the enzyme. A constant flux of S molecules is supplied to the system and P molecules are able to leave the system. In our notation above, Y1 = S, Y2 = E, Y3 = P and Y4 = SE. We write the reactions as

−→ S S + E −→ SE SE −→ S + E SE −→ P + E 54 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI

P −→ . (3.3)

The total number of enzyme molecules, i.e. the number of free enzyme molecules plus the enzyme molecules bound in a complex, is fixed. If the number of SE molecules decreases by one, the number of E molecules increases by one. We express the conservation relation as n2 +n4 = β, where n2 and n4 are the number of E and SE molecules respectively, and β is a constant integer. This means that there is only one independent variable here, not two. In general, if the system contains Λ conservation relations, then the dimension of the system can be reduced from Kˆ to K = Kˆ − Λ. It is necessary to reduce the number of variables in this way to facilitate the linear algebra to be done later. In the Michaelis-Menten system above, Kˆ = 4 and Λ = 1, so K = 3. The general formalism for finding and describing the conservation relations will be given in Section 3.2. To fully specify our general model given in Eq. (3.2), kinetic functions ˜ fµ(n,V ) associated with reaction µ need to be given. They are functions of the vector of particle numbers and the volume, V . For the Michaelis-Menten ˜ example above we have, enumerating the reactions from top to bottom, f1 = V k1, ˜ ˜ ˜ ˜ f2 = k2n1n2/V , f3 = k3n4, f4 = k4n4 and f5 = k5n3. Due to the conservation relations, the vector of particle numbers can be fully described by K species,

so we write n = (n1, . . . , nK ). However, this vector of particle numbers does

not include n4, so we must rewrite any kinetic functions involving n4 using the conservation relation. As discussed in Chapter 2, in the large volume limit the

kinetic functions become functions of the species concentration ni/V only. The

kinetic functions are then denoted by fµ(x), where xi = limV →∞ ni/V . In this limit the conventional, macroscopic and deterministic, description of the systems applies and a set of ordinary differential equations (ODEs) can be written down to describe it. This has the general form:

M dxi X = ν f (x), i = 1,...,K. (3.4) dt iµ µ µ=1

These equations for the Michaelis-Menten system are:

dx 1 = k − k x (β − x ) + k x , dt 1 2 1 2 3 2 dx 2 = k x (β − x ) − (k + k )x , dt 2 1 2 3 4 2 3.2. FORMALISM 55

dx 3 = k x − k x , (3.5) dt 4 2 5 3 where the species SE has been eliminated using the conservation relation ex- pressed in terms of the chemical concentrations. The long time behaviour of this system of ODEs is a stable fixed point. To find this, we set the time derivatives to zero, and solve the resulting equations simultaneously. The fixed point values, denoted by an asterisk, are:

k k + k k k k x∗ = 1 3 1 4 , x∗ = 1 , x∗ = 1 . (3.6) 1 k1 2 3 k2k4(β − ) k4 k5 k4

As outlined in Chapter 2, our starting point for a stochastic description of the reaction system is the chemical master equation, which specifies how the proba- bility that the system is in the state n at time t, P (n, t), changes with time. If 0 0 Tµ(n|n ) is the transition rate from state n to state n associated with reaction µ, then the master equation takes the form

M dP (n, t) X = [T (n|n − ν )P (n − ν , t) − T (n + ν |n)P (n, t)] , (3.7) dt µ µ µ µ µ µ=1 where νµ = (ν1µ, . . . , νKµ) is the stoichiometric vector corresponding to reaction µ. This completely defines the stochastic dynamics of the system once the initial condition P (n, 0) is given. If we multiply Eq. (3.7) by n, and sum over all possible values of n one finds, after shifting the change of variable n → n + ν in the first term, M dhn(t)i X = ν hT (n + ν |n)i. (3.8) dt µ µ µ µ=1 Dividing Eq. (3.8) by V and taking the limit V → ∞, we see that we recover the deterministic description given by Eq. (3.4) if we make the identification −1 fµ(x) = limV →∞ V hTµ(n+νµ|n)i. To obtain more detail than the macroscopic description provides, we need to employ an approximation scheme which can be applied in a systematic way. In Chapter 2 we outlined the system-size expansion of van Kampen, which allows one to calculate corrections to the deterministic results √ in powers of V −1/2 by writing n/V = x + ξ/ V , where x is found by solving Eq. (3.4). This leads to a description where the fluctuations are Gaussian, and are centred on the solution to the system’s macroscopic equations. As described in 56 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI the previous chapter, the fluctuations are described by a Fokker-Planck equation,

K K ∂Π X ∂ 1 X ∂2Π = − (M (ξ)Π) + B , (3.9) ∂t ∂ξ i 2 ij ∂ξ ∂ξ i=1 i i,j=1 i j where Π(ξ, t) replaces P (n, t) as the probability distribution after the change of variables is made. The vector Mi(ξ) is linear in ξ, and may be written as PK Mi(ξ) = j=1 Aijξj. The entire dynamics is characterised by two matrices A and B, whose form, for a general reaction system, will now be found. These general forms can be found with the aid of the step operators, E, which were introduced in Section 2.4. For the reaction labelled by µ, the relevant combination of step

QK −νiµ operators can be written as i=1 E [40], which enumerates how the molecular population of each species is altered by one reaction event of type µ. After the change of variables is introduced, these step operators can be written as an infinite series of derivatives, as shown in Section 2.4. This product can then be written as K K  2 2  Y Y νiµ ∂ νiµ ∂ −νiµ = 1 − √ + + ... . (3.10) E ∂ξ 2V ∂ξ2 i=1 i=1 V i i Although the product symbol runs over all species, species i will only make a contribution to the expression if νiµ is non zero i.e. the population of i molecules is affected by the reaction µ. Now that we have the product operator in this form, we can perform the expansion of the master equation, and identify the contributions to matrices A and B due to this reaction. We follow the same procedure carried out for the example in Section 2.4. The product of step operators in Eq. (3.10) is used to rewrite the right-hand side of the master equation, as we did in Eq. (2.23). The contribution from reaction µ is

" K ! # Y −νiµ E − 1 Tµ(n + νµ|n)P (n, t). (3.11) i=1 √ After the change of variables (n/V = x + ξ/ V ) is made, the right hand side of the master equation can be rewritten, using Eq. (3.10) and noticing that we can express the transformed transition rates in terms of the fµ(x). We recall that terms of order V 1/2 contribute to the deterministic description of system (the ODEs) in Eq. (3.4), and the terms of order V 0 contribute to the Fokker-Planck equation in Eq. (3.9). We can now write down expressions for the entries of the 3.2. FORMALISM 57 matrices A and B:

M M X ∂fµ(x) X A (x) = ν ,B (x) = ν ν f (x), (3.12) ij iµ ∂x ij iµ jµ µ µ=1 j µ=1 where we have summed over µ to collect terms from all M reactions. By com- paring the form of the entries of matrix A with the ODEs in Eq. (3.4), we can see that matrix A is identical to the Jacobian matrix. The explicit forms of these matrices for the Michaelis-Menten example are

  −k2(β − x2) k2x1 + k3 0   A =  k2(β − x2) −k2x1 − (k3 + k4) 0  , 0 k4 −k5

  k1 + k2x1(β − x2) + k3x2 −k2x1(β − x2) − k3x2 0   B =  −k2x1(β − x2) − k3x2 k2x1(β − x2) + (k3 + k4)x2 −k4x2  , 0 −k4x2 k5x3 + k4x2 (3.13) where the parameter associated with the conservation relation, β, has been con- verted into a concentration. In general, these matrices will depend on time, through the time-varying solutions to the ODEs. However, as mentioned in Chap- ter 2, we will be interested in fluctuations about the steady-state. In terms of the deterministic dynamics Eq. (3.4), the solution x(t) will be replaced by its fixed point value x∗, found for the Michaelis-Menten example in Eq. (3.6), and so the A and B matrices will be independent of time. We describe these fluctuations in terms of their covariances, Ξij. The ansatz used in the van Kampen expansion 1 implies that the first moment of the fluctuations, hξi(t)i, is zero . Therefore, the covariances can be simplified to Ξij ≡ hξi(t)ξj(t)i. As described in Section 2.5, the matrix of covariances is the solution to the Lyapunov equation

AΞ + ΞAT + B = 0. (3.14)

All the matrices in Eq. (3.14) are dimension K × K and are independent of time. For fluctuations in biochemical reaction systems, it is often more useful to describe the fluctuations in terms of particle numbers, rather than the random

1Or, more correctly, it is zero to this order in the expansion [35, 50] 58 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI

variable ξ. We call the resulting covariance matrix C, which is defined as

Cij = h(ni − hnii)(nj − hnji)i . (3.15)

Since the average of the fluctuations is zero, we can write hnii = V xi. Relating

ξi and ni allows a relation between our two covariances matrices, Ξ and C, to be found: D√  √ E Cij = V ξi V ξj = V Ξij. (3.16)

The Lyapunov equation, analogous to Eq. (3.14) is therefore

AC + CAT + VB = 0. (3.17)

As discussed in the previous chapter, there are a number of ways of solving this matrix equation. In this application, we shall favour the Bartels-Stewart algorithm [61]. We wish to automate this procedure in the software package COPASI so that, once a reaction system is described, the covariance matrix for the fluctuations around a stable fixed point (if one exists, and can be found) can be obtained without the lengthy calculations encountered in the example in Chapter 2. In order to do this, COPASI must perform the following steps:

(i) Check the reaction system for reversible reactions, which must be split into two irreversible reactions. For a large class of models, COPASI is able to do this in an automated way. Otherwise, they must be split manually

(ii) Use the deterministic description of the model (ODEs) to find a fixed point, by setting the time derivative in each equation to zero, and solving the resulting equations simultaneously

(iii) Detect any conservation relations present, and use these to reduce the num- ber of variables in the system

(iv) Evaluate the Jacobian matrix (of the reduced system) at the fixed point, and find its eigenvalues. If they all have negative real parts, the fixed point is stable

(v) Calculate the matrices A and B, which characterise the Fokker-Planck equa- tion given in Eq. (3.9) 3.3. DETAILS OF THE IMPLEMENTATION 59

(vi) Use the values of A and B to solve Eq. (3.17) for the covariance matrix, C. This equation is solved using an implementation of the Bartels-Stewart algorithm, described in Chapter 2, built in to COPASI

(vii) Using the conservation relations (if any) to calculate the covariances of any remaining species

These steps contain a mixture of tasks which COPASI is already able to per- form, and tasks which we must add to the software package. Details of how we performed the implementations will be given in the next section.

3.3 Details of the implementation

In the previous section we listed the steps COPASI must perform to calculate the fluctuations around the fixed point. COPASI is already capable of some of these tasks. For example, it can find the fixed points of a reaction system, and check their stability by calculating the Jacobian. In the previous section, we found that the Jacobian matrix was identical to the matrix A which appears in the Fokker- Planck equation. Therefore we only need to add an algorithm for the calculation of the matrix B and the Fokker-Planck equation is defined. I (the author of this thesis) wrote a simple algorithm in C++ describing how B was calculated. This also ensured that the theoretical description of the reaction system matched they manner in which the system was described in COPASI. This guided the implementation of the calculation in COPASI, which was performed by J¨urgen Pahle. The C++ code is displayed in Appendix B. The code is split into two files. One of these, a header file, describes a particular reaction system. This file is the input for the second file, which calculates the numerical values of the entries of the matrix B. I was involved in the COPASI implementation in an advisory capacity. For example, at the beginning of the project we were unaware of the consequences of the conservation relations in regard to the linear algebra needed to solve the Lyapunov equation. Therefore, I had to consider how B should be calculated when conservation relations were present, and how the covariance matrix for full system should be recovered from the covariance matrix for the reduced system (with the conserved quantities removed). J¨urgenPahle and I also had to decide which algorithm would be the most suitable for solving the Lyapunov equation. I carried out an investigation into how ill-conditioning could affect the reliability 60 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI

of the solution to the Lyapunov equation, some details of which are found in Appendix A. I was also responsible for testing the implementation, by comparing the results found by COPASI with those obtained myself, either by hand or computationally.

Several of the steps in the list above require knowledge and manipulation of the conservations relations. The next section describes how these conservation relations are found, and how these species are eliminated from the system.

3.4 Conservation relations

To show how we shall cope with conservation relations, we return to the Michaelis- Menten reaction system, described in the previous section. The deterministic description of the model is given by ODEs for the four species:

      k1 x 1 −1 1 0 0 1        k2x1x2  d  x2   0 −1 1 1 0      =    k3x4  . (3.18) dt  x   0 0 0 1 −1     3       k4x4  x4 0 1 −1 −1 0 k5x3

As already mentioned, the conservation relation in this system stems from the fact that the overall amount of enzyme is constant. The rows in the stoichiometry

matrix, νiµ, corresponding to these species are the second and the fourth, which are clearly linearly dependent, since we can write

d (x + x ) = 0. (3.19) dt 2 4

Therefore, we wish to reduce the stoichiometry matrix so that it contains only linearly independent rows. We do this using ideas already implemented in CO- PASI, due to Reder [73] and Vallabhajosyula et. al. [74]. Each row of the ‘full’ stoichiometry matrix, denoted by ν(f), should be a linear combination of the rows of the ‘reduced’ matrix, ν. Following Reder [73], we relate these two matrices via 3.4. CONSERVATION RELATIONS 61 a link matrix, L, such that ν(f) = Lν. By inspection we find     1 −1 1 0 0 1 0 0       1 −1 1 0 0  0 −1 1 1 0   0 1 0    =    0 −1 1 1 0  . (3.20)  0 0 0 1 −1   0 0 1        0 0 0 1 −1 0 1 −1 −1 0 0 −1 0

The full stoichiometry matrix, of size Kˆ × M is written as the product of the link matrix and the reduced stoichiometry matrix, which is of size K ×M. So, we can describe the reaction system with our reduced equations, along with the conser- vation relation(s). This is useful, as matrices with linearly dependent rows will have at least one eigenvalue with value zero[66]. Recall that, for a stable system, its Jacobian evaluated at the fixed point will have eigenvalues all of whose real parts are strictly less than zero, but this is only the case if the linearly dependent rows have been removed. In addition to this, in Chapter 2 we showed that the solution to the Lyapunov equation is only unique if no pairs of eigenvalues of A sum to zero, but this will be the case if A contains one or more zero eigenvalues. So, to automate the fluctuation analysis in COPASI, we need to find the matrices A and B in their reduced form, and use these to solve the Lyapunov equation for the reduced K × K covariance matrix. The complete, Kˆ × Kˆ covariance matrix, which we label C(f), can then be constructed with the aid of the conservation relations. This will be described below.

In all of what follows, we write the state vector n with the K = Kˆ − Λ inde- pendent species first, with the dependent species filling the remaining Λ positions. This anticipates ‘shortening’ n to contain K elements, rather than Kˆ . Using the conservation relations, a dependent variable can be expressed in terms of the K independent variables. The conservation relations will always be linear [73] and so, in general, may be written as:

K X ˆ nj = cj + αjknk, j = K + 1, ..., K, (3.21) k=1 where the cj and αjk are integer constants. In our Michaelis-Menten example,

α42 = −1 and the remaining αjk are zero. If the total number of enzyme molecules is 1000, then c4 = 1000. To see how the conservation relations can be used to find 62 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI covariances involving dependent species, we make the change of variables intro- √ duced in the van Kampen expansion (nj = V xj + V ξj) in the above equation, which gives K √ X √ V xj + V ξj = cj + αjk(V xk + V ξk). (3.22) k=1 The conservation relations should hold in the deterministic limit, so we can write

K X V xj = cj + αjkV xk. (3.23) k=1

Hence, we see that K X ξj = αjkξk. (3.24) k=1 So, once the Lyapunov equation has been solved, and the reduced covariance matrix is found, the relations in Eq. 3.24 can be used to compute the remaining covariances. To demonstrate this, we first calculate Ξij, where i is an independent species and j is a dependent species:

K X Ξij = hξiξji = hξi( αjkξk)i, (3.25) k=1 since hξii = hξji = 0. Now we have an expression for Ξij in terms of known quantities, the covariances of the independent species. Next we calculate Ξij for the case where both i and j are dependent species. We find that in this case

K K X X Ξij = hξiξji = h( αikξk)( αjlξl)i. (3.26) k=1 l=1

Again, we have obtained an expression in terms of known quantities. As before,

Cij = V Ξij. We can write the link matrix, L, in terms of the αjk:

1 0  I   ..     .  α(K+1)1 ··· α(K+1)K  L =   =  . . .  , (3.27) 0 1  . .. .     . .  L0 αKˆ 1 ··· αKKˆ where I is the K × K identity matrix. The relation between the reduced and full 3.5. THE LINEAR NOISE APPROXIMATION IN COPASI 63 covariance matrices can then be written as

C(f) = LCLT , (3.28) with C the K × K covariance matrix of the reduced system, and C(f) the covari- ance matrix for all Kˆ species. We now have all the tools we need to implement our method in COPASI, which will be the subject of the next section.

3.5 The linear noise approximation in COPASI

The techniques described in this chapter and in Chapter 2 can now be used to automate the calculations required to estimate the size of the noise around a steady-state. This is done using the list of tasks given at the end of Section 3.2. We will illustrate how COPASI does this by returning to the Michaelis-Menten reaction system. Once numerical values have been chosen for the reaction param- eters, values for the concentrations at the fixed point can be found, as displayed in Eq. (3.6). COPASI then calculates the matrices A and B, checking the fixed point’s stability via the eigenvalues of A, evaluated at the fixed point. If the fixed point is stable, the covariance matrix for the fluctuations can be calculated by solving the Lyapunov equation, Eq. (3.17). COPASI displays the numerical values for the matrices A and B, as well as the covariance matrix. Figure 3.2 shows the numerical values for B and the covariance matrix. The parameter val- ues, along with the exact form of the conservation relation and the compartment volume, are given in the caption. Once the Lyapunov equation has been solved for the reduced system, the covariance matrix for the full system can be found, as described in the previous section. In Table 3.1 the values of the covariances pre- dicted by the LNA are compared to those found from simulations (in brackets), using the Gillespie algorithm. In addition to avoiding lengthy algebra, another advantage of implementing the LNA in COPASI, is that calculations of the LNA may be done in harness with the other tools available in the software package. Of particular interest are the parameter scan and optimisation tasks. One reason for this is that there is often uncertainty associated with the values chosen for the reaction parameters, which could be inferred from e.g. experimental data. In COPASI, the parameter scan task is used to repeatedly perform a calculation e.g. a steady state calculation for 64 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI

Figure 3.2: The COPASI display for the LNA task, showing the diffusion matrix (top) and the covariance matrix (bottom). Reaction parameters were chosen to −1 −1 −1 −1 −1 −1 be k1 = 0.2 nMs , k2 = 4 nM s , k3 = 3 s , k4 = 1 s , k5 = 0.15 s . The system volume was 10−12 l. The total number of enzyme molecules was set at 220. 3.5. THE LINEAR NOISE APPROXIMATION IN COPASI 65

S SE P E S 1455.8 (1455.6) 61.4 (61.2) 33.5 (32.8) -61.4 (-61.2) SE 61.4 (61.2) 59.1 (59.1) -4.4 (-4.4) -59.1 (-59.1) P 33.5 (32.8) -4.4 (-4.4) 773.9 (773.4) 4.4 (4.4) E -61.4 (-61.2) -59.1 (-59.1) 4.4 (4.4) 59.1 (59.1)

Table 3.1: The covariance matrix C for the Michaelis-Menten reaction system. The covariances calculated using the LNA was compared with those obtained from simulation (values in brackets) using 104 time series generated using the Gillespie algorithm in COPASI.

a prescribed range of parameter choices, allowing the user to see how the system’s behaviour changes due to the reaction parameter values. The optimisation task also requires the user to define a parameter range, but instead attempts to find a maximum or minimum value for an objective function (defined by the user) within this parameter range. This objective function could be the value of the concentration of a particular species at a fixed point or, in our case, a particular variance or covariance. COPASI contains a number of different algorithms for this task. These include traditional gradient-based methods, population-based algorithms like evolutionary algorithms and particle swarm [75, 76], and stochastic searches, such as simulated annealing. If the objective function has several local minima, but the global minimum is sought, the gradient-based techniques may not be suitable, as they can become ‘stuck’ in a local minima. In general, however, no one particular method is best for an arbitrary fitness landscape [77].

In addition to studying models defined by the user, COPASI can be used to study models from the literature, if they can be imported into COPASI. In particular, COPASI can read models written in SBML (Systems Biology Markup Language) [78, 79], which is designed to be the standard language in which to represent models of biological processes. Some of the models used in this thesis were found in the BioModels database, which is an online repository for compu- tational models of biological systems[80, 81]. Many of these models are curated, meaning that the syntax of the SBML file is checked, and the model is compared with the description in the corresponding journal article for consistency. 66 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI

3.6 A MAPK signalling system

We will now apply our method to a model of signal transduction, to demon- strate how the calculation of the LNA in COPASI can be used with other tasks, namely parameter scan and optimisation. We will use a model of signalling through mitogen-activated protein kinase cascades. Signalling through mitogen- activated protein kinases (MAPK) is involved in a wide range of cellular pro- cesses, such as proliferation, differentiation, stress responses and apoptosis. As a result, it is implicated in diseases like cancer, stroke or diabetes [82]. A number of computational modelling studies have helped to elucidate dynamic properties of the system, such as amplification of signals, noise reduction or switching behaviour [83]. There exist different specific MAPK signalling path- ways with different functions, topologies and characteristics [83]. However, in most cases the basic structure is that of a three-tier cascade. This cascade structure significantly enhances the system’s sensitivity to external stimuli [84]. The model we will study has such a three-tier cascade. It is due to Kholo- denko [84], and is a generic model of an extracellular signal-regulated kinases (ERK) MAPK signalling cascade. The reactions are listed in Table 3.2, and shown schematically in Figure 3.6. There are three conservation laws present, one for each level of the cascade, so the system has five independent variables. Throughout this investigation, we use the conservation relations in [84], given in terms of concentrations which are denoted by square brackets, which are: [MKKK]+[MKKK-P ] = 100nM, [MKK]+[MKK-P ]+[MKK-PP ] = 300nM and [MAP K] + [MAP K-P ] + [MAP K-PP ] = 300nM. Due to a negative feedback loop, the model can exhibit limit cycle behaviour for some parameter values, and a stable steady-state for others. Whilst Kholo- denko examined the model in the limit cycle regime [84], we reduced the feedback strength by increasing the kinetic constant KI to 45, so that a stable steady-state exists (all other parameter values remain as in the original paper, unless stated). Table 3.3 shows the covariance matrix, C, for the fluctuations around the steady- state for a particular choice of reaction parameters and compartment volume. The values obtained from the LNA are compared with those from simulation. In the steady-state regime, quasi-cycles are observed for a range of parameter choices. Temporal oscillations in the concentration of MAPKK are shown in Fig- ure 3.4, along with the corresponding power spectra, obtained from both theory and simulation. The fluctuations for this species are quite small relative to the 3.6. A MAPK SIGNALLING SYSTEM 67

Figure 3.3: Reaction Scheme for the Kholodenko model. The dashed lines indicate species modifying the reaction rates.

Chemical Reaction Reaction Kinetics

[MKKK] MKKK −→ MKKK-P; MAPK-PP V1 n ((1+([MAP K-PP ]/KI ) )(K1+[MKKK]))

MKKK-P −→ MKKK V2[MKKK-P ]/(K2 + [MKKK-P ]) MKK −→ MKK-P; MKKK-P k3[MKKK-P ][MKK]/(K3 + [MKK]) MKK-P −→ MKK-PP; MKKK-P k4[MKKK-P ][MKK-P ]/(K4 + [MKK-P ]) MKK-PP −→ MKK-P V5[MKK-PP ]/(K5 + [MKK-PP ]) MKK-P −→ MKK V6[MKK-P ]/(K6 + [MKK-P ]) MAPK −→ MAPK-P; MKK-PP k7[MKK-PP ][MAP K]/(K7 + [MAP K]) MAPK-P −→ MAPK-PP; MKK-PP k8[MKK − PP ][MAP K-P ]/(K8 + [MAP K-P ]) MAPK-PP −→ MAPK-P V9[MAP K-PP ]/(K9 + [MAP K-PP ]) MAPK-P −→ MAPK V10[MAP K-P ]/(K10 + [MAP K-P ]) Table 3.2: Reaction Scheme for the Kholodenko MAPK signalling model. Chem- ical reactions and the reaction kinetics. The positioning of species after a semi- colon in a chemical reaction indicates that the rate of this reaction is modified by the presence of this species, as explicitly shown in the reaction kinetics. The reaction parameters chosen by Kholodenko are as follows: V1 = 2.5, n = 1, KI = 9, K1 = 10, V2 = 0.25, K2 = 8, k3 = 0.025, K3 = 15, k4 = 0.025, K4 = 15, V5 = 0.75, K5 = 15, V6 = 0.75, K6 = 15, k7 = 0.025, K7 = 15, k8 = 0.025, K8 = 15, V9 = 0.5, K9 = 15, V10 = 0.5, K10 = 15. See [84] for their units. 68 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI

Particle Number

2200

2150

2100

2050

2000

1950

1900 t 1000 2000 3000 4000 5000 6000 7000 P Ω

H L 120 000

100 000

80 000

60 000

40 000

20 000

Ω 0.005 0.010 0.015 0.020

Figure 3.4: Oscillations observed in MAPKK, showed by time series generated from the Gillespie algorithm, and the power spectra of the oscillations. The power spectra was averaged over 5 × 103 time series.

fixed point value, adding corrections of less than 10%. It is interesting to see how the magnitude of fluctuations changes with the re- action parameters. As an example, we used our LNA implementation in COPASI in combination with a parameter scan to investigate how changes in the reaction parameter V2 affect the variance of MKKK (MAPK kinase kinase). Values of V2 were scanned over a certain range and the LNA calculated for each value of V2. The results of this scan are shown in Figure 3.5. Notably, there is a local maxi- 2 mum of 987.7 particles . at V2 = 0.32, due to the feedback present in the system.

Then, as V2 increases beyond 0.4, the variance increases steeply. This is because the system undergoes a Hopf bifurcation at V2 = 0.446. In the model, this param- eter corresponds to the Vmax of phospho-MKKK dephosphorylation and so refers to the activity of MKKK-phosphatase. Presently, the effect of phosphatases are 3.7. SUMMARY 69 not often studied in signalling models, as they are not very well characterised at a molecular level. The results here, however, suggest that they have a strong influence on both the system’s intrinsic fluctuations and overall dynamics. We also wanted to investigate the conditions under which fluctuations in chem- ical species at different positions of the signalling cascade become correlated. To achieve this, we used the parameter scan task in COPASI to observe how the covariance of the fluctuations of MKKK and MKK-P, which are on different lev- els in the cascade, varied with the reaction parameters. Some interesting results were found when reaction parameters V2 and k4 were varied, as the magnitude and sign of the covariance varied significantly over a relatively small area of pa- rameter space. The outcome of this task is shown in Figure 3.6. It shows that there is one local maximum for the covariance within the prescribed parameter space. If these maxima (or minima) are sought in a high-dimensional parameter space, it will generally be easier to find them using an optimisation algorithm, rather than exploring parameter space comprehensively. Using the evolutionary programming algorithm [76], which took 199 seconds to run, 4004 steady state and LNA evaluations were carried out for different pairs of parameter values. A local maximum of the covariance was found with a value of 4035 particles2 for

V2 = 0.3226 and k4 = 0.0166, as suggested by Figure 3.6. The algorithm found what would ultimately be its best solution to the optimisation problem after 540 iterations. Figure 3.6 gives a good illustration of the fitness landscapes these optimisation algorithms must explore. As stressed earlier, no one algorithm is preferable for an general fitness landscape. Table 3.4 compares the performance of some of the algorithms provided by COPASI for this particular problem. The gradient-based algorithms, such as the Levenberg-Marquardt algorithm [85, 86], compare well with the more sophisticated options, as the objective function has only one local maxima within the parameter region chosen.

3.7 Summary

In this chapter, we have shown how to calculate fluctuations around the steady state for a general reaction system, using the linear noise approximation. This was used to automate the analysis, so that the process could be written as a task in the software package COPASI. A model of a signal transduction network due to Kholodenko was used to illustrate how the calculation of the covariance 70 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI

1400 var(MKKK)

1200 ] 2 1000

800

600 variance [particles

400

200 0.25 0.3 0.35 0.4

v2 [nM/s]

Figure 3.5: The variance of MKKK fluctuations around the steady state (in units of particle numbers squared) and the reaction parameter V2. The compartment volume was 10−14l.

MAPKK-P MAPK-P MAPKKK MAPKK MAPK MAPKK-P 4025 (4038) 675 (690) 3259 (3272) 1411 (1427) 24 (25) MAPK-P 675 (690) 1187 (1201) 621 (630) 314 (324) 4 (5) MAPKKK 3259 (3272) 621 (630) 4603 (4582) 1734 (1747) 22 (23) MAPKK 1411 (1427) 314 (324) 1734 (1747) 1062 (1089) 11 (12) MAPK 24 (25) 4 (5) 22 (23) 11 (12) 19 (19)

Table 3.3: Covariance matrix C for the MAPK signalling model. Results obtained from the LNA are compared with those from 5 × 103 simulations. Reactions parameters are as in Table 3.2, expect for KI = 45, V2 = 0.33 and V9 = 3.8. The compartment volume was 5 × 10−14l. 3.7. SUMMARY 71

Figure 3.6: Two-dimensional parameter scan of the covariance of the fluctuations of MKKK and MKK-P around their fixed point values, in units of particle num- bers squared. The reaction parameter V2 was varied between 0.22 and 0.41 and −14 the parameter k4 between 0.015 and 0.035. The compartment volume was 10 l.

Algorithm Optimal Value Found Functions Evaluated Particle Swarm 4037.11 100000 (1691) Simulated Annealing 4037.11 101341 (564) Levenberg - Marquardt 4037.06 175 (119) Genetic Algorithm 4033.43 3755 (1705) Evolution Strategy 4037.11 23904 (2060) Evolutionary Programming 4037.11 4004 (540) Nelder-Mead 4037.11 212 (89)

Table 3.4: A comparison of the optimisation algorithms in COPASI. Each al- gorithm was required to find the maximum value of a covariance over a two- dimensional parameter space. The total number of parameter choices evaluated is shown, along with the number of choices required to find what would turn out to be the optimal solution. The parameter space and the corresponding values of the covariance are given in Figure 3.6. 72 CHAPTER 3. FLUCTUATION ANALYSIS AND COPASI matrix for the fluctuations in COPASI can be employed in tandem with other tasks in COPASI i.e. parameter scan and optimisation. The ability to make a quick, accurate estimate of the noise becomes very useful in these circumstances. Without an analytical approximation, the fluctuations would have to be quanti- fied using stochastic simulation, which can be very slow for realistic biochemical reaction systems. Optimisation or parameter scanning require repeated calcu- lations of the covariances for different parameter choices, which compounds the problem. COPASI can also perform an optimisation task, subject to one or more constraints. This could be useful if, for instance, the number of particles of a particular species is low, and the user does not want the fixed point value to become too close to zero, where the LNA can lose accuracy. In the optimisation tasks considered here, the objective function has been taken to be a covariance, but this need not be the case. An objective function can be formed from any combination of quantities which COPASI is able to calculate. For example, in certain cases, the (the variance divided by the mean value) may be a more useful quantity. In the work presented so far, we have only considered reaction systems involv- ing one compartment. This is a limitation when studied biochemical models, as many models contain multiple compartments, e.g. two or more cells, or an extra- cellular reservoir, where molecules can be exchanged between the compartments. In the next chapter, we extend our analysis so that multi-compartment models may be studied using our methods. Chapter 4

Multi-compartment LNA

In Chapter 3 we outlined a formalism for quantifying the fluctuations around a steady-state for a general biochemical reaction model. This framework was used to automate the calculation of a covariance matrix, characterising these fluctuations, within the software package COPASI. However, our procedure was limited in an important way: we assumed that all the species were contained in a single compartment e.g. cell. In the Michaelis-Menten reaction system, molecules were allowed to enter or leave the cell. However, once a molecule had left the cell, it was lost from the system. Similarly, molecules entering the cell were assumed to come from a reservoir, rather than a neighbouring compartment. In many biochemical processes, molecules can be exchanged between cells directly, or cells can exchange molecules with a shared extracellular compartment. Furthermore, eukaryotic cells contain subcellular compartments which are called organelles. Examples of these include the nucleus, lysosomes, and mitochondria [5]. These compartments are extremely small: the volume of a typical mitochondria is less than 1fl[49]. Therefore, in this chapter we will extend the formalism of the LNA, so that biochemical models with multiple compartments can be studied in this way, and that the whole routine can again be automated in COPASI. We will use some very simple models to illustrate these ideas, before applying them to more realistic models of biochemical processes. One of these is a model of metabolic signalling in muscle, the other is a model of yeast glycolysis.

73 74 CHAPTER 4. MULTI-COMPARTMENT LNA

4.1 A two-compartment system

To illustrate how the method applies to systems with more than one compartment we shall look at a simple system involving the diffusion of molecules between two compartments. Molecules in the first compartment, of volume V1, are labelled D, molecules in the second compartment, volume V2, are labelled E. We use n1 to denote the number of D molecules, and n2 for the number of E molecules. The reactions and their transition rates are:

D −→ E,T1(n + ν1|n) = kn1,

V1 E −→ D,T2(n + ν2|n) = k n2, (4.1) V2 where ν is the stoichiometry matrix, as defined in Section 3.2. There is an extra complication to consider when defining the transition rates for a system with many compartments. That is, how do the transition rates of compartment- crossing reactions scale with the volumes of the compartments? The rates of transitions from one compartment to another one should be dependent on prop- erties of the contact surface between the compartments. These properties, such as the size of the diffusion area or the number of channels, will generally not scale with any compartment volume in practice. Rather, this relation will be different for different reaction systems, and will depend on the geometry of the overall sys- tem, and on how the compartments are connected. The transition rates for both reactions considered here are proportional to the concentration of the substrate, and scale with the volume of the first compartment. This could correspond to a situation where the first compartment was located inside the second, so that the contact surface only depends on the inner cell, as illustrated in Figure 4.1. It can be argued that these transition rates should scale with the surface area of this compartment, rather than the volume. However, for simplicity, in this work we shall just consider transition rates that scale linearly with volume in this way. We shall discuss this issue in more detail in Section 4.6. There is a conserved quantity present in the system: the total number of molecules, u, is unaffected by each reaction i.e. n1 + n2 = u. Therefore, the system has only one independent variable, so we can describe the system by either the number of D or E molecules present. Here we choose to use D. We rewrite the transition rate for the second reaction as T2 = k(V1/V2)(u−n1), using 4.1. A TWO-COMPARTMENT SYSTEM 75

Compartment 1

Compartment 2

Figure 4.1: The proposed geometry of the two-compartment model. Molecules are able to diffuse through the cell wall of Compartment 1.

the conservation equation. Using the step operators, E, the contributions from each reaction to the master equation, given in Eq. (3.7), may be rewritten as

+1 Reaction 1: (E − 1)[kn1P (n, t)],

−1 V1 Reaction 2: (E − 1)[k (u − n1)P (n, t)]. V2 (4.2)

We make the usual change of variables for the van Kampen expansion,

n ξ 1 = x + √ . (4.3) V1 V1

After this change of variables, the step operator may be rewritten as

1 ∂ 1 ∂2 ±1 √ E = 1 ± + 2 .... (4.4) V1 ∂ξ 2V1 ∂ξ

The master equation is then rewritten in terms of the new variables and terms of the same order are equated. The first order terms recover the macroscopic description of the system,

dx = k[α2(˜u − x) − x], dt (4.5)

2 where α = V1/V2 andu ˜ = limV1→∞u/V1. Thus, the volume ratio becomes an 76 CHAPTER 4. MULTI-COMPARTMENT LNA extra parameter of the problem. The macroscopic equation has a stable fixed point, which we shall denote by x = x∗. The second order terms define a Fokker- Planck equation for Π(ξ, t), the probability distribution of the random variable ξ, ∂Π ∂ 1 ∂2Π = − (AξΠ) + B . (4.6) ∂t ∂ξ 2 ∂ξ2 This equation is fully characterized by A and B, which are functions of the macroscopic concentration of the chemical species, x(t). In this example it is found that they are

A = −k(α2 + 1), B = kx + kα2(˜u − x). (4.7)

The general form of A and B will be discussed in Section 4.2. For models with K > 1, the Fokker-Planck equation will be multi-variate, and A and B will be square matrices of dimension K. The K × K matrix of covariances satisfies the following Lyapunov equation

AΞ + ΞAT + B = 0, (4.8) which was introduced in Chapter 2. For our one-dimensional example, Ξ is just the variance of the random variable ξ i.e. Ξ = h(ξ − hξi)(ξ − hξi)i. As discussed in our previous work, we are more interested in the variance in terms of molecular numbers n1, rather than ξ. That is, the desired form of the variance, C, in this example is C = h(n1 − hn1i)(n1 − hn1i)i. Using the calculation performed in

Section 3.2, it is possible to link the two quantities by C = V1Ξ. The general form of the relationship between C and Ξ in a model with many compartments will be discussed in the next section. For one-dimensional problems, Eq. (4.8) can be trivially rearranged to solve for Ξ, the variance of the fluctuations around the steady state. We can compare this result from theory with that obtained via simulation, using the Gillespie algorithm, as implemented in COPASI. The value for the variance found by using the LNA was 124.0, in units of particle numbers squared, compared to 124.0 (both to one decimal place) obtained from simulation, averaged over 2000 time series. The numerical values used for the parameters in this example are as follows: V1 = 0.1, V2 = 1 (both in picolitres), 4.2. THE GENERAL CASE 77 k = 0.2s−1 and the total number of molecules in the system was 1500. The values for the volumes chosen could correspond to a eukaryotic cell and its nucleus.

4.2 The general case

As mentioned in the previous section, the A and B matrices are functions of the macroscopic concentrations. But the transition rates that define the master equation depend on the molecular populations, which are discrete quantities. So, to define the general form of A and B we must define a macroscopic quantity

Fµ(x) that corresponds to the transition rate Tµ(n + νµ|n) for reaction µ. To (i) (i) do this, we simply make the replacement ni/V → xi, since limV (i)→∞hnii/V (i) becomes equal to xi in the thermodynamic limit. Here, V denotes the volume of the compartment within which species i is located. In this chapter, all volumes written with bracketed superscripts will be defined in this way: volumes with subscripts define the actual volume of a compartment. The general forms of A and B will look very similar to those given in Chapter 3, except for extra volume factors, which are picked up from the step operators, as shown in Eq. (4.4). In one compartment models these volume factors cancel. We write the step operator associated with species i as

1 ∂ 1 ∂2 ±1 = 1 ± √ + ± .... (4.9) Ei (i) (i) 2 V ∂ξi 2V ∂ξi

We proceed as we did in Section 3.2. That is, we collect terms of order V 0, which form a Fokker-Planck equation. When doing this, it is important to note that the transition rates and, therefore, the Fµ(x) are of order V . The general form for the matrices A and B for multi-compartment models are found to be

M X νiµ ∂Fµ(x) Aij(x) = √ i, j = 1,...,K, (i) (j) ∂x µ=1 V V j M X νiµνjµ Bij(x) = √ Fµ(x) i, j = 1,...,K. (i) (j) µ=1 V V (4.10)

We note at this point that, to perform the expansion in this way, we have as- sumed that the volumes of the compartments are of comparable order. This is 78 CHAPTER 4. MULTI-COMPARTMENT LNA

√ √ because terms of order e.g. (1/ V1) and (1/ V2) are equated when performing the expansion of the master equation.

It is convenient to define the above matrices in terms of the Fµ(x), since these are quantities which COPASI already calculates. In COPASI they are called ‘particle fluxes’, and are calculated as part of the Steady-State task. Once the covariance matrix Ξ is found, we need a way of converting it to C, the covariance matrix in units of particle numbers. We will use a simple example to show how these two matrices are related. We consider an example with two species. The number of molecules of species 1 is n1 and the number of molecules of species 2 n2. Species 1 is located within a compartment of volume V1, species 2 within a compartment of volume V2. The change of variables given in Eq. (4.3) leads to

p p n1 = V1x1 + V1ξ1, n2 = V2x2 + V2ξ2. (4.11)

The quantity C12 is defined as

C12 = h(n1 − hn1i)(n2 − hn2i)i. (4.12)

From Eq. (4.11) we can express the expectation values of n1 and n2 to be

hn1i = V1x1, hn2i = V2x2. (4.13) √ Using the three equations above, we find that C = V V hξ ξ i, which is equal √ ij 1 2 i j to V1V2Ξij, where Ξ is the covariance matrix for the random variable ξ. In general, Cij, the covariance between species i and j in terms of particle numbers can be written as √ (i) (j) Cij = V V Ξij. (4.14)

This relationship can also be expressed as a matrix transformation. For a general system, with K species, the relationship is √ √ √ C = SΞS,S = diag( V (1), V (2),..., V (K)). (4.15)

As mentioned in Chapter 2, the entries of the matrix A are found to be identical to the entries of the Jacobian of the macroscopic system evaluated at the fixed point, for one compartment models. If the general form of the ODEs

is taken to be dxi/dt = gi(x), then Aij = ∂gi/∂xj. This is not the case in 4.2. THE GENERAL CASE 79 models with many compartments. However, the correct form of A for multi- compartment models can be found by applying a similarity transformation to the Jacobian, which from now on we will call A˜. The relationship is found to be √ √ √ A = SAS˜ −1,S = diag( V (1), V (2),..., V (K)). (4.16)

As COPASI is able to calculate the Jacobian, it would be useful to utilise this calculation, instead of performing an extra one. However, there is an additional complication here, since COPASI uses ODEs for the expectation of the number of molecules of the chemical species, rather than their concentrations. The Jacobian calculated from the former, which we will denote Aˆ, is not identical to the one calculated from the latter, A˜. However, these two matrices are similar, and can be easily related to each other:

ˆ ˜ −1 (1) (2) (K) A = S2AS2 ,S2 = diag(V ,V ,...,V ). (4.17)

We will leave the details of this until the next section. We can use this relation, along with the relation linking A˜ with A, the desired matrix, to find the relation between Aˆ and A. It is Aˆ = SAS−1. We want to use these matrices to define a Lyapunov equation which can be solved to yield the covariances. One course of action would be to convert Aˆ to the desired form, A, and use this to solve Eq. (4.8) for Ξ and convert to C, the covariances in terms of particle numbers, using Eq. (4.15). It is more straightforward, however, to define an equivalent Lyapunov equation, involving Aˆ and C,

ACˆ + CAˆT + Bˆ = 0, (4.18) where, in order for the above equation to be equivalent to Eq. (4.8), we make the identification Bˆ = SBS. This is the same as defining B in Eq. (4.10) without the square rooted volume factors in the denominator. This is the form displayed in COPASI. COPASI then solves this equation for C, using the Bartels-Stewart algorithm [61]. Here, we will demonstrate the equivalence of the two Lyapunov equations. We start by rewriting Eq. (4.8) as

AS−1SΞ + ΞSS−1AT + B = 0, (4.19) 80 CHAPTER 4. MULTI-COMPARTMENT LNA

Reaction Particle Flux 1 G6 −→ G1 F1(x) = k1x6V3 2 2 2G1 −→ 3G2 + 4G3 F2(x) = k2x1V1 3 G2 −→; G5 F3(x) = k3x2x5V2 4 G3 −→ F4(x) = k4x3V3 5 G4 −→ G5 F5(x) = k5x4V1 6 G5 −→ G4 F6(x) = k6x5V1 7 G1 −→ G6 F7(x) = k7x1V3 8 −→ G1 F8(x) = k8V1

Table 4.1: The three-compartment reaction system where the matrix S has its usual form. Next we pre- and post-multiply by S:

SAS−1SΞS + SΞSS−1AT S + SBS = 0. (4.20)

Using Eq. (4.15), Eq. (4.31) and defining Bˆ = SBS we can reduce the above equation to ACˆ + CAˆT + Bˆ = 0, (4.21) which recovers (4.18), the form of the Lyapunov equation solved by COPASI. We end this section with another example, this time with three compartments instead of two, to illustrate the natural extension of the method to an arbitrary number of compartments. This three compartment model has six species. Species

G1 and G4 are located in compartment of volume V1, G2, G3 and G5 in compart- ment of volume V2 and G6 in compartment with volume V3. The number of G1 molecules is denoted by n1, the number of G2 molecules by n2, and so on. The reactions are described in Table 4.1, along with their particle fluxes.

In reaction 3, the rate of degradation of species G2 is now modified by the concentration of G5 within the compartment. We notice that species G4 and

G5 are linearly dependent on each other, and we choose to eliminate G5. The matrices A and B have the following form:

 2  −4k2x1 − k7β 0 0 0 k1β  2   6αk2x1 −k3Γ 0 α k3x2 0    A =  8αk2x1 0 −k4 0 0  , (4.22)    2   0 0 0 −(α k6 + k5) 0  k7β 0 0 0 −k1 4.3. RELATION BETWEEN Aˆ AND A˜ 81

G1 G2 G3 G4 G6 G1 229 (229) -48 (-48) -55 (-55) 0 (0) -17 (-17) G2 -48 (-48) 841 (841) 592 (592) 34 (34) -87 (-88) G3 -55 (-55) 592 (592) 846 (846) 0 (0) -68 (-68) G4 0 (0) 34 (34) 0 (0) 89 (89) 0 (0) G6 -17 (-17) -87 (-88) -68 (-68) 0 (0) 2396 (2393)

Table 4.2: Covariances of the fluctuations for the three-compartment system. The covariances of the fluctuations around the steady state, in units of particle numbers squared. Results obtained from the LNA are compared with those from simulation (in brackets), via the Gillespie algorithm. 10000 time series were generated.

B =

 2 2 2 2  β (k1x6 + k7x1) + 4k2x1 + k8 −6k2αx1 −8k2αx1 0 −β(k1x6 + k7x1)  2 2 2 2 2   −6k2αx1 9k2α x1 + k3x2Γ 12k2α x1 0 0   2 2 2 2 2   −8k2αx 12k2α x 16k2α x + k4x3 0 0  ,  1 1 1     0 0 0 k5x4 + k6Γ 0  −k1x6β − k7x1β 0 0 0 k1x6 + k7x1 (4.23)

2 2 2 where Γ =u ˜ − α x4, α = V1/V2 and β = V3/V1. The species are ordered G1,

G2, G3, G4 and G6: species G5 having been eliminated due to conservation. The concentrations x are evaluated at the fixed point values x = x∗. COPASI solves the Lyapunov equation for the five dimensional ‘reduced’ system, then recovers the full six dimensional covariance matrix using the conservation relation. Table 4.2 shows the numerical values obtained for the covariances. The param- −1 −1 −1 −1 −1 eter values were chosen to be k1 = 0.1s , k2 = 0.02pl# s , k3 = 0.1pl# s , −1 −1 −1 −1 −1 −1 k4 = 2s , k5 = 0.1s , k6 = 0.1s , k7 = 0.1s and k8 = 50#pl s , where # denotes particle numbers. The compartment volumes are V1 = 8, V2 = 100 and

V3 = 72 (in picolitres). Results for species G5 are not given, but may be found from conservation considerations. The conservation relation for this system is n4 + n5 = 1300 molecules.

4.3 Relation between Aˆ and A˜

In Section 4.2 we discussed the relationship between matrix A, calculated from the van Kampen expansion, and Aˆ, the form of the Jacobian calculated by COPASI, using ODEs for the expectation of the particle numbers. Equation (4.17) gives 82 CHAPTER 4. MULTI-COMPARTMENT LNA the relationship between Aˆ and A˜, the Jacobian calculated by using ODEs for the concentrations of the chemical species. We will highlight the differences between these quantities with a simple example, just considering reactions 1 and 7 in Table 4.1. These reactions make the following contributions to the macroscopic rate equations,

dx1 V3 = (k1x6 − k7x1), dt V1 dx 6 = (k x − k x ). dt 7 1 1 6 (4.24)

From the above equations, the entries of A˜ are found to be

˜ 2 ˜ 2 A11 = −β k7, A16 = β k1, ˜ ˜ A61 = k7, A66 = −k1, (4.25)

2 where β = V3/V1. We can rewrite Eq. (4.24) in terms of the hn1i and hn6i

dhn1i V3 = k1hn6i − k7 hn1i, dt V1 dhn6i V3 = k7 hn1i − k1hn6i. dt V1 (4.26)

For the equations above, the elements of Aˆ are

ˆ 2 ˆ A11 = −β k7, A16 = k1, ˆ 2 ˆ A61 = β k1, A66 = −k1. (4.27)

By considering how the xi vary compared to the hnii it is possible to find the following relation by inspection

V (i) Aˆ = A˜ . (4.28) ij V (j) ij 4.4. A MODEL OF METABOLISM IN CARDIAC MUSCLE 83

Again, this may be written as a matrix transformation. For a general system,

ˆ ˜ −1 (1) (2) (K) A = S2AS2 ,S2 = diag(V ,V ,...,V ). (4.29)

As mentioned in the previous section, a similar relation may be found between A and A˜: √ √ A = SAS˜ −1,S = diag( V (1), V (2),...,V (K)). (4.30)

Putting all this together, we find a relation between A and Aˆ

Aˆ = SAS−1, (4.31) as stated in the previous section. For convenience, we collect all the relations between matrices used in this chapter and state them below. √ √ C = SΞS,S = diag( V (1), V (2),...,V (K)), A = SAS˜ −1, (4.32) ˆ ˜ −1 (1) (2) (K) A = S2AS2 ,S2 = diag(V ,V ,...,V ), Aˆ = SAS−1.

4.4 A model of metabolism in cardiac muscle

In this section we look at a model due to Kongas and van Beek [87] that stud- ies the role of creatine kinase in the heart by examining energy metabolism in cardiac muscle. It is a two-compartment system, with a cytoplasm and an inter- membrane space. We shall label the volumes of these compartments as Vc and

Vm respectively. In the original article the model is studied deterministically. Here, we will reduce the model in size, without changing the volume ratio used in the article, to study it stochastically. The model is described schematically in Figure 1 of [87]. It involves 5 chemical species, ADP , AT P , creatine (Cr), phosphocreatine (P Cr) and inorganic phosphate (P i). All of these metabolites are present in both compartments, so we have 10 variables. We use a subscript, m, to denote the species in compartment Vm. The reactions are as follows:

ADPm + P im AT Pm

AT Pm + Crm ADPm + P Crm AT P + Cr P Cr + ADP 84 CHAPTER 4. MULTI-COMPARTMENT LNA

Figure 4.2: Reaction Scheme of the creatine kinase model. The scheme was created using CellDesigner [88].

AT P → ADP + P i

P im P i

Crm Cr

ADPm ADP

P Crm P Cr

AT Pm AT P. (4.33)

A schematic depiction of the reactions is shown in Figure 4.2. A SBML (Systems Biology Markup Language [78]) implementation of the model is available from the BioModels Database [81, 89]. This file, which can be downloaded and then read by COPASI, corrects an error in the rate equations given in [87]. As for our previous models, not all of the 10 species can vary independently, as there are three conservation relations present. Hence, COPASI reduces the dimensionality 4.4. A MODEL OF METABOLISM IN CARDIAC MUSCLE 85

Compartment Species Steady-State Value ADP 17490 AT P 10 Vc Cr 5859 P Cr 13 P i 16501 ADPm 11819 AT Pm 5681 Vm Crm 5857 P Crm 15 P im 13984

Table 4.3: Description of the creatine kinase model at the steady-state. Steady- state values are given in terms of particle numbers, and are rounded to the nearest integer.

of the model to 7. The conservation relations, in terms of molecule numbers, are

AT Pm + ADPm + AT P + ADP = C1,

Pm − ADPm + P Cr + P im − ADP + P Crm = C2,

Cr + Crm + P Cr + P Crm = C3, (4.34)

where C1, C2 and C3 are integer constants. We chose the values C1 = 35000,C2 =

1204,C3 = 11743 and did not alter the reaction parameters given in [87]. The values for C1, C2 and C3 were chosen to speed up the numerical simulations of the system (which are extremely slow) by reducing the overall number of molecules in the system, whilst ensuring that each species did not get too close to the zero particle boundary. The deterministic model of the system has a unique steady- state, which is described in Table 4.3. We calculated the covariances of the fluctuations around the steady state using the LNA Task in COPASI. Table 4.4 shows these results and compares the values with those obtained from numerical simulation. Only results for 7 of the 10 species present in the model are shown: the covariances for the other species may be obtained from the conservation equations. 86 CHAPTER 4. MULTI-COMPARTMENT LNA

ADPm AT P Crm P Cr P im AT Pm Cr 8407 -4 1 0 1512 -2252 -1 ADP m (8406) (-4) (1) (0) (1511) (-2252) (-1) -4 10 0 0 -5 -2 0 AT P (-4) (10) (0) (0) (-5) (-2) (0) 1 0 2936 -6 8 -1 -2922 Cr m (1) (0) (2936) (-6) (7) (-1) (-2922) 0 0 -6 13 -7 0 -6 P Cr (0) (0) (-6) (13) (-7) (0) (-6) 1512 -5 8 -7 9467 -2732 7 P i m (1511) (-5) (7) (-7) (9469) (-2731) (7) -2252 -2 -1 0 -2732 4846 0 AT P m (-2252) (-2) (-1) (0) (-2731) (4846) (1) -1 0 -2922 -6 7 0 2936 Cr (-1) (0) (-2922) (-6) (7) (1) (2936)

Table 4.4: Covariances for the creatine kinase model. Values for the covariances of the fluctuations around the steady state, in units of particle numbers squared. Results obtained from the LNA are compared with those found from numeri- cal simulation (in brackets), via the Gillespie algorithm. 4000 time series were generated.

4.5 A yeast glycolysis model

We will now look at a model of yeast glycolysis, due to Wolf & Heinrich [33].

In this chapter, we look at a model with one cell, of volume Ω1, which is cou-

pled with an extracellular compartment, of volume Ω2. In their paper, Wolf & Heinrich begin with this model, then progress to a more complicated model with multiple cells, which are coupled via the extracellular compartment. This is done to study the oscillatory behaviour of the metabolites, across the population of cells. We will look at this situation in the next chapter. The reaction scheme to be considered here is given in Table 4.5, and is shown schematically in Fig- ure 4.3. Diffusion reactions between the cell and the extracellular compartment were included as it is believed that, in the many-cell case, it is this indirect cou- pling between cells that causes oscillations in different cells to synchronise, as observed in experiments [31]. We label the chemical concentrations in the fol- ex lowing way: (S1,S2,S3,S4,A3,N2,S4 ,A2,N1) = (x1, x2, x3, x4, x5, x6, x7, x8, x9). There are two conservation relations in the reaction system, the concentration of ADP plus the concentration of ATP is fixed, as is the concentration of NAD+ 4.5. A YEAST GLYCOLYSIS MODEL 87

Figure 4.3: Reaction Scheme of the yeast glycolysis model. The scheme was created using CellDesigner [88]. 88 CHAPTER 4. MULTI-COMPARTMENT LNA

Reaction Particle Flux

k1 k1x1x5 1. S1 + 2A3 −→ 2S2 + 2A2 F1(x) = q Ω1 1+(x5/K1) k2 2. S2 + N1 −→ S3 + N2 F2(x) = k2x2x9Ω1 k3 3. S3 + 2A2 −→ S4 + 2A3 F3(x) = k3x3x8Ω1 k4 4. S4 + N2 −→ N1 F4(x) = k4x4x6Ω1 k5 5. A3 −→ A2 F5(x) = k5x5Ω1 k6 6. N2 + S2 −→ N1 F6(x) = k6x2x6Ω1 J0 7. −→ S1 F7(x) = J0Ω1 κ ex 8. S4 −→ S4 F8(x) = κx4Ω1 ex κ 9. S4 −→ S4 F9(x) = κx7Ω1 ex k 10. S4 −→ F10(x) = kx7Ω2

Table 4.5: Reaction scheme for the yeast glycolysis model. The variables are defined as follows: S1: glucose, S2: pool of glyceraldehyde 3-phosphate and di- hydroxyacetone phosphate, S3: 1,3-bisphosphoglycerate, S4: pool of pyruvate ex and acetaldehyde within the cell, S4 : pool of pyruvate and acetaldehyde in the + extra-cellular region, A2: ADP, A3: ATP, N1: NAD , N2: NADH. Some of these variables are combinations of different molecular species. This has been done to simplify the model.

plus the concentration of NADH. We write these relations as: x5 + x8 = u1 and x6+x9 = u2. The reaction parameters used by Wolf & Heinrich to study the single −1 −1 −1 cell model with a stable fixed point are: J0 = 3mM min , k1 = 100mM min , −1 −1 −1 −1 −1 −1 −1 k2 = 6mM min , k3 = 16mM min , k4 = 100mM min , k5 = 1.28min , −1 −1 −1 −1 k6 = 12mM min , k = 1.3min , κ = 13min , q = 4, K1 = 0.52mM, u1 = 4mM, u2 = 1mM, ϕ = Ω1/Ω2 = 0.1. We retained these parameter choices, with two exceptions. The stable fixed point for the system described above is very close to the Hopf bifurcation point, which causes difficulties when simulat- ing the system stochastically. Therefore, we reduced the parameter governing ex −1 the degradation of S4 , setting k = 0.5min . We also changed the value of J0 to 2mM min−1. With these parameter choices (as was the case for the original parameters), the Jacobian evaluated at the fixed point has a pair of complex eigenvalues, and quasi-cycles are visible, as shown in Figure 4.4 for species A3. However, we note that the fluctuations are very small, relative to the fixed point values. The covariance matrix for the fluctuations around the fixed point is shown in Table 4.6, in units of 1000 particle numbers squared. Because the species are present in high concentrations, the number of molecules is very large. To aid 4.5. A YEAST GLYCOLYSIS MODEL 89

A3 Concentration 1.92

1.90

1.88

1.86

1.84 Time Min 5 10 15 20 H L

Figure 4.4: Stochastic oscillations in the yeast glycolysis model. The concentra- tion of species A3 is plotted against time in minutes. numerical simulation, which was very slow, we used very small compartment vol- umes, much smaller than the typical size of a yeast cell. We conclude that in this system little is gained by studying the reaction system stochastically (compared to studying it deterministically), as the fluctuations are tiny, relative to the fixed point values. However, it is useful to be able to determine this analytically, via the LNA, as the stochastic simulations were extremely slow for this system. 90 CHAPTER 4. MULTI-COMPARTMENT LNA ex 4 S 1.6 (1.6) 0.26 (0.3) -1.1 (-1.1) 70.1 (70.2) 97.4 (97.0) -36.4 (-36.4) -15.6 (-15.6) 2 N 5.1 (5.2) 2.2 (2.2) 8.5 (8.5) -1.1 (-1.1) 27.4 (27.4) -12.5 (-12.5) -31.7 (-31.7) 3 A 70.1 (70.2) 31.7 (-31.7) -24.2 (-24.2) -15.7 (-15.7) 130.6 (129.8) 608.9 (608.7) -379.4 (-379.3) 4 S 2.5 (2.5) 2.2 (2.2) 0.26 (0.3) 29.7 (29.6) 15.7 (15.6) -15.7(-15.7) -25.1 (-25.2) , as predicted by the LNA, is 8506. The compartment volumes were 2 N 3 S 8.4 (8.4) 2.5 (2.5) 5.2 (5.2) 1.6 (1.6) 27.1 (27.1) -18.0 (-18.0) -24.2 (-24.2) l. Results calculated using the LNA (as implemented in COPASI) are compared 15 − 2 S = 10 2 27.1 (27.1) 29.7 (29.6) 27.4 (27.4) the variance of species -15.6 (-15.6) 350.8 (350.7) -148.9 (-148.8) -379.4 (-379.3) e.g. l and Ω 16 − 1 S = 10 1 -18.0 (-18.0) -25.1 (-25.2) -12.5 (-12.5) -36.4 (-36.4) 491.6 (489.8) 130.6 (129.8) -148.9 (-148.8) 2 3 1 2 3 4 ex 4 S S S S A N S chosen to be Ω Table 4.6: Covariance matrix forparticle the fluctuations numbers around squared the steady-state for the yeast glycolysis model, in units of a thousand with those fromgenerated. numerical simulation. Using the Gillespie algorithm, 1000 simulations, each of length 800 minutes, were 4.6. DISCUSSION 91

4.6 Discussion

In this chapter, we have extended our existing analysis, so that we can use the LNA to quantify fluctuations in biochemical models containing multiple compart- ments. This significantly increasing the range of models to which our analysis can be applied. It is possible to transform a multi-compartment model to a one compartment model, but this must be done ‘by hand’ and is time consuming and prone to error. Therefore, it is desirable to study the model in its original form. Some issues remain which have not yet been resolved. Reaction kinetics for cross-compartment reactions need to be carefully considered. For convenience, in many published models cross-compartmental reactions are often defined so that they scale with volume in the same way as reactions in the ‘bulk’ of the cell. This is the approach we have taken here. Many biochemical models in the literature are described deterministically, for instance, the model studied here due to Wolf & Heinrich. In a stochastic treatment of such a model, it is generally true that this bulk scaling must be used, in order to recover the correct ODEs in the thermodynamic limit. However, in reality this scaling will not always be realistic, as mentioned in Section 4.1. The LNA will work with any particular scaling that is universal for the system. However, a consistent methodology for analysing fluctuations in systems with a mixture of scalings present (e.g. some reactions scale with a cell’s surface area and some with its volume) has not yet been proposed. Proposals for further work in this area will be discussed in Chapter 6. It should also be noted that, when describing the reaction systems, no mention of geometry of the compartments has been made. Any geometrical considerations should be reflected in the specific form of the transition rates chosen by the user. As we have already touched upon, the LNA for multi-compartment models is technically only valid when the compartment √ √ volumes are of comparable order, as terms of order e.g. (1/ V1) and (1/ V2) are equated when performing the expansion of the master equation. However, no significant anomalies between theory and simulation were found due to this issue. It would be interesting to investigate this problem in more detail. In the next chapter, we will use this multi-compartment analysis to see how stochastic oscillations can become synchronised across a population of cells. Chapter 5

Synchronisation of stochastic oscillators

Oscillatory behaviour is observed in a great variety of biological systems, over a wide range of time periods [22, 24]. These oscillations have been modelled extensively, using both deterministic and stochastic frameworks [29, 90, 91]. Most of this modelling is done on a single cell level. However, many processes display coherent behaviour over a population of cells. This requires that the oscillations in the individual cells must influence each other. One reason for this is that random fluctuations, intrinsic to these systems, can introduce random phase lags, so a population of isolated cells, demonstrating oscillatory dynamics, will not remain synchronised over time, even if they are synchronised initially. Therefore, for coherent, collective behaviour (such as intercellular signalling), some form of communication between the cells is necessary. Before describing these ideas any further, we should say what we mean by synchronisation. Pikovsky, Rosemblum & Kurths define synchronisation as “an adjustment of rhythms of oscillating objects due to their weak interaction”[92]. Using some reaction systems as examples, we will show the forms that the interaction can take, and what ‘weak’ means in this context. The role and mechanisms of synchronisation have been much discussed in the study of the dynamics of glycolytic metabolism. Richard et. al. have performed experiments in yeast cells, and observed oscillations both in individual cells and populations of cells [30, 31]. The authors conclude that the coupling of the cells via an extracellular metabolite (proposed to be acetaldehyde) causes the oscillations of the individual cells to synchronise. In [31], two cell populations,

92 93 originally oscillating 180◦ out of phase, became synchronised upon coupling, and a time to synchronise was obtained.

Several mathematical models have attempted to capture and explain this be- haviour. One of these is due to Wolf & Heinrich, devised in [32] and refined in [33]. They begin by describing a deterministic, one-cell model, where the cell is connected to an extra-cellular compartment, which exhibits either a stable fixed point or a stable limit cycle, depending on the values of the reaction parameters. This two-compartment system was introduced in Chapter 4, and was studied stochastically. Wolf & Heinrich then extended the model to n-cells, where the cells are coupled by exchanging molecules with a shared extra-cellular compart- ment. In [33], the authors show that in a two-cell model, synchronisation with either zero-phase or non-zero phase lag is observed, depending on the reaction parameters chosen. Following the experiments by Richard et. al., the authors also mixed two populations of cells which, before mixing, were internally syn- chronised but out of phase with the other population. Wolf & Heinrich found that, once mixed, the two populations did indeed synchronise, although this took considerably longer to achieve than it did in the experiments. This may suggest that the form of the coupling, or the chemical species chosen to be responsible for the coupling, may not be the correct one. In many models it has been found that the form of the coupling chosen in the model can significantly alter the dynamics observed, as found in [93, 94].

Another system in which synchronisation is often studied involves oscillations in the concentration of adenosine 3’, 5’-cyclic monophosphate (cAMP ) in the amoeba Dictyostelium discoideum [24, 95]. The life cycle of the amoeba com- mences with the germination of a spore, from which the cell emerges [24]. The cells feed on bacteria in the forest soil, and grow and divide until the food source becomes scarce. During the onset of starvation, the cells alter their behaviour to aggregate, and become able to communicate via cAMP signals, in the form of oscillations. In the early stages of aggregation, certain cells begin to oscillate, attracting neighbouring cells. The neighbouring cells do not exhibit sustained oscillations themselves, but they are able to pass on the signal. This is known as ‘relay response’, and it has been modelled extensively[96, 24]. The stimulus from the oscillating cell perturbs the neighbouring cell away from its fixed point value. Before relaxing back to the fixed point, the neighbouring cell produces a large, single pulse of cAMP , which can then be passed on to other cells. In 94 CHAPTER 5. SYNCHRONISATION OF STOCHASTIC OSCILLATORS

this way, populations of the Dictyostelium cells form large aggregates, which can be as large as 105 cells. This aggregate has a slug-like form, which is capable of movement. The cells in the ‘slug’ then differentiate, forming spores atop a stalk. This structure is known as a fruiting body. The culmination of the life cycle is the dispersion of spores from the top of the fruiting body, leading back to the start of the life cycle [24]. One reason why this is much studied is that the amoeba is able to switch from uni- to multi-cellular behaviour over the course of the life cycle. Another is that many of the components involved in the cAMP signalling have corresponding components in mammalian pathways, hence the desire to under- stand this system in more detail. We shall analyse a model of cAMP oscillations later in this chapter. The vast majority of models of such systems found in the literature are de- scribed by systems of ODEs, which show limit cycle behaviour. These display oscillations of only a single frequency. Describing the synchrony of stochastic oscillations around a fixed point is slightly more complicated, as these exhibit oscillations over a continuous range of frequencies. Therefore, the mathemati- cal quantities describing this synchronisation (or otherwise) will be a function of frequency, ω. These will be derived from the off-diagonal elements of the power

spectral density matrix, Pij(ω). In Chapter 2 we showed how to use the van Kam- pen expansion to find approximate expressions for the noise present in the system, which, by the van Kampen ansatz, is assumed to be centred on its macroscopic or average behaviour. In a K-dimensional system, the random variable describing

the fluctuations is ξi, i = 1,...,K, which is described by a Langevin equation

dξi X = A ξ + η , (5.1) dt ij j i j

0 0 where η is the noise term with correlator hηi(t)ηj(t )i = 2Bijδ(t−t ). As we found in Chapter 2, the power spectral density matrix (PSDM) is

K K ˜ ˜∗ X X −1 † −1 Pij(ω) = hξi(ω)ξj (ω)i = Φil (ω)Blm(Φ )mj(ω), (5.2) l=1 m=1

where Φij ≡ −Aij − iωδij. So far in this thesis, we have only considered the diagonal elements of this matrix, which are the power spectra. In the context of epidemics, Rozhnova et. al. used the off-diagonal elements to show how stochastic oscillations can become synchronised [97]. They constructed an epidemiological 95 model of a network of cities, where a disease can be transported between cities due to movement of infected commuters. We will use a similar approach but instead of cities and people, we will work with cells and molecules. Unlike the diagonal elements, the off-diagonal elements of the PSDM will in general be complex. Often, these elements are normalised, by using the relevant power spectra. This quantity is known as the complex coherence function (CCF)[70, 98, 97],

Pjk(ω) Cjk(ω) = p . (5.3) Pjj(ω)Pkk(ω)

As this is a complex function, it can be expressed in terms of a magnitude and a phase |Pjk(ω)| |Cjk(ω)| = p , (5.4) Pjj(ω)Pkk(ω)

Im(Cjk(ω)) Im(Pjk(ω)) φjk(ω) = arctan[ ] = arctan[ ]. (5.5) Re(Cjk(ω)) Re(Pjk(ω)) The magnitude of the CCF tells us the similarity between two signals, as a func- tion of ω. The phase of the CCF tells us the phase lag between two signals [98] for each value of the frequency ω. For example, if the two signals are in phase the CCF will be a real function. These ideas will be illustrated using the reaction system introduced in Section 2.7 which was used to show sustained stochastic oscillations. In this chapter we will be interested primarily in looking at the re- lationship between oscillations in neighbouring cells, but we will start with this simple, one-compartment model, and look at the relationship between oscillations of two species in the model. The reaction scheme was described in Eq. (2.63). For convenience, we re-state the form of the matrices A and B for this system,

  −g2y2 + g3 − 4g4y1 g1 − g2y1 0   A =  −g2y2 −g2y1 − g1 2g5  , g3 0 −g5

 2  4g4y1 + g3y1 + g2y1y2 + g1y2 g2y1y2 − g1y2 g3y1   B =  g2y1y2 − g1y2 g2y1y2 + 4g5y3 + g1y2 −2g5y3  , g3y1 −2g5y3 g3y1 + g5y3 + g6 (5.6) where the gi are the reaction rates and the vector of concentrations, (y1, y2, y3), are evaluated at the fixed point. We use these quantities to calculate the power 96 CHAPTER 5. SYNCHRONISATION OF STOCHASTIC OSCILLATORS

spectral density matrix in Eq. (5.2). The elements of this matrix may then used to

calculate complex coherence functions, such as C12(ω), which tells us the relation

between oscillations of species 1 and 2 (Y1 and Y2 in Eq. (2.63)). All the elements of the power spectral density matrix have the same denominator, as can be seen by inspecting Eq. (5.2). Therefore, these denominator cancel when constructing complex coherence functions, the form of which are given by Eq. (5.3). Because the diagonal elements of the PSDM are real, the denominator of the complex coherence functions will be real. In contrast, the numerator will generally be complex. In Section 2.7, we constructed a general form for the diagonal elements of the PSDM, which are written as fractions, with polynomials of ω2 on the numerator and denominator. For a system of S species (here S=3), the highest power present in the numerator is ω2(S−1) and in the denominator it is ω2S. The numerator for the off-diagonal elements of the PSDM will be a polynomial in iω i.e. odd powers of ω will also appear, and will have an imaginary coefficient. Again, the highest power of ω present in the numerator is ω2(S−1). We can write

the absolute value of a CCF (here we look at C12(ω)) as s 8 6 4 2 α4ω + α3ω + α2ω + α1ω + α0 |C12(ω)| = 8 6 4 2 , (5.7) β4ω + β3ω + β2ω + β1ω + β0

where the coefficients αi and βi (i = (0, 1, 2, 3, 4)) are made up of combination of entries of matrices A and B. Their full form is rather lengthy, and is displayed in Appendix C. This is typical of these functions, which are easier to understand when displayed graphically, as a function of ω. We can use the fact that the coefficients of the even powers of ω will be real, and the coefficients of the odd powers imaginary to construct the form of the phase-lag:

 3  γ1ω + γ3ω φ12(ω) = arctan 2 4 . (5.8) γ0 + γ2ω + γ4ω

Again, the coefficients γi where i = (0, 1, 2, 3, 4) are constructed from combination of entries of matrices A and B. We will now plot some of these quantities as a function of frequency. We use the same parameter values as in Section 2.7

and plot these quantities as functions of ω, looking at species Y2 and Y3, whose power spectra were given in Figure 2.5. The absolute value of the CCF for these two species is given in Figure 5.1. We also plot the CCF parametrically in Figure 5.2. From this figure we see that the CCF has an imaginary, as well as a 97

C Ω

È H LÈ 0.8

0.6

0.4

0.2

Ω 0.2 0.4 0.6 0.8

Figure 5.1: Plot of the absolute value of the CCF for species Y and Z. Theoretical results (orange line) are compared with those from 1000 numerical simulations (blue dots).

real component. From this we conclude that there is a phase-lag between these two species. This is shown in Figure 5.3. The phase spectrum can be difficult to interpret, as it has a different value for each frequency. When looking at these plots it is important to bear in mind that the stochastic oscillations being considered here are only significant over a certain frequency range, as displayed in the power spectra plot in Figure 2.5. Therefore, it is the phase lag over this range that is most important. We have highlighted this range in red for Figures 5.2 and 5.3. For completeness, we show a wider range of frequencies in all the plots in this chapter. This also highlights the fact that, despite the relatively low power associated with oscillations of frequencies that are outside of this range, very good agreement between theory and simulation is still found. We note at this stage that we have not used the word ‘synchronisation’ during this example. We will only use this term to describe oscillations in different cells, which will continue to oscillate if isolated from each other, but when coupled through an extracellular compartment will weakly influence each other. In this one cell example, the oscillations will stop if one species is removed, or its concentration becomes fixed [92]. 98 CHAPTER 5. SYNCHRONISATION OF STOCHASTIC OSCILLATORS

Im C Ω

@ H LD

0.8

0.6

0.4

0.2

Re C Ω -0.2 0.2 0.4 0.6 @ H LD Figure 5.2: Parametric plot of the CCF for species Y and Z. Theoretical results (orange line) are compared with those from 1000 numerical simulations (blue dots). Theoretical results for 0.06 ≤ ω ≤ 0.23 are highlighted in red, as this is the frequency range over which the oscillations are most prominent.

Φ Ω

2.5H L

2.0

1.5

1.0

0.5

Ω 0.2 0.4 0.6 0.8 1.0

Figure 5.3: Plot of the phase spectrum for species Y and Z. Theoretical results (orange line) are compared with those from 1000 numerical simulations (blue dots). Theoretical results for 0.06 ≤ ω ≤ 0.23 are highlighted in red, as this is the frequency range over which the oscillations are most prominent. 5.1. COLLECTIVE BEHAVIOUR IN DICTYOSTELIUM 99

5.1 Collective behaviour in Dictyostelium

The first reaction system we shall consider is a model of oscillations in the con- centration of adenosine 3’, 5’-cyclic monophosphate (cAMP ), which are observed within the amoeba Dictyostelium discoideum. The model is due to Kim et. al. and in their work was used to study the synchrony of stochastic oscillations in the concentration of cAMP across multiple cells via numerical simulation. Pre- viously, the pathway that produces these oscillations had been modelled using a set of non-linear ODEs, which exhibit limit cycle behaviour [24]. Starting from such a model, described in [99], Kim et. al. [34] show that considering stochastic oscillations, in addition to those in the limit cycle regime, increases the robust- ness of the oscillations. That is, oscillations (whether deterministic or stochastic in origin) are observed over a larger volume of parameter space than would be the case if only deterministic oscillations were considered. This is desirable, as any parameter values present in a mathematical model of a biochemical reaction system will have an associated uncertainty. In addition to this, because the phys- ical system involves an aggregation of cells, conditions (e.g. temperature) may vary from cell, so the reaction parameters may not be identical inside each cell. Because of this, Kim et. al. made small perturbations to the reaction parameters in different cells to see if the oscillations still synchronised. We start with the one-cell model, which consists of a cell and an extra-cellular environment. The cAMP within the cell is denoted by cAMP i, the cAMP in the extra-cellular environment is denoted by cAMP e. The other chemical species present in the model are adenylyl cylase (ACA), protein kinase (PKA), mitogen-activated pro- tein kinase (ERK2), the cAMP phosphodiesterase (RegA) and the ligand-bound cell receptor (CAR1). The reaction scheme is:

CAR1 −→k1 ACA + CAR1 ACA + PKA −→k2 PKA cAMP i −→k3 PKA + cAMP i PKA −→k4 ∅ CAR1 −→k5 ERK2 + CAR1 PKA + ERK2 −→k6 PKA ∅ −→k7 RegA ERK2 + RegA −→k8 ERK2 100 CHAPTER 5. SYNCHRONISATION OF STOCHASTIC OSCILLATORS

P Ω

H L

400

300

200

100

Ω 0.5 1.0 1.5 2.0

Figure 5.4: Theoretical power spectra for the internal (larger peak) and external cAMP . The peak is around ω = 0.83.

ACA −→k9 cAMP i + ACA RegA + cAMP i −→k10 RegA ACA −→k11 cAMP e + ACA cAMP e −→k12 ∅ cAMP e −→k13 CAR1 + cAMP e CAR1 −→k14 ∅. (5.9)

Some results from the one-cell model will be shown, before looking at the two cell model. The reaction parameters are chosen to be those which yielded the steady −1 −1 −1 state in Figure 1 in [34]. They are as follows: k1 = 2min , k2 = 0.882µM min , −1 −1 −1 −1 −1 k3 = 2.55min , k4 = 1.53min , k5 = 0.588min , k6 = 0.816µM min , k7 = −1 −1 −1 −1 −1 −1 1.02µMmin , k8 = 1.274µM min , k9 = 0.306min , k10 = 0.816µM min , −1 −1 −1 −1 k11 = 0.686min , k12 = 4.998min , k13 = 22.54min , k14 = 4.59min . In [34] the cell volume was chosen to be 3.672 × 10−14l. Figure 5.4 shows the power spectra for the internal and external cAMP . Figure 5.5 shows the magnitude of the CCF for these two species, showing a very strong correlation, especially in the frequency range within which the oscillations are significant. The CCF is displayed parametrically in Figure 5.6, and indicates a phase lag. In [34], the authors construct models of many cells, which are all coupled via the external cAMP . A diagram of such a model (with 3 cells), taken from the 5.1. COLLECTIVE BEHAVIOUR IN DICTYOSTELIUM 101

C56 Ω 1.0 H L

0.8

0.6

0.4

0.2

Ω 0.5 1.0 1.5 2.0

Figure 5.5: Magnitude for the (theoretical) CCF for the internal and external cAMP fluctuations.

Im C56 Ω

H H LL 0.8

0.6

0.4

0.2

Re C56 Ω 0.1 0.2 0.3 0.4 0.5 0.6 0.7 H H LL Figure 5.6: Parametric plot of the (theoretical) CCF for the internal and external cAMP fluctuations. Only results for 0.5 ≤ ω ≤ 1.5 are shown. 102 CHAPTER 5. SYNCHRONISATION OF STOCHASTIC OSCILLATORS

Figure 5.7: Diagram from Kim et. al.[34], illustrating the three-cell model. The reaction parameters in the diagram correspond to the reactions shown in Eq. 5.9, the superscript denoting the relevant cell.

paper, is shown in Figure 5.7. If the reaction parameters within each cell are identical, the oscillations in the two cells become synchronised with zero-phase lag. This effect can be easily seen by looking at the time series. This is shown for a three-cell model in Figure 5.8. Synchronisation with a non-zero phase lag can be found when the reaction parameters differ from cell to cell. Here this is illustrated with a two cell model, where the reaction parameters in one cell are those used for the single cell model, and the reaction parameters in the second cell are obtained by making small, random perturbations to the original parameters. Below the relation between the internal cAMP oscillations in each cell is examined, for one such parameter choice. Figure 5.9 shows the power spectra for these two species. Figures 5.10 & 5.11 show the relevant CCF for these two species, whilst Figure 5.12 displays the phase lag present. Although the reaction parameters are different in each cell, there remains a strong shared signal in the oscillations. Throughout this section so far, we have assumed that all of the cells have the same volume. We will now look at the consequences of relaxing this restriction, which could be due to natural variation in the cell size. We find that a phase lag is introduced when the cells are of different sizes. Given that, for this particular 5.1. COLLECTIVE BEHAVIOUR IN DICTYOSTELIUM 103

cAMPi 20 000

18 000

16 000 Cell 1

14 000 Cell 2

Cell 3 12 000

10 000

Time minutes 10 20 30 40 50 @ D Figure 5.8: Time series for cAMP i (in particle numbers) in different cells for a three-cell model, generated using the Gillespie algorithm. The three species are given different initial conditions, but within a short amount of time they oscillate in phase with each other around their common fixed point.

P Ω

H L

80

60

40

20

Ω 0.5 1.0 1.5 2.0

Figure 5.9: Theoretical power spectra for cAMP i in each cell of the two-cell model. The dashed line corresponds to the oscillations in the second cell, for −1 −1 −1 which the reaction parameters are: k1 = 2.3min , k2 = 0.96µM min , −1 −1 −1 −1 −1 k3 = 2.1min , k4 = 1.9min , k5 = 0.4min , k6 = 0.72µM min , k7 = −1 −1 −1 −1 −1 −1 0.7µMmin , k8 = 1.05µM min , k9 = 0.26min , k10 = 0.89µM min , −1 −1 −1 k11 = 0.46min , k13 = 15min , k14 = 5.8min . The parameter k12 was set to 9min−1. 104 CHAPTER 5. SYNCHRONISATION OF STOCHASTIC OSCILLATORS

C

0.8

0.6

0.4

0.2

Ω 0.5 1.0 1.5 2.0

Figure 5.10: The magnitude of the CCF for the cAMP i in each cell of the two cell model. The solid line is the theoretical result, the dots are from 6163 numerical simulations

Im C

H L 0.35

0.30

0.25

0.20

0.15

0.10

0.05

Re C 0.2 0.4 0.6 0.8 H L Figure 5.11: Parametric plot of the CCF for the cAMP i in each cell of the two-cell model. The solid line is the theoretical result, the dots are from 6163 numerical simulations. Results are shown for 0 ≤ ω ≤ 3. 5.1. COLLECTIVE BEHAVIOUR IN DICTYOSTELIUM 105

Φ Ω

H L 0.8

0.6

0.4

0.2

Ω 0.2 0.4 0.6 0.8 1.0 1.2

Figure 5.12: Phase lag for oscillations of cAMP i in each cell for the two-cell model. The solid line is the theoretical result, the dots are from 6163 numerical simulations. Over the frequency range where oscillations are significant, the phase lag remains fairly constant.

system, it is desirable to have in phase synchronisation, it is important to see how large these phase lags are for various differences in cell size. To do this, we again studied a two-cell model, in which we fixed the volume of one cell, at the volume used previously which we will call VI , and the extracellular region (fixed at 5VI ), and varied the volume of the other cell. The reaction parameters in each cell were identical and were chosen to be those used for the one cell model. The parameter −1 governing the rate of degradation of cAMP e, k12, was set to 12min . Figure 5.13 shows details of the CCF for the case where the second cell has volume 1.4VI . The parametric plot shows that the imaginary component of the CCF is much smaller than the real component: this means that the phase lag is very small.

We then increased the volume of the second cell to 2VI . Figure 5.14 shows that the imaginary part of the CCF is larger than before, indicating a more significant phase lag. The phase spectrum for this system is shown in Figure 5.15. At the frequency for which the oscillations are most significant (the frequency for which the power is greatest), the phase lag is about 0.2 radians, roughly 11◦. This shows that quite large differences in volume are required to introduce measurable phase lags: varying the cell size by a few percent does not have a significant effect. We also repeated this analysis whilst varying the volume of the extracellular region. However, this did not produce significantly different results from those presented here. 106 CHAPTER 5. SYNCHRONISATION OF STOCHASTIC OSCILLATORS

We end this section by commenting on the extension of these models to large aggregates of Dictyostelium cells. In reality, these aggregates can contain thou- sands of cells. Here the matrix-based approach described in this work becomes intractable. In addition to this, a large aggregate of cells in this system should be modelled spatially. In the small models considered here, diffusion is fast enough so that the effects of spatial distributions of the cAMP on the oscillations can be neglected [34]. Clearly, this won’t be the case in systems containing many more cells. Allowing for diffusion would also effect the phase lag across the pop- ulation of cells. The results obtained here for the two cell model suggest that in-phase synchronisation is robust in this system: relatively large perturbations to the parameters (the rate constants or cell volumes) have to be made before moderate phase lags can be measured. Both experimental and numerical work has found that phase lags can be introduced by perturbations made during the time-evolution of the system[100, 99]. A fully spatial model should capture the the spatial waves of cAMP observed in experiments [24, 95]. However, simpler models, such as those studied here, can be helpful for proposing (or, eliminating) possible mechanisms or forms for the synchronisation observed in these social amoeba.

5.2 Synchronisation of glycolytic oscillations

For the next reaction system, we return to the model due to Wolf & Heinrich [33], which was described and studied in Chapter 4, for the single cell case. In their paper, Wolf & Heinrich extend the model to include multiple cells, coupled via a shared extracellular environment, to see if the oscillations in the individual cells would synchronise across the population, as this behaviour has been observed in experiment. This was done in the limit cycle regime. However, for other param- eter choices, a stable fixed point exists and stochastic oscillations are observed about this fixed point. The cells are coupled in a slightly different way than in the previous example. In the model, species S4, which is a ‘lumped’ variable, representing pyruvate and acetaldehyde, is present in all model compartments, and can diffuse between compartments. It is this diffusion which allows the cells to communicate with each other, and was postulated to be the mechanism for synchronisation in [33]. Here, we look at a two-cell system in the case where the fixed point is stable, and stochastic oscillations are observed. We retain the 5.2. SYNCHRONISATION OF GLYCOLYTIC OSCILLATIONS 107

C 1.0

0.8

0.6

0.4

0.2

Ω 0.5 1.0 1.5 2.0 2.5

Im C Re C -0.02 @ D 0.2 0.4 0.6 0.8 1.0 -0.04 @ D -0.06 -0.08 -0.1

Figure 5.13: Some results for a two-cell model, where the cell volumes are not identical. Here, the cell volumes were chosen to be VI and 1.4VI , where VI is the cell volume chosen by Kim. et. al.. The reaction parameters were the same in each cell, and were those used for the one-cell model. Top: The magnitude of the complex coherence function for the cAMP fluctuations in each cell. Below: A parametric plot, showing the real and complex parts of the complex coher- ence function for the frequency range 0 ≤ ω ≤ 2. A small phase lag has been introduced, due to the different cell volumes. 108 CHAPTER 5. SYNCHRONISATION OF STOCHASTIC OSCILLATORS

C 1.0

0.8

0.6

0.4

0.2

Ω 0.5 1.0 1.5 2.0 2.5

Im C Re C @ D 0.2 0.4 0.6 0.8 -0.05 @ D -0.10

-0.15

-0.20

Figure 5.14: Some results for a two-cell model, where the cell volumes are not identical. Here, the cell volumes were chosen to be VI and 2VI , where VI is the cell volume chosen by Kim. et. al.. The reaction parameters were the same in each cell, and were those used for the one-cell model. Top: The magnitude of the complex coherence function for the cAMP fluctuations in each cell. Below: A parametric plot, showing the real and complex parts of the complex coherence function for the frequency range 0 ≤ ω ≤ 2. A phase lag has been introduced, due to the different cell volumes. 5.2. SYNCHRONISATION OF GLYCOLYTIC OSCILLATIONS 109

Φ Ω

@ D Ω 0.2 0.4 0.6 0.8 1.0

-0.05

-0.10

-0.15

-0.20

-0.25

Figure 5.15: The phase spectrum for the cAMP fluctuations in each cell in a two-cell model for the case when the cell volumes are not identical. Here, the cell volumes were chosen to be VI and 2VI . The power spectra for these oscillations peak at ω = 0.9. The phase lag for oscillations of this frequency is 0.2 radians.

same set of parameters as used in Chapter 4. Since the reaction parameters are the same in each cell, the power spectra associated with the fluctuations a particular species are the same in each cell. The power spectra for species A3 is shown in the left hand part of Figure 5.16. The form of the CCF for the

A3 fluctuations in each cell looks very different to those found in the example shown in the previous section. The results obtained for the cAMP oscillations showed that oscillations in one cell were strongly influenced by oscillations in the other, with very high values of the CCF in the frequency region where significant oscillations were observed, see e.g. Figure 5.10. In contrast, the corresponding

CCFs for this system are much lower. The CCF for species A3 is shown in the right hand part of Figure 5.16. Throughout the frequency range over which the oscillations are significant, the CCF is less than 0.01. This means that there is minimal influence between the oscillations in each cells. Similar results are found for all the other species in the model. Even when the coupling between the cells is greatly increased, the CCF remains small. We repeated the same analysis, but with the coupling constant, κ, increased twenty-fold. The power spectra and

CCF, again for the oscillations in species A3, are shown in Figure 5.17. The CCF reaches higher values than previously (≈ 0.02), but is still very low. So it ap- pears that, for this model, the communication between the cells is very weak. It is worth mentioning some of the analysis performed in the limit cycle regime by Wolf & Heinrich in the original paper. Isolated cells displaying oscillations were 110 CHAPTER 5. SYNCHRONISATION OF STOCHASTIC OSCILLATORS

Ω P C Ω 0.007 500@ D @ D

0.006 400 0.005

300 0.004

0.003 200

0.002 100 0.001

Ω Ω 0.05 0.10 0.15 0.20 0.25 0.05 0.10 0.15 0.20 0.25

Figure 5.16: Left: Theoretical estimate for the power spectrum of the fluctuations in species A3. The spectra for the oscillations in each cell are identical. Right: The magnitude of the complex coherence function for the fluctuations of species A3 in each cell. The common signal in the fluctuations is tiny. The model parameters were those used in Chapter 4. The volume of each cell was chosen to be 10−15l, and the volume of the extracellular region was 10−13l. mixed, initially out of phase, and were then observed to synchronise in phase, as observed in the experiments by Richard et. al.. However, the time taken to synchronise in the ODE model, was much longer than was found experimentally. This suggests that the coupling between the cells is very weak, indeed, too weak to completely capture the behaviour found experimentally. Or alternatively, the metabolite proposed to couple the cell, is not in fact the metabolite (or, at least, not the only metabolite) which couples the cells. This emphasises the point that the manner in which the coupling between cells is modelled can have important consequences for the dynamics observed.

5.3 Discussion

In this chapter we have shown how stochastic oscillations in biochemical models can synchronise, and how analytical expressions for this effect may be found. Comparisons between numerical results and theoretical predictions for the CCF (and related quantities) were found to be good, especially in frequency ranges where the oscillations were significant. There have been very few models made which study the synchronisation of stochastic oscillations across multiple cells, and those which have been made, such as [34], are examined using numerical simulation. We hope that the analytical work shown here could complement the numerical approach. In our study of the model of cAMP signalling, we 5.3. DISCUSSION 111

Ω P C Ω

@ D @ D 300 0.020

250 0.015 200

150 0.010

100 0.005 50

Ω Ω 0.05 0.10 0.15 0.20 0.25 0.05 0.10 0.15 0.20 0.25

Figure 5.17: Left: Theoretical estimate for the power spectrum of the fluctuations in species A3. The spectra for the oscillations in each cell are identical. Right: The magnitude of the complex coherence function for the fluctuations of species A3 in each cell. The model parameters were those used in Chapter 4, except for the diffusion constant, κ, which was increased twenty-fold. The common signal in the fluctuations is remains very small. The volume of each cell was chosen to be 10−15l, and the volume of the extracellular region was 10−13l. found that the stochastic oscillations across cells strongly influenced each other, and oscillations of species in different cells synchronised rapidly, as illustrated in Figure 5.8. In the case where the cells were of equal volume, and the reaction parameters across cells were the same, the oscillations synchronised in phase. Introducing changes to the reaction parameters in different cells, or making the cell volumes different to each other introduced a phase lag. These findings are similar to those reported in [97], in the context of epidemiological modelling. However, quite large changes had to be made before the lag was found to be significant. One slightly surprising result we found was that the size of the phase- lag was unaffected by varying the volume of the extracellular compartment. This is probably not realistic, and could be due to the well mixed assumption used throughout this thesis. For a larger extracellular compartment, communication between cells would be expected to take longer, which could affect the phase lag. A similar analysis for a two-cell model of glycolytic oscillations, due to Wolf & Heinrich, was also performed. Here, however, the correlations between oscillations in different cells were much weaker. This could explain why, in the limit cycle regime studied in [33], the two oscillating populations take so long to synchronise after mixing, but this conclusion is speculative at this stage. Chapter 6

Conclusions

Throughout this work, we have studied the stochastic dynamics of biochemical systems using a consistent methodology. We have assumed that these systems can be described by memory-less, or Markov processes, and their dynamics can be described by the master equation. In addition to this, we have assumed that compartments in our models are well-mixed, so that the models do not explicitly involve space. We have restricted ourselves to studying systems whose determin- istic dynamics go to a stable fixed point after the transient dynamics have disap- peared. Using this framework we have examined models of increasing complexity. We commenced our investigation in Chapter 2 with simple, one-compartment models which were used to introduce the mathematical and computational tools required in this thesis. In particular, we showed how to perform the celebrated van Kampen expansion, which was used to provide an approximate, analytical solution to the master equation. This was then used to obtain information about the fluctuations around the fixed point, such as their covariances, or their power spectra. In Chapter 3, we showed how the van Kampen expansion could be per- formed for a general model. This meant that the procedure could be automated, eliminating the need for the lengthy algebra encountered in Chapter 2. An imple- mentation of this fluctuation analysis was then written for the software package COPASI, making it available to a wider range of researchers who are interested in studying this type of model. In Chapter 4 we extended this analysis, so that mod- els with multiple compartments could be studied in this way. This facilitated the study of the synchronisation of stochastic oscillations across a population of cells, which was the subject of Chapter 5. This behaviour has not been widely consid- ered in biological applications. The only paper found in the literature tackled the

112 113 problem using numerical simulations only [34], so the analytical approach used here gives extra insight into this behaviour, and enables predictions to be made as to whether or not synchronisation of stochastic oscillations will be observed in a particular case. Furthermore, when trying to understand the mechanism by which the cells synchronise, the analytical approach can give more insight than a purely numerical approach. Before moving on to discuss possible extensions to this work, I will summarise my original contributions. For the COPASI implementation, I provided some preliminary code to guide the implementation within the software package. I also described how the appearance of conservation relations should be dealt with when calculating the LNA in COPASI. In order that multi-compartment models could be studied using the LNA in COPASI, I extended the LNA formalism, so that this type of model could be studied. I also used the LNA to describe how stochastic oscillations could synchronise in biochemical systems, something that had only been described numerically before. I used this techniques to study a model of stochastic oscillations due to Kim et. al., and found theoretical results which complemented their findings. Using the LNA, I could also quantify the values of phase lags in that reaction system, which was not done in their work. When making a model of a complicated system, one should always be aware of the simplifications and assumptions made when constructing the model, and in which cases these might not be so appropriate. The increasing sophistication of experimental techniques could shed light on the suitability of the modelling assumptions used. For example, how reasonable is the well-mixed assumption, and under which conditions is in inappropriate? If spatial correlations of species build up within a cell this could effectively alter the reaction rates, as the prob- abilities for particular types of molecules to collide become significantly different to those found under well-mixed conditions [3]. Throughout this thesis we have used numerical simulations to assess the ac- curacy of the LNA in particular situations. Generally, we find the agreement between theory and simulation to be good, as found when calculating quanti- ties like a covariance matrix, a power spectrum or a complex coherence function. However, there are cases where the approximation does not yield such good re- sults. This can be the case when the system-size is extremely small, or the fixed point value(s) of one or more species lie close to the zero-molecule line, which clearly cannot be crossed. In the latter case, the probability distribution for the 114 CHAPTER 6. CONCLUSIONS

fluctuations can become ‘squashed’ near the boundary, meaning that it is no longer Gaussian, as assumed by the van Kampen expansion. In the former case, it has been found that the fluctuations are no longer centred on the value given by the macroscopic equations [35]. Rather, they become shifted, by a small but measurable amount [50]. This shift can be measured by considering higher order corrections to the van Kampen expansion. This can be done in a systematic way, by retaining higher order terms in the expansion e.g. extra terms in Eq. (2.30). These additional terms become corrections to the moment equations, with the consequence that the first order moments, hξii, no longer have zero value. Other, similar methods to calculate these corrections have been proposed, such as the so-called ‘two-moment approximation’ (2MA) [101]. Although cases have been found where these higher corrections measurably improve the accuracy of the expansion, it is our experience that in most systems the fluctuations are well characterised by the approach considered here, where higher order terms are dis- carded. It is important, however, to appreciate the types of cases where this might not be the case.

In this work, we have focused on models with one stable fixed point. It is possible for a system to have multiple fixed points. These fixed points may be locally stable, but a sufficiently large fluctuation can result in a transit to another fixed point. When using the van Kampen expansion, it is important to be aware of this possibility. This is because the results obtained from this expansion do not allow for such ‘hops’ from one fixed point to another. This can be understood by recalling that the van Kampen expansion uses the solution to the macroscopic equations to help quantify the fluctuations around the macroscopic behaviour. In the macroscopic regime, once the system reaches a fixed point which is locally stable, it will never leave the fixed point. This is why the van Kampen expansion does not take the system’s other fixed points into account. It is possible to study such multi-stable systems using the Kramers-Moyal expansion. This method does not involve any such linearisation, and so knowledge of multiple attractors (if present) is retained.

As we stated in Chapter 3, one of the benefits of incorporating our fluctuation analysis in COPASI is COPASI’s ability to read in SMBL files. This file type is widely used when describing models of biological processes, and many models in the literature have been written up as SBML files. Many of these models were originally studied deterministically, but it is now straightforward for us to load a 115 description of these models into COPASI and quickly estimate the magnitude of the fluctuations in the model. This informs us how appropriate the macroscopic formalism is for this model. However, there are issues to be aware of when doing this. When we perform this procedure, we are converting a model from its macroscopic form to a mesoscopic one, and forms of the transition rates must be inferred. This is in contrast to the approach used here, where we start with a microscopic description of the reactions using transition rates, and then find the macroscopic description by taking the thermodynamic limit. Doing the reverse of this is straightforward for elementary reactions, but not so for non elementary ones. By a non elementary reaction we mean a reaction which is included in the model in the place of several reactions, which are not shown, and describes the net result of these reactions. An example of this is the Michaelis-Menten enzyme reaction which was introduced in Chapter 3. We used all the intermediate steps, such as the substrate forming a complex with the enzyme, but in deterministic models these reactions are frequently represented as a single net reaction (here it would be S + E −→ P + E), and an effective reaction rate calculated[72]. There is much debate about the interpretation of these effective reaction rates in a stochastic model [102, 103, 104]. We take the view that, where possible, such reactions should be reduced to several elementary reactions and if this is not possible, the effective reaction rate may offer a reasonable description of the reaction in the stochastic model, but should be viewed cautiously. Constructing a more rigorous treatment of these reactions in stochastic models would be a valuable contribution, and presents an interesting opportunity for further work.

In Chapter 4 we encountered difficulty in how to characterise the transition rates for cross-compartment reactions, in particular how they should scale with volume. For models with only one compartment, all transition rates are consid- ered to scale linearly with the compartment volume. This seems to be the most physically realistic choice, although in fact any universal scaling can be used, by rescaling time, as performed in e.g. Ref. [29]. In Chapter 4 we argued that, for cross-compartment reactions, the transition rates should scale with the contact area between the compartments. For the case depicted in Figure 4.1, this scales 2/3 with V1 . However, the way in which the molecules move between compart- ments may not be linked to diffusion area, but rather by a discrete number of channels, so the reaction rates will not necessarily scale with the volume of any of the model compartments. In this thesis, for simplicity, we have assumed that all 116 CHAPTER 6. CONCLUSIONS

cross-compartment reactions scale linearly with volume, so that all reactions scale with volume in the same way. A model containing a mixture of these scalings is problematic for two reasons in particular. When performing the van Kampen expansion, a whole mixture of terms now appear and it is unclear how to collect these together in a systematic way. In addition to this, it is uncertain how the macroscopic description of the system should be obtained. In this work we have taken the infinite volume limit in order to obtain the macroscopic description of the system. This is straightforward for models with a universal scaling, as we found in e.g. the example in Section 2.2. However, when this is carried out for, say, a model with a mixture of volume and surface area scalings, the reactions in the bulk dominate, and contributions from the other reactions disappear. To consider the impact of this, we must think about why the macroscopic picture of the system is useful. It is not because one is interested in studying cells with extremely large volume, but because one believes that the molecular populations are sufficiently large that the fluctuations in the populations are not significant, and the molecular concentrations can be considered as continuous variables. It may be the case that the multi-compartment models of the type described here should only be considered valid for one set of compartment volumes. If one wishes to study the model for a different set of volumes, the reaction rates should be recalculated from first principles e.g. by considering the diffusion area, or the number of channels in the ‘new’ model. Then the macroscopic description of the system should be obtained by considering the numerical values of the the fluxes of the reactions, rather than considering how these reactions scale in general with volume. However, this idea is speculative at this stage. Currently, the vast majority of multi-compartment models in the biochemical literature are stud- ied deterministically, and the stochastic analogue of the model must be inferred. Hopefully, as stochastic modelling becomes more widespread, these problems will be studied in greater detail.

In addition to looking at fluctuations around a fixed point, it is possible to quantify fluctuations around a limit cycle [105, 106]. One can also write down a chemical reaction scheme for which the macroscopic system moves on a chaotic attractor [26]. This is different from the limit cycle or fixed point case, as in those cases there is an isolated deterministic trajectory, about which the fluctuations are distributed. Quite the opposite is true for a chaotic attractor, which contains an infinite number of equally viable trajectories. Fox and Keizer attempted a 117 van Kampen-like expansion for such a system to discover that the covariances, found as the solution to a time-dependent Lyapunov equation, fluctuate wildly and quickly become unfeasibly large [107, 56]. They then abandon this in favour of a Kramers-Moyal approach. Here the fluctuations remain large, but do not ‘blow up’. This is because the Kramers-Moyal approach retains knowledge of the whole attractor, whereas the van Kampen expansion linearises the system around a single trajectory. This is clearly not a sensible approach, as random fluctua- tions will continually induce the system to switch to other nearby trajectories. From these results, Fox & Keizer question the viability of using the macroscopic equations to represent an underlying microscopic process, as the uncertainties can become as large as the actual variables. This view is not universally shared, as others have argued that the chaotic attractor described by the ODEs still rep- resents the most probable behaviour of the system, as supported by numerical simulations [108]. It could be interesting to look at how values of quantities like the positive Lyapunov exponent, the indicator of chaos, are affected by random fluctuations, how this changes with system-size, and whether one can move from looking at the behaviour of particular systems to making more general statements about microscopic models exhibiting this type of behaviour. In summary, stochastic effects in biochemical systems have been shown to be significant in a wide range of processes. In this thesis we have used a methodol- ogy to characterise the fluctuations present in models with a stable fixed point. Despite the increasing computational power available to scientists, it can still be computationally expensive to simulate realistic biochemical reaction systems. Es- pecially if one wishes to study the behaviour of the model in a detailed way, e.g. by exploring parameter space. This makes approximate, analytical solutions of the dynamics an attractive option. We hope that the automation of the method used here will encourage the study of these systems using a stochastic, rather than a deterministic, formalism. Synchronisation of cellular oscillations is impli- cated in tasks such as signalling between cells. In the literature, the standard method for modelling these systems is deterministic, and involves constructing limit cycle oscillators, which are then coupled. The theoretical work shown here, which compares well with numerical simulation, demonstrates that limit cycles are not required for synchronisation to be observed. As argued by Kim et. al. [34], one consequence of this is that synchronisation of oscillations can occur over a greater parameter range than that predicted deterministically. Bibliography

[1] W. M. Becker, L. J. Kleinsmith, and J. Hardin. The world of the cell. Benjamin Cummings, San Francisco, fourth edition, 2000.

[2] D. T. Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comp. Phys., 22:403–434, 1976.

[3] R. Grima and S. Schnell. Modelling reaction kinetics inside cells. Essays Biochem., 45:41–56, 2008.

[4] U. Kummer, B. Krajnc, J. Pahle, A. K. Green, C. J. Dixon, and M. Marhl. Transition from stochastic to deterministic behavior in calcium oscillations. Biophys. J., 89(3):1603–1611, 2005.

[5] H. J. Fromm and M. S. Hargrove. Essentials of Biochemistry. Springer- Verlag, Berlin, 2012.

[6] B. Munsky, G. Neuert, and A. van Oudenaarden. Using gene expression noise to understand gene regulation. Science, 336:183–187, 2012.

[7] D. Zenklusen, D. R. Larson, and R. H. Singer. Single RNA counting re- veals alternative modes of gene expression in yeast. Nat. Struct Mol. Biol., 15:1263, 2008.

[8] M. B. Elowitz, A. J. Levine, E. D. Siggia, and P. S. Swain. Stochastic gene expression in a single cell. Science, 297:1183–1186, 2002.

[9] E. M. Ozbudak, M. Thattai, I. Kurtser, A. D. Grossman, and A. van Oude- naarden. Regulation of noise in the expression of a single gene. Nature Genetics, 31:69–73, 2002.

118 BIBLIOGRAPHY 119

[10] J. M. Pedraza and A. van Oudenaarden. Noise propagation in gene net- works. Science, 307:1965–1969, 2005.

[11] A. Raj and A. van Oudenaarden. Single-molecule approaches to stochastic gene expression. Annu. Rev. Biophys., 38:255–270, 2009.

[12] O. G. Berg, J. Paulsson, and M. Ehrenberg. Fluctuations and quality of control in biological cells: zero-order ultrasensitivity reinvestigated. Bio- phys, J, 79:1228–1236, 2000.

[13] H. Kitano. Biological robustness. Nat. Rev. Genet., 5(11):826–837, 2004.

[14] H. H. McAdams and A. Arkin. It’s a noisy business! Genetic regulation at the nanomolar scale. Trends Genet., 15(2):65–69, 1999.

[15] N. Maheshri and Erin K. O’Shea. Living with noisy genes: How cells function reliably with inherent variability in gene expression. Annu. Rev. Biophys. Biomol. Struct., 36:413–434, 2007.

[16] C. V. Rao, D. W. Wolf, and A. P. Arkin. Control, exploitation and tolerance of intracellular noise. Nature, 420:231–237, 2002.

[17] A. Eldar and M. B. Elowitz. Functional roles for noise in genetic circuits. Nature, 467:167–173, 2010.

[18] A. Arkin, J. Ross, and H. M. McAdams. Stochastic kinetic analysis of de- velopmental pathway bifurcation in phage λ-infected Escherichia coli cells. Genetics, 149(4):1633–1648, 1998.

[19] R. Losick and C. Desplan. Stochasticity and cell fate. Science, 320:65–68, 2008.

[20] A. Raj and A. van Oudenaarden. Nature, nurture, or chance: Stochastic gene expression and its consequences. Cell, 135:216–226, 2008.

[21] M. Perc and M. Marhl. Noise enhances robustness of intracellular Ca2+ oscillations. Physics Letters A, 316:304–310, 2003.

[22] B. Hess and A. Boiteux. Oscillatory phenomena in biochemistry. Ann. Rev. Biochem., 40:237–258, 1971. 120 BIBLIOGRAPHY

[23] A. Goldbeter and S. R. Caplan. Oscillatory enzymes. Annu. Rev. Biophys. Bioeng., 5:449, 1976.

[24] A. Goldbeter. Biochemical oscillations and cellular rhythms: the molecu- lar bases of periodic and chaotic behaviour. Cambridge University Press, Cambridge, 1996.

[25] B. Novak and J. J. Tyson. Design principles of biochemical oscillators. Nat. Rev. Mol. Cell Bio., 9(12):981–991, 2008.

[26] S. H. Strogatz. Nonlinear Dynamics and Chaos. Perseus Books, Cambridge, MA, 1994.

[27] J. M. G. Vilar, H. Y. Kueh, N. Barkai, and S. Leibler. Mechanisms of noise-resistance in genetic oscillators. Proc. Natl. Acad. Sci., 99(9):5988– 5992, 2002.

[28] A. J. McKane and T. J. Newman. Predator-prey cycles from resonant amplification of demographic stochasticity. Phys. Rev. Lett., 94(21):218102, 2005.

[29] A. J. McKane, J. D. Nagy, T. J. Newman, and M. O. Stefani. Amplified biochemical oscillations in cellular systems. J. Stat. Phys., 128(1/2):165– 191, 2007.

[30] P. Richard, B. Teusink, M. B. Hemker, K van Dam, and H. V. Westerhoff. Sustained oscillations in free-energy state and hexose phosphates in yeast. Yeast, 12:731–740, 1996.

[31] P. Richard B. M. Bakker, B. Teusink, K van Dam, and H. V. Westerhoff. Acetaldehyde mediates the synchronization of sustained glycolytic oscilla- tions in populations of yeast cells. Eur. J. Biochem, 235:238–241, 1996.

[32] J. Wolf and R. Heinrich. Dynamics of two-component biochemical systems in interacting cells; synchronization and desynchronization of oscillations and multiple steady states. BioSystems, 43:1–24, 1997.

[33] J. Wolf and R. Heinrich. Effect of cellular interaction on glycolytic os- cillations in yeast: a theoretical investigation. Biochem. J., 345:321–334, 2000. BIBLIOGRAPHY 121

[34] J. Kim, P. Heslop-Harrison, I. Postelthwaite, and D. G. Bates. Stochastic noise and synchronisation during Dictyostelium aggregation make cAMP oscillations robust. PloS Comput. Biol., 3(11):2190–2198, 2007.

[35] N. G. van Kampen. Stochastic Processes in Physics and Chemistry. Else- vier, Amsterdam, third edition, 2007.

[36] L. E. Reichl. A modern course in statistical physics. John Wiley and Sons, New York, second edition, 1994.

[37] M. Delbr¨uck. Statistical fluctuations in autocatytic reactions. J. Chem. Phys., 8:120, 1940.

[38] J. Pahle. Biochemical simulations: stochastic, approximate stochastic and hybrid approaches. Briefings in Bioinformatics, 10(1):53–64, 2009.

[39] C. W. Gardiner. Handbook of Stochastic Methods. Springer-Verlag, Berlin, 2004.

[40] J. Elf and M. Ehrenberg. Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. Genome Research, 13:2475– 2484, 2003.

[41] R. Tomioka, H. Kimura, T. J. Kobayashi, and K. Aihara. Multivariate analysis of noise in genetic regulatory networks. J. Theor. Biol., 229:501– 521, 2004.

[42] N. G. van Kampen. A power series expansion of the master equation. Canadian Journal of Physics, 39(4):551–567, 1961.

[43] A. J. Black and A. J. McKane. Stochastic formulation of ecological models and their applications. Trends in Ecology & Evolution, 27(6):337–345, 2012.

[44] R. A. Blythe and A. J. McKane. Stochastic models of evolution in genetics, ecology and linguistics. J. Stat. Mech., page P07018, 2007.

[45] A. J. Black, A. J. McKane, A. Nunes, and A. Parisi. Stochastic fluctuations in the susceptible-infective-recovered model with distributed infectious pe- riods. Phys. Rev. E, 80:021922, 2009. 122 BIBLIOGRAPHY

[46] A. J. Bladon, T. Galla, and A. J. McKane. Evolutionary dynamics, intrinsic noise, and cycles of cooperation. Phys. Rev. E, 81:066122, 2010.

[47] COPASI. http://www.copasi.org.

[48] S. Hoops, S. Sahle, R. Gauges, C. Lee, J. Pahle, N. Simus, M. Singhal, L. Xu, P. Mendes, and U. Kummer. COPASI - a COmplex PAthway SIm- ulator. Bioinformatics, 22(24):3067–3074, 2006.

[49] P. Thomas, H. Matuschek, and R. Grima. Intrinsic noise analyzer: A software package for the exploration of stochastic biochemical kinetics using the system size expansion. Plos One, 7(6):e38518, 2012.

[50] R. Grima. An effective rate equation approach to reaction kinetics in small volumes: Theory and application to biochemical reactions in nonequilib- rium steady-state conditions. J. Chem. Phys., 133:035101, 2010.

[51] D. T. Gillespie. Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem., 58:35–55, 2007.

[52] M. A. Gibson and J. Bruck. Efficient exact stochastic simulation of chem- ical systems with many species and many channels. J. Phys. Chem. A, 104:1876–1889, 2000.

[53] D. T. Gillespie. Approximate accelerated stochastic simulation of chemi- cally reacting systems. J. Chem. Phys., 115(4):1716, 2001.

[54] D. T. Gillespie and L. R. Petzold. Improved leap-size selection for acceler- ated stochastic simulation. J. Chem. Phys., 119(16):8229, 2003.

[55] Y. Cao, D. T. Gillespie, and L. R. Petzold. Efficient step size selection for the tau-leaping simulation method. J. Chem. Phys., 124:044109, 2006.

[56] R. F. Fox and J. Keizer. Amplification of intrinsic fluctuations by chaotic dynamics in physical systems. Physical Review A, 43(4):1709–1720, 1991.

[57] H. Risken. The Fokker-Planck equation. Springer, Berlin, second edition, 1989.

[58] R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press, Cambridge, 1991. BIBLIOGRAPHY 123

[59] G. Hewer and C. Kenney. The sensitivity of the stable Lyapunov equation. SIAM J. Control and Optimization, 26(2):321–344, 1988.

[60] A. J. Laub. Matrix analysis for scientists & engineers. Society for Industrial and Applied Mathematics, Philadelphia, 2005.

[61] R. H. Bartels and G. W. Stewart. Solution of the matrix equation AX + XB = C [F4]. Comm. ACM, 15(9):820–826, 1972.

[62] S. J. Hammarling. Numerical solution of the stable, non-negative definite Lyapunov equation. IMA Journal of Numerical Analysis, 2:303–323, 1982.

[63] Wolfram research inc., 2010. Mathematica Edition: Version 8.0.

[64] John W. Eaton. GNU Octave Manual. Network Theory Limited, 2002. Available from http://www.octave.org.

[65] K. Jacobs. Stochastic Processes for Physicists. Cambridge University Press, Cambridge, 2010.

[66] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1985.

[67] Richard P. Boland. The stochastic dynamics of oscillatory systems and ecosystems. PhD thesis, University of Manchester, UK, 2009.

[68] Somdeb Ghose and R. Adhiraki. Endogenous quasicycles and stochastic coherence in a closed endemic model. Phys. Rev. E, 82:021913, 2010.

[69] R. J. Field and R. M. Noyes. Oscillations in chemical systems. IV. limit cycle behaviour in a model of a real chemical reaction. J. Chem. Phys., 60:1877–1884, 1974.

[70] M. B. Priestley. Spectral analysis and time series. Academic Press, London, 1981.

[71] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999. 124 BIBLIOGRAPHY

[72] L. Michaelis and M. L. Menten. Die kinetik der invertinwirkung. Biochem. Z., 49:333–369, 1913.

[73] C. Reder. Metabolic control theory: a structural approach. J. Theor. Biol., 135(2):175–201, 1988.

[74] RR Vallabhajosyula, V Chickarmane, and HM Sauro. Conservation analysis of large biochemical networks. Bioinformatics, 22(3):346–353, 2006.

[75] J. Kennedy and R. Eberhart. Particle swarm optimization. In Proceedings of the Fourth IEEE International Conference on Neural Networks, Perth, Australia, volume 4, pages 1942–1948, 1995.

[76] D. B. Fogel, L. J. Fogel, and J. W. Atmar. Meta-evolutionary programming. In 25th Asilomar Conference on Signals, Systems and Computers. IEEE Computer Society, Asilomar, volume 1, pages 540–545, 1992.

[77] D. H. Wolpert and W. G. Macready. No free lunch theorems for optimiza- tion. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997.

[78] M. Hucka, A. Finney, H. M. Sauro, H. Bolouri, J. C. Doyle, , H. Kitano, A. P. Arkin, B. J. Bornstein, D. Bray, A. Cornish-Bowden, A. A. Cuellar, S. Dronov, E. D. Gilles, M. Ginkel, V. Gor, I. I. Goryanin, W. J. Hedley, T. C. Hodgman, J.-H. Hofmeyr, P. J. Hunter, N. S. Juty, J. L. Kasberger, A. Kremling, U. Kummer, N. Le Nov`ere,L. M. Loew, D. Lucio, P. Mendes, E. Minch, E. D. Mjolsness, Y. Nakayama, M. R. Nelson, P. F. Nielsen, T. Sakurada, J. C. Schaff, B. E. Shapiro, T. S. Shimizu, H. D. Spence, J. Stelling, K. Takahashi, M. Tomita, J. Wagner, and J. Wang. The sys- tems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics, 19(4):524–531, 2003.

[79] M. Hucka, A. Finney, B. J. Bornstein, S. M. Keating, B. E. Shapiro, J. Matthews, B. L. Kovitz, M. J. Schilstra, A. Funahashi, J. C. Doyle, and H. Kitano. Evolving a lingua franca and associated software infras- tructure for computational systems biology: the systems biology markup language (SBML) project. Systems Biology, 1(1):41–53, 2004.

[80] BioModels Database. http://www.ebi.ac.uk/biomodels-main/. Last ac- cessed: August 22 2012. BIBLIOGRAPHY 125

[81] C. Li, M. Donizelli, N. Rodriguez, H. Dharuri, L. Endler, V. Chelliah, L. Li, E. He, A. Henry, M. I. Stefan, J. L. Snoep, M. Hucka, N. le Nov`ere, and C. Laibe. Biomodels database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Systems Biology, 4:92, 2010.

[82] M. C. Lawrence, A. Jivan, C. Shao, L. Duan, D. Goad, E. Zaganjor, J. Os- borne, K. McGlynn, S. Stippec, S. Earnest, W. Chen, and M. H. Cobb. The roles of MAPKs in disease. Cell Research, 18(4):436–442, 2008.

[83] B. N. Kholodenko and M. R. Birtwistle. Four-dimensional dynamics of MAPK information-processing systems. Wiley Interdisciplinary Reviews: Systems Biology and Medicine, 1(1):28–44, 2009.

[84] B. N. Kholodenko. Negative feedback and ultrasensitivity can bring about oscillations in the mitogen-activated protein kinase cascades. Eur. J. Biochem., 267:1583–1588, 2000.

[85] K. Levenberg. A method for the solution of certain nonlinear problems in least squares. Quart. Appl. Math., 2:164–168, 1944.

[86] D. Marquardt. A method for the solution of certain nonlinear problems in least squares. SIAM J. Appl. Math., 11:431–441, 1963.

[87] O. Kongas and J. van Beek. Creatine kinase in energy metabolic signalling in muscle. Available from Nature Precedings, 2007.

[88] A. Funahashi, M. Morohashi, H. Kitano, and N. Tanimura. CellDesigner: a process diagram editor for gene-regulatory and biochemical networks. BIOSILICO, 1(5):159–162, 2003.

[89] BioModels Database. http://www.ebi.ac.uk/biomodels-main/ BIOMD0000000041. Last accessed: November 1 2011.

[90] R. Wang, C. Li, L. Chen, and K. Aihara. Modeling and analyzing biological oscillations in molecular networks. Proceedings of the IEEE, 96(8):1361– 1385, 2008.

[91] D. Gonze, J. Halloy, and A. Goldbeter. Deterministic versus stochastic models for circadian rhythms. J. Biol. Phys, 28:637–653, 2002. 126 BIBLIOGRAPHY

[92] A. Pikovsky, M. Rosenblum, and J. Kurths. Synchronization. Cambridge University Press, Cambridge, 2001.

[93] R. Wang and L. Chen. Synchronizing genetic oscillators by signaling molecules. J. Biol. Rhythms, 20(3):257–269, 2005.

[94] D. Gonze, N. Markadieu, and A. Goldbeter. Selection of in-phase or out- of-phase synchronization in a model based on global coupling of cells un- dergoing metabolic oscillations. Chaos, 18:037127, 2008.

[95] K. Kamino, K. Fujimoto, and S. Sawai. Collective oscillations in developing cells: Insights from simple systems. Develop. Growth Differ., 53:503–517, 2011.

[96] A. Goldbeter and L. A. Segel. Unified mechanism for relay and oscilla- tion of cyclic AMP in Dictyostelium discoideum. Proc. Natl. Acad. Sci., 74(4):1543–1547, 1977.

[97] G. Rozhnova, A. Nunes, and A. J. McKane. Phase lag in epidemics on a network of cities. Phys. Rev. E, 85:051912, 2012.

[98] S. L. Marple, Jr. Digital spectral analysis with applications. Prentice-Hall, New Jersey, 1987.

[99] M. T. Laub and W. F. Loomis. A molecular network that produces spon- taneous oscillations in excitable cells of Dictyostelium. Mol. Biol. Cell., 9:3521–3532, 1998.

[100] G. Gerisch and B. Hess. Cycle-AMP-controlled oscillations in suspensed Dyctyostelium cells: their relation to morphgenetic cell interactions. Proc. Nat. Acad. Sci., 71(5):2118–2122, 1974.

[101] M. Ullah and O. Wolkenhauer. Investigating the two-moment characteri- sation of subcellular biochemical networks. J. Theor. Biol., 260:340–352, 2009.

[102] B. P. English, A. M. van Oijen W. Min, K. T. Lee, G. Luo, H. Sun, B. J. Cherayil, S. C. Kou, and X. S. Xie. Ever-fluctuating single enzyme molecules: Michaelis-Menten equation revisited. Nature Chemical Biology, 2(2):87–94, 2006. BIBLIOGRAPHY 127

[103] R. Grima. Noise-induced breakdown of the Michaelis-Menten equation in steady-state conditions. Phys. Rev. Lett., 102:218103, 2009.

[104] K. R. Sanft, D. T. Gillespie, and L. R. Petzold. Legitimacy of the stochastic Michaelis-Menten approximation. IET Syst. Biol., 5:58–69, 2011.

[105] R. P. Boland, T. Galla, and A. J. McKane. How limit cycles and quasi-cycles are related in systems with intrinsic noise. J. Stat. Mech., page P09001, 2008.

[106] R. P. Boland, T. Galla, and A. J. McKane. Limit cycles, complex Floquet multipliers, and intrinsic noise. Phys. Rev. E, 79:051131, 2009.

[107] R. F. Fox and J. Keizer. Effect of molecular fluctuations on the description of chaos by macrovariable equations. Phys. Rev. Lett., 64(3), 1990.

[108] P. Geysermans and G. Nicolis. Thermodynamic fluctuations and chemical chaos in a well-stirred reaction: A master equation analysis. J. Chem. Phys., 99(11), 1993.

[109] B. Noble and J. W. Daniel. Applied Linear Algebra. Prentice-Hall, Engle- wood Cliffs, second edition, 1977. Appendix A

Ill-Conditioned Systems

In this appendix, we will give some details about ill-conditioned systems. These systems were briefly mentioned in Section 2.5, in connection with the solution of the Lyapunov equation, which in the application considered here is the matrix of covariances, C. To recap, the Lyapunov equation has the form

AC + CAT + B = 0, (A.1) where A, B and C are all square matrices of size K, where K is the number of independent variables present in the model. In certain cases, linear equations like these have been found to be ill-conditioned. That is, very small changes in A or B can result in much larger changes in the solution, C. This is useful to know if there is any uncertainty in the model e.g. in the reaction parameters. Using Kronecker products (denoted by ⊗) the Lyapunov equation may be rewritten in the form P x = b, where

P =A ⊗ IK + IK ⊗ A, x =vec(C), b = − vec(B), (A.2)

where IK is the K × K identity matrix and the operator vec takes the columns of a matrix and stacks them on top of each other to form a vector of length K2. We can also define the operator unvec, which is the reverse operation i.e. it acts on a vector and turns it into a matrix. The cost of writing the equation in this simpler form is that the dimension of the problem is much larger: the matrix P

128 129 is a square matrix of dimension K2. To find out how sensitive the solution, the vector x, is to small perturbations in P and b in a particular case, we need to be able to quantify the ‘size’ of a matrix. We do this using a matrix norm. Here we use the spectral norm, or 2-norm, which we write as kxk2, although any operator norm may be used[109]. We will discover that a very useful quantity in these systems will be the condition number of the matrix P , which we write as κ(P ). We define it as −1 κ(P ) = kP k2kP k2, (A.3)

When using the spectral norm, the norm of matrix P is the square root of the largest eigenvalue of the positive-semidefinite matrix P ∗P (where P ∗ is the con- jugate transpose of P ). Or equivalently, it is the largest singular value of P . The singular values of P are found by performing a singular value decomposition, P = UΣV ∗, where U and V are unitary matrices and Σ is a diagonal matrix which lists the singular values of P , σi, in descending absolute size [66]. To cal- culate the condition number of P , we also need the largest singular value of P −1. The singular values of P −1 are just the reciprocals of the singular values of P .

Therefore, the largest of these will be 1/σn, where σn is the smallest singular value of P . Hence, we can write the condition number as

−1 σ1 κ(P ) = kP k2kP k2 = . (A.4) σn

We will now show why κ(P ) is a key quantity here. We assume that the equation solved, rather than P x = b, has the form

(P + δP )(x + δx) = b + δb. (A.5)

That is, small perturbations have been made to the true values of P and b, and so the solution to this perturbed equation will change as a result. We want to be able to find a bound on the size of δx, compared to x. If this is much bigger than the relative perturbations to P and b, then the system could be ill-conditioned. We say ‘could’, because the upper bound on kδxk/kxk can be very loose, and gives what can be considered a worst case scenario. We rewrite the perturbed equation as δx = (P + δP )−1(δb − δP x). (A.6) 130 APPENDIX A. ILL-CONDITIONED SYSTEMS

To proceed, we need some results for the inverse of perturbed matrices. In par- ticular, if P is invertible, then P + δP is invertible provided kP −1δP k < 1. Furthermore, if this is the case, it can be proved that [109]

kP −1k k(P + δP )−1k ≤ . (A.7) 1 − k(δP )P −1k

This is a useful result, since if P is ill-conditioned it will be close to a singular matrix, i.e. for a small δP . Taking the norm of Eq. (A.6), and using the fact that kABk ≤ kAkkBk, we find

kP −1k kδxk ≤ kδb − δP xk 1 − k(δP )P −1k kP −1k ≤ [kδbk + kδP kkxk] . 1 − k(δP )P −1k (A.8)

Dividing both sides by kxk,

kδxk kP −1k kδbk  ≤ + kδP k . (A.9) kxk 1 − k(δP )P −1k kxk

Using the fact that, from the original equation, kbk ≤ kP kkxk, we find

kδxk kP −1k kδbkkP k  ≤ + kδP k . (A.10) kxk 1 − k(δP )P −1k kbk

Finally, we multiply the numerator and denominator of the second term in the square brackets by kP k and, introducing the condition number, we write

kδxk κ(P ) kδbk kδP k ≤ + , (A.11) kxk 1 − k(δP )P −1k kbk kP k

which is the desired formula. The interpretation is clear: the relative error in A and b is scaled up by the factor κ(P )/(1−(δP )kP −1k). Provided that δP is small, the denominator will be close to one, and the condition number will give a good indication of this scaling. If κ(P ) is small or moderate, then the equations can be said to be well-conditioned. However, if κ(P ) is large, this does not imply that the equations are ill-conditioned, as the above equation only gives a upper bound for kδxk/kxk, which can be very loose. However, it does give an indication that 131 the equation could be ill-conditioned, and that further investigation is required.

An example

To illustrate these ideas, we use the model of MAPK signalling due to Kholo- denko, which was studied in Chapter 3. We reduce the feedback strength by changing the value of reaction parameter KI to 31 and look at the system close to the bifurcation point. The numerical form of matrices A and B for this pa- rameter choice are  −0.0048469 0 0.00297298 0.00599676 0   0.00204132 −0.318926 0 0.00204132 0.345808    A =  0 −0.000618717 −0.00148891 0 0.000618789  ,    0.00212612 0 0.0168718 −0.00621679 0  0.000213613 0.0275914 0 0.000213613 −0.345884 (A.12)  15.6045 0 0 −7.17048 0     0 6.27707 0 0 −0.543181    B =  0 0 2.45458 0 0  , (A.13)      −7.17048 0 0 7.17048 0  0 −0.543181 0 0 0.543181 where entries are given to six significant figures, as displayed in the user interface of COPASI. The eigenvalues of A are (−0.43101, −0.2338, −0.00916956, −0.00169137+ 0.000230046i, −0.00169137 − 0.000230046i): the complex pair of eigenvalues has real parts very close to zero. Solving the Lyapunov equation (or, alternatively, reformulating the equation to the form P x = b), the solution to this equation is found to be

 22253.0 290.899 2735.97 15328.6 46.4182     290.899 13.7663 40.3683 216.307 0.626166    C =  2735.97 40.3683 810.187 2514.22 6.43541  . (A.14)      15328.6 216.307 2514.22 12642.4 34.5085  46.4182 0.626166 6.43541 34.5085 0.885136

Our procedure is as follows. We make a random perturbation to the seventh significant figures of the entries of A and B. If an entry is identically equal to zero e.g. B23, we do not perturb it. For a particular choices of δP and δb, we can calculate the bound on kδxk/kxk, using Eq. (A.11),and compare with the value 132 APPENDIX A. ILL-CONDITIONED SYSTEMS

obtained from solving the perturbed equation. For one perturbation, randomly generated, we calculate the bound for kδxk/kxk to be 0.0207. When one considers that in this case kδP k/kP k = and kδbk/kbk were both of order 10−7, this is quite significant change to the solution of the equation. The condition number for P here was calculated to be 14436. However, when we calculating the size of δx by directly inverting the perturbed equation, we calculate kδxk/kxk to be equal to 1 × 10−5, which is much smaller than the value given by the bound. However, we can conclude that there is a perturbation of this magnitude for which the deviation would be much more significant.

Summary

In this appendix, we have introduced some ideas about ill-conditioned systems, and how to describe the conditioning quantitatively. Because the values chosen for the reaction parameters must be estimated e.g. from experiment, there will always be some uncertainty associated with the model of the reaction system. Therefore, it is important to know how much trust one can place in the obtained solution of the covariance matrix. If desired, the conditioning analysis described here could be automated using a library of numerical routines such as LAPACK [71], to carry out the linear algebra. As already discussed, the bound obtained for kδxk/kxk can be very loose, but it does give an indication the conditioning of the system could affect the reliability of the result obtained. Appendix B

Preliminary Code

Here we present the preliminary C++ code, written to provide details for the calculation of the diffusion matrix within COPASI. The code is split into two parts, a header file, which defines the particular biochemical model to be studied, and the ‘main’ file, which reads the information from the header file and performs the calculation. The header file also contains the required C++ libraries:

#include #include #include #include using namespace std;

Then we begin the model description. The model we are using here is a model of yeast glycolysis, due to Wolf & Heinrich [33], which is studied in Chapters 4 and 5. The model description should include the cell volume and the number of species and reactions. The system has nine chemical species and nine reactions. The details of the reactions are used to define the stoichiometry matrices for the substrates and products.

//Define system size (cell volume or total population) double SIZE=10000; //number of species, c int c=9; //number of reactions, A int A=9; //The dimension of the stoichiometry matrices should be A times c.

133 134 APPENDIX B. PRELIMINARY CODE

//Species are in the following order: //(S1,S2,S3,S4,A3,N2,A2,N1,S4^ex) //substrates: int s[9][9]={{1,0,0,0,2,0,0,0,0}, {0,1,0,0,0,0,0,1,0}, {0,0,1,0,0,0,2,0,0}, {0,0,0,1,0,1,0,0,0}, {0,0,0,0,1,0,0,0,0}, {0,1,0,0,0,1,0,0,0}, {0,0,0,0,0,0,0,0,0}, {0,0,0,1,0,0,0,0,0}, {0,0,0,0,0,0,0,0,1}}; //products: int p[9][9]={{0,2,0,0,0,0,2,0,0}, {0,0,1,0,0,1,0,0,0}, {0,0,0,1,2,0,0,0,0}, {0,0,0,0,0,0,0,1,0}, {0,0,0,0,0,0,1,0,0}, {0,0,0,0,0,0,0,1,0}, {1,0,0,0,0,0,0,0,0}, {0,0,0,0,0,0,0,0,1}, {0,0,0,0,0,0,0,0,0}};

We need to find the concentrations of all the species at the stable fixed point, which is calculated in another COPASI task. In COPASI, these values should be ‘fed into’ this task. We define these concentrations (here they are set to arbitrary values) in an array. We then define pointer variables which refer to these values. In COPASI, the fixed point values can either be given in terms of concentration or particle numbers. Here we assume that particle numbers are used, so we divide each entry by the volume. double phi[9]={100/SIZE,100/SIZE,100/SIZE,100/SIZE, 100/SIZE,100/SIZE,100/SIZE,100/SIZE,50/SIZE}; //define a pointer to each element of phi double *P1=&phi[0]; double *P2=&phi[1]; double *P3=&phi[2]; 135 double *P4=&phi[3]; double *P5=&phi[4]; double *P6=&phi[5]; double *P7=&phi[6]; double *P8=&phi[7]; double *P9=&phi[8];

We also need to define the reaction kinetics for each reaction. We do this by defining a function, which takes the concentration of the species as arguments.

//define kinetic functions here, one for each reaction double kinetic1 (double P1, double P5); double kinetic2 (double P2, double P8); double kinetic3 (double P7, double P3); double kinetic4 (double P4, double P6); double kinetic5 (double P5); double kinetic6 (double P2, double P6); double kinetic7();//constant glucose influx double kinetic8 (double P4, double P9); double kinetic9 (double P9);

The reaction kinetics for this system are described below. They include the values of the rate constants. double q= 4;//reaction parameter double KK= 0.52;//reaction parameter double kinetic1(double P1, double P5){ double temp; temp= 5*pow(P5,2)*P1/2*(1+ pow(P5/KK,q)); return temp; } double kinetic2(double P2, double P8){ double temp; temp= 3*P2*P8; return temp; } double kinetic3(double P7, double P3){ double temp; 136 APPENDIX B. PRELIMINARY CODE

temp= 3*P3*P7; return temp; } double kinetic4(double P4, double P6){ double temp; temp= 3*P4*P6; return temp; } double kinetic5(double P5){ double temp; temp= 5*P5; return temp; } double kinetic6(double P2, double P6){ double temp; temp= 3*P2*P6; return temp; } double kinetic7(){ double temp; temp= 5; return temp; } double kinetic8(double P4, double P9){ double temp; temp= 3*(P4-P9); return temp; } double kinetic9(double P9){ double temp; temp=4*P9; return temp; }

For convenience, we group all of these functions as an array.

//Array should be of length A. 137 double array[9]={kinetic1(*P1,*P5),kinetic2(*P2,*P8), kinetic3(*P3,*P5),kinetic4(*P4,*P6),kinetic5(*P5), kinetic6(*P2,*P6),kinetic7(),kinetic8(*P4,*P9),kinetic9(*P9)};

Now that the model is defined, the header file can be read in the other file, where the diffusion matrix is calculated. We start by checking that the model is in a suitable form:

#include "header.cpp"//all other libraries defined in linked file int main(){ for(int i2=0;i2

We can use the two stoichiometry matrices, defined in the header file, to calculate the net stoichiometry matrix. Here this is defined as a two-dimensional array of integers, r. In Chapter 3 it is called νiµ. In the code below, we print the net stoichiometry for each reaction to the screen. The first For loop runs over j, which lists the reactions.

int r[A][c]; //reaction loop for(int j=0; j

for(int i2=0; i2

We are now able to compute the contribution to the matrix B from each reaction. This is done by combining the net stoichiometry with the kinetic functions accord- ing to Eq. (3.12). Again, j labels the reactions. We create the two-dimensional array ‘BMatrix’ to store the values of the elements of B. Each element is ini- tialised to zero and then contributions from each reaction are accrued. The contributions from each reaction are printed to the screen, along with the matrix B in its final form.

double BMatrix[c][c]; for(int i1=0;i1

BMatrix[j6][j7]+= array[j]*r[j][j6]*r[j][j7]*Q; cout<< BContr <

Form of the complex coherence function

In this Appendix we state the coefficients of ω in Eq. (5.7), which is the magnitude of the complex coherence function C12(ω).

2 2 4 2 2 4 2 2 4 2 2 2 4 2 α0 =16g1g3g5g6 − 32g1g2g3g5g6y1 − 128g1g3g4g5g6y1 − 256g1g3g4g5g6y1 2 2 4 2 2 4 2 2 2 2 4 2 2 2 4 3 +16g2g3g5g6y1 + 256g1g2g3g4g5g6y1 + 256g1g4g5g6y1 + 256g1g2g3g4g5g6y1 2 2 4 3 2 4 2 3 2 4 2 3 2 2 2 4 4 +1024g1g3g4g5g6y1 − 128g2g3g4g5g6y1 − 512g1g2g4g5g6y1 + 1024g1g3g4g5y1 2 4 4 2 2 4 2 4 2 4 2 2 2 4 −1024g1g2g3g4g5g6y1 + 256g2g4g5g6y1 − 32g1g2g3g5g6y2 − 88g1g2g3g5g6y1y2 2 4 2 2 4 2 2 2 4 2 2 4 2 +64g1g2g3g5g6y1y2 + 128g1g2g4g5g6y1y2 + 96g1g2g3g5g6y1y2 + 640g1g2g3g4g5g6y1y2 3 4 2 2 2 4 2 2 2 2 4 3 3 2 4 3 −32g2g3g5g6y1y2 − 256g1g2g4g5g6y1y2 + 704g1g2g3g4g5y1y2 − 8g2g3g5g6y1y2 2 4 3 2 2 4 3 3 4 2 3 2 2 4 4 −640g1g2g3g4g5g6y1y2 − 128g1g2g4g5g6y1y2 + 128g2g4g5g6y1y2 − 64g1g2g3g4g5y1y2 2 2 4 4 2 2 4 5 3 2 4 5 2 2 4 2 2 −256g1g2g3g4g5y1y2 − 256g1g2g3g4g5y1y2 + 128g2g4g5g6y1y2 + 16g1g2g5g6y2 2 2 4 2 3 4 2 2 2 2 2 4 2 2 3 4 2 2 +120g1g2g3g5g6y1y2 − 32g1g2g5g6y1y2 + 121g1g2g3g5y1y2 − 128g1g2g3g5g6y1y2 2 2 4 2 2 4 4 2 2 2 3 2 4 3 2 2 2 4 3 2 −160g1g2g4g5g6y1y2 + 16g2g5g6y1y2 − 22g1g2g3g5y1y2 − 344g1g2g3g4g5y1y2 4 4 3 2 3 4 3 2 4 2 4 4 2 3 4 4 2 2 2 2 4 4 2 +8g2g3g5g6y1y2 + 128g1g2g4g5g6y1y2 + g2g3g5y1y2 − 80g1g2g3g4g5y1y2 + 16g1g2g4g5y1y2 4 4 4 2 4 4 5 2 3 2 4 5 2 4 2 4 6 2 +32g2g4g5g6y1y2 + 8g2g3g4g5y1y2 + 32g1g2g4g5y1y2 + 16g2g4g5y1y2 2 3 4 3 2 3 4 2 3 4 4 2 3 4 4 3 3 −32g1g2g5g6y1y2 − 88g1g2g3g5y1y2 + 32g1g2g5g6y1y2 + 8g1g2g3g5y1y2 2 3 4 3 3 4 4 4 3 2 4 4 2 4 +32g1g2g4g5y1y2 + 32g1g2g4g5y1y2 + 16g1g2g5y1y2 (C.1)

2 4 2 2 2 3 2 4 2 4 4 2 α1 =16g1g5g6 − 16g1g3g5g6y1 + 48g1g3g5g6y1 − 16g1g3g5g6y1 − 32g1g2g5g6y1 2 2 4 2 2 3 2 2 3 2 4 2 2 4 2 +36g1g3g5y1 − 64g1g3g4g5g6y1 + 128g1g3g4g5g6y1 − 64g1g2g3g5g6y1 + 16g2g3g5g6y1

140 141

4 2 2 4 2 2 2 2 3 3 2 4 3 2 4 3 −64g1g3g4g5g6y1 + 16g2g5g6y1 − 64g1g3g4g5y1 − 24g1g2g3g5y1 − 64g1g3g4g5y1 2 2 3 3 2 3 3 2 3 3 2 4 3 4 3 +16g2g3g5g6y1 − 128g2g3g4g5g6y1 − 512g1g3g4g5g6y1 + 16g2g3g5g6y1 + 64g2g3g4g5g6y1 2 2 2 2 4 2 2 3 4 2 2 4 4 2 4 4 2 2 4 4 +256g1g3g4g5y1 − 512g1g3g4g5y1 + 4g2g3g5y1 + 64g2g3g4g5y1 + 256g3g4g5y1 2 3 4 2 3 4 2 2 2 5 2 2 3 5 2 2 3 5 +64g2g3g4g5g6y1 + 512g2g3g4g5g6y1 + 512g1g2g3g4g5y1 + 64g2g3g4g5y1 + 512g2g3g4g5y1 2 2 2 2 6 2 2 3 2 4 2 2 4 2 2 2 +256g2g3g4g5y1 + 16g1g3g5g6y2 − 16g1g3g5g6y2 − 36g1g3g5y1y2 + 8g1g2g3g5g6y1y2 2 3 2 3 2 3 4 −80g1g2g3g5g6y1y2 + 16g1g2g3g5g6y1y2 − 64g1g3g4g5g6y1y2 + 8g1g2g3g5g6y1y2 2 2 3 2 2 2 3 2 2 4 2 2 4 2 2 4 2 −100g1g2g3g5y1y2 − 32g1g3g4g5y1y2 + 32g1g2g3g5y1y2 − 16g1g3g4g5y1y2 + 96g1g3g4g5y1y2 2 3 2 2 2 3 2 3 2 2 4 2 +96g1g2g3g5g6y1y2 − 32g2g3g5g6y1y2 − 192g1g2g3g4g5g6y1y2 + 8g2g3g5g6y1y2 4 2 2 2 2 3 2 2 3 3 2 3 3 +32g1g2g4g5g6y1y2 + 320g1g2g3g4g5y1y2 + 88g1g2g3g5y1y2 − 16g1g2g3g4g5y1y2 2 3 3 2 2 3 3 2 2 4 3 4 3 2 4 3 −64g1g2g3g4g5y1y2 − 128g1g3g4g5y1y2 + 4g2g3g5y1y2 + 32g1g2g3g4g5y1y2 + 64g2g3g4g5y1y2 2 4 3 3 2 2 3 2 2 2 3 3 3 3 −128g1g3g4g5y1y2 − 8g2g3g5g6y1y2 − 128g1g2g4g5g6y1y2 − 16g2g3g5g6y1y2 2 3 3 2 4 3 2 2 2 4 2 2 2 4 +256g2g3g4g5g6y1y2 − 32g2g4g5g6y1y2 + 320g1g2g3g4g5y1y2 − 256g1g2g3g4g5y1y2 3 2 3 4 2 3 4 2 2 3 4 2 4 4 2 2 2 5 −4g2g3g5y1y2 − 32g1g2g3g4g5y1y2 + 96g2g3g4g5y1y2 − 16g2g3g4g5y1y2 − 256g1g2g3g4g5y1y2 3 3 5 2 2 3 5 3 2 2 5 2 2 4 2 2 3 2 −16g2g3g4g5y1y2 + 128g2g3g4g5y1y2 + 128g2g4g5g6y1y2 + 9g1g3g5y2 − 16g1g2g3g5g6y2 2 4 2 2 2 3 2 2 4 2 2 4 2 2 4 2 +8g1g2g5g6y2 + 28g1g2g3g5y1y2 + 2g1g2g3g5y1y2 + 12g1g2g3g5y1y2 − 24g1g3g4g5y1y2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 3 2 2 +24g1g2g3g5g6y1y2 − 16g1g2g3g5g6y1y2 + 122g1g2g3g5y1y2 − 16g1g2g3g5y1y2 2 2 3 2 2 2 3 2 2 2 2 4 2 2 2 4 2 2 −36g1g2g3g5y1y2 − 144g1g2g3g4g5y1y2 + 4g2g3g5y1y2 + 8g1g2g4g5y1y2 4 2 2 2 2 4 2 2 3 2 2 2 2 2 2 2 2 −80g1g2g3g4g5y1y2 + 16g1g4g5y1y2 − 32g1g2g3g5g6y1y2 − 160g1g2g4g5g6y1y2 3 3 2 2 3 4 2 2 3 2 2 3 2 2 2 2 3 2 +32g2g3g5g6y1y2 − 8g2g5g6y1y2 − 20g1g2g3g5y1y2 − 336g1g2g3g4g5y1y2 3 3 3 2 3 2 3 3 2 2 3 3 2 3 4 3 2 4 2 3 2 −16g1g2g3g5y1y2 + 8g2g3g5y1y2 + 112g1g2g3g4g5y1y2 − 2g2g3g5y1y2 + 8g2g3g5g6y1y2 3 2 3 2 4 2 2 4 2 3 2 4 2 2 2 2 2 4 2 +128g1g2g4g5g6y1y2 + 2g2g3g5y1y2 − 64g1g2g3g4g5y1y2 + 32g1g2g4g5y1y2 3 3 4 2 3 4 4 2 4 2 4 2 4 2 5 2 3 2 2 5 2 +32g2g3g4g5y1y2 − 8g2g4g5y1y2 + 32g2g4g5g6y1y2 + 16g2g3g4g5y1y2 + 64g1g2g4g5y1y2 4 2 2 6 2 2 4 3 2 2 3 3 2 2 4 3 2 4 3 +32g2g4g5y1y2 − 12g1g2g3g5y2 − 32g1g2g3g5y1y2 + 8g1g2g5y1y2 − 8g1g2g3g5y1y2 2 4 3 2 3 2 3 2 3 2 2 3 3 3 2 3 3 4 2 3 +16g1g2g4g5y1y2 − 32g1g2g5g6y1y2 − 80g1g2g3g5y1y2 + 32g1g2g3g5y1y2 − 8g1g2g5y1y2 4 2 2 3 4 2 3 3 2 3 2 3 3 4 2 4 3 +32g1g2g5g6y1y2 + 16g1g2g3g5y1y2 + 64g1g2g4g5y1y2 + 64g1g2g4g5y1y2 2 2 4 4 2 4 2 2 4 2 2 3 2 3 2 3 +4g1g2g5y2 + 32g1g2g5y1y2 + 32g1g3g5g6y3 − 64g1g2g3g5g6y1y3 − 256g1g3g4g5g6y1y3 2 2 3 2 2 2 3 2 3 2 2 2 3 2 −256g1g3g4g5y1y3 + 32g2g3g5g6y1y3 + 512g1g2g3g4g5g6y1y3 + 512g1g4g5g6y1y3 2 2 3 3 2 3 3 2 3 3 2 3 4 +1024g1g3g4g5y1y3 − 256g2g3g4g5g6y1y3 − 1024g1g2g4g5g6y1y3 − 1024g1g2g3g4g5y1y3 2 2 3 4 2 3 2 2 3 2 3 +512g2g4g5g6y1y3 − 64g1g2g3g5g6y2y3 − 88g1g2g3g5y1y2y3 + 128g1g2g3g5g6y1y2y3 2 3 2 2 3 2 2 3 2 3 3 2 +256g1g2g4g5g6y1y2y3 + 96g1g2g3g5y1y2y3 + 640g1g2g3g4g5y1y2y3 − 64g2g3g5g6y1y2y3 2 3 2 3 2 3 3 2 3 3 2 2 3 3 −512g1g2g4g5g6y1y2y3 − 8g2g3g5y1y2y3 − 640g1g2g3g4g5y1y2y3 − 128g1g2g4g5y1y2y3 3 3 3 3 2 3 5 2 2 3 2 2 2 3 2 +256g2g4g5g6y1y2y3 + 128g2g4g5y1y2y3 + 32g1g2g5g6y2y3 + 120g1g2g3g5y1y2y3 3 3 2 3 3 2 2 2 2 3 2 2 −64g1g2g5g6y1y2y3 − 128g1g2g3g5y1y2y3 − 160g1g2g4g5y1y2y3 4 3 2 2 4 3 3 2 3 3 3 2 +32g2g5g6y1y2y3 + 8g2g3g5y1y2y3 + 128g1g2g4g5y1y2y3 4 3 4 2 2 3 3 3 4 3 2 3 2 3 3 +32g2g4g5y1y2y3 − 32g1g2g5y1y2y3 + 32g1g2g5y1y2y3 + 256g1g2g3g4g5y1y3 142 APPENDIX C. FORM OF THE COMPLEX COHERENCE FUNCTION

(C.2)

3 2 2 2 2 2 3 2 2 4 2 3 2 α2 = − 16g1g3g5g6y1 + 4g1g3g5y1 − 16g1g3g5y1 + 4g3g5y1 + 16g2g3g5g6y1 2 2 3 2 3 3 2 2 2 4 2 2 2 4 2 2 2 2 3 +8g1g2g3g5y1 + 16g2g3g5y1 + 4g2g3g5y1 + 256g3g4g5y1 − 20g1g3g5y1y2 − 4g1g3g5y1y2 2 3 4 2 2 2 2 2 2 2 +4g1g3g5y1y2 − 4g1g3g5y1y2 + 8g1g2g3g5g6y1y2 − 4g1g2g3g5y1y2 + 32g1g3g4g5y1y2 2 2 2 2 2 2 2 2 2 2 3 2 3 2 +48g1g2g3g5y1y2 − 16g1g3g4g5y1y2 + 96g1g3g4g5y1y2 − 8g2g3g5y1y2 + 16g1g3g4g5y1y2 4 2 2 2 2 2 2 2 2 3 2 3 +4g2g3g5y1y2 − 8g2g3g5g6y1y2 + 32g1g2g4g5g6y1y2 − 8g1g2g3g5y1y2 − 16g1g2g3g4g5y1y2 2 2 3 2 2 2 3 2 3 2 2 3 2 2 3 −128g1g3g4g5y1y2 − 28g2g3g5y1y2 + 32g1g2g3g4g5y1y2 + 64g2g3g4g5y1y2 − 128g1g3g4g5y1y2 2 3 3 3 3 2 2 3 3 2 4 2 4 +4g2g3g5y1y2 − 32g2g3g4g5y1y2 − 32g2g4g5g6y1y2 − 4g2g3g5y1y2 − 32g1g2g3g4g5y1y2 2 2 4 2 2 4 3 5 2 2 5 2 2 2 2 −32g2g3g4g5y1y2 − 16g2g3g4g5y1y2 − 16g2g3g4g5y1y2 + 128g2g3g4g5y1y2 + 10g1g3g5y2 2 3 2 2 4 2 2 2 2 2 2 2 2 2 2 2 2 2 +4g1g3g5y2 + g1g5y2 + 8g1g2g5g6y2 + 28g1g2g3g5y1y2 + 4g1g2g3g5y1y2 + 8g1g2g3g5y1y2 2 2 2 3 2 4 2 2 2 2 2 2 2 2 2 2 −32g1g3g4g5y1y2 − 4g1g2g3g5y1y2 − 2g1g2g5y1y2 + g1g2g3y1y2 − 16g1g2g3g5y1y2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 −36g1g2g3g5y1y2 − 144g1g2g3g4g5y1y2 + 8g2g3g5y1y2 + 16g1g2g4g5y1y2 − 64g1g2g3g4g5y1y2 2 2 2 2 2 2 3 2 2 2 4 2 2 3 2 2 2 3 2 3 2 +32g1g4g5y1y2 − 8g2g3g5y1y2 + g2g5y1y2 − 8g2g5g6y1y2 + 2g1g2g3y1y2 2 2 3 2 3 3 2 3 2 3 2 2 3 2 3 2 3 2 +8g1g2g3g4y1y2 − 16g1g2g3g5y1y2 + 8g2g3g5y1y2 + 112g1g2g3g4g5y1y2 − 4g2g3g5y1y2 4 2 4 2 3 4 2 2 2 2 4 2 3 4 2 3 2 4 2 +g2g3y1y2 + 16g1g2g3g4y1y2 + 16g1g2g4y1y2 + 32g2g3g4g5y1y2 − 16g2g4g5y1y2 4 5 2 3 2 5 2 4 2 6 2 2 2 3 2 2 3 +8g2g3g4y1y2 + 32g1g2g4y1y2 + 16g2g4y1y2 − 16g1g2g3g5y2 − 32g1g2g3g5y1y2 2 2 2 3 2 2 3 2 3 2 3 3 2 3 3 2 2 3 +16g1g2g5y1y2 + 32g1g2g4g5y1y2 + 8g1g2g3y1y2 + 32g1g2g3g5y1y2 − 16g1g2g5y1y2 4 3 3 2 3 3 3 4 4 3 2 2 2 4 2 4 2 4 2 3 +8g1g2g3y1y2 + 32g1g2g4y1y2 + 32g1g2g4y1y2 + 8g1g2g5y2 + 16g1g2y1y2 + 32g1g5g6y3 2 2 2 2 3 2 3 3 2 2 2 −16g1g3g5y1y3 + 48g1g3g5y1y3 − 16g1g3g5y1y3 − 64g1g2g5g6y1y3 − 64g1g3g4g5y1y3 2 2 2 3 2 2 3 2 3 2 2 3 2 +128g1g3g4g5y1y3 − 64g1g2g3g5y1y3 + 16g2g3g5y1y3 − 64g1g3g4g5y1y3 + 32g2g5g6y1y3 2 2 2 3 2 2 3 2 2 3 2 3 3 3 3 +16g2g3g5y1y3 − 128g2g3g4g5y1y3 − 512g1g3g4g5y1y3 + 16g2g3g5y1y3 + 64g2g3g4g5y1y3 2 2 4 2 2 4 2 2 2 2 3 2 2 +64g2g3g4g5y1y3 + 512g2g3g4g5y1y3 + 16g1g3g5y2y3 − 16g1g3g5y2y3 + 8g1g2g3g5y1y2y3 2 2 2 2 2 2 3 −80g1g2g3g5y1y2y3 + 16g1g2g3g5y1y2y3 − 64g1g3g4g5y1y2y3 + 8g1g2g3g5y1y2y3 2 2 2 2 2 2 2 3 2 3 2 −32g2g3g5y1y2y3 − 192g1g2g3g4g5y1y2y3 + 8g2g3g5y1y2y3 + 32g1g2g4g5y1y2y3 3 2 3 2 2 3 3 2 3 2 2 3 −8g2g3g5y1y2y3 − 128g1g2g4g5y1y2y3 − 16g2g3g5y1y2y3 + 256g2g3g4g5y1y2y3 3 2 5 2 2 2 2 3 2 2 2 2 +128g2g4g5y1y2y3 − 16g1g2g3g5y2y3 + 8g1g2g5y2y3 + 24g1g2g3g5y1y2y3 2 2 2 3 2 2 2 2 2 2 3 2 2 2 −16g1g2g3g5y1y2y3 − 32g1g2g3g5y1y2y3 − 160g1g2g4g5y1y2y3 + 32g2g3g5y1y2y3 3 3 2 2 4 3 2 3 3 2 4 4 2 2 3 3 −8g2g5y1y2y3 + 8g2g3g5y1y2y3 + 128g1g2g4g5y1y2y3 + 32g2g4g5y1y2y3 − 32g1g2g5y1y2y3 4 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 +32g1g2g5y1y2y3 + 16g1g3g5y3 − 32g1g2g3g5y1y3 − 128g1g3g4g5y1y3 + 16g2g3g5y1y3 2 2 2 2 2 2 2 2 2 2 3 2 2 2 3 2 +256g1g2g3g4g5y1y3 + 256g1g4g5y1y3 − 128g2g3g4g5y1y3 − 512g1g2g4g5y1y3 2 2 2 4 2 2 2 2 2 2 2 2 2 2 +256g2g4g5y1y3 − 32g1g2g3g5y2y3 + 64g1g2g3g5y1y2y3 + 128g1g2g4g5y1y2y3 3 2 2 2 2 2 2 2 3 2 3 2 2 3 3 −32g2g3g5y1y2y3 − 256g1g2g4g5y1y2y3 + 128g2g4g5y1y2y3 − 32g2g4g5y1y2y3 2 2 2 2 2 3 2 2 2 4 2 2 2 2 2 2 2 +16g1g2g5y2y3 − 32g1g2g5y1y2y3 + 16g2g5y1y2y3 + 96g1g2g3g5y1y2y3 143

(C.3)

2 2 2 2 2 2 2 2 α3 =4g3g5y1 − 4g1g3g5y1y2 + 4g1g3g5y1y2 − 4g1g3g5y1y2 − 8g2g3g5y1y2 2 2 2 2 3 3 2 2 2 2 2 +16g1g3g4g5y1y2 + 4g2g3g5y1y2 + 4g2g3g5y1y2 − 32g2g3g4g5y1y2 + g1g3y2 + 4g1g3g5y2 2 2 2 2 2 2 2 2 2 2 2 2 +2g1g5y2 + 2g1g2g3y1y2 − 4g1g2g3y1y2 − 8g1g3g4y1y2 − 4g1g2g3g5y1y2 − 4g1g2g5y1y2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 +4g2g3y1y2 + 8g1g2g4y1y2 + 16g1g2g3g4y1y2 + 16g1g4y1y2 − 8g2g3g5y1y2 2 2 2 2 3 3 2 3 4 2 2 3 2 2 3 2 3 +2g2g5y1y2 − 2g2g3y1y2 − 8g2g4y1y2 − 4g1g2g3y2 + 8g1g2y1y2 + 8g1g2g3y1y2 2 3 3 2 3 2 2 4 2 2 2 +16g1g2g4y1y2 − 8g1g2y1y2 + 4g1g2y2 − 16g1g3g5y1y3 + 16g2g3g5y1y3 + 8g1g2g3g5y1y2y3 2 2 2 2 3 2 2 −8g2g3g5y1y2y3 + 32g1g2g4g5y1y2y3 − 32g2g4g5y1y2y3 + 8g1g2g5y2y3 3 2 2 2 2 2 2 2 2 2 2 2 −8g2g5y1y2y3 + 16g1g5y3 − 32g1g2g5y1y3 + 16g2g5y1y3 (C.4)

2 2 2 2 2 2 α4 = g1y2 − 2g1g2y1y2 + g2y1y2 (C.5)

2 2 4 2 2 3 4 2 4 2 2 4 2 β0 =16g1g3g5g6 + 36g1g3g5g6y1 − 32g1g2g3g5g6y1 − 128g1g3g4g5g6y1 3 4 2 2 2 4 2 2 2 4 2 2 4 2 2 2 2 4 2 2 −24g1g2g3g5g6y1 − 208g1g3g4g5g6y1 + 16g2g3g5g6y1 + 256g1g2g3g4g5g6y1 + 256g1g4g5g6y1 2 3 4 3 2 3 4 3 2 4 3 2 2 4 3 2 4 2 3 +144g1g3g4g5y1 + 4g2g3g5g6y1 + 96g1g2g3g4g5g6y1 + 704g1g3g4g5g6y1 − 128g2g3g4g5g6y1 2 4 2 3 3 4 4 2 2 2 4 4 2 2 4 4 2 4 4 −512g1g2g4g5g6y1 − 96g1g2g3g4g5y1 + 640g1g3g4g5y1 + 48g2g3g4g5g6y1 − 1152g1g2g3g4g5g6y1 2 3 4 4 2 2 4 2 4 2 3 4 5 2 2 4 5 +256g1g4g5g6y1 + 256g2g4g5g6y1 + 16g2g3g4g5y1 − 256g1g2g3g4g5y1 2 3 4 5 2 2 4 5 3 4 5 2 2 2 4 6 3 4 6 +256g1g3g4g5y1 + 192g2g3g4g5g6y1 + 512g1g2g4g5g6y1 + 128g2g3g4g5y1 + 512g1g2g3g4g5y1 2 3 4 6 2 3 4 7 3 2 4 2 4 2 3 3 4 +256g2g4g5g6y1 + 256g2g3g4g5y1 + 36g1g3g5g6y2 − 32g1g2g3g5g6y2 + 81g1g3g5y1y2 2 2 4 3 4 2 4 2 2 4 2 −124g1g2g3g5g6y1y2 − 96g1g3g4g5g6y1y2 + 64g1g2g3g5g6y1y2 + 128g1g2g4g5g6y1y2 2 3 4 2 3 2 4 2 2 2 4 2 2 4 2 −45g1g2g3g5y1y2 − 180g1g3g4g5y1y2 + 92g1g2g3g5g6y1y2 + 352g1g2g3g4g5g6y1y2 3 2 4 2 3 4 2 2 2 4 2 2 2 3 4 3 +64g1g4g5g6y1y2 − 32g2g3g5g6y1y2 − 256g1g2g4g5g6y1y2 + 3g1g2g3g5y1y2 2 2 4 3 3 2 4 3 3 2 4 3 2 4 3 +356g1g2g3g4g5y1y2 + 48g1g3g4g5y1y2 − 4g2g3g5g6y1y2 − 544g1g2g3g4g5g6y1y2 2 2 4 3 3 4 2 3 3 3 4 4 2 2 4 4 +320g1g2g4g5g6y1y2 + 128g2g4g5g6y1y2 + g2g3g5y1y2 + 36g1g2g3g4g5y1y2 2 2 4 4 3 3 4 4 3 4 4 2 2 4 4 +144g1g2g3g4g5y1y2 + 64g1g4g5y1y2 + 32g2g3g4g5g6y1y2 + 448g1g2g4g5g6y1y2 2 2 4 5 2 3 4 5 3 2 4 5 3 2 4 6 +144g1g2g3g4g5y1y2 + 192g1g2g4g5y1y2 + 192g2g4g5g6y1y2 + 48g2g3g4g5y1y2 2 3 4 6 3 3 4 7 3 4 2 2 2 4 2 2 3 2 4 2 +192g1g2g4g5y1y2 + 64g2g4g5y1y2 − 48g1g2g3g5g6y2 + 16g1g2g5g6y2 − 72g1g2g3g5y1y2 2 2 4 2 3 4 2 3 4 2 2 2 2 2 4 2 2 +104g1g2g3g5g6y1y2 + 64g1g2g4g5g6y1y2 − 32g1g2g5g6y1y2 + 121g1g2g3g5y1y2 3 4 2 2 2 2 4 2 2 4 4 2 2 2 3 2 4 3 2 −112g1g2g3g5g6y1y2 + 32g1g2g4g5g6y1y2 + 16g2g5g6y1y2 − 14g1g2g3g5y1y2 2 2 4 3 2 3 2 4 3 2 4 4 3 2 3 4 3 2 −216g1g2g3g4g5y1y2 + 128g1g2g4g5y1y2 + 8g2g3g5g6y1y2 + 192g1g2g4g5g6y1y2 4 2 4 4 2 3 4 4 2 2 2 2 4 4 2 4 4 4 2 4 4 5 2 +g2g3g5y1y2 − 16g1g2g3g4g5y1y2 + 272g1g2g4g5y1y2 + 32g2g4g5g6y1y2 + 8g2g3g4g5y1y2 144 APPENDIX C. FORM OF THE COMPLEX COHERENCE FUNCTION

3 2 4 5 2 4 2 4 6 2 3 2 4 3 3 2 4 3 2 3 4 3 +160g1g2g4g5y1y2 + 16g2g4g5y1y2 + 16g1g2g5g6y2 − 12g1g2g3g5y1y2 − 16g1g2g5g6y1y2 2 3 4 2 3 3 2 4 2 3 4 4 2 3 4 4 3 3 −68g1g2g3g5y1y2 + 80g1g2g4g5y1y2 + 32g1g2g5g6y1y2 + 8g1g2g3g5y1y2 2 3 4 3 3 4 4 4 3 3 3 4 4 2 4 4 2 4 3 2 4 5 +112g1g2g4g5y1y2 + 32g1g2g4g5y1y2 + 16g1g2g5y1y2 + 16g1g2g5y1y2 + 12g2g3g4g5y1y2 (C.6)

2 4 2 2 3 2 3 3 2 4 3 4 4 2 β1 =16g1g5g6 + 4g1g3g5g6y1 − 16g1g3g5g6y1 + 52g1g3g5g6y1 + 4g3g5g6y1 − 32g1g2g5g6y1 2 2 4 2 3 2 2 2 2 2 2 3 3 2 2 3 2 +36g1g3g5y1 + 8g1g2g3g5g6y1 − 16g1g3g4g5g6y1 + 16g2g3g5g6y1 + 128g1g3g4g5g6y1 4 2 2 4 2 2 4 2 2 4 2 2 2 3 2 3 −56g1g2g3g5g6y1 + 16g1g4g5g6y1 − 16g3g4g5g6y1 + 16g2g5g6y1 + 16g1g3g4g5y1 3 3 3 2 4 3 2 4 3 3 4 3 2 3 2 3 2 2 3 −64g1g3g4g5y1 − 24g1g2g3g5y1 + 16g1g3g4g5y1 + 16g3g4g5y1 + 4g2g3g5g6y1 − 32g1g2g3g4g5g6y1 2 2 2 3 2 3 3 2 3 3 2 4 3 4 3 −64g1g3g4g5g6y1 − 128g2g3g4g5g6y1 − 256g1g3g4g5g6y1 + 20g2g3g5g6y1 + 32g1g2g4g5g6y1 2 4 3 3 2 4 2 2 2 2 4 3 3 4 2 2 3 4 −64g3g4g5g6y1 + 32g1g2g3g4g5y1 + 128g1g3g4g5y1 + 64g2g3g4g5y1 − 256g1g3g4g5y1 2 2 4 4 4 4 2 2 4 4 2 2 2 4 2 2 4 +4g2g3g5y1 + 32g1g2g3g4g5y1 + 128g3g4g5y1 − 16g2g3g4g5g6y1 − 128g1g2g3g4g5g6y1 2 3 2 4 2 3 4 2 4 4 3 4 4 2 3 2 5 +256g1g4g5g6y1 + 256g2g3g4g5g6y1 + 16g2g4g5g6y1 + 256g4g5g6y1 + 16g2g3g4g5y1 2 2 2 5 2 3 2 5 2 2 3 5 2 4 5 3 4 5 +256g1g2g3g4g5y1 + 256g1g3g4g5y1 + 256g2g3g4g5y1 + 16g2g3g4g5y1 + 256g3g4g5y1 2 2 2 5 3 2 5 2 2 2 2 6 3 2 6 2 3 2 6 −64g2g3g4g5g6y1 + 512g1g2g4g5g6y1 + 128g2g3g4g5y1 + 512g1g2g3g4g5y1 + 256g2g4g5g6y1 2 3 2 7 3 2 2 3 3 3 4 2 4 3 3 2 +256g2g3g4g5y1 + 4g1g3g5g6y2 + 16g1g3g5g6y2 + 4g1g5g6y2 + 4g1g3g5g6y2 + 18g1g3g5y1y2 3 2 3 2 3 3 3 4 3 4 2 2 2 +36g1g3g5y1y2 − 36g1g3g5y1y2 + 9g1g3g5y1y2 + 9g1g3g5y1y2 + 4g1g2g3g5g6y1y2 3 2 2 3 2 3 2 4 2 4 −32g1g3g4g5g6y1y2 − 64g1g2g3g5g6y1y2 + 32g1g2g3g5g6y1y2 + 12g1g2g5g6y1y2 − 4g2g3g5g6y1y2 4 2 3 2 2 3 2 2 2 2 2 3 2 3 3 2 −32g1g3g4g5g6y1y2 + 22g1g2g3g5y1y2 − 56g1g3g4g5y1y2 − 96g1g2g3g5y1y2 + 32g1g2g3g5y1y2 3 3 2 2 2 3 2 2 4 2 3 4 2 3 4 2 +16g1g3g4g5y1y2 + 96g1g3g4g5y1y2 + 19g1g2g3g5y1y2 + g2g3g5y1y2 + 4g1g4g5y1y2 2 4 2 2 2 2 2 2 2 2 3 2 2 2 +28g1g3g4g5y1y2 − 4g1g2g3g5g6y1y2 − 96g1g2g3g4g5g6y1y2 + 64g1g4g5g6y1y2 2 3 2 2 2 3 2 3 2 2 4 2 +80g1g2g3g5g6y1y2 − 32g2g3g5g6y1y2 − 128g1g2g3g4g5g6y1y2 + 12g1g2g5g6y1y2 4 2 2 4 2 2 3 2 3 2 2 2 3 −32g2g3g4g5g6y1y2 + 64g1g4g5g6y1y2 + 6g1g2g3g5y1y2 + 88g1g2g3g4g5y1y2 3 2 2 3 2 2 3 3 2 3 3 3 2 3 3 +32g1g3g4g5y1y2 + 52g1g2g3g5y1y2 + 4g2g3g5y1y2 − 128g1g2g3g4g5y1y2 2 2 3 3 2 4 3 2 4 3 2 4 3 2 4 3 −64g1g3g4g5y1y2 + 11g1g2g3g5y1y2 + 12g1g2g4g5y1y2 + 28g2g3g4g5y1y2 − 16g1g3g4g5y1y2 3 2 2 3 2 2 3 2 2 2 3 3 3 3 −4g2g3g5g6y1y2 − 96g1g2g3g4g5g6y1y2 + 320g1g2g4g5g6y1y2 − 32g2g3g5g6y1y2 2 3 3 3 4 3 2 4 3 3 3 2 4 2 2 2 4 +128g2g3g4g5g6y1y2 + 4g2g5g6y1y2 + 192g2g4g5g6y1y2 + 2g2g3g5y1y2 + 152g1g2g3g4g5y1y2 2 2 2 4 3 3 2 4 3 2 3 4 2 3 4 +96g1g2g3g4g5y1y2 + 128g1g4g5y1y2 − 8g2g3g5y1y2 − 48g1g2g3g4g5y1y2 2 2 3 4 3 4 4 2 4 4 2 4 4 3 4 4 +32g2g3g4g5y1y2 + g2g3g5y1y2 + 12g1g2g4g5y1y2 + 112g2g3g4g5y1y2 + 64g1g4g5y1y2 3 2 4 2 2 2 4 3 2 2 5 2 2 2 5 −32g2g3g4g5g6y1y2 + 448g1g2g4g5g6y1y2 + 8g2g3g4g5y1y2 + 96g1g2g3g4g5y1y2 2 3 2 5 3 3 5 2 2 3 5 3 4 5 3 4 5 +384g1g2g4g5y1y2 − 32g2g3g4g5y1y2 + 64g2g3g4g5y1y2 + 4g2g4g5y1y2 + 64g2g4g5y1y2 3 2 2 5 3 2 2 6 2 3 2 6 3 3 2 7 2 2 4 2 +192g2g4g5g6y1y2 + 32g2g3g4g5y1y2 + 384g1g2g4g5y1y2 + 128g2g4g5y1y2 + 9g1g3g5y2 3 2 2 4 2 3 2 2 2 3 3 2 2 2 3 2 −16g1g2g3g5g6y2 − 8g1g2g3g5g6y2 − 8g1g2g3g5y1y2 + 16g1g2g3g5y1y2 + 48g1g2g3g5y1y2 3 4 2 2 4 2 2 4 2 2 2 2 2 3 2 2 +4g1g2g5y1y2 − 2g1g2g3g5y1y2 − 24g1g3g4g5y1y2 + 8g1g2g3g5g6y1y2 + 64g1g2g4g5g6y1y2 145

2 3 2 2 4 2 4 2 2 2 2 2 2 2 −16g1g2g3g5g6y1y2 − 4g2g3g5g6y1y2 + 32g1g2g4g5g6y1y2 + 58g1g2g3g5y1y2 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2 2 −32g1g2g3g4g5y1y2 − 16g1g2g3g5y1y2 − 52g1g2g3g5y1y2 − 64g1g2g3g4g5y1y2 2 2 4 2 2 2 2 4 2 2 4 2 2 2 2 4 2 2 3 2 2 2 +8g1g2g5y1y2 + 2g2g3g5y1y2 − 48g1g2g3g4g5y1y2 + 16g1g4g5y1y2 − 48g1g2g3g5g6y1y2 2 2 2 2 2 3 3 2 2 2 4 2 2 3 2 2 3 2 +32g1g2g4g5g6y1y2 + 16g2g3g5g6y1y2 + 48g2g4g5g6y1y2 − 12g1g2g3g5y1y2 2 2 2 3 2 3 2 2 3 2 3 3 3 2 3 2 3 3 2 −272g1g2g3g4g5y1y2 + 256g1g2g4g5y1y2 − 32g1g2g3g5y1y2 + 4g2g3g5y1y2 3 4 3 2 2 4 3 2 2 4 3 2 4 2 3 2 +4g1g2g5y1y2 + 16g2g3g4g5y1y2 + 96g1g2g4g5y1y2 + 8g2g3g5g6y1y2 3 2 3 2 4 2 2 4 2 3 2 4 2 2 2 2 2 4 2 +192g1g2g4g5g6y1y2 + 2g2g3g5y1y2 − 32g1g2g3g4g5y1y2 + 544g1g2g4g5y1y2 3 3 4 2 2 2 4 4 2 4 2 4 2 4 2 5 2 +16g2g3g4g5y1y2 + 32g2g4g5y1y2 + 32g2g4g5g6y1y2 + 16g2g3g4g5y1y2 3 2 2 5 2 4 2 2 6 2 2 4 3 3 2 2 3 +320g1g2g4g5y1y2 + 32g2g4g5y1y2 − 12g1g2g3g5y2 + 16g1g2g5g6y2 2 4 3 3 2 2 3 2 2 3 3 2 4 3 2 4 3 +4g1g2g5g6y2 − 24g1g2g3g5y1y2 − 16g1g2g3g5y1y2 − 7g1g2g3g5y1y2 + 16g1g2g4g5y1y2 2 3 2 3 3 4 3 2 3 2 2 3 3 2 2 2 3 3 3 2 3 −16g1g2g5g6y1y2 + 4g2g5g6y1y2 − 72g1g2g3g5y1y2 + 160g1g2g4g5y1y2 + 16g1g2g3g5y1y2 3 4 2 3 2 4 2 3 4 2 2 3 4 2 3 3 +g2g3g5y1y2 + 36g1g2g4g5y1y2 + 32g1g2g5g6y1y2 + 16g1g2g3g5y1y2 2 3 2 3 3 3 4 3 3 4 2 4 3 2 2 4 4 3 3 2 4 +224g1g2g4g5y1y2 + 4g2g4g5y1y2 + 64g1g2g4g5y1y2 + 4g1g2g5y2 + 32g1g2g5y1y2 3 4 4 2 4 2 2 4 2 2 3 2 3 3 2 3 +4g1g2g5y1y2 + 32g1g2g5y1y2 + 32g1g3g5g6y3 + 36g1g3g5y1y3 − 64g1g2g3g5g6y1y3 2 3 3 3 2 2 2 3 2 2 2 3 2 −256g1g3g4g5g6y1y3 − 24g1g2g3g5y1y3 − 208g1g3g4g5y1y3 + 32g2g3g5g6y1y3 2 2 3 2 2 3 3 3 2 3 3 2 2 3 3 +512g1g4g5g6y1y3 + 4g2g3g5y1y3 + 96g1g2g3g4g5y1y3 + 704g1g3g4g5y1y3 2 3 3 2 2 3 4 2 3 4 2 3 3 4 −1024g1g2g4g5g6y1y3 + 48g2g3g4g5y1y3 − 1152g1g2g3g4g5y1y3 + 256g1g4g5y1y3 2 2 3 4 2 2 3 5 3 3 5 2 3 3 6 3 2 3 +512g2g4g5g6y1y3 + 192g2g3g4g5y1y3 + 512g1g2g4g5y1y3 + 256g2g4g5y1y3 + 36g1g3g5y2y3 2 3 2 2 3 3 3 2 3 −64g1g2g3g5g6y2y3 − 124g1g2g3g5y1y2y3 − 96g1g3g4g5y1y2y3 + 128g1g2g3g5g6y1y2y3 2 3 2 2 3 2 2 3 2 3 2 3 2 +256g1g2g4g5g6y1y2y3 + 92g1g2g3g5y1y2y3 + 352g1g2g3g4g5y1y2y3 + 64g1g4g5y1y2y3 3 3 2 2 3 2 3 2 3 3 2 3 3 −64g2g3g5g6y1y2y3 − 512g1g2g4g5g6y1y2y3 − 4g2g3g5y1y2y3 − 544g1g2g3g4g5y1y2y3 2 2 3 3 3 3 3 3 3 4 2 2 3 4 +320g1g2g4g5y1y2y3 + 256g2g4g5g6y1y2y3 + 32g2g3g4g5y1y2y3 + 448g1g2g4g5y1y2y3 3 2 3 5 3 3 2 2 2 3 2 2 2 3 2 +192g2g4g5y1y2y3 − 48g1g2g3g5y2y3 + 32g1g2g5g6y2y3 + 104g1g2g3g5y1y2y3 3 3 2 3 3 2 3 3 2 2 +64g1g2g4g5y1y2y3 − 64g1g2g5g6y1y2y3 − 112g1g2g3g5y1y2y3 2 2 3 2 2 4 3 2 2 4 3 3 2 3 3 3 2 +32g1g2g4g5y1y2y3 + 32g2g5g6y1y2y3 + 8g2g3g5y1y2y3 + 192g1g2g4g5y1y2y3 4 3 4 2 3 2 3 3 2 3 3 3 4 3 2 3 +32g2g4g5y1y2y3 + 16g1g2g5y2y3 − 16g1g2g5y1y2y3 + 32g1g2g5y1y2y3 2 3 3 3 2 2 3 3 2 −256g2g3g4g5g6y1y3 + 512g1g2g3g4g5g6y1y3 + 48g1g2g3g4g5y1y2 (C.7)

2 2 3 2 3 4 2 2 2 2 2 3 2 β2 =4g1g3g5g6y1 + 4g3g5g6y1 − 16g1g3g5g6y1 + 4g3g5g6y1 + 4g1g3g5y1 − 16g1g3g5y1 2 4 2 2 2 2 2 2 2 2 2 3 2 4 2 +4g3g5y1 + 8g1g2g3g5g6y1 + 16g1g4g5g6y1 − 16g3g4g5g6y1 + 16g2g3g5g6y1 + 16g4g5g6y1 2 2 3 2 2 3 3 2 3 2 3 3 4 3 2 2 3 +8g1g2g3g5y1 + 16g1g3g4g5y1 + 16g3g4g5y1 + 16g2g3g5y1 + 16g3g4g5y1 + 4g2g3g5g6y1 2 3 2 2 3 2 2 2 4 2 4 2 2 2 4 +32g1g2g4g5g6y1 − 64g3g4g5g6y1 + 4g2g3g5y1 + 32g1g2g3g4g5y1 + 128g3g4g5y1 2 2 4 3 2 4 2 2 5 3 2 5 3 2 2 2 +16g2g4g5g6y1 + 256g4g5g6y1 + 16g2g3g4g5y1 + 256g3g4g5y1 + 4g1g5g6y2 + 4g1g3g5g6y2 146 APPENDIX C. FORM OF THE COMPLEX COHERENCE FUNCTION

3 3 3 2 2 3 3 2 2 2 2 3 2 +g1g3y1y2 + 4g1g3g5y1y2 − 4g1g3g5y1y2 + 10g1g3g5y1y2 − 16g1g3g5y1y2 + 10g1g3g5y1y2 2 3 4 2 2 2 2 2 4 +4g1g3g5y1y2 + 5g1g3g5y1y2 + 12g1g2g5g6y1y2 − 4g2g3g5g6y1y2 − 32g1g3g4g5g6y1y2 + 4g2g5g6y1y2 2 3 2 3 2 2 3 2 2 2 2 2 2 2 +3g1g2g3y1y2 − 4g1g3g4y1y2 + 16g1g3g4g5y1y2 + 32g1g3g4g5y1y2 + 22g1g2g3g5y1y2 2 2 2 3 2 2 3 2 2 2 2 2 2 3 2 +48g1g2g3g5y1y2 + 2g2g3g5y1y2 + 8g1g4g5y1y2 + 24g1g3g4g5y1y2 − 8g2g3g5y1y2 3 2 4 2 4 2 2 2 2 2 2 +16g1g3g4g5y1y2 + 5g2g3g5y1y2 + 4g1g4g5y1y2 + 12g1g2g5g6y1y2 − 32g2g3g4g5g6y1y2 2 2 2 2 3 3 2 2 3 3 2 3 2 2 3 +64g1g4g5g6y1y2 + 3g1g2g3y1y2 − 12g1g2g3g4y1y2 − 16g1g3g4y1y2 − 12g1g2g3g5y1y2 2 3 3 2 2 3 2 2 3 2 2 2 3 2 2 3 +4g2g3g5y1y2 − 64g1g3g4g5y1y2 + 14g1g2g3g5y1y2 − 32g2g3g5y1y2 + 24g1g2g4g5y1y2 2 2 3 2 2 3 2 3 3 3 3 4 3 +24g2g3g4g5y1y2 − 32g1g3g4g5y1y2 + 4g2g3g5y1y2 − 32g2g3g4g5y1y2 + 4g2g4g5y1y2 3 2 3 2 2 3 3 3 4 2 2 4 2 2 4 2 3 +4g2g5g6y1y2 + 192g2g4g5g6y1y2 + g2g3y1y2 − 12g1g2g3g4y1y2 − 48g1g2g3g4y1y2 − 4g1g3g5y1y2 3 3 4 3 2 4 2 4 2 2 4 3 2 4 2 2 4 +64g1g4y1y2 − 8g2g3g5y1y2 − 48g1g2g3g4g5y1y2 − 32g2g3g4g5y1y2 + 2g2g3g5y1y2 + 24g1g2g4g5y1y2 2 2 4 3 2 4 3 2 5 2 2 5 2 3 5 +96g2g3g4g5y1y2 + 128g1g4g5y1y2 − 4g2g3g4y1y2 − 48g1g2g3g4y1y2 + 192g1g2g4y1y2 3 5 2 2 5 3 2 5 3 2 5 3 2 6 −32g2g3g4g5y1y2 + 64g2g3g4g5y1y2 + 8g2g4g5y1y2 + 128g2g4g5y1y2 − 16g2g3g4y1y2 2 3 6 3 3 7 2 2 2 2 2 3 2 2 4 2 2 2 +192g1g2g4y1y2 + 64g2g4y1y2 + 10g1g3g5y2 + 4g1g3g5y2 + g1g5y2 − 8g1g2g3g5g6y2 3 2 2 2 2 3 2 2 2 2 2 2 2 2 +16g1g2g3g5y1y2 + 16g1g2g3g5y1y2 + 8g1g2g5y1y2 − 4g1g2g3g5y1y2 − 32g1g3g4g5y1y2 3 2 4 2 2 2 2 2 2 2 2 2 2 2 −4g1g2g3g5y1y2 + 2g1g2g5y1y2 − 4g2g3g5g6y1y2 + 32g1g2g4g5g6y1y2 + g1g2g3y1y2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 −32g1g2g3g4y1y2 − 16g1g2g3g5y1y2 − 20g1g2g3g5y1y2 − 64g1g2g3g4g5y1y2 + 16g1g2g5y1y2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 4 2 2 +4g2g3g5y1y2 − 64g1g2g3g4g5y1y2 + 32g1g4g5y1y2 − 8g2g3g5y1y2 + g2g5y1y2 2 2 2 2 3 2 3 2 2 2 3 2 3 2 3 2 3 3 2 +48g2g4g5g6y1y2 + 2g1g2g3y1y2 − 56g1g2g3g4y1y2 + 128g1g2g4y1y2 − 32g1g2g3g5y1y2 3 2 3 2 2 3 2 3 2 3 2 2 2 3 2 2 2 3 2 +4g2g3g5y1y2 + 48g1g2g3g4g5y1y2 + 8g1g2g5y1y2 + 16g2g3g4g5y1y2 + 192g1g2g4g5y1y2 4 2 4 2 3 4 2 2 2 2 4 2 3 4 2 2 2 2 4 2 +g2g3y1y2 − 16g1g2g3g4y1y2 + 272g1g2g4y1y2 + 16g2g3g4g5y1y2 + 64g2g4g5y1y2 4 5 2 3 2 5 2 4 2 6 2 2 2 3 2 2 3 +8g2g3g4y1y2 + 160g1g2g4y1y2 + 16g2g4y1y2 − 16g1g2g3g5y2 + 4g1g2g5g6y2 3 2 3 2 2 3 2 2 3 2 2 3 3 2 3 −12g1g2g3y1y2 − 16g1g2g3g5y1y2 − 6g1g2g3g5y1y2 + 32g1g2g4g5y1y2 + 4g2g5g6y1y2 2 3 2 3 3 2 2 3 3 2 3 3 2 2 3 2 2 2 3 −4g1g2g3y1y2 + 80g1g2g4y1y2 + 16g1g2g3g5y1y2 + 2g2g3g5y1y2 + 72g1g2g4g5y1y2 4 3 3 2 3 3 3 3 2 3 3 4 4 3 2 2 2 4 +8g1g2g3y1y2 + 112g1g2g4y1y2 + 8g2g4g5y1y2 + 32g1g2g4y1y2 + 8g1g2g5y2 3 3 4 3 2 4 2 4 2 4 2 3 2 3 3 2 +16g1g2y1y2 + 8g1g2g5y1y2 + 16g1g2y1y2 + 32g1g5g6y3 + 4g1g3g5y1y3 − 16g1g3g5y1y3 2 3 3 3 3 3 2 2 2 2 +52g1g3g5y1y3 + 4g3g5y1y3 − 64g1g2g5g6y1y3 + 8g1g2g3g5y1y3 − 16g1g3g4g5y1y3 3 2 2 2 2 2 3 2 2 3 2 2 3 2 +16g2g3g5y1y3 + 128g1g3g4g5y1y3 − 56g1g2g3g5y1y3 + 16g1g4g5y1y3 − 16g3g4g5y1y3 2 3 2 2 3 3 2 3 2 2 3 2 2 3 +32g2g5g6y1y3 + 4g2g3g5y1y3 − 32g1g2g3g4g5y1y3 − 64g1g3g4g5y1y3 − 128g2g3g4g5y1y3 2 2 3 2 3 3 3 3 2 3 3 2 2 4 −256g1g3g4g5y1y3 + 20g2g3g5y1y3 + 32g1g2g4g5y1y3 − 64g3g4g5y1y3 − 16g2g3g4g5y1y3 2 4 2 3 4 2 2 4 2 3 4 3 3 4 −128g1g2g3g4g5y1y3 + 256g1g4g5y1y3 + 256g2g3g4g5y1y3 + 16g2g4g5y1y3 + 256g4g5y1y3 2 2 5 3 5 2 3 6 3 2 3 2 −64g2g3g4g5y1y3 + 512g1g2g4g5y1y3 + 256g2g4g5y1y3 + 4g1g3g5y2y3 + 16g1g3g5y2y3 3 3 2 3 2 2 3 2 2 +4g1g5y2y3 + 4g1g3g5y2y3 + 4g1g2g3g5y1y2y3 − 32g1g3g4g5y1y2y3 − 64g1g2g3g5y1y2y3 2 2 2 3 2 3 3 2 2 2 +32g1g2g3g5y1y2y3 + 12g1g2g5y1y2y3 − 4g2g3g5y1y2y3 − 32g1g3g4g5y1y2y3 − 4g1g2g3g5y1y2y3 2 2 3 2 2 2 2 2 2 2 2 2 −96g1g2g3g4g5y1y2y3 + 64g1g4g5y1y2y3 + 80g1g2g3g5y1y2y3 − 32g2g3g5y1y2y3 2 2 2 3 2 3 2 2 3 2 3 2 3 −128g1g2g3g4g5y1y2y3 + 12g1g2g5y1y2y3 − 32g2g3g4g5y1y2y3 + 64g1g4g5y1y2y3 − 4g2g3g5y1y2y3 147

2 3 2 2 3 3 2 3 2 2 3 −96g1g2g3g4g5y1y2y3 + 320g1g2g4g5y1y2y3 − 32g2g3g5y1y2y3 + 128g2g3g4g5y1y2y3 3 3 3 2 3 3 3 4 2 2 4 3 2 5 +4g2g5y1y2y3 + 192g2g4g5y1y2y3 − 32g2g3g4g5y1y2y3 + 448g1g2g4g5y1y2y3 + 192g2g4g5y1y2y3 3 2 3 2 2 2 2 3 2 2 2 2 −16g1g2g3g5y2y3 − 8g1g2g3g5y2y3 + 8g1g2g3g5y1y2y3 + 64g1g2g4g5y1y2y3 − 16g1g2g3g5y1y2y3 2 3 2 3 2 3 2 2 2 2 2 2 −4g2g3g5y1y2y3 + 32g1g2g4g5y1y2y3 − 48g1g2g3g5y1y2y3 + 32g1g2g4g5y1y2y3 3 2 2 2 2 3 2 2 4 3 2 3 3 2 4 4 2 +16g2g3g5y1y2y3 + 48g2g4g5y1y2y3 + 8g2g3g5y1y2y3 + 192g1g2g4g5y1y2y3 + 32g2g4g5y1y2y3 3 2 3 2 3 3 2 3 3 3 3 3 4 2 3 +16g1g2g5y2y3 + 4g1g2g5y2y3 − 16g1g2g5y1y2y3 + 4g2g5y1y2y3 + 32g1g2g5y1y2y3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 +16g1g3g5y3 − 32g1g2g3g5y1y3 − 128g1g3g4g5y1y3 + 16g2g3g5y1y3 + 256g1g2g3g4g5y1y3 2 2 2 2 2 2 2 3 2 2 2 3 2 2 2 2 4 2 2 2 2 2 +256g1g4g5y1y3 − 128g2g3g4g5y1y3 − 512g1g2g4g5y1y3 + 256g2g4g5y1y3 − 256g1g2g4g5y1y2y3 2 2 2 2 2 2 2 2 2 3 2 2 2 −32g1g2g3g5y2y3 + 64g1g2g3g5y1y2y3 + 128g1g2g4g5y1y2y3 − 32g2g3g5y1y2y3 3 2 3 2 2 2 2 2 2 3 2 2 2 4 2 2 2 2 4 +128g2g4g5y1y2y3 + 16g1g2g5y2y3 − 32g1g2g5y1y2y3 + 16g2g5y1y2y3 + 4g1g5g6y2 (C.8)

2 2 2 2 2 2 2 3 2 3 3 β3 =4g3g5g6y1 + 4g3g5y1 + 16g4g5g6y1 + 16g3g4g5y1 + 4g1g5g6y2 + g1g3y1y2 + g1g3y1y2 2 2 2 2 2 3 2 3 2 +4g1g3g5y1y2 + 6g1g3g5y1y2 + 4g2g5g6y1y2 + 3g1g2g3y1y2 + g2g3y1y2 + 4g1g4y1y2 2 2 2 2 2 2 2 2 2 2 3 −4g1g3g4y1y2 − 8g2g3g5y1y2 + 16g1g3g4g5y1y2 + 6g2g3g5y1y2 + 8g1g4g5y1y2 + 3g1g2g3y1y2 2 3 2 3 2 3 2 3 3 2 3 +12g1g2g4y1y2 − 4g2g3g4y1y2 − 16g1g3g4y1y2 + 4g2g3g5y1y2 − 32g2g3g4g5y1y2 + 8g2g4g5y1y2 2 4 2 4 3 4 3 5 3 5 2 2 2 2 2 +12g1g2g4y1y2 − 16g2g3g4y1y2 + 64g1g4y1y2 + 4g2g4y1y2 + 64g2g4y1y2 + g1g3y2 + 4g1g3g5y2 3 2 2 2 2 2 2 2 2 2 2 2 2 +4g1g2y1y2 − 2g1g2g3y1y2 − 8g1g3g4y1y2 − 4g1g2g3g5y1y2 + 4g1g2g5y1y2 + 8g1g2y1y2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 2 2 3 2 −16g1g2g3g4y1y2 + 16g1g4y1y2 − 8g2g3g5y1y2 + 2g2g5y1y2 + 4g1g2y1y2 + 96g1g2g4y1y2 2 2 4 2 2 3 2 3 2 3 3 2 3 2 2 3 +32g2g4y1y2 − 4g1g2g3y2 + g1g2g3y1y2 + 16g1g2g4y1y2 + g2g3y1y2 + 36g1g2g4y1y2 3 3 3 2 2 4 3 4 2 3 2 3 +4g2g4y1y2 + 4g1g2y2 + 4g1g2y1y2 + 4g1g3g5y1y3 + 4g3g5y1y3 − 16g1g3g5y1y3 + 4g3g5y1y3 2 2 2 2 2 2 2 3 2 2 3 +8g1g2g3g5y1y3 + 16g1g4g5y1y3 − 16g3g4g5y1y3 + 16g2g3g5y1y3 + 16g4g5y1y3 + 4g2g3g5y1y3 3 2 3 2 4 3 4 3 +32g1g2g4g5y1y3 − 64g3g4g5y1y3 + 16g2g4g5y1y3 + 256g4g5y1y3 + 4g1g5y2y3 2 3 2 2 +4g1g3g5y2y3 + 4g1g5y2y3 + 12g1g2g5y1y2y3 − 4g2g3g5y1y2y3 − 32g1g3g4g5y1y2y3 2 2 2 2 2 +12g1g2g5y1y2y3 − 32g2g3g4g5y1y2y3 + 64g1g4g5y1y2y3 3 3 2 3 2 2 2 2 +4g2g5y1y2y3 + 192g2g4g5y1y2y3 − 8g1g2g3g5y2y3 − 4g2g3g5y1y2y3 + 32g1g2g4g5y1y2y3 2 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 +48g2g4g5y1y2y3 + 4g1g2g5y2y3 + 4g2g5y1y2y3 + 16g1g5y3 − 32g1g2g5y1y3 + 16g2g5y1y3 2 3 4 2 2 2 2 2 2 2 3 −4g1g3g5y1y2 + g2g3y1y2 + 2g1g5y2 + 2g2g3y1y2 + 4g2g5y1y2y3 (C.9)

2 2 3 2 2 2 β4 =g1g3y1y2 + g2g3y1y2 + 4g1g4y1y2 + 4g2g4y1y2 + g1y2 + 2g1g2y1y2 2 2 2 2 +g2y1y2 + 4g3g5y1y3 + 16g4g5y1y3 + 4g1g5y2y3 + 4g2g5y1y2y3. (C.10)