Systems Analysis of Stochastic and Population Balance Models for Chemically Reacting Systems

by

Eric Lynn Haseltine

A dissertation submitted in partial fulfillment

of the requirements for the degree of

DOCTOR OF PHILOSOPHY

(Chemical Engineering)

at the

UNIVERSITY OF WISCONSIN–MADISON

2005 c Copyright by Eric Lynn Haseltine 2005 All Rights Reserved i

To Lori and Grace, for their love and support ii

Systems Analysis of Stochastic and Population Balance Models for Chemically Reacting Systems

Eric Lynn Haseltine Under the supervision of Professor James B. Rawlings At the University of Wisconsin–Madison

Chemical reaction models present one method of analyzing complex reaction pathways. Most models of chemical reaction networks employ a traditional, deterministic setting. The short- comings of this traditional framework, namely difficulty in accounting for population het- erogeneity and discrete numbers of reactants, motivate the need for more flexible modeling frameworks such as stochastic and cell population balance models. How to efficiently use models to perform systems-level tasks such as parameter estimation and feedback controller design is important in all frameworks. Consequently, this thesis focuses on three main areas:

1. improving the methods used to simulate and perform systems-level tasks using stochas- tic models,

2. formulating and applying cell population balance models to better account for experi- mental data, and

3. applying moving-horizon estimation to improve state estimates for nonlinear reaction systems.

For stochastic models, we have derived and implemented techniques that improve simulation efficiency and perform systems-level tasks using these simulations. For discrete stochastic models, these systems-level tasks rely on approximate, biased sensitivities, whereas continuous models (i.e. stochastic differential equations) permit calculation of unbiased sen- sitivities. Numerous examples illustrate the efficiency of these methods, including an applica- tion to modeling of batch crystallization systems. We have also investigated using cell population balance models to incorporate both intracellular and extracellular levels of information in viral infections. Given experimental im- ages of the focal infection system for vesicular stomatitis virus, we have applied these models to better understand the dynamics of multiple rounds of virus infection and the interferon (antiviral) host response. The model provides estimates of key parameters and suggests that the experimental technique may cause salient features in the data. We have also proposed an iii efficient and accurate model decomposition that predicts population-level measurements of intracellular and extracellular species. Finally, we have assessed the capabilities of several state estimators, including moving- horizon estimation (MHE) and the extended Kalman filter (EKF). When multiple optima arise in the estimation problem, the judicious use of constraints and nonlinear optimization as em- ployed by MHE can lead to improved state estimates and closed-loop control performance than the EKF. This improvement comes at the price of the computational expense required to solve the MHE optimization. iv v

Acknowledgments

“Whatever you do, work at it with all your heart, as working for the Lord, not for men, since you know that you will receive an inheritance from the Lord as a reward.” -Colossians 3:23-24 I first thank God, creator of heaven and earth, by whose grace I have had the opportu- nity to complete the work comprising this thesis. I thank my wife Lori, for her love, patience, and support. I would not have had the courage to aim so high without your encouragement. Also, the years in Madison would not have been as special without your presence. I thank my daughter Grace, who has always been able to make me smile during this past year no matter how far graduation seemed away. I am grateful to my family: my parents, Doug and Lydia, and my brother, David. With- out your support and guidance through the years of my life, I would not be where I am today. I also wish to thank my in-laws, Carl and Linda Rutkowski, in particular for supporting my wife these past five years. I thank my extended church family at Mad City Church: the Billers, the Thompsons, the Smiths, the Sells, and the Konkols. In particular, I wish to acknowledge Shane and Karen Biller, who have loved, supported, and prayed for my family as if we were part of their own. There are many people in the chemical engineering department at the University of Wisconsin whom I must also acknowledge. First, I thank my advisor, Jim Rawlings, for giving me great latitude to exercise my creativity and to study interesting problems. I am always amazed by your ability to identify the important problems in a field. It has been a great honor to work with you and learn from you. I am also grateful to John Yin for first listening to my modeling ideas, then making ways for me to collaborate with his group. I am deeply indebted to Gabriele Pannocchia, who always made time to answer my questions, no matter how trivial. Since imitation is the highest form of flattery, I have tried to be as patient, kind, and understanding to my junior group members as you were to me. I could always count on either reasoning out research problems or taking a break for humor with Aswin Venkat (a.k.a. the British spy). Thank you, Matt Tenny, for your help in the office and the weight room, although perhaps I would have graduated sooner if you had not introduced me to Nethack. Brian Odelson and Daniel Patience always kept me from taking research too seriously, be it rounding everyone up for a game of darts, or getting MJ to drop by for an ice cream break. Thanks also to John Eaton for Octave and Linux support; who would have figured five years ago that I would install Linux on my laptop? It has been a pleasure getting to know Paul vi

Larsen, Murali Rajamani, and especially Ethan Mastny, who listened to almost all of my ideas on stochastic simulation. I also thank former Rawlings group members Jenny Wang, Scott Middlebrooks, and Chris Rao for their help during my first years in the group. Finally, I have had the great pleasure of getting to know the Yin group over the past year. In particular, I thank Vy Lam for graciously putting up with of my experimental questions. I am also grateful to Patrick Suthers and Hwijin Kim for their friendship.

ERIC LYNN HASELTINE

University of Wisconsin–Madison February 2005 vii

Contents

Abstract ii

Acknowledgments v

List of Tables xiii

List of Figures xv

Chapter 1 Introduction 1

Chapter 2 Literature Review 5 2.1 Traditional Deterministic Reaction Models ...... 5 2.2 Systems Level Tasks for Deterministic Models ...... 8 2.2.1 Optimal Control ...... 8 2.2.2 State Estimation ...... 9 2.2.3 Parameter Estimation ...... 10 2.2.4 Sensitivities ...... 13 2.3 Stochastic Reaction Models ...... 15 2.3.1 Monte Carlo Simulation of the Stochastic Model ...... 16 2.3.2 Performing Systems Level Tasks with Stochastic Models ...... 25 2.4 Population Balance Models ...... 26

Chapter 3 Motivation 29 3.1 Current Limitations of Stochastic Models ...... 29 3.1.1 Integration Methods ...... 29 3.1.2 Systems Level Tasks ...... 31 3.2 Current Limitations of Traditional Deterministic Models ...... 33 3.3 Current Limitations of State Estimation Techniques ...... 34

Chapter 4 Approximations for Stochastic Reaction Models 35 4.1 Stochastic Partitioning ...... 35 4.1.1 Slow Reaction Subset ...... 38 4.1.2 Fast Reaction Subset ...... 39 4.1.3 The Combined System ...... 40 4.1.4 The Equilibrium Approximation ...... 40 viii

4.1.5 The Langevin and Deterministic Approximations ...... 41 4.2 Numerical Implementation of the Approximations ...... 44 4.2.1 Simulating the Equilibrium Approximation ...... 46 4.2.2 Simulating the Langevin and Deterministic Approximations: Exact Next Reaction Time ...... 47 4.2.3 Simulating the Langevin and Deterministic Approximations: Approxi- mate Next Reaction Time ...... 49 4.3 Practical Implementation ...... 50 4.4 Examples ...... 50 4.4.1 Enzyme Kinetics ...... 51 4.4.2 Simple Crystallization ...... 52 4.4.3 Intracellular Viral Infection ...... 59 4.5 Critical Analysis of the Stochastic Approximations ...... 62

Chapter 5 Sensitivities for Stochastic Models 69 5.1 The Chemical Master Equation ...... 69 5.2 Sensitivities for Stochastic Systems ...... 70 5.2.1 Approximate Methods for Generating Sensitivities ...... 71 5.2.2 Deterministic Approximation for the Sensitivity ...... 72 5.2.3 Finite Difference Sensitivities ...... 74 5.2.4 Examples ...... 75 5.3 Parameter Estimation With Approximate Sensitivities ...... 79 5.3.1 High-Order Rate Example Revisited ...... 80 5.4 Steady-State Analysis ...... 82 5.4.1 Lattice-Gas Example ...... 83 5.5 Conclusions ...... 83

Chapter 6 Sensitivity Analysis of Discrete Markov Chain Models 87 6.1 Smoothed Perturbation Analysis ...... 89 6.1.1 Coin Flip Example ...... 90 6.1.2 State-Dependent Simulation Example ...... 93 6.2 Smoothing by Integration ...... 97 6.3 Sensitivity Calculation for Stochastic Chemical Kinetics ...... 100 6.4 Conclusions and Future Directions ...... 100

Chapter 7 Sensitivity Analysis of Stochastic Differential Equation Models 103 7.1 The Master Equation ...... 104 7.2 Sensitivity Examples ...... 106 7.2.1 Simple Reversible Reaction ...... 106 7.2.2 Oregonator ...... 107 7.3 Applications of Parametric Sensitivities ...... 109 7.3.1 Parameter Estimation ...... 109 ix

7.3.2 Calculating Steady States ...... 113 7.3.3 Simple Dumbbell Model of a Polymer in Solution ...... 114 7.4 Conclusions ...... 116

Chapter 8 Stochastic Simulation of Particulate Systems 119 8.1 Introduction ...... 119 8.2 Stochastic Chemical Kinetics Overview ...... 121 8.2.1 Stochastic Formulation of Isothermal Chemical Kinetics ...... 121 8.2.2 Extension of the Problem Scope ...... 122 8.2.3 Interpretation of the Simulation Output ...... 123 8.3 Crystallization Model Assumptions ...... 124 8.4 Stochastic Simulation of Batch Crystallization ...... 126 8.4.1 Isothermal Nucleation and Growth ...... 126 8.4.2 Nonisothermal Nucleation and Growth ...... 135 8.4.3 Isothermal Nucleation, Growth, and Agglomeration ...... 138 8.5 Parameter Estimation With Stochastic Models ...... 141 8.5.1 Trust-Region Optimization ...... 142 8.5.2 Finite Difference Sensitivities ...... 142 8.5.3 Parameter Estimation for Isothermal Nucleation, Growth, and Agglom- eration ...... 145 8.6 Critical Analysis of Stochastic Simulation as a Modeling Tool ...... 146 8.7 Conclusions ...... 148

Chapter 9 Population Balance Models for Cellular Systems 151 9.1 Population Balance Modeling ...... 151 9.2 Application of the Model to Viral Infections ...... 153 9.2.1 Intracellular Model ...... 153 9.2.2 Extracellular Events ...... 154 9.2.3 Final Model Refinements ...... 154 9.2.4 Model Solution ...... 155 9.3 Application to In Vitro and In Vivo Conditions ...... 156 9.3.1 In Vitro Experiment ...... 156 9.3.2 In Vivo Initial Infection ...... 158 9.3.3 In Vivo Drug Therapy ...... 163 9.4 Future Outlook and Impact ...... 168

Chapter 10 Modeling Virus Dynamics: Focal Infections 171 10.1 Experimental System ...... 172 10.1.1 Modeling the Experiment ...... 173 10.1.2 Modeling the Measurement ...... 173 10.1.3 Analyzing and Modeling the Images ...... 174 10.2 Propagation of VSV on BHK-21 Cells ...... 175 x

10.2.1 Development of a Reaction-Diffusion Model ...... 176 10.2.2 Analysis of the Model Fit ...... 177 10.3 Propagation of VSV on DBT Cells ...... 179 10.3.1 Refinement of the Reaction-Diffusion Model ...... 180 10.3.2 Discussion ...... 183 10.3.3 Model Prediction: Infection Propagation in the Presence of Interferon Inhibitors ...... 189 10.4 Conclusions ...... 190 10.5 Appendix ...... 193

Chapter 11 Multi-level Dynamics of Viral Infections 195 11.1 Modeling Framework ...... 196 11.2 Examples ...... 197 11.2.1 Initial Infection for a Generic Viral Infection ...... 197 11.2.2 VSV/DBT Focal Infection ...... 201 11.2.3 Model Solution ...... 207 11.3 Conclusions ...... 211

Chapter 12 Moving-Horizon State Estimation 215 12.1 Formulation of the Estimation Problem ...... 217 12.2 Nonlinear Observability ...... 218 12.3 Extended Kalman Filtering ...... 218 12.4 Monte Carlo Filters ...... 220 12.5 Moving-Horizon Estimation ...... 223 12.6 Example 1 ...... 225 12.6.1 Comparison of Results ...... 225 12.6.2 Evaluation of Arrival Cost Strategies ...... 231 12.7 EKF Failure ...... 233 12.7.1 Chemical Reaction Systems ...... 235 12.7.2 Example 2 ...... 237 12.7.3 Example 3 ...... 244 12.7.4 Computational Expense ...... 248 12.8 Conclusions ...... 250 12.9 Appendix ...... 255 12.9.1 Derivation of the MHE Smoothing Formulation ...... 255 12.9.2 Derivation of the MHE Filtering Formulation ...... 256 12.9.3 Equivalence of the Full Information and Least Squares Formulations . . 257 12.9.4 Evolution of a Nonlinear Probability Density ...... 260 xi

Chapter 13 Closed Loop Performance Using Moving-Horizon Estimation 265 13.1 Regulator ...... 265 13.2 Disturbance Models for Nonlinear Models ...... 266 13.2.1 Plant-model Mismatch: Exothermic CSTR Example ...... 268 13.2.2 Maximum Yield Example ...... 270 13.3 Conclusions ...... 276

Chapter 14 Conclusions 277

Bibliography 281

Vita 293 xii xiii

List of Tables

2.1 Types of cell population models ...... 26

4.1 Model parameters and reaction extents for the enzyme kinetics example . . . . 51 4.2 Model parameters and reaction extents for the simple crystallization example . 53 4.3 Comparison of time steps for the simple crystallization example ...... 56 4.4 Model parameters and reaction extents for the intracellular viral infection example 59 4.5 Simulation time comparison for the intracellular viral infection example . . . . 61

5.1 Parameters for the lattice-gas example...... 83

6.1 Parameters for the coin flip example ...... 92

7.1 Parameter values for the simple reversible reaction...... 107 7.2 Parameter values for the Oregonator system of reactions...... 109 7.3 Parameters for the simple dumbbell model...... 115 7.4 Results for the simple dumbbell model...... 115

8.1 Nucleation and growth parameters for an isothermal batch crystallizer . . . . . 127 8.2 Nonisothermal nucleation and growth parameters for a batch crystallizer . . . 136 8.3 Nucleation, growth, and agglomeration parameters for an isothermal, batch crystallizer ...... 140 8.4 Parameters for the parameter estimation example...... 144 8.5 Estimated parameters ...... 145

9.1 Model parameters for in vitro simulation ...... 157 9.2 Model parameters for in vivo simulation ...... 160 9.3 Comparison of actual and fitted parameter values for in vivo simulation of an initial infection ...... 163 9.4 Additional model parameters for in vivo drug therapy ...... 165

10.1 Parameters used to describe the experimental conditions...... 173 10.2 Parameter estimates for the VSV/BHK-21 focal infection models...... 178 10.3 Hessian analysis for the parameter estimates of the original VSV/BHK-21 focal infection model...... 178 xiv

10.4 Hessian analysis for the parameter estimates of the revised VSV/BHK-21 focal infection model...... 179 10.5 Parameter estimates for the VSV/DBT focal infection models...... 185 10.6 Hessian analysis for the parameter estimates of the reaction-diffusion VSV/DBT focal infection model...... 186 10.7 Hessian analysis for the parameter estimates of the first segregated VSV/DBT focal infection model...... 187 10.8 Hessian analysis for the parameter estimates of the second segregated VSV/DBT focal infection model...... 188

11.1 Model parameters for the initial infection simulation...... 198 11.2 Initial conditions and rate constants for the intracellular reactions of the VSV infection of DBT cells...... 206 11.3 Initial conditions and rate constants for the reactions describing the intracellular host antiviral response of the VSV infection of DBT cells...... 207 11.4 Extracellular model parameters for the infection of DBT cells by VSV...... 208

12.1 Sample size required to ensure that the relative mean square error at zero is less than 0.1...... 223 12.2 EKF steady-state behavior, no measurement or state noise ...... 238 12.3 EKF steady-state behavior, no measurement or state noise ...... 242 12.4 A priori initial conditions for state estimation ...... 246 12.5 Effects of a priori initial conditions, constraints, and horizon length on state es- timation...... 254 12.6 Comparison of MHE and EKF computational expense...... 254

13.1 Model Steady States for a Plant with Tc = 300 K, T = 350 K...... 268 13.2 Maximum yield CSTR parameters...... 273 xv

List of Figures

2.1 Microscopic volume considered in the equation of continuity for two dimensions. 6 2.2 Optimal control seeks to drive the output to set point...... 9 2.3 Parameter estimation seeks to minimize the deviations between the model pre- diction and the data...... 10 2.4 Illustration of the strong law of large numbers given a uniform distribution. . . 17 2.5 Illustration of the central limit theorem given a uniform distribution...... 19

3.1 Computational time per simulation as a function of nAo...... 30

3.2 Extent of reaction as a function of nAo...... 31 3.3 Finite difference sensitivity for the stochastic model...... 32 3.4 Cyclic nature of viral infections...... 33

4.1 Comparison of the stochastic-equilibrium simulation to exact stochastic simu- lation...... 52 4.2 Comparison of approximate tau-leap simulation to exact stochastic simulation. 54 4.3 Comparison of approximate stochastic-Langevin simulation to exact stochastic simulation...... 55 4.4 Comparison of exact stochastic-deterministic simulation to exact stochastic sim- ulation...... 56 4.5 Comparison of approximate stochastic-deterministic simulation to exact stochas- tic simulation...... 57 4.6 Squared error trends for the exact and approximate stochastic-deterministic sim- ulations...... 58 4.7 Intracellular viral infections: (a) typical and (b) aborted...... 60 4.8 Evolution of the template probability distribution for the (a) exact stochastic and (b) approximate stochastic-deterministic simulations...... 62 4.9 Comparisons of the template probability distribution for the exact stochastic and approximate stochastic-deterministic simulations...... 63 4.10 Comparison of the template mean and standard deviation for exact stochastic, approximate stochastic-deterministic, and deterministic simulations...... 64 4.11 Comparison of the genome mean and standard deviation for exact stochastic, approximate stochastic-deterministic, and deterministic simulations...... 64 xvi

4.12 Comparison of the structural protein mean and standard deviation for exact stochastic, approximate stochastic-deterministic, and deterministic simulations. 65

5.1 Comparison of the exact, approximate, and central finite difference sensitivities for a second-order reaction...... 76 5.2 Comparison of the exact and approximate sensitivities for the high-order rate example...... 77 5.3 Relative error of the approximate sensitivity s with respect to the exact sensitiv- ity s as the number of nA,o molecules increases for the high-order rate example. 78 5.4 Comparison of the exact, approximate, and finite difference sensitivity for the high-order rate example...... 78 5.5 Comparison of the (a) parameter estimates per Newton-Raphson iteration and (b) model fit at iteration 20 using the approximate and finite difference sensitiv- ities for the high-order rate example...... 81 5.6 Results for the lattice-gas model...... 84

6.1 Mean E[Sn] as a function of the number of coin flips n ...... 92 ∂E[Sn] 6.2 Mean sensitivity ∂θ as a function of the number of coin flips n ...... 93 6.3 Comparison of nominal and perturbed path for SPA analysis ...... 94 6.4 SPA analysis of the discrete decision...... 94 6.5 Illustration of the branching nature of the perturbed path for SPA analysis . . . 96

6.6 Mean E[nk] as a function of the number of decisions k ...... 97 ∂E[nk] 6.7 Mean sensitivity ∂θ as a function of the number of decisions k ...... 97 6.8 Comparison of the exact and simulated (a) mean and (b) mean integrated sen- sitivity for the irreversible reaction 2A → B...... 99

7.1 Results for the simple reversible reaction re-using the same random numbers. . 108 7.2 Results for the simple reversible reaction using different random numbers. . . . 109 7.3 Results for one trajectory of the Oregonator cyclical reactions...... 110 7.4 Results for parameter estimation of the simple reversible reaction example. . . 112 7.5 Results for steady-state analysis of the Oregonator reaction example: estimated state per Newton iteration...... 114

8.1 Method for calculating the population balance from stochastic simulation. . . . 125 8.2 Mean of the stochastic solution for an isothermal crystallization with nucleation and growth, 1 simulation, characteristic particle size ∆ = 0.01, system volume V = 1 ...... 127 8.3 Mean of the stochastic solution for an isothermal crystallization with nucleation and growth, average of 100 simulations, characteristic particle size ∆ = 0.01, system volume V = 1 ...... 128 8.4 Average stochastic simulation time based on 10 simulations and V = 1 ..... 128 xvii

8.5 Mean of the stochastic solution for an isothermal crystallization with nucleation and growth, average of 100 simulations, characteristic particle size ∆ = 0.1, system volume V = 1 ...... 129 8.6 Deterministic solution by orthogonal collocation for isothermal crystallization with nucleation and growth...... 133 8.7 Deterministic solution by orthogonal collocation for isothermal crystallization with nucleation and growth, inclusion of the diffusivity term...... 134 8.8 Total and supersaturated monomer profiles for nonisothermal crystallization . 136 8.9 Crystallizer and cooling jacket temperature profiles ...... 137 8.10 Mean of the exact stochastic solution for nonisothermal crystallization with nu- cleation and growth...... 137 8.11 Mean of the approximate stochastic solution for nonisothermal crystallization with nucleation and growth, propensity of no reaction a0 = 10 ...... 138 8.12 Deterministic solution by orthogonal collocation for nonisothermal crystalliza- tion with nucleation and growth, inclusion of the diffusivity term ...... 139 8.13 Zeroth moment comparisons ...... 139 8.14 First moment comparisons ...... 140 8.15 Mean of the stochastic solution for an isothermal crystallization with nucleation, growth, and agglomeration ...... 141 8.16 Comparison of final model prediction and measurements for the parameter es- timation example...... 145 8.17 Convergence of parameter estimates as a function of the optimization iteration. 146

9.1 Fit of a structured, unsegregated model to experimental results...... 159 9.2 Time evolution of intracellular components and secreted virus for the intracel- lular model ...... 160 9.3 Fit of a structured, unsegregated model to experimental results...... 161 9.4 Dynamic in vivo response of the cell population balance to initial infection . . . 162 9.5 Extracellular model fit to dynamic in vivo response of an initial infection . . . . 162 9.6 Dynamic in vivo response to initial treatment with inhibitor drugs I1 and I2... 166 9.7 Effect of drug therapy on in vivo steady states...... 167

10.1 Overview of the experimental system...... 172 10.2 Measurement model...... 174 10.3 Comparison of representative experimental images to model fits...... 175 10.4 Comparison of the initial uninfected cell concentration for the original and re- vised models...... 177 10.5 Comparison of representative experimental images to model fits for VSV prop- agation on DBT cells...... 180 10.6 Comparison of intracellular production rates of virus and interferon for the seg- regated model of VSV propagation on DBT cells...... 184 xviii

10.7 Comparison of representative experimental images to model predictions for VSV propagation on DBT cells in the presence of interferon inhibitors...... 189 10.8 Experimental (averaged) images obtained from the dynamic propagation of VSV on BHK-21 cells...... 193 10.9 Experimental (averaged) images obtained from the dynamic propagation of VSV on DBT cells...... 194

11.1 (a) Comparison of the full and decoupled model solutions for the initial infec- tion example. (b) Percent error for the decoupled model solution, assuming the full solution is exact...... 201 11.2 Schematic of modeled events for the infection of DBT cells by VSV...... 202 11.3 Detailed schematic of modeled events for the up-regulation of interferon (IFN) genes...... 203 11.4 Comparison of experimental data, simple segregated model fit, and the devel- oped model...... 211 11.5 Comparison of total production of virus (VSV) and interferon (IFN) per cell for the simple segregated model and intracellularly-structured, segregated model. 212 11.6 Dynamic measurement of mRNA species for the focal infection system. . . . . 212

12.1 Comparison of potential point estimates (mean and mode) for (a) unimodal and (b) bimodal a posteriori distributions...... 216 12.2 Example of using the kernel method to estimate the density of samples drawn from a normal distribution...... 222 12.3 Example of using a histogram to estimate the density of samples drawn from a normal distribution...... 222 12.4 Extended Kalman filter results...... 226

12.5 Contours of P (x1|y0, y1) ...... 227 12.6 Clipped extended Kalman filter results...... 228 12.7 Moving-horizon estimation results...... 229

12.8 Contours of max P (x1, x0|y0, y1)...... 230 x0 12.9 A posteriori density P (x1|y0, y1) calculated using a Monte Carlo filter with den- sity estimation...... 230

12.10Contours of P (x4|y0,..., y4)...... 231 12.11Contours of max P (x1,..., x4|y0,..., y4) with the arrival cost approximated x1,...,x3 using the smoothing update...... 232

12.12Contours of max P (x1,..., x4|y0,..., y4) with the arrival cost approximated x1,...,x3 as a uniform prior...... 233

12.13Contours of max P (x1,..., x10|y0,..., y10) with the arrival cost approximated x1,...,x9 using the smoothing update...... 234 12.14Extended Kalman filter results...... 239 12.15Clipped extended Kalman filter results...... 240 xix

12.16Moving-horizon estimation results...... 241 12.17Extended Kalman filter results...... 243 12.18Clipped extended Kalman filter results...... 244 12.19Moving-horizon estimation results...... 245 12.20Extended Kalman filter results...... 246 12.21Moving-horizon estimation results...... 247 12.22Extended Kalman filter results...... 248 12.23Moving-horizon estimation results...... 249 12.24Clipped extended Kalman filter results...... 250 12.25Moving-horizon estimation results...... 251 12.26Clipped extended Kalman filter results...... 252 12.27Moving-horizon estimation results...... 253

13.1 General diagram of closed-loop control for the model-predictive control frame- work...... 266 13.2 Exothermic CSTR diagram...... 268 13.3 Steady states for the Exothermic CSTR example...... 269 13.4 Exothermic CSTR feed disturbance...... 269 13.5 Exothermic CSTR results: rejection of a feed disturbance using an output dis- turbance model...... 271 13.6 Exothermic CSTR: Comparison of best nonlinear results to linear MPC results. 272 13.7 Maximum yield CSTR ...... 273 13.8 Maximum yield CSTR steady states ...... 274 13.9 Maximum yield CSTR: temporary output disturbance ...... 274 13.10Maximum yield CSTR results...... 275 xx 1

Chapter 1

Introduction

Chemical reaction models present one method of assimilating and interpreting complex re- action pathways. Usually a deterministic framework is employed to model these networks of chemical reactions. This framework assumes that a system evolves in a continuous, well- prescribed manner. Systems-level tasks seek to extract the maximum amount of utility from these models. Most of these tasks, such as parameter estimation and feedback control, can be posed in terms of optimization problems. For systems containing few numbers of particles, such as intracellular reaction net- works, concentrations are not large enough to justify applying the usual smoothly-varying assumption made in deterministic models. Rather, there are a countably finite number of chemical species in the given system. Stochastic reaction models consider such mesoscopic phenomena in terms of discrete, molecular events that, given a cursory examination, occur in a “random” fashion. These stochastic simulations are merely realizations of a deterministi- cally evolving probability distribution. Here, one must use simulation to reconstruct moments of this distribution due to the tremendous size of the probability space. The basis for these models is well established in the literature, but the methods that govern the exact simulation of these models often become computationally expensive to evaluate and hence have great room for improvement. Additionally, relatively little work has been performed in extending systems-level tasks to handling these sorts of models. Consequently, there exists a need to first formulate reasonable analogs of these traditionally deterministic tasks in a stochastic setting, and then propose methods for efficiently performing these tasks. One of the simplest, yet most intriguing biological organisms is the virus. The virus contains enough genetic information to replicate itself given the machinery of a living host. So powerful is this strategy that viral infections present one of the most potent threats to hu- man survival and well-being. The Joint United Nations Programme on HIV/AIDS (UNAIDS) estimates that in 2002, 42 million people were living with HIV/AIDS, 5 million people were newly infected with HIV, and 3.1 million people died due to AIDS related illnesses. The World Health Organization estimates that of the 170 million people currently suffering from hepatitis C, roughly one million will develop cancer of the liver during the next 10 years. In the United States alone, researchers estimate that the 500 million cases of the common cold contracted annually cost $40 billion in health care costs and lost productivity [31]. Hence there is a vi- 2 tal humanitarian and economic interest in systematically understanding how viral infections progress and how this progression can be controlled. Accordingly, researchers have invested significant amounts of time and money towards determining the roles that individual compo- nents such as the genome or proteins play in viral infections. As of yet, however, there exists no comprehensive picture that quantitatively incorporates and integrates data on viral infec- tions from multiple levels. Again, models offer one manner of consolidating the vast amount of information contained across these levels, and systems-level tasks provide one method of conveniently extracting information. This dissertation considers the role of deterministic and stochastic models in assimilat- ing dynamic data. The primary focus is on maximizing the information available from these models as well as applying such models to experimental systems. The remainder of this thesis is organized as follows: • Chapter 2 reviews literature pertaining to simulation of deterministic and stochastic chemical reaction models and methods for extracting information from these simula- tions, such as parameter estimation and state estimation. Here, we introduce the sensi- tivity as a useful quantity for performing systems-level tasks.

• Chapter 3 provides motivation for solving the problems addressed in this thesis.

• Chapters 4 through 7 examine stochastic simulation with an emphasis on stochastic chemical kinetics. We present this material in the following order:

– In Chapter 4, we derive approximations for stochastic chemical kinetics for sys- tems with coupled fast and slow reactions. These approximations lead to simu- lation strategies that result in drastic reductions of computational expense when compared to exact simulation methods.

– Chapter 5 considers biased approximations for calculating mean sensitivities from simulation for the stochastic chemical kinetics problem, and then applies these sen- sitivities to calculate steady states and estimate parameters.

– Chapter 6 explains how the discrete nature of the stochastic chemical kinetics for- mulation makes obtaining unbiased estimates of mean sensitivities difficult, then explores several techniques for calculating these unbiased estimates.

– Chapter 7 considers unbiased estimates for sensitivities of simulations governed by stochastic differential equations. Here, we simply differentiate the continuous samples paths to obtain the desired sensitivities, then use the sensitivities to per- form useful tasks.

– Chapter 8 applies some of the stochastic simulation methods developed in previ- ous chapters to solve the batch crystallization population balance. The flexibility of the simulation allows the modeler to focus on modeling the experimental system rather than the numerical methods required to solve the resulting models. 3

• Chapters 9 through 11 address population balance models for viral infections. We con- sider the following issues:

– Chapter 9 derives a population balance model incorporating information from both the intracellular and extracellular levels of description. To explore the util- ity of this model, we compare numerical results from this model to other simpler models for experimentally relevant conditions.

– Chapter 10 considers modeling of experimental data from the focal infection sys- tem. This experimental system provides dynamic image data for multiple rounds of virus infection and antiviral host response. Here, we place an emphasis on deter- mining the minimal level of modeling complexity necessary to adequately describe the experimental data.

– Chapter 11 proposes a decomposition technique for solving population balance models when flow of information is restricted from the extracellular to intracellular level. The goal is to efficiently and accurately solve population balance models while reconstructing population-level dynamics for intracellular and extracellular species.

• Chapters 12 and 13 consider one specific system-level task, namely state estimation. These chapters focus on the probabilistic formulation of the state estimation problem, in which the goal is to calculate the state estimate that maximizes the a posteriori dis- tribution (the probability of the current state conditioned on all available experimental measurements). We examine the following topics:

– Chapter 12 outlines conditions for generating multiple modes in the a posteriori distribution for some relevant chemically reacting systems. We then construct examples exhibiting such conditions, and compare how several state estimators, namely the extended Kalman filter, moving-horizon estimator, and Monte Carlo filters, perform for these examples.

– Chapter 13 examines how multiple modes in the a posteriori distribution can affect the performance of closed-loop feedback control for different estimators.

• Finally, Chapter 14 presents conclusions, outlines major accomplishments, and discusses potential areas of future work. 4 5

Chapter 2

Literature Review

Models for chemical reaction networks usually arise in a traditional, deterministic setting. Given a deterministic model, we can consider performing various systems level tasks such as parameter estimation and control. We can generally pose these tasks in terms of an optimiza- tion. In this context, a quantity known as the sensitivity becomes useful for efficient solution of the optimization. The shortcomings of the traditional deterministic framework motivate the need for alternatives that provide a more flexible foundation for chemical reaction modeling. Two such alternatives are stochastic and population balance models. This chapter presents a brief review of the modeling literature for both these subjects and the traditional models.

2.1 Traditional Deterministic Reaction Models

In a deterministic setting, we perform mass balances for the reactants and products of interest using the equation of continuity. Here we define the mass of these species as a function of time (t) and the internal (y) and external (x) characteristics of the system:

η(t, z)dz = mass of reactants or products (2.1) " # " # x external characteristics z = = (2.2) y internal characteristics

We now consider an arbitrary, time-varying control volume V (t) spanning a space in z. This volume has a time-varying surface S(t). The normal vector ns points from the surface away from the volume, and the vector vs specifies the velocity of the surface. The vector vz specifies the velocity of material flowing through the volume. Figure 2.1 depicts a low-dimensional representation of this volume. Assuming that V (t) contains a statistically significant amount of mass, the conservation equation for the species contained in V (t) is

d Z Z Z Z η(t, z)dz = Rηdz − F · nsdΩ + η(t, z)(vs · ns)dΩ (2.3) dt V (t) V (t) S(t) S(t) | {z } | {z } | {z } | {z } accumulation generation convective + diffusive flux flux due to surface motion 6

z2 S(t)

vz

vz V (t) ns

ns vs

vs

z1

Figure 2.1: Microscopic volume considered in the equation of continuity for two dimensions.

in which Rη refers to the production rate of the species η, F is the total flux, and dΩ is the differential change in the surface. Making use of the Leibniz formula permits differentiating the volume integral

d Z Z ∂η(t, z) Z η(t, z)dz = dz + η(t, z)(vs · ns) dΩ (2.4) dt V (t) V (t) ∂t S(t)

Substituting equation (2.4) into equation (2.3) yields

Z ∂η(t, z) Z Z dz = Rηdz − F · nsdΩ (2.5) V (t) ∂t V (t) S(t)

Now apply the divergence theorem to the surface integral to obtain

Z ∂η(t, z) Z Z dz = Rηdz − ∇ · Fdz (2.6) V (t) ∂t V (t) V (t)

Combining all terms into the same integral yields

Z ∂η(t, z) dz + ∇ · F − Rηdz = 0 (2.7) V (t) ∂t

Since the element V (t) is arbitrary, the argument of the integral must be zero; this result yields the microscopic equation of continuity:

∂η(t, z) + ∇ · F = R (2.8) ∂t η 7

Equation (2.8) is the most general form of our proposed model. Both Bird, Stewart, and Light- foot [11] and Deen [24] derive this equation without consideration of internal characteristics. We consider a time-varying control element, so our derivation is more akin to that of Deen [24]. Traditionally, one assumes that there are no internal characteristics of interest. Equa- tion (2.8) then further reduces to: ∂η(t, x) + ∇ · F = R (2.9) ∂t η Additionally, we can write the total flux F as the sum of convective and diffusive fluxes

F = η(t, x)vx + f (2.10)

We now assume that the reactor is well-stirred so that neither η nor Rη depend on the external coordinates x. This assumption implies that there is no diffusive flux, i.e. f = 0, which yields ∂η(t, x) + ∇ · (η(t)v ) = R (2.11) ∂t x η

Next, we integrate over the time-varying reactor volume Ve: Z ∂η(t) Z + ∇ · (η(t)vx) dx = Rηdx (2.12) Ve ∂t Ve Z ∂η(t) Z Z dx + ∇ · (η(t)vx) dx = Rηdx (2.13) Ve ∂t Ve Ve dη Z Ve + ∇ · (ηvx) dx = RηVe (2.14) dt Ve in which we have dropped the time dependence of η for notational convenience. Applying the divergence theorem to change the volume integral to a surface integral yields dη Z Ve + ne · (ηvx) dΩe = RηVe (2.15) dt Se in which

• Se is the time-varying surface of the reactor volume Ve,

• dΩe is the differential change in this surface, and

• ne is the normal vector with respect to the surface pointing away from the reactor vol- ume. Clearly η does not change within the reactor volume. However, changes to the surface as well as influx and outflow of material across the reactor boundary affect η as follows Z Z Z ne · (ηvx) dΩe = ne · (ηvx) dΩe,1 + ne · (ηvx) dΩe,2 (2.16) Se Se,1 Se,2 | {z } | {z } flow across the reactor surface surface expansion due to reactor volume changes dVe = q η − qη +η (2.17) f f dt 8 in which q and qf are respectively the effluent and feed volumetric flow rates, and ηf is the concentration of η in the feed. The resulting conservation equation is

df dVe V − q η + qη + η = R V (2.18) e dt f f dt η e

d(ηVe) = q η − qη + R V (2.19) dt f f η e

Equation (2.19) is commonly associated with continuous stirred tank reactors (CSTR’s). Alter- natively, we could have derived the plug flow reactor (PFR) design equation by starting with equation (2.9) and assuming that the reactor is well mixed in only two external dimensions.

2.2 Systems Level Tasks for Deterministic Models

Performing systems level tasks such as parameter estimation, model based feedback control, and process and product design requires a different set of tools than those required for pure simulation. Many systems level tasks are conveniently posed as optimization problems. We briefly review several of these tasks, namely optimal control, state estimation, and parameter estimation, and introduce the sensitivity as a useful quantity for performing these tasks.

2.2.1 Optimal Control

Optimal control consists of minimizing an objective of the form

N X T T T min Φ = (yk − ysp) Q(yk − ysp) + (uk − usp) R(uk − usp) + (∆uk) S∆uk (2.20a) u0,...,uN k=0

s.t. xk+1 = F (xk, uk) (2.20b)

yk = h(xk) (2.20c)

∆uk = uk − uk−1, d(xk) ≥ 0, g(uk) ≥ 0 (2.20d) in which

• yk is the measurement at time tk;

• uk is the input at time tk;

• xk is the state at time tk;

• F (xk, uk) is the solution to a first-principles model (e.g. equation (2.19)) over the time interval [tk, tk+1);

• ysp and usp are the measurement and input, respectively, at the desired set point; • Q and R are matrices that penalize deviations of the measurement and input from set point; and • S is a that penalizes changes in the input. 9

In general, the optimal control problem considers an infinite number of decisions, i.e. the control horizon N is infinite. As shown in Figure 2.2, the goal of optimization (2.20) is to drive the measurements to their set points. Most control applications consist of discrete time sample, so we have formulated the model, equation (2.20b), in discrete time also.

uk

yk

k − 1 k k + 1 k + 2 k Past Future value of control objective Present

Figure 2.2: Optimal control seeks to drive the output to set point by minimizing deviations of both the output y and the input u from their respective set points.

There is a wealth of control literature that examines the properties of the equation (2.20). For example, this formulation does not even guarantee that the controller will drive the out- puts to set point. Rather, one must include additional conditions such as enforcing a terminal penalty on each optimization (i.e. yN = ysp) or adding a terminal penalty to the final measure- ment yN that quantifies the cost-to-go for an infinite horizon. We refer the interested reader to the literature for additional information on this subject [119, 118, 90].

2.2.2 State Estimation

State estimation poses the problem: given a time course of experimental measurements and a dynamic model of the system, what is the most likely state of the system? This problem is usually formulated probabilistically, that is, we would like to calculate

xˆk|k = arg max P (xk|y0,..., yk) (2.21) xk in which xk is the state at time tk, yk is the measurement at time tk, and xˆk|k is the a poste- riori state estimate of x at time tk given all measurements up to time tk. The nature of the 10 estimator depends greatly on the choice of dynamic model. For linear, unconstrained sys- tems with additive Gaussian noise, the Kalman filter [144] provides a closed-form solution to equation (2.21). For constrained or nonlinear systems, solution of this equation may or may not be tractable. One computationally attractive method for addressing the nonlinear system is the extended Kalman filter, which first linearizes the nonlinear system, then applies the Kalman filter update equations to the linearized system [144]. This technique assumes that the a posteriori distribution is normally distributed (unimodal). Examples of implementations include estimation for the production of silicon/germanium alloy films [93], polymerization reactions [103], and fermentation processes [55]. However, the extended Kalman filter, or EKF, is at best an ad hoc solution to a difficult problem, and hence there exist many barriers to the practical implementation of EKFs (see, for example, Wilson et al. [163]).

2.2.3 Parameter Estimation

Parameter estimation seeks to reconcile model predictions with experimental data, as shown in Figure 2.3. In particular, we would like to maximize the probability of the mean parameter

7 6 5

yk 4 3 2 1 A + B ↔ C 0 50 1510 302520 k

Figure 2.3: Parameter estimation seeks to minimize the deviations between the model predic- tion (solid line) and the data (points).

set θ given the measurements yk’s

max PΘ|Y ,...,Y (θ|y0,..., yN ) (2.22) θ 0 N in which θ and yk are realizations of the random variables Θ and Y k, respectively. For con- venience, we drop the subscript denoting the random variable unless required for clarity. We assume that the measurements yk’s are generated from an underlying deterministic model 11 whose measurements are corrupted by noise, i.e.

xk+1 = F (xk; θ) (2.23)

yk = h(xk) + vk (2.24)

vk ∼ N (0, Π) ∀k = 0,...,N (2.25) in which

• the state variables xk’s are simply convenient functions of the parameters θ and

1 • the variables vk’s are realizations of the normally distributed random variable ξ ∼ N (0, Π) .

Using Bayes’ Theorem to manipulate the joint distribution P (θ|y0,..., yN ) yields

P (θ|y0,..., yN ) P (y0,..., yN ) = P (y0,..., yN |θ)P (θ) (2.26) | {z } constant

P (θ|y0,..., yN ) ∝ P (y0,..., yN |θ)P (θ) (2.27)

In general, P (θ) is assumed to be a noninformative prior so as not to unduly influence the estimate of the parameters. For the chosen disturbances (i.e. normally distributed), Box and Tiao show that the noninformative prior is the distribution P (θ) = constant [14]. We derive the distribution of P (y0,..., yN , θ) from the known distribution P (v0,..., vN , θ) in the manner described by Ross [130]. This derivation require use of the inverse function theorem from calculus [132]. First define the function mapping (v0,..., vN , θ) onto (y0,..., yN , θ) as   h(x0(θ)) + v0  .   .  f(v0,..., vN , θ) =   (2.28)   h(xN (θ)) + vN  θ

We require that

1. f(v0,..., vN , θ) can be uniquely solved for v0,..., vN and θ in terms of y0,..., yN and θ. This condition is trivially true because

vk = yk − h(xk(θ)) ∀k = 0,...,N (2.29a) θ = θ (2.29b)

2. f(v0,..., vN , θ) has continuous partial derivatives at all points and the determinant of

1The notation N (0, Π) refers to a normally distributed random variable with mean 0 and covariance Π. 12

its Jacobian is nonzero. The Jacobian J of equation (2.28) is

  I ∂h(x0(θ)) ∂x0 ∂xT ∂θT  0  ∂f(v ,..., v , θ)  .. .  0 N  . .  J = T = (2.30) ∂z  ∂h(xN (θ)) ∂xN   I T T   ∂xN ∂θ  I

T h T T T i z = v0 ... vN θ (2.31)

If h(xk) and xk are at least once continuously differentiable for all k = 0,...,N, then the Jacobian has continuous partial derivatives. Also, J is a block-upper with ones on the diagonal, so its determinant is one (nonzero).

Since these conditions hold, we can calculate the distribution P (y0,..., yN , θ) via

−1 P (y0,..., yN , θ) = det(J) P (v0,..., vN , θ) (2.32) N ! Y = Pξ(vk) P (θ) (2.33) k=0

Then the desired conditional is

P (y ,..., y , θ) P (y ,..., y |θ) = 0 N (2.34) 0 N P (θ) N Y = Pξ(vk) (2.35) k=0 N Y = Pξ (yk − h(xk(θ))) (2.36) k=0

We derive the desired optimization problem next:

N Y max P (θ|y0,..., yN ) ∝ max Pξ(vk) (2.37) θ θ k=0 N ! Y = max log Pξ(vk) (2.38) θ k=0 N X = max log Pξ(yk − h(xk(θ))) (2.39) θ k=0 N X 1 T −1 ∝ min (yk − h(xk)) Π (yk − h(xk)) (2.40) θ 2 k=1 13

Therefore, this problem is equivalent to the optimization

N 1 X T −1 min Φ = ek Π ek (2.41a) θ 2 k=1

ek = yk − h(xk) (2.41b)

xk+1 = F (xk; θ) (2.41c)

We refer the reader to Box and Tiao [14] and Stewart, Caracotsios, and Sørensen [145] for a more detailed account of estimating parameters from data. Their discussion includes, for example, calculation of confidence intervals for estimated parameters.

2.2.4 Sensitivities

We define the sensitivity s as

∂x s = (2.42) ∂θT in which x is the state of the system and θ is a vector containing the parameters of interest for the system. This quantity is useful for efficiently performing optimization. In particular, sensitivities provide precise first-order information about the solution of the system, and this first-order information is manipulated to calculate gradients and Hessians that guide the non- linear optimization routines. For example, consider the nonlinear optimization for parameter estimation, equation (2.41). A strict local solution to this optimization is obtained when the gradient is zero and the Hessian is positive definite. Calculating these quantities yields

∂ 1 X T −1 ∇θΦ = e Π ek (2.43) ∂θT 2 k k  T X ∂h(xk) ∂xk −1 = − Π ek (2.44) ∂xT ∂θT k k  T X ∂h(xk) = − s Π−1e (2.45) ∂xT k k k k

∂ ∇θθΦ = ∇θΦ (2.46) ∂θT  T ! ∂ X ∂h(xk) −1 = − sk Π ek (2.47) ∂θT ∂xT k k  T  2 T X ∂h(xk) −1 ∂h(xk) ∂h(xk) ∂ xk −1 = − sk Π sk + Π ek (2.48) ∂xT ∂xT ∂xT ∂θ ∂θT k k k k k k 14

The sensitivity s clearly arises in calculation of both of these quantities. Next, we consider calculation of sensitivities for ordinary differential equations (ODE’s) and differential algebraic equations (DAE’s). This analysis basically summarizes the excellent work presented by Caracotsios et al. [17].

ODE Sensitivities

ODE systems may be written in the following form: dx = f(x, θ) (2.49a) dt x(0) = x0 (2.49b) Accordingly, we can obtain an expression for the evolution of the sensitivity by differentiating equation (2.49a) by the parameters θ: ∂ dx ∂ = f(x, θ) (2.50) ∂θT dt ∂θT d  ∂x  ∂f(x, θ) ∂x ∂f(x, θ) ∂θ = + (2.51) dt ∂θT ∂xT ∂θT ∂θT ∂θT ds ∂f(x, θ) ∂f(x, θ) = s + (2.52) dt ∂xT ∂θT This analysis demonstrates that the evolution equation for the sensitivity is the following ODE system: ds ∂f(x, θ) ∂f(x, θ) = s + (2.53a) dt ∂xT ∂θT ( 1 if x0,i = θj si,j(0) = (2.53b) 0 otherwise

Equation (2.53) demonstrates two distinctive features about the evolution equation for the sensitivity: 1. it is linear with respect to s, and

2. it depends only on the current values of s and x. Therefore, we can solve for s by merely integrating equation (2.53) along with the ODE sys- tem (2.49).

DAE Sensitivities

DAE systems consider the following general form: 0 = g(x˙ , x, θ) (2.54a)

x(0) = x0 (2.54b)

x˙ (0) = x˙ 0 (2.54c) 15 where x is the state of the system, x˙ is the first derivative of x, and θ is a vector containing the parameters of interest for the system. Again, we define the sensitivity s by equation (2.42) and differentiate equation (2.54a) by the θ to determine an expression for the evolution of the sensitivity: ∂ 0 = g(x˙ , x, θ) (2.55) ∂θT ∂g(x˙ , x, θ) ∂x˙ ∂g(x˙ , x, θ) ∂x ∂g(x˙ , x, θ) ∂θ 0 = + + (2.56) ∂x˙ T ∂θT ∂xT ∂θT ∂θT ∂θT ∂g(x˙ , x, θ) d  ∂x  ∂g(x˙ , x, θ) ∂x ∂g(x˙ , x, θ) 0 = + + (2.57) ∂x˙ T dt ∂θT ∂xT ∂θT ∂θT ∂g(x˙ , x, θ) ds ∂g(x˙ , x, θ) ∂g(x˙ , x, θ) 0 = + s + (2.58) ∂x˙ T dt ∂xT ∂θT This analysis demonstrates that the evolution equation for the sensitivity of a DAE system yields a linear DAE system: ∂g(x˙ , x, θ) ∂g(x˙ , x, θ) ∂g(x˙ , x, θ) 0 = s˙ + s + (2.59a) ∂x˙ T ∂xT ∂θT ( 1 if x0,i = θj si,j(0) = (2.59b) 0 otherwise

s˙(0) = s˙ 0 (2.59c)

As is the case for the original DAE system (2.54), we must pick a consistent initial condi- tion (i.e. s0 and s˙ 0 must satisfy equation (2.59a)). Again, we find that we can solve for the sensitivities of the system by merely integrating equation (2.59) along with the original DAE system (2.54).

2.3 Stochastic Reaction Models

When dealing with systems containing a countably finite number of molecules, deterministic models make the unrealistic assumptions that

1. mesoscopic phenomena can be treated as continuous events; and

2. identical systems given identical perturbations behave precisely the same.

For example, most models of intracellular kinetics inherently examine a small number of molecules contained within a single cell (the finite number of chromosomes in the nucleus, for example), making the first assumption invalid. Additionally, identical systems given iden- tical perturbations may elicit completely different responses. Stochastic models of chemical kinetics make no such assumptions, and hence offer one alternative to traditional determin- istic models. These models have recently received an increased amount of attention from the modeling community (see, for example, [3, 91, 79]). 16

Stochastic models of chemical kinetics postulate a deterministic evolution equation for the probability of being in a state rather than the state itself, as is the case in the usual deter- ministic models. Gillespie outlines the derivation of the evolution equation for this probability distribution in depth [48]. The basis of this derivation depends on the “fundamental hypoth- esis” of the stochastic formulation of chemical kinetics, which defines the reaction parameter cµ characterizing reaction µ as: cµdt = average probability, to first order in dt, that a particular combination of µ reactant molecules will react accordingly in the next time interval dt. We also define

• hµ as the number of distinct molecular reactant combinations for reaction µ at a given time, and

• aµ(n)dt = hµcµδt as the probability, first order in dt, that a µ reaction will occur in the next time interval dt.

Given this “fundamental hypothesis”, the governing equation for this system is the chemical master equation

m dP (n, t) X = a (n − ν )P (n − ν , t) − a (n)P (n, t) (2.60) dt k k k k k=1 in which

• n is the state of the system in terms of number of molecules (a p-vector),

• P (n, t) is the probability that the system is in state n at time t,

• ak(n)dt is the probability to order dt that reaction k occurs in the time interval [t, t + dt), and

• νk is the kth column of the stoichiometric matrix ν (a p × m matrix).

Here, we assume that the initial condition P (n, t0) is known. The solution of equation (2.60) is computationally intractable for all but the simplest systems. Rather, Monte Carlo methods are employed to reconstruct the probability distri- bution and its statistics (usually the mean and variance). We consider such methods subse- quently.

2.3.1 Monte Carlo Simulation of the Stochastic Model

Monte Carlo methods take advantage of the fact that any statistic can be written in terms of a large sample limit of observations, i.e.

Z N N 1 X i 1 X i h(n) , h(n)P (n, t)dn = lim h(n ) ≈ h(n ) for N sufficiently large (2.61) N→∞ N N i=1 i=1 17

0.6 0.55 0.5 )

] 0.45 X [ 0.4 E 0.35 0.3

Mean ( 0.25 0.2 0.15 0.1 0 1000 30002000 4000 5000 Number of Samples

Figure 2.4: Illustration of the strong law of large numbers given a uniform distribution over the interval [0, 1]. As the number of samples increases, the sample mean converges to the true mean of 0.5. in which ni is the ith Monte Carlo reconstruction of the state n. Accordingly, the desired statis- tic can be reconstructed to sufficient accuracy given a large enough number of observations. This statement follows as a direct result of the strong law of large numbers, which we state next.

Theorem 2.1 (Strong Law of Large Numbers [130].) Let X1,X2,...,Xn be a sequence of inde- pendent and identically distributed random variables, each having finite mean E[Xi] = m. Then, with probability 1, X1 + ... + Xn lim = m (2.62) n→∞ n

Proof: See Ross for details of the proof [130].  In this case, reconstructions of the desired statistic, i.e. h(ni), are independent and identically distributed variables according to the common density function given by the chemical master equation (2.60). Therefore, sampling sufficiently many of these h(ni) gives us the convergence to h(n) specified by the strong law of large numbers. We illustrate the strong law of large numbers with a simple example. Consider a uni- form distribution over the interval [0, 1]. This distribution has a finite mean of 0.5. The strong law of large numbers requires the average of samples drawn from this distribution to approach the mean with probability one. Figure 2.4 plots the average as a function of sample size; clearly this value approaches 0.5 as the number of samples increases. Unfortunately, the strong law of large numbers gives no indication as to the accuracy of the reconstructed statistic given a finite number of samples. An estimate for the degree of accuracy actually arises from the central limit theorem, which we state next. 18

Theorem 2.2 (Central Limit Theorem [130].) Let X1,X2,...,Xn be a sequence of independent and identically distributed random variables, each having finite mean m and finite variance σ2. Then the distribution of X1 + ... + Xn − nm Z = √ (2.63) n σ n tends to the standard normal as n → ∞. That is, Z a 1 −x2/2 lim P (Zn ≤ a) = √ e dx n→∞ 2π −∞

Proof: See Ross for details of the proof [130].  In this case, we now expect the reconstruction of the desired statistic, i.e. h(n), to be normally distributed assuming a large enough finite sample N. Simulating this statistic multiple times (e.g. twenty samples of h(n) reconstructed from N samples each, or 20 × N total samples) permits indirect estimation of standard statistics for h(n) such as confidence intervals. How does one check whether or not the finite sample size N is large enough to justify invocation of the central limit theorem, then? Kreyszig proposes the following rule of thumb for determining this number of samples: if the skewness of the distribution is small, use at least twenty and fifty samples to reconstruct the mean and variance, respectively [75]. We can also reconstruct multiple realizations of the ZN distribution, then use statistical tests such as the Shapiro-Wilk test to test this distribution for normality [137, 131]. If these tests indicate normality, then we are free to apply the usual statistical inferences for the ZN distribution and hence obtain some measure of the accuracy of the reconstructed statistic h(n). We illustrate the central limit theorem using again the uniform density over the range [0, 1]. Figure 2.5 compares the Monte Carlo reconstructed density for ZN to the standard nor- mal distribution. For N = 1, the reconstructed density of ZN is obviously not normal; in fact, this plot merely reconstructs the underlying uniform distribution (appropriately shifted). For N = 20, the reconstructed density of ZN compares favorably to the standard normal. These statistical theorems, then, ultimately require samples to be drawn exactly from the master equation. For nontrivial examples, direct solution of the master equation is not feasible. Alternatively, one could consider an exact stochastic simulation of the “fundamental hypothesis” as examined by Gillespie [45]. This method examines the joint probability func- tion, P (τ, µ)dτ, that governs when the next reaction occurs, and which reaction occurs. Here,

p ! X P (τ, µ|n, t) = aµ(n) exp − ak(n)τ (2.64) k=1 in which P (τ, µ|n, t)dτ is the probability that the next reaction will occur in the infinitesimal time interval [t + τ, t + τ + dτ) and will be a µ reaction, given that the original state is n at time t. One can then construct numerical algorithms for simulating trajectories obeying the density (2.64). To our knowledge, no one has yet demonstrated the equivalence between the chemical master equation and stochastic simulation. The fact that these two formulas are somehow 19

0.4 ) ) (a) 1 0.35 Z (

f 0.3 0.25 0.2 0.15 0.1 0.05 Probability Density ( 0 -4 -2 0 2 4 Z1 0.45 ) ) 0.4 (b) 20 Z ( 0.35 f 0.3 0.25 0.2 0.15 0.1 0.05 Probability Density ( 0 -4 -2 0 2 4 Z20

Figure 2.5: Illustration of the central limit theorem given a uniform distribution over the in- terval [0, 1]: (a) N = 1 sample and (b) N = 20 samples. Solid line plots the Monte Carlo reconstructed density. Dashed line plots the standard normal distribution. equivalent rests solely on the basis that both arise from the “fundamental hypothesis”. This reasoning is tantamount to the logical statements “if A implies B and A implies C, then B implies C and C implies B”. This reasoning is incorrect. Here, we demonstrate that one can derive equations (2.60) and (2.64) from one another.

Theorem 2.3 (Equivalence of the master equation and the next reaction probability density.) Assume that P (N 0, t0) is known, where h i N 0 = n0 n1 ... (2.65)

The probability densities generated by the chemical master equation (i.e. equation (2.60)) and the joint 20 density P (τ, µ|n, t)dτ (i.e. equation (2.64)) are identical.

Proof. If these probability densities are indeed equivalent, the evolution equations for these densities must be equivalent. Therefore we can prove this theorem by demonstrating that (1) P (τ, µ|n, t)dτ gives rise to the chemical master equation and (2) the chemical master equation gives rise to P (τ, µ|n, t)dτ.

1. Given P (τ, µ|n, t)dτ, derive the chemical master equation.

We consider propagating the marginal density P (nj, t) (dropping the conditional argu- ment (N 0, t0) for convenience) from time t to the future time t + dτ. Noting that the probability of having multiple reactions occur over this time is order dτ, we have

m ! X P (nj, t + dτ) =P (nj, t) 1 − lim P (τ, k|nj, t)dτ (2.66) τ→0 k=1 m X + P (nj − νk, t) lim P (τ, k|nj − νk, t)dτ + O(dτ) (2.67) τ→0 k=1

Manipulating this equation gives rise to the chemical master equation:

P (nj, t + dτ) − P (nj, t) = −a (n )P (n , t) + P (n − ν , t)a (n − ν ) + O(1) (2.68) dτ k j j j k k j k m P (nj, t + dτ) − P (nj, t) X lim = lim −ak(nj)P (nj, t) + P (nj − νk, t)ak(nj − νk) + O(1) dτ→0 dτ dτ→0 k=1 (2.69) m dP (nj, t) X = −a (n )P (n , t) + P (n − ν , t)a (n − ν ) (2.70) dt k j j j k k j k k=1

2. Given the chemical master equation, derive P (τ, µ|n, t)dτ. In this case, the master equation (2.60) is known. Given that the system is in state n at time t, we seek to derive the probability that the next reaction will occur at time t+τ and will be reaction µ. This statement is equivalent to specifying

(a) P (n, t) = 1 and

(b) no reactions occur over the interval [t, t + τ).

Accordingly, the master equation reduces to the following form:

 0   Pm   0  P (n, t ) − k=1 ak(n) 0 ... 0 P (n, t )  P (n + ν , t0)   a (n) 0 ... 0  P (n + ν , t0)  d  1   1   1  0   =     , t ≤ t ≤ t + τ dt0  .   . . .  .   .   . . .  .  0 0 P (n + νm, t ) am(n) 0 ... 0 P (n + νm, t ) (2.71) 21 in which we have now effectively conditioned each P (n, t0) on the basis that no reaction occurs over the given interval. Solving for the desired probabilities yields

m ! 0 X 0 P (n, t ) = exp − ak(n)(t − t) (2.72) k=1 " m !# 0 aj(n) X 0 P (n + νj, t ) = Pm 1 − exp − ak(n)(t − t) , 1 ≤ j ≤ m (2.73) ak(n) k=1 k=1

Our strategy now is to first note that P (τ, µ|n, t)dτ consists of the independent probabil- ities P (τ, µ|n, t)dτ = P (µ|n, t)P (τ|n, t)dτ (2.74) then solve for these marginal densities as a function of the P (n, t0)’s. Conceptually, P (τ|n, t)dτ is the probability that the first reaction occurs in the interval [t+τ, t+τ +dτ). We solve for this quantity by taking advantage of its relationship with P (n, t + τ)

m 0 X dP (n + νj, t ) P (τ|n, t)dτ = dτ (2.75) dt0 j=1 t0=t+τ 0 dP (n, t ) = − 0 dτ (2.76) dt t0=t+τ m X = ak(n)P (n, t + τ)dτ (2.77) k=1 m m ! X X = ak(n) exp − ak(n)τ dτ (2.78) k=1 k=1

As expected, P (τ|n, t)dτ is independent of µ.

0 Similarly, we express P (µ|n, t) as a function of the P (n + νj, t )’s

0 P (n + νµ, t ) P (µ|n, t) = Pm 0 (2.79) k=1 P (n + νk, t ) " m !# aµ(n) X 0 Pm 1 − exp − ak(n)(t − t) ak(n) k=1 k=1 = m " m !# (2.80) X aj(n) X 0 Pm 1 − exp − ak(n)(t − t) ak(n) j=1 k=1 k=1

aµ(n) = Pm (2.81) k=1 ak(n)

As expected, P (µ|n, t) is independent of τ. 22

Combining the two marginal densities, we obtain

P (τ, µ|n, t)dτ = P (µ|n, t)P (τ|n, t)dτ (2.82) m m ! aµ(n) X X = Pm ak(n) exp − ak(n)τ dτ (2.83) ak(n) k=1 k=1 k=1 m ! X = aµ(n) exp − ak(n)τ dτ (2.84) k=1 as claimed. 

Theorem 2.4 (Reconstruction of the master equation density from exact simulation.) As- suming conservation of mass and a finite number of reactions, then the probability density at a single future time point t reconstructed from Monte Carlo simulations converges to the density governed by the chemical master equation almost surely over the interval [t0, t]. That is,   P lim PN (ni, t|N 0, t0) = P (ni, t|N 0, t0) = 1 ∀i = 1, . . . , ns (2.85) N→∞

in which

• N is the number of exact Monte Carlo simulations,

• PN (n, t|N 0, t0) is the Monte Carlo reconstruction of the probability density given N exact simulations,

• P (n, t|N 0, t0) is the density governed by the master equation, and

• ns is the total number of possible species.

Proof: We must show that   P ψ : lim ni,N (ψ, t) = ni(ψ, t) = 1 ∀i = 1, . . . ns (2.86) N→∞

in which h iT • N = n1 ... nns , and

• ni,N is the Monte Carlo reconstruction of ni given N simulations.

Let  > 0. We must show that there exists an N such that if m > N,

|P {ψ : ni,m(ψ, t) = ni(ψ, t)} − 1| <  ∀i = 1, . . . ns (2.87)

The assumption of conservation of mass and a finite number of reactants indicates that ns is finite. Choose Xi(ψ, t) = δ(ψ − ni, t) (2.88) in which the random variable ψ is generated by running an exact stochastic simulation un- til time t. The mean of this random variable is P (ni, t|N 0, t0). Theorem 2.3 states that any 23 simulation scheme obeying the next reaction probability density P (τ, µ|n, t) generates exact trajectories from the master equation. Therefore, we can apply the strong law of large num- bers, which says that there exists an Ni∀i = 1, . . . , ns such that if m > Ni,  |P {ψ : X (ψ, t) = P (n , t)} − 1| ≤ ∀i = 1, . . . n (2.89) i,m i 2 s

Let N = maxi Ni. Then if m > N,

|P {ψ : ni,m(ψ, t) = ni(ψ, t)} − 1| ≤ |P {ψ : ni,N (ψ, t) = ni(ψ, t)} − 1| ∀i = 1, . . . ns (2.90)  ≤ ∀i = 1, . . . n (2.91) 2 s <  ∀i = 1, . . . ns (2.92)

Since  is arbitrary, the proof is complete.  In his seminal works, Gillespie proposes two simple and efficient methods for gener- ating exact trajectories obeying the probability function P (τ, µ) [45, 46]. Theorem 2.3 proves that these trajectories obey exactly the chemical master equation (2.60). Gillespie appropri- ately named these algorithms the direct method and the next reaction method. We summarize these methods in algorithms 1 and 2.

Algorithm 1 Direct Method. Initialize. Set the time, t, equal to zero. Set the number of species n to n0.

1. Calculate:

(a) the reaction rates ak(n) for k = 1, . . . , m; and Pm (b) the total reaction rate, rtot = k=1 ak(n).

2. Select two random numbers p1, p2 from the uniform distribution (0, 1). Let τ = − log(p1)/rtot. Choose j such that j−1 j X X ak(n) < p2rtot ≤ ak(n) k=1 k=1

3. Let t ← t + τ. Let n ← n + νj. Go to 1.

Exact algorithms such as the direct method treat microscopic phenomena as discrete, molecular events. For intracellular models, this feature is appealing because of the inherently 24

Algorithm 2 First Reaction Method. Initialize. Set the time, t, equal to zero. Set the number of species n to n0.

1. Calculate the reaction rates ak(n) for k = 1, . . . , m.

2. Select m random numbers p1, . . . , pm from the uniform distribution (0, 1). Let τk = − log(pk)/ak(n), k = 1, . . . , m. Choose j such that j = arg min τk k

3. Let t ← t + τj. Let n ← n + νj. Go to 1. small number of molecules contained within a single cell (the finite number of chromosomes in the nucleus, for example). As models become progressively more complex, however, these algorithms often become expensive computationally. Some recent efforts have focused upon reducing this computational load. He, Zhang, Chen, and Yang employ a deterministic equi- librium assumption on polymerization reaction kinetics [61]. Gibson and Bruck refine the first reaction method, i.e. algorithm 2, to reduce the required number of random numbers, a tech- nique that works best for systems in which some reactions occur much more frequently than others [43]. Rao and Arkin demonstrate how to numerically simulate systems reduced by the quasi-steady-state assumption [113]. This work expands upon ideas by Janssen [69, 70] and Vlad and Pop [157] who first examined the adiabatic elimination of fast relaxing variables in stochastic chemical kinetics. Resat, Wiley, and Dixon address systems with reaction rates varying by several orders of magnitude by applying a probability-weighted Monte Carlo ap- proach, but this method increases error in species fluctuations [126]. Gillespie examines two approximate methods, tau leaping and kα leaping, for accelerating simulations by modeling the selection of “fast” reactions with Poisson distributions [50]. These methods employ ex- plicit, first-order Euler approximations that permit larger time steps to be taken than exact methods by allowing multiple firings of fast reactions by approximating the next reaction dis- tribution. In explicit tau leaping, one chooses a fixed time step τ, then increments the state by

m X n(t + τ) ≈ n(t) + νkPk(ak(n(t))τ) (2.93) k=1 in which Pk(ak(n(t))τ) is a Poisson random variable with mean ak(n(t))τ. In kα leaping, one chooses a particular reaction to undergo a predetermined number of events kα, then deter- 25 mines the time τ required for these events to occur by drawing a gamma random variable Γ(aα(n), kα). Using this value of τ, one draws Poisson random variables to determine how many events the remaining reactions undergo. A subsequent paper by Gillespie and Pet- zold discusses the error associated with the tau leaping approximation by using Taylor-series expansion arguments [51]. These conditions specify restrictions on the time increment τ to ensure that the error in the reconstructed mean and variance remain below a user-specified tolerance. However, this error only quantifies the effects of the reaction rate (aj(n)’s) depen- dence upon the state n, not the effect of approximating the exact next reaction distribution with a Poisson distribution. Rathinam, Petzold, Cao, and Gillespie later present a first-order implicit version of tau leaping, i.e.

m m X X n(t + τ) ≈ n(t) + νkak (n(t + τ)) + νk [Pk(ak(n(t))τ) − ak(n(t))] (2.94) k=1 k=1

This method has greater numerical stability than the explicit version [117].

2.3.2 Performing Systems Level Tasks with Stochastic Models

Employing kinetic Monte Carlo models for systems level tasks is an area of active research. Raimondeau, Aghalayam, Mhadeshwar, and Vlachos consider sensitivities via finite differ- ences and parameter estimation for kinetic Monte Carlo simulations [105]. Drews, Braatz, and Alkire consider calculating the sensitivity for the mean of multiple Monte Carlo simulations via finite differences, and apply this method to copper electrodeposition to determine which parameter perturbations most significantly affect the measurements [25]. Gallivan and Mur- ray consider model reduction techniques for the chemical master equation [39], then use the reduced models to determine optimal open-loop temperature profiles for epitaxial thin film growth [38]. Lou and Christofides consider control of growth rate and surface roughness in thin film growth [81, 82], employing proportional integral control that uses a kinetic Monte Carlo model to provide information about interactions between outputs and manipulated in- puts. This simple form of feedback control does not require an optimization. Laurenzi uses a genetic algorithm to estimate parameters for a model of aggregating blood platelets and neu- trophils [78]. Armaou and Kevrekidis employ a coarse time-stepper and a direct stochastic optimization method (Hooke-Jeeves) to determine an optimal control policy for a set of reac- tions on a catalyst surface [4]. Siettos, Armaou, Makeev and Kevrekidis use the coarse time stepper to identify the local linearization of the nonlinear stochastic model at a steady state of interest [138]. Given the local linearization of the model, standard linear quadratic control the- ory is then applied. Armaou, Siettos and Kevrekidis consider extending this control approach to spatially distributed processes [5]. Finally Siettos, Maroudas and Kevrekidis construct bi- furcation diagrams for the mean of the stochastic models [139]. 26 2.4 Population Balance Models

Stochastic models of chemical kinetics pose one alternative to traditional deterministic models for modeling intracellular kinetics. Many biological systems of interest, however, consist of populations of cells influencing one another. Here, we consider the dynamic behavior of cell populations undergoing viral infections. Traditionally, mathematical models for viral infections have focused solely on events occurring in either the intracellular or extracellular level. At the intracellular level, kinetic models have been applied to examine the dynamics of how viruses harness host cells to repli- cate more virus [73, 27, 29, 3], and how drugs targeting specific virus components affect this replication [122, 30]. These models, however, consider only one infection cycle, whereas in- fections commonly consist of numerous infection cycles. At the extracellular level, researchers have considered how drug therapies affect the dynamics of populations of viruses [164, 62, 98, 13, 100]. These models, though, neglect the fact that these drugs target specific intracellular viral components. To better understand the interplay of intracellular and extracellular events, a different modeling framework is necessary. We propose cell population balances as one such framework. Mathematical models for cell population dynamics may be effectively grouped by two distinctive features: whether or not the model has structure, and whether or not the model has segregations [6]. If a model has structure, then multiple intracellular components affect the dynamics of the cell population. If a model has segregations, then some cellular characteristic can be employed to distinguish among different cells in a population. Table 2.1 summarizes the different combinations of models arising from these features. In this context, current extra- cellular models are equivalent to unstructured, unsegregated models because the cells in each population (uninfected and infected cells) are assumed indistinguishable from each other.

Unstructured Structured

Most idealized case Multicomponent average cell Cell population treated as description one-component solute Unsegregated

Multicomponent description of Single component, cell-to-cell heterogeneity heterogeneous individual cells Most realistic case Segregated

Table 2.1: Types of cell population models [6]

The derivation of structured, segregated models stems from the equation of continu- ity. In particular, the derivation is identical as before up to the microscopic equation (2.8), but now considers the effect of various internal segregations upon the population behavior. 27

Fredrickson, Ramkrishna, and Tsuchiya consider the details of this derivation in their seminal contribution [36]. In recent years, this modeling framework has returned to the literature as researchers strive to adequately reconcile model predictions with the dynamics demonstrated by experimental data [80, 10, 33]. Also, new measurements such as flow cytometry offer the promise of actually differentiating between cells of a given population [1, 67], again implying the need to model distinctions between cells in a given population.

Notation

aµ(n) µth reaction rate

cµdt average probability to O(dt) that reaction µ will occur in the next time interval dt dΩ differential change in the control surface S(t)

dΩe differential change in the reactor surface Se

ek deviation between the predicted and actual measurement at time tk F total flux of the quantity η(t, z) f diffusive contribution to the total flux F

hµ number of distinct molecular reactant combinations for reaction µ at a given time J Jacobian m mean of a probability distribution N (m, C) normal distribution with mean m and covariance C

N 0 matrix containing all possible molecular configurations at time t0 n vector of the number of molecules for each chemical species ni ith Monte Carlo reconstruction of the vector n

ne normal vector pointing from the reactor surface Se away from the volume Ve

ns normal vector pointing from the surface S(t) away from the volume V (t)

ns total number of possible species P probability P(m) random number drawn from the Poisson distribution with mean m p random number from the uniform distribution (0, 1) q effluent volumetric flow rate

qf feed volumetric flow rate

Rη production rate of the species η

rtot sum of reaction rates

Se time-varying surface of the reactor volume Ve S(t) arbitrary, time-varying control volume spanning a space in z s sensitivity of the state x with respect to the parameters θ s˙ first derivative of the sensitivity with respect to time t time

tk discrete sampling time

Ve time-varying reactor volume V (t) arbitrary, time-varying control volume spanning a space in z 28

vk realization of the variable ξ at time tk

vs velocity vector for the surface S(t)

vx x-component of the velocity vector vz

vz velocity vector for material flowing through the volume V (t) X random variable x external characteristics x state x˙ first derivative of the state with respect to time

xk state at time tk

Y k distribution for the measurement yk y internal characteristics

yk measurement at time tk ZN random variable whose limiting distribution as N → ∞ is the normal distribution z internal and external characteristics Γ random number drawn from the gamma distribution δ Dirac delta function η(t, z)dz mass of reactants or products Θ distribution for the parameter set θ θ parameter set for a given model µ one possible reaction in the stochastic kinetics framework ν stoichiometric matrix ξ N (0, Π)-distributed random variable Π for the random variable ξ σ standard deviation τ time of the next stochastic reaction φ objective function ψ random variable 29

Chapter 3

Motivation

The motivation for this work is the current state of stochastic and deterministic methods used to model chemically reacting systems. For example, the rapid growth of biological mea- surements on the intracellular level (e.g. microarray and proteomic data) will require much more complicated models to adequately assimilate the data contained by these measurements. Therefore we seek to improve the current techniques used to evaluate and manipulate stochas- tic and deterministic models. In this chapter, we examine the current limitations of the existing methods for using stochastic models, traditional deterministic models, and state estimation techniques.

3.1 Current Limitations of Stochastic Models

We see two primary limitations of current methods for handling stochastic models:

1. exact integration methods scale with the number of reaction events, and 2. methods for performing systems level tasks require the use of noisy finite difference techniques.

We illustrate these points next.

3.1.1 Integration Methods

The current options for performing exact simulation of stochastic chemical kinetics are Gille- spie’s direct and first reaction methods [45, 46], and the next reaction method of Gibson and Bruck [43]. Gibson and Bruck [43] analyze the computational expenditure of these methods, and find that Gillespie’s methods at best scale with the number of reaction events, whereas their next reaction method scales with the log of the number of reaction events. To illustrate this point, we consider the simple reaction

k1 1 2A −−→ B a() = k1nA(nA − 1) (3.1) k−1 2 in which 30

• k1 = 4/(3nAo) and k−1 = 0.1, •  is the dimensionless extent of reaction, • a() is the reaction propensity function,

• nA is the number of A molecules, and

• nAo is the initial number of A molecules.

We consider simulating this system in which there are initially zero B molecules and a variable number of A molecules. For this system, the number of possible reactions scales with nAo. We scale rate constants for reactions with nonlinear rates so that the dimensionless extent of reaction remains constant as the variable nAo changes. Figure 3.1 demonstrates that the computational time for one simulation scales linearly with nAo, as expected.

0.14 0.12 0.1 0.08 0.06 0.04 0.02 0

Time required for simulation (sec) 0 2 4 86 10 1412 1816 20 Initial number of A molecules (nAo)

Figure 3.1: Computational time per simulation as a function of nAo. Line represents the least- squares fit of the data assuming that a simulation with nAo = 0 requires no computational time.

The question arises, then, as to the suitability of these methods for simulating intracel- lular chemistry. As an example, we consider the case of a rapidly growing Escherichia coli (or E. coli) cell. For this circumstance, one E. Coli cell contains approximately four molecules of deoxyribonucleic acid (DNA), 1000 molecules of messenger ribonucleic acid (mRNA), and 106 proteins [6]. Simulating these conditions with methods that scale with the number of reaction events is clearly acceptable for modeling the DNA and mRNA species, but simulating events at the protein level is not a trivial task. Now consider Figure 3.2, which plots how an intensive variable such as the extent of reaction  changes as nAo increases. This figure demonstrates that, as the number of molecules increases, the extent appears to be converging to a smoothly-varying deterministic trajectory. This simulation exhibits precisely the mathematical result proven by Kurtz: in the thermody- namic limit (n → ∞, V → ∞, n/V = constant), the master equation written for n (number 31 of molecules) collapses to the a deterministic equation for c (concentration of molecules) [76]. The appeal of the deterministic equation is that the computational time required for its solu- tion does not scale with the simulated number of molecules. For E. Coli, such an approxima- tion may certainly be valid for reactions among proteins, but not for those among DNA. We address this issue further in Chapter 4.

0.9 0.8 0.7 0.6 0.5 0.4

0.3 nAo 10 × nAo

Extent of reaction 0.2 20 × n 0.1 Ao deterministic 0 0 2 4 86 10 Time

Figure 3.2: Extent of reaction as a function of nAo.

3.1.2 Systems Level Tasks

A secondary issue arising from stochastic models is how to extract information from these models. Currently, most researchers merely integrate these types of models to determine the dynamic behavior of the system given a specific initial condition and inputs. As pointed out previously, this integration is potentially expensive. One recent strategy for obtaining more information from the model involves using finite difference methods to obtain estimates of the model sensitivity [105, 25], then using these sensitivities for parameter estimation and steady-state analysis. For example, we could determine the sensitivity of reaction 3.1 to the forward rate constant k1 by evaluating the central finite difference

∂nA F (k1 + δ) − F (k1 − δ) s = ≈ (3.2) ∂k1 2δ in which

• s is the sensitivity of the state nA with respect to the parameter k1,

• F (x) yields a trajectory from a stochastic model integration given the parameter k1 = x and nAo initial molecules, and

• δ is a perturbation to the parameter k1. 32

Figure 3.3 plots the perturbed trajectories and the desired sensitivity. At the smaller perturba- tion of δ = 0.2k1, the stochastic fluctuations of the simulation dominate, yielding a noisy, poor sensitivity estimate. The larger perturbation of δ = 0.8k1 yields a smoother sensitivity, but the accuracy of the central finite difference is questionable. There is obviously significant room for improvement in the methods used to calculate this quantity. We consider this issue further in Chapters 5 and 6. Additionally, little work has focused on how best to use information obtained from simulations of stochastic differential equations. Accordingly, we consider sen- sitivities for these types of models in Chapter 7. Finally, we apply many of the tools developed in these chapters to crystallization systems in Chapter 8.

200 2000 180 (a) 0 160 140 -2000 n 120 A -4000 S 100 80 -6000 60 −δ -8000 40 +δ 20 -10000 0 2 4 86 10 Time 200 0 180 (b) -1000 160 -2000 140 -3000 -4000 n 120 −δ A -5000 S 100 -6000 80 -7000 60 -8000 40 +δ -9000 20 -10000 0 2 4 86 10 Time

Figure 3.3: Finite difference sensitivity for the stochastic model: (a) small perturbation (δ = 0.2k1) and (b) large perturbation (δ = 0.8k1). 33 3.2 Current Limitations of Traditional Deterministic Models

We restrict this examination to modeling of viral infections, although the same arguments gen- erally hold for virtually all systems involving populations of cells. Figure 3.4 generalizes the cyclic nature of viral infections. The initiation of a viral infection occurs when the virus is introduced to a host organism. The virus then targets specific uninfected host cells for infec- tion. Once infected, these host cells become in essence “factories” that replicate and secrete the virus. The cycle of infection and virus production then continues. During this infection cycle, uninfected cells may continue to reproduce. This cycle is essentially the one proposed by Nowak and May [98].

Uninfected Cells Infected Cells

Generation

death

Free Virus

Figure 3.4: Cyclic nature of viral infections.

These types of models usually assume that the production rate of virus is directly pro- portional to the concentration of infected cells. This assumption generally permits reduction of the model to a coupled set of ordinary differential equations (e.g. three ODE’s to model the un- infected cell population, the infected cell population, and the virus population). This assump- tion is a gross simplification; in fact, many modelers have focused entirely on considering the complex chemistry required at the intracellular level to produce viral progeny [73, 27, 29, 3]. A more realistic picture of viral infections consists of a combination of the intracellular and ex- tracellular levels. As described in Chapter 2, cell population balance models offer one means of combining these two levels. Since the literature review uncovered little active research in this area, we therefore seek to explore the utility of the cell population balance in explaining biological phenomena. We believe that refined versions of these models may lead to insights on how to best control viral propagation. We first explore the utility of the cell population balance in a numerical setting in Chapter 9, then investigate whether or not these types of models are useful in explaining actual experimental data in Chapter 10. Finally, we introduce an approximation that significantly reduces the computational expense of solving this class of models in Chapter 11. 34 3.3 Current Limitations of State Estimation Techniques

It is well established that the Kalman filter is the optimal state estimator for unconstrained, linear systems subject to normally distributed state and measurement noise. Many physical systems, however, exhibit nonlinear dynamics and have states subject to hard constraints, such as nonnegative concentrations or pressures. Hence Kalman filtering is no longer directly ap- plicable. Perhaps the most popular method for estimating the state of nonlinear systems is the extended Kalman filter, which first linearizes the nonlinear system, then applies the Kalman filter update equations to the linearized system [144]. The extended Kalman filter assumes that the a posteriori distribution is normally distributed (unimodal), hence the mean and the mode of the distribution are equivalent. Questions that arise are: how does this strategy perform when multiple modes arise in the a posteriori distribution? Also, are multiple modes even a concern for chemically reacting systems? Finally, can multiple modes in the estimator hinder closed-loop performance? We address the first two of these questions in Chapter 12, and the final question in Chapter 13.

Notation

a() reaction propensity function c concentrations for all reaction species

kj rate constant for reaction k n number of molecules for all reaction species

nA number of molecules for species A s sensitivity δ finite difference perturbation  extent of reaction 35

Chapter 4

Approximations for Stochastic Reaction Models 1

Exact methods are available for the simulation of isothermal, well-mixed stochastic chemical kinetics. As increasingly complex physical systems are modeled, however, these methods be- come difficult to solve because the computational burden scales with the number of reaction events [43]. We address one aspect of this problem: the case in which reacting species fluctuate by different orders of magnitude. We expand upon the idea of a partitioned system [113, 157] and simulation via Gillespie’s direct method [45, 46] to construct approximations that reduce the computational burden for simulation of these species. In particular, we partition the sys- tem into subsets of “fast” and “slow” reactions. We make various approximations for the “fast” reactions (either invoking an equilibrium approximation, or treating them determinis- tically or as Langevin equations), and treat the “slow” reactions as stochastic events. Such approximations can significantly reduce computational load while accurately reconstructing at least the first two moments of the probability distribution for each species. This chapter provides a theoretical background for such approximations and outlines strategies for computing these approximations. First, we examine the theoretical underpin- nings of the approximations. Next, we propose numerical algorithms for performing the simu- lations, review several practical implementation issues, and propose a further approximation. We then consider three motivating examples drawn from the fields of enzyme kinetics, parti- cle technology, and biotechnology that illustrate the accuracy and computational efficiency of these approximations. Finally, we critically examine the technique and present conclusions.

4.1 Stochastic Partitioning

The key ideas are to 1) model the state of the reaction system using extents of reaction as opposed to molecules of species, and 2) partition the state into subsets of “fast” and “slow” reactions. With these two modeling choices, we can exploit the structure of the chemical mas- ter equation, the governing equation for the evolution of the system probability density, by

1Portions of this chapter appear in Haseltine and Rawlings [57]. 36 making order of magnitude arguments. We then derive the master equations that govern the “fast” and “slow” reaction subsets. This section outlines these manipulations in greater detail. We model the state of the system, x, using an extent for each irreversible reaction 2. An extent of reaction model is consistent with a molecule balance model since

T n = n0 + ν x (4.1) in which, assuming that there are m extents of reaction and p chemical species:

• x is the state of the system in terms of extents (an m-vector),

• n is the number of molecules (a p-vector),

• n0 is the initial number of molecules (a p-vector), and

• ν is the stoichiometric matrix (an m × p-matrix).

The upper and lower bounds of x are constrained by the limiting reactant species. We arbi- trarily set the initial condition to the origin. Given assumptions outlined by Gillespie [48], the governing equation for this system is the chemical master equation

m dP (x; t) X = a (x − I )P (x − I ; t) − a (x)P (x; t) (4.2) dt k k k k k=1 in which

• P (x; t) is the probability that the system is in state x at time t,

• ak(x)dt is the probability to order dt that reaction k occurs in the time interval [t, t + dt), and

th • Ik is the k column of the (m × m)- I.

The structure of I arises for this particular chemical master equation because the reactions are irreversible. Also, we have implicitly conditioned the master equation (4.2) on a specific initial condition, i.e. n0. Generalizing the analysis presented in this chapter to a distribution of initial conditions (n0,1,..., n0,n) is straightforward due to the relation X P (x|n0,1,..., n0,n; t) = P (x|n0,j; t)P (n0,j) (4.3) j and the fact that the values of P (n0,j) are specified in the initial condition. Now we examine the time scale over which the extents of reaction change. We must first determine a relevant time scale so that we can partition the extents into two subsets: those that have small propensity functions (ak(x)’s) and occur few if any times over the time scale, and those that have large propensity functions and occur numerous times over the given time

2Note that reversible reactions can be modeled as two irreversible reactions. 37 scale. We designate these subsets of x as the (m − l)-vector y and the l-vector z, respectively. Note that " # " # y Iy 0 x = and I = (4.4) z 0 Iz in which Iy and Iz are (m − l × m − l)- and (l × l)-identity matrices, respectively. We also partition the reaction propensities into groups of fast (cj) and slow (bj)     a1(y, z; t) b1(y, z; t)  .   .   .   .       a (y, z; t)  b (y, z; t)  m−l   m−l    =   (4.5) am−l+1(y, z; t)  c1(y, z; t)       .   .   .   .  am(y, z; t) cl(y, z; t)

Equation (4.2) becomes

m−l dP (y, z; t) X = b (y − Iy, z)P (y − Iy, z; t) − b (y, z)P (y, z; t) dt j j j j j=1 l X z z + ck(y, z − Ik)P (y, z − Ik; t) − ck(y, z)P (y, z; t) (4.6) k=1

Ultimately, we are interested the determining an approximate governing equation for the evo- lution of the joint density, P (y, z; t), in regimes where fast reaction extents are much greater than slow reaction extents. Denoting the total extent space as X, we define a subspace Xp ⊂ X for which " # y c (y, z)  b (y, z) ∀1 ≤ k ≤ l, 1 ≤ j ≤ m − l, ∈ (4.7) k j z Xp

By defining the conditional and marginal probabilities over this subspace as " # y P (y, z; t) = P (z|y; t)P (y; t) ∀ ∈ (4.8) z Xp " # X y P (y; t) = P (y, z; t) ∀ ∈ (4.9) z Xp z we can alternatively derive evolution equations for both the marginal probability of the slow reactions, P (y; t), and the probability of the fast reactions conditioned on the slow reactions, P (z|y; t). Consequently, we then know how the fast and slow reactions evolve over this time scale. Also, this partitioning is similar to that used by Rao and Arkin [113], who partition the master equation by species to treat the quasi-steady-state assumption. We partition by reaction extents to treat fast and slow reactions. 38

All the manipulations performed in the next two subsections apply only for fast and slow reactions in the partitioned subspace Xp. To simplify the presentation of the results, we drop the implied notation " # y ∀ ∈ z Xp from all subsequent equations.

4.1.1 Slow Reaction Subset

We first address the subset of slow reaction extents y. From the definition of the marginal density, X P (y; t) = P (y, z; t) (4.10) z Differentiating equation (4.10) with respect to time yields

dP (y; t) X dP (y, z; t) = (4.11) dt dt z Now substitute the master equation (4.6) into equation (4.11) and manipulate to yield

m−l dP (y; t) X X = b (y − Iy, z)P (y − Iy, z; t) − b (y, z)P (y, z; t) dt  j j j j z j=1 l ! X z z + ck(y, z − Ik)P (y, z − Ik; t) − ck(y, z)P (y, z; t) (4.12) k=1  m−l  X X y y =  bj(y − Ij , z)P (y − Ij , z; t) − bj(y, z)P (y, z; t) z j=1 l ! X X z z + ck(y, z − Ik)P (y, z − Ik; t) − ck(y, z)P (y, z; t) (4.13) z k=1 | {z } 0 m−l X X y y = bj(y − Ij , z)P (y − Ij , z; t) − bj(y, z)P (y, z; t) (4.14) z j=1

Equation (4.14) is exact; we have made no approximations in its derivation. Also, if we rewrite the joint density in terms of the conditional density using the definition

P (y, z; t) = P (z|y; t)P (y; t) (4.15) then one interpretation of this analysis is that the evolution of the marginal P (y; t) depends on the conditional density P (z|y; t). We consider deriving an evolution equation for this con- ditional density next. 39 4.1.2 Fast Reaction Subset

We now address the evolution of the probability density for the subset of fast reactions con- ditioned on the subset of slow reactions, P (z|y; t). For our starting point, we use order of magnitude arguments, i.e. equation (4.7), to approximate the original master equation (4.6) as

l dP (y, z; t) X ≈ c (y, z − Iz )P (y, z − Iz ; t) − c (y, z)P (y, z; t) (4.16) dt k k k k k=1

We define this approximate joint density as PA(y, z; t), and thus its evolution equation is

l dPA(y, z; t) X c (y, z − Iz )P (y, z − Iz ; t) − c (y, z)P (y, z; t) (4.17) dt , k k A k k A k=1

Following Rao and Arkin [113], we define the joint density PA(y, z; t) as the product of the desired conditional density PA(z|y; t) and the marginal density PA(y; t):

PA(y, z; t) = PA(z|y; t)PA(y; t) (4.18)

Differentiating equation (4.18) with respect to time yields

dPA(y, z; t) dPA(z|y; t) dPA(y; t) = P (y; t) + P (z|y; t) (4.19) dt dt A dt A Solving equation (4.19) for the desired conditional derivative yields   dPA(z|y; t) 1 dPA(y, z; t) dPA(y; t) = − PA(z|y; t) (4.20) dt PA(y; t) dt dt

Evaluating the marginal evolution equation by summing equation (4.17) over the fast extents z yields

l dPA(y; t) X X = c (y, z − Iz )P (y, z − Iz ; t) − c (y, z)P (y, z; t) (4.21) dt k k A k k A z k=1 = 0 (4.22)

Consequently, equation (4.19) becomes

l ! dPA(z|y; t) 1 X z z = ck(y, z − Ik)PA(y, z − Ik; t) − ck(y, z)PA(y, z; t) (4.23) dt PA(y; t) k=1 l X z z = ck(y, z − Ik)PA(z − Ik|y; t) − ck(y, z)PA(z|y; t) (4.24) k=1 which is the desired closed-form expression for the conditional density PA(z|y; t). 40 4.1.3 The Combined System

For the slow reactions, we approximate the joint density P (y, z; t) as

P (y, z; t) ≈ PA(z|y; t)P (y; t) (4.25)

Combining the evolution equations for the slow and fast reaction extents, i.e. equations (4.14) and (4.24) respectively, then yields the following coupled master equations

m−l ! ! dP (y; t) X X X ≈ b (y − Iy, z)P (z|y − Iy; t) P (y − Iy; t) − b (y, z)P (z|y; t) P (y; t) dt j j A j j j A j=1 z z (4.26a) l dPA(z|y; t) X = c (y, z − Iz )P (z − Iz |y; t) − c (y, z)P (z|y; t) (4.26b) dt k k A k k A k=1

From these equations, using order of magnitude arguments to produce a time-scale separation has clearly had two effects: first, the coupled expressions for the marginal and conditional evolution equations in (4.26) are Markov in nature; and second, the evolution equation for the fast extents conditioned on the slow extents, PA(z|y), has decoupled from the slow ex- tent marginal, P (y). Additionally, exact solution of the coupled master equations (4.26) is at least as difficult as the original master equation (4.2) due to the fact that one must solve an individual master equation of the form of equation (4.26b) for every element of the slow con- ditional equation (4.26a). From a simulation perspective, equation (4.26) is also as difficult to evaluate as the original master equation (4.2) since both of the coupled master equations are discrete and time-varying. However, approximating the fast extents can significantly reduce the computational expense involved with simulating these coupled equations. Different ap- proximations are applicable based on the characteristic relaxation times of the fast and slow extents. Next, we investigate two such approximations: an equilibrium approximation for the case in which the fast extents relax significantly faster than the slow extents, and a Langevin or deterministic approximation for the case in which both fast and slow extents relax at similar rates.

4.1.4 The Equilibrium Approximation

We first consider the case in which the relaxation time for the fast extents is significantly smaller than the expected time to the first slow reaction. To illustrate this case, we consider the simple example

k1 k A )−*− B −→3 C (4.27) k2 41

We denote the extents of reaction for this example as 1, 2, and 3, and define the reaction propensities as

a1(x) = k1nA (4.28a)

a2(x) = k2nB (4.28b)

a3(x) = k3nC (4.28c)

If k1, k2  k3, then we can partition 1 and 2 as the fast reactions z, and 3 as the slow extent of reaction y. Additionally, we would expect the fast extents of reaction to equilibrate (relax) before the expected time to the first slow reaction. Returning to the master equation formalism, this equilibration implies that we should approximate the fast reactions, equation (4.26b), as

l X z z 0 ≈ ck(y, z − Ik)PA(z − Ik|y; t) − ck(y, z)PA(z|y; t) (4.29) k=1 The resulting coupled master equations are

m−l ! ! dP (y; t) X X X ≈ b (y − Iy, z)P (z|y − Iy; t) P (y − Iy; t) − b (y, z)P (z|y; t) P (y; t) dt j j A j j j A j=1 z z (4.30a) l X z z 0 = ck(y, z − Ik)PA(z − Ik|y; t) − ck(y, z)PA(z|y; t) (4.30b) k=1 This coupled system, equation (4.30), is markedly similar to the governing equations for the slow-scale simulation recently proposed by Cao, Gillespie, and Petzold [16]. Their derivation deviates from ours, however, and the differences deserve some attention. First, Cao, Gillespie, and Petzold [16] partition on the basis of fast and slow species rather than extents, with fast species affected by at least one fast reaction and slow species affected by solely slow reactions. We have chosen to remain in the extent space because extents are equilibrating, not chemical species. Also, Cao, Gillespie, and Petzold [16] use the construct of a virtual fast system to arrive at an evolution equation for the slow species (similar to our evolution equation for the slow extent marginal, equation (4.14)), a choice that obviates the need for defining an evolution equation for the conditional density P (z|y). In contrast to this approach, we believe that our approach has a much tighter connection to the original master equation due to the fact we derived the coupled system, equation (4.30), directly from the the original master equation, and the fact that we can obtain an approximate value of the joint density P (y, z; t) through equation (4.25). Also, all approximations arise directly from order of magnitude and relaxation time arguments.

4.1.5 The Langevin and Deterministic Approximations

We now consider the case in which both fast and slow extents relax at similar time scales. Revisiting the reaction example 4.27, we consider the case in which k1  k2, k3 and nAo  42 nBo, nCo in which the notation nAo refers to the initial number of A molecules. For this exam- ple, we partition 1 as the fast extent of reaction z, and 2 and 3 as the slow extents of reaction y. Until a significant amount of A has been consumed, we would expect numerous firings of 1 interspersed with relatively few firings of 2 and 3. Clearly the system never equilibrates, but rather fast and slow reactions fire until the fast extent reaches a similar order of magnitude as one of the slow extents. Note also that, in contrast to the equilibrium approximation, we have introduced the number of molecules into the time-scale argument. For most cases, we expect this time-scale argument to involve large numbers of reacting molecules, but such involve- ment is not always the case as demonstrated in the viral infection example presented later in this chapter. Rather, we require that the magnitude of the fast reactions remain large relative to the magnitude of the slow reactions through the expected time of the first slow reaction. Returning to the master equation formalism, this process requires a different approxi- mation for the conditional density P (z|y). We proceed by demonstrating as outlined by Gar- diner [41] how this subset can be approximated using the Langevin approximation. Define the characteristic size of the system to be Ω, and use this size to recast the master equation (4.24) in terms of intensive variables (let z ← z/Ω). Performing a Kramers-Moyal expansion on this master equation results in a system size expansion in Ω. In the limit as z and Ω become large, the discrete master equation (4.26b) can be approximated by its first two differential moments with the continuous Fokker-Planck equation

l l l 2 ∂PA(z|y; t) X ∂ 1 X X ∂ = − (A (y, z)P (z|y; t))+ B (y, z)2P (z|y; t) (4.31) ∂t ∂z i A 2 ∂z ∂z ij A i=1 i i=1 j=1 i j in which (noting that z consists of extents of reaction):

l T X z h i A(y, z) = Ii ci(y, z) = c1(y, z) c2(y, z) ··· cl(y, z) (4.32) i=1 l 2 X z z T [B(y, z)] = Ii (Ii ) ci(y, z) = diag (c1(y, z), c2(y, z), . . . , cl(y, z)) (4.33) i=1 Here, diag(a, . . . , z) defines a matrix with elements a, . . . , z on the diagonal. Equation (4.31) has Itoˆ solution of the form l X dzi = Ai(y, z)dt + Bij(y, z)dW j ∀1 ≤ i ≤ l (4.34a) j=1 p = ci(y, z)dt + ci(y, z)dW i ∀1 ≤ i ≤ l (4.34b) in which W is a vector of Wiener processes. Equation (4.34) is the chemical Langevin equation, whose formulation was recently readdressed by Gillespie [49]. Note the difference between equations (4.31) and (4.34). The Fokker-Planck equation (4.31) specifies the distribution of the stochastic process, whereas the stochastic differential equation (4.34) specifies how the trajectories of the state evolve. Also, bear in mind that whether or not a given Ω is large 43 enough to permit truncation of the system size expansion is relative. In this case, Ω is of sufficient magnitude to make this approximation valid for only a subset of the reactions, not the entire system. Combining the evolution equations for the slow and fast reaction extents, i.e. equa- tions (4.26a) and (4.31) respectively, the problem of interest is the coupled set of master equa- tions

m−l dP (y; t) X Z  ≈ b (y − Iy, z0 )P (z0 |y − Iy; t)dz0 P (y − Iy; t) dt k k k A k k k k=1 z Z  0 0 0 − bk(y, z )PA(z |y; t)dz P (y; t) (4.35a) z l l l 2 ∂PA(z|y; t) X ∂ 1 X X ∂ = − (A (y, z)P (z|y; t)) + B (y, z)2P (z|y; t) ∂t ∂z i A 2 ∂z ∂z ij A i=1 i i=1 j=1 i j (4.35b)

If we can solve these equations simultaneously, then we in fact have an approximate solution to the original master equation (4.6) due to the definition of the conditional density given by equation (4.25). Note that the solution is approximate due to the fact that we have used the Fokker-Planck approximation for the master equation of the fast reactions. In the thermodynamic limit (z → ∞, Ω → ∞, z = z/Ω = finite), the intensive variables for the fast subset of reactions (z’s) evolve deterministically [76]. Accordingly, we propose further approximating the Langevin equation (4.34) as

dzi = ci(y, z)dt ∀1 ≤ i ≤ l (4.36)

In this case, the coupled master equations (4.35) reduce to

m−l dP (y; t) X ≈ b (y − Iy, z(t))P (y − Iy; t) − b (y, z(t))P (y; t) (4.37a) dt k k k k k=1

dzi =ci(y, z)dt ∀1 ≤ i ≤ l (4.37b) in which z(t) is the solution to the differential equation (4.36). The benefit of this assumption is that equation (4.36) can be solved rigorously using an ODE solver. Unfortunately for phys- ical systems, the thermodynamic limit is obviously unattainable. However, knowledge of the modeled system can lead to this simplification. If the magnitude of the fluctuations in this term is small compared to the sensitivity of ci(y, z) to the subset y, then equation (4.36) is a valid approximation. This approximation is also valid if one is primarily concerned with the fluctuations in the small-numbered species as opposed to the large-numbered species, assum- ing that the extents approximated by equation (4.36) predominantly affect the population size of large-numbered species. 44 4.2 Numerical Implementation of the Approximations

We now outline procedures for implementing the equilibrium, Langevin, and deterministic approximations presented in the previous section. We propose using simulation to reconstruct moments of the underlying master equation. For the slow reactions, Gillespie [47] outlines a general method for exact stochastic simulation that is applicable to the desired problem, equation (4.26a). This method examines the joint probability function, P (τ, µ), that governs when the next reaction occurs, and which reaction occurs. We present a brief derivation of this function. We proceed by noting that the key probabilistic questions are: when will the next reac- tion occur, and which reaction will it be [45] ? For this end, we define ( P z bµ(y, z)PA(z|y; t)dt equilibrium approximation bµ(y, z; t)dt = R 0 0 0 z bµ(y, z )PA(z |y; t)dz dt Langevin or deterministic approximation (4.38) in which bµ(y, z; t)dt is the probability (first order in dt) that reaction µ occurs in the next time interval dt. We express the joint probability P (τ, µ)dτ as the product of the independent probabilities P (τ, µ)dτ = P0(τ)P (µ)dτ (4.39) in which

• P0(τ) is the probability that no reaction occurs within [t, t + τ), and

• P (µ)dτ is the probability that reaction µ takes place within [t + τ, t + τ + dτ).

To determine P0(τ), consider the change in this probability over the differential incre- ment in time dt, assuming that probabilities are independent over disjoint periods of time [68]:

 m−l  X P0(τ + dt) = P0(τ) 1 − bj(y, z; t + τ)dt (4.40a) j=1 y = P0(τ)(1 − rtot(t)dt) (4.40b) y Here, rtot(t) is the sum of reaction rates for subset y at time t. Rearranging equation (4.40a) and taking the limit as dt → 0 yields the differential equa- tion dP0(τ) = −ry (t)P (τ) (4.41) dt tot 0 which has solution  Z t+τ  y 0 0 P0(τ) = exp − rtot(t )dt (4.42) t The joint probability function P (τ, µ) is therefore:  Z t+τ  y 0 0 P (τ, µ) = bµ(y, z; t + τ) exp − rtot(t )dt (4.43) t 45

We now address our key questions by conditioning the joint probability function P (τ, µ):

P (τ, µ) = P (µ|τ)P (τ) (4.44) in which P (τ) is the probability that a reaction occurs in the differential instant after time t+τ, and P (µ|τ) is the probability that this reaction will be µ. First note that by definition:

l X P (τ) = P (τ, µ) (4.45) µ=1

Implicit in this equation is the assumption that a reaction occurs, and hence the probability of not having a reaction is zero. Then by rearranging equation (4.44) and incorporating (4.45), it can be deduced that: P (τ, µ) P (µ|τ) = Pm−l (4.46) µ=1 P (τ, µ)

Equation (4.46) can be solved exactly by employing equation (4.43) to yield:

b (y, z; t + τ) P (µ|τ) = µ Pm−l (4.47) j=1 bj(y, z; t + τ)

We then solve equation (4.45) by employing equation (4.43):   m−l  Z t+τ  X y 0 0 P (τ) =  bj(y, z; t + τ) exp − rtot(t )dt (4.48a) j=l t  Z t+τ  y y 0 0 = rtot(t + τ) exp − rtot(t )dt (4.48b) t

Using Monte Carlo simulation, we obtain realizations of the desired joint probabil- ity function P (τ, µ) by randomly selecting τ and µ from the probability densities defined by equations (4.48b) and (4.47). Such a method is the equivalent of the direct method for hybrid systems. Given two random numbers p1 and p2 uniformly distributed on (0, 1), τ and µ are constrained accordingly:

Z t+τ y 0 0 rtot(t )dt + log(p1) = 0 (4.49a) t µ−1 µ X y X bk(y, z; t + τ) < p2rtot(t + τ) ≤ bk(y, z; t + τ) (4.49b) k=l+1 k=l+1

Simulating the different approximations require slightly different algorithms, which we ad- dress next. 46 4.2.1 Simulating the Equilibrium Approximation

We first address the equilibrium approximation. For this case, X bj(y, z; t) = bj(y, z)PA(z|y; t) ∀1 ≤ j ≤ m − l (4.50) z

Additionally, the quantities bj(y, z; t) are actually time invariant between slow reactions. Thus, the integral constraint (4.49a) reduces to the algebraic relation

log(p1) τ = − y (4.51) rtot(t)

Algorithm 3 Exact solution of the partitioned stochastic system for the equilibrium approxi- mation. Off-line. Partition the set x of m extents of reaction into fast and slow extents. Determine the parti- tioned stoichiometric matrices (the (m − l × p)-matrix νy and the (l × p)-matrix νz) and the reaction y propensity laws (ak(y, z)’s). Also, choose a strategy for solving the distribution PA(z|y) given by equa- tion (4.30) for the fast reactions in the partitioned case. Initialize. Set the time, t, equal to zero. Set the number of species n to n0.

1. Solve for the distribution PA(z|y), denoting all possible combinations of z as (z(0),..., z(t)). Record the initial value of z as z(i). 2. For subset y, calculate P (a) the reaction propensities, bj(y, z) = z bj(y, z)PA(z|y) ∀j = 1, . . . , m − l, and y Pm−l (b) the total reaction propensity, rtot = k=1 bj(y, z).

3. Select three random numbers p1, p2, and p3 from the uniform distribution (0, 1).

4. Choose z(j) from the distribution PA(z|y) such that

j−1 j X X PA(z(k)|y) < p1 ≤ PA(z(k)|y) k=1 k=1

Set νˆz = z(j) − z(i). y 5. Let τ = − log(p2)/rtot. Choose j such that

j−1 j X y X bk(y, z) < p3rtot ≤ bk(y, z) k=1 k=1

yT y th y 6. Let n ← n + νj + νˆz, where νj is the j row of ν . Go to step 1.

Algorithm 3 presents one method of solving this system. Note that we could draw a sample from the equilibrium distribution PA(z|y) at any time to determine a current value of 47 the state, which may be desirable for sampling the system at uniform time increments. Also, this algorithm is very similar to the slow-scale stochastic simulation algorithm proposed by Cao, Gillespie, and Petzold [16], with the exception that our algorithm partitions extents as opposed to species. Solution of the equilibrated density PA(z|y) deserves some further attention. If we stack probabilities for all possible values of the fast extents into a vector P , we can recast the continuous-time master equation as a vector-matrix problem, i.e.

dP = AP ≈ 0 (equilibrium assumption) (4.52) dt in which A is the matrix of reaction propensities. The equilibrium distribution is then the null space of the matrix A, which we can compute numerically. In general, we expect A to be a . Consequently, we can efficiently solve the linear system (4.52) for P using Krylov iterative methods [153] such as the biconjugate gradient stabilized method. Cao, Gillespie, and Petzold [16] outline some alternative, approximate methods for evaluating this equilibrated density.

4.2.2 Simulating the Langevin and Deterministic Approximations: Exact Next Re- action Time

We now address methods for simulating the Langevin and deterministic approximations. These approximations have time-varying reaction propensities, so we must satisfy equation (4.49a) y by integrating rtot and the fast subset of reactions z forward in time until the following condi- tion is met: Z t+τ y 0 0 rtot(t )dt + log(p1) = 0 (4.53) t m−l y X rtot(t) = bj(y, z; t) (4.54) j=l Z 0 0 0 bj(y, z; t) = bj(y, z )PA(z |y; t)dz ∀1 ≤ j ≤ m − l (4.55) z

For the Langevin approximation, we propose reconstructing the density PA(z|y; t) by simu- lating the stochastic differential equation (4.34) (also known as the Langevin equation). In this case, equation (4.55) becomes

N 1 X b (y, z; t) ≈ b (y, zk) ∀1 ≤ j ≤ m − l (4.56) j N j k=1 in which zk is the kth of N simulations of equation (4.34). For the deterministic approximation, equation (4.37) indicates that we need only solve for the deterministic evolution of the fast extents. We propose using algorithm 4 to solve this partitioned reaction system, in which we 48

Algorithm 4 Exact solution of the partitioned stochastic system for the Langevin and deter- ministic approximations. Off-line. Determine the criteria for when and how the set x of m extents of reaction should be parti- tioned. Determine the stoichiometric matrices of the form given in equation (4.1) and reaction propen- sity laws for the unpartitioned (the (m × p)-matrix ν and ak(x)’s) and partitioned cases (the (m − l × p)- y z y matrix ν , the (l×p)-matrix ν , and ak(y, z)’s). Also, determine the necessary Langevin or deterministic equations for the fast reactions in the partitioned case. Initialize. Set the time, t, equal to zero. Set the number of species n to n0.

1. If the partitioning criteria established off-line are met, go to step 5. 2. Calculate

(a) the reaction propensities, rk = ak(x), and Pm (b) the total reaction propensity, rtot = k=1 rk.

3. Select two random numbers p1, p2 from the uniform distribution (0, 1). Let τ = − log(p1)/rtot. Choose j such that j−1 j X X rk < p2rtot ≤ rk k=1 k=1

4. Let t ← t + τ. T th Let n ← n + νj , where νj is the j row of ν. Go to step 1. 5. For subset y, calculate

y y (a) the reaction propensities, rk = bk(y, z), and y Pm−l y (b) the total reaction propensity, rtot = k=1 rk.

6. Select two random numbers p1, p2 from the uniform distribution (0, 1). z T y 7. Determine νˆz = (ν ) [z(t + τ) − z(t)] by integrating rtot(t) and the subset of fast reactions z until the following condition is met:

Z t+τ m−l y 0 0 y X y rtot(t )dt + log(p1) = 0 s.t. : rtot(t) = ak(y, z; t) t k=1

8. Let t ← t + τ. Let n ← n + νˆz. 9. Choose j such that j−1 j X y y X y rk < p2rtot(t) ≤ rk k=1 k=1 y y Current values of the rk’s and rtot should be available from step 7. yT y th y 10. Let n ← n + νj , where νj is the j row of ν . Go to step 1. 49 choose only to use one simulation to evaluate equation (4.56) for the Langevin case. Using more than one simulation to evaluate equation (4.56) for the Langevin case is also possible. Over the time interval τ, implementation of this algorithm actually enforces the more stringent requirement that dP (y) = 0 (4.57) dt Hence equation (4.22) is exact, not approximate.

4.2.3 Simulating the Langevin and Deterministic Approximations: Approximate Next Reaction Time

One major difficulty in this method is satisfying the constraint

Z t+τ y 0 0 rtot(t )dt + log(p1) = 0 (4.58) t in step 7 of the algorithm 4 as opposed to the simple algebraic relation for τ used in the un- modified Gillespie algorithm (i.e. step 3 of algorithm 4). This constraint can prove to be com- putationally expensive. If the reaction propensities for the fast subset of extents z change insignificantly over the stochastic time step τ, the unmodified Gillespie algorithm can still provide an approximate solution. When the reaction propensities change significantly over τ, steps can be taken to reduce the error of the Gillespie algorithm. One idea is to scale the stochastic time step τ by artificially introducing a probability of no reaction into the system:

• Let a0dt be the contrived probability, first order in dt, that no reaction occurs in the next time interval dt.

This probability does not affect the number of molecules of the modeled reaction system while allowing adjustment of the stochastic time step by changing the magnitude of a0. Theoretically, as the magnitude of a0 becomes infinite, the total reaction rate becomes infinite. As the total reaction rate approaches infinity, the error of the stochastic simulation subject to constraints approaches zero because the algorithm checks whether or not a reaction occurs at every time. Even though the method outlined by Gillespie [47] and Jansen [68] is “exact”, for this case there is still error associated with 1) the number of simulations performed since it is a Monte Carlo method, and 2) integration of the Langevin equations for the fast extents of reaction. Thus it is plausible that these errors may be greater than the error introduced by the approximation. Hence our approximation may often prove to be less computationally expensive than the exact simulation while generating an acceptable amount of simulation error. The approximation modifies steps 5-10 of the algorithm 4 with those given by algo- rithm 5. 50

Algorithm 5 Approximate solution of the partitioned stochastic system. 5. For subset y, calculate

y y (a) the reaction propensities, rk = bk(y, z), and y Pm−l y (b) the total reaction propensity, rtot = k=0 rk.

6. Select two random numbers p1, p2 from the uniform distribution (0, 1).

7. Let τ = − log(p1)/rtot. z T Integrate subset z over the range [t, t + τ) to determine νˆz = (ν ) [z(t + τ) − z(t)]. Let t ← t + τ. Let n ← n + νˆz.

y y 8. Recalculate the reaction propensities rk’s and the total reaction propensity rtot(t). Choose j such that j−1 j X y y X y rk < p2rtot(t) ≤ rk k=0 k=0

T  y y th y 9. Let n ← n + νj , where νj is the j row of ν . Go to step 1.

4.3 Practical Implementation

Partitioning of the state x into “fast” and “slow” extents should be intuitive. We recommend maintaining at least two orders of magnitude difference between the values of the partitioned reaction propensities. It may also be helpful to generate results for a full stochastic simulation, and then identify which reactions are bottlenecks (i.e. ones occurring most frequently). Note that there may exist several regimes that require different partitioning of the state. Also, care should be exercised to maintain the validity of the order of magnitude partition between y and z. It is obviously undesirable for “slow” reaction extents to become the same order of magnitude of the “fast” extents during the time increment τ. Finally, nothing precludes one from invoking the equilibrium approximation for one subset of fast reactions, and the deter- ministic or Langevin approximation for another subset of reactions. We did not carry out such an analysis for notational simplicity.

4.4 Examples

We now consider three motivating examples that illustrate the accuracy of the approximations. For clarity, we first briefly review the nomenclature that indicates which approximations, if any, are performed in a given simulation. We can either perform a purely stochastic simu- 51

Parameter Symbol Value reaction propensity 4.59a a1(x) k1nEnS reaction propensity 4.59b a2(x) k2nES reaction propensity 4.59c a3(x) k3nES reaction 4.59a rate constant k1 20. reaction 4.59b rate constant k2 200. reaction 4.59c rate constant k3 1. initial number of E molecules nEo 20 initial number of S molecules nSo 10 initial number of ES molecules nESo 0 initial number of P molecules nP o 0

Table 4.1: Model parameters and reaction extents for the enzyme kinetics example lation on the unpartitioned reaction system, or we can partition the system into “fast” and “slow” reactions. For this partitioned case, a stochastic-equilibrium simulation equilibrates the fast reactions, a stochastic-Langevin simulation treats the fast reactions as Langevin equations, and a stochastic-deterministic simulation treats the fast reactions deterministically. We can then simulate this partitioned reaction system by exact simulation, in which the next reaction time exactly accounts for the time dependence of the “fast” reactions upon the “slow” reactions; or by an approximate simulation, which neglects this time dependence but scales the next reac- tion time with a propensity of no reaction. For comparison to other approximate techniques, we simulate the simple crystallization example using implicit tau leaping. In contrast to the partitioning techniques proposed here, tau leaping approximates the number of times every reaction fires in a fixed time interval using a rate-dependent Poisson distribution. The details of this method are presented in Chapter 2.

4.4.1 Enzyme Kinetics

We consider the simple enzyme kinetics problem

k1 E + S −→ ES 1 (4.59a)

k2 ES −→ E + S 2 (4.59b)

k3 ES −→ E + P 3 (4.59c)

The model parameters and the reaction extents are given in Table 4.1. For this example, the first and second reactions equilibrate before the expected time of one third reaction. Hence we partition the extents of reaction (i’s) as follows:

• 3 comprises the subset of slow reactions y, and

• 1 and 2 comprise the subset of fast reactions z. 52

20

P 15

S 10 E

5 Number of Molecules ES 0 0 2 4 86 10 Time

Figure 4.1: Comparison of the stochastic-equilibrium simulation (dashed lines) to exact stochastic simulation (solid lines) based on 50 simulations.

We calculate the averages of all species using fifty simulations sampled at a time inter- val of 0.1 units. We use both the stochastic-equilibrium and exact simulations to compute these averages. For the stochastic-equilibrium simulation, solving for the equilibrium distribution in equation (4.52) is easiest if one treats the fast reactions 1 and 2 as one extent. Figure 4.1 presents the results of the comparison. The stochastic-equilibrium simulation provides an excellent reconstruction of the mean behavior. The exact simulation requires roughly twenty- three times the amount of computational expense as the stochastic-equilibrium simulation. We refer the interested reader to Cao, Gillespie, and Petzold [16] for additional exam- ples and discussion on the equilibrium approximation. While their derivation of the equi- librium approximation differs from ours, their simulation algorithm is very similar to our algorithm 3.

4.4.2 Simple Crystallization

Consider a simplified reaction system for the crystallization of species A:

k1 2A −→ B 1 (4.60a)

k2 A + C −→ D 2 (4.60b)

The model parameters and the reaction extents are given in Table 4.2. For this example, the first reaction occurs many more times than the second reaction. Hence we partition the extents 3 of reaction (i’s) as follows :

• 2 comprises the subset of slow reactions y, and

3Reactions are partitioned on the basis of the magnitude of their extents, not their rate constants. 53

Parameter Symbol Value 1 reaction propensity 4.60a a1(x) 2 k1nA(nA − 1) reaction propensity 4.60b a2(x) k2nAnC −7 reaction 4.60a rate constant k1 1 × 10 −7 reaction 4.60b rate constant k2 1 × 10 6 initial number of A molecules nAo 1 × 10 initial number of B molecules nBo 0 initial number of C molecules nCo 10 initial number of D molecules nDo 0

Table 4.2: Model parameters and reaction extents for the simple crystallization example

• 1 comprises the subset of fast reactions z.

We first integrate the system using the implicit tau leap method [117]. We choose a time step of 0.2, and generate Poisson random numbers using code from Numerical Recipes in C [104]. Figure 4.2 demonstrates that this approximation adequately reconstructs the mean and standard deviation for all species. We next perform an approximate stochastic-Langevin simulation. Here we approxi- mate the fast reaction subset using the Langevin approximation and attempt to reconstruct the first two moments of each species. The Langevin equations are integrated using the Euler-Murayama method [40] with a time increment of 0.01. We account for the time-varying propensity of the slow reaction by employing the approximate scheme, setting the propensity of no reaction (a0) to 10. Figure 4.3 compares these results to the exact stochastic results for ten thousand simulations. The approximation accurately reconstructs the mean and standard deviation for all species. Next, we approximate the fast reaction subset deterministically and attempt to recon- struct the first two moments of each species based upon ten thousand simulations. For this case, we consider both the exact and approximate stochastic-deterministic simulations. Figure 4.4 compares the results of exact stochastic simulation to the exact stochastic– deterministic solution. This approximation does an excellent job of reconstructing all of the means as well as the standard deviations for species C and D. However, we are not able to reconstruct the standard deviations for species A and B. This phenomenon is expected because by approximating 1 deterministically, we neglect all fluctuations caused by the first reaction. Figure 4.5 compares the results of exact stochastic simulation to the approximate stochastic-deterministic solution given a small value for the propensity of no reaction, a0. For this value of a0, the approximation accurately reconstructs the means of species A and B, but fails to reconstruct the moments of species C and D as well as the standard deviations of species A and B. This phenomenon indicates that the value of a0 is too small. By examining the cumulative squared error, however, Figure 4.6 demonstrates that increasing the value of a0 results in comparable error for the approximate and exact stochastic-deterministic simulations. Here, the least squares error is based on the deviation of the species C trajectories between the 54 ) 5

− 10 600

10 (a) (b)

× 8 500 400 6 A B 300 4 200 B A 2 100 Number of Molecules 0 0 Number of Molecules ( 200 40 8060 100 200 40 8060 100 Time Time 10 10 (c) +σ 8 8 D −σ 6 6

4 4 +σ 2 C 2 Number of Molecules −σ Number of Molecules (d) 0 0 0 10 3020 40 9080706050 100 0 10 3020 40 9080706050 100 Time Time

Figure 4.2: Comparison of approximate tau-leap simulation (points) to exact stochastic simu- lation (lines) based on 10,000 simulations and time step of 0.2. (a) Comparison of the mean for species A and B. (b) Comparison of the standard deviations for species A and B. (c) Compari- son of the mean (C) and standard deviation (±σ) for species C. (d) compares the mean (D) and standard deviation (±σ) for species D. approximation techniques and the exact stochastic simulation. Table 4.3 compares the order of magnitude of the limiting time step for the different methods in this example. The major improvement in the approximate methods is that the time step is now limited by the “slow” reaction time as opposed to the “fast” reaction time. Note that the solution methods for the partitioned reaction system require more computa- tional expense per limiting time step than the exact stochastic solution method. However, we still observed an order of magnitude improvement in computational expense by employ- ing the approximate solution methods. Also, the results indicate that the tau leap method is the fastest approximation. This result is a little misleading because we employed an implicit first-order method for tau leaping, whereas we integrated deterministic equations using stiff predictor-corrector methods. For a comparison using the same order of method, we expect the 55 ) 5

− 10 600

10 (a) (b)

× 8 500 400 6 A B 300 4 200 B A 2 100 Number of Molecules 0 0 Number of Molecules ( 200 40 8060 100 200 40 8060 100 Time Time 10 10 (c) +σ 8 8 D −σ 6 6

4 4 +σ 2 C 2

Number of Molecules −σ Number of Molecules (d) 0 0 0 10 3020 40 9080706050 100 0 10 3020 40 9080706050 100 Time Time

Figure 4.3: Comparison of approximate stochastic-Langevin simulation (points) to exact stochastic simulation (lines) based on 10,000 simulations, propensity of no reaction a0 = 10, and Langevin time step of 0.01. (a) Comparison of the mean for species A and B. (b) Com- parison of the standard deviations for species A and B. (c) Comparison of the mean (C) and standard deviation (±σ) for species C. (d) compares the mean (D) and standard deviation (±σ) for species D. stochastic-deterministic simulation to yield slightly faster results than tau leaping because the former method does not draw any Poisson random variables. 56 ) 5 103 − 10 A

10 (a)

× 8 102 B

6 10 B A 4 1 B 2 A

Number of Molecules (b) 0 Number of Molecules ( 200 40 8060 100 200 40 8060 100 Time Time 10 10 (c) +σ 8 8 D −σ 6 6

4 4 +σ 2 C 2 Number of Molecules −σ Number of Molecules (d) 0 0 0 10 3020 40 9080706050 100 0 10 3020 40 9080706050 100 Time Time

Figure 4.4: Comparison of exact stochastic-deterministic simulation (points) to exact stochastic simulation (lines) based on 10,000 simulations. (a) Comparison of the mean for species A and B. (b) Comparison of the standard deviations for species A and B. (c) Comparison of the mean (C) and standard deviation (±σ) for species C. (d) Comparison of the mean (D) and standard deviation (±σ) for species D.

Relative O(Time Solution Method System Type Limiting Time Step CPU Step) Time Exact Stochastic unpartitioned fast reaction time O(10−5) 12.3 Tau Leap unpartitioned slow reaction time O(0.25) 1.00 Stochastic- slow reaction time partitioned O(0.01) 1.31 Langevin (Langevin integration) Stochastic- slow reaction time (ODE partitioned O(1) 1.40 Deterministic solver)

Table 4.3: Comparison of time steps for the simple crystallization example 57 ) 5 103 − 10 A

10 (a)

× 8 102 B

6 10 B A 4 1 B 2 A

Number of Molecules (b) 0 Number of Molecules ( 200 40 8060 100 200 40 8060 100 Time Time 10 10 (c) +σ 8 8 D −σ 6 6 +σ 4 4 C 2 −σ 2 Number of Molecules Number of Molecules (d) 0 0 0 10 3020 40 9080706050 100 0 10 3020 40 9080706050 100 Time Time

Figure 4.5: Comparison of approximate stochastic-deterministic simulation (points) to ex- act stochastic simulation (lines) based on 10,000 simulations and propensity of no reaction a0 = 0.01. (a) Comparison of the mean for species A and B. (b) Comparison of the standard deviations for species A and B. (c) Comparison of the mean (C) and standard deviation (±σ) for species C. (d) Comparison of the mean (D) and standard deviation (±σ) for species D. 58

10 (a)

Approximate

1 Squared Error

Exact

0.1 0.10.01 1 10 Propensity of No Reaction, a0 10 (b)

1 Approximate Squared Error

Exact 0.1 0.10.01 1 10 Propensity of No Reaction, a0

Figure 4.6: Squared error trends for the exact and approximate stochastic-deterministic simu- lations based on 10,000 simulations. The squared error is calculated from the deviation of the moments for species C between the approximation techniques and the exact stochastic simula- tion. (a) Plot of the error in the mean of species C. (b) Plot of the error in the standard deviation of species C. 59

Parameter Symbol Value reaction propensity 4.61a a1(x) k1(template) reaction propensity 4.61b a2(x) k2(genome) reaction propensity 4.61c a3(x) k3(template) reaction propensity 4.61d a4(x) k4(template) reaction propensity 4.61e a5(x) k5(struct) reaction propensity 4.61f a6(x) k6(genome)(struct) −1 reaction 4.61a rate constant k1 1. day −1 reaction 4.61b rate constant k2 0.025 day −1 reaction 4.61c rate constant k3 1000. day −1 reaction 4.61d rate constant k4 0.25 day −1 reaction 4.61e rate constant k5 1.9985 day −6 −1 reaction 4.61f rate constant k6 7.5×10 (molecules day) initial number of template molecules templateo 1 initial number of genome molecules genomeo 0 initial number of struct molecules structo 0

Table 4.4: Model parameters and reaction extents for the intracellular viral infection example

4.4.3 Intracellular Viral Infection

We now consider a general model of an infection of a cell by a virus. A reduced system model consists of the following reaction mechanism [143]:

template nucleotides −−−−−−−→ genome 1 (4.61a)

nucleotides + genome −−−−−−−→ template 2 (4.61b) template nucleotides + amino acids −−−−−−−→ struct 3 (4.61c)

template −−−−−−−→ degraded 4 (4.61d)

struct −−−−−−−→ secreted/degraded 5 (4.61e)

genome + struct −−−−−−−→ secreted virus 6 (4.61f) where genome and template are the genomic and template viral nucleic acids respectively, and struct is the viral structural protein. Additional assumptions include:

1. nucleotides and amino acids are available at constant concentrations, and

2. template catalyzes reactions (4.61a) and (4.61c).

We are interested in the time evolution of the template, genome, and struct species. We assume that the initial “infection” of a cell corresponds to the insertion of one template molecule into 60

100000 struct (a) 10000

1000 genome 100 template 10

Number of Molecules 1

500 150100 200 Time (Days) 1000 (b)

100 struct 10

template 1 Number of Molecules genome

0 1 2 3 4 5 Time (Days)

Figure 4.7: Intracellular viral infections: (a) typical and (b) aborted. the cell. The model parameters and reaction extents are presented in Table 4.4. This model has two interesting features best illustrated by the two exact stochastic sim- ulations presented in Figure 4.7. First, the three components of the model exhibit fluctua- tions that vary by differing orders of magnitude. For the same time scale, the struct species fluctuates by hundreds to thousands of molecules, whereas the template and genome species fluctuate by tens of molecules. Second, the model solution exhibits a bimodal distribution. In particular, a cell may exhibit either a “typical” infection in which all species become populated, or an “aborted” infection in which all species are eliminated from the cell. When the number of template and struct molecules are greater than zero and one hun- dred respectively, reactions 4.61c and 4.61e occur many more times than the remaining reac- tions. Hence when template > 0 and struct > 100, we partition the system as follows: 61

Solution Method System Type Relative CPU Time Exact Stochastic unpartitioned 51.5 Stochastic-Deterministic partitioned 1

Table 4.5: Simulation time comparison for the intracellular viral infection example

• 1, 2, 4, and 6 comprise the subset of slow reactions y, and

• 3 and 5 comprise the subset of fast reactions z.

Figure 4.7 indicates that the simulation should traverse between the partitioned and unparti- tioned reaction systems. Since our approximation makes fast reactions continuous events as opposed to discrete ones, we round all species when transitioning from the approximate to ex- act stochastic simulation to prevent non-integer values. This rounding only affects the struct species, and therefore introduces negligible error into the system. We choose to approximate the fast reaction subset deterministically, so we employ the approximate stochastic-deterministic simulation with propensity of no reaction a0 = 0. We compare the approximate stochastic-deterministic simulation to the exact stochastic simula- tion by reconstructing the statistics for each species based upon one thousand simulations. We also compare the evolution of the mean for these two simulations to the solution of the purely deterministic model. Figures 4.8 through 4.10 compare the time evolution of the probability distribution for template, the small numbered species. These figures indicate that the approximate stochastic- deterministic simulation accurately reconstructs the entire template probability distribution. Note that the purely deterministic model, however, is unable to accurately reconstruct even the evolution of the mean. This phenomenon occurs because the deterministic model cannot describe the bimodal nature of the probability density. Figure 4.11 compares the evolution of the mean and standard deviation for the genome species. Again, the approximate simulation accurately reconstructs the time evolution of these moments. Figure 4.12 compares the evolution of the mean and standard deviation for the struct, the large numbered species. Surprisingly, the approximate stochastic-deterministic simula- tion accurately reconstructs the time evolution of both of these statistics. Since we approxi- mated the fast reactions deterministically, we did not expect to accurately reconstruct moments higher than the mean for the large numbered species. For this example, though, fluctuations in the small numbered species, template, are amplified into the struct species via reaction 4.61c. Thus we are able to accurately reconstruct moments of order higher than zero. Table 4.5 compares the computational expense between the exact stochastic and ap- proximate stochastic-deterministic solution methods. The approximate solution method re- sults in a fifty-fold reduction in computational expense over the exact solution method. 62 (a)

0.1

0.05 Probability 0 30 0 50 20 100 10 150 0 200 Time (days) Template Molecules (b)

0.1

0.05 Probability 0 30 0 50 20 100 10 150 0 200 Time (days) Template Molecules

Figure 4.8: Evolution of the template probability distribution for the (a) exact stochastic and (b) approximate stochastic-deterministic simulations.

4.5 Critical Analysis of the Stochastic Approximations

The primary contribution of this work is the idea of partitioning a purely stochastic reaction system using extents of reaction into subsets of slow and fast reactions. Using order of magni- tude arguments, we can derive approximate Markov evolution equations for the slow extent marginal and the fast extents conditioned on the slow extents. The evolution equation for the fast extents conditioned on the slow extents is a closed-form expression, whereas the evo- 63

0.8 0.7 (a) 0.6 0.5 0.4 Stochastic 0.3 Approximate Probability 0.2 0.1 0 500 150100 200 Time (Days) 0.3 (b) 0.25

0.2

0.15

Probability 0.1

0.05

0 50 1510 35302520 4540 Template Molecules

Figure 4.9: Comparisons of the (a) (template = 0,t) and (b) (template,t = 200 days) cross-sections of the template probability distribution for the exact stochastic (solid line) and approximate stochastic-deterministic (dashed line) simulations. lution equation for the slow extent marginal depends on this conditional probability. Using relaxation time arguments, we can propose two approximations for the fast extents: an equilib- rium approximation when the fast extents relax faster than the slow extents, and a Langevin or deterministic approximation when both fast and slow extents exhibit similar relaxation times. The equilibrium assumption is similar in nature to the slow-reaction simulation recently pro- posed in the literature by Cao, Gillespie, and Petzold [16]. In contrast to this approach, we believe that our approach has a much tighter connection to the original master equation. By equilibrating the fast reaction subset, we can substantially reduce the computational requirement by integrating the system over a much larger time step than the exact stochastic simulation. This method requires solving for the equilibrium distribution of the fast reactions. 64

25

+σ Deterministic 20 template 15

10 template 5 Template Molecules 0 −σ -5 500 150100 200 Time (Days)

Figure 4.10: Comparison of the template mean and standard deviation (±σ) for exact stochastic (solid lines), approximate stochastic-deterministic (long dashed lines), and deterministic (short dashed lines) simulations.

250 Deterministic 200 +σ genome

150

100 genome 50

Genome Molecules 0 −σ

-50 500 150100 200 Time (Days)

Figure 4.11: Comparison of the genome mean and standard deviation (±σ) for exact stochastic (solid lines), approximate stochastic-deterministic (dashed lines), and deterministic (points) simulations.

If there are few fast extents or many of fast extents are independent of one another, then exactly solving for this distribution is possible as illustrated by the enzyme kinetics example. If there are a large number of coupled fast extents, then exact solution may not be computationally 65

14 )

3 12 − Deterministic 10 10 +σ

× struct ( 8 6 4 struct 2

Struct Molecules 0 −σ -2 500 150100 200 Time (Days)

Figure 4.12: Comparison of the structural protein (struct) mean and standard deviation (±σ) for exact stochastic (solid lines), approximate stochastic-deterministic (dashed lines), and de- terministic (points) simulations. feasible. For example, consider the coupled, fast reactions

A + E )−*− B + E )−*− C + E )−*− D + E

A minimal representation of these reactions requires three (reversible) extents of reaction, which is difficult to solve given a reasonable number of molecules for each species. By approximating the fast reaction subset using Langevin equations, we can reduce the computational requirement by integrating the system over a much larger time step than the exact stochastic simulation. However, we must now employ schemes for integrating stochas- tic differential equations. By approximating the fast reaction subset deterministically, we can bound the computational requirements for simulation of the system. For this case, we can em- ploy existing and robust ordinary differential equation solvers for integration of this reaction subset. In contrast, the computational expense for exact stochastic simulation scales with the number of reaction events. For an example, reconsider simulation of the simple crystallization system presented in section 4.4.2. Doubling the initial amount of A doubles the number of times the fast reaction must occur, and thus significantly increases the computational load of an exact stochastic simulation. On the other hand, if the fast reaction is approximated deter- ministically, then doubling the initial amount of A does not require stochastic simulation of any additional reaction events, and thus results in no change in the computational load. The partitioning techniques presented here sacrifice some numerical accuracy for a bound on the computational load. By equilibrating some fast reactions, one cannot expect to accurately reconstruct statistics for species affected by these fast reactions at very fine time scales. However, we are often interested in the macroscopic behavior of the system, and it 66 may not be possible to even observe a physical system at such fine time scales. Approximating some discrete, molecular reaction events as continuous events via the Langevin approximation loses the discrete nature of the entire system. However, as illustrated by the simple crystalliza- tion example, this approximation still accurately reconstructs at least the first two moments of each reacting species. Furthermore, approximating fast reactions deterministically eliminates all fluctuations contributed to the system by these reactions. Depending upon the system and the modeling objective, though, these sacrifices may be acceptable. In the simple crystalliza- tion example, the stochastic-deterministic simulations accurately reconstructed the means of all species as well as the standard deviations for the small numbered species. If fluctuations in the larger species are not of interest, then these results are acceptable. In the intracellular viral infection example, the approximate stochastic-deterministic simulation accurately recon- structed the evolution of the probability distribution for the small numbered species, as well as the means and standard deviations for the large numbered species. Here, amplification of fluctuations from the small to large numbered species (template to struct) led to accurate estimates of the statistics of large numbered species. A secondary contribution of this work is an approximate simulation for master equa- tions subject to time-varying constraints. As demonstrated by the simple crystallization ex- ample, this approximate simulation approaches the accuracy of the exact simulation as the magnitude of the propensity of no reaction increases. This approximation is most useful for cases in which the total reaction rate, rtot, is not integrable analytically. For this case, we must use an ODE solver with a stopping criterion to determine the next reaction time. Since calling such an ODE solver requires some overhead computational expense, performing the approxi- mate simulation may be computationally favorable. The work presented here reflects only a fraction of the approximations that should prove useful for simulating stochastic chemical kinetics. For example, one could simulate fast reactions using tau-leaping schemes instead of deterministic or Langevin approximations. Also, we did not address the quasi-steady state assumption (QSSA). In a deterministic setting, the QSSA equilibrates the rate of change for a given chemical species. In terms of our previous example, reaction 4.27, such an assumption would set 0 = a1(x) − a2(x) − a3(x). For the discrete master equation, however, it is unlikely that such a situation can arise due to the integer nature of all chemical species. The most likely situation is for either 1 > 0 and 2 = 3 = 0, or 2, 3  1. In this case, we would expect to almost never find a B molecule in an exact simulation. Although Rao and Arkin [113] recently addressed this issue, they assumed a Markovian form for their governing master equation rather than derive it directly from the original master equation (4.2). A tighter connection between the original and approximate systems should be possible. We believe that the future of stochastic simulation lies in software packages that can

1. adaptively partition reactions into subsets, using appropriate approximations for each subset (i.e. exact, Poisson, Langevin, or deterministic approximations); and

2. adaptively adjust the integration time step to control the error induced at each step. 67

For reconstruction of only the mean and variance, this software should dramatically reduce the amount of computational expense required to generate approximate realizations from the underlying master equation. We envision that the primary benefit of the tools presented in this work is bridging the gap from the microscopic to the macroscopic. In particular, researchers are becoming increas- ingly interested in modeling nanomaterials, phenomena at interfaces, and site interactions on catalysts. In each of these problems, macroscopic interactions in the bulk influence microscopic interactions at interfaces. Although most of the action is at the interface, we cannot neglect the bulk or we lose the ability to model the effect of process design and control strategies. The techniques presented here provide one method of modeling these interactions.

Notation

A matrix of reaction propensities

aj(n) jth reaction propensity (rate)

bj(y, z) jth slow reaction rate averaged over values of the fast extents

bj(y, z) jth slow reaction rate

cj(y, z) jth fast reaction rate I identity matrix

kj rate constant for reaction k N number of Monte Carlo simulations

nj number of molecules for species j

njo initial number of molecules for species j n number of molecules for all reaction species

n0 initial number of molecules for all reaction species

n0,j jth initial number of molecules for all reaction species P probability vector for all possible values of the extents of reaction P probability

PA approximate probability (reduced by order of magnitude arguments) p random number from the uniform distribution (0, 1)

rtot sum of reaction rates y rtot sum of reaction rates for the slow reaction partition t time W vector of Wiener processes x state of the system in terms of extents y subset of slow reaction extents z subset of fast reaction extents z subset of fast reaction extents scaled by Ω  extent of reaction µ one possible reaction in the stochastic kinetics framework ν stoichiometric matrix 68

σ standard deviation τ time of the next stochastic reaction Ω characteristic system size 69

Chapter 5

Sensitivities for Stochastic Models

Recently, models of isothermal, well-mixed stochastic chemical kinetics and Monte Carlo tech- niques for simulating these models have garnered significant attention from researchers in a wide variety of disciplines. This chapter considers a next logical step in applying these mod- els: performing systems level tasks such as parameter estimation and steady-state analysis. One useful quantity in performing these tasks is the sensitivity. Various methods for calcu- lating sensitivities of the underlying probability distribution and its moments are considered. For nontrivial models, the most computationally efficient method of evaluating the sensitivity consists of coupling an approximate evolution equation for the sensitivity with Monte Carlo reconstruction of the desired moments. Several parameter estimation and steady-state analy- sis examples demonstrate that, for systems level tasks, this approximation is well suited. We also show that highly-accurate sensitivities are not critical because optimization algorithms generally converge without exact gradients. This chapter is organized as follows. First we review the chemical kinetics master equa- tion and define the sensitivity of moments of this equation with respect to model parameters. Next we propose and compare several methods for calculating approximations of the sensitiv- ities with an eye on computational efficiency. Finally we illustrate how to use the sensitivities for (1) calculating parameter estimates for several linear and nonlinear kinetic models and (2) performing steady-state analysis.

5.1 The Chemical Master Equation

The governing equation for the system of interest is again the chemical master equation. In this case, however, we consider the dependence of the master equation upon the set of parameters θ m dP (n, t; θ) X = a (n − ν , θ)P (n − ν , t; θ) − a (n, θ)P (n, t; θ) (5.1) dt k k k k k=1 in which

• n is the state of the system in terms of number of molecules (a p-vector),

• θ is a vector containing the system parameters (an l-vector), 70

• P (n, t; θ) is the probability that the system is in state n at time t given parameters θ,

• ak(n, θ)dt is the probability to order dt that reaction k occurs in the time interval [t, t+dt), and

• νk is the kth column of the stoichiometric matrix ν (a p × m matrix).

Here, we assume that the initial condition P (n, t0; θ) is known. One useful quantity in performing systems level tasks is the sensitivity. We consider in the next section the calculation of the sensitivity for stochastic systems governed by the chemical master equation.

5.2 Sensitivities for Stochastic Systems

The sensitivity indicates how responsive the state is to perturbations of a given parameter. For the master equation (5.1), the state is the probability P (n, t; θ), and its sensitivity is

∂P (n, t; θ) s(n, t; θ) = (5.2) ∂θ Here, s(n, t; θ) is an l-vector. We derive the evolution equation for this sensitivity by differen- tiating the master equation (5.1) with respect to the parameters θ

m ∂ dP (n, t; θ) ∂ X = a (n − ν , θ)P (n − ν , t; θ) − a (n, θ)P (n, t; θ) (5.3) ∂θ dt ∂θ k k k k k=1 m ds(n, t; θ) X = a (n − ν , θ)s(n − ν , t; θ) − a (n, θ)s(n, t; θ)+ dt k k k k k=1

∂ak(n − νk, θ) ∂ak(n, θ) P (n − ν , t; θ) − P (n, t; θ) (5.4) ∂θ k ∂θ

We make two observations about equation (5.4):

1. it is linear in the sensitivity s(n, t; θ) and

2. solution of this equation requires simultaneous solution of the master equation (5.1), but not vice versa.

For engineering purposes, we are interested in moments of the probability distribution, i.e. X g(n) = g(n)P (n, t; θ) (5.5) n in which g(n) and g(n) are q-vectors. For example, we might seek to implement control moves that drive the mean system behavior towards a desired set point. Such tasks require knowledge of how sensitive these moments are with respect to the parameters. The master equation (5.1) 71 indicates that the probability distribution evolves continuously with time; consequently, mo- ments of this distribution (assuming that they are well defined) evolve continuously as well. Therefore we can simply differentiate equation (5.5) with respect to the parameters to define the sensitivity of these moments, s(g(n)), as follows:

∂ ∂ X T g(n) = T g(n)P (n, t; θ) (5.6) ∂θ ∂θ n X s(g(n), t; θ) = g(n)s(n, t; θ)T (5.7) n

Here, s(g(n), t; θ) is a q × l matrix. Equation (5.7) indicates that these sensitivities depend upon the sensitivity of the master equation, s(n, t; θ). Therefore, the exact solution of s(g(n)) requires solving the following set of coupled equations:

m dP (n, t; θ) X = a (n − ν , θ)P (n − ν , t; θ) − a (n, θ)P (n, t; θ) (5.8a) dt k k k k k=1 m ds(n, t; θ) X = a (n − ν , θ)s(n − ν , t; θ) − a (n, θ)s(n, t; θ) dt k k k k k=1

∂ak(n − νk, θ) ∂ak(n, θ) + P (n − ν , t; θ) − P (n, t; θ) (5.8b) ∂θ k ∂θ X s(g(n), t; θ) = g(n)s(n, t; θ)T (5.8c) n

Exact solution of even just the master equation (5.1) is computationally intractable for all but the simplest systems. Consequently, exact calculation of both the master equation and its sen- sitivity (i.e. equation (5.8)) is also intractable in general. However, Monte Carlo methods such as those proposed by Gillespie [45, 46] and Gibson and Bruck [43] can reconstruct moments of the master equation to some degree of (error associated with the finite number of simulations corrupts these reconstructed quantities). In the next section, we examine methods for reconstructing the sensitivities given only information about how moments of the master equation evolve.

5.2.1 Approximate Methods for Generating Sensitivities

Approximate methods of generating sensitivities for this system include

1. deriving an approximate model for the sensitivity of a desired moment and

2. applying finite difference schemes.

The primary benefit of these alternatives is that they require only reconstruction of the desired moment, not necessarily via solution of the master equation (5.1). For systems level tasks 72 such as parameter estimation and steady-state analysis, we are particularly interested in the dynamic behavior of the mean n X n = nP (n, t; θ) (5.9) n and its sensitivity X s = s(n, t; θ) = ns(n, t; θ)T (5.10) n in which n is a p-vector and s is a p × l matrix. We consider deriving approximations for the mean sensitivity s subsequently. We note that the sensitivity for any moment could be derived and calculated similarly.

5.2.2 Deterministic Approximation for the Sensitivity

Combining equations (5.4) and (5.10) yields the following evolution equation for the mean sensitivity s

m ds ∂ X X = n (ak(n − νk, θ)P (n − νk, t; θ) − ak(n, θ)P (n, t; θ)) (5.11) dt ∂θT n k=1 m ! ∂ X X X = nak(n − νk, θ)P (n − νk, t; θ) − nak(n, θ)P (n, t; θ) (5.12) ∂θT k=1 n n m ! ∂ X X X = (n + νk)ak(n, θ)P (n, t; θ) − nak(n, θ)P (n, t; θ) (5.13) ∂θT k=1 n n m ∂ X X = νkak(n, θ)P (n, t; θ) (5.14) ∂θT k=1 n

Consider a Taylor series expansion of ak(n, θ) about the mean value n 2 ∂ak(n, θ) 1 T ∂ ak(n) ak(n, θ) = ak(n, θ) + T (n − n) + (n − n) T (n − n) + ··· (5.15) ∂n n=n 2 ∂n∂n n=n One approximation consists of incorporating only the first two terms of the expansion (5.15) into equation (5.14) to obtain m   ds ∂ X X ∂ak(n, θ) ≈ νk ak(n, θ) + (n − n) P (n, θ; t) (5.16) dt ∂θT ∂nT k=1 n n=n m ∂ X = νkak(n, θ) (5.17) ∂θT k=1 ∂ = νa(n, θ) (5.18) ∂θT ∂a(n, θ) ∂n ∂a(n, θ) ∂θ  = ν + (5.19) ∂nT ∂θT ∂θT ∂θT 73

ds ∂a(n, θ) ∂a(n, θ) ≈ ν s + (5.20) dt ∂nT ∂θT in which h iT a(n, θ) = a1(n, θ) ··· am(n, θ) (5.21) Equation (5.20), then, is the first-order approximation of the sensitivity evolution equation assuming that the mean n is known. Logically, then, we must specify how we plan on calculating the mean. Clearly we can also approximate the mean evolution equation using the first two terms of the truncated Taylor series expansion (5.15) as follows: m dn X X = ν a (n)P (n, t; θ) (5.22) dt k k k=1 n m   X X ∂ak(n, θ) ≈ νk ak(n, θ) + (n − n) P (n, θ; t) (5.23) ∂nT k=1 n n=n m X = νkak(n, θ) (5.24) k=1

dn ≈ νa(n, θ) (5.25) dt Equation (5.25) is the usual deterministic approximation of the chemical master equation [154]. In general, the mean behavior of the chemical master equation does not obey the determin- istic equation (5.25); see Arkin, Ross, and McAdams [3] and Srivastava, You, Summers, and Yin [143] for recent biological examples of this phenomenon. Therefore, we do not advise calculating both the mean and the sensitivity in this fashion. We propose to estimate the mean by averaging the results of multiple Monte Carlo simulations, and to approximate the sensitivity of the mean using equation (5.20). Since both the mean and the sensitivity are linear functions, exchanging the order of evaluation is valid. So the following strategies are equivalent: 1. Evaluate sk, nk for every simulation using equation (5.20), in which nk denotes the kth Monte Carlo simulation of n. Since the reaction rate vector a(n, θ) is constant between reaction events, equation (5.20) can be solved exactly via a [21]. Fi- nally, calculate s = E[sk], in which E[n] denotes the expectation of n.

2. Evaluate nk for every simulation, calculate E[nk], then calculate s using E[nk] and equa- tion (5.20). The first option is presumably the more computationally expensive option since exact solution of equation (5.20) requires evaluation of a matrix exponential for every reaction step. The second option, however, may experience difficulties because • depending on the behavior of the mean n, explicit strategies for evaluating equation (5.20) (e.g. Runge-Kutta methods) may require small time steps to ensure stability; and 74

• random noise associated with the finite number of Monte Carlo simulations may induce inaccuracies for higher-order methods.

In spite of these problems, we advocate using the second option to calculate the approximate sensitivity if performing Monte Carlo simulations is computationally expensive. We note that elementary chemical reactions are generally bimolecular. For this case, the Taylor series expansion consists exactly of the first three terms of equation (5.15), and we expect that equation (5.20) adequately approximates the true sensitivity. For unimolecular or zero-order reactions, the Taylor series expansion is exact, so equation (5.20) is exact. Finally, reducing the master equation to a series of moments truncates some of the information contained by the probability distribution of the initial state. For the remainder of this chapter, we assume that this probability distribution is a delta function at the initial mean value. Our method is not restricted to this particular choice of distribution, however. Rather, one may set this initial distribution arbitrarily via proper configuration of the Monte Carlo simulations used to reconstruct the desired moments as discussed in Chapter 4 (see equation (4.3)).

5.2.3 Finite Difference Sensitivities

For finite differences, we assume that we have some evolution equation for the mean n that depends on the system parameters θ

nk+1 = F (nk; θ, Ω) (5.26)

Here, the notation nk denotes the value of the mean n at time tk. Also, Ω denotes the string of random numbers used to propagate the state. Recall that the sensitivity s indicates how sensitive the mean is to perturbations of a given parameter, i.e.

∂nk sk = (5.27) ∂θT We could then approximate the jth column of the desired sensitivity using, for example, a central difference scheme:

F (nk; θ + δcj, Ω1) − F (nk; θ − δcj, Ω2) s = + i · O(δ2) (5.28) k+1,j 2δ

Here, δ is a small positive constant, cj is the jth unit vector, and i is a vector of ones. If we use the mean of Monte Carlo simulations to determine the state propagation function F (nk; θ, Ω) and choose Ω1 6= Ω2, then we have essentially amplified the error associated with the finite number of simulations into evaluation of equation (5.28). On the other hand, evaluating the means by using the same strings of random numbers, i.e. Ω1 = Ω2, eliminates this amplifica- tion. However, we now have the potential of choosing a sufficiently small, non-zero perturba- tion such that F (nk; θ+δcj, Ω1) = F (nk; θ−δcj, Ω2). If we choose the parameter perturbation to be too large, then the O(δ2) is not negligible in equation (5.28). Hence special care must be 75 taken in the selection of the perturbation δ. The subsequent chapter, Chapter 6, discusses these subtleties in greater detail. Finally, the computational expense of this method may be prohibitive if evaluating the mean is computationally intensive because calculating the sensi- tivity requires, in this case, two mean evaluations per parameter. In contrast, calculating the additional sensitivities using the approximate calculation of equation (5.20) does not require any additional stochastic simulations. Raimondeau, Aghalayam, Mhadeshwar, and Vlachos recently examined using finite differences to calculate sensitivities for kinetic Monte Carlo simulations [105]. However, they use only a single simulation to generate their sensitivity and require relatively large parameter perturbations to generate measurable changes in model responses (one of their examples uses a parameter perturbation of approximately 30%). These authors make no appeal to the master equation nor to the fact that the mean should be a smoothly-varying function. We interpret their approach as a mean sensitivity calculation using a poor reconstruction of the mean. Due to the large choice of parameter perturbation, we infer that the authors did not use the same strings of random numbers to evaluate equation (5.28), i.e. Ω1 6= Ω2. Drews, Braatz, and Alkire also recently examined using finite differences to calculate sensitivities for kinetic Monte Carlo code simulating copper electrodeposition [25]. These au- thors consider the specific case of the mean sensitivity, and derive finite differences for cases with significant finite simulation error. In these cases, the finite simulation error is greater than higher-order contributions of the finite difference expansion, so the authors derive first- order finite differences that minimize the variance of the finite simulation error. No appeal is made to the master equation, and they implicitly assume that the mean should be a smoothly- varying function. Their computational requirements certainly motivate the approximations made in this chapter, however. Each simulation required on average 64 hours to complete, and the total computational requirement was 92,547 hours for 22 parameters. Additionally, the authors employed perturbations of +100% and −50%, so the accuracy of the finite differ- ence is questionable. Solving for the approximate sensitivity would require only one mean evaluation (roughly 1400 hours) plus the computational time required for the sensitivity cal- culation, a computational savings of at least an order of magnitude. These authors have also chosen a rather large parameter perturbation, again leading us to infer that they did not use the same strings of random numbers to evaluate equation (5.28), i.e. Ω1 6= Ω2.

5.2.4 Examples

We now illustrate these different methods of calculating the sensitivity with two simple exam- ples. For clarity, we first briefly review the nomenclature that indicates which approximations, if any, are performed in a given simulation. We can either reconstruct the mean exactly by solv- ing the master equation, or approximately via Monte Carlo simulation. Given a reconstruction of the mean, we can then calculate the sensitivity using the approximate equation (5.20), or by finite differences, i.e. equation (5.28). Solving the exact sensitivity of the mean requires solu- tion of equation (5.8), namely the master equation, the desired moment, and their respective 76

200

150

s 100

50 Exact Approximate Finite Difference 0 0 2 4 86 10 Time

Figure 5.1: Comparison of the exact, approximate, and central finite difference sensitivities for a second-order reaction. sensitivities.

Second-Order Reaction Example

We consider the simple second-order reaction

1 A → B a = k n (n − 1) (5.29a) 1 2 1 A A with initial condition nA,o = 20 and nB,o = 0, and k1 = 0.0333. For this example, we define

h iT x = nA nB , θ = k1, s = ∂nB/∂k1

The reaction rate is nonlinear, implying that equation (5.20) is an approximation of the ac- tual sensitivity. We solve for the exact sensitivity. We also reconstruct the mean via Monte Carlo simulation, then calculate the sensitivity by both the approximate equation (5.20) and central finite differences. Each mean evaluation is calculated by averaging fifty Monte Carlo simulations. Additionally, we perturbed k1 by 10% to generate the finite difference sensitivity. Figure 5.1 compares the exact, approximate, and central finite difference sensitivities. For this example, the exact and approximate sensitivities are virtually identical. The central finite dif- ference sensitivity, on the other hand, yields a very noisy and poor reconstruction at roughly twice the cost of the approximate sensitivity. Performing more Monte Carlo simulations per each mean evaluation would improve this estimate at the expense of additional computational burden. 77

14 Approximate 12 Exact 10

s 8 6 4 nAo = 20 2 0 0 2 4 86 10 Time

Figure 5.2: Comparison of the exact and approximate sensitivities for the high-order rate ex- ample.

High-Order Reaction Example

We consider the simple set of reactions

k1nA A −→ B a1 = (5.30a) 1 + KnA B −→ A a2 = k2nB (5.30b) with initial condition nA,o = 20 and nB,o = 0, and parameters k1 = 4.0, k2 = 0.1, and K = 20/nA,o. For this example, we define h iT x = nA nB , θ = K, s = ∂nA/∂K

The first reaction rate is nonlinear, implying that equation (5.20) is an approximation of the actual sensitivity. In fact, for this case the Taylor series expansion (5.15) has an infinite num- ber of terms. We solve for the exact sensitivity. We also reconstruct the mean exactly, then solve for the sensitivity via the approximate equation (5.20). Figure 5.2 plots this comparison, and demonstrates a large discrepancy between the exact and approximate sensitivities. As the initial number of A molecules increases, Figure 5.3 shows that the relative error between the exact and approximate sensitivities decreases. This trend is expected because, in the thermo- dynamic limit (i.e. x → ∞, Ω → ∞, z = x/Ω → constant), the chemical master equation reduces to a deterministic evolution equation for the concentrations z of the form given by the first-order approximation of the mean, equation (5.25) [76]. Next, we consider reconstructing the mean of the system via Monte Carlo simulation, and evaluate the sensitivity by both the approximate equation (5.20) and central finite differ- ences. For this example, we set the initial condition nA,o = 20. Each mean evaluation is calcu- lated by averaging fifty Monte Carlo simulations. Figure 5.4 compares the exact, approximate, 78

0.6 nAo = 20 0.5 0.4 /s ) ¯ s 0.3 − s ( 0.2 nAo = 200 0.1 n = 400 0 Ao 0 2 4 86 10 Time

Figure 5.3: Relative error of the approximate sensitivity s with respect to the exact sensitivity s as the number of nA,o molecules increases for the high-order rate example.

12

10

8

s 6

4 Exact 2 Approximate Finite Difference 0 0 2 4 86 10 Time

Figure 5.4: Comparison of the exact, approximate, and finite difference sensitivity for the high- order rate example. and central finite difference sensitivities. The approximate sensitivity differs significantly from the exact sensitivity at later times but compares favorably with the approximate sensitivity ob- tained from using an exact reconstruction of the mean (i.e. Figure 5.2). Therefore the error in the approximate sensitivity is due to the truncation of the Taylor series expansion and not the Monte Carlo simulations. We perturbed K by 10% to generate the finite difference sensitivity. This sensitivity better approximates the exact sensitivity, but this method amplifies the error associated with the finite number of simulations. Additionally, the computational expense is roughly twice that required for the approximate sensitivity. Finally, we note that this computa- 79 tional expense results from perturbing only a single parameter. If we had required sensitivities for all parameters (k1, k2, and K), the computational expense would triple since the required number of simulations scales linearly with the desired number of sensitivities. In contrast, determining additional sensitivities using the approximate calculation does not require any additional stochastic simulations.

5.3 Parameter Estimation With Approximate Sensitivities

The goal of parameter estimation is to determine the set of parameters that best reconciles the measurements with model predictions. The classical approach is to assume that measurements are corrupted by normally distributed noise. Accordingly, we calculate the optimal parameters via the least squares optimization

1 X T −1 min Φ = ek Π ek (5.31a) θ 2 k

s.t.: xk+1 = F (xk, θ) (5.31b)

ek = yk − h(xk) (5.31c) in which ek’s denote the difference between the measurements yk and the model predictions h(xk), and Π is the covariance matrix for the measurement noise. For the optimal set of parameters, the gradient ∇θΦ is zero. We can numerically evaluate the gradient according to

∂ 1 X T −1 ∇θΦ = e Π ek (5.32) ∂θT 2 k k  T X ∂h(xk) ∂xk −1 = − Π ek (5.33) ∂xT ∂θT k k  T X ∂h(xk) = − s Π−1e (5.34) ∂xT k k k k

Equation (5.34) indicates that the gradient depends upon sk, the sensitivity of the state with respect to the parameters. In general, most experiments do not include many replicates due to cost and time con- straints. Therefore, the best experimental data we are likely to obtain is the average. In fitting these data to stochastic models governed by the master equation, we accordingly choose the mean n as the the state of interest. Monte Carlo simulation and evaluation of equation (5.20) provide estimates of the mean and the sensitivities. Since equation (5.20) is approximate (first- order with respect to the mean), evaluating the gradient using this sensitivity is also approxi- mate. For the sake of illustration, we obtain optimal parameter estimates using an optimiza- tion scheme analogous to the Newton-Raphson method. In particular, we perform a Taylor 80 series expansion of the gradient around the current parameter estimate θk to generate the next estimate θk+1

∇ Φ| ≈ ∇ Φ| + ∇ Φ| (θ − θ ) θ θk+1 θ θk θθ θk k+1 k (5.35)

Since we desire the gradient at the next iterate to be zero,

 −1 θ = θ − ∇ Φ| ∇ Φ| k+1 k θθ θk θ θk (5.36)

Differentiating the gradient (i.e. equation (5.34)) yields the Hessian

 T ! ∂ X ∂h(xk) −1 ∇θθΦ = − sk Π ek (5.37) ∂θT ∂xT k k  T  2 T X ∂h(xk) −1 ∂h(xk) ∂h(xk) ∂ xk −1 = − sk Π sk + Π ek (5.38) ∂xT ∂xT ∂xT ∂θ ∂θT k k k k k k

Making the usual Gauss-Newton approximation for the Hessian (i.e. ek ≈ 0), we obtain

 T X ∂h(xk) ∂h(xk) ∇ Φ ≈ − s Π−1 s (5.39) θθ ∂xT k ∂xT k k k k

Finally, since we estimate both the mean and the sensitivities using Monte Carlo simulations, the finite number of simulations introduces some error into both of these estimates. Properly specifying a convergence criteria for this method must take this error into account. Raimondeau, Aghalayam, Mhadeshwar, and Vlachos argue that using kinetic Monte Carlo simulation to perform parameter estimation is too computationally expensive [105]. They claim that a model with two to three parameters requiring 0.5 hours per simulation needs roughly 105 function evaluations for direct optimization. We believe that the actual number of function evaluations required for direct optimization is significantly lower if one uses the approximate sensitivity coupled with the optimization scheme presented in this section. In the next example, we demonstrate that surprisingly few function evaluations lead to accurate parameter estimates.

5.3.1 High-Order Rate Example Revisited

We consider parameter estimation for the high-order rate example reactions (5.30). Our “ex- perimental data” consists of the time evolution of species A obtained from the average of fifty Monte Carlo simulations. We assume that the values of k1 and k2 are known, and attempt to estimate K using the Newton-Raphson method described in the previous section. Sensi- tivities for this method are obtained using both the approximate and central finite difference sensitivities. For each method, mean evaluations are calculated by averaging fifty Monte Carlo simulations using the same strings of random numbers (note that a different string of random numbers is used to generate the experimental data). Hence calculation of the approximate 81

2 1.8 (a) 1.6 1.4 K 1.2 1 Actual (K = 1) 0.8 0.6 0.4 0 2 4 86 10 1412 1816 20 Iteration 20

)) 18 (b) x

( 16 E

( 14 h 12 10 8 6 4

Measurement 2 0 0 2 4 86 10 Time

Figure 5.5: Comparison of the (a) parameter estimates per Newton-Raphson iteration and (b) model fit at iteration 20 using the approximate (dashed line) and finite difference (solid line) sensitivities for the high-order rate example. Points represent the actual measurement data.

sensitivity requires one mean evaluation per iteration, while the finite difference sensitivity requires three mean evaluations (K, K − δ, and K + δ). We perturbed K by 10% to calculate the central finite difference sensitivity.

Figure 5.5 plots the results of this parameter estimation. Both sensitivities lead to cor- rect estimation of the parameter K in approximately the same number of Newton-Raphson iterations. Clearly the error in the approximate sensitivity does not significantly hinder the search. Additionally, neither method converges to the true parameter value. This phenomenon results due to the fact that different strings of random numbers are used to generate the exper- imental data and the data used to estimate the parameter K. Finally, the estimation using the central finite difference required roughly three times the computational expense of that using the approximate sensitivity. 82 5.4 Steady-State Analysis

Exact determination of steady states requires solving for the stationary state of the master equation (5.1). The difficulty of this task is comparable to that of solving the dynamic response. The next logical question, then, is if we can determine steady states from simulation. Unfortu- nately, we can only reconstruct the entire probability distributions from an infinite number of simulations. Given a finite number of simulations, we can reconstruct only a limited number of moments. Hence we can seek to find a steady state consistent with this desired number of moments. An additional complication associated with simulation is that we only have informa- tion from integrating the model forward in time. At steady state, we know that

xk+1 = xk = steady state (5.40)

Thus, we propose two methods for determining steady states from simulation:

1. Run Monte Carlo simulation for a long time.

2. Guess the steady state.

Check: xk+1 = xk for short simulation?

If not, use a Newton-Raphson search algorithm to search for an improved estimate of the steady state:

∂ (x − x )| ≈ (x − x )| + (x − x ) (θ − θ ) k+1 k θj+1 k+1 k θj T k+1 k j+1 j (5.41) ∂θ θj 0 = (x − x )| + (s − s )| (θ − θ ) k+1 k θj k+1 k θj j+1 j (5.42) θ = θ − (s − s )−1 (x − x )| j+1 j k+1 k k+1 k θj (5.43)

Here, θj denotes the value of the initial state at iteration j. xk denotes the value of the state x at simulation time k for a given iteration.

The second method, recently employed by Makeev et al. [85] in the same capacity, uses short bursts of simulation to determine whether or not the system is at a steady state. Clearly this method may be significantly faster than the first method, which requires a lengthy simula- tion. Additionally, employing the second method permits use of the approximate sensitivity which can be calculated inexpensively from simulation. We consider an example of the second method in the next example. 83

Parameter Value Total catalyst sites 2002 k1 1.6 k2 0.04 −4 k3 1.0 × 10 −3 k4 1.03 × 10 k5 0.36 −2 k6 1.6 × 10

Table 5.1: Parameters for the lattice-gas example.

5.4.1 Lattice-Gas Example

We consider the following lattice-gas reaction model [85]:

k A + ∗ −→1 A∗ (5.44a) k A∗ −→2 A + ∗ (5.44b) k A∗ + B∗ −→3 C + 2 ∗ (5.44c)

k4 B2 + 2∗ −→ 2B∗ (5.44d) k C + ∗ −→5 C∗ (5.44e) k C∗ −→6 C + ∗ (5.44f)

All reactions are elementary as written. Parameters for these reactions are given in Table 5.1. Figure 5.6 plots the results for a dynamic simulation of the lattice-gas model and the convergence of the steady-state search algorithm. For this example, the model response cor- responds to a limit cycle; therefore, the eigenvalues of the state sensitivity matrix sk+1 = T ∂xk+1/∂xk should contain values with absolute value greater than unity to reflect the unsta- ble nature of this steady state. The search algorithm finds a steady state within the region of this limit cycle with eigenvalues (calculated using the approximate sensitivity)

T h −16i λ (sk+1) = −2.58 −1.96 1.38 × 10

Hence the approximate sensitivity indicates that the steady state is indeed unstable.

5.5 Conclusions

We have examined various methods of calculating sensitivities for the moments of the chemi- cal master equation, and explicitly derived methods for calculating the mean sensitivity. Exact solution of the mean sensitivity requires solving the chemical master equation and its sensitiv- ity, a task that is infeasible for all but trivial systems. For more complex systems, the mean and 84

40000 35000 C 30000 25000 20000 15000

Surface Species 10000 A 5000 B 0 2000 400 800600 18001600140012001000 2000 Time 30000 C 25000 20000

15000 A 10000 Surface Species 5000 B 0 1 2 3 4 65 7 98 10 1211 13 14 1615 17 1918 20 Iteration

Figure 5.6: Results for the lattice-gas model: (a) dynamic response of the model from an empty lattice initial condition and (b) convergence of the steady-state search algorithm.

its sensitivity must be reconstructed from Monte Carlo simulations. If carefully implemented, finite differences can generate accurate sensitivities. However, the computational expense of this method scales linearly with the number of parameters, and is particularly burdensome for computationally intensive Monte Carlo simulations. In contrast, employing a first-order approximation of the sensitivity permits inexpensive calculation of the mean sensitivity from a reconstruction of the mean.

Knowledge of model sensitivities permits execution of systems level tasks such as pa- rameter estimation, optimal control, and steady-state analysis. In these operations, highly- accurate sensitivities are not critical because optimization algorithms generally converge, al- beit more slowly, without exact gradients. For use in an optimization context, the efficient evaluation of the approximate sensitivity proposed in this chapter seems well suited. 85 Notation

a(n, t) vector of all reaction rates (ak(n)’s)

ak(n) kth reaction rate

cj the jth unit vector

ek deviation between the predicted and actual measurement vectors at time tk i a vector of ones n vector of the number of molecules for all reaction species n vector of the mean number of molecules for all reaction species nk kth Monte Carlo reconstruction of the vector n P probability s sensitivity of the state x with respect to the parameters θ s sensitivity of the mean n with respect to the parameters θ sk kth Monte Carlo simulation reconstruction of the sensitivity s t time

tk discrete sampling time x state of the system

yk measurement vector at time tk z state vector x scaled by the characteristic system size Ω δ finite difference perturbation λ eigenvalue θ parameter vector for a given model ν stoichiometric matrix Π covariance matrix for the measurement noise φ objective function value Ω characteristic system size 86 87

Chapter 6

Sensitivity Analysis of Discrete Markov Chain Models

In the previous chapter, we considered two approximations to the sensitivity equation: (1) finite differences, which offer inherently biased estimates of the sensitivity for significant com- putational expense, and (2) a first-order approximation to the sensitivity that required trivial computational expense. The second of these methods is analogous to the stochastic fluid mod- els currently proposed in the field of perturbation analysis [19, 167]. For use in the context of unconstrained optimization, we demonstrated that both these approximations to the sensitiv- ities permit efficient optimization. In this chapter, we consider methods for exactly calculating sensitivities for discrete Markov chain models from solely simulation information. In general, a discrete Markov chain model provides simple rules for propagating the discrete state n forward in time, i.e.

P (nk+1) = P (nk+1|nk)P (nk) (6.1) in which P (·) denotes the probability of (·), and nk refers to the state at time tk. Usually the ac- cessible state space is too large to permit computation of the entire probability distribution, so we are forced to sample the distribution via Monte Carlo methods. These methods take advan- tage of the fact that any statistic can be written in terms of a large sample limit of observations, i.e. Z N N 1 X i 1 X i h(n) , h(n)P (n, t)dn = lim h(n ) ≈ h(n ) for N sufficiently large (6.2) N→∞ N N i=1 i=1 in which ni is the ith Monte Carlo reconstruction of the state n. The desired statistic can then be reconstructed to sufficient accuracy given a large enough number of observations. Ultimately, we are interested in calculating sensitivities of expectations, i.e. ∂ s = E[h(n; θ)] (6.3a) ∂θ E [h(n; θ + ∆)] − E [h(n; θ)] = lim (6.3b) ∆→0 ∆ PN h(ni; θ + ∆) − h(ni; θ) = lim lim i=1 (6.3c) ∆→0 N→∞ ∆N 88

in which θ is a parameter of interest 1 For purposes of simulation, we must truncate the number of simulations N at some (hopefully) large, finite value. Then perhaps the easiest approximation to solving equation (6.3) is to fix ∆ at some nonzero value and use, for example, a forward finite difference scheme:

PN h(ni; θ + ∆, Ωi ) − h(ni; θ, Ωi ) O(N −0.5) s = i=1 1 2 + O(∆) + (6.4) ∆N ∆

i i Here, Ω1 and Ω2 refer to the string of random numbers used in the ith simulation. As noted by Fu and Hu [37] in their introduction, we can reduce the variance of this estimate by taking i i Ω1 = Ω2, that is, by using the same seed for multiple simulations. However, finite difference methods are inherently biased estimators due to the fact that the O(∆) term does not go to zero as N → ∞. Additionally, these methods can suffer tremendously from finite simulation error. If h(n) can only be reconstructed to several significant figures, we must choose a large value for the perturbation ∆, causing the O(N −0.5)/∆ term to dominate the expression (6.3).

Alternatively, we could seek to derive unbiased sensitivity estimates from the simu- lated sample paths alone. Accordingly, we would like to be able to justify the interchange of expectation and differentiation in equation (6.3). This particular problem has been well- characterized in the field of perturbation analysis; see, for example, Ho and Cao [63] and Cassandras and Lafortune [18]. When n is discrete, it is clear that for any finite N we can always choose a ∆+ > 0 such that

PN h(ni; θ + ∆) − h(ni; θ) i=1 = 0 if 0 < ∆ < ∆+ (6.5) ∆N

To overcome this problem, we must devise a means to make the sample paths continuous so that the exchange of expectation and differentiation is valid. In this chapter, we consider smoothing by both conditional expectation (smoothed perturbation analysis) and integration.

1This analysis can easily be extended to multiple parameters. We choose to examine a single parameter for notational simplicity. 89 6.1 Smoothed Perturbation Analysis

Smoothed perturbation analysis or SPA “smooths” discrete sample paths by using conditional expectation. Choosing a characterization z for each simulated sample path, we see that

X h(n; θ) = h(n; θ)P (n; θ) (6.6) n X X = h(n; θ)P (n, z; θ) (6.7) n z X X P (z) = h(n; θ)P (n, z; θ) (6.8) P (z) n z X X = h(n; θ)P (n|z; θ)P (z) (6.9) n z X X = P (z) h(n; θ)P (n|z; θ) (6.10) z n X X = P (z) h(n; θ)P (n|z; θ) (6.11) z n X = P (z)E[h(n; θ)|z; θ] (6.12) z

E[h(n; θ)|z; θ] varies continuously with θ, therefore we can evaluate the desired sensitivity by differentiating both sides of equation (6.12)

∂ ∂ X h(n; θ) = P (z)E[h(n; θ)|z; θ] (6.13) ∂θ ∂θ z X ∂ s = P (z) E[h(n; θ)|z; θ] (6.14) ∂θ z X E[h(n; θ)|z, θ + ∆] − E[h(n; θ)|z, θ] = P (z) lim (6.15) ∆→0 ∆ z

Because each Monte Carlo sample path determines the characterization z, equation (6.15) cor- responds to first differentiating the smoothed sample paths, then averaging the results. We refer the interested reader to Fu and Hu [37] for the proofs of the unbiasedness of this estima- tor. The remaining questions are how to choose the characterization z, and how to evaluate the conditional expectation E[h(n; θ)|z; θ]. We examine these issues further by considering two motivating examples. 90 6.1.1 Coin Flip Example

We consider the example of flipping a coin. Define Sn to be the sum of n-independent flips, in which n X Sn = xj (6.16) j=1 P (X = 0) = θ (6.17) P (X = 1) = 1 − θ (6.18) 0 ≤ θ ≤ 1 (6.19)

Here, xj is the jth realization of the random variable X. It is straightforward to show that

n X E[Sn] = E[xj] (6.20) j=1

We are interested in calculating the sensitivity of Sn with respect to the parameter θ. It is easy to show that n X E[Sn] = 1 − θ = n(1 − θ) (6.21) j=1

∂E[Sn] = −n (6.22) ∂θ For the sake of illustration, we compute the SPA estimate for this process. We choose the characterization zi to be the ith outcome of the n flips given the nominal i  i i parameter value θ, e.g. z = x1(θ) = 0, . . . , xn(θ) = 1 . Then equation (6.20) becomes

N X i i i E[Sn] = P (z )E[Sn|z ] (6.23) i=1 N 1 X = E[Si |zi] (6.24) N n i=1 N  n  1 X X = E[xi |zi] (6.25) N  j  i=1 j=1

i i We turn our attention towards calculating the quantity E[xj|z ]. Suppose that the jth flip yields xj(θ) = 0. Because each of the n flips are independent, each xj depends only on the jth element of the characterization z. Therefore, we must calculate the conditional probabilities P (xj(θ + ∆) = 0|xj(θ) = 0) and P (xj(θ + ∆) = 1|xj(θ) = 0). To do so, we use the fact that the random variable X(θ) can be written in terms of the uniform distribution U

P (X(θ) = 0) = P (U < θ) (6.26) 91

Assuming that the parameter perturbation ∆ > 0 2, we evaluate the conditional probabilities:

P (xj(θ + ∆) = 0|xj(θ) = 0) = P (U < θ + ∆|U < θ) (6.27) P (U < θ + ∆, U < θ) = (6.28) P (U < θ) = 1 (6.29)

P (xj(θ + ∆) = 1|xj(θ) = 0) = P (U > θ + ∆|U < θ) (6.30) = 0 (6.31)

Then the desired conditional expectation is

∂E[xj(θ)|z] P (xj(θ + ∆) = 0|xj(θ) = 0) − P (xj(θ) = 0|xj(θ) = 0) = lim (0 − 0) (6.32) ∂θ ∆→0 ∆ P (xj(θ + ∆) = 1|xj(θ) = 0) − P (xj(θ) = 0|xj(θ) = 0) + (1 − 0) (6.33) ∆ =0 (6.34)

Alternatively, we consider the case in which the jth flip yields xj = 1. Then the condi- tional probabilities are

P (xj(θ + ∆) = 0|xj(θ) = 1) = P (U < θ + ∆|U > θ) (6.35) P (θ < U < θ + ∆) = (6.36) P (U > θ) ∆ = (6.37) 1 − θ

P (xj(θ + ∆) = 1|xj(θ) = 1) = P (U > θ + ∆|U > θ) (6.38) P (U > θ + ∆, U > θ) = (6.39) P (U > θ) 1 − (θ + ∆) = (6.40) 1 − θ and the desired conditional expectation is

∂E[xj(θ)|z] P (xj(θ + ∆) = 0|xj(θ) = 1) − P (xj(θ) = 1|xj(θ) = 1) = lim (1 − 0) (6.41) ∂θ ∆→0 ∆ P (xj(θ + ∆) = 1|xj(θ) = 1) − P (xj(θ) = 1|xj(θ) = 1) + (1 − 1) (6.42) ∆ ∆ = lim (6.43) ∆→0 ∆(1 − θ) 1 = (6.44) 1 − θ

2We can also calculate the conditional expectations assuming that ∆ < 0. 92

Parameter Symbol Value Finite difference perturbation ∆ 0.1 Number of simulations N 50 Coin flip probability θ 0.25

Table 6.1: Parameters for the coin flip example

8 7 6 5 E[S ] n 4 3 2 1 Exact Monte Carlo 0 0 2 4 86 10 Number of coin flips n

Figure 6.1: Mean E[Sn] as a function of the number of coin flips n

From this analysis, it is clear that the only trials that impact the sensitivity are those in which xj(θ) = 1. Our estimator of the sensitivity is then

N  n  ∂E[Sn] 1 X X 1 = 1 xi (θ) − 1 (6.45) ∂θ N  1 − θ j  i=1 j=1 ( 1 if δ = 0 1 {δ} = (6.46) 0 otherwise

We now examine the numerical results for this process via simulation. We use the parameters given in Table 6.1. Figure 6.1 plots the average E[Sn] as a function of the number of flips n. Figure 6.2 plots the exact, SPA, and finite difference sensitivities s as a function of the number of flips n. This figure illustrates that the SPA estimate varies significantly less than the finite difference estimate. In fact, the SPA sensitivity appears to have roughly the same amount of error as the simulated estimate of the mean. 93

0 Exact SPA -2 Finite Difference

-4 ∂E[Sn] ∂θ -6

-8

-10

-12 0 2 4 86 10 Number of coin flips n

∂E[Sn] Figure 6.2: Mean sensitivity ∂θ as a function of the number of coin flips n

6.1.2 State-Dependent Simulation Example

In the previous example, each of the n flips are independent, and the probability for choosing heads or tails depended solely on the parameter θ. We now consider calculating the sensitivity for a Markov chain in which the transition probabilities are state dependent, e.g. ( nk + ν1 if U < H1(nk; θ) P (nk+1|nk; θ) = (6.47) nk + ν2 if U > H1(nk; θ) h iT nk = Ak Bk (6.48) " # h i −1 1 ν = ν ν = (6.49) 1 2 1 −1 ¯ k1Ak r1(nk; θ) = (6.50) 1 + θAk ¯ r2(nk; θ) = k2Bk (6.51) r (n ; θ) H (n ; θ) = j k j k P2 (6.52) k=1 rk(nk; θ)

We consider a total of n discrete decisions. One characterization for this system is zi = i i {n0, v1(θ), . . . , vn(θ)}; namely, the initial state n0 and the string of discrete decisions vj(θ)’s for the ith simulation. We note that the simulation uses random numbers to generate the dis- crete decisions (vj(θ)’s). Identifying the discrete decision by the vj(θ)’s is more conducive for calculating sensitivities than the string of random numbers. i i We turn our attention towards calculating the quantity E[n1|z ]. Suppose that the first 94

xk Nominal path

Perturbed path

0 1 2 3 4

Figure 6.3: Comparison of nominal and perturbed path for SPA analysis

U

0 H(n0, θ) 1 H(n0, θ + ∆)

U

0 H(n0, θ) 1 H(n0, θ + ∆)

Figure 6.4: SPA analysis of the discrete decision. Given a positive perturbation ∆ > 0, H(n0, θ + ∆) < H(n0, θ). Therefore if decision v2 is chosen given the nominal parameter θ, no perturbed parameter can change the choice to v1.

decision yields v1(θ) = ν1. We again ask the same question: what if we had chosen ν2 instead of ν1? Figure 6.3 illustrates this question, in which the a new perturbed path deviates from the nominal path. Therefore, we must calculate the conditional probabilities P (v1(n0, θ + ∆) = ν1|vj(n0, θ) = ν1, n0) and P (v1(n0, θ + ∆) = ν2|vj(n0, θ) = ν1, n0). To do so, we again use the fact that the random variable v1(n0, θ) can be written in terms of the uniform distribution U

P (v1(n0, θ) = ν1) = P (U < H1(n0, θ)) (6.53) 95

Assuming that the parameter perturbation ∆ > 0 3, we evaluate the conditional probabilities:

P (v0(θ + ∆, n0) = ν1|v0(θ, n0) = ν1, n0) = P (U < H1(θ + ∆, n0)|U < H1(θ, n0)) (6.54)

P (U < H1(θ + ∆, n0), U < H1(θ, n0)) = (6.55) P (U < H1(θ, n0)) H1(θ + ∆, n0) = (6.56) H1(θ, n0)

P (v0(θ + ∆, n0) = ν2|v0(θ, n0) = ν1, n0) = P (U > H1(θ + ∆, n0)|U < H1(θ, n0)) (6.57)

P (H1(θ + ∆, n0) < U < H1(θ, n0)) = (6.58) P (U < H1(θ, n0)) H1(θ, n0) − H1(θ + ∆, n0) = (6.59) H1(θ, n0)

We note that if ν2 is chosen, no perturbation ∆ > 0 could change the reaction; see Figure 6.4 for an illustration of why. Defining

k−1 X nk = n0 + vj(θ, nj) (6.60) j=0 k−1 X nˆ k = n0 + vj(θ + ∆, nˆ j) (6.61) j=0

Then the desired conditional expectation is

∂E[n1(θ)|z] P (v0(θ + ∆, n0) = ν1|v0(θ, n0) = ν1) = lim (n1 − n1) (6.62) ∂θ ∆→0 ∆ P (v0(θ + ∆, n0) = ν2|v0(θ, n0) = ν1) + (nˆ − n ) (6.63) ∆ 1 1 H1(θ, n0) − H1(θ + ∆, n0) = lim (nˆ 1 − n1) (6.64) ∆→0 ∆H1(θ, n0) ∂ ∂θ [H1(θ, n0)] = − (ν2 − ν1) (6.65) H1(θ, n0)

In general, we are interested in calculating the sensitivity of E[nj(θ)|z], i.e.

N i ∂E[nj(θ)|z] 1 X X X P (nˆ j(θ + ∆)|z (θ) = lim ··· (nˆ j − nj) (6.66) ∂θ ∆→0 N ∆ i=1 vj−1 v0

i so we must consider the probability P (nj(θ + ∆)|z (θ)). Using the properties of conditional

3We can also calculate the conditional expectations assuming that ∆ < 0. 96

xk Nominal path

Perturbed paths

1 2 3 3 4

Figure 6.5: Illustration of the branching nature of the perturbed path for SPA analysis densities and Markov chains, we have

i X X i P (nj(θ + ∆)|z (θ)) = ··· P (vj, . . . , v1; θ + ∆|z (θ)) (6.67) v1 vj−1 X X i i = ··· P (vj; θ + ∆|z (θ), vj−1, . . . , v1; θ + ∆) ··· P (v1; θ + ∆|z (θ)) v1 vj−1 (6.68) in which we use the notation P (·; θ) to denote that the quantity P (·) is a function of the param- eter θ. It is clear that this process branches at every discrete decision, as shown in Figure 6.5, and that we must follow each of these branches with nonzero weight throughout the duration of the simulation.

Figures 6.6 and 6.7 plot the mean and sensitivity comparison for this example. The SPA estimate demonstrates superior reconstruction of the sensitivity in comparison to finite differences, albeit at a greater computational expense. Surprisingly, though, the SPA estimate for each sample path did not require tracking perturbed paths for all possible state combina- tions due to the coalescing of many perturbed paths. However, we do not expect this feature to hold for more models that span larger dimensions, particularly those that include more discrete decisions of the form   nk + ν1 if U < H1(nk; θ)   nk + ν2 if H1(nk; θ) < U ≤ H2(nk; θ) P (n |n ; θ) = (6.69) k+1 k .  .  nk + νm if U > Hm−1(nk; θ) Accordingly, we could consider using a particle filter to track the perturbed paths. 97

20

15

E[x ] k 10 Exact Monte Carlo

5

0 0 2 4 86 10 Number of decisions k

Figure 6.6: Mean E[nk] as a function of the number of decisions k

5 Exact 4 SPA 3 Finite Difference 2 1 ∂E[x ] k 0 ∂θ -1 -2 -3 -4 -5 0 2 4 86 10 Number of decisions k

∂E[nk] Figure 6.7: Mean sensitivity ∂θ as a function of the number of decisions k

6.2 Smoothing by Integration

In some cases, the SPA estimate is not easily calculable. Consequently, we are interested in simpler means of calculating sensitivities. As noted in the previous section, conditional expec- tation provides one means of “smoothing” discrete sample paths. In some sense, expectation may be viewed as an integration over the state space. For some systems, it may be more 98 advantageous to integrate over a variable other than the state space. In this section, we consider the simple example of a state-dependent timing event with no discrete decisions. In particular, we address calculation of sensitivities for stochastic chemi- cal kinetics given only one reaction. In this case, infinitesimal perturbation changes effect only when the single reaction occurs. To begin this analysis, we examine the first possible reaction. We can then define the discrete state n as ( n0, t0 ≤ t < t0 + τ1 n(t; θ) = (6.70) n + ν, t ≥ t0 + τ1

log(p1) τ1 = − (6.71) rtot(n0; θ) T in which t0 is the initial time, n0 is the initial state, t0 +τ1 is the next reaction time, ν is the sto- ichiometric matrix, ant θ is a vector of parameters. We can write equation (6.70) alternatively as Z t−t0 0  0 n(t; θ) = n0 + ν δ t − τ1(n0; θ) dt (6.72) 0 The smoothing trick that we apply here is to define an integrated sensitivity sI in terms of the state integrated with respect to time, i.e. Z t  I ∂ 0 0 s , T n(t ; θ)dt (6.73) ∂θ t0 Integrating equation (6.72) with respect to time yields

∗ ! Z t Z t Z t −t0 ∗ ∗ 0  0 ∗ n(t ; θ)dt = n0 + ν δ t − τ1(n0; θ) dt dt (6.74) t0 t0 0

We can differentiate equation (6.74) with respect to the parameters θ to yield

Z t−t0 I I τ1 0  0 s (t; θ) = s0 + ν δ t − τ1(θ) dt (6.75) rtot(n0; θ) 0 We can similarly show for an arbitrary number of µ reactions that

 ∗ µ  Z t Z t Z t −t0 ∗ ∗ X 0  0 ∗ n(t ; θ)dt = n0 + ν δ t − τj(n0 + (j − 1)ν; θ) dt  dt (6.76) t0 t0 0 j=1 µ Z t−t0 I I X 0  0 s (t; θ) = s0 + ν δ t − τj(n0 + (j − 1)ν; θ) dt (6.77) 0 j=1

Convergence of sI to the integral of the mean sensitivity follows from the law of large num- bers. We consider a simple example to illustrate this technique. The single reaction is

k 2A −→ B (6.78) 99

20 Exact (a) Simulation 15

x 10

5

0 0 2 4 86 10 Time 200 (b) 100 0 I s -100 -200 -300 Exact Simulation -400 0 2 4 86 10 Time

Figure 6.8: Comparison of the exact and simulated (a) mean and (b) mean integrated sensitiv- ity for the irreversible reaction 2A → B.

with the reaction elementary as written. The initial condition is nA = 20 and nB = 0 molecules, and the parameter value is k = 1/15. We solve for the mean and its integrated sensitivity both exactly (via solution of the master equation) and by Monte Carlo reconstruction. For the latter case, we average fifty simulations to reconstruct the mean behavior and apply equation (6.77) to evaluate the integrated sensitivity. Figure 6.8 presents the results for this case, and demon- strates excellent agreement between the exact and reconstructed values for both the mean and the integrated sensitivity.

In general, we require values for the sensitivity rather than the integrated sensitivity. There are numerous possibilities for deriving this quantity. For example, a polynomial can be fitted through the integrated sensitivity. The derivative of this fitted polynomial would then provide an estimate for the desired sensitivity. As seen in Figure 6.8 (b), however, the reconstructed integrated sensitivity can be noisy. Therefore, we recommend against low-order differencing of the integrated sensitivity due to the fact that such differencing amplifies noise. 100 6.3 Sensitivity Calculation for Stochastic Chemical Kinetics

Thus far, we have considered calculation of sensitivities for first the discrete-time case (only choosing from a finite number of discrete events for the next reaction), and the time-dependent case with no discrete event selection. The stochastic chemical kinetics problem, however, is a combination of both time-dependent and discrete events. We envision that this problem could be addressed using the tools presented in previous sections, namely smoothed perturbation analysis (for the “which” reaction choice) and smoothing by integration (for the timing of the reaction). The problematic part for this particular problem, however, is really in implementing SPA. Because time is continuous, the selection of discrete events does not have the property of occurring at the same time for every simulation, as was the case in the discrete-time case. Hence there is no fortuitous coalescing of perturbed paths; in fact, one must nominally track every generated perturbed path to obtain the SPA estimate. Such a task seems unreasonable computationally for all but the simplest models. One potential means around this problem is to bound the computational expense by tracking only the paths that contribute the most to the SPA estimate. However, the problem of continuous time again appears because the perturbed paths are potentially at different time points in the simulation, making comparison of these paths difficult.

6.4 Conclusions and Future Directions

This chapter explored methods for solving the sensitivity of moments of the master equation from simulation via smoothing. We first examined smoothing by conditional expectation, or smoothed perturbation analysis, to address the case of sensitivities for time-independent, discrete event systems. We then applied smoothing by time integration to account for the effect of parameters on the timing of continuous events. Finally, we briefly examined how one might apply these two methods to evaluate sensitivities for stochastic chemical kinetics. As of the writing of this thesis, we do not know of a satisfactory method for efficiently evaluating unbiased estimates for the sensitivity of moments of the discrete master equation. Thus we are forced to conclude that the best options for evaluating these sensitivities are either the approximation proposed previously in Chapter 5, or finite differences. We can speculate on possible methods for doing so as presented next. We speculate that directly solving the companion sensitivity equation to the master equation may offer some hope for calculation of unbiased sensitivities. Considering all the individual probabilities and their respective sensitivities, i.e.   P (n0; θ) P (n; θ) = P (n1; θ) (6.79)  .  . ∂P (n; θ) S(n; θ) = (6.80) ∂θT 101

We can write the evolution equations for the master equation and its sensitivity as linear sys- tems dP (n; θ) = A(θ)P (n; θ) (6.81) dt dS(n; θ) = A(θ)S(n; θ) + J(θ)P (n; θ) (6.82) dt ∂A(θ) J(θ) = (6.83) ∂θ Integrating equation (6.82) with respect to time yields the convolution integral

Z t At A(t−t0) 0 0 S(n; θ) = e S(n; θ)0 + e J(θ)P (n, t ; θ)dt (6.84) 0 The primary drawback to this method is that the sensitivity equation (6.84) has the same large dimensionality as the master equation, the same problem that forced us to use simulation to solve the master equation. We can attempt to solve the sensitivity equation using simulation, but this method also suffers several drawbacks. First, the sensitivity is not a probability distri- bution, so we must recast the problem into a form conducive for solution by simulation. Even after doing so, solving for the sensitivity requires knowledge of the probability distribution, which is presumably reconstructed from simulation. Hence even if we could exactly solve for the sensitivity, the result would only be as accurate as the reconstructed probability density. The primary appeal of this method, though, is that the simulations used to reconstruct the probability density can also be used to evaluate the sensitivity, i.e. the convolution integral in equation (6.84). If we can efficiently store and retrieve this information, solving the sensitivity equation would require little or no additional simulation.

Notation

Hj jth transition probability n discrete state vector N number of simulations P probability

pj jth random number P vector of probabilities

rj jth transition rate

Sn sum of n-independent coin flips S matrix of probability sensitivity vectors sI time-integrated sensitivity t time U uniform distribution

vj jth discrete decision X random variable 102

xj jth realization of the random variable X z characterization of a simulated trajectory ∆ finite difference perturbation ν stoichiometric matrix τ next reaction time θ parameter θ vector of parameters Ω random number string used for simulation 103

Chapter 7

Sensitivity Analysis of Stochastic Differential Equation Models

The purpose of this chapter is to develop and present methods for using stochastic differential equation models for purposes other than pure simulation. As a simulation tool, these types of models are becoming an increasingly popular method for introducing science and engineering students and researchers to the molecular world in which random fluctuations are an impor- tant physical phenomena to be captured in the model. If we consider systems levels tasks, such as parameter estimation, model-based feedback control, and process and product design, we require a different set of tools than those required for pure simulation. Many systems level tasks are conveniently posed as optimization problems, and brute force optimization of these highly “noisy” simulation models either fails outright or is so time consuming that the entire exercise becomes tedious and frustrating.

Simply attaching an optimization method to a stochastic simulation model is ineffi- cient if we do not consider the engineering task that might come later when the simulation is created. We propose adding a small piece of code to the stochastic simulation that exactly computes the sensitivity of the trajectory to all model parameters of interest. These parameters may be kinetic parameters to be estimated from data or control decisions used to control the dynamic or steady-state behavior of the system.

Sensitivity analysis of stochastic differential equations (SDEs) is by no means a new concept. To the best of our knowledge, Dacol and Rabitz [23] first proposed such an analy- sis. These authors suggested using a Green’s function approach to solve for the sensitivity of moments of the underlying probability distribution. In this chapter, we propose differenti- ating simulated sample paths directly to calculate the same sensitivities. We first review the master equation of interest and define the sensitivity of moments of this equation with re- spect to model parameters. Next we propose and compare several methods for calculating these sensitivities with an eye on computational efficiency. Finally, we illustrate how to use the sensitivities for calculating parameter estimates, computing steady states, and computing quantities for polymer models. 104 7.1 The Master Equation

We consider the following master (Fokker-Planck) equation:

l l l ∂P (x, t; θ) X ∂ 1 X X ∂2 = − (A (x; θ)P (x, t; θ)) + B (x; θ)2P (x, t; θ) (7.1) ∂t ∂x i 2 ∂x ∂x ij i=1 i i=1 j=1 i j in which x is the state vector for the system, θ is the vector of parameters, t is time, P (x, t; θ) is the probability distribution function, Ai denotes the ith element of the vector A, and Bij denotes the (i, j)th element of the matrix B. Many different boundary conditions are possible for this system (see, for example, Gardiner [41]); for this chapter, we use reflecting boundary conditions of the form

l l 1 X X ∂ A (x; θ)P (x, t; θ) + B (x; θ)2P (x, t; θ) = 0 (7.2) i 2 ∂x ij i=1 j=1 j unless specified otherwise. Defining the sensitivity S(x, t; θ) as

∂P (x, t; θ) S(x, t; θ) (7.3) , ∂θ we can differentiate equation (7.1) with respect to θ to obtain the sensitivity evolution equation

 l l l  ∂ ∂P (x, t; θ) ∂ X ∂ 1 X X ∂2 = − (A (x; θ)P (x, t; θ)) + B (x; θ)2P (x, t; θ) ∂θ ∂t ∂θ  ∂x i 2 ∂x ∂x ij  i=1 i i=1 j=1 i j (7.4) l   ∂S(x, t; θ) X ∂ ∂Ai(x; θ) = − P (x, t; θ) + A (x; θ)S(x, t; θ) ∂t ∂x ∂θ i i=1 i l l 2   1 X X ∂ ∂Bij(x; θ) + 2B (x; θ)P (x, t; θ) + B (x; θ)2S(x, t; θ) (7.5) 2 ∂x ∂x ∂θ ij ij i=1 j=1 i j

Clearly solution of equation (7.5) requires the solution of equation (7.1), but not vice-versa. In general, we are interested in moments of the probability distribution, i.e. Z g(x) = g(ω)P (ω, t; θ)dω (7.6) x in which g(x) and g(x) are vectors. For example, we might seek to implement control moves that drive the mean system behavior towards a desired set point. Such tasks require knowledge of how sensitive these moments are with respect to the parameters. The master equation (7.1) indicates that the probability distribution evolves continuously with time; consequently, mo- ments of this distribution (assuming that they are well defined) evolve continuously as well. 105

Therefore we can simply differentiate equation (7.6) with respect to the parameters to define the sensitivity of these moments, s(g(x)), as follows:

∂ ∂ Z T g(x) = T g(ω)P (ω, t; θ)dω (7.7) ∂θ ∂θ x Z s(g(x), t; θ) = g(x)S(x, t; θ)T (7.8) x

Here, s(g(x), t; θ) is a matrix. Equation (7.8) indicates that these sensitivities depend upon the sensitivity of the master equation, S(n, t; θ). Therefore, the exact solution of s(g(x)) requires simultaneous solution of equations (7.1), (7.5), and (7.8). As opposed to exactly solving for the desired moments of the master equation, we can reconstruct these moments via simulation. The master equation (7.1) has Itoˆ solution of the form

l X dxi = Ai(x; θ)dt + Bij(x; θ)dW j (7.9) j=1 in which W is a vector of Wiener processes. We can simulate trajectories of equation (7.9) by using, for example, an Euler scheme [40], then tabulate this trajectory information to recon- struct the desired moments (7.6) by applying the law of large numbers

N N Z 1 X 1 X g(x) = g(ω)P (ω, t; θ)dω = lim g(xi) ≈ g(xi) for finite N (7.10) N→∞ N N x i=1 i=1 in which N is the number of simulated trajectories and xi is the value of the state for the ith simulation. Logically, then, we could also attempt to reconstruct the sensitivities from the simulated sample paths alone. This analysis requires some care; in particular, we must justify interchanging the operators of expectation and differentiation, i.e.

i i  i i  E[g(x ; θ + ∆, Ω1)] − E[g(x ; θ, Ω2)] g(x ; θ + ∆, Ω1) − g(x ; θ, Ω2) lim = E lim (7.11) ∆→0 ∆ ∆→0 ∆ in which E(·) denotes the expectation operator and Ω1 and Ω2 refer to the random numbers used to generate the desired expectations. Because the individual sample paths are continu- ous, the interchange is justifiable and we can merely differentiate equation (7.9) with respect to θ

 l  ∂ ∂ X dx = A (x; θ)dt + B (x; θ)dW (7.12) ∂θ i ∂θ  i ij j j=1 l   ∂Ai(x; θ) ∂Ai(x; θ) X ∂Bij(x; θ) ∂Bij(x; θ) ds = s + dt + s + dW (7.13) i ∂x i ∂θ ∂x i ∂θ j j=1 106 in which si is defined as ∂xi s (7.14) i , ∂θ Consequently, we can evaluate the desired moments and sensitivities of these moments by si- multaneously evaluating equations (7.9) and (7.13). Additionally, we have the choice of using either the same strings of random numbers for evaluation of equation (7.11), i.e. Ω1 = Ω2, or different strings of random numbers. The former case corresponds to differentiating individ- ual sample paths and consequently using the same values for the state and Brownian increments to evaluate the desired moments and their sensitivities. This subtle distinction actually results in dramatic differences in the evaluated sensitivities as pointed out by Fu and Hu [37]. We illustrate this point in the examples.

7.2 Sensitivity Examples

We now consider two motivating examples comparing parametric and finite difference sensi- tivities. The first example is a single, reversible reaction that demonstrates the accuracy of the parametric and finite difference sensitivities. The second example consists of the Oregonator reactions and illustrates the superiority of the parametric sensitivity over finite differences.

7.2.1 Simple Reversible Reaction

We consider the reversible reaction

k1 2A )−*− B  = 0.5k1cA(cA − 1) − k2cB (7.15) k−1 in which  denotes the extent of reaction. Parameter values for this example are given in Ta- ble 7.1. We solve the master equation (7.1) and its sensitivity (7.5) for the extent of reaction  by using finite differences to discretizing the  dimension (∆ = 2), then using DASKR (a variant of the package DASPK [15]) to integrate the resulting system of differential-algebraic equa- tions. We also use simulation to evaluate the mean of the stochastic differential equation (7.9) and its sensitivity (7.13). Here, we use a first-order Euler integration with a time increment of ∆t = 10−2. We reconstruct the mean with ten simulations. Figure 7.1 compares the mean results for the master equation, parametric sensitivity, and finite difference sensitivity. For this figure, we have chosen a central finite difference scheme with a perturbation of 10−8 of the parameter value. Figure 7.1 (a) demonstrates that ten simulations yield a reasonable approximation to the mean. Figures 7.1 (b) and (c) illustrate that the parametric and finite difference mean sensitivities yield indistinguishable results (to the scale of the graph), and that these results are similar to those of the master equation. Figure 7.2 compares the mean sensitivity of parameter k2 for the master equation and finite difference sensitivity. Rather than use the same random numbers for evaluation of the sensitivity, we evaluate the perturbed expectations using different strings of random numbers (i.e. Ω1 6= Ω2 in equation (7.11)). Figure 7.2 presents these results for a parameter perturbation 107 of 10%. The finite difference result is substantially noisier than when using the same strings of random numbers for each perturbation (e.g. Figure 7.1 (b)). The noise directly results from the error due to the finite number of simulations used to reconstruct the mean. In fact, the finite simulation error completely swamps the sensitivity calculation when using a parameter perturbation of 1% or less. This result underscores the importance of differentiating the sample trajectories to obtain sensitivity information.

Parameter Value k1 1/450 k2 1/30 P (cA = 150, cB = 25, t = 0) 1

Table 7.1: Parameter values for the simple reversible reaction.

7.2.2 Oregonator

We now consider the Oregonator system of reactions [32]

k W + B −→1 A (7.16) k A + B −→2 X (7.17) k Y + A −→3 2A + C (7.18) k 2A −→4 Z (7.19) k C −→5 B (7.20)

Reactions are elementary as written. Parameters for this system are given in Table 7.2. It is assumed that concentrations of species W and Y remain constant throughout the reaction. Additionally, we track only species A, B, and C since species X and Z are products. We use simulation to evaluate a single trajectory of the stochastic differential equa- tion (7.9) and its sensitivity (7.13). Here, we use a first-order Euler integration with a time −3 increment of ∆t = 10 . The initial condition is P (cA = 500, cB = 1000, cC = 2000, t0) = 1. Figure 7.3 presents the results for this example. Figure 7.3 (a) demonstrates that for the given set of parameters, these reactions yield a stable, oscillatory response. Although Figure 7.3 (b) shows good visual agreement between the parametric and finite difference sen- sitivities, plot (c) clearly shows that the difference between these two sensitivities is actually increasing with time even though the finite difference perturbation is small (10−8 of the pa- rameter k1). 108

160 140 (a) 120 100 A 80 60 B 40 Exact

Number of molecules Simulation 20 0 2 4 86 10 Time 10 (b) 5 B

3 0 Exact − Parametric 10 -5 FD × 1 k

s -10 A -15 -20 0 2 4 86 10 Time 500 400 (c) 300 200 A

2 Exact k 100 s Parametric 0 FD -100 B -200 -300 0 2 4 86 10 Time

Figure 7.1: Results for the simple reversible reaction: (a) comparison of the exact and recon- structed mean (by simulation); (b) comparison of the exact, parametric, and finite difference (FD) sensitivities for parameter k1; and (c) comparison of the exact, parametric, and finite dif- −8 ference (FD) sensitivities for parameter k2. Here, the finite difference perturbation is 10 of each parameter. 109

15 10 5 B 3

− 0 10 -5 × 1

k -10 s A -15 Exact -20 FD -25 0 2 4 86 10 Time

Figure 7.2: Results for the simple reversible reaction: comparison of the exact and finite differ- ence (FD) sensitivities for parameter k1 using different random numbers for each finite differ- −1 ence expectation. Here, the finite difference perturbation is 10 of the parameter k1.

Parameter Value k1cW 2 k2 0.1 k3cY 104 k4 0.016 k5 26

Table 7.2: Parameter values for the Oregonator system of reactions.

7.3 Applications of Parametric Sensitivities

We now turn our attention to applications of parametric sensitivities. We first consider estimat- ing parameters for the simple, reversible reaction of section 7.2.1. We then perform steady-state analysis for the Oregonator reactions of section 7.2.2. Finally, we use parametric sensitivities to evaluate the viscosity of a simple dumbbell model.

7.3.1 Parameter Estimation

The goal of parameter estimation is to determine the set of parameters that best reconciles the measurements with model predictions. The classical approach is to assume that measurements are corrupted by normally distributed noise. Accordingly, we calculate the optimal parameters 110

10

3 cA

− 9 cB

10 8 cC × 7 6 5 4 3 2

Concentration 1 0 0 1 2 3 4 5 Time 20 cA

4 15 cB − 10 cC 10

× 5 0 -5 -10 Sensitivity -15 -20 0 1 2 3 4 5 Time 4 cA 3 3 cB 10 2 cC × 1 0 -1

Sensitivity -2

∆ -3 -4 0 1 2 3 4 5 Time

Figure 7.3: Results for one trajectory of the Oregonator cyclical reactions: (a) simulated trajec- tory, (b) parametric and finite difference (FD) sensitivity for parameter k1, and (c) difference between the parametric and finite difference sensitivities. Here, the finite difference perturba- tion is 10−8 of the parameter. 111 via the least squares optimization

1 X T −1 min Φ = ek Π ek (7.21a) θ 2 k

s.t.: xk+1 = F (xk; θ) (7.21b)

ek = yk − h(xk) (7.21c) in which Φ is the objective function value, ek’s denote the difference between the measure- ments yk and the model predictions h(xk), and Π is the covariance matrix for the measure- ment noise. For the optimal set of parameters, the gradient ∇θΦ is zero. We can numerically evaluate the gradient according to

∂ 1 X T −1 ∇θΦ = e Π ek (7.22) ∂θT 2 k k  T X ∂h(xk) ∂xk −1 = − Π ek (7.23) ∂xT ∂θT k k  T X ∂h(xk) = − s Π−1e (7.24) ∂xT k k k k

Equation (7.24) indicates that the gradient depends upon sk, the sensitivity of the state with respect to the parameters. In general, most experiments do not include many replicates due to cost and time con- straints. Therefore, the best experimental data we are likely to obtain is the average. In fitting these data to stochastic models governed by the master equation, we accordingly choose the mean x as the the state of interest. Monte Carlo simulation and parametric sensitivities pro- vide estimates of the mean and its sensitivity. For the sake of illustration, we obtain optimal parameter estimates using an uncon- strained, line-search optimization with BFGS Hessian update; for further details on this method, we refer the interested reader to Nocedal and Wright [97]. Here, we provide the optimizer with both the objective function and the gradient given in equations (7.21) and (7.24), respectively. Although the Monte Carlo reconstruction of the mean is nominally “stochastic”, by reusing the same string of random numbers for every optimization iteration the objective function given in equation (7.21) becomes, in a sense, deterministic. Additionally, the objective function is con- tinuous with respect to the parameters. Some care must be taken to ensure that the string of random numbers used by the optimization gives a representative reconstruction of the mean (recall that the finite number of simulations introduces some error in the reconstructed mean). Practically, this condition can be checked by optimizing the model with several different ran- dom number strings. We reconsider the simple reversible reaction of section 7.2.1. We assume that we can measure the average amount of cB with a sampling time of ∆t = 0.2. “Experimental” data are generated using the parameters given in section 7.2.1 with the exception that one hundred 112

75 70 (a) B 65 n 60 55 50 45 40 35 Measurement 30 25 0 2 4 86 10 Time -1 k (b) -1.5 2

-2

-2.5 k1 Parameter -3

-3.5 0 2 4 86 10 1412 16 Iteration

Figure 7.4: Results for parameter estimation of the simple reversible reaction example: (a) comparison of the “experimental” (points) and predicted (line) measurements and (b) con- vergence of the optimized parameters (dashed lines) to the true values (solid lines) for the proposed scheme.

simulations are used to generate the mean behavior. For the parameter estimation, we attempt to estimate both log base ten values of k1 and k2 using a different seed for the random num- ber generator than that used to generate the experimental data. We estimate log10 values to prevent both numerical conditioning problems and negative estimates of the rate constants. Figure 7.4 presents the results of this estimation. The experimental and predicted measure- ments agree well, and the parameter estimates quickly converge to close to the true values. The offset between the estimated and true parameters is expected due to the finite simulation error since different seeds are used to generate the experimental and predicted measurements.

To determine the accuracy of the optimization, we analyze both the gradient and the Hessian of the objective function. Differentiating the gradient (i.e. equation (7.24)) yields the 113

Hessian ∇θθΦ

 T ! ∂ X ∂h(xk) −1 ∇θθΦ = − sk Π ek (7.25) ∂θT ∂xT k k  T  2 T X ∂h(xk) −1 ∂h(xk) ∂h(xk) ∂ xk −1 = − sk Π sk + Π ek (7.26) ∂xT ∂xT ∂xT ∂θ ∂θT k k k k k k

Making the usual Gauss-Newton approximation for the Hessian (i.e. ek ≈ 0), we obtain

 T X ∂h(xk) ∂h(xk) ∇ Φ ≈ − s Π−1 s (7.27) θθ ∂xT k ∂xT k k k k

For this optimization, the values of the gradient and the approximate Hessian are

h −5 −6i ∇θΦ = 1.03 × 10 −4.83 × 10 (7.28) " # 1.36 × 105 −3.37 × 104 ∇ Φ = (7.29) θθ −3.37 × 104 1.09 × 104

Examining the eigenvalue/eigenvector (λ/ν) decomposition of the Hessian yields

3 h i λ1 = 2.40 × 10 , ν1 = −0.245 −0.969 (7.30)

5 h i λ2 = 1.44 × 10 , ν2 = −0.969 0.245 (7.31)

Because the gradient is reasonably small and the Hessian is positive definite (eigenvalues are positive), we conclude that the optimizer has indeed converged to a local minimum.

7.3.2 Calculating Steady States

Exact determination of steady states requires solving for the stationary state of the master equation (7.1). The difficulty of this task is comparable to that of solving the dynamic response. In this section, we use the result from section 5.4 which allows us to determine stationary points for moments of the underlying probability distribution given short bursts of simulation. The difference in the analysis presented here is that (1) the considered master equation is of the Fokker-Planck type, i.e. equation (7.1), and (2) sensitivities of the simulated moments can be determined exactly. We now apply this method to the Oregonator system of reactions previously presented in section 7.2.2. For the steady-state calculation, we calculate the evolution of the mean using a short −2 −3 burst of simulation (∆ss = 10 ). We use an Euler integration with time increment ∆t = 10 to evaluate ∆ss. One hundred simulations are used to reconstruct the mean. Figure 7.5 presents the convergence of the steady-state calculation per completed Newton iteration. The majority 114

4.5 4 3.5

3 3 − 2.5 10 C

× 2

x 1.5 1 B 0.5 A 0 1 2 3 4 65 7 98 10 Iteration

Figure 7.5: Results for steady-state analysis of the Oregonator reaction example: estimated state per Newton iteration. of the convergence occurs within the first five iterations. The calculated mean and sensitivity are h iT x = 491.7 1008.6 1979.9 (7.32)   1.031 −0.344 −0.036   s = −0.654 0.748 0.179  (7.33) 0.838 −0.146 0.780

Analyzing the eigenvalues of the mean sensitivity yields h i λ = 0.341 1.11 + 0.065i 1.11 − 0.065i (7.34) which indicates by linear stability analysis (see Chen [21] for further details) that the steady state is unstable, as expected.

7.3.3 Simple Dumbbell Model of a Polymer in Solution

We now consider calculation of the zero-shear viscosity for a simple dumbbell model of a polymer molecule in solution. For this model, two dumbbells are connected by a Hookean spring. We track the coordinates of each dumbbell, in which H  √ dx = (x − x ) + (∇v)T x dt + 2DdW (7.35a) 1 ζ 2 1 1 1  H  √ dx = − (x − x ) + (∇v)T x dt + 2DdW (7.35b) 2 ζ 2 1 2 2   0 0 0   ∇v = γ˙ 0 0 (7.35c) 0 0 0 115 in which x1 and x2 are the Cartesian coordinates of each dumbbell, H is the spring constant, ζ is the friction coefficient, v is the velocity field, D is the diffusivity of each bead, and W is a vector of Wiener processes. The stress τ is defined as

τ =< HqqT > −nkT δ (7.36) in which < · > denotes the expectation operator and q = x1 −x2. For this system, the viscosity η is

∂τ 12 η = (7.37) ∂γ˙ γ˙ =0 Defining γ˙ as the parameter of interest, the viscosity η clearly becomes a function of the sensi- tivities

∂x1 s = (7.38) 1 ∂γ˙ ∂x2 s = (7.39) 2 ∂γ˙

Parameter Symbol Value Friction coefficient ζ 1. Diffusivity D 10−4 Spring constant H 1.

Table 7.3: Parameters for the simple dumbbell model.

Quantity Symbol Value Analytical viscosity η 2.5 × 10−5 −5 −6 Estimated viscosity ηe 2.43 × 10 ± 3.90 × 10

Table 7.4: Results for the simple dumbbell model. Standard deviation calculated by group- ing the simulation results into groups of ten, then determining the standard deviation of the resulting ten averages.

We simulate equation (7.35) using an Euler discretization with time increment of ∆t = 10−2. The expectation < HqqT > is calculated by averaging the time courses of one hundred simulations with a time period of 10. Parameters for the model are given in Table 7.3. For this simple example, the viscosity can be calculated exactly as

Dζ2 η = (7.40) 4H

Table 7.4 presents the results of this simulation. The viscosity calculated using parametric sensitivities compares favorably to the exact value. 116 7.4 Conclusions

We have proposed differentiating simulated sample paths to obtain parametric sensitivities for models consisting of stochastic differential equations. The sensitivity equations are evaluated simultaneously with the model equations to yield accurate, first-order information about the simulated trajectories. Two simple examples demonstrated the accuracy of this technique in comparison to both finite differences and the solution of the underlying master equation and its sensitivity. These results underscore the importance of differencing each simulated trajec- tory rather than trajectories generated using different strings of random numbers. However, we observed little difference between the accuracy of parametric and finite difference sensi- tivities. Additionally, we have demonstrated how these sensitivities can be used to perform systems-level tasks for this class of models. The examples included using nonlinear optimiza- tion to estimate parameters, performing steady-state analysis, and evaluating derivatives for polymer models efficiently. We expect these tools to prove useful in a wide range of applica- tions, from more complex polymer models to financial models.

Notation

cj concentration of species j D diffusivity

ek difference vector between the measurements yk and the model predictions h(xk) at time tk g(x) moment of the probability distribution H spring constant

h(xk) model-predicted measurement vector at time tk

kj rate constant for the jth reaction N number of simulated trajectories P (x, t; θ) probability distribution function q distance vector for the dumbbell model S(x, t; θ) sensitivity of the probability distribution function s(g(x)) sensitivity of a moment of the probability distribution s sensitivity of x for a simulated trajectory t time W vector of Wiener processes x state vector xi value of the state x for the ith simulation

yk measurement vector at time tk  extent of reaction η viscosity

ηe estimated viscosity λ eigenvalue ν eigenvector 117

Φ objective function value Π covariance matrix for the measurement noise ζ friction coefficient τ shear matrix θ vector of model parameters Ω random number string used for simulation 118 119

Chapter 8

Stochastic Simulation of Particulate Systems 1

The stochastic chemical kinetics approach provides one method of formulating the stochastic crystallization population balance equation (PBE). In this formulation, crystal nucleation and growth are modeled as sequential additions of solubilized ions or molecules (units) to either other units or an assembly of any number of units. Monte Carlo methods provide one means of solving this problem. In this chapter, we assess the limitations of such methods by both (1) simulating models for isothermal and nonisothermal size-independent nucleation, growth and agglomeration; and (2) performing parameter estimation using these models. We also de- rive the macroscopic (deterministic) PBE from the stochastic formulation, and compare the nu- merical solutions of the stochastic and deterministic PBEs. The results demonstrate that even as we approach the thermodynamic limit, in which the deterministic model becomes valid, stochastic simulation provides a general, flexible solution technique for examining many pos- sible mechanisms. Thus the stochastic simulation permits the user to focus more on modeling issues as opposed to solution techniques.

8.1 Introduction

Both deterministic and stochastic frameworks have been used to describe the time evolution of a population of particles. The classical deterministic framework consists of coupled popula- tion, mass, and energy balances which describe crystal nucleation, growth, agglomeration, and breakage as smooth, continuous processes. Randolph and Larson [110], Hulburt and Katz [65], and Ramkrishna and Borwanker [108, 109] have extensively studied the analysis and treatment of the deterministic population balance equation (PBE) to these crystal forma- tion mechanisms. Hulburt and Katz [65] made a seminal contribution in which they develop a population balance that includes an arbitrary number of characteristic variables. They use the method of moments to solve the PBE for a variety of applications such as modeling systems with one or two length dimensions, and modeling agglomerating systems. Ramkrishna [107]

1Portions of this chapter to appear in Haseltine, Patience, and Rawlings [56]. 120 provides an excellent summary of techniques used to solve the deterministic balances for mod- els in a single distributed dimension. Ma, Braatz, and Tafti [84] apply high resolution methods to solve the deterministic balances with two characteristic length scales. If the population is large, single microscopic events such as incorporation of growth units into a crystal lattice and biparticle collisions are not significant. Microscopic events tend to occur on short time scales relative to those required to make a significant change in the macroscopic particle size density (PSD). If fluctuations about the average PSD are large, then the deterministic PBE is no longer valid. Large fluctuations about the average density occur when the population modeled is small. Examples of small populations in particulate sys- tems in which fluctuations are significant include such varied applications as aggregation of platelets and neutrophils in flow fields, growth and aggregation of proteins, and aggregation of cell mixtures [79]. The deterministic PBE is also not valid in modeling precipitation reac- tions in micelles in which the micelles act as micro-scale reactors containing a small population of fine particles [86, 8]. In contrast to the deterministic framework, the stochastic framework models crystal nucleation, growth and agglomeration as random, discrete processes. Ramkrishna and Bor- wanker [108, 109] introduce the stochastic framework to modeling particulate processes. The authors show that the deterministic PBE is one of an infinite sequence of equations, called product densities, that describe the mean behavior and fluctuations about the mean behav- ior of the PSD. The deterministic PBE is, in fact, the expectation density of the infinite se- quence of equations satisfied by the product density equations. As the population decreases, higher order product density equations are required to describe the time behavior and fluctu- ations about the expected behavior of the population. We refer the interested reader to Ramkr- ishna [107] for the details of this analysis. One approach to solving the stochastic model for any population of crystals is the Monte Carlo simulation method. Kendall [72] first applies the concept of exponentially dis- tributed time intervals between birth and death events in a single-species population. Shah, Ramkrishna, and Borwanker [136] use the same approach and simulate breakage and agglom- eration in a dispersed-phase system. The rates of agglomeration and breakage are propor- tional to the number of particles in the system and the size-dependent mechanism of breakage and agglomeration. Laurenzi and Diamond [79] apply the same technique as Shah, Ramkr- ishna, and Borwanker to model aggregation kinetics of platelets and neutrophils in flow fields. Gooch and Hounslow [52] apply a Monte Carlo technique similar to Shah, Ramkrishna, and Borwanker to model breakage and agglomeration. Gooch and Hounslow calculate the event time interval from the numerical solution to the zeroth moment equation with ∆N = 1 for breakage, and ∆N = −1 for agglomeration. Manjunath et al. [86] and Bandyopadhyaya et al. [8] use the stochastic approach to model precipitation in small micellar systems. The model specifies the minimum number of solubilized ions and molecules to form a stable nucleus. Once a particle nucleates, growth is rapid and depletes the micelle of growth units. Brow- nian collisions govern the interaction between micelles. Solubilized ions and molecules are transferred during collisions. 121

In the stochastic approach developed here, nucleation and growth in a large-scale batch crystallizer are considered as a sequence of bimolecular chemical reactions. In particular, sol- ubilized ions or molecules (units) sequentially add to other units or to an assembly of any number of units. Both Gillespie [46] and Shah, Ramkrishna, and Borwanker [136] propose equivalent methods for simulating exact trajectories of this random process. The expected be- havior of the system can then be evaluated by averaging over many trajectory simulations. The burden of model solution rests mainly with the computing hardware, and these Monte Carlo simulations can be time intensive depending on the number of particles and the size of the molecular unit. Currently, desktop computers can simulate systems with reasonably large particle populations and small molecular units in a matter of seconds or minutes. In this chapter we first review the stochastic formulation of chemical kinetics and sum- marize the exact simulation method used to solve this system. We then extend the scope of the formulation to describe nonisothermal systems. Since this extension leads to a constraint that hinders the computation, we suggest an approximation that overcomes this obstacle. We then outline assumptions for formulation of the crystallization model. We illustrate the dependence of the stochastic solution on key stochastic parameters, such as cluster size and simulation vol- ume. We also provide an analysis showing the connection between the stochastic formulation and the deterministic PBE. Next, we solve the stochastic formulation for models incorporat- ing isothermal and nonisothermal, size-independent nucleation, growth and agglomeration and contrast the solution to that from the deterministic framework. We then address how to estimate parameters using stochastic models, and provide an example. Finally, we assess the limitations of the Monte Carlo simulation technique.

8.2 Stochastic Chemical Kinetics Overview

In this section, we first review the stochastic formulation of chemical kinetics and one com- putational method for solving this problem. We then relax key assumptions of this problem formulation in order to address other interesting physical systems, and discuss one approxi- mate computational solution method.

8.2.1 Stochastic Formulation of Isothermal Chemical Kinetics

The stochastic formulation of chemical kinetics has its physical basis in the kinetic theory of gases [48]. The modeled system consists of well-mixed, gas-phase chemical species main- tained at thermal equilibrium. The key model assumptions include 1) a hard-sphere molecular model and 2) non-reactive collisions occur much more frequently than reactive collisions. It is then possible to derive a deterministic time-evolution equation not for the state, but rather for the probability of being in a given state at a specific time. This evolution equation is the chemical master equation m dP (x, t) X = a (x − ν )P (x − ν , t) − a (x)P (x, t) (8.1) dt k k k k k=1 122 in which

• x is the state of the system in terms of number of molecules (a p-vector),

• P (x, t) is the probability that the system is in state x at time t,

• ak(x)dt is the probability to order dt that reaction k occurs in the time interval [t, t + dt), and

• νk is the kth column of the stoichiometric matrix ν (a p × m matrix).

Here, we assume that the initial condition P (x, t0) is known. The solution of equation (8.1) is computationally intractable for all but the simplest sys- tems. Rather, Monte Carlo methods are employed to reconstruct the probability distribution and its moments (usually the mean and variance). Monte Carlo methods take advantage of the strong law of large numbers, which permits reconstruction of functions of the probability distribution g(x) by drawing exact samples from this distribution, i.e.

Z N N 1 X i 1 X i g(x) , g(x)P (x, t)dx = lim g(x ) ≈ g(x ) for N sufficiently large (8.2) N→∞ N N i=1 i=1 in which g(x) is the average value of g(x), N is the number of samples, and xi is the ith Monte Carlo reconstruction of x. One efficient method for generating exact trajectories from the master equation is Gille- spie’s direct method [45, 46]. As noted previously, this particular simulation method is equiv- alent to the “interval of quiescence” technique proposed by Shah et al. [136]. This method was previously summarized in algorithm 1.

8.2.2 Extension of the Problem Scope

The previous problem formulation is quite restrictive from a modeling perspective. Firstly, many systems of interest are not solely gas phase. This restriction can be overcome by ju- dicious modeling assumptions to ensure that neither thermodynamics nor conservation laws are violated. Secondly, the reaction propensities (ak’s) often change between reaction events. For example, subjecting the system to a deterministic energy balance introduces time-varying reaction propensities into the system. In such cases the problem of interest is actually the following master equation subject to constraints:

m dP (x; t) X = a (x − ν , y)P (x − ν ; t) − a (x, y)P (x; t) (8.3a) dt k k k k k=1 dy(t) = b(P (x), y; t) (8.3b) dt To solve equation (8.3) exactly, we must revise algorithm 1 to account for the time dependence of the propensity functions, ak(x, y) [47]. Since rtot and rk are functions of time, they must be 123 recalculated after determination of τ in order to choose which reaction occurs next. The major difficulty in this method is that in step 2 of the algorithm 1, we must now satisfy the constraint

Z t+τ 0 0 rtot(t )dt + log(p1) = 0 (8.4) t as opposed to a simple algebraic relation. This constraint often proves to be computationally expensive. If the reaction propensities do not change significantly over the stochastic time step τ, the unmodified algorithm 1 can still provide an approximate solution. When the reaction propensities change significantly over τ, steps can be taken to reduce the error of algorithm 1. One idea is to scale the stochastic time step τ by artificially introducing a probability of no reaction into the system [57]:

• Let a0dt be the contrived probability, first order in dt, that no reaction occurs in the next time interval dt.

This probability does not affect the number of molecules of the modeled reactive system while allowing adjustment of the stochastic time step by changing the magnitude of a0. Theoreti- cally, as the magnitude of a0 becomes infinite, the total reaction rate becomes infinite. As the total reaction rate approaches infinity, the error of the stochastic simulation subject to ODE constraints approaches zero because the algorithm checks whether or not a reaction occurs at every instant of time. Practically, the algorithm should first check the “no reaction propensity” at each iteration to prevent needless calculation of the entire range of actual reactions. Finally, we note that even though the method outlined by Gillespie is “exact” [47], there is still error associated with the finite number of simulations performed since it is a Monte Carlo method. Thus it is plausible that the inherent sampling error may be greater than the error introduced by our approximation. Hence our approximation may often prove to be less computation- ally expensive than the simulation by Gillespie [47] while generating an acceptable amount of simulation error. We summarize our approximation in algorithm 6.

8.2.3 Interpretation of the Simulation Output

Stochastic simulations of population balances involve two inherent and completely different distributions. First, each particle size Nj has its own probability distribution P (Nj, t) dictat- ing the likelihood that the particle size contains a prescribed number of particles. Second, the population balance encompasses the entire distribution of these Nj’s. For the simulation re- sults in this chapter, we perform multiple simulations given a specific initial condition. For each particle size at a given time, we then average over all simulations to obtain the expected numbers of particles for the given size, i.e.

N X i Nj(t) = Nj (t) (8.5) i=1 124

Algorithm 6 Approximate Method (time-dependent reaction propensities). Initialize. Set the time, t, equal to zero.

Set x and y to x0 and y0, respectively.

1. Calculate:

(a) the reaction propensities, rk = ak(x, y), and Pm (b) the total reaction propensity, rtot = k=0 rk.

2. Select two random numbers p1, p2 from the uniform distribution (0, 1).

3. Let τ = − log(p1)/rtot. Integrate dy/dt = b(x, y; t) over the range [t, t + τ) to determine y(t + τ). Let t ← t + τ.

4. Recalculate the reaction propensities rk’s and the total reaction propensity rtot. Choose j such that j−1 j X X rk < p2rtot ≤ rk k=0 k=0

5. Let x ← x + νj. Update y if necessary. Go to 1.

Here, Nj(t) is clearly a scalar value. Finally, we tabulate all of these Nj(t)’s, the expected num- ber of particles, to yield a mean population balance. This procedure is illustrated in Figure 8.1.

8.3 Crystallization Model Assumptions

Certain key assumptions ensure the validity of the stochastic problem formulation. These assumptions are:

1. The system of interest is a well-mixed, constant volume, batch crystallizer. The well- mixed assumption implies that the crystallizer temperature is homogeneous; that is, if any event creates a temperature change, the thermal energy is instantaneously dis- tributed throughout the crystallizer.

2. Particles have discrete sizes and size changes occur in discrete increments. On an atomic level, this assumption is physically true since crystals are composed of a discrete number 125 Distributions Scalars

N1 N1 . . . . Sample and Tabulate Average

Nj Nj . . . .

Distribution of Stochastic Averages Nn Nn StochasticStochastic Realizations Averages

Figure 8.1: Method for calculating the population balance from stochastic simulation. Each particle size Nj has its own inherent probability distribution. Monte Carlo methods provide samples from these distributions, and the samples are averaged to yield the mean value. Tab- ulating the mean values yields the mean of the stochastic population balance.

of molecules.

3. The degree of supersaturation acts as the thermodynamic driving force for crystalliza- tion. This assumption is necessary to account for the system thermodynamics. Other- wise we would need to employ molecular dynamics simulations using an appropriate model for the potential energy function to more accurately describe the time evolution of the population balance. The downside of that choice is that our problem of interest, the macroscopic behavior of the crystallizer, becomes computationally intractable.

The additional assumptions we use to simplify the solution of the population balance and reduce computational load are:

1. Physical properties for the heat capacity, liquid and crystal densities, and the heat of crystallization remain constant.

2. Nucleation, growth, and agglomeration rate constants are independent of temperature.

3. Crystal growth occurs in integer steps of a monomer unit.

4. The number of saturated monomers is an empirical function of temperature. 126 8.4 Stochastic Simulation of Batch Crystallization

To illustrate the solution of the population balance via stochastic simulation, we examine three examples:

1. isothermal nucleation and growth;

2. nonisothermal nucleation and growth; and

3. isothermal nucleation, growth, and agglomeration.

The mechanisms for each of these examples are size-independent. Also, we define the follow- ing nomenclature:

• Mtot,Msat, and M are the total number of monomers, number of saturated monomers, and number of supersaturated monomers, respectively, on a per volume basis. Hence:

M = M tot − M sat (8.6)

• ∆ is the characteristic volume of one monomer unit.

• Nn is the number of particles with size ln = ∆(n + 1).

• V is the system volume.

0 0 • Vmon is the initial volume of monomer. For these examples, Vmon = 800V .

0 • nmon is the initial number of monomer particles, and is determined by the relation: V 0 n0 = mon (8.7) mon ∆

0 0 • nseed is the initial number of seed particles. For these examples, nseed = 10V .

8.4.1 Isothermal Nucleation and Growth

Consider the isothermal reaction system with second-order nucleation and growth and a uni- formly incremented volume scale ∆ = li − li−1:

kn 2M −→ N1 (8.8a)

kg Nn + M −→ Nn+1 (8.8b)

The model parameters are given in Table 8.1. We have chosen to model the crystallization mechanism using a volume scale in order to conserve mass (recall the constant crystal density assumption). In accord with this choice, the initial number of monomers are computed based on the assigned value of ∆. Finally, we quadratically distribute the seeds over the particle volume interval l ∈ [2, 2.5]. 127

Parameter Symbol Value −9 nucleation rate constant kn 3.125 × 10 −4 growth rate constant kg 2.5 × 10 number of saturated monomers Msat 0

Table 8.1: Nucleation and growth parameters for an isothermal batch crystallizer

Figure 8.2: Mean of the stochastic solution for an isothermal crystallization with nucleation and growth, 1 simulation, characteristic particle size ∆ = 0.01, system volume V = 1

Since kg is a constant, size-independent growth exhibits the same kinetics as the second- order reaction:

kg A + M −→ B (8.9)

Here the number of species A molecules is equivalent to the zeroth moment of the particle distribution N. We can reduce computational expense by using reaction (8.9) to calculate the total reaction propensity (rtot) in the algorithm 1, then only calculating reaction propensities as needed to determine the next reaction.

Simulation Results

The stochastic simulation contains two parameters, the simulation volume V and the charac- teristic particle size ∆, that do not exist in deterministic population balances. In deterministic 128

Figure 8.3: Mean of the stochastic solution for an isothermal crystallization with nucleation and growth, average of 100 simulations, characteristic particle size ∆ = 0.01, system volume V = 1

100

10

1

Average Time for One Simulation (sec) 0.1 0.10.010.0011e-041e-05 1 Characteristic Particle Size

Figure 8.4: Average stochastic simulation time based on 10 simulations and V = 1 129

6 4 2

Crystals 0 9 50 8 7 40 6 5 30 4 20 3 2 10 Volume 1 Time 0 0

Figure 8.5: Mean of the stochastic solution for an isothermal crystallization with nucleation and growth, average of 100 simulations, characteristic particle size ∆ = 0.1, system volume V = 1

population balances, the simulation volume is specified by the volume of the modeled crys- tallizer. In general, stochastic techniques cannot simulate the system volume due to excessive computational expense. To overcome this difficulty, we invoke the well-mixed assumption, choose a volume that accurately represents the system, and average the results of multiple simulations given this volume.2 Care must be taken to ensure that the results are generated from a sufficient number of simulations. For an example, consider the case in which ∆ = 0.01 and V = 1. For one simulation, Figure 8.2 shows that each particle size is sparsely populated, making discrete transitions between states clearly observable. Averaging over one hundred simulations, Figure 8.3 demonstrates that the particle sizes are more densely populated, thus credibly reproducing the average system behavior. Varying the characteristic particle size ∆ varies the initial number of monomer units. As ∆ decreases, the initial number of monomer units increases. Since the computational ex- pense scales with the number of reactant molecules, this expense increases. Figure 8.4 illus- trates this point by examining the average computational expense for ten simulations as a function of ∆. In addition, the dispersion among particle sizes associated with the stochas- tic simulation becomes less pronounced as ∆ decreases. The effects of manipulating ∆ are illustrated in Figures 8.3 and 8.5.

2Rate constants of order greater than one are volume dependent in the stochastic simulation because reactions are molecular events. 130

Derivation of the Macroscopic Population Balance as the Limit of the Master Equation

The results of the stochastic simulations lead to the belief that, under appropriate conditions, the deterministic population balance arises from the master equation system representation. We now prove this assertion. The discrete master equation is of the form given in equation (8.1). Define the char- acteristic size of the system to be Ω, and use this size to recast the master equation (8.1) in terms of intensive variables (let z ← x/Ω). Performing a Kramers-Moyal expansion on this master equation results in a system size expansion in Ω. In the limit as x and Ω become large, the discrete master equation can be approximated by its first two differential moments. This approximation is the continuous Fokker-Planck equation [41]:

l l l ∂P (z; t) X ∂ 1 X X ∂2 = − (A (z)P (z; t)) + B (z)2P (z; t) (8.10a) ∂t ∂z i 2 ∂z ∂z ij i=1 i i=1 j=1 i j m X A(z) = νkak(z) (8.10b) k=1 m 2 X T B(z) = νkνk ak(z) (8.10c) k=1

Equation (8.10) has Itoˆ solution of the form:

l X dzi = Ai(z)dt + Bij(z)dW j (8.11) j=1 in which W is a vector of Wiener processes. The Fokker-Planck equation (8.10) specifies the distribution of the stochastic process, whereas the stochastic differential equation (8.11) speci- fies how the trajectories of the state evolve. By taking the thermodynamic limit (x → ∞, Ω → ∞, z = x/Ω = finite), equation (8.11) approaches the deterministic limit [76]:

dzi = A (z) (8.12) dt i

The deterministic limit implies that the probability P (z; t) collapses to a delta function. Now consider the two densities N(li, t) and f(l, t), representing the discrete and continuous pop- ulation balances, respectively. These densities are functions of the characteristic particle size l and the time t. N(li, t) has units of number of crystals per volume, and f(l, t) has units of number of crystals per volume per characteristic particle size. Define the system volume, V , as the extensive characteristic size of the system, Ω. For the kinetic mechanism (8.8), equation 131

(8.12) defines the the discrete population balance accordingly:

∞ dM X tot = −k M 2 − k MN(l , t) (8.13a) dt n g i i=1 dN(l1, t) 1 = k M 2 − k MN(l , t) (8.13b) dt 2 n g 1 dN(li, t) = k M [N(l , t) − N(l , t)] , i = 2,..., ∞ (8.13c) dt g i−1 i For small ∆ and a ≥ 1, it is apparent that the following equality should hold:

∆ Z la+ 2 N(la, t) = f(l, t)dl (8.14a) ∆ la− 2

la = (a + 1)∆ (8.14b)

Differentiating equation (8.14a) with respect to time yields:

∆ ∆ Z la+ Z la+ dN(la, t) d 2 2 ∂f(l, t) = f(l, t)dl = dl (8.15) dt dt ∆ ∆ ∂t la− 2 la− 2 For a > 1, apply the definition given by (8.13c) into equation (8.15):

l + ∆ Z a 2 ∂f(l, t) kgM[N(la − ∆, t) − N(la, t)] = dl (8.16) ∆ ∂t la− 2 Rewriting the left hand side in terms of an integral over the particle size l and regrouping yields: l + ∆ Z a 2 ∂f(l, t) + kgM[f(l, t) − f(l − ∆, t)]dl = 0 (8.17) ∆ ∂t la− 2 Since the bounds on the integral of equation (8.17) are arbitrary, i.e., they hold for any a such that a > 1, one solution is to set the integrand to zero: ∂f(l, t) + k M[f(l, t) − f(l − ∆, t)] = 0 (8.18) ∂t g Mccoy [92] suggests considering a Taylor series expansion to determine the difference f(l, t) − f(l − ∆, t):

∂f(l, t) 1 ∂2f(l, t) f(l − ∆, t) = f(l, t) + [(l − ∆) − l] + [(l − ∆) − l]2 + ... (8.19a) ∂l 2! ∂l2 ∂f(l, t) ∆2 ∂2f(l, t) = f(l, t) − ∆ + + ... (8.19b) ∂l 2 ∂l2 Hence the desired difference is: ∂f(l, t) ∆2 ∂2f(l, t) f(l, t) − f(l − ∆, t) = ∆ − + ... (8.20) ∂l 2 ∂l2 132

For sufficiently small ∆, the first partial derivative of equation (8.20) adequately approximates this difference: ∂f(l, t) = −k M [f(l, t) − f(l − ∆, t)] (8.21a) ∂t g ∂f(l, t) ≈ −k0 M (8.21b) g ∂l 0 where kg = ∆kg. Equation (8.21b) is the corresponding macroscopic population balance equa- tion for well-mixed systems with only nucleation and growth, and is defined over the range 0 ≤ l < ∞. The boundary condition for equation (8.21b) at l = ∞ is:

f(∞, t) = 0 (8.22)

The other boundary condition, f(0, t), can be determined by examining the zeroth moment (µ0) of equation (8.21b) and noting that only nucleation influences the number of particles: Z ∞ Z ∞ ∂f(l, t) 0 ∂f(l, t) dl = −kgM dl (8.23a) 0 ∂t 0 ∂l dµ0 = −k0 M(f(∞, t) − f(0, t)) (8.23b) dt g 1 k M 2 = k0 Mf(0, t) (8.23c) 2 n g knM f(0, t) = 0 (8.23d) 2kg

Finally, conservation of monomer dictates:

∞ dM X tot = −k M 2 − k MN(l , t) (8.24a) dt n g i i=1 Z ∞ 2 ≈ −knM − kgM f(l, t)dl (8.24b) 0 In summary, in the thermodynamic limit and as ∆ becomes small, the stochastic formulation yields the following deterministic formulation:

∂f(l, t) ∂f(l, t) = −k0 M (8.25a) ∂t g ∂l Z ∞ dM tot 2 = −knM − kgM f(l, t)dl (8.25b) dt 0 knM f(0, t) = 0 (8.25c) 2kg

Using these results, we solve the deterministic population balance for ∆ = 0.01 using orthogonal collocation on finite elements [121, 127]. Figure 8.6 presents the resulting popula- tion balance discretized to ∆. Note that in comparison to the mean of the stochastic solution, 133

Figure 8.6: Deterministic solution by orthogonal collocation for isothermal crystallization with nucleation and growth, results discretized to a characteristic particle size ∆ = 0.01, system volume V = 1 i.e. Figure 8.3, the deterministic solution displays no dispersion in either the seed or nucle- ated particle distributions. This result indicates that the simulated characteristic particle size, ∆ = 0.01, is large enough to merit including higher order terms of the f(l, t) − f(l − ∆, t) expansion. The next correction is the “diffusivity” term commonly used to model growth rate dispersion. The corresponding formulation for this model is:

∂f(l, t) ∂f(l, t) ∆ ∂2f(l, t) = −k0 M − (8.26a) ∂t g ∂l 2 ∂l2 Z ∞ dM tot 2 = −knM − kgM f(l, t)dl (8.26b) dt 0

knM ∆ ∂f(l, t) f(0, t) = 0 + (8.26c) 2kg 2 ∂l l=0

∂f(l, t) 0 = (8.26d) ∂l l=∞ Figure 8.7 presents this population balance discretized to ∆. Comparison of this result to Figure 8.3, the mean of the stochastic solution, demonstrates excellent agreement between the two distributions. In contrast to prior modeling efforts (e.g. [111]), however, the “diffusivity” term is a function of the growth rate, not a constant. Hence when the growth rate is zero, growth rate dispersion ceases. 134

Figure 8.7: Deterministic solution by orthogonal collocation for isothermal crystallization with nucleation and growth, inclusion of the diffusivity term, results discretized to a characteristic particle size ∆ = 0.01, system volume V = 1

The key differences between the stochastic and deterministic population balances are somewhat subtle and deserve further attention. First, the stochastic population balance has discrete particle sizes containing an integer number of particles. The deterministic population balance, on the other hand, has continuous particle sizes, and integration over a range of par- ticle sizes yields a real number of particles contained within this range. Second, the number of particles contained in each size class of the stochastic population balance is governed by an individual probability distribution; hence different simulations may yield different numbers of particles in a particular size class at the same time even if the initial condition is identical. Only in the large number (thermodynamic) limit do these probability distributions collapse to delta functions (single values) for the concentration of particles in a given size class. In the determin- istic population balance, simulating a given initial condition multiple times yields the same number of particles over a given size range at the same simulation time. We note that Ramkrishna [106] provides a similar, but different perspective than ours on the connection between the stochastic and deterministic population balances. In his work, Ramkrishna considers continuous particle size classes, and demonstrates that the deterministic population balance can be obtained by averaging the governing master equation. Our deriva- tion considers discrete particle sizes and derives the deterministic population balance as the large number (thermodynamic) limit of the governing master equation. We shy away from averaging because of literature examples demonstrating that this equivalence does not always 135 hold in the small molecule limit [143].

8.4.2 Nonisothermal Nucleation and Growth

In this example, we are interested in modeling a nonisothermal crystallizer whose temperature is regulated by a cooling jacket. We consider the reaction system:

kn n 2M −→ N1 ∆Hrxn (8.27a) kg g Nn + M −→ Nn+1 ∆Hrxn (8.27b)

For the deterministic case, the energy balance should satisfy the following equation:

n   g  Z ∞  dT UA ∆Hrxn 1 2 ∆Hrxn = (Tj − T ) − knM − kgM f(l, t)dl (8.28) dt ρCpV ρCp 2 ρCp 0

Stochastically, we differentiate between enthalpy changes due to interaction with the cool- ing jacket and enthalpy changes due to nucleation and growth reactions. We treat enthalpy changes due to reactions stochastically in that they instantaneously release a specified heat of reaction upon completion. On the other hand, we treat enthalpy changes due to interaction with the cooling jacket continuously, giving rise to a deterministic enthalpy loss expression. This treatment of the energy balance with stochastic and deterministic contributions is dis- cussed further by Vlachos [156]. Hence our simulation plan is as follows:

1. Upon completion of a reaction event, update the temperature due to the enthalpy of reaction.

2. Between reaction events, update the temperature using the following equation: dT UA = (Tj − T ) (8.29) dt ρCpV

Since the monomer saturation, Msat, is a function of temperature, the monomer supersatura- tion, M , is also a function of temperature and we must apply an algorithm that accounts for time-dependent reaction propensities. We quadratically distribute the seeds over the crystal volume interval [2, 2.5]. The cooling temperature profile for the jacket (Tj) follows an exponen- tially decreasing trajectory. The solubility relationship for the number of monomer is given by: 0.04 log M = 2.25 log T + + 1.3 (8.30) 10 sat 10 T The model parameters are given in Table 8.2. The results for the mean of the exact stochastic simulation are presented in Figures 8.8 through 8.10. Figure 8.11 presents the result for the mean of the approximate stochastic simulation with propensity of no reaction a0 = 10. The discretized solution of the deterministic population balance including the diffusivity term is presented in Figure 8.12. These figures 136

Parameter Symbol Value −6 nucleation rate constant kn 1.5625 × 10 −2 growth rate constant kg 1.25 × 10 characteristic particle volume ∆ 0.01 initial crystallizer temperature To 39.896 initial cooling jacket temperature Tj,o 39.896 crystallizer heat transfer coefficient × area UA 5 solution density × heat capacity ρCp 1 simulation system volume V 1 n g nucleation and growth heats of reaction ∆Hrxn = ∆Hrxn −0.01

Table 8.2: Nonisothermal nucleation and growth parameters for a batch crystallizer

80000 2500

70000 2000

60000 1500 50000 1000 40000

500 30000 Number of Monomer Molecules

20000 0 Number of Supersaturated Molecules 0 10 3020 40 50 Time

Figure 8.8: Total and supersaturated monomer profiles for nonisothermal crystallization

demonstrate agreement between the mean of the exact stochastic solution, the mean of the approximate stochastic solution, and the deterministic solution. Figures 8.13 and 8.14 compare of the zeroth and first moments of the approximate stochastic simulation to the exact stochastic simulation. Here, we define the jth moment of the stochastic simulation µj as X j µj = x Nx (8.31) x in which Nn is the average number of particles in the nth size class. Varying the value of the propensity of no reaction, a0, controls the stochastic time in the approximate stochastic solution. For this simulation, the value of a0 = 0.1 is clearly too small to account for the 137

40

35 Crystallizer

30 Jacket

25 Temperature

20

15 0 10 3020 40 50 Time

Figure 8.9: Crystallizer and cooling jacket temperature profiles

Figure 8.10: Mean of the exact stochastic solution for nonisothermal crystallization with nu- cleation and growth, average of 500 simulations, characteristic particle size ∆ = 0.01, system volume V = 1 138

Figure 8.11: Mean of the approximate stochastic solution for nonisothermal crystallization with nucleation and growth, average of 500 simulations, characteristic particle size ∆ = 0.01, system volume V = 1, propensity of no reaction a0 = 10

time-varying reaction propensities as evidenced by the poor initial reconstruction of the mo- ments. However, as the value of a0 increases, the resulting population balances tend towards the exact stochastic solution. Although accuracy increases as a0 increases, computational ex- pense increases as well. Hence the value of a0 must be carefully selected to balance the two. Also, our implementation of the exact stochastic simulation employed an ODE solver with a stopping criteria to account for the time-varying reaction propensities, whereas the approxi- mate solution did not require an ODE solver. As a result, the exact solution was two orders of magnitude slower than the approximate solution.

8.4.3 Isothermal Nucleation, Growth, and Agglomeration

We examine the same reactions as in mechanism (8.8), but now consider particle agglomera- tion as well:

kn 2M −→ N1 (8.32a)

kg Nl + M −→ Nn+1 (8.32b)

ka Np + Nq −→ Np+q (8.32c)

The model parameters are given in Table 8.3. For size-independent agglomeration, ka is a 139

Figure 8.12: Deterministic solution by orthogonal collocation for nonisothermal crystallization with nucleation and growth, inclusion of the diffusivity term, results discretized to a charac- teristic particle size ∆ = 0.01, system volume V = 1

35 a0 = 0.1 a = 10 30 0 a0 = 100 25

20

15 Percent Error 10

5

0 0 10 3020 40 50 Time

Figure 8.13: Zeroth moment comparisons, mean of the stochastic solution for nonisothermal crystallization with nucleation and growth, average of 500 simulations, characteristic particle size ∆ = 0.01, system volume V = 1 140

35 a0 = 0.1 a = 10 30 0 a0 = 100 25

20

15 Percent Error 10

5

0 0 10 3020 40 50 Time

Figure 8.14: First moment comparisons, mean of the stochastic solution for nonisothermal crystallization with nucleation and growth, average of 500 simulations, characteristic particle size ∆ = 0.01, system volume V = 1

Parameter Symbol Value −9 nucleation rate constant kn 3.125 × 10 −4 growth rate constant kg 2.5 × 10 −4 agglomeration rate constant ka 2.5 × 10 simulation system volume V 1 characteristic particle volume ∆ 0.01 number of saturated monomers Msat 0

Table 8.3: Nucleation, growth, and agglomeration parameters for an isothermal, batch crystal- lizer constant. To make the simulation efficient, we note that this type of agglomeration exhibits the same kinetics as the second-order reaction:

k 2A −→a C (8.33)

Again, the number of species A molecules is equivalent to the zeroth moment of the particle distribution N, so we can use reaction (8.33) to calculate the propensity of all agglomeration events occurring. In steps 1 and 4 of algorithm 1, we use this value in calculation of the total reaction rate. Next, we first determine which type of reaction occurs (nucleation, growth, or agglomeration), then which specific event occurs, again calculating reaction propensities only as needed. 141

Figure 8.15: Mean of the stochastic solution for an isothermal crystallization with nucleation, growth, and agglomeration; average of 500 simulations; characteristic particle size ∆ = 0.01; system volume V = 1

The results of this simulation are presented in Figure 8.15. In contrast to Figure 8.3, the equivalent reaction system without agglomeration, we see that agglomeration increases the observed particle dispersion phenomenon.

8.5 Parameter Estimation With Stochastic Models

The goal of parameter estimation is to determine the set of parameters that best reconciles the experimental measurements with model predictions. The classical approach is to assume that measurements are corrupted by normally distributed noise. Accordingly, we calculate the optimal parameters via the least squares optimization

1 X T min Φ = ek Rek (8.34a) θ 2 k

s.t.: nk+1 = F (nk, θ) (8.34b)

ek = yk − h(nk) (8.34c) in which ek’s denote the difference between the measurements yk’s and the model predic- 3 tions h(nk)’s . In general, most experiments do not include many replicates due to cost

3 −1 We assume that the measurement residuals ek’s are normally distributed with zero mean and R covariance. 142 and time constraints. Therefore, the best experimental data we are likely to obtain is in the form of moments of the master equation, i.e. equation (8.2). Clearly the master equation (8.1) demonstrates that these moments are twice continuously differentiable, so standard nonlinear optimization algorithms apply to fitting these moments to data. In fitting data to stochastic models governed by the master equation, we choose the mean x as the the state of interest. Monte Carlo simulation provides an estimate of this mean, albeit to some degree of error due to the finite simulation error. In the following subsections, we present a trust-region optimization method, discuss the calculation of finite difference sen- sitivities, and provide an example of estimating parameters for the nucleation, growth, and agglomeration mechanism of section 8.4.3.

8.5.1 Trust-Region Optimization

We perform optimization (8.34) using a trust-region method employing a Gauss-Newton ap- proximation of the Hessian. This method has provable convergence to stationary points (i.e. ∇θΦ → 0) [97]. Algorithm 7 presents the basic steps of this method. Evaluation of the objective function is relatively expensive since it requires integrating the stochastic model. Therefore, we choose to accept all parameter changes that reduce the value of the objective function and solve the trust-region subproblem exactly using a quadratic programming solver. Also, we scale the optimized parameters using a log10 transformation. The trust-region subproblem requires knowledge of both the gradient and the Hessian. We can numerically evaluate both of these quantities

∂ 1 X ∇ Φ = eT Re (8.37) θ ∂θT 2 k k k  T X ∂h(nk) ∂nk = − Rek (8.38) ∂nT ∂θT k k  T X ∂h(nk) = − S Re (8.39) ∂nT k k k k  T X ∂h(nk) ∂h(nk) ∇ Φ ≈ − S R S (8.40) θθ ∂nT k ∂nT k k k k which indicates dependence upon Sk, the sensitivity of the state with respect to the parame- ters.

8.5.2 Finite Difference Sensitivities

We assume that the unknown evolution equation for the mean x depends on the system pa- rameters θ

xk+1 = F (xk, θ) (8.41) 143

Algorithm 7 Trust Region Optimization.

Given k = 0, ∆ > 0, ∆0 ∈ (0, ∆), and η ∈ [0, 0.25). while (not converged)

1. Solve the subproblem

T 1 T pk = arg min mk(p) = Φ|θ + ∇θΦ|θ p + p ∇θθΦ|θ p (8.35a) p∈Rn k k 2 k

s.t.: ||p||∞ ≤ ∆k (8.35b)

2. Evaluate Φ(θk) − Φ(θk + pk) ρk = (8.36) mk(0) − mk(pk)

3. if ρk < 0.25

∆k+1 = 0.25||pk||∞

else

if ρk > 0.75 and ||pk||∞ = ∆k

∆k+1 = min(2∆k, ∆)

else

∆k+1 = ∆k

end if

end if if ρk > η

θk+1 = θk + pk

else

θk+1 = θk

end if

4. k ← k + 1 end while 144

Parameter Symbol Value

simulations per measurement evaluation nsim 1 finite difference perturbation δ 0.01θj 1 transmittance constant kt 3000 measurement inverse covariance R diag([10−8, 1])

Table 8.4: Parameters for the parameter estimation example. Here, θj is the jth element of the vector θ.

Here, the notation xk denotes the value of the mean x at time tk. The sensitivity s indicates how sensitive the mean is to perturbations of a given parameter, i.e.

∂xk sk = (8.42) ∂θT

We can then approximate the jth component of the desired sensitivity using, for example, a central difference scheme:

F (xk, θ + δej) − F (xk, θ − δej) s = + i · O(δ2) (8.43) k+1,j 2δ

in which δ is a small positive constant, ej is the jth unit vector, and i is a vector of ones. Finite difference methods have several potential problems when used in conjunction with Monte Carlo reconstructed quantities as discussed in Chapters 6 and 5. To reduce the finite simulation error, we re-seed the random number generator before each sample used to generate the mean xk. In doing so, we must take special care in the selection of the pertur- bation δ to ensure that its effect on the mean is sufficiently large; otherwise, the positive and negative perturbations are approximately equal (i.e. F (xk, θ + δej) ≈ F (xk, θ − δej)) resulting in a poor reconstruction of the sensitivity. Finally, the computational expense of this method can be prohibitive if evaluating the mean is computationally intensive because calculating the sensitivity requires, in this case, two mean evaluations per parameter. Drews, Braatz, and Alkire [25] recently examined using finite differences to calculate sensitivities for kinetic Monte Carlo code simulating copper electrodeposition. These authors consider the specific case of the mean sensitivity, and derive finite differences for cases with significant finite simulation error. In these cases, the finite simulation error is greater than higher-order contributions of the finite difference expansion, so the authors derive first-order finite differences that minimize the variance of the finite simulation error. We circumvent the need for such expressions by appealing to the law of large numbers; that is, we reduce the variance of the finite simulation error by merely increasing the number of simulations used to evaluate the mean when necessary. 145

Transformed Parameter Symbol Actual Value Estimated Value

nucleation rate constant log10 kn −8.51 −8.63 ± 0.03 growth rate constant log10 kg −3.60 −3.56 ± 0.02 agglomeration rate constant log10 ka −3.60 −3.73 ± 0.05

Table 8.5: Estimated parameters

80000 1

70000 0.9 0.8 60000 0.7 50000 0.6 40000 0.5 30000 Transmittance 0.4

20000 0.3

Number of Supersaturated Molecules 10000 0.2 0 10 3020 40 50 Time

Figure 8.16: Comparison of final model prediction and measurements for the parameter esti- mation example.

8.5.3 Parameter Estimation for Isothermal Nucleation, Growth, and Agglomera- tion

We reconsider the isothermal nucleation, growth, and agglomeration example given in sec- tion 8.4.3. Traditional measurements for batch crystallizers yield moments of the PBE, so we assume that we can measure both the supersaturated monomer and transmittance, i.e. " # M y = (8.44) exp (−ktµ2) X 2 µ2 = x Nx (8.45) x in which µ2 is the second moment of the particle distribution and M is the average amount of supersaturated monomer. Parameters for the optimization routine are given in Table 8.4. Using the kinetic mechanism (8.32), we generate results from one simulation for the “experi- mental” measurements, then attempt to fit the parameters using subsequent simulations. Table 8.5 compares the actual and estimated parameter values. We also report 95% 146

-3 -2.8

-4 log10 kg -3 -3.2 -5 -3.4 -6 log10 ka -3.6 -7 -3.8 -8 log10 kn -4

-9 -4.2

-10 -4.4 1 2 3 4 65 7 98 10 Iteration

Figure 8.17: Convergence of parameter estimates as a function of the optimization iteration. confidence intervals for the estimated parameter values calculated by ignoring the effect of the finite simulation error. The results indicate excellent agreement between the actual and fitted parameters. The slight discrepancies in the fit most likely result from the finite simulation error (we simulated the “experimental” and predicted measurements using different seeds for the random number generator). Figure 8.16, which plots both the “experimental” and model predicted measurements, also demonstrates excellent agreement between the model and the experiment. Figure 8.17 plots the convergence of the parameter estimates as a function of the opti- mization iteration. This result indicates that the convergence to the optimal parameter values occurs relatively quickly (roughly five iterations). Each iteration requires seven mean evalua- tions (six for the finite difference calculations and one for the predicted step). Raimondeau, Aghalayam, Mhadeshwar, and Vlachos [105] argue that using kinetic Monte Carlo simulation to perform parameter estimation is too computationally expensive. They claim that a model with two to three parameters needs roughly 105 function (mean) evaluations for direct optimization. For this example, in contrast, the required number of mean evaluations is less than 102. In general, we expect that the actual number of function evaluations required for direct optimization is significantly lower than their estimate when using an appropriate optimization scheme.

8.6 Critical Analysis of Stochastic Simulation as a Modeling Tool

Thus far, we have demonstrated the efficacy of stochastic simulation as a macroscopic mod- eling tool. Now we address the benefits and shortcomings of this technique. The primary 147 shortcoming of stochastic simulation is the computational expense. Since the computational expense of stochastic simulation scales with the number of reactant molecules, this expense increases as the modeled volume increases or the characteristic particle size decreases. Also, the computational expense is significantly greater than that required to solve the equivalent deterministic system. However, as computing power continues to increase, this discrepancy will become less of a hindrance in solving the stochastic PBE. Perhaps the greatest advantages of stochastic simulation are its flexibility and ease of implementation. The simple algorithms presented in this chapter are applicable to any reac- tion mechanism. For example, adding agglomeration to the preexisting isothermal nucleation and growth code required addition of the n(n−1)/2 possible agglomeration reactions between n possible particle sizes. We expect that adding more complicated mechanisms or tracking more than one crystal characteristic are straightforward extensions of this algorithm. Imple- menting size-dependent growth, for example, requires only making the reaction propensities functions of length (i.e., rk = ak(x, y, lk)). To track two characteristic lengths, we need only explicitly account for each particle and define mechanisms for growth of each characteristic length. The most difficult part of augmenting the reaction mechanism is deciding how to store and update the active particle sizes. To illustrate these points, we invite the interested reader to download and examine codes that simulate isothermal nucleation and growth, and isothermal nucleation, growth, and agglomeration from our web site at

http://www.che.wisc.edu/˜haseltin/stochsims.tar.

Addition of agglomeration requires approximately sixty additional lines of code to the nu- cleation and growth code. The majority of this code updates the data structure employed to account for existing crystal sizes. In contrast, attempting to examine nucleation, growth, and agglomeration using orthogonal collocation most likely requires major revision of the solution technique, such as adaptive mesh algorithms. Stochastic simulation inherently accounts for each crystal in the simulation. Hence we see stochastic simulation as a general solution tech- nique that allows the user to focus on key modeling issues as opposed to population balance solution methods. We also demonstrated one method of performing parameter estimation with stochastic models. By applying appropriate nonlinear optimization routines, we can obtain optimal pa- rameter values with surprisingly few evaluations of the stochastic model. The primary draw- back to the presented method is the calculation of sensitivities via finite differences. Finite difference methods quickly become expensive to evaluate as both the number of parameters and the computational burden of evaluating the stochastic model increase. Finally, refined op- timization of Monte Carlo simulations requires quantifying the effects of the finite simulation error on both the model constraint (an error-in-variables formulation is more appropriate) and the termination criteria. 148 8.7 Conclusions

Stochastic simulation provides one alternative to solving the deterministic crystallization pop- ulation balance. For systems with small numbers of monomer and seed, the stochastic crys- tallization model is more realistic than the deterministic model because it inherently accounts for the system fluctuations. In the limit as the numbers of monomer and seed become large, the deterministic model becomes valid. Even for this case, stochastic simulation provides a general, flexible solution technique for examining many possible reaction mechanisms. Ad- ditionally, optimization of the stochastic model for purposes such as parameter estimation is feasible and requires relatively few evaluations of the model. Simulation results presented in this chapter illustrate these claims. Thus stochastic simulation should permit the user to focus more on modeling issues as opposed to solution techniques.

Notation

A crystallizer area

a0dt contrived probability, first order in dt, that no reaction occurs in the next time interval dt

ak(x)dt probability to order dt that reaction k occurs in the time interval [t, t + dt)

Cp heat capacity e error vector

ej jth unit vector f(l, t)dl concentration of particles g(x) average value of the quantity g(x)

h(xk) model prediction of the measurement vector at time tk i vector of ones

ka agglomeration rate constant

kg growth rate constant

kn nucleation rate constant l characteristic particle size M average amount of supersaturated monomer M number of supersaturated monomers

M tot total number of monomers

M sat number of saturated monomers N number of Monte Carlo samples

Nj jth particle size

Nn number of particles with size ln = ∆(n + 1) 0 nmon initial number of monomer particles 0 nseed initial number of seed particles P (x, t) probability that the system is in state x at time t

pk kth uniformly-distributed random number R inverse covariance matrix of the measurement noise

rk kth reaction propensity 149 rtot total reaction propensity S sensitivity matrix of the state s sensitivity of the mean x T temperature

Tj,o initial cooling jacket temperature

To initial crystallizer temperature t time U crystallizer heat transfer coefficient V system volume 0 Vmon initial volume of monomer W the Wiener process x state vector in terms of number of molecules x average state vector xi ith Monte Carlo reconstruction of x y vector of state-dependent variables yk measurement vector at time tk z state vector in terms of concentration (intensive variable) ∆ characteristic volume of one monomer unit

∆k trust-region optimization parameter at step k ∆ trust-region optimization parameter g ∆Hrxn growth heat of reaction n ∆Hrxn nucleation heat of reaction δ small positive constant η trust-region optimization parameter

µj jth moment of the particle size distribution ν stoichiometric matrix Φ objective function value

ρp solution density

ρk trust-region optimization parameter at step k τ next reaction time θ vector of model parameters Ω characteristic system size 150 151

Chapter 9

Population Balance Models for Cellular Systems 1

To date, most models of viral infections have focused exclusively on modeling either the intra- cellular level or the extracellular level. To more realistically model these infections, we propose incorporating both levels of information into the description. One way of performing this task in a deterministic setting is to derive cell population balances from the equation of continuity. In this chapter, we first outline the basics of deriving and solving these population balance models for viral infections. Next, we construct a population balance model for a generic viral infection. We examine the behavior of this model given in vitro and in vivo conditions, and compare the results to other model candidates. Finally, we present conclusions and consider the future role of cell population balances in modeling virus dynamics.

9.1 Population Balance Modeling

The general population balance equation for cell populations arises from the seminal contribu- tion of Fredrickson, Ramkrishna, and Tsuchiya [36]. In recent years, this modeling framework has returned to the literature as researchers strive to adequately reconcile model predictions with the dynamics demonstrated by experimental data [80, 10, 33]. Also, new measurements such as flow cytometry offer the promise of actually differentiating between cells of a given population [1, 67], again implying the need to model distinctions between cells in a given pop- ulation. Here, we present a brief derivation for models encompassing a population of infected cells as well as intracellular and extracellular components of interest. In a deterministic setting, we can model the infected cell population by deriving a cell population balance from the equation of continuity. Here we define the concentration of in- fected cells as a function of time (t) and the internal (y) and external (x) characteristics of the

1Portions of this chapter to appear in Haseltine, Rawlings, and Yin [60]. 152 system:

η(t, z)dz = concentration of infected cells (9.1) " # " # x external characteristics z = = (9.2) y internal characteristics

We can then write a conservation equation for these cells by considering an arbitrary control volume V (t) spanning a space in x and y, assuming that V (t) contains a statistically signif- icant number of cells. Following the same arguments presented in section 2.1 results in the microscopic equation of continuity, equation (2.8). This equation is the most general form of our proposed model. We reiterate that the only assumption made thus far is that we consider a statistically significant number of cells. We now must specify segregations for the infected cell population. First, we assume that the cells are well-mixed; this assumption allows us to eliminate the spatial dimensions from equation (2.8):

∂η(t, y) + ∇ · (η(t, y)v ) = R (9.3) ∂t y η Next, we propose differentiating among the stage of infection for infected cells by using the infected cell age. The cell age acts as a “clock” that starts upon initial infection of an uninfected cell and ends upon the death of this cell. Although such a parameter cannot be explicitly mea- sured, it can nonetheless be identified experimentally through its effect upon other observable quantities such as the expression of viral products. Because the age changes with time in the usual way, the age velocity term is unity,

y = τ = infected cell age (9.4)

vy = 1 (9.5)

Additionally, modeling the intracellular biochemical network necessitates augmenting the cell population balance with mass balances for viral components (labeled component i). Since the intracellular components are also segregated by the cell age, derivation of these mass balances follows that for the infected cell population (i.e. from equation (2.3) to (9.3)), yielding

∂ij ∂ij + = R + E j = 1, . . . , n (9.6) ∂t ∂τ j j in which

• Rj is the intracellular production rate of component j. Processes such as transcription and of the viral genome are examples of events contributing to Rj.

• Ej accounts for the effect of extracellular events on the intracellular production rate of component j. An example of such an event includes superinfection of an infected cell, which inserts additional viral genome and proteins into the cell. 153

We model extracellular components (labeled component e) as well-mixed and unsegregated (i.e. having no τ-dependence). The production rates for extracellular components may also be a function of both extracellular (E) and intracellular (R) events. In this case, however, infected cells produce and secrete extracellular components at an age-dependent rate. The conservation equation for the extracellular component, then, includes an integration of the intracellular rate over the infected cell population:

Z τd ∂ek = Ek + η(t, τ)Rkdτ k = 1, . . . , m (9.7) ∂t 0

Here, τd specifies the age of the oldest infected cell. Examples of processes contributing to Ek and Rk include regeneration of uninfected cells and secretion of virus from infected cells, respectively. The comprehensive model for this system is

∂η(t, τ) ∂η(t, τ) + = R (9.8a) ∂t ∂τ η ∂ij ∂ij + = R + E j = 1, . . . , n (9.8b) ∂t ∂τ j j Z τd ∂ek = Ek + η(t, τ)Rkdτ k = 1, . . . , m (9.8c) ∂t 0

9.2 Application of the Model to Viral Infections

We now consider application of this model to a generic viral infection. We first outline the basic intracellular and extracellular events occurring in such an infection, discuss further model refinements, and present the numerical technique used to solve the final model.

9.2.1 Intracellular Model

At the intracellular level, we incorporate events from a simple structured model of virus growth [143]:

k1 nucleotides + gen −→ tem 1 = k1i V1 i gen (9.9a) V1 k2 amino acids −→ str 2 = k2i V2 i tem (9.9b) V2, tem k3 nucleotides −→ gen 3 = k3i tem (9.9c) tem k4 str −→ degraded 4 = k4i str (9.9d)

k5 gen + str −→ secreted virus 5 = k5i geni str (9.9e)

Here, gen and tem are the genomic and template viral nucleic acids respectively, str is the viral structural protein, V1 and V2 are viral enzymes that catalyze their respective reactions, 154 and the reaction rates are given by the  expressions. These events account for the insertion of the viral genome into the host nucleus, production of a viral template used to replicate the viral genome and mass-produce viral structural protein, and the assembly and secretion of viral progeny. We assume that host nucleotides and amino acids are available at constant concentrations. Therefore, the only intracellular components that we must track are the tem, gen, str, V1, and V2 components.

9.2.2 Extracellular Events

At the extracellular level, we adopt a standard model [98]:

k6 virus + uninfected cell −→ infected cell 6 = k6e vire unc (9.10a)

k7 virus −→ degraded 7 = k7e vir (9.10b)

k8 infected cell −→ death 8 = k8e inf (9.10c)

k9 uninfected cell −→ death 9 = k9e unc (9.10d)

k10 precursors −→ uninfected cell 10 = k10 (9.10e)

These events address the intuitive notions of cell growth, death, and infection by free virus. From this point forward, we use the abbreviations unc, inf, and vir for uninfected host cells, infected host cells, and virus.

9.2.3 Final Model Refinements

Further model assumptions include:

• Reaction rates of intracellular and extracellular events follow simple, mass-action ki- netics. All reactions are elementary as written except for enzyme-catalyzed reactions, in which case the expressions result from performing model reduction on Michaelis- Menten kinetics.

• Infected cells are created at age zero due to interaction between uninfected cells and free virus, and infected cells die at an exponential rate until age τd

Rη = k6euncevirδ(τ) − η(t, τ)(k8 + δ(τ − τd)) (9.11)

Here, δ is the Dirac delta function. Also, an initial infection corresponds to insertion of 1 gen/cell, 80 V1/cell, and 40 V2/cell into an uninfected cell.

• No superinfection of infected cells occurs.

• Concentrations of intracellular enzymes remain constant throughout the life cycle of an infected cell. 155

Therefore, our final model is ∂η(t, τ) ∂η(t, τ) + = k e e δ(τ) − (k + δ(τ − τ )) η(t, τ) (9.12a) ∂t ∂τ 6 unc vir 8 d ∂i (t, τ) ∂i (t, τ) tem + tem = R (9.12b) ∂t ∂τ tem ∂igen(t, τ) ∂igen(t, τ) + = R + δ(τ) (9.12c) ∂t ∂τ gen ∂i (t, τ) ∂i (t, τ) str + str = R (9.12d) ∂t ∂τ str ∂iV (t, τ) ∂iV (t, τ) 1 + 1 = 80δ(τ) (9.12e) ∂t ∂τ ∂iV (t, τ) ∂iV (t, τ) 2 + 2 = 40δ(τ) (9.12f) ∂t ∂τ de unc = k − k e − k e e (9.12g) dt 10 9 unc 6 unc vir Z τd devir = −k7evir − k6euncevir + η(t, τ)Rvir(τ)dτ (9.12h) dt 0

9.2.4 Model Solution

To solve the model, we use orthogonal collocation on finite elements of Lagrange polynomi- als [155, 121, 127]. This method approximates functions of multiple coordinates, e.g. η(t, τ), by a linear combination of Lagrange interpolation polynomials:

n X η(t, τ) ≈ Lj(τ)η(t, τj) (9.13) j=1 in which Lj is a Lagrange interpolation polynomial of degree n, and η(t, τj) is the function evaluated at the point τj. Accordingly, we can approximate the age derivative at each colloca- tion point as

n ∂η(t, τ) X ∂Lj(τ) ≈ η(t, τj) (9.14) ∂τ ∂τj τ=τj j=1 n X ≈ Aijη(t, τj) (9.15) j=1 in which the matrix A is the derivative weight matrix. Also, we can approximately evaluate integrals by using quadrature

n Z τd X η(t, τ)dτ ≈ qjη(t, τj) (9.16) 0 j=1 156

th where qj is the j quadrature weight. This method is known as the global orthogonal collocation method when only one collocation element is applied to the entire domain of interest. Alternatively, one could split the domain into multiple subdomains, then apply a collocation element to each subdomain; in this case, the method is called orthogonal collocation on finite elements. Collocation on finite elements permits concentration of elements in regions where sharp gradients exist, a case that normally causes difficulties in global orthogonal collocation. At the junction of finite elements, one imposes continuity of the population η(t, τ), i.e.

η(t, τ)|τ − = η(t, τ)|τ + (9.17) in which the boundary between elements occurs at τ = τ, and τ − and τ + represent the bound- aries of the adjoining finite elements [127]. Note that the number of boundary conditions at the junction of elements is equal to the number of partial derivatives due to segregations (i.e. τ), and that the order of each boundary condition is one less than the order of its partial deriva- tive. Unless otherwise specified, we use only one collocation element in our discretization. The collocation method is very sensitive to large changes of the order of magnitude for the approximating function. Since equation (9.12a) indicates that η(t, τ) changes exponentially, we use a logarithmic transformation to scale η(t, τ). Applying this method to equation (9.12) in effect discretizes the integro-partial differential equation into a system of differential alge- braic equations (DAE’s). We then use the software package DASPK [15] to integrate the DAE system. Orthogonal collocation on finite elements presents merely one manner of solving equa- tion (9.12). We refer the interested reader to Mantzaris, Daoutidis, and Srienc [87, 88, 89] for an excellent overview of other numerical methods used to solve similar equations.

9.3 Application to In Vitro and In Vivo Conditions

To better understand the cell population balance, we apply model (9.12) to both in vitro and in vivo conditions. We also compare the results of the model to other commonly used models.

9.3.1 In Vitro Experiment

Here we construct an in silico example to simulate a laboratory experiment. Our apparatus is a well-mixed, batch reactor containing uninfected cells in which nutrients are provided to sus- tain cells without growth. We assume that assays are available that measure the concentration of uninfected cells, infected cells, virus, genome, template, structural protein, and V1 and V2 viral enzymes. With the goal of determining the intracellular kinetics, we consider performing the following experiment: infect a population of cells and measure components for a sample of cells. Although this technique has the disadvantage of introducing the population dynam- ics into the measurements, sampling a statistically significant number of cells has two primary advantages: 157

Parameter Value Units τd 100 days −4 k1 3.13 × 10 cell/(#-day) k2 25.0 cell/(#-day) −1 k3 0.7 day −1 k4 2.0 day −6 k5 7.5 × 10 cell/(#-day) −9 k6 5.0 × 10 host/(#-day) −2 −1 k7 8.0 × 10 day −2 −1 k8 5.0 × 10 day −2 −1 k9 1.0 × 10 day k10 0 #/(host-day)

Table 9.1: Model parameters for in vitro simulation

• stochastic effects and cell to cell variations should average out, and

• we can adjust the sample size so that each component can be detected by its assay and consistency with the key assumption of the continuity equation (statistically significant number of cells) is maintained. We simulate the population balance model (9.12) with parameters given in Table 9.1 for the following initial conditions: 1. extracellular virus >> uninfected cells (all uninfected cells are infected initially), and

2. extracellular virus > uninfected cells (only a fraction of uninfected cells are infected ini- tially). Experimental observations indicate that infected cells die [83]. Perhaps the simplest way to account for cell death is to combine the intracellular model (9.9) with a simple popula- tion balance, i.e. de unc = −k¯ e − k¯ e e (9.18a) dt 5 unc 2 unc vir de inf = −k¯ e + k¯ e e (9.18b) dt 4 inf 2 unc vir de vir = −k¯ e − k¯ e c + R e (9.18c) dt 3 vir 2 unc vir vir inf Equation (9.18) is a structured, unsegregated model. Next, we perform parameter estimation and model reduction 2 to obtain an optimal fit of the structured, unsegregated model (9.18) to the data generated by the population balance (structured, segregated) model (9.12). For the sake of brevity, we do not report any of the fitted rate constants (k¯’s). Examining this optimal fit provides insight into the limitations of structured, unsegregated models.

2Rawlings and Ekerdt [120] provide the details of this method. 158

Case 1: All Uninfected Cells Infected Initially

Figure 9.1 presents the results for this case. These results indicate that the structured, unsegre- gated model provides an excellent fit to the data. Since all uninfected cells are infected within a relatively short period of time (roughly ten days), the approximation that all cells behave the same is valid; hence the good fit to the data. We contrast these results to those obtained from only simulating the intracellular events (i.e. Figure 9.2). Over the same time, the purely intracellular model predicts that all intra- cellular components increase monotonically throughout the experiment. We therefore infer that the phenomenon of cell death causes the maxima observed in the measured intracellular components. This observation reiterates the fact that experiments of this type introduce the population dynamics into the measurements.

Case 2: A Fraction of Uninfected Cells Infected Initially

Figure 9.3 presents the results for this case. Examination of these results indicate that roughly two rounds of infection initiation occur (marked by peaks in the infected cell population): the first round within the first ten days of the experiment, corresponding to the initial infection; and the second round at roughly 75 to 100 days, corresponding to infection of uninfected cells from virus produced by the first round of infected cells. Since the structured, unsegregated model assumes that all cells behave on average the same, it cannot adequately describe the phenomenon of multiple rounds of infection. As a result, this model provides a sub-par fit to the data. Also, we note that multiple rounds of infection have been observed experimentally in continuous flow reactors [66, 151, 134, 74] as opposed to the conditions simulated here which are batch experiments.

9.3.2 In Vivo Initial Infection

We now consider the in vivo behavior of the cell population balance for an initial infection of a virus-free host. Here, the initial condition is the steady state of the system with no virus. For the sake of illustration, we account for the host immune response very simply: comparison of Tables 9.1 and 9.2 shows that, in contrast to the in vitro system, the in vivo system:

• clears extracellular virus more rapidly (faster decay due to a larger value of k7), and

• uninfected host cells are produced at a nonzero rate (k10 is now nonzero).

Figure 9.4 demonstrates the host response for all extracellular components. The system exhibits three stages of infection: first, a period of relative dormancy for roughly two infection cycles (200 days ≈ 2τd); next, a cycle of rapid infection leading first to a peak in the infected cell then virus population; and finally, an approach to an infected steady state. In the first stage, both the extracellular virus and infected cell populations are actually increasing steadily. However, in contrast to the rapid rate of infection observed during the second stage, the first stage appears to be dormant on the scale of Figure 9.4. 159

100 8 90 7 #/host) 80 #/host) 5 6 5 − 70 −

10 5

60 10 ×

50 × 4 40 3 30 2 20 10 1

0 infected cells ( 0 uninfected cells ( 200 40 8060 100 200 40 8060 100 Time (Days) Time (Days) 10 35 9 30 8 7 25 #/host) 6 #/host) 7

5 20 − 5 − 10

10 15

× 4 × 3 10

2 tem ( virus ( 5 1 0 0 200 40 8060 100 200 40 8060 100 Time (Days) Time (Days) 9 16 8 14 7 12 6 #/host)

#/host) 10 8 6

5 −

− 8 10

10 4 ×

× 6 3 2 4 gen ( 1 struct ( 2 0 0 200 40 8060 100 200 40 8060 100 Time (Days) Time (Days)

Figure 9.1: Fit of a structured, unsegregated model to experimental results. Initial condition is such that all uninfected cells are quickly infected by virus. Points present the “experimental” data obtained by solving the population balance model (structured, segregated model). Lines present the optimal fit of the structured, unsegregated model to the “experimental” data. 160

106 str 105

104 secreted virus

103 gen 2 10 tem

Concentration 10

1

10−1 200 40 8060 100 Time (Days)

Figure 9.2: Time evolution of intracellular components and secreted virus for the intracellular model

Parameter Value Units τd 100 days −4 k1 3.13 × 10 cell/(#-day) k2 25.0 cell/(#-day) −1 k3 0.7 day −1 k4 2.0 day −6 k5 7.5 × 10 cell/(#-day) −9 k6 5.0 × 10 host/(#-day) −1 k7 1.0 day −2 −1 k8 5.0 × 10 day −2 −1 k9 1.0 × 10 day 6 k10 1.0 × 10 #/(host-day)

Table 9.2: Model parameters for in vivo simulation

For the in vivo case, structured, unsegregated models do not offer an adequate repre- sentation of the system. Firstly, the “average cell” approximation ignores the cyclic nature of an infection because we must assume that the average cell reaches a steady state when intu- itively we know that cells are regenerating and dying. Secondly, our intracellular model (see Figure 9.2) does not reach a steady state over the life time of an infected cell (i.e. 100 days), so making the in vivo model reach a steady state requires unphysical changes to either the intracellular or extracellular description. 161

10 25 9

#/host) 8 20 #/host) 5 4 − 7 − 10

6 10 15 ×

5 × 4 10 3 2 5 1

0 infected cells ( 0 uninfected cells ( 500 150100 200 500 150100 200 Time (Days) Time (Days) 10 14 9 12 8 7 10 #/host) 6 #/host) 6

5 8 − 5 − 10

10 6

× 4 × 3 4

2 tem ( virus ( 2 1 0 0 500 150100 200 500 150100 200 Time (Days) Time (Days) 35 7 30 6 25 5 #/host) #/host) 8

5 20 4 − − 10

10 15 3 × × 10 2 gen (

5 struct ( 1 0 0 500 150100 200 500 150100 200 Time (Days) Time (Days)

Figure 9.3: Fit of a structured, unsegregated model to experimental results. Initial condition is such that not all uninfected cells are initially infected by virus. Points present the “experimen- tal” data obtained by solving the population balance model (structured, segregated model). Lines present the optimal fit of the structured, unsegregated model to the “experimental” data. 162

20

#/host) 18 7 − 16 10

× 14 12 uninfected cells 10 virus 8 6 4 infected cells 2 0

Extracellular Components ( 0 100 300200 400 600500 Time (Days)

Figure 9.4: Dynamic in vivo response of the cell population balance to initial infection

25 #/host) 7 − 20 10 × 15

10 virus uninfected cells

5 infected cells 0

Extracellular Components ( 0 100 300200 400 600500 Time (Days)

Figure 9.5: Extracellular model fit to dynamic in vivo response of an initial infection

Alternatively, we could incorporate only the extracellular events (9.10) in a mathemat- ical description as so: de unc = kˆ − kˆ e − kˆ e e (9.19a) dt 1 5 unc 2 unc vir de inf = −kˆ e + kˆ e e (9.19b) dt 4 inf 2 unc vir de vir = −kˆ e − kˆ e c + kˆ e (9.19c) dt 3 vir 2 unc vir 6 inf 163

Model (9.19) Model (9.12) Parameter Fit Value 95% Confidence Interval Parameter Value Units 5 5 6 kˆ1 7.96 × 10 ±2.17 × 10 k10 1.0 × 10 #/(host-day) −9 −9 −9 kˆ2 4.28 × 10 ±1.04 × 10 k6 5.0 × 10 host/(#-day) −2 −3 −1 kˆ3 1.56 × 10 ±2.16 × 10 k7 1.0 day −2 −2 −2 −1 kˆ4 3.86 × 10 ±1.07 × 10 k8 5.0 × 10 day −2 −3 −2 −1 kˆ5 1.04 × 10 ±2.10 × 10 k9 1.0 × 10 day −3 −1 kˆ6 0.104 ±8.68 × 10 NA day unc(t=0) 2.14 × 108 ±5.68 × 107 unc(t=0) 108 #/host vir(t=0) 37.2 ±9.25 vir(t=0) 1000 #/host

Table 9.3: Comparison of actual and fitted parameter values for in vivo simulation of an initial infection

This model differs only from that of Wodarz and Nowak [164] in that we assume infection of an uninfected cell by a virus consumes the virus. Again, we attempt to optimally fit this model (9.19) to the cell population balance results 3. Figure 9.5 shows that this model cannot exhibit the same behavior as the cell population balance; most noticeably, the purely extracel- lular model cannot capture the dynamics of the initial dormant phase nor the burst of virus that follows the peak in the infected cell population. Table 9.3 illustrates that the fitted and ac- tual parameters do not match to 95% confidence, but all fitted parameters are roughly the same order of magnitude with the exception of the virus decay parameter (k7 and kˆ3) and the ini- tial virus concentration. This discrepancy occurs because the purely extracellular model (9.19) lumps all intracellular virus production events together. This result indicates that unstruc- tured, lumped parameter models can supply unreliable estimates for parameters that govern individual events.

9.3.3 In Vivo Drug Therapy

Now we consider in vivo response to drug therapy. In particular, we examine the extracellular effect that viral enzyme inhibitors I1 and I2 produce by affecting the intracellular enzymes V1 and V2, respectively. Thus, the extracellular events associated with the drug therapy are

k13 I1 −→ degraded / secreted 13 = k13e I1 (9.20a)

k14 I1 + unc −→ I1(adsorbed) + unc 14 = k14e I1 e unc (9.20b)

k15 I2 −→ degraded / secreted 15 = k15e I2 (9.20c)

k16 I2 + unc −→ I2(adsorbed) + unc 16 = k16e I2 e unc (9.20d)

3 Optimal fit corresponds to a least squares fit for the residual log10(yk + ci) − log10(sk + ci), where log10 is the base ten logarithm, yk is the measurement vector, sk is the model predicted measurement vector, i is a vector of ones, and c is a small constant. Also, the initial uninfected cell and virus concentrations were used as model parameters. 164

In equations (9.20b) and (9.20d), we use the notation “(adsorbed)” to designate that the ex- tracellular drugs have been adsorbed into a cell. Intracellularly, these drugs then interact as so:

K1 V1 + I1 )−*− V − I1 (9.21a)

K2 V2 + I2 )−*− V − I2 (9.21b)

k11 I1 −→ secreted 11 = k11i I1 (9.21c)

k12 I2 −→ secreted 12 = k12i I2 (9.21d)

For this situation, we assume that:

1. equilibrium holds for the intracellular reactions (9.21a) and (9.21b);

2. all other reactions in (9.21) and (9.20) are elementary as written;

3. the inhibitors interact only with uninfected cells; and

4. the extracellular drug intake can be modeled as an overdamped second-order, linear function [99] of the form   ζt   βt ζ βt uIj (t) =u ¯Ij 1 − exp − cosh + sinh (9.22a) τu τu β τu 2 0.5 β = ζ − 1 (9.22b)

assuming that a change in the drug intake occurs at time t = 0.

Parameters for this model are given in Tables 9.2 and 9.4. The initial condition for this model corresponds to the steady state of the previous section (see Figure 9.4). Figure 9.6 presents the dynamic response for in vivo drug therapy. This response demon- strates the characteristic “pharmacokinetic lag” observed experimentally in viral treatments [13, 101]; however, this lag is directly attributable to modeled events, namely the drug intake dy- namics, the assumption that the drugs interact only with uninfected cells, and the intracellular dynamics of drug interaction with virus enzymes. In contrast, purely extracellular models must lump each of these individual events into (generally) a single parameter to describe this lag, as examined by Perelson et al. [101]. Another attractive feature of the cell population balance over the purely extracellular model is the ability to examine the effects that perturbations to the intracellular model have upon the extracellular components. As an example, we consider the effect that changes in the efficacy of the viral inhibitors have upon the extracellular uninfected cell and virus concentra- tions. Such a change in efficacy may result, for example, by a mutation in the viral enzymes causing decreased efficiency in the viral enzyme-inhibitor interaction. Also, we assume that intracellular drug concentrations cannot exceed values of 45 and 60 #/cell for iI1 and iI2 , re- spectively, due to adverse side-effects of the inhibitors. Plots (a) and (b) of Figure 9.7 present 165

Parameter Value Units K1 1.0 cell/# K2 1.0 cell/# −1 k11 100. day −1 k12 100. day −1 k13 10. day −3 k14 1.0 × 10 host/(#-day) −1 k15 9.0 day −4 k16 8.0 × 10 host/(#-day) 7 u¯I1 4.0 × 10 #/(host-day) 7 u¯I2 4.0 × 10 #/(host-day) ζ 1.1 unitless τu 10. day

Table 9.4: Additional model parameters for in vivo drug therapy the results for the nominal case. If the goal of the drug therapy is to maximize the uninfected cell concentration while minimizing the virus concentration, then the optimal treatment strat- egy is to maximize intake of both drugs. Plots (c) and (d) of Figure 9.7 present the results for a mutated virus corresponding to an 80% and 90% decrease in the binding constants K1 and K2, respectively. After the mutation, the optimal treatment strategy is actually to maximize I1 intake and stop treatment with I2. 166

4.5 I1

#/host) 4.0

6 I2 − 3.5 10

× 3.0 ( 2.5 2.0 1.5 1.0 0.5 0.0

Extracellular Inhibitor 0 100 300200 400 500 Time (Days) 2.0 1.8 1.6 infected 1.4 #/host)

7 1.2 uninfected −

10 1.0 × ( 0.8 0.6 Cells 0.4 0.2 0 100 300200 400 500 Time (Days) 9.0 8.0 #/host) 7

− 7.0 10

× 6.0 ( 5.0 4.0 3.0 2.0 1.0 Extracellular Virus 0 100 300200 400 500 Time (Days)

Figure 9.6: Dynamic in vivo response to initial treatment with inhibitor drugs I1 and I2. 167 6050 6050 (b) (d) 40 40 1 1 I I (#/cell) (#/cell) 2 2 I I 3020 3020 increasing increasing Intracellular Intracellular 10 10 0 0 9 8 7 6 5 8 7

10 10 10 10 10 10 10

xrclua iu (#/host) Virus Extracellular xrclua iu (#/host) Virus Extracellular 6050 6050 (c) (a) steady states. Amount of (a) uninfected cells and (b) extracellular virus given 40 40 1 1 I (#/cell) (#/cell) I in vivo 2 2 I I 3020 3020 increasing increasing Intracellular Intracellular 10 10 0 0

0 8 6 4 2 0

80 70 60 50 40 30 20 10 16 14 12 10

#/host) ( Cells Uninfected #/host) ( Cells Uninfected

10 × 10 ×

6 − 6 − nominal drug efficacy. Amountmutation. of (c) uninfected cells and (d) extracellular virus given reduced drug efficacy due to virus Figure 9.7: Effect of drug therapy on 168 9.4 Future Outlook and Impact

The cell population balance offers an intuitive, flexible environment for modeling the com- bined intracellular and extracellular events associated with viral infections. Because this model has segregations, it can account for observed phenomena such as multiple rounds of infection and pharmacokinetic delays associated with drug treatments of infections. Because this model has structure, it can examine the effects that each intracellular component has upon the dy- namics of the extracellular components. Neither structured, unsegregated models nor purely extracellular models can account for both of these phenomena.

Validation of cell population balance models requires experimental measurements of both extracellular populations and intracellular viral components. Traditional assays already offer a means for measuring extracellular populations; for example, clinicians routinely mea- sure both host CD4+ T-cells and virus titers in HIV-infected patients. Methods such as poly- merase chain reaction (PCR), western blotting, and plaque assays offer quantitative intracellu- lar measurements of the viral genome, proteins, and infectious viral progeny, respectively. Cell population balance models provide one method of adequately assimilating the data contained in these measurements.

For in vitro experiments, we suspect that modifications to existing protocols may yield new information about the structure of the population balance model. For example, most studies of replication for animal viruses rely on one-step growth curves in which all cells in a culture are infected simultaneously [162]. While such experiments have supplied informa- tion on the intracellular dynamics of a single infection cycle, they offer no insight into how virus-mediated activities, such as activation of cellular antiviral responses and cell-cell com- munication, may influence the subsequent dynamics of viral propagation. New in vitro meth- ods currently being developed [26, 28] allow viruses to infect cells sequentially rather than simultaneously, opening new opportunities to probe virus-host interactions at multiple levels.

A good quantitative model of how viral infections propagate will lead to better under- standing of how to best control this propagation. For example, steady-state analysis for in vitro drug therapy revealed that the optimal treatment strategy for one particular virus mu- tation requires stopping treatment with one drug. This counterintuitive result highlights a potential pitfall of current strategies that aim to thwart the emergence of drug-resistant virus mutants by employing multiple anti-viral drugs. Another intriguing possibility would be to perform sensitivity analysis for both intracellular components and rate constants to determine which ones have the greatest impact upon extracellular components such as the virus concen- tration. This analysis could then focus drug development towards those candidates having maximum therapeutic benefit. One could also consider tailoring therapies by characterizing both the virus and immune system for a given individual, rather than relying on general drug regimens obtained from the best “average” response for a given study. 169 Notation

A derivative weight matrix for orthogonal collocation c small constant

Ej extracellular production rate

ej extracellular viral component i a vector of ones

ij intracellular viral component

Kj equilibrium constant for the segregated, structured model

kj reaction rate constant for the segregated, structured model ¯ kj reaction rate constant for the unsegregated, structured model ˆ kj reaction rate constant for the purely extracellular model

Lj(τ) Lagrange interpolation polynomial of degree n for orthogonal collocation

log10 base ten logarithm qj jth quadrature weight for orthogonal collocation

Rj jth intracellular production rate

Rη production rate for the infected cell population η

sk measurement vector predicted by the model t time

uj second-order input for extracellular component j

u¯j input for extracellular component j V (t) arbitrary, time-varying control volume spanning a space in z

vy vector specifying the y-component velocity of cells flowing through the volume V x external characteristics y internal characteristics

yk experimental measurement vector z internal and external characteristics β parameter for the second-order input function δ Dirac delta function

j jth reaction rate η(t, z)dz concentration of infected cells

η(t, τj) infected cell concentration evaluated at the point τj τ infected cell age

τd age of the oldest infected cell permitted by the model

τu natural period of the second-order input function ζ damping coefficient of the second-order input function 170 171

Chapter 10

Modeling Virus Dynamics: Focal Infections

We consider using dynamic models to obtain a better quantitative and integrative understand- ing of both viral infections and cellular antiviral mechanisms. We expect this approach to provide key insights into mechanisms of viral pathogenesis and host immune responses, as well as facilitate development of effective anti-viral strategies. Our focus, however, is not to incorporate all the wealth of information already known about either of these topics; rather, we seek to identify the critical biological and experimental phenomena that give rise to the ex- perimental observations. We consider the focal infection system described by Duca et al. [26], which permits quantification of multiple rounds of viral infection. This experimental system provides a unique platform for studying multiple rounds of the virus replication cycle as well as the innate ability of host cells to combat the invading virus. We consider the example virus/host system of vesicular stomatitis virus (VSV) propa- gating on either baby hamster kidney (BHK-21) cells or murine astrocytoma (DBT) cells. VSV is a member of the Rhabdoviridae family consisting of enveloped RNA viruses [129]. Its com- pact genome is only approximately 12 kb in length, and encodes genetic information for five proteins. Because VSV is highly infective and grows to high titer in cell culture, it is viewed as a model system for studying viral replication [64, 7]. Also, VSV infection can elicit an interferon-mediated antiviral response from host cells [129]. Thus the studied experimental system provides a platform for further probing the quantitative dynamics of this antiviral re- sponse. A great wealth of information is known about the interferon antiviral response (see, for example, [133, 54]). We seek to elucidate what level of complexity is requisite to explain the experimental data. Yin and McCaskill [165] first proposed a reaction-diffusion model to capture the dy- namics of plaque formation due to viral infection. The authors derived model solutions for this formulation in several limiting cases. You and Yin [166] later refined this model and used a finite difference method to numerically solve the time progression of the resulting model. Fort [34] and Fort and Mendez´ [35] revised the model of You and Yin [166] to account for the delay associated with intracellular events required to replicate virus, and derived expressions for the velocity of the propagating front. These works, however, focused on explaining the 172

Step 1: Monolayers fixed at selected times

¢ ¢ ¢

¡ ¡ £ £ £

¢ ¢ ¢

¡ ¡ £ £ £

¢ ¢ ¢

¡ ¡ £ £ £

uninfected cells Step 2: Removal of agar and washes

focal infection

¤ ¤ ¤ ¦ ¦ ¦

¥ ¥ § § §

¤ ¤ ¤ ¦ ¦ ¦

¥ ¥ § § §

¤ ¤ ¤ ¦ ¦ ¦

¥ ¥ § § § Measurement Imaging

Step 3: Antibody labeling for viral glycoprotein

¨ ¨

©

infection spread ©

¨ ¨

© ©

¨ ¨

© © Key:

Step 4: Detection by antibody immunofluorescence Antibody Virus

Dead cell Infected cell

     

   

     

   

     

    Uninfected cell

Figure 10.1: Overview of the experimental system. Initially, host cells are grown in a confluent monolayer on a plate. The cells are then covered by a layer of agar. To initiate the infection, a pipette (one mm radius) is used to carefully remove a small portion of the agar in the center of the plate. An initial inoculum of virus is then placed in the resulting hole in the agar, initiating the infection. The agar overlay serves to restrict virus propagation to nearby cells. To monitor the infection spread, monolayers are fixed at various times post-infection. The agar overlay is removed and the cells are rinsed several times, the last time with a labeled antibody that binds specifically to the viral glycoprotein coating the exterior of the virus capsid. Images of the monolayers are then acquired using an inverted epifluorescent microscope. velocity of the infection front, a quantity derived from experimentally-obtained images of the infection spread. Our goal in this chapter is to explain the infection dynamics contained within the entire images. In this chapter, we first briefly review the experimental system of interest. Next, we outline the steps taken to analyze the experimental measurements (images of the infection spread) and propose a measurement model. We then successively formulate, fit, and refine models using the analyzed images, first for VSV infection of BHK-21 cells, then for DBT cells. Finally, we analyze the results of the parameter fitting and present conclusions.

10.1 Experimental System

Here we briefly review the experimental system of interest; for detailed information on the experimental procedure, we refer the interested reader to Duca et al. [26]. This system permits dynamic, spatial quantification of virus protein via antibody immunofluorescence. Figure 10.1 presents a general schematic of this experimental system along with a digital image acquired during such an infection. Initially, host cells are grown in a confluent monolayer on a plate. The cells are then covered by a layer of agar. To initiate the infection, a pipette (one mm radius) is used to carefully remove a small portion of the agar in the center of the plate. An 173

Parameter Symbol Value −9 Cell volume Vc 3.4 × 10 ml 6 Initial number of uninfected cells nunc,0 10 cells 4 Number of viruses in the initial inoculum nvir,0 8.0 × 10 viruses Radius of the plate rplate 1.75 cm

Table 10.1: Parameters used to describe the experimental conditions. initial inoculum of virus is then placed in the resulting hole in the agar, initiating the infection. The agar overlay serves to restrict virus propagation to nearby cells. To monitor the infection spread, monolayers are fixed at various times post-infection. The agar overlay is removed and the cells are rinsed several times, the last time with a labeled antibody that binds specifically to the viral glycoprotein coating the exterior of the virus capsid. Images of the monolayers are then acquired using an inverted epiflourescent microscope.

10.1.1 Modeling the Experiment

Table 10.1 presents parameters used to model the experimental conditions. We assume that cells are spherical objects, with the height of the cell monolayer equal to the resulting cell diam- eter. Concentrations for all species are calculated assuming that the volume of the monolayer is cylindrical. The dimensions of this cylinder are given by the height of the cell monolayer and the radius of the plate. We model the concentration of the initial virus inoculum using the piecewise linear continuous function  c , r < 0.075 cm  vir,0 c (t = 0, r) = 20  vir 1 − cm (r − 0.075) cvir,0 0.075 cm ≤ r ≤ 0.125 cm (10.1)  0, r > 1.25 cm

10.1.2 Modeling the Measurement

We assume that the measurement process (steps one through four in Figure 10.1) is an equi- librium process in which virus associates indiscriminately with cells in the monolayer. Addi- tionally, dead cells undergo a change in morphology which decreases their ability to remain bound to the plate during removal of the agar overlay. We account for this effect by estimating kwash, the fraction of dead cells that adhere to the plate after the removal of the agar overlay and the subsequent washes. Accordingly, the amount of virus bound to host cells is given by the expression cvir-host Km = (10.2) cvir (cunc + cinfc + kwashcdc) in which Km is the equilibrium constant, and cvir, cunc, cinfc, cdc, and cvir-host refer to the con- centrations of virus, uninfected cells, infected cells, dead cells, and virus-host complexes, re- spectively. 174 Original Image Averaged Image Intensity Intensity 255 255 254 254

k . . m . . ibgd 1 1

0 0

vmin vmax vmin vmax Virus-Host Complex Virus-Host Complex

Figure 10.2: Measurement model. The original images quantize the virus-host concentration, a continuous variable, onto the integer-valued intensity. Each in the averaged images is the mean of 400 from the original image, and we approximate the step-wise discontinuous intensity (incremented by 1/400) as a piece-wise, continuous function.

10.1.3 Analyzing and Modeling the Images

We have reduced the amount of information in each image by partitioning the images into blocks of 20 pixels by 20 pixels, then averaging the pixels contained in each block. This aver- aging technique has the primary benefit of drastically reducing the total number of pixels that must be analyzed (in the case of the largest image, from roughly two million to five thousand pixels) while retaining the prominent features of the infection spread. We assume that the intensity of each pixel in the image is due to the background flu- orescence of cells and linear variation in the concentration of virus-host complexes, which fluoresce due to the labeled antibody. In the original images, the intensity information quan- tizes this essentially continuous variable into a step-wise, discontinuous signal (integer valued from 0 to the saturating value of 255). For the averaged images, the intensity information is step-wise, discontinuous with increments of 1/400. We approximate this signal using a piece- wise continuous function. The comparison between the measurement model for the original and averaged images is illustrated in Figure 10.2. The measurement model is then:

 i , c ≤ v  bgd vir-host min ym = kmcvir-host + ibgd, vmin < cvir-host < vmax (10.3)   255, cvir-host ≥ vmax

in which ym is the intensity measurement, km is the conversion constant from concentration to intensity, ibgd is the background fluorescence (in intensity), and vmin and vmax are the minimum and maximum detectable virus-host concentrations. 175

Time Data Original Model + Initial Inoculation (hours) 18

30

48

72

90

Figure 10.3: Comparison of representative experimental images to model fits. The full set of experimental images are available in the appendix. “Original Model” refers to the derived reaction-diffusion model. “+ Initial Inoculation” incorporates the variation in the concentra- tion of uninfected cells within the radius of the initial inoculation. The white scale bar in the upper left-hand corner of the experimental images is one millimeter.

10.2 Propagation of VSV on BHK-21 Cells

We first consider propagation of VSV on baby hamster kidney (BHK-21) cells. The first column of images in Figure 10.3 presents representative images for the time course of the experiment; the full set of experimental images are available in the appendix. For this virus/host system, the images demonstrate two prominent features: (1) the infection propagates unimpeded out- ward radially and (2) the band of intensity amplifies from the first to the third measurement. We now consider models to quantitatively capture both of these features. 176 10.2.1 Development of a Reaction-Diffusion Model

We extend the reaction-diffusion model first proposed by Yin and McCaskill [165] and later refined by You and Yin [166] to model this infection. We consider only extracellular species, namely virus, uninfected cells, infected cells, and dead cells. In this context, only the virus is allowed to diffuse, and we model the following reactions:

k virus + uninfected cell −→1 infected cell (10.4a) k infected cell −→2 Y virus (10.4b) in which Y is the yield of virus per infected cell. We assume that the infection propagation is radially symmetric. The concentrations of all species are then segregated by both time and radial distance, giving rise to the following governing equations for the model: ∂c 1 ∂  ∂c  vir = Deff r vir + R (10.5a) ∂t r ∂r vir ∂r vir ∂c unc =R (10.5b) ∂t unc ∂c infc =R (10.5c) ∂t infc 1 − φ Deff = 2D (10.5d) vir vir 2 + φ

φ = Ve(cunc + cinfc) (10.5e)

dcvir cj(t = 0, r) known, = 0 (10.5f) dr r=0,rmax

in which the reaction terms (e.g. Rvir) are dictated by the stoichiometry of reaction (10.4) assuming that the reactions are elementary as written. Also, diffusivity of the virus is hin- dered due to the presence of uninfected and infected cells on the plate. An effective dif- fusivity accounts for this effect. We solve equation (10.5) by discretizing the spatial dimen- sion using central differences with an increment of 0.025 cm, then solving the resulting set of differential-algebraic equations using the package DASKR, a variant of the predictor-corrector solver DASPK [15], with the banded solver option. We determine optimal parameter estimates by solving the following least squares optimization X T min Φ = min ek Rek θ θ k

s.t.: ek = yk − h(xk; θ) h iT xk = cvir cunc cinfc cdc Equation (10.5) which minimizes the sum of squared residuals between the vectorized images yk and the model-predicted images h(xk; θ) in a pixel by pixel comparison by manipulating the model parameters θ. Here we use a log10 transformation of the parameters for the optimization. 177

4 3.5 3 2.5 (#/ml) 7

− 2 10

× 1.5 unc c 1 0.5 Original Model + Initial Inoculation 0 0.20 0.4 0.80.6 1.41.21 1.81.6 Radius (cm)

Figure 10.4: Comparison of the initial uninfected cell concentration for the original and revised (accounting for the initial inoculation effect) models.

The second column of images in Figure 10.3 presents the results for the optimal fit. In comparison to the experimental data, the results demonstrate similar radial propagation of the infection front, but do not capture the amplification of intensity observed through the first three samples. To refine the model, we propose that the resulting amplification results from an initial condition effect. In particular, we allow the initial concentration of uninfected cells 0 to vary within the radius of the initial inoculum, and introduce the parameter cunc,0 in which  0 cunc,0, r < 0.075 cm  0  20  cunc,0   c (t = 0, r) = 1 + 1 − (r − 0.075) c ,0, 0.075cm ≤ r ≤ 0.125 cm (10.6) unc cm cunc,0 unc  cunc,0, r > 1.25 cm Performing the parameter estimation with this additional degree of freedom yields the altered initial concentration profile for uninfected cells in Figure 10.4 as well as the optimal fit pre- sented in the third column of images in Figure 10.3. Clearly this fit captures both the outward radial propagation of the infection as well as the amplification of the intensity in the first three images of the time series data.

10.2.2 Analysis of the Model Fit

Table 10.2 presents the parameter estimates for both the original and refined models. Both models predict roughly the same estimates for all parameters. Also, adding the parameter 0 cunc,0 reduces the objective function Φ by about five percent. Ware et al. [160] use laser light-scattering spectroscopy to estimate the diffusivity of the −8 2 2 VSV virion to be 2.326 × 10 cm /sec. Converting this value to cm /hr and taking the log10 178

Model 1 Model 2

Parameter Units log10 Value log10 Value −1 k1 hr −11.0 −10.8 3 k2 cm /hr 0.145 0.555 2 Dvir cm /hr −3.87 −3.94 Y 2.66 2.51

ibgd 1.50 1.50 −3 kmKm cm −15.9 −15.7

kwash −1.34 −1.39 −3 cunc,0 cm NA 6.25 Φ 2.11 × 106 2.00 × 106

Table 10.2: Parameter estimates for the VSV/BHK-21 focal infection models. Parameters are estimated for the log10 transformation of the parameters. NA denotes that the parameter is not applicable for the given model.

Eigenvector

log10 Parameter v1 v2 v3 v4 v5 v6 v7 k1 0.769 0.046 −0.069 0.147 −0.195 −0.233 −0.537 k2 −0.162 0.976 0.042 0.030 −0.077 −0.037 −0.102 Dvir −0.241 −0.080 −0.103 −0.812 −0.285 −0.088 −0.420 Y −0.468 −0.144 0.142 0.384 0.331 −0.051 −0.693

ibgd 0.046 −0.003 −0.030 0.098 −0.303 0.926 −0.195 kmKm 0.285 0.131 −0.142 −0.358 0.821 0.272 −0.076 kwash 0.147 −0.007 0.970 −0.182 0.021 0.052 0.006 Eigenvalue −1.05e7 −2.23e5 6.12e5 4.88e6 3.58e7 3.21e8 1.07e9

Table 10.3: Hessian analysis for the parameter estimates of the original VSV/BHK-21 focal infection model. Parameters are estimated for the log10 transformation of the parameters. Negative eigenvalues are likely due to error in the finite difference approximation used to calculate the Hessian. yields a value of −4.08. This value is very close to the estimated values of −3.87 and −3.94 (see Table 10.2). Table 10.3 analyzes the Hessian of the objective function for the parameter estimates of the original model. This analysis indicates that two linear combinations of parameters cannot be estimated due to negative eigenvalues (which most likely result from errors in the finite difference approximation of the Hessian). The first of these two linear combinations of param- eters, i.e. v1, is primarily constituted by the first reaction rate constant k1 and the virus yield Y . The second rate constant k2 accounts for virtually all of the second of these linear combi- nations. Table 10.4 analyzes the Hessian of the objective function for the parameter estimates of the revised model. This analysis indicates that two linear combinations of parameters can- not be estimated due to negative eigenvalues. These two linear combinations of parameters 179

Eigenvector

log10 Parameter v1 v2 v3 v4 v5 v6 v7 v8 k1 0.782 −0.019 −0.001 0.055 0.147 −0.167 −0.207 −0.541 k2 −0.068 −0.995 0.024 −0.036 0.011 −0.039 −0.015 −0.046 Dvir −0.212 0.035 −0.008 0.071 −0.828 −0.312 −0.070 −0.401 Y −0.504 0.065 0.005 −0.105 0.363 0.293 −0.050 −0.715

ibgd −0.004 −0.025 −0.999 0.002 0.010 −0.000 0.000 −0.001 kmKm 0.048 0.002 0.001 0.029 0.101 −0.286 0.936 −0.170 kwash 0.263 −0.059 −0.003 0.109 −0.367 0.840 0.267 −0.068 0 cunc,0 0.115 0.024 −0.004 −0.983 −0.128 0.023 0.047 0.006 Eigenvalue −1.64e7 −2.92e4 7.66e3 5.31e5 7.59e6 3.60e7 3.24e8 1.24e9

Table 10.4: Hessian analysis for the parameter estimates of the revised VSV/BHK-21 focal infection model. Parameters are estimated for the log10 transformation of the parameters. Negative eigenvalues are likely due to error in the finite difference approximation used to calculate the Hessian. correspond roughly to those of the original model. The modeling process gives insight into the key biological and experimental phenom- ena giving rise to the observed experimental measurements. First, manipulation of the initial concentration of uninfected cells within the radius of the initial inoculum accounts for the am- plification of the intensity in the first three images of the time-series data. This effect has two possible causes: either cells are damaged or removed when a hole is removed from agar at the initiation of the experiment, or uninfected cells but not infected cells continue to grow dur- ing the first portion of the experiment. Second, the infection spread is well characterized by considering only extracellular species in the model development. We could have incorporated intracellular infection events (transcription, translation, replication, and assembly of virus) into the model description, but the additional parameters necessary for this model would not be justifiable for the given experimental data.

10.3 Propagation of VSV on DBT Cells

We now consider propagation of VSV on murine astrocytoma (DBT) cells. The first column of images in Figure 10.5 presents a representative time course for the experiment; the full set of experimental images are available in the appendix. For this virus/host system, the images demonstrate three prominent features: (1) the infection propagates unimpeded outward radi- ally for the first three images, (2) the intensity of the measurement amplifies from the first to the third measurement, and (3) the infection spread is halted after the third image and the in- tensity of the measurement diminishes. This particular cell line is known to have an antiviral strategy, namely the interferon signaling pathway. We now consider models to quantitatively capture all of these features. 180

Reaction- Time Segregated Segregated Data Diffusion (hours) Model, Fit 1 Model, Fit 2 Model 7

27

48

72

96

Figure 10.5: Comparison of representative experimental images to model fits for VSV prop- agation on DBT cells. The white scale bar in the upper left-hand corner of the experimental images is one millimeter.

10.3.1 Refinement of the Reaction-Diffusion Model

We refine the reaction-diffusion model proposed in the previous section to model this infec- tion. In addition to the extracellular species considered previously (virus, uninfected cells, infected cells, and dead cells), we also model interferon (without any distinction between the types α, β, and γ) and inoculated cells. Both virus and interferon are permitted to diffuse. We 181 account for the following reactions:

k virus + uninfected cell −→1 infected cell (10.7a) k infected cell −→2 Y virus + dead cell (10.7b) k infected cell −→3 infected cell + interferon (10.7c) k uninfected cell + interferon −→4 inoculated cell (10.7d) k inoculated cell −→5 inoculated cell + interferon (10.7e) k inoculated cell + virus −→1 inoculated cell (10.7f) k infected cell + virus −→1 infected cell (10.7g)

This reaction mechanism makes the following assumptions:

1. interferon binds to uninfected cells to form inoculated cells that are resistant to viral infection,

2. super-infection of infected cells does not alter the yield of virus per infected cell, and

3. virus binds indiscriminately to uninfected, infected, and inoculated cells.

We again assume that the infection propagation is radially symmetric. The concentrations of all species are then segregated by both time and radial distance, giving rise to the following governing equations for the model:

∂c 1 ∂  ∂c  vir = Deff r vir + R (10.8a) ∂t r ∂r vir ∂r vir ∂c 1 ∂  ∂c  ifn = Deffr ifn + R (10.8b) ∂t r ∂r ifn ∂r ifn ∂c ∂c unc = R , infc = R (10.8c) ∂t unc ∂t infc ∂c ∂c inoc = R , dc = R (10.8d) ∂t inoc ∂t dc 1 − φ 1 − φ Deff = 2D ,Deff = 2D (10.8e) vir vir 2 + φ ifn ifn 2 + φ

φ = Ve(cunc + cinfc + cinoc) (10.8f)

dcvir dcifn = 0, = 0 (10.8g) dr dr r=0,rmax r=0,rmax

ci(t = 0, r) known (10.8h) in which the reaction terms (e.g. Rvir) are dictated by the stoichiometry of reaction (10.7) assuming that the reactions are elementary as written. Additionally, the initial images of the infection indicate a ring-like pattern in the intensity. We account for this phenomenon by esti- 0 0 mating two parameters, cunc,1 and cunc,2, that determine the shape of the initial radial profile 182 for the uninfected cell concentration, i.e.

 0  cunc,2, r < 0.025 cm  20(c0 −c0 )  0 unc,2 unc,1 cunc,2 − cm (r − 0.025), 0.025 cm ≤ r < 0.075 cm  0 cunc(t = 0, r) = cunc,1, 0.075 ≤ r < 0.1 cm (10.9) 20(c −c0 )  unc,0 unc,1  cunc,0 − (r − 0.025) 0.1 cm, ≤ r < 0.15 cm  cm  cunc,0, r > 0.15 cm

We estimate the optimal parameters using the same spatial discretization and nonlinear opti- mization as in the previous section. The second column of images in Figure 10.5 present the optimal fits for this model. In comparison to the experimentally obtained images, this model is able to capture quantitatively the radial propagation of the infection front. However, the fit only qualitatively captures the increase and decrease in the intensity of the experimental data. To better quantitatively cap- ture the temporal changes in this intensity, we propose incorporating the life cycle of infected cells. We therefore segregate the infected cell population by the age of infection τ, and model the intracellular production rates of virus and interferon using first-order plus time delay ex- pressions, i.e.

rvir(τ) = Kvir [1 − exp (−kvir(τ − dvir))] (10.10)

rifn(τ) = Kifn [1 − exp (−kifn(τ − difn))] (10.11)

We also assume that infected cells cannot live longer than age τd, at which point these cells die. This model requires fitting of four more parameters than the reaction-diffusion model (seven additional parameters are required for the first-order plus time delay description, but this description obviates the need for the virus yield Y and the rate constants k2 and k3). The considered reactions now become:

k virus + uninfected cell −→1 infected cell (10.12a) infected cell −→ virus (age dependent) (10.12b) infected cell −→ infected cell + interferon (age dependent) (10.12c) k uninfected cell + interferon −→4 inoculated cell (10.12d) k inoculated cell −→5 inoculated cell + interferon (10.12e) k inoculated cell + virus −→1 inoculated cell (10.12f) k infected cell + virus −→1 infected cell (all ages) (10.12g) 183

The model equations are then the following set of coupled integro-partial differential equa- tions

  Z τd ∂cvir 1 ∂ eff ∂cvir = Dvirr + cinfc(τ)rvir(τ)dτ + Rvir (10.13a) ∂t r ∂r ∂r 0   Z τd ∂cifn 1 ∂ eff ∂cifn = Difnr + cinfc(τ)rifn(τ)dτ + Rifn (10.13b) ∂t r ∂r ∂r 0 ∂c ∂c ∂c unc = R , infc + infc = R (10.13c) ∂t unc ∂t ∂τ infc ∂c ∂c inoc = R , dc = R (10.13d) ∂t inoc ∂t dc 1 − φ 1 − φ Deff = 2D ,Deff = 2D (10.13e) vir vir 2 + φ ifn ifn 2 + φ Z τd φ = Ve(cunc + cinfcdτ + cinoc) (10.13f) 0

dcvir dcifn = 0, = 0 (10.13g) dr dr r=0,rmax r=0,rmax

dcinfc dcinfc = k1c c , = 0 (10.13h) dτ vir unc dτ τ=0 τ=τd

ci(t = 0, r) known (10.13i)

We discretize the age dimension using orthogonal collocation on Lagrange polynomials [155] with seventeen points, and use the same spatial discretization scheme as in the reaction- diffusion model. The third and fourth columns of images in Figure 10.5 present the optimal fits for this model. In comparison to the experimentally obtained images, this model is able to capture quantitatively both the radial propagation of the infection front and the changes in the inten- sity of the experimental data. The optimization also yields two sets of parameters with similar fits and similar values of the objective function, but different values for the parameters. Most overtly different are the estimates for the intracellular production rates of virus and interferon, which suggest two different mechanisms for up-regulation of the interferon pathway. These production rates are presented in Figure 10.6. In the first fit, the estimated maximum age of infected cells is roughly 26 hours, and the production of interferon lags significantly after the production of interferon. For the second fit, the estimated maximum age of infected cells is only roughly 17 hours, and the production of interferon closely precedes the virus production. Additionally, the production rates in the second fit are approximately an order of magnitude lower than the production rates in the first fit.

10.3.2 Discussion

The models provide estimates for key parameters in the viral infection and host response. In this case, the three model fits only predict similar parameter values for the background fluo- 184

200 180 (a) 160 140 120 100 80 60 40 Production Rate (#/hour) 20 Virus Interferon 0 50 1510 302520 Infection Age (hours) 35 (b) 30

25

20

15

10

Production Rate (#/hour) 5 Virus Interferon 0 0 2 4 86 10 1412 1816 Infection Age (hours)

Figure 10.6: Comparison of intracellular production rates of virus and interferon for the seg- regated model of VSV propagation on DBT cells.

rescence ibgd and the viral diffusivity Dvir. The remaining parameters are generally different by at least an order of magnitude. Ware et al. [160] estimate the diffusivity of the VSV virion to be 2.326 × 10−8 cm2/sec. 2 Converting this value to cm /hr and taking the log10 yields a value of −4.08. The estimated values of this diffusivity, Dvir in Table 10.5, are all within an order of magnitude of this value. Porterfield et al. [102] and Nichol and Deutsch [96] estimate the diffusivity of γ-interferon to be 7.4 × 10−7 and 4.1 × 10−7 cm2/sec, respectively. Converting these values to cm2/hr and 185

Reaction-Diffusion Segregated Fit 1 Segregated Fit 2

Parameter Units log10 Value log10 Value log10 Value −1 k1 hr −12.103 −9.816 −10.159 3 k2 cm /hr 4.941 NA NA 3 k3 cm /hr −8.258 NA NA 3 k4 cm /hr −8.181 −8.334 −11.890 3 k5 cm /hr 0.637 0.752 3.717 2 Dvir cm /hr −3.737 −3.406 −3.445 2 Difn cm /hr −2.938 −2.981 −0.990 Y 3.841 NA NA

ibgd 1.577 1.573 1.571 km/Km −17.130 −16.571 −15.868

kwash −6.131 −0.785 −0.942 0 −3 cunc,1 cm 7.466 6.529 6.443 0 −3 cunc,2 cm 7.602 7.426 7.428 −1 kvir hr NA −0.197 0.727 −1 kifn hr NA −0.387 −0.838 −1 Kvir hr NA 2.021 1.479 −1 Kifn hr NA 2.304 1.434 −1 dvir hr NA 0.834 0.619 −1 difn hr NA 1.283 0.583 τd hr NA 1.416 1.228 Φ 7.63 × 105 6.35 × 105 6.30 × 105

Table 10.5: Parameter estimates for the VSV/DBT focal infection models. Parameters are es- timated for the log10 transformation of the parameters. NA denotes that the parameter is not applicable for the given model.

taking the log10 yields values of −2.83 to −2.57, respectively. These values have the same order of magnitude as the fits for the reaction-diffusion model and the first segregated fit. The sec- ond segregated fit predicts the diffusivity of interferon to be roughly two orders of magnitude greater than either of the previously reported values. The infection spread is not well characterized by considering only extracellular species in the model development. Incorporation of simple first-order plus time delay expressions for the production rates of virus and interferon leads to significantly improved quantitative prediction of the given experimental data (roughly a 17% decrease in the objective function Φ via the addition of four parameters). Additionally, the model fits suggest two different possible mechanisms for production of both virus and interferon. For VSV infection of Krebs- 2 carcinoma cells [158] and mouse L cells [161], experimental studies place the first detectable amount of interferon between four and eight hours, respectively. These results suggest that the second segregated fit is more realistic than the first segregated fit. 186

log10 Eigenvector Parameter v1 v2 v3 v4 v5 v6

k1 −0.499 −0.229 −0.561 0.045 0.153

k2 0.289 −0.591 −0.142 −0.165 0.162

k3 −0.026 0.033 −0.016 −0.946 0.067

k4 0.303 −0.223 −0.302 0.070 0.144

k5 0.053 −0.019 −0.190 −0.016 −0.058

Dvir 0.584 −0.160 −0.281 0.096 −0.199

Difn −0.189 −0.677 0.561 0.003 −0.092 Y 0.423 0.180 0.301 −0.072 0.005

ibgd −0.054 −0.033 0.018 0.005 km/Km −0.053 −0.154 0.026 0.044 −0.118

kwash 1 0 cunc,1 −0.086 −0.061 0.009 0.069 −0.656 0 cunc,2 0.038 0.008 0.224 0.219 0.653 Eigenvalue −2.83e7 −6.81e6 −4.80e6 −2.66e6 −3.37e5 1.02

log10 Eigenvector Parameter v7 v8 v9 v10 v11 v12 v13

k1 0.086 0.067 −0.097 0.005 −0.142 −0.372 0.419

k2 −0.468 −0.449 0.154 −0.191 0.097 −0.017 0.013

k3 0.302 −0.033 −0.027 0.061 0.009 0.008 −0.010

k4 0.047 0.119 −0.243 0.767 −0.082 0.171 −0.199

k5 0.032 0.002 0.019 −0.297 −0.862 0.242 −0.254

Dvir 0.498 0.243 −0.032 −0.363 0.192 −0.083 0.141

Difn 0.145 0.316 −0.207 −0.015 −0.114 −0.054 −0.027 Y −0.191 0.018 −0.036 0.196 −0.392 −0.393 0.553

ibgd −0.013 −0.015 −0.126 −0.028 0.035 0.774 0.614 km/Km 0.215 0.052 0.899 0.274 −0.077 0.067 0.110

kwash 0 cunc,1 0.244 −0.645 −0.187 0.182 −0.052 −0.066 0.040 0 cunc,2 0.515 −0.445 −0.020 −0.037 −0.072 −0.016 0.009 Eigenvalue 1.89e6 6.81e6 2.09e7 3.61e7 1.22e8 4.15e8 1.27e9

Table 10.6: Hessian analysis for the parameter estimates of the reaction-diffusion VSV/DBT focal infection model. Parameters are estimated for the log10 transformation of the parameters. Negative eigenvalues are likely due to error in the finite difference approximation used to calculate the Hessian. Unreported values denote that the contribution of the parameter to the eigenvector is less than 5 × 10−4.

Table 10.6 presents the Hessian analysis for the reaction-diffusion model. This anal- ysis indicates that roughly five linear combinations of parameters cannot be estimated from the experimental data. However, Figure 10.5 demonstrates that this model is not capable of capturing the infection dynamics, particularly the magnitude of the intensity. Tables 10.7 and 10.8 present the Hessian analysis of the objective function Φ for the seg- regated model fits. Roughly five linear combinations of parameters yield negative eigenvalues for both fits, indicating that these parameter combinations cannot be estimated from the exper- imental data. This analysis indicates that the experimental measurements are not informative 187

log10 Eigenvector Parameter v1 v2 v3 v4 v5 v6 v7 v8 v9

k1 0.635 −0.038 0.317 −0.239 −0.187 −0.097 −0.158

k4 −0.707

k5 −0.081 0.090 −0.322 0.047 −0.397 0.017 −0.643

Dvir −0.707

Difn −0.042 −0.124 0.402 −0.076 0.576 0.062 0.019

ibgd 0.014 0.006 0.003 −0.002 −0.001 −0.001 0.003 km/Km 0.376 0.497 −0.055 0.005 0.006 −0.021 0.174

kwash −0.025 −0.141 0.076 −0.042 −0.010 −0.002 −0.152 func,1 −0.044 −0.072 −0.070 0.172 0.060 −0.970 0.046

func,2 −0.105 −0.048 −0.047 0.033 0.027 0.075 −0.026

kvir 0.421 −0.468 −0.532 0.442 0.227 0.161 0.124

kifn −0.059 −0.079 0.419 0.571 −0.540 0.083 0.362 Kvir −0.216 −0.588 0.197 −0.111 −0.099 −0.009 −0.209

Kifn −0.059 −0.233 −0.297 −0.595 −0.318 −0.029 0.541 dvir 0.448 −0.270 0.174 −0.113 −0.120 −0.061 −0.143

difn 0.707 τd −0.707 Eigenvalue −2.27e14 −7.08e13 −1.67e5 −6.18e4 −2.61e3 4.32e3 1.20e4 2.07e4 7.33e4

log10 Eigenvector Parameter v10 v11 v12 v13 v14 v15 v16 v17

k1 0.018 −0.116 0.066 −0.511 −0.281 0.086

k4 −0.707

k5 −0.436 −0.333 −0.030 0.039 0.044 −0.021

Dvir 0.707

Difn −0.528 −0.436 0.002 0.047 0.074 −0.031

ibgd −0.003 −0.004 −0.013 −0.025 0.369 0.928 km/Km −0.076 −0.136 0.611 −0.389 0.162

kwash 0.056 0.058 0.948 0.205 0.012 0.015 func,1 −0.026 −0.096 0.014 −0.005 0.002

func,2 0.639 −0.753 −0.002 0.012 −0.010 0.005

kvir −0.105 −0.044 0.049 −0.058 −0.099 0.038

kifn −0.169 −0.156 0.048 0.008 0.034 −0.012 Kvir 0.052 0.113 −0.227 0.328 −0.527 0.221

Kifn −0.227 −0.211 0.067 0.011 0.047 −0.016 dvir 0.130 0.042 −0.185 0.454 0.577 −0.226

difn 0.707 τd 0.707 Eigenvalue 3.19e5 4.93e5 6.32e5 1.44e7 9.39e7 8.87e8 7.08e13 2.27e14

Table 10.7: Hessian analysis for the parameter estimates of the first segregated VSV/DBT focal infection model. Parameters are estimated for the log10 transformation of the parameters. Negative eigenvalues are likely due to error in the finite difference approximation used to calculate the Hessian. Unreported values denote that the contribution of the parameter to the eigenvector is less than 5 × 10−4. 188

log10 Eigenvector Parameter v1 v2 v3 v4 v5 v6 v7 v8 v9

k1 0.074 0.412 0.363 −0.016 0.001 0.050 −0.015 −0.023 0.019

k4 0.030 −0.404 0.420 −0.495 0.048 0.063 0.022 0.002 0.005

k5 −0.039 0.147 −0.163 0.537 −0.192 0.285 −0.023 −0.022 0.145

Dvir 0.086 −0.097 0.204 −0.003 −0.010 0.046 0.016 0.005 −0.014

Difn 0.409 −0.022 −0.048 0.000 −0.001 −0.003 −0.002 0.033 0.001

ibgd −0.003 0.016 −0.009 −0.006 0.000 −0.001 0.001 km/Km −0.050 0.553 −0.224 −0.440 0.027 0.006 0.078 0.007 −0.056

kwash 0.011 0.081 0.140 0.323 −0.033 0.018 −0.076 −0.035 −0.059 func,1 0.005 −0.036 0.018 0.121 0.013 −0.094 0.982 0.031 −0.060

func,2 0.001 −0.028 0.010 0.100 −0.027 0.029 −0.069 −0.003 −0.980

kvir 0.411 −0.058 −0.216 −0.014 0.466 0.278 0.038 −0.670 −0.008

kifn 0.689 −0.014 −0.242 −0.024 −0.003 −0.121 −0.036 0.550 −0.005 Kvir 0.157 −0.320 0.230 0.265 −0.033 0.029 −0.028 0.017 0.070

Kifn −0.213 0.066 0.090 0.199 0.859 −0.110 −0.053 0.348 0.009 dvir 0.294 0.432 0.581 0.063 −0.012 0.127 0.036 −0.025 −0.000

difn −0.105 −0.070 −0.056 −0.125 0.025 0.881 0.086 0.347 −0.019 τd 0.024 −0.130 0.204 0.107 −0.002 0.014 −0.025 −0.010 0.017 Eigenvalue −5.04e5 −2.39e5 −1.38e5 −5.37e4 −1.35e2 9.32e3 1.21e4 2.66e5 1.08e5 Eigenvector

log10 Parameter v10 v11 v12 v13 v14 v15 v16 v17 k1 −0.268 −0.379 0.306 0.098 −0.289 0.494 0.177 0.095

k4 −0.149 −0.067 −0.146 −0.037 −0.475 −0.186 −0.280 −0.134

k5 0.143 −0.293 −0.269 −0.060 −0.487 −0.157 −0.247 −0.116

Dvir 0.103 −0.145 −0.733 −0.246 0.262 0.474 0.060 0.101

Difn 0.014 0.013 −0.255 0.873 0.003 −0.000 0.000

ibgd 0.001 0.003 0.052 0.017 0.008 −0.033 −0.472 0.878 km/Km 0.093 0.220 −0.330 −0.075 −0.317 −0.190 0.313 0.168

kwash −0.754 0.483 −0.211 −0.051 −0.022 −0.075 −0.002 0.011 func,1 −0.056 −0.010 0.011 0.002 −0.028 0.001 −0.001 0.000

func,2 0.084 −0.069 0.010 0.004 −0.090 −0.008 −0.016 −0.005

kvir −0.059 −0.031 0.059 −0.160 −0.012 0.012 0.010 0.006

kifn −0.125 −0.089 0.082 −0.330 −0.062 −0.020 −0.033 −0.015 Kvir 0.372 0.436 0.124 −0.045 −0.373 0.129 0.424 0.239

Kifn 0.065 −0.020 −0.088 0.068 −0.080 −0.026 −0.042 −0.019 dvir 0.312 0.179 0.056 −0.085 0.267 −0.287 −0.228 −0.139

difn −0.064 0.080 0.095 0.062 0.153 0.051 0.082 0.037 τd −0.149 −0.465 −0.076 −0.017 0.166 −0.569 0.511 0.263 Eigenvalue 1.85e5 4.74e5 2.09e6 2.74e6 8.68e6 5.20e7 1.37e8 9.60e8

Table 10.8: Hessian analysis for the parameter estimates of the second segregated VSV/DBT focal infection model. Parameters are estimated for the log10 transformation of the parameters. Negative eigenvalues are likely due to error in the finite difference approximation used to calculate the Hessian. Unreported values denote that the contribution of the parameter to the eigenvector is less than 5 × 10−4. 189

Time Segregated Model, Fit Segregated Model, Fit Data (hours) 1 2

24

48

96

144

Figure 10.7: Comparison of representative experimental images to model predictions for VSV propagation on DBT cells in the presence of interferon inhibitors. The white scale bar in the lower left-hand corner of the experimental images is one millimeter. enough to distinguish between these different mechanisms.

10.3.3 Model Prediction: Infection Propagation in the Presence of Interferon In- hibitors

To validate the model, we compare model predictions of the infection propagation in the pres- ence of interferon inhibitors to experimentally-obtained images. We assume that the dosing of interferon inhibitor is sufficiently large to completely inhibit production of interferon. Accord- ingly, we set the constants k5 and K2 corresponding to interferon production from inoculated cells and the production rate of interferon in infected cells to zero. Figure 10.7 compares the results for the experimental data with the segregated model predictions. In both cases, the models over-predict the radial propagation of the infection front for the latter two time points. Additionally, the first segregated model predicts even farther propagation of the infection front than the second segregated model. The most likely expla- nation for the deviations between the data and predictions is that the dosing of the interferon inhibitor is not large enough to completely eliminate the host antiviral response. 190 10.4 Conclusions

We have used quantitative models to investigate the dynamics of multiple rounds of viral infection and host antiviral response for the focal infection system. For the VSV/BHK virus- host system, extracellular models capture the salient features contained in the measurements, namely unimpeded radial propagation of the infection front as well as amplification of the signal in the initial data points. The model suggests that an initial condition effect for the uninfected cell concentration is necessary to capture the latter feature of the data. This effect may result from the experimental technique used to initiate the viral infection. For the VSV/DBT virus-host system, the data initially behaves similarly to the VSV/BHK system (outward radial propagation of the infection front and amplification of the signal), but then the infection front stagnates and the signal strength diminishes. This stagnation occurs due to the host antiviral mechanism of interferon signaling. The proposed extracellular model is not capable of capable of quantitatively capturing the measurement dynamics. Refining the model by introducing an age segregation significantly improves the data fit. Here, we use simple first-order plus time delay dynamics to model both the production rates of interferon and virus. Consequently, the model fit suggests a rough estimate for intracellular production rates of these species. However, the data are not informative enough to uniquely determine all of the parameters in the model as evidenced by both the Hessian analysis and the fact that two sets of parameters fit the data equally well. We also compared segregated model predictions with no interferon production to ex- periments of the VSV/DBT system dosed with interferon inhibitors. The model predictions overestimated the radial propagation of the infection front. This over-prediction likely results from incomplete inhibition of interferon production. This work serves as a first step in providing a quantitative understanding of multiple rounds of both viral infection and host antiviral response. Also, comparing model predictions to experimental measurements requires modeling of both the underlying biology of the system and the experimental procedure. Additional experimental measurements such as microarray data or using reporter genes to detect interferon up-regulation should provide further con- straints to the developed model and necessitate future model modification. We expect future iterations of additional experiments, measurements, and modeling to elucidate an even better comprehensive understanding of both viral infections and cell-cell signaling.

Notation

cj concentration of species j 0 cunc initial concentration of uninfected cells in the radius of the initial inoculum for the VSV/BHK- 21 fit 0 cunc,1 initial concentration of uninfected cells in the first radial region of the initial inoculum for the VSV/DBT fit 0 cunc,2 initial concentration of uninfected cells in the second radial region of the initial inoculum 191

for the VSV/DBT fit

Difn interferon diffusivity eff Difn effective interferon diffusivity Dvir virus diffusivity eff Dvir effective virus diffusivity dj time delay for reaction j e error vector

h(xk; θ) model prediction vector of the measurement

ibgd background fluorescence Kj rate constant for reaction j

Km equilibrium constant for the measurement

kj rate constant for reaction j

km conversion constant from virus-host concentration to intensity

kwash fraction of dead cells removed during the measurement process

nunc,0 initial number of uninfected cells

nvir,0 number of viruses in the initial inoculum R weighting matrix for parameter estimation

Rj production rate of species j

rj intracellular production rate of species j r radial dimension

rplate radius of the plate t time

Vc cell volume

vmin minimum detectable virus-host concentration

vmax maximum detectable virus-host concentration x state vector Y virus yield per infected cell y measurement vector

ym intensity measurement Φ objective function value for parameter estimation φ correction to the diffusivity for hindered diffusion τ age of infection

τd maximum age of infection θ vector of model parameters

Subscripts

dc dead cell ifn interferon infc infected cell inoc inoculated cell unc uninfected cell 192

vir virus vir-host virus-host complex 193 10.5 Appendix

18 hours 30 hours 48 hours 72 hours 90 hours

Figure 10.8: Experimental (averaged) images obtained from the dynamic propagation of VSV on BHK-21 cells. The white scale bar in the lower left-hand corner of the experimental images is one millimeter. 194

7 hours 27 hours 48 hours 72 hours 96 hours

Figure 10.9: Experimental (averaged) images obtained from the dynamic propagation of VSV on DBT cells. The white scale bar in the lower left-hand corner of the experimental images is one millimeter. 195

Chapter 11

Multi-level Dynamics of Viral Infections

One of the simplest, yet most intriguing biological organisms is the virus. The virus contains enough genetic information to replicate itself given the machinery of a living host. So powerful is this strategy that viral infections are at once a threat to and a hope for human survival. Ac- cording to the Joint United Nations Programme on HIV/AIDS (UNAIDS) in 2002, 42 million people were living with human immunodeficiency virus (HIV), 5 million people were newly infected with HIV, and 3.1 million people died due to acquired immune deficiency syndrome (AIDS) related illnesses. At the same time, viruses show promise in anti-tumor therapies as oncolytic agents [9] and as delivery vehicles for gene therapy [95]. The common thread be- tween these two examples is that controlling the propagation of virus spread is essential, and doing so first requires understanding of how viruses propagate. Mathematical models offer one means of quantitatively understanding how viruses propagate, and how to best control this propagation. In particular, models can serve as a beneficial tool in proposing, identifying, and distinguishing between key biological and experimental phenomena contained in data. Most mathematical models for viral infections have focused exclusively on events in ei- ther the intracellular or extracellular level. At the intracellular level, kinetic models have been applied to examine the dynamics of how viruses harness host cells to replicate more virus [73, 27, 29, 3], and how drugs targeting specific virus components affect this replication [122, 30]. These models, however, consider only one infection cycle, whereas infections commonly con- sist of numerous infection cycles. At the extracellular level, researchers have considered how drug therapies affect the dynamics of populations of viruses [164, 62, 98, 13, 100]. These mod- els, though, neglect the fact that these drugs target specific intracellular viral components. To more realistically model these infections, we recently proposed incorporating both levels of information into the description in a deterministic setting via cell population balances [60]. In this chapter, we consider a limiting case of this general model in which information flows unidirectionally from the intracellular level to the extracellular level. In this case, it is possible to decouple the intracellular and extracellular levels such that one can first solve the equations governing the intracellular description of the model, then use these results to solve the extracellular description of the model. We first briefly review the general cell population 196 balance modeling approach for viral infections. We then introduce the idea of decoupling the intracellular and extracellular descriptions. Two motivating examples illustrate the efficacy of this technique. Finally, we discuss the results and present conclusions.

11.1 Modeling Framework

We consider population balance models containing an arbitrary number of internal segrega- tions. One can readily extend these models to include external (i.e., spatial) segregations as is considered in the second example presented in this chapter. The resulting segregated model is then ∂η(t, y) + ∇ · (η(t, y)v ) = R (11.1a) ∂t y η i ∂cj(t, y) + ∇ · ci (t, y)v  = R + E j = 1, . . . , n (11.1b) ∂t j y j j Z ∂ck = Ek + η(t, y)Rk(t, y)dy k = 1, . . . , m (11.1c) ∂t y

i in which η(t, y)dy, cj(t, y)dy, and ck are the concentrations of infected cells, intracellular com- ponents, and extracellular components respectively; y is a vector of all the internal segrega- tions; vy is the velocity vector for each of the y components; and Rj and Ej are the intracellular and extracellular reaction rates for species j, respectively. We focus our attention on the intracellular reaction set, i.e., equation (11.1b). If we can remove the time dependence for this set of equations, then the production rate term Rk(t, y) also becomes time independent. In this case, the intracellular reactions

i  ∇ · cj(t, y)vy = Rj + Ej j = 1, . . . , n (11.2) effectively decouple from the extracellular reactions and population balance

∂η(t, y) + ∇ · (η(t, y)v ) = R (11.3a) ∂t y η Z ∂ck = Ek + η(t, y)Rk(y)dy k = 1, . . . , m (11.3b) ∂t y

Consequently, we may first solve the intracellular equations (11.2) to determine the now time-independent production rate term Rk(y), then use this term to solve the remaining equa- tions (11.3). The primary benefit of this decomposition is the potential for significant reduc- tions in both computational expense and the complexity of the resulting systems of equations. We illustrate these claims in the examples. What then are the biological assumptions that we must make to validate this decompo- sition? The decomposition clearly requires that the time-dependent extracellular description have little or no interaction with the intracellular events. The most restrictive assumption, 197 then, is that each host cell is infected by identically the same virus (i.e., identical initial condi- tions for each infected cell), and that the infected cell may affect the extracellular environment but not vice versa. A less restrictive assumption would permit variation in the initial con- dition, but at the expense of requiring more substantial simulation of the intracellular equa- tions (11.2). Accounting for more extensive interaction from the extracellular to intracellular descriptions, such as super-infection of infected cells, requires solving the full model (11.1).

11.2 Examples

In this section, we consider two examples that illustrate the efficiency of the proposed de- composition. First, we re-examine the model previously presented by Haseltine, Rawlings, and Yin [60]. Then we develop a multi-level model describing the focal infection of murine astrocytoma (DBT) cells by vesicular stomatitis virus (VSV) [77].

11.2.1 Initial Infection for a Generic Viral Infection

We reconsider the initial infection example of Haseltine, Rawlings, and Yin [60]. This model considers intracellular species of genomic (gen) and template (tem) viral nucleic acids respec- tively, viral structural protein (str), and viral enzymes V1 and V2. Intracellular reactions in- clude:

i k1 i i i nucleotides + gen −→ tem a1 = k1 cV1 cgen (11.4a) V1 i k2 i i i amino acids −→ str a2 = k2 cV2 ctem (11.4b) V2, tem i k3 i i nucleotides −→ gen a3 = k3 ctem (11.4c) tem i k4 i i str −→ degraded a4 = k4 cstr (11.4d) i k5 i i i gen + str −→ secreted virus a5 = k5 cgencstr (11.4e)

Reaction rates are given by the aj expressions. These events account for the insertion of the viral genome into the host nucleus, production of a viral template used to replicate the viral genome and mass-produce viral structural protein, and the assembly and secretion of viral progeny. We assume that host nucleotides and amino acids are available at constant concen- trations. Extracellularly, the model tracks uninfected host cells (unc), infected host cells (infc), 198

Parameter Value Units τd 100 days i −4 k1 3.13 × 10 cell/(#-day) i k2 25.0 cell/(#-day) i −1 k3 0.7 day i −1 k4 2.0 day i −6 k5 7.5 × 10 cell/(#-day) −9 k6 5.0 × 10 host/(#-day) −1 k7 1.0 day −2 −1 k8 5.0 × 10 day −2 −1 k9 1.0 × 10 day 6 k10 1.0 × 10 #/(host-day) 8 cunc(t = 0) 10 #/host cvir(t = 0) 1000 #/host i cgen(τ = 0) 1 #/cell ci (τ = 0) 80 V1 #/cell ci (τ = 0) 40 V2 #/cell

Table 11.1: Model parameters for the initial infection simulation.

and virus (vir) for the reactions

k6 virus + uninfected cell −→ infected cell a6 = k6 cvircunc (11.5a)

k7 virus −→ degraded a7 = k7 cvir (11.5b)

k8 infected cell −→ death a8 = k8 cinfc (11.5c)

k9 uninfected cell −→ death a9 = k9 cunc (11.5d)

k10 precursors −→ uninfected cell a10 = k10 (11.5e)

These events address the intuitive notions of cell growth, death, and infection by free virus. In this description, infected cells are segregated by the age of infection τ, and it is assumed that 199 all infected cells die by the maximum age τd. The model equations for this system are then ∂η(t, τ) ∂η(t, τ) + = k e e δ(τ) − (k + δ(τ − τ )) η(t, τ) (11.6a) ∂t ∂τ 6 unc vir 8 d ∂i (t, τ) ∂i (t, τ) tem + tem = R (11.6b) ∂t ∂τ tem ∂igen(t, τ) ∂igen(t, τ) + = R + δ(τ) (11.6c) ∂t ∂τ gen ∂i (t, τ) ∂i (t, τ) str + str = R (11.6d) ∂t ∂τ str ∂iV (t, τ) ∂iV (t, τ) 1 + 1 = 80δ(τ) (11.6e) ∂t ∂τ ∂iV (t, τ) ∂iV (t, τ) 2 + 2 = 40δ(τ) (11.6f) ∂t ∂τ de unc = k − k e − k e e (11.6g) dt 10 9 unc 6 unc vir Z τd devir = −k7evir − k6euncevir + η(t, τ)Rvir(τ)dτ (11.6h) dt 0 Table 11.1 presents the initial conditions and rate constants used for this simulation. These parameters are the same as those used in Chapter 10. For this example, the intracellular events decouple from the population balance (11.6a) in equation (11.6) due to the fact that the governing equations for the intracellular species do not depend on time. Consequently, we can first solve for the intracellular age distribution, i.e. ∂i (τ) tem = R (11.7a) ∂τ tem ∂igen(τ) = R + δ(τ) (11.7b) ∂τ gen ∂i (τ) str = R (11.7c) ∂τ str ∂iV (τ) 1 = 80δ(τ) (11.7d) ∂τ ∂iV (τ) 2 = 40δ(τ) (11.7e) ∂τ and subsequently for the population-level dynamics ∂η(t, τ) ∂η(t, τ) + = k e e δ(τ) − (k + δ(τ − τ )) η(t, τ) (11.8a) ∂t ∂τ 6 unc vir 8 d de unc = k − k e − k e e (11.8b) dt 10 9 unc 6 unc vir Z τd devir = −k7evir − k6euncevir + η(t, τ)Rvir(τ)dτ (11.8c) dt 0 i i i Rvir(τ) = k5cgen(τ)cstr(τ) (11.8d) 200

To solve the coupled sets of integro-partial differential equations, we use orthogonal collocation of Lagrange polynomials on finite elements [155]. When t < τd, we use a single element with the transformation τ τ = t For this case, the population balance becomes, by application of the chain rule,

∂η(t, τ) ∂η(t, τ) ∂η(t, τ) ∂η(t, τ) ∂τ ∂η(t, τ) ∂τ + = + + (11.9) ∂t ∂τ ∂t ∂τ ∂t ∂τ ∂τ ∂η(t, τ) 1 − τ ∂η(t, τ) = + (11.10) ∂t t ∂τ

Similar expressions can be derived for each of the segregated species. When t > τd, we use two elements with the transformations

τ − t + τd τ 1 = τmin − t + τd τ τ 2 = t − τmin

Transformation of the population balance and segregated intracellular species follows simi- larly as before by application of the chain rule. Also, continuity between elements is enforced, i.e.

τ 1| = τ 2| τmin τmin We use twenty-five collocation points for each finite element. The discretized PDE yields a system of differential-algebraic equations, which we solve using the package DASKR, a variant of the predictor-corrector solver DASPK [15]. For the decoupled system, we first solve for the intracellular reactions, i.e., equations (11.7), over the age range [0, τd] using one hundred evenly-incremented time points. Virus production rates required by the cell popu- lation balance, i.e., equation (11.8), are calculated from this intracellular information by linear interpolating between the time points. Figure 11.1 plots the full and decoupled model solutions for all extracellular species. The decoupled solution provides results indistinguishable from the full solution, but at roughly one third the computation expense (15.6 CPU seconds versus 45.2 CPU seconds on a 1.6 GHz Intel Centrino processor). The majority of the decrease in computational expense is directly attributable to the reduced size of the state vector; the decoupled solution requires only dis- cretization of the cell population balance, whereas the full solution also requires discretization of all intracellular species. Because the predictor-corrector method employed by DASKR re- quires the solution of a Newton iteration, an operation that scales cubically with the size of the state, we expect more dramatic decreases in computational expense for the decoupled solution as the number of intracellular species increases. Additionally, formulating the problem in this manner significantly reduces the potential for discretization problems due to stiffness in the intracellular components. 201

20

#/host) 18 (a) 7

− 16 10

× 14 12 uninfected cells 10 virus 8 6 4 infected cells 2 0

Extracellular Components ( 0 100 300200 400 600500 Time (Days) 0.006 (b) 0.004

0.002

0

-0.002 Percent Error -0.004 Virus -0.006 Uninfected Cells Infected Cells -0.008 0 100 300200 400 600500 Time (Days)

Figure 11.1: (a) Comparison of the full (lines) and decoupled (points) model solutions for the initial infection example. (b) Percent error for the decoupled model solution, assuming the full solution is exact.

11.2.2 VSV/DBT Focal Infection

In this section, we incorporate intracellular events corresponding to viral infection and subse- quent host-cell response for VSV infection of DBT cells. VSV is a member of the Rhabdoviridae family consisting of enveloped RNA viruses [129]. Its compact genome is only approximately 12 kb in length, and encodes genetic information for five proteins: nucleoprotein (N), phos- phoprotein (P), matrix (M), glycoprotein (G), and large protein (L) [129]. Recently, researchers

202



VSV 

¨

©





¤ ¤

¥ ¥

¨

©





¤ ¤

¥ ¥

¨

©





¤

¤ b

¥ ¥

¢

¨

£

©









¤ ¤

¥ ¥

¢

£

 a



b 



¤ ¤ ¥

¥ a

¢

£





¦ ¦

§ §





b b

¢

£





¦ ¦

§ § 

 b a

¢

£

¦ ¦ §

§ a a

¦ ¦ §

§ IFN

¡ b a

¡

¡

¡

α/β α/β RNP IFN- genes IFN- genes L dsRNA

mRNAM mRNAL

M L

RNP

! !

" "

#

! !

" "

Infected Cell # Uninfected Cell

! !

" "

#

! !

" "

#

! !

 







 











 







 





 







 





 







 



 

 Interferon signaling occurs more







 

  

 rapidly than virus propagation.



 







 

Figure 11.2: Schematic of modeled events for the infection of DBT cells by VSV. Infection of a host cell begins with insertion of the viral ribonucleoprotein (RNP) and polymerase (L). The polymerase reversibly binds to the RNP to form a double-stranded RNA complex (dsRNA), which serves as the template for viral transcription and replication. The model assumes that the matrix (M) and L proteins limit virus growth, and explicitly accounts for transcription and translation of these proteins. The M and L proteins combine with the RNP to form progeny viruses, which are secreted from infected cells. Additionally, the viral dsRNA induces up- regulation of the host interferon (IFN) genes, leading to production of IFN which is also se- creted from infected cells. The model allows secreted VSV and IFN to compete for uninfected cells: if VSV binds first, then the infection cycle starts again; if IFN binds first, up-regulation of host IFN genes leads to an inoculated state in which viral RNP is immediately degraded upon entry into the cell.

have begun investigating the potential of using VSV as an oncolytic agent for anti-tumor ther- apies (see [44], for example). However, the host antiviral strategy of interferon signaling can substantially limit the propagation of the infection. To maximize the therapeutic benefit of this agent, then, better understanding of multiple rounds of viral propagation and host an- tiviral response are needed. The focal infection system proposed by Duca et al. [26] provides one in vitro platform for investigating these dynamics. We have already considered several 203

b a a IFN a b b Extracellular

Intracellular

IRF mRNAIFN dsRNA

IRFP PRD

Figure 11.3: Detailed schematic of modeled events for the up-regulation of interferon (IFN) genes. Presence of viral double-stranded RNA (dsRNA) leads to the phosphorylation of the interferon-response factor (IRF to IRFP). IRFP reversibly binds to the protein regulatory domain (PRD) of the interferon gene to up-regulate synthesis of interferon messenger RNA (mRNAIFN). Translation of mRNAIFN produces interferon, which is secreted from the cell. simple dynamic models for this system given only measurements on an extracellular level in Chapter 10. Here we consider the potential for incorporating intracellular measurements by developing a model that contains intracellular structure and captures the spatial spread of the infection. For model development, we make the following assumptions for this system:

1. Virus replication is limited by large (L) and matrix (M) protein production.

2. Interferon genes are up-regulated by detection of the viral double-stranded ribonucleic acid (dsRNA) species.

3. Interferon up-regulation in inoculated cells occurs over a significantly faster time scale than events associated with viral infection (transcription, translation, and replication).

4. Antiviral mechanisms in inoculated cells destroy viral ribonucleoproteins (RNPs) imme- diately upon entry into the cell.

These assumptions greatly simplify the biological complexity of both viral replication and host antiviral response, as is discussed in greater detail in the discussion of intracellular events leading to viral replication and up-regulation of the interferon signaling pathway next. Such detail could readily be incorporated into future developments of this model.

Intracellular Viral Replication

For the intracellular model of viral replication, we start with the previous model development for this infection of Lim, Lang, and Yin (in preparation), using many of the same reaction expressions and rate constants. The primary differences between this previous work and the model derived here are that we track only the large and matrix proteins as opposed to all five 204

VSV proteins, and we use a more detailed model for assembly of the virus. Consequently, we incorporate the following steps leading to virus replication:

1. Virus binds to a host cell and inserts a single viral ribonucleoprotein (RNP) and fifty polymerases (L protein).

2. The polymerase (L) reversibly binds to the ribonucleoprotein (RNP) to form the RL species, which serves as the template for both transcription and replication.

3. For virus replication, we neglect formation of positive-strand RNA and assume that packaging of negative-strand RNA by nucleoprotein occurs instantaneously.

4. Transcription proceeds processively from the 5’ to the 3’ end of the ribonucleoprotein. Since the gene order for the studied VSV is N-P-M-G-L, transcription yields the messen- ger ribonucleoprotein for the matrix (mRNAM) first. At this point, the polymerase may either dissociate from the template (this polymerase-template complex is denoted RL1), or continue to transcribe the mRNA for the large protein (mRNAL).

5. The rate of translation depends solely on the concentrations of the mRNA species (con- centrations of amino acids and host ribosomes are not rate limiting).

6. Assembly of the virus capsid begins with construction of the matrix core. We model this construction as a polymerization-like process initiated by the fusion of two M proteins and sequential addition of M proteins. This sequential addition is approximated as a ten-

step process. Given the completed matrix core Mfull, assembly continues with packaging of first the ribonucleoprotein (to form the MR complex) and then 50 L proteins to form a virus that is secreted from the cell.

Reaction (11.11) accounts for all of these events. All reactions are elementary as written unless i specified otherwise by reaction rate aj. Values for all intracellular rate constants (k ’s) and nonzero initial conditions are given in Table 11.2. 205

ki RNP −→1 degraded (11.11a) i k2 RNP + L )−*− RL (11.11b) i k−2 i k3 RL −→ RL1 + mRNAM (11.11c) i k4 RL1 −→ RNP + L (11.11d) i k5 RL1 −→ RNP + L + mRNAL (11.11e) ki RL −→6 2RNP + L (11.11f) i k7 mRNAM −→ mRNAM + M (11.11g) i k8 mRNAL −→ mRNAL + L (11.11h) i k9 mRNAM −→ degraded (11.11i) i k10 mRNAL −→ degraded (11.11j) ki M −→11 degraded (11.11k) ki L −→12 degraded (11.11l) i k13 2M −→ M2 (11.11m) i k14 i Mj + 182.2M −→ Mj+1 a14 = k14 cMj cM (11.11n) i k15 Mfull + R −→ MR (11.11o) i k16 i 50L + MR −→ secreted virus a16 = k16 cLcMR (11.11p)

The infected cell in Figure 11.2 presents a brief overview of these modeled intracellular events.

Intracellular Host Antiviral Response

Figure 11.3 presents the modeled events for the host antiviral response, namely up-regulation of the interferon pathway, in greater detail. These events reflect a substantially simplified pic- ture of interferon signaling recently reviewed in [133, 54]). Initiation of this response occurs when the cell recognizes viral double-stranded RNA species, leading to phosphorylation of an interferon-response factor (IRF). Here we assume that the cell initially has twenty copies of IRF. The phosphorylated species, IRFP, binds to the positive regulatory domain (PRD) of the interferon gene to form the induced complex IP, which expresses interferon. Interferon 206

Parameter Value L0 50 #/(infected cell) RNP0 1 #/(infected cell) i −0.265 −1 k1 10 hr i 1.663 k2 10 cell/hr i 0.369 −1 k−2 10 hr i 0.529 −1 k3 10 hr i 1.428 −1 k4 10 hr i 0.319 −1 k5 10 hr i 0.177 −1 k6 10 hr i 1.796 −1 k7 10 hr i 1.031 −1 k8 10 hr i −0.065 −1 k9 10 hr i −0.265 −1 k10 10 hr i −0.227 −1 k11 10 hr i −0.902 −1 k12 10 hr i −2.683 k13 10 cell/hr i −0.823 k14 10 cell/hr i −1.525 k15 10 cell/hr i 0.813 k16 10 cell/hr

Table 11.2: Initial conditions and rate constants for the intracellular reactions of the VSV infec- tion of DBT cells.

secreted from the cell can diffuse radially and bind to uninfected cells to form inoculated cells that are resistant to viral infection (i.e., inserted viral ribonucleoprotein is degraded immedi- ately). Figure 11.2 illustrates this inoculation.

This model significantly reduces the complexity of the interferon response. For exam- ple, IFN-β-induced protein kinase R and IFN-α-induced 2’-5’ oligoadenylate synthetase re- spectively inactivate protein synthesis by inactivation of the eukaryotic protein synthesis ini- tiation factor eIF-2 and destroy mRNA through activation of the cellular endonuclease RNase L [162, 135]. Our simpler model does not distinguish between α, β, and γ interferons, nor does it account explicitly for eIF-2 or RNase L. Consequently, the model treats cellular antiviral re- sponse as all or none: either cells are susceptible to viral infection, or they are resistant due to interferon inoculation. 207

Parameter Value IRF0 20 #/cell i 1.597 k17 10 cell/hr i 1.294 k18 10 cell/hr i 3.451 −1 k−18 10 hr i 2.329 −1 k19 10 hr i −0.294 −1 k20 10 hr i −0.636 −1 k21 10 hr i −1.303 −1 k22 10 hr

Table 11.3: Initial conditions and rate constants for the reactions describing the intracellular host antiviral response of the VSV infection of DBT cells.

The intracellular reactions considered in this example are

i k17 RL + IRF −→ RL + IRFP (11.12a) i k17 RL1 + IRF −→ RL1 + IRFP (11.12b) i k18 IRFP + PRD )−*− IP (11.12c) i k−18 i k19 IP −→ IRFP + PRD + mRNAIFN (11.12d) i k20 mRNAIFN −→ mRNAIFN + IFN (11.12e) i k21 mRNAIFN −→ degraded (11.12f) ki IFN −→22 secreted (11.12g)

All reactions are elementary as written. Values for the required rate constants (ki’s) are given in Table 11.3.

11.2.3 Model Solution

For the given assumptions, the intracellular events decouple from the extracellular events. Accordingly, the model equations for the intracellular events are of the form

i ∂cj = R (11.13) dτ j

i in which cj is the intracellular concentration of the jth species, τ is the infected cell age, and Rj is the production rate of the jth species. We solve this ODE system using the package DASKR with an time step of 0.1 hours. 208

Parameter Symbol Value −9 Cell volume Vc 3.4 × 10 ml

Initial number of uninfected 6 n ,0 10 cells cells unc

Number of viruses in the 4 n ,0 8.0 × 10 viruses initial inoculum vir

Radius of the plate rplate 1.75 cm −10.159 3 Infection rate constant k1 10 cm /hr −11.890 3 Inoculation rate constant k2 10 cm /hr

Interferon production rate 3.717 3 k3 10 cm /hr constant −3.445 2 Virus diffusivity Dvir 10 cm /hr −0.990 2 Interferon diffusivity Difn 10 cm /hr 1.571 Background fluorescence ibgd 10 −15.868 Measurement rate constant km/Km 10 Fraction of dead cells −0.942 removed during the kwash 10 measurement process Initial concentration of c0 106.443 cm−3 uninfected cells unc,1 Initial concentration of c0 107.428 cm−3 uninfected cells unc,2 1.228 Maximum age of infection τd 10 hr

Table 11.4: Extracellular model parameters for the infection of DBT cells by VSV.

Extracellular reactions considered by this example are

k virus + uninfected cell −→1 infected cell (11.14a) infected cell −→ virus (age dependent) (11.14b) infected cell −→ infected cell + interferon (age dependent) (11.14c) k uninfected cell + interferon −→2 inoculated cell (11.14d) k inoculated cell −→3 inoculated cell + interferon (11.14e) k inoculated cell + virus −→1 inoculated cell (11.14f) k infected cell + virus −→1 infected cell (all ages) (11.14g)

The model equations are then the following set of coupled integro-partial differential equa- 209 tions

  Z τd ∂cvir 1 ∂ eff ∂cvir i = Dvirr + cinfc(τ)Rvir(τ)dτ + Rvir (11.15a) ∂t r ∂r ∂r 0   Z τd ∂cifn 1 ∂ eff ∂cifn i = Difnr + cinfc(τ)Rifn(τ)dτ + Rifn (11.15b) ∂t r ∂r ∂r 0 ∂c ∂c ∂c unc = R , infc + infc = R (11.15c) ∂t unc ∂t ∂τ infc ∂c ∂c inoc = R , dc = R (11.15d) ∂t inoc ∂t dc 1 − φ 1 − φ Deff = 2D ,Deff = 2D (11.15e) vir vir 2 + φ ifn ifn 2 + φ Z τd φ = Ve(cunc + cinfcdτ + cinoc) (11.15f) 0

dcvir dcifn = 0, = 0 (11.15g) dr dr r=0,rmax r=0,rmax

dcinfc dcinfc = k1c c , = 0 (11.15h) dτ vir unc dτ τ=0 τ=τd

ci(t = 0, r) known (11.15i)

The production rates of virus and interferon are approximated using linear interpolation be- tween time points for the intracellular model results. We discretize the spatial dimension using central differences with an increment of 0.025 cm. We assume that cells are spherical objects, with the height of the cell monolayer equal to the resulting cell diameter. Concentrations for all species are calculated assuming that the volume of the monolayer is cylindrical. The di- mensions of this cylinder are given by the height of the cell monolayer and the radius of the plate. We model the concentration of the initial virus inoculum using the piecewise linear continuous function  c , r < 0.075 cm  vir,0 c (t = 0, r) = 20  vir 1 − cm (r − 0.075) cvir,0 0.075 cm ≤ r ≤ 0.125 cm (11.16)  0, r > 1.25 cm

The initial radial profile for the uninfected cell concentration is given by the expression

 0  cunc,2, r < 0.025 cm  20(c0 −c0 )  0 unc,2 unc,1 cunc,2 − cm (r − 0.025), 0.025 cm ≤ r < 0.075 cm  0 cunc(t = 0, r) = cunc,1, 0.075 ≤ r < 0.1 cm (11.17) 20(c −c0 )  unc,0 unc,1 cunc,0 − (r − 0.025), 0.1 cm ≤ r < 0.15 cm  cm  cunc,0, r > 0.15 cm

We first use the model to predict infection spread for the focal infection system [26]. 210

Model predictions for the experimental measurements are calculated via the relations

cvir-host Km = (11.18) cvir (cunc + cinfc + kwashcdc)  i , c ≤ v  bgd vir-host min ym = kmcvir-host + ibgd, vmin < cvir-host < vmax (11.19)   255, cvir-host ≥ vmax in which Km is the equilibrium constant; cvir, cunc, cinfc, cdc, and cvir-host refer to the concen- trations of virus, uninfected cells, infected cells, dead cells, and virus-host complexes, respec- tively; ym is the intensity measurement; km is the conversion constant from concentration to intensity; ibgd is the background fluorescence (in intensity); and vmin and vmax are the mini- mum and maximum detectable virus-host concentrations. Parameters for the measurement and the extracellular model are given in Table 11.4. Figure 11.4 compares the experimental data, simple segregated model predictions (re- sults taken directly from Chapter 10), and the predictions for the model developed in this chap- ter. This figure demonstrates excellent agreement between the two model predictions and the experimental data. Clearly the experimental data is not informative enough to merit all of the intracellular structure developed in this section. This model requires eighteen more parame- ters than the simple segregated model, which approximates intracellular production rates of both virus and interferon using first-order plus time delay models. Figure 11.5 demonstrates that the total production of both virus and interferon on a per infected cell basis is similar for both the simple and intracellularly-structured models. However, the simpler model includes no intracellular structure and hence cannot predict concentrations for specific intracellular components. For example, we might consider harvesting the entire monolayer of cells and assaying for intracellular species, as was recently performed by Munir and Kapur [94], who used microarrays to analyze host-pathogen interactions for an avian pneumovirus infection. Figure 11.6 presents the results for an mRNA assay of this type, assuming that inoculated cells maintain an average of fifty copies of interferon mRNA. The results intuitively follow the na- ture of the infection. Initially, VSV hijacks the host DBT cells to produce the viral components as evidenced by the sharp rises of matrix and large protein mRNA. As the host cell recognizes viral double-stranded RNA, it up-regulates the interferon response, leading to a burst in the interferon mRNA. Finally, all remaining living cells become inoculated to infection, leading to a steady state with a constant value of interferon mRNA and no viral mRNA species. For this example, we expect that solving the decoupled system should yield dramatic reductions in the computational expense when compared to solving the full model. Since the intracellular level contains twenty-four distinct species, solving the full model would require discretization of each of these species at every node of the spatial discretization. Tracking this information is clearly unnecessary because the concentrations of intracellular species are not spatially dependent. Additionally, the intracellular description for this example is stiff due to the large rate constants for the reversible reactions (11.11b) and (11.12c), which poses potential problems for discretization of the age dimension. 211

Time Data Simple Intracellular

7 hours

27 hours

48 hours

72 hours

96 hours

Figure 11.4: Comparison of experimental data, simple segregated model fit (simple), and the developed model (intracellular). The two models fit the data equally well. White scale bar in the upper left-hand corner of the experimental images is one millimeter.

11.3 Conclusions

We have considered a decomposition for solving viral infection models with restricted flow of information from the extracellular to intracellular level. For these cases, the intracellular level decomposes from the extracellular level, leading to a more computationally tractable problem than the original formulation. Two examples illustrated the efficacy of this decomposition. The assumptions required for this decomposition restrict the ability of the model to compensate for instantaneous extracellular effects such as super-infection of infected cells and transfer of material from the extracellular to intracellular levels. However, we suspect that in many cases these effects may either be insignificant or indistinguishable from models making these 212

400 350 300 250 200 VSV 150 IFN 100 50 0 0 2 4 86 10 1412 1816

Total Amount Secreted Per Infected Cell (number) Time (hours)

Figure 11.5: Comparison of total production of virus (VSV) and interferon (IFN) per cell for the simple segregated model (lines; first-order plus time delay expressions for the production rate) and intracellularly-structured, segregated model (points).

8 L 7 M IFN 6 5 4 3 mRNA 2 1 0 -1 0 10 3020 40 9080706050 100 Time (hours)

Figure 11.6: Dynamic measurement of mRNA species (interferon and viral matrix (M) and large (L) proteins) for the focal infection system. Here, the entire monolayer is harvested and analyzed for mRNA. 213 assumptions given experimental data. We also extended a model for vesicular stomatitis virus infection of murine astrocytoma cells to include intracellular structure for both virus replication and interferon signaling. This model predicts similar spatio-temporal profiles to a segregated, intracellularly-unstructured model for the focal infection system, indicating that the available extracellular data is not in- formative enough to justify even the simplified intracellular model presented in this paper. In addition, the proposed model predicts population-averaged concentrations of intracellular species, and thus serves as a basis for assimilating more extensive intracellular measurements in further studies of viral infection and cellular antiviral mechanisms.

Notation

aj jth reaction rate

cj concentration of extracellular species j i cj concentration of intracellular species j 0 cunc initial concentration of uninfected cells in the radius of the initial inoculum for the VSV/BHK- 21 fit 0 cunc,1 initial concentration of uninfected cells in the first radial region of the initial inoculum for the VSV/DBT fit 0 cunc,2 fraction two of initial concentration of uninfected cells in the second radial region of the initial inoculum for the VSV/DBT fit

dj time delay for reaction j

Difn interferon diffusivity

Dvir virus diffusivity

Ej jth extracellular production rate

ibgd background fluorescence Kj rate constant for reaction j

Km equilibrium constant for the measurement

kj rate constant for reaction j i kj rate constant for intracellular reaction j km conversion constant from virus-host concentration to intensity

Rj jth intracellular production rate

Rη production rate for the infected cell population η r radial dimension t time

vy velocity vector for the internal characteristics y Y virus yield per infected cell y internal characteristics

ym intensity measurement δ Dirac delta function η(t, y)dy concentration of infected cells τ age of infection 214

τd maximum age of infection

Subscripts

dc dead cell ifn interferon infc infected cell inoc inoculated cell unc uninfected cell vir virus vir-host virus-host complex 215

Chapter 12

Moving-Horizon State Estimation 1

It is well established that the Kalman filter is the optimal state estimator for unconstrained, linear systems subject to normally distributed state and measurement noise. Many physical systems, however, exhibit nonlinear dynamics and have states subject to hard constraints, such as nonnegative concentrations or pressures. Hence Kalman filtering is no longer directly ap- plicable. As a result, many different types of nonlinear state estimators have been proposed; Soroush [141] provides a review of many of these methods. We focus our attention on tech- niques that formulate state estimation in a probabilistic setting, that is, both the model and the measurement are potentially subject to random disturbances. Such techniques include the extended Kalman filter, moving-horizon estimation, Bayesian estimation, and Gaussian sum approximations. In this probabilistic setting, state estimators attempt to reconstruct the a pos- teriori distribution P (xT |y0,..., yT ), which is the probability that the state of the system is xT given measurements y0,..., yT . The question arises, then, as to which point estimate should be used for the state estimate. Two obvious choices for the point estimate are the mean and the mode of the a posteriori distribution. For non-symmetric distributions, Figure 12.1 (a) demonstrates that these estimates are generally different. Additionally, if this distribution is multimodal as is Figure 12.1 (b), then the mean may place the state estimate in a region of low probability. Clearly the mode is a more desirable estimate in such cases. For nonlinear systems, the a posteriori distribution is generally non-symmetric and potentially multimodal. In this chapter, we outline conditions that lead to the formation of multiple modes in the a posteriori distribution for systems tending to a steady state, and con- struct examples that generate multiple modes. To the best of our knowledge, only Alspach and Sorenson (and references contained within) [2], Gordon et al. [53], and Chaves and Son- tag [20] have proposed examples in which multiple modes arise in the a posteriori distribu- tion, but these contributions do not examine conditions leading to their formation. Gaussian sum approximations [2] offer one method for addressing the formation of multiple modes in the a posteriori distribution for unconstrained systems. Current Bayesian estimation meth- ods [53, 12, 22, 142] offer another means for addressing multiple modes, but these methods propose estimation of the mean rather than the mode. In this chapter, we examine the estima-

1Portions of this chapter appear in Haseltine and Rawlings [58] and are to appear in Haseltine and Rawl- ings [59]. 216

0.16 (a) )

T 0.12

mode mean , ..., y

0 0.08 y | T x (

p 0.04

0 0 2 4 86 10 xT 0.8 (b) modemode )

T 0.6 , ..., y

0 0.4 y | T x (

p 0.2 mean

0 -3 -2 -1 0 1 2 3 xT

Figure 12.1: Comparison of potential point estimates (mean and mode) for (a) unimodal and (b) bimodal a posteriori distributions. tion properties of both the extended Kalman filter and moving-horizon estimation through simulation. The extended Kalman filter assumes that the a posteriori distribution is nor- mally distributed (unimodal), hence the mean and the mode of the distribution are equiva- lent. Moving-horizon estimation seeks to reconstruct the mode of the a posteriori distribution via constrained optimization, but current implementations employ local optimizations that offer no means of distinguishing between multiple modes of this distribution. The simulation examples thus provide a means of benchmarking these current industrially implementable technologies. In this chapter, we first formulate the estimation problem of interest. Next, we briefly review pertinent extended Kalman filtering, Monte Carlo filter, and moving-horizon estima- tion literature. Then we present several motivating chemical engineering examples in which the accurate incorporation of both state constraints and the nonlinear model are paramount for obtaining accurate estimates. 217 12.1 Formulation of the Estimation Problem

In chemical engineering systems, most processes consist of continuous processes with discrete measurements. Therefore for this work, we choose the discrete stochastic system model

xk+1 = F (xk, uk) + G(xk, uk)wk (12.1a)

yk = h(xk) + vk (12.1b) in which

• xk is the state of the system at time tk,

• uk is the system input at time tk (assumes a zero order hold over the interval [tk, tk+1)),

• wk is a N (0, Qk) noise (N (m, P ) denotes a normal distribution with mean m and covari- ance P ),

• F (xk, uk) is the solution to a first principles, differential equation model,

• G(xk, uk) is a full column rank matrix (this condition is required for uniqueness of the a posteriori distribution defined in section 12.5),

• yk is the system measurement at time tk,

• hk is a (possibly) nonlinear function of xk at time tk, and

• vk is a N (0, Rk) noise.

We believe that by appropriately choosing both a first principles model and a noise structure, we can identify both the model parameters (or a reduced set of these parameters) and the state and measurement noise covariance structures. Such identification could proceed as follows:

1. Assuming a noise structure, identify the model parameters.

2. Assuming a model, model parameters, and a noise structure, identify the covariance structures.

Here, we propose performing replicate experiments and measurements to estimate moments of the desired quantity (in general, the mean of the state or covariance structure), then fit- ting the model parameters by comparing the estimated moments to those reconstructed from Monte Carlo simulation of equation (12.1). This identification procedure is an area of current research beyond the scope of this chapter, but we maintain that such a procedure yields a rough, potentially biased, yet useful stochastic model from the system measurements. As discussed in the introduction, state estimators given multimodal a posteriori distri- butions should solve the problem

+ xT = arg max P (xT |y0,..., yT ) (12.2) xT 218

Here, we assume that the input sequence u0,..., uT is known exactly. Equation (12.2) is re- ferred to as the maximum a posteriori estimate. In the special case that the system is not constrained and in equation (12.1)

1. F (xk, uk) is linear with respect to xk,

2. h(xk) is linear with respect to xk, and

3. G(xk, uk) is a constant matrix, the maximum a posteriori estimator is the Kalman filter, whose well-known recursive form is conducive for online implementation. For the more general formulation given by equa- tion (12.1), online solution of the exact maximum a posteriori estimate is impractical, and approximations are used to obtain state estimates in real time.

12.2 Nonlinear Observability

The determination of observability for nonlinear systems such as equation (12.1) is substan- tially more difficult than for linear systems. For linear systems, either one state is the optimal estimate, or infinitely many states are optimal estimates, in which case the system is unob- servable. Nonlinear systems have the additional complication that finitely many states may be locally optimal estimates. Definitions of nonlinear observability should account for such a con- dition. Concepts such as output-to-state stability [159] offer promise for a rigorous mathemat- ical definition of nonlinear observability, but currently no easily implemented tests for such determination exist. In lay terms, such a definition for deterministic models should roughly correspond to “for the given model and measurements, if the measurement data are close, the initial conditions generating the measurements are close.” One approximate method of checking nonlinear observability is to examine the time- varying Gramian [21]. This test actually establishes the observability criterion for linear, time- varying systems. By approximating nonlinear systems as linear time-varying systems, we can obtain a rough estimate of the degree of observability for the system by checking the condi- tion number of the time-varying Gramian. In general, ill-conditioned Gramians indicate poor observability because different initial conditions can reconstruct the data arbitrarily closely [93].

12.3 Extended Kalman Filtering

The extended Kalman filter is one approximation for calculating equation (12.2). The EKF lin- earizes nonlinear systems, then applies the Kalman filter (the optimal, unconstrained, linear state estimator) to obtain the state estimates. The tacit approximation here is that the process statistics are multivariate normal distributions. We summarize the algorithm for implement- ing the EKF presented by Stengel [144], employing the following notation: 219

• E[α] denotes the expectation of α,

• Ak denotes the value of the function A at time tk,

• xk|l refers to the value of x at time tk given measurements up to time tl,

• xˆ denotes the estimate of x, and

• x¯0 denotes the a priori estimate of x0, that is, the estimate of x0 with knowledge of no measurements.

The assumed prior knowledge is identical to that of the Kalman filter:

x¯0 given (12.3a) T P 0 = E[(x − x¯0)(x − x¯0) ] (12.3b) T Rk = E[vkvk ] (12.3c) T Qk = E[wkwk ] (12.3d)

The inputs uk are also assumed to be known. The approximation uses the following linearized portions of equation (12.1)

∂F (x, u) Ak = (12.4) ∂xT x=xk,u=uk

∂h(x) Ck = (12.5) ∂xT x=xk to implement the following algorithm:

1. At each measurement time, compute the filter gain L and update the state estimate and covariance matrix:

T T −1 Lk = P k|k−1Ck [CkP k|k−1Ck + Rk] (12.6)

xˆk|k =x ˆk|k−1 + Lk(yk − h(ˆxk|k−1)) (12.7)

P k|k = P k|k−1 − LkCkP k|k−1 (12.8)

2. Propagate the state estimate and covariance matrix to the next measurement time via the equations:

xˆk+1|k = F (xˆk, uk) (12.9) T T P k+1|k = AkP k|kAk + GkQkGk (12.10)

3. Let k ← k + 1. Return to step 1. 220

Until recently, few properties regarding the stability and convergence of the EKF have been proven. Recent publications present bounded estimation error and exponential conver- gence arguments for the continuous and discrete EKF forms given detectability, small initial estimation error, small noise terms, and perfect correspondence between the plant and the model [123, 124, 125]. However, depending on the system, the bounds on initial estimation error and noise terms may be unreasonably small. Also, initial estimation error may result in bounded estimate error but not exponential convergence, as illustrated by Chaves and Sontag [20].

12.4 Monte Carlo Filters

The basic idea of Monte Carlo filters is to use simulations of the stochastic process to recon- struct the state estimates. In general, we can reconstruct functions of the underlying probabil- ity distribution by sampling from this distribution, then averaging the resulting properties. In the limit as the number of samples approaches infinity, we obtain the equivalence

N Z 1 Xs h(x)P (x)dx = lim h(xj) (12.11) Ns→∞ N s j=1 in which xj is the jth realization of x. Monte Carlo filters approximate the left-hand side of equation (12.11) by evaluating the right-hand side of the same equation with a finite number of samples. For example, most Monte Carlo filters propose estimation of the mean

N Z 1 Xs E[x] = x0P (x0)dx0 ≈ xj N x s j=1

The primary benefits of using this type of filter are

• their relative simplicity (most demanding requirement is the integration of model (12.1)), and

• the ability to use any combination of model and random noise in a straightforward man- ner.

Examples of Monte Carlo techniques include

• Rejection sampling [12]. Here, one draws samples from an initial distribution, propa- gates this sample to the next measurement time via equation (12.1a), then either accepts or rejects the integrated state based upon the statistics of the measurement noise in equa- tion (12.1b). This process is repeated until one generates the desired number of accepted samples.

• Particle methods [22]. For this method, one randomly distributes a set number of initial “particles”, i.e. states. Each of these states is propagated to the next measurement time 221

via equation (12.1a), then a weight qj is assigned to each state according to the distri- bution of the measurement noise vk in equation (12.1b). For this case, functions of the underlying probability distribution are evaluated according to

Z PNs j j=1 qjh(x ) h(x0)P (x0)dx0 ≈ (12.12) PNs x j=1 qj

Spall [142] presents a nice overview of other Monte Carlo methods. Most Monte Carlo filters propose estimation of the mean as opposed to the mode. For cases such as the bimodal distribution given in Figure 12.1(b), however, one may prefer the mode point estimate. In this case, a Monte Carlo filter must first estimate the entire probability density, then maximize this estimated density to calculate the mode. We consider the task of density estimation next. The density estimation is well known in the field of statistics. The information pre- sented here merely summarizes some of the information presented by Silverman [140]. The basic idea is that one can use samples of an underlying distribution to approximately recon- struct this distribution. Kernel methods are perhaps the most popular manner of performing this reconstruction. This technique proceed roughly as follows:

1. Draw samples from the underlying distribution.

2. Apply a “kernel” density at each sample.

3. Sum the kernel densities to approximate the underlying distribution.

Mathematically, the approximate distribution is then

Ns   1 X x − Xj f¯(x) = K (12.13) N h h s j=1 in which

• f¯(x) is the reconstructed distribution, a function of x;

• Ns is the number of samples;

• h is the window width, also called the smoothing parameter or bandwidth;

• K is the kernel; and

• Xj is the jth sample of x.

Each kernel obeys the properties of a probability density function, and are usually symmetric. Silverman [140] gives greater detail in selection of the kernel K and the window width h. As an illustrative example, we employ the kernel method to estimate the density of samples drawn from a normal distribution. Figure 12.2 demonstrates the three steps of this 222

0.4 0.3

f(x) 0.2 0.1 0 -3 -2 -1 0 1 2 3

0.4 0.3

f(x) 0.2 0.1 0 -3 -2 -1 0 1 2 3

0.4 0.3

f(x) 0.2 0.1 0 -3 -2 -1 0 1 2 3

Figure 12.2: Example of using the kernel method to estimate the density of samples drawn from a normal distribution.

0.2 0.15 0.1 f(x) 0.05 0 -4 -3 -2 -1 0 1 2 3 4

Figure 12.3: Example of using a histogram to estimate the density of samples drawn from a normal distribution. procedure and presents the resulting reconstructed density. This density approximates well the underlying normal distribution given a relatively small sample size of ten. In contrast, more na¨ıve methods of estimating the density such as using histograms generate substantially worse estimates as seen by Figure 12.3. The primary drawback of density estimation is its “curse of dimensionality”. Table 12.1 presents the number of samples required to approximate, to a given relative error, an under- lying standard multivariate normal distribution for the origin (a single point). The number of 223

Dimensionality Required Sample Size 1 4 2 19 3 67 4 223 5 768 6 2790 7 10700 8 43700 9 187000 10 842000

Table 12.1: Sample size required to ensure that the relative mean square error at zero (a single point) is less than 0.1. The underlying distribution is a standard multivariate normal density. samples increases exponentially with the dimensionality. Since the computational expense of Monte Carlo methods scales with the number of samples, we expect density estimation to be applicable for systems with smaller dimensions. Another drawback results if one is interested in obtaining the maximum mode of the reconstructed density. If the estimated density has multiple modes, then one requires a global optimizer to find the desired mode. However, we expect that local optimization may prove ef- fective in calculating a global optimum since the samples of the underlying distribution should provide excellent initial guesses.

12.5 Moving-Horizon Estimation

One alternative to solving the maximum a posteriori estimate is to maximize a joint probability for a trajectory of state values, i.e.,

∗ ∗ {x0,..., xT } = arg max P (x0,..., xT |y0,..., yT ) (12.14) x0,...,xT

Equation (12.14) is the full information estimate. The computational burden of calculating this estimate increases as more measurements come online. To bound this burden, one can fix the estimation horizon:

 ∗ ∗ xT −N+1,..., xT = arg max P (xT −N+1,..., xT |y0,..., yT ) (12.15) xT −N+1,...,xT

Moving-horizon estimation, or MHE, corresponds probabilistically to equation (12.15), and is equivalent numerically to a constrained, nonlinear optimization problem [128, 114]. We note that the restrictive assumptions of normally distributed noises and the model given by equation (12.1) are not required by MHE. If the matrix G in equation (12.1) is not a function of 224 the state xk, then these assumptions merely lead to a convenient least-squares optimization as demonstrated by Jazwinski [71]. From a theoretical perspective, Tyler and Morari examine the feasibility of constrained MHE for linear, state-space models [152]. Rao et al. show that constrained MHE is an asymp- totically stable observer in a nonlinear deterministic modeling framework [112, 116]. These works also provide a nice overview of current MHE research. Furthermore, recent advances in numerical computation have allowed real-time implementation of MHE strategies for the local optimization of the MHE problem [147, 148]. How to incorporate the effect of past data outside the current estimation horizon (also known as the arrival cost), though, remains an open issue of MHE. Rao, Rawlings and Lee [115] explore estimating this cost for constrained linear systems with the corresponding cost for an unconstrained linear system. More specifically, the follow- ing two schemes are examined:

1. a “filtering” scheme that penalizes deviations of the initial estimate in the horizon from an a priori estimate, and

2. a “smoothing” scheme that penalizes deviations of the trajectory of states in the estima- tion horizon from an a priori estimate.

For unconstrained, linear systems, the MHE optimization collapses to the Kalman filter for both of these schemes. Rao [112] further considers several optimal and suboptimal approaches for estimating the arrival cost via a series of optimizations. These approaches stem from the property that, in a deterministic setting (no state or measurement noise), MHE is an asymptoti- cally stable observer as long as the arrival cost is underbounded. One simple way of estimating the arrival cost, therefore, is to implement a uniform prior. Computationally, a uniform prior corresponds to not penalizing deviations of the initial state from the a priori estimate. For nonlinear systems, Tenny and Rawlings [148] estimate the arrival cost by approxi- mating the constrained, nonlinear system as an unconstrained, linear time-varying system and applying the corresponding filtering and smoothing schemes. They conclude that the smooth- ing scheme is superior to the filtering scheme because the filtering scheme induces oscillations in the state estimates due to unnecessary propagation of initial error. Here, the tacit assump- tion is that the probability distribution around the optimal estimate is a multivariate normal. The problem with this assumption is that nonlinear systems may exhibit multiple peaks (i.e. local optima) in this probability distribution. Haseltine and Rawlings [58] demonstrate that approximating the arrival cost with the smoothing scheme in the presence of multiple local optima may skew all future estimates. They conjecture that if global optimization is imple- mentable in real time, approximating the arrival cost with a uniform prior and making the estimation horizon reasonably long is preferable to an approximate multivariate normal ar- rival cost because of the latter’s biasing effect on the state estimates. We now seek to demonstrate by simulation examples that MHE is a useful and prac- tical tool for state estimation of chemical process systems. We examine the performance of 225

MHE with local optimization and an arrival cost approximated with a “smoothing” update. For further details regarding this MHE scheme, we refer the interested reader to Tenny and Rawlings [148] and note that this code is freely available as part of the NMPC toolbox (http: //www.che.wisc.edu/˜tenny/nmpc/). Currently, this particular MHE configuration rep- resents a computationally feasible implementation for an industrial setting.

12.6 Example 1

Consider the gas-phase, reversible reaction

k¯ 2A −→ B k¯ = 0.16 (12.16) with stoichiometric matrix h i ν = −2 1 (12.17) and reaction rate ¯ 2 r = kPA (12.18) We define the state and measurement to be " # PA h i x = , yk = 1 1 xk (12.19) PB where Pj denotes the partial pressure of species j. We assume that the ideal gas law holds (high temperature, low pressure), and that the reaction occurs in a well-mixed, isothermal batch reactor. From first principles, the model for this system is T T h i x˙ = f(x) = ν r, x0 = 3 1 (12.20)

For state estimation, consider the following parameters: 2 2 ∆t = tk+1 − tk = 0.1, Π0 = diag(6 , 6 ), Gk = diag(1, 1), T 2 2 2 h i Qk = diag(0.001 , 0.001 ), Rk = 0.1 , x¯0 = 0.1 4.5 (12.21)

Note that the initial guess, x¯0, is poor. The actual plant experiences N (0, Qk) noise in the state and N (0, Rk) noise in the measurements. We now examine the estimation performance of both the EKF and MHE for this system.

12.6.1 Comparison of Results

Figure 12.4 demonstrates that the EKF converges to incorrect estimates of the state (the par- tial pressures). In addition, the EKF estimates that the partial pressures are negative, which is physically unrealizable. To explain why this phenomenon occurs, we examine the probability density P (xk|y0,..., yk). Recall that the goal of the maximum likelihood estimator is to deter- mine the state xk that maximizes this probability density. Since we know the statistics of the system, we can calculate this density by successively 226

8 (a) 6 B 4 B 2 A 0 Partial Pressure -2 A -4 0 2 4 86 10 Time 5 (b) 4.5 4 3.5

Pressure 3 2.5 2 0 2 4 86 10 Time

Figure 12.4: Extended Kalman filter results: (a) evolution of the actual (solid line) and EKF up- dated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and EKF updated (dashed line) pressure estimates.

1. using the discretized version of the nonlinear model

 xk,1  ¯  2k∆txk,1 + 1  xk+1 = F (xk, wk) =  ¯ 2  + wk (12.22)  k∆txk,1  xk,2 + ¯ 2k∆txk,1 + 1

to propagate the probability density from P (xk|y0,..., yk) to P (xk+1|y0,..., yk) via −1 ∂F (xk,wk) ∂F (xk,wk) ∂xT ∂wT P (x , w |y ,..., y ) = P (x |y ,..., y )P (w ) k k k+1 k 0 k k 0 k k ∂wk ∂wk (12.23) T T ∂xk ∂wk and then 227

2.85

2.8

2.75 2 t

2.7

2.65

2.6 A > 0, B > 0

−8 −6 −4 −2 0 2 t 1

Figure 12.5: Contours of P (x1|y0, y1)

2. using measurements to update P (xk|y0,..., yk−1) to P (xk|y0,..., yk)

P (xk|y0,..., yk−1)pvk (yk − Cxk) P (xk|y0,..., yk) = R ∞ (12.24) −∞ P (xk|y0,..., yk−1)pvk (yk − Cxk)dxk

Therefore, the expression for the probability density we are interested in is R ∞ R ∞ −∞ ... −∞ Ωkdw0 . . . dwk−1 P (xk|y0,..., yk) = R ∞ R ∞ R ∞ (12.25) −∞ ... −∞ −∞ Ωkdw0 . . . dwk−1dxk in which

k−1    k k−1  Y 2 1 X X Ω = 2k¯∆tx + 1 exp − (x − x¯ )T Π−1(x − x¯ ) + vT R−1v + wT Q−1w k  j,1   2  0 0 0 0 0 j j j j j=0 j=0 j=0 (12.26) We can numerically evaluate equation (12.25) using the integration package Bayespack [42].

Figure 12.5 presents a contour plot of the results for P (x1|y0, y1) with transformed axes " #−1 √ 1 −1 t = 2 x 1 1

This plot clearly illustrates the formation of two peaks in the probability density. However, only one of these peaks corresponds to a region where both the partial pressures for species A 228

16 14 (a) 12 10 8 6 4 Partial Pressure B 2 0 A 0 10 3020 40 50 Time 14 (b) 12 10 8

Pressure 6 4 2 0 10 3020 40 50 Time

Figure 12.6: Clipped extended Kalman filter results: (a) evolution of the actual (solid line) and clipped EKF updated (dashed line) concentrations; (b) the evolution of the actual (solid line), measured (points), and clipped EKF updated (dashed line) pressure estimates. and B are positive. The real problem is that the process prohibits negative partial pressures, whereas unconstrained estimators permit updating of the state to regions where partial pres- sures may be negative. Since the EKF falls into the unconstrained estimator category with a local optimization (at best), the estimation behavior in Figure 12.4 is best explained as a poor initial guess leading to an errant region of attraction. One method of preventing negative estimates for the partial pressure is to “clip” the EKF estimates. In this strategy, partial pressures rendered negative by the filter update are zeroed. As seen in Figure 12.6, this procedure results in an improved estimate in that the EKF eventually converges to the true state, but estimation during the initial dynamic response is poor. Also, only the estimates are “clipped”, not the covariance matrix. Thus the accuracy of the approximate covariance matrix is now questionable. 229

4.5 4 (a) 3.5 3 2.5 B 2 1.5

Partial Pressure 1 0.5 A 0 0 2 4 86 10 Time 4.2 4 (b) 3.8 3.6 3.4 3.2

Pressure 3 2.8 2.6 2.4 0 2 4 86 10 Time

Figure 12.7: Moving-horizon estimation results, states constrained to x ≥ 0, smoothing initial covariance update, and horizon length of one time unit (N = 11 measurements): (a) evolution of the actual (solid line) and MHE updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and MHE updated (dashed line) pressure estimates.

Alternatively, we can optimally constrain the partial pressures by applying MHE. Fig- ure 12.7 presents the MHE results for a horizon length of one time unit (N = 11 measure- ments). These results indicate significant improvement over those of either the EKF or the clipped EKF. To explore further the differences between the full information and maximum likeli- hood estimates, we examine contour plots of the projection

max P (x0,..., xk|y0,..., yk) (12.27) x0,...,xk−1 noting again the equivalence between this probability and the full information cost function Φk given by equation (12.14). Figure 12.8 confirms of our previous assertion that the full in- 230

2.85

2.8

2.75 2 t 2.7

2.65

2.6 A > 0, B > 0

−8 −6 −4 −2 0 2 t 1

Figure 12.8: Contours of max P (x1, x0|y0, y1). x0

2.8 2 t 2.7

2.6 A > 0, B > 0 −8 −6 −4 −2 0 2 t 1

Figure 12.9: A posteriori density P (x1|y0, y1) calculated using a Monte Carlo filter with density estimation. formation and maximum likelihood estimates are not equivalent for nonlinear systems. In fact, the global optima are even different. However, the full information formulation retains the dominant characteristic of the maximum likelihood estimate, namely the formation of two local optima. Finally, we consider using the rejection sampling technique outlined by Bølviken et al. [12] for the Monte Carlo filter, and reconstruct the a posteriori density P (x1|y0, y1) using one hundred accepted samples. Figure 12.9 presents these results. The actual distribution, Figure 12.5, is bimodal with the maximum mode placed in the region where PA > 0 and PB > 0. The Monte Carlo reconstruction is unimodal, and the single mode does not overlap in 231

A > 0, B > 0 2.7

2.65

2.6 2 t

2.55

2.5

2.45

−8 −6 −4 −2 0 2 t 1

Figure 12.10: Contours of P (x4|y0,..., y4).

the same transformed coordinate t1 space as the actual maximum. These results indicate that Monte Carlo methods do not provide very accurate estimation of the mode in the presence of multiple modes for models with small state noise. The primary sources of error are the finite number of samples associated with the Monte Carlo approximation, and the error induced by the density estimation approximation.

12.6.2 Evaluation of Arrival Cost Strategies

The next logical question is: does MHE retain the same properties as the maximum likelihood estimate? The short answer is: it depends on what approximation one chooses for the arrival cost. Figures 12.10 through 12.12 compare contours of the maximum likelihood estimate, unconstrained MHE with a smoothing update, and unconstrained MHE with a uniform prior, respectively, given five measurements. Figure 12.11 shows that the smoothing update biases the contours of the state estimate so much that the estimator no longer predicts multiple op- tima. This biasing occurs because the update has “smoothed” the estimate around only one of the optima in the estimator. Using MHE with a uniform prior, on the other hand, retains the property of multiple optima in the estimator as seen in Figure 12.12. Increasing the number of measurements in the estimation horizon can overcome the biasing of the smoothing update. Figure 12.13 shows the eventual reemergence of multiple optima in the estimator upon increasing the estimation horizon from four (i.e. Figure 12.11) to 232

2.7

2.65

2.6 2 t

2.55

2.5

2.45 A > 0, B > 0

−8 −6 −4 −2 0 2 t 1

Figure 12.11: Contours of max P (x1,..., x4|y0,..., y4) with the arrival cost approximated x1,...,x3 using the smoothing update. ten. However, the optima are still heavily biased by the smoothing update. We speculate that any approximation of the arrival cost using the assumption that the process is a time-varying linear system may lead to substantial biasing of the estimator. A short estimation horizon further compounds such biasing because the information contained in the data can no longer overcome the prior information (i.e. the arrival cost). This situation is analogous to cases in Bayesian inference when the prior dominates and distorts the infor- mation contained in the data [14]. We expect the EKF to demonstrate similar biasing since it is essentially a suboptimal MHE with a short estimation horizon and an arrival cost approxi- mated by a filtering update. For such approximations to work well, one must have a system that does not exhibit multiple local optima in the probability distribution. The optimization strategy further obfuscates the issue of whether or not to approx- imate the arrival cost via linearization (e.g. the smoothing and filtering updates). Ideally, one would implement a global optimizer so that MHE could then distinguish between local optima. With global optimization, approximating the arrival cost with a uniform prior and making the estimation horizon reasonably long is preferable to approximating the arrival cost as a multivariate normal because of the observed biasing effect. Currently, though, only local optimization strategies can provide the computational performance required to perform the MHE calculation in real time. For this case, it may be preferable to use a linear approximation of the arrival cost and then judiciously apply constraints to prevent multiple optima in the 233

2.7

2.65

2.6 2 t

2.55

2.5

2.45 A > 0, B > 0

−8 −6 −4 −2 0 2 t 1

Figure 12.12: Contours of max P (x1,..., x4|y0,..., y4) with the arrival cost approximated x1,...,x3 as a uniform prior. estimator. The examples considered next examine the estimator performance of this type of MHE.

12.7 EKF Failure

In this section, we outline the conditions that generate EKF failure in two classes of chemical reactors. We then present several examples that demonstrate failure of the EKF as an estimator. If there is no plant-model mismatch, measurement noise, or state noise, one definition of estimator failure is

lim xˆk|k − xk >  (12.28) k→∞ for some  > 0 (|x| is a norm of x). That is, the estimator is unable to reconstruct the true state no matter how many measurements it processes. For stable systems, i.e. those systems tending to a steady state, we expect that

xˆk|k = xˆk−1|k−1 (12.29) in the same limit as equation (12.28). We now examine the discrete EKF given such conditions. 234

2.38 2.36 2.34 2.32 2

t 2.3 2.28 2.26 2.24 2.22 A > 0, B > 0 2.2 −8 −6 −4 −2 0 2 t 1

Figure 12.13: Contours of max P (x1,..., x10|y0,..., y10) with the arrival cost approximated x1,...,x9 using the smoothing update.

The following equations govern the propagation and update steps [144]:

xˆk|k−1 = F (ˆxk−1|k−1, uk−1, wk−1) (12.30a) T T P k|k−1 = Ak−1P k−1|k−1Ak−1 + Gk−1Qk−1Gk−1 (12.30b)

xˆk|k = xˆk|k−1 + Lk(yk − h(xˆk|k−1)) (12.30c)

P k|k = P k|k−1 − LkCkP k|k−1 (12.30d) T T −1 Lk = P k|k−1Ck [CkP k|k−1Ck + Rk] (12.30e) in which

∂F (xk, uk, wk) Ak = T (12.31a) ∂xk ∂F (xk, uk, wk) Gk = T (12.31b) ∂wk ∂h(xk) Ck = T (12.31c) ∂xk

At steady state, the following equalities hold:

xˆk|k = xˆk−1|k−1 (12.32a)

P k|k = P k−1|k−1 (12.32b) 235

Combining expressions (12.30) and (12.32) yields:

0 = F (xˆk−1|k−1, uk−1) − xˆk|k−1 (12.33a) T T 0 = Ak−1P k−1|k−1Ak−1 + Gk−1Qk−1Gk−1 − P k|k−1 (12.33b)

0 = xˆk|k−1 + Lk(yk − h(ˆxk|k−1)) − xˆk−1|k−1 (12.33c)

0 = P k|k−1 − LkCkP k|k−1 − P k−1|k−1 (12.33d) T T −1 Lk = P k|k−1Ck [CkP k|k−1Ck + Rk] (12.33e)

If both equations (12.28) and (12.33) hold, then the EKF has failed as an estimator. One solution to equation (12.33) results when multiple steady states satisfy the steady- state measurement. This phenomenon corresponds to the case that

xˆk|k = xˆk|k−1 = xˆk−1|k−1 (12.34)

yk = h(xˆk|k−1) (12.35)

xˆk|k 6= xk (12.36)

We would expect the EKF to fail when

1. the system model and measurement are such that multiple states satisfy the steady-state measurement, and

2. the estimator is given a poor initial guess of the state.

Condition 1 does not imply that the system is unobservable; rather, this condition states that the state cannot be uniquely determined from solely the steady-state measurement. For such a case to be observable, the process dynamics must make the system observable. Condition 2 implies that the poor initial guess skews the estimates (xˆk|k’s) toward a region of attraction not corresponding to the actual state (xk’s).

12.7.1 Chemical Reaction Systems

For well-mixed systems consisting of reaction networks, the nonlinearity of the system must be present at steady state so that multiple steady states can satisfy the steady-state measurement. Consequently, we must analyze the structure of the stoichiometric matrix in combination with the number (and type) of measurements to determine whether or not multiple steady states can satisfy the steady-state measurement. Define:

• ν, the stoichiometric matrix of size r × s, in which r is the number of reactions and s is the number of species;

• ρ, the rank of ν (ρ = r if there are no linearly dependent reactions);

• η, the nullity of ν;

• n, the number of measurements; and 236

• nm, the number of measurements that can be written as a linear combination of states (e.g. y = x1 + x2 and (x1 + x2)y = x1).

For batch reactors, conservation laws yield a model of the form

d (xV ) = νT r(x)V (12.37) dt R R in which

• x is an s-vector containing the concentration of each species in the reactor,

• VR is the volume of the reactor, and

• r(x) is an r-vector containing the reaction rates.

For this system ρ specifies the number of independent equations at equilibrium. In general, we require that

1. all reactions are reversible

2. the following inequalities hold: number of “linear” number of estimated number of independent equations < species ≤ equations nm + η s n + ρ

Note that the batch reactor preserves the nonlinearity of the reaction rates in the steady-state calculation. Also, the combination of batch steady-state equations and measurements may or may not be an over-specified problem. For continuously stirred tank reactors (CSTRs), conservation laws yield a model of the form d (xV ) = Q c − Q x + νT r(x)V (12.38) dt R f f o R where

• x is an s-vector containing the concentration of each species in the reactor,

• VR is the volume of the reactor,

• Qf is the volumetric flow rate into the reactor,

• cf is an s-vector containing the inlet concentrations of each species,

• Qo is the effluent volumetric flow rate, and

• r(x) is an r-vector containing the reaction rates. 237

Here η specifies the number of linear algebraic relationships among the s species at equilib- rium because the null space represents linear combinations of the material balances that elim- inate nonlinear reaction rates. We require

number of “linear” equations number of estimated species < (12.39) nm + η s

If equation (12.39) is an equality instead of an inequality, then determination of the steady state is generally a well-defined, linear problem with a unique solution. Note that the left hand side of equation (12.39) is actually an upper bound since we could potentially choose a measurement contained within the span of the null space (a linear combination of the null vectors). However, such measurements would be invariant and hence would give no dynamic information. Also, equation (12.39) does not imply that multiple steady states can satisfy the steady-state measurement; rather, having multiple steady states that can satisfy the steady- state measurement implies that equation (12.39) holds. EKF failure for CSTRs modeled by equation (12.38) must be confirmed by verifying that equation (12.33) holds. This requirement differs from the batch case because in general, the CSTR design equation (12.38) yields a suf- ficient number of equations to calculate all possible steady states, whereas the batch design equation (12.37) does not. We now examine several examples that illustrate these points.

12.7.2 Example 2

Consider the gas-phase, reversible reactions

k1 A )−*− B + C (12.40a) k2 k3 2B )−*− C (12.40b) k4

h iT k = 0.5 0.05 0.2 0.01 (12.40c) with stoichiometric matrix " # −1 1 1 ν = (12.41) 0 −2 1 and reaction rates " # k1cA − k2cBcC r = 2 (12.42) k3cB − k4cC We define the state and measurements to be h iT x = cA cB cC (12.43a) h i y = RTRTRT x (12.43b) 238

Predicted EKF Actual Steady Component Steady State State A −0.0274 0.01241 B −0.2393 0.1837 C 1.1450 0.6753

Table 12.2: EKF steady-state behavior, no measurement or state noise

where cj denotes the concentration of species j, R is the ideal gas constant, and T is the reactor temperature 2. We assume that the ideal gas law holds (high temperature, low pressure). We consider state estimation for both a batch reactor and a CSTR.

Batch Reactor

From first principles, the model for a well-mixed, constant volume, isothermal batch reactor is

x˙ = f(x) = νT r(x) (12.44) h iT x0 = 0.5 0.05 0 (12.45)

We consider state estimation with the following parameters:

∆t = tk+1 − tk = 0.25 (12.46a) 2 2 2 Π0 = diag 0.5 , 0.5 , 0.5 (12.46b)

Gk = diag (1, 1, 1) (12.46c) 2 2 2 Qk = diag 0.001 , 0.001 , 0.001 (12.46d) 2 Rk = 0.25 (12.46e) h iT x¯0 = 0 0 4 (12.46f)

Note that the initial guess, x¯0, is poor. The actual plant experiences N (0, Qk) noise in the state and N (0, Rk) noise in the measurements. We now examine the estimation performance of both the EKF and MHE for this system. Figure 12.14 demonstrates that the EKF cannot reconstruct the evolution of the state for this system. In fact, the EKF appears to converge to incorrect steady-state estimates of the state. Table 12.2 presents the results of solving the equations in (12.33) for this system. Note that the concentrations of components A and B are negative, indicating that the EKF has converged to an unphysical state estimate. To prevent negative concentrations, we next implement an ad hoc clipping strategy in which negative filtered values of the state are set to zero (i.e. if xˆk|k < 0, set xˆk|k = 0). Figure 12.15 plots these clipped EKF results. Here, the clipped EKF drives the predicted pressure three orders of magnitude larger than the measured

2For the simulations, RT = 32.84. 239

1.6 C 1.2

0.8 C 0.4 B 0 A A Concentration -0.4 B (a) -0.8 50 1510 302520 Time 35

30

25

20 Pressure

15 (b) 10 50 1510 302520 Time

Figure 12.14: Extended Kalman filter results: (a) evolution of the actual (solid line) and EKF updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and EKF updated (dashed line) pressure estimates.

pressure before eventually converging to the actual states. Figure 12.16 presents the results of applying MHE. For these results, we have constrained the state to prevent estimation of negative concentrations. The figures demonstrate that MHE swiftly converges to the correct state estimates. A little algebraic analysis reveals that multiple steady states satisfy the steady-state measurement for this system. At steady state, the model and measurement equations yield one linear equation (assuming no noise in the steady-state measurement yss)

yss c + c + c = (12.47) A B C RT 240

1000 100 (a) 10 1 A 0.1 B 0.01 C

Concentration 0.001 1e-04 1e-05 200 40 8060 140120100 Time 100000 (b) 10000

1000 Pressure 100

10 200 40 8060 140120100 Time

Figure 12.15: Clipped extended Kalman filter results: (a) evolution of the actual (solid line) and clipped EKF updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and clipped EKF updated (dashed line) pressure estimates. and two nonlinear equations

k1cA = k−1cBcC (12.48) 2 k2cB = k−2cC (12.49) Solving for the steady-state solution using equations (12.47)-(12.49):

k2 2 2 cC = cB = K2cB (12.50) k−2 k−1k2 3 K2 3 cA = cB = cB (12.51) k1k−2 K1 K2 3 2 yss 0 = cB + K2cB + cB − (12.52) K1 RT 241

0.7 0.6 C 0.5 0.4 0.3 B 0.2 Concentration 0.1 A (a) 0 50 1510 302520 Time 32 30 28 26 24

Pressure 22 20 18 (b) 16 50 1510 302520 Time

Figure 12.16: Moving-horizon estimation results, states constrained to x ≥ 0, smoothing initial covariance update, and horizon length of 2.5 time units (N = 11 measurements): (a) evolution of the actual (solid line) and MHE updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and MHE updated (dashed line) pressure estimates.

Descartes’ rule of signs states that for polynomials with real coefficients, the number of pos- itive, real roots is either the number of sign changes between consecutive coefficients or two less than this number. Since equilibrium constants and the steady-state measurement are posi- tive, equation (12.52) has at most one positive root. Thus there is only one physically realizable steady state. MHE is a natural estimation tool for this system since its incorporation of con- straints can thus prevent the estimator from converging to unphysical steady states. 242

Predicted EKF Actual Steady Component Steady State State A −0.0122 0.0224 B −0.1364 0.2006 C 1.1746 0.6411

Table 12.3: EKF steady-state behavior, no measurement or state noise

CSTR

From first principles, the model for a well-mixed, isothermal CSTR reactor is

Qf Qo T x˙ = cf − x + ν r(x) (12.53) VR VR h iT cf = 0.5 0.05 0 (12.54) h iT x0 = 0.5 0.05 0 (12.55)

Qf = Qo = 1 (12.56)

VR = 100 (12.57) We consider state estimation with the following measurement and parameters: h i yk = RTRTRT xk (12.58a)

∆t = tk+1 − tk = 0.25 (12.58b) 2 2 2 Π0 = diag 4 , 4 , 4 (12.58c)

Gk = diag (1, 1, 1) (12.58d) 2 2 2 Qk = diag 0.001 , 0.001 , 0.001 (12.58e) 2 Rk = 0.25 (12.58f) h iT x¯0 = 0 0 3.5 (12.58g)

Again, the initial guess, x¯0, is poor. The actual plant experiences N (0, Qk) noise in the state and N (0, Rk) noise in the measurements. We now examine the estimation performance of both the EKF and MHE for this system. Figure 12.17 demonstrates that, similarly to the batch case, the EKF appears to converge to an incorrect steady-state estimate. This observation is confirmed by determining the EKF steady state assuming no state or measurement noise. Calculating the EKF steady state via equations (12.33) and assuming no state or measurement noise yields the results in Table 12.3. Some steady-state analysis of the system sheds light on the cause of this phenomenon. As- suming no noise in the steady-state measurement, the system has one linear steady-state mea- surement yss yss c + c + c = (12.59) A B C RT 243

3 2.5 2 1.5 C 1 C 0.5 B A 0 A

Concentration B -0.5 -1 (a) -1.5 50 1510 302520 Time

30 28 26 24 22 Pressure 20 18 (b) 16 50 1510 302520 Time

Figure 12.17: Extended Kalman filter results: (a) evolution of the actual (solid line) and EKF updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and EKF updated (dashed line) pressure estimates. and one linear combination resulting from ρ, the null space of the stoichiometric matrix h i ρ = 3 1 2 (12.60)

3cA + cB + 2cC = 3cAf + cBf + 2cCf (12.61) Therefore the steady-state calculation is a nonlinear problem, and this system satisfies both conditions required for EKF failure. Figure 12.18 presents the EKF estimation results for implementation of a clipping strat- egy. Although clipping eliminates estimation error, this strategy causes a lengthy period of overestimation of the pressure, in some cases by two orders of magnitude. Figure 12.19 presents the results of applying MHE. For these results, we have con- strained the state to prevent estimation of negative concentrations. These figures demonstrate 244

1000 (a) 100 10 1 A 0.1 B 0.01 C Concentration 0.001 1e-04 200 40 8060 100 Time 10000 (b)

1000

Pressure 100

10 200 40 8060 100 Time

Figure 12.18: Clipped extended Kalman filter results: (a) evolution of the actual (solid line) and clipped EKF updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and clipped EKF updated (dashed line) pressure estimates. that MHE swiftly converges to the correct state estimates.

12.7.3 Example 3

Reconsider the batch model given in section 12.7.2, but with the following updated parameters

h iT k = 0.5 0.4 0.2 0.1 (12.62a) 2 Rk = 0.1 (12.62b) and new measurement h i yk = −1 1 1 xk (12.63) 245

0.7 C 0.6 0.5 0.4 0.3 B 0.2 Concentration 0.1 A (a) 0 50 1510 302520 Time 32 30 28 26 24

Pressure 22 20 18 (b) 16 50 1510 302520 Time

Figure 12.19: Moving-horizon estimation results, states constrained to x ≥ 0, smoothing initial covariance update, and horizon length of 2.5 time units (N = 11 measurements): (a) evolution of the actual (solid line) and MHE updated (dashed line) concentrations. (b) evolution of the actual (solid line), measured (points), and MHE updated (dashed line) pressure estimates.

Note that the measurement has no physical meaning. Solving for the steady-state solution in terms of cB yields

K2 3 2 0 = − cB + K2cB + cB − yss (12.64) K1 Again using Descartes’ rule of signs and taking into account the specified parameters, equa- tion (12.64) has two positive roots and one negative root. In contrast to the previous example, there are multiple physically realizable steady states. We now examine the effect of poor initial conditions upon the estimation behavior of the EKF and MHE. Table 12.4 presents the a priori initial conditions for state estimation. Comparison of Figures 12.20 and 12.21 demonstrates that given a poor estimate of the initial state, the EKF 246

Figures x¯0 h iT 12.20, 12.21 3 0.1 3 h iT 12.22-12.27 4 0 4

Table 12.4: A priori initial conditions for state estimation

12 A 10 C 8 6 4 Concentration 2 B 0 50 1510 302520 Time 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Measurement -0.2 -0.4 -0.6 50 1510 302520 Time

Figure 12.20: Extended Kalman filter results: (a) evolution of the actual (solid line) and EKF updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and EKF updated (dashed line) pressure estimates. cannot reconstruct the evolution of the state while MHE can. Figures 12.22 and 12.23 show that given an even poorer estimate of the initial state, both the EKF and MHE fail to reconstruct the evolution of the state. To improve the quality of the estimates, we constrain the concentrations in the estimators so that 0 ≤ cj ≤ 4.5, j = A, B, C (12.65) 247

4 A 3.5 3 C 2.5 2 1.5 B

Concentration 1 0.5 0 50 1510 302520 Time 1.2 1 0.8 0.6 0.4 0.2 0 Measurement -0.2 -0.4 -0.6 50 1510 302520 Time

Figure 12.21: Moving-horizon estimation results, states constrained to x ≥ 0, smoothing initial covariance update, and horizon length of 2.5 time units (N = 11 measurements): (a) evolution of the actual (solid line) and MHE updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and MHE updated (dashed line) pressure estimates.

Figures 12.24 and 12.25 demonstrate that with this extra knowledge, MHE converges to the true state estimates while the clipped EKF estimates are trapped on the constraint. Finally, we relax the concentration constraints to

0 ≤ cj ≤ 5.5, j = A, B, C (12.66)

Not surprisingly, the clipped EKF estimates remain trapped on the constraint, as shown in Figure 12.26. The quality of the MHE estimates is a function of the estimation horizon, as seen in Figure 12.27. If the estimation horizon is too short, the MHE estimates are pinned against the state constraint; increasing the horizon remedies this problem. For short horizons, we suspect that the data in the estimation horizon cannot overcome the biasing of the arrival cost 248

12 A 10 C 8 6 4 Concentration 2 B 0 50 1510 302520 Time 1.2 1 0.8 0.6 0.4 0.2 0 Measurement -0.2 -0.4 -0.6 50 1510 302520 Time

Figure 12.22: Extended Kalman filter results: (a) evolution of the actual (solid line) and EKF updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and EKF updated (dashed line) pressure estimates. approximation (with the smoothing scheme), hence resulting in state estimates pinned against the constraint. Changing arrival cost approximations (e.g. switching from the smoothing scheme to a uniform prior) when constraints are active may constitute one way of addressing this problem without having to increase the estimation horizon. Table 12.5 summarizes the estimation results examined in this section.

12.7.4 Computational Expense

Table 12.6 summarizes the average computational expense per time step for each of the ex- amples presented in this chapter. All computations were performed in GNU Octave (http: //www.octave.org/) on a 2.0-GHz processor. MHE computations were performed using 249

12 A 10 C 8 6 4 Concentration 2 B 0 50 1510 302520 Time 1.2 1 0.8 0.6 0.4 0.2 0 Measurement -0.2 -0.4 -0.6 50 1510 302520 Time

Figure 12.23: Moving-horizon estimation results, states constrained to x ≥ 0, smoothing initial covariance update, and horizon length of 2.5 time units (N = 11 measurements): (a) evolution of the actual (solid line) and MHE updated (dashed line) concentrations. (b) evolution of the actual (solid line), measured (points), and MHE updated (dashed line) estimates.

the NMPC toolbox (http://www.che.wisc.edu/˜tenny/nmpc/). Not surprisingly, MHE requires substantially more computational time than the EKF. This increase results because

1. MHE employs optimization while the EKF uses a one-step linearization, and

2. MHE calculates sensitivities over a trajectory of states whereas the discrete EKF calcu- lates only a single sensitivity. 250

5 A,C 4

3

2

Concentration 1 B

0 50 1510 302520 Time 1.2 1 0.8 0.6 0.4 0.2 0 Measurement -0.2 -0.4 -0.6 50 1510 302520 Time

Figure 12.24: Clipped extended Kalman filter results, states clipped to 0 ≤ x ≤ 4.5: (a) evo- lution of the actual (solid line) and clipped EKF updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and clipped EKF updated (dashed line) estimates.

12.8 Conclusions

Virtually all chemical engineering systems contain nonlinear dynamics and/or state constraints. The need to incorporate this information into state estimation is illustrated by the examples presented in this chapter. These examples demonstrate that even with perfect concordance be- tween the model and the physical plant, it is possible for the nominal EKF to fail to converge to the true state when 1. the system model and measurement are such that multiple states satisfy the steady-state measurement, and

2. the estimator is given a poor initial guess of the state. 251

5 4.5 A 4 C 3.5 3 2.5 2 1.5 Concentration 1 B 0.5 0 50 1510 302520 Time 2 1.5 1 0.5 0 Measurement -0.5 (b) -1 50 1510 302520 Time

Figure 12.25: Moving-horizon estimation results, states constrained to 0 ≤ x ≤ 4.5, smoothing initial covariance update, and horizon length of 2.5 time units (N = 11 measurements): (a) evolution of the actual (solid line) and MHE updated (dashed line) concentrations; (b) evolu- tion of the actual (solid line), measured (points), and MHE updated (dashed line) estimates.

Given the same estimator tuning, model, and measurements as the EKF, MHE provides im- proved state estimation and greater robustness to poor guesses of the initial state. These bene- fits arise because MHE incorporates physical state constraints into an optimization, accurately uses the nonlinear model, and optimizes over a trajectory of states and measurements. With local optimization, our results indicate that multivariate normal approximations to the arrival cost combined with judicious use of constraints can prevent multiple optima in the estimator and generate acceptable estimator performance. The issue of global versus local optimization and the selection of an arrival cost also have substantial impact on the behavior of MHE. If one could implement a global optimiza- tion strategy in real time, approximating the arrival cost with a uniform prior and making 252

6 5 A,C 4 3 2 Concentration 1 B 0 50 1510 302520 Time 1.2 1 0.8 0.6 0.4 0.2 0 Measurement -0.2 -0.4 -0.6 50 1510 302520 Time

Figure 12.26: Clipped extended Kalman filter results, states clipped to 0 ≤ x ≤ 5.5: (a) evo- lution of the actual (solid line) and clipped EKF updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points), and clipped EKF updated (dashed line) estimates. the estimation horizon reasonably long is preferable to an approximate multivariate normal arrival cost because of the latter’s biasing effect on the state estimates. With local optimiza- tion, our results indicate that multivariate normal approximations to the arrival cost combined with judicious use of constraints can prevent multiple optima in the estimator and generate acceptable estimator performance. One potential pitfall of employing local optimization is the inability to identify multi- ple modes in the a posteriori distribution. Example 2 of this chapter illustrates this pitfall per- fectly: attraction of MHE estimates to a mode in the infeasible region leads to state estimates trapped on constraints even though another mode lies within the feasible region. To overcome this difficulty, we propose that Monte Carlo particle filters may prove useful in estimating the 253

6 5 N=2.5 N=5 4

C 3 N=10 2

1 Actual 0 50 1510 302520 Time 2.5 2 N=2.5,5 1.5 1 N=10

0.5 Actual Measurement 0 (b) -0.5 50 1510 302520 Time

Figure 12.27: Moving-horizon estimation results, states constrained to 0 ≤ x ≤ 5.5, and smoothing initial covariance update: (a) effect of horizon length on the evolution of the ac- tual (solid line) and MHE updated (dashed line) C concentration; (b) evolution of the actual (solid line), measured (points), and MHE updated (dashed line) estimates. Values of N on the plots correspond to the horizon length in time units. a posteriori distribution. These filters present one method of identifying the appearance and disappearance of small numbers of local optima in the a posteriori distribution, but do not provide a reasonable framework for accurately reconstructing the mode of this distribution. Using particle filters to estimate the arrival cost in MHE presents one manner of better approx- imating the mode of the a posteriori distribution. Additionally, identifying local optima in the arrival cost distribution should yield better initial guesses for the local MHE optimization. It is reasonable to expect that more complicated models with less restrictive assump- tions than the ones proposed here may yield multiple optima corresponding to both physically realizable and unrealizable states. Since MHE permits incorporation of constraints into its op- 254

Estimator x¯0 Constraints Horizon Length Estimates Converge? h iT EKF 3 0.1 3 x ≥ 0 NA No h iT MHE 3 0.1 3 x ≥ 0 2.5 time units (N = 11) Yes h iT EKF 4 0 4 x ≥ 0 NA No h iT MHE 4 0 4 x ≥ 0 2.5 time units (N = 11) No h iT EKF 4 0 4 0 ≤ x ≤ 4.5 NA No h iT MHE 4 0 4 0 ≤ x ≤ 4.5 2.5 time units (N = 11) Yes h iT EKF 4 0 4 0 ≤ x ≤ 5.5 NA No h iT MHE 4 0 4 0 ≤ x ≤ 5.5 2.5 time units (N = 11) No h iT MHE 4 0 4 0 ≤ x ≤ 5.5 5 time units (N = 21) No h iT MHE 4 0 4 0 ≤ x ≤ 5.5 10 time units (N = 41) Yes

Table 12.5: Effects of a priori initial conditions, constraints, and horizon length on state estima- tion. N denotes the number of measurements in the estimation horizon.

Horizon Average CPU Time Example Estimator Length per Time Step (sec) 12.7.2 EKF N = 1 0.003 12.7.2 MHE N = 11 0.737 12.7.2 EKF N = 1 0.005 12.7.2 MHE N = 11 0.676 12.7.3 EKF N = 1 0.006 12.7.3 MHE N = 11 1.756 12.7.3 MHE N = 21 4.712 12.7.3 MHE N = 41 6.899

Table 12.6: Comparison of MHE and EKF computational expense. N denotes the number of measurements in the estimation horizon.

timization, it is the natural choice for preventing estimation of physically unrealizable states. Since MHE employs a trajectory of measurements as opposed to measurements at only a sin- gle time, it is better suited than the EKF for distinguishing among the remaining physically realizable states. 255 12.9 Appendix

12.9.1 Derivation of the MHE Smoothing Formulation

Assume that the system is Markov, that is,

P (xk+1|x0,..., xk) = P (xk+1|xk)

P (xT −N+1,..., xT |y0,..., yT ) (12.67)

=P (xT −N+1|y0,..., yT ) P (xT −N+2,..., xT |y0,..., yT , xT −N+1) (12.68)

P (xT −N+2,..., xT , y0,..., yT |xT −N+1) =P (xT −N+1|y0,..., yT ) (12.69) P (y0,..., yT |xT −N+1) P (xT −N+1|y0,..., yT ) = P (xT −N+2,..., xT |xT −N+1) P (y0,..., yT |xT −N+1,..., xT ) P (y0,..., yT |xT −N+1) (12.70)

P (xT −N+1|y0,..., yT ) = P (y0,..., yT −N |xT −N+1,..., xT , yT −N+1,..., yT )× P (y0,..., yT |xT −N+1)

P (xT −N+2,..., xT |xT −N+1)P (yT −N+1,..., yT |xT −N+1,..., xT ) (12.71)

P (xT −N+1|y0,..., yT ) = P (y0,..., yT −N |xT −N+1,..., xT , yT −N+1,..., yT )× P (y0,..., yT |xT −N+1) T −1 ! T ! Y Y P (xk+1|xk) P (yk|xk) (12.72) k=T −N+1 k=T −N+1 T −1 ! T ! P (xT −N+1|y0,..., yT ) Y Y = P (xk+1|xk) P (yk|xk) × P (y ,..., y |xT −N+1) T −N+1 T k=T −N+1 k=T −N+1

P (y ,..., y |xT −N+1,..., xT , y ,..., y ) 0 T −N T −N+1 T (12.73) P (y0,..., yT −N |xT −N+1, yT −N+1,..., yT ) T −1 ! T ! P (xT −N+1|y0,..., yT ) Y Y = P (xk+1|xk) P (yk|xk) × P (y ,..., y |xT −N+1) T −N+1 T k=T −N+1 k=T −N+1

P (xT −N+2,..., xT |xT −N+1, y ,..., y ) 0 T (12.74) P (xT −N+2,..., xT |xT −N+1, yT −N+1,..., yT ) T −1 ! T ! P (xT −N+1|y0,..., yT ) Y Y = P (xk+1|xk) P (yk|xk) (12.75) P (y ,..., y |xT −N+1) T −N+1 T k=T −N+1 k=T −N+1

The corresponding probabilistic manipulations are: 256

From Equation To Equation Manipulation of Boxed Quantity (12.67) (12.68) P (a, b|c) = P (a|b, c)P (b|c) (12.68) (12.69) P (a, b|c) = P (a|b, c)P (b|c) P (a, b|c) (12.69) (12.70) P (a|b, c) = P (b|c) (12.70) (12.71) P (a, b|c) = P (a|b, c)P (b|c) (12.71) (12.72) P (a, b|c) = P (a|b, c)P (b|c) and the Markov property (12.72) (12.73) P (a, b|c) = P (a|b, c)P (b|c) P (a|b, c, d) P (c|a, b, d) (12.73) (12.74) = P (a|b, d) P (c|b, d) (12.74) (12.75) 1 by the Markov property

12.9.2 Derivation of the MHE Filtering Formulation

From the smoothing formulation, we can recover the filtering formulation by manipulating the first term of (12.75):

P (xT −N+1|y ,..., y ) 0 T (12.76) P (yT −N+1,..., yT |xT −N+1)

P (xT −N+1, y0,..., yT ) = (12.77) P (yT −N+1,..., yT |xT −N+1)P (y0,..., yT )

P (xT −N+1, yT −N+1,..., yT |y0,..., yT −N ) P (y0,..., yT −N ) = (12.78) P (yT −N+1,..., yT |xT −N+1)P (y0,..., yT ) P (y ,..., y |x , y ,..., y ) P (x |y ,..., y )P (y ,..., y ) = T −N+1 T T −N+1 0 T −N T −N+1 0 T −N 0 T −N P (yT −N+1,..., yT |xT −N+1) P (y0,..., yT ) (12.79)

P (y0,..., yT −N ) = P (xT −N+1|y0,..., yT −N ) (12.80) P (y0,..., yT )

P (xT −N+1|y ,..., y ) = 0 T −N (12.81) P (yT −N+1,..., yT |y0,..., yT −N ) (12.82)

The corresponding probabilistic manipulations are: 257

From Equation To Equation Manipulation of Boxed Quantity P (a, b) (12.76) (12.77) P (a|b) = P (b) (12.77) (12.78) P (a, b, c) = P (a, b|c)P (c) (12.78) (12.79) P (a, b|c) = P (a|b, c)P (b|c) (12.79) (12.80) 1 by the Markov property P (a, b) (12.80) (12.81) = P (a|b) P (b)

The filtering formulation of MHE is thus

p(xT −N+1,..., xT |y0,..., yT ) = T −1 ! T ! P (xT −N+1|y ,..., y ) Y Y 0 T −N P (x |x ) P (y |x ) (12.83) P (y ,..., y |y ,..., y ) k+1 k k k T −N+1 T 0 T −N k=T −N+1 k=T −N+1

12.9.3 Equivalence of the Full Information and Least Squares Formulations

Consider the model given by equation (12.1). This model assumes that each wk and vk is normally distributed, and that the matrix G(xk, uk) has full column rank. We would like to calculate the maximum likelihood estimate:

arg max P (x0,..., xT |y0,..., yT ) (12.84) x0,...,xT

= arg min − log P (x0,..., xT |y0,..., yT ) (12.85) x0,...,xT ( T −1 ! T !) Y Y = arg min − log P (x0) P (xk+1|xk) P (yk|xk) x0,...,xT k=T −N+1 k=T −N+1 (12.86) T −1 T X X = arg min − log P (x0) − log P (xk+1|xk) − log P (yk|xk) x0,...,xT k=T −N+1 k=T −N+1 (12.87)

We can calculate the conditional probabilities in equation (12.87) by first rewriting the joint distributions as functions of independent random variables

(xk+1, xk) = f(wk, xk) (12.88)

(yk, xk) = f(vk, xk) (12.89)

These density calculations are presented in the following subsections. 258

Calculation of P (xk+1|xk)

Given the model (12.1), derive the density P (xk+1|xk) under the assumption that G(xk, uk) has full column rank. We derive this density by writing the joint density P (xk+1, xk) as a function of the joint density P (wk, xk). For this conversion to hold, we must show that

1. (wk, xk) can be uniquely written in terms of (xk+1, xk). We trivially have

xk = xk (12.90)

Also, we have

xk+1 = F (xk, uk) + G(xk, uk)wk (12.91)

G(xk, uk)wk = xk+1 − F (xk, uk) (12.92) T T G(xk, uk) G(xk, uk)wk = G(xk, uk) (xk+1 − F (xk, uk)) (12.93)

T −1 T wk = G(xk, uk) G(xk, uk) G(xk, uk) (xk+1 − F (xk, uk)) (12.94)

Since G(xk, uk) has full column rank, equation (12.94) has a unique solution.

2. We must show that the following matrix has full rank:

∂xk+1 ∂xk+1  T T  ∂wk ∂xk  H1 =   (12.95)  ∂xk ∂xk  T T ∂wk ∂xk  ∂F (xk, uk) ∂G(xk, uk)  G(xk, uk) T + T wk =  ∂xk ∂xk  (12.96) 0 I

Clearly H1 has full rank since G(xk, uk) has full column rank.

Since these conditions hold, the inverse function theorem tells us that

−1 P (xk+1, xk) =P (wk, xk)|H1(xk, wk)| (12.97) −1 =P (wk)P (xk)|H1(xk, wk)| (independence of wk and xk) (12.98)  T −1 T  =P wk = G(xk, uk) G(xk, uk) G(xk, uk) (xk+1 − F (xk, uk)) −1 × P (xk)|H1(xk, wk)| (12.99)

Now solve for the desired conditional density:

P (xk+1, xk) P (xk+1|xk) = (12.100) P (xk)  T −1 T  −1 = P wk = G(xk, uk) G(xk, uk) G(xk, uk) (xk+1 − F (xk, uk)) |H1(xk, wk)| (12.101) 259

Calculation of P (yk|xk)

Given the model (12.1), derive the density P (yk|xk). We derive this density by writing the joint density P (yk, xk) as a function of the joint density P (vk, xk). For this conversion to hold, we must show that

1. (vk, xk) can be uniquely written in terms of (yk, xk). We clearly have

vk = yk − h(xk) (12.102)

xk = xk (12.103)

2. We must show that the following matrix has full rank:

  ∂yk ∂yk T T ∂vk ∂xk  H2 =   (12.104)  ∂xk ∂xk  T T ∂vk ∂xk  ∂h(xk) I T =  ∂xk  (12.105) 0 I

Clearly H2 has full rank.

Since these conditions hold, the inverse function theorem tells us that

−1 P (yk, xk) = P (vk, xk)|H2(xk)| (12.106)

= P (vk)P (xk) (independence of vk and xk) (12.107)

= P (vk = yk − h(xk)) P (xk) (12.108)

Now solve for the desired conditional density:

P (yk, xk) P (yk|xk) = (12.109) P (xk)

= P (vk = yk − h(xk)) (12.110)

Derivation of the Least Squares Problem

We have assumed that both P (wk) and P (vk) are normally distributed. N (m, P )-distributed multivariate normals have probability functions of the form

1  1  P (x) = exp − (x − m)T P −1(x − m) (12.111) (2π)n/2|P |1/2 2 260 where n is the number of elements of the variable x. Therefore

arg max = arg min − log P (xk+1|xk) (12.112) xk,xk+1 xk,xk+1 −1 = arg min − log P (wk)|H1(xk, wk)| (12.113) xk,xk+1

1 T −1 = arg min wk Qk wk + log (|H1(xk, wk)|) (12.114) xk,xk+1 2

s.t.: xk+1 = F (xk, uk) + G(xk, uk)wk (12.115) and

arg max P (yk|xk) = arg min − log P (yk|xk) (12.116) xk xk

= arg min − log P (vk) (12.117) xk 1 T −1 = arg min vk Rk vk (12.118) xk 2

s.t.: yk = h(xk) + vk (12.119)

Plugging these values into equation (12.87) yields the minimization

T −1 T X T −1 X T −1 ΦT = min Γ(x0) + wk Qk wk + |H1(xk, wk)| + vk Rk vk (12.120a) x0,...,xT k=0 k=0

T −1 s.t.: Γ(x0) = (x0 − x¯0) Π (x0 − x¯0) (12.120b)

xk+1 = F (xk, uk) + G(xk, uk)wk (12.120c)

yk = h(xk) + vk (12.120d)

If G(xk, uk) is not a function of xk, then the determinant |H1(xk, wk)| is constant and opti- mization (12.120) becomes a pure least-squares problem.

12.9.4 Evolution of a Nonlinear Probability Density

We are interested in determining formulas for the evolution of the a posteriori probability density for the system

 xk,1   2k∆txk,1 + 1  xk+1 =  2  + wk (12.121a)  k∆txk,1  xk,2 + 2k∆txk,1 + 1 h i yk = 1 1 xk + vk (12.121b)

We view future states (xk’s) as functions of the random variables with known statistics (x0, wk’s, and vk’s). First update the a priori estimate, x¯0, with the first measurement, y0, by 261

1. writing the joint probability density P (x0, y0) as a function of P (x0, v0) −1 I 0

P (x0, y0) = P (x0, v0) (12.122a) CI

= P (x0)P (v0) (x0 and v0 are independent) (12.122b)

2. calculating the conditional probability density P (x0|y0)

P (x0, y0) P (x0|y0) = (12.123a) P (y0) P (x0, y0) = R ∞ (12.123b) −∞ P (x0, y0)dx0

P (x0)Pv0 (y0 − Cx0) = R ∞ (12.123c) −∞ P (x0)Pv0 (y0 − Cx0)dx0  1 T −1 1 T −1  exp − (x0 − x¯0) Π (x0 − x¯0) − (y − Cx0) R (y − Cx0) = 2 0 2 0 k 0 R ∞  1 T −1 1 T −1  −∞ exp − 2 (x0 − x¯0) Π0 (x0 − x¯0) − 2 (y0 − Cx0) Rk (y0 − Cx0) dx0 (12.123d)

Now propagate P (x0|y0) to the next measurement time to obtain P (x1|y0): −1 ∂x1 ∂x1 ∂xT ∂wT P (x1, w0|y ) = P (x0, w0|y ) 0 0 (12.124a) 0 0 ∂w0 ∂w0 T T ∂x0 ∂w0 ¯ 2 = P (x0, w0|y0) 2k∆x0,1 + 1 (12.124b) ¯ 2 = P (x0|y0)P (w0) 2k∆x0,1 + 1 (12.124c)

Z ∞ P (x1|y0) = P (x1, w0|y0)dw0 (12.125) −∞ Z ∞ ¯ 2 = P (x0|y0)P (w0) 2k∆x0,1 + 1 dw0 (12.126) −∞

P (x1|y0)Pv1 (y1 − Cx1) P (x1|y0, y1) = R ∞ (12.127a) −∞ P (x1|y0)Pv1 (y1 − Cx1)dx1 R ∞ −∞ Ω1dw0 = R ∞ R ∞ (12.127b) −∞ −∞ Ω1dw0dx1 in which    Z ∞ 1 2 1 X Ω = 2k¯∆x + 1 exp − (x − x¯ )T Π−1(x − x¯ ) + wT Q−1w + vT R−1v dw 1 0,1  2  0 0 0 0 0 0 k 0 j k j 0 −∞ j=0 (12.128) For future times, it is straightforward to derive equations (12.25) and (12.26). 262 Notation

Ak state sensitivity at time tk

Ck linearization of the measurement function h(xk)

cf inlet concentration vector

cj concentration of species j

F (xk, uk) solution to a first principles, differential equation model f¯(x) reconstructed distribution using density estimation

G(xk, uk) full column rank matrix h window width for density estimation

h(xk, tk) model prediction of the measurement at time tk K kernel function for density estimation k reaction rate vector

kj jth reaction rate L filter gain matrix N MHE horizon length

Ns number of Monte Carlo samples n number of measurements

nm number of measurements that can be written as a linear combination of states N (m, P ) normal distribution with mean m and covariance P P probability

Pj partial pressure of species j

Q covariance of the state noise wk

Qf volumetric flow rate into the reactor

Qo effluent volumetric flow rate

qj weight of the jth Monte Carlo sample

R covariance of the measurement noise vk R ideal gas constant r(x) reaction rate vector T reactor temperature t time

uk input at time tk

VR reactor volume

vk N (0, Rk) measurement noise at time tk

wk N (0, Qk) state noise at time tk x state

xk state at time tk

xj|k estimated state at time tj given measurements

x¯0 a priori estimate of x0

xˆk estimated state at time tj to time tk

yk measurement at time tk yss steady-state measurement 263

∆t time increment  arbitrary constant η nullity of the stoichiometric matrix ν ν stoichiometric matrix

Π covariance matrix for the a priori state estimate x0

Φk objective function value at time tk ρ rank of the stoichiometric matrix ν ρ null space of the stoichiometric matrix ν 264 265

Chapter 13

Closed Loop Performance Using Moving-Horizon Estimation

We now turn our attention to the effect of local optima in the estimator under closed-loop control. Figure 13.1 details the interaction between the process, sensor, estimator, target calcu- lation, and regulator. For this section, we use the nonlinear model-predictive control (NMPC) regulator and target calculation contained in the NMPC toolbox [146, 149, 150], and consider the effect of employing either the extended Kalman filter (EKF) or moving-horizon estimation (MHE) as the estimator.

13.1 Regulator

The goal of the regulator is to drive the state to its set point. Nonlinear model-predictive control solves the on-line optimization

Nc−1 X T min L(xk, uk) + xN P xNc (13.1) {u },{x } c k k k=0 | {z } | {z } Stage cost Tail cost

xk+1 = F (xk, uk) (13.2)

Hxk ≤ h, Duk ≤ d, M u∆uk ≤ mu (13.3) in which

• uk is the input vector at time tk;

• x is the state vector at time tk;

• Nc is the control horizon length;

• P is the penalty for the terminal state in the control horizon;

• H and h are the state constraint matrix and vector, respectively;

• D and d are the input constraint matrix and vector, respectively; and 266 Sensor Noise

vk Unmeasured Process Disturbances Sensor Process x = F (x , u , w ) uk k+1 k k k yk = h(xk) + vk Measurements Regulator yk

xt ut Estimator xˆk

State Estimate xˆk Target pk Calculation

yset uset

Figure 13.1: General diagram of closed-loop control for the model-predictive control frame- work. The goal of control is to drive the process measurements (yk’s) to a desired measurement and input set point (yset and uset, respectively). Model-predictive control requires models for both the process and sensor. The estimator uses these models along with process measurement yk and input uk information to estimate the state xk and disturbances dk. The target calculation determines the state and input targets (xt and ut) given the state estimates. The regulator uses these targets and the state estimate to calculate the next input to the process.

• M u and mu are the change in input constraint matrix and vector, respectively.

The subscript k denotes that the measurement time is tk. Here, we use the expectation of the stochastic model (12.1a) as the desired control model. Also, one generally assumes that the control horizon is sufficiently long so that the linear tail cost adequately approximates the infinite horizon solution.

13.2 Disturbance Models for Nonlinear Models

For offset free control, we must account for discrepancies between the plant and the model. The general strategy for doing so requires augmentation of the state with a disturbance model. In general, one models the disturbance as a random walk that influences either manipulated 267 inputs or the observed outputs as so:

xk+1 = F (xk, uk+Xudk) + Gwk (13.4)

yk = h(xk)+Xydk + vk (13.5)

dk+1 = dk + ξk (13.6)

wk ∼ N (0, Q) (13.7)

vk ∼ N (0, R) (13.8)

ξk ∼ N (0, Qd) (13.9) (13.10) in which

• dk is the integrated disturbance vector at time tk,

• wk is the state noise vector at time tk,

• G is the full-column rank state noise matrix,

• Xu is the input disturbance matrix,

• Xy is the output disturbance matrix,

• wk is a N (0, Q) noise at time tk (N (0, Q) denotes a normal distribution with mean 0 and covariance Q),

• F (xk, uk) is the solution to a first principles, differential equation model,

• yk is the system measurement at time tk,

• h is a (possibly) nonlinear function of x,

• vk is a N (0, R) noise at time tk, and

• ξk is a N (0, Qd) noise at time tk.

Such a model implies that dk is stochastic in nature. For output disturbance models, dk should remain roughly constant over the estimation horizon; otherwise, the tacit assumption is that the system is not modeled sufficiently well, and so increasing the estimation horizon has no tangible benefit because the model predictions are not reliable. 268

CAf

Tf

A → B

T

Tc

Figure 13.2: Exothermic CSTR diagram.

Steady State cA (mol/l) Output T (K) Disturbance d (K) 1 0.851 326.2 23.8 2 0.583 344.4 5.6 3 0.177 371.8 −21.8

Table 13.1: Model Steady States for a Plant with Tc = 300 K, T = 350 K

13.2.1 Plant-model Mismatch: Exothermic CSTR Example

We consider the exothermic CSTR shown in Figure 13.2. This example was motivated by a similar example in Tenny [146]. The state, input, and measurement are " # cA x = (13.11) T

u = Tc (13.12)

230 K ≤ uk ≤ 427 K (13.13)

|∆uk| ≤ 15 K (13.14) h i yk = 0 1 xk + dk (13.15) in which cA is the concentration of species A, T is the reactor temperature, and Tc is the coolant temperature. We induce a small mismatch in activation energy between the plant and the model. Here, the output disturbance model generates multiple steady states in the estimator for a given range of the input Tc as seen in Figure 13.3. Table 13.2.1 presents the exact values of these optima. The question of interest, then, is whether or not these optima affect the overall control performance of the system. 269

380 Model 370 Plant 360

(K) set point

T 350 +dk −d 340 k Output 330

320

310 302300298296 304 306 Input Tc (K)

Figure 13.3: Steady states for the Exothermic CSTR example.

360 1.6 1.5 350 1.4 340 1.3 1.2 330 1.1 320 1 0.9 310 0.8 Feed Temperature (K) 300 0.7 Feed Concentration (mol/l) 0 1 2 3 4 5 Time (hr)

Figure 13.4: Exothermic CSTR feed disturbance.

We consider a disturbance in the feed given by Figure 13.4. We examine the closed-loop performance given the following estimators:

1. EKF (Qd = 0.25)

2. MHE with N = 2, smoothing update (Qd = 0.25)

−8 3. MHE with N = 10, no initial penalty (uMHE; Qd = 10 )

2 2 4. MHE with N = 10, constant initial penalty (cMHE; Qd = 0.25, P T −N|T = diag(1, 10 , 10 ))

We use nonlinear MPC with a prediction horizon of Nc = 60 and sampling interval ∆t = 0.05 270 hours. The controller penalty matrices are

Q = diag(0, 4), R = 2 (13.16)

Figure 13.5 presents the results of this comparison. The EKF causes plant ignition, whereas each MHE is able to successfully reject the disturbance without igniting the plant. MHE with a longer estimation horizon provides better disturbance rejection than MHE with a shorter horizon. Also, the output and input behavior of MHE with N = 2 and the EKF appear roughly identical through the first simulated hour, but the estimated states cA and T are slightly different, thus explaining the disparate closed-loop performances. Finally, the two MHE’s with N = 10 provide very similar input and output trajectories even though the estimated states are substantially different. In fact, the two apparently have different steady- state attractors (compare the estimates in Figure 13.5 with the steady states of Table 13.2.1). Note that we did not present results for MHE with N = 10 and the smoothing update. For this particular case, the smoothing update leads to very large penalties on the a priori estimate of the initial state in the horizon, and subsequently yields very poor estimation. Of course, the current benchmark in disturbance rejection is linear MPC. Therefore, we linearize the model around unstable steady state 2, employ an output disturbance model with the same tuning as above, and use a Kalman filter for state estimation. Figure 13.6 compares the best nonlinear results, MHE with N = 10, to the linear MPC result. Nonlinear MPC appears to provide little if any improvement over linear MPC. Additionally, the computational expense of nonlinear MPC is at least two orders of magnitude greater than that of linear MPC.

13.2.2 Maximum Yield Example

Consider the CSTR in Figure 13.7 in which we would like to maximize the yield of the inter- mediate B. This example was motivated by a similar example in Tenny [146]. The state, inputs, disturbance model, measurement, and state-evolution equations are given by

h iT x = cA cB (13.17) h iT u = Tc cAf (13.18) " # " # Tc 1 uk = + dk (13.19) cAf 0 h i yk = 0 1 xk (13.20)   dcA F E1 = (c − c ) − k exp c (13.21) dt V A,f A 1,0 RT A   dcB E2 F = k ∗ c − k exp c − c (13.22) dt 1 A 2,0 RT B V B (13.23) 271

360 355

(K) 350 T 345 340 EKF Output 335 MHE N = 2 MHE N = 10 330 0.50 1 1.5 2 3.532.5 4 4.5 5 Time (hr) 340 EKF 330 MHE N = 2 N = 10 (K) 320 uMHE

C cMHE N = 10 T 310 300 Input 290 280 0 1 2 3 4 5 Time (hr) 1 0.9 MHE N = 2

(mol/l) 0.8

A 0.7 c 0.6 N = 10 0.5 cMHE 0.4 0.3 EKF 0.2 0.1 uMHE N = 10 0 Concentration 0.50 1 1.5 2 3.532.5 4 4.5 5 Time (hr) 380 EKF (K) 370 MHE N = 2

T cMHE N = 10 360 uMHE N = 10 350 340 Temperature 330 0.50 1 1.5 2 3.532.5 4 4.5 5 Time (hr)

Figure 13.5: Exothermic CSTR results: rejection of a feed disturbance using an output distur- bance model. 272

356 354 352 LMPC set point

(K) 350

T 348 346 cMHE, N=10 344 342 Output 340 338 336 0.50 1 1.5 2 3.532.5 4 4.5 5 Time (hr) 330 325 cMHE N = 10 LMPC 320 (K) 315 C

T 310 305

Input 300 295 290 0 1 2 3 4 5 Time (hr)

Figure 13.6: Exothermic CSTR: Comparison of best nonlinear results to linear MPC results.

Due to plant-model mismatch, we opt to operate at a set point less than the true maximum for a given cAf value. Given an input disturbance on the coolant temperature Tc, multiple optima again arise in the estimator, as demonstrated by Figure 13.8. For this example, we consider the temporary disturbance in the measurement given by Figure 13.9. We examine the closed loop performance given both the EKF and MHE. For the estimator, we choose the tuning parameters 2 2 −8 Π0 = diag(0.1 , 0.1 , 1), Qd = 1, R = 10 (13.24) with no state noise. The MHE implementation uses a short estimation horizon of N = 5 and the smoothing update. For the controller, we use a prediction horizon of Nc = 60 with a time increment of ∆t = 0.05 hours. The controller penalty matrices are

Q = diag(0, 400), R = diag(0.5, 50) (13.25)

Figure 13.10 presents the results of this example. MHE successfully rejects the distur- bance and returns the output to set point, whereas the EKF estimates cause a target calcula- tion failure that results in a shutdown of the process. The estimated state cA and disturbance 273

Tf

c Af A → B → C

T cb cAf ,set Tset

Figure 13.7: Maximum yield CSTR

Parameter Value F 100. V 100. 10 k1,0 7.2 × 10 10 k2,0 5.2 × 10 E1/R 8750 E2/R 9700

Table 13.2: Maximum yield CSTR parameters. demonstrate that the EKF experiences considerable difficulties resolving the sudden changes in the measurement caused by the output. Ultimately these difficulties lead to the failure of the target calculation. 274

0.7 0.6 set point 0.5

(mol/l) plant

b 0.4 models C 0.3 0.2

Output 0.1 0 350300 450400 500 Input Tc (K)

Figure 13.8: Maximum yield CSTR steady states

0.2

(mol/l) 0.15 k d 0.1 0.05 0 -0.05 -0.1

Output Disturbance 0.50 1 1.5 2 3.532.5 4 Time (hr)

Figure 13.9: Maximum yield CSTR: temporary output disturbance 275

0.7 0.6 set point 0.5

(mol/l) MHE 0.4 target calculation fails B c 0.3 0.2 EKF

Output 0.1 0 0.50 1 1.5 2 3.532.5 4 Time (hr) 2 1.8 1.6 1.4 EKF 1.2 1 (mol/l) 0.8 A c 0.6 set point 0.4 MHE 0.2 target calculation fails 0 0.50 1 1.5 2 3.532.5 4 Time (hr) 200 target calculation fails 150 100 50 MHE 0

Disturbance (l/hr) -50 EKF -100 0.50 1 1.5 2 3.532.5 4 Time (hr)

Figure 13.10: Maximum yield CSTR results: (a) measurement cB, (b) estimated state cA, and (c) estimated disturbance. EKF and MHE denote extended Kalman filter and moving-horizon estimator, respectively. 276 13.3 Conclusions

In this chapter, we demonstrated how integrated disturbances used in conjunction with non- linear models could induce multiple optima in the estimator. The two examples clearly illus- trated that the quality of feedback depends on the quality of the state estimate, since MHE ex- hibited superior performance to the EKF when used in conjunction with nonlinear MPC. Ad- ditionally, we observed that increasing the MHE horizon length led to improved closed-loop performance at the expense of increased computational burden. Finally, for the exothermic CSTR example, we obtained no significant improvement in nonlinear over linear control for disturbance rejection. This result provides a preliminary indication that the expected improve- ment in disturbance rejection given a nonlinear model requires a better disturbance model.

Notation

D input constraint matrix d input constraint vector H state constraint matrix h state constraint vector M change in input constraint matrix m change in input constraint vector N control horizon length P penalty matrix for the terminal state in the control horizon u input vector x state vector 277

Chapter 14

Conclusions

This thesis has addressed improving and applying stochastic and deterministic methods for modeling chemically reacting systems. The three primary focuses of this thesis were simulat- ing and using stochastic simulations, deriving and applying deterministic population balance models, and applying and improving moving-horizon state estimation. In this chapter, we briefly recap the primary contributions made to each of these topics and outline future av- enues of research. Chapters 4 through 8 considered stochastic models with an emphasis on chemical ki- netics: how to efficiently simulate such models, and how to maximize the utility of these models. For these models, exact simulation methods can be computationally expensive due to the fact that the computational expense scales linearly with the number of reaction events. Additionally, little is currently known about how to efficiently extract information from these models (the so-called systems-level tasks). Chapter 4 considered approximations for more efficient simulation of stochastic chemical kinetics models governed by the discrete master equation. By first partitioning reactions into sets of fast and slow reactions and then mak- ing either an equilibrium, Langevin, or deterministic approximation for the fast reactions, we were able to derive coupled master equations that approximated the complete master equa- tion. These derivations led to simulation strategies that can significantly decrease the com- putational expense of evaluating these models while still accurately reconstructing moments of the exact probability distribution. Chapter 5 considered biased approximations for mean sensitivities of the discrete master equation given only simulation data. Here, we proposed a suitable approximation (first-order error with respect to the mean) that required insignif- icant computational effort in comparison to finite difference methods. These approximate sensitivities enabled efficient execution of systems-level tasks due to the fact that optimization algorithms generally converge without exact gradients. In Chapter 6, we investigated meth- ods for computing unbiased mean sensitivities. We first explained why calculating unbiased mean sensitivities for simulations governed by discrete master equations is difficult: namely, the interchange of the the differentiation and expectation operators required to compute the sensitivity is not valid for a finite number of Monte Carlo reconstructions. To overcome this problem, we applied smoothed perturbation analysis to evaluate sensitivities for discrete-time, state-dependent Markov chain models. To account for the effect of parameters on the timing 278 of continuous events, we introduced the novel technique of smoothing by time integration. These two methods (smoothed perturbation analysis and smoothing by integration) can be combined to estimate sensitivities for the problem of interest, simulations of stochastic chem- ical kinetics governed by the discrete master equation. However, problems arise in imple- menting the smoothed perturbation analysis, making this method more expensive to evalu- ate than the biased sensitivity estimates proposed in the previous chapter. In Chapter 7, we proposed a novel method for calculating exact sensitivities for simulations of stochastic differ- ential equations. For this case, the simulated sample paths are continuous, so we calculated sensitivities by simply differentiating the sample paths with respect to the parameters. We also demonstrated how these sensitivities could be used to efficiently perform systems-level tasks, including steady-state analysis and parameter estimation. However, the results demonstrated little improvement over finite difference methods for fixed-time step integration schemes. Fi- nally, Chapter 8 applied many of the techniques for simulating and using discrete simulations to model batch crystallization systems. Here the primary contribution consisted of demon- strating that stochastic simulation provides a flexible solution technique for examining many possible reaction mechanisms. A second contribution was showing that optimization of the stochastic model is feasible and requires relatively few evaluations of the model. We see many areas for future work in the area of stochastic simulation, including:

• robust software packages that can

1. adaptively partition reactions into fast and slow subsets, applying appropriate ap- proximations for each subset, and 2. adaptively control the error at each step;

• rigorous error analysis for all of the approximations in comparison to the solution of the original master equation; and • efficient methods for evaluating unbiased estimates of mean sensitivities for the discrete master equation governing stochastic chemical kinetics.

Chapters 9 through 11 examined population balance models for virus infections. Cur- rently, most modelers focus either solely on the extracellular or intracellular levels, even though many in vitro and in vivo experiments involve interactions between the two levels. Population balance models can incorporate both levels of information, but solving these models numeri- cally can prove computationally expensive. In Chapter 9, we first derived a population balance model that incorporated intracellular and extracellular levels of information. These models permit differentiating between cells in the population. We then compared this model to other simpler models, such as extracellular models and models that assume all cells in the popula- tion are identical. The results demonstrated that the cell population balance models can more intuitively account for experimentally-observed phenomena than these simpler models, such as multiple rounds of infection and pharmacokinetic delays associated with drug treatments 279 of infections. Chapter 10 considered modeling experimental data from the focal infection sys- tem for the vesicular stomatitis virus. Here, our emphasis was understanding the dynamics of multiple rounds of virus infection and antiviral host response. For host cells without an an- tiviral response, namely baby hamster kidney cells, extracellular models adequately described the dynamics contained in the experimental measurements. In this case, the model suggested that an initial condition effect possibly resulting from the experimental technique led to salient features of the experimental data. For host cells with an interferon antiviral response, in this case murine astrocytoma cells, an age-segregated population balance model best described the experimental measurements. Here, the model suggested intracellular production rates of both virus and interferon. However, combinations of parameters fit the data equally well. Con- sequently, additional measurements are required to uniquely determine all parameters in the model. Chapter 11 revisited the formulation of the population balance model and proposed a decomposition for solving these models when flow of information is restricted from the extra- cellular to intracellular level. As demonstrated by the examples, this decomposition permits efficient and accurate solution of the population balance model. Additionally, the model re- sults can be used to predict population-level measurements of intracellular species. Our work in modeling the focal infection system serves as a first step in providing a quantitative understanding of multiple rounds of both viral infection and host antiviral re- sponse. Additional experimental measurements such as microarray data or using reporter genes to detect interferon up-regulation should provide further constraints to the developed model and necessitate future model modification. We expect future iterations of additional experiments, measurements, and modeling to elucidate an even better comprehensive under- standing of both viral infections and cell-cell signaling. At the same time, we also expect the experiments to eventually out-pace our current capability to solve adequate models. For example, the decomposition technique for solving population balance models presented in Chapter 11 cannot account for a graded antiviral response resulting from differing exposures of host cells to interferon; rather, cells are either completely resistant or completely susceptible to viral infection. Accounting for such a graded response requires more efficient methods for solving coupled integro-partial differential equations than the ones presented in this thesis. Chapters 12 and 13 considered the state estimation problem of determining the max- imum a posteriori state given dynamic process measurements and nonlinear system mod- els. The current industrial standard for this estimation problem, the extended Kalman filter, is computationally efficient but treats the a posteriori distribution as approximately normal. This approximation is not appropriate for multimodal a posteriori distributions, but it is not clear if such distributions arise in chemically reacting systems. In Chapter 12, we examined different probabilistic observers that approximately solve this problem, namely the extended Kalman filter, Monte Carlo observers, and moving-horizon estimation. We outlined conditions in which multiple modes can appear in the a posteriori distribution, and demonstrated that the judicious use of constraints and nonlinear optimization as employed by moving-horizon estimation can lead to significantly better state estimates than the extended Kalman filter. Ad- ditionally, we proposed that Monte Carlo methods could be used to provide improved esti- 280 mates of the arrival cost in the moving-horizon formulation. In Chapter 13, we considered the performance of both moving-horizon estimation and the extended Kalman filter in closed- loop feedback control. Here, using moving-horizon estimation provided superior closed-loop performance than using the extended Kalman filter for cases in which the estimation problem exhibited multiple optima. However, we saw little difference in the performance of nonlin- ear control versus linear control for the case of disturbance rejection. We attribute this phe- nomenon to the lack of a properly-tuned disturbance model. The primary areas for future work in moving-horizon estimation include

• exploring the benefit of using Monte Carlo observers to approximate the arrival cost function;

• distinguishing between local optima arising in the estimation problem, ideally through application of global optimization;

• reducing the computational expense of the optimization required to solve the moving- horizon estimation problem; and

• improving the estimation performance by accurately identifying covariance matrices from experimental data.

In conclusion, this thesis has addressed both stochastic and deterministic models for chemically reacting systems, as well as how to best extract information from these models for purposes other than pure simulation. We believe that the simulation and systems-level tasks developed here should prove useful in modeling and understanding dynamic physical systems. Additionally, we hope that the modeling of multiple rounds of viral infections and host antiviral response will provide an integrated, quantitative understanding of how these infections propagate, and how to best control this propagation. 281

Bibliography

[1] N. R. Abu-Absi, A. Zamamiri, J. Kacmar, S. J. Balogh, and F. Srienc. Automated flow cytometry for acquisition of time-dependent population data. Cytometry, 51A(2):87–96, February 2003.

[2] D. L. Alspach and H. W. Sorenson. Nonlinear Bayesian estimation using Gaussian sum approximations. IEEE Transactions on Automatic Control, AC-17(4):439–448, 1972.

[3] A. Arkin, J. Ross, and H. McAdams. Stochastic kinetic analysis of developmental path- way bifurcation in phage lambda-infected Escherichia coli cells. Genetics, 149(4):1633– 1648, August 1998.

[4] A. Armaou and I. G. Kevrekidis. Optimal switching policies using coarse timesteppers. In Proceedings of the IEEE Conference on Decision and Control, Maui, Hawaii, December 2003.

[5] A. Armaou, C. I. Siettos, and I. G. Kevrekidis. Time-steppers and control of microscopic distributed processes. Accepted for publication in Int. J. Robust Nonlinear Control, 2003.

[6] J. E. Bailey and D. F. Ollis. Biochemical Engineering Fundamentals. McGraw-Hill, New York, 1986.

[7] L. A. Ball, C. R. Pringle, B. Flanagan, V. P. Perepelitsa, and G. W. Wertz. Phenotypic consequences of rearranging the P, M, and G genes of vesicular stomatitis virus. J. Virol., 73(6):4705–4712, June 1999.

[8] R. Bandyopadhyaya, R. Kumar, K. S. Gandhi, and D. Ramkrishna. Modeling of precipi- tation in reverse micellar systems. Langmuir, 13:3610–3620, 1997.

[9] J. Bell, B. Lichty, and D. Stojdl. Getting oncolytic virus therapies off the ground. Cancer cell, 4(1):7–11, July 2003.

[10] W. E. Bentley, B. Kebede, T. Franey, and M.-Y. Wang. Segregated characterization of recombinant epoxide hydrolase synthesis via the baculovirus/insect cell expression sys- tem. Chemical Engineering Science, 49(24A):4133–4141, December 1994.

[11] R. B. Bird, W. E. Stewart, and E. N. Lightfoot. Transport Phenomena. John Wiley & Sons, New York, 1960. 282

[12] E. Bølviken, P. J. Acklam, N. Christopherson, and J.-M. Størdal. Monte Carlo filters for non-linear state estimation. Automatica, 37(2):177–183, February 2001.

[13] S. Bonhoeffer, R. M. May, G. M. Shaw, and M. A. Nowak. Virus dynamics and drug therapy. Proc. Natl. Acad. Sci. USA, 94(13):6971–6976, June 1997.

[14] G. E. P. Box and G. C. Tiao. Bayesian Inference in Statistical Analysis. Addison–Wesley Publishing Company, Reading, Massachusetts, 1st edition, 1973.

[15] P. N. Brown, A. C. Hindmarsh, and L. R. Petzold. Using Krylov methods in the solu- tion of large-scale differential-algebraic systems. SIAM Journal on Scientific Computing, 15(6):1467–1488, November 1994.

[16] Y. Cao, D. T. Gillespie, and L. R. Petzold. The slow-scale stochastic simulation algorithm. Journal of Chemical Physics, 122(1):014116, January 2005.

[17] M. Caracotsios and W. E. Stewart. Sensitivity analysis of initial value problems with mixed ODEs and algebraic equations. Computers & Chemical Engineering, 9(4):359–365, 1985.

[18] C. G. Cassandras and S. Lafortune. Introduction to Discrete Event Systems. The Kluwer In- ternational Series in Engineering and Computer Science. Kluwer Academic Publishers, Boston, MA, 1999.

[19] C. G. Cassandras, Y. Wardi, B. Melamed, G. Sun, and C. G. Panayiotou. Perturbation analysis for on-line control and optimization of stochastic fluid models. IEEE Transac- tions on Automatic Control, AC-47(8):1234–1248, 2002.

[20] M. Chaves and E. Sontag. State-estimators for chemical reaction networks of Feinberg- Horn-Jackson zero deficiency type. European Journal of Control, 8(4):343–359, 2002.

[21] C.-T. Chen. Linear System Theory and Design. Oxford University Press, 3rd edition, 1999.

[22] W. S. Chen, S. Ungarala, B. Bakshi, and P. Goel. Bayesian rectification of nonlinear dy- namic processes by the weighted bootstrap. In AIChE Annual Meeting, Reno, Nevada, 2001.

[23] D. K. Dacol and H. Rabitz. Sensitivity analysis of stochastic kinetic models. J. Math. Phys., 25(9):2716–2727, September 1984.

[24] W. M. Deen. Analysis of Transport Phenomena. Topics in chemical engineering. Oxford University Press, Inc., New York, 1998.

[25] T. O. Drews, R. D. Braatz, and R. C. Alkire. Parameter sensitivity analysis of Monte Carlo simulations of copper electrodeposition with multiple additives. Journal of The Electrochemical Society, 150(11):C807–C812, November 2003. 283

[26] K. A. Duca, V. Lam, I. Keren, E. E. Endler, G. J. Letchworth, I. S. Novella, and J. Yin. Quantifying viral propagation in vitro: Toward a method for characterization of complex phenotypes. Biotechnology Progress, 17(6):1156–1165, November–December 2001.

[27] M. Eigen, C. K. Biebricher, M. Gebinoga, and W. C. Gardiner. The hypercycle. Cou- pling of RNA and protein biosynthesis in the infection cycle of an RNA bacteriophage. Biochemistry, 30(46):11005–11018, November 1991.

[28] E. E. Endler, K. A. Duca, P. F. Nealey, G. M. Whitesides, and J. Yin. Propagation of viruses on micropatterned host cells. Biotechnology and Bioengineering, 17(6):1156–1165, November–December 2003.

[29] D. Endy, D. Kong, and J. Yin. Intracellular kinetics of a growing virus: A genetically structured simulation for bacteriophage T7. Biotechnology and Bioengineering, 55(2):375– 389, July 1997.

[30] D. Endy and J. Yin. Toward antiviral strategies that resist viral escape. Antimicrobial Agents and Chemotherapy, 44(4):1097–1099, April 2000.

[31] A. M. Fendrick, A. S. Monto, B. Nightengale, and M. Sarnes. The economic burden of non-influenza-related viral repiratory tract infection in the United States. Archives of Internal Medicine, 163(4):487–494, February 2003.

[32] R. J. Field and R. M. Noyes. Oscillations in chemical systems. IV. Limit cycle behavior in a model of a real chemical reaction. Journal of Chemical Physics, 60(5):1877–1884, March 1974.

[33] A. P. Fordyce and J. B. Rawlings. A segregated fermentation model for growth and differentiation of Bacillus licheniformis. AIChE Journal, 42(11):3241–3252, November 1996.

[34] J. Fort. A comment on amplification and spread of viruses in a growing plaque. Journal of Theoretical Biology, 214(3):515–518, February 2002.

[35] J. Fort and V. Mendez.´ Time-delayed spread of viruses in growing plaques. Phys. Rev. Lett., 89(17):178101–1–178101–4, October 2002.

[36] A. G. Fredrickson, D. Ramkrishna, and H. M. Tsuchiya. Statistics and dynamics of pro- caryotic cell populations. Mathematical Biosciences, 1:327–374, 1967.

[37] M. Fu and J.-Q. Hu. Conditional Monte Carlo: Gradient Estimation and Optimization Appli- cations. The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, Boston, MA, 1997.

[38] M. A. Gallivan. Modeling and Control of Epitaxial Thin Film Growth. PhD thesis, California Institute of Technology, 2003. 284

[39] M. A. Gallivan and R. M. Murray. Model reduction and system identification for master equation control systems. In Proceedings of the American Control Conference, pages 3561– 3566, Denver, Colorado, June 2003.

[40] T. C. Gard. Introduction to Stochastic Differential Equations. Marcel Dekker, Inc., 1988.

[41] C. W. Gardiner. Handbook of Stochastic Methods for Physics, Chemistry, and the Natural Sciences. Springer-Verlag, Berlin, Germany, 2nd edition, 1990.

[42] A. Genz and R. E. Kass. A collection of numerical integration software for Bayesian anal- ysis. Available from http://www.sci.wsu.edu/math/faculty/genz/homepage, 1998.

[43] M. A. Gibson and J. Bruck. Efficient exact stochastic simulation of chemical systems with many species and many channels. Journal of Physical Chemistry A, 104:1876–1889, 2000.

[44] M. A. Giedlin, D. N. Cook, and J. Thomas W. Dubensky. Vesicular stomatitis virus: An exciting new therapeutic oncolytic virus candidate for cancer or just another chapter from Field’s Virology? Cancer Cell, 4(4):241–243, October 2003.

[45] D. T. Gillespie. A general method for numerically simulating the stochastic time evolu- tion of coupled chemical reactions. Journal of Computational Physics, 22:403–434, 1976.

[46] D. T. Gillespie. Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry, 81:2340–2361, 1977.

[47] D. T. Gillespie. Markov Processes: An Introduction for Physical Scientists. Academic Press, Inc., 1992.

[48] D. T. Gillespie. A rigorous derivation of the chemical master equation. Physica A, 188:404–425, 1992.

[49] D. T. Gillespie. The chemical Langevin equation. Journal of Chemical Physics, 113(1):297– 306, 2000.

[50] D. T. Gillespie. Approximate accelerated stochastic simulation of chemically reacting systems. Journal of Chemical Physics, 115(4):1716–1733, July 2001.

[51] D. T. Gillespie and L. R. Petzold. Improved leap-size selection for accelerated stochastic simulation. Journal of Chemical Physics, 119(16):8229–8234, October 2003.

[52] J. R. Gooch and M. J. Hounslow. Monte Carlo simulation of size-enlargement mecha- nisms in crystallization. AIChE Journal, 42(7):1864–1874, 1996.

[53] N. Gordon, D. Salmond, and A. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings F-Radar and Signal Processing, 140(2):107–113, April 1993. 285

[54] N. Grandvaux, B. R. tenOever, M. J. Servant, and J. Hiscott. The interferon antiviral response: from viral invasion to evasion. Current Opinion in Infectious Diseases, 15(3):259– 267, June 2002.

[55] R. Gudi, S. Shah, and M. Gray. Multirate state and parameter estimation in an antibi- otic fermentation with delayed measurements. Biotechnology and Bioengineering, 44:1271– 1278, 1994.

[56] E. L. Haseltine, D. B. Patience, and J. B. Rawlings. On the stochastic simulation of par- ticulate systems. Accepted for publication in ChE. Sci., 2004.

[57] E. L. Haseltine and J. B. Rawlings. Approximate simulation of coupled fast and slow reactions for stochastic chemical kinetics. Journal of Chemical Physics, 117(15):6903–7390, October 2002.

[58] E. L. Haseltine and J. B. Rawlings. A critical evaluation of extended Kalman filtering and moving horizon estimation. Technical Report 2002–03, TWMCC, Department of Chemical Engineering, University of Wisconsin-Madison, August 2002.

[59] E. L. Haseltine and J. B. Rawlings. A critical evaluation of extended Kalman filtering and moving horizon estimation. Accepted for publication in Industrial & Engineering Chemistry Research, 2004.

[60] E. L. Haseltine, J. B. Rawlings, and J. Yin. Dynamics of viral infections: Incorporating both the intracellular and extracellular levels. Computers & Chemical Engineering, 2005. In press.

[61] J. He, H. Zhang, J. Chen, and Y. Yang. Monte Carlo simulation of kinetics and chain length distributions in living free-radical polymerization. Macromolecules, 30(25):8010– 8018, December 15 1997.

[62] A. V. M. Herz, S. Bonhoeffer, R. M. Anderson, R. M. May, and M. A. Nowak. Viral dynamics in vivo: Limitations on estimates of intracellular delay and virus decay. Proc. Natl. Acad. Sci. USA, 93:7427–7251, July 1996.

[63] Y.-C. Ho and X.-R. Cao. Perturbation Analysis of Discrete Event Dynamic Systems. Kluwer Academic Press, Boston, 1991.

[64] J. J. Holland, L. P. Villarreal, and M. Breindl. Factors involved in the generation and replication of rhabdovirus defective T particles. J. Virol., 17(3):805–815, March 1976.

[65] H. M. Hulburt and S. Katz. Some problems in particle technology: A statistical mechan- ical formulation. Chemical Engineering Science, 19:555–574, 1964.

[66] Y. Husimi, K. Nishigaki, Y. Kinoshita, and T. Tanaka. Cellstat - a continuous culture system of a bacteriophage for the study of the mutation rate and the selection process of the DNA level. Review of Scientific Instruments, 53(4):517–522, 1982. 286

[67] F. J. Isaacs, J. Hasty, C. R. Cantor, and J. J. Collins. Prediction and measurement of an autoregulatory genetic module. Proc. Natl. Acad. Sci. USA, 100(13):7714–7719, June 2003.

[68] A. P. J. Jansen. Monte Carlo simulations of chemical reactions on a surface with time- dependent reaction-rate constants. Comput. Phys. Commun., 86:1–12, 1995.

[69] J. A. M. Janssen. The elimination of fast variables in complex chemical reacions. II. Mesoscopic level (reducible case). J. Stat. Phys., 57(1/2):171–185, 1989.

[70] J. A. M. Janssen. The elimination of fast variables in complex chemical reacions. III. Mesoscopic level (irreducible case). J. Stat. Phys., 57(1/2):187–198, 1989.

[71] A. H. Jazwinski. Stochastic Processes and Filtering Theory. Academic Press, New York, 1970.

[72] D. G. Kendall. Stochastic processes and population growth. Journal of the Royal Statistical Society: Series B, 11:230–264, 1949.

[73] A. Knijnenburg and U. Kreischer. Discrete simulation of replication of a RNA- bacteriophage prototype system. In K. Bellman, editor, Molecular Genetics Information Systems: Modelling and Simulation, pages 267–290, Berlin, 1983. Akademie-Verlag.

[74] D. Kong and J. Yin. Whole-virus vaccine development by continuous-culture on a com- plementing host. Biotechnology, 13(6):583–586, June 1995.

[75] E. Kreyszig. Advanced Engineering Mathematics. John Wiley & Sons, New York, 8th edi- tion, 1999.

[76] T. G. Kurtz. The relationship between stochastic and deterministic models for chemical reactions. Journal of Chemical Physics, 57(7):2976–2978, 1972.

[77] V. Lam, K. A. Duca, and J. Yin. Arrested spread of vesicular stomatitis virus infections in vitro depends on interferon-mediated antiviral activity. Biotech. Bioeng., In press, 2005.

[78] I. J. Laurenzi. Stochastic Processes in Biological and Biochemical Kinetics. PhD thesis, Uni- versity of Pennsylvania, October 2002.

[79] I. J. Laurenzi and S. L. Diamond. Monte Carlo simulation of the heterotypic aggregation kinetics of platelets and neutrophils. Biophysical Journal, 77:1733–1746, 1999.

[80] P. Licari and J. E. Bailey. Modeling the population dynamics of baculovirus-infected insect cells: Optimizing infection strategies for enhanced recombinant protein yields. Biotechnology and Bioengineering, 39(4):432–441, February 1992.

[81] Y. Lou and P. D. Christofides. Estimation and control of surface roughness in thin film growth using kinetic Monte-Carlo models. Chemical Engineering Science, 58(14):3115– 3129, July 2003. 287

[82] Y. Lou and P. D. Christofides. Feedback control of growth rate and surface roughness in thin film growth. AIChE Journal, 49(8):2099–2113, August 2003.

[83] S. E. Luria. General Virology. John Wiley & Sons, New York, 1953.

[84] D. L. Ma, R. D. Braatz, and D. K. Tafti. Compartmental modeling of multidimensional crystallization. International Journal of Modern Physics B, 16(1–2):383–390, January 2002.

[85] A. G. Makeev, D. Maroudas, A. Z. Panagiotopoulos, and I. G. Kevrekidis. Coasrse bi- furcation analysis of kinetic Monte Carlo simulations: A lattice-gas model with lateral interactions. Journal of Chemical Physics, 117(18):8229–8240, November 2002.

[86] S. Manjunath, K. S. Gandhi, R. Kumar, and D. Ramkrishna. Precipitation in small systems–I. Stochastic analysis. Chemical Engineering Science, 49(9):1451–1463, 1994.

[87] N. V. Mantzaris, P. Daoutidis, and F. Srienc. Numerical solution of multi-variable cell population balance models: I. Finite difference methods. Computers & Chemical Engi- neering, 25(11–12):1411–1440, November 2001.

[88] N. V. Mantzaris, P. Daoutidis, and F. Srienc. Numerical solution of multi-variable cell population balance models. II. Spectral methods. Computers & Chemical Engineering, 25(11–12):1441–1462, November 2001.

[89] N. V. Mantzaris, P. Daoutidis, and F. Srienc. Numerical solution of multi-variable cell population balance models. III. Finite element methods. Computers & Chemical Engineer- ing, 25(11–12):1463–1481, November 2001.

[90] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Constrained model pre- dictive control: Stability and optimality. Automatica, 36(6):789–814, 2000.

[91] H. McAdams and A. Arkin. Simulation of prokaryotic genetic circuits. Annu. Rev. Bio- phys. Bio., 27:199–224, 1998.

[92] B. J. McCoy. A new population balance model for crystal size distributions: re- versible, size-dependent growth and dissolution. Journal of Colloid and Interface Science, 240(1):139–149, 2001.

[93] S. A. Middlebrooks. Modelling and Control of Silicon and Germanium Thin Film Chemical Vapor Deposition. PhD thesis, University of Wisconsin–Madison, 2001.

[94] S. Munir and V. Kapur. Regulation of host cell transcriptional physiology by the avian pneumovirus provides key insights into host-pathogen interactions. Journal of Virology, 77(8):4899–4910, 2003.

[95] A. C. Nathwani, R. Benjamin, A. W. Nienhuis, and A. M. Davidoff. Current status and prospects for gene therapy. Vox Sanguinis, 87(2):73–81, August 2004. 288

[96] J. C. Nichol and H. F. Deutsch. Biophysical studies of blood plasma proteins. VII. Sep- aration of γ-globulin from the sera of various animals. Journal of the American Chemical Society, 70(1):80–83, January 1948.

[97] J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, New York, 1999.

[98] M. A. Nowak and R. M. May. Virus Dynamics: Mathematical Principles of Immunology and Virology. Oxford University Press, 2000.

[99] B. A. Ogunnaike and W. H. Ray. Process Dynamics, Modeling, and Control. Oxford Uni- versity Press, New York, 1994.

[100] A. S. Perelson. Modelling viral and immune system dynamics. Nature Reviews Immunol- ogy, 2(1):28–36, January 2002.

[101] A. S. Perelson, A. U. Neumann, M. Markowitz, J. M. Leonard, and D. D. Ho. HIV-1 dynamic in vivo: Virion clearance rate, infected cell life-span, and viral generation time. Science, 271(5255):1582–1586, March 1996.

[102] J. S. Porterfield, D. C. Burke, and A. C. Allison. An estimate of the molecular weight of interferon as measured by its rate of diffusion through agar. Virology, 12(2):197–203, October 1960.

[103] V. Prasad, M. Schley, L. P. Russo, and B. W. Bequette. Product property and production rate control of styrene polymerization. Journal of Process Control, 12(3):353–372, 2002.

[104] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C. Cambridge University Press, Cambridge, 2nd edition, 1992.

[105] S. Raimondeau, P. Aghalayam, A. B. Mhadeshwar, and D. G. Vlachos. Parameter opti- mization of molecular models: Application to surface kinetics. Industrial and Engineering Chemistry Research, 42(6):1174–1183, March 2003.

[106] D. Ramkrishna. Analysis of population balance—IV: The precise connection between Monte Carlo simulation and population balances. Chemical Engineering Science, 36:1203– 1209, 1981.

[107] D. Ramkrishna. Population Balances. Academic Press, San Deigo, 2000.

[108] D. Ramkrishna and J. D. Borwanker. A puristic analysis of population balance–I. Chem- ical Engineering Science, 28:1423–1435, 1973.

[109] D. Ramkrishna and J. D. Borwanker. A puristic analysis of population balance–II. Chem- ical Engineering Science, 29:1711–1721, 1974.

[110] A. D. Randolph and M. A. Larson. Transient and steady-state size distributions in con- tinuous mixed suspension crystallizers. AIChE Journal, 8(5):639–645, 1962. 289

[111] A. D. Randolph and E. T. White. Modeling size dispersion in the prediction of crystal- size distribution. Chemical Engineering Science, 32:1067–1076, 1977.

[112] C. V. Rao. Moving Horizon Strategies for the Constrained Monitoring and Control of Nonlinear Discrete-Time Systems. PhD thesis, University of Wisconsin–Madison, 2000.

[113] C. V. Rao and A. P. Arkin. Stochastic chemical kinetics and the quasi-steady-state assumption: Application to the Gillespie algorithm. Journal of Chemical Physics, 118(11):4999–5010, March 2003.

[114] C. V. Rao and J. B. Rawlings. Constrained process monitoring: moving-horizon ap- proach. AIChE Journal, 48(1):97–109, January 2002.

[115] C. V. Rao, J. B. Rawlings, and J. H. Lee. Constrained linear state estimation – a moving horizon approach. Automatica, 37(10):1619–1628, 2001.

[116] C. V. Rao, J. B. Rawlings, and D. Q. Mayne. Constrained state estimation for nonlinear discrete-time systems: stability and moving horizon approximations. IEEE Transactions on Automatic Control, 48(2):246–258, February 2003.

[117] M. Rathinam, L. R. Petzold, Y. Cao, and D. T. Gillespie. Stiffness in stochastic chem- ically reacting systems: The implicit tau-leaping method. Journal of Chemical Physics, 119(24):12784–12794, December 2003.

[118] J. Rawlings. Tutorial overview of model predictive control. IEEE Control Systems Maga- zine, 20:38–52, 2000.

[119] J. B. Rawlings. Tutorial: Model predictive control technology. In Proceedings of the Amer- ican Control Conference, San Diego, CA, pages 662–676, 1999.

[120] J. B. Rawlings and J. G. Ekerdt. Chemical Reactor Analysis and Design Fundamentals. Nob Hill Publishing, Madison, WI, 2002.

[121] J. B. Rawlings, W. R. Witkowski, and J. W. Eaton. Modelling and control of crystallizers. Powder Technology, 69:3–9, 1992.

[122] B. Reddy and J. Yin. Quantitative intracellular kinetics of HIV type 1. AIDS Research and Human Retroviruses, 15(3):273–283, February 1999.

[123] K. Reif, S. Gunther,¨ E. Yaz, and R. Unbehauen. Stochastic stability of the discrete-time extended Kalman filter. IEEE Transactions on Automatic Control, 44(4):714–728, April 1999.

[124] K. Reif, S. Gunther,¨ E. Yaz, and R. Unbehauen. Stochastic stability of the continuous-time extended Kalman filter. IEE Proceedings-Control Theory and Applications, 147(1):45–52, January 2000. 290

[125] K. Reif and R. Unbehauen. The extended Kalman filter as an exponential observer for nonlinear systems. IEEE Transactions on Signal Processing, 47(8):2324–2328, August 1999.

[126] H. Resat, H. S. Wiley, and D. A. Dixon. Probability-weighted dynamic Monte Carlo method for reaction kinetics simulations. Journal of Physical Chemistry B, 105(44):11026– 11034, 2001.

[127] R. G. Rice and D. D. Do. Applied mathematics and modeling for chemical engineers. Wiley Series in Chemical Engineering. John Wiley & Sons, Inc., New York, 1995.

[128] D. G. Robertson, J. H. Lee, and J. B. Rawlings. A moving horizon-based approach for least-squares state estimation. AIChE Journal, 42(8):2209–2224, August 1996.

[129] J. K. Rose and M. A. Whitt. Fundamental Virology, chapter Rhabdoviridae: The viruses and their replication, pages 665–688. Lippincott Williams & Wilkins, fourth edition, 2001.

[130] S. M. Ross. A first course in probability. Prentice Hall, Upper Saddle River, N. J., 5th edition, 1998.

[131] P. Royston. A remark on algorithm AS 181: The w-test for normality. Appl. Stat., 44(4):547–551, 1995.

[132] W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, Inc., New York, third edi- tion, 1976.

[133] C. E. Samuel. Antiviral actions of interferons. Clinical Microbiology Reviews, 14(4):778– 809, October 2001.

[134] A. Schwienhorst, B. F. Lindemann, and M. Eigen. Growth kinetics of a bacteriophage in continuous culture. Biotechnology and Bioengineering, 50(2):217–221, April 1996.

[135] G. C. Sen. Viruses and interferons. Annu. Rev. Microbiol., 55:255–281, 2001.

[136] B. H. Shah, D. Ramkrishna, and J. D. Borwanker. Simulation of particulate systems using the concept of the interval of quiescence. AIChE Journal, 23(6):897–904, 1977.

[137] S. S. Shapiro and M. B. Wilk. An analysis of variance test for normality (complete sam- ples). Biometrika, 52(3–4):591–611, 1965.

[138] C. I. Siettos, A. Armaou, A. G. Makeev, and I. G. Kevrekidis. Microscopic/stochastic timesteppers and “coarse” control: A KMC example. AIChE Journal, 49(7):1922, 2003.

[139] C. I. Siettos, D. Maroudas, and I. G. Kevrekidis. Coarse bifurcation diagrams via micro- scopic simulators: A state-feedback control-based approach. Accepted for publication in Int. J. Bif. Chaos, 2003.

[140] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York, 1986. 291

[141] M. Soroush. State and parameter estimations and their applications in process control. Computers & Chemical Engineering, 23(2):229–245, December 1998.

[142] J. C. Spall. Estimation via Markov chain Monte Carlo. IEEE Control Systems Magazine, 23(2):34–45, April 2003.

[143] R. Srivastava, L. You, J. Summers, and J. Yin. Stochastic vs. deterministic modeling of intracellular viral kinetics. Journal of Theoretical Biology, 218(3):309–321, October 2002.

[144] R. F. Stengel. Optimal Control and Estimation. Dover Publications, Inc., 1994.

[145] W. E. Stewart, M. Caracotsios, and J. P. Sørensen. Computer-Aided Modelling of Reactive Systems. 1999. In preparation.

[146] M. Tenny. Computational Strategies for Nonlinear Model Predictive Control. PhD thesis, University of Wisconsin–Madison, 2002.

[147] M. J. Tenny and J. B. Rawlings. State estimation strategies for nonlinear model predictive control. AIChE Annual Meeting, Reno, November 2001.

[148] M. J. Tenny and J. B. Rawlings. Efficient moving horizon estimation and nonlinear model predictive control. In Proceedings of the American Control Conference, pages 4475–4480, Anchorage, Alaska, May 2002.

[149] M. J. Tenny, J. B. Rawlings, and S. J. Wright. Closed-loop behavior of nonlinear model predictive control. AIChE Journal, 50(9):2142–2154, September 2004.

[150] M. J. Tenny, S. J. Wright, and J. B. Rawlings. Nonlinear model predictive control via feasibility-perturbed sequential quadratic programming. Computational Optimization and Applications, 28(1):87–121, April 2004.

[151] J. Tramper, E. J. Vandenend, C. D. Degooijer, R. Kompier, F. L. J. Vanlier, M. Usmany, and J. M. Vlak. Production of baculovirus in a continuous insect-cell culture - Bioreactor design, operation, and modeling. Annals of the New York Academy of Sciences, 589:423–430, May 1990.

[152] M. L. Tyler and M. Morari. Stability of constrained moving horizon estimation schemes. Preprint AUT96–18, Automatic Control Laboratory, Swiss Federal Institute of Technol- ogy, 1996.

[153] H. A. van der Vorst. Iterative Krylov Methods for Large Linear Systems. Number 13 in Cambridge Monographs on Applied and Computational Mathematics. Cambridge Uni- versity Press, New York, NY, 2003.

[154] N. G. van Kampen. Stochastic Processes in Physics and Chemistry. Elsevier Science Pub- lishers, Amsterdam, The Netherlands, 2nd edition, 1992. 292

[155] J. Villadsen and M. L. Michelsen. Solution of Differential Equation Models by Polynomial Approximation. Prentice-Hall, Englewood Cliffs New Jersey, 1978.

[156] D. G. Vlachos. Instabilities in homogeneous nonisothermal reactors: Comparison of deterministic and Monte Carlo simulations. Journal of Chemical Physics, 102(4):1781–1790, 1995.

[157] M. O. Vlad and A. Pop. A physical interpretation of age-dependent master equations. Physica A, 155(2):276–310, 1989.

[158] R. R. Wagner and A. S. Huang. Inhibition of RNA and interferon synthesis in Krebs-2 cells infected with vesicular stomatitis virus. Virology, 28(1):1–10, January 1966.

[159] Y. Wang and E. D. Sontag. Output-to-state stability and detectability of nonlinear sys- tems. Systems & Control Letters, 29:279–290, 1997.

[160] B. R. Ware, T. Raj, W. H. Flygare, J. A. Lesnaw, and M. E. Reichmann. Molecular weights of vesicular stomatitis virus and its defective particles by laser light-scattering spec- troscopy. J. Virol., 11(1):141–145, January 1973.

[161] G. W. Wertz and J. S. Youngner. Interferon production and inhibition of host synthesis in cells infected with vesicular stomatitis virus. J. Virol., 6(4):476–484, October 1970.

[162] D. O. White and F. J. Fenner. Medical Virology. Academic Press, fourth edition edition, 1994.

[163] D. I. Wilson, M. Agarwal, and D. Rippin. Experiences implementing the extended Kalman filter on an industrial batch reactor. Computers & Chemical Engineering, 22(11):1653–1672, 1998.

[164] D. Wodarz and M. A. Nowak. Mathematical models of HIV pathogenesis and treatment. BioEssays, 24(12):1178–1187, 2002.

[165] J. Yin and J. S. McCaskill. Replication of viruses in a growing plaque: a reaction-diffusion model. Biophysical Journal, 61(6):1540–1549, June 1992.

[166] L. You and J. Yin. Amplification and spread of viruses in a growing plaque. Journal of Theoretical Biology, 200(4):365–373, 1999.

[167] H. Yu and C. G. Cassandras. Perturbation analysis for production control and optimiza- tion of manufacturing systems. Automatica, 40(6):945–956, June 2004. 293

Vita

Eric Lynn Haseltine was born in Kingsport, Tennessee to Doug and Lydia Haseltine. In June

1995, he graduated as valedictorian of his class from Dobyns-Bennett High School in Kingsport.

In May 1999, he graduated summa cum laude with departmental honors from Clemson Univer- sity with a Bachelor of Science degree in Chemical Engineering. His undergraduate education included three cooperative education rotations and one summer internship with the Eastman

Chemical Company in Kingsport, TN. In the fall of 1999, he began his graduate studies under the direction of James B. Rawlings in the Department of Chemical Engineering at the Uni- versity of Wisconsin-Madison. After surviving six Wisconsin winters, he will be heading for a warmer climate as a post-doctoral fellow at the California Institute of Technology in Pasadena,

California.

Permanent Address: 3909 Hemlock Park Dr. Kingsport, TN 37663

1 This dissertation was prepared with LATEX 2ε by the author.

1This particular University of Wisconsin compliant style was carved from The University of Texas at Austin styles as written by Dinesh Das (LATEX 2ε), Khe–Sing The (LATEX), and John Eaton (LATEX). Knives and chisels wielded by John Campbell and Rock Matthews.