Bayesian Inference for Stable Differential Equation Models With
Total Page:16
File Type:pdf, Size:1020Kb
Bayesian inference for stable differential equation models with applications in computational neuroscience Author: Philip Maybank A thesis presented for the degree of doctor of philosophy school of mathematical, physical and computational sciences university of reading February 16, 2019 Remember to look up at the stars and not down at your feet. Try to make sense of what you see and wonder about what makes the universe exist. Be curious. And however difficult life may seem, there is always something you can do and succeed at. It matters that you don’t just give up. THE LATE PROF STEPHEN HAWKING, 1942-2018 ABSTRACT Inference for mechanistic models is challenging because of nonlinear interactions between model parameters and a lack of identifiability. Here we focus on a specific class of mechanistic models, which we term stable differential equations. The dynamics in these models are approximately linear around a stable fixed point of the system. We exploit this property to develop fast approximate methods for posterior inference. We first illustrate our approach using simulated EEG data on the Liley et al model, a mechanistic neural population model. Then we apply our methods to experimental EEG data from rats to estimate how parameters in the Liley et al model vary with level of isoflurane anaesthesia. More generally, stable differential equation models and the corresponding inference methods are useful for analysis of stationary time-series data. Compared to the existing state-of-the art, our methods are several orders of magnitude faster, and are particularly suited to analysis of long time-series (>10,000 time-points) and models of moderate dimension (10-50 state variables and 10-50 parameters.) 3 ACKNOWLEDGEMENTS I have enjoyed studying Bayesian methodology with Richard Culliford and Changqiong Wang, and Neuroscience with Asad Malik and Catriona Scrivener. I have also been fortunate in being able to participate in the following graduate level training, and international conferences : the 2014-15 Academy for PhD Training in Statistics (APTS); the New Perspectives in Markov Chain Monte Carlo (NPMCMC) 2015 summer school; the International Society for Bayesian Analysis (ISBA) 2016 world meeting in Sardinia; and the Computational Neurosciences (CNS) 2018 meeting in Seattle. And further training in compu- tational skills and software development at the Numerical Algorithms Group (NAG) in Oxford, 2017. The support staff at the University of Reading (Ruth Harris, Brigitte Calderon, Peta-Ann King, and Kristine Aldridge) have given me a lot of practical help and general supportiveness. My PhD supervisors, Richard Everitt and Ingo Bojak, have taught me a lot about their respective fields, and about myself. I would like to thank my family, Gemma and Rowan, for their love and encouragement. 4 DECLARATION I confirm that this is my own work and the use of all material from other sources has been properly and fully acknowledged. Philip Maybank CONTENTS List of Figures 11 List of Tables 16 1 introduction 19 2 survey of bayesian methods 25 3 neural population models 61 4 stationary processes and the whittle likelihood 85 5 efficient mcmc samplers for parameter inference 117 6 c++ classes and functions for parameter inference 141 7 eeg data analysis 169 8 summary 187 a eigenvector derivatives 193 Bibliography 197 6 CONTENTS(WITHSUBSECTIONS) List of Figures 11 List of Tables 16 1 introduction 19 2 survey of bayesian methods 25 2.1 Preliminaries 27 2.1.1 Maximum likelihood estimation 28 2.1.2 Optimization 29 2.2 Approximate inference 31 2.2.1 The Laplace approximation 31 2.2.2 Variational approximation 32 2.2.3 Integrated Nested Laplace Approximation 35 2.3 Markov Chain Monte Carlo 36 2.3.1 The Metropolis-Hastings algorithm 36 2.3.2 The Metropolis-within-Gibbs algorithm 37 2.3.3 Sampling efficiency and tuning MH and MwG algorithms 39 2.3.4 Riemannian MCMC 41 2.4 Sequential Monte Carlo 41 2.4.1 Resampling distributions 45 2.5 Pseudo-marginal MCMC 46 2.6 State-space models 47 2.6.1 Discretization of continuous-time state-space models 49 7 Contents (with subsections) 2.6.2 Kalman filter marginal likelihood for linear systems 51 2.6.3 Particle filters for nonlinear systems 53 2.6.4 Particle MCMC for static parameter estimation 56 2.7 Contextual summary 57 3 neural population models 61 3.1 Neurophysiology 62 3.1.1 Electrophysiology 62 3.1.2 Synaptic physiology 65 3.1.3 System dynamics 72 3.1.4 Excitatory-inhibitory networks 74 3.2 Field potentials and the EEG 78 3.3 Anaesthesia 80 3.3.1 Neurophysiology 80 3.3.2 Macroscopic recordings of brain activity 81 3.4 Inference for Neural Population Models 82 4 stationary processes and the whittle likelihood 85 4.1 Preliminaries 86 4.1.1 Stationary Processes 86 4.1.2 The Wiener Process and the damped harmonic oscillator 87 4.1.3 Linearization and approximately stationary processes 90 4.1.4 FitzHugh-Nagumo equations 92 4.2 Spectral Analysis for univariate processes 94 4.2.1 Sums of stationary processes 96 4.2.2 Discrete-time observations of continuous-time processes 96 4.2.3 Whittle likelihood 97 4.3 Spectral analysis for linear systems 99 8 Contents (with subsections) 4.3.1 Derivatives of the spectral density 101 4.4 Evaluating accuracy of Whittle likelihood 104 4.4.1 Evaluating posterior accuracy under linearized model 104 4.4.2 Comparison between Kalman filter and Whittle likelihood on linear model 111 4.5 Discussion 113 5 efficient mcmc samplers for parameter inference 117 5.1 Sensitivity analysis of Liley et al model 118 5.1.1 Effect of changing a single parameter 118 5.1.2 Grid-based likelihood evaluation 119 5.2 Model reparameterization: equilibrium values as parameters 120 5.3 Evaluating efficiency of reparameterization 126 5.3.1 MwG on deterministic FitzHugh Nagumo equations 126 5.3.2 Justifying the use of Whittle likelihood for EEG analysis 127 5.3.3 MwG on stochastic Liley et al model 131 5.3.4 smMALA on stochastic Liley et al model 134 5.4 Discussion 138 6 c++ classes and functions for parameter inference 141 6.1 Whittle likelihood function with parameter vector input 146 6.1.1 A class for parameter vectors that supports element access by name 146 6.1.2 A class for data in the time and frequency domains 148 6.1.3 Evaluating the Whittle likelihood of data given parameters 149 6.1.4 Defining a derived model class to implement a specific model 151 6.2 Posterior density function for single and multiple time-series 152 6.2.1 Evaluating the prior density using a derived class for parameter vec- tors 152 6.2.2 Evaluating the posterior density for a single time-series 153 9 Contents (with subsections) 6.2.3 Evaluating the posterior density for multiple time-series 155 6.2.4 Evaluating derivatives of the posterior density 157 6.3 Sampling from the posterior density with MCMC 158 6.3.1 Plotting and summarizing MCMC results 162 6.4 Maximization of the posterior density 164 6.5 Discussion 166 7 eeg data analysis 169 7.1 Analysis of adult resting state EEG data 170 7.2 Development of prior distribution for NPM parameters 172 7.2.1 Posterior-prior transformation 175 7.3 Development of prior distribution for effect of isoflurane 176 7.3.1 Hill equations for the effect of isoflurane 178 7.3.2 Ratios for the effect of isoflurane 180 7.4 Analysis of EEG data from rats undergoing isoflurane anaesthesia 180 7.4.1 Parameter estimation with uncertainty quantification 182 7.5 Discussion 184 8 summary 187 a eigenvector derivatives 193 Bibliography 197 10 LISTOFFIGURES Figure 4.1 Vector fields for the FitzHugh-Nagumo model (black arrows) and its lin- earization around a stable equilibrium (blue arrows), as well as sample path of nonlinear model (coloured line). Arrows represent the drift term, i.e., the deterministic time-derivative. Left column: Nonlinear model is well approximated by linearization. Right column: Linearized model is not a good approximation in spite of similar likelihood compared to the cases on the left, cf. see Figs. 4.2 and 4.3. I(t) = 0 and P (t) was white 2 noise with variance σin. Top left parameter values for (a, b, c, d, I, σin): (−5, 6000, 40, 4000, 100, 100). Top right: (−30, 6000, 40, 4000, 100, 100). Bottom left: (−30, 6000, 40, 4000, 100, 10). Bottom right: (−150, 3000, 170, 17000, 100, 10). 106 Figure 4.2 Comparison between Kalman filter (solid blue line), EKF (dashed red line) and particle filter (black dots) for the stochastic FitzHugh-Nagumo 2 model. P (t) is white noise with variance σin, observation noise with 2 variance σobs is added. The only unknown parameter is a. The marginal likelihood is estimated on a uniformly spaced sequence of 200 a values. Fixed parameters: b = 6000, c = 40, d = 4000, I0 = 100, I(t) ≡ 0. Time-step in solver and Kalman filter = 10−3, in observations = 10−2. Number of particles = 1000. 109 11 LIST OF FIGURES Figure 4.3 Comparison between marginal MCMC with Kalman filter (left) and EKF (right) on a problem with unknown parameters (a, b, c, I0). Plots show MCMC samples from the joint (a, b) marginal distribution only. Param- eter values: d = 4000, T = 2, σin = 10, σobs = 0.01. Time-step in solver and Kalman filter = 10−3, in observations = 10−2. The MCMC chain started at the true parameters and ran for 10,000 iterations. Plots shows every 500th sample (black dots). The coloured background is a smoothed histogram with blue representing low probability density and yellow representing high probability density.