Bayesian Inference and Model Selection for Partially-Observed, Continuous-Time, Stochastic Epidemic Models

Bayesian Inference and Model Selection for Partially-Observed, Continuous-Time, Stochastic Epidemic Models James Nicholas Walker Thesis submitted for the degree of Doctorate of Philosophy in Applied Mathematics and Statistics at The University of Adelaide (Faculty of Engineering, Computer and Mathematical Sciences) School of Mathematical Sciences December, 2019 Contents Abstract ix Signed Statement x Acknowledgements xi Publications xii 1 Introduction 1 2 Background Material 9 2.1 Markov Chains . .9 2.1.1 Continuous-time Markov chain theory . 10 2.1.2 Partially-observed Markov chains . 11 2.1.3 Doob-Gillespie algorithm . 13 2.2 Stochastic Epidemic Models . 15 2.2.1 Homogeneous SI(n)R model . 15 2.2.2 Homogeneous SE(n1)I(n2)R model . 17 2.2.3 Household models . 18 2.3 Bayesian Inference . 21 2.3.1 Markov chain Monte Carlo . 22 2.3.2 Gibbs sampling . 26 2.3.3 Data-augmented MCMC . 27 2.3.4 Particle filters . 38 2.3.5 Particle-marginal MCMC . 42 2.3.6 Sequential Monte Carlo squared . 43 ii 3 Efficient Bayesian Inference for Epidemics 47 3.1 Data-Augmentation for an SIR Household Model . 47 3.1.1 Model and data assumptions . 48 3.1.2 Inference . 48 3.1.3 Results . 52 3.2 Inference for an SE(n1)I(n2)R Household Model . 59 3.2.1 Model . 59 3.2.2 Inference . 60 3.2.3 Results . 63 3.3 Sequential DA-MCMC . 66 3.3.1 The aim of SDA-MCMC . 66 3.3.2 Algorithm description . 67 3.3.3 SIR model example . 68 3.3.4 Modifications for independent outbreak data . 73 3.3.5 SI(n)R example . 73 3.3.6 Discussion . 74 4 Accuracy and Precision of Estimates Under Various Surveillance Schemes 78 4.1 Methods . 82 4.2 Results . 84 4.2.1 Subpopulation size and outbreak repetitions . 84 4.2.2 Latent and infectious period distribution . 85 4.3 Discussion . 96 5 Model Selection via Importance Sampling 99 5.1 Methods . 101 5.1.1 Importance sampling for evidence estimation . 101 5.1.2 Sequential importance resampling algorithm for likelihood estimation . 103 5.2 Case study I: Inferring the shape of the infectious period distribution . 106 5.2.1 SI(k)R model . 107 5.2.2 Importance sampling for SI(k)R outbreaks . 107 5.2.3 Implementation . 110 5.2.4 Results . 111 5.3 Case study II: Inferring symptom onset relative to infectiousness . 117 5.3.1 SE(2)I(2)R model . 117 5.3.2 Importance sampling for initial state generation . 119 5.3.3 Importance sampling for SE(2)I(2)R outbreaks . 120 5.3.4 Implementation . 121 5.3.5 Results . 122 5.4 Discussion . 126 6 Joint Inference and Model Selection 131 6.1 JIMS and SMC-squared . 131 6.1.1 Inference step . 132 6.1.2 Importance sampling densities . 133 6.1.3 Model selection step . 135 6.2 Comparison based on SIR model . 141 6.2.1 Results . 141 6.3 Comparison based on SE(2)I(2)R model . 146 6.3.1 Results . 146 6.4 Application to multiple outbreaks data . 150 6.4.1 Implementation . 150 6.4.2 Results . 151 6.5 Discussion . 156 7 Concluding Remarks 159 A DA-MCMC for multiple outbreaks 169 Bibliography 173 List of Figures 2.1 An illustration of a partially-observed Markov chain. 12 2.2 Examples of trace plots. 24 2.3 An example of using MCMC samples to estimate a posterior distribution. 25 2.4 An example of Gibbs sampling. 27 2.5 An illustration of the labelled SIR process. 31 2.6 A comparison of posterior distributions obtained by applying DA-MCMC to the standard SIR model and the labelled SIR model. 33 2.7 Kernel density estimates of the posterior distribution from a single outbreak from an SI(3)R model. 36 2.8 Kernel density estimates of the posterior distribution from a single SEIR outbreak. 39 3.1 An illustration of household stratified data. 49 3.2 Boxplots of maximum a posteriori (MAP) estimates of model parameters from 50 simulations from the SIR household model. 54 3.3 Boxplots of maximum a posteriori (MAP) estimates of (R∗; r) from 50 simulations from the SIR household model. 55 3.4 Kernel density estimates of R∗ and r from an SIR household outbreak. 56 3.5 Boxplots of the efficiency of DA-MCMC and BPA against the number of infected households. 57 3.6 The household size distribution of Adelaide. 63 3.7 Kernel density estimates of the posterior distribution from an SE(2)I(2)R model with Adelaide's household distribution. 65 3.8 Kernel density estimates of the posterior distribution obtained from the SDA-MCMC and DA-MCMC from a single SIR outbreak. 71 3.9 The posterior distribution of the hidden state of the SIR model. 72 v 3.10 Kernel density estimates of posterior distributions sampled via the SDA- MCMC and DA-MCMC methods, based on multiple SI(n)R outbreaks. 74 4.1 Boxplots of median estimates of R0 and r from multiple outbreaks of SEIR epidemics in various subpopulation distributions. 86 4.2 Boxplots of median estimates of R0 and r from multiple outbreaks of SE(2)I(2)R epidemics in various subpopulation distributions. 87 4.3 Histograms of the total number of secondary infections from multiple outbreaks from various subpopulation distributions and models. 88 4.4 Boxplots of median estimates of R0 and r from multiple outbreaks of SE(2)I(2)R epidemics in various subpopulation distributions, where inference assumed a SEIR model. 90 4.5 Boxplots of median estimates of R0 and r from multiple outbreaks of SE(2)I(2)R epidemics in various subpopulation distributions, where inference assumed an S(n)R model with unknown shape parameter. 91 4.6 Kernel density estimates of posterior distributions of R0 and r from multiple outbreaks of SE(2)I(2)R epidemics in various subpopulation distributions, where inference is performed assuming various models. 93 4.7 Boxplots of median estimates of R0 and r from final size data from multiple outbreaks of SE(2)I(2)R epidemics in various subpopulation distributions, where inference assumed a SIR model. 94 5.1 Box plots of the difference in posterior model probability of the true model and the other candidate models for SI(k)R models with tight priors. 113 5.2 Box plots of the difference in posterior model probability of the true model and the other candidate models for SI(k)R models with loose priors. 114 5.3 The coefficient of variation of the model evidence against the number of state particles for model selection performed on multiple outbreaks from an SI(2)R model. 116 5.4 An diagram showing how different models of observation correspond to different transitions from an SE(2)I(2)R model. 117 5.5 Flow chart showing how to propose forced events for the SE(2)I(2)R model. 121 5.6 Box plots of the difference in posterior model probability of the true model and the other candidate models for the Pre, Co and Post observation models with tight priors. 124 5.7 Box plots of the difference in posterior model probability of the true model and the other candidate models for the Pre, Co and Post observation models with loose priors. 125 6.1 Samples from a defence mixture and from the posterior distribution. 135 6.2 Box plots of posterior model probability estimates from SMC-squared and JIMS algorithm performed on a single SIR outbreak. 142 6.3 Box plots of posterior model probability estimates for each iteration of SMC-squared and JIMS algorithm performed on a single SIR outbreak. 143 6.4 Scatter plots of mean parameter estimates from SMC-squared and JIMS algorithm performed on a single SIR outbreak. 144 6.5 Box plots of the cumulative run time at each iteration of SMC-squared and JIMS algorithm performed on a single SIR outbreak. 145 6.6 Box plots of posterior model probability estimates from SMC-squared and JIMS algorithm performed on a single SE(2)I(2)R outbreak. 147 6.7 Box plots of mean parameter estimates at each iteration of SMC-squared and JIMS algorithm, performed on a single SE(2)I(2)R outbreak. 148 6.8 Plots of the variance of the mean parameter estimates at each iteration of SMC-squared and JIMS algorithm, performed on a single SE(2)I(2)R outbreak. 148 6.9 Box plots of cumulative run time at each iteration of SMC-squared and JIMS algorithm, performed on a single SE(2)I(2)R outbreak. 149 6.10 Plots of posterior model probability estimates at each iteration of JIMS algorithm performed on multiple outbreak data from SE(2)I(2)R models. 153 6.11 Kernel density estimates of marginal posterior distributions from JIMS algorithm performed on multiple outbreak data from SE(2)I(2)R models. 154 6.12 Box plots of median parameter estimates from JIMS algorithm run on 30 simulated sets of multiple outbreak data for three different SE(2)I(2)R type models. 155 Abstract Emerging infectious diseases are an ongoing threat to the health of populations around the world. In response, countries such as the USA, UK and Australia, have outlined data collection protocols to surveil these novel diseases. One of the aims of these data collection protocols is to characterise the disease in terms of transmissibility and clini- cal severity in order to inform an appropriate public health response.

Load more