Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation
Total Page:16
File Type:pdf, Size:1020Kb
Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation Steven Kleinegesse 1 Michael U. Gutmann 1 Abstract in order to understand the underlying natural process bet- Implicit stochastic models, where the data- ter or to predict some future events. Since the likelihood generation distribution is intractable but sampling function is intractable for implicit models, we have to revert is possible, are ubiquitous in the natural sciences. to likelihood-free inference methods such as approximate The models typically have free parameters that Bayesian computation (for recent reviews see e.g. Lin- need to be inferred from data collected in sci- tusaari et al., 2017; Sisson et al., 2018). entific experiments. A fundamental question is While considerable research effort has focused on develop- how to design the experiments so that the col- ing efficient likelihood-free inference methods (e.g. Papa- lected data are most useful. The field of Bayesian makarios et al., 2019; Chen & Gutmann, 2019; Gutmann & experimental design advocates that, ideally, we Corander, 2016; Papamakarios & Murray, 2016; Ong et al., should choose designs that maximise the mutual 2018), the quality of the estimated parameters θ ultimately information (MI) between the data and the param- depends on the quality of the data y that are available for eters. For implicit models, however, this approach inference in the first place. We here consider the scenario is severely hampered by the high computational where we have control over experimental designs d that af- cost of computing posteriors and maximising MI, fect the data collection process. For example, these might be in particular when we have more than a handful the spatial location or time at which we take measurements, of design variables to optimise. In this paper, we or they might be the stimulus that is used to perturb the propose a new approach to Bayesian experimental natural system. Our overall goal is then to find experimental design for implicit models that leverages recent designs d that yield the most information about the model advances in neural MI estimation to deal with parameters. these issues. We show that training a neural net- work to maximise a lower bound on MI allows In Bayesian experimental design (BED), we construct and us to jointly determine the optimal design and the optimise a utility function U(d) that indicates the value of posterior. Simulation studies illustrate that this a design d. A choice for the utility function that is firmly gracefully extends Bayesian experimental design rooted in information theory is the mutual information (MI) for implicit models to higher design dimensions. I(θ; y; d) between model parameters and data, p(θ y; d) I(θ; y; d) = Ep(θ;yjd) log j ; (1) 1. Introduction p(θ) Many processes in nature can be described by a parametric where p(θ y; d) is the posterior distribution of θ given j arXiv:2002.08129v3 [stat.ML] 14 Aug 2020 statistical model from which we can simulate data. When data y that were obtained with design d, and p(θ) is the the corresponding data-generating distribution is intractable prior belief of θ.1 MI describes the expected uncertainty we refer to it as an implicit model. These models are reduction in the model parameters θ when collecting data abundant in science and engineering, having been used, with experimental design d, which makes it an effective util- for instance, in high-energy physics (Agostinelli et al., ity function for BED. Since MI depends on the full posterior 2003), cosmology (M. Schafer & Freeman, 2012), epidemi- p(θ y; d), it is sensitive to deviations from the prior, e.g. ology (Corander et al., 2017), cell biology (Vo et al., 2015) nonlinearj correlations or multi-modality, that other utility and pharmacokinetics (Donnet & Samson, 2013). Usually, functions do not capture (Ryan et al., 2016). we wish to infer the free parameters θ of the implicit model Although a quantity with highly desirable properties, MI 1School of Informatics, University of Edinburgh, Ed- is notoriously difficult to compute and maximise. The key inburgh, UK. Correspondence to: Steven Kleinegesse problems in the context of implicit models are 1) that the <[email protected]>. 1It is typically assumed that p(θ) is not affected by d. BED for Implicit Models using MINE posteriors are hard to estimate and that obtaining samples where likelihood functions are tractable or approximations from them is expensive as well; 2) that the functional rela- are available, unlike our paper. In theory, their method is tionship between the designs and the MI is unknown and that also applicable in the likelihood-free setting. (approximate) gradients are generally not available either. The first problem made the community turn to approximate 2. The MINEBED Method Bayesian computation and less powerful utility functions, We here show how to perform experimental design for im- e.g. based on the posterior variance (Drovandi & Pettitt, plicit models by maximising a lower bound of the mutual 2013), or approximations with kernel density estimation information (MI) between model parameters θ and data y. (Price et al., 2016). More recently, Kleinegesse & Gutmann The main properties of our approach are that for a large (2019) have shown that likelihood-free inference by den- and interesting class of implicit models, the designs can be sity ratio estimation (Thomas et al., 2016) can be used to found by gradient ascent and that the method simultaneously estimate the posterior and the mutual information at the yields an estimate of the posterior. same time, which alleviates the first problem to some extent. The second problem has been addressed by using gradient- We start with the MINE-f MI lower bound of Belghazi et al. free optimisation techniques such as grid-search, sampling (2018), which we shall call Ib(d; ), (Muller¨ , 1999), evolutionary algorithms (Price et al., 2018), Gaussian-process surrogate modelling (Overstall & McGree, Ib(d; ) = Ep(θ;yjd) [T (θ; y)] h i (2) 2018) and Bayesian optimisation (Kleinegesse & Gutmann, −1 T (θ;y) e Ep(θ)p(yjd) e ; 2019). However, these approaches generally do not scale − well with the dimensionality of the design variables d (e.g. where T (θ; y) is a neural network parametrised by , with Spall, 2005). θ and y as input. This is also the lower bound of Nguyen In BED, our primary goal is to find the designs d that max- et al.(2010) and the f-GAN KL of Nowozin et al.(2016). imise the MI and not estimating the MI to a high accuracy. We note that while we focus on the above lower bound, our Rather than spending resources on estimating MI accurately approach could also be applied to other MI lower bounds of for non-optimal designs, a potentially more cost-effective e.g. Poole et al.(2019), Foster et al.(2019a) and Foster et al. approach is thus to relax the problem and to determine the (2019b). Moreover, comparing the performance of different designs that maximise a lower bound of the MI instead, MI lower bounds is not the aim of this paper and we do not while tightening the bound at the same time. In our paper, claim that the particular bound in (2) is superior to others. we take this approach and show that for a large class of Importantly, Belghazi et al.(2018) showed that the bound implicit models, the lower bound can be tightened and max- in (2) can be tightened by maximising it with respect to the imised by gradient ascent, which addresses the aforemen- neural network parameters by gradient ascent and that tioned scalability issues. Our approach leverages the Mutual for flexible enough neural networks we obtain I(θ; y; d) = Information Neural Estimation (MINE) method of Belghazi max I(d; ). Our experimental design problem can thus et al.(2018) to perform BED — we thus call it MINEBED. 2 b be formulated as We show later on that in addition to gradient-based exper- n o imental design, MINEBED also provides us with an ap- d∗ = arg max max Ib(d; ) : (3) proximation of the posterior p(θ y; d) so that no separate d j and oftentimes expensive likelihood-free inference step is The main difficulty is that for implicit models, the expecta- needed. tions in the definition of Ib(d; ) generally depend in some complex manner on the design variables d, complicating gradient-based optimisation. Related Work Foster et al.(2019a) have recently consid- ered the use of MI lower bounds for experimental design. 2.1. Experimental Design with Gradients Their approach is based on variational approximations to the posterior and likelihood. While the authors note that this in- We are interested in computing the derivative dIb(d; ), troduces a bias, in practice, this may actually be acceptable. r However, the optimal designs were determined by Bayesian dIb(d; ) = dEp(θ;yjd) [T (θ; y)] r r h i optimisation and, like in the aforementioned previous work, −1 T (θ;y) de Ep(θ)p(yjd) e : (4) this approach may thus also suffer from scalability problems. − r The authors rectify this in a follow-up paper (Foster et al., For implicit models we do not have access to the gradients 2019b) but their experiments focus on the explicit setting of the joint distribution p(θ; y d) and marginal distribution j 2The research code for this paper is available at: https:// p(y d). Thus, we cannot simply pull d into the inte- github.com/stevenkleinegesse/minebed gralsj defining the expectations and computer the derivatives. BED for Implicit Models using MINE For the same reason we cannot use score-function estima- follows the same distribution as θ, which then allows us to tors (Mohamed et al., 2019), as they rely on the analytic follow the procedure as described above.