Slice Sampling on Hamiltonian Trajectories
Total Page:16
File Type:pdf, Size:1020Kb
Slice Sampling on Hamiltonian Trajectories Benjamin Bloem-Reddy [email protected] John P. Cunningham [email protected] Department of Statistics, Columbia University Abstract tonian system, and samples a new point from the distribu- tion by simulating a dynamical trajectory from the current Hamiltonian Monte Carlo and slice sampling sample point. With careful tuning, HMC exhibits favorable are amongst the most widely used and stud- mixing properties in many situations, particularly in high ied classes of Markov Chain Monte Carlo sam- dimensions, due to its ability to take large steps in sam- plers. We connect these two methods and present ple space if the dynamics are simulated for long enough. Hamiltonian slice sampling, which allows slice Proper tuning of HMC parameters can be difficult, and sampling to be carried out along Hamiltonian there has been much interest in automating it; examples in- trajectories, or transformations thereof. Hamil- clude Wang et al.(2013); Hoffman & Gelman(2014). Fur- tonian slice sampling clarifies a class of model thermore, better mixing rates associated with longer trajec- priors that induce closed-form slice samplers. tories can be computationally expensive because each sim- More pragmatically, inheriting properties of slice ulation step requires an evaluation or numerical approxi- samplers, it offers advantages over Hamiltonian mation of the gradient of the distribution. Monte Carlo, in that it has fewer tunable hyper- parameters and does not require gradient infor- Similar in objective but different in approach is slice sam- mation. We demonstrate the utility of Hamil- pling, which attempts to sample uniformly from the volume tonian slice sampling out of the box on prob- under the target density. Slice sampling has been employed lems ranging from Gaussian process regression successfully in a wide range of inference problems, in large to Pitman-Yor based mixture models. part because of its flexibility and relative ease of tuning (very few if any tunable parameters). Although efficiently slice sampling from univariate distributions is straightfor- 1. Introduction ward, in higher dimensions it is more difficult. Previous ap- proaches include slice sampling each dimension individu- After decades of work in approximate inference and nu- ally, amongst others. One particularly successful approach merical integration, Markov Chain Monte Carlo (MCMC) is to generate a curve, parameterized by a single scalar techniques remain the gold standard for working with in- value, through the high-dimensional sample space. Ellip- tractable probabilistic models, throughout statistics and tical slice sampling (ESS) (Murray et al., 2010) is one such machine learning. Of course, this gold standard also approach, generating ellipses parameterized by θ 2 [0; 2π]. comes with sometimes severe computational requirements, As we show in section 2.2, ESS is a special case of our which has spurred many developments for increasing the proposed sampling algorithm. efficacy of MCMC. Accordingly, numerous MCMC algo- rithms have been proposed in different fields and with dif- In this paper, we explore the connections between HMC ferent motivations, and perhaps as a result the similarities and slice sampling by observing that the elliptical trajectory between some popular methods have not been highlighted used in ESS is the Hamiltonian flow of a Gaussian poten- or exploited. Here we consider two important classes of tial. This observation is perhaps not surprising – indeed it MCMC methods: Hamiltonian Monte Carlo (HMC) (Neal, appears at least implicitly in Neal(2011); Pakman & Panin- 2011) and slice sampling (Neal, 2003). ski(2013); Strathmann et al.(2015) – but here we leverage that observation in a way not previously considered. To wit, HMC considers the (negative log) probability of the in- this connection between HMC and ESS suggests that we tractable distribution as the potential energy of a Hamil- might perform slice sampling along more general Hamil- rd tonian trajectories, a method we introduce under the name Proceedings of the 33 International Conference on Machine Hamiltonian slice sampling (HSS). HSS is of theoretical in- Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s). terest, allowing us to consider model priors that will induce Slice Sampling on Hamiltonian Trajectories simple closed form slice samplers. Perhaps more impor- the system evolves deterministically in an augmented state tantly, HSS is of practical utility. As the conceptual off- space (q; p) according to Hamilton’s equations. In par- spring of HMC and slice sampling, it inherits the relatively ticular, at the i-th sampling iteration of HMC, the initial (i−1) small amount of required tuning from ESS and the ability conditions of the system are q0 = q , the previous to take large steps in sample space from HMC techniques. sample, and p0, which in most implementations is sam- pled from N (0;M). M is a “mass” matrix that may be In particular, we offer the following contributions: used to express a priori beliefs about the scaling and depen- • We clarify the link between two popular classes of dence of the different dimensions, and for models with high MCMC techniques. dependence between sampling dimensions, M can greatly affect sampling efficiency. An active area of research is • We introduce Hamiltonian slice sampling, a general investigating how to adapt M for increased sampling effi- sampler for target distributions that factor into two ciency; for example Girolami & Calderhead(2011); Betan- components (e.g., a prior and likelihood), where the court et al.(2016). For simplicity, in this work we assume prior factorizes or can be transformed so as to factor- throughout that M is set ahead of time and remains fixed. ize, and all dependence structure in the target distri- The resulting Hamiltonian is bution is induced by the likelihood or the intermediate 1 transformation. H(q; p) = − logπ ~(q) + pT M −1p: (1) 2 • We show that the prior can be either of a form which induces analytical Hamiltonian trajectories, or more When solved exactly, Hamilton’s equations generate generally, of a form such that we can derive such a tra- Metropolis-Hastings (MH) proposals that are accepted with jectory via a measure preserving transformation. No- probability 1. In most situations of interest, Hamilton’s table members of this class include Gaussian process equations do not have an analytic solution, so HMC is of- and stick-breaking priors. ten performed using a numerical integrator, e.g. Störmer- Verlet, which is sensitive to tuning and can be computation- • We demonstrate the usefulness of HSS on a range of ally expensive due to evaluation or numerical approxima- models, both parametric and nonparametric. tion of the gradient at each step. See Neal(2011) for more details. We first review HMC and ESS to establish notation and a conceptual framework. We then introduce HSS generally, 2.2. Univariate and Elliptical Slice Sampling followed by a specific version based on a transformation to the unit interval. Finally, we demonstrate the effectiveness Univariate slice sampling (Neal, 2003) generates samples and flexibility of HSS on two different popular probabilistic uniformly from the area beneath a density p(x), such that models. the resulting samples are distributed according to p(x). It proceeds as follows: given a sample x0, a threshold 2. Sampling via Hamiltonian dynamics h ∼ U(0; p(x0)) is sampled; the next sample x1 is then sampled uniformly from the slice S := fx : p(x) > hg. We are interested in the problem of generating samples of When S is not known in closed form, a proxy slice S0 can a random variable f from an intractable distribution π∗(f), be randomly constructed from operations that leave the uni- either directly or via a measure preserving transformation form distribution on S invariant. In this paper, we use the r−1(q) = f. For clarity, we use f to denote the quantity stepping out and shrinkage procedure, with parameters w, of interest in its natural space as defined by the distribution the width, and m, the step out limit, from Neal(2003). π∗, and q denotes the transformed quantity of interest in Hamiltonian phase space, as described in detail below. We ESS (Murray et al., 2010) is a popular sampling algorithm defer further discussion of the map r(·) until section 2.4, for latent Gaussian models, e.g., Markov random field or GP but note that in typical implementations of HMC and ESS, Gaussian process ( ) models. For a latent multivariate f 2 d π(f) = N (fj0; Σ) r(·) is the indentity map, i.e. f = q. Gaussian R with prior , ESS gen- erates MCMC transitions f ! f 0 by sampling an auxiliary variable ν ∼ N (0; Σ) and slice sampling the likelihood 2.1. Hamiltonian Monte Carlo supported on the ellipse defined by HMC generates MCMC samples from a target dis- 0 tribution, often an intractable posterior distribution, f = f cos θ + ν sin θ; θ 2 [0; 2π]: (2) ∗ 1 π (q) := Z π~(q), with normalizing constant Z, by simu- lating the Hamiltonian dynamics of a particle in the poten- Noting the fact that the solutions to Hamilton’s equations in tial U(q) = − logπ ~(q). We are free to specify a distribu- an elliptical potential are ellipses, we may reinterpret ESS. tion for the particle’s starting momentum p, given which The Hamiltonian induced by π~(q) /N (qj0; Σ) and mo- Slice Sampling on Hamiltonian Trajectories mentum distribution p ∼ N (0;M) is Algorithm 1 Hamiltonian slice sampling 1 1 Input: Current state f; differentiable, one-to-one trans- H(q; p) = qT Σ−1q + pT M −1p: (3) −1 2 2 formation q := r(f) with Jacobian J(r (q)) Output: A new state f 0. If f is drawn from π∗, the When M = Σ−1, Hamilton’s equations yield the trajectory marginal distribution of f 0 is also π∗.