Probabilistic Path Hamiltonian Monte Carlo

Total Page:16

File Type:pdf, Size:1020Kb

Probabilistic Path Hamiltonian Monte Carlo Probabilistic Path Hamiltonian Monte Carlo Vu Dinh * 1 Arman Bilge * 1 2 Cheng Zhang * 1 Frederick A. Matsen IV 1 Abstract date step, and thus has proved to be more effective than standard MCMC methods in a variety of applications. The Hamiltonian Monte Carlo (HMC) is an efficient method has gained a lot of interest from the scientific com- and effective means of sampling posterior distri- munity and since then has been extended to tackle the prob- butions on Euclidean space, which has been ex- lem of sampling on various geometric structures such as tended to manifolds with boundary. However, constrained spaces (Lan et al., 2014; Brubaker et al., 2012; some applications require an extension to more Hartmann and Schutte¨ , 2005), on general Hilbert space general spaces. For example, phylogenetic (evo- (Beskos et al., 2011) and on Riemannian manifolds (Giro- lutionary) trees are defined in terms of both a lami and Calderhead, 2011; Wang et al., 2013). discrete graph and associated continuous param- eters; although one can represent these aspects However, these extensions are not yet sufficient to apply to using a single connected space, this rather com- all sampling problems, such as in phylogenetics, the infer- plex space is not suitable for existing HMC al- ence of evolutionary trees. Phylogenetics is the study of gorithms. In this paper, we develop Probabilistic the evolutionary history and relationships among individ- Path HMC (PPHMC) as a first step to sampling uals or groups of organisms. In its statistical formulation distributions on spaces with intricate combina- it is an inference problem on hypotheses of shared history torial structure. We define PPHMC on orthant based on observed heritable traits under a model of evolu- complexes, show that the resulting Markov chain tion. Phylogenetics is an essential tool for understanding is ergodic, and provide a promising implementa- biological systems and is an important discipline of com- tion for the case of phylogenetic trees in open- putational biology. The Bayesian paradigm is now com- source software. We also show that a surrogate monly used to assess support for inferred tree structures function to ease the transition across a bound- or to test hypotheses that can be expressed in phylogenetic ary on which the log-posterior has discontinuous terms (Huelsenbeck et al., 2001). derivatives can greatly improve efficiency. Although the last several decades have seen an explosion of advanced methods for sampling from Bayesian posterior distributions, including HMC, phylogenetics still uses rela- 1. Introduction tively classical Markov chain Monte Carlo (MCMC) based Hamiltonian Monte Carlo is a powerful sampling algo- methods. This is in part because the number of possible tree rithm which has been shown to outperform many exist- topologies (the labeled graphs describing the branching ing MCMC algorithms, especially in problems with high- structure of the evolutionary history) explodes combinato- dimensional and correlated distributions (Duane et al., rially as the number of species increases. Also, to represent 1987; Neal, 2011). The algorithm mimics the movement of the phylogenetic relation among a fixed number of species, a body balancing potential and kinetic energy by extending one needs to specify both the tree topology (a discrete ob- the state space to include auxiliary momentum variables ject) and the branch lengths (continuous distances). This and using Hamiltonian dynamics. By traversing long iso- composite structure has thus far limited sampling methods probability contours in this extended state space, HMC is to relatively classical Markov chain Monte Carlo (MCMC) able to move long distances in state space in a single up- based methods. One path forward is to use a construction of the set of phylogenetic trees as a single connected space *Equal contribution 1Program in Computational Biology, composed of Euclidean spaces glued together in a com- Fred Hutchison Cancer Research Center, Seattle, WA, USA binatorial fashion (Kim, 2000; Moulton and Steel, 2004; 2Department of Statistics, University of Washington, Seattle, WA, USA. Correspondence to: Frederick, A. Matsen IV <mat- Billera et al., 2001; Gavryushkin and Drummond, 2016) [email protected]>. and try to define an HMC-like algorithm thereupon. Proceedings of the 34 th International Conference on Machine Experts in HMC are acutely aware of the need to extend Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 HMC to such spaces with intricate combinatorial structure: by the author(s). Betancourt(2017) describes the extension of HMC to dis- Probabilistic Path Hamiltonian Monte Carlo 12|3456 different between topologies. In fact, there is no general (a) 1 5 notion of differentiability of the posterior function on the 2 3 4 6 whole tree space and any scheme to approximate Hamilto- nian dynamics needs to take this issue into account. 1 5 In this paper, we develop Probabilistic Path Hamiltonian 123|456 2 3 4 6 Monte Carlo (PPHMC) as a first step to sampling distribu- 1356|24 tions on spaces with intricate combinatorial structure (Fig- 1 5 ure1). After reviewing how the ensemble of phylogenetic 6 3 2 4 1 5 trees is naturally described as a geometric structure we 3 6 identify as an orthant complex (Billera et al., 2001), we de- 2 4 fine PPHMC for sampling posteriors on orthant complexes 13|2456 along with a probabilistic version of the leapfrog algorithm. This algorithm generalizes previous HMC algorithms by doing classical HMC on the Euclidean components of the (b) q orthant complex, but making random choices between al- ternative paths available at a boundary. We establish that p the integrator retains the good theoretical properties of q Hamiltonian dynamics in classical settings, namely prob- 1 abilistic equivalents of time-reversibility, volume preserva- tion, and accessibility, which combined together result in q a proof of ergodicity for the resulting Markov chain. Al- 2 though a direct application of the integrator to the phylo- q genetic posterior does work, we obtain significantly bet- 3 ter performance by using a surrogate function near the boundary between topologies to control approximation er- Figure 1. PPHMC on the orthant complex of tree space, in which ror. This approach also addresses a general problem in us- n ing Reflective HMC (RHMC; Afshar and Domke, 2015) each orthant (i.e. R≥0) represents the continuous branch length parameters of one tree topology. PPHMC uses the leapfrog al- for energy functions with discontinuous derivatives (for gorithm to approximate Hamiltonian dynamics on each orthant, which the accuracy of RHMC is of order O(), instead of but can move between tree topologies by crossing boundaries be- the standard local error O(3) of HMC on Rn). We pro- tween orthants. (a) A single PPHMC step moving through three vide, validate, and benchmark two independent implemen- topologies; each topology change along the path is one NNI move. tations in open-source software. (b) Because three orthants meet at every top-dimensional bound- ary, the algorithm must make a choice as to which topology to se- lect. PPHMC uniformly selects a neighboring tree topology when 2. Mathematical framework the algorithm hits such a boundary. Here we show three potential 2.1. Bayesian learning on phylogenetic tree space outcomes q1, q2 and q3 of running a single step of PPHMC started at position q with momentum p. A phylogenetic tree (τ; q) is a tree graph τ with N leaves, each of which has a label, and such that each edge e is associated with a non-negative number qe. Trees will be crete and tree spaces as a major outstanding challenge for assumed to be bifurcating (internal nodes of degree 3) un- the area. However, there are several challenges to defining less otherwise specified. We denote the number of edges of a continuous dynamics-based sampling methods on such such a tree by n = 2N − 3. Any edge incident to a leaf spaces. These tree spaces are composed of Euclidean com- is called a pendant edge, and any other edge is called an ponents, one for each discrete tree topology, which are internal edge. Let TN be the set of all N-leaf phylogenetic glued together in a way that respects natural similarities be- trees for which the lengths of pendant edges are bounded tween trees. These similarities dictate that more than two from below by some constant e0 > 0. (This lower bound such Euclidean spaces should get glued together along a on branch lengths is a technical condition for theoretical common lower-dimensional boundary. The resulting lack development and can be relaxed in practice.) of manifold structure poses a problem to the construction of We will use nearest neighbor interchange (NNI) moves an HMC sampling method on tree space, since up to now, (Robinson, 1971) to formalize what tree topologies that are HMC has just been defined on spaces with differential ge- “near” to each other. An NNI move is a transformation ometry. Similarly, while the posterior function is smooth that collapses an internal edge to zero and then expands within each topology, the function’s behavior may be very Probabilistic Path Hamiltonian Monte Carlo the resulting degree 4 vertex into an edge and two degree 3 Given a proper prior distribution with density π0 imposed vertices in a new way (Figure1a). Two tree topologies τ1 on the branch lengths and on tree topologies, the posterior and τ2 are called adjacent topologies if there exists a single distribution can be computed as P(τ; q) / L(τ; q)π0(τ; q). NNI move that transforms τ1 into τ2. 2.2. Bayesian learning on orthant complexes We will parameterize TN as Billera-Holmes-Vogtmann (BHV) space (Billera et al., 2001), which we describe as With the motivation of phylogenetics in mind, we now de- n follows. An orthant of dimension n is simply R≥0; each scribe how the phylogenetic problem sits as a specific case n-dimensional orthant is bounded by a collection of lower of a more general problem of Bayesian learning on orthant dimensional orthant faces.
Recommended publications
  • The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo
    Journal of Machine Learning Research 15 (2014) 1593-1623 Submitted 11/11; Revised 10/13; Published 4/14 The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo Matthew D. Hoffman [email protected] Adobe Research 601 Townsend St. San Francisco, CA 94110, USA Andrew Gelman [email protected] Departments of Statistics and Political Science Columbia University New York, NY 10027, USA Editor: Anthanasios Kottas Abstract Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS performs at least as efficiently as (and sometimes more efficiently than) a well tuned standard HMC method, without requiring user intervention or costly tuning runs.
    [Show full text]
  • Arxiv:1801.02309V4 [Stat.ML] 11 Dec 2019
    Journal of Machine Learning Research 20 (2019) 1-42 Submitted 4/19; Revised 11/19; Published 12/19 Log-concave sampling: Metropolis-Hastings algorithms are fast Raaz Dwivedi∗;y [email protected] Yuansi Chen∗;} [email protected] Martin J. Wainwright};y;z [email protected] Bin Yu};y [email protected] Department of Statistics} Department of Electrical Engineering and Computer Sciencesy University of California, Berkeley Voleon Groupz, Berkeley Editor: Suvrit Sra Abstract We study the problem of sampling from a strongly log-concave density supported on Rd, and prove a non-asymptotic upper bound on the mixing time of the Metropolis-adjusted Langevin algorithm (MALA). The method draws samples by simulating a Markov chain obtained from the discretization of an appropriate Langevin diffusion, combined with an accept-reject step. Relative to known guarantees for the unadjusted Langevin algorithm (ULA), our bounds show that the use of an accept-reject step in MALA leads to an ex- ponentially improved dependence on the error-tolerance. Concretely, in order to obtain samples with TV error at most δ for a density with condition number κ, we show that MALA requires κd log(1/δ) steps from a warm start, as compared to the κ2d/δ2 steps establishedO in past work on ULA. We also demonstrate the gains of a modifiedO version of MALA over ULA for weakly log-concave densities. Furthermore, we derive mixing time bounds for the Metropolized random walk (MRW) and obtain (κ) mixing time slower than MALA. We provide numerical examples that support ourO theoretical findings, and demonstrate the benefits of Metropolis-Hastings adjustment for Langevin-type sampling algorithms.
    [Show full text]
  • An Introduction to Data Analysis Using the Pymc3 Probabilistic Programming Framework: a Case Study with Gaussian Mixture Modeling
    An introduction to data analysis using the PyMC3 probabilistic programming framework: A case study with Gaussian Mixture Modeling Shi Xian Liewa, Mohsen Afrasiabia, and Joseph L. Austerweila aDepartment of Psychology, University of Wisconsin-Madison, Madison, WI, USA August 28, 2019 Author Note Correspondence concerning this article should be addressed to: Shi Xian Liew, 1202 West Johnson Street, Madison, WI 53706. E-mail: [email protected]. This work was funded by a Vilas Life Cycle Professorship and the VCRGE at University of Wisconsin-Madison with funding from the WARF 1 RUNNING HEAD: Introduction to PyMC3 with Gaussian Mixture Models 2 Abstract Recent developments in modern probabilistic programming have offered users many practical tools of Bayesian data analysis. However, the adoption of such techniques by the general psychology community is still fairly limited. This tutorial aims to provide non-technicians with an accessible guide to PyMC3, a robust probabilistic programming language that allows for straightforward Bayesian data analysis. We focus on a series of increasingly complex Gaussian mixture models – building up from fitting basic univariate models to more complex multivariate models fit to real-world data. We also explore how PyMC3 can be configured to obtain significant increases in computational speed by taking advantage of a machine’s GPU, in addition to the conditions under which such acceleration can be expected. All example analyses are detailed with step-by-step instructions and corresponding Python code. Keywords: probabilistic programming; Bayesian data analysis; Markov chain Monte Carlo; computational modeling; Gaussian mixture modeling RUNNING HEAD: Introduction to PyMC3 with Gaussian Mixture Models 3 1 Introduction Over the last decade, there has been a shift in the norms of what counts as rigorous research in psychological science.
    [Show full text]
  • Hamiltonian Monte Carlo Swindles
    Hamiltonian Monte Carlo Swindles Dan Piponi Matthew D. Hoffman Pavel Sountsov Google Research Google Research Google Research Abstract become indistinguishable (Mangoubi and Smith, 2017; Bou-Rabee et al., 2018). These results were proven in Hamiltonian Monte Carlo (HMC) is a power- service of proofs of convergence and mixing speed for ful Markov chain Monte Carlo (MCMC) algo- HMC, but have been used by Heng and Jacob (2019) rithm for estimating expectations with respect to derive unbiased estimators from finite-length HMC to continuous un-normalized probability dis- chains. tributions. MCMC estimators typically have Empirically, we observe that two HMC chains targeting higher variance than classical Monte Carlo slightly different stationary distributions still couple with i.i.d. samples due to autocorrelations; approximately when driven by the same randomness. most MCMC research tries to reduce these In this paper, we propose methods that exploit this autocorrelations. In this work, we explore approximate-coupling phenomenon to reduce the vari- a complementary approach to variance re- ance of HMC-based estimators. These methods are duction based on two classical Monte Carlo based on two classical Monte Carlo variance-reduction “swindles”: first, running an auxiliary coupled techniques (sometimes called “swindles”): antithetic chain targeting a tractable approximation to sampling and control variates. the target distribution, and using the aux- iliary samples as control variates; and sec- In the antithetic scheme, we induce anti-correlation ond, generating anti-correlated ("antithetic") between two chains by driving the second chain with samples by running two chains with flipped momentum variables that are negated versions of the randomness. Both ideas have been explored momenta from the first chain.
    [Show full text]
  • Hamiltonian Monte Carlo Within Stan
    Hamiltonian Monte Carlo within Stan Daniel Lee Columbia University, Statistics Department [email protected] BayesComp mc-stan.org 1 Why MCMC? • Have data. • Have a rich statistical model. • No analytic solution. • (Point estimate not adequate.) 2 Review: MCMC • Markov Chain Monte Carlo. The samples form a Markov Chain. • Markov property: Pr(θn+1 j θ1; : : : ; θn) = Pr(θn+1 j θn) • Invariant distribution: π × Pr = π • Detailed balance: sufficient condition: R Pr(θn+1; A) = A q(θn+1; θn)dy π(θn+1)q(θn+1; θn) = π(θn)q(θn; θn+1) 3 Review: RWMH • Want: samples from posterior distribution: Pr(θjx) • Need: some function proportional to joint model. f (x, θ) / Pr(x, θ) • Algorithm: Given f (x, θ), x, N, Pr(θn+1 j θn) For n = 1 to N do Sample θˆ ∼ q(θˆ j θn−1) f (x,θ)ˆ With probability α = min 1; , set θn θˆ, else θn θn−1 f (x,θn−1) 4 Review: Hamiltonian Dynamics • (Implicit: d = dimension) • q = position (d-vector) • p = momentum (d-vector) • U(q) = potential energy • K(p) = kinectic energy • Hamiltonian system: H(q; p) = U(q) + K(p) 5 Review: Hamiltonian Dynamics • for i = 1; : : : ; d dqi = @H dt @pi dpi = − @H dt @qi • kinectic energy usually defined as K(p) = pT M−1p=2 • for i = 1; : : : ; d dqi −1 dt = [M p]i dpi = − @U dt @qi 6 Connection to MCMC • q, position, is the vector of parameters • U(q), potential energy, is (proportional to) the minus the log probability density of the parameters • p, momentum, are augmented variables • K(p), kinectic energy, is calculated • Hamiltonian dynamics used to update q.
    [Show full text]
  • Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables
    Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables Guangyao Zhou Vicarious AI Union City, CA 94587, USA [email protected] Abstract Hamiltonian Monte Carlo (HMC) has emerged as a powerful Markov Chain Monte Carlo (MCMC) method to sample from complex continuous distributions. However, a fundamental limitation of HMC is that it can not be applied to distributions with mixed discrete and continuous variables. In this paper, we propose mixed HMC (M- HMC) as a general framework to address this limitation. M-HMC is a novel family of MCMC algorithms that evolves the discrete and continuous variables in tandem, allowing more frequent updates of discrete variables while maintaining HMC’s ability to suppress random-walk behavior. We establish M-HMC’s theoretical properties, and present an efficient implementation with Laplace momentum that introduces minimal overhead compared to existing HMC methods. The superior performances of M-HMC over existing methods are demonstrated with numerical experiments on Gaussian mixture models (GMMs), variable selection in Bayesian logistic regression (BLR), and correlated topic models (CTMs). 1 Introduction Markov chain Monte Carlo (MCMC) is one of the most powerful methods for sampling from probability distributions. The Metropolis-Hastings (MH) algorithm is a commonly used general- purpose MCMC method, yet is inefficient for complex, high-dimensional distributions because of the random walk nature of its movements. Recently, Hamiltonian Monte Carlo (HMC) [13, 22, 2] has emerged as a powerful alternative to MH for complex continuous distributions due to its ability to follow the curvature of target distributions using gradients information and make distant proposals with high acceptance probabilities.
    [Show full text]
  • Markov Chain Monte Carlo Methods for Estimating Systemic Risk Allocations
    risks Article Markov Chain Monte Carlo Methods for Estimating Systemic Risk Allocations Takaaki Koike * and Marius Hofert Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada; [email protected] * Correspondence: [email protected] Received: 24 September 2019; Accepted: 7 January 2020; Published: 15 January 2020 Abstract: In this paper, we propose a novel framework for estimating systemic risk measures and risk allocations based on Markov Chain Monte Carlo (MCMC) methods. We consider a class of allocations whose jth component can be written as some risk measure of the jth conditional marginal loss distribution given the so-called crisis event. By considering a crisis event as an intersection of linear constraints, this class of allocations covers, for example, conditional Value-at-Risk (CoVaR), conditional expected shortfall (CoES), VaR contributions, and range VaR (RVaR) contributions as special cases. For this class of allocations, analytical calculations are rarely available, and numerical computations based on Monte Carlo (MC) methods often provide inefficient estimates due to the rare-event character of the crisis events. We propose an MCMC estimator constructed from a sample path of a Markov chain whose stationary distribution is the conditional distribution given the crisis event. Efficient constructions of Markov chains, such as the Hamiltonian Monte Carlo and Gibbs sampler, are suggested and studied depending on the crisis event and the underlying loss distribution. The efficiency of the MCMC estimators is demonstrated in a series of numerical experiments. Keywords: systemic risk measures; conditional Value-at-Risk (CoVaR); capital allocation; copula models; quantitative risk management 1.
    [Show full text]
  • The Geometric Foundations of Hamiltonian Monte Carlo 3
    Submitted to Statistical Science The Geometric Foundations of Hamiltonian Monte Carlo Michael Betancourt, Simon Byrne, Sam Livingstone, and Mark Girolami Abstract. Although Hamiltonian Monte Carlo has proven an empirical success, the lack of a rigorous theoretical understanding of the algo- rithm has in many ways impeded both principled developments of the method and use of the algorithm in practice. In this paper we develop the formal foundations of the algorithm through the construction of measures on smooth manifolds, and demonstrate how the theory natu- rally identifies efficient implementations and motivates promising gen- eralizations. Key words and phrases: Markov Chain Monte Carlo, Hamiltonian Monte Carlo, Disintegration, Differential Geometry, Smooth Manifold, Fiber Bundle, Riemannian Geometry, Symplectic Geometry. The frontier of Bayesian inference requires algorithms capable of fitting complex mod- els with hundreds, if not thousands of parameters, intricately bound together with non- linear and often hierarchical correlations. Hamiltonian Monte Carlo (Duane et al., 1987; Neal, 2011) has proven tremendously successful at extracting inferences from these mod- els, with applications spanning computer science (Sutherland, P´oczos and Schneider, 2013; Tang, Srivastava and Salakhutdinov, 2013), ecology (Schofield et al., 2014; Terada, Inoue and Nishihara, 2013), epidemiology (Cowling et al., 2012), linguistics (Husain, Vasishth and Srinivasan, 2014), pharmacokinetics (Weber et al., 2014), physics (Jasche et al., 2010; Porter and Carr´e,
    [Show full text]
  • Probabilistic Programming in Python Using Pymc3
    Probabilistic programming in Python using PyMC3 John Salvatier1, Thomas V. Wiecki2 and Christopher Fonnesbeck3 1 AI Impacts, Berkeley, CA, United States 2 Quantopian Inc, Boston, MA, United States 3 Department of Biostatistics, Vanderbilt University, Nashville, TN, United States ABSTRACT Probabilistic programming allows for automatic Bayesian inference on user-defined probabilistic models. Recent advances in Markov chain Monte Carlo (MCMC) sampling allow inference on increasingly complex models. This class of MCMC, known as Hamiltonian Monte Carlo, requires gradient information which is often not readily available. PyMC3 is a new open source probabilistic programming framework written in Python that uses Theano to compute gradients via automatic differentiation as well as compile probabilistic programs on-the-fly to C for increased speed. Contrary to other probabilistic programming languages, PyMC3 allows model specification directly in Python code. The lack of a domain specific language allows for great flexibility and direct interaction with the model. This paper is a tutorial-style introduction to this software package. Subjects Data Mining and Machine Learning, Data Science, Scientific Computing and Simulation Keywords Bayesian statistic, Probabilistic Programming, Python, Markov chain Monte Carlo, Statistical modeling INTRODUCTION Probabilistic programming (PP) allows for flexible specification and fitting of Bayesian Submitted 9 September 2015 statistical models. PyMC3 is a new, open-source PP framework with an intuitive and Accepted 8 March 2016 readable, yet powerful, syntax that is close to the natural syntax statisticians use to Published 6 April 2016 describe models. It features next-generation Markov chain Monte Carlo (MCMC) Corresponding author Thomas V. Wiecki, sampling algorithms such as the No-U-Turn Sampler (NUTS) (Hoffman & Gelman, [email protected] 2014), a self-tuning variant of Hamiltonian Monte Carlo (HMC) (Duane et al., 1987).
    [Show full text]
  • Modified Hamiltonian Monte Carlo for Bayesian Inference
    Modified Hamiltonian Monte Carlo for Bayesian Inference Tijana Radivojević1;2;3, Elena Akhmatskaya1;4 1BCAM - Basque Center for Applied Mathematics 2Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory 3DOE Agile Biofoundry 4IKERBASQUE, Basque Foundation for Science July 26, 2019 Abstract The Hamiltonian Monte Carlo (HMC) method has been recognized as a powerful sampling tool in computational statistics. We show that performance of HMC can be significantly improved by incor- porating importance sampling and an irreversible part of the dynamics into a chain. This is achieved by replacing Hamiltonians in the Metropolis test with modified Hamiltonians, and a complete momen- tum update with a partial momentum refreshment. We call the resulting generalized HMC importance sampler—Mix & Match Hamiltonian Monte Carlo (MMHMC). The method is irreversible by construc- tion and further benefits from (i) the efficient algorithms for computation of modified Hamiltonians; (ii) the implicit momentum update procedure and (iii) the multi-stage splitting integrators specially derived for the methods sampling with modified Hamiltonians. MMHMC has been implemented, tested on the popular statistical models and compared in sampling efficiency with HMC, Riemann Manifold Hamiltonian Monte Carlo, Generalized Hybrid Monte Carlo, Generalized Shadow Hybrid Monte Carlo, Metropolis Adjusted Langevin Algorithm and Random Walk Metropolis-Hastings. To make a fair com- parison, we propose a metric that accounts for correlations among samples and weights,
    [Show full text]
  • The Metropolis–Hastings Algorithm C.P
    The Metropolis{Hastings algorithm C.P. Robert1;2;3 1Universit´eParis-Dauphine, 2University of Warwick, and 3CREST Abstract. This article is a self-contained introduction to the Metropolis- Hastings algorithm, this ubiquitous tool for producing dependent simula- tions from an arbitrary distribution. The document illustrates the principles of the methodology on simple examples with R codes and provides entries to the recent extensions of the method. Key words and phrases: Bayesian inference, Markov chains, MCMC meth- ods, Metropolis{Hastings algorithm, intractable density, Gibbs sampler, Langevin diffusion, Hamiltonian Monte Carlo. 1. INTRODUCTION There are many reasons why computing an integral like Z I(h) = h(x)dπ(x); X where dπ is a probability measure, may prove intractable, from the shape of the domain X to the dimension of X (and x), to the complexity of one of the functions h or π. Standard numerical methods may be hindered by the same reasons. Similar difficulties (may) occur when attempting to find the extrema of π over the domain X . This is why the recourse to Monte Carlo methods may prove unavoidable: exploiting the probabilistic nature of π and its weighting of the domain X is often the most natural and most efficient way to produce approximations to integrals connected with π and to determine the regions of the domain X that are more heavily weighted by π. The Monte Carlo approach (Hammersley and Handscomb, 1964; Rubinstein, 1981) emerged with computers, at the end of WWII, as it relies on the ability of producing a large number of realisations of a random variable distributed according to a given distribution, arXiv:1504.01896v3 [stat.CO] 27 Jan 2016 taking advantage of the stabilisation of the empirical average predicted by the Law of Large Numbers.
    [Show full text]
  • Continuous Relaxations for Discrete Hamiltonian Monte Carlo
    Continuous Relaxations for Discrete Hamiltonian Monte Carlo Yichuan Zhang, Charles Sutton, Amos Storkey Zoubin Ghahramani School of Informatics Department of Engineering University of Edinburgh University of Cambridge United Kingdom United Kingdom [email protected], [email protected] [email protected], [email protected] Abstract Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for in- ference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1 Introduction Discrete undirected graphical models have seen wide use in natural language processing [11, 24] and computer vision [19]. Although sophisticated inference algorithms exist for these models, including both exact algorithms and variational approximations, it has proven more difficult to develop discrete Markov chain Monte Carlo (MCMC) methods. Despite much work and many recent advances [3], the most commonly used MCMC methods in practice for discrete models are based on Metropolis- Hastings, the effectiveness of which is strongly dependent on the choice of proposal distribution. An appealing idea is to relax the constraint that the random variables of interest take integral values.
    [Show full text]