Probabilistic Path Hamiltonian Monte Carlo

Probabilistic Path Hamiltonian Monte Carlo Vu Dinh * 1 Arman Bilge * 1 2 Cheng Zhang * 1 Frederick A. Matsen IV 1 Abstract date step, and thus has proved to be more effective than standard MCMC methods in a variety of applications. The Hamiltonian Monte Carlo (HMC) is an efficient method has gained a lot of interest from the scientific com- and effective means of sampling posterior distri- munity and since then has been extended to tackle the prob- butions on Euclidean space, which has been ex- lem of sampling on various geometric structures such as tended to manifolds with boundary. However, constrained spaces (Lan et al., 2014; Brubaker et al., 2012; some applications require an extension to more Hartmann and Schutte¨ , 2005), on general Hilbert space general spaces. For example, phylogenetic (evo- (Beskos et al., 2011) and on Riemannian manifolds (Giro- lutionary) trees are defined in terms of both a lami and Calderhead, 2011; Wang et al., 2013). discrete graph and associated continuous parameters; although one can represent these aspects However, these extensions are not yet sufficient to apply to using a single connected space, this rather com- all sampling problems, such as in phylogenetics, the infer- plex space is not suitable for existing HMC al- ence of evolutionary trees. Phylogenetics is the study of gorithms. In this paper, we develop Probabilistic the evolutionary history and relationships among individ- Path HMC (PPHMC) as a first step to sampling uals or groups of organisms. In its statistical formulation distributions on spaces with intricate combina- it is an inference problem on hypotheses of shared history torial structure. We define PPHMC on orthant based on observed heritable traits under a model of evolu- complexes, show that the resulting Markov chain tion. Phylogenetics is an essential tool for understanding is ergodic, and provide a promising implementa- biological systems and is an important discipline of com- tion for the case of phylogenetic trees in open- putational biology. The Bayesian paradigm is now com- source software. We also show that a surrogate monly used to assess support for inferred tree structures function to ease the transition across a bound- or to test hypotheses that can be expressed in phylogenetic ary on which the log-posterior has discontinuous terms (Huelsenbeck et al., 2001). derivatives can greatly improve efficiency. Although the last several decades have seen an explosion of advanced methods for sampling from Bayesian posterior distributions, including HMC, phylogenetics still uses rela- 1. Introduction tively classical Markov chain Monte Carlo (MCMC) based Hamiltonian Monte Carlo is a powerful sampling algo- methods. This is in part because the number of possible tree rithm which has been shown to outperform many exist- topologies (the labeled graphs describing the branching ing MCMC algorithms, especially in problems with high- structure of the evolutionary history) explodes combinato- dimensional and correlated distributions (Duane et al., rially as the number of species increases. Also, to represent 1987; Neal, 2011). The algorithm mimics the movement of the phylogenetic relation among a fixed number of species, a body balancing potential and kinetic energy by extending one needs to specify both the tree topology (a discrete ob- the state space to include auxiliary momentum variables ject) and the branch lengths (continuous distances). This and using Hamiltonian dynamics. By traversing long iso- composite structure has thus far limited sampling methods probability contours in this extended state space, HMC is to relatively classical Markov chain Monte Carlo (MCMC) able to move long distances in state space in a single up- based methods. One path forward is to use a construction of the set of phylogenetic trees as a single connected space *Equal contribution 1Program in Computational Biology, composed of Euclidean spaces glued together in a com- Fred Hutchison Cancer Research Center, Seattle, WA, USA binatorial fashion (Kim, 2000; Moulton and Steel, 2004; 2Department of Statistics, University of Washington, Seattle, WA, USA. Correspondence to: Frederick, A. Matsen IV <mat- Billera et al., 2001; Gavryushkin and Drummond, 2016) [email protected]>. and try to define an HMC-like algorithm thereupon. Proceedings of the 34 th International Conference on Machine Experts in HMC are acutely aware of the need to extend Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 HMC to such spaces with intricate combinatorial structure: by the author(s). Betancourt(2017) describes the extension of HMC to dis- Probabilistic Path Hamiltonian Monte Carlo 12|3456 different between topologies. In fact, there is no general (a) 1 5 notion of differentiability of the posterior function on the 2 3 4 6 whole tree space and any scheme to approximate Hamilto- nian dynamics needs to take this issue into account. 1 5 In this paper, we develop Probabilistic Path Hamiltonian 123|456 2 3 4 6 Monte Carlo (PPHMC) as a first step to sampling distribu- 1356|24 tions on spaces with intricate combinatorial structure (Fig- 1 5 ure1). After reviewing how the ensemble of phylogenetic 6 3 2 4 1 5 trees is naturally described as a geometric structure we 3 6 identify as an orthant complex (Billera et al., 2001), we de- 2 4 fine PPHMC for sampling posteriors on orthant complexes 13|2456 along with a probabilistic version of the leapfrog algorithm. This algorithm generalizes previous HMC algorithms by doing classical HMC on the Euclidean components of the (b) q orthant complex, but making random choices between al- ternative paths available at a boundary. We establish that p the integrator retains the good theoretical properties of q Hamiltonian dynamics in classical settings, namely prob- 1 abilistic equivalents of time-reversibility, volume preserva- tion, and accessibility, which combined together result in q a proof of ergodicity for the resulting Markov chain. Al- 2 though a direct application of the integrator to the phylo- q genetic posterior does work, we obtain significantly bet- 3 ter performance by using a surrogate function near the boundary between topologies to control approximation er- Figure 1. PPHMC on the orthant complex of tree space, in which ror. This approach also addresses a general problem in us- n ing Reflective HMC (RHMC; Afshar and Domke, 2015) each orthant (i.e. R≥0) represents the continuous branch length parameters of one tree topology. PPHMC uses the leapfrog al- for energy functions with discontinuous derivatives (for gorithm to approximate Hamiltonian dynamics on each orthant, which the accuracy of RHMC is of order O(), instead of but can move between tree topologies by crossing boundaries be- the standard local error O(3) of HMC on Rn). We pro- tween orthants. (a) A single PPHMC step moving through three vide, validate, and benchmark two independent implemen- topologies; each topology change along the path is one NNI move. tations in open-source software. (b) Because three orthants meet at every top-dimensional boundary, the algorithm must make a choice as to which topology to se- lect. PPHMC uniformly selects a neighboring tree topology when 2. Mathematical framework the algorithm hits such a boundary. Here we show three potential 2.1. Bayesian learning on phylogenetic tree space outcomes q1, q2 and q3 of running a single step of PPHMC started at position q with momentum p. A phylogenetic tree (τ; q) is a tree graph τ with N leaves, each of which has a label, and such that each edge e is associated with a non-negative number qe. Trees will be crete and tree spaces as a major outstanding challenge for assumed to be bifurcating (internal nodes of degree 3) un- the area. However, there are several challenges to defining less otherwise specified. We denote the number of edges of a continuous dynamics-based sampling methods on such such a tree by n = 2N − 3. Any edge incident to a leaf spaces. These tree spaces are composed of Euclidean com- is called a pendant edge, and any other edge is called an ponents, one for each discrete tree topology, which are internal edge. Let TN be the set of all N-leaf phylogenetic glued together in a way that respects natural similarities be- trees for which the lengths of pendant edges are bounded tween trees. These similarities dictate that more than two from below by some constant e0 > 0. (This lower bound such Euclidean spaces should get glued together along a on branch lengths is a technical condition for theoretical common lower-dimensional boundary. The resulting lack development and can be relaxed in practice.) of manifold structure poses a problem to the construction of We will use nearest neighbor interchange (NNI) moves an HMC sampling method on tree space, since up to now, (Robinson, 1971) to formalize what tree topologies that are HMC has just been defined on spaces with differential ge- “near” to each other. An NNI move is a transformation ometry. Similarly, while the posterior function is smooth that collapses an internal edge to zero and then expands within each topology, the function’s behavior may be very Probabilistic Path Hamiltonian Monte Carlo the resulting degree 4 vertex into an edge and two degree 3 Given a proper prior distribution with density π0 imposed vertices in a new way (Figure1a). Two tree topologies τ1 on the branch lengths and on tree topologies, the posterior and τ2 are called adjacent topologies if there exists a single distribution can be computed as P(τ; q) / L(τ; q)π0(τ; q). NNI move that transforms τ1 into τ2. 2.2. Bayesian learning on orthant complexes We will parameterize TN as Billera-Holmes-Vogtmann (BHV) space (Billera et al., 2001), which we describe as With the motivation of phylogenetics in mind, we now de- n follows. An orthant of dimension n is simply R≥0; each scribe how the phylogenetic problem sits as a specific case n-dimensional orthant is bounded by a collection of lower of a more general problem of Bayesian learning on orthant dimensional orthant faces.

Probabilistic Path Hamiltonian Monte Carlo

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Arxiv:1801.02309V4 [Stat.ML] 11 Dec 2019

An Introduction to Data Analysis Using the Pymc3 Probabilistic Programming Framework: a Case Study with Gaussian Mixture Modeling

Hamiltonian Monte Carlo Swindles

Hamiltonian Monte Carlo Within Stan

Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables

Markov Chain Monte Carlo Methods for Estimating Systemic Risk Allocations

The Geometric Foundations of Hamiltonian Monte Carlo 3

Probabilistic Programming in Python Using Pymc3

Modified Hamiltonian Monte Carlo for Bayesian Inference

The Metropolis–Hastings Algorithm C.P

Continuous Relaxations for Discrete Hamiltonian Monte Carlo