Reparameterizing Distributions on Lie Groups

UvA-DARE (Digital Academic Repository) Reparameterizing Distributions on Lie Groups Falorsi, L.; de Haan, P.; Davidson, T.R.; Forré, P. Publication date 2019 Document Version Final published version Published in Proceedings of Machine Learning Research License Other Link to publication Citation for published version (APA): Falorsi, L., de Haan, P., Davidson, T. R., & Forré, P. (2019). Reparameterizing Distributions on Lie Groups. Proceedings of Machine Learning Research, 89, 3244-3253. https://arxiv.org/abs/1903.02958 General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:27 Sep 2021 Reparameterizing Distributions on Lie Groups Luca Falorsi1,2 Pim de Haan1,3 Tim R. Davidson1,2 Patrick Forré1 1University of Amsterdam 2Aiconic 3UC Berkeley Abstract non-trivial topology has had a longstanding tradition and giant impact on such fields as physics, mathemat- ics, and various engineering disciplines. One significant Reparameterizable densities are an important class describing numerous spaces of fundamental inter- way to learn probability distributions in a est is that of Lie groups, which are groups of symmetry deep learning setting. For many distributions transformations that are simultaneously differentiable it is possible to create low-variance gradient manifolds. Lie groups include rotations, translations, estimators by utilizing a ‘reparameterization scaling, and other geometric transformations, which trick’. Due to the absence of a general repa- play an important role in several application domains. rameterization trick, much research has re- Lie group elements are for example utilized to describe cently been devoted to extend the number the rigid body rotations and movements central in of reparameterizable distributional families. robotics, and form a key ingredient in the formula- Unfortunately, this research has primarily fo- tion of the Standard Model of particle physics. They cused on distributions defined in Euclidean also provide the building blocks underlying ideas in a space, ruling out the usage of one of the most plethora of mathematical branches such as holonomy in influential class of spaces with non-trivial Riemannian geometry, root systems in Combinatorics, topologies: Lie groups. In this work we define and the Langlands Program connecting geometry and a general framework to create reparameteri- number theory. zable densities on arbitrary Lie groups, and provide a detailed practitioners guide to fur- Many of the most notable recent results in machine ther the ease of usage. We demonstrate how learning can be attributed to researchers’ ability to com- to create complex and multimodal distribu- bine probability theoretical concepts with the power of tions on the well known oriented group of deep learning architectures, e.g. by devising optimiza- 3D rotations, SO(3), using normalizing flows. tion strategies able to directly optimize the parameters Our experiments on applying such distribu- of probability distributions from samples through back- tions in a Bayesian setting for pose estimation propagation. Perhaps the most successful instantiation on objects with discrete and continuous sym- of this combination, has come in the framework of Vari- metries, showcase their necessity in achieving ational Inference (VI) (Jordan et al., 1999), a Bayesian realistic uncertainty estimates. method used to approximate intractable probability densities through optimization. Crucial for VI is the ability to posit a flexible family of densities, and a way 1 INTRODUCTION to find the member closest to the true posterior by optimizing the parameters. Formulating observed data points as the outcomes of These variational parameters are typically optimized probabilistic processes, has proven to provide a useful using the evidence lower bound (ELBO), a lower bound framework to design successful machine learning mod- on the data likelihood. The two main approaches to ob- els. Thus far, the research community has drawn almost taining estimates of the gradients of the ELBO are the exclusively from results in probability theory limited to score function (Paisley et al., 2012; Mnih and Gregor, Euclidean and discrete space. Yet, expanding the set 2014) also known as REINFORCE (Williams, 1992), of possible spaces under consideration to those with a and the reparameterization trick (Price, 1958; Bonnet, 1964; Salimans et al., 2013; Kingma and Welling, 2013; Proceedings of the 22nd International Conference on Ar- Rezende et al., 2014). While various works have shown tificial Intelligence and Statistics (AISTATS) 2019, Naha, the latter to provide lower variance estimates, its use Okinawa, Japan. PMLR: Volume 89. Copyright 2019 by is limited by the absence of a general formulation for the author(s). Reparameterizing Distributions on Lie Groups all variational families. Although in recent years much as product. work has been done in extending this class of reparameterizable families (Ruiz et al., 2016; Naesseth et al., Lie Algebra, g: The Lie algebra g of a N dimen- 2017; Figurnov et al., 2018), none of these methods sional Lie group is its tangent space at the identity, explicitly investigate the case of distributions defined which is a vector space of N dimensions. We can see on non-trivial manifolds such as Lie groups. the algebra elements as infinitesimal generators, from which all other elements in the group can be created. The principal contribution of this paper is therefore For matrix Lie groups we can represent vectors v in to extend the reparameterization trick to Lie groups. the tangent space as matrices v . We achieve this by providing a general framework to × define reparameterizable densities on Lie groups, under Exponential Map, exp(·): The structure of the al- which the well-known Gaussian case of Kingma and gebra creates a map from an element of the algebra to Welling (2013) is recovered as a special instantiation. a vector field on the group manifold. This gives rise to This is done by pushing samples from the Lie algebra the exponential map, exp : g ! G which maps an alge- into the Lie group using the exponential map, and by bra element to the group element at unit length from observing that the corresponding density change can the identity along the flow of the vector field. The zero be analytically computed. We formally describe our vector is thus mapped to the identity. For compact con- approach using results from differential geometry and nected Lie groups, such as SO(3), the exponential map measure theory. is surjective. Often, the map is not injective, so the In the remainder of this work we first cover some pre- inverse, the log map, is multi-valued. The exponential liminary concepts on Lie groups and the reparameteri- map of matrix Lie groups is the matrix exponential. zation trick. We then proceed to present the general idea underlying our reparameterization trick for Lie Adjoint Representation, adx: The Lie algebra is groups (ReLie1), followed by a formal proof. Addition- equipped with with a bracket [·; ·]: g × g ! g, ally, we provide an implementation section2 where we which is bilinear. The bracket relates the structure study three important examples of Lie groups, deriving of the group to structure on the algebra. For exam- the reparameterization details for the n-Torus, TN , the ple, log(exp(x) exp(y)) can be expressed in terms of oriented group of 3D rotations, SO(3), and the group the bracket. The bracket of matrix Lie groups is the of 3D rotations and translations, SE(3). We conclude commutator of the algebra matrices. The adjoint rep- by creating complex and multimodal reparameterizable resentation of x 2 g is the matrix representation of the densities on SO(3) using a novel non-invertible normal- linear map adx : g ! g : y 7! [x; y]. izing flow, demonstrating applications of our work in both a supervised and unsupervised setting. 2.2 Reparameterization Trick The reparameterization trick (Price, 1958; Bonnet, 2 PRELIMINARIES 1964; Salimans et al., 2013; Kingma and Welling, 2013; Rezende et al., 2014) is a technique to simulate samples In this section we first cover a number of preliminary z ∼ q(z; θ) as z = T (; θ), where ∼ s() is indepen- concepts that will be used in the rest of this paper. dent from θ 4, and the transformation T (; θ) should be differentiable w.r.t. θ. It has been shown that this 2.1 Lie Groups and Lie Algebras generally results in lower variance estimates than score function variants, thus leading to more efficient and bet- Lie Group, G: A Lie group, G is a group that is also ter convergence results (Titsias and Lázaro-Gredilla, a smooth manifold. This means that we can, at least 2014; Fan et al., 2015). This reparameterization of in local regions, describe group elements continuously samples z, allows expectations w.r.t. q(z; θ) to be with parameters. The number of parameters equals rewritten as Eq(z;θ)[f(z)] = Es()[f(T (; θ))], thus mak- the dimension of the group. We can see (connected) ing it possible to directly optimize the parameters of a Lie groups as continuous symmetries where we can probability distribution through backpropagation. continuously traverse between group elements3.

Reparameterizing Distributions on Lie Groups

Quasi-Invariant Gaussian Measures for the Cubic Fourth Order Nonlinear Schrödinger Equation

A Brief on Characteristic Functions

ON the ISOMORPHISM PROBLEM for NON-MINIMAL TRANSFORMATIONS with DISCRETE SPECTRUM Nikolai Edeko (Communicated by Bryna Kra) 1. I

Sarnak's Möbius Disjointness for Dynamical Systems with Singular Spectrum and Dissection of Möbius Flow

Dynamical Systems of Probabilistic Origin: Gaussian and Poisson Systems Elise Janvresse, Emmanuel Roy, Thierry De La Rue

An Application of a Functional Inequality to Quasi-Invariance in Inﬁnite Dimensions

Incorporating Parameter Risk Into Derivatives Prices – Bid-Ask Pricing and Calibration

Semidefinite Approximations of Invariant Measures for Polynomial

Arxiv:1811.08174V1 [Math.PR] 20 Nov 2018 Rcse Hc R Qiain Ne Oednmc.Tediﬀeren the Dynamics

Non-Autonomous Random Dynamical Systems: Stochastic Approximation and Rate-Induced Tipping

Topics in Ergodic Theory: Notes 1

Hitting Time Statistics for Observations of Dynamical Systems