UvA-DARE (Digital Academic Repository)

Reparameterizing Distributions on Lie Groups

Falorsi, L.; de Haan, P.; Davidson, T.R.; Forré, P.

Publication date 2019 Document Version Final published version Published in Proceedings of Machine Learning Research License Other Link to publication

Citation for published version (APA): Falorsi, L., de Haan, P., Davidson, T. R., & Forré, P. (2019). Reparameterizing Distributions on Lie Groups. Proceedings of Machine Learning Research, 89, 3244-3253. https://arxiv.org/abs/1903.02958

General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

Download date:27 Sep 2021 Reparameterizing Distributions on Lie Groups

Luca Falorsi1,2 Pim de Haan1,3 Tim R. Davidson1,2 Patrick Forré1 1University of Amsterdam 2Aiconic 3UC Berkeley

Abstract non-trivial topology has had a longstanding tradition and giant impact on such fields as physics, mathemat- ics, and various engineering disciplines. One significant Reparameterizable densities are an important class describing numerous spaces of fundamental inter- way to learn probability distributions in a est is that of Lie groups, which are groups of symmetry deep learning setting. For many distributions transformations that are simultaneously differentiable it is possible to create low-variance gradient manifolds. Lie groups include rotations, translations, estimators by utilizing a ‘reparameterization scaling, and other geometric transformations, which trick’. Due to the absence of a general repa- play an important role in several application domains. rameterization trick, much research has re- elements are for example utilized to describe cently been devoted to extend the number the rigid body rotations and movements central in of reparameterizable distributional families. robotics, and form a key ingredient in the formula- Unfortunately, this research has primarily fo- tion of the Standard Model of particle physics. They cused on distributions defined in Euclidean also provide the building blocks underlying ideas in a space, ruling out the usage of one of the most plethora of mathematical branches such as holonomy in influential class of spaces with non-trivial Riemannian geometry, root systems in Combinatorics, topologies: Lie groups. In this work we define and the Langlands Program connecting geometry and a general framework to create reparameteri- number theory. zable densities on arbitrary Lie groups, and provide a detailed practitioners guide to fur- Many of the most notable recent results in machine ther the ease of usage. We demonstrate how learning can be attributed to researchers’ ability to com- to create complex and multimodal distribu- bine probability theoretical concepts with the power of tions on the well known oriented group of deep learning architectures, e.g. by devising optimiza- 3D rotations, SO(3), using normalizing flows. tion strategies able to directly optimize the parameters Our experiments on applying such distribu- of probability distributions from samples through back- tions in a Bayesian setting for pose estimation propagation. Perhaps the most successful instantiation on objects with discrete and continuous sym- of this combination, has come in the framework of Vari- metries, showcase their necessity in achieving ational Inference (VI) (Jordan et al., 1999), a Bayesian realistic uncertainty estimates. method used to approximate intractable probability densities through optimization. Crucial for VI is the ability to posit a flexible family of densities, and a way 1 INTRODUCTION to find the member closest to the true posterior by optimizing the parameters. Formulating observed data points as the outcomes of These variational parameters are typically optimized probabilistic processes, has proven to provide a useful using the evidence lower bound (ELBO), a lower bound framework to design successful machine learning mod- on the data likelihood. The two main approaches to ob- els. Thus far, the research community has drawn almost taining estimates of the gradients of the ELBO are the exclusively from results in probability theory limited to score function (Paisley et al., 2012; Mnih and Gregor, Euclidean and discrete space. Yet, expanding the set 2014) also known as REINFORCE (Williams, 1992), of possible spaces under consideration to those with a and the reparameterization trick (Price, 1958; Bonnet, 1964; Salimans et al., 2013; Kingma and Welling, 2013; Proceedings of the 22nd International Conference on Ar- Rezende et al., 2014). While various works have shown tificial Intelligence and Statistics (AISTATS) 2019, Naha, the latter to provide lower variance estimates, its use Okinawa, Japan. PMLR: Volume 89. Copyright 2019 by is limited by the absence of a general formulation for the author(s). Reparameterizing Distributions on Lie Groups all variational families. Although in recent years much as product. work has been done in extending this class of reparam- eterizable families (Ruiz et al., 2016; Naesseth et al., Lie Algebra, g: The Lie algebra g of a N dimen- 2017; Figurnov et al., 2018), none of these methods sional Lie group is its tangent space at the identity, explicitly investigate the case of distributions defined which is a vector space of N dimensions. We can see on non-trivial manifolds such as Lie groups. the algebra elements as infinitesimal generators, from which all other elements in the group can be created. The principal contribution of this paper is therefore For matrix Lie groups we can represent vectors v in to extend the reparameterization trick to Lie groups. the tangent space as matrices v . We achieve this by providing a general framework to × define reparameterizable densities on Lie groups, under Exponential Map, exp(·): The structure of the al- which the well-known Gaussian case of Kingma and gebra creates a map from an element of the algebra to Welling (2013) is recovered as a special instantiation. a vector field on the group manifold. This gives rise to This is done by pushing samples from the Lie algebra the exponential map, exp : g → G which maps an alge- into the Lie group using the exponential map, and by bra element to the group element at unit length from observing that the corresponding density change can the identity along the flow of the vector field. The zero be analytically computed. We formally describe our vector is thus mapped to the identity. For compact con- approach using results from differential geometry and nected Lie groups, such as SO(3), the exponential map theory. is surjective. Often, the map is not injective, so the In the remainder of this work we first cover some pre- inverse, the log map, is multi-valued. The exponential liminary concepts on Lie groups and the reparameteri- map of matrix Lie groups is the matrix exponential. zation trick. We then proceed to present the general idea underlying our reparameterization trick for Lie Adjoint Representation, adx: The Lie algebra is groups (ReLie1), followed by a formal proof. Addition- equipped with with a bracket [·, ·]: g × g → g, ally, we provide an implementation section2 where we which is bilinear. The bracket relates the structure study three important examples of Lie groups, deriving of the group to structure on the algebra. For exam- the reparameterization details for the n-Torus, TN , the ple, log(exp(x) exp(y)) can be expressed in terms of oriented group of 3D rotations, SO(3), and the group the bracket. The bracket of matrix Lie groups is the of 3D rotations and translations, SE(3). We conclude commutator of the algebra matrices. The adjoint rep- by creating complex and multimodal reparameterizable resentation of x ∈ g is the matrix representation of the densities on SO(3) using a novel non-invertible normal- linear map adx : g → g : y 7→ [x, y]. izing flow, demonstrating applications of our work in both a supervised and unsupervised setting. 2.2 Reparameterization Trick

The reparameterization trick (Price, 1958; Bonnet, 2 PRELIMINARIES 1964; Salimans et al., 2013; Kingma and Welling, 2013; Rezende et al., 2014) is a technique to simulate samples In this section we first cover a number of preliminary z ∼ q(z, θ) as z = T (; θ), where  ∼ s() is indepen- concepts that will be used in the rest of this paper. dent from θ 4, and the transformation T (; θ) should be differentiable w.r.t. θ. It has been shown that this 2.1 Lie Groups and Lie Algebras generally results in lower variance estimates than score function variants, thus leading to more efficient and bet- Lie Group, G: A Lie group, G is a group that is also ter convergence results (Titsias and Lázaro-Gredilla, a smooth manifold. This means that we can, at least 2014; Fan et al., 2015). This reparameterization of in local regions, describe group elements continuously samples z, allows expectations w.r.t. q(z; θ) to be with parameters. The number of parameters equals rewritten as Eq(z;θ)[f(z)] = Es()[f(T (; θ))], thus mak- the dimension of the group. We can see (connected) ing it possible to directly optimize the parameters of a Lie groups as continuous symmetries where we can through backpropagation. continuously traverse between group elements3. Many relevant Lie groups are matrix Lie groups, which can Unfortunately, there exists no general approach to defin- be expressed as a subgroup of the Lie group GL(n, R) ing a reparameterization scheme for arbitrary distribu- of invertible square matrices with matrix multiplication tions. Although there has been a significant amount of research into finding ways to extend or generalize 1Pronounced ‘really’. the reparameterization trick (Ruiz et al., 2016; Naes- 2 Our implementation is available at seth et al., 2017; Figurnov et al., 2018), to the best https://github.com/pimdh/relie 3We refer the interested reader to (Hall, 2003). 4At most weakly dependent (Ruiz et al., 2016). Luca Falorsi1,2, Pim de Haan1,3, Tim R. Davidson1,2, Patrick Forré1

Figure 2.1: Illustration of reparameterization trick of a Lie group v. the classic reparameterization trick. of our knowledge no such trick exists for spaces with operation, we have to account for the possible change non-trivial topologies such as Lie groups. in volume using the change of variable formula5. Addi- tionally the exponential map is not necessarily injective, 3 REPARAMETERIZING such that multiple points in the algebra can map to DISTRIBUTIONS ON LIE the same element in the group. We will have a more in depth discussion of both complications in the following GROUPS subsection. In this section we will first explain our reparameteriza- (c) Finally, to change the location of the distribution qˆ, tion trick for distributions on Lie groups (ReLie), by we left multiply g by another group element gµ, apply- analogy to the classic Gaussian example described in ing the group specific operation. In the classic case this (Kingma and Welling, 2013), as we can consider RN corresponds to a translation by µ. If the exponential under addition as a Lie group with Lie algebra RN map is surjective (like in all compact and connected itself. In the remainder we build an intuition for our Lie groups), then gµ can also be parameterized by the general theory drawing both from geometrical as well exponential map6. as measure theoretical concepts, concluded by stating our formal theorem. 3.2 Theory

3.1 Reparameterization Steps Geometrical Concepts When trying to visualize the change in volume, moving from the Lie algebra The following reparameterization steps (a), (b), (c) are space to that of the group manifold, we quickly reach illustrated in Figure 2.1. the limits of our geometrical intuition. As concepts like volume and distance are no longer intuitively defined, (a) We first sample from a reparameterizable distribu- naturally our treatment of integrals and probability tion r(v|σ) on g. Since the Lie algebra is a real vector densities should be reinspected as well. In mathematics space, if we fix a basis this is equivalent to sampling a these concepts are formally treated in the fields of differ- reparameterizable distribution from N . In fact, the R ential and Riemannian geometry. To gain insight into basis induces an isomorphism between the Lie algebra building quantitative models of the above-mentioned and N (see Appendix G). R concepts, these fields start from the local space behav- (b) Next we apply the exponential map to v, to obtain ior instead. This is done through the notion of the an element, g ∼ qˆ(g|σ) of the group. If the distribu- Riemannian metric, which formally corresponds to "at- tion r(v|σ) is concentrated around the origin, then the 5 distribution of qˆ(g|σ) will be concentrated around the In a sense, this is similar to the idea underlying nor- N malizing flows (Rezende and Mohamed, 2015) group identity. In the Gaussian example on , this 6 R Care must be taken however when gµ is predicted by step corresponds to the identity operation, and r = qˆ. a neural network to avoid homeomorphism conflicts as As this transformation is in general not the identity explored in (Falorsi et al., 2018; de Haan and Falorsi, 2018) Reparameterizing Distributions on Lie Groups

0 0 (a) f∗(m ) no density (b) g∗(m ) with density

Figure 3.1: Example of the two non-injective mappings, f(x) = 1 and g(x) = x2 + c, where the blue line denotes the initial Gaussian density of m0, and the orange line the transformed density. In (a) the pushforward measure 0 of m by f collapses to δ1, while for (b) the pushforward measure by g has a density.

N taching" to the tangent space TpG at every point p a where B(R ) is the Borel σ-algebra, i.e. the collection scalar product h , ip. This allows to measure quantities of all measurable sets. When applying the exponen- like length and angles, and to define a local volume tial map, we define a new measure on G9, technically 10 element, in small infinitesimal scales. Extrapolating called the pushforward measure, exp∗(m) . However, from this approach we are now equipped to measure G already comes equipped with another measure ν, not sets and integrate functions, which corresponds to hav- necessarily equal to exp∗(m). Hence, if we consider ing a measure on the space7. Notice that this measure a prior distribution that has a density on ν, in order arose directly from the geometric properties defined to compute quantities such as the Kullback-Leibler di- by the Riemannian metric. By carefully choosing this vergence we also need exp∗(m)  ν, meaning it has a metric, we can endow our space with some desirable density qˆ w.r.t. ν. properties. A standard choice for Lie groups is to use a In the case the exponential map is injective, it can left invariant metric, which automatically induces a left easily be shown that the pushforward measure has a ν, called the Haar measure (unique density on ν11. However, that these requirements are up to a constant): not necessarily fulfilled can be best explained through a simple example: Consider f : → , s.t. f(x) = ν(gE) = ν(E), ∀g ∈ G, E ∈ B[G], R R 1, ∀x ∈ R, this function is clearly differentiable (see Fig. 1(a)). If we take a measure m0, with a Gaussian density, where gE is the set obtained by applying the group the pushforward of m0 by f is a Dirac delta, δ , for element to each element in the set E. More intuitively, 1 which it no longer holds that f (m0)  λ. Intuitively, this implies that left multiplication doesn’t change ∗ this happens because f is not injective since all points volume. x ∈ R are mapped to 1, such that all the mass of the pushforward measure is concentrated on a single point. Measure Theoretical Concepts Perhaps a more natural way to view this problem comes from measure Yet, this does not mean that all non-injective mappings theory, as we’re trying to push a measure on g, to a can not be used. Instead, consider g : R → R, s.t. 2 0 space G with a possibly different topology. Whenever g(x) = x + c, ∀x ∈ R with c ∈ R a constant, and m N as before (see Fig. 1(b)). Although g is clearly not discussing densities such as r in R , it is implicitly stated that we consider a density w.r.t. the Lebesgue injective, for the pushforward measure by g we still have 0 measure λ. What this really means is that we are g∗(m )  λ. The key property here, is that it’s possible considering a measure m, absolutely continuous (a.c.) to partition the domain of g into the sets (−∞, 0) ∪ w.r.t. λ, written as m  λ8. Critically, this is equiva- (0, ∞) ∪ {0}. For the first two, we can now apply the lent to stating there exists a density r, such that change of variable method on each, as g is injective when restricted to either. The zero set can be ignored, Z N 9 m(E) = r dλ, ∀E ∈ B(R ), We can do this since the exponential map is differen- E tiable, thus continuous, thus measurable. 10See definition 3, Appendix C 7We refer the interested reader to (Lee, 2012). 11In fact, as discussed before we can always reduce to 8See definition 1, Appendix C this case by defining a measure with limited support. Luca Falorsi1,2, Pim de Haan1,3, Tim R. Davidson1,2, Patrick Forré1 since it has Lebesgue measure 0. This partition-idea of volume when pushing a density from the algebra to can be generally extended for Lie groups, by proving the group. Here, we given an intuitive explanation for that the Lie algebra domain can be partitioned in a set Matrix Lie groups in d dimensions with matrices of of measure zero and a countable union of open sets in size n × n. For a formal general explanation, we refer which the exponential map is a diffeomorphism. This to Appendix D. insight proven in Lemma E.5, allows us to prove the The image of the exp map is the d dimensional manifold, general theorem: Lie group G, embedded in Rn×n. An infinitesimal Theorem 3.1. Let G, g, m, λ, ν be defined as above, variation in the input around point x ∈ g, creates an then exp∗(m)  ν with density: infinitesimal variation of the output, which is restricted to the d dimensional manifold G. Infinitesimally this X −1 p(a) = r(x)|J(x)| , a ∈ G, (1) gives rise to a linear map between the tangent spaces x∈g:exp(x)=a at input and output. This is the Jacobian.  (−1)k  The change of volume is the determinant of the Ja- where J(x) := det P∞ (ad )k k=0 (k + 1)! x cobian. To compute it, we express the tangent space at the output in terms of the chosen basis of the Lie Proof. See Appendix E.6 algebra. This is possible, since a basis for a Lie algebra provides a unique basis for the tangent space through- Location Transformation Having verified the out G. This can be computed analytically for any x, pushforward measure has a density, qˆ, the final step is since the exp map of matrix Lie groups is the matrix to recenter the location of the resulting distribution. In exponential, for which derivatives are computable. Nev- practice, this is done by left multiplying the samples by ertheless a general expression of J(x) exists for any another group element. Technically, this corresponds Lie Group and is given in terms of the complex eigen- to applying the left multiplication map value spectrum Sp(·) of the adjoint representation of x, which is a linear map: L : G → G, g 7→ g g gµ µ Theorem 4.1. Let G be a Lie Group and g its Lie Since this map is a diffeomorphism, we can again apply algebra, then it can be shown that J(x) can be computed the change of variable method. Moreover, if the mea- using the following expression sure on the group ν is chosen to be the Haar measure, Y λ as noted before applying the left multiplication map J(x) := (2) leaves the Haar measure unchanged. In this case the 1 − e−λ λ∈Sp(adx) final sample thus has density λ6=0

g ∼ q(g |g , σ) =q ˆ(g−1g |σ) z z µ µ z Proof. See Appendix D.4 Additionally, the entropy of the distribution is invariant w.r.t. left multiplication. 4.2 Three Lie Group Examples

4 IMPLEMENTATION The n-Torus, TN : The n-Torus is the cross-product of n times S1. It is an abelian (commutative) group, In this section we present general implementation de- which is interesting to consider as it forms an important tails, as well as worked out examples for three inter- building block in the theory of Lie groups. The n-Torus N has the following matrix representation: esting and often used groups: the n-Torus, T , the oriented group of 3D rotations, SO(3), and the 3D B  rotation-translation group, SE(3). The worst case repa- α1 cos α − sin α T (α) := . . ,B := , rameterization computational complexity for a matrix  .  α sin α cos α B Lie group of dimension n can be shown12 to be O(n3). αn

However for many Lie groups closed form expressions n can be derived, drastically reducing the complexity. where α = (α1, ··· , αn) ∈ R . The basis of the Lie algebra is composed of 2n × 2n block-diagonal matrices with 2 × 2 blocks s.t. all blocks are 0 except one that 4.1 Computing J(x) is equal to L: The term J(x) as appearing in the general reparameter-   α1L   ization theorem 3.1, is crucial to compute the change . 0 −1 L(α) =  . .  ,L := 12 1 0 See Appendix D.4. αnL Reparameterizing Distributions on Lie Groups

The exponential map is s.t. the pre-image can be defined from the following relationship L(α) 7→ T (α):

n exp(L(α + 2πk)) = exp(L(α)), k ∈ Z

The pushforward density is defined as J(L(α)) = 1 X qˆ(T (α)|σ) = r (α + 2kπ|σ) (3) n k∈Z It can be observed that there is no change in volume. The resulting distribution on the circle or 1-Torus, which is also the Lie group SO(2), is illustrated in (a) SO(3) Appendix B.

The Special Orthogonal Group, SO(3): The Lie group of orientation preserving three dimensional rota- tions has its matrix representation defined as

> SO(3) := {R ∈ GL(3, R): R R = I ∧ det(R) = 1} The elements of its Lie algebra so(3), are represented by the 3D vector space of skew-symmetric 3 × 3 matrices. We choose a basis for the Lie algebra: 0 0 0   0 0 1 0 −1 0 L1,2,3 := 0 0 −1 ,  0 0 0 , 1 0 0 0 1 0 −1 0 0 0 0 0 (b) SE(3) This provides a vector space isomorphism between R3 3 and so(3), written as [ · ]× : R → so(3). Assuming Figure 4.1: Plotted the log relative error of analytical the decomposition v× = θu×, s.t. θ ∈ R≥0, kuk = 1, and numerical estimations of J(x) for the groups SO(3) the exponential map is given by the Rodrigues rotation and SE(3), equations (5), (6). Numerical estimation of formula (Rodrigues, 1840) Jacobian performed by taking small discrete steps of decreasing size (x-axis) in each Lie algebra direction. exp(v ) = I + sin(θ)u + (1 − cos(θ))u2 (4) × × × The error is evaluated at 1000 randomly sampled points. Since SO(3) is a compact and connected Lie group this map is surjective, however it is not injective. The The Special Euclidean Group, SE(3): This Lie complete preimage of an arbitrary group element can group extends SO(3) by also adding translations. Its be defined by first using the principle branch log(·) matrix representation is given by operator to find the unique Lie algebra element next R u to the origin, and then observing the following relation , u ∈ 3,R ∈ SO(3) 0 1 R exp(θu×) = exp((θ + 2kπ)u×) k ∈ Z The Lie algebra, se(3) is similarly built concatenating In practice, we will already have access to such an a skew-symmetric matrix and a R3 vector element of the Lie algebra due to the sampling approach. ω u The pushforward density defined almost everywhere as S(ω, u) = × , u, ω ∈ 3, 0 0 R kvk2 J(v) = (5) A basis can easily be found combining the basis ele- 2 − 2 cos kvk 3 ments for so(3) and the canonical basis of R . The   2 X log(R) (θ(R) + 2kπ) exponential map from algebra to group is defined as qˆ(R|σ) = r (θ(R) + 2kπ) σ , θ(R) 3 − tr(R)     k∈ ω u exp(ω ) V u Z × 7→ × , 0 0 0 1 where R ∈ SO(3) and where exp(·) is defined as in equation (4), and tr(R) − 1 θ(R) = k log(R)k = cos−1 1 − cos(kωk) kωk − sin(kωk) 2 V = I + ω + ω2 kωk2 × kωk3 × Luca Falorsi1,2, Pim de Haan1,3, Tim R. Davidson1,2, Patrick Forré1

From the expression of the exponential map it is clear that the preimage can be described similar to SO(3). Finally the pushforward density is defined almost ev- erywhere as

 kωk2 2 J(S(ω, u)) = (6) 2 − 2 cos kωk qˆ(M|σ) =      2 2 X ω kω + 2kπk r S (kωk + 2kπ), u σ , kωk 2 − 2 cos kωk k∈Z where ω and u such that S(ω, u) = log(M). The log (a) Line symmetry can be easily defined from the log in SO(3).

5 RELATED WORK

Various work has been done in extending the repa- rameterization trick to an ever growing amount of variational families. Figurnov et al. (2018) provide a detailed overview, classifying existing approaches into (1) finding surrogate distributions, which in the absence of a reparameterization trick for the desired distribution, attempts to use an acceptable alterna- tive distribution that can be reparameterized instead (b) Triangular symmetry (Nalisnick and Smyth, 2017). (2) Implicit reparameter- ization gradients, or pathwise gradients, introduced in Figure 5.1: Samples of the Variational Inference model machine learning by Salimans et al. (2013), extended by and Markov Chain Monte Carlo of Experiment 6.1. Graves (2016), and later generalized by Figurnov et al. Outputs are shifted in the z-dimension for clarity. (2018) using implicit differentiation. (3) Generalized reparameterizations finally try to generalize the stan- dard approach as described in the preliminaries section. Notable are (Ruiz et al., 2016), which relies on defining map is generally not injective. NF can be combined a suitable invertible standardization function to allow with our work to create complex distributions on Lie a weak dependence between the noise distribution and groups, as is demonstrated in the next section. the parameters, and the closely related (Naesseth et al., 2017), focusing on rejection sampling. Defining and working with distributions on homoge- neous spaces, including Lie groups, was previously in- All of the techniques above can be used orthogonal to vestigated in (Chirikjian and Kyatkin, 2000; Chirikjian, our approach, by defining different distributions over 2010; Wolfe and Mashner, 2011; Chirikjian, 2011; the Lie algebra. While some even allow for reparameter- Chirikjian and Kyatkin, 2016; Ming, 2018). Barfoot 13 izable densities on spaces with non-trivial topologies , and Furgale (2014) also discuss implicitly defining dis- none of them provide the tools to correctly take into tributions on Lie groups, through a distribution on account the volume change resulting from pushing den- the algebra, focusing on the case of SE(3). However, N sities defined on R to arbitrary Lie groups. In that these works only consider the neighbourhood of the regard the ideas underlying normalizing flows (NF) identity, making the exponential map injective, but the (Rezende and Mohamed, 2015) are the closest to our distribution less expressive. In addition, generally only approach, in which probability densities become in- Gaussian distributions on the Lie Algebra are used creasingly complex through the use of injective maps. in past work. Cohen and Welling (2015) devised har- Two crucial differences however with our problem do- monic exponential families which are a powerful family main, are that the change of variable computation now of distributions defined on homogeneous spaces. These needs to take into account a transformation of the un- works all did not concentrate on making the distribu- derlying space, as well as the fact that the exponential tions reparameterizable. Mallasto and Feragen (2018) 13For example Davidson et al. (2018) reparameterize the defined a wrapped Gaussian process on Riemannian von Mises-Fisher distribution which is defined on SM , with manifolds through the pushforward of the exp map S1 isomorphic to the Lie group SO(2). without providing an expression for the density. Reparameterizing Distributions on Lie Groups

Figure 5.2: Samples from conditional SO(3) distribution p(g|x) for different x, where x has the symmetry of rotations of 2π/3 along one axis of Experiment 6.2. Shown are PCA embeddings of the matrices, learned using Locally Invertible Flow (LI-Flow) with Maximum Likelihood Estimation.

6 EXPERIMENTS Results are shown in Fig. 5.1 and compared to Markov Chain Monte Carlo samples. We observe the symme- We conduct two experiments on SO(3) to highlight the tries are correctly inferred. potential of using complex and multimodal reparame- terizable densities on Lie groups14. 6.2 Maximum Likelihood Estimation

Normalizing Flow To construct multimodal distri- To demonstrate the versatility of the reparameterizable butions Normalizing Flows are used (Dinh et al., 2014; Lie group distribution, we learn supervised pose estima- Rezende and Mohamed, 2015): tion by learning a multimodal conditional distribution using MLE, as in (Dinh et al., 2017).

d f d r·tanh d exp 0 R −→ R −→ R =∼ g −→ G, We created data set: (x, g = exp()g) of objects x rotated to pose g and algebra noise samples . The where f is an invertible Neural Network consisting of object is symmetric for the subgroup corresponding to several coupling layers (Dinh et al., 2014), the tanh(·) rotations of 2π/3 along one axis. We train a LI-Flow 0 function is applied to the norm and a unit Gaussian model by maximizing: Ex,g0 log p(g |x). The results in is used as initial distribution. The hyperparameter r Fig. 5.2 reveal that the LI-Flow successfully learns a determines the non-injectivity of the exp map and thus multimodal conditional distribution. of the flow. r must be chosen such that the image of r · tanh is contained in the regular region of the 7 CONCLUSION exp map. For sufficiently small r, the entire flow is invertible, but may not be surjective, while for bigger r In this paper we have presented a general framework the flow is non-injective, with a finite inverse set at each to reparameterize distributions on Lie groups (ReLie), g ∈ G, as exp is a local diffeomorphism and the image of that enables the extension of previous results in repa- r · tanh has compact support. For details see Appendix rameterizable densities to arbitrary Lie groups. Fur- E. For such a Locally Invertible Flow (LI-Flow), the thermore, our method allows for the creation of com- likelihood evaluation requires us to branch at the non- plex and multimodal distributions through normalizing injective function and traverse the flow backwards for flows, for which we defined a novel Locally Invertible each element in the preimage. Flow (LI-Flow) example on the group SO(3). We em- pirically showed the necessity of LI-Flows in estimating 6.1 Variational Inference uncertainty in problems containing discrete or continu- ous symmetries. In this experiment we estimate the SO(3) group ac- tions that leave a symmetrical object invariant. This This work provides a bridge to leverage the advantages highlights how our method can be used in probabilistic of using deep learning to estimate uncertainty for nu- generative models and unsupervised learning tasks. We merous application domains in which Lie groups play have a generative model p(x|g) and a uniform prior an important role. In future work we plan on further over the latent variable g. Using Variational Inference exploring the directions outlined in our experimental we optimize the Evidence Lower Bound to infer an section to more challenging instantiations. Specifically, approximate posterior q(g|x) modeled with LI-Flow. learning rigid body motions from raw point clouds or modeling environment dynamics for applications in 14See Appendix A for additional details. optimal control present exciting possible extensions. Luca Falorsi1,2, Pim de Haan1,3, Tim R. Davidson1,2, Patrick Forré1

Acknowledgements Falorsi, L., de Haan, P., Davidson, T. R., De Cao, N., Weiler, M., Forré, P., and Cohen, T. S. (2018). Explo- The authors would like to thank Rianne van den Berg rations in homeomorphic variational auto-encoding. and Taco Cohen for suggestions and insightful discus- ICML Workshop. sions to improve this manuscript. Fan, K., Wang, Z., Beck, J., Kwok, J., and Heller, K. A. (2015). Fast second order stochastic backpropagation References for variational inference. In NIPS, pages 1387–1395. Alexandrino, M. M. and Bettiol, R. G. (2015). Lie Figurnov, M., Mohamed, S., and Mnih, A. (2018). groups and geometric aspects of isometric actions. Implicit reparameterization gradients. In Bengio, Cham: Springer. S., Wallach, H., Larochelle, H., Grauman, K., Cesa- Barfoot, T. D. and Furgale, P. T. (2014). Associating Bianchi, N., and Garnett, R., editors, Advances in uncertainty with three-dimensional poses for use in Neural Information Processing Systems 31, pages estimation problems. IEEE Transactions on Robotics, 439–450. Curran Associates, Inc. 30(3):679–693. Graves, A. (2016). Stochastic backpropagation through Bonnet, G. (1964). Transformations des signaux aléa- mixture density distributions. arXiv preprint toires a travers les systemes non linéaires sans mé- arXiv:1607.05690. moire. In Annales des Télécommunications, vol- Hall, B. (2003). Lie Groups, Lie Algebras, and Repre- ume 19, pages 203–220. Springer. sentations: An Elementary Introduction. Graduate Caron, R. and Traynor, T. (2005). The zero set of Texts in Mathematics. Springer. polynomials. http: // www1. uwindsor. ca/ math/ Hermann, R. (1980). Differential geometry, lie groups, sites/ uwindsor. ca. math/ files/ 05-03. pdf . and symmetric spaces (sigurdur helgason). SIAM Chirikjian, G. S. (2010). Information-theoretic inequal- Review, 22(4):524–526. ities on unimodular lie groups. Journal of geometric Howe, R. (1989). Review: Sigurdur helgason, groups mechanics, 2(2):119. and geometric analysis. integral geometry, invariant Chirikjian, G. S. (2011). Stochastic Models, Infor- differential operators and spherical functions. Bull. mation Theory, and Lie Groups, Volume 2: Ana- Amer. Math. Soc. (N.S.), 20(2):252–256. lytic Methods and Modern Applications, volume 2. Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Springer Science & Business Media. Saul, L. K. (1999). An introduction to variational Chirikjian, G. S. and Kyatkin, A. B. (2000). Engineer- methods for graphical models. Machine learning, ing applications of noncommutative harmonic anal- 37(2):183–233. ysis: with emphasis on rotation and motion groups. Kingma, D. P. and Welling, M. (2013). Auto-encoding CRC press. variational bayes. CoRR. Chirikjian, G. S. and Kyatkin, A. B. (2016). Har- monic Analysis for Engineers and Applied Scientists: Klenke, A. (2014). Probability Theory - A Comprehen- Updated and Expanded Edition. Courier Dover Publi- sive Course. Universitext. Springer, London, second cations. edition. Cohen, T. S. and Welling, M. (2015). Harmonic expo- Lee, J. (2010). Introduction to topological manifolds, nential families on manifolds. ICML. volume 202. Springer Science & Business Media. Davidson, T. R., Falorsi, L., Cao, N. D., Kipf, T., and Lee, J. M. (2012). Introduction to smooth manifolds. Tomczak, J. M. (2018). Hyperspherical Variational Graduate texts in mathematics. Springer, New York, Auto-Encoders. UAI. NY [u.a.], 2. ed. edition. de Haan, P. and Falorsi, L. (2018). Topological con- Mallasto, A. and Feragen, A. (2018). Wrapped gaussian straints on homeomorphic auto-encoding. NIPS process regression on riemannian manifolds. CVPR. Workshop. Milnor, J. (1976). Curvatures of left invariant metrics Dinh, L., Krueger, D., and Bengio, Y. (2014). on lie groups. Advances in Mathematics, 21(3):293 – Nice: Non-linear independent components estima- 329. tion. ICLR. Ming, Y. (2018). Variational bayesian data analysis on Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2017). manifold. Control Theory and Technology, 16(3):212– Density estimation using real nvp. ICLR. 220. Duistermaat, J. J. and Kolk, J. A. C. (2000). Lie Mnih, A. and Gregor, K. (2014). Neural variational groups. Berlin: Springer. inference and learning in belief networks. In ICML. Reparameterizing Distributions on Lie Groups

Moler, C. and Van Loan, C. (2003). Nineteen dubious ways to compute the exponential of a matrix, twenty- five years later. SIAM review, 45(1):3–49. Naesseth, C., Ruiz, F., Linderman, S., and Blei, D. (2017). Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms. AIS- TATS, pages 489–498. Nalisnick, E. and Smyth, P. (2017). Stick-breaking variational autoencoders. ICLR. Paisley, J., Blei, D., and Jordan, M. (2012). Variational bayesian inference with stochastic search. ICML. Price, R. (1958). A useful theorem for nonlinear de- vices having gaussian inputs. IRE Transactions on Information Theory, 4(2):69–72. Rezende, D. and Mohamed, S. (2015). Variational inference with normalizing flows. ICML, 37:1530– 1538. Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate infer- ence in deep generative models. ICML, pages 1278– 1286. Rodrigues, O. (1840). Des lois géométriques qui régissent les déplacements d’un système solide dans l’espace: et de la variation des cordonnées provenant de ces déplacements considérés indépendamment des causes qui peuvent les produire. J Mathematiques Pures Appliquees, 5:380–440. Ruiz, F. R., Titsias RC AUEB, M., and Blei, D. (2016). The generalized reparameterization gradient. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R., editors, NIPS, pages 460–468. Curran Associates, Inc. Salimans, T., Knowles, D. A., et al. (2013). Fixed-form variational posterior approximation through stochas- tic linear regression. Bayesian Analysis, 8(4):837– 882. Schreiber, U. and Bartels, T. (2018). volume form. http://ncatlab.org/nlab/show/volume% 20form. Revision 10. Titsias, M. and Lázaro-Gredilla, M. (2014). Doubly stochastic variational bayes for non-conjugate infer- ence. In ICML, pages 1971–1979. Williams, R. J. (1992). Simple statistical gradient- following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256. Wolfe, K. C. and Mashner, M. (2011). Bayesian fusion on lie groups. Journal of Algebraic Statistics, 2(1).