Geometrically Enriched Latent Spaces

Geometrically Enriched Latent Spaces Georgios Arvanitidis Søren Hauberg Bernhard Schölkopf MPI for Intelligent Systems, Tübingen DTU Compute, Lyngby MPI for Intelligent Systems, Tübingen Abstract A common assumption in generative models is that the generator immerses the latent space into a Euclidean ambient space. In- stead, we consider the ambient space to be a Riemannian manifold, which allows for en- coding domain knowledge through the asso- ciated Riemannian metric. Shortest paths can then be defined accordingly in the latent space to both follow the learned manifold and respect the ambient geometry. Through Figure 1: The proposed shortest path ( ) favors the careful design of the ambient metric we can smiling class, while the standard shortest path ( ) ensure that shortest paths are well-behaved merely minimizes the distance on the data manifold. even for deterministic generators that otherwise would exhibit a misleading bias. Ex- perimentally we show that our approach im- proves the interpretability and the function- lines in Z are not shortest paths in any meaningful ality of learned representations both using sense, and therefore do not constitute natural inter- stochastic and deterministic generators. polants. To overcome this issue, it has been proposed to endow the latent space with a Riemannian metric such that curve lengths are measured in the ambi- 1 Introduction ent observation space X (Tosi et al., 2014; Arvanitidis et al., 2018). In other words, this ensures that any Unsupervised representation learning has made smooth invertible transformation of Z does not change tremendous progress with generative models such the distance between a pair of points, as long as the as variational autoencoders (VAEs) (Kingma and ambient path in X remains the same. This approach Welling, 2014; Rezende et al., 2014) and generative immediately solves the identifiability problem. adversarial networks (GANs) (Goodfellow et al., While distances in X are well-defined and give rise to 2014). These, and similar, models provide a flexible an identfiable latent representation, they need not be and efficient parametrization of the density of obser- particularly useful. We take inspiration from metric vations in an ambient space X through a typically learning (Weinberger et al., 2006; Arvanitidis et al., lower dimensional latent space Z. 2016) and propose to equip the ambient observation While the latent space Z constitutes a compressed space X with a Riemannian metric and measure curve representation of the data, it is by no means unique. lengths in latent space accordingly. With this ap- Like most other latent variable models, these genera- proach it is straight-forward to steer shortest paths in tive models are subject to identifiability problems, such latent space to avoid low-density regions, but also to that different representations can give rise to identical incorporate higher level semantic information. For in- densities (Bishop, 2006). This implies that straight stance, Fig.1 shows a shortest path under an ambient metric that favors images of smiling people. In such a Proceedings of the 24th International Conference on Artifi- way, we can control, and potentially unbias, distance cial Intelligence and Statistics (AISTATS) 2021, San Diego, based methods by utilizing domain knowledge, for ex- California, USA. PMLR: Volume 130. Copyright 2021 by ample in an individual fairness scenario. Hence, we the author(s). get both identifiable and useful latent representations. Geometrically Enriched Latent Spaces D x point x 2 M. Hence, v 2 TxM is a vector v 2 R and v actually the Riemannian metric is M : M! D×D. v X R0 Thus, the simplest approach is to assume that X is equipped with the Euclidean metric MX (x) = ID and γ(t) γ(t) its restriction is utilized as the Riemannian metric on y TxM. Since the choice of MX (·) has a direct impact on M, we can utilize other metrics designed to encode x y high-level semantic information (see Sec.3). Figure 2: Examples of a tangent vector ( ) and a Another view is to consider as smooth manifold the shortest path ( ) on an embedded M ⊂ X (left) and D D whole ambient space X = R . Hence, the TxX = R on an ambient X (right). is centered at x 2 RD and again the simplest Rieman- nian metric is the Euclidean MX (x) = ID. However, we are able to use other suitable metrics that simply In summary, we consider the ambient space of a gen- change the way we measure distances in X (see Sec.3). erative model as a Riemannian manifold, where the For instance, given a set of points in X we can con- metric can be defined by the user in order to encode struct a metric with small magnitude near the data to high level information about the problem of interest. pull the shortest paths towards them (see Fig.2 right). In such a way, the resulting shortest paths in the latent For a d-dimensional embedded manifold M ⊂ X , a col- space move optimally on the data manifold, while re- lection of chart maps φ : U ⊂ M ! d is used to as- specting the geometry of the ambient space. This can i i R sign local intrinsic coordinates to neighborhoods U ⊂ be useful in scenarios where a domain expert wants i M, and for simplicity, we assume that a global chart to control the shortest paths in an interpretable way. map φ(·) exists. By definition, when M is smooth the In addition, we propose a simple method to construct φ(·) and its inverse φ−1 : φ(M) ⊂ d ! M ⊂ X exist diagonal metrics in the ambient space, as well as an R and are smooth maps. Thus, a vx 2 TxM can be ex- architecture for the generator in order to extrapolate d pressed as vx = Jφ−1 (z)vz, where z = φ(x) 2 R and meaningfully. We show how this enables us to prop- d vz 2 R are the representations in the intrinsic coor- erly capture the geometry of the data manifold in de- D×d dinates. Also, the Jacobian J −1 (z) 2 defines a terministic generators, which is otherwise infeasible. φ R basis that spans the TxM, and thus, we represent the ambient metric MX (·) in the intrinsic coordinates as 2 Applied Riemannian geometry intro | −1 hvx; vxix = hvz; Jφ−1 (z) MX (φ (z))Jφ−1 (z)vzi We are interested in Riemannian manifolds (do Carmo, = hvz; M(z)vzi = hvz; vziz; (1) 1992), which constitute well-defined metric spaces, | −1 d×d where the inner product is defined only locally and with M(z) = Jφ−1 (z) MX (φ (z))Jφ−1 (z) 2 R0 changes smoothly throughout space. In a nutshell, being smooth. As we discuss below, we should be able these are smooth spaces where we can compute short- to evaluate the intrinsic M(z) in order to find length est paths, which prefer regions where the magnitude of minimizing curves on M. However, when M is em- the inner product is small. In this work, we show how bedded the chart maps are usually unknown, as well to use these geometric structures in machine learning, as a global chart rarely exists. In contrast, for ambient where it is commonly assumed that data lie near a low like manifolds the global chart is φ(x) = x, which is dimensional manifold in an ambient observation space. convenient to use in practice. Definition 1. A Riemannian manifold is a smooth In general, one of the main utilities of a Riemannian manifold M, equipped with a positive definite Rieman- manifold M ⊆ X is to enable us compute short- nian metric M(x) 8 x 2 M, which changes smoothly p est paths therein. Intuitively, the norm hdx; dxix and defines a local inner product on the tangent space represents how the infinitesimal displacement vector TxM at each point x 2 M as hv; uix = hv; M(x)ui dx ≈ x0 − x on M is locally scaled. Thus, for a curve with v; u 2 TxM. γ : [0; 1] !M that connects two points x = γ(0) and y = γ(1), the length on M or equivalently in φ(M) A smooth manifold is a topological space, which lo- using that γ(t) = φ−1(c(t)) and Eq.1 is measured as cally is homeomorphic to a Euclidean space. An intu- itive way to think of a d-dimensional smooth manifold Z 1 q is as an embedded non-intersecting surface M in an length[γ(t)] = hγ_ (t); γ_ (t)iγ(t)dt (2) D 0 ambient space X for example R with D > d (see 1 Z p Fig.2 left). In this case, the tangent space TxM is = hc_(t); M(c(t))c _(t)idt = length[c(t)]; a d-dimensional vector space tangential to M at the 0 Georgios Arvanitidis, Søren Hauberg, Bernhard Schölkopf whereγ _ (t) = @tγ(t) 2 Tγ(t)M is the velocity of the M(z) = Jg(z)|MX (g(z))Jg(z) is known as the pull- curve and accordinglyc _(t) 2 Tc(t)φ(M). The minimiz- back metric. Essentially, it captures the intrinsic ge- ers of this functional are the shortest paths, also known ometry of the immersed MZ , while taking into ac- as geodesics. We find them by solving a system of 2nd count the geometry of X . Therefore, the space Z order nonlinear ordinary differential equations (ODEs) together with M(z) constitutes a Riemannian mani- dZ defined in the intrinsic coordinates. Notably, for am- fold, but since Z = R the chart map and the TzZ bient like manifolds the trivial chart map enables us to are trivial.

Geometrically Enriched Latent Spaces

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support