Geometrically Enriched Latent Spaces

Geometrically Enriched Latent Spaces Georgios Arvanitidis Søren Hauberg Bernhard Schölkopf MPI for Intelligent Systems, Tübingen DTU Compute, Lyngby MPI for Intelligent Systems, Tübingen Abstract A common assumption in generative models is that the generator immerses the latent space into a Euclidean ambient space. In- stead, we consider the ambient space to be a Riemannian manifold, which allows for en- coding domain knowledge through the asso- ciated Riemannian metric. Shortest paths can then be defined accordingly in the latent space to both follow the learned manifold and respect the ambient geometry. Through Figure 1: The proposed shortest path ( ) favors the careful design of the ambient metric we can smiling class, while the standard shortest path ( ) ensure that shortest paths are well-behaved merely minimizes the distance on the data manifold. even for deterministic generators that otherwise would exhibit a misleading bias. Ex- perimentally we show that our approach im- proves the interpretability and the function- lines in Z are not shortest paths in any meaningful ality of learned representations both using sense, and therefore do not constitute natural inter- stochastic and deterministic generators. polants. To overcome this issue, it has been proposed to endow the latent space with a Riemannian metric such that curve lengths are measured in the ambi- 1 Introduction ent observation space X (Tosi et al., 2014; Arvanitidis et al., 2018). In other words, this ensures that any Unsupervised representation learning has made smooth invertible transformation of Z does not change tremendous progress with generative models such the distance between a pair of points, as long as the as variational autoencoders (VAEs) (Kingma and ambient path in X remains the same. This approach Welling, 2014; Rezende et al., 2014) and generative immediately solves the identifiability problem. adversarial networks (GANs) (Goodfellow et al., While distances in X are well-defined and give rise to 2014). These, and similar, models provide a flexible an identfiable latent representation, they need not be and efficient parametrization of the density of obser- particularly useful. We take inspiration from metric vations in an ambient space X through a typically learning (Weinberger et al., 2006; Arvanitidis et al., lower dimensional latent space Z. 2016) and propose to equip the ambient observation While the latent space Z constitutes a compressed space X with a Riemannian metric and measure curve representation of the data, it is by no means unique. lengths in latent space accordingly. With this ap- Like most other latent variable models, these genera- proach it is straight-forward to steer shortest paths in tive models are subject to identifiability problems, such latent space to avoid low-density regions, but also to that different representations can give rise to identical incorporate higher level semantic information. For in- densities (Bishop, 2006). This implies that straight stance, Fig.1 shows a shortest path under an ambient metric that favors images of smiling people. In such a Proceedings of the 24th International Conference on Artifi- way, we can control, and potentially unbias, distance cial Intelligence and Statistics (AISTATS) 2021, San Diego, based methods by utilizing domain knowledge, for ex- California, USA. PMLR: Volume 130. Copyright 2021 by ample in an individual fairness scenario. Hence, we the author(s). get both identifiable and useful latent representations. Geometrically Enriched Latent Spaces D x point x 2 M. Hence, v 2 TxM is a vector v 2 R and v actually the Riemannian metric is M : M! D×D. v X R0 Thus, the simplest approach is to assume that X is equipped with the Euclidean metric MX (x) = ID and γ(t) γ(t) its restriction is utilized as the Riemannian metric on y TxM. Since the choice of MX (·) has a direct impact on M, we can utilize other metrics designed to encode x y high-level semantic information (see Sec.3). Figure 2: Examples of a tangent vector ( ) and a Another view is to consider as smooth manifold the shortest path ( ) on an embedded M ⊂ X (left) and D D whole ambient space X = R . Hence, the TxX = R on an ambient X (right). is centered at x 2 RD and again the simplest Rieman- nian metric is the Euclidean MX (x) = ID. However, we are able to use other suitable metrics that simply In summary, we consider the ambient space of a gen- change the way we measure distances in X (see Sec.3). erative model as a Riemannian manifold, where the For instance, given a set of points in X we can con- metric can be defined by the user in order to encode struct a metric with small magnitude near the data to high level information about the problem of interest. pull the shortest paths towards them (see Fig.2 right). In such a way, the resulting shortest paths in the latent For a d-dimensional embedded manifold M ⊂ X , a col- space move optimally on the data manifold, while re- lection of chart maps φ : U ⊂ M ! d is used to as- specting the geometry of the ambient space. This can i i R sign local intrinsic coordinates to neighborhoods U ⊂ be useful in scenarios where a domain expert wants i M, and for simplicity, we assume that a global chart to control the shortest paths in an interpretable way. map φ(·) exists. By definition, when M is smooth the In addition, we propose a simple method to construct φ(·) and its inverse φ−1 : φ(M) ⊂ d ! M ⊂ X exist diagonal metrics in the ambient space, as well as an R and are smooth maps. Thus, a vx 2 TxM can be ex- architecture for the generator in order to extrapolate d pressed as vx = Jφ−1 (z)vz, where z = φ(x) 2 R and meaningfully. We show how this enables us to prop- d vz 2 R are the representations in the intrinsic coor- erly capture the geometry of the data manifold in de- D×d dinates. Also, the Jacobian J −1 (z) 2 defines a terministic generators, which is otherwise infeasible. φ R basis that spans the TxM, and thus, we represent the ambient metric MX (·) in the intrinsic coordinates as 2 Applied Riemannian geometry intro | −1 hvx; vxix = hvz; Jφ−1 (z) MX (φ (z))Jφ−1 (z)vzi We are interested in Riemannian manifolds (do Carmo, = hvz; M(z)vzi = hvz; vziz; (1) 1992), which constitute well-defined metric spaces, | −1 d×d where the inner product is defined only locally and with M(z) = Jφ−1 (z) MX (φ (z))Jφ−1 (z) 2 R0 changes smoothly throughout space. In a nutshell, being smooth. As we discuss below, we should be able these are smooth spaces where we can compute short- to evaluate the intrinsic M(z) in order to find length est paths, which prefer regions where the magnitude of minimizing curves on M. However, when M is em- the inner product is small. In this work, we show how bedded the chart maps are usually unknown, as well to use these geometric structures in machine learning, as a global chart rarely exists. In contrast, for ambient where it is commonly assumed that data lie near a low like manifolds the global chart is φ(x) = x, which is dimensional manifold in an ambient observation space. convenient to use in practice. Definition 1. A Riemannian manifold is a smooth In general, one of the main utilities of a Riemannian manifold M, equipped with a positive definite Rieman- manifold M ⊆ X is to enable us compute short- nian metric M(x) 8 x 2 M, which changes smoothly p est paths therein. Intuitively, the norm hdx; dxix and defines a local inner product on the tangent space represents how the infinitesimal displacement vector TxM at each point x 2 M as hv; uix = hv; M(x)ui dx ≈ x0 − x on M is locally scaled. Thus, for a curve with v; u 2 TxM. γ : [0; 1] !M that connects two points x = γ(0) and y = γ(1), the length on M or equivalently in φ(M) A smooth manifold is a topological space, which lo- using that γ(t) = φ−1(c(t)) and Eq.1 is measured as cally is homeomorphic to a Euclidean space. An intu- itive way to think of a d-dimensional smooth manifold Z 1 q is as an embedded non-intersecting surface M in an length[γ(t)] = hγ_ (t); γ_ (t)iγ(t)dt (2) D 0 ambient space X for example R with D > d (see 1 Z p Fig.2 left). In this case, the tangent space TxM is = hc_(t); M(c(t))c _(t)idt = length[c(t)]; a d-dimensional vector space tangential to M at the 0 Georgios Arvanitidis, Søren Hauberg, Bernhard Schölkopf whereγ _ (t) = @tγ(t) 2 Tγ(t)M is the velocity of the M(z) = Jg(z)|MX (g(z))Jg(z) is known as the pull- curve and accordinglyc _(t) 2 Tc(t)φ(M). The minimiz- back metric. Essentially, it captures the intrinsic ge- ers of this functional are the shortest paths, also known ometry of the immersed MZ , while taking into ac- as geodesics. We find them by solving a system of 2nd count the geometry of X . Therefore, the space Z order nonlinear ordinary differential equations (ODEs) together with M(z) constitutes a Riemannian mani- dZ defined in the intrinsic coordinates. Notably, for am- fold, but since Z = R the chart map and the TzZ bient like manifolds the trivial chart map enables us to are trivial.

Load more