Hyperbolic Entailment Cones for Learning Hierarchical Embeddings
Total Page:16
File Type:pdf, Size:1020Kb
Hyperbolic Entailment Cones for Learning Hierarchical Embeddings Octavian-Eugen Ganea 1 Gary Becigneul´ 1 Thomas Hofmann 1 Abstract Popular methods typically embed symbolic objects in low Learning graph representations via low- dimensional Euclidean vector spaces using a strategy that dimensional embeddings that preserve relevant aims to capture semantic information such as functional network properties is an important class of similarity. Symmetric distance functions are usually mini- problems in machine learning. We here present mized between representations of correlated items during a novel method to embed directed acyclic the learning process. Popular examples are word embedding graphs. Following prior work, we first advocate algorithms trained on corpora co-occurrence statistics which for using hyperbolic spaces which provably have shown to strongly relate semantically close words and model tree-like structures better than Euclidean their topics (Mikolov et al., 2013; Pennington et al., 2014). geometry. Second, we view hierarchical relations However, in many fields (e.g. Recommender Systems, Ge- as partial orders defined using a family of nested nomics (Billera et al., 2001), Social Networks), one has to geodesically convex cones. We prove that these deal with data whose latent anatomy is best defined by non- entailment cones admit an optimal shape with a Euclidean spaces such as Riemannian manifolds (Bronstein closed form expression both in the Euclidean and et al., 2017). Here, the Euclidean symmetric models suf- hyperbolic spaces, and they canonically define fer from not properly reflecting complex data patterns such the embedding learning process. Experiments as the latent hierarchical structure inherent in taxonomic show significant improvements of our method data. To address this issue, the emerging trend of geometric over strong recent baselines both in terms of deep learning1 is concerned with non-Euclidean manifold representational capacity and generalization. representation learning. In this work, we are interested in geometrical modeling 1. Introduction of hierarchical structures, directed acyclic graphs (DAGs) and entailment relations via low dimensional embeddings. Producing high quality feature representations of data such Starting from the same motivation, the order embeddings as text or images is a central point of interest in artificial method (Vendrov et al., 2015) explicitly models the partial intelligence. A large line of research focuses on embedding order induced by entailment relations between embedded discrete data such as graphs (Grover & Leskovec, 2016; objects. Formally, a vector x 2 Rn represents a more gen- Goyal & Ferrara, 2017) or linguistic instances (Mikolov eral concept than any other embedding from the Euclidean et al., 2013; Pennington et al., 2014; Kiros et al., 2015) into entailment region Ox := fy j yi ≥ xi; 81 ≤ i ≤ ng. A first continuous spaces that exhibit certain desirable geometric concern is that the capacity of order embeddings grows only properties. This class of models has reached state-of-the- linearly with the embedding space dimension. Moreover, arXiv:1804.01882v3 [cs.LG] 6 Jun 2018 art results for various tasks and applications, such as link the regions Ox suffer from heavy intersections, implying prediction in knowledge bases (Nickel et al., 2011; Bor- that their disjoint volumes rapidly become bounded2. As a des et al., 2013) or in social networks (Hoff et al., 2002), consequence, representing wide (with high branching fac- text disambiguation (Ganea & Hofmann, 2017), word hyper- tor) and deep hierarchical structures in a bounded region nymy (Shwartz et al., 2016), textual entailment (Rocktaschel¨ of the Euclidean space would cause many points to end up et al., 2015) or taxonomy induction (Fu et al., 2014). undesirably close to each other. This also implies that Eu- clidean distances would no longer be capable of reflecting 1 Department of Computer Science, ETH Zurich, the original tree metric. Switzerland. Correspondence to: Octavian-Eugen Ganea <[email protected]>, Gary Becigneul´ Fortunately, the hyperbolic space does not suffer from the <[email protected]>, Thomas Hofmann aforementioned capacity problem because the volume of <[email protected]>. 1 th http://geometricdeeplearning.com/ Proceedings of the 35 International Conference on Machine 2 For example, in n dimensions, no n + 1 distinct regions Ox Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 can simultaneously have unbounded disjoint sub-volumes. by the author(s). Hyperbolic Entailment Cones any ball grows exponentially with its radius, instead of poly- We also compute an analytic closed-form expression for the nomially as in the Euclidean space. This exponential growth exponential map in the n-dimensional Poincare´ ball, allow- property enables hyperbolic spaces to embed any weighted ing us to perform full Riemannian optimization (Bonnabel, tree while almost preserving their metric3 (Gromov, 1987; 2013) in the Poincare´ ball, as opposed to the approximate Bowditch, 2006; Sarkar, 2011). The tree-likeness of hyper- optimization method used by (Nickel & Kiela, 2017). bolic spaces has been extensively studied (Hamann, 2017). Moreover, hyperbolic spaces are used to visualize large 2. Mathematical preliminaries hierarchies (Lamping et al., 1995), to efficiently forward information in complex networks (Krioukov et al., 2009; We now briefly visit some key concepts needed in our work. Cvetkovski & Crovella, 2009) or to embed heterogeneous, scale-free graphs (Shavitt & Tankel, 2008; Krioukov et al., Notations. We always use k · k to denote the Euclidean 2010; Blasius¨ et al., 2016). norm of a point (in both hyperbolic or Euclidean spaces). From a machine learning perspective, recently, hyperbolic We also use h·; ·i to denote the Euclidean scalar product. spaces have been observed to provide powerful representa- tions of entailment relations (Nickel & Kiela, 2017). The 2.1. Differential geometry latent hierarchical structure surprisingly emerges as a sim- For a rigorous reasoning about hyperbolic spaces, one needs ple reflection of the space’s negative curvature. However, to use concepts in differential geometry, some of which we the approach of (Nickel & Kiela, 2017) suffers from a few highlight here. For an in-depth introduction, we refer the drawbacks: first, their loss function causes most points to reader to (Spivak, 1979) and (Hopper & Andrews, 2010). collapse on the border of the Poincare´ ball, as exemplified in Figure3. Second, the hyperbolic distance alone (being Manifold. A manifold M of dimension n is a set that can symmetric) is not capable of encoding asymmetric relations n be locally approximated by the Euclidean space R . For needed for entailment detection, thus a heuristic score is cho- 2 2 3 instance, the sphere S and the torus T embedded in R sen to account for concept generality or specificity encoded are 2-dimensional manifolds, also called surfaces, as they in the embedding norm. 2 can locally be approximated by R . The notion of manifold We here inspire ourselves from hyperbolic embeddings is a generalization of the notion of surface. (Nickel & Kiela, 2017) and order embeddings (Vendrov et al., 2015). Our contributions are as follows: Tangent space. For x 2 M, the tangent space TxM of M at x is defined as the n-dimensional vector-space • We address the aforementioned issues of (Nickel & approximating M around x at a first order. It can be defined 0 Kiela, 2017) and (Vendrov et al., 2015). We propose to as the set of vectors v that can be obtained as v := c (0), where c :(−"; ") !M is a smooth path in M such that replace the entailment regions Ox of order-embeddings by a more efficient and generic class of objects, namely c(0) = x. geodesically convex entailment cones. These cones are defined on a large class of Riemannian manifolds and Riemannian metric. A Riemannian metric g on M is a induce a partial ordering relation in the embedding collection (gx)x of inner-products gx : TxM × TxM! R space. on each tangent space TxM, depending smoothly on x. Although it defines the geometry of M locally, it induces • The optimal entailment cones satisfying four natural a global distance function d : M × M ! R+ by setting properties surprisingly exhibit canonical closed-form d(x; y) to be the infimum of all lengths of smooth curves expressions in both Euclidean and hyperbolic geometry joining x to y in M, where the length ` of a curve γ : that we rigorously derive. [0; 1] !M is defined as Z 1 q • An efficient algorithm for learning hierarchical em- 0 0 `(γ) = gγ(t)(γ (t); γ (t))dt: (1) beddings of directed acyclic graphs is presented. This 0 learning process is driven by our entailment cones. Riemannian manifold. A smooth manifold equipped • Experimentally, we learn high quality embeddings and with a Riemannian metric is called a Riemannian mani- improve over experimental results in (Nickel & Kiela, fold. Subsequently, due to their metric properties, we will 2017) and (Vendrov et al., 2015) on hypernymy link only consider such manifolds. prediction for word embeddings, both in terms of ca- pacity of the model and generalization performance. Geodesics. A geodesic (straight line) between two points 3See end of Section 2.2 for a rigorous formulation. x; y 2 M is a smooth curve of minimal length joining x to Hyperbolic Entailment Cones y in M. Geodesics define shortest paths on the manifold. tangent spaces), straight lines (geodesics), curve lengths or They are a generalization of lines in the Euclidean space. volume elements. In the Poincare´ ball model, the Euclidean metric is changed by a simple scalar field, hence the model Exponential map. The exponential map expx : TxM! is conformal (i.e. angle preserving), yet distorts distances. M around x, when well-defined, maps a small perturba- tion of x by a vector v 2 TxM to a point expx(v) 2 M, Induced distance and norm. It is known (Nickel & such that t 2 [0; 1] 7! expx(tv) is a geodesic join- Kiela, 2017) that the induced distance between 2 points n ing x to expx(v).