Lorentzian Distance Learning for Hyperbolic Representations

Lorentzian Distance Learning for Hyperbolic Representations Marc T. Law 1 2 3 Renjie Liao 1 2 Jake Snell 1 2 Richard S. Zemel 1 2 Abstract A common and natural non-Euclidean representation space We introduce an approach to learn representations is the spherical model (e.g.( Wang et al., 2017)) where the d d data lies on a unit hypersphere = x R : x 2 = 1 based on the Lorentzian distance in hyperbolic ge- S { ∈ � � } ometry. Hyperbolic geometry is especially suited and angles are compared with the cosine similarity func- to hierarchically-structured datasets, which are tion. Recently, some machine learning approaches (Nickel prevalent in the real world. Current hyperbolic & Kiela, 2017; 2018; Ganea et al., 2018) have considered representation learning methods compare exam- representing hierarchical datasets with the hyperbolic model. ples with the Poincare´ distance. They try to min- The motivation is that anyfinite tree can be mapped into imize the distance of each node in a hierarchy afinite hyperbolic space while approximately preserving with its descendants while maximizing its dis- distances (Gromov, 1987), which is not the case for Eu- tance with other nodes. This formulation pro- clidean space (Linial et al., 1995). Since hierarchies can duces node representations close to the centroid be formulated as trees, hyperbolic spaces can be used to of their descendants. To obtain efficient and inter- represent hierarchically structured data where the high-level pretable algorithms, we exploit the fact that the nodes of the hierarchy are represented close to the origin centroid w.r.t the squared Lorentzian distance can whereas leaves are further away from the origin. be written in closed-form. We show that the Eu- An important question is the form of hyperbolic geometry. clidean norm of such a centroid decreases as the Since theirfirst formulation in the early nineteenth century curvature of the hyperbolic space decreases. This by Lobachevsky and Bolyai, hyperbolic spaces have been property makes it appropriate to represent hierar- used in many domains. In particular, they became popular in chies where parent nodes minimize the distances mathematics (e.g. space theory and differential geometry), to their descendants and have smaller Euclidean and physics when Varicak (1908) discovered that special norm than their children. Our approach obtains relativity theory (Einstein, 1905) had a natural interpretation state-of-the-art results in retrieval and classifica- in hyperbolic geometry. Various hyperbolic geometries and tion tasks on different datasets. related distances have been studied since then. Among them are the Poincare´ metric, the Lorentzian distance (Ratcliffe, 2006), and the gyrodistance (Ungar, 2010; 2014). 1. Introduction In the case of hierarchical datasets, machine learning ap- Generalizations of Euclidean space are important forms of proaches that learn hyperbolic representations designed to data representation in machine learning. For instance, ker- preserve the hierarchical similarity order have typically em- nel methods (Shawe-Taylor et al., 2004) rely on Hilbert ployed the Poincare´ metric. Usually, the optimization prob- spaces that possess the structure of the inner product and lem is formulated so that the representation of a node in a are therefore used to compare examples. The properties hierarchy should be closer to the representation of its chil- of such spaces are well-known and closed-form relations dren and other descendants than to any other node in the are often exploited to obtain efficient, scalable, and inter- hierarchy. Based on (Gromov, 1987), the Poincare´ metric is pretable training algorithms. While representing examples a sensible dissimilarity function as it satisfies all the proper- in a Euclidean space is appropriate to compare lengths and ties of a distance metric and is thus natural to interpret. angles, non-Euclidean representations are useful when the task requires specific structure. In this paper, we explain why the squared Lorentzian distance is a better choice than the Poincare´ metric. One ana- 1University of Toronto, Canada 2Vector Institute, Canada 3 lytic argument relies on Jacobi Field (Lee, 2006) properties NVIDIA, work done while affiliated with the University of of Riemannian centers of mass (also called “Karcher means” Toronto. Correspondence to: Marc Law<[email protected]>. although Karcher (2014) strongly discourages the use of Proceedings of the 36 th International Conference on Machine that term). One other interesting property is that its centroid Learning, Long Beach, California, PMLR 97, 2019. Copyright can be written in closed form. 2019 by the author(s). Lorentzian Distance Learning for Hyperbolic Representations Contributions: The main contributions of this paper are space of our model (e.g. the output representation of some the study of the Lorentzian distance. We show that interpret- neural network). We consider that d =R d. F ing the squared Lorentzian distances with a set of points is equivalent to interpreting the distance with their centroid. 2.2. Optimizing the Poincare´ distance metric We also study the dependence of the centroid with some hyperparameters, particularly the curvature of the manifold Most methods that compare hyperbolic representations that has an impact on its Euclidean norm which is used as a (Nickel & Kiela, 2017; 2018; Ganea et al., 2018; Gulcehre proxy for depth in the hierarchy. This is the key motivation et al., 2019) consider the Poincare´ distance metric defined for allc d,d d as: for our theoretical work characterizing its behavior w.r.t. the ∈P ∈P 2 curvature. We relate the Lorentzian distance to other hyper- 1 c d d (c,d) = cosh − 1 + 2 � − � (3) bolic distances/geometries and explore its performance on P (1 c 2)(1 d 2) retrieval and classification problems. � −� � −� � � which satisfies all the properties of a distance metric and is therefore natural to interpret. Direct optimization in d P 2. Background of problems using the distance formulation in Eq. (3) is In this section, we provide some technical background about numerically unstable for two main reasons (see for instance hyperbolic geometry and introduce relevant notation. The (Nickel & Kiela, 2018) or (Ganea et al., 2018, Section 4)). interested reader may refer to (Ratcliffe, 2006). First, the denominator depends on the norm of examples, so optimizing over c and d when either of their norms is close to 1 leads to numerical instability. Second, elements have 2.1. Notation and definitions to be re-projected onto the Poincare´ ball at each iteration To simplify the notation, we consider that vectors are row with afixed maximum norm. Moreover, Eq. (3) is not vectors and is the � -norm. In the following, we consider differentiable whenc=d (see proof in appendix). �·� 2 three important spaces. For better numerical stability of their solver, Nickel & Kiela Poincare´ ball: The Poincare´ ball d is defined as the set of (2018) propose to use an equivalent formulation of d in P P d-dimensional vectors with Euclidean norm smaller than 1 the unit hyperboloid model. They use the fact that there (i.e. d = x R d : x <1 ). Its associated distance is exists an invertible mapping h: d,β d defined for all P { ∈ � � } H →P the Poincare´ distance metric defined in Eq. (3). a=(a , ,a ) d,β as: 0 ··· d ∈H Hyperboloid model: We consider some specific hyper- 1 d d,β d+1 h(a) := (a1, ,a d) (4) boloid models R defined as follows: d 2 ··· ∈P H ⊆ 1 + 1 + i=1 ai d,β d+1 2 � := a=(a 0, ,a d) R : a = β,a 0 >0 When β=1,a d,�b d, we have the following H { ··· ∈ � � L − } ∈H ∈H (1) equivalence: where β>0 and a 2 = a,a is the squared Lorentzian � �L � � L 1 norm of a. The squared Lorentzian norm is derived from the d (a,b) =d (h(a),h(b)) = cosh− ( a,b ) (5) H P −� � L Lorentzian inner product defined for all a=(a 0, ,a d) d,β d,β ··· ∈ Nickel & Kiela (2018) show that optimizing the formulation ,b=(b 0, ,b d) as: in Eq. (5) in d is more stable numerically. H ··· ∈H H d Duality between spherical and hyperbolic geometries: a,b := a 0b0 + aibi β (2) One can observe from Eq. (5) that preserving the order � � L − ≤ − i=1 � of Poincare´ distances is equivalent to preserving the reverse order of Lorentzian inner products (defined in Eq. (2)) since It is worth noting that a,b = β iff a=b . Otherwise, 1 � � L − the cosh− function is monotonically increasing on its do- a,b < β for all pairs (a,b) ( d,β)2. Vectors in � � L − ∈ H main [1, + ). The relationship between the Poincare´ met- d,β are a subset of positive time-like vectors1. The hyper- ∞ H d,β ric and the Lorentzian inner product is actually similar to the boloid has constant negative curvature 1/β. More- 1 H − relationship between the geodesic distance cos− ( p,q ) d,β d 2 � � over, every vector a satisfies a0 = β+ i=1 ai . and the cosine p,q (or the squared Euclidean distance ∈H � � We note d := d,1 the space obtained when� β=1 ; it is p q 2 = 2 2 p,q ) when p and q are on a unit hyper- H H � � − � d − � � called the unit hyperboloid model and is the main hypersphere because of the duality between these geometries S boloid model considered in the literature. (Ratcliffe, 2006). The hyperboloid d,β can be seen as a H half hypersphere of imaginary radius i√β. In the same way Model space: Finally, we note d R d the output vector F ⊆ as kernel methods that consider inner products in Hilbert 1A vector a that satisfies a,a <0 is called time-like and it spaces as similarity measures, we consider in this paper the � � L is called positive iffa 0 >0.

Lorentzian Distance Learning for Hyperbolic Representations

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support