The World Is Not Always Flat Or Learning Curved Manifolds
Total Page:16
File Type:pdf, Size:1020Kb
The World is not always Flat or Learning Curved Manifolds Evgeni Begelfor and Michael Werman Abstract Manifold learning and finding low-dimensional structure in the data is an important task. Most algorithms for the purpose embed the data in the Euclidean space, an approach which is destined to failure on non-flat data. This paper presents a non-iterative algebraic method for embed- ding the data into hyperbolic and spherical spaces. We argue that these spaces are often better than Euclidean space in capturing the geometry of the data. We also demonstrate the utility of these embeddings by show- ing how some of the standard data-handling algorithms translate to these curved manifolds. 1 Introduction and motivation Embedding data in low-dimensional space (also known as manifold learning) is important for many applications such as visualization, data manipulation (smoothing / noise reduc- tion etc.) and data exploration (clustering etc.). Most algorithms tackle the problem by ¡£¢ searching an embedding into a Euclidean space such that ¤ is as small as possible, yet the data maintain some of its geometry (either locally or globally). The main advantage of the Euclidean space is its being simple and well understood. However, many times the ¢ inherent geometry of the data isn't flat, thus any embedding in ¥ will nesecarily have a big distortion. For example, embedding a regular tree with the shortest path metric in the Euclidean space requires high dimension (Fig, 1). This is because the volume of a ball in ¡¦¢ grows only polynomially, thus not having ”enough space” for the tree. On the other hand, the volume in the hyperbolic space grows exponentially, allowing a good embedding. For another example, points on a sphere, such as cities on Earth with the natural distances will be distorted as well by embedding in the Euclidean space, Figure 2. This paper suggests an efficient non iterative algorithm for choosing the ”right” space and embedding data points in it. We also provide a method of computing geodesic distances, means and covariances in these spaces which are useful for data manipulation and under- standing. Our algorithm can be used either for embedding data given its pairwise dissimi- larity, or for dimensionality reduction through classical metric MDS. § A number of papers discussed the advantages of visualizing data using §©¨ or [24, 12, 14]. Our algorithm enhances the embedding of the data for these visualizations. School of Engineering and Computer Science, Hebrew University of Jerusalem, Jerusalem 91904, Israel 4.5 classical(euclidean) MDS 4 Curved MDS 3.5 8 3 6 2.5 2 Distortion 4 1.5 Distortion 1 2 0.5 0 0 0 5 10 15 20 −2 0 2 4 6 8 −4 Dimension Dimension x 10 Figure 1: The distortion of embedding the Figure 2: Embedding cities with their airline nodes of a tree using the pairwise nodes path distances. The distortion of the embedding is length as a metric versus the dimension of em- shown versus the curvature of the host space. bedding space. Euclidean embedding requires The best curvature is where is the radius much higher dimension than the curved one of the Earth The paper will review relevant properties of § and , then we will show how to choose the ”best” space and embed the data into it, based on a theorem of Blumenthal, Menger and Schoenberg [18, 2]. We will then show how k-means, EM and mean-shift can be effectively computed in these spaces The clustering is based on the possibility of computing geodesic distances, means and covariances in these spaces. 2 Previous work Embedding data in low dimensional space has two variants. In dimensionality reduction or manifold learning we want to embed high dimensional data in a lower dimensional space while keeping some properties to with the points geometry either globally or locally. In the second variant, called Multidimensional Scaling (MDS)(see [5] for general introduction), we only have the points pairwise dissimilarity measures, and our goal is to embed the data according to them. There are numerous approaches to MDS and we will briefly mention a few. Classical metric MDS (sometimes referred to as PCA, principal component analysis) [22] comes to minimize the norm of the differences between the inter-point distances given by the dis- similarity matrix¨ and those calculated between the embedded points. The main advantage of classical metric MDS is that it is non iterative and guaranteed to minimize the norm functional. Non classical metric MDS, for example, [17] comes to minimize different¨ func- tionals at the price of using iterative methods that are both computationally expensive and not guaranteed to converge to a global minima. Non-metric MDS [20, 11] tries to preserve the order of the distances instead of their values. Often the metric space data (pairwise dis- tances) has some missing values. There are a few methods to ”fill in” the missing values, such as using geodesic distances and SDE [25]. One of the methods for solving the dimensionality reduction problem is to reduce it to MDS by constructing the inter-point dissimilarity matrix. PCA can be looked upon as MDS using the global Euclidean distance. Isomap[21] tries to capture the geometry of the data by taking the ”geodesic” distances in the data between points. Some non-metric methods for dimensionality reduction are LLE[16], HLLE[7], Laplacian eigenmaps[1], which try to preserve relationships in the local neighborhoods of the data. All these methods work well ¢ if the data lies on or close to manifold which is isometric to a subset of ¡ . Learning of curved manifold have been discussed in [23], although the paper addresses non-flatness arising from the sampling of the manifold, and not from its own geometry. There have been papers related to embedding finite metric spaces into the hyperbolic [19], especially for visualization [12, 24], spherical [6] and general non-linear manifolds. This is a highly non convex problem and all solutions up to date were based on iterative methods that are prone to local minima. To emphasize, we've done the following experiment: We took random 100 points in the hyperbolic space, computed the distance matrix and than tried to find the embedding using iterative methods. Even in this case, when the distortion of the best embedding is zero, brute-force optimization results in embedding with high distortion. 3 Geometrical Background There are 3 types of Riemannian spaces with constant (sectional) curvature; spherical (cur- vature ), Euclidean (curvature ) and hyperbolic (curvature ). 3.1 Spherical Geometry 1 We model the m-dimensional sphere with curvature ¤ in the following way: "!$# - ¡ ¢ ¨ &%('*),+ - /.10 3 243 with the distance: )C5DE7 F )65879;:=<,>?>1@BA ¤ 3.2 Hyperbolic Geometry2 There different models of the hyperbolic space in this paper we shall use the hyperboloid model. For visualization# matters [14, 24] Klein or Poincare disk models may be more ¡ G¦)?H appropriate. On &%(' we define a bilinear form in the following way: - - -RQ GJIK)6LDHMI4N?L*NPO + I L ' GJL4)6LDHMS. The hyperboloid consisting of all vectors L such that has two sheets. We take § L N to be the one containing points with positive 3 , with the distance: 243 GIK)6L_HE7 F IK)6LT7UWVYX*Z1Z?[=\^] OE¤ 3badcEefaYg4c 3 7 79 where ' and is the inverse function of . >?@YAC` VBX*Z1Z1[=\d] >1@BA6` ¨ 3.3 Euclidean Geometry ¥ The well known Euclidean space can also be seen as the limit of the hyperbolic spaces ¢ § ¤ or the spherical spaces as tend to zero. 1 The curvature h may seem artificial in the way it is presented here. See [4, 13] for more infor- mation 2For further information the reader is referred to [8] 4 C-MDS: Curved Multidimensional Scaling # ¡ The goal of the well-known classical MDS is to embed a given metric space in equipped with the Euclidean metric minimizing the average distortion of the distances. We shall de- scribe a procedure generalizing MDS to non-flat spaces with constant curvature (spheres and hyperbolic spaces), which is very similar in spirit to the original MDS and also de- pends on a generalization of the PCA/SVD method. We need a theorem connecting the distances in a metric space i and its isometric embeddability in a space with constant curvature. The theorem goes back to Blumenthal, Menger and Schoenberg [18, 2] and the 2"k proof reproduced here was taken from [10]. # ¡ i j imlnipo ¤Jq/ Let be a finite space of size with % a metric and . Define an ¢ 3 3 2 sutwv j©lrj matrix as follows : -yx F -yx ¢ twv s ¤ 7 2 >1@BA ¢ i") ),¤z){s¦twv Theor3 em 1. Let as above. ¢ ¢ ¤r i § s¦twv . The metric space can be isometrically embedded in iff the signature of .|)C}~)CjrO}OJ.d7 } e is 3 with . ¢ ¢ ¤r i s¦twv . The metric space can be isometrically embedded in iff the signature of }~)C8)CjrO}=7 }£J . is 3 with . 7 Signature ){d) means that the matrix has positive eigenvalues, negative eigenvalues V Z V and eigenvalues. Z - -RQ # - ¢ ¤J ^5 § Proof. The hyperbolic case, . Let be points in . We look at the ¡ ' - ¢ § 5 &%(' hyperboloid model of , thus is a vector in . As defined above, the distances between I are 3 - x 2 3 - x G5 )C5 HE7 F 5 )65 7U VYX*Z1Z?[=\^] OE¤ - x - -yx ¢ twv £G5 )C5 H I thus s In other words, if we write as rows of matrix A and define an ? . ¢ O¦.? twv s ;¦E (m+1)-by-(m+1) matrix then one has and O¦. the condition on the signature follows 4. 3 3 e ¢ .|){}~)6jO}uO.^7 jnOf,5£O In the other direction, if s¦twv has signature then there exists a ¢ ¢ .^7 s¦twv ¦ §© matrix such that .