Metric Embedding, Hyperbolic Space, and Social Networks∗
Total Page:16
File Type:pdf, Size:1020Kb
Metric Embedding, Hyperbolic Space, and Social Networks∗ Kevin Verbeek† Subhash Suri† ABSTRACT matical content, these embeddings have also become a ma- jor tool in the analysis of high-dimensional data: the general We consider the problem of embedding an undirected graph idea is to embed a high-dimensional metric space into a much into hyperbolic space with minimum distortion. A funda- smaller-dimensional space without creating large distortion mental problem in its own right, it has also drawn a great in pair-wise distances. A celebrated result in this area is the deal of interest from applied communities interested in em- Johnson-Lindenstrauss Lemma [7], showing that metrics in- pirical analysis of large-scale graphs. In this paper, we es- duced by n points in an Euclidean space can be embedded tablish a connection between distortion and quasi-cyclicity 2 with (1 + ε) distortion into O(ε− log n)-dimensional space. of graphs, and use it to derive lower and upper bounds The reader may consult the book by Matouˇsek [13] for an on metric distortion. Two particularly simple and natu- excellent survey and exposition to many elegant results on ral graphs with large quasi-cyclicity are n-node cycles and metric embeddings. n n square lattices, and our lower bound shows that any In a different context, researchers in the networking com- hyperbolic-space× embedding of these graphs incurs a multi- munities have also explored geometric embeddings for net- plicative distortion of at least Ω(n/ log n). This is in sharp work measurements and geographic routing [8, 17, 18, 20]. contrast to Euclidean space, where both of these graphs can A number of network services use shortest path distances to be embedded with only constant multiplicative distortion. implement application level multicast trees or routes [8, 14, We also establish a relation between quasi-cyclicity and δ- 17]. While maintaining the full matrix of pairwise distances hyperbolicity of a graph as a way to prove upper bounds is infeasible in Internet scale networks, a geometric embed- on the distortion. Using this relation, we show that graphs ding gives a highly compact representation of those dis- with small quasi-cyclicity can be embedded into hyperbolic tances. Similarly, virtual coordinates and geographic rout- space with only constant additive distortion. Finally, we also ing are an elegant way to avoid the need for full network present an efficient (linear-time) randomized algorithm for topology at each network router. There is also a great deal of embedding a graph with small quasi-cyclicity into hyperbolic interest in estimating pair-wise distances in social networks, space, so that with high probability at least a (1 ε) frac- as a means to measure influence, to recommend friends, and tion of the node-pairs has only constant additive distortion.− to show selective profiles, but running all-pair shortest path Our results also give a plausible theoretical explanation for algorithms is infeasible at their scale. Fortunately, a number why social networks have been observed to embed well into of empirical studies have shown that simple geometric em- hyperbolic space: they tend to have small quasi-cyclicity. beddings are good enough for estimating (most) pair-wise network distances in large graphs [19, 20]. The embedding 1. INTRODUCTION dimension is typically small (e.g. 10), and therefore the dis- Metric embeddings have been a topic of intense research tance between any two nodes can be calculated in constant in many areas of computer science, including geometry, the- time. Interestingly, these experimental results also show ory, databases and machine learning, during the past few that hyperbolic-space embeddings lead to higher accuracy decades. Besides their fundamental nature and rich mathe- (lower distortion) than Euclidean space [18, 20]. Kleinberg’s well-known result [8] that every connected finite graph has ∗This research was partially supported by the National a greedy embedding into the hyperbolic plane, while many Science Foundation Grant CNS-1035917 and by DARPA GRAPHS (BAA-12-01). graphs do not admit such an embedding in the Euclidean plane, also underscores the relative advantage of hyperbolic †Department of Computer Science, University of California Santa Barbara [kverbeek|suri]@cs.ucsb.edu space. Some researchers have even argued that the “hidden space” of the Internet and social networks is naturally the negatively curved hyperbolic space [12]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not Our work is a theoretical counterpart to this line of em- made or distributed for profit or commercial advantage and that copies bear pirical research. We are interested both in establishing the this notice and the full citation on the first page. Copyrights for components worst-case distortion bounds for hyperbolic space embed- of this work owned by others than ACM must be honored. Abstracting with ding in general, and in discovering useful characterizations credit is permitted. To copy otherwise, or republish, to post on servers or to of graphs that may explain the observed low distortion in redistribute to lists, requires prior specific permission and/or a fee. Request hyperbolic embedding of large-scale social networks. We permissions from [email protected]. SoCG’14, June 8–11, 2014, Kyoto, Japan. begin with some formal definitions necessary to formulate Copyright 2014 ACM 978-1-4503-2594-3/14/06 ...$15.00. the problem and state our results. 1 1.1 Metric Spaces and Embeddings Using this relation, we show that graphs with small quasi- A metric space is a tuple ( ,d) where is the underlying cyclicity can be embedded into hyperbolic space with only X X space and d: R is the metric (or distance func- constant additive distortion. Finally, we present an efficient tion) measuringX× theX distance→ between two elements of . (linear-time) randomized algorithm for embedding a graph X A metric embedding (or simply embedding)of( 1,d1)into with small quasi-cyclicity into hyperbolic space, so that at X ( 2,d2)isamapf : 1 2. In this paper, we are primar- least a (1 ε) fraction of the node-pairs have constant ad- X X →X − ily concerned with embedding the graph metric (V,dG)of ditive distortion, with high probability. an undirected graph G =(V,E)intok-dimensional hyper- k 1.3 Related Work bolic space (H ,dH), where dG measures the length of the The body of research on hyperbolic space and its connec- shortest path between two nodes in G, and dH is the stan- k dard metric of H . In all our embeddings, the dimension k tions to various network properties and dynamics is large. of the target space is a constant, independent of the size of Due to lack of space, we just mention a few papers that the input graph. When the metric is clear from the context, are algorithmic in nature and explore questions similar to we simply refer to it as or G. the ones discussed in our paper. In [11], Krauthgamer and The quality of an embeddingX is measured by the worst- Lee present algorithms for approximate nearest-neighbors case distortion over all the distances in the metric, either and low-stretch routing, as well as a PTAS for the Travel- in relative or absolute terms. In particular, let f be an ing Salesman Problem in hyperbolic space. Their results embedding of the metric space ( 1,d1) into the metric space also apply to δ-hyperbolic spaces. In [3], Chepoi et al. X ( 2,d2). We say that f has multiplicative distortion c, for present schemes for computing an additive approximation cX 1, if of the diameter, center, and radius of δ-hyperbolic spaces ≥ and graphs. They also show that several graph classes are d2(f(x),f(y)) d1(x, y) cd2(f(x),f(y)) x, y 1 (1) ≤ ≤ ∀ ∈X δ-hyperbolic and present a linear-time algorithm for approx- The embedding f has additive distortion c if imating trees of n-node δ-hyperbolic graphs with O(δ log n) additive distortion. In [16], Sarkar shows that every finite d1(x, y) d2(f(x),f(y)) c x, y 1 (2) 2 | − |≤ ∀ ∈X tree admits an embedding into H with only 1 + ε multi- All our lower bounds in Section 3 are shown using multi- plicative distortion, for any ε>0; for this result, however, plicative distortion, which implies similar (in fact, stronger) the input metric needs to be scaled by a factor that depends lower bounds on the additive distortion as well. Our up- on both ε and the maximum degree of the tree, which may per bounds and algorithms, however, produce results with be non-constant. Nonetheless, this suggests a natural ap- additive distortion. proach to embed graph metrics into hyperbolic space: first embed the graph metric into a tree metric, and then embed Remark. Some papers on embeddings allow the input the tree metric into 2 using the result of [16]. Some graph distances to be scaled before constructing the embedding, H metrics (e.g. the n-cycle), however, require Ω(n) distortion which may produce smaller distortion especially for the ad- when embedded into a tree metric, as shown in [15]. On the ditive measure. We also utilize such a scaling by a constant other hand, under a relaxed model where the target metric factor for our upper bounds and algorithms. We do not con- is not a single tree but rather a distribution over several tree sider scaling of distances by a non-constant factor because metrics, one can achieve an expected distortion O(log n), as our intention is to investigate the intrinsic difference between shown by Fakcharoenphol et al.