Algorithms and Hardness for Linear Algebra on Geometric Graphs
Total Page:16
File Type:pdf, Size:1020Kb
2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS) Algorithms and Hardness for Linear Algebra on Geometric Graphs Josh Alman Timothy Chu Aaron Schild Zhao Song [email protected] [email protected] [email protected] [email protected] Harvard U. Carnegie Mellon U. U. of Washington Columbia, Princeton, IAS d d Abstract—For a function K : R × R → R≥0, and a set learning. The first task is a celebrated application of the P {x ,...,x }⊂Rd n K G P = 1 n of points, the graph P of fast multipole method of Greengard and Rokhlin [GR87], n is the complete graph on nodes where the weight between [GR88], [GR89], voted one of the top ten algorithms of the nodes i and j is given by K(xi,xj ). In this paper, we initiate the study of when efficient spectral graph theory is possible twentieth century by the editors of Computing in Science and on these graphs. We investigate whether or not it is possible Engineering [DS00]. The second task is spectral clustering to solve the following problems in n1+o(1) time for a K-graph [NJW02], [LWDH13], a popular algorithm for clustering o(1) GP when d<n : data. The third task is to label a full set of data given only • Multiply a given vector by the adjacency matrix or a small set of partial labels [Zhu05b], [CSZ09], [ZL05], Laplacian matrix of GP which has seen increasing use in machine learning. One • G Find a spectral sparsifier of P notable method for performing semi-supervised learning is • Solve a Laplacian system in GP ’s Laplacian matrix the graph-based Laplacian regularizer method [LSZ+19b], For each of these problems, we consider all functions of the K u, v f u − v2 f R → R [ZL05], [BNS06], [Zhu05a]. form ( )= ( 2) for a function : .We provide algorithms and comparable hardness results for many Popular techniques for each of these problems benefit K such , including the Gaussian kernel, Neural tangent kernels, from primitives in spectral graph theory on a special class and more. For example, in dimension d =Ω(logn), we show f of dense graphs called geometric graphs. For a function that there is a parameter associated with the function for d d d which low parameter values imply n1+o(1) time algorithms for K : R × R → R and a set of points X ⊆ R , the K- all three of these problems and high parameter values imply graph on X is a graph with vertex set X and edges with the nonexistence of subquadratic time algorithms assuming weight K(u, v) for each pair u, v ∈ X. Adjacency matrix- SETH Strong Exponential Time Hypothesis ( ), given natural vector multiplication, spectral sparsification, and Laplacian assumptions on f. As part of our results, we also show that the exponential system solving in geometric graphs are directly relevant to dependence on the dimension d in the celebrated fast multi- each of the above problems, respectively: pole method of Greengard and Rokhlin cannot be improved, ∈ assuming SETH, for a broad class of functions f. To the best of 1) n-body simulation (one step): For each i our knowledge, this is the first formal limitation proven about {1, 2,...,d}, make a weighted graph Gi on the fast multipole methods. points in X, in which the weight of the edge be- tween the points u, v ∈ X in Gi is Ki(u, v):= Ggrav·mu·mv vi−ui ( 2 )( ), where Ggrav is Gravitational I. INTRODUCTION u−v2 u−v2 constant and mx is the mass of the point x ∈ X. Let Linear algebra has a myriad of applications throughout Ai denote the weighted adjacency matrix of Gi. Then computer science and physics. Consider the following seem- Ai1 is the vector of ith coordinates of force vectors. In ingly unrelated tasks: particular, gravitational force can be computed by doing 1) n-body simulation (one step):Givenn bodies X O(d) adjacency matrix-vector multiplications, where Rd located at points in , compute the gravitational force each adjacency matrix is that of the Ki-graph on X on each body induced by the other bodies. for some i. Rd 2) Spectral clustering:Givenn points X in , partition 2) Spectral clustering: Make a K graph G on X.In X by building a graph G on the points in X, computing K − 2 applications, (u, v)=f( u v 2), where f is often the top k eigenvectors of the Laplacian matrix LG of G chosen to be f(z)=e−z [vL07], [NJW02]. Instead ≥ Rk for some k 1 to embed X into , and run k-means of directly running a spectral clustering algorithm on on the resulting points. LG, one popular method is to construct a sparse matrix Rd 3) Semi-supervised learning:Givenn points X in M approximating LG and run spectral clustering on → R and a function g : X whose values on some of M instead [CFH16], [CSB+11], [KMT12]. Standard X are known, extend g to the rest of X. sparsification methods in the literature are heuristical, Each of these tasks has seen much work throughout nu- and include the widely used Nystrom method which merical analysis, theoretical computer science, and machine uniformly samples rows and columns from the original 2575-8454/20/$31.00 ©2020 IEEE 541 DOI 10.1109/FOCS46700.2020.00057 + K − q matrix [CJK 13]. (u, v)= u v 2 [Llo82], [AS14], [ACX19], [CMS20]. If H is a spectral sparsifier of G, it has been suggested We would also like to emphasize that many kernel that spectral clustering with the top k eigenvectors functions which do not at first appear to be of the form H − 2 of L performs just as well in practice as spectral f( u v 2) can be rearranged appropriately to be of this clustering with the top k eigenvectors of LG [CFH16]. form. For instance, in the full version we show that the re- One justification is that since H is a spectral sparsifier cently popular Neural Tangent Kernel is of this form, so our of G, the eigenvalues of LH are at most a constant results apply to it as well. That said, to emphasize that our factor larger than those of LG, so cuts with similar results are very general, we will mention later how they also conductance guarantees are produced. Moreover, spec- apply to some functions of the form K(u, v)=f(u, v), tral clustering using sparse matrices like LH is known including K(u, v)=|u, v|. to be faster than spectral clustering on dense matrices Next, we briefly elaborate on the problems that we con- like LG [CFH16], [CJK+13], [KMT12]. sider (part (c)). For more details, see the full version. The 3) Semi-supervised learning: An important subroutine in points in X are assumed to be real numbers stated with semi-supervised learning is completion based on 2- polylog(n) bits of precision. Our algorithms and hardness minimization [Zhu05a], [Zhu05b], [LSZ+19b]. Specif- results pertain to algorithms that are allowed some degree ically, given values gv for v ∈ Y , where Y is a subset of approximation. For an error parameter ε>0, our n of X, find the vector g ∈ R (variable over X \Y ) that multiplication and Laplacian system solving algorithms pro- K − 2 minimizes u,v∈X,u= v (u, v)(gu gv) . The vector duce solutions with ε-additive error, and our sparsification g can be found by solving a Laplacian system on the algorithms produce a graph H for which the Laplacian K-graph for X. quadratic form (1 ± ε)-approximates that of G. In the first, second, and third tasks above, a small Matrix-vector multiplication, spectral sparsification, and number of calls to matrix-vector multiplication, spectral Laplacian system solving are very natural linear algebraic sparsification, and Laplacian system solving, respectively, problems in this setting, and have many applications beyond were made on geometric graphs. One could solve these the three we have focused on (n-body simulation, spectral problems by first explicitly writing down the graph G and clustering, and semi-supervised learning). See Section I-E then using near-linear time algorithms [SS11], [CKM+14]to below where we expand on more applications. multiply, sparsify, and solve systems. However, this requires Finally, we discuss dependencies on the dimension d and 1+o(1) a minimum of Ω(n2) time, as G is a dense graph. the accuracy ε for which n algorithms are possible (part (b)). Define α, a measure of the ‘diameter’ of the point In this paper, we initiate a theoretical study of the ge- set and f,as ometric graphs for which efficient spectral graph theory is possible. In particular, we attempt to determine for which − 2 − 2 maxu,v∈X f( u v 2) maxu,v∈X u v 2 (a) functions K and (b) dimensions d there is a much α := + . minu,v∈X f(u − v2) minu,v∈X u − v2 faster, n1+o(1)-time algorithm for each of (c) multiplication, 2 2 sparsification, and Laplacian solving. Before describing our It is helpful to have the following two questions in mind results, we elaborate on the choices of (a), (b), and (c) that when reading our results: we consider in this work. • (High-dimensional algorithms, e.g. d =Θ(logn)) We start by discussing the functions K that we consider Is there an algorithm which runs in time (part (a)). Our results primarily focus on the class of 1+o(1) K − 2 poly(d, log(nα/ε))n for multiplication and functions of the form (u, v)=f( u v 2) for a function d Laplacian solving? Is there an algorithm which runs in f : R≥ → R for u, v ∈ R .