Algorithms and Hardness for Linear Algebra on Geometric Graphs
Total Page:16
File Type:pdf, Size:1020Kb
Algorithms and Hardness for Linear Algebra on Geometric Graphs∗ Josh Alman† Timothy Chu‡ Aaron Schild§ Zhao Song¶ Abstract d d d For a function K : R R R≥ , and a set P = x ,...,xn R of n points, the K × → 0 { 1 } ⊂ graph GP of P is the complete graph on n nodes where the weight between nodes i and j is given by K(xi, xj ). In this paper, we initiate the study of when efficient spectral graph theory is possible on these graphs. We investigate whether or not it is possible to solve the following 1+o(1) o(1) problems in n time for a K-graph GP when d<n : • Multiply a given vector by the adjacency matrix or Laplacian matrix of GP • Find a spectral sparsifier of GP • Solve a Laplacian system in GP ’s Laplacian matrix 2 For each of these problems, we consider all functions of the form K(u, v) = f( u v 2) for a function f : R R. We provide algorithms and comparable hardness resultsk for−manyk such K, including the→ Gaussian kernel, Neural tangent kernels, and more. For example, in dimension d = Ω(log n), we show that there is a parameter associated with the function f for which low parameter values imply n1+o(1) time algorithms for all three of these problems and high parameter values imply the nonexistence of subquadratic time algorithms assuming Strong Exponential Time Hypothesis (SETH), given natural assumptions on f. As part of our results, we also show that the exponential dependence on the dimension d in the celebrated fast multipole method of Greengard and Rokhlin cannot be improved, assuming SETH, for a broad class of functions f. To the best of our knowledge, this is the first formal limitation proven about fast multipole methods. arXiv:2011.02466v1 [cs.DS] 4 Nov 2020 ∗A preliminary version of this paper appeared in the Proceedings of 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS 2020). †[email protected], Harvard University, supported by a Rabin fellowship. Part of the work done while the author was a graduate student at MIT. Supported in part by a Michael O. Rabin Postdoctoral Fellowship. ‡[email protected], Carnegie Mellon University. §[email protected], University of Washington. Part of the work done while the author was a graduate student at UC Berkeley. Supported in part by ONR #N00014-17-1-2429, a Packard fellowship, and the University of Washington. ¶[email protected], Columbia University, Princeton University and Institute for Advanced Study. Part of the work done while the author was a graduate student at UT-Austin. Supported in part by Ma Huateng Foundation, Schmidt Foundation, Simons Foundation, NSF, DARPA/SRC, Google and Amazon. Contents 1 Introduction 1 1.1 High-dimensional results . ......... 3 1.1.1 Multiplication................................ .... 3 1.1.2 Sparsification ................................. ... 4 1.1.3 Laplaciansolving.............................. .... 5 1.2 OurTechniques................................... .... 6 1.2.1 Multiplication................................ .... 6 1.2.2 Sparsification ................................. ... 8 1.3 Brief summary of our results in terms of pf ....................... 9 1.4 Summary of our Results on Examples . ....... 11 1.5 OtherRelatedWork ................................ 11 2 Summary of Low Dimensional Results 12 2.1 Multiplication.................................. ...... 12 2.2 Sparsification ................................... ..... 13 2.3 Laplaciansolving ................................ ...... 13 3 Preliminaries 14 3.1 Notation........................................ 14 3.2 Graph and Laplacian Notation . ....... 15 3.3 Spectral Sparsification via Random Sampling . ............ 16 3.4 WoodburyIdentity ................................ ..... 17 3.5 TailBounds....................................... 17 3.6 Fine-GrainedHypotheses . ....... 17 3.7 Dimensionality Reduction . ........ 18 3.8 NearestNeighborSearch . ....... 18 3.9 Geometric Laplacian System . ........ 19 4 Equivalence of Matrix-Vector Multiplication and Solving Linear Systems 20 4.1 Solving Linear Systems Implies Matrix-Vector Multiplication . 22 4.2 Matrix-Vector Multiplication Implies Solving Linear Systems ............. 24 4.3 Lower bound for high-dimensional linear system solving ................ 24 5 Matrix-Vector Multiplication 25 5.1 Equivalence between Adjacency and Laplacian Evaluation ............... 26 5.2 ApproximateDegree ............................... ..... 27 5.3 ‘Kernel Method’ Algorithms . ........ 28 5.4 Lower Bound in High Dimensions . ...... 29 5.5 Lower Bounds in Low Dimensions . ...... 37 5.6 Hardness of the n-BodyProblem ............................. 39 5.7 HardnessofKernelPCA............................. ..... 39 6 Sparsifying Multiplicatively Lipschitz Functions in AlmostLinearTime 40 6.1 High Dimensional Sparsification . .......... 43 6.2 Low Dimensional Sparsification . ......... 44 i 7 Sparsifiers for x,y 44 |h i| 7.1 Existence of large expanders in inner product graphs . ............... 44 7.2 Efficient algorithm for finding sets with low effective resistance diameter . 47 7.3 Using low-effective-resistance clusters to sparsify the unweighted IP graph . 51 7.4 Sampling data structure . ....... 53 7.5 Weighted IP graph sparsification . ......... 56 7.5.1 (ζ, κ)-cover for unweighted IP graphs . 58 7.5.2 (ζ, κ)-cover for weighted IP graphs on bounded-norm vectors . 60 7.5.3 (ζ, κ)-cover for weighted IP graphs on vectors with norms in the set [1, 2] ∪ [z, 2z] for any z > 1 ................................ 60 7.5.4 (ζ, κ)-cover for weighted IP graphs with polylogarithmic dependence on norm 63 7.5.5 Desired (ζ, κ, δ)-cover ............................... 63 7.5.6 ProofofLemma7.1 ............................... 67 8 Hardness of Sparsifying and Solving Non-Multiplicatively-Lipschitz Laplacians 68 9 Fast Multipole Method 77 9.1 Overview ........................................ 77 9.2 K(x,y) = exp( x y 2),FastGaussiantransform. 78 −k − k2 9.2.1 Estimation .................................... 80 9.2.2 Algorithm..................................... 84 9.2.3 Result........................................ 86 9.3 Generalization.................................. ...... 87 9.4 K(x,y) = 1/ x y 2 .................................... 88 k − k2 10 Neural Tangent Kernel 88 References 89 ii 1 Introduction Linear algebra has a myriad of applications throughout computer science and physics. Consider the following seemingly unrelated tasks: 1. n-body simulation (one step): Given n bodies X located at points in Rd, compute the gravitational force on each body induced by the other bodies. 2. Spectral clustering: Given n points X in Rd, partition X by building a graph G on the points in X, computing the top k eigenvectors of the Laplacian matrix L of G for some k 1 G ≥ to embed X into Rk, and run k-means on the resulting points. 3. Semi-supervised learning: Given n points X in Rd and a function g : X R whose values → on some of X are known, extend g to the rest of X. Each of these tasks has seen much work throughout numerical analysis, theoretical computer science, and machine learning. The first task is a celebrated application of the fast multipole method of Greengard and Rokhlin [GR87, GR88, GR89], voted one of the top ten algorithms of the twentieth century by the editors of Computing in Science and Engineering [DS00]. The second task is spectral clustering [NJW02, LWDH13], a popular algorithm for clustering data. The third task is to label a full set of data given only a small set of partial labels [Zhu05b, CSZ09, ZL05], which has seen increasing use in machine learning. One notable method for performing semi-supervised learning is the graph-based Laplacian regularizer method [LSZ+19b, ZL05, BNS06, Zhu05a]. Popular techniques for each of these problems benefit from primitives in spectral graph theory on a special class of dense graphs called geometric graphs. For a function K : Rd Rd R and a set × → of points X Rd, the K-graph on X is a graph with vertex set X and edges with weight K(u, v) for ⊆ each pair u, v X. Adjacency matrix-vector multiplication, spectral sparsification, and Laplacian ∈ system solving in geometric graphs are directly relevant to each of the above problems, respectively: 1. n-body simulation (one step): For each i 1, 2,...,d , make a weighted graph G ∈ { } i on the points in X, in which the weight of the edge between the points u, v X in Gi is Ggrav mu mv vi ui ∈ Ki(u, v) := ( · 2· )( − ), where Ggrav is the gravitational constant and mx is the u v 2 u v 2 mass of the pointk −x k X. Letk −Ak denote the weighted adjacency matrix of G . Then A 1 is the ∈ i i i vector of ith coordinates of force vectors. In particular, gravitational force can be computed by doing O(d) adjacency matrix-vector multiplications, where each adjacency matrix is that of the Ki-graph on X for some i. 2 2. Spectral clustering: Make a K graph G on X. In applications, K(u, v)= f( u v 2), where z k − k f is often chosen to be f(z)= e− [vL07, NJW02]. Instead of directly running a spectral clus- tering algorithm on LG, one popular method is to construct a sparse matrix M approximating + LG and run spectral clustering on M instead [CFH16, CSB 11, KMT12]. Standard sparsifi- cation methods in the literature are heuristical, and include the widely used Nystrom method which uniformly samples rows and columns from the original matrix [CJK+13]. If H is a spectral sparsifier of G, it has been suggested that spectral clustering with the top k eigenvectors of LH performs just as well in practice as spectral clustering with the top k eigenvectors of LG [CFH16]. One justification is that since H is a spectral sparsifier of G, the eigenvalues of LH are at most a constant factor larger than those of LG, so cuts with similar conductance guarantees are produced. Moreover, spectral clustering using sparse matrices like LH is known to be faster than spectral clustering on dense matrices like LG [CFH16, CJK+13, KMT12]. 3. Semi-supervised learning: An important subroutine in semi-supervised learning is com- + pletion based on ℓ2-minimization [Zhu05a, Zhu05b, LSZ 19b]. Specifically, given values gv for v Y , where Y is a subset of X, find the vector g Rn (variable over X Y ) that minimizes ∈ ∈ \ 1 K(u, v)(g g )2.