Dimensionality Reduction in Euclidean Space
Total Page:16
File Type:pdf, Size:1020Kb
Dimensionality Reduction in Euclidean Space Jelani Nelson I begin with a description of what this article is not about. but complementary to the above-mentioned approaches, It is not about Principal Component Analysis (PCA), Ker- as the form of dimension reduction we focus on here for nel PCA, Multidimensional Scaling, ISOMAP, Hessian example can be used to obtain faster algorithms for approx- Eigenmaps, or other methods of dimensionality reduction imate PCA. primarily created to help understand high-dimensional Moving back a few steps from dimension reduction, datasets. Rather, this article focuses on high dimension- more generally an effective technique in the design of al- ality as a barrier to algorithmic efficiency (i.e., low run- gorithms processing geometric data is to employ a metric ning time and/or memory consumption), and explores embedding to transform the input in one given metric space how dimension reduction can be used as an algorithmic to another that is computationally friendlier, and then to tool to overcome this barrier. In fact, as we discuss more in work over the latter space (see the survey [Ind01]). To mea- length in Section 5.2, this view is not only different than sure the quality of such an embedding, we use the follow- ing terminology: given a host metric space 풳 = (푋, 푑푋 ) and Jelani Nelson is a professor of electrical engineering and computer science at the a target space 풴 = (푌, 푑푌 ), 푓 ∶ 푋 → 푌 is said to be a bi- University of California, Berkeley. His email address is minilek@berkeley Lipschitz embedding with distortion 퐷 if there exists a (scal- .edu. ing) constant 푐 such that for all 푥, 푦 ∈ 푋, The author’s research was supported by NSF award CCF-1951384, ONR grant N00014-18-1-2562, ONR DORECG award N00014-17-1-2127, an Alfred P. 푐 ⋅ 푑푋 (푥, 푦) ≤ 푑푌 (푓(푥), 푓(푦)) ≤ 푐퐷 ⋅ 푑푋 (푥, 푦). (1) Sloan Research Fellowship, and a Google Faculty Research Award. Due to publisher constraints, only a limited number of references could be in- To illustrate the embedding paradigm in action, con- cluded. For a version of this article with a full list of references, please see full sider the k-median problem. The input is a finite metric version on the arXiv. space 풳 = (푋, 푑푋 ), |푋| = 푛, together with an integer Communicated by Notices Associate Editor Reza Malek-Madani. 1 ≤ 푘 ≤ 푛. The goal is to compute For permission to reprint this article, please contact: [email protected] ∗ . 푆 = 푎푟푔푚푖푛 ∑ min 푑푋 (푥, 푐). (2) 푐∈푆 푆⊂푋 푥∈푋 DOI: https://doi.org/10.1090/noti2166 |푆|=푘 1498 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 10 That is, we would like to partition 푋 into 푘 clusters, to- A natural question is then: for which normed spaces gether with identifying a cluster center 푐 in each cluster, so do there exist such dimensionality-reducing maps with as to minimize the sum of distances from every 푥 ∈ 푋 low distortion? An early and seminal result in this direc- to its closest cluster center. If 풳 can be an arbitrary 푛- tion was given by Johnson and Lindenstrauss [JL84], who point metric space, then this problem is known to be NP- showed that near-isometric embeddings exist when 풳, 풴 hard. Meanwhile when 풳 is the shortest path metric on are Euclidean. a tree, the problem can be solved exactly in time 푂(푘푛2) Lemma 1 (JL lemma [JL84]). Let 휀 ∈ (0, 1) and 푋 ⊂ ℝ푑 via the Kariv-Hakimi dynamic programming algorithm.1 be arbitrary with |푋| having size 푛 > 1. Then there exists 푓 ∶ Tree shortest path metrics are thus an example of what we 푋 → ℝ푚 with 푚 = 푂(휀−2 log 푛) such that for all 푥, 푦 ∈ 푋, would call a computationally friendly metric space for the 푘-median problem. Thus if 풳 admits an algorithmically ‖푥 − 푦‖2 ≤ ‖푓(푥) − 푓(푦)‖2 ≤ (1 + 휀)‖푥 − 푦‖2. (3) efficient embedding into a tree metric with some small dis- In fact, all proofs of the JL lemma show that 푓 can be tortion 퐷, we can obtain a fast 퐷-approximation algorithm taken as a linear map. The various known proofs of the JL for 푘-median on 풳 (i.e., achieving a clustering cost that lemma all identify a distribution Γ over ℝ푚×푑 such that if is at most a factor 퐷 larger than optimal) by first embed- one draws a random Π ∼ Γ, then 푓(푥) = Π푥 satisfies equa- ding our original metric into some tree 푇 and then solv- tion (3) with high probability. In the original proof [JL84], ing 푘-median exactly in 푇. In fact it has been shown by Γ was taken as a scaled orthogonal projection onto a ran- Fakcharoenphol et al., following previous work of Bartal, dom 푚-dimensional subspace of ℝ푑 (and hence their tech- that any 푛-point metric space embeds into a distribution nique is often called the random projection method), though over tree metrics with distortion 푂(log 푛). We will not dis- since then several other distributions have been shown to cuss here what distortion means for probabilistic embed- provide a similar guarantee. dings into a distribution over target spaces, but to make our Hearing of such a result naturally inspires certain case for the embedding paradigm it suffices to point out follow-up questions. Is low-distortion dimension reduc- that these results implied the first ever polynomial time tion possible in other normed spaces, e.g., ℓ푝 for 푝 ≠ 2? algorithms for 푘-median computation in arbitrary metric Is the 푚 = 푂(휀−2 log 푛) bound in the JL lemma the best spaces with approximation factor at most polylogarithmic possible? Is it possible to obtain a distribution Γ provid- in 푛. ing the JL lemma as mentioned above such that Π ∼ Γ can In this article we focus on embeddings in which both be sampled using few random bits? Given that the stated the host and target spaces are normed spaces, in which case primary motivation of dimension reduction is algorithmic we can drop the scaling factor 푐 in equation (1). We even efficiency, just how fast can the mapping 푥 ↦ Π푥 be per- more specifically focus on the case when 풳, 풴 are finite- formed? dimensional subspaces of the same normed space 풵, and where 푑푖푚(풴) ≤ 푑푖푚(풳) so that 푓 provides us with the al- 1. Dimension Reduction in Other Spaces gorithmic advantage of dimension reduction. As one might Given the dimension reduction possible in Euclidean imagine, several algorithms for high-dimensional compu- space, one might wonder in what other spaces such a re- tational geometry problems have running times or mem- sult is possible. A negative result was proven by Johnson ory requirements which grow (sometimes poorly) with the and Naor, who showed that at least for linear embeddings, dimension of the input. An example is the nearest neigh- spaces that enjoy dimension reduction as good as in the bor search data structural problem, in which one wants Euclidean case must themselves be nearly Euclidean. to preprocess a set of input points 푥 , … , 푥 ∈ ℝ푑 to cre- 1 푛 푍 ate a low-memory data structure 풟 such that later one can Theorem 1 ([JN10]). Suppose is normed space satisfying 푋 ⊂ 푍 |푋| = 푛 quickly identify the closest 푥 to some query point 푞 ∈ ℝ푑 the property that for every , , there exists a linear 푖 푓 ∶ 푍 → 퐸 푂(log 푛) by querying 풟.2 The best known algorithms for this prob- mapping for an -dimensional subspace 퐸 ⊂ 푍 푓 푂(1) 푋 lem with fast query time (in terms of 푛) either have run- such that has -distortion when restricted to . Then, every 푘-dimensional linear subspace of 푍 embeds into ning time or memory usage exponential in 푑 (see the dis- ∗ 2푂(log 푘) cussion in [HPIM12]). Euclidean space with distortion 2 . ∗ In the above, log 푚 is the number of times one must take the iterated logarithm of 푚, base two, to obtain a ∗ 22 1We use standard asymptotic notation. For functions 푓, 푔: 푓 = 푂(푔) if number which is at most 1. For example, log (22 ) = 4. lim sup |푓(푥)/푔(푥)| < ∞ 푓 = Ω(푔) 푔 = 푂(푓) 푓 = Θ(푔) ∗ 푥→∞ . if ; if both The key takeaway here is that log 푚 is a very slow-growing 푓 = 푂(푔) and 푓 = Ω(푔); 푓 = 표(푔) if lim푥→∞ 푓(푥)/푔(푥) = 0; and 푓 = 휔(푔) if 푔 = 표(푓). function, so that the distance to being Euclidean is small. 2Though specifically for the nearest neighbor problem, an embedding satisfying The theorem though does not preclude the existence of a weaker guarantee suffices for applications. some form of dimension reduction in spaces that are not NOVEMBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1499 nearly Euclidean. In particular, one can still shoot for di- Unfortunately, the above approach does not extend to mension reduction bounds that are 휔(log 푛), or potentially show that 푚 must grow by more than a constant factor achieve 푂(log 푛) target dimension with 푂(1) distortion via beyond log8 푛 as 휀 → 0. Subsequently, Alon showed the nonlinear embeddings. Several results exist showing that lower bound Ω(휀−2 log 푛/ log(1/휀)) for 휀 > 1/√푛. Roughly, some nontrivial dimension reduction in ℓ푝-spaces, for ex- the approach was to let 푋 be as above (0, together with ample, is possible. On the negative side, Brinkman and the simplex), and to again let 푓 be a low-distortion em- Charikar have shown that for an 푛-point set endowed with bedding as above with 푓(0) = 0.