Dimensionality Reduction for K-Means and Low Rank Approximation

dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques to accelerate algorithms for: ∙ k-means clustering ∙ principal component analysis (PCA) ∙ constrained low rank approximation 1 dimensionality reduction Replace large, high dimensional dataset with low dimensional sketch. d features d’ << d features A Ã n data points 2 ∙ Simultaneously improves runtime, memory requirements, communication cost, etc. dimensionality reduction Solution on sketch Ã should approximate original solution. 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 2 1.5 2 1 1.5 0.5 1 0.5 0 0 −0.5 −0.5 3 dimensionality reduction Solution on sketch Ã should approximate original solution. 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 2 1.5 2 1 1.5 0.5 1 0.5 0 0 −0.5 −0.5 ∙ Simultaneously improves runtime, memory requirements, communication cost, etc. 3 ∙ Choose k clusters to minimize total intra-cluster variance Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 k-means clustering ∙ Extremely common objective function for clustering 4 Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 k-means clustering ∙ Extremely common objective function for clustering ∙ Choose k clusters to minimize total intra-cluster variance 4 Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 k-means clustering ∙ Extremely common objective function for clustering ∙ Choose k clusters to minimize total intra-cluster variance 4 Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 k-means clustering ∙ Extremely common objective function for clustering ∙ Choose k clusters to minimize total intra-cluster variance 4 Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 k-means clustering ∙ Extremely common objective function for clustering ∙ Choose k clusters to minimize total intra-cluster variance 4 k-means clustering ai µ Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 4 ∙ Several (1 + ϵ) and constant factor approximation algorithms. ∙ In practice: Lloyd’s heuristic (i.e. “the k-means algorithm”) with k-means++ initialization is used. O(log k) approximation guaranteed, typically performs much better. Dimensionality reduction can speed up any of these algorithms. ∙ NP-hard, even for fixed dimension d or fixed k. k-means clustering Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 5 ∙ Several (1 + ϵ) and constant factor approximation algorithms. ∙ In practice: Lloyd’s heuristic (i.e. “the k-means algorithm”) with k-means++ initialization is used. O(log k) approximation guaranteed, typically performs much better. Dimensionality reduction can speed up any of these algorithms. k-means clustering Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 ∙ NP-hard, even for fixed dimension d or fixed k. 5 ∙ In practice: Lloyd’s heuristic (i.e. “the k-means algorithm”) with k-means++ initialization is used. O(log k) approximation guaranteed, typically performs much better. Dimensionality reduction can speed up any of these algorithms. k-means clustering Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 ∙ NP-hard, even for fixed dimension d or fixed k. ∙ Several (1 + ϵ) and constant factor approximation algorithms. 5 Dimensionality reduction can speed up any of these algorithms. k-means clustering Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 ∙ NP-hard, even for fixed dimension d or fixed k. ∙ Several (1 + ϵ) and constant factor approximation algorithms. ∙ In practice: Lloyd’s heuristic (i.e. “the k-means algorithm”) with k-means++ initialization is used. O(log k) approximation guaranteed, typically performs much better. 5 k-means clustering Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 ∙ NP-hard, even for fixed dimension d or fixed k. ∙ Several (1 + ϵ) and constant factor approximation algorithms. ∙ In practice: Lloyd’s heuristic (i.e. “the k-means algorithm”) with k-means++ initialization is used. O(log k) approximation guaranteed, typically performs much better. Dimensionality reduction can speed up any of these algorithms. 5 P n k − k2 min i=1 ai µ(ai) 2 µ1 µ2 … µk a1 a1 a2 a2 a3 a3 A C(A) an-1 an-1 an an key observation k-means clustering == low rank approximation 6 µ1 µ2 … µk a1 a1 a2 a2 a3 a3 A C(A) an-1 an-1 an an key observation k-means clustering == low rank approximation P n k − k2 min i=1 ai µ(ai) 2 6 key observation k-means clustering == low rank approximation P n k − k2 min i=1 ai µ(ai) 2 µ1 µ2 … µk a1 µ1 a2 a2 a3 a3 A C(A) an-1 µ1 an an 6 key observation k-means clustering == low rank approximation P n k − k2 min i=1 ai µ(ai) 2 µ1 µ2 … µk a1 µ1 a2 a2 a3 µ2 A C(A) an-1 µ1 an an 6 key observation k-means clustering == low rank approximation P n k − k2 min i=1 ai µ(ai) 2 µ1 µ2 … µk a1 µ1 a2 µk a3 µ2 A C(A) an-1 µ1 an µk 6 key observation k-means clustering == low rank approximation P n k − k2 k − k2 min i=1 ai µ(ai) 2 = A C(A) F µ1 µ2 … µk a1 µ1 a2 µk a3 µ2 A C(A) an-1 µ1 an µk 6 key observation C(A) is actually a projection of A’s columns onto a rank k subspace [Boutsidis, Drineas, Mahoney, Zouzias ‘11] 7 key observation C(A) is actually a projection of A’s columns onto a rank k subspace [Boutsidis, Drineas, Mahoney, Zouzias ‘11] 1 1/√|C1| 1/√|C1| a1 µ1 √|C | 1 1/√|C | a µ 1 2 2 k √|Ck| 1/√|Ck| 1/√|Ck| a3 µ2 1 √|C2| A = C(A) 1 √|C1| 1 an-1 µ1 √|Ck| an µk 7 key observation C(A) is actually a projection of A’s columns onto a rank k subspace [Boutsidis, Drineas, Mahoney, Zouzias ‘11] 1 1/|C1| 1/|C1| a1 µ1 1/|C2| a2 µk 1 1/|C | k 1/|Ck| a3 µ2 1 A = C(A) 1 an-1 µ1 1 an µk 7 key observation C(A) is actually a projection of A’s columns onto a rank k subspace [Boutsidis, Drineas, Mahoney, Zouzias ‘11] 1 µ1 µ1 µ 2 = µk 1 µk µ2 1 C(A) 1 µ1 1 µk 7 key observation C(A) is actually a projection of A’s columns onto a rank k subspace [Boutsidis, Drineas, Mahoney, Zouzias ‘11] 1 1/√|C1| 1/√|C1| a1 µ1 √|C | 1 1/√|C | T a µ 1 2 2 k XC √|Ck| 1/√|Ck| 1/√|Ck| a3 µ2 1 √|C2| A = C(A) XC 1 √|C1| 1 an-1 µ1 √|Ck| an µk 7 ∙ General form for constrained low rank approximation. ∙ Set S = {All rank k orthonormal matrices} for principal component analysis/singular value decomposition. key observation Xn k − k2 ) k − > k2 min ai µ (C[ai]) 2 = min A XX A F C rank(X)=k;X2S i=1 Where X is a rank k orthonormal matrix and for k-means S is the set of all clustering indicator matrices. 8 ∙ General form for constrained low rank approximation. key observation Xn k − k2 ) k − > k2 min ai µ (C[ai]) 2 = min A XX A F C rank(X)=k;X2S i=1 Where X is a rank k orthonormal matrix and for k-means S is the set of all clustering indicator matrices. ∙ Set S = {All rank k orthonormal matrices} for principal component analysis/singular value decomposition. 8 key observation Xn k − k2 ) k − > k2 min ai µ (C[ai]) 2 = min A XX A F C rank(X)=k;X2S i=1 Where X is a rank k orthonormal matrix and for k-means S is the set of all clustering indicator matrices. ∙ Set S = {All rank k orthonormal matrices} for principal component analysis/singular value decomposition. ∙ General form for constrained low rank approximation. 8 k − > k2 ≈ k − > k2 for all rank k X; Ã XX Ã F A XX A F 2 2 A − XXT A ≈ Ã − XXT Ã F F d O(k) Projection-Cost Preserving Sketch so what? k − > k2 Can solve any problem like minrank(X)=k;X2S A XX A F if: 9 2 2 A − XXT A ≈ Ã − XXT Ã F F d O(k) Projection-Cost Preserving Sketch so what? k − > k2 Can solve any problem like minrank(X)=k;X2S A XX A F if: k − > k2 ≈ k − > k2 for all rank k X; Ã XX Ã F A XX A F 9 2 2 A − XXT A ≈ Ã − XXT Ã F F d O(k) Projection-Cost Preserving Sketch so what? k − > k2 Can solve any problem like minrank(X)=k;X2S A XX A F if: k − > k2 ≈ k − > k2 for all rank k X; Ã XX Ã F A XX A F 9 2 2 A − XXT A ≈ Ã − XXT Ã F F d O(k) so what? k − > k2 Can solve any problem like minrank(X)=k;X2S A XX A F if: k − > k2 ≈ k − > k2 for all rank k X; Ã XX Ã F A XX A F Projection-Cost Preserving Sketch 9 If we find an X that gives: > k − > k2 ≤ k − f k2 Ã XX Ã F γ Ã XXoptÃ F then: k − > k2 ≤ · k − > k2 A XX A F γ (1 + ϵ) A (XX)optA F See coresets of Feldman, Schmidt, Sohler ‘13.

Dimensionality Reduction for K-Means and Low Rank Approximation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support