Dimensionality Reduction for K-Means and Low Rank Approximation

Dimensionality Reduction for K-Means and Low Rank Approximation

dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques to accelerate algorithms for: ∙ k-means clustering ∙ principal component analysis (PCA) ∙ constrained low rank approximation 1 dimensionality reduction Replace large, high dimensional dataset with low dimensional sketch. d features d’ << d features A à n data points 2 ∙ Simultaneously improves runtime, memory requirements, communication cost, etc. dimensionality reduction Solution on sketch à should approximate original solution. 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 2 1.5 2 1 1.5 0.5 1 0.5 0 0 −0.5 −0.5 3 dimensionality reduction Solution on sketch à should approximate original solution. 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 2 1.5 2 1 1.5 0.5 1 0.5 0 0 −0.5 −0.5 ∙ Simultaneously improves runtime, memory requirements, communication cost, etc. 3 ∙ Choose k clusters to minimize total intra-cluster variance Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 k-means clustering ∙ Extremely common objective function for clustering 4 Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 k-means clustering ∙ Extremely common objective function for clustering ∙ Choose k clusters to minimize total intra-cluster variance 4 Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 k-means clustering ∙ Extremely common objective function for clustering ∙ Choose k clusters to minimize total intra-cluster variance 4 Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 k-means clustering ∙ Extremely common objective function for clustering ∙ Choose k clusters to minimize total intra-cluster variance 4 Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 k-means clustering ∙ Extremely common objective function for clustering ∙ Choose k clusters to minimize total intra-cluster variance 4 k-means clustering ai µ Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 4 ∙ Several (1 + ϵ) and constant factor approximation algorithms. ∙ In practice: Lloyd’s heuristic (i.e. “the k-means algorithm”) with k-means++ initialization is used. O(log k) approximation guaranteed, typically performs much better. Dimensionality reduction can speed up any of these algorithms. ∙ NP-hard, even for fixed dimension d or fixed k. k-means clustering Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 5 ∙ Several (1 + ϵ) and constant factor approximation algorithms. ∙ In practice: Lloyd’s heuristic (i.e. “the k-means algorithm”) with k-means++ initialization is used. O(log k) approximation guaranteed, typically performs much better. Dimensionality reduction can speed up any of these algorithms. k-means clustering Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 ∙ NP-hard, even for fixed dimension d or fixed k. 5 ∙ In practice: Lloyd’s heuristic (i.e. “the k-means algorithm”) with k-means++ initialization is used. O(log k) approximation guaranteed, typically performs much better. Dimensionality reduction can speed up any of these algorithms. k-means clustering Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 ∙ NP-hard, even for fixed dimension d or fixed k. ∙ Several (1 + ϵ) and constant factor approximation algorithms. 5 Dimensionality reduction can speed up any of these algorithms. k-means clustering Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 ∙ NP-hard, even for fixed dimension d or fixed k. ∙ Several (1 + ϵ) and constant factor approximation algorithms. ∙ In practice: Lloyd’s heuristic (i.e. “the k-means algorithm”) with k-means++ initialization is used. O(log k) approximation guaranteed, typically performs much better. 5 k-means clustering Xn k − k2 min Cost(A; C) = ai µ (C[ai]) 2 C i=1 ∙ NP-hard, even for fixed dimension d or fixed k. ∙ Several (1 + ϵ) and constant factor approximation algorithms. ∙ In practice: Lloyd’s heuristic (i.e. “the k-means algorithm”) with k-means++ initialization is used. O(log k) approximation guaranteed, typically performs much better. Dimensionality reduction can speed up any of these algorithms. 5 P n k − k2 min i=1 ai µ(ai) 2 µ1 µ2 … µk a1 a1 a2 a2 a3 a3 A C(A) an-1 an-1 an an key observation k-means clustering == low rank approximation 6 µ1 µ2 … µk a1 a1 a2 a2 a3 a3 A C(A) an-1 an-1 an an key observation k-means clustering == low rank approximation P n k − k2 min i=1 ai µ(ai) 2 6 key observation k-means clustering == low rank approximation P n k − k2 min i=1 ai µ(ai) 2 µ1 µ2 … µk a1 µ1 a2 a2 a3 a3 A C(A) an-1 µ1 an an 6 key observation k-means clustering == low rank approximation P n k − k2 min i=1 ai µ(ai) 2 µ1 µ2 … µk a1 µ1 a2 a2 a3 µ2 A C(A) an-1 µ1 an an 6 key observation k-means clustering == low rank approximation P n k − k2 min i=1 ai µ(ai) 2 µ1 µ2 … µk a1 µ1 a2 µk a3 µ2 A C(A) an-1 µ1 an µk 6 key observation k-means clustering == low rank approximation P n k − k2 k − k2 min i=1 ai µ(ai) 2 = A C(A) F µ1 µ2 … µk a1 µ1 a2 µk a3 µ2 A C(A) an-1 µ1 an µk 6 key observation C(A) is actually a projection of A’s columns onto a rank k subspace [Boutsidis, Drineas, Mahoney, Zouzias ‘11] 7 key observation C(A) is actually a projection of A’s columns onto a rank k subspace [Boutsidis, Drineas, Mahoney, Zouzias ‘11] 1 1/√|C1| 1/√|C1| a1 µ1 √|C | 1 1/√|C | a µ 1 2 2 k √|Ck| 1/√|Ck| 1/√|Ck| a3 µ2 1 √|C2| A = C(A) 1 √|C1| 1 an-1 µ1 √|Ck| an µk 7 key observation C(A) is actually a projection of A’s columns onto a rank k subspace [Boutsidis, Drineas, Mahoney, Zouzias ‘11] 1 1/|C1| 1/|C1| a1 µ1 1/|C2| a2 µk 1 1/|C | k 1/|Ck| a3 µ2 1 A = C(A) 1 an-1 µ1 1 an µk 7 key observation C(A) is actually a projection of A’s columns onto a rank k subspace [Boutsidis, Drineas, Mahoney, Zouzias ‘11] 1 µ1 µ1 µ 2 = µk 1 µk µ2 1 C(A) 1 µ1 1 µk 7 key observation C(A) is actually a projection of A’s columns onto a rank k subspace [Boutsidis, Drineas, Mahoney, Zouzias ‘11] 1 1/√|C1| 1/√|C1| a1 µ1 √|C | 1 1/√|C | T a µ 1 2 2 k XC √|Ck| 1/√|Ck| 1/√|Ck| a3 µ2 1 √|C2| A = C(A) XC 1 √|C1| 1 an-1 µ1 √|Ck| an µk 7 ∙ General form for constrained low rank approximation. ∙ Set S = {All rank k orthonormal matrices} for principal component analysis/singular value decomposition. key observation Xn k − k2 ) k − > k2 min ai µ (C[ai]) 2 = min A XX A F C rank(X)=k;X2S i=1 Where X is a rank k orthonormal matrix and for k-means S is the set of all clustering indicator matrices. 8 ∙ General form for constrained low rank approximation. key observation Xn k − k2 ) k − > k2 min ai µ (C[ai]) 2 = min A XX A F C rank(X)=k;X2S i=1 Where X is a rank k orthonormal matrix and for k-means S is the set of all clustering indicator matrices. ∙ Set S = {All rank k orthonormal matrices} for principal component analysis/singular value decomposition. 8 key observation Xn k − k2 ) k − > k2 min ai µ (C[ai]) 2 = min A XX A F C rank(X)=k;X2S i=1 Where X is a rank k orthonormal matrix and for k-means S is the set of all clustering indicator matrices. ∙ Set S = {All rank k orthonormal matrices} for principal component analysis/singular value decomposition. ∙ General form for constrained low rank approximation. 8 k − > k2 ≈ k − > k2 for all rank k X; à XX à F A XX A F 2 2 A − XXT A ≈ à − XXT à F F d O(k) Projection-Cost Preserving Sketch so what? k − > k2 Can solve any problem like minrank(X)=k;X2S A XX A F if: 9 2 2 A − XXT A ≈ à − XXT à F F d O(k) Projection-Cost Preserving Sketch so what? k − > k2 Can solve any problem like minrank(X)=k;X2S A XX A F if: k − > k2 ≈ k − > k2 for all rank k X; à XX à F A XX A F 9 2 2 A − XXT A ≈ à − XXT à F F d O(k) Projection-Cost Preserving Sketch so what? k − > k2 Can solve any problem like minrank(X)=k;X2S A XX A F if: k − > k2 ≈ k − > k2 for all rank k X; à XX à F A XX A F 9 2 2 A − XXT A ≈ à − XXT à F F d O(k) so what? k − > k2 Can solve any problem like minrank(X)=k;X2S A XX A F if: k − > k2 ≈ k − > k2 for all rank k X; à XX à F A XX A F Projection-Cost Preserving Sketch 9 If we find an X that gives: > k − > k2 ≤ k − f k2 à XX à F γ à XXoptà F then: k − > k2 ≤ · k − > k2 A XX A F γ (1 + ϵ) A (XX)optA F See coresets of Feldman, Schmidt, Sohler ‘13.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    104 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us