Spectral Curvature Clustering (SCC)

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Int J Comput Vis (2009) 81: 317–330 DOI 10.1007/s11263-008-0178-9 Spectral Curvature Clustering (SCC) Guangliang Chen · Gilad Lerman Received: 15 November 2007 / Accepted: 12 September 2008 / Published online: 10 December 2008 © The Author(s) 2008. This article is published with open access at Springerlink.com Abstract This paper presents novel techniques for im- 1 Introduction proving the performance of a multi-way spectral clustering framework (Govindu in Proceedings of the 2005 IEEE We address the problem of hybrid linear modeling. Roughly Computer Society Conference on Computer Vision and Pat- speaking, we assume a data set that can be well approxi- tern Recognition (CVPR’05), vol. 1, pp. 1150–1157, 2005; mated by a mixture of affine subspaces, or equivalently, flats, Chen and Lerman, 2007, preprint in the supplementary web- and wish to estimate the parameters of each of the flats as well as the membership of the given data points associated page) for segmenting affine subspaces. Specifically, it sug- with them. More precise formulations of this problem ap- gests an iterative sampling procedure to improve the uni- pear in Ma et al. (2008) and Chen and Lerman (2007). form sampling strategy, an automatic scheme of inferring There are many algorithms that can be applied to this the tuning parameter from the data, a precise initialization problem. Some of them emphasize modeling of the under- procedure for K-means, as well as a simple strategy for iso- lying flats and then use the models to infer the clusters (see lating outliers. The resulting algorithm, Spectral Curvature e.g., Independent Component Analysis (Hyvärinen and Oja Clustering (SCC), requires only linear storage and takes lin- 2000), Subspace Separation (Kanatani 2001, 2002), Gen- ear running time in the size of the data. It is supported by eralized Principal Component Analysis (GPCA) (Vidal et theory which both justifies its successful performance and al. 2005;Maetal.2008)). A few others address the clus- guides our practical choices. We compare it with other exist- tering part and then use its output to estimate the parame- ing methods on a few artificial instances of affine subspaces. ters of the underlying flats (see e.g., Multi-way Cluster- Application of the algorithm to several real-world problems ing algorithms (Agarwal et al. 2005, 2006; Govindu 2005; is also discussed. Shashua et al. 2006), Tensor Voting (Medioni et al. 2000), k-Manifolds (Souvenir and Pless 2005), Grassmann Cluster- ing (Gruber and Theis 2006), Poisson Mixture Model (Haro · Keywords Hybrid linear modeling Multi-way spectral et al. 2006)). There are also algorithms that iterate be- · · · clustering Polar curvature Iterative sampling Motion tween the two components of data clustering and subspace · segmentation Face clustering modeling (see e.g., Mixtures of PPCA (MoPPCA) (Tip- ping and Bishop 1999), K-Subspaces (Ho et al. 2003)/k- Planes (Bradley and Mangasarian 2000; Tseng 1999)). This work was supported by NSF grant #0612608. In this paper we mainly focus on the special case of hy- Supplementary webpage: http://www.math.umn.edu/~lerman/scc/. brid linear modeling where all the flats have the same dimension d ≥ 0. We emphasize the clustering component, G. Chen · G. Lerman () School of Mathematics, University of Minnesota, 127 Vincent and thus refer to this special case as d-flats clustering.We Hall, 206 Church Street SE, Minneapolis, MN 55455, USA follow Govindu’s framework of multi-way spectral cluster- e-mail: [email protected] ing (Govindu 2005) and Ng et al.’s framework of spectral G. Chen clustering (Ng et al. 2002). In our setting, the former frame- e-mail: [email protected] work starts by assigning to any d + 2 points in the data 318 Int J Comput Vis (2009) 81: 317–330 an affinity measure quantifying d-dimensional flatness, thus Vd+1(z1,...,zd+2) the volume of the (d + 1)-simplex forming a (d +2)-way affinity tensor. Next, it obtains a simi- formed by these points. The polar sine at each vertex zi larity matrix by decomposing the affinity tensor so that spec- is tral clustering methods can be applied. psin (z ,...,z + ) However, there are critical issues associated with this zi 1 d 2 framework that need to be thoroughly addressed. First of (d + 1)!·V + (z ,...,z + ) = d 1 1 d 2 , 1 ≤ i ≤ d + 2. (1) all, as the size of the data and the intrinsic dimension d 1≤j≤d+2 zj − zi increase, it is computationally prohibitive to calculate or j=i store, not to mention process, the affinity tensor. Approx- The polar curvature of the d + 2 points is defined as fol- imating this tensor by uniformly sampling a small subset lows (Chen and Lerman 2007; Lerman and Whitehouse of its “fibers” (Govindu 2005) is insufficient for large d 2008d): and data of moderate size. Better numerical techniques have cp(z1,...,zd+2) = diam({z1,...,zd+2}) to be developed while maintaining both reasonable performance and fast speed. Second, the multi-way affinities con- d+2 2 tain a tuning parameter which crucially affects clustering. It × + psinzi (z1,...,zd 2) , (2) is not clear how to select its optimal value while avoiding an i=1 exhaustive search. There are also smaller issues, e.g., how to where diam(S) denotes the diameter of the set S. We re- deal with outliers. mark that when d = 0, the polar curvature coincides with Our algorithm, Spectral Curvature Clustering (SCC), the Euclidean distance. provides specific solutions to the above issues. More specif- It is shown in Lerman and Whitehouse (2008d) (follow- ically, it contributes to the advancement of multi-way spec- ing the methods of Lerman and Whitehouse 2008a, 2008b, tral clustering in the following aspects. 2008c) that under certain conditions the least squares error • It introduces an iterative sampling procedure to signifi- of approximating certain probability measures μ by d-flats 2 d+2 cantly improve accuracy over the standard random sam- is comparable to the average of cp (with respect to μ ). pling scheme used in Govindu (2005) (see Sect. 3.1.1). This observation is used in the theoretical analysis of the • It suggests an automatic way of estimating the tuning pa- SCC algorithm (Chen and Lerman 2007). rameter commonly used in multi-way spectral clustering 2.2 The Affinity Tensor and its Matrix Representation methods (see Sect. 3.1.2). • It employs an efficient way of applying K-means in its D We assume a data set X ={x1, x2,...,xN } in R sampled setting (see Sect. 3.1.3). from a collection of Kd-flats (possibly with noise and out- • It proposes a simple strategy to isolate outliers while clus- liers), where K>1 and N is large. Using the above polar tering flats (see Sect. 3.4). curvature cp and a fixed constant σ>0, we construct the fol- + Careful analysis of the theoretical performance of the lowing multi-way affinity for any d 2 points xi1 ,...,xid+2 SCC algorithm appears in Chen and Lerman (2007). in X: The rest of the paper is organized as follows. In Sect. 2 A(i1,...,id+2) we first introduce our multi-way affinities, and then review −c2(x ,...,x )/( σ 2) p i1 id+2 2 ; a theoretical version of the SCC algorithm (Chen and Ler- = e , if i1,...,id+2 are distinct man 2007). Section 3 discusses various techniques that are 0, otherwise. used to make the theoretical version practical, and the SCC (3) algorithm is formulated incorporating these techniques. We We will explain in Sect. 3.1.2 how to select the optimal value compare our algorithm with other competing methods using of the tuning parameter σ . various kinds of artificial data sets as well as several real- Equation (3) defines a (d + 2)-way tensor A of size world applications in Sect. 4. Section 5 concludes with a N × N ×···×N, but we will only use a matrix representa- brief discussion and possible avenues for future work. tion of A, which we denote by A and call the affinity matrix. The size of A is N × N d+1. For each 1 ≤ i ≤ N,theith row of the matrix A (i.e., A(i, :)) is expanded from the ith slice 2 Background of the tensor A (i.e., A(i, :,...,:)) following some arbitrary but fixed order, e.g., the lexicographic order, of the last d +1 2.1 Polar Curvature indices (see e.g., Bader and Kolda 2004, Fig. 2). This ordering is not important to us, since what we really need is the Let d and D be integers such that 0 ≤ d<D. For any product AA (see Algorithm 1 below), which is independent D d + 2 distinct points z1,...,zd+2 in R , we denote by of such ordering. Int J Comput Vis (2009) 81: 317–330 319 2.3 The SCC Algorithm in Theory where Fk is the OLS d-flat approximating Ck (obtained by Principal Component Analysis (PCA)), and dist(x,Fk) de- The Theoretical Spectral Curvature Clustering (TSCC) al- notes the orthogonal distance from x to Fk. In situations gorithm (Chen and Lerman 2007) is presented below (Algo- where we know the true membership of the data points, we rithm 1) for solving the d-flats clustering problem. also compute the percentage of misclassified points. That is, # of misclassified points e = · 100%. (5) Algorithm 1 Theoretical Spectral Curvature Clustering % N (TSCC) Input: X ={x , x ,...,x }⊂RD: data, d: dimension, K: 1 2 N 3 The SCC Algorithm number of d-flats, σ : tuning parameter.

Spectral Curvature Clustering (SCC)

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support