A Correlation-Based Adaptive Spectral Clustering Algorithm on Multi-Scale Data

A Correlation-Based Adaptive Spectral Clustering Algorithm on Multi-Scale Data

CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data Xiang Lix, Ben Kaox, Caihua Shanx, Dawei Yiny, Martin Esterz xThe University of Hong Kong, Pokfulam Road, Hong Kong yJD.com, Beijing, China zSimon Fraser University, Burnaby, BC, Canada x{xli2, kao, chshan}@cs.hku.hk [email protected] [email protected] ABSTRACT Similarity Graph Eigenvectors k-means We study the problem of applying spectral clustering to cluster matrix S Laplacian L multi-scale data, which is data whose clusters are of various sizes and densities. Traditional spectral clustering techniques discover clusters by processing a similarity matrix that reflects the proxim- Figure 1: Spectral clustering pipeline ity of objects. For multi-scale data, distance-based similarity is not mining [8] and information network analysis [17]. These are funda- effective because objects of a sparse cluster could be far apart while mental tasks that are at the cores of many applications and services, those of a dense cluster have to be sufficiently close. Following [16], such as text/media information retrieval systems, recommender we solve the problem of spectral clustering on multi-scale data by systems, and viral marketing. integrating the concept of objects’ “reachability similarity” with Given a set of objects X = fx1; x2; :::xn g and a similarity matrix a given distance-based similarity to derive an objects’ coefficient S such that each entry Sij represents the affinity between objects xi matrix. We propose the algorithm CAST that applies trace Lasso to and xj , standard spectral clustering methods first construct a graph regularize the coefficient matrix. We prove that the resulting coeffi- G = ¹X; Sº, where X denotes the set of vertices and Sij gives the cient matrix has the “grouping effect” and that it exhibits “sparsity”. weight of the edge that connects xi and xj . Then, the graph Lapla- We show that these two characteristics imply very effective spectral cian L of G is computed and eigen-decomposition is performed clustering. We evaluate CAST and 10 other clustering methods on 1 on matrix L to derive the k smallest eigenvectors fe1; e2; :::; ek g , a wide range of datasets w.r.t. various measures. Experimental re- where k is the desired number of clusters and ei is the i-th smallest sults show that CAST provides excellent performance and is highly eigenvector. These eigenvectors form a k ×n matrix, whose j-th col- robust across test cases of multi-scale data. umn is taken as the feature vector of object xj . (Essentially, objects are mapped into low-dimensional embeddings using the eigenvec- KEYWORDS tors.) Finally, a post-processing step, e.g., k-means, is applied on Spectral clustering; robustness; multi-scale data the objects with their feature vectors to return clusters. Figure 1 ACM Reference Format: illustrates the general pipeline of spectral clustering. Xiang Lix, Ben Kaox, Caihua Shanx, Dawei Yiny, Martin Esterz. 2020. CAST: Spectral clustering aims to optimize certain criterion that mea- A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale sures the quality of graph partitions. For example, the NCuts [29] Data. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge method minimizes the normalized cut between clusters, which mea- Discovery and Data Mining (KDD ’20), August 23–27, 2020, Virtual Event, CA, sures the weights of inter-cluster edges. Conventionally, objects’ USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3394486. affinity is given by some distance-based similarity. For multi-scale 3403086 data, which consists of object clusters of different sizes and densi- ties, distance-based similarity is often ineffective in capturing the arXiv:2006.04435v1 [cs.LG] 8 Jun 2020 1 INTRODUCTION correlations between objects [26, 38]. This leads to poor perfor- Cluster analysis is a fundamental task in machine learning and data mance of spectral methods. For example, Fig. 2(a) shows a dense mining, which seeks to group similar objects into same clusters and rectangular cluster located on top of a very sparse strip-shaped separate dissimilar objects into different clusters. Spectral cluster- cluster. Objects at different ends of the strip-shaped cluster are far ing, which transforms clustering into a graph partitioning problem, apart and hence their distance-based similarity is small. Fig. 2(b) has been shown to be effective in image segmentation [37], text shows the clustering given by NCuts, from which we see that the Permission to make digital or hard copies of all or part of this work for personal or strip-shaped cluster is incorrectly segmented. classroom use is granted without fee provided that copies are not made or distributed In [16], the ROSC algorithm was proposed to address the multi- for profit or commercial advantage and that copies bear this notice and the full citation scale data issue in spectral clustering. The idea is to rectify a given on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, distance-based similarity matrix S by deriving a coefficient matrix to post on servers or to redistribute to lists, requires prior specific permission and/or a Z that can better express the correlation among objects. Intuitively, fee. Request permissions from [email protected]. each entry Zij in Z represents how well an object xi characterizes KDD ’20, August 23–27, 2020, Virtual Event, CA, USA © 2020 Association for Computing Machinery. 1 ACM ISBN 978-1-4503-7998-4/20/08...$15.00 We say that an eigenvector ei is smaller than another eigenvector ej if ei ’s eigenvalue https://doi.org/10.1145/3394486.3403086 is smaller than that of ej ’s. independent, i.e., XT X = I (I is the identity matrix), the trace Lasso will behave like the `1-norm. In this paper we study spectral clustering over multi-scale data. We propose the Correlation-based Adaptive Spectral clustering method using Trace lasso, or CAST. We discuss how CAST takes advantage of the trace Lasso to achieve robust spectral clustering. (a) Dataset (b) NCuts We summarize our main contributions as follows. • We study the problem of applying spectral clustering on multi- Figure 2: (a) A multi-scale dataset, (b) clustering by NCuts scale data. We propose the CAST algorithm, which uses trace Lasso to construct and regularize a coefficient matrix Z. A correlation matrix that exhibits grouping effect and inter-cluster sparsity is another object xj , and two objects are considered highly correlated subsequently derived for effective and robust spectral clustering. (and thus should be put into the same cluster) if they give simi- • We mathematically prove that the derived matrix by CAST has lar characterization to other objects. The coefficient matrix Z is grouping effect. This ensures high intra-cluster object correlation. constructed based on the similarity matrix S as well as a transitive • We conduct extensive experiments to show the effectiveness of K-nearest-neighbor (TKNN) graph. Specifically, two objects xi and CAST. We compare CAST with 10 other methods w.r.t. various clus- xj are connected in the TKNN graph if there exists an object se- tering quality measures over a wide range of datasets. Our results quence < xi ; :::xj > such that adjacent objects in the sequence are show that CAST consistently provides very good performance over K-nearest-neighbors of each other. For example, objects that are lo- the range of datasets. It is thus a very robust algorithm especially cated at far ends of the strip-shaped cluster (Fig. 2) are connected by in handling multi-scale data. a chain of K-NN relations. An important property that was proven The rest of the paper is organized as follows. Section 2 introduces in [16] is that the matrix Z has the grouping effect [16, 24], which related works. In Section 3 we describe the ROSC algorithm, give states that if two objects are similar in terms of both S and TKNN formal definitions of some important concepts based on which our graph connectivity, their corresponding coefficient vectors in Z are algorithm is designed, and then present CAST. Section 4 presents also similar. Based on Z, ROSC constructs a new correlation matrix experimental results. Finally, Section 5 concludes the paper. Z˜. The grouping effect of Z ensures that highly correlated objects are grouped together by applying spectral clustering on Z˜. Besides expressing the correlation between objects of the same 2 RELATED WORK cluster, another important factor for correct clustering is to suppress Spectral clustering is a widely studied topic [2, 5, 6, 18, 34]. There the correlation between objects of different clusters. ROSC, however, are many works that study various aspects of spectral clustering, focuses on enhancing the former by deriving a coefficient matrix such as computational efficiency [4, 7, 35], clustering performance Z that amplifies intra-cluster correlation; it does not promote the on data with different characteristics [12, 33, 40], and the theoretic latter. Our objective is to study methods that deal with both factors. foundations of the method [15, 25, 27]. An introduction to spectral Specifically, our proposed algorithm CAST regularizes matrix Z so clustering is given in [13, 28, 32]. that it has grouping effect and it exhibits inter-cluster sparsity. By Despite the success of spectral clustering, previous works [26, 36] sparsity, we refer to the desired property that entries in the matrix have pointed out that spectral methods can be adversely affected that correspond to inter-cluster object pairs should be 0 or very when data is multi-scale. To address the problem, the self-tuning small, hence the matrix is sparse. spectral clustering method ZP [38] uses local scaling to extend a 2 One common approach to enforce sparsity is to apply `1 regu- j jxi −xj j j Gaussian kernel based similarity Sij = exp − 2 to Sij = larization on a solution matrix (i.e., by including the `1-norm as a 2σ 2 j jxi −xj j j penalty term in an optimization problem).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us