Robust and Efficient Computation of Eigenvectors in a Generalized Spectral Method for Constrained Clustering Chengming Jiang Huiqing Xie Zhaojun Bai University of California, Davis East China University of University of California, Davis
[email protected] Science and Technology
[email protected] [email protected] Abstract 1 INTRODUCTION FAST-GE is a generalized spectral method Clustering is one of the most important techniques for constrained clustering [Cucuringu et al., for statistical data analysis, with applications rang- AISTATS 2016]. It incorporates the must- ing from machine learning, pattern recognition, im- link and cannot-link constraints into two age analysis, bioinformatics to computer graphics. Laplacian matrices and then minimizes a It attempts to categorize or group date into clus- Rayleigh quotient via solving a generalized ters on the basis of their similarity. Normalized eigenproblem, and is considered to be sim- Cut [Shi and Malik, 2000] and Spectral Clustering ple and scalable. However, there are two [Ng et al., 2002] are two popular algorithms. unsolved issues. Theoretically, since both Laplacian matrices are positive semi-definite Constrained clustering refers to the clustering with and the corresponding pencil is singular, it a prior domain knowledge of grouping information. is not proven whether the minimum of the Here relatively few must-link (ML) or cannot-link Rayleigh quotient exists and is equivalent to (CL) constraints are available to specify regions an eigenproblem. Computationally, the lo- that must be grouped in the same partition or be cally optimal block preconditioned conjugate separated into different ones [Wagstaff et al., 2001]. gradient (LOBPCG) method is not designed With constraints, the quality of clustering could for solving the eigenproblem of a singular be improved dramatically.