Relational Clustering by Symmetric Convex Coding

Relational Clustering by Symmetric Convex Coding

Relational Clustering by Symmetric Convex Coding Bo Long [email protected] Zhongfei (Mark) Zhang [email protected] Computer Science Dept., SUNY Binghamton, Binghamton, NY 13902 Xiaoyun Wu [email protected] Google Inc, 1600 Amphitheatre, Mountain View, CA 94043 Philip S. Yu [email protected] IBM Watson Research Center, 19 skyline Drive, Hawthorne, NY 10532 Abstract applications, such as web mining, social network analysis, bioinformatics, VLSI design, and task scheduling. Further- Relational data appear frequently in many ma- more, the relational data are more general in the sense all chine learning applications. Relational data con- the feature data can be transformed into relational data un- sist of the pairwise relations (similarities or dis- der a certain distance function. similarities) between each pair of implicit ob- jects, and are usually stored in relation matri- The most popular way to cluster similarity-based relational ces and typically no other knowledge is avail- data is to formulate it as the graph partitioning problem, able. Although relational clustering can be for- which has been studied for decades. Graph partitioning mulated as graph partitioning in some applica- seeks to cut a given graph into disjoint subgraphs which correspond to disjoint clusters based on a certain edge cut tions, this formulation is not adequate for gen- objective. Recently, graph partitioning with an edge cut ob- eral relational data. In this paper, we propose a jective has been shown to be mathematically equivalent to general model for relational clustering based on an appropriate weighted kernel k-means objective function symmetric convex coding. The model is applica- (Dhillon et al., 2004; Dhillon et al., 2005). The assump- ble to all types of relational data and unifies the tion behind the graph partitioning formulation is that since existing graph partitioning formulation. Under the nodes within a cluster are similar to each other, they this model, we derive two alternative bound opti- form a dense subgraph. However, in general this is not true mization algorithms to solve the symmetric con- for relational data, i.e., the clusters in relational data are not vex coding under two popular distance functions, necessarily dense clusters consisting of strongly-related ob- Euclidean distance and generalized I-divergence. jects. Experimental evaluation and theoretical analysis Figure 1 shows the relational data of four clusters, show the effectiveness and great potential of the which are of two different types. In Figure 1, C1 = proposed model and algorithms. fv1; v2; v3; v4g and C2 = fv5; v6; v7; v8g are two tradi- tional dense clusters within which objects are strongly re- lated to each other. However, C3 = fv9; v10; v11; v12g and 1. Introduction C4 = fv13; v14; v15; v16g also form two sparse clusters, within which the objects are not related to each other, but Two types of data are used in unsupervised learning, fea- they are still ”similar” to each other in the sense that they ture and relational data. Feature data are in the form of are related to the same set of other nodes. In Web min- feature vectors and relational data consist of the pairwise ing, this type of cluster could be a group of music ”fans” relations (similarities or dissimilarities) between each pair Web pages which share the same taste on the music and of objects, and are usually stored in relation matrices and are linked to the same set of music Web pages but are not typically no other knowledge is available. Although feature linked to each other (Kumar et al., 1999). Due to the impor- data are the most common type of data, relational data have tance of identifying this type of clusters (communities), it become more and more popular in many machine learning has been listed as one of the five algorithmic challenges in Appearing in Proceedings of the 24 th International Conference Web search engines (Henzinger et al., 2003). Note that the on Machine Learning, Corvallis, OR, 2007. Copyright 2007 by cluster structure of the relation data in Figure 1 cannot be the author(s)/owner(s). correctly identified by graph partitioning approaches, since Relational Clustering by Symmetric Convex Coding they look for only dense clusters of strongly related objects 2 3 5 6 by cutting the given graph into subgraphs; similarly, the 1 4 7 8 pure bi-partite graph models cannot correctly identify this 9 10 11 12 type of cluster structures. Note that re-defining the rela- tions between the objects does not solve the problem in this 13 14 15 16 situation, since there exist both dense and sparse clusters. (a) (b) If the relational data are dissimilarity-based, to apply graph Figure 1. The graph (a) and relation matrix (b) of the relational partitioning approaches to them, we need extra efforts on data with different types of clusters. In (b), the dark color denotes appropriately transforming them into similarity-based data 1 and the light color denotes 0. and ensuring that the transformation does not change the cluster structures in the data. Hence, it is desirable for Leland, 1995; Karypis & Kumar, 1998). an algorithm to be able to identify the cluster structures no matter which type of relational data is given. This is Recently, graph partitioning with an edge cut objective even more desirable in the situation where the background has been shown to be mathematically equivalent to an knowledge about the meaning of the relations is not avail- appropriate weighted kernel k-means objective function able, i.e., we are given only a relation matrix and do not (Dhillon et al., 2004; Dhillon et al., 2005). Based on this know if the relations are similarities or dissimilarities. equivalence, the weighted kernel k-means algorithm has been proposed for graph partitioning (Dhillon et al., 2004; In this paper, we propose a general model for relational Dhillon et al., 2005). Yu et al. (2005) propose the graph- clustering based on symmetric convex coding of the re- factorization clustering for the graph partitioning, which lation matrix. The proposed model is applicable to the seeks to construct a bipartite graph to approximate a given general relational data consisting of only pairwise relations graph. Nasraoui et al. (1999) propose the relational fuzzy typically without other knowledge; it is capable of learning maximal density estimator algorithm. both dense and sparse clusters at the same time; it unifies the existing graph partition models to provide a generalized In this paper, our focus is on the homogeneous relational theoretical foundation for relational clustering. Under this data, i.e., the objects in the data are of the same type. There model, we derive iterative bound optimization algorithms are some efforts in the literature that can be considered to solve the symmetric convex coding for two important as clustering heterogeneous relational data, i.e., different distance functions, Euclidean distance and generalized I- types of objects are related to each other. For example, co- divergence. The algorithms are applicable to general rela- clustering addresses clustering two types of related objects, tional data and at the same time they can be easily adapted such as documents and words, at the same time. Dhillon to learn a specific type of cluster structure. For example, et al. (2003) propose a co-clustering algorithm to maximize when applied to learning only dense clusters, they provide the mutual information. A more generalized co-clustering new efficient algorithms for graph partitioning. The con- framework is presented by Banerjee et al. (2004) wherein vergence of the algorithms is theoretically guaranteed. Ex- any Bregman divergence can be used in the objective func- perimental evaluation and theoretical analysis show the ef- tion. Long et al. (2005), Li (2005) and Ding et al. (2006) fectiveness and great potential of the proposed model and all model the co-clustering as an optimization problem in- algorithms. volving a triple matrix factorization. 2. Related Work 3. Symmetric Convex Coding Graph partitioning (or clustering) is a popular formulation In this section, we propose a general model for relational of relational clustering, which divides the nodes of a graph clustering. Let us first consider the relational data in Fig- into clusters by finding the best edge cuts of the graph. ure 1. An interesting observation is that although the dif- Several edge cut objectives, such as the average cut (Chan ferent types of clusters look so different in the graph from et al., 1993), average association (Shi & Malik, 2000), nor- Figure 1(a), they all demonstrate block patterns in the re- malized cut (Shi & Malik, 2000), and min-max cut (Ding lation matrix of Figure 1(b) (without loss of generality, we et al., 2001), have been proposed. Various spectral algo- arrange the objects from the same cluster together to make rithms have been developed for these objective functions the block patterns explicit). Motivated by this observation, (Chan et al., 1993; Shi & Malik, 2000; Ding et al., 2001). we propose the Symmetric Convex Coding (SCC) model These algorithms use the eigenvectors of a graph affinity to cluster relational data by learning the block pattern of matrix, or a matrix derived from the affinity matrix, to par- a relation matrix. Since in most applications, the relations tition the graph. are of non-negative values and undirected, relational data can be represented as non-negative, symmetric matrices. Multilevel methods have been used extensively for graph Therefore, the definition of the SCC is given as follows. partitioning with the Kernighan-Lin objective, which at- tempt to minimize the cut in the graph while maintaining Definition 3.1. Given a symmetric matrix A 2 R+, a dis- equal-sized clusters (Bui & Jones, 1993; Hendrickson & tance function D and a positive number k, the symmetric Relational Clustering by Symmetric Convex Coding convex coding is given by the minimization, Theorem 3.2 states that with the prototype matrix B re- stricted to be of the form rIk, SCC under Euclidean dis- min D(A; CBCT ): (1) tance is reduced to the trace maximization in (3).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us