Co-Clustering of Directed Graphs Via
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) CoDiNMF: Co-Clustering of Directed Graphs via NMF Woosang Lim Rundong Du Haesun Park School of Computational Science School of Computational Science School of Computational Science and Engineering and Engineering and Engineering Georgia Institute of Technology, Georgia Institute of Technology, Georgia Institute of Technology, Atlanta, GA 30332, USA Atlanta, GA 30332, USA Atlanta, GA 30332, USA [email protected] [email protected] [email protected] Abstract co-clustering (Dhillon, Mallela, and Modha 2003), multi- view co-clustering (Sun et al. 2015), co-clustering based Co-clustering computes clusters of data items and the related on spectral approaches (Wu, Benson, and Gleich 2016; features concurrently, and it has been used in many appli- Rohe, Qin, and Yu 2016), co-clustering based on NMF cations such as community detection, product recommenda- tion, computer vision, and pricing optimization. In this paper, (Long, Zhang, and Yu 2005; Wang et al. 2011). However, we propose a new co-clustering method, called CoDiNMF, most of co-clustering methods assume that the connections which improves the clustering quality and finds directional between entities are symmetric or undirected, but many in- patterns among co-clusters by using multiple directed and teractions in real networks are asymmetric or directed. For undirected graphs. We design the objective function of co- example, the data set of patents and words contains a cita- clustering by using min-cut criterion combined with an addi- tion network among patents, and it can be represented as tional term which controls the sum of net directional flow be- a directed and asymmetric graph. For development of phi- tween different co-clusters. In addition, we show that a vari- losophy, there are influence flow from one philosopher or ant of Nonnegative Matrix Factorization (NMF) can solve the philosophy school to other philosophers or schools respec- proposed objective function effectively. We run experiments tively, and these relationships can be directed and asymmet- on the US patents and BlogCatalog data sets whose ground truth have been known, and show that CoDiNMF improves ric. Rohe et al. recently proposed a spectral co-clustering clustering results compared to other co-clustering methods in method for directed networks, called DI-SIM, which uses terms of average F1 score, Rand index, and adjusted Rand the asymmetric regularized graph Laplacian for directed index (ARI). Finally, we compare CoDiNMF and other co- graph (Rohe, Qin, and Yu 2016). DI-SIM computes the left clustering methods on the Wikipedia data set of philosophers, and right singular vectors of regularized graph Laplacian to and we can find meaningful directional flow of influence generate two lower-dimensional representations for sending among co-clusters of philosophers. and receiving nodes. Then, by using an asymmetry score, it discovers the asymmetries in the relationships and describe Introduction the directional communities. However, DI-SIM can only be applied to the data sets which consist of one kind of entity Clustering is an essential task in unsupervised learning distinguished by sending and receiving roles. which finds the inherent relationships and group structures in the data sets. In particular, co-clustering simultaneously In this paper, we propose a new NMF-based co-clustering computes co-clusters which consist of both features and method, called CoDiNMF, which computes co-clusters by data items or higher order entities at the same time by ex- using both directed and undirected relationships to improve ploiting the dualities. Since co-clustering is effective in an- co-clustering quality. Especially, CoDiNMF is able to find alyzing the dyadic or higher-order relationships compared directional relationships among co-clusters by controlling to the traditional one-way clustering, it has many applica- the net directional flow between different co-clusters. In the tions such as text mining (Dhillon 2001; Dhillon, Mallela, later sections, we will derive CoDiNMF, and compare it with and Modha 2003), bioinformatics (Cho et al. 2004), prod- other NMF-based co-clustering in terms of accuracy on the uct recommendation (Vlachos et al. 2014), pricing optimiza- US patents and BlogCatalog data sets. We will also demon- tion (Zhu, Yang, and He 2015), sense discovery (Chen et al. strate its ability for finding directional relationships on the 2015), and community detection (Rohe, Qin, and Yu 2016). Wikipedia data set of philosophers. Dhillon (Dhillon 2001) suggested Spectral Co-clustering (SCo) to compute co-clusters by using the min-cut prob- lem of a bipartite graph with documents and words as Related Work parts. Since then, co-clustering has been studied with dif- ferent ways over the last decades; information theoretic In this section, we briefly discuss some of the existing Copyright c 2018, Association for the Advancement of Artificial co-clustering methods in literature including spectral co- Intelligence (www.aaai.org). All rights reserved. clustering, and NMF based co-clustering. 3611 Spectral Co-clustering tering asymmetries on some data sets, it can be applied only Let X be the m × n data matrix for the set of features to the special case of data set which represents sending and receiving relationships among the nodes of the same kind of A = {a1, ..., am} and the set of data items B = {b1, ..., bn}. For a document data set, A is the set of words, B is the set entities. X A of documents, and is the weight matrix between and NMF-based Co-clustering B. Dhillon suggested Spectral Co-clustering (SCo) which constructs graph Laplacian by using a bipartite graph rep- Non-negative Matrix Factorization (NMF) and its variants 0 X have been used for clustering (Xu, Liu, and Gong 2003; resented by the edge weight matrix M = to X 0 Kim and Park 2011; Kuang, Yun, and Park 2015; Kuang and Park 2013; Du et al. 2017a), and NMF has the advantage of computes low dimensional embedding, and it runs k-means providing intuitive interpretation for the result. Non-negative clustering on the computed embedding to find k co-clusters Block Value Decomposition (NBVD) was proposed to ana- (Dhillon 2001). More specifically, the objective function of lyze the latent block structure of the non-negative data ma- co-clustering for bipartition is designed to solve min-cut trix based on Nonnegative Matrix Tri-Factorization (NMTF) problem defined with a partition vector q and the weight (Long, Zhang, and Yu 2005), and its objective function is η2 matrix M, where qi = if the i-th row/column of η1 2 minimize X − WTHF subject to W, T, H ≥ 0, (1) η1 W,T,H M qi = − i belongs to the first cluster and η2 if the -th row/column of M belongs to the second cluster, ηi is the where X is the data matrix defined in the previous section, sum of edge weights of the nodes in the i-th cluster, and the W is m × c, T is c × l, and H is l × n. T is called as block plus or minus sign of qi plays an important role in assigning value matrix which can be considered as a compact repre- nodes to clusters. Since the problem of solving the proposed sentation of X. W and H are coefficient matrices for row objective function is NP-complete, SCo was suggested. It and column clusters, respectively. To solve Eqn (1), they de- computes the second left and right singular vectors of nor- rived an algorithm that iteratively updates the decomposition −1/2 −1/2 by using a set of multiplicative updating rules. malized weight matrix Xn = D1 XD2 , and construct m+n Wang et al., proposed Locality Preserved Fast Nonnega- vector y ∈ R as an approximation of q, where D1 and tive Matrix Tri-Factorization (LP-FNMTF) for co-clustering D2 are diagonal matrices such that D1(i, i)= j Xi,j, and which uses two undirected graphs GA of features and GB of D2(i, i)= Xj,i k j . Finally, it runs -means algorithm on data points in addition to the bipartite graph of X (Wang y the 1-dimensional to compute co-clusters. For multiparti- et al. 2011), where the undirected graphs GA and GB can tioning, it uses = log2 k singular vectors of Xn to con- X (m+n)× be either generated from or obtained from prior knowl- struct a matrix Y ∈ R and runs k-means on Y . edge (Wang et al. 2011). Thus, the objective function of LP- Wu et al. extended the spectral co-clustering method for FNMTF in Eqn (2) has two terms in addition to Eqn (1) the tensor data sets and proposed a method called Gen- to consider the manifold structures in both data and feature eral Tensor Spectral Co-clustering (GTSC) (Wu, Benson, space, and the constraints in Eqn (1) and Eqn (2) are differ- and Gleich 2016). First, GTSC extends the original rectan- ent. gle data tensor to a square and symmetric extended tensor. 2 minimize X − WSHF (2) Next, it constructs a Markov chain by using the transition W,S,H matrix and the super-spacey stationary vector, and computes T T the second leading real-valued eigenvector of Markov chain. +α tr(W (I − LA)W )+β tr(H(I − LB)H ) At the final step, it recursively applies sweep cut for the re- subject to W ∈ Ψm×d,H ∈ Ψc×n, cursive bisection procedures for clustering. Although GTSC extended spectral co-clustering, it still uses undirected and where LA and LB are normalized graph Laplacians s.t. −1/2 −1/2 −1/2 −1/2 symmetric connections between entities. LA = DA GADA and LB = DB GBDB , and Rohe et al. recently proposed a new spectral co-clustering Ψ is a cluster indicator matrix.