A Fast Kernel-based Multilevel Algorithm for Graph Clustering INDERJIT DHILLON,YUQIANG GUAN, AND BRIAN KULIS Data Mining Lab Department of Computer Science University of Texas at Austin

Austin, TX 78712 USA g finderjit, yguan, kulis @cs.utexas.edu

ABSTRACT 2. Graph Clustering 3.2 Initial Clustering Phase ¯ Quality and computation time of our multilevel methods compared with the benchmark spectral algorithm.

¯ 64 clusters 64 clusters

Graph clustering (also called graph partitioning) — clustering the nodes Eventually, the graph is coarsened to the point where very few 1.05 12

= ´Î E µ Î Given graph , consisting of a set of vertices , a set of edges spectral spectral nodes remain in the graph. mlkkm(0) mlkkm(0)

of a graph — is an important problem in diverse data mining applica- mlkkm(20) mlkkm(20) E and an affinity matrix whose entry represents similarity between 10 tions. Traditional approaches involve optimization of graph clustering

each pair of nodes, the goal of graph clustering is to partition the nodes ¯ We use the spectral algorithm of Yu and Shi [5], which we general- 8 objectives such as normalized cut or ratio association; spectral methods 1 such that between-cluster connection is weak and with-cluster connec- ize to work with arbitrary weights [3]. Thus our initial clustering is are widely used for these objectives, but they require eigenvector com- 6 tion is strong. “customized” for different graph clustering objectives. putation which can be slow. Recently, graph clustering with a general È

0.95 4

´A B µ = ´Aµ = ´A Î µ

Denote links and degree links . Here

¾A ¾B cut objective has been shown to be mathematically equivalent to an ap- Computation time in seconds ¯ Since the coarsest graph is significantly smaller than the input are some prominent graph clustering objectives. 2 propriate weighted kernel k-means objective function. In this paper, we

graph, spectral methods are adequate in terms of speed. Ratio association value (scaled by values of mlkkm(20)) 0.9 0 exploit this equivalence to develop a very fast multilevel algorithm for ¯ Ratio association: data 3elt uk add32 whitaker3 crack fe_4elt2 memplus bcsstk30 data 3elt uk add32 whitaker3 crack fe_4elt2 memplus bcsstk30 graph clustering. Multilevel approaches involve coarsening, initial par-

64 clusters 64 clusters

X 3.3 Refinement Phase 1.3 30

´Î Î µ titioning and refinement phases, all of which may be specialized to dif- links spectral spectral

mlkkm(0) mlkkm(0)

´µ = maximize

1.25

Î

Î mlkkm(20) mlkkm(20) ½

25 j

ferent graph clustering objectives. Unlike existing multilevel clustering jÎ

¯

We extend the clustering from to ½ as follows: if a supernode =½

approaches, such as METIS, our algorithm does not constrain the cluster 1.2

is in a cluster , then all nodes in formed from that supernode 20

sizes to be nearly equal. Our approach gives a theoretical guarantee that 1.15

¯ are in cluster . This yields an initial clustering for the graph, which Ratio cut: 1.1 15 the refinement step decreases the graph cut objective under considera- is then improved using a refinement algorithm. This process is re- 1.05

10

tion. Experiments show that we achieve better final objective function X

Computation time in seconds

Î Î Ò Î µ

´ pearted until the refinement algorithm is run on the original graph

links 1

´µ =

minimize

Î

values as compared to a state-of-the-art spectral clustering algorithm: on Î 5

½

¼ . j

jÎ 0.95

Normalized cut value (scaled by values of mlkkm(20)) =½ a series of benchmark test graphs with up to thirty thousand nodes and 0.9 0

data 3elt uk add32 whitaker3 crack fe_4elt2 memplus bcsstk30 data 3elt uk add32 whitaker3 crack fe_4elt2 memplus bcsstk30 one million edges, our algorithm achieves lower normalized cut values ¯ The refinement algorithm we use is weight kernel -means. In [3], in 67% of our experiments and higher ratio association values in 100% of ¯ Normalized cut: we proved that each of the graph clustering objectives given ear-

our experiments. Furthermore, on large graphs, our algorithm is signif- lier may be expressed as special cases of the weighted kernel k-

X

´Î Î Ò Î µ

icantly faster than spectral methods. Finally, our algorithm requires far links means objective with the correct choice of weights and kernel ma- 4.2 The IMDB Movie Data Set — A Case Study

´µ =

minimize

Î Î

½

´Î µ

less memory than spectral methods; we cluster a 1.2 million node movie degree trix. Hence, the weighted kernel k-means algorithm can be directly

=½ network into 5000 clusters, which due to memory requirements cannot used to locally optimize these graph clustering objectives. The IMDB contains information about movies, actors and movie events. be done directly with spectral methods. By connecting actors and movies or events in which they participate, we ¯ Normalized association:

´¼µ form a sparse undirected graph with approximately 1.2 million nodes

g f f g

=½ =½

Weighted Kernel kmeans( , , , , , ) and 7.6 million edges.

´µ = ´µ

Input: : kernel matrix, : number of clusters, : weights

¯

for each point, : optional maximum number of iterations, Normalized cut values (NCV) and ratio association values (RAV)

´¼µ

¯

5¼¼¼

f

Kernighan-Lin objective: g for obtaining clusters of the IMDB movie data set.

1. Introduction : optional initial clustering

f g

Output: : final clustering of the points

mlkkm(0) kkm METIS

X

Î Î Ò Î µ

What is graph clustering and what algorithms were used? ´

links 2308

´µ =

minimize NCV 4788 2643

Î Î ½

1. If no initial clustering is given, initialize the clusters

jÎ j

´¼µ ´¼µ

¯ Graph clustering, clustering the nodes of a graph, is an important RAV 18526 1349 12744

= ¼

½ randomly. Set .

problem in many domains such as circuit partitioning, VLSI design,

¯

jÎ j = jÎ j 8 = ½

subject to A Selection of Movies and Actors in Cluster 633

task scheduling, bioinformatics, social network analysis, etc. 2. For each row of and every cluster , compute

Movies È

Most common approach to optimize first 4 objectives is spectral algo- È

¯

¾

´µ µ

Various spectral algorithms have been developed for a number of ´

Harry Potter and the Sorcerer’s Stone (2001)

¾ ¾

rithm, where eigenvectors of an appropriate matrix are computed then a

È È

´ µ = · different graph clustering objective functions, such as normalized m

¾ Harry Potter and the Chamber of Secrets (2002)

µ ´

´µ ´µ

¾ ¾

postprocessing step is taken to derive the clustering from the eigenvec-

cut [4] and ratio cut [1]. Harry Potter and the Prisoner of Azkaban (2004) tors [1, 4, 5]. The last objetive has been used in multilevel algorithms.

Harry Potter and the Goblet of Fire (2005)

£

¯

´ µ ´µ = Multilevel methods, such as Metis, try to minimize edge cut but 3. Find argmin m , resolving ties arbitrarily. Harry Potter: Behind the Magic (2001 TV) explicitly restrict clusters to be of equal size. Compute the updated clusters as Harry Potter und die Kammer des Schreckens:

Das grobe RTL Special zum Film (2002 TV)

·½µ £

The theoretical fundation for this paper 3. Multilevel Approach ´

= f : ´µ = g Actors

¯ Recently we show that a wide class of graph clustering objec-

What is the difference between our algorithm and previous approaches? Daniel Radcliffe, Rupert Grint, Emma Watson, Tom Felton

= · ½

tives, including ratio cut, ratio association, the Kernighan-Lin ob- 4. If not converged or , set and go to Step 3;

·½µ

´ Peter Best, Sean Biggerstaff, Scott Fern, ,

¯

f g

jective and normalized cut, can all be viewed as special cases of the It works for a wide class of graph clustering objectives, and all three =½ Otherwise, stop and output final clusters . Harry Melling, , , Robert Pattinson weighted kernel k-means objective function [2, 3]. phases of the algorithm can be specialized for each graph clustering , , Edward Randell, Jamie Waylett objective. What is the main contribution in this paper? Shefali Chowdhury, , Bonnie Wright 4. Experiments Jamie Yeates, Chris Rankin, Joshua Herdman, Stanislav Ianevski ¯ In this paper, we develop a multilevel graph clustering algorithm 3.1 Coarsening Phase

that uses weighted kernel k-means as the refinement. The algo-

rithm removes the restriction that clusters be of equal size, a re- Given inital graph ¼ , we repeatedly transform it into smaller and 4.1 Clustering of Benchmark graphs References

striction all previous multilevel algorithms such as Metis have. smaller graphs ½ by combining a subset of node in into “su-

¯ The graphs

·½ pernodes” in . Specifically, [1] P.K. Chan, Martine Schlag, and J.Y. Zien. Spectral k-way ratio cut par-

¯ We show that our algorithm is significantly faster than spectral Graph name #nodes #edges Description titioning. IEEE Trans. CAD-Integrated Circuits and Systems, 13:1088–

clustering methods and on our benchmark test graphs of varying ¯ For node , we merge it with its neighbor if maximizes DATA 2851 15093 finite element mesh 1096, 1994.

sizes, the final objective function values of the multilevel algorithm 3ELT 4720 13722 finite element mesh

´ µ ´ µ

[2] Inderjit Dhillon, Yuqiang Guan, and Brian Kulis. Kernel k-means,

are better than the spectral algorithm in 100% of runs for ratio as- · UK 4824 6837 finite element mesh

´µ ´ µ sociation, and 67% of runs for normalized cut. ADD32 4960 9462 32-bit adder spectral clustering and normalized cuts. In Proc. 10th ACM KDD Con- ference

WHITAKER3 9800 28989 finite element mesh , 2004.

´ µ ¯ We cluster the 1.2 million node IMDB movie network into 5000 where corresponds to the edge weight between vertices

CRACK 10240 30380 finite element mesh

´µ clusters. Running a spectral algorithm directly on this data set and and is the weight of vertex . [3] Inderjit Dhillon, Yuqiang Guan, and Brian Kulis. A unified view of FE 4ELT2 11143 32818 finite element mesh

would require storage of 5000 eigenvectors, which is prohibitive. kernel k-means, spectral clustering and graph cuts. Technical Report

¯ ·½

The edge weights between two supernodes in are taken to be MEMPLUS 17758 54196 memory circuit TR-04-25, University of Texas at Austin, 2004. the sum of the edge weights between the nodes in comprising BCSSTK30 28294 1007284 stiffness matrix the supernodes. [4] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(8):888–905, August 2000. [5] S. X. Yu and J. Shi. Multiclass spectral clustering. In International Conference on Computer Vision, 2003.