Machine Learning Approaches to Link-Based Clustering Zhongfei (Mark) Zhang†, Bo Long‡, Zhen Guo†, Tianbing Xu†, and Philip S. Yu♮ Abstract We have reviewed several state-of-the-art machine learning approaches to different types of link based clustering in this chapter. Specifically, we have pre- sented the spectral clustering for heterogeneous relational data, the symmetric con- vex coding for homogeneous relational data, the citation model for clustering the special but popular homogeneous relational data – the textual documents with ci- tations, the probabilistic clustering framework on mixed membership for general relational data, and the statistical graphical model for dynamic relational cluster- ing. We have demonstrated the effectiveness of these machine learning approaches through empirical evaluations. 1 Introduction Link information plays an important role in discovering knowledge from data. For link-based clustering, machine learning approaches provide pivotal strengths to de- velop effective solutions. In this chapter, we review several specific machine learn- ing approachesto link-based clustering. We by no means mean that these approaches are exhaustive. Instead, our intention is to use these exemplar approaches to show- case the power of machine learning techniques to solve the general link-based clus- tering problem. When we say link-based clustering, we mean the clustering of relational data. In other words, links are the relations among the data items or objects. Consequently, Zhang, Guo, and Xu† Computer Science Department, SUNY Binghamton, e-mail: \{zhongfei,zguo,txu\}@cs. binghamton.edu Long‡ Yahoo! Labs, Yahoo! Inc., e-mail:
[email protected] Yu♮ Dept. of Computer Science, Univ. of Illinois at Chicago, e-mail:
[email protected] 1 2 Zhang, Long, Guo, Xu, and Yu in the rest of this chapter, we use the terminologies of link-based clustering and relational clustering exchangeably.