A Clustering Method Based on Boosting

Pattern Recognition Letters 25 (2004) 641–654 www.elsevier.com/locate/patrec A clustering method based on boosting D. Frossyniotis a,*, A. Likas b, A. Stafylopatis a a School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., Zographou 15773, Athens, Greece b Department of Computer Science, University of Ioannina, 451 10 Ioannina, Greece Received 16 July 2003; received in revised form 1 December 2003 Abstract It is widely recognized that the boosting methodology provides superior results for classification problems. In this paper, we propose the boost-clustering algorithm which constitutes a novel clustering methodology that exploits the general principles of boosting in order to provide a consistent partitioning of a dataset. The boost-clustering algorithm is a multi-clustering method. At each boosting iteration, a new training set is created using weighted random sampling from the original dataset and a simple clustering algorithm (e.g. k-means) is applied to provide a new data partitioning. The final clustering solution is produced by aggregating the multiple clustering results through weighted voting. Experiments on both artificial and real-world data sets indicate that boost-clustering provides solutions of improved quality. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Ensemble clustering; Unsupervised learning; Partitions schemes 1. Introduction concerning the label of the cluster to which a training point actually belongs. The majority of Unlike classification problems, there are no clustering algorithms are based on the following established approaches that combine multiple four most popular clustering approaches: iterative clusterings. This problem is more difficult than square-error partitional clustering, hierarchical designing a multi-classifier system: in the classifi- clustering, grid-based clustering and density-based cation case it is straightforward whether a basic clustering (Halkidi et al., 2001; Jain et al., 2000). classifier (weak learner) performs well with respect Partitional methods can be further classified to a training point, while in the clustering case this into hard clustering methods, whereby each sam- task is difficult since there is a lack of knowledge ple is assigned to one and only one cluster, and soft clustering methods, whereby each sample can be associated (in some sense) with several clusters. * Corresponding author. Tel.: +30-210-7722508; fax: +30- The most commonly used partitional clustering 210-7722109. E-mail addresses: [email protected] (D. Frossyniotis), algorithm is k-means, which is based on the [email protected] (A. Likas), [email protected] (A. Stafylo- square-error criterion. This algorithm is compu- patis). tationally efficient and yields good results if the 0167-8655/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2003.12.018 642 D. Frossyniotis et al. / Pattern Recognition Letters 25 (2004) 641–654 clusters are compact, hyper-spherical in shape and the results from several runs of a clustering algo- well separated in the feature space. Numerous at- rithm in order to specify a common partition. tempts have been made to improve the perfor- Another multi-clustering approach is introduced mance of the simple k-means algorithm by using by Fred (2001), where multiple clusterings (using the Mahalanobis distance to detect hyper-ellip- k-means) are exploited to determine a co-associa- soidal shaped clusters (Bezdek and Pal, 1992) or by tion matrix of patterns, which is used to define an incorporating a fuzzy criterion function resulting appropriate similarity measure that is subse- in a fuzzy c-means algorithm (Bezdek, 1981). A quently used to extract arbitrarily shaped clusters. different partitional clustering approach is based Model structure selection is sometimes left as a on probability density function (pdf) estimation design parameter, while in other cases the selection using Gaussian mixtures. The specification of the of the optimal number of clusters is incorporated parameters of the mixture is based on the expec- in the clustering procedure (Smyth, 1996; Fisher, tation–minimization algorithm (EM) (Dempster 1987) using either local or global cluster validity et al., 1977). A recently proposed greedy-EM criteria (Halkidi et al., 2001). algorithm (Vlassis and Likas, 2002) is an incre- For classification or regression problems, it has mental scheme that has been found to provide been analytically shown that the gains from using better results than the conventional EM algorithm. ensemble methods involving weak learners are di- Hierarchical clustering methods organize data rectly related to the amount of diversity among the in a nested sequence of groups which can be dis- individual component models. In fact, for difficult played in the form of a dendrogram or a tree data sets, comparative studies across multiple (Boundaillier and Hebrail, 1998). These methods clustering algorithms typically show much more can be either agglomerative or divisive. An variability in results than studies comparing the agglomerative hierarchical method places each results of weak learners for classification. Thus, sample in its own cluster and gradually merges there could be a potential for greater gains when these clusters into larger clusters until all samples using an ensemble for the purpose of improving are ultimately in a single cluster (the root node). A clustering quality. divisive hierarchical method starts with a single The present work, introduces a novel cluster cluster containing all the data and recursively ensemble approach based on boosting, whereby splits parent clusters into daughters. multiple clusterings are sequentially constructed to Grid-based clustering algorithms are mainly deal with data points which were found hard to proposed for spatial data mining. Their main cluster in previous stages. An initial version of this characteristic is that they quantise the space into a multiple clustering approach was introduced by finite number of cells and then they do all opera- Frossyniotis et al. (2003). Further developments tions on the quantised space. On the other hand, are presented here including several improvements density-based clustering algorithms adopt the key on the method and experimentation with different idea to group neighbouring objects of a data set data sets exhibiting very promising performance. into clusters based on density conditions. The key feature of this method relies on the gen- However, many of the above clustering eral principle of the boosting classification algo- methods require additional user-specified para- rithm (Freund and Schapire, 1996), which meters, such as the optimal number and shapes of proceeds by building weak classifiers using pat- clusters, similarity thresholds and stopping crite- terns that are increasingly difficult to classify. The ria. Moreover, different clustering algorithms and very good performance of the boosting method in even multiple replications of the same algorithm classification tasks was a motivation to believe that result in different solutions due to random initial- boosting a simple clustering algorithm (weak izations, so there is no clear indication for the best learner) can lead to a multi-clustering solution partition result. with improved performance in terms of robustness In (Frossyniotis et al., 2002) a multi-clustering and quality of the partitioning. Nevertheless, it fusion method is presented based on combining must be noted that developing a boosting algo- D. Frossyniotis et al. / Pattern Recognition Letters 25 (2004) 641–654 643 rithm for clustering is not a straightforward task, • If t > 1, renumber the cluster indexes of H t as there exist several issues that must be addressed according to the highest matching score, such as the evaluation of a consensus clustering, given by the fraction of shared instances with since cluster labels are symbolic and, thus, one the clusters provided by the boost-clustering tÀ1 must also solve a correspondence problem. The until now, using Hag . proposed method is general and any type of basic • Calculate the pseudoloss: clustering algorithm can be used as the weak 1 XN learner. An important strength of the proposed ¼ wtCQt ð1Þ t 2 i i method is that it can provide clustering solutions i¼1 of arbitrary shape although it may employ clus- t where CQi is a measurement index that is tering algorithms that provide spherical clusters. used to evaluate the quality of clustering of t an instance xi for the partition H . • Set b ¼ 1Àt. t t 2. The boost-clustering method • Stopping criteria: i(i) If t > 0:5 then We propose a new iterative multiple clustering T ¼ t À 1 approach, that we will call boost-clustering, which go to step 3 iteratively recycles the training examples providing (ii) If t <max then multiple clusterings and resulting in a common ic ¼ ic þ 1 partition. At each iteration, a distribution over the if ic ¼ 3 then training points is computed and a new training set T ¼ t is constructed using random sampling from the go to step 3 original dataset. Then a basic clustering algorithm else is applied to partition the new training set. The ic ¼ 0 final clustering solution is produced by aggregat- max ¼ t ing the obtained partitions using weighted voting, • Update distribution W : where the weight of each partition is a measure of t t CQi its quality. The algorithm is summarized below. tþ1 wibt wi ¼ ð2Þ Zt Algorithm boost-clustering. Given: Input sequence where Z is a normalization constant such of N instances ðx ; ...; x Þ; x 2 Rd ; i ¼ 1; ...; N,a t P 1 N i that W tþ1 is a distribution, i.e., N wtþ1 ¼ 1. basic clustering algorithm, the number C of clusters i¼1 i • Compute the aggregate cluster hypothesis: to partition the data set and the maximum number "# of iterations T . Xt log b t P ð sÞ s Hag ¼ arg max t hi;k ð3Þ k¼1;...;C log b 1 s¼1 j¼1 ð jÞ 1. Initialize wi ¼ 1=N for i ¼ 1; ...; N. Set t ¼ 1, max ¼ 0 and ic ¼ 0.

A Clustering Method Based on Boosting

Text Mining Course for KNIME Analytics Platform

A Survey of Clustering with Deep Learning: from the Perspective of Network Architecture

A Tree-Based Dictionary Learning Framework

Comparison of Dimensionality Reduction Techniques on Audio Signals

Robust Hierarchical Clustering∗

Space-Time Hierarchical Clustering for Identifying Clusters in Spatiotemporal Point Data

Chapter G03 – Multivariate Methods

A Study of Hierarchical Clustering Algorithm

CNN Features Are Also Great at Unsupervised Classification

Single Link Clustering on Data Sets Ajaya Kushwaha, Manojeet Roy

ML Cheatsheet Documentation

LEARNING EMBEDDINGS for SPEAKER CLUSTERING BASED on VOICE EQUALITY Yanick X. Lukic, Carlo Vogt, Oliver Dürr, and Thilo Stadelma