A Memetic-Grasp Algorithm for Clustering

A MEMETIC-GRASP ALGORITHM FOR CLUSTERING Yannis Marinakis, Magdalene Marinaki, Nikolaos Matsatsinis and Constantin Zopounidis Department of Production Engineering and Management, Technical University of Crete University Campus, 73100 Chania, Greece Keywords: Clustering analysis, Feature selection problem, Memetic Algorithms, Particle Swarm Optimization, GRASP. Abstract: This paper presents a new memetic algorithm, which is based on the concepts of Genetic Algorithms (GAs), Particle Swarm Optimization (PSO) and Greedy Randomized Adaptive Search Procedure (GRASP), for optimally clustering N objects into K clusters. The proposed algorithm is a two phase algorithm which combines a memetic algorithm for the solution of the feature selection problem and a GRASP algorithm for the solution of the clustering problem. In this paper, contrary to the genetic algorithms, the evolution of each individual of the population is realized with the use of a PSO algorithm where each individual have to improve its physical movement following the basic principles of PSO until it will obtain the requirements to be selected as a parent. Its performance is compared with other popular metaheuristic methods like classic genetic algorithms, tabu search, GRASP, ant colony optimization and particle swarm optimization. In order to assess the efficacy of the proposed algorithm, this methodology is evaluated on datasets from the UCI Machine Learning Repository. The high performance of the proposed algorithm is achieved as the algorithm gives very good results and in some instances the percentage of the corrected clustered samples is very high and is larger than 96%. 1 INTRODUCTION 2004; Jain and Zongker, 1997; Marinakis et al., 2007). Feature extraction utilizes some Clustering analysis is one of the most important transformations to generate useful and novel features problem that has been addressed in many contexts from the original ones. and by researchers in many disciplines and it The clustering algorithm design or selection step identifies clusters (groups) embedded in the data, is usually combined with the selection of a where each cluster consists of objects that are corresponding proximity measure and the similar to one another and dissimilar to objects in construction of a criterion function which makes the other clusters (Jain et al., 1999; Mirkin, 1996; partition of clusters a well defined optimization Rokach and Maimon, 2005; Xu and Wunsch II, problem (Jain et al., 1999; Rokach and Maimon, 2005). 2005). Many heuristic, metaheuristic and stochastic The typical clustering analysis consists of four algorithms have been developed in order to find a steps (with a feedback pathway) which are the near optimal solution in reasonable computational feature selection or extraction, the clustering time. Suggestively, for example, clustering algorithm design or selection, the cluster validation algorithms based on Tabu Search (Liu et al., 2005), and the results interpretation (Xu and Wunsch II, Simulated Annealing (Chu and Roddick, 2000), 2005). Greedy Randomized Adaptive Search Procedure The basic feature selection problem (FSP) is an (Cano et al., 2002), Genetic Algorithms (Sheng and optimization one, where a search through the space Liu, 2006; Yeh and Fu, 2007); Neural Networks of feature subsets is conducted in order to identify (Liao and Wen, 2007), Ant Colony Optimization the optimal or near-optimal one with respect to the (Azzag et al., 2007; Kao and Cheng, 2006; Yang and performance measure. In the literature many Kamel, 2006) Particle Swarm Optimization (Kao et successful feature selection algorithms have been al., 2007; Paterlini and Krink, 2006; Sun et al., proposed (Aha and Bankert, 1996; Cantu-Paz et al., 2006) and Immune Systems (Li and Tan, 2006; Younsi and Wang, 2004) have proposed in the 36 Marinakis Y., Marinaki M., Matsatsinis N. and Zopounidis C. (2008). A MEMETIC-GRASP ALGORITHM FOR CLUSTERING. In Proceedings of the Tenth International Conference on Enterprise Information Systems - AIDSS, pages 36-43 DOI: 10.5220/0001694700360043 Copyright c SciTePress A MEMETIC-GRASP ALGORITHM FOR CLUSTERING literature. An analytical survey of the clustering Procedure (GRASP) for the solution of the algorithms can be found in (Jain et al., 1999; Rokach clustering problem. In this algorithm, the activated and Maimon, 2005; Xu and Wunsch II, 2005). features are calculated by the memetic algorithm Cluster validity analysis is the assessment of a (see section 2.4) and the fitness (quality) of each clustering procedure’s output using effective member of the population is calculated by the evaluation standards and criteria (Jain et al., 1999; clustering algorithm (see section 2.5). In the Xu and Wunsch II, 2005). In the results following, initially the clustering problem is stated, interpretation step, experts in the relevant fields then a general description of the proposed algorithm interpret the data partition in order to guarantee the is given while in the last two subsections each of the reliability of the extracted knowledge. phases of the algorithm are presented analytically. In this paper, a new hybrid metaheuristic algorithm that uses a memetic algorithm (Moscato, 2.2 The Clustering Problem 2003) for the solution of the feature selection problem and a Greedy Randomized Adaptive Search The problem of clustering N objects (patterns) into K Procedure (GRASP) (Feo and Resende, 1995) for clusters is considered: Given N objects in Rn, the solution of the clustering problem is proposed. allocate each object to one of K clusters such that the The reason that a memetic algorithm, i.e. a genetic sum of squared Euclidean distances between each algorithm with a local search phase (Moscato, 2003), object and the center of its belonging cluster (which is used instead of a classic genetic algorithm is that is also to be found) for every such allocated object is it is very difficult for a pure genetic algorithm to minimized. The clustering problem can be effectively explore the solution space. A mathematically described as follows: combination of a global search optimization method N K with a local search optimization method usually 2 Minimize J (w, z) = w x − z improves the performance of the algorithm. In this ∑∑ ij i j (1) paper instead of using a local search method to i==11j improve each individual separately, we use a global subject to search method, like Particle Swarm Optimization, and, thus, each individual does not try to improve its K solution by itself but it uses knowledge from the ∑ wij = 1, i = 1, ..., N (2) solutions of the whole population. In order to assess j =1 the efficacy of the proposed algorithm, this wij = 0 or 1, i = 1, ..., N, j = 1, ..., K (3) methodology is evaluated on datasets from the UCI Machine Learning Repository. The rest of this paper where: is organized as follows: In the next section the K is the number of clusters (given or unknown), proposed Memetic Algorithm is presented and N is the number of objects (given), analyzed in detail. In section 3, the analytical n xi ∈ R , (i =1,..., N) is the location of the ith computational results for the datasets taken from the pattern (given), UCI Machine Learning Repository are presented n zj ∈ R , (j =1,..., K) is the center of the jth cluster while in the last section conclusions and future (to be found), where research are given. 1 N z = w x (4) j N ∑ ij i 2 THE PROPOSED j i=1 MEMETIC-GRASP where Nj is the number of objects in the jth cluster, ALGORITHM wij is the association weight of pattern xi with cluster j, (to be found), where: 2.1 Introduction ⎧1 if pattern i is allocated to cluster j, ⎪ The proposed algorithm (MEMETIC-GRASP) for ⎪∀ i = 1, ..., N, j = 1, ..., K the solution of the clustering problem is a two phase wij = ⎨ (5) ⎪ algorithm which combines a memetic algorithm ⎪0 otherwise. (MA) for the solution of the feature selection ⎩ problem and a Greedy Randomized Adaptive Search 37 ICEIS 2008 - International Conference on Enterprise Information Systems 2.3 General Description of the short description of each phase of the Memetic- Algorithm GRASP algorithm is presented. Initially, as it was mentioned in the section 2.1, in Initialization the first phase of the algorithm a number of features Generate the initial population are activated, using a Memetic Algorithm. Usually Evaluate the fitness of each individual using the in a genetic algorithm each individual of the GRASP algorithm for Clustering population is used in discrete phases. Some of the Main Phase Do while stopping criteria are not satisfied individuals are selected as parents and by using a Select individuals from the population to be crossover and a mutation operator they produce the parents offspring which can replace them in the population. Call crossover operator to produce offspring But this is not what really happens in real life. Each Call mutation operator individual has the possibility to evolve in order to Evaluate the fitness of the offspring using the optimize its behaviour as it goes from one phase to GRASP algorithm for clustering the other during its life. Thus, in the proposed Call PSO memetic algorithm, the evolution of each individual Evaluate the fitness of the offspring using the of the population is realized with the use of a PSO GRASP algorithm for clustering algorithm. In order to find the clustering of the Replace the population with the fittest of the samples (fitness or quality of the genetic algorithm), whole population a GRASP algorithm is used. The clustering Enddo algorithm has the possibility to solve the clustering problem with known or unknown number of In the proposed algorithm, each individual in the clusters. When the number of clusters is known the population represents a candidate solution to the Eq. (1), denoted as SSE, is used in order to find the feature subset selection problem. Let m be the total best clustering. In the case that the number of number of features (from these features the choice of clusters is unknown two additional measures are the features used to represent each individual is used.

Load more