Exploring Differential Evolution and Particle Swarm Optimization to Develop Some Symmetry-Based Automatic Clustering Techniques: Application to Gene Clustering

Neural Comput & Applic (2018) 30:735–757 https://doi.org/10.1007/s00521-016-2710-0 ISCMI15 Exploring differential evolution and particle swarm optimization to develop some symmetry-based automatic clustering techniques: application to gene clustering 1 2 Sriparna Saha • Ranjita Das Received: 17 April 2016 / Accepted: 8 November 2016 / Published online: 1 February 2017 Ó The Natural Computing Applications Forum 2017 Abstract In the current paper, we have developed two bio- visualization tools, namely Eisen plot and cluster profile inspired fuzzy clustering algorithms by incorporating the plot. optimization techniques, namely differential evolution and particle swarm optimization. Both these clustering tech- Keywords Unsupervised classification Á Particle swarm niques can detect symmetrical-shaped clusters utilizing the optimization (PSO) Á Differential evolution (DE) Á established point symmetry-based distance measure. Both Symmetry Á Point symmetry-based distance Á Gene the proposed approaches are automatic in nature and can expression data detect the number of clusters automatically from a given dataset. A symmetry-based cluster validity measure, F-Sym-index, is used as the objective function to be opti- 1 Introduction mized in order to automatically determine the correct partitioning by both the approaches. The effectiveness of In the field of data mining, clustering [22] has innumerable the proposed approaches is shown for automatically clus- applications for solving different real-life problems tering some artificial and real-life datasets as well as for [15, 23]. In the literature, many invariant clustering tech- clustering some real-life gene expression datasets. The niques have been proposed [4] to cluster the dataset. To current paper presents a comparative analysis of some identify clusters from a dataset, some proximity or simi- meta-heuristic-based clustering approaches, namely newly larity measurements need to be defined among data points proposed two techniques and the already existing auto- to establish rules which can be used to assign points to the matic genetic clustering techniques, VGAPS, GCUK, domain of a particular cluster centroid. For recognition and HNGA. The obtained results are compared with respect to identification of most of the objects, ‘‘Symmetry’’ is useful some external cluster validity indices. Moreover, some as it is an important characteristic of real-life objects. As statistical significance tests, as well as biological signifi- symmetry is a natural phenomenon, we can assume that cance tests, are also conducted. Finally, results on gene some kind of symmetricity exists in the cluster structure expression datasets have been visualized by using some also. Symmetry measurements can be of two types, point symmetry (PS) and line symmetry (LS). Point symmetry- based measurements are more applicable for clusters which & Sriparna Saha are symmetric about their central point. In Fig. 1, some [email protected] objects having point symmetry and line symmetry prop- Ranjita Das erties are shown. Inspired by these observations, some [email protected]; [email protected] point symmetry-based measurements are developed in 1 Department of Computer Science and Engineering, Indian [7, 39]. These distance functions are then utilized in [7]to Institute of Technology Patna, Patna, India develop some clustering techniques which can determine 2 Department of Computer Science and Engineering, National any kind of point symmetric clusters from different data- Institute of Technology Mizoram, Aizawl, India sets. The symmetry in clustering is discussed in many 123 736 Neural Comput & Applic (2018) 30:735–757 variable length coding representation, and used the two- stage selection and mutation operator. But when the dimension of the dataset increases, the search ability gets reduced. Kao et al. [24] presented a hybrid particle swarm optimization algorithm for automatically evolving the cluster centers and applied it to the problem of generalized machine cell formation. In recent years, some new optimization techniques like Fig. 1 Point symmetric and line symmetric objects cuckoo search technique [47], differential evolution (DE) [34, 46], particle swarm optimization (PSO) [33] and ant existing works on clustering, for example, in the analysis of colony optimization [13] have been proposed in the liter- invariant clustering [4]. In [7], some genetic algorithm- ature. Recent studies have also revealed that these opti- based techniques are developed for solving the clustering mization techniques converge much faster than the genetic problem using the properties of symmetry. The clustering algorithms [34, 46]. Based on these observations, some problem is modeled as an optimization problem and differential evolution-based and particle swarm optimiza- genetic algorithm [19] was used to optimize the total tion-based clustering techniques are also developed in the symmetrical compactness of the obtained clustering to get literature [29, 31, 35, 40]. In [31], a modified differential the optimal partitioning. This algorithm overcomes some evolution-based clustering technique is developed for drawbacks associated with SBKM and Mod-SBKM clus- satellite image segmentation. In [40], a modified fitness- tering techniques [43]. based adaptive differential evolution algorithm is developed for clustering of image pixels. Here the control 1.1 Some automatic clustering techniques parameters of the traditional DE-based approach are cal- culated adaptively using the fitness-based statistics. In [36], In the literature, many genetic algorithm-based clustering two variants of DE-based clustering techniques are pro- techniques are available which are capable of detecting the posed. These are then applied for solving clustering prob- number of clusters and the appropriate partitioning auto- lem from some real-life datasets. Zhang et al. [48] have matically from any given dataset. Some examples are used DE to optimize the coordinates of the samples dis- variable string length genetic K-means algorithm tributed randomly on a plane. Kernel-based approaches are (GCUK)[6], hybrid niching genetic algorithm (HNGA) utilized here to map the data of the original space into a [41] where Euclidean distance has been used for assigning high-dimensional feature space in which a fuzzy dissimi- data points into different clusters. A variable string length larity matrix is constructed. Cai et al. [11] combined tra- genetic clustering technique (VGAPS) [38] is also pro- ditional DE and one step K-means clustering for the posed where point symmetry-based distance has been used. problem of unconstrained global optimization. Tvrdk In [6], a genetic algorithm-based K-means clustering et al. [44] developed a hybrid method by combining DE technique has been developed which is able to detect and K-means algorithm and applied it to non-hierarchical clusters having equi-sized hyper-spherical shapes. GCUK clustering. In [26] authors have incorporated a local uses genetic algorithm-based K-means clustering technique improvement phase to the classical DE to get the faster for automatic identification of clusters. In HNGA [41]to convergence and better performance and further applied in prevent premature convergence, a niching method is the wireless sensor network to increase the lifetime of the developed along with a weighted sum validity function for network. Liu et al. [27] combined two multi-parent cross- optimization. Liu et al. [28] developed an automatic clus- over operators with differential evolution and it is pre- tering technique based on genetic algorithm and presented sented to solve the problem of global optimization. A good a noising selection and division absorption-based mutation survey covering the existing particle swarm optimization- technique to maintain the diversity of population and based clustering techniques can be found in [2]. selection pressure. Horta et al. [21] developed an evolu- tionary technique based on fuzzy clustering for automati- 1.2 Motivation cally identifying the clusters present in the relational data. In [1], authors have introduced a grouping-based evolu- All the existing DE- and PSO-based clustering techniques tionary approach which has used the idea of grouping are found to perform better than the corresponding genetic encoding and an adaptive exploration and exploitation algorithm-based versions. But in the earlier attempts, these operator. Moreover, an elitist scheme is also applied to algorithms were used along with popular Euclidean dis- ensure that the best solution is preserved by the algorithm. tance for assignment of points to different clusters. As He et al. [20] adopted for initialization of individual, a mentioned earlier, symmetry-based measurements [7] are 123 Neural Comput & Applic (2018) 30:735–757 737 found to perform better than the popular Euclidean dis- 1.4 Major contributions tance-based versions in detecting clusters having different shapes and sizes. Thus, the incorporation of these sym- The followings are the key contributions of the current metry-based measurements in the frameworks of differen- paper: tial evolution and particle swarm optimization-based • This is the first attempt where some differential clustering techniques can help to increase the quality of the evolution or particle swarm optimization-based fuzzy partitions further. clustering techniques are developed using the proper- In the current paper, we have made an attempt in this ties of symmetry. direction. Two algorithms based on the search

Load more