Improving Selection Methods for Evolutionary by Clustering

Kaikuo Xu1, ChangjieTang1, Yintian Liu1, Chuan Li1, Jiang Wu1, Jun Zhu2, Li Dai2 1School of Computer Science, Sichuan University, Chengdu, 610065, China 2National Center for Birth Defects Monitoring, Chengdu, 610065, China {xukaikuo, tangchangjie}@cs.scu.edu.cn

Abstract characteristics of the population, hence the performance is still far from practice requirement. This study applies clustering in population selection to In this work, we apply clustering to selection improve the efficiency of evolutionary algorithms. The methods to better balance these two primary factors. (a) main contributions include: (a) Proposes a novel This selection framework is a combination of clustering selection framework that uses the number of clusters for a [7, 8] and classic selection schemes [6]. It first clusters the population as the measurement the population diversity. population according to the fitness distribution, then (b) Proposes clustering-ranking selection, an instance of carries out selection on the clusters using the classic this framework, and discusses its mathematical principle selection methods, at last selects individuals from the by PD-SP equation. (c) Gives experiments over CLPSO selected clusters according to uniform distribution; (b) (Comprehensive Learning Particle Swarm Optimization). Ranking selection is chosen to form an instance of the Experiment result shows that the proposed selection framework. The mathematical analysis shows that since method outperforms canonical exponential ranking on all the results of the clustering vary with the evolutional the sixteen-benchmark functions for both 10-D and 30-D progress, the selective pressure and population diversity problems except a function for 30-D problem. change dynamically; (c) An equation called PD-SP equation is generated to describe the relation; (d) CLPSO 1. Introduction was chosen to optimize 16 benchmark functions in experiment. We demonstrate that the genetic search under Evolutionary is an amalgamation the guidance of the clustering-ranking selection yields direction of data mining and evolutionary computing better results than that of the canonical exponential inspired by biological . Unlike data mining using ranking selection. evolutionary algorithms [1], it first uses existing The rest of the paper is organized as follows. Section techniques in data mining, such as trend analysis, 2 introduces the framework for selection with clustering. clustering and frequent pattern mining, to analysis the Section 3 describes an instance of this framework: data generated in the evolution process, and then utilize clustering-ranking selection method and discusses it from the discovered rule to direct the evolution [2-4]. a mathematical view. Section 4 gives experiments. The subject of evolutionary data mining, evolutionary Conclusion is given in Section 5. algorithms, have been widely researched and many methods have been proposed. Despite their different 2. A framework for selection with clustering representation and genetic operators [1], natural selection 2.1. Symbols and Terms is a common and essential part for all of them. Selection process plays a major role in evolutionary algorithms Clustering is an important human activity and it is since it determines the direction of search whereas other heuristic for our selection: in a population, there are many genetic operators propose new search points in an groups in which the fitness of the individuals within a undirected way. In the literature [5], Whitley argues that group being similar while dissimilar to those of population diversity and selective pressure is the only two individuals in other groups. Therefore, these groups can primary factors in genetic search and that the main be treated differently. problem consists in the inverse relation. A good selection Let P = {I1, I2, …, IN} be a population containing N should balance the two primary factors well. Although individuals Ii, F(Ii) be the fitness of Ii, si be a non-null many selection schemes [6] have been proposed to subset of P and S = {s1, s2, …, s2n-1} be the set of all improve the performance, they don’t adapt to the subsets. Denote the individual whose fitness is the largest in si as max(si) and the smallest one as min(si). Definition 1 (Similarity) Let Ii and Ij as two individuals. Supported partly by the 11th Five Years Key Programs for Science and The similarity between Ii and Ij is given as equation 1. min(F (I ),F (I )) Technology Development of China under grant No.2006038002003; sim(I , I ) = i j (1) Youth Foundation, Sichuan University, 06036: JS20070405506408. Jun i j max(F (Ii ),F (I j )) Zhu is the associate author.

Third International Conference on Natural Computation (ICNC 2007) 0-7695-2875-9/07 $25.00 © 2007 Definition 2 (Similar) Let Ii and Ij be two individuals, σ example, roulette wheel selection may require the average be a user-specified threshold, if sim ( Iii , I ) ≥ σ, then Ii is fitness of all the individuals inside a cluster. Theoretically this won’t change the time complexity of 1. said to be similar to Ij and vice versa. Definition 3 (Cluster) Let s be a non-null subset of P, if Algorithm 2 Selection with clustering min(s) is similar to max(s), then s is called a Cluster, and Input: the population P ’ max(s) is the Core of cluster s denoted as Core(s). Output: the population after selection, P 1. Clustering the population Definition 4 (Maximal Cluster) Let s be a cluster in P, 2. Call ClusterSelection (C) ’ if ∀I∈P and I∉s, I is not similar to the core of s, then s is 3. Call IndividualSelecction (C ) 4. return a Maximal Cluster in P. Procedure ClusterSelection (C) The clustering algorithm described in Algorithm 1 Procedure IndividualSelection (C’) 14. P’ ← ∅ consists of two main procedures: Sorting and Clustering. 15. for each s’∈ C’ do Sorting can be performed in O(nlog(n)) steps by standard 16. r ← random[0, |s’| ] 17. P’ = P’ ∪ s’[r] techniques. Since the population has been sorted 18. return P’ according to fitness in ascending order, we can scan the population in a fitness descending order in cluster. We According to procedure IndividualSelection, given a find max(P) from P, which is the first element we scan cluster s∈C and its selection probability ps, the probability and remove it from P. Then we find all the elements that of the individuals in s is given by formula 2. are similar to max(P), remove them from P (in lines 10 - p '= ps (i∈{1, …, |s|}) (2) 13). This generates a maximal cluster, which takes max(P) j |s| as the core. The step is repeated until P is null. Because To further illustrate this framework, exponential the population is sorted ahead, the time complexity of ranking selection is selected to carry out ClusterSelection. cluster is O(n). It requires the rank of each cluster, which is mentioned above. This new formed selection method is called Algorithm 1 Clustering the population clustering-ranking selection. Input: population P Output: a cluster set C 1. Sort P on fitness (value ascending order) 3. An instance of the selection framework 2. Call Cluster (P) 3. return 3.1. Clustering-ranking Selection (CRS) Procedure Cluster (P) //P is sorted by fitness 4. C ← ∅ 5. count ← 0 This selection consists of two steps. It first selects 6. while P ≠ ∅do clusters using exponential ranking selection and then 7. s ← ∅ 8. scan P and find I = max (P) selects individuals in the selected clusters according to the 9. s ← s∪{I} uniform distribution. Let q denote the base of exponent. 10. remove I from P Given the cluster set C generated by Algorithm 1, the 11. while ∋I’∈ P and I’ is similar to I do 12. s ← s∪{I’} probability of the clusters is given by formula 3. 13. remove I’ from P |C|−i 15. s.rank = count q = ( i∈{1, …, |C|}) (3) 16. count ++ pi |C| − 17. C ← C∪{s} q|C| j 18. for each s∈ C do ∑ j=1 19. s.rank = |C| - s.rank Algorithm 3 Clustering-ranking selection 20. C[s.rank] = s ∈ 21. return C Input: the population P and the ranking base q (0, 1] Output: the population after selection P’ 1. Clustering the population Each cluster is a set of individuals whose attributes 2. Call ClusterSelection (q,C) in the solution space are similar. Therefore the meaning of 3. Call IndividualSelecction (C’) 4. return |C|, the cardinal number of the output, is clear: it could be Procedure ClusterSelection (q,C) //C is sorted by fitness a measurement of the population diversity. |C| will 5. C’ ← ∅ vary with the evolution of the population. 6. sc0 ← 0 7. for i ← 1 to |C| do 8. sci ← sci-1 + pi //(Equation 3) 9. for i ← 1 to |C| do 2.2. Selection with clustering 10. r ← random[0, sc] 11. si’ ← sl such that sl-1 ≤ r < sl 12. C’ = C’ ∪ {si’} The framework for selection with clustering is 13. return C’ described in Algorithm 2. The process is composed of |C| − three parts, of which ClusterSelection is the key. All the The sum q|C| j normalizes the probabilities to ∑ j=1 classic selection schemes, such as tournament selection, |C| |C| − |C| − truncation selection, ranking selection, can be used to ensure that p = 1. For q|C| j = q 1 [6], we ∑ j=1 j ∑ j=1 q−1 carry out this procedure. Different schemes require different information provided by algorithm 1. For can rewrite the above equation as formula 4.

Third International Conference on Natural Computation (ICNC 2007) 0-7695-2875-9/07 $25.00 © 2007 q −1 − M *A2 = |C| i ∈ MIN( ) = σ . (proof skipped) pi |C| q (i {1, …, |C|}) (4) q −1 M *A3 According to equation 4 and 5, given a population P Theorem 1 shows that the degree of the difference and its cluster set C, ∀I∈P, I ∈ s, s ∈ C, s.rank = i, the depends on σ. The approaching of σ to 1 will make probability of I is given as formula 5 algorithm 3 and algorithm 2 get more and more similar. q −1 − According to our observation, the real rate, M *A2 , gets P(I) = q|C| i ( i∈{1, …, |C|}) (5) | s |*(q|C| −1) M *A3 much more closer to 1 than σ does. Thus, we’ll discuss Algorithm 3 is similar to the algorithm 2. The the properties of algorithm 4 in more detail. The whole differences lie in the input and the calculation of the analysis is based on the work in [6]. We derive the selection probabilities. The time complexity of Algorithm following theorems from the theorems that describe the 3 depends on two parts: Algorithm 1 and the properties of exponential ranking selection. proportionate selection performed in Algorithm 3. Theorem 2 The loss of diversity pd,E(|C|) of clustering- Proportionate selection can be performed in something ranking selection is between O(n) and O(n2). Here we assume that a method q|C| −1 a−1 − |C| no worse than O(nlog(n)) is adopted, concluding that 1− ln a 1 ln |C| q p ,E(| C |) = a ln a − = |C|q ln q − (6) clustering-ranking has time complexity O(nlog(n)), which d ln a a −1 | C | lnq q|C| −1 shows that the time complexity of clustering-ranking Proof is skipped. selection is the same as that of exponential ranking Figure 1 depicts the relationship between loss of selection [6]. diversity and population diversity with different q. In Algorithm 3, the fitness’s for the individuals in the q=0.90 q=0.92 q=0.95 0.4 ) same cluster are different. Therefore, it’s difficult to 0.35 discuss about algorithm 3 from a mathematical view. And 0.3 0.25 it also stands for the framework adopting other classic 0.2 0.15 selection methods. To simplify the problem for 0.1

mathematical analysis, we derived a variation of it diversity of (loss p(|C|) 0.05 0 (Algorithm 4). 1 3 5 7 9 1113151719212325272931 |C| (population diversity) 3.2. Variation of CRS Fig. 1. Loss of diversity pd,E(|C|) PD-SP equation: The selection pressure IE(|C|) of Algorithm 4 first transforms the cluster set C to a clustering-ranking selection is population of the same length. Each core will stand for π lnln π lnln q|C| the corresponding cluster. Then exponential ranking IE(| C |) = 0.558 a = 0.558 (7) 3.69a 3.69q|C| selection will be carried out on this population. Figure 2 depicts the relationship between selection Algorithm 4 Variation of clustering-ranking selection pressure and population diversity with different q. Input: the population P and the ranking base q ∈ (0, 1] q=0.90 q=0.92 q=0.95 Output: the population after selection P’ 0.8 1. Clustering the population 0.7 2. Call Transform(C) 0.6 3. Call ExponentialRankingSelection (q, Pcore) 0.5 4. return 0.4 Procedure Transform(C) //C is sorted by fitness 0.3 ← ∅ 0.2 5. Pcore pressure) I (selection 0.1 6. for each s∈ C 0 7. Pcore = Pcore ∪ core(s) 1 3 5 7 9 1113151719212325272931 8. return Pcore |C| (population diversity) According to the definition in [10], it’s clear that Fig. 2. Selection pressure IE(|C|) Figure 1 and Figure 2 show that both pd,E(|C|) and many of the properties for algorithm 3 are the same as that of algorithm 2 except the expected average fitness IE(|C|) increase with the increasing of |C| and vice visa. This is what we expect: when the population diversity is after selection( M * ). Therefore, the properties that are high, we need to decrease it to slow down the speed of related to are different. We’ll concentrate on in M * M * convergence; when the population diversity is low, we anglicizing the difference between Algorithm 2 and need to increase it to speed up the evolution. The Algorithm 3. population diversity, |C|, can be taken as the “context”, Theorem 1 For the same population P, let be M *A2 and the properties of clustering-ranking selection always expected average fitness after performing Algorithm 2 on adapt to the context. That’s why we call it “a context- P and be the value for Algorithm 3, then M *A3 aware selection method”.

Third International Conference on Natural Computation (ICNC 2007) 0-7695-2875-9/07 $25.00 © 2007 From Figure 1 and Figure 2, we also get that the perform best. On function 14, they all failed to solve the smaller q is, the bigger both pd,E(|C|) and IE(|C|) are. problem. On Schwefel’s function, CRS (σ = 1) successes Therefore we can control the evolution process manually to avoid falling into the deep local optimal which is far through tuning the value of q. from the global optimum. CRS (σ = 0.96) is the best among the four: it surpasses the others on 7 functions. 4. Experement evaluation The experiment conducted on 10-D problems is repeated on 30-D problems, and the results are presented CLPSO[9] is used in this evaluation, which is a in Table 4. We can observe that the relation between CRS variant of particle swarm optimizer. Experiments were and ERS is similar. CRS surpasses ERS on all functions conducted to compare CLPSO with the classic except 3 and significantly improves the results on exponential ranking selection (ERS) and the proposed functions 15 and 16. But in this time, the number CRS (σ clustering-ranking selection (CRS). The parameters = 0.98 or 0.99) perform best is 5, which is worthy of settings are listed in Table 1. The test functions are the attention. CRS (σ = 0.96) is the best among the four same as those in [9]. instead of CRS (σ = 1): it surpasses the others on 7 Table 3 presents the means and variances for the 30 functions. Although CRS (σ = 0.96) performed not as runs of the CLPSO with ERS and CRS separately on 16 well as in solving 10-D problems, it also performed well test functions with 10-D problems. The best results at solving the composition functions. among the two selection methods are shown in bold. CRS Table 1. Parameters settings for experiments surpasses ERS on most functions, especially improves the ID Dimension Population Size FEs σ results on functions 8 and 15. On function 5 and 6, their 1 10 10 30000 {0.96,0.98,0.99,1} performances are comparative: both of them successfully 2 30 40 200000 {0.96,0.98,0.99,1} achieve the global minimum for each run. On all the left functions, the number CRS (σ = 0.98 or 0.99) perform best is 2, which is far less than that CRS (σ = 1 or 0.96)

Table 2. Results on 10-D Func Group A Group A Group B Group B Selection 1 2 3 4 ERS 2.41e-023±7.11e-023 2.56e+000±1.75e+000 4.71e-013±4.88e-013 1.10e-002±1.13e-002 CRS (σ=1) 3.87e-024±8.03e-024 2.19e+000±1.54e+000 5.44e-013±4.56e-013 1.04e-002±7.06e-003 CRS (σ=0.99) 4.62e-024±1.16e-023 2.45e+000±1.89e+000 4.18e-013±3.55e-013 1.03e-002±9.38e-003 CRS (σ=0.98) 1.62e-024±4.04e-024 2.21e+000±1.51e+000 3.96e-013±3.15e-013 9.87e-003±8.99e-003 CRS (σ=0.96) 4.69e-024±1.19e-023 1.82e+000±1.39e+000 3.30e-013±2.30e-013 8.67e-003±8.83e-003 Func Group B Group B Group B Group B Selection 5 6 7 8 ERS 0±0 0±0 1.00e-001±3.05e-001 1.97e+001±4.49e+001 CRS (σ=1) 0±0 0±0 6.67e-002±2.54e-001 1.27e-004±3.91e-013 CRS (σ=0.99) 0±0 0±0 1.00e-001±3.05e-001 3.94e+000±2.16e+001 CRS (σ=0.98) 0±0 0±0 6.67e-002±2.54e-001 7.90e+000±3.00e+001 CRS (σ=0.96) 0±0 0±0 6.67e-002±2.54e-001 7.90e+000±3.00e+001 Func Group C Group C Group C Group C Selection 9 10 11 12 ERS 3.89e-005±1.37e-004 4.11e-002±2.53e-002 6.64e-001±5.83e-001 5.96 e+000±2.72e+000 CRS (σ=1) 8.71e-005±3.88e-004 4.17e-002±2.97e-002 4.71e-001±4.28e-001 5.69 e+000±2.56e+000 CRS (σ=0.99) 1.97e-005±6.31e-005 3.82e-002±2.48e-002 6.77e-001±6.79e-001 6.45 e+000±2.67e+000 CRS (σ=0.98) 8.39e-005±3.31e-004 4.07e-002±3.03e-002 6.18e-001±6.43e-001 6.58 e+000±2.79e+000 CRS (σ=0.96) 1.73e-005±5.15e-005 3.75e-002±2.44e-002 6.49e-001±7.30e-001 6.71 e+000±1.97e+000 Func Group C Group C Group D Group D Selection 13 14 15 16 ERS 5.18 e+000±1.69e+000 4.37e+002±2.38e+002 1.82e+001±3.74e+001 3.47e+001±3.91e+001 CRS (σ=1) 5.07 e+000±2.01e+000 4.61e+002±2.81e+002 9.12e-001±3.40e+000 3.89e+001±3.53e+001 CRS (σ=0.99) 5.13 e+000±2.02e+000 4.13e+002±2.69e+002 9.04e+000±2.68e+001 3.61e+001±4.05e+001 CRS (σ=0.98) 5.00 e+000±2.00e+000 4.77e+002±2.10e+002 3.33e+000±1.83e+001 3.27e+001±3.00e+001 CRS (σ=0.96) 5.84 e+000±1.81e+000 4.53e+002±2.08e+002 1.45e-002±6.08e-002 2.82e+001±3.19e+001

Table 3. Results on 30-D Func Group A Group A Group B Group B Selection 1 2 3 4

Third International Conference on Natural Computation (ICNC 2007) 0-7695-2875-9/07 $25.00 © 2007 ERS 4.68e-011 ±4.05e-011 1.74e+001±5.88e+000 4.86e-006 ±2.43e-006 3.36e-007 ±1.22e-006 CRS (σ=1) 4.15e-011 ±3.54e-011 1.63e+001±4.68+-000 5.00e-006 ±2.07e-006 9.28e-008 ±2.90e-007 CRS (σ=0.99) 6.51e-011 ±4.92e-011 1.47e+001±4.35e+000 5.02e-006 ±2.07e-006 5.90e-008 ±1.90e-007 CRS (σ=0.98) 4.44e-011 ±3.10e-011 1.68e+001±5.19e+000 5.88e-006 ±3.19e-006 7.97e-008 ±1.46e-007 CRS (σ=0.96) 6.28e-011 ±5.30e-011 1.64e+001±4.49e+000 6.43e-006 ±3.39e-006 1.62e-008 ±4.52e-007 Func Group B Group B Group B Group B Selection 5 6 7 8 ERS 1.08e-004 ±3.24e-005 2.12e-007 ±3.60e-007 1.23e-006 ±1.98e-006 3.84e-004 ±4.21e-012 CRS (σ=1) 1.02e-004 ±3.15e-005 1.53e-007 ±1.27e-007 5.34e-007 ±4.74e-007 3.81e-004 ±2.85e-012 CRS (σ=0.99) 1.03e-004 ±3.29e-005 1.67e-007 ±2.02e-007 8.55e-007 ±8.61e-007 3.82e-004 ±3.81e-012 CRS (σ=0.98) 1.17e-004 ±2.98e-005 2.49e-007 ±3.77e-007 1.64e-006 ±2.20e-006 3.82e-004 ±5.58e-012 CRS (σ=0.96) 1.23e-004 ±3.90e-005 3.05e-007 ±3.46e-007 1.72e-006 ±1.98e-006 3.81e-004 ±1.29e-012 Func Group C Group C Group C Group C Selection 9 10 11 12 ERS 2.68e-001 ±2.86e-001 4.41e-004 ±4.36e-004 6.25e+000±1.89e+000 4.65e+001 ±8.62e+000 CRS (σ=1) 1.89e-001 ±2.50e-001 3.75e-004 ±3.76e-004 6.11e+000±1.28e+000 4.05e+001 ±7.30e+000 CRS (σ=0.99) 2.97e-001 ±3.16e-001 3.85e-004 ±6.32e-004 6.66e+000±1.52e+000 4.27e+001 ±8.40e+000 CRS (σ=0.98) 2.99e-001 ±2.83e-001 3.19e-004 ±3.40e-004 6.66e+000±1.58e+000 4.29e+001 ±8.91e+000 CRS (σ=0.96) 3.45e-001 ±2.95e-001 3.58e-004 ±3.71e-004 7.29e+000±1.82e+000 4.17e+001 ±8.48e+000 Func Group C Group C Group D Group D Selection 13 14 15 16 ERS 3.75e+001±6.53e+000 2.35e+003 ±4.18e+002 1.17e-001±4.34e-001 2.02e+000±1.85e+000 CRS (σ=1) 3.60e+001±7.95e+000 2.34e+003 ±4.25e+002 3.48e-002±1.53e-001 9.20e-001±1.64e+000 CRS (σ=0.99) 3.69e+001±6.31e+000 2.31e+003 ±5.00e+002 1.63e-003±8.77e-003 3.29e-001±2.98e-001 CRS (σ=0.98) 3.54e+001±6.20e+000 2.35e+003 ±4.38e+002 1.67e-005±3.22e-005 4.07e-001±5.38e-001 CRS (σ=0.96) 3.66e+001±6.34e+000 2.38e+003 ±5.17e+002 1.36e-005±3.36e-005 5.88e-001±5.31e-001

5. Conclusions is best, In J. David Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, In this paper, a selection framework utilizing clustering pages 116-121, San Mateo, CA, 1989. Morgan Kaufmann Publishers. strategy is proposed. It could adapt itself to evolution of [6] Blickle, T. and Thiele, L: A Comparison of Selection the population. We applied a concrete selection scheme Schemes used in Genetic Algorithms (2. Edition), TIK within this framework to optimize some difficult Report No. 11, Computer Engineering and Communication benchmark functions using CLPSO and obtained better Networks Lab (TIK), Swiss Federal Institute of Technology performance at least for the given conditions. It should be (ETH) Zürich, Switzerland, 1995. clear that comparing with other alternatives of selection [7] Lee, J., Lee, D.: An improved cluster labeling method for methods to better balance exploration and exploitation, support vector clustering, IEEE Transactions on Pattern the proposed framework is a general framework, and Analysis and Machine Intelligence 27 (2005) 461 – 464. obviously it can be readily applied to other evolutionary [8] Jaewook Lee; Daewon Lee: Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector algorithms, such as GEP, GAs. Clustering, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.28, pp.1869-1874, 2006. 6. References [9] J. J. Liang, A. K. Qin, P. N. Suganthan and S. Baskar: Comprehensive learning particle swarm optimizer for [1] Bäck T:Evolutionary Algorithms in Theory and Practice, global optimization of multimodal functions, IEEE Oxford University Press, New York, 1996. Transactions on , 10(3): 281- [2] Bing Liu, Chunru Wan, and L.P. Wang: An efficient semi- 295, 2006. unsupervised gene selection method via spectral biclustering, IEEE Transactions on Nano-Bioscience, vol.5, no.2, pp.110-114, June, 2006. [3] Tseng, V.S., Kao, C.-P.: Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method, IEEE/ACM Transactions on Computational Biology and Bioinformatics 2 (2005) 355 – 365. [4] Fu, X.J., L.P. Wang: Data dimensionality reduction with application to simplifying RBF network structure and improving classification performance, IEEE Trans. System, Man, Cybern, Part B: Cybernetics 33 (2003) 399-409. [5] Darrell Whitley: The GENITOR algorithm and selection pressure: Why rank based allocation of reproductive trials

Third International Conference on Natural Computation (ICNC 2007) 0-7695-2875-9/07 $25.00 © 2007