<<

Improved Step Size for the MO-CMA-ES Thomas Voß, Nikolaus Hansen, Christian Igel

To cite this version:

Thomas Voß, Nikolaus Hansen, Christian Igel. Improved Step Size Adaptation for the MO-CMA-ES. Genetic And Conference, Jul 2010, Portland, United States. pp.487-494, ￿10.1145/1830483.1830573￿. ￿hal-00503251￿

HAL Id: hal-00503251 https://hal.archives-ouvertes.fr/hal-00503251 Submitted on 18 Jul 2010

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Improved Step Size Adaptation for the MO-CMA-ES

Thomas Voß Nikolaus Hansen Christian Igel Institut für Neuroinformatik Université de Paris-Sud Institut für Neuroinformatik Ruhr-Universität Bochum Centre de recherche INRIA Ruhr-Universität Bochum 44780 Bochum, Germany Saclay – Íle-de-France 44780 Bochum, Germany [email protected] F-91405 Orsay Cedex, France [email protected] [email protected]

ABSTRACT 1. INTRODUCTION The multi-objective covariance matrix adaptation The multi-objective covariance matrix adaptation evolu- strategy (MO-CMA-ES) is an evolutionary for tion strategy (MO-CMA-ES, [14, 16, 19]) is an extension continuous vector-valued optimization. It combines indica- of the CMA-ES [12, 11] for real-valued multi-objective opti- tor-based selection based on the contributing hypervolume mization. It combines the mutation and strategy adaptation with the efficient strategy parameter adaptation of the elitist of the (1+1)-CMA-ES [14, 15, 19] with a multi-objective se- covariance matrix adaptation (CMA-ES). lection procedure based on non-dominated sorting [6] and Step sizes (i.e., mutation strengths) are adapted on indivi- the contributing hypervolume [2] acting on a population of dual-level using an improved implementation of the 1/5-th individuals. success rule. In the original MO-CMA-ES, a mutation is In the MO-CMA-ES, step sizes (i.e., mutation strengths) regarded as successful if the offspring ranks better than its are adapted on individual-level. The step size update pro- parent in the elitist, rank-based selection procedure. In con- cedure originates in the well-known 1/5-th rule originally trast, we propose to regard a mutation as successful if the presented by [18] and extended by [17]. If the success rate, offspring is selected into the next parental population. This that is, the fraction of successful mutations, is high, the step criterion is easier to implement and reduces the computa- size is increased, otherwise it is decreased. In the original tional of the MO-CMA-ES, in particular of its MO-CMA-ES, a mutation is regarded as successful if the re- steady-state variant. The new step size adaptation improves sulting offspring is better than its parent. In this study, we the performance of the MO-CMA-ES as shown empirically propose to replace this criterion and to consider a mutation using a large set of benchmark functions. The new update as being successful if the offspring becomes a member of the scheme in general leads to larger step sizes and thereby coun- next parent population. We argue that this notion of success teracts premature convergence. The experiments comprise is easier to implement, computationally less demanding, and the first evaluation of the MO-CMA-ES for problems with improves the performance of the MO-CMA-ES. more than two objectives. In the next section, we briefly review the MO-CMA-ES. In Sec. 3, we discuss our new notion of success for the step Categories and Subject Descriptors size adaptation. Then, we empirically evaluate the resulting . In this evaluation, the MO-CMA-ES is for the G.1.6 [Optimization]: ; I.2.8 [Problem first time benchmarked on functions with more than two Solving, Control Methods, and Search]: meth- objectives. As a baseline, we consider a new variant of the ods NSGA-II, in which the crowding distance is replaced by the contributing hypervolume for sorting individuals at the same General Terms level of non-dominance. Algorithms, Performance 2. THE MO-CMA-ES Keywords In the following, we briefly outline the MO-CMA-ES ac- multi-objective optimization, step size adaptation, covari- cording to [14, 16, 19], see Algorithm 1. For a detailed ance matrix adaptation, evolution strategy, MO-CMA-ES description and a performance evaluation on bi-objective benchmark functions we refer to [14, 21]. We consider ob- n m T jective functions f : R R , x (f1(x), . . . , fm(x)) . → 7→ (g) In the MO-CMA-ES, a candidate solution ai in generation g is a tuple x(g), p¯(g) , σ(g), p(g), C(g) , where x(g) Rn is i succ,i i i,c i i ∈ Permission to make digital or hard copies of all or part of this work for (g) the currenth search point,p ¯ ,i [0, 1] isi the smoothed suc- personal or classroom use is granted without fee provided that copies are succ ∈ cess probability, σ(g) R+ is the global step size, p(g) Rn not made or distributed for profit or commercial advantage and that copies i ∈ 0 i,c ∈ bear this notice and the full citation on the first page. To copy otherwise, to (g) n×n is the cumulative evolution path, Ci R is the covari- republish, to post on servers or to redistribute to lists, requires prior specific ance matrix of the search distribution.∈ For an individual permission and/or a fee. GECCO’10, July 7–11, 2010, Portland, Oregon, USA. a encoding search point x, we write f(a) for f(x) with a Copyright 2010 ACM 978-1-4503-0072-8/10/07 ...$10.00. slight abuse of notation. We first describe the general ranking procedure and sum- Algorithm 1: (µ +λ)-MO-CMA-ES marize the other parts of the MO-CMA-ES. The MO-CMA- ES relies on the non-dominated sorting selection scheme [6]. 1 g 0, initialize parent population Q(0); As in the SMS-EMOA [2], the hypervolume-indicator serves 2 repeat← as second-level sorting criterion to rank individuals at the 3 for k = 1,...,λ do (g) same level of non-dominance. Let A be a population, and let 4a ik 1, ndom Q ; a, a′ be two individuals in A. Let the non-dominated solu- ←U | | ′ ′ 4b ik k;“ “ ” ” tions in A be denoted by ndom(A) = a A ∄a A : a ←g g { ∈ ∈ ≺ 5 a′( +1) a( ) ; a , where denotes the Pareto-dominance relation. The k ← ik } ≺ ˛ ′(g+1) (g) (g) (g) elements in ndom(A) are assigned a level of non-dominance˛ 6 x x + σ 0, C ; k ik ik ik of 1. The other ranks of non-dominance are defined recur- ∼ N 7 Q(g) Q(g) a′(g+1) “; ” sively by considering the set A without the solutions with ← ∪ k lower ranks [6]. Formally, let dom0(A) = A, doml(A) = 8 for k = 1,...,λ don o doml−1(A) ndoml(A), and ndoml(A) = ndom(doml−1(A)) (g+1) \ 9 p¯′ for l 1. For a A we define the level of non-dominance succ,k ← ≥ ∈ ′(g+1) (g) ′(g+1) rank(a, A) to be i iff a ndomi(A). (1 cp)p¯ + cp succ (g) a , a ; ∈ − succ,k Q ik k The hypervolume measure or -metric was introduced p¯′(g+1)“ −ptarget ” S 10 ′(g+1) ′(g+1) 1 succ,k succ in the domain of evolutionary multi-objective optimization σ k σ k exp d target ; ← 1−psucc (MOO) in [26]. It is defined as „ « 11 ¯′(g+1) if p succ,k < pthresh then 12 ′(g+1) ref ref p c,k f ref (A)=Λ f1(a), f1 fm(a), fm , ← ′ S ×···× x (g+1)−x(g) a∈A ! ′(g+1) k ik h i h i (1 cc) p + cc(2 cc) ; [ c,k σ(g) (1) − − ik ref m with f R referring to an appropriately chosen refer- 13 ′(g+1) p ∈ C k ence point and Λ( ) being the Lebesgue measure. The con- ← ′(g+1) ′(g+1) ′(g+1)T · ′ C p p tributing hypervolume of a point a A = ndom(A) is given (1 ccov) k + ccov c,k c,k ; 14 − by ∈ else ′ ′ ′ ′ g ′ g ∆S (a, A ) = f ref (A ) f ref (A a ) . (2) 15 ( +1) ( +1) S − S \{ } p c,k (1 cc) p c,k ; ′ ← − Now we define the contribution rank cont(a, A ) of a. This is 16 ′(g+1) ′(g+1) C k (1 ccov) C k + again done recursively. The element, say a, with the smallest ′(←g+1) −′(g+1)T ′(g+1) ccov p p + cc (2 cc) C ; contributing hypervolume is assigned contribution rank 1. c,k c,k − k ′ The next rank is assigned by considering A a etc. More (g) “ (g) (g) ′(g+1)” 17 p¯ (1 cp)¯p + cp succ (g) a , a ; ′ ′ \{ } ik ik Q ik k precisely, let c0(A ) = argmina∈A′ ∆S (a, A ) and ← − p(g) −ptarget (g) (g) ¯succ,i succ “ ” − 18 1 k i 1 σi σi exp d target ; ′ ′ ′ k ← k 1−psucc ci(A ) = c0 A cj (A ) (3) „ « \ 19 g g + 1; j=0 ! ← [ ˘ ¯ (g) (g−1) 20 Q Q≺ 1 i µ ; for i > 0. For a A′ we define the contribution rank ← :i ≤ ≤ ′ ∈ ′ until stoppingn criterion is meto ; cont(a, A ) to be i + 1 iff a = ci(A ). In the ranking pro- ˛ ˛ cedure ties are broken at random. We refer to the points a a = argmin fi(a), i = 1,...,m as boundary ele- { | a∈A } ments of A. The reference point f ref is chosen in each it- ref eration such that an individual with fitness f would be state MO-CMA-ES), the parent is chosen uniformly at ran- dominated by all individuals in the current population and dom from the set of non-dominated individuals ndom Q(g) such that all boundary elements get the highest contribu- tion ranks (i.e., the boundary elements are always selected). (line 4a). Otherwise, one offspring individual is created“ from” Such a reference point always exists, is easy to find and its every parent individual (line 4b). Thereafter, the strategy exact choice does not matter for the MO-CMA-ES as long parameters of parent and offspring individuals are adapted as the boundary elements get the highest ranks. (lines 9–18). The decision whether a new candidate solu- Finally, the following relation between individuals a, a′ tion is better than its parent is made in the context of the (g) A is defined: ∈ population Q of parent and offspring individuals subject to the indicator-based selection strategy implemented in the ′ ′ a A a rank(a, A) < rank(a ,A) algorithm. The step sizes and the covariance matrix of the ≺ ⇔ ′∨ offspring individuals are updated (lines 9–16). Subsequently, rank(a, A) = rank(a ,A) ∧ the step sizes σ(g) of the parent individuals a(g) are adapted ′ ik ik cont(ˆ a, ndom(A)) > cont(a , ndom(A)) (4) (line 17 and 18). Finally, the new parent population is se- (g) ′(g+1) ˜ lected from the set of parent and offspring individuals ac- The success indicator succQ(g) ai , ai in Algorithm 1 cording to the indicator-based selection scheme (line 20). “ ” ′(g+1) (g) (g) evaluates to one if the mutation that has created ai is Here, Q≺:i denotes the ith best individual in Q ranked by considered to be successful and to zero otherwise, see Sec. 3. non-dominated sorting and the contributing hypervolume In each generation, λ offspring individuals are sampled according to (Eq. 4). (lines 3–7). If λ does not equal µ (e.g., in case of the steady- When in this study the MO-CMA-ES is applied to a bench- mark problem f with box constraints, we consider a penal- these rounds the cardinality of the set we have to consider is ized fitness function reduced by one, the contributing hypervolume (Eq. 2) needs k 2mk−m2+2k−m penalty 2 to be computed i = times. For the f (x) = f(feasible(x)) + α x feasible(x)) , (5) i=k−m 2 k − k2 special case of the steady-state MO-CMA-ES with λ = 1, we where feasible(x) returns the closest feasible point to x w.r.t. therefore need toP compute a contributing hypervolume for at the L1-norm. most µ+1 points, because we just have to determine a single The (external) strategy parameters are the population individual to discard. However, in the original MO-CMA- target size, initial global step size, target success probability psucc , ES additional contributing hypervolume computations are step size damping d, success rate averaging parameter cp, cu- required as discussed in the following section. mulation time horizon parameter cc, and covariance matrix learning rate ccov. Default values as given in [14] and used target −1 3. NEW NOTION OF SUCCESS FOR STEP in this paper are d = 1 + n/2, psucc = (5 + 1/2) , cp = target target 2 psucc /(2 + psucc ), cc = 2/(n + 2), ccov = 2/(n + 6), and SIZE ADAPTATION p −6 pthresh = 0.44. In the constraint handling we set α = 10 . For the success rule based adaptation of the step sizes, (0) The initial global step sizes σi are set dependent on the we need a notion of what is considered to be a successful problem (e.g., in the case of box constraints, see below, with mutation. The choice of the success criterion is crucial for u l u l u l xi xi = xj xj for 1 i, j n to 0.6 (x1 x1)). the step size update procedure (see Eq. 7). It directly affects − − ≤ ≤ · − the (smoothed) success rate associated with the individual 2.1 Step Size Update Procedure that in turn influences the update of the global step-size σi The focus of this study is the step size adaptation, which as well as the update of the covariance matrix Ci. is described in more detail in the following. After sampling In general, a conservative notion of success results in a the new candidate solutions, the step size of parent a is up- low success rate and thereby in a decrease of the step sizes dated based on the smoothed success rate (different notions σi. If the criterion is too conservative, the convergence rate of success are discussed in Sec. 3) of the MO-CMA-ES slows down. In contrast, an optimistic ′ notion of success results in a higher success rate and larger p¯succ = (1 cp)p ¯succ + cp succQ a, a , (6) steps. − In the (1+1)-CMA-ES defining the notion of success is with a learning rate cp (0 < cp 1) according` ´ to ≤ unambiguous. A mutation has been successful if the parent target 1 psucc psucc is replaced by the offspring. This can be expressed in two σ = σ exp −target . (7) · d 1 psucc ways. A mutation has been successful if (i) the offspring is „ − « better than the parent, (ii) the offspring is selected. While The update rule is rooted in the 1/5-success-rule proposed these criteria are equivalent in the (1+1)-CMA-ES, they lead by [18] and is an extension from the rule proposed by [17]. to different success indicators succQ in the MO-CMA-ES. It implements the well-known heuristic that the step size should be increased if the success rate (i.e., the fraction of 3.1 Parent-Based Notion of Success offspring better than the parent) is high, and the step size In the original MO-CMA-ES, a mutation was considered should be decreased if the success rate is low. The rule is as being successful if the offspring is better than the parent. reflected in the argument of the exponential function. For target This requires a direct comparison of an offspring individual p¯succ > psucc the argument is greater than zero and the ′(g+1) (g) target ai with its parent ai w.r.t. the level of non-dominance step size increases; forp ¯succ < psucc the argument is smaller target and the contribution rank. Thus, we have than zero and the step size decreases; forp ¯succ = psucc the argument becomes zero and no change of σ takes place. ′(g+1) (g) I (g) ′(g+1) 1 if ai ai The argument to the exponential function is always smaller succQ(g) ai , ai = ≺ . (8) than 1/d and larger than 1/d if ptarget < 0.5 (a necessary (0 otherwise − succ “ ” assumption). Therefore, the damping parameter d controls This direct comparison of parent and offspring may require the rate of the step size adaptation. Usingp ¯succ instead more contributing hypervolume computations than the en- (g) ′(g+1) of succQ(g) ai , ai primarily smooths the single step vironmental selection procedure (see Sec. 2.2) if we use ex- size changes.“ ” actly the same ranking method as described in Sec. 2. If parent and offspring are selected and have the same level 2.2 Hypervolume Computations of non-dominance (which frequently happens in real-valued Computing the contributing hypervolume (see Eq. 2) is multi-objective optimization after the first generations), ad- computationally demanding [24, 23, 3, 1]. In fact, calcu- ditional hypervolume computations are required to deter- lating the contributing hypervolume is #P-hard (see [5]), mine whether the parent or the offspring rank higher. where #P is the analog of NP for counting problems (see [20]). 3.2 Population-Based Notion of Success Thus, in an efficient implementation of the MO-CMA-ES We propose a simpler, at least as intuitive notion of suc- ′(g+1) this should be done as rarely as possible. For selection, it is cess. An offspring individual ai is considered successful not necessary to rank all µ + λ individuals. It is sufficient if it is selected for the next parent population P (g+1) = to determine the λ worst. In addition, only if there is a (g) Q 1 i µ : need to pick m λ individuals from the same level of non- ≺:i ≤ ≤ ≤ dominance, say from ndoml(A) with k = ndoml(A) and n ˛ o | | ˛ ′(g+1) (g+1) m < k, we need to determine k m times the individual P (g) ′(g+1) 1 if ai Q − succQ(g) ai , ai = ∈ . (9) with the least hypervolume contribution. Because in each of (0 otherwise “ ” This criterion is strictly more optimistic than the parent- tetrahedron). The function is called GELLIm, where the based notion of success in the sense that superscript m indicates the number of objectives.

I (g) ′(g+1) P (g) ′(g+1) The default search space dimension for constrained and succ (g) a , a = 1 succ (g) a , a = 1 , Q i i ⇒ Q i i non-rotated benchmark functions has been chosen to be 30. “ ” “ ” (10) In case of rotated benchmark functions, the search space di- for all selected individuals requiring a step size update. mensions has been chosen to be 10. The number of parent No additional (contributing) hypervolume computations and offspring individuals has been set to µ = λ = 100. We are needed in addition to those needed for selection as de- conducted 25 independent trials with 100000 fitness func- scribed in Sec. 2.2. Especially for the steady state MO- tion evaluations each. We sampled the performance of the CMA-ES, this considerably decreases the computation time algorithms after every 500th fitness function evaluation and for the strategy parameter update. carried out the statistical evaluation after 25,000 and 50,000 For the remainder of this work, the terms MO-CMA-ESI fitness function evaluations. and MO-CMA-ESP refer to the MO-CMA-ES relying on the individual-based and population-based notion of success, 4.2 Statistical Evaluation respectively. We consider the unary hypervolume-indicator as perfor- mance measure [26, 27]. We want to compare k = 5 algo- 4. EMPIRICAL EVALUATION rithms on a particular optimization problem f after g fitness This section presents a performance evaluation of the re- evaluations and we assume that we have conducted t trials. vised step size adaptation on a broad range of two- and We consider the non-dominated individuals of the union of all k t populations after g evaluations. These individuals three-objective benchmark functions. We compare the (µ+λ)- · P P make up the reference set . The upper bounds of the ref- MO-CMA-ES and the (µ+1)-MO-CMA-ES to the results R I 1 of the original (µ+λ)-MO-CMA-ES and the (µ+1)-MO- erence set of the respective fitness function translated by CMA-ESI . The results of both algorithms are compared to serve as the reference point for the calculation of the unary results of a “new” variant of the well-known NSGA-II [6]. hypervolume-indicator. Because it is well-known that the MO-CMA-ES in general Several ways to compare multiple direct search outperforms the standard NSGA-II (e.g., see [16]) for real- on multiple objective functions have been proposed in the valued optimization, we replaced the second-level sorting cri- literature [4, 9]. Such a statistical comparison is not straight- terion in the NSGA-II and use the contributing hypervolume forward, because one has to account for multiple testing. For (as in the MO-CMA-ES) instead of the crowding distance. the overall evaluation of the algorithms, we follow the rec- The resulting algorithm is a non-steady-state version of the ommendation in [9] and use non-parametric statistical tests SMS-EMOA [2]. All experiments have been conducted using in a step-wise procedure. For each fitness function, we rank the Shark library [13]. the algorithms and then compute the average ranks. Then, we apply a (Friedman) test to check whether the ranks are 4.1 Experimental Setup different from the mean rank. If so, we determine ad hoc whether two algorithms differ by pairwise comparison (us- We compare the algorithms on several classes of bench- ing Bergmann-Hommel’s dynamic procedure). We fixed a mark functions. The bi-criteria constrained benchmark func- significance level of p = 0.001. For a detailed description tions ZDT1–4 and ZDT6 (see [25]) and their rotated vari- of the test procedure we refer to the literature [8, 9, 10]. ants IHR1–4 and IHR6 (see [14]) have been chosen for the We rely on the open source software supplied by Garcıa and performance evaluation. Additionally, the set of bi-objective Herrera [9] in our evaluation. In addition, we compare the test problems has been augmented by the unconstrained and results on the individual benchmark functions using a stan- rotated functions ELLI1, ELLI2, CIGTAB1 and CIGTAB2 dard two-sided Wilcoxon rank sum test (p < 0.001). All (see [14]), with the distance of the optima of the single ob- results highlighted in the following sections are statistically jectives set to the default value two. In the case of three significant, if not stated otherwise. objectives, the seven constrained functions DTLZ1–7 (see [7]) have been chosen. 4.3 Results We defined a new class of test functions based on ELLI1 (see [16]), which is scalable to an arbitrary number of ob- The results of the performance evaluation after 25,000 and n×n 50,000 fitness function evaluations are presented in Tables 1 jectives m n, for this study. Let O R be an or- P ≤ n×n ∈ and 2. In the overall comparison, the (µ+λ)-MO-CMA-ES thogonal matrix, D R be a diagonal matrix. Each P ∈ n and the (µ+1)-MO-CMA-ES performed significantly bet- candidate solution x R is transformed by v = DOx. I I Moreover, a matrix M∈ Rm×n defining the centers of the ter than the (µ+λ)-MO-CMA-ES , (µ+1)-MO-CMA-ES m objectives is needed.∈ The objective functions are then and the NSGA-II across the set of benchmark functions. 1 n 2 The steady-state MO-CMA-ES with the new step-size up- given by fm(v) = α2·n i=1(vi Mmi) . Varying M pro- · − × date is the best of the five algorithms. duces different benchmark functions. Here, M Rm n is P ∈ When looking at the single benchmark functions, the new d−1 P P set to M ij = 0 if j > m, M ij = d if i = j, and (µ+λ)-MO-CMA-ES and the new (µ+1)-MO-CMA-ES −1 >0 n×n M ij = otherwise. Here, d qR and D R performed always better than their counterparts with the √d·(d−1) ∈ ∈ i−1 original step size adaptation (in one single case the difference − is set to Dij = α n 1 if i = j and to 0 otherwise. For this is not significant at the level p < 0.001). Further, the MO- study, the parameter α was set to 1000 and d was chosen CMA-ES outperformed NSGA-II across the set of bench- as 2. The m optima of the single objective functions are mark functions, except for ZDT4, IHR4, and DTLZ1. The placed on the unit sphere centered at the origin of Z such latter three functions are multi-modal and it is well-known that they have maximum distance (i.e., they form a hyper- that the elitist variant of the (MO-)CMA-ES suffers from 4 (µ+λ)-MO-CMA-ESI (µ+λ)-MO-CMA-ESP (µ+1)-MO-CMA-ESI 3.8 P 10 (µ+1)-MO-CMA-ES

3.6

3.4

3.2

3

1

2.8 Global Step-Size Absolute Hypervolume 2.6

2.4 (µ+λ)-MO-CMA-ESI (µ+λ)-MO-CMA-ESP 2.2 (µ+1)-MO-CMA-ESI (µ+1)-MO-CMA-ESP NSGA-II 2 0.1 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 Function Evaluations Evaluations

Figure 1: Evolution of the absolute hypervolume (left) and of the corresponding global step size (right) for the fitness function ELLI2 over the number of objective function evaluations. All plots refer to the medians over 25 trials. convergence into suboptimal local optima on multi-modal Acknowledgments fitness landscapes (see [14]). The new step size adaptation CI acknowledges support from the German Federal Ministry attenuates this problem. The enhanced performance can be of Education and Research within the National Network attributed to the larger global step sizes realized by the new Computational Neuroscience under grant number 01GQ0951. variant (see Fig. 1 and Fig. 2) which in turn prevents both algorithms from getting stuck in local optima early. Nev- ertheless, the NSGA-II still outperforms all MO-CMA-ES 6. REFERENCES variants on these fitness functions. [1] N. Beume. S-metric calculation by considering The good performance of both the (µ+λ)-MO-CMA-ESP dominated hypervolume as Klee’s measure problem. as well as its steady-state variant (µ+1)-MO-CMA-ESP show Evolutionary Computation, 17(4):477–492, 2009. that the new notion of success clearly improves the conver- [2] N. Beume, B. Naujoks, and M. Emmerich. gence properties of the algorithm. SMS-EMOA: Multiobjective selection based on dominated hypervolume. European Journal of Operational Research, 181(3):1653–1669, 2007. [3] N. Beume and G. Rudolph. Faster S-metric 5. CONCLUSIONS calculation by considering dominated hypervolume as We presented a new step size adaptation procedure for the Klee’s measure problem. In IASTED International Conference on Computational Intelligence, pages MO-CMA-ES that improves the convergence speed and at 231–236. ACTA Press, 2006. the same time reduces the risk of convergence into subopti- [4] S. Bleuler, M. Laumanns, L. Thiele, and E. Zitzler. mal local optima. Additionally, the new update scheme low- PISA – A platform and programming language ers the computational complexity of both the generational independent interface for search algorithms. In C. M. MO-CMA-ES as well as its steady-state variant consider- Fonseca, P. J. Fleming, E. Zitzler, K. Deb, and ably by reducing the number of required computations of L. Thiele, editors, Evolutionary Multi-Criterion the contributing hypervolume. Our experiments showed the Optimization (EMO 2003), volume 2632 of LNCS, pages 494 – 508. Springer-Verlag, 2003. significantly improved performance of the new approach. [5] K. Bringmann and T. Friedrich. Approximating the As baseline for the empirical evaluation, we considered least hypervolume contributor: NP-Hard in General, a new variant of the NSGA-II relying on the hypervolume But Fast in Practice. In M. Ehrgott, C. M. Fonseca, indicator as second-level sorting criterion. This compari- X. Gandibleux, J.-K. Hao, and M. Sevaux, editors, son demonstrated that the superior performance of the MO- EMO, volume 5467 of Lecture Notes in Computer CMA-ES is not only due to the selection procedure but that Science, pages 6–20. Springer, 2009. the powerful strategy parameter adaptation in the CMA-ES [6] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: plays a major role. For the first time, it was shown that NSGA-II. IEEE Transactions on Evolutionary the MO-CMA-ES outperforms the NSGA-II also in the case Computation, 6:182–197, 2002. of more than two objectives (see Fig. 3). In summary, we [7] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler. strongly recommend the new update rule for the MO-CMA- Scalable multi-objective optimization test problems. ES. In Congress on Evolutionary Computation (CEC), In future work, additional notions of success will be con- pages 825–830. IEEE Press, 2002. sidered. For instance, one could check whether the offspring [8] J. Demˇsar. Statistical comparisons of classifiers over individual is at least at the same level of non-dominance as multiple data sets. Journal of Machine Learning Research, 7:1–30, 2006. the parent individual. Further, we will study the impact of [9] S. Garcıa and F. Herrera. An extension on ”statistical the new step size update procedure on the performance of comparisons of classifiers over multiple data sets” for the MO-CMA-ES with recombination of strategy parame- all pairwise comparisons. Journal of Machine Learning ters presented in [22]. Research, 9:2677–2694, 2008. 17 10 (µ+λ)-MO-CMA-ESI (µ+λ)-MO-CMA-ESP I 16 (µ+1)-MO-CMA-ES (µ+1)-MO-CMA-ESP

15

14 1

13 (µ+λ)-MO-CMA-ESI (µ+λ)-MO-CMA-ESP 12 (µ+1)-MO-CMA-ESI (µ+1)-MO-CMA-ESP NSGA-II

11 Global Step-Size

Absolute Hypervolume 0.1 10

9

8

7 0.01 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 Fitness Function Evaluations Fitness Function Evaluations

Figure 2: Evolution of the absolute hypervolume (left) and of the global step size (right) for the fitness function IHR4.

1.55 3.7

3.6 1.5

3.5

1.45 3.4

3.3 1.4 Absolute Hypervolume Absolute Hypervolume

3.2

1.35 I I (µ+λ)-MO-CMA-ES 3.1 (µ+λ)-MO-CMA-ES (µ+λ)-MO-CMA-ESP (µ+λ)-MO-CMA-ESP (µ+1)-MO-CMA-ESI (µ+1)-MO-CMA-ESI (µ+1)-MO-CMA-ESP (µ+1)-MO-CMA-ESP NSGA-II NSGA-II 1.3 3 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 Fitness Function Evaluations Fitness Function Evaluations (a) DTLZ2 (b) DTLZ3

18.6 6.4

18.5 6.35

18.4 6.3

18.3 6.25

18.2 Absolute Hypervolume 6.2 Absolute Hypervolume 18.1 6.15

18 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 (µ+λ)-MO-CMA-ESI 6.1 P Fitness Function Evaluations (µ+λ)-MO-CMA-ES (µ+1)-MO-CMA-ESI P (µ+λ)-MO-CMA-ESI (µ+1)-MO-CMA-ES (µ+λ)-MO-CMA-ESP NSGA-II (µ+1)-MO-CMA-ESI 6.05 (µ+1)-MO-CMA-ESP 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 NSGA-II Fitness Function Evaluations (c) DTLZ6 (d) DTLZ7

Figure 3: Evolution of the absolute hypervolume for the fitness functions DTLZ2, DTLZ3, DTLZ6, and DTLZ7 having three objectives. (µ+λ)-MO-CMA-ESI (µ+λ)-MO-CMA-ESP (µ+1)-MO-CMA-ESI (µ+1)-MO-CMA-ESP NSGA-II

Two-objective functions

ZDT1 6.04312V 6.08654I,III,V 6.06543I,V 6.08921I,II,III,V 6.04301 ZDT2 4.33030V 4.33181I,III,V 4.33092I,V 4.33201I,II,III,V 4.31906 ZDT3 3.89724V 3.89845I,III,V 3.89755I,V 4.89991I,II,III,V 3.89696 ZDT4 5.7355682III 6.9222784I,III,IV 5.0322781 5.33372III 7.92227I,II,III,IV ZDT6 9.14021V 9.14039I,III,V 9.14029I,V 9.14052I,II,III,V 8.99023 IHR1 10.00087V 10.00951I,III,V 10.00100I,V 10.00999I,II,III,V 9.35870 IHR2 21.35308V 22.46712I,III,V 22.00009I,V 22.55552I,II,III,V 19.26734 IHR3 11.01029V 11.94161I,III,V 11.50321I,V 11.99028I,II,III,V 10.90063 IHR4 9.63114III,IV 12.00649I,III,IV 7.40903 8.93219III 16.29601I,II,III,IV IHR6 7.49222V 8.30291I,III,V 7.51206I,V 8.44444I,II,III,V 7.04526 ELLI1 2.28712V 2.29654I,III,V 2.28990I,V 2.30657I,II,III,V 2.23001 ELLI2 2.90507V 2.94182I,III,V 2.91999V 2.96430I,II,III,V 2.84432 CIGTAB1 3.92059V 3.94653I,III,V 3.93485I,V 4.13942I,II,III,V 3.91111 CIGTAB2 7.09623V 7.09914I,III,V 7.09745I,V 7.10005I,II,III,V 6.82424

Three-objective functions

DTLZ1 210.09418III,IV 231.77529I,III,IV 190.11114 198.77843III 362.99741I,II,III,IV DTLZ2 1.45493V 1.51007I,III,V 1.45590I,V 1.52587I,II,III,V 1.44501 DTLZ3 3.46292V 3.57601I,III,V 3.48666I,V 3.61829I,II,III,V 3.41438 DTLZ4 3.75548V 3.87132I,III,V 3.80002I,V 3.91733I,II,III,V 3.71349 DTLZ5 1.74993V 1.81004I,III,V 1.78524I,V 1.86991I,II,III,V 1.73421 DTLZ6 18.17184V 18.35190I,III,V 18.32435I,V 18.45000I,II,III,V 18.04588 DTLZ7 6.25983V 6.29887I,III,V 6.26779I,V 6.33241I,II,III,V 6.22765 GELLI3 5.34226V 5.82398I,III,V 5.42099I,V 6.02909I,II,III,V 5.04991

Table 1: Performance comparison of the (µ+λ)-MO-CMA-ESI , (µ+λ)-MO-CMA-ESP , (µ+1)-MO-CMA-ESI , (µ+1)-MO- CMA-ESP , and NSGA-II using the hypervolume indicator as second-level sorting criterion. The table shows the median of 25 trials after 25,000 generations of the hypervolume-indicator (the higher the better). The superscripts I,II,III,IV and V indicate whether the respective algorithm performs significantly better than the (µ+λ)-MO-CMA-ESI , (µ+λ)-MO-CMA-ESP , (µ+1)-MO-CMA-ESI , (µ+1)-MO-CMA-ESP , and NSGA-II, respectively (two-sided Wilcoxon rank sum test, p < 0.001). The best value in each row is marked bold.

[10] S. Garcıa, D. Molina, M. Lozano, and F. Herrera. A for evolution strategies. In Proceedings of the Genetic study on the use of non-parametric tests for analyzing and Evolutionary Computation Conference (GECCO the evolutionary algorithms’ behaviour: A case study 2006), pages 453–460. ACM Press, 2006. on the CEC’2005 special session on real parameter [16] C. Igel, T. Suttorp, and N. Hansen. Steady-state optimization. Journal of Heuristics, 15:617–644, 2009. selection and efficient covariance matrix update in the [11] N. Hansen, S. D. Muller,¨ and P. Koumoutsakos. multi-objective CMA-ES. In Fourth International Reducing the time complexity of the derandomized Conference on Evolutionary Multi-Criterion evolution strategy with covariance matrix adaptation Optimization (EMO 2007), volume 4403 of LNCS. (CMA-ES). Evolutionary Computation, 11(1):1–18, Springer-Verlag, 2007. 2003. [17] S. Kern, S. D. Muller,¨ N. Hansen, D. Buche,¨ [12] N. Hansen and A. Ostermeier. Completely J. Ocenasek, and P. Koumoutsakos. Learning derandomized self-adaptation in evolution strategies. probability distributions in continuous evolutionary Evolutionary Computation, 9(2):159–195, 2001. algorithms – a comparative review. Natural [13] C. Igel, T. Glasmachers, and V. Heidrich-Meisner. Computing, 3:77–112, 2004. Shark. Journal of Machine Learning Research, [18] I. Rechenberg. Evolutionsstrategie: Optimierung 9:993–996, 2008. technischer Systeme nach Prinzipien der biologischen [14] C. Igel, N. Hansen, and S. Roth. Covariance matrix Evolution. Frommann-Holzboog, 1973. adaptation for multi-objective optimization. [19] T. Suttorp, N. Hansen, and C. Igel. Efficient Evolutionary Computation, 15(1):1–28, 2006. covariance matrix update for variable metric evolution [15] C. Igel, T. Suttorp, and N. Hansen. A computational strategies. Machine Learning, 75(2):167–197, 2009. efficient covariance matrix update and a (1+1)-CMA [20] L. G. Valiant. The complexity of computing the (µ+λ)-MO-CMA-ESI (µ+λ)-MO-CMA-ESP (µ+1)-MO-CMA-ESI (µ+1)-MO-CMA-ESP NSGA-II

Two-objective functions

ZDT1 6.69909V 6.71001I,III,V 6.69899I,V 6.73491I,II,III,V 6.69625 ZDT2 4.33332V 4.33482I,III,V 4.433402I,V 4.433502I,II,III,V 4.32001 ZDT3 3.99924V 3.99989I,III,V 3.99929I,V 4.02111I,II,III,V 3.99901 ZDT4 5.96642III 7.45576I,III,IV 5.10009 5.76452IV 8.13997I,II,III,IV ZDT6 11.10091V 11.63232I,III,V 11.29834I,V 11.65779I,II,III,V 11.09981 IHR1 11.47301V 11.59002I,III,V 11.49309I,V 11.71329I,II,III,V 11.46899 IHR2 21.49398V 22.51000I,III,V 22.01000I,V 22.72727I,II,III,V 20.27754 IHR3 12.00024V 12.00091I,III,V 12.00029I,V 12.00093I,II,III,V 12.00019 IHR4 9.66304III,IV 13.97065I,III,IV 7.50005 9.44219III 16.39299I,II,III,IV IHR6 8.99222V 9.37221I,III,V 9.11209I,V 9.39911I,II,III,V 8.10921 ELLI1 2.37769V 2.37782I,III,V 2.37779I,V 2.37791I,II,III,V 2.37664 ELLI2 4.00016V 4.00261I,III,V 4.00101III 4.00411I,II,III,V 3.90004 CIGTAB1 4.32756V 4.32865I,III,V 4.32765I,V 4.33065I,II,III,V 4.10921 CIGTAB2 7.57643V 7.90036I,III,V 7.87643I,V 8.10911I,II,III,V 7.20192

Three-objective functions

DTLZ1 214.29212III,IV 233.72501I,III,IV 191.90762 198.77843III 366.21213I,II,III,IV DTLZ2 1.46973V 1.53591I,III,V 1.47532I,V 1.55009I,II,III,V 1.45592 DTLZ3 3.46299V 3.58811I,III,V 3.49291I,V 3.66377I,II,III,V 3.41494 DTLZ4 3.77945V 3.89059I,III,V 3.82637I,V 3.96828I,II,III,V 3.73446 DTLZ5 1.75773V 1.83225I,III,V 1.78524I,V 1.86923I,II,III,V 1.73587 DTLZ6 18.28844V 18.44184I,III,V 18.33333I,V 18.59409I,II,III,V 18.06321 DTLZ7 6.27901V 6.32499I,III,V 6.28969I,V 6.37965I,II,III,V 6.24907 GELLI3 5.38871V 6.00012I,III,V 5.46952I,V 6.22111I,II,III,V 5.33869

Table 2: Performance comparison of the (µ+λ)-MO-CMA-ESI , (µ+λ)-MO-CMA-ESP , (µ+1)-MO-CMA-ESI , (µ+1)-MO- CMA-ESP , and NSGA-II using the hypervolume indicator as second-level sorting criterion. The table shows the median of 25 trials after 50,000 generations of the hypervolume-indicator (the higher the better). The superscripts I,II,III,IV and V indicate whether the respective algorithm performs significantly better than the (µ+λ)-MO-CMA-ESI , (µ+λ)-MO-CMA-ESP , (µ+1)-MO-CMA-ESI , (µ+1)-MO-CMA-ESP , and NSGA-II, respectively. The best value in each row is marked bold.

permanent. Theoretical , 8:189–201, Laboratory, (TIK), Swiss Federal Institute of 1979. Technology (ETH) Zurich, 2001. [21] T. Voß, N. Beume, G. Rudolph, and C. Igel. [25] E. Zitzler, K. Deb, and L. Thiele. Comparison of Scalarization versus indicator-based selection in multiobjective evolutionary algorithms: Empirical multi-objective CMA evolution strategies. In IEEE results. Evolutionary Computation, 8(2):173–195, 2000. Congress on Evolutionary Computation 2008 (CEC [26] E. Zitzler and L. Thiele. Multiobjective optimization 2008), pages 3041–3048. IEEE Press, 2008. using evolutionary algorithms — a comparative case [22] T. Voß, N. Hansen, and C. Igel. Recombination for study. In A. E. Eiben, T. B¨ack, M. Schoenauer, and learning strategy parameters in the MO-CMA-ES. In H.-P. Schwefel, editors, Fifth International Conference M. Ehrgott, C. Fonseca, X. Gandibleux, J.-K. Hao, on Parallel Problem Solving from Nature (PPSN-V), and M. Sevaux, editors, Fifth International Conference volume 1498 of LNCS, pages 292–301. on Evolutionary Multi-Criterion Optimization (EMO Springer-Verlag, 1998. 2009), volume 5467 of LNCS. Springer-Verlag, 2009. [27] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, [23] L. While. A new analysis of the LebMeasure algorithm and V. Grunert da Fonseca. Performance assessment for calculating hypervolume. In C. A. Coello Coello, of multiobjective optimizers: An analysis and review. E. Zitzler, and A. Hernandez Aguirre, editors, Third IEEE Transactions on Evolutionary Computation, International Conference on Evolutionary 7(2):117–132, 2003. Multi-Criterion Optimization (EMO 2005), volume 3410 of LNCS, pages 326–340. Springer-Verlag, 2005. [24] E. Zitzler. Hypervolume metric calculation. Technical report, Computer Engineering and Networks