12th International Conference on Parallel Problem Solving From Nature - PPSN XII Sebastian U. Stich On spectral invariance of Randomized Hessian [email protected] and Covariance Matrix Adaptation schemes Christian L. Müller [email protected]

Motivation Contribution Conclusion 1 T monotonic dependence Let f(x) = 2x Hx quadratic to minimize. The simple We consider Covariance Matrix Adaptation schemes We observe a of the performance random search scheme in [1] draws search directions u (CMA-ES [3], Gaussian Adaptation (GaA) [4]) and Ran- of the studied variable metric schemes on the shape of from fixed distribution N (0,C) and generates improving domized Hessian (RH) schemes from Leventhal and the eigenvalue spectra. x = x+arg min f(x+tu)·u Lewis [5]. iterates by a line search: + t . The sigmoidal-shaped spectra of fSigm presents the hard- The progress depends on the spectra of H and can be We provide a new, numerically stable implementation for est learning problem for all tested variable metric schemes. estimated [2] as: RH and, in addition, combine the update with an adap- Randomized Hessian schemes are less dependent on the tive step size strategy. λ (HC)! spectra. f(x ) 1 − min f(x) + / Tr[HC] We design a class of quadratic functions with parametriz- The use of an evolution path is crucial for CMA schemes able spectra to study the influence of the spectra on the to get superior performance. eigenvalue spectra performance of variable metric schemes. Study the influence of the of Randomized Hessian update with adaptive step size has H variable the Hessian on the performance of We empirically study 5 variable metric schemes on this shown to be an effective variable metric scheme. metric schemes. function class and on Rosenbrock’s function.

Algorithms Test functions

Randomized Hessian Update [2, 5] 1 T • Quadratic functions f(x) = 2x Hx • curvature estimate in a random direction u ∼ Sn−1 • standard Rosenbrock function as non-convex test case • rank-1 update of Hessian estimate H Convergence dependent on spectra Simple design principles for the Hessian H: • 2-4 additional function evaluations • H ∈ Rn × Rn positive definite (pd) ∆ ← f(x+u)−2f(x)+f(x−u) − uT Hu 1 u 2 • T CMA-ES and GaA show strongest dependence on spectra • fixed condition number κ(H) = L 2 if J := H + ∆u · uu pd then T • RH schemes are less dependent on spectra n(L+1) 3 H+ ← H + ∆u · uu • fixed trace Tr[H] = 2 else • spectra λ(H) easy to parametrize 4 v ← smallestEigenVector(J) f(x+v)−2f(x)+f(x−v) T 5 ∆v ← 2 − v Jv  T  T Generic extension to known test functions (tablet, cigar). 6 H+ ← H + ∆v · vv + ∆u · uu 1e6 fSigm(15) many eigenvalues close to 1 or to L RH RP fLin linearly spaced eigenvalues RH (1+1) f most eigenvalues concentrated at the mean L/2 RH with line search (RH RP) 1e5 GaA Flat(6) CMA-ES fNes the same spectra as Nesterov’s worst case • alternate between Hessian estimate and search CMA-ESnp function for first order methods • search direction u ∼ N (0,H−1) fRosen smoothly changing Hessian • line search to guarantee sufficient decrease #FES for accuracy 1e-9 1e4 ∗ x := x + arg min f(x + tu) · u sigmoid - linear - flat L t∈R shape of the spectra ∗ ∗ x+ ∈ [(1 − µ)x + µx , x ] Figure: Relation between method performance and spectral distribution in n = 50 for L = 1e6. We recorded #FES needed to reach accuracy 1e-9 on all parametrized functions fSigm, fFlat and fLin; L/2 RH with adaptive step size control the median of 51 runs is indicated by a marker. Sigmoid Linear Nesterov • alternate between Hessian estimate and search Flat −1 • u ∼ N (0,H ) 1 search direction 1 n/2 n • adaptive step size control (RH (1+1)) eigenvalue index Figure: Shape of the spectra of the quadratic benchmark functions. success: x+ = x + σu σ+ = σ · 1.40 failure: x+ = x σ+ = σ · 0.88 Scaling in dimension Trajectories Gaussian Adaptation [4] quadratically n • covariance estimation based on maximum entropy • learning phase scales in the dimension • convergence in three phases: (i) tune-in, (ii) learning, principle, rank-1 update along search direction • RH schemes quadratic throughout testbed (iii) fast convergence • search direction u ∼ N (0,C) • GaA and CMA-ESnp on fSigm(15) sub-quadratic • convergence rate at the level of the target accuracy is • adaptive step size control • CMA-ES on fFlat(6) super-linear best for CMA-ES and RH (1+1)

CMA- [3] RH (1+1) GaA 1e4 1e4 RH RP 1e5 1e5 1 1 1e-4 1e-4

• (1,4)-CMA-ES with mirrored sampling and function value function value 1e-8 1e-8 1e3 1e3 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 sequential selection, default parameter setting 2 2 #FES for accuracy 1e-9 #FES for accuracy 1e-9 #FES / n #FES / n 5 10 20 30 40 50 60 5 10 20 30 40 50 60 dimension dimension • rank-1 update based on covariance estimation and (a) fNes, n = 50, L = 10000 (b) fSigm(15), n = 50, L = 10000 fNes, L = 10000 fSigm(15), L = 10000 (a) (b) evolution path 1e4 RH RP RH (1+1) 1e4 1 CMA-ES CMA-ESnp (1+1)-ES • search direction u ∼ N (0,C) 1e5 1 RP 1e5 1e-4 1e-4 GaA function value function value CMA-ES 1e-8 1e-8 CMA-ESnp

0 10 20 30 40 50 60 0 50 100 150 200 1e3 1e3 #FES / n2 #FES / n2 #FES for accuracy 1e-9 #FES for accuracy 1e-9 5 10 20 30 40 50 60 5 10 20 30 40 50 60 CMA-ES without Evolution Path dimension dimension (c) fFlat(6), n = 50, L = 10000 (d) fRosen, n = 50 (c) fFlat(6), L = 10000 (d) fRosen • (1,4)-CMA-ES with mirrored sampling and Figure: Evolution of function value vs. #FES for different functions. Figure: #FES to reach the target accuracy vs. dimension n in log-log sequential selection, no evolution path (CMA-ESnp) We recorded #FES needed to reach accuracy 1e-9. The median scale. The median of 11 runs is depicted by a marker for all converged trajectory of 11 runs is depicted; mean and one standard deviation are • rank-1 update based on covariance estimation runs within the considered #FES budget. Thin lines indicate quadratic indicated by markers. scaling (top) or linear scaling (bottom) .

Selected References Acknowledgments

[1] Stich, S.U., Müller, C.L., Gärtner, B.: [3] Brockhoff, D., Auger, A., Hansen, N., Arnold, D., Hohm, T.: We thank Dr. Bernd Gärtner and Dr. Ivo F. Sbalzarini for useful discussions. Optimization of convex functions with Random Pursuit. Mirrored Sampling and Sequential Selection for Evolution Strategies. Supported by the project CG Learning (FET-Open grant number: 255827). http://arxiv.org/abs/1111.0194 (2011) In: PPSN XI. Volume 6238 of LNCS. Springer (2010) 11–21 [2] Stich, S.U., Gärtner, B., Müller, C.L.: [4] Müller, C.L., Sbalzarini, I.F.: Gaussian Adaptation as a unifying framework [5] Leventhal, D., Lewis, A.S.: Variable Metric Random Pursuit. for continuous black-box optimization and adaptive Monte Carlo sampling. Randomized Hessian estimation and directional search. in preparation for Math. Prog. (2012) In: (2010) 1–8 Optimization 60(3) (2011) 329–345