Journal of Machine Learning Research 18 (2017) 1-65 Submitted 11/14; Revised 10/16; Published 4/17 Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles Yann Ollivier
[email protected] CNRS & LRI (UMR 8623), Université Paris-Saclay 91405 Orsay, France Ludovic Arnold
[email protected] Univ. Paris-Sud, LRI 91405 Orsay, France Anne Auger
[email protected] Nikolaus Hansen
[email protected] Inria & CMAP, Ecole polytechnique 91128 Palaiseau, France Editor: Una-May O’Reilly Abstract We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space 푋 into a continuous-time black-box optimization method on 푋, the information-geometric optimization (IGO) method. Invariance as a major design principle keeps the number of arbitrary choices to a minimum. The resulting IGO flow is the flow of an ordinary differential equation conducting the natural gradient ascentofan adaptive, time-dependent transformation of the objective function. It makes no particular assumptions on the objective function to be optimized. The IGO method produces explicit IGO algorithms through time discretization. It naturally recovers versions of known algorithms and offers a systematic way to derive new ones. In continuous search spaces, IGO algorithms take a form related to natural evolution strategies (NES). The cross-entropy method is recovered in a particular case with a large time step, and can be extended into a smoothed, parametrization-independent maximum likelihood update (IGO-ML). When applied to the family of Gaussian distributions on R푑, the IGO framework recovers a version of the well-known CMA-ES algorithm and of xNES.