Stochastic Optimization

Total Page:16

File Type:pdf, Size:1020Kb

Stochastic Optimization Stochastic Optimization Marc Schoenauer DataScience @ LHC Workshop 2015 Content 1 Setting the scene 2 Continuous Optimization 3 Discrete/Combinatorial Optimization 4 Non-Parametric Representations 5 Conclusion Content 1 Setting the scene 2 Continuous Optimization 3 Discrete/Combinatorial Optimization 4 Non-Parametric Representations 5 Conclusion Stochastic Optimization Hypotheses Search Space Ω with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly draw x0 2 Ω and compute F(x0) Initialisation Until(happy) y = Best neighbor(xt) neighbor structure on Ω Compute F(y) If F(y) F (xt) then xt+1 = y accept if improvement else xt+1 = xt Comments Find closest local optimum defined by neighborhood structure Iterate from different x0's Until(very happy) Stochastic Optimization Hypotheses Search Space Ω with some topological structure Objective function F assume some weak regularity Stochastic (Local) Search Randomly draw x0 2 Ω and compute F(x0) Initialisation Until(happy) y = Random neighbor(xt) neighbor structure on Ω Compute F(y) If F(y) F (xt) then xt+1 = y accept if improvement else xt+1 = xt Comments Find one close local optimum defined by neighborhood structure Iterate, leaving current optimum IteratedLocalSearch Stochastic Optimization Hypotheses Search Space Ω with some topological structure Objective function F assume some weak regularity Stochastic (Local) Search { alternative viewpoint Randomly draw x0 2 Ω and compute F(x0) Initialisation Until(happy) y = Move(xt) stochastic variation on Ω Compute F(y) If F(y) F (xt) then xt+1 = y accept if improvement else xt+1 = xt Comments Find one close local optimum defined by the Move operator Iterate, leaving current optimum IteratedLocalSearch Stochastic Optimization Hypotheses Search Space Ω with some topological structure Objective function F assume some weak regularity Stochastic Search aka Metaheuristics Randomly draw x0 2 Ω and compute F(x0) Initialisation Until(happy) y = Move(xt) stochastic variation on Ω Compute F(y) xt+1 = Select(xt; y) stochastic selection, biased toward better points Comments Can escape local optima e.g., Select = Boltzman selection Iterate ? Stochastic Optimization Hypotheses Search Space Ω with some topological structure Objective function F assume some weak regularity Population-based Metaheuristics i i Randomly draw x0 2 Ω and compute F(x0), i = 1; : : : ; µ Initialisation Until(happy) i 1 µ y = Move(xt ;:::; xt ), i = 1; : : : ; λ stochastic variations on Ω Compute F(yi), i = 1; : : : ; λ 1 µ 1 µ 1 λ (xt+1;:::; xt+1) = Select(xt ;:::; xt ; y ;:::; y ) (µ + λ) selection = Select(y1;:::; yλ) (µ, λ) selection Evolutionary Metaphor: 'natural' selection and 'blind' variations i i y = Movem(xt) for some i: mutation i i j y = Movec(xt; xt ) for some (i; j): crossover Metaheuristics: an Alternative Viewpoint Population and variation operators define a distribution on Ω −! directly evolve a parameterized distribution P (θ) Distribution-based Metaheuristics Initialize distribution parameters θ0 Until(happy) Sample distribution P (θt) ! y1;:::; yλ 2 Ω Evaluate y1;:::; yλ on F Update parameters θt+1 Fθ(θt; y1;:::; yλ; F(y1);:::; F(yλ)) includes selection Covers Estimation ofDistributionAlgorithms Population-based Metaheuristics EvolutionaryAlgorithms ParticleSwarmOptimization,DifferentialEvolution, . Deterministic algorithms This is (mostly) Experimental Science Theory lagging behind practice though rapidly catching up in recent years not discussed here Results need to be experimentally validated Experiments Validation relies on grounded statistical validation CPU costly informed (meta)parameter setting/tuning CPU costly ++ reproducibility Open Science From Solution Space to Search Space Choices Objective function Search space representation and move operators / parameterized distribution Approximation bias vs optimization error This talk Non-convex, non-smooth objective parametric continuous optimization ! Rank-based ML seeded identification of influencers Search spaces and crossover operators the TSP Non-parametric representations exploration vs optimization Content 1 Setting the scene 2 Continuous Optimization The Evolution Strategy The CMA-ES (Covariance Matrix Adaptation Evolution Strategy) The Rank-based DataScience 3 Discrete/Combinatorial Optimization 4 Non-Parametric Representations 5 Conclusion Content 1 Setting the scene 2 Continuous Optimization The Evolution Strategy The CMA-ES (Covariance Matrix Adaptation Evolution Strategy) The Rank-based DataScience 3 Discrete/Combinatorial Optimization 4 Non-Parametric Representations 5 Conclusion Stochastic Optimization Templates Population-based Metaheuristics i i Randomly draw x0 2 Ω and compute F(x0), i = 1; : : : ; µ Initialisation Until(happy) i 1 µ y = Move(xt ;:::; xt ), i = 1; : : : ; λ stochastic variations on Ω Compute F(yi), i = 1; : : : ; λ 1 µ 1 µ 1 λ (xt+1;:::; xt+1) = Select(xt ;:::; xt ; y ;:::; y ) (µ + λ) selection = Select(y1;:::; yλ) (µ, λ) selection Distribution-based Metaheuristics Initialize distribution parameters θ0 Until(happy) Sample distribution P (θt) ! y1;:::; yλ 2 Ω Evaluate y1;:::; yλ on F Update parameters θt+1 Fθ(θt; y1;:::; yλ; F(y1);:::; F(yλ)) Stochastic Optimization Templates Distribution-based Metaheuristics Initialize distribution parameters θ0 Until(happy) Sample distribution P (θt) ! y1;:::; yλ 2 Ω Evaluate y1;:::; yλ on F Update parameters θt+1 Fθ(θt; y1;:::; yλ; F(y1);:::; F(yλ)) The (µ, λ)−Evolution Strategy Ω = Rn P (θ): normal distributions Initialize distribution parameters m;σ; C, set population size λ 2 N While not terminate 2 n 1 Sample distribution N m;σ C ! y1;:::; yλ 2 R yi ∼ m + σ Ni(0; C) for i = 1;:::;λ 2 Evaluate y1;:::; yλ on F Compute F(y1);:::; F(yλ) 3 Select µ best samples y1;:::; yµ 4 Update distribution parameters m;σ; C F (m;σ; C; y1;:::; yµ; F(y1);:::; F(yµ)) The (µ, λ)−Evolution Strategy (2) Gaussian Mutations mean vector m 2 Rn is the current solution the so-called step-size σ 2 R+ controls the step length the covariance matrix C 2 Rn×n determines the shape of the distribution ellipsoid How to update m, σ, and C? i=µ 1 X m = x \Crossover" of best µ individuals µ i:λ i=1 Adaptive σ and C Need for adaptation P 2 Sphere function f(x) = xi ; n = 10 Content 1 Setting the scene 2 Continuous Optimization The Evolution Strategy The CMA-ES (Covariance Matrix Adaptation Evolution Strategy) The Rank-based DataScience 3 Discrete/Combinatorial Optimization 4 Non-Parametric Representations 5 Conclusion Cumulative Step-Size Adaptation (CSA) Measure the length of the evolution path path of the mean vector m in the generation sequence # # decrease σ increase σ Compare to random walk without selection Covariance Matrix Adaptation Rank-One Update Pµ m m + σyw; yw = i=1 wi yi:λ; yi ∼ Ni (0; C) initial distribution, C = I T new distribution: C 0:8 × C + 0:2 × yw yw ruling principle: the adaptation increases the probability of successful steps, yw, to appear again Covariance Matrix Adaptation Rank-One Update Pµ m m + σyw; yw = i=1 wi yi:λ; yi ∼ Ni (0; C) yw, movement of the population mean m (disregarding σ) T new distribution: C 0:8 × C + 0:2 × yw yw ruling principle: the adaptation increases the probability of successful steps, yw, to appear again Covariance Matrix Adaptation Rank-One Update Pµ m m + σyw; yw = i=1 wi yi:λ; yi ∼ Ni (0; C) T mixture of distribution C and step yw, C 0:8 × C + 0:2 × yw yw T new distribution: C 0:8 × C + 0:2 × yw yw ruling principle: the adaptation increases the probability of successful steps, yw, to appear again Covariance Matrix Adaptation Rank-One Update Pµ m m + σyw; yw = i=1 wi yi:λ; yi ∼ Ni (0; C) new distribution (disregarding σ) T new distribution: C 0:8 × C + 0:2 × yw yw ruling principle: the adaptation increases the probability of successful steps, yw, to appear again Covariance Matrix Adaptation Rank-One Update Pµ m m + σyw; yw = i=1 wi yi:λ; yi ∼ Ni (0; C) movement of the population mean m T new distribution: C 0:8 × C + 0:2 × yw yw ruling principle: the adaptation increases the probability of successful steps, yw, to appear again Covariance Matrix Adaptation Rank-One Update Pµ m m + σyw; yw = i=1 wi yi:λ; yi ∼ Ni (0; C) mixture of distribution C and step yw, T C 0:8 × C + 0:2 × yw yw T new distribution: C 0:8 × C + 0:2 × yw yw ruling principle: the adaptation increases the probability of successful steps, yw, to appear again Covariance Matrix Adaptation Rank-One Update Pµ m m + σyw; yw = i=1 wi yi:λ; yi ∼ Ni (0; C) T new distribution: C 0:8 × C + 0:2 × yw yw ruling principle: the adaptation increases the probability of successful steps, yw, to appear again CMA-ES in one slide n Input: m 2 R , σ 2 R+, λ Initialize: C = I, and pc = 0, pσ = 0, 2 2 Set: cc ≈ 4=n, cσ ≈ 4=n, c1 ≈ 2=n , cµ ≈ µw=n , c1 + cµ ≤ 1, p µw 1 dσ ≈ 1 + n , and wi=1:::λ such that µw = Pµ 2 ≈ 0:3 λ i=1 wi While not terminate xi = m + σ yi; yi ∼ Ni (0; C) ; for i = 1;:::;λ sampling Pµ Pµ m i=1 wi xi:λ = m + σyw where yw = i=1 wi yi:λ update mean p p 2p pc (1 − cc) pc + 1Ifkpσ k<1:5 ng 1 − (1 − cc) µw yw cumulation for C 1 p 2p − pσ (1 − cσ) pσ + 1 − (1 − cσ) µw C 2 yw cumulation for σ T Pµ T C (1 − c1 − cµ) C + c1 pc pc + cµ i=1 wi yi:λyi:λ update C σ σ × exp cσ kpσ k − 1 update of σ dσ EkN(0;I)k Not covered on this slide: termination, restarts, active or mirrored sampling, outputs, and boundaries Invariances: Guarantee for Generalization Invariance properties of CMA-ES 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 Invariance to order preserving transformations 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 in function space
Recommended publications
  • Stochastic Optimization from Distributed, Streaming Data in Rate-Limited Networks Matthew Nokleby, Member, IEEE, and Waheed U
    1 Stochastic Optimization from Distributed, Streaming Data in Rate-limited Networks Matthew Nokleby, Member, IEEE, and Waheed U. Bajwa, Senior Member, IEEE T Abstract—Motivated by machine learning applications in T training samples fξ(t) 2 Υgt=1 drawn independently from networks of sensors, internet-of-things (IoT) devices, and au- the distribution P (ξ), which is supported on a subset of Υ. tonomous agents, we propose techniques for distributed stochastic There are, in particular, two main categorizations of training convex learning from high-rate data streams. The setup involves a network of nodes—each one of which has a stream of data data that, in turn, determine the types of methods that can be arriving at a constant rate—that solve a stochastic convex used to find approximate solutions to the SO problem. These optimization problem by collaborating with each other over rate- are (i) batch training data and (ii) streaming training data. limited communication links. To this end, we present and analyze In the case of batch training data, where all T samples two algorithms—termed distributed stochastic approximation fξ(t)g are pre-stored and simultaneously available, a common mirror descent (D-SAMD) and accelerated distributed stochastic approximation mirror descent (AD-SAMD)—that are based on strategy is sample average approximation (SAA) (also referred two stochastic variants of mirror descent and in which nodes to as empirical risk minimization (ERM)), in which one collaborate via approximate averaging of the local, noisy subgra- minimizes the empirical average of the “risk” function φ(·; ·) dients using distributed consensus. Our main contributions are in lieu of the true expectation.
    [Show full text]
  • Metaheuristics1
    METAHEURISTICS1 Kenneth Sörensen University of Antwerp, Belgium Fred Glover University of Colorado and OptTek Systems, Inc., USA 1 Definition A metaheuristic is a high-level problem-independent algorithmic framework that provides a set of guidelines or strategies to develop heuristic optimization algorithms (Sörensen and Glover, To appear). Notable examples of metaheuristics include genetic/evolutionary algorithms, tabu search, simulated annealing, and ant colony optimization, although many more exist. A problem-specific implementation of a heuristic optimization algorithm according to the guidelines expressed in a metaheuristic framework is also referred to as a metaheuristic. The term was coined by Glover (1986) and combines the Greek prefix meta- (metá, beyond in the sense of high-level) with heuristic (from the Greek heuriskein or euriskein, to search). Metaheuristic algorithms, i.e., optimization methods designed according to the strategies laid out in a metaheuristic framework, are — as the name suggests — always heuristic in nature. This fact distinguishes them from exact methods, that do come with a proof that the optimal solution will be found in a finite (although often prohibitively large) amount of time. Metaheuristics are therefore developed specifically to find a solution that is “good enough” in a computing time that is “small enough”. As a result, they are not subject to combinatorial explosion – the phenomenon where the computing time required to find the optimal solution of NP- hard problems increases as an exponential function of the problem size. Metaheuristics have been demonstrated by the scientific community to be a viable, and often superior, alternative to more traditional (exact) methods of mixed- integer optimization such as branch and bound and dynamic programming.
    [Show full text]
  • Adaptive Stochastic Optimization
    Industrial and Systems Engineering Adaptive Stochastic Optimization Frank E. Curtis1 and Katya Scheinberg2 1Department of Industrial and Systems Engineering, Lehigh University 2Department of Operations Research and Information Engineering, Cornell University ISE Technical Report 20T-001 Adaptive Stochastic Optimization The purpose of this article is to summarize re- Frank E. Curtis Katya Scheinberg cent work and motivate continued research on ◦ the design and analysis of adaptive stochastic op- January 18, 2020 timization methods. In particular, we present an analytical framework—new for the context Abstract. Optimization lies at the heart of of adaptive deterministic optimization—that sets machine learning and signal processing. Contem- the stage for establishing convergence rate guar- porary approaches based on the stochastic gradi- antees for adaptive stochastic optimization tech- ent method are non-adaptive in the sense that niques. With this framework in hand, we remark their implementation employs prescribed param- on important open questions related to how it eter values that need to be tuned for each appli- can be extended further for the design of new cation. This article summarizes recent research methods. We also discuss challenges and oppor- and motivates future work on adaptive stochastic tunities for their use in real-world systems. optimization methods, which have the potential to offer significant computational savings when 2 Background training large-scale systems. Many problems in ML and SP are formulated as optimization problems. For example, given a 1 Introduction data vector y Rm from an unknown distribu- tion, one often∈ desires to have a vector of model The successes of stochastic optimization algo- parameters x Rn such that a composite objec- ∈ rithms for solving problems arising machine tive function f : Rn R is minimized, as in learning (ML) and signal processing (SP) are → now widely recognized.
    [Show full text]
  • Conservative Stochastic Optimization with Expectation Constraints
    1 Conservative Stochastic Optimization with Expectation Constraints Zeeshan Akhtar,? Amrit Singh Bedi,y and Ketan Rajawat? Abstract This paper considers stochastic convex optimization problems where the objective and constraint functions involve expectations with respect to the data indices or environmental variables, in addition to deterministic convex constraints on the domain of the variables. Although the setting is generic and arises in different machine learning applications, online and efficient approaches for solving such problems have not been widely studied. Since the underlying data distribution is unknown a priori, a closed-form solution is generally not available, and classical deterministic optimization paradigms are not applicable. Existing approaches towards solving these problems make use of stochastic gradients of the objective and constraints that arrive sequentially over iterations. State-of-the-art approaches, such as those using the saddle point framework, are able to ensure that the optimality gap as well as the constraint violation − 1 decay as O T 2 where T is the number of stochastic gradients. The domain constraints are assumed simple and handled via projection at every iteration. In this work, we propose a novel conservative − 1 stochastic optimization algorithm (CSOA) that achieves zero average constraint violation and O T 2 optimality gap. Further, we also consider the scenario where carrying out a projection step onto the convex domain constraints at every iteration is not viable. Traditionally, the projection operation can be avoided by considering the conditional gradient or Frank-Wolfe (FW) variant of the algorithm. The state-of-the-art − 1 stochastic FW variants achieve an optimality gap of O T 3 after T iterations, though these algorithms have not been applied to problems with functional expectation constraints.
    [Show full text]
  • Monte Carlo Methods for Sampling-Based Stochastic Optimization
    Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Leli`evre,G. Stoltz from ENPC and E. Kuhn from INRA. A. Schreck and E. Moulines from Telecom ParisTech P. Priouret from Paris VI Monte Carlo methods for sampling-based Stochastic Optimization Introduction Simulated Annealing Simulated Annealing (1/2) Let U denote the objective function one wants to minimize. U(x) min U(x) () max exp(−U(x)) () max exp − 8T > 0 x2X x2X x2X T In order to sample from πT? where U(x) π (x) = exp − T T sample successively from a sequence of tempered distributions πT1 , πT2 , ··· with T1 > T2 > ··· > T?. Monte Carlo methods for sampling-based Stochastic Optimization Introduction Simulated Annealing Simulated Annealing (1/2) Let U denote the objective function one wants to minimize. U(x) min U(x) () max exp(−U(x)) () max exp − 8T > 0 x2X x2X x2X T In order to sample from πT? where U(x) π (x) = exp − T T sample successively from a sequence of tempered distributions πT1 , πT2 , ··· with T1 > T2 > ··· > T?. or sample successively from a sequence of (nt-iterated) kernels (PTt (x; ·))t such that πT PT = πT . Monte Carlo methods for sampling-based Stochastic Optimization Introduction Simulated Annealing Simulated Annealing (2/2) Under conditions on X, on the cooling schedule (Tt)t, on the kernels (Pt)t, on the dominating measure and the set of minima, ··· Kirkpatrick, Gelatt and Vecchi. Optimization via Simulated Annealing. Science (1983) Geman and Geman.
    [Show full text]
  • Stochastic Optimization,” in Handbook of Computational Statistics: Concepts and Methods (2Nd Ed.) (J
    Citation: Spall, J. C. (2012), “Stochastic Optimization,” in Handbook of Computational Statistics: Concepts and Methods (2nd ed.) (J. Gentle, W. Härdle, and Y. Mori, eds.), Springer−Verlag, Heidelberg, Chapter 7, pp. 173–201. dx.doi.org/10.1007/978-3-642-21551-3_7 STOCHASTIC OPTIMIZATION James C. Spall The Johns Hopkins University Applied Physics Laboratory 11100 Johns Hopkins Road Laurel, Maryland 20723-6099 U.S.A. [email protected] Stochastic optimization algorithms have been growing rapidly in popularity over the last decade or two, with a number of methods now becoming “industry standard” approaches for solving challenging optimization problems. This chapter provides a synopsis of some of the critical issues associated with stochastic optimization and a gives a summary of several popular algorithms. Much more complete discussions are available in the indicated references. To help constrain the scope of this article, we restrict our attention to methods using only measurements of the criterion (loss function). Hence, we do not cover the many stochastic methods using information such as gradients of the loss function. Section 1 discusses some general issues in stochastic optimization. Section 2 discusses random search methods, which are simple and surprisingly powerful in many applications. Section 3 discusses stochastic approximation, which is a foundational approach in stochastic optimization. Section 4 discusses a popular method that is based on connections to natural evolution—genetic algorithms. Finally, Section 5 offers some concluding remarks. 1 Introduction 1.1 General Background Stochastic optimization plays a significant role in the analysis, design, and operation of modern systems. Methods for stochastic optimization provide a means of coping with inherent system noise and coping with models or systems that are highly nonlinear, high dimensional, or otherwise inappropriate for classical deterministic methods of optimization.
    [Show full text]
  • Contextual Covariance Matrix Adaptation Evolutionary Strategies
    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Contextual Covariance Matrix Adaptation Evolutionary Strategies Abbas Abdolmaleki1;2;3, Bob Price5, Nuno Lau1, Luis Paulo Reis2;3, Gerhard Neumann4∗ 1: IEETA, DETI, University of Aveiro 2: DSI, University of Minho 3: LIACC, University of Porto, Portugal 4: CLAS, TU Darmstadt, Germany 5: PARC, A Xerox Company, USA Abstract the ball to its team mates which are positioned on differ- ent locations on the field. Here, the soccer robot should Many stochastic search algorithms are designed to learn to kick the ball to any given target location, which is optimize a fixed objective function to learn a task, specified by the context vector, on the field. In such cases, i.e., if the objective function changes slightly, for learning for every possible context is clearly inefficient or example, due to a change in the situation or con- even infeasible. Therefore our goal is to generalize learned text of the task, relearning is required to adapt to tasks from similar contexts to a new context. To do so, we the new context. For instance, if we want to learn learn a context-dependent policy for a continuous range of a kicking movement for a soccer robot, we have to contexts without restarting the learning process. In this pa- relearn the movement for different ball locations. per, we consider stochastic search algorithms for contextual Such relearning is undesired as it is highly inef- learning. Stochastic search algorithms [Hansen et al., 2003; ficient and many applications require a fast adap- Sun et al., 2009; Ruckstieߨ et al., 2008] optimize an objec- tation to a new context/situation.
    [Show full text]
  • Arxiv:2005.11011V2 [Quant-Ph] 11 Aug 2020 Error Is Fundamental to Measuring the Values on a Quan- Heuristics and Test Them Numerically on Example Prob- Tum Device
    Using models to improve optimizers for variational quantum algorithms Kevin J. Sung,1, 2, ∗ Jiahao Yao,3 Matthew P. Harrigan,1 Nicholas C. Rubin,1 Zhang Jiang,1 Lin Lin,3, 4 Ryan Babbush,1 and Jarrod R. McClean1, y 1Google Research, Venice, CA 2Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 3Department of Mathematics, University of California, Berkeley, CA 4Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA (Dated: August 13, 2020) Variational quantum algorithms are a leading candidate for early applications on noisy intermediate-scale quantum computers. These algorithms depend on a classical optimization outer- loop that minimizes some function of a parameterized quantum circuit. In practice, finite sampling error and gate errors make this a stochastic optimization with unique challenges that must be ad- dressed at the level of the optimizer. The sharp trade-off between precision and sampling time in conjunction with experimental constraints necessitates the development of new optimization strate- gies to minimize overall wall clock time in this setting. In this work, we introduce two optimization methods and numerically compare their performance with common methods in use today. The methods are surrogate model-based algorithms designed to improve reuse of collected data. They do so by utilizing a least-squares quadratic fit of sampled function values within a moving trusted region to estimate the gradient or a policy gradient. To make fair comparisons between optimization methods, we develop experimentally relevant cost models designed to balance efficiency in testing and accuracy with respect to cloud quantum computing systems. The results here underscore the need to both use relevant cost models and optimize hyperparameters of existing optimization meth- ods for competitive performance.
    [Show full text]
  • A Review of Simheuristics: Extending Metaheuristics to Deal with Stochastic Combinatorial Optimization Problems
    A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Juan, Angel A.; Faulin, Javier; Grasman, Scott E.; Rabe, Markus; Figueira, Gonçalo Article A review of simheuristics: Extending metaheuristics to deal with stochastic combinatorial optimization problems Operations Research Perspectives Provided in Cooperation with: Elsevier Suggested Citation: Juan, Angel A.; Faulin, Javier; Grasman, Scott E.; Rabe, Markus; Figueira, Gonçalo (2015) : A review of simheuristics: Extending metaheuristics to deal with stochastic combinatorial optimization problems, Operations Research Perspectives, ISSN 2214-7160, Elsevier, Amsterdam, Vol. 2, pp. 62-72, http://dx.doi.org/10.1016/j.orp.2015.03.001 This Version is available at: http://hdl.handle.net/10419/178250 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence.
    [Show full text]
  • Kafisto: a Kalman Filtering Framework for Stochastic Optimization
    KaFiStO: A Kalman Filtering Framework for Stochastic Optimization Aram Davtyan, Sepehr Sameni, Llukman Cerkezi, Givi Meishvilli, Adam Bielski, Paolo Favaro Computer Vision Group University of Bern faram.davtyan, sepehr.sameni, llukman.cerkezi givi.meishvilli, adam.bielski, [email protected] Abstract tic gradient descent (SGD) [61]. SGD owes its performance gains to the adoption of an approximate version of the ob- Optimization is often cast as a deterministic problem, jective function at each iteration step, which, in turn, yields where the solution is found through some iterative proce- an approximate or noisy gradient. dure such as gradient descent. However, when training neu- While SGD seems to benefit greatly (e.g., in terms of rate ral networks the loss function changes over (iteration) time of convergence) from such approximation, it has also been due to the randomized selection of a subset of the samples. shown that too much noise hurts the performance [83,7]. This randomization turns the optimization problem into a This suggests that, to further improve over SGD, one could stochastic one. We propose to consider the loss as a noisy attempt to model the noise of the objective function. We observation with respect to some reference optimum. This consider the iteration-time varying loss function used in interpretation of the loss allows us to adopt Kalman filter- SGD as a stochastic process obtained by adding the ex- ing as an optimizer, as its recursive formulation is designed pected risk to zero mean Gaussian noise. A powerful ap- to estimate unknown parameters from noisy measurements. proach designed to handle estimation with such processes Moreover, we show that the Kalman Filter dynamical model is Kalman filtering [36].
    [Show full text]
  • CMA-ES and Advanced Adaptation Mechanisms Youhei Akimoto, Nikolaus Hansen
    CMA-ES and Advanced Adaptation Mechanisms Youhei Akimoto, Nikolaus Hansen To cite this version: Youhei Akimoto, Nikolaus Hansen. CMA-ES and Advanced Adaptation Mechanisms. GECCO ’18 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion, Jul 2018, Kyoto, Japan. hal-01959479 HAL Id: hal-01959479 https://hal.inria.fr/hal-01959479 Submitted on 18 Dec 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. CMA-ES and Advanced Adaptation Mechanisms Youhei Akimoto1 & Nikolaus Hansen2 1. University of Tsukuba, Japan 2. Inria, Research Centre Saclay, France [email protected] [email protected] Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. GECCO '18 Companion, July 15–19, 2018, Kyoto, Japan © 2018 Copyright is held by the owner/author(s). ACM ISBN 978-1-4503-5764-7/18/07.
    [Show full text]
  • A Comparative Study of PSO and CMA-ES Algorithms on Black-Box Optimization Benchmarks Paweł Szynkiewicz Research and Academic Computer Network (NASK), Warsaw, Poland
    Paper A Comparative Study of PSO and CMA-ES Algorithms on Black-box Optimization Benchmarks Paweª Szynkiewicz Research and Academic Computer Network (NASK), Warsaw, Poland https://doi.org/10.26636/jtit.2018.127418 Abstract|Numerous practical engineering applications can suitable or entirely unavailable, are referred to as black-box be formulated as non-convex, non-smooth, multi-modal and problems. Thereby we assume that the black-box can be ill-conditioned optimization problems. Classical, determin- queried through a simulation, experimental measurements istic algorithms require an enormous computational effort, or the so-called surrogate model to provide crucial perfor- which tends to fail as the problem size and its complexity in- mance characteristics [1]{[5] for specified values of system crease, which is often the case. On the other hand, stochastic, inputs. Practical applications in network system design, biologically-inspired techniques, designed for global optimum cybersecurity, large scale systems modeling, optimization calculation, frequently prove successful when applied to real life computational problems. While the area of bio-inspired and control, etc. are discussed in [6]{[11]. algorithms (BIAs) is still relatively young, it is undergoing con- Let us consider the following black-box optimization prob- tinuous, rapid development. Selection and tuning of the ap- lem: propriate optimization solver for a particular task can be chal- lenging and requires expert knowledge of the methods to be min f (x) considered. Comparing the performance of viable candidates x2ℜdim against a defined test bed environment can help in solving such min max subject to: xi ≤ xi ≤ xi ; i = 1; :::; dim ; (1) dilemmas. This paper presents the benchmark results of two biologically inspired algorithms: covariance matrix adapta- where f is the real-valued, dim-dimensional function.
    [Show full text]