Improved Gradient Descent Learning Strategy with Simulated Annealing

Total Page:16

File Type:pdf, Size:1020Kb

Improved Gradient Descent Learning Strategy with Simulated Annealing SA-GD: Improved Gradient Descent Learning Strategy with Simulated Annealing Zhicheng Cai School of Electronic Science and Engineering Nanjing University, Nanjing, China 210023 Email: [email protected] Abstract—Gradient descent algorithm is the most utilized stances like local minimums, saddle points and plateaus in method when optimizing machine learning issues. However, there the loss function [6]. It is easy for optimization with gradient exists many local minimums and saddle points in the loss descent to make the model get into these disappointing areas function, especially for high dimensional non-convex optimization problems like deep learning. Gradient descent may make loss and fail to jump out of inevitably. These situation tackles function trapped in these local intervals which impedes further further optimization and learning. As a consequence, the optimization, resulting in poor generalization ability. This paper model trained fails to obtain sufficient robustness and a good proposes the SA-GD algorithm which introduces the thought generalization ability. Therefore, many improved gradient de- of simulated annealing algorithm to gradient descent. SA-GD scent algorithm have been raised such as NAG [18], Adam method offers model the ability of mounting hills in probability, tending to enable the model to jump out of these local areas [14] and so on [25]. However, these improved optimization and converge to a optimal state finally. We took CNN models algorithms still keep loss function performing gradient descent as an example and tested the basic CNN models on various without possessing the ability of mounting hills [6] and benchmark datasets. Compared to the baseline models with fail to solve certain issues mentioned above. As a matter of traditional gradient descent algorithm, models with SA-GD fact, researchers prefer utilizing stochastic gradient descent algorithm possess better generalization ability without sacrificing the efficiency and stability of model convergence. In addition, SA- simply in many tasks [8], [15]. In addition, these improved GD can be utilized as an effective ensemble learning approach optimization algorithms fail to achieve better performance which improves the final performance significantly. compared to simple SGD under some circumstances. It is believed that quipping machine learning models with I. INTRODUCTION the ability of mounting hills is one approach to make the It has been decades since the concept of machine learning loss function jump out off these disappointing local areas. In was raised. From simple linear models to more complex addition, we believe that the ability of mounting hills should be issues processing high dimensional data such as reinforcement according to certain probability function. Moreover, we argue learning [32] and deep learning models [6], [16], the this probability function should take all present model states machine learning community prospers continuously because of into consideration, including loss, epoch, even learning rate. To unremitting research. What’s more, machine learning has been take effect while ensuring model convergence, this probability successfully applied in many significant realms and tasks such should be low in the beginning and then ascending, but under as natural language processing [33], computer vision [31], a certain ceiling all the time. This paper set the upper bound and speech recognition [2]. as 33 percent. Most of the machine learning models can be regarded Inspired by the simulated annealing algorithm [29] whose as a kind of unconstrained optimization problem, requiring probability function takes energy and temperature into con- the optimal parameters to obtain better representation ability. sideration, we proposed SA-GD optimization algorithm which arXiv:2107.07558v1 [cs.LG] 15 Jul 2021 Gradient descent algorithm [22], which is a kind of iterative stands for gradient descent improved by simulated annealing. method utilized to solve nonlinear equations, is generally SA-GD algorithm possesses the gradient ascent probability applied to optimize machine learning issues such as linear function in the similar format as simulated annealing algorithm regression [7], multi-layer perceptron [28], convolutional does. This function enables the model to perform gradient neural networks [8], and so on [30]. Specifically, when ascent in probability. With the ability of mounting hills, the facing a machine learning issue, we utilize gradient descent model is supposed to jump out of these intractable local areas iteratively to minimize the well-designed target function and and converge to an optimal state. Moreover, the probability of update these learned parameters by back-propagation [17] to performing gradient ascent can be regarded as a sort of noisy make model convergence. For tasks with tremendous dataset, disturbance, which enhances the model generalization ability stochastic gradient descent (SGD) algorithm [12] which splits as well. In section 3 we will exhibit the derivation process of dataset into mini-batches is utilized to balance between the the final gradient ascent probability function step by step. speed and stability of model convergence. In addition, due to the fluctuation of loss curve caused However, in the case of high dimensional non-convex by gradient ascent, the models of various epochs during optimization problems, there exists many intractable circum- training procedure are supposed to be different significantly. Make the precondition that all models selected perform well with standard momentum term [21] or Nesterov momentum individually, the group of selected models possesses diversity. [18], and adaptive learning rate optimization algorithms start- As a result, the thought of ensemble learning with SA-GD is ing from the perspective of learning rate such as AdaGrad [19] proposed naturally. and AdaDelta [34], trying to make the model escape from the saddle point and get a better convergence result. However, our II. RELATED WORK experimental results validate that these improved optimization A. Gradient descent algorithm algorithms fail to make a significant improvement to the model For most deep learning and neural network algorithms, generalization ability. we utilize the gradient descent algorithm to optimize the minimization of the cost function. Suppose the current cost B. Simulated annealing algorithm 0 function is J(θi) and J (θi) is the derivative of the cost Researches in statistical mechanics have demonstrated that function at θi, which is also the direction of the gradient that if a system is slowly cooled down (i.e. annealing process) reduces the cost function the fastest. We can gradually reduce starting from a high temperature, the energy of the system the value of the cost function by moving the parameter θi one is steadily reduced and the state of the system particles 0 small step in the derivative direction so that θi+1 = J (θi) is continuously changed. After sufficient transitions, thermal to obtain J(θi+1) < J(θi), where is the hyper-parameter equilibrium can be reached at each temperature. When the learning rate. Iterating over the above process in the hope of system is completely cooled, it will enter a state of minimal eventually obtaining the global optimal solution is the gradient energy with a high probability [4]. The Metropolis algorithm descent algorithm. [1] constructs a mathematical model to simply describe the However, for a high-dimensional non-convex optimization process of the system annealing. Assuming that the energy of problem like deep learning, there are often many saddle points, the system in state n is E(n), the probability that the system local minimums, or plateaus [6]. Gradient descent may make transitions from state n to state n + 1 at temperature T can the cost function fall into such local intervals and fail to jump be formulated as Eq. 1. out and converge to the global optimal solution. Fig. 1 shows the case where the cost function falls into a local minimum. If ( 1 ;E(n + 1) ≤ E(n) the learned weights of the network are not initialized properly, P (n ! n + 1) = E(n)−E(n+1) (1) it is very easy to make the network eventually converge to a e KT ;E(n + 1) > E(n) poor state [6]. where K is the Boltzmann constant and T is the system temperature. Under a certain temperature, the system will reach thermal equilibrium after sufficient conversions have taken place. The probability of the system being in state i at this point satisfies the Boltzmann distribution [4] as illustrated in Eq. 2. − E(i) e KT PT (X = i) = (2) P − E(j) j2S e KT where S is the set of state spaces. When the temperature drops very lowly to 0, there is: − E(i)−Emin Fig. 1. Schematic diagram of the case where cost function falls into a local e KT minimum. lim T !0 P − E(j)−Emin j2S e KT Stochastic gradient descent [12] takes one random sample E(i)−E − min e KT at a time to calculate the gradient, and since the optimal = lim T !0 − E(j)−Emin − E(j)−Emin direction of a single sample point may not be the global P e KT + P e KT j2Smin j2 =Smin optimal direction, the possibility that the gradient is not zero E(i)−E − min e KT when it falls into a local minimum is preserved, giving the loss = lim T !0 − E(j)−Emin function a chance to jump out of the local minimum. However, P e KT j2Smin the stochastic approach can be seen as adding a random factor 8 1 to the direction of the gradient and may provide the wrong < ; i 2 Smin = jSminj direction of optimization, resulting in the network eventually : 0 ; otherwise failing to converge to a better state. For deep learning models, (3) the probability of saddle point appearance is much higher than the probability of local minimum appearance [6], some where Emin represents the lowest energy in the state space improved gradient descent algorithms such as gradient descent and Smin = ijE(i) = Emin stands for the set of states with the lowest energy.
Recommended publications
  • Simulated Annealing Hyper-Heuristic for a Shelf Space Allocation on Symmetrical Planograms Problem
    S S symmetry Article Simulated Annealing Hyper-Heuristic for a Shelf Space Allocation on Symmetrical Planograms Problem Kateryna Czerniachowska * and Marcin Hernes Department of Process Management, Wroclaw University of Economics and Business, Komandorska 118/120, 53-345 Wroclaw, Poland; [email protected] * Correspondence: [email protected] Abstract: The allocation of products on shelves is an important issue from the point of view of effective decision making by retailers. In this paper, we investigate a practical shelf space allocation model which takes into account the number of facings, capping, and nesting of a product. We divide the shelf into the segments of variable size in which the products of the specific types could be placed. The interconnections between products are modelled with the help of categorizing the products into specific types as well as grouping some of them into clusters. This results in four groups of constraints—shelf constraints, shelf type constraints, product constraints, position allocation constraints—that are used in the model for aesthetic symmetry of a planogram. We propose a simulated annealing algorithm with improvement and reallocation procedures to solve the planogram profit maximization problem. Experiments are based on artificial data sets that have been generated according to real-world conditions. The efficiency of the designed algorithm has been estimated using the CPLEX solver. The computational tests demonstrate that the proposed algorithm gives valuable results in an acceptable time. Citation: Czerniachowska, K.; Hernes, M. Simulated Annealing Keywords: retailing decisions; shelf space allocation; simulated annealing; heuristics; merchandising Hyper-Heuristic for a Shelf Space Allocation on Symmetrical Planograms Problem.
    [Show full text]
  • A Simulated Annealing Based Inexact Oracle for Wasserstein Loss Minimization
    A Simulated Annealing Based Inexact Oracle for Wasserstein Loss Minimization Jianbo Ye 1 James Z. Wang 1 Jia Li 2 Abstract ryl(x; ·) is computed in O(m) time, m being the complex- ity of outcome variables x or y. This part of calculation is Learning under a Wasserstein loss, a.k.a. Wasser- often negligible compared with the calculation of full gra- stein loss minimization (WLM), is an emerging dient with respect to the model parameters. But this is no research topic for gaining insights from a large longer the case in learning problems based on Wasserstein set of structured objects. Despite being concep- distance due to the intrinsic complexity of the distance. tually simple, WLM problems are computation- We will call such problems Wasserstein loss minimization ally challenging because they involve minimiz- (WLM). Examples of WLMs include Wasserstein barycen- ing over functions of quantities (i.e. Wasserstein ters (Li & Wang, 2008; Agueh & Carlier, 2011; Cuturi & distances) that themselves require numerical al- Doucet, 2014; Benamou et al., 2015; Ye & Li, 2014; Ye gorithms to compute. In this paper, we intro- et al., 2017b), principal geodesics (Seguy & Cuturi, 2015), duce a stochastic approach based on simulated nonnegative matrix factorization (Rolet et al., 2016; San- annealing for solving WLMs. Particularly, we dler & Lindenbaum, 2009), barycentric coordinate (Bon- have developed a Gibbs sampler to approximate neel et al., 2016), and multi-label classification (Frogner effectively and efficiently the partial gradients of et al., 2015). a sequence of Wasserstein losses. Our new ap- proach has the advantages of numerical stability Wasserstein distance is defined as the cost of matching two and readiness for warm starts.
    [Show full text]
  • University of Waterloo Faculty of Engineering
    University of Waterloo Faculty of Engineering Global Optimization of General Failure Surfaces in Slope Analysis by Hybrid Simulated Annealing Rocscience, Inc. Toronto, Ontario Prepared by Su, Xiao ID 20251755 2B Chemical Engineering May 1st , 2009 4105 Credit Pointe Dr, Mississauga, Ontario L5M 3K7 May 1st, 2009 Dr Thomas Duever, Chair of Chemical Engineering University of Waterloo Waterloo, Ontario N2L 3G1 Dear Dr Duever, I am submitting this document, entitled “Global Optimization of General Failure Surfaces in Slope Analysis by Hybrid Simulated Annealing” as my 2B Work Report for Rocscience, Inc. My project investigated the efficiency of a hybrid simulated annealing (HSA) method for finding the minima (critical failure surfaces and factors of safety) of slope stability equations. The HSA combines a Very Fast Simulated Annealing (VFSA) and a Local Monte-Carlo (LMC) search, the latter being an optimization algorithm designed specifically for solving slope stability problems. This work is an extension as well as a generalization of the project from my summer work-term (2008), also undertaken at Rocscience. During both terms, I worked under the supervision of Dr. Brent Corkum, software development manager at Rocscience. I implemented the simulated annealing algorithm as a new optimization method in SLIDE 5.0, a slope stability analysis program developed by Rocscience. In addition to learning professional C++ and principles of object oriented programming, I had the opportunity to study the theoretical and practical aspects of non-convex optimization, especially the performance of stochastic methods when used on multi- modal functions. I also learned soil mechanics and slope stability principles. This report was written entirely by me and has not received any previous academic credit at Rocscience or the University of Waterloo.
    [Show full text]
  • Arxiv:2101.10154V1 [Cond-Mat.Dis-Nn] 25 Jan 2021
    Variational Neural Annealing 1, 2, 3, 1 1, 2 2, 3 1, 2 Mohamed Hibat-Allah, ∗ Estelle M. Inack, Roeland Wiersema, Roger G. Melko, and Juan Carrasquilla 1Vector Institute, MaRS Centre, Toronto, Ontario, M5G 1M1, Canada 2Department of Physics and Astronomy, University of Waterloo, Ontario, N2L 3G1, Canada 3Perimeter Institute for Theoretical Physics, Waterloo, ON N2L 2Y5, Canada (Dated: January 26, 2021) Many important challenges in science and technology can be cast as optimization problems. When viewed in a statistical physics framework, these can be tackled by simulated annealing, where a gradual cooling procedure helps search for groundstate solutions of a target Hamiltonian. While powerful, simulated annealing is known to have prohibitively slow sampling dynamics when the optimization landscape is rough or glassy. Here we show that by generalizing the target distribution with a parameterized model, an analogous annealing framework based on the variational principle can be used to search for groundstate solutions. Modern autoregressive models such as recurrent neural networks provide ideal parameterizations since they can be exactly sampled without slow dynamics even when the model encodes a rough landscape. We implement this procedure in the classical and quantum settings on several prototypical spin glass Hamiltonians, and find that it significantly outperforms traditional simulated annealing in the asymptotic limit, illustrating the potential power of this yet unexplored route to optimization. P I. INTRODUCTION <latexit sha1_base64="Wk+wWdaEIxpmR5GD8uDFICvy2Ig=">AAACNHicfVDLSgMxFL1TX7W+Rl26CRbBVZkRQZdFN4KbCvYBbRkyaaYNzWSGJGMpQz/KjR/iRgQXirj1G8y0s6iteCBwOOfcJPf4MWdKO86rVVhZXVvfKG6WtrZ3dvfs/YOGihJJaJ1EPJItHyvKmaB1zTSnrVhSHPqcNv3hdeY3H6hULBL3ehzTboj7ggWMYG0kz76teSnq9KKRwFJGI9RJ4kUyZ/6Xm5Q8u+xUnCnQMnFzUoYcNc9+NjeSJKRCE46VartOrLsplpoRTielTqJojMkQ92nbUIFDqrrpdOkJOjFKDwWRNEdoNFXnJ1IcKjUOfZMMsR6oRS8T//LaiQ4uuykTcaKpILOHgoQjHaGsQdRjkhLNx4ZgIpn5KyIDLDHRpuesBHdx5WXSOKu4TsW9Oy9Xr/I6inAEx3AKLlxAFW6gBnUg8Agv8A4f1pP1Zn1aX7NowcpnDuEXrO8fzwWstw==</latexit>
    [Show full text]
  • A Simulated Annealing Algorithm for Solving Two-Echelon Vehicle Routing Problem with Locker Facilities
    algorithms Article A Simulated Annealing Algorithm for Solving Two-Echelon Vehicle Routing Problem with Locker Facilities A. A. N. Perwira Redi 1 , Parida Jewpanya 2,*, Adji Candra Kurniawan 3, Satria Fadil Persada 4, Reny Nadlifatin 5 and Oki Anita Candra Dewi 6 1 Industrial Engineering Department, BINUS Graduate Program—Master of Industrial Engineering, Bina Nusantara University, Jakarta 11480, Indonesia; [email protected] 2 Department of Industrial Engineering, Rajamangala University of Technology Lanna Tak, Tak 63000, Thailand 3 Department of Logistics Engineering, Universitas Pertamina, Jakarta 12220, Indonesia; [email protected] 4 Department of Business Management, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia; [email protected] 5 Department of Technology Management, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia; [email protected] 6 Logistics Engineering Department, Universitas Internasional Semen Indonesia, Gresik 61122, Indonesia; [email protected] * Correspondence: [email protected] Received: 6 August 2020; Accepted: 1 September 2020; Published: 3 September 2020 Abstract: We consider the problem of utilizing the parcel locker network for the logistics solution in the metropolitan area. Two-echelon distribution systems are attractive from an economic standpoint, whereas the product from the depot can be distributed from or to intermediate facilities. In this case, the intermediate facilities are considered as locker facilities present in an accessible location in the vicinity of the final customers. In addition, the utilization of locker facilities can reduce the cost caused by the unattended deliveries. The problem is addressed as an optimization model that formulated into an integer linear programming model denoted as the two-echelon vehicle routing problem with locker facilities (2EVRP-LF).
    [Show full text]
  • Microwave Imaging for Conducting Scatterers by Hybrid Particle Swarm Optimization with Simulated Annealing
    2011 8th International Multi-Conference on Systems, Signals & Devices MICROWAVE IMAGING FOR CONDUCTING SCATTERERS BY HYBRID PARTICLE SWARM OPTIMIZATION WITH SIMULATED ANNEALING Bouzid Mhamdi, Khaled Grayaa and Taoufik Aguili Syscom Laboratory, Engineer School of Tunis - BP 37 Belvedere, Tunis - 1002, Tunisia [email protected] ABSTRACT The first is based on gradient searching schemes such as the Newton-Kantorovitch method [3], modified In this paper, a microwave imaging technique for gradient method [4] and Levenberg-Marguart algorithm reconstructing the shape of two-dimensional perfectly [5]. It is well-known that these deterministic techniques conducting scatterers by means of a stochastic used for fast reconstruction of microwave images tend to optimization approach is investigated. get trapped in a local extreme and suffer from a major Based on the boundary condition and the measured drawback, where the final image is highly dependent on scattered field derived by transverse magnetic the initial trial solution. To overcome this obstacle, a illuminations, a set of nonlinear integral equations is second class of population based stochastic methods such obtained and the imaging problem is reformulated in to as the genetic algorithm (GA) [1-2], simulated annealing an optimization problem. (SA) [6] and neural network (NN) [7] has become A hybrid approximation algorithm, called PSO-SA, is attractive alternatives to reconstruct microwave images. developed in this work to solve the scattering inverse These techniques consider the imaging problem as a problem. In the hybrid algorithm, particle swarm optimization (PSO) combines global search and local global optimization problem, and reconstruct the correct search for finding the optimal results assignment with image by searching for the optimal solution through the reasonable time and simulated annealing (SA) uses use of either rivalry or cooperation strategies in the certain probability to avoid being trapped in a local problem space.
    [Show full text]
  • Arxiv:1904.04123V2 [Stat.ML] 10 Oct 2019 Human Efforts Invested in finding Good Network Architec- Pected final Architecture, and Choose the Components That Tures
    ASAP: Architecture Search, Anneal and Prune Asaf Noy1, Niv Nayman1, Tal Ridnik1, Nadav Zamir1, Sivan Doveh2, Itamar Friedman1, Raja Giryes2, and Lihi Zelnik-Manor1 1DAMO Academy, Alibaba Group (fi[email protected]) 2School of Electrical Engineering, Tel-Aviv University Tel-Aviv, Israel Abstract methods, such as ENAS [34] and DARTS [12], reduced the search time to a few GPU days, while not compromising Automatic methods for Neural Architecture Search much on accuracy. While this is a major advancement, there (NAS) have been shown to produce state-of-the-art net- is still a need to speed up the process to make the automatic work models. Yet, their main drawback is the computa- search affordable and applicable to more problems. tional complexity of the search process. As some primal In this paper we propose an approach that further re- methods optimized over a discrete search space, thousands duces the search time to a few hours, rather than days or of days of GPU were required for convergence. A recent weeks. The key idea that enables this speed-up is relaxing approach is based on constructing a differentiable search the discrete search space to be continuous, differentiable, space that enables gradient-based optimization, which re- and annealable. For continuity and differentiability, we fol- duces the search time to a few days. While successful, it low the approach in DARTS [12], which shows that a con- still includes some noncontinuous steps, e.g., the pruning of tinuous and differentiable search space allows for gradient- many weak connections at once. In this paper, we propose a based optimization, resulting in orders of magnitude fewer differentiable search space that allows the annealing of ar- computations in comparison with black-box search, e.g., chitecture weights, while gradually pruning inferior opera- of [48, 35].
    [Show full text]
  • Lecture 19 -- Simulated Annealing
    Simulated Annealing Biostatistics 615/815 Lecture 19 Scheduling z Need to pick a date for mid-term z Default date is December 20, 2006 z We could have it earlier… • For example, on December 12, 2006? z What do you prefer? So far … z “Greedy” optimization methods • Can get trapped at local minima • Outcome might depend on starting point z Examples: • Golden Search • Nelder-Mead Simplex Optimization • E-M Algorithm Today … z Simulated Annealing z Markov-Chain Monte-Carlo method z Designed to search for global minimum among many local minima The Problem z Most minimization strategies find the nearest local minimum z Standard strategy • Generate trial point based on current estimates • Evaluate function at proposed location • Accept new value if it improves solution The Solution z We need a strategy to find other minima z This means, we must sometimes select new points that do not improve solution z How? Annealing z One manner in which crystals are formed z Gradual cooling of liquid … • At high temperatures, molecules move freely • At low temperatures, molecules are "stuck" z If cooling is slow • Low energy, organized crystal lattice formed Simulated Annealing z Analogy with thermodynamics z Incorporate a temperature parameter into the minimization procedure z At high temperatures, explore parameter space z At lower temperatures, restrict exploration Markov Chain z Start with some sample • A set of mixture parameters z Propose a change • Edit mixture parameters in some way z Decide whether to accept change • Decision is based on relative probabilities
    [Show full text]
  • Simulated Annealing
    Appendix A Simulated Annealing A.1 Introduction Simulated annealing can be considered as an extension of local optimization meth- ods because: • This approach can be applied to criteria that are neither continuous nor con- tinuously differentiable. It is just necessary to be able to compute the value of the criterion for any feasible solution. Thus, the criterion may be given by any type of function or even by an algorithm that returns a numerical value starting from the values of the parameters that define a solution. • The variables can be of a qualitative nature, needing only to be able to derive from their “values” a quantitative value of the criterion. Indeed, simulated annealing easily applies to combinatorial optimization prob- lems. A problem P that belongs to the field of application of simulated annealing is expressed as any optimization problem: Find *∈ Ss such that )*( = Opt sfsf )( ∈Ss where: • S is the set of feasible solutions. A feasible solution is one that satisfies all the constraints. We will see some examples in the next section. • f is a “function” in the broadest sense, as explained above. This function is the criterion. • Opt refers either to minimization or to maximization, depending on the type of problem. 450 A Simulated Annealing A.2 Basic Requirements In order to apply simulated annealing, we must be able to: • Compute the value of the criterion for any feasible solution. • Define an initial feasible solution. • Derive a neighboring feasible solution from any current solution. The criterion depends on the problem to be solved. An initial feasible solution may be difficult to find.
    [Show full text]
  • Heuristics: Simulated Annealing and Genetic Algorithms
    Simulated annealing and genetic algorithms CE 377K Stephen D. Boyles Spring 2015 1 Introduction Different optimization problems must be solved in different ways. Even just for the simple examples developed at the start of this course, the way that one solves the shortest path problem, for instance, is very different from the way that one solves the transit frequency setting problem. Later in this course, we will learn examples of these types of specific solution methods. But first, we start with a broad class of solution methods which can be applied to nearly any optimization problem. In optimization, there is often a tradeoff between how widely applicable a solution method is, and how efficient or effective it is at solving specific problems. For any one specific problem, a tailor- made solution process is likely much faster than a generally-applicable method, but generating such a method requires more effort and specialized knowledge, and is less \rewarding" in the sense that the method can only be more narrowly applied. So, the first solution methods you will see in this course are very widely applicable. They are examples of heuristics, which are not guaranteed to find the global optimum solution, but tend to work reasonably well in practice. In practice, they only tend to be applied for very large or very complicated problems which cannot be solved exactly in a reasonable amount of time with our current knowledge. But until you learn the more specialized solution methods later in the class, you can also use these even for smaller or simpler optimization problems too.
    [Show full text]
  • Improving Simulated Annealing Through Derandomization
    Improving Simulated Annealing through Derandomization Mathieu Gerber∗ Luke Bornn Department of Statistics Harvard University We propose and study a version of simulated annealing (SA) on continu- ¯ ous state spaces based on (t; s)R-sequences. The parameter R 2 N regulates the degree of randomness of the input sequence, with the case R = 0 corres- ponding to IID uniform random numbers and the limiting case R = 1 to (t; s)-sequences. Our main result, obtained for rectangular domains, shows that the resulting optimization method, which we refer to as QMC-SA, con- verges almost surely to the global optimum of the objective function ' for any R 2 N. When ' is univariate, we are in addition able to show that the completely deterministic version of QMC-SA is convergent. A key property of these results is that they do not require objective-dependent conditions on the cooling schedule. As a corollary of our theoretical analysis, we provide a new almost sure convergence result for SA which shares this property under minimal assumptions on '. We further explain how our results in fact apply to a broader class of optimization methods including for example threshold accepting, for which to our knowledge no convergence results currently ex- ist. We finally illustrate the superiority of QMC-SA over SA algorithms in a numerical study. Keywords: Global optimization; Quasi-Monte Carlo; Randomized quasi- Monte Carlo; Simulated annealing; Threshold accepting 1. Introduction arXiv:1505.03173v4 [stat.CO] 5 Sep 2016 Simulated annealing (SA) belongs to the class of stochastic optimization techniques, a popular suite of tools to find the global optimum of multi-modal and/or non-differentiable functions defined on continuous sets (Geman and Hwang, 1986).
    [Show full text]
  • Improved Salp Swarm Algorithm with Simulated Annealing for Solving Engineering Optimization Problems
    S S symmetry Article Improved Salp Swarm Algorithm with Simulated Annealing for Solving Engineering Optimization Problems Qing Duan, Lu Wang, Hongwei Kang * , Yong Shen, Xingping Sun and Qingyi Chen School of Software, Yunnan University, Kunming 650000, China; [email protected] (Q.D.); [email protected] (L.W.); [email protected] (Y.S.); [email protected] (X.S.); [email protected] (Q.C.) * Correspondence: [email protected] Abstract: Swarm-based algorithm can successfully avoid the local optimal constraints, thus achieving a smooth balance between exploration and exploitation. Salp swarm algorithm (SSA), as a swarm- based algorithm on account of the predation behavior of the salp, can solve complex daily life optimization problems in nature. SSA also has the problems of local stagnation and slow convergence rate. This paper introduces an improved salp swarm algorithm, which improve the SSA by using the chaotic sequence initialization strategy and symmetric adaptive population division. Moreover, a simulated annealing mechanism based on symmetric perturbation is introduced to enhance the local jumping ability of the algorithm. The improved algorithm is referred to SASSA. The CEC standard benchmark functions are used to evaluate the efficiency of the SASSA and the results demonstrate that the SASSA has better global search capability. SASSA is also applied to solve engineering optimization problems. The experimental results demonstrate that the exploratory and exploitative proclivities of the proposed algorithm and its convergence patterns are vividly improved. Citation: Duan, Q.; Wang, L.; Kang, H.; Shen, Y.; Sun, X.; Chen, Q. Keywords: swarm-based algorithm; salp swarm algorithm; single objective optimization; symmetric Improved Salp Swarm Algorithm perturbation; simulated annealing; engineering optimization problems with Simulated Annealing for Solving Engineering Optimization Problems.
    [Show full text]