Global Optimization by Adapted Diffusion

Global Optimization by Adapted Diffusion The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Poliannikov, Oleg V., Elena Zhizhina, and Hamid Krim. “Global Optimization by Adapted Diffusion.” IEEE Transactions on Signal Processing 58.12 (2010): 6119–6125. Web. © 2012 IEEE. As Published http://dx.doi.org/10.1109/tsp.2010.2071867 Publisher Institute of Electrical and Electronics Engineers Version Final published version Citable link http://hdl.handle.net/1721.1/70849 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 12, DECEMBER 2010 6119 Global Optimization by Adapted Diffusion Oleg V. Poliannikov, Elena Zhizhina, and Hamid Krim, Fellow, IEEE Abstract—In this paper, we study a diffusion stochastic dy- the set of global minima of , the diffusion coefficient should namics with a general diffusion coefficient. The main result is slowly decay to zero. We conventionally refer to the behavior that adapting the diffusion coefficient to the Hamiltonian allows of the function as the cooling schedule of our dynamics. We to escape local wide minima and to speed up the convergence of the dynamics to the global minima. We prove the convergence call this standard dynamics [4], [5] spatially homogeneous be- of the invariant measure of the modified dynamics to a measure cause the diffusion coefficient does not depend on the state vari- concentrated on the set of global minima and show how to choose able . a diffusion coefficient for a certain class of Hamiltonians. In this paper, we propose a new diffusion process whose dis- Index Terms—Nonlinear systems, optimization methods, simu- tinguished feature is a spatially inhomogeneous diffusion coef- lated annealing, stochastic fields. ficient. It is important that the stationary Gibbs distribution of the newly introduced dynamics be identical to that of the homogeneous diffusion. It is, however, shown that by appropriately I. INTRODUCTION constructing the inhomogeneous diffusion, one can improve the IBBS field based stochastic methods have long been rec- speed of convergence of the overall dynamics to the stationary G ognized as an effective approach to solve problems of distribution. We prove that the order of the speed of convergence global optimization, see for instance [1]–[6]. Their essence can cannot be improved on, but the corresponding coefficient can be be summarized in the following way. Consider an equilibrium in principle chosen optimally. stochastic dynamics with a stationary Gibbs measure, where The inhomogeneous diffusion coefficient that leads to the the latter is associated with some energy functional . The dy- optimal speed of convergence depends on the functional at namics is then changed so that it is no longer in equilibrium, and hand. Its exact form for a general continues to be an open its limit distribution is concentrated on the set of global minima problem. Of particular interest in many applications is a situa- of . This approach to optimization is generally called simu- tion where the global minimum of is so narrow that a standard lated annealing. diffusion tends to overlook it. We demonstrate that it is possible More precisely, consider a stochastic diffusion dynamics, to adapt the diffusion to the cost functional and to hence alle- whose invariant measure is given by viate this problem. The performance of the adapted diffusion is shown to offer superior performance in comparison to its clas- (1) sical counterpart. These problems may arise when the cost functional consists where , is the energy func- of two terms. The first data fidelity term smoothly penalizes de- tional, and —normalized Lebesgue measure on . We seek to viations from the given data [7]. The second smoothness term find the set of global minima of . The classical technique of defines a relatively small subspace, to which the solution is at- solving this problem is to stochastically perturb the determin- tracted (but does not have to belong). The convex combina- istic gradient descent tion of these terms often results in a functional with narrow global minimum (see Section IV-A for an illustration). Alterna- (2) tively, consider a system identification problem where the goal where is a realization of the standard Brownian motion is to recover the coefficients of an unknown IIR filter based (see, for example, [4] and [5]). Then, following the simulated on the observed output. The coefficients are usually found by annealing regime, for the limiting measure to concentrate on minimizing the mismatch between the synthetic response of test filters and the data. The resulting multidimensional cost functionals are multimodal, and the optimal global minimum Manuscript received November 22, 2009; accepted August 15, 2010. Date of is relatively narrow as compared to other local minima (see publication September 02, 2010; date of current version November 17, 2010. The associate editor coordinating the review of this manuscript and approving Section IV-C). it for publication was Prof. Jean-Christophe Pesquet. This work was also sup- This paper is organized as follows. In Section II, we formulate ported by the U.S. Air Force Office of Scientific Research under Grant FA results on stochastic dynamics and its approximations, which 9550-07-1-0104. O. V. Poliannikov is with the Earth Resources Laboratory, Massachusetts In- are non-homogeneous Markov chains. In Section III, we discuss stitute of Technology, Cambridge, MA 02139 USA (e-mail: [email protected]). how to choose a modified diffusion so as to adapt it to a partic- E. Zhizhina is with the Dobrushin Laboratory, Institute for Information Trans- ular form of the cost functional. In Section IV, we analyze and mission Problems, Moscow GSP-4, 127994, Russia (e-mail: [email protected]). H. Krim is with the Department of Electrical and Computer Engineering, compare the convergence properties of the modified diffusions North Carolina State University, Raleigh, NC 27695 USA (e-mail: ahk@ncsu. and of the Langevin dynamics using numerical simulations. The edu). conclusions and description of possible extensions are deferred Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. to Section V. Finally, the Appendix contains the proof of the Digital Object Identifier 10.1109/TSP.2010.2071867 main result. 1053-587X/$26.00 © 2010 IEEE 6120 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 12, DECEMBER 2010 II. THEORETICAL RESULTS we introduce the following notation. We denote by the Let , and denote by a space of -neighborhood of any , and by the union of sufficiently smooth functions on with matching (periodic) -neighborhoods of all global minima of : boundary conditions. Let be a smooth function (9) bounded from below. For specificity, assume that (3) Theorem 1: For an arbitrary , assume the existence of , such that . Then as Consider a continuous time Markov process on whose in- and . finitesimal generator is given by Proof: See the Appendix. III. SPEED OF CONVERGENCE AND MODIFIED DIFFUSION COEFFICIENT (4) It is seen from (37)–(39) that the rate of convergence of the modified diffusion remains similar, since the parameter in As follows from the formula for diffusions generated by second- the cooling procedure decreases in the same way as for the clas- order differential operators of the general form [8], this process sical dynamic. The speed of convergence, however, may be im- is also the solution to the diffusion equation proved by minimizing the coefficients in (37) and (38). We note that if , the coefficients in (37) and (38) have the same expression: (5) (10) The function in (5) is assumed fixed, nonnegative and smooth. This together with the form of the modified drift suggest that the Below we will describe how by choosing a specific form of function could be chosen inversely proportional to : , we can control the behavior of the dynamic to take advan- tage of known features of . The positive scalar is (11) the annealing parameter called “temperature.” As in the case of the conventional annealing, it will slowly decay to zero. The where the suitable choice of parameter ensures the stability of meaning of parameter will become obvious shortly. the numerical algorithm in the neighborhood of local mimima, Proposition 1: If the process defined by (4) has a unique sta- where . tionary distribution , then the latter has a Gibbs density, i.e., The diffusion coefficient so constructed suppresses random jumps when the gradient of the cost function is large, and it re- (6) inforces them when the gradient is small. As a result, the process naturally explores narrow steep cavities of the cost functional in where is a normalization constant, and is the normal- more detail than the standard (homogeneous) diffusion would. ized Lebesgue measure on . In the next section, we show that this results in far superior Proof: Follows from the equality performance of the optimization algorithm for the class of cost functionals under consideration. (7) IV. NUMERICAL SIMULATIONS for any . In this section we present simulation results, which demon- The diffusion process that corresponds to the infinitesimal strate the performance of the newly proposed modified diffusion generator (4) is written as an approximation in time of the dif- versus the standard dynamics. In both simulation cases, we use fusion process (5). It is a Markov chain given by the same cooling schedule: (12) (8) where , , and is the total number of itera- tions. Note that the sequence decays monotonously. The where is the discretization step, and is an i.i.d. parameter controls the rate of decay, and it should be suffi- random sequence, and , . ciently small as to allow proper mixing in the sense of Proposi- Proposition 2: For any such that , Markov tion 2 at each temperature level, while being large enough to en- chain (8) has a stationary distribution, which will be denoted .

Load more