<<

Simulated Annealing A Basic Introduction

Prof. Olivier de Weck

Massachusetts Institute of Technology

Dr. Cyrus Jilla Outline

• Background in Statistical Mechanics – Atom Configuration Problem – Metropolis • Simulated Annealing Algorithm • Sample Problems and Applications • Summary Learning Objectives

• Review background in Statistical Mechanics: configuration, ensemble, entropy, heat capacity • Understand the basic assumptions and steps in Simulated Annealing • Be able to transform design problems into a combinatorial optimization problem suitable to SA • Understand strengths and weaknesses of SA Heuristics What is a ?

• A Heuristic is simply a rule of thumb that hopefully will find a good answer. • Why use a Heuristic? – Heuristics are typically used to solve complex (large, nonlinear, nonconvex (ie. contain many local minima)) multivariate combinatorial optimization problems that are difficult to solve to optimality. • Unlike gradient-based methods in a convex design space, heuristics are NOT guaranteed to find the true global optimal solution in a single objective problem, but should find many good solutions (the mathematician's answer vs. the engineer’s answer) • Heuristics are good at dealing with local optima without getting stuck in them while searching for the global optimum. Types of Heuristics

• Heuristics Often Incorporate Randomization

• 2 Special Cases of Heuristics – Construction Methods • Must first find a feasible solution and then improve it. – Improvement Methods • Start with a feasible solution and just try to improve it.

• 3 Most Common Heuristic Techniques – Genetic – Simulated Annealing – – New Methods: Particle Swarm Optimization, etc… Origin of Simulated Annealing (SA)

• Definition: A heuristic technique that mathematically mirrors the cooling of a set of atoms to a state of minimum energy.

• Origin: Applying the field of Statistical Mechanics to the field of Combinatorial Optimization (1983)

• Draws an analogy between the cooling of a material (search for minimum energy state) and the solving of an optimization problem.

• Original Paper Introducing the Concept – Kirkpatrick, S., Gelatt, C.D., and Vecchi, M.P., “Optimization by Simulated Annealing,” Science, Volume 220, Number 4598, 13 May 1983, pp. 671­ 680. Statistical Mechanics The Analogy

• Statistical Mechanics: The behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature.

• Combinatorial Optimization: Finding the minimum of a given function depending on many variables.

• Analogy: If a liquid material cools and anneals too quickly, then the material will solidify into a sub-optimal configuration. If the liquid material cools slowly, the crystals within the material will solidify optimally into a state of minimum energy (i.e. ground state). – This ground state corresponds to the minimum of the cost function in an optimization problem. Sample Atom Configuration

Original Configuration Perturbed Configuration Atom Configuration - Sample Problem Atom Configuration - Sample Problem 7 7

6 6

5 5

4 3 4

3 4 3 4

2 2 2 3 2

1 1 1 1

0 0

-1 -1 -1 0 1 2 3 4 5 6 7 -1 0 1 2 3 4 5 6 7 E=133.67 E=109.04 Perturbing = move a random atom Energy of original (configuration) to a new random (unoccupied) slot Configurations

• Mathematically describe a configuration – Specify coordinates of each “atom”

Ø1ø Ø 3 øØ 4 ø Øø 5 {ri } with i=1,..,4 r1 = Œœ, r2 = Œ œ, r3 = Œ œ, r4 = Œœ º1ß º2 ߺ 4 ß ºß3

– Specify a slot for each atom

{ri} with i=1,..,4 R = [1 121 9 23] Energy of a state

• Each state (configuration) has an energy level associated with it

H( qq,�,t) = � q�iip- L ( qqt ,,� ) i Hamiltonian

H =+=TVE tot Energy of configuration = Objective function value of design kinetic potential Energy Energy sample problem

• Define energy function for “atom” sample problem 1 N 2 2 E = mgy + x - x + y - y 2 i i �(( i j ) ( i j ) ) potential� ji„ energy ������������� kinetic energy

N Absolute and relative position of E( R) = � Er i ( i ) i=1 each atom contributes to Energy Compute Energy of Config. A

• Energy of initial configuration

E1 = 1� 10 � 1+ 5 + 18+ 20= 20.95

E2 = 1� 10 � 2+ 5 + 5 + 5 = 26.71

E3 = 1� 10 � 4+ 18+ 5 + 2 = 47.89

E4 = 1� 10 � 3 + 20+ 5 + 2 = 38.12

Total Energy Configuration A: E({rA})= 133.67 Boltzmann

P! Number of configurations NR = ( PN- )! P=# of slots=25 N =6,375,600 N=# of atoms =4 R

What is the likelihood that a particular configuration will exist in a large ensemble of configurations?

Ø-Er({ }) ø Boltzmann probability Pr({ }) = exp Œ œ depends on energy º kTB ß and temperature Occurences 10000 12000 2000 4000 6000 8000 0 2 4 6 80 60 40 20 0 T high state(s) inthelimitoflowtemperature Boltzmann Distribution collapsestothelowestenergy Bol T=100 -E T=100 Energy av 100 g =100.1184 120 tzm 4 160 140 Basis ofsearch bySimulatedAnnealing 180 ann Coll 200 Occurences 10000 12000 14000 2000 4000 6000 8000 0 0 0 0 0 0 100 80 60 40 20 T=10 T=10 -E Energy av g =58.9245 120 4 160 140 aps 8 200 180

Occurences 10000 12000 14000 e at l e 2000 4000 6000 8000 0 0 20 40 60 T=1 -E T=1 80 ow T Energy av 0 10 4 10 8 200 180 160 140 120 100 g T low =38.1017 Partition Function Z

• Ensemble of configurations can be described statistically • Partition function, Z, generates the ensemble average

� -E � NR � -E � Z = Trex p� i � = � exp � i � Ł kB Tł i=1 Ł kT B ł

Initially (at T>>0) equal to the number of possible configurations Free Energy

NR Average Energy of all Eav g = E()T= � ETi ( ) Configurations in an i=1 ensemble

F( T) =-k B TZln= E() T- TS

NR � ETi ( ) � �exp � - � ETi ( ) i=1 Ł kTB ł -dZln E avg = ET() = = Z d (1 kTB )

Relates average Energy at T with Entropy S Specific Heat and Entropy

• Specific Heat

NR Ø 1 2 2 ø Œ � Ei () T- ET ( ) œ 2 2 dET() º N R i=1 ß E() T- ET ( ) CT( ) = = 2 = 2 dT kBT kTB • Entropy T 1 CT() dS(T) CT() S(T)= S() T- dT 1 � = T T dTT Entropy is ~ equal to ln(# of unique configurations) Low Temperature Statistics

Specific Heat C and Entropy S 5 10 10

T 8 Entropy S

6 decreases

4 with lower T

2 # o f c onfiguration s, N 0 # configs 10 0 Ht. Capacity 0 2 4 0 2 4 10 10 10 10 10 10 decreases Temperature Temperature

150 150

100 g 100 v a 50 50

0 0

-50 -50 Free Energ y, F -100 -100 Approaches Approaches Av erage Energ y, E

-150 -150 E* from E* from 0 2 4 0 2 4 10 10 10 10 10 10 below Temperature Temperature above Minimum Energy Configurations • Sample Atom Placement Problem Uniqueness?

Atom Configuration - Sample Problem Atom Configuration - Sample Problem 7 7

6 6

5 E*=60 5 E*=14.26 4 4

3 3

2 2 3 2

1 2 1 4 3 1 4 1

0 0

-1 -1 -1 0 1 2 3 4 5 6 7 -1 0 1 2 3 4 5 6 7 Strong gravity g=10 Low gravity g=0.1 Simulated Annealing Dilemma

• Cannot compute energy of all configurations ! – Design space often too large – Computation time for a single function evaluation can be large • Use Metropolis Algorithm, at successively lower temperatures to find low energy states – Metropolis: Simulate behavior of a set of atoms in thermal equilibrium (1953) – Probability of a configuration existing at T � Boltzmann Probability P(r,T)=exp(-E(r)/T) The SA Algorithm • Terminology: – X (or R or G) = Design Vector (i.e. Design, Architecture, Configuration) – E = System Energy (i.e. Objective Function Value) – T = System Temperature – D = Difference in System Energy Between Two Design Vectors

• The Simulated Annealing Algorithm

1) Choose a random Xi, select the initial system temperature, and specify the cooling (i.e. annealing) schedule

2) Evaluate E(Xi) using a simulation model

3) Perturb Xi to obtain a neighboring Design Vector (Xi+1)

4) Evaluate E(Xi+1) using a simulation model

5) If E(Xi+1)< E(Xi), Xi+1 is the new current solution

6) If E(Xi+1)> E(Xi), then accept Xi+1 as the new current solution with a (- D/T) probability e where D = E(Xi+1) - E(Xi). 7) Reduce the system temperature according to the cooling schedule. 8) Terminate the algorithm. SA BLOCK DIAGRAM

Start To End Tmin

� � � y Define Initial n T < Tmin ? Configuration Ro �

Perturb Configuration Reduce Temperature Evaluate Energy E(Ro) � Ri -> Ri+1 � Tj+1 = Tj - �T �� n y � Evaluate Energy E(Ri+1) Reached Equilibrium at Tj ? �

Compute Energy Difference Accept R as i+1 Keep Ri as Current �E = E(Ri+1) - E(Ri) � New Configuration Configuration

� y y

n �E < 0 ? exp -�E > � ? � � Metropolis n Step � � Create Random Number � in [0,1] Matlab Function: SA.m

Best Configuration Initial Configuration SA.m xbest Ebest xo Simulated Annealing Option Flags Algorithm History options xhist

evaluation perturbation function function file_eval.m file_perturb.m Exponential Cooling

k • Typically (T1/To)~0.7-0.9 � T � T = 1 T Simulated Annealing Evolution k +1 � � k 10 C-specific heat Ł To ł 9 S-entropy T/10-temperature 8 7 Can also do linear 6 or step-wise cooling 5

4

3 …

2 But exponential cooling 1 often works best.

0 0 5 10 15 20 25 30 35 40 45 Temperature Step Key Ingredients for SA

• A concise description of a configuration (architecture, design, topology) of the system (Design Vector).

• A random generator of rearrangements of the elements in a configuration (Neighborhoods). This generator encapsulates rules so as to generate only valid configurations.

• A quantitative objective function containing the trade-offs that have to be made (Simulation Model and Output Metric(s)). Surrogate for system energy.

• An annealing schedule of the temperatures and/or the length of times for which the system is to be evolved. Sample Problems Matlab “peaks” function

• Difficult due to plateau at z=0, local maxima

• SAdemo1 Peaks 3 • x =[-2,-2] o 2 • Optimum at Maximum 1 • x*=[ 0.012, 1.524]

y 0 • z*=8.0484 -1

-2 Start Point -3 -3 -2 -1 0 1 2 3 x “peaks” convergence • Initially ~ nearly random search • Later ~ gradient search SA convergence history 2 current configuration new best configuration 0

-2

-4 Syst em Energy -6

-8

-10 0 50 100 150 200 250 300 Iteration Number Traveling Salesman Problem

• N cities arranged randomly on [-1,1] • Choose N=15 • SAdemo1 • Minimize “cost” of route (length, time,…) • Visit each city once, return to start city

N 2 2 2 2 l( R) = � ( x j ( i+1 ) - x j ( i )) + �( x j ( R1 ) - xRj ( N )) i=1 j=1 j=1 ������������� ����������� visit N cities return home to first city TSP Problem (II)

Initial (Random) Route Final (Optimized) Route Length: 17.43 Length: 8.24

Length of TSP Route: 17.4309 Length of TSP Route: 8.2384 1 2 1 2 7 10 7 10 0.8 11 0.8 11

0.6 Start1 0.6 1

0.4 8 0.4 8 3 3 0.2 14 12 0.2 14 Start12 9 9 0 0

-0.2 -0.2

-0.4 -0.4 6 6 -0.6 5 -0.6 5 4 4

-0.8 13 -0.8 13

-1 15 -1 15 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Shortest Route Result with SA Structural Optimization

• Define: – Design Domain find ri i = 1,..., N T – Boundary Conditions minC= fu ()r i – Loads s.t. uKf= -1 – Mass constraint N • Subdivide domain s.t. �Viir £ mmax – N x M design “cells” i=1 – Cell density r=1 or r=0 • Where to put material to minimize compliance? Structural Topology Optimization (II) “Energy” = strain energy = compliance Computed via Finite Element Analysis

41 4242 43 44 45 46 47 48 49 50 43 44 45 28 29 30 31 32 46 33 34 35 36 47 31 3232 33 34 35 36 37 38 48 39 40 33 34 49 35 19 20 21 22 23 36 24 25 26 27 50 37 38 21 22 23 24 25 26 27 28 29 30 24 25 39 10 11 12 13 14 26 15 16 17 18 40 27 Applied Load 11 1212 13 14 15 16 17 1828 19 20 13 14 29 15 1 2 3 4 5 16 6 7 8 9 30 17 1 22 3 4 5 6 7 188 9 10 3 4 5 19 6 20 7 Cantilever 8 9 F=-2000 Boundary 10 condition

Deformation not drawn to scale Structural Toplogy Optimization

Initial Structural Toplogy “Final” Toplogy Initial Structural Configuration: x o “island” 9 Structural Configuration 8

7

6

5

4

3

2

1

0

-1 0 2 4 6 8 10 Compliance C=236.68 Compliance C=593.0 Not “optimal” – often need post-processing Structural Optimization – Convergence Analysis Simulated Annealing Evolution 14 45 C-specific heat T=7.4 S-entropy 40 12 ln(T)-temperature Don’t terminate T=36 35 T=19 when C is still 10 30 T=6.7 T=10T=11 large ! T=13 T=6 8 25 T=21 T=9.2T=8.2 T=16 T=29T=68 T=93 20 T=24 T=1.8e+002 6 T=55 Occurences T=40 T=3e+002 T=17 T=1.9e+002 T=32 T=2.2e+002 T=2.7e+002 15 T=14 T=1.3e+002T=1.6e+002 T=2.4e+002 T=61T=45 T=1e+002 4 T=49T=84 T=26 T=1.1e+002 10 T=75 T=1.4e+002

2 5

0 0 0 5 10 15 20 25 30 35 40 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 Temperature Step Energy Boltzmann Distributions Evolution of - Entropy, Temperature, Specific Heat Premature termination

SA convergence history 2200 current configuration 2000 new best configuration

1800

1600

1400

1200

1000 Syst em Energy

800

600

400

200 0 500 1000 1500 2000 2500 3000 Iteration Number

Indicator: Best Configuration found only shortly before Simulated Annealing terminated. Final Example: Telescope Array Optimization

• Place N=27 stations in xy within a 200 km radius • Minimize UV density metric • Ideally also minimize cable length

Cable Length = 1064.924 UV Density = 0.67236 200 100

100 50

0 0 Initial Configuration -100 -50

-200 -100 -200 -100 0 100 200 -100 -50 0 50 100 Optimized Solution

Cable Length = 1349.609 UV Density = 0.33333 200 100

100 50

0 0

-100 -50

-200 -100 -200 -100 0 100 200 -100 -50 0 50 100 Simulated Annealing Improved UV density from 0.67 to 0.33

• Simulated Annealing transforms the array: – Hub-and-Spoke � Circle-with-arms

Reference available Summary Summary: Steps of SA

• The Simulated Annealing Algorithm

1) Choose a random Xi, select the initial system temperature, and outline the cooling (ie. annealing) schedule

2) Evaluate E(Xi) using a simulation model

3) Perturb Xi to obtain a neighboring Design Vector (Xi+1)

4) Evaluate E(Xi+1) using a simulation model

5) If E(Xi+1)< E(Xi), Xi+1 is the new current solution

6) If E(Xi+1)> E(Xi), then accept Xi+1 as the new current solution with a (- D/T) probability e where D = E(Xi+1) - E(Xi). 7) Reduce the system temperature according to the cooling schedule. 8) Terminate the algorithm. Research in SA

• Alternative Cooling Schedules and Termination criteria • Adaptive Simulated Annealing (ASA) – determines its own cooling schedule • Hybridization with other Heuristic Search Methods (GA, Tabu Search …) • Multiobjective Optimization with SA References

• Cerny, V., "Thermodynamical Approach to the Traveling Salesman Problem: An Efficient Simulation Algorithm", J. Opt. Theory Appl., 45, 1, 41-51, 1985 .

• de Weck O. “System Optimization with Simulated Annealing (SA), Memorandum

• Cohanim B., Hewitt J, and de Weck O. L. “The Design of Radio Telescope Array Configurations using Multiobjective Optimization: Imaging Performance versus Cable Length”, The Astrophysical Journal, (in press), 2004

• Jilla, C.D., and Miller, D.W., “Assessing the Performance of a Heuristic Simulated Annealing Algorithm for the Design of Distributed Satellite Systems,” Acta Astronautica, Vol. 48, No. 5-12, 2001, pp. 529-543.

• Kirkpatrick, S., Gelatt, C.D., and Vecchi, M.P., “Optimization by Simulated Annealing,” Science, Volume 220, Number 4598, 13 May 1983, pp. 671-680.

• Metropolis,N., A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, "Equation of State Calculations by Fast Computing Machines", J. Chem. Phys.,21, 6, 1087-1092, 1953.