View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Unitn-eprints Research

DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY

38100 Povo — Trento (Italy), Via Sommarive 14 http://dit.unitn.it/

REACTIVE SEARCH AND INTELLIGENT OPTIMIZATION

Roberto Battiti, Mauro Brunato, and Franco Mascia

Technical Report # DIT-07-049

Reactive Search and Intelligent Optimization

Roberto Battiti, Mauro Brunato and Franco Mascia

Dipartimento di Informatica e Telecomunicazioni, Universit`adi Trento, Italy

Version 1.02, July 6, 2007 Technical Report DIT-07-049, Universit`adi Trento, July 2007 Available at: http://reactive-search.org/ Email for correspondence: [email protected] ii Contents

Preface 1

1 Introduction 1 1.1 Parameter tuning and intelligent optimization ...... 2 1.2 Bookoutline ...... 3 Bibliography ...... 4

2 Reacting on the neighborhood 5 2.1 Local search based on perturbation ...... 5 2.2 Learninghowtoevaluatetheneighborhood ...... 7 2.3 Learning the appropriate neighborhood in variable neighborhood search . . . . . 8 2.4 Iteratedlocalsearch...... 12 Bibliography ...... 16

3 Reacting on the annealing schedule 19 3.1 Stochasticity in local moves and controlled worsening of solution values . . . . . 19 3.2 SimulatedAnnealingandAsymptotics ...... 19 3.2.1 Asymptoticconvergenceresults ...... 20 3.3 Online learning strategies in ...... 22 3.3.1 Combinatorial optimization problems ...... 23 3.3.2 of continuous functions ...... 24 Bibliography ...... 25

4 Reactive prohibitions 27 4.1 Prohibitions for diversification (Tabu Search) ...... 27 4.1.1 FormsofTabuSearch...... 28 4.1.2 Dynamicalsystems ...... 28 4.1.3 An example of Fixed Tabu Search ...... 29 4.1.4 Relation between prohibition and diversification ...... 30 4.1.5 Howtoescapefromanattractor ...... 31 4.2 ReactiveTabuSearch(RTS) ...... 36 4.2.1 Self-adjustedprohibitionperiod ...... 36 4.2.2 Theescapemechanism ...... 37 4.3 Implementation: storing and using the search history ...... 37 4.3.1 Fast algorithms for using the search history ...... 38 4.3.2 Persistentdynamicsets ...... 39 Bibliography ...... 41

5 Model-based search 45 5.1 Modelsofaproblem...... 45 5.2 Anexample ...... 47 5.3 Dependentprobabilities...... 47

iii iv CONTENTS

5.4 Thecross-entropymodel ...... 50 Bibliography ...... 51

6 Reacting on the objective function 53 6.1 Eliminating plateaus by looking inside the problem structure ...... 57 6.1.1 Non-oblivious local search for SAT ...... 58 Bibliography ...... 59

7 Algorithm portfolios and restart strategies 63 7.1 Introduction: portfoliosandrestarts ...... 63 7.2 Predicting the performance of a portfolio from its componentalgorithms . . . . . 64 7.2.1 Parallelprocessing ...... 66 7.3 Reactiveportfolios ...... 67 7.4 Defininganoptimalrestarttime ...... 68 7.5 Reactiverestarts ...... 70 7.6 Summary ...... 71 Bibliography ...... 71

8 Racing 73 8.1 Introduction ...... 73 8.2 Racing to maximize cumulative reward by interval estimation ...... 74 8.3 Aiming atthemaximum withthresholdascent ...... 75 8.4 Racing for off-line configuration of meta-heuristics ...... 77 Bibliography ...... 80

9 Metrics, landscapes and features 81 9.1 Selecting features with mutual information ...... 81 9.2 Measuring local search components ...... 83 9.3 Selecting components based on diversification and bias ...... 84 9.3.1 The diversification-bias compromise (D-B plots) ...... 86 9.3.2 A conjecture: better algorithms are Pareto-optimal in D-B plots ...... 88 9.4 Howtomeasureproblemdifficulty ...... 89 Bibliography ...... 91 Preface

Considerate la vostra semenza: fatti non foste a viver come bruti, ma per seguir virtute e canoscenza.

Li miei compagni fec’io s`ıaguti, con questa orazion picciola, al cammino, che a pena poscia li avrei ritenuti;

e volta nostra poppa nel mattino, de’ remi facemmo ali al folle volo, sempre acquistando dal lato mancino.

Consider your origins: you’re not made to live as beasts, but to follow virtue and knowledge.

My companions I made so eager, with this little oration, of the voyage, that I could have hardly then contained them;

that morning we turned our vessel, our oars we made into wings for the foolish flight, always gaining ground toward the left.

(Dante, Inferno Canto XXVI, translated by Anthony LaPorta)

We would like to thank our colleagues, friends and students for reading preliminary ver- sions, commenting, and discovering mistakes. Of course we keep responsibility for the re- maining ones. In particular we wish to thank Elisa Cilia and Paolo Campigotto for their fresh initial look at the book topics. Comments on different chapters have been submitted by various colleagues and friends, including: Matteo Gagliolo, Holger Hoos, Vittorio Maniezzo. This book is version 1.0, which means that we expect future releases in the next months as soon as we carve out reading and writing time from our daily chores. Writing a more detailed preface, including acknowledging all comments, also by colleagues who are reading version 1.0, is also on the stack. The Reactive Search website at http://reactive-search.org/ is a good place to look for updated information. Finally, if you are working in areas related to Reactive Search and Intelligent Optimization and you do not find references here, we will be very happy to hear from you and to cite your work in the future releases.

Roberto, Mauro and Franco

1 2 PREFACE Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. is ing. ry to fine tune searching for Errando discitur compiled into parameters disastrous consequences. y high-level thinking. Of ience with the problem is talk about issues which are a lot experience to with some motivation if you ional complexity at the voice oped problem solving strate- Chi fa falla; e fallando s’impara. o many years the knowledge lving is strongly connected to e story: the fly will get burnt te natural for us, while more lems in Artificial Intelligence, life. been restarts, so to speak) while the kind of “affirmative action” for ot well known at the beginning, about a fly getting burnt by an of theoretical computer science. esent during its genetic evolution istinguishes an expert skier from scheme, for example es and knowledge compilation into tion and test, repeated modifications searching by generating a trajectory of search by adding one solution component at a time, Mistakes are the portals of discovery. James Joyce You win only if you aren’t afraid to lose. Rocky Aoki the optimal solution cannot be found exactly in 1 from sensors to action, without effort and ”symbolic” think , which grow as a low-order polynomial of the input size. This working at very high speed. Think about you driving a car and t on a landscape defined by the corresponding solution value. seamlessly What is critical for men is critical also in many human-devel For most of the relevant and difficult problems (see computat This book is about learning for problem solving. Let’s start explain in detailis how you so move compiledcourse, your into this feet kind your when of neuralparameters driving: fine tuning of system after of a that s problem-solving dynamicalprimitive strategi you system creatures hardly (our are need nervousincandescent more light system) an rigid bulb is (fooled in qui becauseapart their no from behavior. light a bulb distant was Think again pr one and called again ”sun”). andIn again. addition You to know No learning, the learningof search rest and solutions by of by fine trial-and-error, th genera small tuning local can changes have are alsogies. part of It the human isOperations Research not and surprising related that areas, many follow methods the for solving prob of a neural system The knowledge accumulated from the previous experience has the techniques (with somereal falling expert down into jumps local minima and a well known negativeHardness result of established approximation, in the in last addition decades to NP-hardness, is a acceptable computing times an optimal configuration on aand tree backtracking of possibilities ifcandidate a solutions dead-end is encountered, or ”NP-hardness”) researchers now believe that are not already annot expert far in from the everybody’slearning. area, human experience. just Learning to takes Human places problem makeand when so sure the its that problem structure at we handavailable. becomes is n more For and concreteness, morea let’s novice clear consider is when skiing. that more the exper What novice d knows some instructions but needs Introduction: Machine Learning for Intelligent Optimization Chapter 1 Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. f - task- m, fully documented schemes, where the : different options are can be profitably used ns. ) or to go and explore other are tuned through a feed- Papadimitriou and Steiglitz for very long times, maybe term too much because the following chapters. Citing for e method, or the parameters elopment and in the practical by Mauro. xplain how he allocates forces cific application. The Machine results [6], the hope of finding ric algorithmic schemes which lity of the heuristics results is CHAPTER 1. INTRODUCTION good. (Intensification typically olic explanation. Think about actory, both from a theoretical unsatisfactory mathemati- iled tuning. Some schemes are t always clear-cut. ) is an issue which excruciated ted by a researcher or by a final on. In many cases sub-symbolic Algorithm A can be better than odifying choice rules to favor the ne new regions, and intensifica- acterized by a certain number of e in parameter tuning is evident. ees must be abandoned for many iled implementation ranging from . The quality of results is not au- ailed algorithm design, or values of dback loop can require a slow “trial sub-symbolic learning . Deciding whether it is better to look for gold intensification/exploitation approaches above without a formal guarantee of > process and make it an integral and diversification/exploration intensification five , whose appropriate setting is a subject that raises issues o < can be used by the algorithm to determine the appropriate bal and , i.e., while the algorithm is solving an instance of a proble the user as a crucial learning component on-line Any of the diversification automate the tuning , such approaches are certainly valid in practical situatio cally performance can be considered a “heuristic.” However 6. Heuristics Unfortunately, because of the “hardness of approximation” A particular delicate issue in many heuristics is their deta Parameter tuning is a crucial issue both in the scientific dev In this book we consider some machine learning methods which Heuristics used to suffer from a bad reputation, citing from If learning acts approximation algorithms with formalrelevant performance guarant problems.forever, We and are some effort condemnedand is from to required a live to practical point make with of them heuristics view. more satisf not rigid but allow forinternal the parameters. specification of Think choices aboutworld in our the champion det novice to skier, ...The its the term deta writers meta-heuristics is ofcan used this in book: be some the cases specializedboundary to differenc for denote between gene different the heuristic problems. and the meta-heuristic We is douse no of not heuristics. like Inuser. the some As cases parameter thedifficult detailed tuning tuning is as is user is execu algorithm dependent comparing B the if different reproducibi tuned parametric by Roberto, algorithms. while itin can order be worse to if tuned book [4]: heuristics. 2 part of the algorithm.accumulated In knowledge particular the is focusregulating compiled is a into on dynamical the systemlearning to parameters build schemes of or will th searchneural work for networks, a without think soluti about giving theto a champion the skier different high-level who muscles symb cannot during e a slalom. dependent local properties where the other minersvalleys are excavating and ( unchartedforty-niners territories and ( which weexample will from meet over [5]“diversification andtion drives over focuses the again more search in intentlyoperates the to by on exami re-starting regions from previouslyinclusion high of found quality attributes to solutions, of be or these by solutions)”. m 1.1 Parameter tuningAs and we intelligent mentioned, optimization manychoices state-of-the-art and heuristics free are parameters char ance between research methodology [1,back 2, loop 3]. thatdeveloped includes In and tested some untiltomatically cases acceptable transferred the results to parameters are differentand obtained instances error” process and when the the fee algorithm has to be tuned for a spe Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. 3 online tomated ptimization beasts to ed while the algorithm nding research area of r or specific user. This es a longer and harder esh intuition for the ap- nefit from an automated on, and by progressively e obtained through feed- aselessly rolling a rock to loratory tests, followed by fgang von Goethe). On the vidence that diversification process when the algorithm oint of view, where objective eter tuning for heuristics. In re archives further simplifies wn weight. The rolling stones . The algorithm complication tics for many problems which n isolation and we try to avoid he underlying motivations and h a heavier task: the algorithm An automated heuristic balance work (“Grau, teurer Freund, ist ble, or by at least directing the . earch through an internal , developed in the last decades a plex optimization problems. The ing objectives are reached: op machine learning methods and c. algorithm itself, a task that requires c. The task would be daunting and . In Reactive Search the past history of . The algorithm becomes self-contained in the algorithm . This does not imply higher unemployment the algorithm maintains the internal flexibility needed to advocates the integration of sub-symbolic machine learn- of heuristic algorithms. simple re-use . The time-consuming tuning phase is now substituted by an au Reactive Search eliminates the human intervention opportunities for learning and self-tuning strategies requirement is particularly importantevaluations from are the crucial. scientificthe The p test recent and introduction of softwa Complete and unambiguousand documentation its quality can be judged independently from the designe process. Let ustuning note process. that onlydevelopment On phase, the with the final athe user contrary, above possible will described the preliminary exhaustive typically phase documentation algorithmis of be of presented the designer exp to tuning fac the scientific community. Automation cover many problems,runs but and the “monitors” tuning its is past behavior. automated, and execut for the theback “exploration mechanisms, versus forincreasing exploitation” example the dilemma amount by of canis starting diversification needed. b only with when intensificati there is e • In particular, • We aim at giving the main principles and at developing some fr The point of view of the book is to look at the zoo of different o will increase, but the price is worth paying if the two follow ing techniques into local search heuristics for solving com word reactive hints at a ready response to events during the s rates for researchers. On the contrary, one is now loaded wit developer must transfer histhe intelligent expertise exhaustive into description the of the tuning phase Learning community, with significantrich influx variety of from “design Statistics algorithms principles” and that can that be canthis used be to way profitably devel used one in the area of param 1.2. BOOK OUTLINE feedback loop for thethe self-tuning search of is critical used parameters for: feedback-based parameter tuning automated balance of diversification and intensification 1.2 Book outline The book doesheuristics, not meta-heuristics, aim stochasticbordering at local on a search, the complete et myththe top of coverage of Sisyphus, a of condemnedare mountain, the in whence by this the the widely case stone godswould caused would expa render to by fall a back the ce book of rapid obsolete its development after o of a new short heuris proaches. span. We like mathematicssources but we of also inspiration think takesalle that some Theorie. hiding t color Und out desother gr ¨un of Lebens hand, pictures goldner the and Baum” scientific these hand-waving — can pitfalls Johann be by Wol veryreader dangerous giving to i also the the bibliographic references basic for equations deepening when aunderline topi possi Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. 1 , IN- search for (1995), no. 1, Designing and 1 s of approximation lifornia at Berkeley, troduce some problems . Stewart, , Journal of Heuristics techniques (online learning m has been widely studied by CHAPTER 1. INTRODUCTION hods hese parameters are presented basic issues and algorithms are ning strategies in heuristics it in onal interest. We plan to produce he different methods are identified, , Journal of Heuristics reactive search (1995), no. 1, 147–167. Combinatorial optimization, algorithms and complex- 1 (1996), no. 1, 1–28. 8 Probabilistic diversification and intensification in local Toward an experimental method for algorithm simulation , Journal of Heuristics Testing heuristics: We have it all wrong Efficient checking of polynomials and proofs and the hardnes , Ph.D. thesis, Computer Science Division, University of Ca , Prentice-Hall, NJ, 1982. reporting on computational(1995), experiments no. with 1, 9–32. heuristic met 33–42. FORMS Journal on Computing ity vehicle routing problems 1992. The structure of the following chapters is as follows: i) the Intelligent optimization, the application of machine lear The focus is definitely more on methods than on problems. We in techniques applied to locala search) more is balanced wider version because in of the pers future. introduced, ii) the parametersiii) critical opportunities for and the schemesand success discussed. of for t the automated tuning of t Bibliography [1] R. S. Barr, B. L. Golden, J. P. Kelly, M. G. C. Resende, and W to make the discussionreactive more search concrete or and when intelligent a optimization specific heuristics. proble itself a very wide area, and the space dedicated to 4 [2] J.N. Hooker, [3] Catherine C. McGeoch, [4] C. H. Papadimitriou and K. Steiglitz, [5] Y. Rochat and E. Taillard, [6] M. Sudan, Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. e is to ice? ) M X emale! ( f goodness , the moves L } hborhood with 1 , search trajectory 0 { function, = } X xpect historical fidelity : , ...M L fitness t to add some pieces to the h we assume already known =0 repetition the current config- ul reader may notice that the ing strategies can be applied. gth and builds a olution with a single wheel, it is tried. The function ck to model A and start other st of the changes are localized, ,i ngredient but additional spices point of view! This real life story ) an initial tentative solution and ) s we answer that this book is not (0) t h a second addition, one may end s note in passing that everybody’s ( X es and go down to the nitty-gritty of is the current solution at iteration X ( i ) t µ ( ) the individual bits, and therefore X = , obtained by applying a set of basic moves ) X t ( X 5 flipping such that ), the function to be optimized is tested, the change is ∈ X There is not a single answer, please specify whether male or f to be minimized. If no neighbor has this property, i.e., if th X How many shoe-shops does one person visit before making a cho is the search space, { f perturbed X . )= L ) t ( X ( N function. is the neighborhood of point starts from an admissible configuration ) ) t ( where the successor of the current point is a point in the neig X to the current configuration: ( +1) t N M objective ( . t , ..., µ 1 , ..., X Local search Fig 2.1 shows an example in the history of bike design (do not e Local search based on perturbing a candidate solution, whic , µ (0) 0 X a lower value ofconfiguration the is function a local minimizer, the search stops. If the search space is given by binary strings with a given len can be those changing (or complementing or kept if the new solution is better, otherwise another change µ function, here, this bookworks is but about it algorithms!). isoriginal not Model design, optimal A the yet. ischanges. situation a Model But is let’s B starting note is worse. s up that, a with if randomized Model One one C, attemp insists couldhas clearly and superior a revert proceeds from lesson: wit ba aare local usability and in search safety certain by cases smalllife needed perturbations is to is an obtain a example superiordramatic tasty results. of changes i an Let’ do optimization happen,goodness algorithm but function in of not our action: soabout life frequently. mo philosophy, is let’s not The stop so caref the here clearly algorithms. defined. with far-fetched To analogi thi to the reader,Let’s is a define first the paradigmatic notation. case where simple learn Reacting on the neighborhood 2.1 Local searchA based basic on problem-solving perturbation strategy consists of starting from Chapter 2 trying to improve ituration is through slightly repeated modified small ( changes. At each (“time”) be optimized is called with more poetic names in some papers: equal to the string length Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. d (2.1) (2.2) e case olutions of . e neighborhood, ed to the overall value. The analogy than by sampling a f ultiplied by the displace- the search and encourage ) ) n order to continue the in- ) nd keep all others fixed. In t s the problem structure. For ( and the P, or in sorting) an improper value of the neighbors can be atorial optimization problems ndom point generally is much imation to the global optimum tistical way, to a small change X f values in the neighborhood are s to quantify the notion that a X ary string describing the current ( e recalculated from scratch. ) ions. A better neighborhood can f ) N t ( MODEL C ( diversify X ( is that of a continuously differentiable n in its neighborhood

f

is therefore more effective than sampling from Probability . Large-dimensional problems tend to have a very peaked dist ∗ X X is based on building a sequence of locally optimal solutions maps from the search space values. EARCH ∗ S X As it happens with many simple — but sometimes very effective One may also argue that iterated local search, at least its mo Our description follows mainly [15]. L CAL objective function values Local Search the current local minimum; (ii) applying local search after ciple has been rediscoveredlocation multiple problem times. is For example, perturbedis in reached, by the placing hazard artificialhazard is “hazard zone eliminated are and penalizing the routes usual inside local them: sear the takes as input an initial configuration In other words, theso perturbation that is the obtained oldof by local a temporarily minimum century will agolocal now one search be [26]. already suboptimal with finds the seminal m ideas related toshares ite many designactive principles in with theby variable iterated a Lin-Kernighan neighborhood 4-change algorithm s a (a of “big starting [12], kick” solutionsearch eliminating whe for four literature the edges next basedof and run on [19, reconn of Simulated 17, the Annealing,principles. 18, Lin-Kernighan 25] the heu contains work very ab interesting results couple unless one is sotions, lucky sampling to from start already at a local minimum. If o basic feature of local search, see Fig. 2.7. a move can beis much to faster avoid thanminimizer confinement into computing in the them a starting fromstrong point given to scratch). for attraction avoid the A reverting basin, next to run so memory-less has that random to th restarts be (i ap for Figure 2.7: Probability of low 2.4. ITERATED LOCAL SEARCH domly extracted from Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. : ∗ h in X are value proba- y to two traction in b,c,d,e,f leading from a local in the figure. b X ghboring local minima. to in a by a fixed percent excess. h l search, one says that the fast stochastic search in g . best nstance increases, the e attraction basins geography a rom ly if their attraction basins are al search until convergence at f em which tends to cluster good d then defines a local search nt in the other attraction basin ig. 2.10: generate a sufficiently neighborhood definition on local k among neighbors). Hierarchy er to search in the neighborhood tatistical arguments [21] related to and, if properly designed, can lead to hierarchy of nested local searches which is of theoretical interest is ob- (proper as usual means that it makes ∗ ∗ c around a local optimum. An attraction of the perturbation, furthermore one has X randomized path X biased b strength we are going to get very similar values with a large are not neighbors of by repeating runs of local search after starting from ∗ CHAPTER 2. REACTING ON THE NEIGHBORHOOD d ∗ a X g,h . For example, in Fig. 2.8, local minima X tends to become extremely peaked about the mean X f attraction basins f will therefore be ∗ of the dynamical system obtained by applying the local searc . Points a X e . One could continue, but in practice one limits the hierarch attractor values significantly lower than those expected by a random ex f . ∗ X , mean value which can be offset from the best value )) ∗ x ( A weaker notion of closeness (neighborhood) which permits a Relief is from the rich structure of many optimization probl One may be tempted to sample in A neighborhood for the space of local minima A final design issue is how to build the path connecting twoOne nei has to determine the appropriate . f ∗ ( moves. By definition, twoneighbors, local i.e., minima there are neighbors areis if points a and on on neighbor the boundary in where the a original poi space different random initial points. Unfortunately, general s the “law of largebility distribution numbers” of indicate the that cost when values the size of the i E If we repeat a random extraction from neighbors of local minimum optimum to one of the neighboring local optima, see the path f probability. local minima together: insteadof of a a current random goodstarting restart local from it optimum. is a bett What proper one neighborhood needs structure is on a the space of local minima levels. The sampling among the internal structuremeans of that the one problem uses local “visible” search during to a identify local wal minima, an and which does not— require a daunting an task exhaustive indeed determination — is of based th on creating a tained from the structure of the Figure 2.8: Neighborhood among attraction basins induced a minima in 14 local optimum is an basin contains all points mapped to the given optimum by loca the discovery of X An heuristic solution isstrong the perturbation following leading one, toa see a local new Fig. minimum. point 2.9 and and then F apply loc Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. 15 default neighborhood . If perturbation is too c X c , then local search to , history) b d b c ′ b perturbations of different strengths ∗ X to , a ∗ X ( a a ) () ′ X ( neighborhoods of increasing diameter , history) Figure 2.10: Iterated Local Search ) ∗ therefore missing the closer local minima. 0 () X d X ( f(X) ( max k LocalSearch AcceptanceDecision Perturb ≤ ← k ← ← ′ ∗ ∗ ′ 1 LocalSearch InitialSolution X X X (no improvement or termination condition) ← ← ← k while 0 ∗ repeat until X X function IteratedLocalSearch 2. 1. strong one may end up at 9. 4. 8. 5. 10. Figure 2.9: ILS: a perturbation leads from 2.4. ITERATED LOCAL SEARCH 6. 7. 3. Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , ” d of the random OLUTION ERTURB ement of difference S , P f strength , leading to the NITIAL } the problem [16]. EARCH /T S iversification, to an )) ′ ∗ (1994), no. 2, 126–140. X OCAL ( 6 t importance also in this f adapt the by statistically monitoring sification can be obtained lar technique to allow also , L (1997), no. ARTICLE 2, − imilar ideas have been used wing chapters. prove on local search results, ) e following chapters, that the dless cycle is produced if the performance can be obtained ∗ in the trajectory in the starting 2 o increase the robustness and value, to a very relaxed X e risk that the solution returns oning [13] problems. ed by temporarily changing the ( s. A possible solution consists of ser to a random sampling, there- f d by the algorithm. Memory and , The Journal of the Operational se robustness and discourage cy- g a proper perturbation, or “kick”, f techniques are in reality strongly on away from a local optimum, as ( ood of the current solution for the the initial point is less relevant for adapted OLUTION minimum is displaced, like in [4, 6], { S exp NITIAL (1994), 181–206. : I 6 CHAPTER 2. REACTING ON THE NEIGHBORHOOD , ORSA Journal on Computing Reactive search, a history-sensitive heuristic for MAX- , with probability: test, which accepts a move depending on the of some local moves. Glover [7] and, independently, Hansen T Learning with first, second, and no derivatives: a case study intensification first, minimal diversification only if neede in its different realizations can range from a strict requir , Neurocomputing . Greedy construction is often recommended in I (1981), no. 9, 815–819. implementation of [17, 18]. usage of software for optimization. prohibition 32 parameter ECISION ECISION Local optima avoidance in depot location D D simulated annealing hands-off The reactive tabu search some configuration variables and by optimizing sub-parts of , temperature , ACM Journal of Experimental Algorithmics which always accepts the randomly generated moves to favor d fixing CCEPTANCE CCEPTANCE Research Society SAT http://www.jea.acm.org/. in high energy physics jointly optimizing the four basic components Learning based on the previous search history is of paramoun We will encounter these more complex techniques in the follo While simple implementations of ILS are often adequate to im It is already clear, and it will hopefully become more so in th In particular, the Simulated Annealing method [14] is a popu [4] John Baxter, [1] R. Battiti and M. Protasi, [3] [2] R. Battiti and G. Tecchiolli, improvement, which accepts a move only if it improves the The A back to the startingperturbation and local local optimum. search are After deterministic. this happens,case an [23]. en The principle of ” to avoid cycling: if the perturbation is too small there is th 16 can be applied, togethercling. with As stochastic we elements have toattraction seen increa basin, for VNS, while minimal excessivefore perturbations ones loosing mainta bring the the boostperturbing method from clo by the a problem short structure random propertie walk of a length which is and Jaumard [9] popularizedin the the Tabu past for Search the (TS) Traveling approach. Salesman S [22] and graph partiti Bibliography perturbation to the localconsidered characteristics instance. in theobjective Creative function neighborh perturbations with penalties can soor that be by the obtain current local intermediate walk to identify quickly low-costlong runs solutions, provided while that thereactive sufficient effect learning exploration of can is be maintaine used in a way similar to that of [1] to the progress in the search. and do not requireby opening the “black box” local search, high and on a large-step Markov chain design principles underlyingrelated. many We superficially already mentioned different or the issue selecting related the to appropriate designin well neighborhood, as to the leadpermit a issue more soluti of using online reactive learningworsening schemes moves t inthrough the a temporary stochastic way. In other schemes, diver and A Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , , 17 ive, , Bell , SIAM f Com- (2002), , Opera- , Comput- 57 nt learning nd large-step , Tech. Report , Proc. 17th Col- sis, improvements, (1995), 347–364. uting problem with em Theory, Urbana, 47, Springer Verlag, , ISORMS 83 arch heuristics ly 2003. (1989), no. 3, 190–260. 1 ta, Antonio J. Fernandez, and (2003), no. 4, 347–368. , INFORMS JOURNAL ON COM- , Computers and Operations Re- Large-step markov chains for the 15 (1996), 57–76. Perturbation: An efficient technique (1991), 299. Optimization by simulated annealing 63 5:3 Iterated local search Combining simulated annealing with local search Variable neighborhood descent with self-adaptive Cut size statistics of graph bisection heuristics (1999), no. 1, 231–251. A tutorial on variable neighborhood search 10 , ORSA Journal on Computing Algorithms for the maximum satisfiability problem Algorithms for computer solution of the traveling salesman An efficient heuristic procedure for partitioning graphs , Complex Systems Variable neighborhood search (1992), 219–224. (1970), 291–307. , Proceedings of the 7th EU/MEeting on Adaptive, Self-Adapt 11 49 , Ph.D. thesis, Darmstadt University of Technology, Dept. o , European Journal of Operational Research Choosing search heuristics by non-stationary reinforceme Job-shop scheduling: Computational study of local search a Local optimization and the travelling salesman problem , INFORMS JOURNAL ON COMPUTING (1983), 671–680. Local search algorithms for combinatorial problems - analy A reactive variable neighborhood search for the vehicle-ro (1997), no. 11, 1097–1100. (1996), no. 2, 125–133. Tabu search - part i , ANNALS OF OPERATIONS RESEARCH 8 220 , Proceedings of the Sixth Allerton Conf. on Circuit and Syst Large-step markov chains for the tsp incorporating local se 24 (1990), 279–303. , 44 ing ISSN: 0711-2440, Les Cahiers du GERAD, Montreal, Canada, Ju neighborhood-ordering and Multi-Level Metaheuristics, malaga, SpainJose (Carlos E. Cot Gallardo, eds.), 2006. loquium on AutomataBerlin, Languages 1990. and Programming, LNCS, vol. 4 Systems Technical J. search for the solution of veryPUTING large instances of the euclidean tsp time windows tion Research Letters Science heuristics (2004), 523–544. 321. JOURNAL OF OPTIMIZATION optimization methods traveling salesman problem problem and new applications puter Science, 1998. Illinois, IEEE, 1968, pp. 814–821. [8] N. Mladenovic P. Hansen, [6] B. Codenotti, G. Manzini, L. Margara, and G. Resta, [9] P. Hansen and B. Jaumard, [7] F. Glover, [5] O. Braysy, [10] P. Hansen and N. Mladenovic, [11] Bin Hu and Gnther R. Raidl, [12] D.S. Johnson, [13] B. Kernighan and S. Lin, [14] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, [19] Olivier C. Martin and Steve W. Otto, [15] H. R. Lourenco, O. C. Martin, and T. Stutzle, BIBLIOGRAPHY [20] Alexander Nareyek, [18] [21] G. R. Schreiber and O. C. Martin, [16] H.R. Lourenco, [17] Olivier Martin, Steve W. Otto, and Edward W. Felten, [22] K. Steiglitz and P. Weiner, [23] T. Stuetzle, Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , MIT (1999), 469– A quantitative 113 ox 513, 5600 MB Eindhoven, and E.H.L. Aarts, Reinforcement learning: An introduction Guided local search and its application to the CHAPTER 2. REACTING ON THE NEIGHBORHOOD , Computing Science Reports 95/06, Department of Com- , European Journal of Operational Research Press, 1998. analysis of iterated local search puting Science, Eindhoven University ofNetherlands, Technology, 1995. P.O. B traveling salesman problem 499. [24] Richard S. Sutton and Andrew G. Barto, 18 [25] T.W.M. Vossen, M.G.A. Verhoeven, H.M.M. ten Eikelder, [26] Christos Voudouris and Edward Tsang, Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. mulated apter 2 one (moves leading to . But the danger is le (until they become es while still biasing the ectory away from the local Temperature schedules are cool rent point is chosen stochas- ize the technique, with a hint a “controlled worsening” has Markov processes. The trajec- , but the move generation and f “trying to escape and falling tarting local minimum will be point and not on the previous searching in the vicinity of good d worsening” of solution values ciently far. , imal point and for problems with for self-tuning. ely happens if the local minimum fixed 19 -VNS routine of Section 2.3, where one accepted accepting also worsening moves values) and if more than one step is needed before points KEWED f values are visited more frequently than large values, the Si f may lead to the discovery of better minima. In the previous ch values) the trajectory moves to a neighbor of a local minimum of solution values values better than that of the local minimum become accessib f f Now, if one extends local search by To avoid deterministic cycles and to allow for worsening mov changed the neighborhoodminimum. structure In this in chapter order the neighborhood to structure push is the traj acceptance are stochasticaiming and one at permits escapingalready also from a been the “controlle considered local in attractor. the S Let’s note that worsening moves provided that they lead the trajectory suffi Annealing (SA) method hasat been mathematical investigated. results, and We will then underline summar opportunities 3.2 Simulated Annealing andThe Simulated Asymptotics Annealing method [17]tory is is based built on the intically, theory a of with randomized a manner:history. probability the that successor of depends the only cur on the current a rich internal structure encountered in many applications 3.1 Stochasticity in local moves andAs controlled mentioned previously, worsening local search stops at a locally opt Reacting on the annealing schedule Chapter 3 local minima that, after raisingchosen the at solution the valueback second at immediately iteration, to the leading the newis starting to strict point, point”. (all an neighbors the This endlesswith have situation s cycle worse sur o neighbors of the current solution point). exploration so that low worse Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. . ti- (3.1) (3.3) (3.4) (3.2) given be the j ) k transition ( X . Under ap- /T goes to zero ) state t )) , furthermore ( ) T t ( X X goes to zero, only ( f . If T − f ) Y ocess, let panacea ( , if f becomes 1, whether it being the search space . ( ) − p S e − not a alue of and from the current local exponential acceptance rule = ∈ O p (1 . Being a Markov process, SA j e and general-purpose method, s where high importance is at- . onverge of the method when the ) patience which is excessive con- Vice versa more recent developments in SA sults, a practitioner has to place on [24, 7, 17, 2]. . or optimization. First, one does a brief introduction about some i whose components are given by e process and waits long enough, ed —and memorized— during the alled urations. This basic result raised ity of finding a given configuration ptimal solutions when the number q = . d local search [11], and even random for all 1) , − ) k ( j )=1 (meaning that transitions do not depend ∗ X = | -th trial, then the elements of the j k ∈ S (0) = ) k ) X , with probability , with probability ( | k which determines the probability that worsening ) ) ) i ( ) ) ) X t t t T ( ( ( X be the set of optimal solutions. One starts from an = ) X X X ∗ Pr( ) ) ( ( ( k ) ( S t f ( homogeneous X →∞ ≤ >f >f lim X k ) = Pr( ( ) ) ) k Pr( implies that more worsening moves tend to be accepted, ( N Y Y Y CHAPTER 3. REACTING ON THE ANNEALING SCHEDULE ( ( ( ij ( parameter f f f T P →∞ k if if if of a finite = lim configuration at convergence is optimal or not, but that an op ) i t q ( and repeatedly applies (3.1) to generate a trajectory Y Y X EIGHBOR (0) final N      before the last step, are defined as: X , given the probability that the configuration is at a specific i temperature P ← ← Y +1) t ( be an instance of a combinatorial optimization problem, X denote the set of possible outcomes (states) of a sampling pr ) O being the objective function. Let ,f stationary distribution goes to infinity, then the probability that a move is accepted S f ( SA introduces a Unfortunately, after some decades it became clear that SA is A Let Given the limited practical relevance of the theoretical re T 20 moves are accepted: a larger and therefore a larger diversification. The rule in (3.1) is c If improving moves are acceptedis as characterized in by a thememory memory-less standard about property: local the if search initial oneat configuration starts is a th lost, given thethe probabil state probability will will be peakhigh stationary only hopes at and of the only solvingstarting globally optimization dependent from optimal problems seminal on config through work the in a v physics simpl [19] and in optimizati stochastic variable denoting the outcome of the improves the result or not, and one obtains a random walk. most mathematical results aboutnumber asymptotic of convergence iterations (c not goes care to whether infinity) the are quite irrelevant f probability matrix that it was at state mal solution (or asearch. good approximation Second, thereof) asymptotic is convergencesidering encounter usually the limited requires length a search of our [8] lives. have better Actually, asymptotic repeate results for someasymptotic problems results intributed the background to and learningcharacteristics develop from during heuristic a theasymptotic search. task convergence results, before we Inconsidering the will adaptive the concentrate search and following, on learning is the implementations. after started 3.2.1 Asymptotic convergenceLet results on time) Markov chain is defined as the stochastic vector propriate conditions, the probabilityof of iterations finding go one of to the infinity o tends to one: and initial configuration Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , k T P 21 T (3.5) (3.9) (3.7) (3.6) (3.8) m, a q (3.10) = . T such that q k T whose compo- ), there exist a 0 ) T or, in other words ( > q ∗ q ii ) (the greatest common n (not easy to calculate!). P . Furthermore , there is a positive prob- ( ations of the asymptotic i ) q . T probability one, “converges i, j ( the homogeneous and inho- T tribution ly long homogeneous Markov P )= e with i ity must ensure that one can sequence of values aperiodic istribution is determined by the 0 y solution in a finite number of be reached. The temperature = trix ) ) i k n: ( ( ∗ n > (for every X S /T )=1 ) χ ∗ j ) | ( )=1 , equal to one if the argument belongs to Pr( i 0 ∗ f ∗ /T ∗ q 1 k ) i − ∈ S S |S ( e = + ) f →∞ ∈ S A k k S − = ( k ji ) e ∈ k ∗ i P j X ( j q lim ( irreducible q log( X P T Pr O ≥ )= ∈ Pr( X j T k )= ( →∞ T lim i T k q ( →∞ 0 irrelevant for the application of SA to optimization i lim 0 k q → → lim T lim T is the set of all integers i in a finite number of steps) and D depends on the depth of the deepest local, non-global optimu j A from , then the Markov chain converges in distribution to i , where 2 > 0 )=1 k . i D is the characteristic function of the set =0 and ∗ k S 0 gcd( T χ →∞ A> If a finite Markov chain is homogeneous, Under appropriate conditions [1] (the generation probabil If a stationary distribution exists, one has The above cited asymptotic convergence results of SA in both The algorithm asymptotically finds an optimal solution with If the temperature decreases in a sufficiently slow way: It follows that: In any finite-time approximation one must resort to approxim k ability of reaching divisor the distribution is not modified by a single Markov step. move from ansteps) arbitrary the initial Markov solution chain to a associated to second arbitrar SA has a stationary dis Homogeneous model In the homogeneous modelchains, one where considers each a chain sequence is of for infinite a fixed value of the temperatur for nents are given by: unique stationary distribution, determined by the equatio where and The theoretical value of 3.2. SIMULATED ANNEALING AND ASYMPTOTICS value which is not easy to calculate formogeneous a model generic are instance. unfortunately must be lowered before converging: one has a non-increasing with probability one”. Inhomogeneous model In practice one cannot wait for alim stationary distribution to the set, zero otherwise. convergence. The “speedsecond of convergence” largest to eigenvalue the of stationary the d transition probability ma Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. cooling strategies aiming at a he search, but then the current state, not on the is intuitive that a better . Extensive experimental would take less time than ling lies in its asymptotic can happen (e.g., Traveling convergence. For a practical d in [5, 6]. ed for solving combinatorial ed in the years to make SA a rch [8] has better asymptotic to be executed to escape from d to those expected from the e, all future iterations can be pically larger than the size of escape ing the evolution of the search, rtitioning are presented in [15, havior of SA arbitrarily closely sed in practice with fast low w.r.t. the jump size SA risks rinciple. In the following sections ht-bulb being a local minimum). l minimizer and the temperature is l finite-length annealing schedules A [1]. earn, and avoid doing it again! asymptotic escape, an enormous number of iterations in the total number of possible configura- size of upward jump to escape from attractor size of upward jump to escape from X b eventually quadratic CHAPTER 3. REACTING ON THE ANNEALING SCHEDULE complete enumeration of all solutions size of temperature parameter T f(X) , i.e. ways to progressively reduce the temperature during t More details about cooling schedules can be found in [20, 13] In addition, repeated local search [11], and even random sea The memoryless property (current move depending only on the asymptotic results are notobtained directly on applicable. specific Thelimiting simple optima theorems problems [26]. do not always correspon results of SA for16]. graph A comparison partitioning, of coloring SA and and number Reactive pa Search has been presente 3.3 Online learning strategiesIf in one simulated wantsconvergence annealing properties, to the main be drawback lies poetic, in the the main feature of simulated annea schedules tions in the search space [1]. For the inhomogeneous case, it Salesman Problem) that the approximating an optimal solution arbitrarily closely by S results. Accordingrequires to a [1] number “approximatingthe of the transitions solution asymptotic that space.optimization for be problems most to Thus, problems optimality.” the is Of ty SA course, SA algorithm can is be u clearly unsuit application of SA if thealready local very configuration small in is comparison closethe to to the attractor, a upward although loca jump the which system has will Figure 3.1: Simulated Annealing: if the temperature is very The number of transitions is at least a practical entrapment close to a local minimizer. better time-management than the “let’swe go will to summarize infinite the time”competitive main p strategy. memory-based approaches develop 22 can be spent around the attractor. Given a finite patience tim spent while “circlingAnimals like with a superior fly cognitive around abilities get a burnt light-bulb” once, (theprevious l lig history) makesperformance SA can be look obtainedby like by using developing a memory, simple by dumb self-analyz models animal and indeed. by It activating more direct Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , 23 etter when alpha ies) to . After a along the found and /T f ntinued for T 2 f , for example σ start to diversify the hase transition T phase transitions values. In detail, a is the temperature f situation, for example msp T a posteriori ( no tentative move will be emperature. This process , geometric cooling sched- to the slower one given by elated to the minimum-temperature a new instance, it is not so ecrease, a motivation arises α msp ariance reaches its maximal ided f parameters start T d) which would not capitalize ation to define the maximum- t the temperature ests to estimate: the mean of completion, and may improve uantity estimated by the ratio bility, no tentative moves will d an annealing schedule (this T initial period the temperature erature high enough to escape izer, but not too rapid to avoid onic cooling schedule has to be , in many cases one would like se to other good local minima”). lation. The basic design principle n by orithms — by definition — return d the temperature: ven iations of the entrapment g the state after the transition, big f the temperature. Concretely, one ure finer details in the solution have to e users’ patience). The best value ity to improve significantly on the practically in the vanilla version one will reach gain after reaching a different basin. ecreasing the temperature so that the , monitors the evolution of α β increase the temperature ), a sensible choice has to be made for the and , and α end T start . If the proper scale of the temperature is wrong, , see [9, 23, 3]. A very simple proposal [9] suggests T end T , and α , is usually called “energy” when exploiting physical analog start f , so that longer allocated CPU times are related to possibly b T , final temperature 1 α < values ( will remain stuck in a helpless manner even if the search is co f are used. While we avoid entering physics, the idea is that a p best f , with t non-monotonic cooling schedules αT anytime algorithm = +1 t specific heat T Up to now we discussed only about a monotonic decrease of the t The analogy with physics is pursued in [18], where concepts r Let’s note the stopping criterion in many cases should be dec Because this problem is related to a monotonic temperature d . is related to solving areconfigurations sub-part take of place a and problem. this is Before signalled reachin by wide var and phase transition corresponding to thebe big fixed, reconfiguration, and this requiresdefines heuristically two a temperature-reduction slower parameters decreasetrajectory o and after the phase transition takes place at a gi corresponding to the maximumvalue) specific one heat, switches when from the a faster scaled temperature v decrease give phase transition occurs whenbetween the the specific estimated heat variance is of maximal, the a objective q function an ule the useful spanwill of be CPU so timebe low is accepted that anymore practically the withinsimple limited, system the to “freezes” after finite guess and, the span appropriateto with of use parameter the large an values. run. proba Furthermore Now, for β has some weaknesses: for fixed values of three involved parameters an iteration so that the temperature will be so slow that define a maximum energy scaletemperature of scale, the and system, the itsscale. standard minimum These devi change temperature inis scales energy the tell to same us define as where the to cooling schedule begin introduced and before). en 3.3.1 Combinatorial optimization problems Even if a vanilla version of SA is adopted (starting temperat 3.3. ONLINE LEARNING STRATEGIES IN SIMULATED ANNEALING accepted with areached non-negligible so probability far (given the finit very long CPU times, see also 3.1. In other words, given a set o extremely poor results arethe distribution to of be expected. The work [27] sugg and better values until thethe user best decides answer to possible stop.on even Anytime the if alg answer they if are they not are allowed allowed to to run run to longer. after determining thatresult. additional time has little probabil to consider resetting the temperature oncelocal and minima for but also all low at enough a to constant visit temp them. For example, a the best heuristic solution isis found related in to: a i) preliminary exploitingsystem SA an can simu attraction settle basin down rapidly close bysolution to d the and local visit minimizer, ii) otherAs attraction usual, basins, the iii) temperature decreaserapid increase a enough in this to kind avoida of falling random-walk non-monot back situation to (whereof the all the current random local local moves structureThe minim are of implementation details accepte the have problem to (“good do with local determining minima an clo Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. s a from rsions ity given learning int with a random of tentative changes, f SA running in parallel max t an improving perturbation by the algorithm itself using looking” [12] (but let’s note probabilistic information for cal minimum and activating ting factor depending on the me optimum point, they are perturbations to escape from ariable neighborhood search, nce ep along the chosen direction. n combinatorial problems and rst possibility for le and adaptive stochastic local e parameter is eliminated. The f convergence to a globally opti- e, nonetheless this is taken as ne “ rent configurations of different If one looks for gold, one should mple random sampling, or even st. A simple adaptive technique he local minimum. More details . k of the other prospectors!). The y reduce the reset temperature: lying intensification and causing tion of functions defined on real proposed in tabu search, in par- rameters depend not only on the e) will eventually find the optimal red in [21, 22]. Experimental evi- temperature include resetting the se evolution as a function of events phases can be used, which multiply during successive steps of the algorithm. in the area of design automation suggests h e : a perturbation leading to a worsening solution re-heating EURISTIC H (by lowering the temperature parameter) when there is EURISTIC H CHAPTER 3. REACTING ON THE ANNEALING SCHEDULE is close to one at the beginning, while it increases if, after , the temperature value when the current best solution was determination of SA parameters and acceptance function doe EQUENCE γ or by the user using his own learning mechanism”. The authors found EQUENCE T larger than one at each iteration during reheat. Enhanced ve , evaluate the function and accept the move with the probabil h γ a priori = e reset T . Alternatively [3] geometric intensifying the search 2 ) is pioneered by [10]. The basic method is to generate a new po / R reset T ← “Cybernetic” optimization is proposed in [12] as a way to use Let’s note that similar “strategic oscillations” have been Modifications departing from the exponential acceptance ru by a heating factor reset from the fact that no tentative move is accepted after a seque 24 found [23]. If the reset is successful, one may progressivel T and determining the detailed temperatureoccurring decrease-increa during the search.temperature to Possibilities to increase the T A first critical choiceA has fixed to choice do obviously with may the be range very of inefficient: the this random opens st a fi (this can bereactively seen an as escape deriving mechanism).positive evidence In performance this of of way the “entrapment” the S in temperatur a lo is accepted if and only if a fixed number of trials could not find some learning mechanism appropriately note that theproblem optimal but also choices on of themum algorithm particular is pa instance not and a thatexhaustive selling a enumeration point proof (if for o the asolution, specific set although heuristic: of they in configurations aresuggested fact is not in a finit the [22] si is best the algorithms S to sugge in (3.1). One cycles over the different directions involve a learningsystem process state. to choose In a particular, proper value of the hea local minima rather thanthe to annealing some of metals” mystical [22]. connection betwee feedback during aand run to of aim SA. at The idea is to consider more runs o step along a direction that the success of SA is “due largely to its acceptance of bad fixed number of escapeand trials, additional the bibliography system can is be still found trapped in in the t citedticular papers in thesee reactive Chapter tabu 2. search [4] see Chapter 4,search and methods in for combinatorial v dence optimization shows are that conside the not lead to efficient implementations. Adaptations may be do 3.3.2 Global optimization ofThe continuous application functions of SA to continuous optimization (optimiza evidence that the search isspend converging more to the time optimumthat value. “looking actually one where may allempirical argue differently the evidence depending other is onparallel prospectors taken the runs. luc from are thealso For similarity close sure, between to ifevidence cur each of more other. a solutions (possible) Thein are closeness contrary a to close reactive is the manner to optimum not a point, the necessarily gradual imp tru reduction sa of the temperature. variables in Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , , 6 is 25 17 as: h e x (1999), d 1, , and the 16 k T T r (1985), 41–45. , SIAM Journal em: An efficient Smolensky and ← 45 ison on qap tasks +1 k being a second fixed T T re and the step size are N illsdale, NJ, 1995, to ap- value is adapted during the h st simulated re-annealing [14] t is less than 40%. The speed resolution in the early stages, v authors encourage more work , European Journal of Opera- eliver precious feedback about of random moves cycling along moves at about one-half of the manner: S cycles, ( nd random step selection are au- ges [14]. the previous search. Although the , ORSA Journal on Computing oordinate dimensions (in practice N he state is represented as a point T h . The Deterministic and randomized local , Operations Research Spektrum ′ e N x h S (1994), no. 6, 1–8. v N is obtained from the current point 1) accepted Simulated annealing cooling schedules for (1989), 79–95. ′ , 28 1 is updated: each component is increased if x is the step-range parameter, one for each − v 39 ( h v Rand + , and . The exponential acceptance rule is used to decide h x v Boltzmann machines for travelling salesman problems (1988), no. 6, 1455–1470. = ′ , Asia-Pacific Journal of Operational Research x 26 The reactive tabu search On the convergence rate of annealing processes , Journal of Optimization Theory and Applications (1990), no. 1, 93–100. 46 An improved annealing scheme for the qap returns a random number uniformly distributed between -1 an A thermodynamical approach to the traveling salesman probl 1) , 1 − Simulated annealing and tabu search in the long run: a compar Local search with memory: Benchmarking rts , collected into the vector ( , Mathematical Perspectives on Neural Networks (M. Mozer P. , , h surface. In particular a new trial point f (1994), no. 2, 126–140. D. Rumelhart (Eds.),pear. eds.), Lawrence Erlbaum Publishers, H search on Control and Optimization Computer and Mathematics with Applications 1–22. simulation algorithm (1995), no. 2/3, 67–86. tional Research European Journal of Operational Research the school timetabling problem In Adaptive Simulated Annealing (ASA), also known as very fa [5] [1] E. H. L. Aarts, J.H.M. Korst, and P.J. Zwietering, [2] E.H.L. Aarts and J.H.M. Korst, [8] T.-S. Chiang and Y. Chow, [9] D.T. Connolly, [6] [4] R. Battiti and G. Tecchiolli, [7] V. Cherny, [3] D. Abramson, H. Dang, and M. Krisnamoorthy, current point is reset toimplementation the is best-so-far already point reactive foundso and during that based a on “good memory,some monitoring the crucial of internal the parameters minimization of process” the can algorithm. d the parameters that controltomatically temperature cooling adjusted schedule according a toin algorithm a progress. boxadjusted and If so the that t moves allwhile of as the the an state search is oval space directed cloud is to sampled around promising at it, areas a in the coarse the temperatu lateBibliography sta total number of triedthe moves. coordinates In directions, particular, the after step-range a vector number search with the aim of maintaining the number of the number of acceptedof moves is increase/decrease greater could thanit be 60%, reduced is different if fixed for i to the different 2 c and 1/2 in the above paper). After parameter), the temperature is reduced in a multiplicative the local BIBLIOGRAPHY whether to update the current point with the new point where Rand dimension the unit-length vector along direction Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , , 1 21 ms Opti- Equa- , Algo- , ACM (1986), 18 ting conver- types of con- (1988), no. 2, (1989), no. 8, Enhanced sim- ign automation 13 12 Convergence and uation; part ii, graph , NJ, USA), IEEE Press, , Journal of Heuristics , ACM Trans. Math. Softw. d Catherine Schevon, -continuous variables incentelli, on. part i, graph partitioning , DAC ’86: Proceedings of the ntrol . H. Tellerand E. Teller, hm Experiments with simulated an- (1970), no. 6, 1225–1228. Minimizing multimodal functions of , AIP Conference Proceedings, vol. 18 , Journal of Chemical Physics , Math. Oper. Res. nd Jacques Haussy, (1993), no. 10/11, 1–8. Optimization by simulated annealing (1993), no. 1-4, 421–451. 25 (1991), no. 3, 378–406. 41 Simulated annealing: theory and applica- 39 , Advances in Applied Probability , Mathl. Comput. Modelling Analysis of finite length annealing schedules , Oper. Res. Bounding the probability of success of stochastic methods , Operations Research , Ann. Oper. Res. CHAPTER 3. REACTING ON THE ANNEALING SCHEDULE Metastrategy simulated annealing and tabu search algorith (1997), no. 2, 209–228. , Computers Math. Applic. 23 Cybernetic optimization by simulated annealing: Accelera A for the approximate solution of certain (1989), no. 6, 865–892. Cooling schedules for optimal annealing (1983), 671–680. Concepts of scale in simulated annealing (1991), 346–366. 37 Very fast simulated re-annealing 6 220 Optimization by simulated annealing: an experimental eval Simulated annealing and combinatorial optimization , DAC ’85: Proceedings of the 22nd ACM/IEEE conference on Des , , , Kluwer Academic Publishers, Norwell, MA, USA, 1987. (1987), no. 3, 262–280. Oper. Res. 967–973. mization by simulated annealing: an experimental evaluati for the vehicle routing problem 311–329. 122, 1984, pp. 261–270. 13 for global optimization (1996), no. 2, 225–246. strained optimization problems continuous variables with the simulated annealing algorit gence by parallel processing and probabilistic feedback co ulated annealing forTrans. globally Math. Softw. minimizing functions of many 23rd ACM/IEEE conference on1986, pp. Design 293–299. automation (Piscataway coloring and number partitioning rithmica tions no. 3, 747–771. tion of state(1953), calculation no. by 6, 10871092. fast computing machines finite-time behavior of simulated annealing nealing (New York, NY, USA), ACM Press, 1985, pp. 748–752. Science [16] [15] David S. Johnson, Cecilia R. Aragon, Lyle A. McGeoch, an [14] L. Ingber, [11] A.G. Ferreira and J. Zerovnik, [12] Mark A. Fleischer, [13] Bruce Hajek, [10] A. Corana, M. Marchesi, C. Martini, and S. Ridella, 26 [25] Patrick Siarry, G´erard Berthiau, Franc¸ois Durdin, a [24] Martin Pincus, [23] Ibrahim Hassan Osman, [26] P. N. Strenski and Scott Kirkpatrick, [17] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, [27] S.R. White, [19] N. Metropolis, A. N. Rosenbluth, M. N. Rosenbluth, and A [21] Surendra Nahar, Sartaj Sahni, and Eugene Shragowitz, [20] Debasis Mitra, Fabio Romeo, and Alberto Sangiovanni-V [22] [18] P. J. M. Laarhoven and E. H. L. Aarts (eds.), Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , cutting Konrad Lorenz techniques and strategy of [19] (in an . It is difficult to assign ar Programming relaxation arch is started from a new ation for many approaches breakfast. It keeps him young. pect to alternative heuristics nimizer without wasting the s been developed in the last nly cite the four “principles” ition-based heuristics” starting s [15]. It is therefore difficult, if diversification to avoid that the n the proposal and diffusion of reduction as similar to those proposed in ves cannot correspond to better zer. gorithms like local search, with [17] for an independent seminal hat are common to a set of local prohibition-based d classifications tend to be short- versions of TS: the fact that TS is s in the intelligent use of the past ard a pet hypothesis every day before like strategies can be found in solutions are generated) and in branch recency, frequency, quality and influence of selected moves available at the current , and the fact that the main modifications to 27 beyond local optimality ) or in the opposite prohibition forbidden strategy of [24] (once common features are detected in many denial local (neighborhood) search algorithms for solving integer problems through their Line It is a good morning exercise for a research scientist to disc The renaissance and full blossoming of “intelligent prohib It is evident that Glover’s ideas have been a source of inspir In our opinion, the main competitive advantage of TS with res 4.1 Prohibitions forThe diversification Tabu (Tabu Search meta-heuristic Search) [13] is based on the use of Reactive prohibitions Chapter 4 suboptimal solutions, they are “intelligent” schemes asthe a purpose complement to ofa basic guiding precise heuristic the date al basicTS of heuristic can birth be to found these in principles. the For example, ide application to the Travellingoptima are Salesman fixed). Problem, In allplanes very different edges contexts, t prohibition- local search are obtained through the (inequalities that cut off previouslyand obtained bound fractional algorithms (subtreessolutions), are see not the considered textbook if [22]. the lea from the late eightiesa is rich greatly variety due ofpaper. to meta-heuristic the tools role [13, of 14], F. but Glover seebased i also on theabout intelligent using use long-term ofalso memory cited memory in in in tabu-searchyears [6]. of heuristics. and applied A with Let’s growingnot success o number impossible, to to of a characterize wide TS-basedlived. a selection algorithms “canonical Nonetheless, of ha form” problem at of leastused TS, two to an aspects complement characterize many point. TS actswork to already continue executed, therandom as search initial it beyond point, is and thesearch the to first trajectory case enforce remains local confined appropriate if mi near amounts a a of new given run localbased minimi of on local local search se history like of Simulated the Annealing search (SA) to [18] influence lie its future steps. Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , (4.2) (4.1) (4.3) mecha- in TS is prohibited space, what object-oriented }} ems. +1) e longest span. HSAT t ( ) sited during the previous nature of the system. Let +1) t te prohibitions in the next ( its recently-applied moves, , ..., X ) ) ry strings with a given length ) the neighbors are on the underlying dynamical t (0) ( c algorithms. By analogy with ine the rigid enforcement of its innocent variation of GSAT can X the previously introduced neigh- , ..., X if more moves produce the same X in a manner that depends on the ( ) ) should be separated from ) (0) A implest form of TS is what is called 6∈ { t stochastic te the abstract concepts and opera- ( tion with specific data structures. In N +1) ies exist. For example the HSAT [12] X t ,X X ( ( ) ( X N ( CHAPTER 4. REACTIVE PROHIBITIONS +1) ones. The general way of generating the t ( N versus s. t. can be obtained with moves that have been X ( f ) EIGHBOR N ( ∆ -N +1) t allowed ( X EST LLOW ( of (4.1)–(4.2) obtained by modifying local search. A B N . deterministic ∈ = +1) t Stochastic Probabilistic TS Robust TS Fixed TS with stochasticReactive tie TS breaking with stochasticReactive tie TS breaking with neighborhood(stochastic sampling candidate list strategies) X ( in Computer Science [1], and with the related Discrete Dynamical System { ) = (search trajectory generation) selects a subset of +1) contains the t +1) ( t , ..., X ( ) ) a specific policy is realized. An essential abstract concept X t ) = X (0) ( LLOW ( X X A +1) ( t N how ( N (that determine which trajectory is generated in the search X is the current configuration and ( ⊂ ) t A Deterministic Strict TS Fixed TS Reactive TS ( ) ) N t X ( abstract data type Table 4.1: A classification based on discrete dynamical syst ). X policies discrete dynamical system ( L } A 1 N , , the preferred move is the one that has not been applied for th 0 : a neighbor is prohibited if and only if it has already been vi f { that determine ∆ = Let us assume that the feasible search space is the set of bina A first subdivision is given by the Before starting the discussion on the use of memory to genera X : L 28 The set-valued function A borhood. In prohibition-based search (Tabu Search) some of a subset search trajectory that we consider is given by: entire search trajectory the concept of 4.1.1 Forms ofThere Tabu are Search several vantage points from which to view heuristi software engineering techniques [8],tions it of is TS useful from to theother separa detailed words, implementation, i.e., realiza the balance ofnisms intensification and diversification is, etc.) “inactive” for a longerincrease time. its performance It on is some remarkable SAT to benchmark see tasks how [12]. this 4.1.2 Dynamical systems A classification ofsystem some is illustrated TS-related in algorithms Fig 4.1. that is based can be seenHSAT as discourages a recent “soft” moves version if of the Tabu same Search: while TS prohib given by the us first consider thestrict-TS deterministic versions. Possibly the s section, let’s note invariation passing of that GSAT introduces other(best) a softer tie-breaking possibilit rule into GSAT: part of thesimple search prohibition [13] rule). (the Therefore term (4.2) “strict” becomes: is chosen to underl Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. ng 29 ted that ules (4.4) (4.6) (4.7) (4.5) . Note T 0] , computa- 0 probabilis- , ajectory com- ) and the cost ] = [1 3 ,b ) (1) ) 2 t EIGHBOR ( } X ,b ) 1 T -N b T tatus (in this case the − − is the last usage time of t = [ ( t EST ) ( chasticity in fixed-TS and µ X < ( substituted with < ) 3 xecution of its inverse. The prohibition parameter , in the event that the same , so that b 1 ) 1 ube) are illustrated in Fig. 4.1 1 deterministic versions can be − the B 2 move such that its inverse has nt is the set of points that are µ s in efficiency by allowing du- SED − b e evaluation can be substituted µ ded [4]. The main principles of ore systematic approach would ( µ U 1 petition of previously-visited con- ( b is a means for achieving diversity for short). In addition, there are S can be obtained in probabilistic ys be used to avoid external search 7 } SED AST ) )) SED − ) ) U t t is not sufficient to avoid long cycles in 3 U ( ( RTS b , so that the system abandons the local ) T X , AST ( AST (0) A +3 , ..., X X 2 N ( ( b , so that the three defining equations are now: (0) f ) t ( s. t. L ≥ is a local minimizer because all moves produce a +2 s.t. L ,X T ) the prohibition parameter is randomly changed [13]. Stochasticity can increase the robustness of ) 1 1) t throughout the search [13]. A neighbor is allowed if ) b ( t − reactive-TS random breaking of ties ( is the set of 3-bit strings ( t T ( X EIGHBOR )=1 X )=0 ◦ T X ]) = ( ◦ (1) 3 µ (0) -N µ ,b X X = ( robust-TS 2 ( = : only a partial list of candidates is examined before choosi iterations. In detail, if L f EST EACT f ,b X with large probability for allowed moves, small for prohibi 1 B R X { T b { at the beginning): ([ f with = = probabilistic-TS )= for the basic moves acting on binary strings. Now, possible r ) 0] t −∞ 1 ) ) = ( , t ) − ( 0 t X ( +1) , µ t ( T )= ( X A µ ), the general dynamical system that generates the search tr ( = X ( ) N t A = [0 ( µ N T (0) SED stochastic sampling X U AST algorithm is obtained by fixing (L changes with the iteration counter depending on the search s µ T Two additional algorithms can be obtained by introducing a If Stochastic algorithms related to the previously described If the neighborhood evaluation is expensive, the exhaustiv The feasible points (the edges of the 3-dimensional binary c The point Let us note that strict-TS is parameter-free. 4.1. PROHIBITIONS FOR DIVERSIFICATION (TABU SEARCH) determines how longfixed-TS a move will remain prohibited after the e and only if itnot is been obtained used from during the the current last point by applying a move prises an additional evolution equation for notation will be Let us note that to determine the prohibitionfigurations parameter by have reacting to been the proposed re in [4] ( situations where the singlethe reactive search mechanism trajectory on RTS and are therefore briefly a reviewed second in reaction Sec. 4.2. isobtained nee in many ways. For example, prohibition rules can be tic generation-acceptance rules ones, see for example the the different algorithms, in addition [13] “randomization without reliance onplications memory,” and although potentially itseek unproductive could to wandering “entail eliminate.” that aTS a los Incidentally, [11]. m asymptotic In results a for different T proposal ( tion. At least thisbiases, simple possibly form caused of stochasticity by should the alwa ordering of thewith loop a partial indices. the best allowed neighbor. 4.1.3 A workedLet out us example assume of that Fixed the search Tabu space Search in reactive-TS cancost be function added decrease is through obtained a by more than one winner in between an upper and a lower bound during the search [25]. Sto that the move is applied even if function is: minimizer. higher cost value. The best of the three admissible moves is with the associated costconnected with function. edges. The neighborhood of a poi Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. ). is } 3 1] µ , : the 0 pplied , [1 , to assure , and 0] 2 , 1 − , )=1 n [1 ) t { ( T diversification iterations. In order − efore it is allowed to T t 0]) = . ( , 0 . < , = 3 + 1) = 5 ([1 E siders that, as soon as the T E )=3 N ( )=0 ollowing propositions can be 2( 011 +1 1 (2) 3 . The neighborhood is therefore 001 µ T µ or the next X + 1) ( Λ( and the system reaches the global and the system will return to its ≤ f T or 3 )=0 τ 1 ) µ 2 2( t µ ( µ ≥ T because it was used too recently, i.e. its with 101 for CHAPTER 4. REACTIVE PROHIBITIONS − R variables changed in the first phase must = −1 (global minimizer) 0] = 4 t , E ( =1 E τ 1 111 t ⇒ , +1 ≥ . must be less than or equal to ) T values and search trajectory. t )= ( ) .... = [1 T t f ( X = 2 000 0] )=0 = 0 (local minimizer) steps. . , E (2) 1 will again be = 1 E 0 ,X along the trajectory is µ ) , ) X − τ 010 R +1 [1 R (1) Λ( + + t t T ( X ( )= → X X remains equal to zero the system is trapped forever in the ( (3) 0] is prohibited at ) , t H X between a starting point and successive point along the tra- 0 ( 1 = 1 ( , so that , 110 is admissible again because µ T 2 f T used in eq. 4.4 is related to the amount of satisfies H [0 , 1 µ 100 ) µ T 1 → . If µ =1 Trajectory with with = 1 = 3 ) 0] imply that only a limited subset of the possible moves can be a (0) t Λ( , ( E E 1] 0 X T , T , 1 move is = [1 , Figure 4.1: Search space, → (1) = [1 X 0] , is prohibited, , the best move from (3) 0 , 2 X µ [0 = 0 admissible , the longer the distance that the search trajectory must go b =2 (1) T t T The Hamming distance jectory is strictly increasing for The minimum repetition interval If On the contrary, if • • At The above relations are immediately demonstrated if one con But large values of 30 4.1.4 Relation betweenThe prohibition prohibition and parameter diversification most recent usage time minimizer: limit cycle starting point: limited to the points that can be reached by applying admissible because it was never used. The best move is The best larger come back todemonstrated: a previously visited point. In particular, the f to come back to the starting configuration, all value of a variable is flipped, this value remains unchanged f to the current configuration. In particular, be changed back to their original values. Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. 31 (4.8) is the to the chaotic ) T t ( H set he number of gh artificial, the al minimizer and problems that must be ssumption that the search inimizer. The fact that the ce are visited. If oints with low cost function ntinuous space is the usual r periodicity but nonetheless etter points could be close to e local minimizer is the zero tion are here illustrated for a o continue the search beyond ose to a discovered local mini- ve can be applied the choice is ts. d (they are analogous to herefore progressively enforcing ce is the set of all binary strings e neighborhood of a strict local -most (least significant) bit of the sins. If these are not discovered, asic move produce the same cost erations is clearly a desired effect nded it can be easily shown that a ated in the neighborhood of good e., as a function of the Hamming them terminates at a specific local zation procedure, in fact its points rts the discussion is mostly limited at a local minimizer. All points such of the search trajectory. Confinements +1 ⌋ ) t ( 2 is illustrated in Fig 4.2. Let us now consider log = 4 ≤⌊ L ) (endless repetition of a sequence of configurations t . Now, as soon as a local minimizer is encountered, ( H of the search trajectory generated by deterministic local cycles , the following holds true: avoid the confinement t attractors attraction basin values in the neighborhood). It is therefore appropriate to f in the neighborhood is strictly increasing as a function of t ) and that the cost is precisely the Hamming distance. Althou f 0] . Let us assume that the search has just reached a (strict) loc ... in dynamical systems). L [00 An heuristic prescription is that the search point is kept cl Some very different ways of realizing this general prescrip Simple confinements can be The set of obtained configuration for trajectory starting from anthat arbitrary point a will deterministic terminate localminimizer search make trajectory up starting its from during the search)such or that more only a complexattractors limited trajectories portion with of the no search clea space ismizer visite at the beginning,the snooping search about should betterlonger-term attraction gradually diversification ba progress strategies). to larger distances (t “laboratory” test problem definedof as length follows. The search spa search. If the cost function is integer-valued and lower bou its entire attraction basindo is not not have ofthe smaller interest cost current for values. the basin,solved optimi It whose in is boundaries heuristic nonethelessthe techniques are true local based not minimizer that on and known. b can local how happen search to One because is thevalues, of how trajectory t the and tends therefore tosearch trajectory also be remains biased toward close towardin to the p the the just minimizer hypothesis for abandonedsuboptimal some that point local it better rather m than points among are randomly preferentially extracted poin loc that the cost different bits withdistance). respect to Without thestring loss given ( local of minimizer generality, (i. let us assume that th that at least twonot moves influenced can by be the applied (clearly, if only one mo 4.1. PROHIBITIONS FOR DIVERSIFICATION (TABU SEARCH) smallest value that guarantees diversification. 4.1.5 How to escapeLocal from minima an points attractor are assumption is not unrealisticpositive-definite in quadratic many approximation cases.minimizer of of An a the analogy differentiableto cost in function. deterministic in co In versions. the th following pa Strict-TS In the deterministic versiondecrease of at strict-TS, a if givenstring iteration, more is the than move selected. one that b acts on the right how the Hammingalways distance finds evolves an in allowed move time, until in all the points optimistic of a the search spa Hamming distance at iteration Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. H ited such [1101] =3 ). L =2 L L = 32 -bit search 1) L the search is − L ( = 10 L .8) trivially follows. Trajectory for Trajectory for , Hamming distance t , so that the string =4 can be reached only at stic strict-TS ( H = 14 it is stuck after visiting only 11 ict-TS can be stuck (trapped) at t jectory for an ited. In fact, the smallest strings. For guration space and the trajectory : iteration = 20 tion basin) the necessary number of L -bit strings after appending zero as the with respect to the local minimum point L CHAPTER 4. REACTIVE PROHIBITIONS H -th bit is set (e.g, in Fig 4.2). Now, a certain Hamming distance H string: 1010 string: 1011 string: 1001 string: 1000 string: 1100 string: 1110 string: 1111 string: 0111 string: 0101 string: 0100 string: 0110 string: 0010 string: 0011 string: 0001 string: 0000 =3 L = 14 =4 =3 =2 =1 =2 =3 =4 =3 =2 =1 =2 =1 =2 =1 =0 t . Fig. 4.3 shows the actual evolution of the Hamming H H H H H H H H H H H H H H H H and the search is stuck at of the trajectory for = 14 = 13 = 12 = 11 = 10 =9 =8 =7 =6 =5 =4 =3 =2 =1 =0 Stuck at (String not visited: 1101) t t t t t t t t t t t t t t t . The detailed dynamics is complex, as “iron curtains” of vis = 4 L = 32 L initial part because the fourth bit is set at this iteration). Equation (4 =8 t Figure 4.3: Evolution of the Hamming distance for determini If the trajectory must reach Hamming distance In practice the above optimistic assumption is not true: str can be reached only as soon as or after the % of the search space. before escaping (i.e., beforeiterations encountering is a at better least attrac exponential in distance for the case of stuck after visiting 47 % of the entire search space, for points (that cannot be visited again) are created in the confi a configuration such that allthat neighbors this have already event been happens vis is is not visited. The problem worsens for higher-dimensional H space becomes a legal or after most significant bit (see the trajectory for and binary string. This can be demonstrated after observing that a complete tra Figure 4.2: Search trajectory for deterministic strict-TS 32 Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. is 33 t ( . An (4.9) espect = 32 = 11 5 t 2 one obtains , the third at 2 = 32 t> L will be encountered is too big an “over- ts down and allows ch slower. In fact, the T , for is visited at =1 1 t will remain prohibited mption that strict-TS is behavior is the same for H ion is not as “intensifying” T = 1 raction basin is encountered. w bit is set for the first time, Let us now assume that one H ch. As an example let us note eater than 5, while actor depends on the size of the prohibit the immediate visit of isiting points at small Hamming n increases. In particular, there the Hamming distance is related mum occurs unless the nearest her-dimensional search spaces: erty of the stochastic version is e: some configurations at small w-dimensional search space can S, i.e., that all configurations at etween zero and a maximal value cling configurations with lower cost are is encountered at maller distances tend to be visited visited in different cycles. In fact all duced. =1  . As an example decreases). All configurations in a cycle H i L H have the same probability of being visited if  H . H =0 H <

while one point at = are visited before configurations at distance greater , if . This effect can be seen as a manifestation of the H H H 2 = 8 C t >> , and therefore this number of configurations have to be H C = 679 121 to be visited is: 4 . Therefore, at least a configuration at C H 1) , C steps. Therefore (see Fig. 4.4), the Hamming distance with r − is visited at n T ( = 242 825 2 5 = 4 C t> is too small the trajectory will never escape, but if H have been visited. T = 8 303 633 1) 5 -th at − C n L (only at this iteration the ice around the first frozen bit mel ( , 2 , zero probability otherwise. +1 . The initial growth of the Hamming distance is in this case mu +1 = 64 H T T ,... the L 4 It can be easily derived that Let us note that the relation of (4.8) is valid only in the assu The effectiveness of fixed-TS in escaping from the local attr Computations have been executed by Mathematica = ≤ 1 value with respect to the minimal distance such that a new att from (4.9) a value visited before finding a configuration at Hamming distance gr explosion in the numberattraction of basin iterations is spentfor very near close. a local The opti situation worsens in hig than constrained” trajectory will be generated. “curse of dimensionality:” aencounter dramatic technique performance that reductions worksis as in the the very danger dimensio that lo thedistances, entire unless search additional span diversifying will tools be are spent intro while v Fixed-TS The analysis of(“frozen”) fixed-TS for additional is simple: as soon as a bit is changed i t> number of configurations In particular, if deterministic and thatmanages it to is obtain notHamming a distance stuck more less than for “intensifying” or any equal version to configuration. of strict-T only after H T to the starting configuration willH cycle in a regular manner b are different (apart from the initial configuration). The cy both the deterministicthat and different the configurations stochastic haveconfigurations the version, possibility at the of a being prop given Hamming distance the iteration). Thisall is bits caused to by theobtained). right the In are fact particular, progressively the that, cleared second as configuration (because at soon new as a ne changes that are immediately executed because must obey the corresponding constraints.to The the slow growth “basin of before filling” effect points [4] atsome of larger points). strict-TS: distances allas (unless On points an the at the intuitive s other ironHamming picture hand, curtains distance of let are strict-TS visited usthat could only note the lead in point that the one at the last to explorat part believ of the sear 4.1. PROHIBITIONS FOR DIVERSIFICATION (TABU SEARCH) Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. s if the T (4.10) sion, let ons of the when a local T e of the limited fitness surfaces increases monotonically, } ) 2 t ( alue. As soon as the value .6. The maximal Hamming − ules are given for the entire T o guarantee that at least two he strict-TS case. of the escape from the attrac- isited configuration, while the ted if repetitions disappear for ry reaction that decreases ,L scheme used. Given the previ- are cast to integers before using } ministic and stochastic fixed-TS stem does not come back imme- ed by the fact that no additional articular, the evidence for a con- n real-world that T ig. 4.7 that expands the initial part +1 ,T 1 . and the diversification, a possible pre- 1 CHAPTER 4. REACTIVE PROHIBITIONS . After the cycle is completed the local T × T ). + 1) { ) T t ( T max { 2( is gradually decreased. This general procedure was if there is evidence that the system is confined near T T ) = min T ( EACT are dominant over decreases. In fact, to simplify the discus the maximum Hamming distance that is reached during a cycle i R ) for the deterministic version is shown in Fig. 4.5, repetiti T t at the beginning and that the reaction acts to increase ( T T =1 and the cycle length is T ). tends to be small with respect to its upper bound, both becaus +1 T ) t ( = 10 T T , = The evolution of The behavior of the Hamming distance isNow, illustrated for in a Fig. given 4 In the present discussion we consider only the initial phase = 32 max L prohibition local minimum point causeis reached a the rapid system increaseattraction enters up basins a to cycle. exist its This in maximal limit the cycle v test is caus case considered, while i them (the largest integer less than or equal to The initial value (anddiately lower bound) to of a one justmoves implies are left that allowed the configuration. at sy each The iteration. upper Non-integral values bound of is used t minimum point is repeated, in the following manner: size of the attraction basins and because of the complementa repetitions disappear. distance reached increases in a much faster way compared to t us assume that H feedback process. tor, when increases of used in the design of the RTS algorithm in [4], where specific r Figure 4.4: Evolution of the Hamming distance for both deter ( 34 ously illustrated relation between the prohibition a local attractor, untilfinement a can new be attractor obtainedfact is from that encountered. a the new repetition Ina attraction of p suitably basin a long has period. previously been found In v this can last be case, postula Reactive-TS The behavior of reactive-TS depends on the specific reactive minimizer is repeated and the reaction occurs. The result is scription is that of gradually increasing and therefore the cycle lengthof does the also, graph. as illustrated in F Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. in 35 , ... (4.12) (4.11) (4.13) ). 18 , 10 = 32 , L = 4 t is sufficient to escape becomes a lower bound T crucial one: one obtains reactive-TS ( max H 2 e reactive case. In this last case T  licative instead of additive (when ohibition 2 + − T  hm, and a (pessimistic) increase that 1 during the initial steps. The increase for deterministic reactive-TS with reac- − max ) t t H T √ ( + +1)=3 by one unit at each step. If the prohibition is O i 9+4 2( T of iterations executed is: 2 √ max at which a reaction occurs (like t H T =1 1 2 t i X ) = ) = ) = t T ( has been used. Therefore the increase of the maximum ( t max 1 max H ). ( − H t = 32 max L H = T in (4.10)), and therefore the above estimate of +1 >T ⌋ 1 . Figure 4.6: Evolution of the Hamming distance for simplified 1 Let us now consider a generic iteration Let us note that the difference with respect to strict-TS is a × just before the reaction, the total number T Fig. 4.7). At theT beginning (4.10) will increase tion at local minimizers ( Figure 4.5: Evolution of the prohibition parameter 4.1. PROHIBITIONS FOR DIVERSIFICATION (TABU SEARCH) reachable Hamming distance is approximately is clearly faster⌊ in later steps, when the reaction is multip in the following phase. an (optimistic) logarithmic increasebehaves in like the the square strictbold root algorit tours of at the increasing number distances of are iterations executed in th until the pr where the relation Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. ). ay ong = 32 L increases if T prohibited only ) mechanisms during value obtained during its T ons, together with the RTS s accordingly ( he algorithm checks whether reactive nt problems, thms for using the search his- : s disappear. TS, first 100 iterations ( ith a ess surface change slowly in the e search space vary widely in the ven faster. nce that diversification is needed is that diversification is needed, it de- ions. All configurations found dur- for the different tasks, T CHAPTER 4. REACTIVE PROHIBITIONS evidence would be inappropriate). T during the search is illustrated in Fig. 4.8, for a Quadratic of the search space. This is particularly relevant for “in- increases in an exponential way when repetitions are en- T T decreases if no repetitions occurred during a sufficiently l T is determined through feedback (i.e., T is not fixed during the search, but is determined in a dynamic w T local structure is equal to one at the beginning (the inverse of a given move is T tory. The three issues are briefly discussed in the following secti Let us note that An example of the behavior of 1. the determination of an appropriate prohibition 2. the robustness of the technique for3. a the wide adoption range of of minimal differe computational complexity algori Figure 4.7: Evolution of the Hamming distance for reactive- from the attractor.different In regions, addition, and RTS if reaches the a properties given of local the minimizer w fitn 36 previous history, the chances are large that it will escape e 4.2 Reactive Tabu SearchSome problems (RTS) arising in TS that are worth investigating are methods proposed to deal with them. 4.2.1 Self-adjusted prohibitionIn period RTS the prohibition at the next step),creases it when increases this only evidence when disappears.signaled there In is by detail: the the repetition evide ing of the previously search visited arethe stored configurat current in memory. configuration After has a already move is been executed found t and it react the search. a configuration is repeated, period). depending on the homogeneous” tasks, where thedifferent statistical regions (in properties these of cases th a fixed Assignment Problem task [4]. countered, it decreases in a gradual manner when repetition Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. of 37 ura- (or “digital of RTS (in chaotic attractors are still possible). In radix tree L × namics are illustrated in 2 on a QAP task. the consequent realization of ent to avoid long cycles (e.g., T of the ordered list of all moves m a second more radical diver- tic trapping” of the trajectory in all points, eventually all points rantee that the search trajectory limination method (REM) [14, 9], hrough the ists of a number of random steps ted portion of the space, although ct before choosing a version of TS ss widely different computational , a different classification is based nalogy is with itely often: if the probability for a er than with a bias toward steps that bring he same dynamics can be obtained asymptotic convergence escape routines to enforce long-term nts). The detailed investigation of the ns of a given set of configurations) are methods and storage of the configurations [27, 4], or, must be less than the length of the string, otherwise all T hashing , L ) is needed. The escape phase is triggered when too many config escape Figure 4.8: Dynamics of the the prohibition period For both reasons, to increase the robustness of the algorith With a stochastic escape, one can easily obtain the Some examples of different implementations of the same TS dy dynamical systems, where the trajectorya is limit confined in cycle a is limi not present). sification step ( tions are repeated tooexecuted often starting [4]. from A the simple currentthe escape configuration trajectory action (possibly away cons from the current search region). a finite-cardinality searchpoint space, to escape be iswill reached activated be after infin visited escaping - clearlyasymptotic is including properties different the and globally from finite-time optimaldiversification effects zero poi is of for an different open research area. 4.3 Implementation: storingWhile and the above using classification dealson the with dynamical the search systems detailed history datathe structures needed used operations. in Differentcomplexities the so data algorithms that structures attention and can shouldthat on posse be is spent efficient on on this a subje particular problem. Fig. 4.9. Strict-TS cana be implemented term through that the refersperformed reverse to e throughout a the technique searchin for (called all “running the list”). cases storage through T and standard analysis for the case of a search space consisting of binary strings, t addition, even if “limitavoided, cycles” the first (endless reactive cyclicis mechanism repetitio not is confined not in sufficient aa to limited limited gua region portion of of the search the space. search A space “chao is still possible (the a moves are eventually prohibited, and therefore cycles long 4.3. IMPLEMENTATION: STORING AND USING THE SEARCH HISTORY 4.2.2 The escapeThe mechanism basic tabu mechanismfor based binary on strings prohibitions of is not length suffici Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. - or hashing d can be less , a small and (1) O vent called “collision”). shown in Fig. 4.10. From ifferent data structures. ted configurations, or with the well-known ition is reported if and only el of detail, hashing can be y, if a “compressed” item is f the history is negligible for are stored. he configuration, the answer er iteration obtained with the hashing is t-out list where the prohibited TS, static tabu search – STS – nstant with respect to the num- nce: different hashing functions the binary strings, and constant be satisfied [14]), in addition its the last usage time of each move red, both REM and approximated ns that need only a few bytes per t, total number of repetitions) are reported even if the configuration statistically significant effects on ored (see also Fig. 4.10) an exact different domains [7]. REM is not s when hashing is used in other ate the cost function in the neigh- the number of iterations executed, Storage of configuration Storage of hashed value Storage of cost function ... CHAPTER 4. REACTIVE PROHIBITIONS Exact Approx Reverse Elimination Method Hashing (and storage ofRadix configuration) tree ... FIFO list Storage of last usage... time of moves Hashing Radix tree List of visited configurations ... one obtains an index into a “bucket array.” The items (configu phi memory per iteration, while the actual number of bytes store (1) Strict TS Reactive TS Fixed TS O Trivially, fixed-TS (alternative terms [13, 14] are: simple Reactive-TS can be implemented through a simple list of visi An example of a memory configuration for the hashing scheme is Figure 4.9: The same search trajectory can be obtained with d for REM, because only changes (moves) and notor configurations tabu navigation method)moves can are located be (the realizedand “tabu with by list”), a using or (4.4). first-in by firs storing in an array more efficient hashingrealized or in radix different treeanswer ways. techniques. is If obtained the Atif from entire a the the configuration finer memory configuration isstored, lev lookup has st like operation been a (awill visited hashed repet have before). value a limited ofis probability a On new, of limited the error becauseExperimentally, length (a contrar the derived repetition small compressed can from collisionthe be items t probabilities use are of do equal reactive-TSiteration not as by can have heuristic chance tool, beschemes and – used. is hashing an a versio subject e The worth effect investigating. of collision probabilitie 4.3.1 Fast algorithms forThe using storage the and search access history of the past events is executed through radix-tree techniques in aber CPU time of that iterations. istasks approximately requiring Therefore co a the non-trivialborhood. number overhead of caused operations by to evalu the use o the current configuration ration or hashed value or derived quantity, last time of visi tree”) technique [4]. Hashing– is possibly an with old incrementalapplicable tool calculation in to – Computer all are Scie computational problems available complexity (the for per “sufficiency iterationwhile property” is must the proportional to average complexity obtained through incremental 38 constant number ofradix operations. tree technique The iswith worst-case proportional respect to complexity to the p the number iteration.hashing of If use bits the in memory usage is conside Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. ) i ( 39 and ) i ( NSERT . I ⊲ ) L NSERT ( O en in [10] that is structures, where 5 r 5 t 5 operations I φ nodes) can be modified. A - n local search, configuration umber of stored items is not ompressed hashed value, etc.) live ing function scatters the items mputational geometry, editing, ulated from the configuration. ge operations in history-sensitive ic order, in time ntry. Both storage and retrieval k trees. The binary string can be nd, last but not least, the context ructure must be maintained and the previous work of [21]. 4 is defined. If collisions are resolved r . More precisely, given a hash table partially persistent 4 t n/m 4 φ = from the set. As cited above, configuration ⊲ ⊲ i α - with a possible realization as a balanced red- ] ,L [10], meaning that when a change is executed the 1 2 2 [1 r r r 1 2 2 t t t only because of the addition or subtraction of a single 1 2 2 ) φ φ φ t time, on the average. ⊲ ⊲ ⊲ ⊲ ⊲ ⊲ ⊲ ( ) - - - X α ephemeral time, while at most a single node of the tree is allocated or ) L (1 + O Bucket array P q  1 - - - elements, a load factor (log n P  O P  ) ) ) ) ) 4 2 3 1 5 techniques for obtaining persistent data structures is giv φ φ φ φ φ ( ( ( ( ( require ) i ( hash hash hash hash hash for inserting and deleting a given index ad hoc ) i ( slots that stores differs from configuration ELETE m +1) Ordinary data structures are t can be considered as a set of indices in ( ELETE D review of 4.3.2 Persistent dynamicPersistent sets dynamic sets are proposedheuristics to support in memory–usa [3, 2]. dedicated to a systematic study of persistence, continuing Hashing combined with persistent red-black trees The basic observation isX that, because Tabu Search is based o all versions can be accessed but only the newest version (the index (a single bit is changed in the string). Let us define the previous version is destroyed.implementation of Now, very high in levelof many programming history-based languages, contexts a like heuristics,accessed. co multiple In versions particular, in of heuristics one a is interested data in st X by chaining searches take black tree, see [5,immediately 16] obtained for from two seminal the papers tree about by red-blac visiting it in symmetr Figure 4.10: Open hashing scheme: items (configuration, or c are stored in “buckets”. The index of the bucket array isthen calc stored in linkedrequire lists an starting approximately frommuch constant the larger amount indexed than the of arraywith size e time a of if: uniform the probability bucketwith i) over array, the and the different ii) n the array hash indices 4.3. IMPLEMENTATION: STORING AND USING THE SEARCH HISTORY and D Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. . or (1) path O , but the ) L is to combine (log ) O X ( in the past is easy: , one is interested in t t . Let us now describe ) i eferences. Only the path unction, with a possible EARCH ( ointer and following it iff d studied in [21] to obtain ing only linear space pro- S phemeral tree (see the top nter to a node, if the extra path each time an update et of search trees is created, remains of btrees. The time and space by comparing the indices of r to the appropriate root of too, is copied. Thus copying ) eded only for the most recent i ven time gistered. If the extra pointer is ELETE . Contrary to the popular usage fied time ” pointer (beyond the usual left ed in the previous history of the ( e of updates is an upper bound ast version of the tree is followed a systematic study of techniques . t), extra pointers are dashed and he search (time of configuration, stored and the desired version is Let us now summarize from [23] ) cost per update operation is n example of the realizations that the last parent of the copied node. L d right pointers of the copy to their nsertion or deletion can be done in and the appropriate bucket searched. irection (left or right) does not have s copied or a node with a free extra er is used the pointer to follow from or D ) t ELETE ) ( i (log ( X O CHAPTER 4. REACTIVE PROHIBITIONS and D NSERT ) , see Fig. 4.12. From a given configuration i ( space. A smarter realization is based on pointers are not shown, colors are correct only ) L are now of ( ULL ) O NSERT i ( . Collisions are resolved through chaining: starting from . Otherwise, if the extra pointer is not used, the normal t is generated by the search dynamics, the corresponding (1) ) ELETE t ( O X color changes [26]. In addition, the amortized number of col , see for example [20]. ) and D L ) (1) i ( O (log O NSERT : the hashing value is computed from ) iteration. ) partially persistent dynamic sets t ( X any ( and , independently proposed by many researchers, see [23] for r rotations and EARCH As soon as configuration A convenient way of realizing a data structure supporting X- Let us now consider the context of history-based heuristics The worst-case time complexity of I The method that we will use is a space-efficient scheme requir The trivial way is that of keeping in memory all copies of the e Now, the REM method [13, 14] is closely reminiscent of a metho an index into a “bucket array” is obtained through a hashing f (1) X-S X hashing incremental evaluation in time each bucket headerthe there persistent red-black is treenumber a of and repetitions). linked satellite data list needed containing by a t pointe persistent red-black tree is updated through I For each item in the linked list the pointer to the root of the p checking whether a configuration hassearch, already at been encounter important result derived in [23] is that the amortized space Let us recall thaton the the total actual amortized number space of cost nodes of created. a sequenc of persistent dynamic sets to search past versions at a speci after starting from thea appropriate root, node if is thethe determined extra time point by stamp examining isleft-right the pointers not time are larger considered. stamp than to of Note be that the stored: the extrathe pointer given p d children the with search(live) that version tree of of the property the node. it tree. can In be In addition, derived Fig. colors 4.11 are N ne for the live treetime-stamped. (the nodes reachable from the rightmost roo O deallocated at each iteration. Rebalancing the tree after i 40 changes per update is posed in [23].occurs. To The this approachand end, right each avoids ones) node copying with containspointer a the an is time additional entire available, stamp. “extra italready access When is used, attempting used the to node andlatest add is the values. copied, a time In setting poi of addition, theIf the a initial pointer usage the left to is an parent the re proliferates copy has is through already stored successive in used ancestorspointer until the is the extra encountered. root pointer, i Searching the the parent, data structure at a gi partial persistence, in whichrebuilt the from entire scratch each update timewith sequence an better is access is space-time performed, complexitieshow while a is partially present persistent inwe red-black [23, consider tree is 10]. can presented be in realized. Fig. 4.11. A part of Fig. 4.11), each copy requiring copying from the root toone the nodes per where update, changescomplexities are having for made different is I copied: roots a but s sharing common su Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. ) t 6 ( 41 (in ULL X (1) , Acta , ORSA O with that ) , Addison- t , Tech. Re- ( X , Tech. Report time is needed , McGraw-Hill, is updated and lgorithms ) mputing (Berke- t space (amortized L ng table is Ω( (1) O euristics d heuristics average-case time and , Addison-Wesley, 1990. ) sed version, Presented at with a pointer to the root L ( ntire search, it is straight- ations and for establishing O Making data structures persis- he configuration e require , canned with no success, a N suming that the bucket array size , ORSA Journal on Computing required to update the current (2003), no. 3, 268–308. s. requires values, the above complexity is optimal 35 Introduction to algorithms f Data structures and algorithms (1993), 31–46. in the worst case. Because 41 ). Otherwise, the last visit time reactive-TS ) ) L ( X,t O ( Metaheuristics in combinatorial optimization: Overview being the load factor, in our case upper bounded by 1). . If the sets are equal, a pointer to the item on the linked ) α t ( , , i.e., The reactive tabu search ) ) NSERT X ) Dynamic tabu list management using the reverse elimination α t , ACM Comput. Surv. ( (1992), no. 1, 32–37. Some convergence results for probabilistic tabu search X ( 4 (1 + O EARCH (1972), 290–306. 1 Partially persistent dynamic sets for history-sensitive h Symmetric binary b-trees: Data structure and maintenance a Object oriented programming, an evolutionary approach Time- and space-efficient data structures for history-base , Annals of Operations Research , , the average-case time for searching and updating the hashi ) , Proceedings of the 18th Annual ACM Symposium on Theory of Co L amortized space for storing and retrieving the past configur and conceptual comparison New York, 1990. (1994), no. 2, 126–140. Informatica method tent ley, CA), ACM, May 28-30 1986. Journal on Computing port UTM-96-478, Dip. di Matematica, Univ. di Trento, 1996. Wesley, 1983. UTM-96-478, Dip. di Matematica,the Fifth Univ. DIMACS di Challenge, Trento, Rutgers, 1996, NJ, 1996. Revi (log In the last case a new item is linked in the appropriate bucket After collecting the above cited complexity results, and as In fact, both the hash table and the persistent red-black tre O (1) [7] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, [8] B. J. Cox, [9] F. Dammeyer and S. Voss, [6] Christian Blum and Andrea Roli, [5] R. Bayer, [4] R. Battiti and G. Tecchiolli, [1] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, [2] R. Battiti, [3] [10] J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan [11] U. Faigle and W. Kern, and the old set is compared with BIBLIOGRAPHY The time is therefore dominated by that required to compare t is list is returned. Otherwise, after the entire list has been s pointer is returned. of the live version of the tree (X-I for the tree). The worst-case time complexity per iteration detail, searches take time the repetition counter is incremented. is equal toforward the maximum to number conclude of that iterations each executed iteration in of the e for the considered application to history-based heuristic Bibliography prohibitions. obtained through X-S during the neighborhood evaluation to compute the O Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , , at (1981), , Commu- , Comput- 12 , Proc. of the , Parallel Comput- em Theory, Urbana, puter Society, 1978, , Annals of Operations , Tech. report, Dept. of , Information Processing igan, 1994, pp. 64–92. (1989), no. 3, 190–260. , Bell Systems Technical J. (1990), no. 1, 4–32. 1 ds, 1981. al Intelligence, AAAI Press / 2 , Mathematical Programming, State Optimization by simulated annealing CHAPTER 4. REACTIVE PROHIBITIONS , Information Processing Letters Combinatorial optimization, algorithms and complex- A dichromatic framework for balanced trees Hashing vectors for tabu search , ORSA Journal on Computing Planar point location using persistent search trees Algorithms for the maximum satisfiability problem Algorithms for computer solution of the traveling salesman Hysterical b-trees , ORSA Journal on Computing Towards an understanding of hill-climbing procedures for s (1986), no. 7, 669–679. 29 Searching in the past ii: general transforms Updating a balanced search tree in o(1) rotations (1993), 123–138. (1983), 671–680. Robust taboo search for the quadratic assignment problem (1983), 253–257. Tabu search - part i 41 220 Tabu search: Improved solution alternatives , Proceedings of the Sixth Allerton Conf. on Circuit and Syst Tabu search - part ii 16 (1990), 279–303. (1991), 443–455. Computer solutions of the travelling salesman problems , , 44 17 (1965), no. 10, 2245–69. , Prentice-Hall, NJ, 1982. Letters ing nications of the ACM problem ing Illinois, IEEE, 1968, pp. 814–821. Proceedings of the EleventhThe National MIT Conference Press, 1993, on pp. Artifici 28–33. of the Art (J. R. Birge and K. G. Murty,19th eds.), The Ann. Univ. Symp. of onpp. Mich Foundations 8–21. of Computer Science, IEEE Com Research Science no. 4, 199–202. ity 44 Computer Science, Univ. of Utrecht, Utrecht, The Netherlan [27] D. L. Woodruff and E. Zemel, [18] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, [24] K. Steiglitz and P. Weiner, [14] [15] [26] R. E. Tarjan, [25] E. Taillard, [12] I.P. Gent and T. Walsh, 42 [13] F. Glover, [16] L. J. Guibas and R. Sedgewick, [17] P. Hansen and B. Jaumard, [19] S. Lin, [21] M. H. Overmars, [22] C. H. Papadimitriou and K. Steiglitz, [23] N. Sarnak and R. E. Tarjan, [20] D. Maier and S. C. Salveter, Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. 43 ucket”. 9 8 6 6 9 7 4 7 = 2 t = 1 9 7 t nter to the appropriate root 6 8 8 6 8 6 =2 =2 =2 5 4 4 5 ket array is calculated from the t t t tion of 7 and 5. Path copying 4 5 tree from an ephemeral one (top), 3 8 6 node copying (bottom) with dashed 3 4 3 9 7 = 1 9 7 8 6 9 7 t 6 8 4 8 6 =1 =1 =1 4 8 6 4 t t t 4 3 3 Ephemeral red-black tree 3 9 8 Ephemeral red-black tree with path copying =0 in the persistent search tree is stored in a linked list at a “b 4 t ) Ephemeral red-black tree with limited node copying t ( 3 6 X Figure 4.12: Open hashing scheme with persistent sets: a poi for configuration BIBLIOGRAPHY Items on the listconfiguration contain through satellite a data. hashing function. The index of the buc Figure 4.11: How tocontaining obtain indices a partially 3,4,6,8,9 persistent(middle), with at red-black thick t=0, lines markinglines the with denoting copied the subsequent part. “extra” Limited inser pointers with time stamp. Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. CHAPTER 4. REACTIVE PROHIBITIONS 44 Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. c a of the , rather than of the problem, d of the search. This and (Johann Von Neumann) ddition of certain verbal c a candidate solution, model ), and the correspond- d simplified version , so the global minimum e mization problems by ap- e and continuous domains, the distribution: eventually e model, so that the future em as possible, while being ed line) to take into account ear intain a erpret, they mainly make models. highlights a possible drawback on can be assumed). Based on sense) than the problem itself. r around ificant probability of generating through ons. If the problem is a function lutions. Clearly, for a model to probability distribution defining odel as a rent family of useful optimization a bability to the neighborhood of or probability distribution for the intensification t a certain point. heless, the same problems can be unction (continuous line) must be fication of such a mathematical construct knowledge or by uniformity assumptions. is solely and precisely that it is expected to work. 45 shall be discovered and a high probability density a priori e By a model is meant a mathematical construct which, with the a The sciences do not try to explain, they hardly even try to int could lead to a negligible probability of selecting a point n . Further model-guided generations and tests shall improve b d interpretations, describes observed phenomena. The justi In the previous chapters we concentrated on how to solve opti To solve a problem, we resort to the model in order to generate Although memory-based technique can be used in both discret Model-based search Chapter 5 plying different flavors ofthought the of local from search a more paradigm.techniques. global Nonet point of view, leading to a diffe 5.1 Models ofThe a main problem idea of model-based optimization is to create and ma whose aim is to provideto some be clues minimized, about for the instance,function problem’s it soluti itself; is in helpful more tothe think general estimated of settings, likelihood such the of m model finding can a good be quality a then solution check a it.generation The is result biased ofbe towards the useful better check it and shallsomehow must better be “more provide candidate used tractable” as so toThe (in much refine initial information a th model computational about can be or the created analytical probl through would never be discovered.is It why, looks in like practice, thepoints the emphasis also models is in are on unexplored corrected regions. to ensure a sign and the latter case betterminimized. supports An our initial intuition. model Inminimum (the Fig. (in dashed 5.1 case line) a ofthis provides f no estimate, a prior some pri candidate knowledge, minima a are uniform generated (points distributi the region around the global minimum and shall be assigned to its surroundings. The same example also of applications na¨ıf of the technique: assigning a high pro ing function valuesthe are latest computed. findings: the The global model minimum is is more updated likely (dott to occu Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. ). In 2 x and updates the 1 1 2 t the problem (previous Function model model Sampled points Unknown optimum e theoretical aspects. The model, and an appropriate lf an optimization task. In me of a model-based search Genetic or Evolutionary Algo- ether and to keep in memory oop were new candidates are memory about past states— is s from model low cost values (see model : etween the machine learning sub- uture steps. A lot of computational cal optima, as well as overtraining, hesis reminds of search algorithms CHAPTER 5. MODEL-BASED SEARCH learning techniques, see for example [11]. model-based runs the risk of becoming more and more difficult to e and e a b c d instance-based O f(x) solutions and evaluations). a model used to generate sample solutions, the last samples generated, a memory containing previously accumulated knowledge abou An alternative possibility would be to avoid the model altog • • • Now that we are supported by intuition, let’s proceed with th The design choices consist of defining a suitable generative Figure 5.1: Model-based search: one generates sample point 46 generative model to increase the probability for point with pathological cases, optimalgenerate. point all or a fractionbased of on the a previously dynamic population testedrithms). of points sample The (this points, informed hypot also readerdivision known will as between also notice a similarity b discussion is inspiredapproach by is presented the in recent Fig. survey 5.2. Represented in entities [15]. are The sche The process developsgenerated in by an the iterative model,used way and to through their improve a evaluation the feedback —together model itself l with in view oflearning a rule new to generation. favor generationcomplexity of superior can models in lurkparticular the one f within should the avoid learningwhich second rules would converging issue, hamper to generalization. lo which is in itse Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , θ 47 p statistics are rithm [1] the es, a prototype ; S s exploitation). The moving ith respect to the bit values: ; on some benchmark tasks [1], cs from the standard genetic p g. 5.3. quality solutions. As a parallel is the set of all binary strings of s of the optimal positions corre- dscape has a rich structure, for n r to that used in Learning Vector s updated after learning from the tion-Maximizing Input Clustering) bad samples in addition to moving } Memory ue below a given threshold (remem- 1 3], which also suggest a clear theo- , : , dering possible dependencies in the Long−term memory tic in a GA population, n ¯ 0 S { 1] ∈ , ) = [0 -th bit of the string and every bit is in- . n i X ∈ -tuple of parameters ) of the best solutions from n n using the vector ¯ S ,...,s S 1 s Generation as the , s ,...,p 1 =( Sample ρ 1 ). The method aims at modeling the distribution s p + Model p . = ( ) ) i ρ p p ( − sample (1 ; p minimized ← : p Generate a sample set Extract a fixed number for each Feedback (learning) repeat Initialize in the vector . In the Population-Based Incremental Learning (PBIL) algo 5. 6. 4. 3. 2. 1. ,...,n dependencies among the individual bits = 1 is the probability of producing can be seen as representing a moving average of the best sampl i is a learning rate parameter (regulating exploration versu , the generation model is defined by an i , p n p ρ 5 . Although this simple technique shows results superior to GA The initial state of the model corresponds to indifference w = 0 i 5.3 Dependent probabilities Estimates of probability densitiesform for of pairwise optimization conditional consi retical probabilities framework. are Their studied MIMICaims technique in at (Mutual-Informa [ estimating a probabilityber density that for points the with function val is to be vector placed in thewith middle machine of learning the literature,Quantization, cluster see the [7]. providing update Variations the include ruletowards moving best good is away from ones. simila A schematic representation is shownthe in method Fi has intrinsicexample weaknesses more than if a thesponding single optimization to cluster, lan or complex configuration vector where dependently generated.algorithm” The [1]: motivation instead of is maintaining to implicitly “remove a statis geneti Figure 5.2: Model-based architecture:last generated a samples generative and model the i previous long-term memory. 5.2 An example Let’s start from a simple model. The search space 5.2. AN EXAMPLE length where following steps are iterated: p maintained explicitly Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. he areas utions. Fi- ) n space; X ( p chosen from a reduced ) ints are in). The choice 1 p ) n X X | ( 1 ˆ p − n be used and updated within n obability distribution is also known, a single sample ber of sample points. In [3] one er 8). The algorithm therefore θ ; a progressively lower threshold model is appropriate, one aims X ) is to be selected. In the absence ( ≤ ; X θ + 1 t ) ( ; t θ x 0 f p ...p ( . ; ) ← f X ) t ← θ n t Good quality solutions t + p CHAPTER 5. MODEL-BASED SEARCH X if otherwise. ; ( ˆ + p +1 t ) = min -th percentile of the data (so that a specific θ + X ˆ ,...,X θ N so that it permits a non-trivial estimation of ( 3 θ + X with a distribution + | gradually shifts towards good quality solutions 1 requires excessive computational resources, one 2 ) + ) p X X ( ( Bad quality solutions X p p ( ) ˆ measure p t 0 n ( +1) is defined as the one minimizing the Kullback-Leibler − p    − , zero elsewhere: ) θ X − ( ≤ )= ˆ p − ,...,X ) x t ( ) 2 ( p value on this population; θ X X ( -th percentile of the data; p | f f − − 1 − N X ( ← O p : 2 with p behavior: the threshold will be high at the beginning, then t mean +1 t )= X retain only the points less than Update the density estimatorGenerate model more samples fromθ the model ← so that it is equal to a fixed X 0 ( , a widely used measurement of the agreement between distrib θ Generate a random initial population uniformly in the input repeat θ p ) ˆ p k 6. 7. 5. 2. 1. 4. 3. p ( D adapt would be sufficient to identify the optimizer. is too low no sample point will make it, if it is too large all po ˆ θ θ p threshold-descent What is left is the choice of a (parametric) model so that it ca Now that the path is clear, let’s see the details. The joint pr Before proceeding with the method, a proper threshold (if θ nally, because identifying the optimal resorts to a greedy construction of the distribution divergence acceptable CPU times and withoutfirst requiring approximates an the excessive true num distribution Figure 5.3: PBIL: the “prototype” vector 48 set of possibilities. The closest (qualitative example in two dimensions). uniform over inputs If this distribution is known and the best value p of a priori knowledge, we can aim at tuning from leading to lower and(a lower similar costs technique willproceeds will be as be identified follows: leading encountered to for racing in Chapt in [3] is to at a fraction of the sample points is below the threshold). If the Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. s is 49 θ p (5.1) (5.3) (5.2) , } ) n to fix indices i X ( . For simplicity, h ; ted solutions ) le transformation). . n . In detail, one con- ) i )+ ) j X ) n , +1 i ( k n θ ˆ X i p π | X were used. Actually, the P is chosen one generates i | X | 1 X h ln X ( k ) − i ( p n rbrace). Because searching p i ) ∇ X an heuristic greedy manner X pirical mean from samples to minimize is: ( n ( θ X stimated by sampling [12]. In ˆ θ π p ) ( P has minimum divergence with . ˆ ) P p ) h tropy X ) | k of the probabilistic construction ing to the highest expected value t ∇ X 1 p +1 θ + ) ( ilities and, at each step, calculating the s dependent on the specific model. X ( − k θ P i ( f n X 0 D π π ) ( X · · · s now substituted with a continuous {z ˆ ion model is determined by a vector of ln | θ p : f X j X X X 1 ( ∇ ( X X p ) )+ is a suitably small step-size. ( X f d = 3 X ˆ h t θ i so that the cost function to be minimized X ;  stochastic gradient ascent ǫ p ) } ) ( · · · ) ,..., X j n π | ) E f 2 X 3 2 )= X t based on the estimated X ( where each member is defined by a permuta- i θ ( π ( − S θ ,...,i X P n π ) ˆ h X ∈ i X ( P X p ( s | such that j +1 , n θ X θ X 2 t h k ∇ 1 ( i ǫ P π π , where arg min π θ ) log ˆ ] ) − : ˆ 6∈{ p X + t based on the estimated P π j 1 )+ arg max ( X ) n − p t θ 2 ( p k i ( i ) ) θ = f X ← = arg min 2 ( E X X X . ∗ π k | k = ,..., [log ˆ f i ( 1 X ∇ θ n 2 ← p i X p X t | ǫ n E X +1 1 X − , obtaining the following update rule: for i t X ∇ ( t π log + θ − h | S , n X t ] ) 3. 2. 1. 1 ( p θ = p X )+ − ( = p n p ( ) = [log )= h θ p X = +1 ( t X Z E − k ( θ E pick a value for π ∇ ˆ p = = for Randomly pick a value for , and that the related probability distribution over genera ) = 3. 2. 1. π Θ ˆ p is positive (otherwise make it positive by applying a suitab k ∈ permutations is too demanding one adopts a greedy algorithm f p ( ! θ n D is the estimated empirical entropy. After the distribution ˆ h of the indices between 1 and π : Next, because While an approximated probability distribution is found in Gradient ascent consists of starting from an initial f gradient and updating above, an alternate standardassume that technique is that of model and does steepestdetail, descent let’s assume where that the theparameters probabilistic exact sample gradient generat is e Here one starts from an initial value for the parameters differentiable. The combinatorial optimizationoptimization problem problem: i determine theof model parameters lead is give by theamong rest all of the summation (the terms within the unde first term in the summation does not depend on where the properties of the logarithm and the definition of en samples as follows: is approximated starting from pairwise conditional probab 5.3. DEPENDENT PROBABILITIES sequentially: We leave to the reader the gradient calculation, because it i the actual distribution. The Kullback-Leibler divergence extracted from the distribution The gradient of the expectation can be substituted with an em siders a class of probability distributions One aims at picking the the permutation where tion Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. : n θ f a). )) (5.4) (5.5) (5.8) (5.7) (5.6) X ( f )( ich con- X ( t p to the closest cross-entropy ˆ ∝ p ) , which may also X f ( n p , by the project p values tend to smooth tion of the SGA technique, ximization problem usually T ifferent steps). An example rches have been successful. a fast sample approximation . g the (estimated) gradient on ) but also sufficiently flexible to rtunately many issues have to ic gradient ascent, for example , we will . X ) he embedded optimization tasks : ) obal optima solutions! Unfortu- easured by the Kullback-Leibler ( ecific problem. In particular the es. In addition local minima are s the “intensification” (exploitation) p ( X on the starting distribution by the t ( θ p of rare events in stochastic systems. ) goes to infinity p ) ln ) ) X n ln X X X ) ln ( ( CHAPTER 5. MODEL-BASED SEARCH ( ( p ˆ p p f ∇ X ) ) ( ln s ln f ( X ˆ p t ( ˆ p f t S t p ∈ X S X is going to be a member of this parametric fam- s X X ∈ X ˆ X p s X − X t ǫ can be adapted (large )= θ p + )= T k p t | θ , the final distribution update is given by solving the p θ we obtain higher probability values for higher function p (ˆ arg max f (ˆ ) is constant when minimizing over · D = h = arg max p t X which is progressively updated to increase the probability as used in the MIMIC technique, or a Boltzmann function p ( +1 = ln ˆ f +1 t 0 ) t ) ˆ t p is a distribution which can be obtained by fixing parameters p θ p t +1 X solution (and again remembering the pitfalls of local minim also holds for monotonic transformations of X t p ( p t t f P p p f <θ ( ∝ I ) X ( ˆ p method of [14] [2] builds upon the above ideas and upon [13] wh where the temperature is proportional to ˆ p f/T − cross-entropy = exp We already anticipate the final part of the story: the above ma Similarly to what was done for stochastic gradient descent, The Cross-Entropy method therefore appears asThe a theoretical generaliza framework above is surely of interest, unfo T values, it looks like we are on the correct road: when the probability tends tonately there be is different an from obstacle: if zero only at the gl ily. Here our cross-entropy measure (CE) comes to the rescue of generating good solutions. If we multiply at each iterati function evaluation 5.4 The cross-entropyThe model 50 distribution is thedivergence: parametric family, where closeness is m What holds for the original in our model, there is no guarantee that also siders an adaptive algorithmOne for estimating starts probabilities from a distribution or, taking into account that maximization problem: Because our cannot be solved exactlystarting but from again the one previous can resort to stochast differences leading to a less aggressive search). can substitute the summation over the entire solution space allowing for time-dependent quality functions. be specified before obtainingmodel an has to effective be algorithmaccommodate sufficiently for the simple a structural to sp propertieshaunting allow along of fast the specific way computation landscap ofimplicit gradient in ascent the in technique. theflavor solution An of of intrinsic t the danger approach, is related one on tends to search where previous sea If instead ofactually a recovers complete the ascent stochastic gradient one ascent follows rule: only one step alon depend on memory (sois that an different indicator function functions can be used at d f Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , is 51 } n , i.e., ,...,α and on the 1 is computed α α { i α . The adaptation i p ), while after a number ] is based on repetitions /n s.), vol. 9, The MIT Press, niversity, 1995, CMU-CS- MIMIC: Finding optima by iterations concentrate the tive-GRASP is derived both proposed in the framework estimating their probability relaxing the greediness = 1 pportunity arises for a set of ique with advanced memory- on analogies with ant colonies e search context. Preliminary tional to these average values i y A tutorial on the cross-entropy ing, the reward in optimization p ( f possible values the search space”. 6]. The purpose is to adapt the c information about the a priori rs population-based probabilistic al components (or more complex ting a promising region and then sing solution components in a re- ns, although the most successful red about the quality of solutions xtracted. The bigger the candidate ng from the just constructed solu- ons in a probabilistic manner, and truction happens via a model which ions) can influence the choice of the ing different values for ed by either stochastically breaking idate list can be self-tuned, or statis- is defined as the fraction of elements struction. In spite of the name, there obtained when using α i a (2005), 19–67. 134 , Advances in Neural Information Processing Systems Removing the genetics from the standard genetic algorithm is chosen at each construction with probability α , Annals of Operations Research Tech. report, School of95-141. Computer Science, Carnegie Mellon U method estimating probability densities (Michael C. Mozer, Michael1997, I. p. Jordan, 424. and Thomas Petsche, ed In spite of the practical challenges, some approaches based The Greedy Randomized Adaptive Search (GRASP) framework [5 Estimation of Distributions (EDA) [8] algorithms have been [1] S. Baluja and R. Caruana, [2] P. de Boer, D. Kroese, S. Mannor, and R. Rubinstein, [3] Jeremy S. de Bonet, Charles L. Isbell Jr., and Paul Viola, of GRASP constructions the average solution value and the probabilities are(of updated course so scaled so thatfrom that they the additional they become diversification sum propor implicit up when to consider one). The power of reac considered. A valueacts of on the probabilities: at the beginning they are uniform that are considered in the restricted candidate list; a set o Unfortunately, a newexploration gold too much, mine seeis will also not Fig. never on 5.1. be revisitingmoving Contrary old found on to good to if gold solutions explore min the but uncharted territory. on first rapidly exploi behavior [4] are based onis similar influenced principles: by solution cons theACO quality applications of are previouslybased in generated features, solutio fact local combinationsdesirability search, of of and the the different problem basic solution specific techn components. heuristi of randomized greedy constructionstion. and local Different search starting starti ties points for during local the search selection are of obtain the next solution component or b BIBLIOGRAPHY putting at each greedy iterationstricted a candidate list, fraction from of which thelist, the most the winner promi larger is randomly the e is exploration actually characteristics of no the adaptationself-tuning con possibilities. to previous For example runstics the in size about of GRASP, relationships the but cand structural between the properties final o learned quality fromnext the and previous solution individu construct component toinvestigations add, about going Reactive-GRASP towardssize are the of presented model-bas the in candidategenerated [10, with list different by values. considering In information detail, gathe a parameter reactive adaptation of the probabilities. of evolutionary computation foruse modeling it promising to soluti produce thesearch next generation. algorithms A survey baseddistribution in [9] and on conside using modeling the model promising to solutions guide the by exploration of Bibliography Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , , 22 Re- ons ¨ack, (2000), , DIALM , Journal 12 , European s optimization , Theor. Com- (2002), no. 1, 5– , Proceedings of the 21 SA), ACM Press, 2001, (1999), no. 2, 127–190. , Ann. Math. Stat. 1 ete algorithms and meth- , 1991. st, Massachusetts), Morgan obile phone networks nd Mauricio G. C. Resende, Model-based search for combinato- CHAPTER 5. MODEL-BASED SEARCH A survey of optimization by building and using Introduction to the theory of neural computation (1997), 89–112. , INFORMS JOURNAL ON COMPUTING Ant colony optimization theory: a survey 99 Greedy randomized adaptive search procedures Reactive grasp: An application to a matrix decomposition A stochastic approximation method From recombination of genes to the estimation of distributi (1995), 109–133. 6 , Parallel Problem Solving from NaturePPSN IV (A. Eiben, T. B , Computational Optimization and Applications , pp. 373–395. Optimization of computer simulation models with rare event Combining instance-based and model-based learning (2005), no. 2-3, 243–278. The cross-entropy method for combinatorial and continuous 344 , rial optimization METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY put. Sci. of Global Optimization active grasp with path relinking’01: for channel Proceedings assignment in ofods m the for mobile 5th computing internationalpp. and 60–67. workshop communications on (New York, Discr NY, U M. Shoenauer, and H. Schwefel, eds.), 1996, p. 178187. Addison-Wesley Publishing Company, Inc., Redwood City, CA i. binary parameters Tenth International Conference on MachineKaufmann, Learning 1993, (Amher pp. 236–243. (1951), 400–407. probabilistic models 20. no. 3, 164–176. Journal of Operations Research problem in tdma traffic assignment [4] Marco Dorigo and Christian Blum, [5] T.A. Feo and M.G.C. Resende, [6] Fernando C. Gomes, Panos Pardalos, Carlos S. Oliveira, a [7] J.A. Hertz, A. Krogh, and R.G. Palmer, [8] H. and M¨uhlenbein G. Paa, [9] M. Pelikan, D.E. Goldberg, and F. Lobo, [15] M. Zlochin, M. Birattari, N. Meuleau, and M. Dorigo, 52 [12] H. Robbins and S. Monro, [13] R. Rubinstein, [10] M. Prais and C. C. Ribeiro, [11] J. R. Quinlan, [14] Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. . . i ) x n ) t ) is ( X ( search f , i.e., − LIP ) ) t ( MAX-TRIES plateau . (ad. in the web) µ X ( , selects a variable f ). At each iteration p local search, each one + tisfied clauses, is often a ber of tries ( ed local minimum, so that ber of unsatisfied clauses. pped repeatedly in order to int a seminal paper in this examples see [11], [27], and by the function F qual, getting different values ntuition, see also Fig. 6.1, one s of LS MAX-FLIPS hat satisfies all clauses. ivated by previous applications he end of the run. In addition, ve function in order to support eration process to intensify the um is reached a y along the plateau it would be trainees; they are great upper body, e search trajectory will be gently ty over such variables, the other y to to the model-based search ot shown. In particular, the best function so that previous promis- e identification of other attraction roposals, method based on clause atisfiability problem is of particular basins. rtional to the problem dimension ness techniques have been proposed ins are added to GSAT in [25, 26]. In ligence, see for example [10] and [18]. given by the number of satisfied clauses. f for short) is defined as f ∆ 53 (or f µ ∆ . One criterion, active with “noise” probability ) i x , the quantity low-cost exercise. Here’s the proper way to do them anywhere − µ (1 The algorithm is briefly summarized in Fig. 6.2. A certain num This chapter considers reactive modification of the objecti As with many algorithmic principles, it is difficult to pinpo The influential algorithm GSAT [27] consists of multiple run In local search for SAT [27] methods (GSAT) single bits are fli How to Do a Proper Push-Up. Push-ups aren’t just for buff army a variable is chosen by two possible criteria and then flipped becomes equal to Different “noise” strategies toparticular, escape the from GSAT-with-walk attraction bas algorithm. executed, where each try consists of a number of iterations ( occurring in some unsatisfiedone clause is with the uniform standard probabili method based on the function For a generic move phase, when bitstax are to flipped pay withoutInstead before changing of reaching the executingbetter an number attraction to approximate of basins “give random-walk sa some withbasins trajector tilt” in a to the the lower plateau neighborhood.in num surface the If neighborhood to is all fasten facilitated, clauses th and are soon not after the born first to p be e Reacting on the objective function Chapter 6 appropriate diversification oftechniques, the here the search focussearch process. is in promising not regions, on Contrar buting modeling on areas a modifying in the solution-gen the objective pushed solution to space visit appear new lessmay portions favorable, think of and the about th searchthe pushing space. search up To trajectory the help will search the flow i into landscape neighboring at attraction area. a The discover literature aboutinterest. stochastic local Different search variations for of thefor local S search SAT with and random MAX-SATthe updated starting review from of theof [14]. “min-conflicts” late These heuristics eighties, techniques in were for the in area some part of mot Artificial Intel The straightforward book-keeping partassignment of found the during algorithm allthe is trials run n is is terminated saved immediately and if reported an at assignmentincrease t is the found number t of satisfied clauses. After a local minim consisting of a number of iterations that is typically propo Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. atisfied after a omes not a simple ights increments which ncreased, after some time eights. New parameters are attraction basin. More algorithm based on the ase the effectiveness of GSAT te as one does not wait until lgorithm a positive weight is dified during problem solving learning how difficult a clause a repulsion from a given local hould be counted when deter- f heir weight is increased so that e weights are updated after each niques are proposed in [8], either ∆ o use weights to encourage more n of the search space where the eflect overall statistics of the SAT s at arameters based on feedback from tly push the solution out of a given generates random numbers in the a while the search proceeds. Clause–   new local minimum c ANDOM then dynamic penalty

t . Let and − 1 Z 2 α 1 S )= { t A ( )= ′ A and >t B is killed 1 = min A terminates, A S T T AB , F ) αt survival function ( when the whole CPU time is allocated to it. Let . Then the fraction dedicated to ) = Pr( 1 A t p ( A A A Figure 7.1: A sequential portfolio strategy. S )= t ( CHAPTER 7. ALGORITHM PORTFOLIOS AND RESTART STRATEGIES ′ A p to complete: is the corresponding cumulative distribution function. If t ) t ( A ponent algorithms F As another example, when surfing the web the response time to d Consider a portfolio of two algorithms lot. Again, it issave intuitive the that user clicking from again on an the ”endless” same waiting li time. 7.2 Predicting the performance of a portfolioTo from make the its above com- intuitive arguments precise let 64 time of arrival of process longer that probability distribution. The where total CPU time isand dedicated no to process it swapping in overhead, we a can time-sharing model fashion the wi new system cumulative distribution function are respectively: of completion is described by random variable be the random variableswhole associated CPU), with with their survival termination functions t process running algorithm time of the two-process portfolio system is therefore descr Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. 65 can be e where ) T ( e of a lower Var ted completion ) p ONENT ALGORITHMS Algorithm B >t 2 Parametric portfolio Pareto frontier T 1 − 2 α cribed above, we obtain a ) Pr( >t ) 1 deviation are plotted against each T bution is bimodal or heavy-tailed), plot. The efficient frontierimal contains or extremal points). 1 >t − 1 } α 2 T . 1 ) − 2 t ( ∂t , α ) = Pr( has a fairly low average completion time, but 1 ∂S and the standard deviation ) T t 1 − A >t ) 2 − 1 2 T α ( T )= { 1 > α t ( − 2 2 Expected completion time p α T Alpha = 0.245 . ∧ ) t 2 ) Pr( t α >t 1 ( can be obtained by differentiation: 2 ) = Pr(min 1 S T T Alpha = 0.95 Alpha = 0.93 Alpha = 0.445 ) > α 1 t − 1 1 1 becomes available. Fig. 7.2 illustrates an interesting cas α T T >t α ) ( 1 T are given. Algorithm ( S B Var knob, therefore, a series of possible combinations of expec = Pr( = Pr( = Algorithm A 1 p has a higher expected completion time, but with the advantag α and ) = Pr( t B ( A S

6 8 10 12 14 16 18

01 41 18 16 14 12 10 Risk (standard deviation) (standard Risk The survival function of the portfolio is By turning the suffers from a largewhile standard algorithm deviation (because the distri two algorithms risk of havingparametric a distribution whose longer expected computation. value and standard By combining them as des Finally, the expected termination value E the set of non-dominated configurations (a.k.a. Pareto-opt Figure 7.2: Expected time versus standard deviation (risk) 7.2. PREDICTING THE PERFORMANCE OF A PORTFOLIO FROM ITS COMP The probability distribution of calculated. time E(T) and risk Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , ) t . ( i and N (7.2) (7.3) p = 2 while the n ads to: , we must t t + probabilities 1 = n be the discrete T i T . ) τ ( while the other ones i ower mean time shown in black dots t p having probability +1 t 0 1 2 3 t ∞ i X = τ nces between lower expected )= ain idle, therefore orithm r of better alternatives, while the e are no free meals and a higher B milar when investing in the stock i to run of the different algorithms, ns of second), let rocessor terminates at ctions: − i T >t n equal processors and two algorithms N ) efficient frontier ). Some of the obtained distributions t ( N A 1 . As in the previous case, we can define t S B i ) = Pr( N ) t ) t ( t ( i ( 1 (pure 1 all other are killed. p S Algorithm terminates, 1  i N B )= , S t  ) ( τ S ( N =1 i i X p t =0 is easier to compute on the basis of the survival function )= X τ t A ) ( executed) to t p that the portfolio terminates exactly at time ( )= B S ) t t halts precisely at time ( ≤ p i , the event that two processors terminate at t T A (only : (all processors are assigned to the same algorithm), this le ) t 0 ( N 1 CHAPTER 7. ALGORITHM PORTFOLIOS AND RESTART STRATEGIES S = ) = Pr( Figure 7.3: A portfolio strategy on a parallel machine. t 1 ( i n , and so on. The different runs are independent and therefore F t (there are parameter values that yield distributions with l going from 1 α dominated Consider time as a discrete variable (clock ticks or fractio 66 other for take more than are lower risk) and canchoice be eliminated among from the consideration non-dominated in possibilities favo (on the in the figure) hastime to or be lower risk. specifiedmarket: depending The choice while on along some the thereturn choices user Pareto comes are prefere frontier with obviously is a discardable, si higher ther risk. 7.2.1 Parallel processing Let us consider a different context [6] and assume that as illustrated in Fig. 7.3. Of course no processor should rem are multiplied. If are available so that one has to decide how many copies random variable associated with the termination time of alg The portfolio survival function of the single process other ones take more than the probability that process the corresponding cumulative probability and survival fun sum probabilities for different events: the event that one p To calculate the probability Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. i i- in A 67 ) t (7.4) ( i p ng for longer is allocated to of algorithm i ) α t ( i p . 2 i is given, together with − , use it to define a first 2 i ) n t ) ( A t i e frequently than one may ( risk of the portfolio can be p 2 among the two sets of copies ome cases only a slight “mix- f S t eactive search spirit. In the tion time 2 performance evaluation”, also gorithms. As before, the com- lved and the algorithm status. a single algorithm, for example i folios is considered in [2]. The hms properties of the individual al- ance [5]. Heavy-tailed distribu- xpected time but longer time to tegy. ) ation has to be modified to take e more conservative best-bound be good on the cases which are t en training and usage, exploiting ( ent has a relatively high probabil- ils of form, Pareto-L´evy namely: hoice executed depending on the off-line: a preliminary exhaustive 2 cally changed in an online manner p cal heavy-tailed behavior of which often quickly reaches a solu-  particularly effective when negatively 2 2 i n α  − 1 i − Cx 1 n . In branch-and-bound applications [6] one ≈ when more information is received and use it ) t ) ) ( successes at time t 1 ( i i S p 1 i ) t X > x ( ( is solved. The portion of CPU time 1 vice versa P p k , and the goal is to minimize the solution time of the b  k 1 1 b i n . A preliminary suggestion of dynamic online strategies is 1  f 1 2 1 . n n ≥ i being non-negative integers). ≤ ≤ 2 τ are unknown, or if they are only partially known, one has to i 1 2 2 i i i X + can then be used to calculate the mean and variance of the term ) ≤ ≤ 1 t i 0 0 ) ( t i ( p and p is a constant. )= 1 t i ( C ( p in the portfolio with a heuristic function which is decreasi i i = and A 2 2 i + and then refine the estimate of 1 i 1 f <α< 0 A “life-long learning” approach for dynamic algorithm port Portfolios can also be applied to component routines inside If the distributions Experiments with the portfolio approach [8, 6] show that in s estimated termination times each algorithm where general approach of ”droppingthe the mapping artificial boundary during betwe termed training, and Adaptive including Onlineinter-problem training Time AOTA time Allocation framework, in see [1], Fig. is 7.4, fully a in set of the algorit r present in [8]. strategies. In ation suitable can portfolio, be a preferableobtain depth-first to a strategy a first breadth solution. first strategy with lowerto e determine an acceptable move in a local-search based stra 7.3 Reactive portfolios The assumption ingorithms the are above known analysiscalculated, beforehand, is the so that efficient that the frontierrisk-aversion the statistical nature determined expected of and time the thestudy and user. of final the c The components strategy precedes is the therefore portfolio definition. finds that ensembles of “risky” strategies can outperform th ing” of strategies can beity beneficial of provided finding that a one compon solutioncorrelated fairly strategies quickly. are Portfolios aremore combined: also difficult one for algorithm the tends other to one, and resort to reactive strategies, where the strategy is dynami when more information isFor obtained example, about one the may task(s) derive being a so maximum-likelihood estimate o Similar although more complicatedplete knowledge formulas about hold for more al is updated after each new instance a sequence of problem instances nation times. Portfolios can be effective to “cure” the typi to define subsequent values of such that many complete search methods,expect, where in some very cases longtions leading runs are to characterized occur infinite by mor mean a or power-law infinite decay, vari also called ta value of When two algorithms areinto considered, account the the different probability ways comput to distribute 7.3. REACTIVE PORTFOLIOS whole set of instances. The model used to predict the termina Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. i A (7.5) (7.6) is used to tance τ = [13]: k τ if no solution is τ ) heavy-tailed. k n of algorithm ( ased on history . The distribution of completion time τ τ P not is the random variable . 1 ) − T τ i k ) A τ can be derived by observing istribution of solution times. us of each process P nce of runs of a randomized hm, with a different random τ lgorithm. The number of runs mod ventually the optimal solution tfolio is − t in the attraction basin around hoice to escape local minima! remaining iterations. Therefore, thm like simulated annealing is hich a constant (1 he problem. For example, heavy after a time ometric distribution with parame- ) ll running at time τ k T >t is ) . ] ) k starting the algorithm does not modify Pr( ⌋ mod t ( t/τ T > τ | ⌊ history ) ,...,m , each run corresponds to a Bernoulli trial — remember that ( τ τ τ τ ) f P − ) τ =1 ) T − t [ ← k τ ≤ i ∆ i τ ,f ) T α n P ,...,n < E ] )=(1 ,f T restarts and =1 [ m = Pr( i >t ⌋ E ,...,τ τ τ 1 is beneficial if its expected time to convergence is less than τ T P t/τ ( τ ⌊ ,...,b P i 1 f A ,b n ← ) = Pr( ) A t n ( τ ) of the restart strategy with restart time n S considering also the complete history of the last solved ins τ Figure 7.4: The inter-problem AOTA framework. -th problem instance ( CHAPTER 7. ALGORITHM PORTFOLIOS AND RESTART STRATEGIES ,..., -th algorithm ( T τ i k 1 f solved) ,...,α A for a slot of CPU time i ( one has had 1 i ,...,τ α t 1 ( ∀A A τ not ( k b k (local) Expected remaining time to completion of current ru (local) Fraction of CPU time dedicated to algorithm ( (input) (input) (input) Function estimating the expected completion time b (input) Function deciding time slice according to expected update history of update estimated termination run b ∀ update repeat update model initialize while , in fact the probability of a success at repetition τ P In general, a restart strategy consists of executing a seque Whether restart is beneficial or not depends of course on the d If the distribution is heavy-tailed, restart easily cures t If the algorithm is always restarted at time function AOTA repeat i i τ P i k VariableA Scope Meaning τ b α history (local) Collection of data about execution and stat f f found, and restartingseed. an The independent optimal run restart of strategy the is uniform, same i.e., algorit one in w algorithm, to solve a given instance, stopping each run 7.4 Defining anRestarting optimal an restart algorithm at time time 9. 5. 6. 8. 10. 1. 68 the expected additional time to converge given that it is sti 4. As a trivial example, ifthe the distribution expected is solution exponential, time. re tails can betrapped encountered in if the a neighborhoodwill stochastic be of visited, local a an local search enormousthe minimizer. number local algori of minimizer Although iterations before can e escaping. be Restart spen is a method of c which succeeds with probability 3. 7. 2. associated with termination timeexecuted of by an the unbounded restart runter strategy of before the success a follows a ge The tail decays now in an exponential manner: the restart por the survival function of the restart strategy is the termination time that at iteration Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. - 69 (7.9) (7.7) (7.8) (7.10) uration . g , i.e., the sum . The expected τ ) T τ ( thout censoring one . The demonstration /F (Bernoulli trials) and . In these cases the τ 1 c ) t τ sponding to ( for an unbounded run of F T red is termined by minimizing (7.7). ounts of CPU time in case of y long runs. In this case the tal run-time rategy, with cutoff sequence: ) ff time btained from the runs which θ he probability density function | ) ssful runs is equal to: c ) τ t ( θ ( | i ) G t with τF t ( t )) ) ved before time ( t g − τ d ( = ( F ) k t t F =1 F ( Y i d ) ) =1 )+ t. F ) − τ τ t t τ ( ( or later, so that the run is terminated pre- t<τ d τ ( ( )= ′ d 0 ) (1 F F τ θ t F P ) Z of the obtained sequence of termination times τ | ( i θ tF ′ τ | t − L − 0 + ( τ Z t tF ( τ τ L g d τ + )) = 0 ) k t =1 t t ∞ Y i ( Z c )= ( )= t d τ τ Z ) tp tF T t T ( ( )= ( ( τ t ′ θ 0 d E E )= d Z θ tF T | | ( c τ t 0 L ( Z c , this is equal to: , each run succeeds with probability ) L τ achieves a performance within a logarithmic factor of the ex t ( ′ ) F : ,... θ 8 )= , t ( 4 p , approach can be used. Censored sampling allows to bound the d 2 , given by maximizing the likelihood 1 ) , g k 1 , is the conditional cumulative distribution function corre 4 , is the cumulative distribution function of the run-time ) being the parameter to be identified from the experiments. Wi 2 θ ) | , ,...,t θ t τ 1 2 ( ( , , ) ,t G 1 F θ 1 , | t t 2 ( , If the distribution is known, an optimal cutoff time can be de In the discrete case: Calculating the run-time distribution can require large am g 1 = ( , The result follows from the fact that: With censoring, some experimental runs will exceed the cuto bound each run [10]. In this case, the expected value of the to 7.4. DEFINING AN OPTIMAL RESTART TIME Consider the cases when termination is within length of each run is: maturely. Because where where If the distribution(1 is not known, a universal non-uniform st and therefore: T is simple. For a given cutoff corresponding multiplicative term in (7.9) is substituted of run-times of the successful run, and all previous unsucce the mean number of trials before a successful run is encounte giving (7.7). the algorithm, i.e., the probability that the problem is sol can determine pected run-time of the optimal policy, see [10]heavy for tails details. becausecensored one sampling has toof wait for each the experimentalconverge termination run before of the and ver censoringas still threshold exploit [11]. the Let information us model o t Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. L + (the steps O during A L steps, the and i any instance steps accuracy into +1 single instance experiments in L O k multiple instance and the expected ) n model (for example . An upper bound O )) p O y assuming that runs − p L − a solver during the initial p ( he model accuracy ich can be used is given by e, in the L become available. Real-time p A ( metric model for the run-time efined: in the + ge in a total of A er. To be more precise, the dis- has to solve either n evidence available during the d. Given the results mentioned or the complete knowledge case nt and logarithmic factor which O ncreasing as soon as more data + od of (7.10), iii) use the estimated wise run up to a total of ime starting from both structural e. Some examples of parametric p del by maximizing the likelihood, s of the dynamic policy w.r.t. the ed target is reached. onger or shorter than the median. . A way to determine it is to ask O ce and the state of the solver c p t interesting case is between the two t ( for a given amount of time allocated / . Let us now consider more advanced . The estimate can be now minimized by ) )=1 R , so that the prediction can be used by dy- ( n ( ub E E , and an upper bound on the expected time to ) n )) ( independent L are used, only a bound and not the exact expected E p is the probability of a run ending within O i − p steps, while runs terminating between )(1 O A . An upper bound on the length of a single run is therefore ) − L p − + (1 L steps (observation horizon) )(1 as many instances as possible O A Ap CHAPTER 7. ALGORITHM PORTFOLIOS AND RESTART STRATEGIES )( steps. The probability of continuation, taking the limited − predict the length of a run and that the runs are O L − steps take exactly + (1 problem), see [7] for details. L L O + ( on the fraction of terminated runs (uncensored samples), ru Ap and the observation horizon. In spite of the crudeness of the u O L permit better results. ) = before restarting. R length of a run ( In [7, 9] it is shown how to use features capturing the state of One has to decide about a proper cutoff threshold The final receipt is therefore: i) choose an appropriate para 1. observe a run for 3. if the prediction is negative restart immediately, other 2. if the run is not terminated predict whether it will conver ub max instances for target strategies where at leastin the one previous of section, theseand it assumptions the looks is zero as relaxe knowledgecan if be case large the (within for problem a practicalsituations, is applications). multiplicative when solved Actually, consta a f the partialduring mos a knowledge run is or availableobservations during about which a the is sequence characteristics i ofa of a run runs specific on instan related instances phase of the runnamic to restart policies. Bayesianevidence models available to predict at the therun run (in beginning t a of “reactive” thecrimination manner) is run, are between trained and in “long” executio The a and dynamic supervised “short” policy mann runs, considered i.e., in runs [7] l is as follows: account, is varying solve a problem with the above policy is steps take exactly probability of convergence during a single run is therefore E on the expectedending number within of steps in a single run can be derived b 70 number of runs until a solution is found is parallel (or with interleaving) and stop as soondistribution, as ii) the desir determinewhere the some “best” terms parameters are substituted ofrun-time with the distribution the mo censored to likeliho models determine are the considered optimal in [3]. restart tim 7.5 Reactive restarts Up to now thethe assumption has been that the only observation wh as soon as possible,( or Because the model is not perfect, an important parameter is t no observations during the steps after static one are demonstrated. Three different contexts are d number of steps is minimized) significantly superior result probability of a correct prediction). If context one draws cases from a distribution of instances and context one has to solve a specific instance as soon as possibl Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. of 71 out 5 . 0 (2000), Dynamic 24 A bayesian , CP 2006 - Their Appli- (2001), no. 1-2, e (Menlo Park, CA, rategies 126 , ed.), vol. 2, Springer, . Chickering, . For example, indepen- Lauderdale, Florida), Jan context where one among nd Bart Selman, Heavy-tailed phenomena in at the beginning from one of e user - and a new sample is mpare this with the situation g the optimal restart policy is and the case when other pre- ndia, 2007, in press. ce of Constraint Programming mic programming, considering An economics approach to hard uns. Here restarting is clearly ider two different distributions 02, pp. 674–681. of instances which are solved in pp. 235–244. we know that 100 iterations are e obtained during the run about AT). The task is to find the opti- , Artif. Intell. r two distributions, one consisting , Seventeenth Conference on Uncer- , J. of Automated Reasoning of converging at iteration 10, probability 5 (1997), 51–54. . 0 , Proceedings AI and MATH ’06, Ninth International Algorithm portfolios , IJCAI 2007 - Twentieth International Joint Confer- 275 A neural network model for inter-problem adaptive online but now, after each unsuccessful run, the solver’s belief ab , Science ) ,... 2 ,t 1 t , Eighteenth national conference on Artificial intelligenc , Proceedings Artificial Neural Networks: Formal Models and ( Learning restart strategies Impact of censored sampling on the performance of restart st Dynamic algorithm portfolios , , , Symposium on Artificial Intelligence and Mathematics (Fort 2006. satisfiability and constraint satisfaction problems ence on Artificial Intelligence, January 6-12, Hyderabad, I USA), American Association for Artificial Intelligence, 20 restart policies approach to tacklingtainty hard in computational Artificial problems Intelligence (Seattle,USA), Aug 2001, computational problems Twelfth International Conference on- Principles Nantes, and France, Practi Springer, Berlin, Sep 2006, pp. 167–181. no. (1/2), 67–100. 43–62. time allocation cations - ICANNBerlin, 2005, 2005, 15th pp. Int. 7–12. Conf. (Warsaw) (W. Duch et al. The assumption of independence among runs is relaxed in [12] [5] Carla Gomes, Bart Selman, Nuno Crato, and Henry Kautz, [3] [9] Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes, a [8] Bernardo A. Huberman, Rajan M. Lukose, and Tad Hogg, [4] [6] Carla P. Gomes and Bart Selman, [7] E. Horvitz, Y. Ruan, C. Gomes, H. Kautz, B. Selman, and D. M [1] M. Gagliolo and J. Schmidhuber, [2] dence is not validseveral if probability more runs distributions. areof As on instances an the which same example, are100 instance solved conside picked iterations. in 10 If iterations,needed an the and instance other restarting is one wouldof not only solved a waste in single computing 10 distribution cycles. iterations with Co probability 7.6. SUMMARY converging at iteration 1000,useful with as independence shown amongseveral in the RTD section r is 7.4.extracted picked at at The the each workcorresponding beginning run in - to from [12] without satisfiable the considers informingmal or same the restart th unsatisfiable distribution policy instances (e.g., of cons S the source distribution canformulated be as a updated. Markov The decisionboth process problem the and of case solved findin in withdictors which dyna of only the the distributionthe termination can fact time be that is a used observed, SAT (for instance example is evidenc or is not satisfiable). 7.6 Summary Bibliography Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , Infor- , 2004. Optimal speedup of las vegas algorithms Restart policies with dependence among runs: A , Jon Wiley, New York, 1982. , (2002). Analysis and algorithms for restart (1993), no. 4, 173180. 47 CHAPTER 7. ALGORITHM PORTFOLIOS AND RESTART STRATEGIES Applied life data analysis mation Processing Letters dynamic programming approach [10] M. Luby, A. Sinclair, and D. Zuckerman, 72 [11] W. Nelson, [13] A. van Moorsel and K. Wolter, [12] Y. Ruan, E. Horvitz, and H. Kautz, Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. minded of . Running arms which, k racing can be obtained by using the problems intriguing is ng and intelligent heuristics ion. One wants to maximize interested in maximizing the ically our bets on the winning ptotically optimal algorithm is ergence time, or a lower risk this way some of the portfolio ient number of pulls to select estimate of the future potential rithms, or more runs of a given le future computing cycles, see payoff is near-optimal. The max sents a simple distribution-free ce is termed e value (GEV) payoff distribution ion into account. mize the allocation of time among d we get more and more informa- y of the previous iterations and the s to maximize the overall objective of served in optimization heuristics, but The smart player goes with the winners. . . to learn the different distributions and average trials. If the distribution is known one would (the maximum value obtained by a pull in the 73 n exploration . One is faced with a slot machine with the best pull -armed bandit problem k to pull the better arm once the winner becomes clear. One is re We have already seen that more advanced reactive strategies A related strategy using a “life-long” learning loop to opti A racing strategy is characterized by two components: i) the Racing is related to a paradigmatic problem in machine leani randomized strategy, to(variance), or obtain both. either a lower expecteda conv learning loopparameters while or the the portfolio restart or threshold can restart take scheme fresh run.a informat set In of alternative algorithms for solving a specific instan 8.1 Introduction Portfolios and restarts are simple ways to combine more algo Racing Chapter 8 algorithms are liketion horses, about after the the relativehorses, competition performance which is and are starte we assignedFig. can 8.1. a update growing period fraction of the availab given the current statecorresponding of results), the ii) search the (i.e.,minimizing allocation given of a the the function. histor CPU cycle known as the sequence). The paperwith a [7] high is probability an dedicated arm (an to hypothesis) determining whose a suffic version of the banditpresented problem in is [10], considered infor in the each assumption [4, of 3]. arm. aapproach. An generalized Our asym extrem explanation follows closely [9], which pre immediately pull only the better performing arm. What makes when pulled, yield athe payoff expected from total payoff a fixed over but a sequence unknownthat of distribut one has to split the effort between exploitation the critical exploration-versus-exploitation dilemmathere ob is antotal important payoff difference: but in in maximizing optimization one is not Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. (8.1) (8.2) CHAPTER 8. RACING n an approximated man- he information about the pper bound is derived from produces a random variable and an upper bound (with a stance minus the cumulative rage can be different from the 0 i i obtainable by knowing the best algorithm dedicated to the max x n = 2 n> thms) are evaluated periodically to 2 i ¯ if otherwise µ nµβ . At each iteration, the arm with the SCENT − ) i is for the classical bandit problem, which 2 -A α ,n µ Thres ) i ,n i µ (¯ U reward obtained icular distribution, one , see [1] for more details routine. n optimization problem, for i maximum    that all algorithms become n 2ln ity has been presented in a the q y, after pulling each arm once, s a way to allocate more runs to (different stochastic algorithms) nstance. ion an estimate of the expected s a value above threshold by the + ing Problem. The “reward” is the t no algorithm will reach it. The n rithms is to put a threshold n Fig. 8.3, with probability at least µ ropriate threshold is not known at erent ordering criteria, see [9] for ∗ k STIMATION δµ )=¯ -E + i upper bound average payoff α    µ, n 1) highest upper bound (¯ being the best arm expected reward) is not − U ∗ )    k µ ( NTERVAL , n 0 ∗ n,δ -I    ∗ ( is limited to at most: µ ← 3 arms < µ i    p i µ , n )2 regret 0    δ repetitions the upper bound is not wrong: HERNOFF − ← n    i (1 x Estimation R 1 2 3 4 5 6 7 ...... 1 2 3 4    +1 ) i i n payoff Interval Initialize ,n i } ← µ i (¯ U Figure 8.3: The C i , ..., k , receive payoff 2 R, n i ˆ , 1 + times: i x ∈{ n argmax i , for all arms and for all ← ← i 2) pull arm x i ˆ δ/ We are therefore not interested in cumulative reward, but in A similar algorithm based on Chernoff-Hoeffding’s inequal From this basic inequality, which does not depend on the part function Chernoff forall repeat − at any pull. A way to estimate the potential of different algo and to estimate thecorresponding probability empirical that frequency. each Unfortunately algorithmthe the produce beginning, app and oneindistinguishable may - end or up with with an a impossible trivial threshold, threshold so - tha so 6. and experimental results. 8.3 Aiming atOur the optimization maximum context is withaiming characterized threshold at by ascent discovering aexample set the different of maximum horses greedy value proceduresan for application characterized to an by the difffinal instance Resource result of Constrained obtained Project an by Schedul the a single algorithms run which of tend an to algorithm. get Racing better i results on the given i Figure 8.2: Racing with interval estimation. At each iterat payoff of each arm as well as1. its “error bar” are available. 8.3. AIMING AT THE MAXIMUM WITH THRESHOLD ASCENT previous work [1]. Inone their then pulls simple the UCB1 arm deterministic with the polic highest bound 5. derives that, if arms are(1 pulled according to the algorithm i 2. 4. 3. Therefore each suboptimal arm (with pulled many times and the expected Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. = 1 , the s is the i i,R ¯ ν n means a s ), and it is 1] , [0 cation. If CHAPTER 8. RACING . In the figure, nces. A larger } 1 (Round Robin). For larger , received rewards about the bounded in           i ∆ s n he bound is determined only in the past, an estimate of the − r so that even poor performers easily calculated from routine. alize, one assumes that payoffs 1 ) han ly raised until a selected number s is therefore not so obvious and it , ..., s ed on pure luck - so to speak), but a ≥ k Thres ppropriate threshold while the racing 2∆ SCENT , received payoffs -A ∆ ,         , which is updated in the algorithm. The 0   i ∈ {       R   ,   HRESHOLD ∆ arms      raise threshold until S payoffs are left above )                     s,n,δ R ( received by horse 1 2 3 4 5 6 7 ......           R   (raise threshold) received a value greater than i 0 ) i Ascent Figure 8.4: The T + ∆ payoff ← ,n i } controls the tradeoff between intensification and diversifi ν (¯ i,R s n U Thres i +1 , ..., k is the same as before. , receive payoff 2 i ˆ ← , values i,R U 1 n times: (number of payoffs received above threshold R 0 one start differentiating between the individual performa ∈{ n ← argmax s ← i Thres Initialize ← i,R pull arm n forall while i and the next algorithm to run is the one with the lowest ˆ i n The threshold starts from zero (remember that all values are The parameter function Threshold Thres forall repeat 8. 11. progressively raised so thatthreshold an appropriate is number reached. ofare less For integer t simplicity, multiples but of a it suitably is small easy to gener 5. heuristic solution presented in [9] reactively learns the a of experimented payoffs is left. scheme runs, see Fig. 8.5 and Fig. 8.4. Figure 8.5: Threshold ascent: the threshold is progressive 1. 76 2. 3. 4. frequency with with arm 10. 7. 6. 9. upper bound number of payoffs equal to the threshold becomesby so high that no algorithm reaches it: t values of probability that it will do so in the future. This quantity is more robust evaluation of thevery different large strategies value (not means bas have that a the threshold chance gets of lower being and selected. lowe The specific setting of looks like more work is needed. Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. ): θ 77 are (8.4) (8.6) (8.5) values θ approach. In error bars e over different to be optimized ) θ ( is the range for the implies a non-trivial C ) C n available, given the ochastic occurrence of hands-off away from the estimate ber of evaluations have θ,i and the configuration ) ( i ǫ c i ( n of independence of the I e available, the manifestly ng context (in particular for rlo fashion by considering a is terminated when a single er estimates of performance ected performance and error dP stances and runs, tending to e way in which ) for calculating error bars. One bars that show a clear inferior new test instance is generated ance a criterion eginning, ideally one could use 2 e configurations. Unfortunately case an off-line algorithm tuning 2 he intrinsic stochasticity in the θ,i lculate rror bar) can beat the pessimistic | nǫ B e best configuration of parameters guration problem is: c − termine that a candidate configu- ally sound ( on. Before deciding for elimination, ers and, at regular intervals, has to because it is going to be used for a e ume that the set of possible C 2 dP < ) ) ) θ ( θ,i C being more than ( >ǫ c k θ is the set of instances, C true Z I est estimation effort is more concentrated onto I E has been obtained, depending on available CPU E Z ǫ − = arg min ∗ )] = θ true θ,i E ( k c ( [ . This process is actually a bread-and-butter issue for re- I,C Prob E )= θ ( C , for example the average cost of the route in the above exampl θ is the following Lebesgue integral ( ) is significantly worse than the the current best configuratio θ ( j θ C is: There are two sources of randomness in the evaluation: the st The variations of the off-line racing technique depend on th The situation is illustrated in Fig. 8.6, at each iteration a The advantage is clear: costly evaluation cycles to get bett First, the above integral in (8.5) is estimated in a Monte-Ca In [8], racing is used to select models in a supervised learni for an heuristic solving repetitive problems [2]. Let’s ass est instances and different runs, the ideal solution of the confi with respect to is finite. For example,determine a the pizza best delivery route service(or to receives serve “configuration”), ord the even last if customers.long expensive, time In is this in worth the future. the effort an instance (withrandomized a algorithm while certain solving probability a distribution) given instance. and t Given where 8.4 Racing forThe off-line context here configuration is that ofθ of selecting meta-heuristics in an off-line manner th 8.4. RACING FOR OFF-LINE CONFIGURATION OF META-HEURISTICS cost of the best solution found in a run, depending on the inst time and application requirements. current state of the experimentation. and the surviving candidatesbars are are run updated. onperformance, Afterwards, the they if instance. are some eliminated Thea candidates from candidate exp further have checks considerati error to seereal whether value its of optimistic the value best (top performer, e see Fig.are 8.6. dedicated onlycandidate to emerges the as mostbeen the promising executed, winner candidates. or or when when Racing a a target certain error maximum bar num particular, one needsration a statistically sound criterion to de CPU cost, and one has to resort toset smarter of methods. instances.poor configurations Second, are discarded asthe so soon that most the as promisingsearchers candidates the in first heuristics, with estimates racing becom one aims at a statistic derived from the experimental data. “lazy” or memory-based learning).is Two based methods are on proposed samples: Hoeffding’s the bound probability that which the makes true the error only assumptio E Because the probability distributionsa are brute not force knowninfinity, approach, at to considering the calculate a b this the very approach expected large is tremendously value number costly, for of usually each each in run of to the ca finit Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. . In the δ CHAPTER 8. RACING    fidence parameter lable. Error bars are reduced n the best performer 4 and can at these can be evaluated with a eal value of their performance is responds to results of the various a-heuristics. At each iteration an a-heuristics. The most promising n upper bound average cost    highest upper bound pessimistic bound    X    best performer so far X: discarded because of inferior performance or similar performance to other configurations one or more winning candidate configurations algorithm configurations    "block" of results on instance i       X 1 2 3 4 5 6 7 ......    X X ...... payoff X X X XXXXXX theta_1 ..... theta_n instance_i instance_k instance_1 .... figure, configurations 2 andbe 6 immediately perform eliminated significantly worse fromat tha consideration the top (even of if the the error r bar they cannot beat number 4). candidate algorithm configurations aremore identified precise asap estimate so (more th configurations test instances). on the Each same block instance. cor Figure 8.7: Racing for off-line optimal configuration of met Figure 8.6: Racing for off-line optimal configuration of met estimate of thewhen expected more performance tests with are error executed, bars exact is value avai depends also on con 78 Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. le on 79 j (8.8) (8.9) (8.7) θ value regate , from for the l t δ ), one easily δ ed in favor of the , while the situation within block δ 2 θ j ristically estimated as − θ 1 of > ns about the statistical dis- e inferior in a statistically lj l is eliminated if the proba- )) ction in continuous space is R n is above the usual confidence ( of selecting the best candidate t so that all possible rankings ′ candidate configurations ion. Blocking through ranking uggestions for using a statisti- j 2 wise comparisons. Each block, rror to be less than 2 he posterior distribution of costs  he confidence parameter ∆) ed one can use Bayesian statistics, +1) − 4 , ..., e +1) 2 n ) n ( ( (1 k (1) /δ kn ′ j − n − ,e 2 j ) 2 degrees of freedom. If the observed log(2 R n lj 2 (  the sum of the ranks over all instances. The 1) j R B =1 lj − =1 n j r n j R n : , ..., e ( P T P in the figure, in favor of model =1 )= are fixed one can solve for the required number of k l 1) (1) 3 j =1 θ δ k l P e − n,δ k ( n P ǫ = ( ′ j j true = R T > E . : 1 ) θ ) is the confidence in the bound for a single model during a sing j true δ distributed with 3 n,δ E 2 1 algorithm (based on the Friedman test), in addition to an agg ( . From the results one gets a ranking − ( 2 θ i ǫ θ θ χ (1 is ACE Prob quantile of the distribution, the null hypothesis is reject and the confidence T ) δ ǫ − (1 . The value bound the largest possible error (in practice this can be heu n B Tighter error bounds can be derived by making more assumptio In [2] the focus is explicitly on meta-heuristics configurat Friedman test [5] considers the statistics the smallest to the largest, and an additional instance after the entire algorithm is terminated [8]. tribution. If thethis evaluation is errors the are secondbility normally method that distribut proposed a second inthreshold: model [8]. has One a candidate better mode expected performance solves for the error bar test over allsee candidates fig. performed 8.7 consists before of considering the pair results obtained by the different some multiple of the estimated standard deviation). Given t right-hand side of (8.6) (we want the probability of a large e samples Additional methods forcal shrinking method the known intervals, asconsidered as in blocking [6]. well are as explained in s [8].is Model used sele in their F-R where iteration, additional calculations provide a confidence is still undecided for model Under the nullare hypothesis equally that likely the candidates are equivalen Figure 8.8: Bayesian elimination of inferior models, from t of the differentsignificant manner, models for example one model can eliminate the models which ar 8.4. RACING FOR OFF-LINE CONFIGURATION OF META-HEURISTICS exceeds the If the accuracy Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. are h (8.10) θ onference and j inciples and θ c Selection for CHAPTER 8. RACING 2 , International Conference δ/ − y by Extreme Value Theory, 1 ektik, Technische Universit¨at es. Configurations er than at least another one. In >t A racing algorithm for configuring ” p, 2 Finite-time analysis of the multiarmed +1) 4 distribution. n ( t , John Wiley & Sons, December 1999. kn − , ed.), Morgan Kaufmann Publishers, 2002, 2 An asymptotically optimal algorithm for the ij Principles and practice of constraint program- R , 20th National Conference on Artificial Intelli- 1) k et al. =1 h − n j n R The max k-armed bandit: A new model for explo- (2002), no. 2/3, 235–256. P Memory based for validation 1)( The racing algorithm: Model selection for lazy learn- − (1997), no. 1-5, 193–225. , Conference on AI and Statistics, 1997. =1 − j k l k 47 ( R P 11 k “ A simple distribution-free approach to the max k-armed ) 1) − T , AAAI, 2006. n ( k − (1 k quantile of the Student’s 2 2) r A quantitative study of hypothesis selection δ/ − Practical nonparametric statistics , Proceedings of the Twelfth International Conference on Pr , Machine Learning , Proceedings of the Genetic and Evolutionary Computation C (1 , Lecture Notes in Computer Science, vol. 3258, ch. Heuristi is the 2 δ/ − , Artificial Intelligence Review 1 t on Machine Learning, 1995, pp. 226–234. and tuning of function approximators ers bandit problem Also available as: AIDA-2002-01Darmstadt, Technical Darmstadt, Report Germany, of pp. Intell 11–18. bandit problem Stochastic Search Optimization:pp. 197–211, Modeling Springer Solution Berlin Qualit / Heidelberg, 2004. metaheuristics ration applied to searchgence heuristic (AAAI-05), selection July 2005, Best Paper Award. (San Francisco, CA, USA) (W.B. Langdon Practice of Constraint Programming (CP 2006), 2006. ming cp 2004 max k-armed bandit problem [7] Philip W. L. Fong, [8] Oden Maron and Andrew W. Moore, [2] M. Birattari, T. L. St ¨utzle, Paquete, and K. Varrentrap [6] Artur Dubrawski and Jeff Schneider, [9] M. J. Streeter and S.F. Smith, [3] Vincent Cicirello and Stephen Smith, [5] W. J. Conover, [4] Vincent A. Cicirello and Stephen F. Smith, [1] Peter Auer, Nicol`oCesa-Bianchi, and Paul Fischer, [10] Matthew J. Streeter and Stephen F. Smith, considered different if: Bibliography where this case one proceed with a pairwise comparison of candidat hypothesis that at least one candidate tends to perform bett 80 Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. o. pecific mutual , or for a e form of the (Galileo Galilei) for a given search zero, knowledge of the words, the selected feature ? E.g., the running time to en time bounds, etc. he area of heuristics, finding e Section 9.2 and Section 9.3, nd the scope of this book, in the el from the examples one must h predicting power has to start which can be measures from the rch issues related to: evaluating networks, and machine learning. . The literature for selecting fea- nformation to predict the outputs. cal way with the notion of nsable building block. dividual algorithm components and y sound way to measure the uncer- . uncertainty 81 input features intrinsically more difficult mutual information the most effective heuristic for a given problem Measure what is measurable, and make measurable what is not s . The amount by which the uncertainty in the output decreases ) y | x ( , see below for the precise definition. Now, after one knows a s p entropy predict the future evolution of an heuristic , the uncertainty in the output can decrease, depending on th x (MI for short). technique? specific instance? How can I predict that a problem is How can I predict which is completion, the probability of finding a solution within giv How can I While a complete review of the literature in this area is beyo An output distribution is characterized by an • • In the scientific challenge of using learning strategies in t Let’s consider some challenging questions: • If the mutual information between a feature and the output is next sections we willinput briefly parameters (features), mention see some Sectionselecting interesting 9.1, measuring them resea in basedmeasuring on problem a difficulty, diversification see and Section bias 9.4. metric, se 9.1 Selecting featuresAs with in mutual all information scientificfrom challenges appropriate the measurements, development statistics, of models wit conditional distribution input value probability distribution of thetainty is outputs. with the The theoreticall tures is very richBefore in starting the to area learn ofbe sure pattern a that recognition, parametric the neural or inputThis data non-parametric qualitative (input mod features) criterion have caninformation sufficient be i made precise in a statisti after the input is known is termed Metrics, landscapes and features Chapter 9 appropriate metrics and appropriate features is an indispe input does not reduce the uncertainty in the output. In other Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. con- (9.1) nitial uncer- components is the n has a finite set of possible ting whether a run of an ion task where the output x ut (the desired prediction) is t the difficulty of estimating l possible information. True, lest possible, which contains with to obtain a highly-informative r how sophisticated our model. nt: why not using the raw data x om the data to an output class. t information, and only the actual nsional spaces. Heuristically, one s 0) within the next minute. Among ) c input features original data ( e) features. Its use in feature selection P ) ln c ( P c =1 otuput class N c X − )= c C ( CHAPTER 9. METRICS, LANDSCAPES AND FEATURES H classes and the input variable c holds here: if the dimension of the input is too large, the N internal parameters of the classifier feature extraction process x1 x2 x3 x4 ... , are the probabilities of the different output values, the i c , ..., N = 1 identifies one among ,c c curse of dimensionality ) c ( P Let’s now come to some definitions for the case of a classificat If The average uncertainty after knowing the feature vector Figure 9.1: A classifier mapping input features extracted fr 82 cannot be used (inThe isolation) MI to measure predict betweentherefore the a very output vector relevant of - to input no identifyis features promising matte pioneered (informativ and in the [1]. outp variable values, see Fig.algorithm on 9.1. an instance willthe converge For (class possible example, 1) features or not oneset, extracted (clas so may from that the think theconstruction data classification about of problem one predic the starts would classifier frominstead like is sufficien of left. features? Onebut For may the sure ask there atlearning this is task poi not becomes betterprobability unmanageable. way distributions to Think from use foraims samples al example at in abou very asufficient large-dime small information subset to of predict the features, output. possibly close to the smal tainty in the output class is measured by the entropy: Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , ) 83 X ( (9.5) (9.2) (9.3) (9.6) (9.4) f = mutual is equal y ) x ariable has correlation points looks c, ( ) P on the evident is: X, Y ry dependencies! ( C ables by the product is defined as: Y and σ . The conditional entropy , X ) x sed correlation measure. A ! , if the number of instances )) Y ) to 1 means increasing linear sented in [1], using Fraser’s ( rs. Actually, very little infor- r a high-dimensional feature and ) x Y oser the coefficient is to zero, 2 | nt. The correlation coefficient µ c have zero correlation, but zero n heuristic method which uses X ) le the plot of E X ( | ual if and only if the input fea- an evident direction. f between random variables X and − σ ) P ( C cond, as in the case that − ) nt to have a horse-race of different ith the output. he joint probability ( ) correlation only if you have reasons Y P Y Y 2 c,f ( H ) )( ) ln σ X,Y ( c Y E x ρ X ( X − ( P | ) given input c µ σ is the covariance. After simple transfor- ) P E ( X c C ( − P ( p E cov c ) ln is the Pearson product-moment ) X H =1 N c − X X (( : ( ) x, c c

E ( 2 )= ) P E x X = XY ( ; ( − ) and P c,x C X ) E ( 2 x Y I n =1 σ i X X X, Y )= ( X − C )= E σ and standard deviations ; C cov( p ; X Y )= ( probability of class µ x = X linear relationship = I relative to the mean is usually accompanied by an increase of | ( I C ( X X,Y X,Y and H ρ ρ X µ between variables conditional ) dependencies between two variables: it may well be that one v C relationships, use mutual information to estimate arbitra ; : X ( is the I linear . The amount by which the uncertainty decreases is by definiti ) ) , which is obtained by dividing the covariance of the two vari linear x x | ( c ( P P ) c ( An equivalent expression which makes the symmetry between As mentioned before, statistically independent variables The correlation value is between 1 and -1. Correlation close Although very powerful theoretically, estimating the MI fo Let’s note that the MI measure isThe widely different used measure from of the widely-u P ), close to -1 means a decreasing linear relationship. The cl to is always less-thantures or and equal-to the the output class initial are entropy. statistically independent: It t is eq information ditional entropy 9.2. MEASURING LOCAL SEARCH COMPONENTS where E is the expected value of the variable and where Y Y with expected values like an isotropic cloud around the expected values,correlation without does notdetects imply only that the variables are independe mations one obtains the equivalent formula: the weaker the correlation between the variables. For examp relationship (an increase of full information and actually determines the value of the se coefficient vector starting fromonly labeled the samples MI isalgorithm not between in a individual [9]. trivial features task. and the A outputfeature can is be pre informative even if not linearly correlated w of their standard deviations. In detail, the correlation 9.2 Measuring localTo search ensure components progress in algorithmicalgorithms research on it a is not setmation sufficie of can instances be and obtained declare by these winners kinds and of lose comparisons. In fact while still having zero correlation.to suspect To make it short: trust Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. = X (0) ) and ion of X bias is analyzed ort runs of a values ( e, random search f of the model. If one . Random Walk gen- ge nts are evaluated: } predicting, what are the s for a given problem are random walk ,n so that it can produce in- 1 a different set of instances { components on a benchmark asic components are investi- h the Hamming distance and between two binary strings rts from the zero string: ed portion of the search space oblem with non-oblivious local esults will be finally obtained, his method mitigates the effect prediction, etc. To make some roblem (or on a single instance) move, with uniform probability: an intelligent researcher (...and f instances”, it can determine a ) a method is better than another lways easy! – then he takes some nown that the basic compromise for a non-trivial task, the search is: t ion can be associated with different ANDOM X, Y ) t ( why . Given the obvious fact that only a R ( i x H n )= =1 bias i t X ( r ) and the average speed in Hamming distance as a measure of the distance between points )= and (0) ) is monitored and related to the final algorithm bias ,X ) where t ( ) t X generality and prediction power ( ( value ( CHAPTER 9. METRICS, LANDSCAPES AND FEATURES X H ) f generator of random instances t ( r µ diversification diversification = Hamming distance in the diversification-bias (D-B) plane for a suitable relat diversification properties of Random Walk +1) t ( values visited), the diversification of simple X is related to the f why relating the final performance to measures obtained after sh should be generated to visit preferentially points with lar ). The two requirements are conflicting: as an extreme exampl ) t ( X maximal elements . In this case the Hamming distance at iteration . In particular, the average 0) is given by the number of bits that are different. gated and a conjecturethe is formulated that the best component The diversification-bias metrics (D-B plots) of different b partial order (Sec. 9.3.2). The conjecture is validated bysuite a [2]. competitive analysis of the bias with mean to provide a basic system against which more complex compone After selecting the metric (diversification is measured wit Y , ..., • • Let us now consider the The investigation in [2] follows this• scheme: This exercise takes different forms depending on what one is A better method is to design a Without loss of generality, let us assume that the search sta 0 , diversification erates a Markov chain by selecting at each iteration a random from a starting configuration ( and performance. 9.3 Selecting componentsLet based us on focus diversification ontoto and local-search be based bias reached heuristics: is it that is well between k is optimal for diversificationmetrics. but not Here for we bias. adopt Diversificat the along the search trajectory. The Hamming distance method negligible fraction oftrajectory the admissible points can be visited to avoid the confinement( of the search in a limited and localiz is capable of predicting the– performance of course of before a thesteps technique run towards on is a understanding. finished, p predicting the past is a starting data, whatexamples, is the the work computational in effortsearch [2] aims dedicated spent at to on solving the the MAX-SAT pr one. Explaining (0 very motivated tovia get a publication!) careful tuning be of sure algorithm some parameters. promisingstances r used duringextracted the from development the same andof generator tuning “intelligent is phase, tuning used done for while winner the by in final the a test. researcher fairer T on horse-race, but a still finite does set not o explain for the benchmark is limited and if sufficient time is given to 84 Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , , t t = + 85 ) t (9.9) (9.7) (9.8) ( in an (9.10) b . LS H 2 f n/ , is solved for , is derived by t = 0 S , defined as and TS(0.1) from the (0) t ) ) b x t + ( for different algorithms. b x 2 ime , LS − orse solution value n be derived from the above. countered by LS, when diver- (1 T). Random walk evolution is p om Walk, the Hamming distance qual to 1 and ii) the bit is set to , see [2] for experimental details. d even for large iteration numbers + ) t ) t ( t ( Hamming distance as a function of b x ) b x p = 2 n , with initial value t p − = tends to its asymptotic value of 2 ) t ) ) ) t t (1 t ( ( ≈ ( i 2 b x − b ) b x t H ( 1 − n/ equal to 0.1, denoted as TS(0.1), are then run for n b ) the difference equation 9.8 for the evolution of the =1 H i X 2 1 f = and a remarkable difference is present for the two = T τ = 2 ) ) t t ( n/ ( ) + (1 the probability that a given bit is changed at iteration b x p b ) t by: If ( , is: the expected Hamming distance ) b x 0 t (0) ≥ t ,X ) t additional iterations. Fig. 9.2 shows the average ( Although the initial linear growth is similar to that of Rand Let us now compare the evolution of the mean Hamming distance The qualitative behavior of the average Hamming distance ca n X ( b the additional iterations after reaching the LS optimum does not reach the asymptotic value 10 algorithms. The fact that the asymptotic value is not reache and Fixed-TS with fractional prohibition For large The analysis is startedsification as becomes soon crucial. as the LS first local optimum is en always moves to the best neighbor, even if the neighbor has a w integer, probability exponential way, with a “time constant” Figure 9.2: Average Hamming distance reached by Random Walk At the beginning 9.3. SELECTING COMPONENTS BASED ON DIVERSIFICATION AND BIA first local optimumalso of reported LS, for with reference. standard deviation (MAX-3-SA and therefore the expected value of the Hamming distance at t H one obtains: considering the two possible1. events In that detail, i) after the defining bit as remains e The equation for It is straightforward to derive the followingTheorem theorem: 1 Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. y n 4 (9.11) (9.12) tends to . The above σ/n 2 n/ rajectory back to ): the fraction of √ 2 / = 500. = n σ is shown in Fig. 9.3. = 1 . q ternal parameters of the 2 tributed with a binomial n/ = then run for additional are obtained at each itera- tical after the first local op- emands that some stronger = 500 s the first local optimum is p decreases in the above expo- n e number of satisfied clauses on of the search space, with s to activate a restart after a 2 , the application of a move will tant he trajectory astray, away from is not reached by the trajectory, rsification schemes are not nec- re ( n/ 2 = np/ 2 θ np − e n 1 2 ≤ ] × pn )  θ n H − and then decrease. Because the ratio (1  2 values most strings are clustered in a narrow peak at and the standard deviation is ≤ n 2 n/ CHAPTER 9. METRICS, LANDSCAPES AND FEATURES H [ n/ P r values, most binary strings are at distance of approximatel n . As an example, one can use the Chernoff bound [11]: 2 n/ = is equal to H H ). The distribution of Hamming distances for 0 ≥ value, and an additional move could be selected to bring the t Figure 9.3: Probability of different Hamming distances for θ f tending to infinity, for large n from a given string. In detail, the Hamming distances are dis The compromise between bias and diversification becomes cri Clearly, if better local optima are located in a cluster that Let’s note that, for large The mean bias and diversification depend on the value of the in 2 9.3.1 The diversification-biasWhen compromise a (D-B plots) localtion search until component the isincreases first started, by local at new optimum least configurations essary is one. and encountered, potentially During because this dangerous,the phase local because th optimum. additional they dive could lead t timum is encountered.worsen In the fact, if the local optimum is strict strings at distance they will neverdiversification be found. action is Innumber executed. of other iterations words, that For a is a robust example, small algorithm multiple an d of option the time i cons 86 n/ implies that allbounded Hamming visited distance strings from the tend starting to point. lie in a confined regi It is well known that the mean is the starting local optimum. different components.encountered All by runs LS, proceed it as is follows: stored as and soon the a selected component is coefficients increase up to the mean distribution with the same probability of success and failu the probability to find a point at a distance less than zero for nential way ( Hamming distance Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. he 87 for itera- erefore n . Three 4 b H/n could be ex- n ≡ c H n c S H tie-breaking rule and average number of ) (label “gsat” in Fig. 9.4) , almost no tie is present , i.e., p n un starts from GSAT local n oint is marked with “0.0” in pts the “standard” oblivious n f “randomness” in the GSAT- tisfied clauses after crease is unexpected because ossible causes for the behavior e optimal frontier of Fig. 9.4 is . 9.4. Each point gives the D- o prohibitions are present in TS been tested and the results are -walk( [10], and a er of moves achieving the largest ity implies more diversification. ndom choice among variables in S, and HSAT. For each of these, ifferent algorithms. The Hamming , the decrease in Vice versa ation. p , while the mean Hamming distance p keeps the trajectory on a rough terrain p , while for Fixed-TS one can change the p . f , the other one deriving from the random breaking introduced in Sec. 6.1.1. Finally, for GSAT-with- ∆ from the stored local optimum and the final value p are collected. The values are then averaged over H u aspiration criterion for increasing NOB f b u , with statistical error bars. The hypothesis is confirmed. T p : parametric curves as a function of a single parameter are th f T , where flat portions tend to be rare. ), i.e., average Hamming distance divided by b f u , n c H ), and the effect of a simple f T GSAT, Fixed-TS(0.0), and GSAT-with-walk(0.0) coincide: n Different diversification-bias (D-B) plots are shown in Fig Because the first randomness source increases with The advantage of the D-B plot analysis is clear: it suggests p ) is plotted as a function of f unsatisfied clauses, for a specificdistance parameter setting is in the normalized d with respect to the problem dimensio basic algorithms aretwo considered: options GSAT-with-walk, about Fixed-T function, the the guiding other the functions non-oblivious are studied: one ado reached at firstit decreases contradicts and then theThe increases. intuitive reason for argument The thewith-walk that initial above algorithm result de more (see is stochastic Fig.unsatisfied that clauses, 6.2), active there one with are probability deriving two from sources the o ra obtained. and no stochasticFig. choice 9.4. is By present considering in the GSAT-with-walk. parametric curve The for GSAT-with p different tasks and different random number seeds. B coordinates ( walk one can change the probability parameter of ties if more variables achieve the largest of the number of unsatisfied clauses plained if the secondcollected source in Fig. decreases. 9.5, where This∆ the conjecture average has number of ties (numb fractional prohibition tions versus mean Hammingoptimum, distance. see [2]. MAX-3-SAT tasks. Each r iterations. The final Hamming distance Figure 9.4: Diversification-bias plane. Mean number of unsa 9.3. SELECTING COMPONENTS BASED ON DIVERSIFICATION AND BIA larger amount of stochasticityat higher implied values by of a larger one observes a gradual increase of when the non-oblivious functionFixed-TS( is used. The algorithm on th it is studied in [2]. of different algorithms, leading to a more focused investig Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. . B A is in b f , A ), for ≥ max- B p e A b f he graph and s: A diversification n D-B plots (Hamming distance maximal elements . The third measure and the ) points. The points are in GSAT-with-walk( b u values but, as soon as one nt algorithms , p wants to maximizebias the best n mobility algorithms for SAT and CSP. f c H ty r diversification and a smaller t can be tested experimentally. regions. The above conjecture is as metric is proposed in [2]. A iterations. ts is introduced: component rithms for optimization, the compo- n diversification h large 4 lts. . The ordering is partial because no B of the given relation if the other compo- and called “domination”, in introduced b u and ≥ ≤ A bias b u has better diversification but worse bias when . Values at A f and B n c H CHAPTER 9. METRICS, LANDSCAPES AND FEATURES maximal element ≥ A is a n c (average unsatisfied clauses) and H A values during a run, on the average, are the if satisfied clauses versus the Hamming distance, therefore th f B un possess both a higher diversification, and a better bias. In t depth ) if and only if it has a larger or equal diversification and bia B . do not ≥ . B B A ( n c H B ≥ A n components are in the lower–right corner of the set of ( c H A definition of three metrics is used in [18] [22] for studying Clearly, when one applies a technique for optimization, one The conjecture produces some “falsifiable” predictions tha By definition, component If local-search based components are used in heuristic algo speed) correspond closely to the above used Figure 9.5: Mean number of ties as a function of the probabili 88 of this point istested visited, experimentally the in search [2], must with proceed fully to satisfactory visit resu new The first two metrics the oblivious (OB) and non-oblivious (NOB) 9.3.2 A conjecture:Aa better algorithms conjecture are aboutrelation Pareto-optimal the i of relevance partial of order, the denoted diversification-bi by the symbol value found during theThe search run. trajectory This must visit value preferentially is points affected wit by both the compared with conclusions can be reached if, for example, the diversification-bias plane for the given partial order. In particular, a partial ordering of the different componen better than component dominates in a set of algorithms in the following way: given two compone and one plots the number of imal Conjecture characterized by the factnumber that of satisfied no clauses. other point has both a large nents in the set nents producing the best Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. 89 how for a . For a ”microscopic” ge is what one may have a large d deriving typical verage). After this ”macroscopic” global be- a level of understanding s along the search trajec- etween any unexplored as- re large, the variance in the f the search space. A two- derived from massive exper- ter and a gas with five flying a circular trajectory: if fresh ll behave in a similar way. As ied to constraint satisfaction tigations, see for example [6], ermined by how iner is not negligible. On the identified, no matter how it is bdivision with a very small and ls ethods, one would like to char- n one container is very close to e NP-hard for problems with bi- etween problem characteristics even if the individual motion is orst-case scenario is considered . lems is much more complicated eft as an exercise to the reader). xists which progressively sheds ng the trajectory and determine or an example, one can consider ps. Alternatively, one could con- age is intended to measure e precision of theoretical results permits ent in [12]. The SAT problem, in ete approximation of the coverage scopic quantities like pressure and s of systems [12], for example start- s, by identifying appropriate statis- ation of a string is the farthest point . Statistical mechanics has been very leading to a more difficult exploration. f an algorithm is better performing, here why measure is intuitively clear, the detailed defini- landscape largest unexplored gap in the search space why a problem is more difficult to solve speed of coverage (configurations with their probabilities of occurrence) an made at each search step interact to determine the the search explores the entire space. In other words, covera is defined as the size of the string (one tries to be on the safe side to estimate the real co ) takes a more global view at the search progress. In fact, one given of the system, in particular the function value ensembles coverage Unfortunately, measuring it is not so fast, actually it can b Once the motivation for a The effectiveness of a stochastic local search method is det Unfortunately the situation for combinatorial search prob coverage ( 9.4. HOW TO MEASURE PROBLEM DIFFICULTY tion and implementation is somewhat challenging. In [18] a w and binary string this issignment given and the by nearest the explored maximum assignment. Hamming distancenary b strings, and onesample has to points resort given to bythe approximations the [18]. maximum negation F minimum oftory. distance the The between visited rationale these points for alo from this points heuristic a and choice point is that the neg needs to ensurecamouflaged that in eventually the the search landscape. optimal solution will be mobility but nonethelessdimensional remain analogy confined is incorn that of is a bird not smallsystematically on flying portion the at o trajectory high it speed will along never discover it. Cover estimate is available one dividessider how by fast the coverage number decreases ofspeed). during search Dual the ste measures search on (a discr the constraints are studied in [22] 9.4 How toWhile measure before problem we concentrated difficulty on understanding we consider the issue of understanding stochastic local search method.and problem One difficulty. aims Becauseacterize at the statistical focus relationships properties is b of on the local solution search m havior local decisions successful in the past ating relating from local the and molecule-molecule global interactiontemperature. behavior to derive Statistical macro mechanicstical builds upon statistic imentation. Forproblems, example, in an particular extensive theparticular review graph the coloring of 3-SAT, problem, has models[7], is been [17], appl pres [20]. the playground for many inves global behaviors of thebehavior ensemble is members. very When small thean so numbers that example, most a if members one ofmolecules, the has the ensemble two wi probability communicatingother to containers hand, of find the one probability allzero to li molecules if observe the in 51% containers ofvery one complex, are the conta the filled molecules macroscopic i with behaviorhardly air will measurable produce at random a deviation normal 50% (the su pressure: molecule count isthan l the situations foris physics-related more problems limited. solight Nonetheless, that onto a different th growing aspects body of of combinatorial literature problems e and and explanation which goes beyond the simple empirical mode Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. em . As tion tree sub-optimal local ptimal solutions) under-constrained ste of search time . A good descriptive bility problems. ete and one must limit ty of common search lauses ( optimal solutions accounts h cost d depth of ems in the same transition ctors: i) number of solutions n widely analyzed, for a few onstrate that the size of the f temperature and pressure. ly correlated with the cost of m with which to analyze the hard problem instances are for random graphs [8, 4]: as a . At the other extreme, a large ty like connectivity change in a t sibilities and conclude with no entify difficult instances for NP- esults on the crossover point in expect that the search becomes that the mean distance between e in search cost. ry large, local search will easily ameters. For example, consider h fail only after descending very exity classes are defined through g zones where the most difficult r the SAT problem. The papers earch is used in the tests). rized by the abrupt change of its ed parts into coherent structures ric features.” y of experimental research see for transition between solvable and tion [23] considers the Job Shop tion to phase transitions and the rom a given local optimum instead l intelligence systems and cognitive cal value. “This phenomenon, anal- models) have been considered to explain the of problem difficulty aim at identifying measur- but the situation is not so simple. At the limit, if problem) implies that any tentative solution is quickly instances in between are the hardest ones. CHAPTER 9. METRICS, LANDSCAPES AND FEATURES massif central generating difficult instances to challenge algorithms number of solutions have been identified as a mechanism to study and explain probl . For example, a complete backtracking algorithm on the solu over-constrained descriptive cost models critically constrained analysis), see for example [19] for generating hard Satisfia , on average. This location also corresponds to a (the set of Boolean variables that have the same value in all o play a role. A large number of deep local minima is causing a wa instance characteristics (features) influencing the searc For backtracking, this is due to a competition between two fa For local search one has to be careful. The method is not compl Constraint satisfaction and SAT phase transitions have bee In addition to being of high scientific interest, identifyin Phase-transitions More empirical Big-Valley models [3] (a.k.a. The work [6] demonstrates that the logarithm of the number of worst-case find it. Not only the number of solutions but also the number an only one solution is available but the attraction basin is ve problem) implies many solutions,number it of is easy clauses to ( find one of them minima unsolvable instances and local searchregion show when very more long clauses are computing added times to for the SATand instances. probl ii) facility of pruning many subtrees. A small number of c the experimentation to solvableharder instances. with One a may smaller naively ruled out (pruned from the tree), it is fast to rule out all pos solution. The concentrated near theheuristics same parameter values for a wide varie difficulty. A phasemacroscopic transition properties in a at physical certainthe system values transitions characte of from thePhenomena ice defining analogous to to par phase waterfunction transitions to of have the been steam average studied atvery node rapid degree specific manner. some values macroscopic [14]models o proper predicts will undergo that sudden large-scaleas phase artificia their transitions from topological disjoint connectivityogous increases to beyond phase abehavior transitions criti of in large-scale computation nature, and provides determine its a gene newreferences paradig see [5]search [16] problem [19] is [15] present [21] in [17]. [13]. A A clear surprising introduc result is tha in a similar waydeeply in to the tentative search solutionsexample tree. in [6] Intuition backtracking for helps, results whic forrandom on a 3-SAT. CSP growing and bod SAT, [7] for experimental r problems are is very relevant for 90 strange as it may soundhard at problems the (let’s beginning, remember it that isa the not computational so compl easy to id able is positively correlatedScheduling to problem the (JSP) and solutionrandom demonstrates cost. local experimentally minima Thesolving and contribu the the problem nearest to optimal optimality solution (a simple is version high of tabu s model should account for a significant portion of thefor varianc a large[17] portion and of [20] thebackbone study variability the in distribution local of search SAT cost solutions fo and dem success of local search, and the preference for continuing f Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. f a- 91 , Pro- (9.13) , vol. 2, , Artificial be the search t X rvised neural net (1959), 290–297. arch problems gion. 6 (1989), no. 3, 190–260. , Neven Tomov, and Toby Phase transitions and the 1 (1997), no. ARTICLE 2, a function of problem size and (1994), no. 2. 2 instances of the same size [23]. , Information Processing Letters 16 )] then the ACF is points in time. Let µ 2 (1994), no. 4, 537–550. − auto-correlation of the time-series of σ 5 s , Principles and Practice of Constraint X Where the really hard problems are 2 )( σ µ − Experimental results on the crossover point in Independent coordinates for strange attractors , Publ. Math. Debrecen (1986), no. 2, 1134–1140. t X Critical behavior in the satisfiability of random On the big valley and adaptive multi-start for dis- [( 33 and variance E µ (1994), 1297–1301. )= (1996), no. 1-2, 1–15. t,s . The autocorrelation function of a random process (ACF) Reactive search, a history-sensitive heuristic for MAX- (1996), no. 1-2, 31–57. Phase transitions in artificial intelligence systems ( 264 , ORSA Journal on Computing 81 R 81 , Cambridge University Press, 2001. A guided tour of chernoff bounds , Operation Research Letters has mean , Phys. Rev. A On random graphs t X is a measure derived from the ACF of the range over which fluctu , Science (1987), no. 2, 155–171. , Artif. Intell. , Artif. Intell. 33 Using the mutual information for selecting features in supe Applications of statistical mechanics to combinatorial se Tabu search - part i Random Graphs , IEEE Transactions on Neural Networks Local search and the number of solutions in one region of space are correlated with those in another re , ACM Journal of Experimental Algorithmics (1989/90), 305–308. f correlation length pp. 357–406, World Scientific, Singapore, 1995. search problem Intelligence boolean expressions 33 Programming, 1996, pp. 119–133. Walsh, random 3-sat learning SAT from mutual information http://www.jea.acm.org/. ceedings of the 12th IJCAI (1991), 331–337. crete global optimizations The Unfortunately, for many problems the correlation length is [7] James M. Crawford and Larry D. Auton, [8] P. Erdos and A. Renyi, [2] R. Battiti and M. Protasi, [1] R. Battiti, [9] Andrew M. Fraser and Harry L. Swinney, [5] P. Cheeseman, B. Kanefsky, and W.M. Taylor, [3] KD Boese, AB Kahng, and S. Muddu, [6] David A. Clark, Jeremy Frank, Ian P. Gent, Ewan MacIntyre [4] Bollob´as, configuration at time t. If [13] Tad Hogg, Bernardo A. Huberman, and Colin P. Williams, of restarting from scratch. These models measure the BIBLIOGRAPHY values produced by adescribes random the walk correlation between the process at different [14] B.A. Huberman and T. Hogg, [15] Scott Kirkpatrick and Bart Selman, [12] Tad Hogg, tions of it does not explain the variance in computational cost among Bibliography [11] T. Hagerup and C. Rueb, [10] F. Glover, Technical Report DIT-07-049, Universit`adi Trento, July 2007. Copyright (C) 2007 Roberto Battiti, Mauro Brunato and Franco Mascia, all rights reserved. , , Problem ction problems (2003), no. 2, 189–217. , ch. Constraint Metrics ntelligence (AAAI-92) (San telligence (1994), 100–104. 143 Generating hard satisfiability nd L. Darrell Whitley, , AAAI/IAAI, 1997, pp. 340–345. (2000), 235–270. 12 , Artif. Intell. Local search characteristics of incomplete sat Hard and easy distributions of SAT problems Backbone Fragility and the Local Search Cost CHAPTER 9. METRICS, LANDSCAPES AND FEATURES (2001), no. 2, 121–150. (1996), no. 1-2, 17–29. 132 81 Theory and applications of satisfiability testing Clustering at the phase transition Phase transition and the mushy region in constraint satisfa , Artif. Intell. , Artif. Intell. , Journal of Artificial Intelligence Research procedures problems Peak for Local Search, pp. 269–281, Springer Verlag, 2005. Proceedings of the 11th European Conference on Artificial In difficulty for tabu search in job-shop scheduling Proceedings of theJose, Tenth Ca), National July 1992, Conference pp. on 459–465. Artificial I [19] Bart Selman, David G. Mitchell, and Hector J. Levesque, [20] Josh Singer, Ian Gent, and Alan Smaill, [21] B.M. Smith, [22] Finnegan Southey, [23] Jean-Paul Watson, J. Christopher Beck, Adele E. Howe, a [16] D. Mitchell, B. Selman, and H. Levesque, 92 [17] Andrew J. Parkes, [18] Dale Schuurmans and Finnegan Southey,