No Free Lunch Theorems for Optimization 1 Introduction
Total Page:16
File Type:pdf, Size:1020Kb
No Free Lunch Theorems for Optimization David H Wolp ert IBM Almaden ResearchCenter NNaD Harry Road San Jose CA William G Macready Santa Fe Institute Hyde Park Road Santa Fe NM Decemb er Abstract A framework is develop ed to explore the connection b etween eective optimization algorithms and the problems they are solving A numb er of no free lunch NFL theorems are presented that establish that for any algorithm any elevated p erformance over one class of problems is exactly paid for in p erformance over another class These theorems result in a geometric interpretation of what it means for an algorithm to be well suited to an optimization problem Applications of the NFL theorems to information theoretic asp ects of optimization and b enchmark measures of p erformance are also presented Other issues addressed are timevarying optimization problems and a pr ior i headtohead minimax distinctions b etween optimization algorithms distinctions that can obtain despite the NFL theorems enforcing of a typ e of uniformity over all algorithms Intro duction The past few decades have seen increased interest in generalpurp ose blackb ox optimiza tion algorithms that exploit little if anyknowledge concerning the optimization problem on which they are run In large part these algorithms havedrawn inspiration from optimization pro cesses that o ccur in nature In particular the two most p opular blackb ox optimization strategies evolutionary algorithms FOW Hol and simulated annealing KGV mimic pro cesses in natural selection and statistical mechanics resp ectively In light of this interest in generalpurp ose optimization algorithms it has b ecome im p ortant to understand the relationship b etween howwell an algorithm a p erforms and the optimization problem f on which it is run In this pap er we present a formal analysis that contributes towards such an understanding by addressing questions like the following Given the plethora of blackb ox optimization algorithms and of optimization problems how can we b est match algorithms to problems ie how b est can we relax the blackb ox nature of the algorithms and have them exploit some knowledge concerning the optimization problem In particular while serious optimization practitioners almost always p erform such matching it is usually on an ad ho c basis how can suchmatching b e formally analyzed More generally what is the underlying mathematical skeleton of optimization theory b efore the esh of the probability distributions of a particular context and set of optimization problems are im p osed What can information theory and Bayesian analysis contribute to an understanding of these issues How a priori generalizable are the p erformance results of a certain algorithm on a certain class of problems to its p erformance on other classes of problems How should weeven measure such generalization howshouldwe assess the p erformance of algorithms on problems so that wemay programmatically compare those algorithms Broadly sp eaking wetaketwo approaches to these questions First weinvestigate what apriori restrictions there are on the pattern of p erformance of one or more algorithms as one runs over the set of all optimization problems Our second approach is to instead fo cus on a particular problem and consider the eects of running over all algorithms In the current pap er wepresent results from b oth typ es of analyses but concentrate largely on the rst approach The reader is referred to the companion pap er MW for more kinds of analysis involving the second approach We b egin in Section byintro ducing the necessary notation Also discussed in this section is the mo del of computation we adopt its limitations and the reasons wechose it One might exp ect that there are pairs of search algorithms A and B suchthat A per forms b etter than B on average even if B sometimes outp erforms A As an example one might exp ect that hillclimbing usually outp erforms hilldescending if ones goal is to nd a maximum of the cost function One might also exp ect it would outp erform a random search in sucha context One of the main results of this pap er is that such exp ectations are incorrect We prove two NFL theorems in Section that demonstrate this and more generally illuminate the connection b etween algorithms and problems Roughly sp eaking weshow that for b oth static and time dep endent optimization problems the average p erformance of any pair of algorithms across all p ossible problems is exactly identical This means in particular that if some algorithm a s p erformance is sup erior to that of another algorithm a over some set of optimization problems then the reverse must b e true over the set of all other optimization problems The reader is urged to read this section carefully for a precise statement of these theorems This is true even if one of the algorithms is random any algorithm a p erforms worse than randomly just as readily over the set of all optimization problems as it p erforms b etter than randomlyPossible ob jections to these results are also addressed in Sections and In Section we present a geometric interpretation of the NFL theorems In particular we show that an algorithms average p erformance is determined byhow aligned it is with the underlying probability distribution over optimization problems on which it is run This Section is critical for anyone wishing to understand how the NFL results are consistentwith the wellaccepted fact that many search algorithms that do not takeinto account knowledge concerning the cost function work quite well in practice Section demonstrates that the NFL theorems allow one to answer a numb er of what would otherwise seem to b e intractable questions The implications of these answers for measures of algorithm p erformance and of how b est to compare optimization algorithms are explored in Section In Section we discuss some of the ways in which despite the NFL theorems algo rithms can have a pr ior i distinctions that hold even if nothing is sp ecied concerning the optimization problems In particular weshow that there can b e headtohead minimax distinctions b etween a pair of algorithms it ie we show that considered one f at a time a pair of algorithms may b e distinguishable even if they are not when one lo oks over all f s In Section we presentanintro duction to the alternative approach to the formal analysis of optimization in which problems are held xed and one lo oks at prop erties across the space of algorithms Since these results hold in general they hold for any and all optimization problems and in this are indep endent of the what kinds of problems one is more or less likely to encounter in the real world In particular these results state that one has no a priori justication for using a search algorithms b ehavior so far on a particular cost function to predict its future b ehavior on that function In fact when cho osing b etween algorithms based on their observed p erformance it do es not suce to make an assumption ab out the cost function some currently p o orly understo o d assumptions are also b eing made ab out how the algorithms in question are related to each other and to the cost function In addition to presenting results not found in MW this section serves as an intro duction to p ersp ective adopted in MW We conclude in Section with a brief discussion a summary of results and a short list of op en problems Wehave conned as many of our pro ofs to app endices as p ossible to facilitate the ow of the pap er A more detailed and substantially longer version of this pap er a version that also analyzes some issues not addressed in this pap er can b e found in WM Finallywe cannot emphasize enough that no claims whatso ever are b eing made in this pap er concerning howwell various search algorithms work in practice The fo cus of this pap er is on what can b e said a priori without any assumptions and from mathematical principles alone concerning the utility of a search algorithm Preliminaries We restrict attention to combinatorial optimization in which the search space X though p erhaps quite large is nite We further assume that the space of p ossible cost values Y is also nite These restrictions are automatically met for optimization algorithms run on digital computers For example typically Y is some or bit representation of the real numb ers in such a case The size of the spaces X and Y are indicated by jX j and jY j resp ectively Optimization problems f sometimes called cost functions or ob jective functions or energy func X tions are represented as mappings f X Y F Y is then the space of all p ossible jX j problems F is of size jY j avery large but nite numb er In addition to static f we shall also b e interested in optimization problems that dep end explicitly on time The extra notation needed for such timedep endent problems will b e intro duced as needed It is common in the optimization community to adopt an oraclebased view of computa tion In this view when assessing the p erformance of algorithms results are stated in terms of the numb er of function evaluations required to nd a certain solution Unfortunately though many optimization algorithms are wasteful of function evaluations In particular many algorithms do not rememb er where they have already searched and therefore often revisit the same p oints Although any algorithm that is wasteful in this fashion can b e made more ecient simply by rememb ering where it has b een cf tabu search Glo Glo many realworld algorithms elect not to employ this stratagem Accordingly