Can We Measure the Difficulty of an Optimization Problem?
Total Page:16
File Type:pdf, Size:1020Kb
Can we measure the difficulty of an optimization problem? Tansu Alpcan Tom Everitt Marcus Hutter Dept. of Electrical and Electronic Engineering Department of Mathematics Research School of Computer Science The University of Melbourne, Australia Stockholm University Australian National University Email: [email protected] Email: [email protected] Email: [email protected] Abstract—Can we measure the difficulty of an optimization The goal of this paper is to develop a framework for problem? Although optimization plays a crucial role in modern measuring difficulty of optimization problems. The conceptual science and technology, a formal framework that puts problems framework introduced builds upon Shannon and Algorithmic and solution algorithms into a broader context has not been established. This paper presents a conceptual approach which Information Theories [9], [10] and establishes a link between gives a positive answer to the question for a broad class of opti- probabilistic and algorithmic approaches to optimization. A mization problems. Adopting an information and computational distinctive feature of the proposed framework is the explicit perspective, the proposed framework builds upon Shannon and focus on algorithmic information for measuring (open-box) algorithmic information theories. As a starting point, a concrete optimization difficulty. model and definition of optimization problems is provided. Then, a formal definition of optimization difficulty is introduced which The outline of the paper is as follows. The next section builds upon algorithmic information theory. Following an initial presents the preliminary definitions and the adopted model. analysis, lower and upper bounds on optimization difficulty Section III contains the main definitions and results, including are established. One of the upper-bounds is closely related to upper and lower bounds on the optimization difficulty measure Shannon information theory and black-box optimization. Finally, introduced. The paper concludes with a discussion on future various computational issues and future research directions are discussed. research directions in Section IV. II. DEFINITIONS AND MODEL I. INTRODUCTION This paper studies mathematical optimization problems [11] Considering the broad applicability of optimization as a that are commonly expressed as discipline, it is not surprising that the difficulty of optimization max f(x) subject to gi(x) ≤ 0; i = 1; : : : ; m; (1) problems has been investigated before, e.g. [1], [2]. Likewise, x the (No) Free Lunch Theorems which explore the connection where the n-dimensional real-valued vector x 2 Rn is the between effective optimization algorithms and the problems decision variable, the function f is the objective function and they solve have generated a lot of interest in the research gi are the constraint functions. The list of constraints define community [3]–[5]. the solution or search space A which is a assumed to be a While the work [1] has presented an excellent overview of compact subset of the n-dimensional real space, A ⊂ Rn. various aspects which make an optimization problem “diffi- As a starting point for developing an algorithmic and cult”, it has not presented a unifying conceptual framework for information-theoretic characterization of optimization diffi- measuring “difficulty”. The complexity of optimization prob- culty, it is assumed here that the function f is Lipschitz- lems has been discussed in [6]. The discussion on (No) Free continuous on A and the optimization problem (1) admits a ∗ Lunch Theorems [3]–[5], [7] is more focused on performance feasible global solution x = arg maxx2A f(x). of algorithms for various problems rather than analyzing For a given scalar " > 0 and compact set A, let A(") difficulty. be an "-discretization of A constructed using the following Moreover, Free Lunch Theorems focus on black-box opti- procedure. Let C be a finite covering of A with hypercubes of mization problems where the objective function is not known side length at most ". For each cube C 2 C, let xC 2 C \ A. to the optimizer unlike their counterpart “open-box” problems Finally, let A(") be the set of all xC , C 2 C. Thus, A(") is considered in this paper where the objective function is known a finite subset of A, with the same cardinality as C. If the and its properties are actively used to find the solution. The pa- cubes in C do not overlap, we call discretization based on C per [4] partly discusses complexity-based ranking of problems non-overlapping. following an algorithmic information approach similar to the To allow for a computational treatment of optimiza- one proposed here, however, does not aim to create a broader tion problems, an encoding of problems as (binary) strings framework. The paper [8] has proposed the use of instance must be chosen. The “standard calculus symbols” we base and Kolmogorov complexities as an estimator of difficulty for these descriptions on are: finite precision real numbers black-box optimization problems. 0, 1:354;::: ; variables x1; x2;::: ; elementary functions +; ·; exp;::: ; parenthesis; relations ≤; =;::: .A function Correctness of Solutions n ! is an expression formed by elementary functions, real R R To ensure correctness, any alleged δ - argmax solution is numbers and the variables x ; : : : ; x , and a constraint on n " 1 n R paired with a polynomially verifiable certificate s of the the is a formula on the form g(x ; : : : ; x ) ≤ 0 with g : n ! . 1 n R R correctness. In general, the trace (step-by-step reporting) of Binary representations of functions and constraints may then a correct optimization algorithm forms one example of a be reached in a straightforward manner by giving each symbol (linearly verifiable) certificate of the returned argmax. To a binary encoding (e.g. ASCII). Through concatenation of the verify such a certificate, it suffices to check that each step symbol encodings, functions and constraints receive natural of the trace corresponds to the definition of the algorithm, and binary encodings as well. If e is an expression, let `(e) denote that the final step of the trace outputs the proposed argmax. the length of its binary encoding. A general type of certificates (not specific to a particular Based on the model and assumptions introduced, a formal class or optimization algorithm) may for example be based on definition of the optimization problem (1) is provided next. formal proofs in first-order logic or type-theory [12]. Indeed, Definition II.1 (Optimization problem). A (discretizable) opti- many automated theorem proving systems have developed mization problem on Rn is a tuple hf; c; "i where f : Rn ! R formalizations of analysis [13], which could potentially form is the objective function, c is a list of constraints on Rn the basis of a suitable proof system. expressed using functional (in)equalities, and " > 0 is a Definition II.3 (Verifiable Solutions). Consider the optimiza- discretization parameter. tion problem in Definition II.1, and a suitable proof system The constraints delineate the search space A, over which f T offering polynomially verifiable certificates s of candidate is to be optimized. For simplicity, assume that A is non-empty solutions. A solution of the optimization problem hf; c; "i is and compact and that f is Lipschitz-continuous over A. Let defined as a pair hx∗; si where s is a certificate in T that x∗ A(") " A be an -discretization of . is a δ"- argmax for hf; c; "i. An argmax of f on A is a point x∗ 2 A satisfying 8x 2 ∗ It is beyond the scope of this paper to describe a suitable A : f(x) ≤ f(x ).A discrete argmax (or δ"- argmax) is a proof system T in detail. The paper will instead rely on semi- point x^ satisfying 8x 2 A : f(x) ≤ f(^x) + δ" where δ" = maxfjf(x) − f(^x)j : kx − x^k < "g and k·k is the maximum formal proof sketches in examples, and polynomial verifiabil- norm. ity in abstract arguments. For concreteness, we will assume that certificates in T can be verified in time dnq. That is, we assume the existence of a verifier for certificates in T with Discretized and Approximate Solutions runtime at most dnq for certificates of length n. The definition above accepts δ - argmax (rather than true " Illustrative Example argmax) as a solution to the constrained optimization problem. The discretization parameter " then effectively states how Consider the optimization problem close the discrete argmax needs to be to the true argmax. 2 The somewhat involved definition of δ"- argmax also allows hf(x) = 3x + x; x ≥ −1; x ≤ 1; " = 0:001i : (2) points further than " away from the correct answer, as long as those points differ less in their target value than some point A solution is δ"- argmax = 1. The following certificate within " of the argmax. Under the adopted Lipschitz-continuity sketch establishes the correctness of the solution. Note that assumption on objective functions, if the Lipschitz constant is the certificate below is not formal. However, a suitable formal k, then the desired solution differs at most δ in target value system should yield certificates of similar size. from the optimum, if one chooses " = δ=k. 1) df=dx = 6x + 1 (derivative) The following is a sufficient condition for a point x^ in 2) 6x+1 = 0 () x = −1=6 (properties of real numbers) the discrete search space A(") being a discrete argmax: If 3) roots(df=dx) = {−1=6g (from 1 and 2) it holds for all x 2 A(") that f(x) ≤ f(^x), then x^ must be a 4) boundary = {−1; 1g (from c) δ"- argmax. 5) x 62 roots(df=dx)^x 62 boundary(c) =): argmax(x) Next, a formal definition of non-trivial problems is provided. (calculus) Non-trivial problems will be the focus of this paper. 6) argmax = −1=6 _ argmax = −1 _ argmax = 1 (from 3–5) Definition II.2 (Non-trivial problems).