How to Advance in Structural Convex Optimization

op TI M A 7 8 NOVEMBER 2008 page 2 How to advance in Structural Convex Optimization Yurii Nesterov October, 2008 Abstract called oracle, and it is local, which means that it is not changing if the function is In this paper we are trying to analyze the modified far enough from the test point. common features of the recent advances This model of interaction between the in Structural Convex Optimization: optimization scheme and the problem data polynomial-time interior-point methods, is called the local Black Box. At the time of smoothing technique, minimization in its development, this concept fitted very relative scale, and minimization of composite well the existing computational practice, functions. where the interface between the general optimization packages and the problem Keywords convex optimization · non- data was established by Fortran subroutines smooth optimization · complexity theory created independently by the users. · black-box model · optimal methods Black-Box framework allows to speak · structural optimization · smoothing about the lower performance bounds technique. for diffierent problem classes in terms of informational complexity. That is the lower Mathematics Subject Classification (2000) estimate for the number of calls of oracle 90C06 · 90C22 · 90C25 · 90C60. which is necessary for any optimization method in order to guarantee delivering an Convex Optimization is one of the rare ε-solution to any problem from the problem fields of Numerical Analysis, which benefit class. In this performance measure we do from existence of well-developed complexity not include at all the complexity of auxiliary theories. In our domain, this theory was computations of the scheme. created in the middle of the seventies in Let us present these bounds for the most a series of papers by A.Nemirovsky and important classes of optimization problems D.Yudin (see [8] for full exposition). It posed in the form consists of three parts: – Classification and description of problem min f(x), (1) instances. x∈Q – Lower complexity bounds. – Optimal methods. n where Q ⊆ R is a bounded closed convex set (⎢⎢x⎜⎜ ≤ R, x ∈ Q), and function f is In [8], the complexity of a convex convex on Q. In the table below, the first optimization problem was linked with its column indicates the problem class, the level of smoothness introduced by Hölder second one gives an upper bound for conditions on the first derivatives of allowed number of calls of the oracle in functional components. It was assumed 1 the optimization scheme , and the last that the only information the optimization column gives the lower bound for analytical methods can learn about the particular complexity of the problem class, which problem instance is the values and derivatives depends on the absolute accuracy ε and the of these components at some test points. class parameters. This data can be reported by a special unit This paper was written during the visit of the author at IFOR (ETH, Zurich). The author expresses his gratitude to the Scientific Director of this center Hans-Jacob Lüthi for his support and excellent working conditions. Yurii Nesterov Center for Operations Research and Econometrics (CORE) Université catholique de Louvain (UCL) 34 voie du Roman Pays, 1348 Louvain-la-Neuve, Belgium. Tel.: +32-10-474348 Fax: +32-10-474301 e-mail: [email protected] op TI M A 7 8 NOVEMBER 2008 page put them into the black box for numerical exclusion of variables. methods. That is the main conceptual Imagine for a moment that we do not contradiction of the standard Convex know how to solve the system of linear Optimization. equations. In order to discover the above Intuitively, we always hope that the scheme we should apply the following structure of the problem can be used for improving the performance of minimization Golden Rules schemes. Unfortunately, structure is a very fuzzy notion, which is quite difficult to 1. Find a class of problems which a (2) formalize. One possible way to describe can be solved very efficiently. It is important that these bounds are exact. the structure is to fix the analytical type 2. Describe the transformation This means that there exist methods, which of functional components. For example, rules for converting the initial have efficiency estimates on corresponding we can consider the problems with linear problem into desired form. problem classes proportional to the lower constraints only. It can help, but this . Describe the class of problems bounds. The correspondingoptimal methods approach is very fragile: If we add just a for which these transformation were developed in [8,9,19,22,2]. For single constraint of another type, then we rules are applicable. further references, we present a simplified get a new problem class, and all theory must a version of the optimal method [9] as applied be redone from scratch. In our example, it is the class 2 to the problem (1) with f ∈ C2: On the other hand, it is clear that having of linear systems with triangular the structure at hand we can play a lot with matrices. (4) Choose a starting point y0 ∈ Q and set the analytical form of the problem.We can x−1 = y0. For k ≥ 0 iterate: rewrite the problem in many equivalent In Convex Optimization, these rules were settings using non-trivial transformations used already several times for breaking of variables or constraints, introducing down the limitations of Complexity Theory. additional variables, etc. However, this Historically, the first example of that would serve almost no purpose without type is the theory of polynomial-time () fixing a clear final goal. So, let us try to interior-point methods (IPM) based on self- understand what it could be. concordant barriers. In this framework, the As usual, it is better to look at classical class of easy problems is formed by problems As we see, the complexity of each iteration examples. In many situations the sequential of unconstrained minimization of self- of this scheme is comparable with that of the reformulations of the initial problem can concordant functions treated by the Newton simplest gradient method. However, the rate be seen as a part of numerical scheme. We method. This know-how is further used in of convergence of method () is much faster. start from a complicated problem P and, the framework of path-following schemes After a certain period of time, it became step by step, change its structure towards to for solving so-called standard minimization clear that, despite its mathematical the moment we get a trivial problem (or, a problems. Finally, it can be shown that by excellence, Complexity Theory of Convex problem which we know how to solve): a simple barrier calculus this approach can Optimization has a hidden drawback. P → … → (f *, x*). be extended onto all convex optimization Indeed, in order to apply convex A good example of such a strategy is the problems with known structure (see [11,18] optimization methods, we need to be standard approach for solving system of for details). The efficiency estimates of sure that functional components of our linear equations corresponding schemes are of the order problem are convex. However, we can check 1/2 υ Ax = b. O(υ ln –ε ) iterations of the Newton convexity only by analyzing the structure of We can proceed as follows: method, where υ is the parameter of these functions: If our function is obtained 1. Check if A is symmetric and positive corresponding self-concordant barrier. from the basic convex functions by convex definite. Sometimes this is clear from the Note that for many important feasible operations (summation, maximum, etc.), we origin of the matrix. sets this parameter is smaller than the conclude that it is convex. If not, then we 2. Compute Cholesky factorization of this dimension of the space of variables. Hence, have to apply general optimization methods matrix: for the pure Black-Box schemes such an which usually do not have theoretical A = LLT ; efficiency is simply unreachable in view guarantees for the global performance. where L is a lower-triangular matrix. of the lower complexity bound for class Thus, the functional components of Form two auxiliary systems C (see (2)). It is interesting that formally the problem are not in the black box the Ly = b, LT x = y. the modern IPMs look very similar to the moment we check their convexity and . Solve these system by sequential usual Black-Box schemes (Newton method choose minimization scheme. However, we plus path-following approach), which 1 If this upper bound is smaller than O(n), then the dimension of the problem is really very big, and we were developed in the very early days of cannot afford the method to perform this amount of calls. Nonlinear Optimization [4]. However, this 2 In method (11)-(13) from [9], we can set ak = 1 + k/2 since in the proof we need only to ensure is just an illusion. For complexity analysis 2 2 ak+1 – ak ≤ ak+1. of polynomial-time IPM, it is crucial that 3 Numerical verification of convexity is an extremely difficult problem. op TI M A 7 8 NOVEMBER 2008 page 4 they employ the special barrier functions difficult. Usually we can work directly with which do not satisfy the local Black-Box the structure of our problem, at least in the assumptions (see [10] for discussion). cases when it is created by us. The second example of using the rules Thus, the basis of the smoothing technique Thus, up to a logarithmic factor, for (4) needs more explanations. By certain [12,1] is formed by two ingredients: obtaining the same accuracy, the methods circumstances, these results were discovered the above observation, and a trivial but based on smoothing technique need only with a delay of twenty years.

Load more