EFFICIENT METHODS in OPTIMIZATION Th´Eorie De La Programmation Non-Lin´Eaire Anatoli Iouditski
Total Page:16
File Type:pdf, Size:1020Kb
1 Universit´e Joseph Fourier Master de Math´ematiques Appliqu´ees , 2`eme ann´ee EFFICIENT METHODS IN OPTIMIZATION Th´eorie de la programmation non-lin´eaire Anatoli Iouditski Lecture Notes Optimization problems arise naturally in many application fields. Whatever people do, at some point they get a craving for organizing things in a best possible way. This intention, converted in a mathematical form, appears to be an optimization problem of certain type (think of, say, Optimal Diet Problem). Unfortunately, the next step, consisting of finding a solution to the mathematical model is less trivial. At the first glance, everything looks very simple: many commercial optimization packages are easily available and any user can get a “solution” to his model just by clicking on an icon at the desktop of his PC. However, the question is, how much he could trust it? One of the goals of this course is to show that, despite to their attraction, the general optimization problems very often break the expectations of a naive user. In order to apply these formulations successfully, it is necessary to be aware of some theory, which tells us what we can and what we cannot do with optimization problems. The elements of this theory can be found in each lecture of the course. This course itself is based on the lectures given by Arkadi Nemirovski at Technion in late 1990’s. On the other hand all the errors and inanities you may find here should be put on the account of the name at the title page. http://www-ljk.imag.fr/membres/Anatoli.Iouditski/cours/optimisation-convexe.htm 2 Contents 1 Introduction 5 1.1 General formulation of the problem ....................... 5 1.1.1 Problem formulation and terminology .................. 5 1.1.2 Performance of Numerical Methods ................... 8 1.2 Complexity bounds for Global Optimization .................. 10 1.3 Identity cards of the fields ............................ 15 1.4 Rules of the game ................................. 17 1.5 Suggested reading ................................. 17 2 When everything is simple: 1-dimensional Convex Optimization 19 2.1 Example: one-dimensional convex problems .................. 19 2.1.1 Upper complexity bound ......................... 20 2.1.2 Lower complexity bound ......................... 24 2.2 Conclusion ..................................... 27 2.3 Exercises ...................................... 27 2.3.1 Can we use 1-dimensional optimization? ................ 27 2.3.2 Extremal Ellipsoids ............................ 30 3 Methods with linear convergence 37 3.1 Class of general convex problems: description and complexity ........ 37 3.2 Cutting Plane scheme and Center of Gravity Method ............. 39 3.2.1 Case of problems without functional constraints ............ 39 3.3 The general case: problems with functional constraints ............ 44 3.4 Lower complexity bound ............................. 48 3.5 The Ellipsoid method ............................... 54 3.5.1 Ellipsoids ................................. 55 3.5.2 The Ellipsoid method ........................... 57 3.6 Exercises ...................................... 60 3.6.1 Some extensions of the Cutting Plane scheme ............. 60 3.6.2 The method of outer simplex ...................... 71 3 4 CONTENTS 4 Large-scale optimization problems 73 4.1 Goals and motivations .............................. 73 4.2 The main result .................................. 74 4.3 Upper complexity bound: Subgradient Descent ................. 76 4.4 The lower bound ................................. 79 4.5 Subgradient Descent for Lipschitz-continuous convex problems ........ 80 4.6 Bundle methods .................................. 84 4.6.1 The Level method ............................ 86 4.6.2 Concluding remarks ............................ 90 4.7 Exercises ...................................... 92 4.7.1 Around Subgradient Descent ....................... 92 4.7.2 Mirror Descent .............................. 96 4.7.3 Stochastic Approximation ........................ 101 4.7.4 The Stochastic Approximation method ................. 105 5 Nonlinear programming: Unconstrained Minimization 109 5.1 Relaxation ..................................... 109 5.2 Gradient method ................................. 110 5.3 Newton method .................................. 117 5.3.1 Gradient Method and Newton Method: What is different? ...... 121 5.4 Newton Method and Self-Concordant Functions ................ 125 5.4.1 Preliminaries ............................... 125 5.4.2 Self-concordance ............................. 126 5.4.3 Self-concordant functions and the Newton method ........... 127 5.4.4 Self-concordant functions: applications ................. 130 5.5 Conjugate gradients method ........................... 133 5.6 Exercises ...................................... 137 5.6.1 Implementing Gradient method ..................... 137 5.6.2 Regular functions ............................. 139 5.6.3 Properties of the gradient ........................ 139 5.6.4 Classes of differentiable functions .................... 142 6 Constrained Minimization 145 6.1 Penalty and Barrier function methods ...................... 145 6.2 Self-concordant barriers and path-following scheme .............. 150 6.2.1 Path-following scheme .......................... 151 6.2.2 Applications ................................ 156 6.2.3 Concluding remarks ............................ 158 Lecture 1 Introduction (General formulation of the problem; Important examples; Black Box and Iterative Methods; Analytical and Arithmetical Complexity; Uniform Grid Method; Lower complexity bounds; Lower bounds for Global Optimization; Rules of the Game.) 1.1 General formulation of the problem Let us start with fixing the mathematical form of our main problem and the standard terminology. 1.1.1 Problem formulation and terminology Let x be an n-dimensional real vector: x =(x(1),...,x(n)) ∈ Rn, G be a subset of Rn,and functions f(x),g1 ...gm(x) are some real-valued function of x. In the entire course we deal with some variants of the following minimization problem: min f(x), s.t.: gj(x)&0,j=1...m, (1.1.1) x ∈ G, where & could be ≤, ≥ or =. We call f(x)theobjective function, the vector function g(x)=(g1(x),...,gm(x)) is called the functional constraint,thesetG is called the basic feasible set,andtheset Q = {x ∈ G | gj(x) ≤ 0,j=1...m}, is called the feasible set of the problem (1.1.1).1 There is some natural classification of the types of minimization problems: 1That is just a convention to consider the minimization problems. Instead, we could consider a maxi- mization problem with the objective function −f(x). 5 6 LECTURE 1. INTRODUCTION • Constrained problems: Q ⊂ Rn. • Unconstrained problems: Q ≡ Rn. • Smooth problems:allgj(x) are differentiable. • Nonsmooth problems: there is a nondifferentiable component gk(x), • Linearly constrained problems: all functional constraints are linear: n gj(x)= ai,jxi + bj ≡aj,x + bj,j=1...m, i=1 (here ·, · stands for the inner product in Rn), and G is a polyhedron. If f(x) is also linear then (1.1.1)isaLinear Programming Problem.Iff(x) is quadratic then (1.1.1)isaQuadratic Programming Problem. There is also some classification in accordance to the properties of the feasible set. • Problem (1.1.1) is called feasible if Q = ∅. • Problem (1.1.1) is called strictly feasible if ∃x ∈ int Q such that gj(x) < 0(or> 0) for all inequality constraints and gj(x) = 0 for all equality constraints. Finally, we distinguish different types of solutions to (1.1.1): • x∗ is called the optimal global solution to (1.1.1)if f(x∗) ≤ f(x) for all x ∈ Q (global minimum). Then f(x∗) is called the optimal value of the problem. • x∗ is called a local solution to (1.1.1)if f(x∗) ≤ f(x) for all x ∈ int Q¯ ⊂ Q (local minimum). Let us consider now several examples demonstrating the origin of the optimization prob- lems. Example 1.1.1 Let x(1) ...x(n) be our design or decision variables. Then we can fix some functional characteristics of our decision: f(x),g1,...,gm(x). That could be the price of the project, the amount of the required resources, the reliability of the system, and many others. We fix the most important characteristics, f(x), as our objective. For all others we impose some bounds: aj ≤ gj(x) ≤ bj. 1.1. GENERAL FORMULATION OF THE PROBLEM 7 Thus, we come up with the problem: min f(x), s.t.: aj ≤ gj(x) ≤ bj,j=1...m, x ∈ G, where G stands for the structural constraints, like positiveness or boundedness of some variables, etc. Example 1.1.2 Let our initial problem be as follows: Find x ∈ Rn such that g1(x)=a1, ... (1.1.2) gm(x)=am. Then we can consider the problem: m 2 min (gj(x) − aj) x j=1 (may be with some additional constraints on x). Note that the problem (1.1.2) is almost universal. It covers ordinary differential equa- tions, partial differential equations, problems, arising in Game Theory, and many others. Example 1.1.3 Sometimes our decision variable x(1) ...x(n) must be integer, say, we need x(i) ∈{0, 1}. That can be described by the constraint: x(i)(x(i) − 1) = 0,i=1...n. Thus, we could treat also the {0,1} Programming Problems: min f(x), s.t.: aj ≤ gj(x) ≤ bj,j=1...m, x ∈ G, x(i)(x(i) − 1) = 0,i=1...n. 8 LECTURE 1. INTRODUCTION Looking at these examples, a reader can understand the enthusiasm of the pioneers of nonlinear programming, which can be easily recognized in the papers of 1950 – 1960. Thus, our first impression should be as follows: Nonlinear Optimization is very important and perspective application theory. It covers almost all fields of Numerical Analysis. However,