Optimization and Approximation

Optimization and Approximation Elisa Riccietti12 1Thanks to Stefania Bellavia, University of Florence. 2Reference book: Numerical Optimization, Nocedal and Wright, Springer 2 Contents I I part: nonlinear optimization 5 1 Prerequisites 7 1.1 Necessary and sufficient conditions . .9 1.2 Convex functions . 10 1.3 Quadratic functions . 11 2 Iterative methods 13 2.1 Directions for line-search methods . 14 2.1.1 Direction of steepest descent . 14 2.1.2 Newton's direction . 15 2.1.3 Quasi-Newton directions . 16 2.2 Rates of convergence . 16 2.3 Steepest descent method for quadratic functions . 17 2.4 Convergence of Newton's method . 19 3 Line-search methods 23 3.1 Armijo and Wolfe conditions . 23 3.2 Convergence of line-search methods . 27 3.3 Backtracking . 30 3.4 Newton's method . 32 4 Quasi-Newton method 33 4.1 BFGS method . 34 4.2 Global convergence of the BFGS method . 38 5 Nonlinear least-squares problems 41 5.1 Background: modelling, regression . 41 5.2 General concepts . 41 5.3 Linear least-squares problems . 43 5.4 Algorithms for nonlinear least-squares problems . 44 5.4.1 Gauss-Newton method . 44 5.5 Levenberg-Marquardt method . 45 6 Constrained optimization 47 6.1 One equality constraint . 48 6.2 One inequality constraint . 50 6.3 First order optimality conditions . 52 3 4 CONTENTS 6.4 Second order optimality conditions . 58 7 Optimization methods for Machine Learning 61 II Linear and integer programming 63 8 Linear programming 65 8.1 How to rewrite an LP in standard form . 66 8.2 Primal and dual problems . 69 8.3 Convex and strictly convex problems . 71 8.4 Geometry of Ω . 72 8.5 Simplex method . 77 8.5.1 How to choose a starting vertex of Ω . 84 8.5.2 Generalization of the algorithm to the degenerate case . 85 8.5.3 Advantages and disadvantages of the simplex method . 87 9 Flow networks problems 89 9.1 Minimum cost flow problem . 89 9.2 Maximum flow problem . 91 9.2.1 How to find a starting point of Ω . 99 Part I I part: nonlinear optimization 5 Chapter 1 Prerequisites n Let A be an open set of R and let n f : A ⊆ R −! R T x = (x1; : : : ; xn) 7−! f(x) An unconstrained optimization problem is a problem of the form min f(x); x where f is called the objective function. In the following we will assume f to be a nonlinear function. 1. If f is differentiable in x (i.e. if there exist all the partial derivatives of f in x), the n gradient of f in x is rf(x) 2 R : 0@f(x)1 B @x1 C B . C rf(x) = B . C : B . C @@f(x)A @xn n×n 2. If f is two times differentiable in x, the Hessian matrix of f in x is H(x) 2 R 0 T 1 0 @2f(x) @2f(x) 1 @f(x) ··· r B @x @x @x @x C B @x1 C B 1 1 1 n C B C B . C B . C H(x) = Hf (x) = B . C = B . C : B C B C @ @2f(x) @2f(x) A B @f(x)T C ··· @ r A @xn@x1 @xn@xn @xn If f 2 C2(x) then H(x) is a symmetric matrix. 3. Let us remind the first-order Taylor formula with Lagrange form of the remainder. Let f 2 C1(A). Let x; x + h 2 A with h 6= 0 such that the segment fx + th j t 2 [0; 1]g whose endpoints are x and x + h is contained in A. Then it exists t 2 (0; 1), depending on x and h, such that f(x + h) = f(x) + rf(x + th)T h: (1.1) 7 8 CHAPTER 1. PREREQUISITES 4. Let us remind the second-order Taylor formula with Lagrange form of the remainder. Let f 2 C2(A). Let x; x + h 2 A with h 6= 0 such that the segment fx + th j t 2 [0; 1]g whose endpoints are x and x + h is contained in A. Then it exists t 2 (0; 1), depending on x and h, such that 1 f(x + h) = f(x) + rf(x)T h + hT H(x + th)h: (1.2) 2 5. f is convex in A if 8x; y 2 A f(tx + (1 − t)y) ≤ tf(x) + (1 − t)f(y); 8t 2 [0; 1]: 6. f is strictly convex in A if 8x; y 2 A f(tx + (1 − t)y) < tf(x) + (1 − t)f(y); 8t 2 (0; 1): Definition 1.0.1. Let x∗ 2 A. • x∗ is a local minimizer or a local minimum point for f if it exists a neighbourhood Ω of x∗ such that f(x∗) ≤ f(x) 8x 2 Ω: f(x∗) is a local minimum of f. • x∗ is a global minimizer or a global minimum point for f if f(x∗) ≤ f(x) 8x 2 A: f(x∗) is a global minimum of f. • x∗ is a stationary point for f if rf(x∗) = 0: Definition 1.0.2. Directional derivatives n Let p 2 R and f differentiable in a neighbourhood of x. The directional derivative of f in x with respect to the direction p is defined as @f f(x + hp) − f(x) (x) = lim : (1.3) @p h!0 h It holds (see TD1) @f (x) = rf(x)T p: @p Definition 1.0.3. Descent directions n A direction p 2 R is a descent direction for f in x if @f (x) = rf(x)T p < 0; @p π i i.e., if the angle # between p and rf(x) is such that # 2 ; π . 2 If rf(x) 6= 0, we can always find a descent direction: that of the antigradient −∇f(x). From (1.3) it follows that if p is a descent direction, then it exists h¯ > 0 such that f(x + hp) − f(x) < 0 8h 2 (0; h¯): 1.1. NECESSARY AND SUFFICIENT CONDITIONS 9 1.1 Necessary and sufficient conditions In this section we give necessary and sufficient conditions for a point to be a minimum point. Theorem 1.1.1. First order necessary condition Let f 2 C1(Ω) in a neighbourhood Ω of x∗. x∗ is a minimizer for f (in Ω) =) rf(x∗) = 0, i.e. x∗ is a stationary point for f: Proof. We prove it by contradiction. Let us assume that rf(x∗) 6= 0. Let p = −∇f(x∗) the antigradient of f in x∗, clearly p 6= 0. The function g(x) = rf(x)T p is such that g(x∗) = rf(x∗)T p = −∇f(x∗)T rf(x∗) = −k∇f(x∗)k2 < 0: Then, as f 2 C1(Ω) and so g is continuous in Ω, it will remain negative in a neighbourhood of ∗ x , i.e., it exists T 2 R; T > 0 such that 8t 2 [0;T ] 0 > g(x∗ + tp) = rf(x∗ + tp)T p: (1.4) From (1.1), 8τ 2 (0;T ), it exists t 2 (0; 1) such that f(x∗+τp) = f(x∗)+rf(x∗+ tτ p)T τp = f(x∗)+τ rf(x∗ + t0p)T p < f(x∗): |{z}0 | {z } = t 2(0;T ) < 0 from (1.4) Then x∗ cannot be a minimum point for f, which leads us to a contradiction. This is just a necessary conditions, all minimizers are stationary points but not all stationary points are minimizers, they may be maximizers or saddle points. Theorem 1.1.2. Second order necessary condition Let f 2 C2(Ω) for a neighbourhood Ω of x∗. x∗ is a minimum point for f (in Ω) =) H(x∗) is positive semidefinite: Proof. We do the proof by contradiction. Let us assume that H(x∗) is not positive semidefinite, n T ∗ i.e. that it exists p 2 R ; p 6= 0 such that p H(x )p < 0: Let us define g(x) := pT H(x)p; ∗ 2 it holds g(x ) < 0. Then, as f 2 C (Ω) and so g is continuous in Ω, it exists T 2 R; T > 0 such that 8t 2 [0;T ] 0 > g(x∗ + tp) = pT H(x∗ + tp)p: (1.5) From (1.2), 8τ 2 (0;T ), it exists t 2 (0; 1) such that 1 f(x∗ + τp) = f(x∗) + rf(x∗)T τp + (τp)T H(x∗ + tτ p)τp = 2 | {z } |{z}0 = 0 from Theorem 1.1.1 = t 2(0;T ) 1 = f(x∗) + τ 2 pT H(x∗ + t0p)p < f(x∗): 2 | {z } < 0 from (1.5) Then x∗ cannot be a minimum point for f, which leads us to a contradiction. 10 CHAPTER 1. PREREQUISITES In the following theorem we show a sufficient condition: if this condition is satisfied, we are sure to have a minimum point. This is a second order condition: first order derivatives are not enough to establish a sufficient condition. Theorem 1.1.3. Sufficient second-order condition Let f 2 C2(Ω) for a neighbourhood Ω of x∗. rf(x∗) = 0 =) x∗ is a minimum point for f: H(x∗) is positive definite ∗ ∗ ∗ Proof. As H(x ) is positive definite, it exists a neighbourhood B = BT (x ) of x such that n 8x 2 B the matrix H(x) remains positive definite. Then for every p 2 R , 8τ 2 (0;T ) it exists t 2 (0; 1) such that 1 f(x∗ + τp) = f(x∗) + rf(x∗)T τp + (τp)T H(x∗ + tτ p)τp = 2 | {z } |{z}0 = 0 = t 2(0;T ) 1 = f(x∗) + τ 2 pT H(x∗ + t0p)p > f(x∗); 2 | {z } 2B | {z } > 0 i.e.

Optimization and Approximation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support