Optimization and Approximation
Total Page:16
File Type:pdf, Size:1020Kb
Optimization and Approximation Elisa Riccietti12 1Thanks to Stefania Bellavia, University of Florence. 2Reference book: Numerical Optimization, Nocedal and Wright, Springer 2 Contents I I part: nonlinear optimization 5 1 Prerequisites 7 1.1 Necessary and sufficient conditions . .9 1.2 Convex functions . 10 1.3 Quadratic functions . 11 2 Iterative methods 13 2.1 Directions for line-search methods . 14 2.1.1 Direction of steepest descent . 14 2.1.2 Newton's direction . 15 2.1.3 Quasi-Newton directions . 16 2.2 Rates of convergence . 16 2.3 Steepest descent method for quadratic functions . 17 2.4 Convergence of Newton's method . 19 3 Line-search methods 23 3.1 Armijo and Wolfe conditions . 23 3.2 Convergence of line-search methods . 27 3.3 Backtracking . 30 3.4 Newton's method . 32 4 Quasi-Newton method 33 4.1 BFGS method . 34 4.2 Global convergence of the BFGS method . 38 5 Nonlinear least-squares problems 41 5.1 Background: modelling, regression . 41 5.2 General concepts . 41 5.3 Linear least-squares problems . 43 5.4 Algorithms for nonlinear least-squares problems . 44 5.4.1 Gauss-Newton method . 44 5.5 Levenberg-Marquardt method . 45 6 Constrained optimization 47 6.1 One equality constraint . 48 6.2 One inequality constraint . 50 6.3 First order optimality conditions . 52 3 4 CONTENTS 6.4 Second order optimality conditions . 58 7 Optimization methods for Machine Learning 61 II Linear and integer programming 63 8 Linear programming 65 8.1 How to rewrite an LP in standard form . 66 8.2 Primal and dual problems . 69 8.3 Convex and strictly convex problems . 71 8.4 Geometry of Ω . 72 8.5 Simplex method . 77 8.5.1 How to choose a starting vertex of Ω . 84 8.5.2 Generalization of the algorithm to the degenerate case . 85 8.5.3 Advantages and disadvantages of the simplex method . 87 9 Flow networks problems 89 9.1 Minimum cost flow problem . 89 9.2 Maximum flow problem . 91 9.2.1 How to find a starting point of Ω . 99 Part I I part: nonlinear optimization 5 Chapter 1 Prerequisites n Let A be an open set of R and let n f : A ⊆ R −! R T x = (x1; : : : ; xn) 7−! f(x) An unconstrained optimization problem is a problem of the form min f(x); x where f is called the objective function. In the following we will assume f to be a nonlinear function. 1. If f is differentiable in x (i.e. if there exist all the partial derivatives of f in x), the n gradient of f in x is rf(x) 2 R : 0@f(x)1 B @x1 C B . C rf(x) = B . C : B . C @@f(x)A @xn n×n 2. If f is two times differentiable in x, the Hessian matrix of f in x is H(x) 2 R 0 T 1 0 @2f(x) @2f(x) 1 @f(x) ··· r B @x @x @x @x C B @x1 C B 1 1 1 n C B C B . C B . C H(x) = Hf (x) = B . C = B . C : B C B C @ @2f(x) @2f(x) A B @f(x)T C ··· @ r A @xn@x1 @xn@xn @xn If f 2 C2(x) then H(x) is a symmetric matrix. 3. Let us remind the first-order Taylor formula with Lagrange form of the remainder. Let f 2 C1(A). Let x; x + h 2 A with h 6= 0 such that the segment fx + th j t 2 [0; 1]g whose endpoints are x and x + h is contained in A. Then it exists t 2 (0; 1), depending on x and h, such that f(x + h) = f(x) + rf(x + th)T h: (1.1) 7 8 CHAPTER 1. PREREQUISITES 4. Let us remind the second-order Taylor formula with Lagrange form of the remainder. Let f 2 C2(A). Let x; x + h 2 A with h 6= 0 such that the segment fx + th j t 2 [0; 1]g whose endpoints are x and x + h is contained in A. Then it exists t 2 (0; 1), depending on x and h, such that 1 f(x + h) = f(x) + rf(x)T h + hT H(x + th)h: (1.2) 2 5. f is convex in A if 8x; y 2 A f(tx + (1 − t)y) ≤ tf(x) + (1 − t)f(y); 8t 2 [0; 1]: 6. f is strictly convex in A if 8x; y 2 A f(tx + (1 − t)y) < tf(x) + (1 − t)f(y); 8t 2 (0; 1): Definition 1.0.1. Let x∗ 2 A. • x∗ is a local minimizer or a local minimum point for f if it exists a neighbourhood Ω of x∗ such that f(x∗) ≤ f(x) 8x 2 Ω: f(x∗) is a local minimum of f. • x∗ is a global minimizer or a global minimum point for f if f(x∗) ≤ f(x) 8x 2 A: f(x∗) is a global minimum of f. • x∗ is a stationary point for f if rf(x∗) = 0: Definition 1.0.2. Directional derivatives n Let p 2 R and f differentiable in a neighbourhood of x. The directional derivative of f in x with respect to the direction p is defined as @f f(x + hp) − f(x) (x) = lim : (1.3) @p h!0 h It holds (see TD1) @f (x) = rf(x)T p: @p Definition 1.0.3. Descent directions n A direction p 2 R is a descent direction for f in x if @f (x) = rf(x)T p < 0; @p π i i.e., if the angle # between p and rf(x) is such that # 2 ; π . 2 If rf(x) 6= 0, we can always find a descent direction: that of the antigradient −∇f(x). From (1.3) it follows that if p is a descent direction, then it exists h¯ > 0 such that f(x + hp) − f(x) < 0 8h 2 (0; h¯): 1.1. NECESSARY AND SUFFICIENT CONDITIONS 9 1.1 Necessary and sufficient conditions In this section we give necessary and sufficient conditions for a point to be a minimum point. Theorem 1.1.1. First order necessary condition Let f 2 C1(Ω) in a neighbourhood Ω of x∗. x∗ is a minimizer for f (in Ω) =) rf(x∗) = 0, i.e. x∗ is a stationary point for f: Proof. We prove it by contradiction. Let us assume that rf(x∗) 6= 0. Let p = −∇f(x∗) the antigradient of f in x∗, clearly p 6= 0. The function g(x) = rf(x)T p is such that g(x∗) = rf(x∗)T p = −∇f(x∗)T rf(x∗) = −k∇f(x∗)k2 < 0: Then, as f 2 C1(Ω) and so g is continuous in Ω, it will remain negative in a neighbourhood of ∗ x , i.e., it exists T 2 R; T > 0 such that 8t 2 [0;T ] 0 > g(x∗ + tp) = rf(x∗ + tp)T p: (1.4) From (1.1), 8τ 2 (0;T ), it exists t 2 (0; 1) such that f(x∗+τp) = f(x∗)+rf(x∗+ tτ p)T τp = f(x∗)+τ rf(x∗ + t0p)T p < f(x∗): |{z}0 | {z } = t 2(0;T ) < 0 from (1.4) Then x∗ cannot be a minimum point for f, which leads us to a contradiction. This is just a necessary conditions, all minimizers are stationary points but not all stationary points are minimizers, they may be maximizers or saddle points. Theorem 1.1.2. Second order necessary condition Let f 2 C2(Ω) for a neighbourhood Ω of x∗. x∗ is a minimum point for f (in Ω) =) H(x∗) is positive semidefinite: Proof. We do the proof by contradiction. Let us assume that H(x∗) is not positive semidefinite, n T ∗ i.e. that it exists p 2 R ; p 6= 0 such that p H(x )p < 0: Let us define g(x) := pT H(x)p; ∗ 2 it holds g(x ) < 0. Then, as f 2 C (Ω) and so g is continuous in Ω, it exists T 2 R; T > 0 such that 8t 2 [0;T ] 0 > g(x∗ + tp) = pT H(x∗ + tp)p: (1.5) From (1.2), 8τ 2 (0;T ), it exists t 2 (0; 1) such that 1 f(x∗ + τp) = f(x∗) + rf(x∗)T τp + (τp)T H(x∗ + tτ p)τp = 2 | {z } |{z}0 = 0 from Theorem 1.1.1 = t 2(0;T ) 1 = f(x∗) + τ 2 pT H(x∗ + t0p)p < f(x∗): 2 | {z } < 0 from (1.5) Then x∗ cannot be a minimum point for f, which leads us to a contradiction. 10 CHAPTER 1. PREREQUISITES In the following theorem we show a sufficient condition: if this condition is satisfied, we are sure to have a minimum point. This is a second order condition: first order derivatives are not enough to establish a sufficient condition. Theorem 1.1.3. Sufficient second-order condition Let f 2 C2(Ω) for a neighbourhood Ω of x∗. rf(x∗) = 0 =) x∗ is a minimum point for f: H(x∗) is positive definite ∗ ∗ ∗ Proof. As H(x ) is positive definite, it exists a neighbourhood B = BT (x ) of x such that n 8x 2 B the matrix H(x) remains positive definite. Then for every p 2 R , 8τ 2 (0;T ) it exists t 2 (0; 1) such that 1 f(x∗ + τp) = f(x∗) + rf(x∗)T τp + (τp)T H(x∗ + tτ p)τp = 2 | {z } |{z}0 = 0 = t 2(0;T ) 1 = f(x∗) + τ 2 pT H(x∗ + t0p)p > f(x∗); 2 | {z } 2B | {z } > 0 i.e.