Nonlinear Programming Models Introduction NLP Problems Local

Total Page:16

File Type:pdf, Size:1020Kb

Nonlinear Programming Models Introduction NLP Problems Local Nonlinear Programming Models Fabio Schoen Introduction 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models – p. 1 Nonlinear Programming Models – p. 2 NLP problems Local and global optima A global minimum or global optimum is any x⋆ S such that ∈ min f(x) x S f(x) f(x⋆) x S Rn ∈ ⇒ ≥ ∈ ⊆ A point x¯ is a local optimum if ε > 0 such that Standard form: ∃ x S B(¯x, ε) f(x) f(¯x) min f(x) ∈ ∩ ⇒ ≥ where Rn is a ball in Rn. hi(x)=0 i =1,m B(¯x, ε) = x : x x¯ ε Any global optimum{ ∈ is alsok a−localk≤optimum,} but the opposite is g (x) 0 j =1, k j ≤ generally false. Here S = x Rn : h (x)=0 i, g (x) 0 j { ∈ i ∀ j ≤ ∀ } Nonlinear Programming Models – p. 3 Nonlinear Programming Models – p. 4 Convex Functions Convex Functions A set S Rn is convex if ⊆ x, y S λx + (1 λ)y S ∈ ⇒ − ∈ for all choices of λ [0, 1]. Let Ω Rn: non empty convex set. A function f : Ω R ∈is convex iff ⊆ → f(λx + (1 λ)y) λf(x) + (1 λ)f(y) − ≤ − for all x, y Ω,λ [0, 1] ∈ ∈ x y Nonlinear Programming Models – p. 5 Nonlinear Programming Models – p. 6 Properties of convex functions Convex functions Every convex function is continuous in the interior of Ω. It might be discontinuous, but only on the frontier. If f is continuously differentiable then it is convex iff f(y) f(x) + (y x)T f(x) ≥ − ∇ for all y Ω ∈ x y Nonlinear Programming Models – p. 7 Nonlinear Programming Models – p. 8 If f is twice continuously differentiable f it is convex iff its Example: an affine function is convex (and concave) Hessian matrix is positive semi-definite:⇒ For a quadratic function (Q: symmetric matrix): ∂2f 1 2f(x) := f(x) = xT Qx + bT x + c ∇ ∂x ∂x 2 i j we have then 2f(x) < 0 iff ∇ f(x) = Qx + b 2f(x) = Q vT 2f(x)v 0 v Rn ∇ ∇ ∇ ≥ ∀ ∈ is convex iff < or, equivalently, all eigenvalues of 2f(x) are non negative. f Q 0 ∇ ⇒ Nonlinear Programming Models – p. 9 Nonlinear Programming Models – p. 10 Convex Optimization Problems Maximization Slight abuse in notation: a problem min f(x) max f(x) x S ∈ x S ∈ is a convex optimization problem iff S is a convex set and f is convex on S. For a problem in standard form is called convex iff S is a convex set and f is a concave function (not to be confused with minimization of a concave function, (or min f(x) maximization of a convex function) which are NOT a convex optimization problem) hi(x)=0 i =1,m g (x) 0 j =1, k j ≤ if f is convex, hi(x) are affine functions, gj(x) are convex functions, then the problem is convex. Nonlinear Programming Models – p. 11 Nonlinear Programming Models – p. 12 Convex and non convex optimization Convex functions: examples Convex optimization “is easy”, non convex optimization is Many (of course not all . ) functions are convex! usually very hard. affine functions aT x + b Fundamental property of convex optimization problems: every quadratic functions 1 xT Qx + bT x + c with Q = QT , Q 0 local optimum is also a global optimum (will give a proof later) 2 Minimizing a positive semidefinite quadratic function on a any norm is a convex function polyhedron is easy (polynomially solvable); if even a single (however is concave) eigenvalue of the hessian is negative the problem becomes x log x log x ⇒ n NP –hard f is convex if and only if x0,d R , its restriction to any ∀ ∈ line: φ(α) = f(x0 + αd), is a convex function a linear non negative combination of convex functions is convex g(x, y) convex in x for all y g(x, y) dy convex ⇒ R Nonlinear Programming Models – p. 13 Nonlinear Programming Models – p. 14 more examples . max aT x + b is convex i{ i } f,g: convex max f(x),g(x) is convex ⇒ { } fa convex functions for any a (a possibly uncountable Data Approximation set) sup f (x) is convex ∈A ⇒ a∈A a f convex f(Ax + b) ⇒ let S Rn be any set f(x) = sup x s is convex ⊆ ⇒ s∈S k − k T T race(A X) = i,j AijXij is convex (it is linear!) log det X−1 is convexP over the set of matrices X Rn×n : X 0 ∈ ≻ λmax(X) (the largest eigenvalue of a matrix X) Nonlinear Programming Models – p. 15 Nonlinear Programming Models – p. 16 Table of contents Norm approximation norm approximation Problem: maximum likelihood min Ax b x k − k robust estimation where A, b: parameters. Usually the system is over-determined, i.e. b Range(A). 6∈ For example, this happens when A Rm×n with m > n and A has full rank. ∈ r := Ax b: “residual”. − Nonlinear Programming Models – p. 17 Nonlinear Programming Models – p. 18 Examples Example: ℓ1 norm r = √rT r: least squares (or “regression”) Matrix A R100×30 ∈ k k 80 r = √rT Pr with P 0: weighted least squares norm 1 residuals k k ≻ r = max r : minimax, or ℓ∞ or di Tchebichev 70 k k i | i| approximation 60 r = r : absolute or ℓ1 approximation k k i | i| 50 P Possible (convex) additional constraints: 40 maximum deviation from an initial estimate: x x ǫ k − estk≤ 30 simple bounds ℓi xi ui ≤ ≤ 20 ordering: x x x 1 ≤ 2 ≤···≤ n 10 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 Nonlinear Programming Models – p. 19 Nonlinear Programming Models – p. 20 2 ℓ∞ norm ℓ norm 20 18 norm residuals norm 2 residuals 18 ∞ 16 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 Nonlinear Programming Models – p. 21 Nonlinear Programming Models – p. 22 Variants comparison min h(y aT x) where h: convex function: i i i 4 − norm 1(x) P z2 z 1 3.5 norm 2(x) h linear–quadratic h(z) = | |≤ linquad(x) 2 z 1 z > 1 deadzone(x) ( | |− | | 3 logbarrier(x) 0 z 1 2.5 “dead zone”: h(z) = | |≤ z 1 z > 1 ( | |− | | 2 log(1 z2) z < 1 1.5 logarithmic barrier: h(z) = − − | | z 1 1 ( ∞ | |≥ 0.5 0 -0.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Nonlinear Programming Models – p. 23 Nonlinear Programming Models – p. 24 Maximum likelihood Max likelihood estimate - MLE Given a sample X1,X2,...,Xk and a parametric family of (taking the logarithm, which does not change optimum points): probability density functions ( ; θ), the maximum likelihood L · ˆ T estimate of θ given the sample is θ = arg max log(p(Xi ai θ)) θ − i X θˆ = arg max (X1,...,Xk; θ) θ L If p is log–concave this problem is convex. Examples: ⇒ Example: linear measures with and additive i.i.d. (independent ε (0, σ), i.e. p(z) = (2πσ)−1/2 exp( z2/2σ2) MLE is the ∼ N − ⇒ identically dsitributed) noise: ℓ2 estimate: θ = arg min Aθ X ; k − k2 1 T p(z) = (1/(2a)) exp( z /a) ℓ estimate: Xi = ai θ + εi (1) −| | ⇒ θˆ = arg min Aθ X θ k − k1 where ε iid random variables with density p( ): i · k (X ...,X ; θ) = p(X aT θ) L 1 k i − i i=1 Y Nonlinear Programming Models – p. 25 Nonlinear Programming Models – p. 26 Ellipsoids p(z) = (1/a) exp( z/a)1 (negative exponential) the An ellipsoid is a subset of Rn of the form − {z≥0} ⇒ estimate can be found solving the LP problem: = x Rn :(x x )T P −1(x x ) 1 E { ∈ − 0 − 0 ≤ } min 1T (X Aθ) − Rn Aθ X where x0 is the center of the ellipsoid and P is a ≤ symmetric∈ positive-definite matrix. p uniform on [ a, a] the MLE is any θ such that Alternative representations: − ⇒ Aθ X ∞ a k − k ≤ = x Rn : Ax b 1 E { ∈ k − k2 ≤ } where A 0, or ≻ = x Rn : x = x + Au u 1 E { ∈ 0 |k k2 ≤ } where A is square and non singular (affine transformation of the unit ball) Nonlinear Programming Models – p. 27 Nonlinear Programming Models – p. 28 Robust Least Squares RLS T 2 It holds: Least Squares: xˆ = arg min i(ai x bi) Hp: ai not known, but it is known that − α + βT y α + β y pP | | ≤ | | k kk k then, choosing y⋆ = β/ β if α 0 and y⋆ = β/ β , otherwise ai i = a¯i + Piu : u 1 k k ≥ − k k ∈ E { k k≤ } if α < 0, then y =1 and k k T where Pi = P < 0. Definition: worst case residuals: i α + βT y⋆ = α + βT β/ β sign(α) | | | k k | T 2 = α + β max (ai x bi) | | k k ai∈Ei − i sX then: A robust estimate of x is the solution of T T T max (ai x bi) = max a¯i x bi + u Pix ai∈Ei | − | kuk≤1 | − | T 2 T xˆr = arg min max (ai x bi) = a¯i x bi + Pix x ai∈Ei − | − | k k s i X Nonlinear Programming Models – p. 29 Nonlinear Programming Models – p. 30 ... ... Thus the Robust Least Squares problem reduces to min t 2 1/2 x,t k k T 2 T min ( a¯i x bi + Pix ) a¯ x b + P x t | − | k k i i i i i ! − k k≤ X T a¯i x + bi + Pix ti (a convex optimization problem).
Recommended publications
  • Chapter 8 VARIABLE NEIGHBORHOOD SEARCH
    Chapter 8 VARIABLE NEIGHBORHOOD SEARCH Pierre Hansen GERAD and HEC Montreal, Canada Nenad Mladenovic GERAD and Mathematical Institute, SANU, Belgrade, Serbia 8.1 INTRODUCTION Variable Neighborhood Search (VNS) is a recent metaheuristic, or frame­ work for building heuristics, which exploits systematically the idea of neigh­ borhood change, both in the descent to local minima and in the escape from the valleys which contain them. In this tutorial we first present the ingredients of VNS, i.e. Variable Neighborhood Descent (VND) and Reduced VNS (RVNS) followed by the basic and then the general scheme of VNS itself which contain both of them. Extensions are presented, in particular Skewed VNS (SVNS) which enhances exploration of far-away valleys and Variable Neighborhood Decomposition Search (VNDS), a two-level scheme for solution of large in­ stances of various problems. In each case, we present the scheme, some illus­ trative examples and questions to be addressed in order to obtain an efficient implementation. Let us consider a combinatorial or global optimization problem min/(x) (8.1) subject to xeX (8.2) where f{x) is the objective function to be minimized and X the set oi feasible solutions. A solution x* e X is optimal if fix*) < fix), VxeX (8.3) An exact algorithm for problem (8.1)-(8.2), if one exists, finds an optimal solution X*, together with the proof of its optimality, or shows that there is no 212 HANSEN AND MLADENOVIC feasible solution, i.e. X = 0. Moreover, in practice, the time to do so should be finite (and not too large); if one deals with a continuous function one must admit a degree of tolerance, i.e.
    [Show full text]
  • Hardness of Continuous Local Search: Query Complexity and Cryptographic Lower Bounds
    Electronic Colloquium on Computational Complexity, Report No. 63 (2016) Hardness of Continuous Local Search: Query Complexity and Cryptographic Lower Bounds Pavel Hub´aˇcek∗ Eylon Yogev∗ Abstract Local search proved to be an extremely useful tool when facing hard optimization problems (e.g., via the simplex algorithm, simulated annealing, or genetic algorithms). Although powerful, it has its limitations: there are functions for which exponentially many queries are needed to find a local optimum. In many contexts the optimization problem is defined by a continuous function, which might offer an advantage when performing the local search. This leads us to study the following natural question: How hard is continuous local search? The computational complexity of such search problems is captured by the complexity class CLS which is contained in the intersection of PLS and PPAD, two important subclasses of TFNP (the class of NP search problems with a guaranteed solution). In this work, we show the first hardness results for CLS (the smallest non-trivial class among the currently defined subclasses of TFNP). Our hardness results are in terms of black- box (where only oracle access to the function is given) and white-box (where the function is represented succinctly by a circuit). In the black-box case, we show instances for which any (computationally unbounded) randomized algorithm must perform exponentially many queries in order to find a local optimum. In the white-box case, we show hardness for computationally bounded algorithms under cryptographic assumptions. As our main technical contribution we introduce a new total search problem which we call End-of-Metered-Line.
    [Show full text]
  • An Analysis of the Local Optima Storage Capacity of Hopfield
    An Analysis of the Local Optima Storage Capacity of Hopfield Network Based Fitness Function Models Kevin Swingler, Leslie Smith Computing Science and Maths, University of Stirling, Stirling, FK9 4LA. Scotland Abstract. A Hopfield Neural Network (HNN) with a new weight update rule can be treated as a second order Estimation of Distribution Algorithm (EDA) or Fitness Function Model (FFM) for solving optimisation problems. The HNN models promising solutions and has a capacity for storing a certain number of local optima as low energy attractors. Solutions are generated by sampling the patterns stored in the attractors. The number of attractors a network can store (its capacity) has an impact on solution diversity and, consequently solution quality. This paper introduces two new HNN learning rules and presents the Hopfield EDA (HEDA), which learns weight values from samples of the fitness function. It investigates the attractor storage capacity of the HEDA and shows it to be equal to that known in the literature for a standard HNN. The relationship between HEDA capacity and linkage order is also investigated. 1 Introduction A certain class of optimisation problem may be solved (or an attempt at solving may be made) using metaheuristics. Such problems generally have the following qualities: the search is for an optimal (or near optimal) pattern of values over a number (often many) of random variables; any candidate solution, which is an instantiation of each of those variables, has an associated score, which is its quality as a solution; a fitness function exists that takes a candidate solution and produces a score. The function (or algorithm) for calculating the score may be evaluated for any input vector, but may not be inverted to produce an input vector that would maximise the score.
    [Show full text]
  • Continuous Global Optimization in R
    JSS Journal of Statistical Software September 2014, Volume 60, Issue 6. http://www.jstatsoft.org/ Continuous Global Optimization in R Katharine M. Mullen University of California, Los Angeles Abstract This article surveys currently available implementations in R for continuous global optimization problems. A new R package globalOptTests is presented that provides a set of standard test problems for continuous global optimization based on C functions by Ali, Khompatraporn, and Zabinsky(2005). 48 of the objective functions contained in the package are used in empirical comparison of 18 R implementations in terms of the quality of the solutions found and speed. Keywords: global optimization, constrained optimization, continuous optimization, R. 1. Introduction to global optimization Global optimization is the process of finding the minimum of a function of n parameters, with the allowed parameter values possibly subject to constraints. In the absence of constraints (which are discussed in Section 1.1), the task may be formulated as minimize f(x) (1) x where f is an objective function and the vector x represents the n parameters. If f is a n function < ! <, so that elements xi of the input vector x and the output value are real numbers, the global optimization problem is continuous. Global optimization may be contrasted with local optimization. Local optimization finds local optima, which represent the best solution in a subset of the parameter space, not necessarily in the parameter space as a whole. A local optimum x∗ may be defined as a point for which there exists some δ > 0 such that for all points x such that kx − x∗k ≤ δ; f(x∗) ≤ f(x); in other words, a local optima x∗ is a point at which the objective function f(x∗) is less than or equal to f(x) at all other points x in a certain neighborhood.
    [Show full text]
  • Choosing Mutation and Crossover Ratios for Genetic Algorithms—A Review with a New Dynamic Approach
    information Article Choosing Mutation and Crossover Ratios for Genetic Algorithms—A Review with a New Dynamic Approach Ahmad Hassanat 1,2,*,†,‡ , Khalid Almohammadi 1, Esra’a Alkafaween 2,‡ , Eman Abunawas 2, Awni Hammouri 2 and V. B. Surya Prasath 3,4,5,6 1 Computer Science Department, Community College, University of Tabuk, Tabuk 71491, Saudi Arabia; [email protected] 2 Computer Science Department, Mutah University, Karak 61711, Jordan; [email protected] (E.A.); [email protected] (E.A.); [email protected] (A.H.) 3 Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, OH 45229, USA; [email protected] 4 Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA 5 Department of Biomedical Informatics, College of Medicine, University of Cincinnati, OH 45267, USA 6 Department of Electrical Engineering and Computer Science, University of Cincinnati, OH 45221, USA * Correspondence: [email protected] † Current address: Computer Department, Community College, University of Tabuk, Tabuk 71491, Saudi Arabia. ‡ These authors contributed equally to this work. Received: 6 November 2019; Accepted: 2 December 2019; Published: 10 December 2019 Abstract: Genetic algorithm (GA) is an artificial intelligence search method that uses the process of evolution and natural selection theory and is under the umbrella of evolutionary computing algorithm. It is an efficient tool for solving optimization problems. Integration among (GA) parameters is vital for successful (GA) search. Such parameters include mutation and crossover rates in addition to population that are important issues in (GA). However, each operator of GA has a special and different influence. The impact of these factors is influenced by their probabilities; it is difficult to predefine specific ratios for each parameter, particularly, mutation and crossover operators.
    [Show full text]
  • Which Local Search Operator Works Best for the Open-Loop TSP?
    applied sciences Article Which Local Search Operator Works Best for the Open-Loop TSP? Lahari Sengupta *, Radu Mariescu-Istodor and Pasi Fränti Machine Learning Group, School of Computing, University of Eastern Finland, 80101 Joensuu, Finland; [email protected].fi (R.M.-I.); [email protected].fi (P.F.) * Correspondence: [email protected].fi; Tel.: +358-417232752 Received: 6 August 2019; Accepted: 17 September 2019; Published: 23 September 2019 Abstract: The traveling salesman problem (TSP) has been widely studied for the classical closed-loop variant. However, very little attention has been paid to the open-loop variant. Most of the existing studies also focus merely on presenting the overall optimization results (gap) or focus on processing time, but do not reveal much about which operators are more efficient to achieve the result. In this paper, we present two new operators (link swap and 3–permute) and study their efficiency against existing operators, both analytically and experimentally. Results show that while 2-opt and relocate contribute equally in the closed-loop case, the situation changes dramatically in the open-loop case where the new operator, link swap, dominates the search; it contributes by 50% to all improvements, while 2-opt and relocate have a 25% share each. The results are also generalized to tabu search and simulated annealing. Keywords: traveling salesman problem; open-loop TSP; randomized algorithms; local search; NP–hard; O–Mopsi game 1. Introduction The traveling salesman problem (TSP) aims to find the shortest tour for a salesperson to visit N number of cities. In graph theory, a cycle including every vertex of a graph makes a Hamiltonian cycle.
    [Show full text]