Improved Penalty Method Via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization

Total Page:16

File Type:pdf, Size:1020Kb

Load more

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) Improved Penalty Method via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization Wanli Shi1, Bin Gu1,2,3* 1 Nanjing University of Information Science & Technology, P.R.China 2MBZUAI, United Arab Emirates 3JD Finance America Corporation, Mountain View, CA, USA [email protected], [email protected] Abstract of HO methods have emerged, such as grid search, random search (Bergstra et al. 2011; Bergstra and Bengio 2012), so- Hyperparameter optimization (HO) is an important problem lution path (Gu and Sheng 2017; Gu, Liu, and Huang 2017; in machine learning which is normally formulated as a bilevel optimization problem. Gradient-based methods are dominant Bao, Gu, and Huang 2019; Gu and Ling 2015), and sev- in bilevel optimization due to their high scalability to the eral Bayesian methods (Thornton et al. 2013; Brochu, Cora, number of hyperparameters, especially in a deep learning and De Freitas 2010; Swersky, Snoek, and Adams 2014; Wu problem. However, traditional gradient-based bilevel opti- et al. 2019). mization methods need intermediate steps to obtain the ex- In real-world applications, the number of hyperparame- act or approximate gradient of hyperparameters, namely hy- ters increases shapely, especially in a deep learning problem. pergradient, for the upper-level objective, whose complexity In these problems, gradient-based methods are dominant in is high especially for high dimensional datasets. Recently, bilevel problems with continuous hyperparameter set due a penalty method has been proposed to avoid the computa- tion of the hypergradient, which speeds up the gradient-based to their high scalability to the amount of hyperparameters. BHO methods. However, the penalty method may result in a The main idea of the gradient-based methods is to first ap- very large number of constraints, which greatly limits the ef- proximate the solution on the training set and then compute ficiency of this method, especially for high dimensional data the gradient descent direction for hyperparameters, namely problems. To address this limitation, in this paper, we propose hypergradient. However, calculating the exact hypergradient a doubly stochastic gradient descent algorithm (DSGPHO) to needs the computation of the inverse Hessian of the lower- improve the efficiency of the penalty method. Importantly, level problem, which makes it impractical for high dimen- we not only prove the proposed method can converge to the sional data problem. To solve this problem, many methods KKT condition of the original problem in a convex setting, have been proposed to approximate the hypergradient by us- but also provide the convergence rate of DSGPHO which is ing some intermediate steps, such as solving a linear system the first result in the references of gradient-based bilevel op- timization as far as we know. We compare our method with (the approximate methods (Domke 2012; Pedregosa 2016; three state-of-the-art gradient-based methods in three tasks, Rajeswaran et al. 2019)) or using reverse/forward mode dif- i.e., data denoising, few-shot learning, and training data poi- ferentiation (e.g. (Maclaurin, Duvenaud, and Adams 2015; soning, using several large-scale benchmark datasets. All the Domke 2012; Franceschi et al. 2017; Swersky, Snoek, and results demonstrate that our method outperforms or is com- Adams 2014)). Although these methods can avoid the com- parable to the existing methods in terms of accuracy and effi- putation of inverse Hessian, the intermediate steps usually ciency. need extra loops which will affect the efficiency of these gradient-based methods. Introduction Recently, (Mehra and Hamm 2019) pointed out that the intermediate steps are not necessary. They formulate the Hyperparameter optimization problems (HO) are at the cen- bilevel problem as a single level constrained problem, by ter of several important machine learning problems, which replacing the lower-level objective with its first-order opti- are usually formulated as bilevel optimization problems mal necessary condition. Then the penalty framework can (Feurer and Hutter 2019). Any machine learning algorithms be used to solve this constrained problem. With updating crucially depend on the choice of their hyperparameters. the model parameters and hyperparameters alternately, the However, manually tunning hyperparameters is often time- penalty method can finally obtain the exact hypergradient. It consuming, and depends heavily on human’s prior knowl- speeds up the gradient-based methods but brings a new prob- edge, which makes it difficult to find the optimal hyperpa- lem in the meanwhile. Since the constraints are calculated rameters. Therefore, to choose the set hyperparameters au- from the first order necessary condition of the lower-level tomatically has attracted great attention, and large amounts objective, it may lead to the optimization problem with a *Corresponding Authors large number of constraints, especially for high dimensional Copyright © 2021, Association for the Advancement of Artificial datasets or deep learning methods. Calculating the gradient Intelligence (www.aaai.org). All rights reserved. of the penalty term in such a heavily constrained problem 9621 Method Problem Update v Intermediate steps Time Space Convergence Rate 2 2 FMD Bilevel v v − ηrvg P = P (I − ηrvvg) − ηruvg O(dmT ) O(dm) − 2 p = p − ηruvg · q RMD Bilevel v v − ηrvg 2 O(dT ) O(dT + m) − q = (I − ηrvvg)q 2 2 Approx Bilevel v v − ηrvg Solve minq krvvg · q − rvfk2 O(dT ) O(d + m) − 2 Penalty Single level v v − η(rvf + µrvvg · rvg) Not reqired O(dT ) O(d + m) −p DSGPHO Single level v v − ηr~ vL Not reqired O(T ) O(d + m) O(1= K) Table 1: The complexity of various gradient methods for update the hyperparameters one time. Here d is the size of model parameters, m denotes the size of hyperparameters and T is the number of model parameters update per hypergradient compu- tation. r~ vL = r~ vf + [µcj + zj]rvcj where zj is the jth element of dual parameter, µ > 0 is the penalty parameter and cj is jth element of rvg. K denotes the total number of hyperparameter updates. will greatly limits its efficiency. ority of our method. To address this limitation, in this paper, we propose a dou- bly stochastic gradient descent method to improve the ef- Related Works ficiency of the penalty method (Mehra and Hamm 2019). Similar to (Mehra and Hamm 2019), we first transform the In this section, we give a brief review of the gradient-based bilevel problem to a single level constrained problem and hyperparameter optimization methods. give its corresponding augmented Lagrangian function. We As we mentioned in the previous section, many gradient first randomly sample a validation data instance to calculate method use some intermediate steps to approximate the hy- the stochastic gradient of the upper-level objective. Then, we pergradient. Specifically, (Franceschi et al. 2017; Maclaurin, randomly sample a constraint and calculate its correspond- Duvenaud, and Adams 2015; Shaban et al. 2019) use the ing gradient. By combining these terms together, we obtain forward/reverse mode differentiation (RMD/FMD) to ap- the stochastic gradient of the augmented Lagrangian func- proximate the hypergradient. Both of them view the train- tion, and update the model parameters and hyperparameters, ing procedure as dynamical system with a state vt+1 = alternately, by using this stochastic gradient. Since in our Φ(vt; u) := vt − ηrvg(u; vt), for t = 1; ··· ;T , where v method, we have two sources of randomness, i.e., the valida- denote the model parameters, u denotes the hyperparame- tion set and the constraints set, we can denote our method as ters, η is the learning rate and g is the training objective on a doubly stochastic gradient descent method for the penalty training set. According to (Feurer and Hutter 2019), RMD 2 hyperparameter optimization (DSGPHO). We prove that our needs another T iterations to calculate p = p − ηruvg · q 2 method can achieve the unbiased estimation of the exact hy- and q = (I − ηrvvg)q and then outputs the hypergradient pergradient asymptotically and it finally converges to the p, where I is the identity matrix. In each p update step we KKT point of the original problem. Importantly, we also pro- need O(d) time and space to calculate the Jacobian vector vide the convergence rate of DSGPHO which is the first re- product, where d is the size of model parameters. Totally, it sult in the references of gradient-based bilevel optimization needs O(dT ) time and O(m+T d) space to get the hypergra- as far as we know. We compare our method with three state- dient, where m is the size of hyperparameters. FMD needs 2 2 of-the-art gradient-based methods in three tasks, i.e., data to calculate P = P (I − ηrvvg) − ηruvg and update v at denoising, few-shot learning and training data poisoning, us- the same time. Then the hypergradient can be computed by ing several large scale benchmark datasets. All the results using P . (Feurer and Hutter 2019) pointed out that it needs demonstrate that our method outperforms or is comparable O(dm) time to get each P . Hence the time complexity for to the existing methods in terms of accuracy and efficiency. calculate one hypergradient is O(dmT ), and the space com- plexity is O(dm) for P is a d × m matrix. (Pedregosa 2016) Contributions We summarized the contributions as follows, 2 2 solve a linear problem minq krvvg · q − rvfk2 and use the 1. In this paper, we propose a doubly stochastic gradient solution to approximate the hypergradient. When using gra- descent method to improve the efficiency of the penalty dient method to solve this problem, we require O(d) time to method. Within each iteration, we randomly sample a val- obtain the Hessian-vector product in each iteration accord- idation instance and a constraint to calculate the stochastic ing to (Pearlmutter 1994).
Recommended publications
  • PENALTY METHODS for the SOLUTION of GENERALIZED NASH EQUILIBRIUM PROBLEMS (WITH COMPLETE TEST PROBLEMS) Francisco Facchinei1

    PENALTY METHODS for the SOLUTION of GENERALIZED NASH EQUILIBRIUM PROBLEMS (WITH COMPLETE TEST PROBLEMS) Francisco Facchinei1

    PENALTY METHODS FOR THE SOLUTION OF GENERALIZED NASH EQUILIBRIUM PROBLEMS (WITH COMPLETE TEST PROBLEMS) Francisco Facchinei1 and Christian Kanzow2 1 Sapienza University of Rome Department of Computer and System Sciences “A. Ruberti” Via Ariosto 25, 00185 Roma, Italy e-mail: [email protected] 2 University of W¨urzburg Institute of Mathematics Am Hubland, 97074 W¨urzburg, Germany e-mail: [email protected] February 11, 2009 Abstract The generalized Nash equilibrium problem (GNEP) is an extension of the classical Nash equilibrium problem where both the objective functions and the constraints of each player may depend on the rivals’ strategies. This class of problems has a multitude of important engineering applications and yet solution algorithms are extremely scarce. In this paper, we analyze in detail a globally convergent penalty method that has favorable theoretical properties. We also consider strengthened results for a particular subclass of problems very often considered in the literature. Basically our method reduces the GNEP to a single penalized (and nonsmooth) Nash equilibrium problem. We suggest a suitable method for the solution of the latter penalized problem and present extensive numerical results. Key Words: Nash equilibrium problem, Generalized Nash equilibrium problem, Jointly convex problem, Exact penalty function, Global convergence. 1 Introduction In this paper, we consider the Generalized Nash Equilibrium Problem (GNEP) and describe new globally convergent methods for its solution. The GNEP is an extension of the classical Nash Equilibrium Problem (NEP) where the feasible sets of each player may depend on ν n the rivals’ strategies. We assume there are N players and denote by x ∈ R ν the vector representing the ν-th player’s strategy.
  • First-Order Methods for Semidefinite Programming

    First-Order Methods for Semidefinite Programming

    FIRST-ORDER METHODS FOR SEMIDEFINITE PROGRAMMING Zaiwen Wen Advisor: Professor Donald Goldfarb Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2009 c 2009 Zaiwen Wen All Rights Reserved ABSTRACT First-order Methods For Semidefinite Programming Zaiwen Wen Semidefinite programming (SDP) problems are concerned with minimizing a linear function of a symmetric positive definite matrix subject to linear equality constraints. These convex problems are solvable in polynomial time by interior point methods. However, if the number of constraints m in an SDP is of order O(n2) when the unknown positive semidefinite matrix is n n, interior point methods become impractical both in terms of the × time (O(n6)) and the amount of memory (O(m2)) required at each iteration to form the m × m positive definite Schur complement matrix M and compute the search direction by finding the Cholesky factorization of M. This significantly limits the application of interior-point methods. In comparison, the computational cost of each iteration of first-order optimization approaches is much cheaper, particularly, if any sparsity in the SDP constraints or other special structure is exploited. This dissertation is devoted to the development, analysis and evaluation of two first-order approaches that are able to solve large SDP problems which have been challenging for interior point methods. In chapter 2, we present a row-by-row (RBR) method based on solving a sequence of problems obtained by restricting the n-dimensional positive semidefinite constraint on the matrix X. By fixing any (n 1)-dimensional principal submatrix of X and using its − (generalized) Schur complement, the positive semidefinite constraint is reduced to a sim- ple second-order cone constraint.
  • Constraint-Handling Techniques for Highly Constrained Optimization

    Constraint-Handling Techniques for Highly Constrained Optimization

    Andreas Petrow Constraint-Handling Techniques for highly Constrained Optimization Intelligent Cooperative Systems Computational Intelligence Constraint-Handling Techniques for highly Constrained Optimization Master Thesis Andreas Petrow March 22, 2019 Supervisor: Prof. Dr.-Ing. habil. Sanaz Mostaghim Advisor: M.Sc. Heiner Zille Advisor: Dr.-Ing. Markus Schori Andreas Petrow: Constraint-Handling Techniques for highly Con- strained Optimization Otto-von-Guericke Universität Intelligent Cooperative Systems Computational Intelligence Magdeburg, 2019. Abstract This thesis relates to constraint handling techniques for highly constrained opti- mization. Many real world optimization problems are non-linear and restricted to constraints. To solve non-linear optimization problems population-based meta- heuristics are used commonly. To enable such optimization methods to fulfill the optimization with respect to the constraints modifications are required. Therefore a research field considering such constraint handling techniques establishes. How- ever the combination of large-scale optimization problems with constraint handling techniques is rare. Therefore this thesis involves with common constraint handling techniques and evaluates these on two benchmark problems. I Contents List of Figures VII List of TablesXI List of Acronyms XIII 1. Introduction and Motivation1 1.1. Aim of this thesis...........................2 1.2. Structure of this thesis........................3 2. Basics5 2.1. Constrained Optimization Problem..................5 2.2. Optimization Method.........................6 2.2.1. Classical optimization....................7 2.2.2. Population-based Optimization Methods...........7 3. Related Work 11 3.1. Penalty Methods........................... 12 3.1.1. Static Penalty Functions................... 12 3.1.2. Co-Evolution Penalty Methods................ 13 3.1.3. Dynamic Penalty Functions................. 13 3.1.4. Adaptive Penalty Functions................. 13 3.2.
  • Numerical Solution of Saddle Point Problems

    Numerical Solution of Saddle Point Problems

    Acta Numerica (2005), pp. 1–137 c Cambridge University Press, 2005 DOI: 10.1017/S0962492904000212 Printed in the United Kingdom Numerical solution of saddle point problems Michele Benzi∗ Department of Mathematics and Computer Science, Emory University, Atlanta, Georgia 30322, USA E-mail: [email protected] Gene H. Golub† Scientific Computing and Computational Mathematics Program, Stanford University, Stanford, California 94305-9025, USA E-mail: [email protected] J¨org Liesen‡ Institut f¨ur Mathematik, Technische Universit¨at Berlin, D-10623 Berlin, Germany E-mail: [email protected] We dedicate this paper to Gil Strang on the occasion of his 70th birthday Large linear systems of saddle point type arise in a wide variety of applica- tions throughout computational science and engineering. Due to their indef- initeness and often poor spectral properties, such linear systems represent a significant challenge for solver developers. In recent years there has been a surge of interest in saddle point problems, and numerous solution techniques have been proposed for this type of system. The aim of this paper is to present and discuss a large selection of solution methods for linear systems in saddle point form, with an emphasis on iterative methods for large and sparse problems. ∗ Supported in part by the National Science Foundation grant DMS-0207599. † Supported in part by the Department of Energy of the United States Government. ‡ Supported in part by the Emmy Noether Programm of the Deutsche Forschungs- gemeinschaft. 2 M. Benzi, G. H. Golub and J. Liesen CONTENTS 1 Introduction 2 2 Applications leading to saddle point problems 5 3 Properties of saddle point matrices 14 4 Overview of solution algorithms 29 5 Schur complement reduction 30 6 Null space methods 32 7 Coupled direct solvers 40 8 Stationary iterations 43 9 Krylov subspace methods 49 10 Preconditioners 59 11 Multilevel methods 96 12 Available software 105 13 Concluding remarks 107 References 109 1.
  • Constrained Cohort Intelligence Using Static and Dynamic Penalty Function Approach for Mechanical Components Design

    Constrained Cohort Intelligence Using Static and Dynamic Penalty Function Approach for Mechanical Components Design

    Constrained Cohort Intelligence using Static and Dynamic Penalty Function Approach for Mechanical Components Design Omkar Kulkarni1, Ninad Kulkarni1, Anand J Kulkarni2,3*, Ganesh Kakandikar1 1 Department of Mechanical Engineering, Faculty of Engineering, Zeal College of Engineering and Research, Pune 2 Odette School of Business, University of Windsor, 401 Sunset Avenue, Windsor, Ontario N9B 3P4, Canada Email: [email protected]; Ph: 1 519 253 3000 x4939 3Department of Mechanical Engineering, Symbiosis Institute of Technology, Symbiosis International University, Pune MH 412 115 India Email: [email protected]; [email protected], Ph: 91 20 39116468 Abstract Most of the metaheuristics can efficiently solve unconstrained problems; however, their performance may degenerate if the constraints are involved. This paper proposes two constraint handling approaches for an emerging metaheuristic of Cohort Intelligence (CI). More specifically CI with static penalty function approach (SCI) and CI with dynamic penalty function approach (DCI) are proposed. The approaches have been tested by solving several constrained test problems. The performance of the SCI and DCI have been compared with algorithms like GA, PSO, ABC, d-Ds. In addition, as well as three real world problems from mechanical engineering domain with improved solutions. The results were satisfactory and validated the applicability of CI methodology for solving real world problems. Keywords: Cohort Intelligence, Static Penalty Function Approach, Dynamic Penalty Function Approach, Constrained Optimization 1. Introduction In past few years, several metaheuristics have been proposed in the field of optimization. They include number of nature/bio-inspired optimization techniques such as Evolutionary Algorithms (EAs), Swarm Intelligence (SI), etc. Some of the SI techniques include Particle swarm optimization (PSO), Cuckoo Search Algorithm (CS) (Yang et al.
  • A Penalty Method for PDE-Constrained Optimization in Inverse Problems

    A Penalty Method for PDE-Constrained Optimization in Inverse Problems

    A penalty method for PDE-constrained optimization in inverse problems T. van Leeuwen1 and F.J. Herrmann2 1Mathematical Institute, Utrecht University, Utrecht, the Netherlands. 2 Dept. of Earth, Ocean and Atmospheric Sciences, University of British Columbia, Vancouver (BC), Canada. E-mail: [email protected] Abstract. Many inverse and parameter estimation problems can be written as PDE-constrained optimization problems. The goal is to infer the parameters, typically coefficients of the PDE, from partial measurements of the solutions of the PDE for several right-hand-sides. Such PDE-constrained problems can be solved by finding a stationary point of the Lagrangian, which entails simultaneously updating the parameters and the (adjoint) state variables. For large-scale problems, such an all-at-once approach is not feasible as it requires storing all the state variables. In this case one usually resorts to a reduced approach where the constraints are explicitly eliminated (at each iteration) by solving the PDEs. These two approaches, and variations thereof, are the main workhorses for solving PDE-constrained optimization problems arising from inverse problems. In this paper, we present an alternative method that aims to combine the advantages of both approaches. Our method is based on a quadratic penalty formulation of the constrained optimization problem. By eliminating the state variable, we develop an efficient algorithm that has roughly the same computational complexity as the conventional reduced approach while exploiting a larger search space. Numerical results show that this method indeed reduces some of the non-linearity of the problem and is less sensitive to the initial iterate. arXiv:1504.02249v2 [math.OC] 23 Sep 2015 A penalty method for inverse problems 2 1.
  • A Double-Multiplicative Dynamic Penalty Approach for Constrained Evolutionary Optimization

    A Double-Multiplicative Dynamic Penalty Approach for Constrained Evolutionary Optimization

    Struct Multidisc Optim (2008) 35:431–445 DOI 10.1007/s00158-007-0143-1 RESEARCH PAPER A double-multiplicative dynamic penalty approach for constrained evolutionary optimization Simone Puzzi & Alberto Carpinteri Received: 13 July 2006 /Revised: 5 February 2007 /Accepted: 25 March 2007 / Published online: 2 June 2007 # Springer-Verlag 2007 Abstract The present paper proposes a double-multiplica- include genetic algorithms (GAs; Holland 1975; Goldberg tive penalty strategy for constrained optimization by means 1989), evolutionary programming (Fogel 1966), evolution of genetic algorithms (GAs). The aim of this research is to strategies (Rechenberg 1973; Schwefel 1981), and biolog- provide a simple and efficient way of handling constrained ical evolution methods (Arutyunyan and Drozdov 1988; optimization problems in the GA framework without the Azegami 1990; Mattheck and Burkhardt 1990). The present need for tuning the values of penalty factors for any given trend is to decrease the differences among these paradigms optimization problem. After a short review on the most and refer, in generic terms, to evolutionary algorithms popular and effective exterior penalty formulations, the (EAs) or evolution programs (EPs) when talking about any proposed penalty strategy is presented and tested on five of them (Michalewicz 1996). different benchmark problems. The obtained results are The reason behind the success of these approaches is that compared with the best solutions provided in the literature, traditional nonlinear programming methods are local search showing the effectiveness of the proposed approach. techniques and are, thus, prone to converge toward local optima. This is a serious drawback in many practical Keywords Genetic algorithms . Exterior penalty function . structural optimization problems where the design space Constraint handling .
  • A Generalized Newton-Penalty Algorithm for Large Scale Ill-Conditioned Quadratic Problems Abstract MSC (2000) No.: 90C06, 90C20

    A Generalized Newton-Penalty Algorithm for Large Scale Ill-Conditioned Quadratic Problems Abstract MSC (2000) No.: 90C06, 90C20

    Available at Applications and Applied http://pvamu.edu/aam Mathematics: Appl. Appl. Math. ISSN: 1932-9466 An International Journal (AAM) Vol. 4, Issue 2 (December 2009), pp. 273 – 278 (Previously, Vol. 4, No. 2) A Generalized Newton-Penalty Algorithm for Large Scale Ill-Conditioned Quadratic Problems Maziar Salahi and Moslem Ganji Department of Mathematics Faculty of Sciences, University of Guilan P.O. Box 1914, Rasht, Iran Received: July 11, 2009; Accepted: October 27, 2009 Abstract Large scale quadratic problems arise in many real world applications. It is quite often that the coefficient matrices in these problems are ill-conditioned. Thus, if the problem data are available even with small error, then solving them using classical algorithms might result to meaningless solutions. In this short paper, we propose an efficient generalized Newton-penalty algorithm for solving these problems. Our computational results show that our new simple algorithm is much faster and better than the approach of Rojas et al. (2000), which requires parameter tuning for different problems. Keywords: Large Scale Ill-Quadratic Problems, Penalty Method MSC (2000) No.: 90C06, 90C20 1. Introduction Large scale quadratic problems arise in many disciplines like image restoration [ Rojas and Steihaug (2002)]. This is a well studied problem and efficient algorithms have been developed to solve various forms of , for example see Salahi (2009 a, b). It is often the case that the problem is ill- conditioned. Thus, even a small error in problem data might significantly change the solution. 273 274 Maziar Salahi and Moslem Ganji Regularization is a technique to deal with such situation and the well known one is the Tikhonov regularization [Tikhonov (1963)], which considers the following problem instead of where is the so called regularization parameter [Hansen ].
  • An Augmented Lagrangian Method for Optimization Problems in Banach Spaces Arxiv:1807.04467V1 [Math.OC] 12 Jul 2018

    An Augmented Lagrangian Method for Optimization Problems in Banach Spaces Arxiv:1807.04467V1 [Math.OC] 12 Jul 2018

    An Augmented Lagrangian Method for Optimization Problems in Banach Spaces∗ Christian Kanzowy Daniel Stecky Daniel Wachsmuthy October 23, 2017 Abstract. We propose a variant of the classical augmented Lagrangian method for con- strained optimization problems in Banach spaces. Our theoretical framework does not require any convexity or second-order assumptions and allows the treatment of inequality constraints with infinite-dimensional image space. Moreover, we discuss the convergence properties of our algorithm with regard to feasibility, global optimality, and KKT conditions. Some numerical results are given to illustrate the practical viability of the method. Keywords. Constrained optimization, augmented Lagrangian method, Banach space, inequality constraints, global convergence. 1 Introduction Let X, Y be (real) Banach spaces and let f : X ! R, g : X ! Y be given mappings. The aim of this paper is to describe an augmented Lagrangian method for the solution of the constrained optimization problem min f(x) subject to (s.t.) g(x) ≤ 0: (P ) We assume that Y,! L2(Ω) densely for some measure space Ω, where the natural order on L2(Ω) induces the order on Y . A detailed description together with some remarks about this setting is given in Section 2. Augmented Lagrangian methods for the solution of optimization problems belong to the most famous and successful algorithms for the solution of finite-dimensional problems and are described in almost all text books on continuous optimization, see, e.g. [8, 29]. Their generalization to infinite-dimensional problems has received considerable attention throughout the last decades [6, 7, 16, 18, 20, 21, 22, 24, 25]. However, most existing arXiv:1807.04467v1 [math.OC] 12 Jul 2018 approaches either assume a very specific problem structure [6, 7], require strong convexity assumptions [18] or consider only the case where Y is finite-dimensional [20, 24].
  • DSDP5: Software for Semidefinite Programming

    DSDP5: Software for Semidefinite Programming

    DSDP5: Software for Semidefinite Programming Preprint ANL/MCS-P1289-0905 STEVEN J. BENSON Mathematics and Computer Science Division, Argonne National Laboratory and YINYU YE Department of Management Science and Engineering, Stanford University DSDP implements the dual-scaling algorithm for semidefinite programming. The source code for this interior-point algorithm, written entirely in ANSI C, is freely available. The solver can be used as a subroutine library, as a function within the Matlab environment, or as an executable that reads and writes to data files. Initiated in 1997, DSDP has developed into an efficient and robust general-purpose solver for semidefinite programming. Its features include a convergence proof with polynomially bounded worst-case complexity, primal and dual feasible solutions when they exist, certificates of infeasibility when solutions do not exist, initial points that can be feasible or infeasible, relatively low memory requirements for an interior-point method, sparse and low- rank data structures, extensibility that allows applications to customize the solver and improve its performance, a subroutine library that enables it to be linked to larger applications, scalable performance for large problems on parallel architectures, and a well-documented interface and examples of its use. The package has been used in many applications and tested for efficiency, robustness, and ease of use. Categories and Subject Descriptors: G.1.6 [Numerical Analysis]: Optimization; D.2.13 [Soft- ware Engineering]: Reusable Software - Reusable libraries General Terms: semidefinite programming, linear matrix inequalities, conic programming, interior- point methods Additional Key Words and Phrases: dual-scaling algorithm 1. SEMIDEFINITE PROGRAMMING Over the last fifteen years, considerable attention has been paid to optimization problems in which the variable is not a vector but a symmetric matrix that is required to be positive semidefinite.
  • Comparison of Constraint-Handling Techniques for Metaheuristic Optimization

    Comparison of Constraint-Handling Techniques for Metaheuristic Optimization

    Comparison of Constraint-Handling Techniques for Metaheuristic Optimization Xing-Shi He1, Qin-Wei Fan1, Mehmet Karamanoglu2, and Xin-She Yang2? 1 College of Science, Xi'an Polytechnic University, Xi'an 710048, P. R. China. 2 School of Science and Technology, Middlesex University, London, UK. [email protected] Abstract. Many design problems in engineering have highly nonlinear constraints and the proper handling of such constraints can be important to ensure solution quality. There are many different ways of handling constraints and different algorithms for optimization problems, which makes it difficult to choose for users. This paper compares six different constraint-handling techniques such as penalty methods, barrier func- tions, -constrained method, feasibility criteria and stochastic ranking. The pressure vessel design problem is solved by the flower pollination algorithm, and results show that stochastic ranking and -constrained method are most effective for this type of design optimization. Keywords: Constraint-handling techniques · Feasibility · Flower polli- nation algorithm · Nature-inspired computation · Optimization. 1 Introduction A vast majority of problems in engineering and science can be formulated as optimization problems with a set of inequality and equality constraints. How- ever, such problems can be challenging to solve, not only because of the high nonlinearity of problem functions, but also because of the complex search do- main shapes enclosed by various constraints. Consequently, both the choice of optimization algorithms and the ways of handling complex constraints are cru- cially important. Efficient algorithms may not exist for a given type of problem. Even with an efficient algorithm for a given problem, different ways of handling constraints may lead to varied accuracy.
  • PENNON a Code for Convex Nonlinear and Semidefinite Programming

    PENNON a Code for Convex Nonlinear and Semidefinite Programming

    PENNON A Code for Convex Nonlinear and Semidefinite Programming Michal Koˇcvara∗ Michael Stingl∗∗ Abstract We introduce a computer program PENNON for the solution of problems of con- vex Nonlinear and Semidefinite Programming (NLP-SDP). The algorithm used in PENNON is a generalized version of the Augmented Lagrangian method, originally introduced by Ben-Tal and Zibulevsky for convex NLP problems. We present generalization of this algorithm to convex NLP-SDP problems, as imple- mented in PENNON and details of its implementation. The code can also solve second-order conic programming (SOCP) problems, as well as problems with a mixture of SDP, SOCP and NLP constraints. Results of extensive numerical tests and comparison with other optimization codes are presented. The test examples show that PENNON is particularly suitable for large sparse problems. 1 Introduction A class of iterative methods for convex nonlinear programming problems, in- troduced by Ben-Tal and Zibulevsky [3] and named PBM, proved to be very efficient for solving large-scale nonlinear programming (NLP) problems, in par- ticular those arising from optimization of mechanical structures. The framework of the algorithm is given by the augmented Lagrangian method; the difference to the classic algorithm is in the definition of the augmented Lagrangian func- tion. This is defined using a special penalty/barrier function satisfying certain properties; this definition guarantees good behavior of the Newton method when minimizing the augmented Lagrangian function. The PBM algorithm has been recently generalized to convex semidefinite programming problems [10]. The idea is to use the PBM penalty function to construct another function that penalizes matrix inequality constraints.