OPTIMIZATION and VARIATIONAL INEQUALITIES Basic Statements and Constructions

Total Page:16

File Type:pdf, Size:1020Kb

OPTIMIZATION and VARIATIONAL INEQUALITIES Basic Statements and Constructions 1 OPTIMIZATION and VARIATIONAL INEQUALITIES Basic statements and constructions Bernd Kummer; [email protected] ; May 2011 Abstract. This paper summarizes basic facts in both finite and infinite dimensional optimization and for variational inequalities. In addition, partially new results (concerning methods and stability) are included. They were elaborated in joint work with D. Klatte, Univ. Z¨urich and J. Heerda, HU-Berlin. Key words. Existence of solutions, (strong) duality, Karush-Kuhn-Tucker points, Kojima-function, generalized equation, (quasi -) variational inequality, multifunctions, Kakutani Theo- rem, MFCQ, subgradient, subdifferential, conjugate function, vector-optimization, Pareto- optimality, solution methods, penalization, barriers, non-smooth Newton method, per- turbed solutions, stability, Ekeland's variational principle, Lyusternik theorem, modified successive approximation, Aubin property, metric regularity, calmness, upper and lower Lipschitz, inverse and implicit functions, generalized Jacobian @cf, generalized derivatives TF , CF , D∗F , Clarkes directional derivative, limiting normals and subdifferentials, strict differentiability. Bemerkung. Das Material wird st¨andigaktualisiert. Es soll Vorlesungen zur Optimierung und zu Vari- ationsungleichungen (glatt, nichtglatt) unterst¨utzen. Es entstand aus Scripten, die teils in englisch teils in deutsch aufgeschrieben wurden. Daher gibt es noch Passagen in beiden Sprachen sowie z.T. verschiedene Schreibweisen wie etwa cT x und hc; xi f¨urdas Skalarpro- dukt und die kanonische Bilinearform. Hinweise auf Fehler sind willkommen. 2 Contents 1 Introduction 7 1.1 The intentions of this script . .7 1.2 Notations . .7 1.3 Was ist (mathematische) Optimierung ? . .8 1.4 Particular classes of problems . 10 1.5 Crucial questions . 11 2 Lineare Optimierung 13 2.1 Dualit¨atund Existenzsatz f¨urLOP . 15 2.2 Die Simplexmethode . 17 2.3 Schattenpreise . 22 2.4 Klassische Transportaufgabe . 22 2.5 Maximalstromproblem . 24 2.6 The core of a game . 25 2.7 Aquivalenz:¨ Matrixspiel und Lineare Optimierung . 26 2.7.1 Matrixspiel . 26 2.7.2 Matrixspiel als LOP; Bestimmen einer GGS . 27 2.7.3 LOP als Matrixspiel; Bestimmen einer LOP-L¨osung . 28 2.8 The Julia Robinson Algorithm . 28 3 Elements of convex analysis 31 3.1 Separation of convex sets . 31 3.2 Brouwer's and Kakutani's Fixpunktsatz . 34 3.2.1 The theorems in IRn ........................... 34 3.2.2 The theorems in normed spaces . 38 3.3 Nash Equilibria and Minimax Theorem . 38 3.3.1 The basic Existence Theorem . 39 3.3.2 Normalized Nash Equilibria . 39 3.4 Classical subdifferentials, normals, conjugation . 40 3.4.1 Definitions and existence . 41 3.4.2 Directional derivative, subdifferential, optimality condition . 43 3.4.3 Particular subdifferentials in analytical form . 46 3.5 Normal cones and Variational Inequalities . 47 3.5.1 Definitions and Basic motivations . 47 3.5.2 VI's and Fixed-points . 48 3.5.3 Monotonicity . 49 ∗ 3.5.4 Analytical formulas for CM and NM = CM and hypo-monotonicty . 50 3.6 An Application: Tschebyschev-approximation . 52 3 4 CONTENTS 4 Nonlinear Problems in IRn 53 4.1 Notwendige Optimalit¨atsbedingungen . 53 4.1.1 KKT- Bedingungen, Lagrange- Multiplikatoren . 53 4.1.2 Concrete Constraint Qualifications . 57 4.1.3 Calmness, MFCQ and Aubin-Property . 59 4.1.4 Calmness and Complementarity . 60 4.2 The standard second order condition . 61 4.3 Eine Anwendung: Brechungsgesetz . 63 5 Duality and perturbations in Banach spaces 65 5.1 Separation in a normed space . 65 5.2 Strong duality and subgradients in B-spaces . 66 5.3 Minimax and saddle points . 68 5.4 Ensuring strong duality and existence of LM's . 69 5.4.1 The convex case . 69 5.4.2 The C1- case and linear approximations . 70 5.5 Modifications for vector- optimization . 71 6 L¨osungsverfahren; NLO in endl. Dimension 73 6.1 Freie Minima, Newton Meth. und variable Metrik . 73 6.1.1 "Fixed stepsize" or Armijo- Golstein rule . 73 6.1.2 Line search . 74 6.1.3 Variable metric . 75 6.2 The nonsmooth Newton method . 75 6.3 Cyclic Projections; Feijer Methode . 78 6.4 Proximal Points, Moreau-Yosida approximation . 78 6.5 Schnittmethoden; Kelley-Schnitte . 79 6.5.1 The original version . 79 6.5.2 Minimizing a convex function by linear approximation . 80 6.6 Strafmethoden . 81 6.7 Barrieremethoden . 83 6.8 Ganzzahligkeit und "Branch and Bound" . 83 6.9 Second-order methods . 85 7 Stability: Motivations and first results 87 7.1 Two-stage optimization and the main problems . 87 7.1.1 The simplest two-stage problem . 87 7.1.2 The general case . 88 7.2 KKT under strict complementarity . 89 7.3 Basic generalized derivatives . 91 7.3.1 CF and TF ................................ 91 7.3.2 Co-derivatives and generalized Jacobians . 91 7.4 Some chain rules . 92 7.4.1 Adding a function . 93 7.4.2 Inverse mappings . 93 7.4.3 Derivatives of the inverse S = (h + F )−1 ................ 93 7.4.4 Composed mappings . 93 7.4.5 Linear transformations or diffeomorphisms of the space . 94 7.5 Loc. Lipschitzian inverse and implicit functions . 94 7.5.1 Inverse Lipschitz functions . 94 CONTENTS 5 7.5.2 Implicit Lipschitz functions . 96 8 KKT points as zeros of equations 99 8.1 (S. M. Robinson's) Generalized equations . 99 8.2 NCP- functions . 100 8.3 Kojima's function . 101 8.3.1 The product form . 101 8.3.2 Implicit KKT-points . 103 8.4 Relations to penalty-barrier functions . 104 8.4.1 Quadratic Penalties . 104 8.4.2 Quadratic and logarithmic barriers . 105 8.4.3 Modifications and estimates . 105 8.5 Regularity and Newton Meth. for KKT points and VI's . 106 9 More elements of nonsmooth analysis 109 9.1 Abstract normals, subgradients, coderivatives, tangents . 109 9.1.1 Subdifferentials and optimality conditions derived from normals . 110 9.1.2 (i) Usual localized normals . 110 9.1.3 (ii) Fr´echet-normals . 111 9.1.4 (iii) Limiting "- normals. 112 9.1.5 (iv) Limiting Fr´echet-normals. 113 9.1.6 Equivalence of the limiting concepts . 113 9.2 (v) Clarke's approach . 114 9.2.1 Basic definitions and interrelations . 114 9.2.2 Mean-value theorems and Generalized Jacobian . 118 9.3 Subdifferentials derived from optimality conditions . 120 9.3.1 The elementary principle . 120 9.3.2 The empty and limiting F-subdifferential . 121 10 Stability (regularity) of solutions 123 10.1 Definitions of Stability . 123 10.2 Interrelations and composed mappings . 124 10.3 Stability of multifunctions via Lipschitz functions . 125 10.3.1 Basic transformations . 125 10.3.2 Solutions of (exact) penalty problems . 127 10.4 Stability and algorithms . 128 10.5 The algorithmic framework . 129 10.6 Stability via approximate projections . 131 10.7 Stability of a locally Lipschitz operator . 132 10.7.1 Calmness and the relative slack for inequality systems . 133 10.7.2 The relative slack for solving C1 inequality systems . 135 10.7.3 Calmness and crucial linear inequality systems . 136 10.8 Modified successive approximation for −h(x) 2 F (x)............. 137 11 Ekelands Principle, Stability and gen. Derivatives 139 11.1 Ekeland's principle and Aubin property . 139 11.1.1 Application to the Aubin property . 140 11.1.2 Weakly stationary points . 142 11.2 Stability in terms of generalized derivatives . 143 11.2.1 Strongly Lipschitz . 143 11.2.2 Upper Lipschitz . 144 6 CONTENTS 11.2.3 Lower Lipschitz . 144 11.2.4 Aubin property . 144 11.2.5 Summary . ..
Recommended publications
  • Lecture 7: Convex Analysis and Fenchel-Moreau Theorem
    Lecture 7: Convex Analysis and Fenchel-Moreau Theorem The main tools in mathematical finance are from theory of stochastic processes because things are random. However, many objects are convex as well, e.g. collections of probability measures or trading strategies, utility functions, risk measures, etc.. Convex duality methods often lead to new insight, computa- tional techniques and optimality conditions; for instance, pricing formulas for financial instruments and characterizations of different types of no-arbitrage conditions. Convex sets Let X be a real topological vector space, and X∗ denote the topological (or algebraic) dual of X. Throughout, we assume that subsets and functionals are proper, i.e., ;= 6 C 6= X and −∞ < f 6≡ 1. Definition 1. Let C ⊂ X. We call C affine if λx + (1 − λ)y 2 C 8x; y 2 C; λ 2] − 1; 1[; convex if λx + (1 − λ)y 2 C 8x; y 2 C; λ 2 [0; 1]; cone if λx 2 C 8x 2 C; λ 2]0; 1]: Recall that a set A is called algebraic open, if the sets ft : x + tvg are open in R for every x; v 2 X. In particular, open sets are algebraically open. Theorem 1. (Separating Hyperplane Theorem) Let A; B be convex subsets of X, A (algebraic) open. Then the following are equal: 1. A and B are disjoint. 2. There exists an affine set fx 2 X : f(x) = cg, f 2 X∗, c 2 R; such that A ⊂ fx 2 X : f(x) < cg and B ⊂ fx 2 X : f(x) ≥ cg.
    [Show full text]
  • ABOUT STATIONARITY and REGULARITY in VARIATIONAL ANALYSIS 1. Introduction the Paper Investigates Extremality, Stationarity and R
    1 ABOUT STATIONARITY AND REGULARITY IN VARIATIONAL ANALYSIS 2 ALEXANDER Y. KRUGER To Boris Mordukhovich on his 60th birthday Abstract. Stationarity and regularity concepts for the three typical for variational analysis classes of objects { real-valued functions, collections of sets, and multifunctions { are investi- gated. An attempt is maid to present a classification scheme for such concepts and to show that properties introduced for objects from different classes can be treated in a similar way. Furthermore, in many cases the corresponding properties appear to be in a sense equivalent. The properties are defined in terms of certain constants which in the case of regularity proper- ties provide also some quantitative characterizations of these properties. The relations between different constants and properties are discussed. An important feature of the new variational techniques is that they can handle nonsmooth 3 functions, sets and multifunctions equally well Borwein and Zhu [8] 4 1. Introduction 5 The paper investigates extremality, stationarity and regularity properties of real-valued func- 6 tions, collections of sets, and multifunctions attempting at developing a unifying scheme for defining 7 and using such properties. 8 Under different names this type of properties have been explored for centuries. A classical 9 example of a stationarity condition is given by the Fermat theorem on local minima and max- 10 ima of differentiable functions. In a sense, any necessary optimality (extremality) conditions de- 11 fine/characterize certain stationarity (singularity/irregularity) properties. The separation theorem 12 also characterizes a kind of extremal (stationary) behavior of convex sets. 13 Surjectivity of a linear continuous mapping in the Banach open mapping theorem (and its 14 extension to nonlinear mappings known as Lyusternik-Graves theorem) is an example of a regularity 15 condition.
    [Show full text]
  • Monotonic Transformations: Cardinal Versus Ordinal Utility
    Natalia Lazzati Mathematics for Economics (Part I) Note 10: Quasiconcave and Pseudoconcave Functions Note 10 is based on Madden (1986, Ch. 13, 14) and Simon and Blume (1994, Ch. 21). Monotonic transformations: Cardinal Versus Ordinal Utility A utility function could be said to measure the level of satisfaction associated to each commodity bundle. Nowadays, no economist really believes that a real number can be assigned to each commodity bundle which expresses (in utils?) the consumer’slevel of satisfaction with this bundle. Economists believe that consumers have well-behaved preferences over bundles and that, given any two bundles, a consumer can indicate a preference of one over the other or the indi¤erence between the two. Although economists work with utility functions, they are concerned with the level sets of such functions, not with the number that the utility function assigns to any given level set. In consumer theory these level sets are called indi¤erence curves. A property of utility functions is called ordinal if it depends only on the shape and location of a consumer’sindi¤erence curves. It is alternatively called cardinal if it also depends on the actual amount of utility the utility function assigns to each indi¤erence set. In the modern approach, we say that two utility functions are equivalent if they have the same indi¤erence sets, although they may assign di¤erent numbers to each level set. For instance, let 2 2 u : R+ R where u (x) = x1x2 be a utility function, and let v : R+ R be the utility function ! ! v (x) = u (x) + 1: These two utility functions represent the same preferences and are therefore equivalent.
    [Show full text]
  • The Ekeland Variational Principle, the Bishop-Phelps Theorem, and The
    The Ekeland Variational Principle, the Bishop-Phelps Theorem, and the Brøndsted-Rockafellar Theorem Our aim is to prove the Ekeland Variational Principle which is an abstract result that found numerous applications in various fields of Mathematics. As its application to Convex Analysis, we provide a proof of the famous Bishop- Phelps Theorem and some related results. Let us recall that the epigraph and the hypograph of a function f : X ! [−∞; +1] (where X is a set) are the following subsets of X × R: epi f = f(x; t) 2 X × R : t ≥ f(x)g ; hyp f = f(x; t) 2 X × R : t ≤ f(x)g : Notice that, if f > −∞ on X then epi f 6= ; if and only if f is proper, that is, f is finite at at least one point. 0.1. The Ekeland Variational Principle. In what follows, (X; d) is a metric space. Given λ > 0 and (x; t) 2 X × R, we define the set Kλ(x; t) = f(y; s): s ≤ t − λd(x; y)g = f(y; s): λd(x; y) ≤ t − sg ⊂ X × R: Notice that Kλ(x; t) = hyp[t − λ(x; ·)]. Let us state some useful properties of these sets. Lemma 0.1. (a) Kλ(x; t) is closed and contains (x; t). (b) If (¯x; t¯) 2 Kλ(x; t) then Kλ(¯x; t¯) ⊂ Kλ(x; t). (c) If (xn; tn) ! (x; t) and (xn+1; tn+1) 2 Kλ(xn; tn) (n 2 N), then \ Kλ(x; t) = Kλ(xn; tn) : n2N Proof. (a) is quite easy.
    [Show full text]
  • Conjugate of Some Rational Functions and Convex Envelope of Quadratic Functions Over a Polytope
    Conjugate of some Rational functions and Convex Envelope of Quadratic functions over a Polytope by Deepak Kumar Bachelor of Technology, Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 2012 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE COLLEGE OF GRADUATE STUDIES (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA (Okanagan) January 2019 c Deepak Kumar, 2019 The following individuals certify that they have read, and recommend to the College of Graduate Studies for acceptance, a thesis/dissertation en- titled: Conjugate of some Rational functions and Convex Envelope of Quadratic functions over a Polytope submitted by Deepak Kumar in partial fulfilment of the requirements of the degree of Master of Science. Dr. Yves Lucet, I. K. Barber School of Arts & Sciences Supervisor Dr. Heinz Bauschke, I. K. Barber School of Arts & Sciences Supervisory Committee Member Dr. Warren Hare, I. K. Barber School of Arts & Sciences Supervisory Committee Member Dr. Homayoun Najjaran, School of Engineering University Examiner ii Abstract Computing the convex envelope or biconjugate is the core operation that bridges the domain of nonconvex analysis with convex analysis. For a bi- variate PLQ function defined over a polytope, we start with computing the convex envelope of each piece. This convex envelope is characterized by a polyhedral subdivision such that over each member of the subdivision, it has an implicitly defined rational form(square of a linear function over a linear function). Computing this convex envelope involves solving an expo- nential number of subproblems which, in turn, leads to an exponential time algorithm.
    [Show full text]
  • On Nonconvex Subdifferential Calculus in Banach Spaces1
    Journal of Convex Analysis Volume 2 (1995), No.1/2, 211{227 On Nonconvex Subdifferential Calculus in Banach Spaces1 Boris S. Mordukhovich, Yongheng Shao Department of Mathematics, Wayne State University, Detroit, Michigan 48202, U.S.A. e-mail: [email protected] Received July 1994 Revised manuscript received March 1995 Dedicated to R. T. Rockafellar on his 60th Birthday We study a concept of subdifferential for general extended-real-valued functions defined on arbitrary Banach spaces. At any point considered this basic subdifferential is a nonconvex set of subgradients formed by sequential weak-star limits of the so-called Fr´echet "-subgradients of the function at points nearby. It is known that such and related constructions possess full calculus under special geometric assumptions on Banach spaces where they are defined. In this paper we establish some useful calculus rules in the general Banach space setting. Keywords : Nonsmooth analysis, generalized differentiation, extended-real-valued functions, nonconvex calculus, Banach spaces, sequential limits 1991 Mathematics Subject Classification: Primary 49J52; Secondary 58C20 1. Introduction It is well known that for studying local behavior of nonsmooth functions one can suc- cessfully use an appropriate concept of subdifferential (set of subgradients) which replaces the classical gradient at points of nondifferentiability in the usual two-sided sense. Such a concept first appeared for convex functions in the context of convex analysis that has had many significant applications to the range of problems in optimization, economics, mechanics, etc. One of the most important branches of convex analysis is subdifferen- tial calculus for convex extended-real-valued functions that was developed in the 1960's, mainly by Moreau and Rockafellar; see [23] and [25, 27] for expositions and references.
    [Show full text]
  • EECS 127/227AT Fall 2020 Homework 5 Homework 5 Is Due on Gradescope by Friday 10/09 at 11.59 P.M
    EECS 127/227AT Fall 2020 Homework 5 Homework 5 is due on Gradescope by Friday 10/09 at 11.59 p.m. 1 A combinatorial problem formulated as a convex optimization problem. In this question we will see how combinatorial optimization problems can also sometimes be solved via related convex optimization problems. We also saw this in a different context in problem 5 on Homework 3 when we related λ2 to φ(G) for a graph. This problem does not have any connection with the earlier problem in Homework 3. The relevant sections of the textbooks are Sec. 8.3 of the textbook of Calafiore and El Ghaoui and Secs. 4.1 and 4.2 (except 4.2.5) of the textbook of Boyd and Vandenberghe. We have to decide which items to sell among a collection of n items. The items have given selling prices si > 0, i = 1; : : : ; n, so selling the item i results in revenue si. Selling item i also implies a transaction cost ci > 0. The total transaction cost is the sum of transaction costs of the items sold, plus a fixed amount, which we normalize to 1. We are interested in the following decision problem: decide which items to sell, with the aim of maximizing the margin, i.e. the ratio of the total revenue to the total transaction cost. Note that this is a combinatorial optimization problem, because the set of choices is a discrete set (here it is the set of all subsets of f1; : : : ; ng). In this question we will see that, nevertheless, it is possible to pose this combinatorial optimization problem as a convex optimization problem.
    [Show full text]
  • Lecture Slides on Convex Analysis And
    LECTURE SLIDES ON CONVEX ANALYSIS AND OPTIMIZATION BASED ON 6.253 CLASS LECTURES AT THE MASS. INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASS SPRING 2014 BY DIMITRI P. BERTSEKAS http://web.mit.edu/dimitrib/www/home.html Based on the books 1) “Convex Optimization Theory,” Athena Scien- tific, 2009 2) “Convex Optimization Algorithms,” Athena Sci- entific, 2014 (in press) Supplementary material (solved exercises, etc) at http://www.athenasc.com/convexduality.html LECTURE 1 AN INTRODUCTION TO THE COURSE LECTURE OUTLINE The Role of Convexity in Optimization • Duality Theory • Algorithms and Duality • Course Organization • HISTORY AND PREHISTORY Prehistory: Early 1900s - 1949. • Caratheodory, Minkowski, Steinitz, Farkas. − Properties of convex sets and functions. − Fenchel - Rockafellar era: 1949 - mid 1980s. • Duality theory. − Minimax/game theory (von Neumann). − (Sub)differentiability, optimality conditions, − sensitivity. Modern era - Paradigm shift: Mid 1980s - present. • Nonsmooth analysis (a theoretical/esoteric − direction). Algorithms (a practical/high impact direc- − tion). A change in the assumptions underlying the − field. OPTIMIZATION PROBLEMS Generic form: • minimize f(x) subject to x C ∈ Cost function f : n , constraint set C, e.g., ℜ 7→ ℜ C = X x h1(x) = 0,...,hm(x) = 0 ∩ | x g1(x) 0,...,gr(x) 0 ∩ | ≤ ≤ Continuous vs discrete problem distinction • Convex programming problems are those for which• f and C are convex They are continuous problems − They are nice, and have beautiful and intu- − itive structure However, convexity permeates
    [Show full text]
  • 2011-Dissertation-Khajavirad.Pdf
    Carnegie Mellon University Carnegie Institute of Technology thesis Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy title Convexification Techniques for Global Optimization of Nonconvex Nonlinear Optimization Problems presented by Aida Khajavirad accepted by the department of mechanical engineering advisor date advisor date department head date approved by the college coucil dean date Convexification Techniques for Global Optimization of Nonconvex Nonlinear Optimization Problems Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Mechanical Engineering Aida Khajavirad B.S., Mechanical Engineering, University of Tehran Carnegie Mellon University Pittsburgh, PA August 2011 c 2011 - Aida Khajavirad All rights reserved. Abstract Fueled by applications across science and engineering, general-purpose deter- ministic global optimization algorithms have been developed for nonconvex nonlinear optimization problems over the past two decades. Central to the ef- ficiency of such methods is their ability to construct sharp convex relaxations. Current general-purpose global solvers rely on factorable programming tech- niques to iteratively decompose nonconvex factorable functions, through the introduction of variables and constraints for intermediate functional expres- sions, until each intermediate expression can be outer-approximated by a con- vex feasible set. While it is easy to automate, this factorable programming technique often leads to weak relaxations. In this thesis, we develop the theory of several new classes of cutting planes based upon ideas from generalized convexity and convex analysis. Namely, we (i) introduce a new method to outer-approximate convex-transformable func- tions, an important class of generalized convex functions that subsumes many functional forms that frequently appear in nonconvex problems and, (ii) derive closed-form expressions for the convex envelopes of various types of functions that are the building blocks of nonconvex problems.
    [Show full text]
  • A Microlocal Characterization of Lipschitz Continuity
    A microlocal characterization of Lipschitz continuity Benoˆıt Jubin∗† November 3, 2017 Abstract We study continuous maps between differential manifolds from a microlocal point of view. In particular, we characterize the Lipschitz continuity of these maps in terms of the microsupport of the constant sheaf on their graph. Furthermore, we give lower and upper bounds on the microsupport of the graph of a continuous map and use these bounds to characterize strict differentiability in microlocal terms. Contents Introduction 1 1 Background material 4 2 Microsupports associated with subsets 11 3 Whitney cones of maps 14 4 Conormals of maps 19 5 Real-valued functions 24 6 Main results 29 7 Topological submanifolds 37 A Appendix: Lipschitz continuity and strict differentiability 39 B Appendix: tangent cones 40 arXiv:1611.04644v2 [math.AG] 2 Nov 2017 Introduction Microlocal analysis is the study of phenomena occurring on differential manifolds via a study in their cotangent bundle; for instance, the study of the singularities of solutions of a partial differential equation on a manifold M via the study of their wavefront set in T ∗M. A general setting for microlocal analysis is the microlocal theory of sheaves, developed by M. Kashi- wara and P. Schapira (see [KS90]). In [Vic], N. Vichery used this theory to study from a microlocal viewpoint continuous real-valued functions on differential manifolds, and to define for these functions a good notion of subdifferential. We extend this study to continuous maps ∗Key words: microlocal theory of sheaves, Lipschitz maps, Dini derivatives. †MSC: 35A27, 26A16, 26A24. 1 between differential manifolds. We study simultaneously the tangent aspects of the subject to emphasize the parallelism between the tangent and cotangent sides.
    [Show full text]
  • Duality and Auxiliary Functions for Bregman Distances
    Duality and Auxiliary Functions for Bregman Distances Stephen Della Pietra, Vincent Della Pietra, and John Lafferty October 8, 2001 CMU-CS-01-109 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract We formulate and prove a convex duality theorem for Bregman distances and present a technique based on auxiliary functions for deriving and proving convergence of iterative algorithms to minimize Bregman distance subject to linear constraints. This research was partially supported by the Advanced Research and Development Activity in Information Technology (ARDA), contract number MDA904-00-C-2106, and by the National Science Foundation (NSF), grant CCR-9805366. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of ARDA, NSF, or the U.S. government. Keywords: Bregman distance, convex duality, Legendre functions, auxiliary functions I. Introduction Convexity plays a central role in a wide variety of machine learning and statistical inference prob- lems. A standard paradigm is to distinguish a preferred member from a set of candidates based upon a convex impurity measure or loss function tailored to the specific problem to be solved. Examples include least squares regression, decision trees, boosting, online learning, maximum like- lihood for exponential models, logistic regression, maximum entropy, and support vector machines. Such problems can often be naturally cast as convex optimization problems involving a Bregman distance, which can lead to new algorithms, analytical tools, and insights derived from the powerful methods of convex analysis. In this paper we formulate and prove a convex duality theorem for minimizing a general class of Bregman distances subject to linear constraints.
    [Show full text]
  • Radial Duality Part I: Foundations
    Radial Duality Part I: Foundations Benjamin Grimmer∗ Abstract Renegar [11] introduced a novel approach to transforming generic conic optimization prob- lems into unconstrained, uniformly Lipschitz continuous minimization. We introduce radial transformations generalizing these ideas, equipped with an entirely new motivation and devel- opment that avoids any reliance on convex cones or functions. Perhaps of greatest practical importance, this facilitates the development of new families of projection-free first-order meth- ods applicable even in the presence of nonconvex objectives and constraint sets. Our generalized construction of this radial transformation uncovers that it is dual (i.e., self-inverse) for a wide range of functions including all concave objectives. This gives a powerful new duality relat- ing optimization problems to their radially dual problem. For a broad class of functions, we characterize continuity, differentiability, and convexity under the radial transformation as well as develop a calculus for it. This radial duality provides a strong foundation for designing projection-free radial optimization algorithms, which is carried out in the second part of this work. 1 Introduction Renegar [11] introduced a framework for conic programming (and by reduction, convex optimiza- tion), which turns such problems into uniformly Lipschitz optimization. After being radially trans- formed, a simple subgradient method can be applied and analyzed. Notably, even for constrained problems, such an algorithm maintains a feasible solution at each iteration while avoiding the use of orthogonal projections, which can often be a bottleneck for first-order methods. Subsequently, Grimmer [8] showed that in the case of convex optimization, a simplified radial subgradient method can be applied with simpler and stronger convergence guarantees.
    [Show full text]