Convex Optimization

Total Page:16

File Type:pdf, Size:1020Kb

Convex Optimization Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 1 / 26 Key Concepts in Convex Analysis: Convex Sets Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 2 / 26 Key Concepts in Convex Analysis: Convex Functions Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 3 / 26 Key Concepts in Convex Analysis: Minimizers Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 4 / 26 A β−strongly convex function satisfies a stronger condition: 8λ 2 [0; 1] β f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) − λ(1 − λ)kx − x0k2 2 2 strong convexity ) Strong convexity strict convexity. 6( Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: 8λ 2 [0; 1], f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) convexity Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26 ) Strong convexity strict convexity. 6( Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: 8λ 2 [0; 1], f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) A β−strongly convex function satisfies a stronger condition: 8λ 2 [0; 1] β f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) − λ(1 − λ)kx − x0k2 2 2 convexity strong convexity Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26 Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: 8λ 2 [0; 1], f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) A β−strongly convex function satisfies a stronger condition: 8λ 2 [0; 1] β f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) − λ(1 − λ)kx − x0k2 2 2 convexity strong convexity ) Strong convexity strict convexity. 6( Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26 v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound non-differentiable case Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 If f is differentiable, @f (x) = frf (x)g non-differentiable case Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg linear lower bound Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 non-differentiable case Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound non-differentiable case Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound non-differentiable case Notation: r~ f (x) is a subgradient of f at x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 Verify definition of a convex function. @2f (x) Check if @2x greater than or equal to 0 (for twice differentiable function). Show that it is constructed from simple convex functions with operations that preserver convexity. nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective Reference: Boyd and Vandenberghe (2004) Establishing convexity How to check if f (x) is a convex function? Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 7 / 26 Establishing convexity How to check if f (x) is a convex function? Verify definition of a convex function. @2f (x) Check if @2x greater than or equal to 0 (for twice differentiable function). Show that it is constructed from simple convex functions with operations that preserver convexity. nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective Reference: Boyd and Vandenberghe (2004) Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 7 / 26 Unconstrained Optimization Algorithms: First order methods (gradient descent, FISTA, etc.) Higher order methods (Newton's method, ellipsoid, etc.) ... Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 8 / 26 Algorithm: @f (xt ) gt = @x . xt = xt−1 − ηgt . Repeat until convergence. Gradient descent Problem: min f (x) x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 9 / 26 Gradient descent Problem: min f (x) x Algorithm: @f (xt ) gt = @x . xt = xt−1 − ηgt . Repeat until convergence. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 9 / 26 Algorithm: @f (xt ) gt = @x . −1 st = H gt , where H is the Hessian. xt = xt−1 − ηst . Repeat until convergence. Newton's method is a special case of steepest descent using Hessian norm. Newton's method Problem: min f (x) x Assume f is twice differentiable. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26 Newton's method is a special case of steepest descent using Hessian norm. Newton's method Problem: min f (x) x Assume f is twice differentiable. Algorithm: @f (xt ) gt = @x . −1 st = H gt , where H is the Hessian. xt = xt−1 − ηst . Repeat until convergence. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26 Newton's method Problem: min f (x) x Assume f is twice differentiable. Algorithm: @f (xt ) gt = @x . −1 st = H gt , where H is the Hessian. xt = xt−1 − ηst . Repeat until convergence. Newton's method is a special case of steepest descent using Hessian norm. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26 Lagrangian: m p X X L(x; λ, ν) = f (x) + λi gi (x) + νi hi (x) i=1 i=1 λi and νi are Lagrange multipliers. Suppose x is feasible and λ ≥ 0, then we get the lower bound: m f (x) ≥ L(x; λ, ν)8x 2 X; λ 2 R+ Duality Primal problem: minx f (x) subject to gi (x) ≤ 0 i = 1;:::; m hi (x) = 0 i = 1;:::; p for x 2 X. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26 Suppose x is feasible and λ ≥ 0, then we get the lower bound: m f (x) ≥ L(x; λ, ν)8x 2 X; λ 2 R+ Duality Primal problem: minx f (x) subject to gi (x) ≤ 0 i = 1;:::; m hi (x) = 0 i = 1;:::; p for x 2 X. Lagrangian: m p X X L(x; λ, ν) = f (x) + λi gi (x) + νi hi (x) i=1 i=1 λi and νi are Lagrange multipliers. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26 Duality Primal problem: minx f (x) subject to gi (x) ≤ 0 i = 1;:::; m hi (x) = 0 i = 1;:::; p for x 2 X. Lagrangian: m p X X L(x; λ, ν) = f (x) + λi gi (x) + νi hi (x) i=1 i=1 λi and νi are Lagrange multipliers. Suppose x is feasible and λ ≥ 0, then we get the lower bound: m f (x) ≥ L(x; λ, ν)8x 2 X; λ 2 R+ Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26 Lagrange dual function: min L(x; λ, ν) x This is a concave function, regardless of whether f (x) convex or not. Can be −∞ for some λ and ν. Lagrange dual problem: max L(x; λ, ν) subject to λ ≥ 0 λ,ν Dual feasible: if λ ≥ 0 and λ, ν 2 dom L(x; λ, ν). Dual optimal: d∗ = max min L(x; λ, ν) λ≥0,ν x Duality Primal optimal: p∗ = min max L(x; λ, ν) x λ≥0,ν Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 12 / 26 Lagrange dual problem: max L(x; λ, ν) subject to λ ≥ 0 λ,ν Dual feasible: if λ ≥ 0 and λ, ν 2 dom L(x; λ, ν). Dual optimal: d∗ = max min L(x; λ, ν) λ≥0,ν x Duality Primal optimal: p∗ = min max L(x; λ, ν) x λ≥0,ν Lagrange dual function: min L(x; λ, ν) x This is a concave function, regardless of whether f (x) convex or not.
Recommended publications
  • Lecture 7: Convex Analysis and Fenchel-Moreau Theorem
    Lecture 7: Convex Analysis and Fenchel-Moreau Theorem The main tools in mathematical finance are from theory of stochastic processes because things are random. However, many objects are convex as well, e.g. collections of probability measures or trading strategies, utility functions, risk measures, etc.. Convex duality methods often lead to new insight, computa- tional techniques and optimality conditions; for instance, pricing formulas for financial instruments and characterizations of different types of no-arbitrage conditions. Convex sets Let X be a real topological vector space, and X∗ denote the topological (or algebraic) dual of X. Throughout, we assume that subsets and functionals are proper, i.e., ;= 6 C 6= X and −∞ < f 6≡ 1. Definition 1. Let C ⊂ X. We call C affine if λx + (1 − λ)y 2 C 8x; y 2 C; λ 2] − 1; 1[; convex if λx + (1 − λ)y 2 C 8x; y 2 C; λ 2 [0; 1]; cone if λx 2 C 8x 2 C; λ 2]0; 1]: Recall that a set A is called algebraic open, if the sets ft : x + tvg are open in R for every x; v 2 X. In particular, open sets are algebraically open. Theorem 1. (Separating Hyperplane Theorem) Let A; B be convex subsets of X, A (algebraic) open. Then the following are equal: 1. A and B are disjoint. 2. There exists an affine set fx 2 X : f(x) = cg, f 2 X∗, c 2 R; such that A ⊂ fx 2 X : f(x) < cg and B ⊂ fx 2 X : f(x) ≥ cg.
    [Show full text]
  • Conjugate of Some Rational Functions and Convex Envelope of Quadratic Functions Over a Polytope
    Conjugate of some Rational functions and Convex Envelope of Quadratic functions over a Polytope by Deepak Kumar Bachelor of Technology, Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 2012 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE COLLEGE OF GRADUATE STUDIES (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA (Okanagan) January 2019 c Deepak Kumar, 2019 The following individuals certify that they have read, and recommend to the College of Graduate Studies for acceptance, a thesis/dissertation en- titled: Conjugate of some Rational functions and Convex Envelope of Quadratic functions over a Polytope submitted by Deepak Kumar in partial fulfilment of the requirements of the degree of Master of Science. Dr. Yves Lucet, I. K. Barber School of Arts & Sciences Supervisor Dr. Heinz Bauschke, I. K. Barber School of Arts & Sciences Supervisory Committee Member Dr. Warren Hare, I. K. Barber School of Arts & Sciences Supervisory Committee Member Dr. Homayoun Najjaran, School of Engineering University Examiner ii Abstract Computing the convex envelope or biconjugate is the core operation that bridges the domain of nonconvex analysis with convex analysis. For a bi- variate PLQ function defined over a polytope, we start with computing the convex envelope of each piece. This convex envelope is characterized by a polyhedral subdivision such that over each member of the subdivision, it has an implicitly defined rational form(square of a linear function over a linear function). Computing this convex envelope involves solving an expo- nential number of subproblems which, in turn, leads to an exponential time algorithm.
    [Show full text]
  • EECS 127/227AT Fall 2020 Homework 5 Homework 5 Is Due on Gradescope by Friday 10/09 at 11.59 P.M
    EECS 127/227AT Fall 2020 Homework 5 Homework 5 is due on Gradescope by Friday 10/09 at 11.59 p.m. 1 A combinatorial problem formulated as a convex optimization problem. In this question we will see how combinatorial optimization problems can also sometimes be solved via related convex optimization problems. We also saw this in a different context in problem 5 on Homework 3 when we related λ2 to φ(G) for a graph. This problem does not have any connection with the earlier problem in Homework 3. The relevant sections of the textbooks are Sec. 8.3 of the textbook of Calafiore and El Ghaoui and Secs. 4.1 and 4.2 (except 4.2.5) of the textbook of Boyd and Vandenberghe. We have to decide which items to sell among a collection of n items. The items have given selling prices si > 0, i = 1; : : : ; n, so selling the item i results in revenue si. Selling item i also implies a transaction cost ci > 0. The total transaction cost is the sum of transaction costs of the items sold, plus a fixed amount, which we normalize to 1. We are interested in the following decision problem: decide which items to sell, with the aim of maximizing the margin, i.e. the ratio of the total revenue to the total transaction cost. Note that this is a combinatorial optimization problem, because the set of choices is a discrete set (here it is the set of all subsets of f1; : : : ; ng). In this question we will see that, nevertheless, it is possible to pose this combinatorial optimization problem as a convex optimization problem.
    [Show full text]
  • Lecture Slides on Convex Analysis And
    LECTURE SLIDES ON CONVEX ANALYSIS AND OPTIMIZATION BASED ON 6.253 CLASS LECTURES AT THE MASS. INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASS SPRING 2014 BY DIMITRI P. BERTSEKAS http://web.mit.edu/dimitrib/www/home.html Based on the books 1) “Convex Optimization Theory,” Athena Scien- tific, 2009 2) “Convex Optimization Algorithms,” Athena Sci- entific, 2014 (in press) Supplementary material (solved exercises, etc) at http://www.athenasc.com/convexduality.html LECTURE 1 AN INTRODUCTION TO THE COURSE LECTURE OUTLINE The Role of Convexity in Optimization • Duality Theory • Algorithms and Duality • Course Organization • HISTORY AND PREHISTORY Prehistory: Early 1900s - 1949. • Caratheodory, Minkowski, Steinitz, Farkas. − Properties of convex sets and functions. − Fenchel - Rockafellar era: 1949 - mid 1980s. • Duality theory. − Minimax/game theory (von Neumann). − (Sub)differentiability, optimality conditions, − sensitivity. Modern era - Paradigm shift: Mid 1980s - present. • Nonsmooth analysis (a theoretical/esoteric − direction). Algorithms (a practical/high impact direc- − tion). A change in the assumptions underlying the − field. OPTIMIZATION PROBLEMS Generic form: • minimize f(x) subject to x C ∈ Cost function f : n , constraint set C, e.g., ℜ 7→ ℜ C = X x h1(x) = 0,...,hm(x) = 0 ∩ | x g1(x) 0,...,gr(x) 0 ∩ | ≤ ≤ Continuous vs discrete problem distinction • Convex programming problems are those for which• f and C are convex They are continuous problems − They are nice, and have beautiful and intu- − itive structure However, convexity permeates
    [Show full text]
  • OPTIMIZATION and VARIATIONAL INEQUALITIES Basic Statements and Constructions
    1 OPTIMIZATION and VARIATIONAL INEQUALITIES Basic statements and constructions Bernd Kummer; [email protected] ; May 2011 Abstract. This paper summarizes basic facts in both finite and infinite dimensional optimization and for variational inequalities. In addition, partially new results (concerning methods and stability) are included. They were elaborated in joint work with D. Klatte, Univ. Z¨urich and J. Heerda, HU-Berlin. Key words. Existence of solutions, (strong) duality, Karush-Kuhn-Tucker points, Kojima-function, generalized equation, (quasi -) variational inequality, multifunctions, Kakutani Theo- rem, MFCQ, subgradient, subdifferential, conjugate function, vector-optimization, Pareto- optimality, solution methods, penalization, barriers, non-smooth Newton method, per- turbed solutions, stability, Ekeland's variational principle, Lyusternik theorem, modified successive approximation, Aubin property, metric regularity, calmness, upper and lower Lipschitz, inverse and implicit functions, generalized Jacobian @cf, generalized derivatives TF , CF , D∗F , Clarkes directional derivative, limiting normals and subdifferentials, strict differentiability. Bemerkung. Das Material wird st¨andigaktualisiert. Es soll Vorlesungen zur Optimierung und zu Vari- ationsungleichungen (glatt, nichtglatt) unterst¨utzen. Es entstand aus Scripten, die teils in englisch teils in deutsch aufgeschrieben wurden. Daher gibt es noch Passagen in beiden Sprachen sowie z.T. verschiedene Schreibweisen wie etwa cT x und hc; xi f¨urdas Skalarpro- dukt und die kanonische Bilinearform. Hinweise auf Fehler sind willkommen. 2 Contents 1 Introduction 7 1.1 The intentions of this script . .7 1.2 Notations . .7 1.3 Was ist (mathematische) Optimierung ? . .8 1.4 Particular classes of problems . 10 1.5 Crucial questions . 11 2 Lineare Optimierung 13 2.1 Dualit¨atund Existenzsatz f¨urLOP . 15 2.2 Die Simplexmethode .
    [Show full text]
  • Duality and Auxiliary Functions for Bregman Distances
    Duality and Auxiliary Functions for Bregman Distances Stephen Della Pietra, Vincent Della Pietra, and John Lafferty October 8, 2001 CMU-CS-01-109 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract We formulate and prove a convex duality theorem for Bregman distances and present a technique based on auxiliary functions for deriving and proving convergence of iterative algorithms to minimize Bregman distance subject to linear constraints. This research was partially supported by the Advanced Research and Development Activity in Information Technology (ARDA), contract number MDA904-00-C-2106, and by the National Science Foundation (NSF), grant CCR-9805366. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of ARDA, NSF, or the U.S. government. Keywords: Bregman distance, convex duality, Legendre functions, auxiliary functions I. Introduction Convexity plays a central role in a wide variety of machine learning and statistical inference prob- lems. A standard paradigm is to distinguish a preferred member from a set of candidates based upon a convex impurity measure or loss function tailored to the specific problem to be solved. Examples include least squares regression, decision trees, boosting, online learning, maximum like- lihood for exponential models, logistic regression, maximum entropy, and support vector machines. Such problems can often be naturally cast as convex optimization problems involving a Bregman distance, which can lead to new algorithms, analytical tools, and insights derived from the powerful methods of convex analysis. In this paper we formulate and prove a convex duality theorem for minimizing a general class of Bregman distances subject to linear constraints.
    [Show full text]
  • Convex Integral Functionals 1
    TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 349, Number 4, April 1997, Pages 1421{1436 S 0002-9947(97)01478-5 CONVEX INTEGRAL FUNCTIONALS NIKOLAOS S. PAPAGEORGIOU Abstract. We study nonlinear integral functionals determined by normal convex integrands. First we obtain expressions for their convex conjugate, their "-subdifferential (" 0) and their "-directional derivative. Then we de- rive a necessary and sufficient≥ condition for the existence of an approximate solution for the continuous infimal convolution. We also obtain general con- ditions which guarantee the interchangeability of the conditional expectation and subdifferential operators. Finally we examine the conditional expectation of random sets. 1. Introduction In many areas of applied mathematics, such as control theory, mathematical economics and stochastic nonlinear programming, we have to deal with integral functionals of the form I'(x)= Ω'(!;x(!)) dµ(!), where (Ω; Σ,µ)isameasure space, ' :Ω X = + is a measurable integrand, X a Banach space R R R and x( ) belongs× to→ a space∪{ of measurable∞} X-valued functions on Ω. Classically· only finite Carath´eodory-type integrands '(!;x) were considered (i.e. ! '(!;x) measurable and x '(!;x) continuous). However in many important applications→ '(!; ) is discontinuous→ and it is important to allow '( ; )totake infinite values, since· this way we can represent more efficiently several· · types of constraints. In this situation convexity provides the necessary framework to develop a complete theory of such integral functionals. This line of research was initiated by Rockafellar [21, 22, 23], who introduced the notion of “normal integrand”, which proved to be a very fruitful concept for the study of I'( ). Subsequent important contributions were made by Ioffe-Tihomirov [12], Rockafellar· [24], Ioffe-Levin [11], Bismut [4], Levin [18], Hiriart-Urruty [9], Castaing-Valadier [6] and Komuro [15].
    [Show full text]
  • An Application of Maximal Exponential Models to Duality Theory
    entropy Article An Application of Maximal Exponential Models to Duality Theory Marina Santacroce ID , Paola Siri ID and Barbara Trivellato * ID Dipartimento di Scienze Matematiche “G.L. Lagrange”, Dipartimento di Eccellenza 2018-2022, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy; [email protected] (M.S.); [email protected] (P.S.) * Correspondence: [email protected]; Tel.: +39-011-090-7550 Received: 23 May 2018; Accepted: 22 June 2018; Published: 27 June 2018 Abstract: We use maximal exponential models to characterize a suitable polar cone in a mathematical convex optimization framework. A financial application of this result is provided, leading to a duality minimax theorem related to portfolio exponential utility maximization. Keywords: maximal exponential model; Orlicz spaces; convex duality; exponential utility 1. Introduction In this manuscript, we use the notion of maximal exponential model in a convex duality framework and consider an application to portfolio optimization problems. The theory of non-parametric maximal exponential models centered at a given positive density p starts with the work by Pistone and Sempi [1]. In that paper, by using the Orlicz space associated with an exponentially growing Young function, the set of positive densities is endowed with a structure of exponential Banach manifold. More recently, different authors have generalized this structure replacing the exponential function with deformed exponentials (see, e.g., Vigelis and Cavalcante [2], De Andrade et al. [3]) and in other directions (see e.g., Imparato and Trivellato [4]). The geometry of nonparametric exponential models and its analytical properties in the topology of Orlicz spaces have been also studied in subsequent works, such as Cena and Pistone [5] and Santacroce, Siri and Trivellato [6,7], among others.
    [Show full text]
  • Conjugate Duality
    LECTURE 5: CONJUGATE DUALITY 1. Primal problem and its conjugate dual 2. Duality theory and optimality conditions 3. Relation to other type of dual problems 4. Linear conic programming problems Motivation of conjugate duality Min f(x) (over R) = Max g(y) (over R) and h(y) = - g(y) • f(x) + h(y) = f(x) – g(y) can be viewed as “duality gap” • Would like to have (i) weak duality 0 (ii) strong duality 0 = f(x*) + h(y*) Where is the duality information • Recall the fundamental result of Fenchel’s conjugate inequality • Need a structure such that in general and at optimal solutions 0 = <x*, y*> = f(x*) + h(y*) Concept of dual cone • Let X be a cone in • Define its dual cone • Properties: (i) Y is a cone. (ii) Y is a convex set. (iii) Y is a closed set. Observations Conjugate (Geometric) duality Dual side information • Conjugate dual function • Dual cone Properties: 1. Y is a cone in 2. Y is closed and convex 3. both are closed and convex. Conjugate (Geometric) dual problem Observations Conjugate duality theory Conjugate duality theory Proof Conjugate duality theory Example – Standard form LP Conjugate dual problem Dual LP Example – Karmarkar form LP Example – Karmarkar form LP Example – Karmarkar form LP • Conjugate dual problem becomes which is an unconstrained convex programming problem. Illustration Example – Posynomial programming • Nonconvex programming problem • Transformation Posynomial programming • Primal problem: Conjugate dual problem: Dual Posynomial Programming h(y) , supx2En [< x; y > −f (x)] < +1 Pn Pn xj Let g(x) = j=1 xj yj − log j=1 cj e ∗ xj @g cj e = 0 ) yj = x∗ @xj Pn j j=1 cj e Pn ) yj > 0 and j=1 yj = 1: ) Ω is closed.
    [Show full text]
  • Reinforcement Learning Via Fenchel-Rockafellar Duality
    Reinforcement Learning via Fenchel-Rockafellar Duality Ofir Nachum Bo Dai [email protected] [email protected] Google Research Google Research Abstract We review basic concepts of convex duality, focusing on the very general and supremely useful Fenchel-Rockafellar duality. We summarize how this duality may be applied to a variety of reinforcement learning (RL) settings, including policy evaluation or optimization, online or offline learning, and discounted or undiscounted rewards. The derivations yield a number of intriguing results, including the ability to perform policy evaluation and on-policy policy gradient with behavior-agnostic offline data and methods to learn a policy via max-likelihood optimization. Although many of these results have appeared previously in various forms, we provide a unified treatment and perspective on these results, which we hope will enable researchers to better use and apply the tools of convex duality to make further progress in RL. 1 Introduction Reinforcement learning (RL) aims to learn behavior policies to optimize a long-term decision-making process in an environment. RL is thus relevant to a variety of real- arXiv:2001.01866v2 [cs.LG] 9 Jan 2020 world applications, such as robotics, patient care, and recommendation systems. When tackling problems associated with the RL setting, two main difficulties arise: first, the decision-making problem is inherently sequential, with early decisions made by the policy affecting outcomes both near and far in the future; second, the learner’s knowledge of the environment is only through sampled experience, i.e., previously sampled trajectories of interactions, while the underlying mechanism governing the dynamics of these trajectories is unknown.
    [Show full text]
  • The Role of Perspective Functions in Convexity, Polyconvexity, Rank-One Convexity and Separate Convexity
    The role of perspective functions in convexity, polyconvexity, rank-one convexity and separate convexity Bernard Dacorogna∗ and Pierre Mar´echal† October 4, 2007 Abstract Any finite, separately convex, positively homogeneous function on R2 is convex. This was first established in [1]. In this paper, we give a new and concise proof of this result, and we show that it fails in higher dimension. The key of the new proof is the notion of perspective of a convex function f, namely, the function (x, y) → yf(x/y), y > 0. In recent works [9, 10, 11], the perspective has been substantially gen- eralized by considering functions of the form (x, y) → g(y)f(x/g(y)), with suitable assumptions on g. Here, this generalized perspective is shown to be a powerful tool for the analysis of convexity properties of parametrized families of matrix functions. 1 Introduction In [1], Dacorogna established the following theorem: Theorem 1 Let f : R2 → R be separately convex and positively homoge- neous of degree one. Then f is convex. A rather natural question then arises: does this theorem remain valid in higher dimension ? As we will see, the answer is negative. In Section 2 of this paper, we provide a new and concise proof of the above theorem, which uses the notion of perspective in convex analysis. We then ∗Section de Math´ematiques, Ecole Polytechnique F´ed´eralede Lausanne, CH-1015 Lau- sanne, Switzerland. Email: bernard.dacorogna@epfl.ch. †Institut de Math´ematiques, Universit´e Paul Sabatier, F-31062 Toulouse cedex 4, France. Email: [email protected].
    [Show full text]
  • Symbolic Convex Analysis
    SYMBOLIC CONVEX ANALYSIS by Chris Hamilton B.Sc., Okanagan University College, 2001 a thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in the Department of Mathematics °c Chris Hamilton 2005 SIMON FRASER UNIVERSITY March 2005 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author. APPROVAL Name: Chris Hamilton Degree: Master of Science Title of thesis: Symbolic Convex Analysis Examining Committee: Dr. Michael Monagan Chair Dr. Jonathan Borwein, Senior Supervisor Dr. Adrian Lewis, Committee Member Dr. Rustum Choksi, Committee Member Dr. Michael McAllister, External Examiner Date Approved: ii Abstract Convex optimization is a branch of mathematics dealing with non-linear optimization prob- lems with additional geometric structure. This area has been the focus of considerable research due to the fact that convex optimization problems are scalable and can be effi- ciently solved by interior-point methods. Additionally, convex optimization problems are much more prevalent than previously thought as existing problems are constantly being recast in a convex framework. Over the last ten years or so, convex optimization has found applications in many new areas including control theory, signal processing, communications and networks, circuit design, data analysis and finance. As with any new problem, of key concern is visualization of the problem space in order to help develop intuition. In this thesis we develop and explore tools for the visualization of convex functions and related objects. We provide symbolic functionality where possible and appropriate, and proceed numerically otherwise. Of critical importance in convex optimization are the operations of Fenchel conjugation and subdifferentiation of convex functions.
    [Show full text]