Machine Learning Theory Lecture 20: Mirror Descent

Total Page:16

File Type:pdf, Size:1020Kb

Machine Learning Theory Lecture 20: Mirror Descent Machine Learning Theory Lecture 20: Mirror Descent Nicholas Harvey November 21, 2018 In this lecture we will present the Mirror Descent algorithm, which is a common generalization of Gradient Descent and Randomized Weighted Majority. This will require some preliminary results in convex analysis. 1 Conjugate Duality A good reference for the material in this section is [5, Part E]. n ∗ n Definition 1.1. Let f : R ! R [ f1g be a function. Define f : R ! R [ f1g by f ∗(y) = sup yTx − f(x): x2Rn This is the convex conjugate or Legendre-Fenchel transform or Fenchel dual of f. For each linear functional y, the convex conjugate f ∗(y) gives the the greatest amount by which y exceeds the function f. Alternatively, we can think of f ∗(y) as the downward shift needed for the linear function y to just touch or \support" epi f. 1.1 Examples Let us consider some simple one-dimensional examples. ∗ Example 1.2. Let f(x) = cx for some c 2 R. We claim that f = δfcg, i.e., ( 0 (if x = c) f ∗(x) = : +1 (otherwise) This is called the indicator function of fcg. Note that f is itself a linear functional that obviously supports epi f; so f ∗(c) = 0. Any other linear functional x 7! yx − r cannot support epi f for any r (we have supx(yx − cx) = 1 if y 6= c), ∗ ∗ so f (y) = 1 if y 6= c. Note here that a line (f) is getting mapped to a single point (f ). 1 ∗ Example 1.3. Let f(x) = jxj. We claim that f = δ[−1;1] (the indicator function of [−1; 1]). For any y 2 [−1; 1], the linear functional x 7! yx supports epi f at the point (0; 0); so f ∗(y) = 0. On the other hand, if y > 1 then the linear functional x 7! yx − r cannot support epi f for any r ∗ (we have supx(yx − jxj) = 1 for y > 1), so f (y) = 1. Similarly for y < −1. 1 T ∗ Example 1.4. Let f(x) = 2 x x. We claim that f = f. We have ∗ T 1 T 1 2 f (y) = sup y x − 2 x x ≤ sup kyk2 kxk2 − 2 kxk2 : x2Rn x2Rn This upper bound is maximized when kxk2 = kyk2, and the inequality is achieved when x = y. ∗ 1 T ∗ Thus f (y) = 2 y y = f(y), so f = f . n Pn Example 1.5 (Negative entropy). Define f : R>0 ! R by f(x) = i=1 xi ln xi. We saw in our ∗ Pn yi−1 earlier lectures on convexity that f is convex. We claim that f (y) = i=1 e . By Claim 1.9, proving the result for n = 1 also establishes the general result. ∗ By definition f (y) = supz>0(yz −z ln z). The derivative of yz −z ln z is y −ln z −1. The unique y−1 ∗ y−1 y−1 y−1 critical point satisfies z = e and it is a maximizer. Thus f (y) = ye − e (y − 1) = e . n 1 2 ∗ 1 2 Example 1.6. Let k·k be a norm on R and let Let f(x) = 2 kxk . Then f = 2 kxk∗. References. [3, Example 3.27]. 1.2 Properties n Claim 1.7 (Young-Fenchel Inequality). For any x; y 2 R , yTx ≤ f(x) + f ∗(y): Proof. f ∗(y) + f(x) = sup yTx0 − f(x0) + f(x) ≥ yTx − f(x) + f(x) = yTx: x02Rn Claim 1.8. f ∗ is closed and convex (regardless of whether f is). T Proof. For each x, define gx(y) = y x − f(x). Note that gx is an affine function of y, so gx is ∗ ∗ closed and convex. As f = supx2Rn gx, Lemma 5.8 implies that f is closed and convex. a b Claim 1.9 (Conjugate of Separable Function). Let f : R × R ! R be defined by f(x1; x2) = ∗ ∗ ∗ f1(x1) + f2(x2). Then f (x1; x2) = f1 (x1) + f2 (x2). 2 Proof. Straight from the definitions, we have ∗ T f (y1; y2) = sup (y1; y2) (z1; z2) − f(z1; z2) a b (z1;z2)2R ×R T T = sup y1 z1 + y2 z2 − f1(z1) − f2(z2) a b (z1;z2)2R ×R T T = sup y1 z1 − f1(z1) + sup y2 z2 − f2(z2) a b z12R z22R ∗ ∗ = f1 (y1) + f2 (y2): Claim 1.10. Suppose f is a closed, convex function. Then f ∗∗ = f. References. [2, Proposition 7.1.1], [3, Exercise 3.39]. The following claim shows that vectors x and y achieving inequality in Claim 1.7 are rather special. Claim 1.11. Suppose that f is closed and convex. The following are equivalent: y 2 @f(x) (1.1a) x 2 @f ∗(y) (1.1b) h y; x i = f(x) + f ∗(y) (1.1c) References. See [7, Slide 7-15], [5, Part E, Corollary 1.4.4]. In the differentiable setting, (1.1a) () (1.1c) appears in [3, pp. 95]. Proof. ∗ (1.1a))(1.1c): Suppose y 2 @f(x). Then f (y) = supu h y; u i − f(u) = h y; x i − f(x), by the subgradient inequality. n (1.1c))(1.1b): For any v 2 R , we have f ∗(v) = sup h v; u i − f(u) u ≥ h v; x i − f(x) = h v − y; x i − f(x) + h x; y i = h v − y; x i + f ∗(y); by (1.1c). This shows that x 2 @f ∗(y). (1.1b))(1.1a): Let g = f ∗. Then g is closed and convex by Claim 1.8. If x 2 @g(y) then y 2 @g∗(x), by the implication (1.1a))(1.1c). But g∗ = f by Claim 1.10, so this establishes the desired result. 3 2 Bregman Divergence A good reference for the material in this section is [8]. Let X be a closed convex set. Let f : X! R be a continuously-differentiable and convex function. The first-order approximation of f at x is f(x) ≈ f(y) + h rf(y); x − y i: Since f is convex, the subgradient inequality implies that the left-hand side is at least the right- hand side. The amount by which the left-hand side exceeds the right-hand side is the Bregman divergence. Definition 2.1. The Bregman divergence is defined to be Df (x; y) = f(x) − f(y) − h rf(y); x − y i: 2.1 Examples n 2 Example 2.2. Define f : R ! R by f(x) = kxk2. Then Df (x; y) = f(x) − f(y) − h rf(y); x − y i 2 2 = kxk2 − kyk2 − 2h y; x − y i 2 2 = kxk2 + kyk2 − 2h y; x i 2 = kx − yk2 : n Example 2.3 (Negative entropy). Recall that the negative entropy function is f : R>0 ! R Pn defined by f(x) = i=1 xi ln xi. Then T Df (x; y) = f(x) − f(y) − rf(y) (x − y) n n n X X X = xi ln xi − yi ln yi − (ln yi + 1)(xi − yi) i=1 i=1 i=1 n n n n X X X X = xi ln xi − xi ln yi − xi + yi i=1 i=1 i=1 i=1 n n n X X X = xi ln(xi=yi) − xi + yi i=1 i=1 i=1 = DKL(x k y); (2.1) the generalized KL-divergence between x and y, which we introduced in Lecture 16. In the case Pn Pn that i=1 xi = i=1 yi = 1, this is the ordinary KL divergence (or \relative entropy") between x and y. Negative entropy will be particularly important to us, so we prove one property of it now. 4 Claim 2.4. Negative entropy is 1-strongly convex with respect to the `1 norm. To prove this, we require the following theorem. 1 2 Theorem 2.5 (Pinsker's Inequality). For any distributions p; q, we have DKL(p k q) ≥ 2 kp − qk1. References. Wikipedia, Lecture notes of Sanjeev Khudanpur. Pn Proof (of Claim 2.4). As in Example 2.3, let f(x) = i=1 xi ln xi. Then, f(y) ≥ f(x) + h rf(x); y − x i + DKL(y k x) (by (2.1)) 1 2 = f(x) + h rf(x); y − x i + 2 kx − yk1 (by Theorem 2.5): There are also some interesting examples involving matrices. Example 2.6. Let f(X) = tr(X log X). Then Df (X; y) = tr(X log X − X log Y − X + Y ). This is called the von Neumann divergence, or quantum relative entropy. −1 −1 Example 2.7. Let f(X) = − log det X. Then Df (X; Y ) = tr(XY − I) − log det(XY ). This is called the log-det divergence. 2.2 Properties Claim 2.8. Df (x; y) is convex in x. Proof. This is immediate from the definition since f(x) is convex in x and −h rf(y); x − y i is linear in x. Note. Df (x; y) is not generally convex in y. Consider the case f(x) = exp(x) and x = 4. Then 4 Df (4; 0) = e − 5 < 50 4 Df (4; 1) = e − 4e > 43 4 2 Df (4; 2) = e − 3e < 33 As Df (4; 1) > Df (4; 0) + Df (4; 2) =2, Df (x; y) is not convex in y. Lemma 2.9. Let f be closed, convex and differentiable. Fix any x; y 2 X . Definex ^ = rf(x) and y^ = rf(y). Then rf ∗(^x) = x (2.2) Df (x; y) = Df ∗ (^y; x^) (2.3) References.
Recommended publications
  • Conjugate Convex Functions in Topological Vector Spaces
    Matematisk-fysiske Meddelelser udgivet af Det Kongelige Danske Videnskabernes Selskab Bind 34, nr. 2 Mat. Fys. Medd . Dan. Vid. Selsk. 34, no. 2 (1964) CONJUGATE CONVEX FUNCTIONS IN TOPOLOGICAL VECTOR SPACES BY ARNE BRØNDSTE D København 1964 Kommissionær : Ejnar Munksgaard Synopsis Continuing investigations by W. L . JONES (Thesis, Columbia University , 1960), the theory of conjugate convex functions in finite-dimensional Euclidea n spaces, as developed by W. FENCHEL (Canadian J . Math . 1 (1949) and Lecture No- tes, Princeton University, 1953), is generalized to functions in locally convex to- pological vector spaces . PRINTP_ll IN DENMARK BIANCO LUNOS BOGTRYKKERI A-S Introduction The purpose of the present paper is to generalize the theory of conjugat e convex functions in finite-dimensional Euclidean spaces, as initiated b y Z . BIRNBAUM and W. ORLICz [1] and S . MANDELBROJT [8] and developed by W. FENCHEL [3], [4] (cf. also S. KARLIN [6]), to infinite-dimensional spaces . To a certain extent this has been done previously by W . L . JONES in his Thesis [5] . His principal results concerning the conjugates of real function s in topological vector spaces have been included here with some improve- ments and simplified proofs (Section 3). After the present paper had bee n written, the author ' s attention was called to papers by J . J . MOREAU [9], [10] , [11] in which, by a different approach and independently of JONES, result s equivalent to many of those contained in this paper (Sections 3 and 4) are obtained. Section 1 contains a summary, based on [7], of notions and results fro m the theory of topological vector spaces applied in the following .
    [Show full text]
  • An Asymptotical Variational Principle Associated with the Steepest Descent Method for a Convex Function
    Journal of Convex Analysis Volume 3 (1996), No.1, 63{70 An Asymptotical Variational Principle Associated with the Steepest Descent Method for a Convex Function B. Lemaire Universit´e Montpellier II, Place E. Bataillon, 34095 Montpellier Cedex 5, France. e-mail:[email protected] Received July 5, 1994 Revised manuscript received January 22, 1996 Dedicated to R. T. Rockafellar on his 60th Birthday The asymptotical limit of the trajectory defined by the continuous steepest descent method for a proper closed convex function f on a Hilbert space is characterized in the set of minimizers of f via an asymp- totical variational principle of Brezis-Ekeland type. The implicit discrete analogue (prox method) is also considered. Keywords : Asymptotical, convex minimization, differential inclusion, prox method, steepest descent, variational principle. 1991 Mathematics Subject Classification: 65K10, 49M10, 90C25. 1. Introduction Let X be a real Hilbert space endowed with inner product :; : and associated norm : , and let f be a proper closed convex function on X. h i k k The paper considers the problem of minimizing f, that is, of finding infX f and some element in the optimal set S := Argmin f, this set assumed being non empty. Letting @f denote the subdifferential operator associated with f, we focus on the contin- uous steepest descent method associated with f, i.e., the differential inclusion du @f(u); t > 0 − dt 2 with initial condition u(0) = u0: This method is known to yield convergence under broad conditions summarized in the following theorem. Let us denote by the real vector space of continuous functions from [0; + [ into X that are absolutely conAtinuous on [δ; + [ for all δ > 0.
    [Show full text]
  • ADDENDUM the Following Remarks Were Added in Proof (November 1966). Page 67. an Easy Modification of Exercise 4.8.25 Establishes
    ADDENDUM The following remarks were added in proof (November 1966). Page 67. An easy modification of exercise 4.8.25 establishes the follow­ ing result of Wagner [I]: Every simplicial k-"complex" with at most 2 k ~ vertices has a representation in R + 1 such that all the "simplices" are geometric (rectilinear) simplices. Page 93. J. H. Conway (private communication) has established the validity of the conjecture mentioned in the second footnote. Page 126. For d = 2, the theorem of Derry [2] given in exercise 7.3.4 was found earlier by Bilinski [I]. Page 183. M. A. Perles (private communication) recently obtained an affirmative solution to Klee's problem mentioned at the end of section 10.1. Page 204. Regarding the question whether a(~) = 3 implies b(~) ~ 4, it should be noted that if one starts from a topological cell complex ~ with a(~) = 3 it is possible that ~ is not a complex (in our sense) at all (see exercise 11.1.7). On the other hand, G. Wegner pointed out (in a private communication to the author) that the 2-complex ~ discussed in the proof of theorem 11.1 .7 indeed satisfies b(~) = 4. Page 216. Halin's [1] result (theorem 11.3.3) has recently been genera­ lized by H. A. lung to all complete d-partite graphs. (Halin's result deals with the graph of the d-octahedron, i.e. the d-partite graph in which each class of nodes contains precisely two nodes.) The existence of the numbers n(k) follows from a recent result of Mader [1] ; Mader's result shows that n(k) ~ k.2(~) .
    [Show full text]
  • CLOSED MEANS CONTINUOUS IFF POLYHEDRAL: a CONVERSE of the GKR THEOREM Emil Ernst
    CLOSED MEANS CONTINUOUS IFF POLYHEDRAL: A CONVERSE OF THE GKR THEOREM Emil Ernst To cite this version: Emil Ernst. CLOSED MEANS CONTINUOUS IFF POLYHEDRAL: A CONVERSE OF THE GKR THEOREM. 2011. hal-00652630 HAL Id: hal-00652630 https://hal.archives-ouvertes.fr/hal-00652630 Preprint submitted on 16 Dec 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. CLOSED MEANS CONTINUOUS IFF POLYHEDRAL: A CONVERSE OF THE GKR THEOREM EMIL ERNST Abstract. Given x0, a point of a convex subset C of an Euclidean space, the two following statements are proven to be equivalent: (i) any convex function f : C → R is upper semi-continuous at x0, and (ii) C is polyhedral at x0. In the particular setting of closed convex mappings and Fσ domains, we prove that any closed convex function f : C → R is continuous at x0 if and only if C is polyhedral at x0. This provides a converse to the celebrated Gale-Klee- Rockafellar theorem. 1. Introduction One basic fact about real-valued convex mappings on Euclidean spaces, is that they are continuous at any point of their domain’s relative interior (see for instance [13, Theorem 10.1]).
    [Show full text]
  • Lecture Slides on Convex Analysis And
    LECTURE SLIDES ON CONVEX ANALYSIS AND OPTIMIZATION BASED ON 6.253 CLASS LECTURES AT THE MASS. INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASS SPRING 2014 BY DIMITRI P. BERTSEKAS http://web.mit.edu/dimitrib/www/home.html Based on the books 1) “Convex Optimization Theory,” Athena Scien- tific, 2009 2) “Convex Optimization Algorithms,” Athena Sci- entific, 2014 (in press) Supplementary material (solved exercises, etc) at http://www.athenasc.com/convexduality.html LECTURE 1 AN INTRODUCTION TO THE COURSE LECTURE OUTLINE The Role of Convexity in Optimization • Duality Theory • Algorithms and Duality • Course Organization • HISTORY AND PREHISTORY Prehistory: Early 1900s - 1949. • Caratheodory, Minkowski, Steinitz, Farkas. − Properties of convex sets and functions. − Fenchel - Rockafellar era: 1949 - mid 1980s. • Duality theory. − Minimax/game theory (von Neumann). − (Sub)differentiability, optimality conditions, − sensitivity. Modern era - Paradigm shift: Mid 1980s - present. • Nonsmooth analysis (a theoretical/esoteric − direction). Algorithms (a practical/high impact direc- − tion). A change in the assumptions underlying the − field. OPTIMIZATION PROBLEMS Generic form: • minimize f(x) subject to x C ∈ Cost function f : n , constraint set C, e.g., ℜ 7→ ℜ C = X x h1(x) = 0,...,hm(x) = 0 ∩ | x g1(x) 0,...,gr(x) 0 ∩ | ≤ ≤ Continuous vs discrete problem distinction • Convex programming problems are those for which• f and C are convex They are continuous problems − They are nice, and have beautiful and intu- − itive structure However, convexity permeates
    [Show full text]
  • Discrete Convexity and Log-Concave Distributions in Higher Dimensions
    Discrete Convexity and Log-Concave Distributions In Higher Dimensions 1 Basic Definitions and notations n 1. R : The usual vector space of real n− tuples n 2. Convex set : A subset C of R is said to be convex if λx + (1 − λ)y 2 C for any x; y 2 C and 0 < λ < 1: n 3. Convex hull : Let S ⊆ R . The convex hull denoted by conv(S) is the intersection of all convex sets containing S: (or the set of all convex combinations of points of S) n 4. Convex function : A function f from C ⊆ R to (−∞; 1]; where C is a n convex set (for ex: C = R ). Then, f is convex on C iff f((1 − λ)x + λy) ≤ (1 − λ)f(x) + λf(y); 0 < λ < 1: 5. log-concave A function f is said to be log-concave if log f is concave (or − log f is convex) n 6. Epigraph of a function : Let f : S ⊆ R ! R [ {±∞}: The set f(x; µ) j x 2 S; µ 2 R; µ ≥ f(x)g is called the epigraph of f (denoted by epi(f)). 7. The effective domain (denoted by dom(f)) of a convex function f on S n is the projection on R of the epi(f): dom(f) = fx j 9µ, (x; µ) 2 epi(f)g = fx j f(x) < 1g: 8. Lower semi-continuity : An extended real-valued function f given n on a set S ⊂ R is said to be lower semi-continuous at a point x 2 S if f(x) ≤ lim inf f(xi) for every sequence x1; x2; :::; 2 S s.t xi ! x and limit i!1 of f(x1); f(x2); ::: exists in [−∞; 1]: 1 2 Convex Analysis Preliminaries In this section, we recall some basic concepts and results of Convex Analysis.
    [Show full text]
  • Discrete Convexity and Equilibria in Economies with Indivisible Goods and Money
    Mathematical Social Sciences 41 (2001) 251±273 www.elsevier.nl/locate/econbase Discrete convexity and equilibria in economies with indivisible goods and money Vladimir Danilova,* , Gleb Koshevoy a , Kazuo Murota b aCentral Institute of Economics and Mathematics, Russian Academy of Sciences, Nahimovski prospect 47, Moscow 117418, Russia bResearch Institute for Mathematical Sciences, Kyoto University, Kyoto 606-8502, Japan Received 1 January 2000; received in revised form 1 October 2000; accepted 1 October 2000 Abstract We consider a production economy with many indivisible goods and one perfectly divisible good. The aim of the paper is to provide some light on the reasons for which equilibrium exists for such an economy. It turns out, that a main reason for the existence is that supplies and demands of indivisible goods should be sets of a class of discrete convexity. The class of generalized polymatroids provides one of the most interesting classes of discrete convexity. 2001 Published by Elsevier Science B.V. Keywords: Equilibrium; Discrete convex sets; Generalized polymatroids JEL classi®cation: D50 1. Introduction We consider here an economy with production in which there is only one perfectly divisible good (numeraire or money) and all other goods are indivisible. In other words, the commodity space of the model takes the following form ZK 3 R, where K is a set of types of indivisible goods. Several models of exchange economies with indivisible goods and money have been considered in the literature. In 1970, Henry (1970) proved the existence of a competitive equilibrium in an exchange economy with one indivisible good and money. He also gave an example of an economy with two indivisible goods in which no competitive *Corresponding author.
    [Show full text]
  • Conjugate Duality and Optimization CBMS-NSF REGIONAL CONFERENCE SERIES in APPLIED MATHEMATICS
    Conjugate Duality and Optimization CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS A series of lectures on topics of current research interest in applied mathematics under the direction of the Conference Board of the Mathematical Sciences, supported by the National Science Foundation and published by SIAM. GARRETT BIRKHOFF, The Numerical Solution of Elliptic Equations D. V. LINDLEY, Bayesian Statistics, A Review R. S. VARGA, Functional Analysis and Approximation Theory in Numerical Analysis R. R. BAHADUR, Some Limit Theorems in Statistics PATRICK BILLINGSLEY, Weak Convergence of Measures: Applications in Probability J. L. LIONS, Some Aspects of the Optimal Control of Distributed Parameter Systems ROGER PENROSE, Techniques of Differential Topology in Relativity HERMAN CHERNOFF, Sequential Analysis and Optimal Design J. DURBIN, Distribution Theory for Tests Based on the Sample Distribution Function SOL I. RUBINOW, Mathematical Problems in the Biological Sciences P. D. LAX, Hyperbolic Systems of Conservation Laws and the Mathematical Theory of Shock Waves I. J. SCHOENBERG, Cardinal Spline Interpolation IVAN SINGER, The Theory of Best Approximation and Functional Analysis WERNER C. RHEINBOLDT, Methods of Solving Systems of Nonlinear Equations HANS F. WEINBERGER, Variational Methods for Eigenvalue Approximation R. TYRRELL ROCKAFELLAR, Conjugate Duality and Optimization SIR JAMES LIGHTHILL, Mathematical Biofluiddynamics GERARD SALTON, Theory of Indexing CATHLEEN S. MORAWETZ, Notes on Time Decay and Scattering for Some Hyperbolic Problems F. HOPPENSTEADT, Mathematical Theories of Populations: Demographics, Genetics and Epidemics RICHARD ASKEY, Orthogonal Polynomials and Special Functions L. E. PAYNE, Improperly Posed Problems in Partial Differential Equations S. ROSEN, Lectures on the Measurement and Evaluation of the Performance of Computing Systems HERBERT B.
    [Show full text]
  • Nonparametric Estimation of Multivariate Convex-Transformed Densities.” DOI: 10.1214/10-AOS840SUPP
    The Annals of Statistics 2010, Vol. 38, No. 6, 3751–3781 DOI: 10.1214/10-AOS840 c Institute of Mathematical Statistics, 2010 NONPARAMETRIC ESTIMATION OF MULTIVARIATE CONVEX-TRANSFORMED DENSITIES By Arseni Seregin1 and Jon A. Wellner1,2 University of Washington We study estimation of multivariate densities p of the form p(x)= h(g(x)) for x ∈ Rd and for a fixed monotone function h and an un- known convex function g. The canonical example is h(y)= e−y for y ∈ R; in this case, the resulting class of densities P(e−y)= {p = exp(−g) : g is convex} is well known as the class of log-concave densities. Other functions h allow for classes of densities with heavier tails than the log-concave class. We first investigate when the maximum likelihood estimatorp ˆ exists for the class P(h) for various choices of monotone transfor- mations h, including decreasing and increasing functions h. The re- sulting models for increasing transformations h extend the classes of log-convex densities studied previously in the econometrics literature, corresponding to h(y) = exp(y). We then establish consistency of the maximum likelihood esti- mator for fairly general functions h, including the log-concave class P(e−y) and many others. In a final section, we provide asymptotic minimax lower bounds for the estimation of p and its vector of deriva- tives at a fixed point x0 under natural smoothness hypotheses on h and g. The proofs rely heavily on results from convex analysis. 1. Introduction and background. 1.1. Log-concave and r-concave densities.
    [Show full text]
  • CONVEX ANALYSIS and NONLINEAR OPTIMIZATION Theory and Examples
    CONVEX ANALYSIS AND NONLINEAR OPTIMIZATION Theory and Examples JONATHAN M. BORWEIN Centre for Experimental and Constructive Mathematics Department of Mathematics and Statistics Simon Fraser University, Burnaby, B.C., Canada V5A 1S6 [email protected] http://www.cecm.sfu.ca/ jborwein ∼ and ADRIAN S. LEWIS Department of Combinatorics and Optimization University of Waterloo, Waterloo, Ont., Canada N2L 3G1 [email protected] http://orion.uwaterloo.ca/ aslewis ∼ To our families 2 Contents 0.1 Preface . 5 1 Background 7 1.1 Euclidean spaces . 7 1.2 Symmetric matrices . 16 2 Inequality constraints 22 2.1 Optimality conditions . 22 2.2 Theorems of the alternative . 30 2.3 Max-functions and first order conditions . 36 3 Fenchel duality 42 3.1 Subgradients and convex functions . 42 3.2 The value function . 54 3.3 The Fenchel conjugate . 61 4 Convex analysis 78 4.1 Continuity of convex functions . 78 4.2 Fenchel biconjugation . 90 4.3 Lagrangian duality . 103 5 Special cases 113 5.1 Polyhedral convex sets and functions . 113 5.2 Functions of eigenvalues . 120 5.3 Duality for linear and semidefinite programming . 126 5.4 Convex process duality . 132 6 Nonsmooth optimization 143 6.1 Generalized derivatives . 143 3 6.2 Nonsmooth regularity and strict differentiability . 151 6.3 Tangent cones . 158 6.4 The limiting subdifferential . 167 7 The Karush-Kuhn-Tucker theorem 176 7.1 An introduction to metric regularity . 176 7.2 The Karush-Kuhn-Tucker theorem . 184 7.3 Metric regularity and the limiting subdifferential . 191 7.4 Second order conditions . 197 8 Fixed points 204 8.1 Brouwer's fixed point theorem .
    [Show full text]
  • Calculus of Convex Polyhedra and Polyhedral Convex Functions by Utilizing a Multiple Objective Linear Programming Solver1
    Calculus of convex polyhedra and polyhedral convex functions by utilizing a multiple objective linear programming solver1 Daniel Ciripoi2 Andreas L¨ohne2 Benjamin Weißing2 July 17, 2018 Abstract The article deals with operations defined on convex polyhedra or poly- hedral convex functions. Given two convex polyhedra, operations like Minkowski sum, intersection and closed convex hull of the union are con- sidered. Basic operations for one convex polyhedron are, for example, the polar, the conical hull and the image under affine transformation. The concept of a P-representation of a convex polyhedron is introduced. It is shown that many polyhedral calculus operations can be expressed ex- plicitly in terms of P-representations. We point out that all the relevant computational effort for polyhedral calculus consists in computing pro- jections of convex polyhedra. In order to compute projections we use a recent result saying that multiple objective linear programming (MOLP) is equivalent to the polyhedral projection problem. Based on the MOLP- solver bensolve a polyhedral calculus toolbox for Matlab and GNU Octave is developed. Some numerical experiments are discussed. Keywords: polyhedron, polyhedral set, polyhedral convex analysis, poly- hedron computations, multiple objective linear programming, P-represen- tation MSC 2010 Classification: 52B55, 90C29 arXiv:1801.10584v2 [math.OC] 13 Jul 2018 1 Introduction Convex polyhedra and polyhedral convex functions are relevant in many disci- plines of mathematics and sciences. They can be used to approximate convex sets and convex functions with the advantage of always having finite represen- tations. This naturally leads to the need of a calculus, that is, a collection of 1This research was supported by the German Research Foundation (DFG) grant number LO{1379/7{1.
    [Show full text]
  • Lecture 3 Convex Functions
    Lecture 3 Convex functions (Basic properties; Calculus; Closed functions; Continuity of convex functions; Subgradients; Optimality conditions) 3.1 First acquaintance Definition 3.1.1 [Convex function] A function f : M ! R defined on a nonempty subset M of Rn and taking real values is called convex, if • the domain M of the function is convex; • for any x; y 2 M and every λ 2 [0; 1] one has f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y): (3.1.1) If the above inequality is strict whenever x 6= y and 0 < λ < 1, f is called strictly convex. A function f such that −f is convex is called concave; the domain M of a concave function should be convex, and the function itself should satisfy the inequality opposite to (3.1.1): f(λx + (1 − λ)y) ≥ λf(x) + (1 − λ)f(y); x; y 2 M; λ 2 [0; 1]: The simplest example of a convex function is an affine function f(x) = aT x + b { the sum of a linear form and a constant. This function clearly is convex on the entire space, and the \convexity inequality" for it is equality; the affine function is also concave. It is easily seen that the function which is both convex and concave on the entire space is an affine function. Here are several elementary examples of \nonlinear" convex functions of one variable: 53 54 LECTURE 3. CONVEX FUNCTIONS • functions convex on the whole axis: x2p, p being positive integer; expfxg; • functions convex on the nonnegative ray: xp, 1 ≤ p; −xp, 0 ≤ p ≤ 1; x ln x; • functions convex on the positive ray: 1=xp, p > 0; − ln x.
    [Show full text]