UNIVERSITY of CALIFORNIA, SAN DIEGO Operator
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA, SAN DIEGO Operator Theory for Analysis of Convex Optimization Methods in Machine Learning A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Cognitive Science by Patrick W. Gallagher Committee in charge: Professor Virginia de Sa, Chair Professor Philip Gill, Co-Chair Professor Jeffrey Elman Professor James Hollan Professor Lawrence Saul Professor Zhuowen Tu 2014 Copyright Patrick W. Gallagher, 2014 All rights reserved. The dissertation of Patrick W. Gallagher is approved, and it is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Chair University of California, San Diego 2014 iii EPIGRAPH ...nothing at all takes place in the universe in which some rule of maximum or minimum does not appear. — Leonhard Euler ...between two truths of the real domain, the easiest and shortest path quite often passes through the complex domain. — Paul Painlevé You can see everything with one eye, but looking with two eyes is more convenient. — Jan Brinkhuis and Vladimir Tikhomirov iv TABLE OF CONTENTS Signature Page . iii Epigraph . iv Table of Contents . v List of Figures . xi List of Tables . xiv Acknowledgements . xv Vita . xvi Abstract of the Dissertation . xvii Chapter 1 Introduction . 1 1.1 Introduction . 1 1.2 Context . 2 1.2.1 Convexity, strong smoothness, and strong convexity . 3 1.2.2 Assumptions and convergence rates from a standard perspective and a more general perspective . 15 1.2.3 Overview of what follows . 17 Chapter 2 Related Literature . 19 Chapter 3 Sets and convexity . 24 3.1 Introduction . 24 3.2 Basic terminology and definitions . 24 3.2.1 Affine combinations (lines), segments (line segments), and rays . 25 3.2.2 Combinations . 26 3.2.3 Set terminology . 28 3.2.4 Hulls . 29 3.3 Operations involving sets . 32 3.3.1 Scaling of a set . 32 3.3.2 (Minkowski) sum of two sets . 32 3.4 Frequently relevant sets . 33 3.4.1 Affine hyperplane . 33 3.4.2 Norm ball . 34 3.4.3 Norm cones . 34 3.4.4 Polyhedron . 35 v 3.4.5 Simplex, unit simplex, probability simplex . 35 Chapter 4 Functions and convexity . 37 4.1 Introduction . 37 4.2 Extended-real-valued functions . 37 4.3 Sublevel sets, level sets, superlevel sets . 40 4.4 Graph, epigraph, hypograph . 42 4.5 Minorant, majorant, supporting minorant, supported majorant 45 4.6 Affine, linear, and convex functions . 47 Chapter 5 Strong smoothness and strong convexity . 50 5.1 Introduction . 50 5.2 Strong smoothness with parameter L . 51 5.3 Strong convexity with parameter m . 52 5.4 Both strong smoothness and strong convexity . 55 Chapter 6 Sets associated with other sets . 57 6.1 Introduction . 57 6.1.1 Hulls: linear, affine, convex, convex conic . 58 6.1.2 The polar set of a generic nonempty set . 58 6.2 Cones associated with any set . 59 6.2.1 The polar cone of a generic set . 59 6.2.2 The dual cone of a generic set . 60 6.3 Cones associated with a set and some point in that set . 61 6.3.1 The cone of directions that are “feasible” with re- spect to the set S at the point x# 2 S: . 62 6.3.2 The cone of directions that are “tangent” with re- spect to the set S at the point x# 2 S: . 63 6.3.3 The cone of vectors that are “normal” with respect to the set S at the point x# 2 S: . 66 Chapter 7 Functions associated with sets . 69 7.1 Introduction . 69 7.2 Indicator function of an arbitrary set . 69 7.3 Support function of an arbitrary set . 71 7.4 Distance function associated with a convex set (and a generic norm) . 72 7.5 Gauge function associated with a convex set containing the origin . 73 Chapter 8 Conjugacy in convex analysis . 76 8.1 Introduction . 76 8.2 Conjugate function definition via the support function of the epigraph of a function . 77 vi 8.3 Conjugate function definition via supporting affine minorants 79 8.4 Basic Properties of the Conjugate . 86 8.4.1 Fenchel-Young inequality . 86 8.4.2 The case of equality in the Fenchel-Young inequality 86 8.4.3 Basic transformation results for the conjugate . 91 Chapter 9 Convex analysis results: preliminaries . 95 9.1 Introduction . 95 9.2 Support function examples . 95 9.2.1 Support function of the empty set . 95 n 9.2.2 Support function of R . 96 9.2.3 Support function of a singleton set . 96 9.2.4 Support function of a linear subspace . 96 9.2.5 Support function of an arbitrary cone . 97 9.2.6 Support function of the polar set of a convex set that contains the origin . 97 9.2.7 Support function of a unit norm ball . 98 9.2.8 Support function of the epigraph of an arbitrary func- tion . 98 Chapter 10 Convex analysis results on Fenchel conjugates . 99 10.1 Introduction . 99 10.2 Fenchel conjugates . 100 10.2.1 Conjugate of a linear function . 100 10.2.2 Conjugate of an affine function . 100 10.2.3 Conjugate of absolute value . 101 1 2 10.2.4 Conjugate of c · 2 x for strictly positive c . 103 10.2.5 Conjugate of the indicator function of a generic set . 103 10.2.6 Conjugate of the indicator function of a closed con- vex set . 105 10.2.7 Conjugate of a positively homogeneous function . 105 10.2.8 Conjugate of the indicator function of a generic cone 107 10.2.9 Conjugate of the indicator function of a convex cone 107 10.2.10 Conjugate of the indicator function of a generic norm unit ball . 109 10.2.11 Conjugate of the support function of a generic set . 110 10.2.12 Conjugate of the support function of a closed convex set . 111 10.2.13 Conjugate of the support function of a norm unit ball 112 10.2.14 Conjugate of the support function of a convex cone . 113 10.2.15 Conjugate of a generic norm . 113 10.2.16 Conjugate of (one-half) generic norm squared . 116 vii 10.2.17 Conjugate of the generic k·k-norm measured dis- tance from a point x# to a closed convex set C . 119 10.2.18 Conjugate of (one-half) squared generic norm dis- tance function of a point from a set . 121 Chapter 11 Optimization theory . 123 11.1 Introduction . 123 11.2 Equality constraints only . 124 11.2.1 Problem: Primal optimization problem . 124 11.2.2 Lagrangian function L(·;·) . 125 11.2.3 The Lagrange dual function g(·) . 125 11.2.4 Problem: Evaluate the Lagrange dual function for some l$ . 126 11.2.5 Problem: Optimize the Lagrange dual function over p∗ all l 2 R . 127 11.2.6 Relations involving one or more of the three prob- lems above . 128 Chapter 12 Optimization theory with modified functions . 135 12.1 Introduction . 135 12.2 Equality constraints only . 136 12.2.1 Problem: Primal optimization problem . 136 12.2.2 Introducing modified functions . 136 12.2.3 Introducing modified problems . 139 12.2.4 Observations on the modified problems . 140 Chapter 13 Operator theory basics . 146 13.1 Introduction . 146 13.2 Definitions . 147 13.3 Relationships . 153 13.3.1 Polarization identity . 153 13.3.2 Another identity . 154 Chapter 14 Contractivity-type properties and monotonicity-type properties . 155 14.1 Contractivity-type properties . 155 14.2 Monotonicity-type properties . 163 14.3 Contractivity-type properties with respect to a fixed point . 168 14.4 Monotonicity-type properties with respect to a zero point of the operator . 174 14.5 Contractivity-type properties with respect to a fixed point set 178 14.6 Monotonicity-type properties with respect to a zero point set 184 viii Chapter 15 Relationships between “contractivity” properties of T and “mono- tonicity” properties of G = I − T . 188 15.1 Introduction . 188 15.2 Starting from contractivity-type properties . 189 15.3 Starting from monotonicity-type properties . 197 Chapter 16 Operator scaling, transformation, combination, and composition . 210 16.1 Introduction . 210 16.2 Monotonicity properties under scaling . 210 16.2.1 Scaling an operator G that is strongly monotone . 210 16.2.2 Scaling an operator G that is inverse strongly mono- tone . 212 16.2.3 Scaling an operator G that is both strongly monotone and inverse strongly monotone . 214 16.2.4 Ray from the identity through an operator T . 215 16.3 Operator addition with the identity . 222 16.4 Operator addition . 224 16.5 Operator affine combination . 225 16.6 Operator convex combination . 226 16.6.1 Operator convex combination: monotonicity perspec- tive . ..