Introduction to Nonsmooth Optimization Adil Bagirov • Napsu Karmitsa Marko M. Mäkelä

Introduction to Nonsmooth Optimization Theory, Practice and Software

123 Adil Bagirov Napsu Karmitsa School of Information Technology and Marko M. Mäkelä Mathematical Sciences, Centre for Department of and Statistics Informatics and Applied Optimization University of Turku University of Ballarat Turku Ballarat, VC Finland Australia

ISBN 978-3-319-08113-7 ISBN 978-3-319-08114-4 (eBook) DOI 10.1007/978-3-319-08114-4

Library of Congress Control Number: 2014943114

Springer Cham Heidelberg New York Dordrecht London

Ó Springer International Publishing Switzerland 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com) Preface

Nonsmooth optimization refers to the general problem of minimizing (or maximizing) functions that are typically not differentiable at their minimizers (maximizers). These kinds of functions can be found in many applied fields, for example in image denoising, optimal control, neural network training, data mining, economics, and computational chemistry and physics. Since classical theory of optimization presumes certain differentiability and strong regularity assumptions for the functions to be optimized, it cannot be directly utilized. The aim of this book is to provide an easy-to-read introduction to the theory of nonsmooth optimization and also to present the current state of numerical nonsmooth opti- mization. In addition, the most common cases where nonsmoothness is involved in practical computations are introduced. In preparing this book, all efforts have been made to ensure that it is self-contained. The book is organized into three parts: Part I deals with nonsmooth optimi- zation theory. We first provide an easy-to-read introduction to convex and non- with many numerical examples and illustrative figures. Then we discuss nonsmooth optimality conditions from both analytical and geometrical viewpoints. We also generalize the concept of convexity for nonsmooth functions. At the end of the part, we give brief surveys of different generalizations of sub- differentials and approximations to subdifferentials. In Part II, we consider nonsmooth optimization problems. First, we introduce some real-life nonsmooth optimization problems, for instance, the molecular distance geometry problem, protein structural alignment, data mining, hemivari- ational inequalities, the power unit-commitment problem, image restoration, and the nonlinear income tax problem. Then we discuss some formulations which lead to nonsmooth optimization problems even though the original problem is smooth (continuously differentiable). Examples here include exact penalty formulations. We also represent the maximum eigenvalue problem, which is an important component of many engineering design problems and graph theoretical applica- tions. We refer to these problems as semi-academic problems. Finally, a com- prehensive list of test problems—that is, academic problems—used in nonsmooth optimization is given.

v vi Preface

Part III is a guide to nonsmooth optimization software. First, we give short descriptions and the pseudo-codes of the most commonly used methods for non- smooth optimization. These include different subgradient methods, cutting plane methods, bundle methods, and the gradient sampling method, as well as some hybrid methods and discrete gradient methods. In addition, we introduce some common ways of dealing with constrained nonsmooth optimization problems. We also compare implementations of different nonsmooth optimization methods for solving unconstrained problems. At the end of the part, we provide a table enabling the quick selection of suitable software for different types of nonsmooth optimi- zation problems. The book is ideal for anyone teaching or attending courses in nonsmooth optimization. As a comprehensible introduction to the field, it is also well-suited for self-access learning for practitioners who know the basics of optimization. Furthermore, it can serve as a reference text for anyone—including experts— dealing with nonsmooth optimization. Acknowledgments: First of all, we would like to thank Prof. Herskovits for giving the reason to write a book on nonsmooth analysis and optimization: He once asked why the subject concerned is elusive in all the books and articles dealing with it, and pointed out the lack of an extensive elementary book. In addition, we would like to acknowledge Prof. Kuntsevich and Kappel for providing Shor’s r-algorithm on their web site as well as Prof. Lukšan and Vlcˇek for providing the bundle-Newton algorithm. We are also grateful to the following colleagues and students, all of whom have influenced the content of the book: Annabella Astorino, Ville-Pekka Eronen, Antonio Fuduli, Manlio Gaudioso, Kaisa Joki, Sami Kankaanpää, Refail Kasimbeyli, Yury Nikulin, Gurkan Ozturk, Rami Rakkolainen, Julien Ugon, Dean Webb and Outi Wilppu. The work was financially supported by the University of Turku (Finland), Magnus Ehrnrooth Foundation, Turku University Foundation, Federation Uni- versity Australia, and Australian Research Council.

Ballarat, April 2014 Adil Bagirov Turku Napsu Karmitsa Marko M. Mäkelä Contents

Part I Nonsmooth Analysis and Optimization

1 Theoretical Background ...... 3 1.1 Notations and Definitions ...... 3 1.2 Matrix Calculus ...... 4 1.3 Hausdorff Metrics...... 6 1.4 Functions and Derivatives ...... 6

2 Convex Analysis ...... 11 2.1 Convex Sets...... 11 2.1.1 Convex Hulls ...... 13 2.1.2 Separating and Supporting Hyperplanes ...... 14 2.1.3 Convex Cones ...... 22 2.1.4 Contingent and Normal Cones ...... 27 2.2 Convex Functions...... 32 2.2.1 Level Sets and Epigraphs ...... 37 2.2.2 Subgradients and Directional Derivatives ...... 38 2.2.3 ε-Subdifferentials ...... 47 2.3 Links Between Geometry and Analysis ...... 49 2.3.1 Epigraphs ...... 49 2.3.2 Level Sets ...... 51 2.3.3 Distance Function...... 53 2.4 Summary...... 57 Exercises ...... 57

3 Nonconvex Analysis ...... 61 3.1 Generalization of Derivatives ...... 61 3.1.1 Generalized Directional Derivative ...... 61 3.1.2 Generalized Subgradients ...... 64 3.1.3 ε-Subdifferentials ...... 73 3.1.4 Generalized Jacobians ...... 76

vii viii Contents

3.2 Subdifferential Calculus ...... 77 3.2.1 Subdifferential Regularity ...... 77 3.2.2 Subderivation Rules ...... 79 3.3 Nonconvex Geometry ...... 94 3.3.1 Tangent and Normal Cones ...... 94 3.3.2 Epigraphs and Level Sets ...... 98 3.3.3 Cones of Feasible Directions ...... 102 3.4 Other Generalized Subdifferentials ...... 104 3.4.1 Quasidifferentials ...... 104 3.4.2 Relationship Between Quasidifferential and Clarke Subdifferential ...... 109 3.4.3 Codifferentials ...... 110 3.4.4 Basic and Singular Subdifferentials...... 112 3.5 Summary...... 112 Exercises ...... 113

4 Optimality Conditions ...... 117 4.1 Unconstrained Optimization...... 117 4.1.1 Analytical Optimality Conditions ...... 118 4.1.2 Descent Directions ...... 120 4.2 Geometrical Constraints ...... 121 4.2.1 Geometrical Optimality Conditions...... 122 4.2.2 Mixed Optimality Conditions ...... 123 4.3 Analytical Constraints ...... 126 4.3.1 Geometrical Optimality Conditions...... 127 4.3.2 Fritz John Optimality Conditions ...... 128 4.3.3 Karush-Kuhn-Tucker Optimality Conditions ...... 130 4.4 Optimality Conditions Using Quasidifferentials ...... 134 4.5 Summary...... 135 Exercises ...... 136

5 Generalized Convexities ...... 139 5.1 Generalized Pseudoconvexity ...... 139 5.2 Generalized Quasiconvexity...... 150 5.3 Relaxed Optimality Conditions...... 161 5.3.1 Unconstrained Optimization...... 162 5.3.2 Geometrical Constraints ...... 163 5.3.3 Analytical Constraints ...... 164 5.4 Summary...... 166 Exercises ...... 167

6 Approximations of Subdifferentials ...... 169 6.1 Continuous Approximations of Subdifferential ...... 169 6.2 Discrete Gradient and Approximation of Subgradients ...... 175 Contents ix

6.3 Piecewise Partially Separable Functions and Computation of Discrete Gradients ...... 183 6.3.1 Piecewise Partially Separable Functions ...... 183 6.3.2 Chained and Piecewise Chained Functions ...... 185 6.3.3 Properties of Piecewise Partially Separable Functions...... 187 6.3.4 Calculation of the Discrete Gradients ...... 193 6.4 Summary...... 196 Exercises ...... 197

Notes and References ...... 199

Part II Nonsmooth Problems

7 Practical Problems ...... 203 7.1 Computational Chemistry and Biology ...... 203 7.1.1 Polyatomic Clustering Problem ...... 203 7.1.2 Molecular Distance Geometry Problem ...... 204 7.1.3 Protein Structural Alignment ...... 207 7.1.4 Molecular Docking ...... 209 7.2 Data Analysis...... 211 7.2.1 Cluster Analysis via NSO ...... 211 7.2.2 Piecewise Linear Separability in Supervised Data Classification ...... 215 7.2.3 Piecewise Linear Approximations in Regression Analysis ...... 227 7.2.4 Clusterwise Linear Regression Problems ...... 230 7.3 Optimal Control Problems ...... 233 7.3.1 Optimal Shape Design...... 233 7.3.2 Distributed Parameter Control Problems ...... 234 7.3.3 Hemivariational Inequalities...... 235 7.4 Engineering and Industrial Applications ...... 235 7.4.1 Power Unit-Commitment Problem ...... 235 7.4.2 Continuous Casting of Steel...... 236 7.5 Other Applications ...... 237 7.5.1 Image Restoration...... 238 7.5.2 Nonlinear Income Tax Problem ...... 239

8 SemiAcademic Problems...... 241 8.1 Exact Penalty Formulation...... 241 8.2 Integer Programming with Lagrange Relaxation ...... 243 8.2.1 Traveling Salesman Problem ...... 243 8.3 Maximum Eigenvalue Problem ...... 244 x Contents

9 Academic Problems ...... 247 9.1 Small Unconstrained Problems...... 248 9.2 Bound Constrained Problems ...... 269 9.3 Linearly Constrained Problems...... 269 9.4 Large Problems ...... 277 9.5 Inequality Constrained Problems ...... 283

Notes and References ...... 287

Part III Nonsmooth Optimization Methods

10 Subgradient Methods ...... 295 10.1 Standard Subgradient Method ...... 295 10.2 Shor’s r-Algorithm (Space Dilation Method) ...... 296

11 Cutting Plane Methods ...... 299 11.1 Standard Cutting Plane Method ...... 299 11.2 Cutting Plane Method with Proximity Control ...... 301

12 Bundle Methods ...... 305 12.1 Proximal Bundle and Bundle Trust Methods ...... 305 12.2 Bundle Newton Method ...... 309

13 Gradient Sampling Methods ...... 311 13.1 Gradient Sampling Method ...... 311

14 Hybrid Methods ...... 313 14.1 Variable Metric Bundle Method ...... 313 14.2 Limited Memory Bundle Method ...... 317 14.3 Quasi-Secant Method ...... 320 14.4 Non-Euclidean Restricted Memory Level Method ...... 322

15 Discrete Gradient Methods ...... 327 15.1 Discrete Gradient Method ...... 327 15.2 Limited Memory Discrete Gradient Bundle Method ...... 330

16 Constraint Handling ...... 335 16.1 Exact Penalty...... 335 16.2 Linearization ...... 336

17 Numerical Comparison of NSO Softwares ...... 339 17.1 Solvers ...... 340 17.2 Problems ...... 343 Contents xi

17.3 Termination, Parameters, and Acceptance of Results ...... 344 17.4 Results ...... 344 17.4.1 Extra-Small Problems ...... 345 17.4.2 Small-Scale Problems ...... 346 17.4.3 Medium-Scale Problems ...... 348 17.4.4 Large Problems ...... 349 17.4.5 Extra Large Problems ...... 351 17.4.6 Convergence Speed and Iteration Path ...... 352 17.5 Conclusions ...... 354

References ...... 357

Index ...... 369 Acronyms and Symbols

Rn n-Dimensional Euclidean space N Set of natural numbers x; y; z (column) Vectors xT Transposed vector xT y Inner product of x and y 1 kxk Norm of x in Rn; k x k¼ ðxT xÞ2 xi ith Component of vector x (xk) Sequence of vectors 0 Zero vector a; b; c; α; ε; λ Scalars t # 0 t ! 0þ A, B Matrices

ðAÞij Element of matrix A in row i of column j AT Transposed matrix A1 Inverse of matrix A tr A Trace of matrix A P 1 kAkmn m 2 2 Matrix norm kAkmn ¼ i¼1 kAik I Identity matrix ei ith Column of the identity matrix diag½θ1; ...; θn Diagonal matrix with diagonal elements θ1; ...; θn B(x; r) Open ball with radius r and central point x Bðx; rÞ Closed ball with radius r and central point x S1 Sphere of the unit ball (a, b) Open interval [a, b] Closed interval [a, b), (a, b] Half-open intervals Hðp; αÞ Hyperplane Hþðp; αÞ; Hðp; αÞ Halfspaces S, U Sets

xiii xiv Acronyms and Symbols cl S Closure of set S int S Interior of set S bd S Boundary of set S PðSÞ Power set m \i¼1Si Intersection of sets Si, i ¼ 1; ...; m S U Demyanov difference conv S of set S cone S Conic hull of set S ray S Ray of the set S S Polar cone of the set S KS(x) Contingent cone of set S at x TS(x) Tangent cone of set S at x NS(x) Normal cone of set S at x GS(x) Cone of globally feasible directions of set S at x FS(x) Cone of locally feasible directions of set S at x DS(x) Cone of descent directions at x 2 S DSðxÞ Cone of polar subgradient directions at x 2 S FSðxÞ Cone of polar constraint subgradient directions at x 2 S levαf Level set of f with parameter α epi f of f I; J ; K Sets of indices jIj Number of elements in set I f(x) Objective function value at x arg min f(x) Point where function f attains its minimum value rf ðxÞ Gradient of function f at x o f ðxÞ Partial derivative of function f with respect to xi oxi r2f ðxÞ Hessian matrix of function f at x o2 f ðxÞ Second partial derivative of function f with respect to xi oxioxj and xj CmðRnÞ The space of functions f : Rn ! R with continuous partial derivatives up to order m LðRn; RÞ The space of linear mappings from Rn ! R Dk (generalized) Variable metric approximation of the inverse of the Hessian matrix f 0ðx; dÞ Directional derivative of function f at x in the direction d 0 fεðx; dÞ ε-Directional derivative of function f at x in the direction d f ðx; dÞ Generalized directional derivative of function f at x in the direction d dHðA; BÞ Hausdorff distance (distance between sets A and B) dSðxÞ Distance function (distance of x to the set S) dðx; yÞ Distance function (distance between x and y) ocf ðxÞ Subdifferential of f at x of ðxÞ Subdifferential of function f at x Acronyms and Symbols xv

ξ 2 of ðxÞ Subgradient of function f at x oεf ðxÞ ε-Subdifferential of convex function f at x oG Goldstein ε-subdifferential of function f at x ε f ðxÞ of ðxÞ Subdifferential of quasidifferentiable function f at x of ðxÞ Superdifferential of quasidifferentiable function f at x Df ðxÞ Df ðxÞ¼½of ðxÞ; of ðxÞ Quasidifferential of function f at x dfðxÞ Hypodifferential of codifferentiable function f at x df ðxÞ Hyperdifferential of codifferentiable function f at x Df ðxÞ Df ðxÞ¼½dfðxÞ; df ðxÞ Codifferential of function f at x obf ðxÞ Basic (limiting) subdifferential of f at x o1f ðxÞ Singular subdifferential of f at x v ¼ Γ ðx; g; e; z; ζ; αÞ Discrete gradient of function f at x in direction g D0ðx; λÞ Set of discrete gradients v(x, g, h) Quasi-secant of function f at x QSecðx; hÞ Set of quasi-secants QSLðxÞ Set of limit points of quasi-secants as h # 0 P Set of univariate positive infinitesimal functions G Set of all vertices of the unit hypercube in Rn n Ωf A set in R where function f is not differentiable

^fkðxÞ Piecewise linear cutting plane model of function f at x ~fkðxÞ Piecewise quadratic model of function f at x r hðxÞ Jacobian matrix of function h : Rn ! Rm at x o hðxÞ Generalized Jacobian matrix of function h : Rn ! Rm at x A(x) Real symmetric matrix-valued affine function of x λiðAðxÞÞ i:th Eigenvalue of AðxÞ λmaxðAðxÞÞ Eigenvalue of AðxÞ with the largest absolute value max Maximum min Minimum sup Supremum inf Infimum div(i, j) Integer division for positive integers i and j mod(i, j) Remainder after integer division, mod(i, j)=j(i/j - div(i, j)) ln Natural logarithm DC Difference of convex functions FJ Fritz John optimality conditions KKT Karush–Kuhn–Tucker optimality conditions LOVO Low order value optimization MDGP Molecular distance geometry problem MINLP Mixed integer nonlinear programming NC Nonconstancy NSO Nonsmooth optimization PLP Piecewise linear potential LC, LNC Large-scale convex and nonconvex problems, n = 1000 MC, MNC Medium-scale convex and nonconvex problems, n = 200 xvi Acronyms and Symbols

SC, SNC Small-scale convex and nonconvex problems, n =50 XLC, XLNC Extra–large convex and nonconvex problems, n = 4000 XSC, XSNC Extra-small convex and nonconvex problems, n 20 BNEW Bundle–Newton method BT Bundle trust method CP (standard) Cutting plane method CPPC Cutting plane method with proximity control DGM Discrete gradient method GS Gradient sampling method LMBM Limited memory bundle method LDGB Limited memory discrete gradient bundle method NERML Non-Euclidean restricted memory level method PBM Proximal bundle method QSM Quasi-secant method VMBM Variable metric bundle method Introduction

Nonsmooth optimization is among the most difficult tasks in optimization. It deals with optimization problems where objective and/or constraint functions have discontinuous gradients. Nonsmooth optimization dates back to the early 1960s, when the concept of the subdifferential was introduced by R.T. Rockafellar and W. Fenchel and the first nonsmooth optimization method—the subgradient method was developed by N. Shor, Y. Ermolyev, and their colleagues in Kyev, Ukraine (in the former Soviet Union at that time). In the 1960s and in early 1970s, nonsmooth optimization was mainly applied to solve minimax and large linear problems using decomposition. Such problems can also be solved using other optimization techniques. The most important developments in nonsmooth optimization started with the introduction of the bundle methods in the mid-1970s by C. Lemarechal (and also by P. Wolfe and R. Mifflin). In its original form, the bundle method was introduced to solve nonsmooth convex problems. The 1970s and early 1980s were an important period for new developments in nonsmooth analysis. Various general- izations of subdifferentials were introduced, including the Clarke subdifferential and Demyanov–Rubinov quasidifferential. The use of the Clarke subdifferential allowed the extension of bundle methods to solve nonconvex nonsmooth optimization problems. Since the early 1990s, nonsmooth optimization has been widely applied to solve many practical problems. Such applications, for example, include mechanics, economics, computational chemistry, engineering, machine learning, and data mining. In most of these applications, nonsmooth optimization approaches allow the significant reduction of the number of decision variables in comparison with any other approaches, and thus facilitate the design of efficient algorithms for their solution. Therefore, in these applications, optimization problems cannot be solved by other optimization techniques as efficiently as they can be solved using nonsmooth optimization techniques. Undoubtedly, nonsmooth optimization has now become an indispensable tool for solving problems in diverse fields. Nonsmoothness appears in the modeling of many practical problems in a very natural way. The source of nonsmoothness can be divided into four classes:

xvii xviii Introduction inherent, technological, methodological, and numerical nonsmoothness. In inher- ent nonsmoothness, the original phenomenon under consideration itself contains various discontinuities and irregularities. Typical examples of inherent non- smoothness are the phase changes of materials in the continuous casting of steel, piecewise linear tax models in economics, cluster analysis, supervised data classification, and clusterwise linear regression in data mining and machine learning. Technological nonsmoothness in a model is usually caused by extra technological constraints. These constraints may cause a nonsmooth dependence between variables and functions, even though the functions were originally continuously differentiable. Examples of this include so-called obstacle problems in optimal shape design and discrete feasible sets in product planning. On the other hand, some solution algorithms for constrained optimization may also lead to a nonsmooth problem. Examples of methodological nonsmoothness are the exact penalty function method and the Lagrange decomposition method. Finally, problems may be analytically smooth but numerically nonsmooth. That is the case with, for instance, noisy input data or so-called ‘‘stiff problems,’’ which are numerically unstable and behave like nonsmooth problems. Despite huge developments in nonsmooth optimization in recent decades and wide application of its techniques, only a very few books have been specifically written about it. Some of these books are out of date and do not contain the most recent developments in the area. Moreover, all of these books were written in a way that requires from the audience a high level of knowledge of the subject. Our aim in writing this book is to give an overview of the current state of numerical nonsmooth optimization to a much wider audience, including practitioners. The book is divided into three major parts dealing, respectively, with theory of nonsmooth optimization (convex and nonsmooth analysis, optimality conditions), practical nonsmooth optimization problems (including applications to real world problems and descriptions of academic test problems) and methods of nonsmooth optimization (description of methods and their pseudo-codes, as well as comparison of different implementations). In preparing this book, all efforts have been made to ensure that it is self-contained. Within each chapter of the first part, exercises, numerical examples and graphical illustrations have been provided to help the reader to understand the concepts, practical problems, and methods discussed. At the end of each part, notes and references are presented to aid the reader in their further study. In addition, the book contains an extensive bibliography.