Lagrangian Decomposition for Neural Network Verification

Total Page:16

File Type:pdf, Size:1020Kb

Lagrangian Decomposition for Neural Network Verification Lagrangian Decomposition for Neural Network Verification Rudy Bunel∗, Alessandro De Palma∗, Alban Desmaison University of Oxford frudy, [email protected] Krishnamurthy (Dj) Dvijotham, Pushmeet Kohli Philip H.S. Torr, M. Pawan Kumar DeepMind University of Oxford Abstract where safety and robustness would be a prerequisite, we need to invent techniques that can prove formal guarantees A fundamental component of neural network for neural network behaviour. A particularly desirable verification is the computation of bounds on property is resistance to adversarial examples (Goodfel- the values their outputs can take. Previous low et al., 2015, Szegedy et al., 2014): perturbations ma- methods have either used off-the-shelf solvers, liciously crafted with the intent of fooling even extremely discarding the problem structure, or relaxed well performing models. After several defenses were pro- the problem even further, making the bounds posed and subsequently broken (Athalye et al., 2018, Ue- unnecessarily loose. We propose a novel sato et al., 2018), some progress has been made in being approach based on Lagrangian Decomposition. able to formally verify whether there exist any adversarial Our formulation admits an efficient supergradi- examples in the neighbourhood of a data point (Tjeng ent ascent algorithm, as well as an improved et al., 2019, Wong and Kolter, 2018). proximal algorithm. Both the algorithms offer Verification algorithms fall into three categories: unsound three advantages: (i) they yield bounds that (some false properties are proven false), incomplete (some are provably at least as tight as previous dual true properties are proven true), and complete (all prop- algorithms relying on Lagrangian relaxations; erties are correctly verified as either true or false). A (ii) they are based on operations analogous critical component of the verification systems developed to forward/backward pass of neural networks so far is the computation of lower and upper bounds on layers and are therefore easily parallelizable, the output of neural networks when their inputs are con- amenable to GPU implementation and able to strained to lie in a bounded set. In incomplete verifica- take advantage of the convolutional structure tion, by deriving bounds on the changes of the predic- of problems; and (iii) they allow for anytime tion vector under restricted perturbations, it is possible stopping while still providing valid bounds. to identify safe regions of the input space. These results Empirically, we show that we obtain bounds allow the rigorous comparison of adversarial defenses comparable with off-the-shelf solvers in a and prevent making overconfident statements about their fraction of their running time, and obtain tighter efficacy (Wong and Kolter, 2018). In complete verifica- arXiv:2002.10410v3 [cs.LG] 17 Jun 2020 bounds in the same time as previous dual tion, bounds can also be used as essential subroutines algorithms. This results in an overall speed-up of Branch and Bound complete verifiers (Bunel et al., when employing the bounds for formal verifi- 2018). Finally, bounds might also be used as a training cation. Code for our algorithms is available at signal to guide the network towards greater robustness https://github.com/oval-group/ and more verifiability (Gowal et al., 2018, Mirman et al., decomposition-plnn-bounds. 2018, Wong and Kolter, 2018). Most previous algorithms for computing bounds are either 1 INTRODUCTION computationally expensive (Ehlers, 2017) or sacrifice a lot of tightness in order to scale (Gowal et al., 2018, Mirman As deep learning powered systems become more and more et al., 2018, Wong and Kolter, 2018). In this work, we common, the lack of robustness of neural networks and design novel customised relaxations and their correspond- their reputation for being “Black Boxes” is increasingly ing solvers for obtaining bounds on neural networks. Our worrisome. In order to deploy them in critical scenarios approach offers the following advantages: ∗ These authors contributed equally to this work. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR volume 124, 2020. • While previous approaches to neural network problem instances. Others are incomplete, based on re- bounds (Dvijotham et al., 2018) are based on La- laxations of the verification problem. They trade speed grangian relaxations, we derive a new family of opti- for completeness: while they cannot verify properties for mization problems for neural network bounds through all problem instances, they scale significantly better. Two Lagrangian Decomposition, which in general yields main types of bounds have been proposed: on the one duals at least as strong as those obtained through La- hand, some approaches (Ehlers, 2017, Salman et al., 2019) grangian relaxation (Guignard and Kim, 1987). We rely on off-the-shelf solvers to solve accurate relaxations in fact prove that, in the context of ReLU networks, such as PLANET (Ehlers, 2017), which is the best known for any dual solution from the approach by Dvijotham linear-sized approximation of the problem. On the other et al. (2018), the bounds output by our dual are as least hand, as PLANET and other more complex relaxations as tight. We demonstrate empirically that this deriva- do not have closed form solutions, some researchers have tion computes tighter bounds in the same time when also proposed easier to solve, looser formulations (Gowal using supergradient methods. We further improve on et al., 2018, Mirman et al., 2018, Weng et al., 2018, Wong the performance by devising a proximal solver for the and Kolter, 2018). Explicitly or implicitly, these are all problem, which decomposes the task into a series of equivalent to propagating a convex domain through the strongly convex subproblems. For each, we use an iter- network to overapproximate the set of reachable values. ative method for which we derive optimal step sizes. Our approach consists in tackling a relaxation equivalent • Both the supergradient and the proximal method op- to the PLANET one (although generalised beyond ReLU), erate through linear operations similar to those used by designing a custom solver that achieves faster perfor- during network forward/backward passes. As a con- mance without sacrificing tightness. Some potentially sequence, we can leverage the convolutional struc- tighter convex relaxations exist but involve a quadratic ture when necessary, while standard solvers are of- number of variables, such as the semi-definite program- ten restricted to treating it as a general linear opera- ming method of Raghunathan et al. (2018) . Better re- tion. Moreover, both methods are easily paralleliz- laxations obtained from relaxing strong Mixed Integer able: when computing bounds on the activations at Programming formulations (Anderson et al., 2019) have a layer k, we need two solve two problems for each quadratic number of variables or a potentially exponential hidden unit of the network (for the upper and lower number of constraints. We do not address them here. bounds). These can all be solved in parallel. In com- A closely related approach to ours is the work of Dvi- plete verification, we need to compute bounds for sev- jotham et al. (2018). Both their method and ours are eral different problem domains at once: we solve these anytime and operate on similar duals. While their dual problems in parallel as well. Our GPU implementation is based on the Lagrangian relaxation of the non-convex thus allows us to solve several hundreds of linear pro- problem, ours is based on the Lagrangian Decomposition grams at once on a single GPU, a level of parallelism of the nonlinear activation’s convex relaxation. Thanks to that would be hard to match on CPU-based systems. the properties of Lagrangian Decomposition (Guignard • Most standard linear programming based relax- and Kim, 1987), we can show that our dual problem pro- ations (Ehlers, 2017) will only return a valid bound vides better bounds when evaluated on the same dual vari- if the problem is solved to optimality. Others, like ables. The relationship between the two duals is studied the dual simplex method employed by off-the-shelf in detail in section 4.2. Moreover, in terms of the followed solvers (Gurobi Optimization, 2020) have a very high optimization strategy, in addition to using a supergradient cost per iteration and will not yield tight bounds with- method like Dvijotham et al. (2018), we present a proxi- out incurring significant computational costs. Both mal method, for which we can derive optimal step sizes. methods described in this paper are anytime (terminat- We show that these modifications enable us to compute ing it before convergence still provides a valid bound), tighter bounds using the same amount of compute time. and can be interrupted at very small granularity. This is useful in the context of a subroutine for complete 3 PRELIMINARIES verifiers, as this enables the user to choose an appropri- Throughout this paper, we will use bold lower case letters ate speed versus accuracy trade-off. It also offers great (like z) to represent vectors and upper case letters (like W ) versatility as an incomplete verification method. to represent matrices. Brackets are used to indicate the 2 RELATED WORKS i-th coordinate of a vector (z[i]), and integer ranges (e.g., [1; n − 1]). We will study the computation of the lower Bound computations are mainly used for formal verifica- bound problem based on a feedforward neural network, tion methods. Some methods are complete (Cheng et al., with element-wise activation function σ (·). The network 2017, Ehlers, 2017, Katz et al., 2017, Tjeng et al., 2019, inputs are restricted to a convex domain C, over which we Xiang et al., 2017), always returning a verdict for each assume that we can easily optimise linear functions. This is the same assumption that was made by Dvijotham et al. ingful, easy to solve subproblems. We then impose con- (2018). The computation for an upper bound is analogous.
Recommended publications
  • A Learning-Based Iterative Method for Solving Vehicle
    Published as a conference paper at ICLR 2020 ALEARNING-BASED ITERATIVE METHOD FOR SOLVING VEHICLE ROUTING PROBLEMS Hao Lu ∗ Princeton University, Princeton, NJ 08540 [email protected] Xingwen Zhang ∗ & Shuang Yang Ant Financial Services Group, San Mateo, CA 94402 fxingwen.zhang,[email protected] ABSTRACT This paper is concerned with solving combinatorial optimization problems, in par- ticular, the capacitated vehicle routing problems (CVRP). Classical Operations Research (OR) algorithms such as LKH3 (Helsgaun, 2017) are inefficient and dif- ficult to scale to larger-size problems. Machine learning based approaches have recently shown to be promising, partly because of their efficiency (once trained, they can perform solving within minutes or even seconds). However, there is still a considerable gap between the quality of a machine learned solution and what OR methods can offer (e.g., on CVRP-100, the best result of learned solutions is between 16.10-16.80, significantly worse than LKH3’s 15.65). In this paper, we present “Learn to Improve” (L2I), the first learning based approach for CVRP that is efficient in solving speed and at the same time outperforms OR methods. Start- ing with a random initial solution, L2I learns to iteratively refine the solution with an improvement operator, selected by a reinforcement learning based controller. The improvement operator is selected from a pool of powerful operators that are customized for routing problems. By combining the strengths of the two worlds, our approach achieves the new state-of-the-art results on CVRP, e.g., an average cost of 15.57 on CVRP-100. 1 INTRODUCTION In this paper, we focus on an important class of combinatorial optimization, vehicle routing problems (VRP), which have a wide range of applications in logistics.
    [Show full text]
  • 12. Coordinate Descent Methods
    EE 546, Univ of Washington, Spring 2014 12. Coordinate descent methods theoretical justifications • randomized coordinate descent method • minimizing composite objectives • accelerated coordinate descent method • Coordinate descent methods 12–1 Notations consider smooth unconstrained minimization problem: minimize f(x) x RN ∈ n coordinate blocks: x =(x ,...,x ) with x RNi and N = N • 1 n i ∈ i=1 i more generally, partition with a permutation matrix: U =[PU1 Un] • ··· n T xi = Ui x, x = Uixi i=1 X blocks of gradient: • f(x) = U T f(x) ∇i i ∇ coordinate update: • x+ = x tU f(x) − i∇i Coordinate descent methods 12–2 (Block) coordinate descent choose x(0) Rn, and iterate for k =0, 1, 2,... ∈ 1. choose coordinate i(k) 2. update x(k+1) = x(k) t U f(x(k)) − k ik∇ik among the first schemes for solving smooth unconstrained problems • cyclic or round-Robin: difficult to analyze convergence • mostly local convergence results for particular classes of problems • does it really work (better than full gradient method)? • Coordinate descent methods 12–3 Steepest coordinate descent choose x(0) Rn, and iterate for k =0, 1, 2,... ∈ (k) 1. choose i(k) = argmax if(x ) 2 i 1,...,n k∇ k ∈{ } 2. update x(k+1) = x(k) t U f(x(k)) − k i(k)∇i(k) assumptions f(x) is block-wise Lipschitz continuous • ∇ f(x + U v) f(x) L v , i =1,...,n k∇i i −∇i k2 ≤ ik k2 f has bounded sub-level set, in particular, define • ⋆ R(x) = max max y x 2 : f(y) f(x) y x⋆ X⋆ k − k ≤ ∈ Coordinate descent methods 12–4 Analysis for constant step size quadratic upper bound due to block coordinate-wise
    [Show full text]
  • Process Optimization
    Process Optimization Mathematical Programming and Optimization of Multi-Plant Operations and Process Design Ralph W. Pike Director, Minerals Processing Research Institute Horton Professor of Chemical Engineering Louisiana State University Department of Chemical Engineering, Lamar University, April, 10, 2007 Process Optimization • Typical Industrial Problems • Mathematical Programming Software • Mathematical Basis for Optimization • Lagrange Multipliers and the Simplex Algorithm • Generalized Reduced Gradient Algorithm • On-Line Optimization • Mixed Integer Programming and the Branch and Bound Algorithm • Chemical Production Complex Optimization New Results • Using one computer language to write and run a program in another language • Cumulative probability distribution instead of an optimal point using Monte Carlo simulation for a multi-criteria, mixed integer nonlinear programming problem • Global optimization Design vs. Operations • Optimal Design −Uses flowsheet simulators and SQP – Heuristics for a design, a superstructure, an optimal design • Optimal Operations – On-line optimization – Plant optimal scheduling – Corporate supply chain optimization Plant Problem Size Contact Alkylation Ethylene 3,200 TPD 15,000 BPD 200 million lb/yr Units 14 76 ~200 Streams 35 110 ~4,000 Constraints Equality 761 1,579 ~400,000 Inequality 28 50 ~10,000 Variables Measured 43 125 ~300 Unmeasured 732 1,509 ~10,000 Parameters 11 64 ~100 Optimization Programming Languages • GAMS - General Algebraic Modeling System • LINDO - Widely used in business applications
    [Show full text]
  • The Generalized Simplex Method for Minimizing a Linear Form Under Linear Inequality Restraints
    Pacific Journal of Mathematics THE GENERALIZED SIMPLEX METHOD FOR MINIMIZING A LINEAR FORM UNDER LINEAR INEQUALITY RESTRAINTS GEORGE BERNARD DANTZIG,ALEXANDER ORDEN AND PHILIP WOLFE Vol. 5, No. 2 October 1955 THE GENERALIZED SIMPLEX METHOD FOR MINIMIZING A LINEAR FORM UNDER LINEAR INEQUALITY RESTRAINTS GEORGE B. DANTZIG, ALEX ORDEN, PHILIP WOLFE 1. Background and summary. The determination of "optimum" solutions of systems of linear inequalities is assuming increasing importance as a tool for mathematical analysis of certain problems in economics, logistics, and the theory of games [l;5] The solution of large systems is becoming more feasible with the advent of high-speed digital computers; however, as in the related problem of inversion of large matrices, there are difficulties which remain to be resolved connected with rank. This paper develops a theory for avoiding as- sumptions regarding rank of underlying matrices which has import in applica- tions where little or nothing is known about the rank of the linear inequality system under consideration. The simplex procedure is a finite iterative method which deals with problems involving linear inequalities in a manner closely analogous to the solution of linear equations or matrix inversion by Gaussian elimination. Like the latter it is useful in proving fundamental theorems on linear algebraic systems. For example, one form of the fundamental duality theorem associated with linear inequalities is easily shown as a direct consequence of solving the main prob- lem. Other forms can be obtained by trivial manipulations (for a fuller discus- sion of these interrelations, see [13]); in particular, the duality theorem [8; 10; 11; 12] leads directly to the Minmax theorem for zero-sum two-person games [id] and to a computational method (pointed out informally by Herman Rubin and demonstrated by Robert Dorfman [la]) which simultaneously yields optimal strategies for both players and also the value of the game.
    [Show full text]
  • The Gradient Sampling Methodology
    The Gradient Sampling Methodology James V. Burke∗ Frank E. Curtisy Adrian S. Lewisz Michael L. Overtonx January 14, 2019 1 Introduction in particular, this leads to calling the negative gra- dient, namely, −∇f(x), the direction of steepest de- The principal methodology for minimizing a smooth scent for f at x. However, when f is not differentiable function is the steepest descent (gradient) method. near x, one finds that following the negative gradient One way to extend this methodology to the minimiza- direction might offer only a small amount of decrease tion of a nonsmooth function involves approximating in f; indeed, obtaining decrease from x along −∇f(x) subdifferentials through the random sampling of gra- may be possible only with a very small stepsize. The dients. This approach, known as gradient sampling GS methodology is based on the idea of stabilizing (GS), gained a solid theoretical foundation about a this definition of steepest descent by instead finding decade ago [BLO05, Kiw07], and has developed into a a direction to approximately solve comprehensive methodology for handling nonsmooth, potentially nonconvex functions in the context of op- min max gT d; (2) kdk2≤1 g2@¯ f(x) timization algorithms. In this article, we summarize the foundations of the GS methodology, provide an ¯ where @f(x) is the -subdifferential of f at x [Gol77]. overview of the enhancements and extensions to it To understand the context of this idea, recall that that have developed over the past decade, and high- the (Clarke) subdifferential of a locally Lipschitz f light some interesting open questions related to GS.
    [Show full text]
  • An Efficient Three-Term Iterative Method for Estimating Linear
    mathematics Article An Efficient Three-Term Iterative Method for Estimating Linear Approximation Models in Regression Analysis Siti Farhana Husin 1,*, Mustafa Mamat 1, Mohd Asrul Hery Ibrahim 2 and Mohd Rivaie 3 1 Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, Terengganu 21300, Malaysia; [email protected] 2 Faculty of Entrepreneurship and Business, Universiti Malaysia Kelantan, Kelantan 16100, Malaysia; [email protected] 3 Department of Computer Sciences and Mathematics, Universiti Teknologi Mara, Terengganu 54000, Malaysia; [email protected] * Correspondence: [email protected] Received: 15 April 2020; Accepted: 20 May 2020; Published: 15 June 2020 Abstract: This study employs exact line search iterative algorithms for solving large scale unconstrained optimization problems in which the direction is a three-term modification of iterative method with two different scaled parameters. The objective of this research is to identify the effectiveness of the new directions both theoretically and numerically. Sufficient descent property and global convergence analysis of the suggested methods are established. For numerical experiment purposes, the methods are compared with the previous well-known three-term iterative method and each method is evaluated over the same set of test problems with different initial points. Numerical results show that the performances of the proposed three-term methods are more efficient and superior to the existing method. These methods could also produce an approximate linear regression equation to solve the regression model. The findings of this study can help better understanding of the applicability of numerical algorithms that can be used in estimating the regression model. Keywords: steepest descent method; large-scale unconstrained optimization; regression model 1.
    [Show full text]
  • Revised Simplex Algorithm - 1
    Lecture 9: Linear Programming: Revised Simplex and Interior Point Methods (a nice application of what we have learned so far) Prof. Krishna R. Pattipati Dept. of Electrical and Computer Engineering University of Connecticut Contact: [email protected] (860) 486-2890 ECE 6435 Fall 2008 Adv Numerical Methods in Sci Comp October 22, 2008 Copyright ©2004 by K. Pattipati Lecture Outline What is Linear Programming (LP)? Why do we need to solve Linear-Programming problems? • L1 and L∞ curve fitting (i.e., parameter estimation using 1-and ∞-norms) • Sample LP applications • Transportation Problems, Shortest Path Problems, Optimal Control, Diet Problem Methods for solving LP problems • Revised Simplex method • Ellipsoid method….not practical • Karmarkar’s projective scaling (interior point method) Implementation issues of the Least-Squares subproblem of Karmarkar’s method ….. More in Linear Programming and Network Flows course Comparison of Simplex and projective methods References 1. Dimitris Bertsimas and John N. Tsisiklis, Introduction to Linear Optimization, Athena Scientific, Belmont, MA, 1997. 2. I. Adler, M. G. C. Resende, G. Vega, and N. Karmarkar, “An Implementation of Karmarkar’s Algorithm for Linear Programming,” Mathematical Programming, Vol. 44, 1989, pp. 297-335. 3. I. Adler, N. Karmarkar, M. G. C. Resende, and G. Vega, “Data Structures and Programming Techniques for the Implementation of Karmarkar’s Algorithm,” ORSA Journal on Computing, Vol. 1, No. 2, 1989. 2 Copyright ©2004 by K. Pattipati What is Linear Programming? One of the most celebrated problems since 1951 • Major breakthroughs: • Dantzig: Simplex method (1947-1949) • Khachian: Ellipsoid method (1979) - Polynomial complexity, but not competitive with the Simplex → not practical.
    [Show full text]
  • A Non-Iterative Method for Optimizing Linear Functions in Convex Linearly Constrained Spaces
    A non-iterative method for optimizing linear functions in convex linearly constrained spaces GERARDO FEBRES*,** * Departamento de Procesos y Sistemas, Universidad Simón Bolívar, Venezuela. ** Laboratorio de Evolución, Universidad Simón Bolívar, Venezuela. Received… This document introduces a new strategy to solve linear optimization problems. The strategy is based on the bounding condition each constraint produces on each one of the problem’s dimension. The solution of a linear optimization problem is located at the intersection of the constraints defining the extreme vertex. By identifying the constraints that limit the growth of the objective function value, we formulate linear equations system leading to the optimization problem’s solution. The most complex operation of the algorithm is the inversion of a matrix sized by the number of dimensions of the problem. Therefore, the algorithm’s complexity is comparable to the corresponding to the classical Simplex method and the more recently developed Linear Programming algorithms. However, the algorithm offers the advantage of being non-iterative. Key Words: linear optimization; optimization algorithm; constraint redundancy; extreme points. 1. INTRODUCTION Since the apparition of Dantzig’s Simplex Algorithm in 1947, linear optimization has become a widely used method to model multidimensional decision problems. Appearing even before the establishment of digital computers, the first version of the Simplex algorithm has remained for long time as the most effective way to solve large scale linear problems; an interesting fact perhaps explained by the Dantzig [1] in a report written for Stanford University in 1987, where he indicated that even though most large scale problems are formulated with sparse matrices, the inverse of these matrixes are generally dense [1], thus producing an important burden to record and track these matrices as the algorithm advances toward the problem solution.
    [Show full text]
  • A New Algorithm of Nonlinear Conjugate Gradient Method with Strong Convergence*
    Volume 27, N. 1, pp. 93–106, 2008 Copyright © 2008 SBMAC ISSN 0101-8205 www.scielo.br/cam A new algorithm of nonlinear conjugate gradient method with strong convergence* ZHEN-JUN SHI1,2 and JINHUA GUO2 1College of Operations Research and Management, Qufu Normal University Rizhao, Sahndong 276826, P.R. China 2Department of Computer and Information Science, University of Michigan Dearborn, Michigan 48128-1491, USA E-mails: [email protected]; [email protected] / [email protected] Abstract. The nonlinear conjugate gradient method is a very useful technique for solving large scale minimization problems and has wide applications in many fields. In this paper, we present a new algorithm of nonlinear conjugate gradient method with strong convergence for unconstrained minimization problems. The new algorithm can generate an adequate trust region radius automatically at each iteration and has global convergence and linear convergence rate under some mild conditions. Numerical results show that the new algorithm is efficient in practical computation and superior to other similar methods in many situations. Mathematical subject classification: 90C30, 65K05, 49M37. Key words: unconstrained optimization, nonlinear conjugate gradient method, global conver- gence, linear convergence rate. 1 Introduction Consider an unconstrained minimization problem min f (x), x ∈ Rn, (1) where Rn is an n-dimensional Euclidean space and f : Rn −→ R is a continu- ously differentiable function. #724/07. Received: 10/V/07. Accepted: 24/IX/07. *The work was supported in part by NSF CNS-0521142, USA. 94 A NEW ALGORITHM OF NONLINEAR CONJUGATE GRADIENT METHOD When n is very large (for example, n > 106) the related problem is called large scale minimization problem.
    [Show full text]
  • Lecture Notes on Iterative Optimization Algorithms
    Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell December 8, 2014 Lecture Notes on Iterative Optimization Algorithms Contents Preface vii 1 Overview and Examples 1 1.1 Overview . 1 1.2 Auxiliary-Function Methods . 2 1.2.1 Barrier-Function Methods: An Example . 3 1.2.2 Barrier-Function Methods: Another Example . 3 1.2.3 Barrier-Function Methods for the Basic Problem 4 1.2.4 The SUMMA Class . 5 1.2.5 Cross-Entropy Methods . 5 1.2.6 Alternating Minimization . 6 1.2.7 Penalty-Function Methods . 6 1.3 Fixed-Point Methods . 7 1.3.1 Gradient Descent Algorithms . 7 1.3.2 Projected Gradient Descent . 8 1.3.3 Solving Ax = b ................... 9 1.3.4 Projected Landweber Algorithm . 9 1.3.5 The Split Feasibility Problem . 10 1.3.6 Firmly Nonexpansive Operators . 10 1.3.7 Averaged Operators . 11 1.3.8 Useful Properties of Operators on H . 11 1.3.9 Subdifferentials and Subgradients . 12 1.3.10 Monotone Operators . 13 1.3.11 The Baillon{Haddad Theorem . 14 2 Auxiliary-Function Methods and Examples 17 2.1 Auxiliary-Function Methods . 17 2.2 Majorization Minimization . 17 2.3 The Method of Auslander and Teboulle . 18 2.4 The EM Algorithm . 19 iii iv Contents 3 The SUMMA Class 21 3.1 The SUMMA Class of Algorithms . 21 3.2 Proximal Minimization . 21 3.2.1 The PMA . 22 3.2.2 Difficulties with the PMA . 22 3.2.3 All PMA are in the SUMMA Class . 23 3.2.4 Convergence of the PMA .
    [Show full text]
  • Iterative Methods for Solving Optimization Problems
    ITERATIVE METHODS FOR SOLVING OPTIMIZATION PROBLEMS Shoham Sabach ITERATIVE METHODS FOR SOLVING OPTIMIZATION PROBLEMS Research Thesis In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Shoham Sabach Submitted to the Senate of the Technion - Israel Institute of Technology Iyar 5772 Haifa May 2012 The research thesis was written under the supervision of Prof. Simeon Reich in the Department of Mathematics Publications: piq Reich, S. and Sabach, S.: Three strong convergence theorems for iterative methods for solving equi- librium problems in reflexive Banach spaces, Optimization Theory and Related Topics, Contemporary Mathematics, vol. 568, Amer. Math. Soc., Providence, RI, 2012, 225{240. piiq Mart´ın-M´arquez,V., Reich, S. and Sabach, S.: Iterative methods for approximating fixed points of Bregman nonexpansive operators, Discrete and Continuous Dynamical Systems, accepted for publi- cation. Impact Factor: 0.986. piiiq Sabach, S.: Products of finitely many resolvents of maximal monotone mappings in reflexive Banach spaces, SIAM Journal on Optimization 21 (2011), 1289{1308. Impact Factor: 2.091. pivq Kassay, G., Reich, S. and Sabach, S.: Iterative methods for solving systems of variational inequalities in reflexive Banach spaces, SIAM J. Optim. 21 (2011), 1319{1344. Impact Factor: 2.091. pvq Censor, Y., Gibali, A., Reich S. and Sabach, S.: The common variational inequality point problem, Set-Valued and Variational Analysis 20 (2012), 229{247. Impact Factor: 0.333. pviq Borwein, J. M., Reich, S. and Sabach, S.: Characterization of Bregman firmly nonexpansive operators using a new type of monotonicity, J. Nonlinear Convex Anal. 12 (2011), 161{183. Impact Factor: 0.738. pviiq Reich, S.
    [Show full text]
  • Topic 3 Iterative Methods for Ax = B
    Topic 3 Iterative methods for Ax = b 3.1 Introduction Earlier in the course, we saw how to reduce the linear system Ax = b to echelon form using elementary row operations. Solution methods that rely on this strategy (e.g. LU factorization) are robust and efficient, and are fundamental tools for solving the systems of linear equations that arise in practice. These are known as direct methods, since the solution x is obtained following a single pass through the relevant algorithm. We are now going to look at some alternative approaches that fall into the category of iterative methods . These techniques can only be applied to square linear systems ( n equations in n unknowns), but this is of course a common and important case. 45 Iterative methods for Ax = b begin with an approximation to the solution, x0, then seek to provide a series of improved approximations x1, x2, … that converge to the exact solution. For the engineer, this approach is appealing because it can be stopped as soon as the approximations xi have converged to an acceptable precision, which might be something as crude as 10 −3. With a direct method, bailing out early is not an option; the process of elimination and back-substitution has to be carried right through to completion, or else abandoned altogether. By far the main attraction of iterative methods, however, is that for certain problems (particularly those where the matrix A is large and sparse ) they are much faster than direct methods. On the other hand, iterative methods can be unreliable; for some problems they may exhibit very slow convergence, or they may not converge at all.
    [Show full text]