Efficient Implementation of the Simplex Method on a CPU-GPU

Total Page:16

File Type:pdf, Size:1020Kb

Efficient Implementation of the Simplex Method on a CPU-GPU 2011 IEEE International Parallel & Distributed Processing Symposium Efficient Implementation of the Simplex Method on a CPU-GPU System Mohamed Esseghir Lalami, Vincent Boyer, Didier El-Baz CNRS ; LAAS ; 7 avenue du colonel Roche, F-31077 Toulouse, France Universite´ de Toulouse ; UPS, INSA, INP, ISAE ; LAAS ; F-31077 Toulouse France Email: [email protected] [email protected] [email protected] Abstract—The Simplex algorithm is a well known method to best of our knowledge, these are the available references on solve linear programming (LP) problems. In this paper, we pro- parallel implementations on GPUs of algorithms for LP. pose a parallel implementation of the Simplex on a CPU-GPU The revised Simplex method is generally more efficient than systems via CUDA. Double precision implementation is used in order to improve the quality of solutions. Computational tests the standard Simplex method for large linear programming have been carried out on randomly generated instances for problems (see [11] and [12]), but for dense LP problems, non-sparse LP problems. The tests show a maximum speedup the two approaches are equivalent (see [13] and [14]). of 12:5 on a GTX 260 board. In this paper, we concentrate on the parallel implementation Keywords-hybrid computing; GPU computing; parallel com- of the standard Simplex algorithm on CPU-GPU systems puting; CUDA; Simplex method; linear programming. for dense LP problems. Dense linear programming prob- lems occur in many important domains. In particular, some I. INTRODUCTION decompositions like Benders, Dantzig-Wolfe give rise to full Initially developed for real time and high-definition 3D dense LP problems. Reference is made to [15] and [16] for graphic applications, Graphics Processing Units (GPUs) applications leading to dense LP problems. have gained recently attention for High Performance Com- The standard Simplex method is an iterative method that puting applications. Indeed, the peak computational capa- manipulates independently at each iteration the elements of a bilities of modern GPUs exceeds the one of top-of-the-line fixed size matrix. The main challenge was to implement this central processing units (CPUs). GPUs are highly parallel, algorithm in double precision with CUDA C environment multithreaded, manycore units. without using existing NVIDIA libraries like CUBLAS In November 2006, NVIDIA introduced, Compute Unified and LAPACK in order to obtain as best speedup as we Device Architecture (CUDA), a technology that enables can. By identifying the tasks that can be parallelized and users to solve many complex problems on their GPU cards good management of GPUs memories one can obtain good (see for example [1] - [4]). speedup with regards to sequential implementation. Some related works have been presented on the parallel We have been solving linear programming problems in implementation of algorithms on GPU for linear program- the context of the solution of NP-complete combinatorial ming (LP) problems. O’Leary and Jung have proposed in optimization problems (see [17]). For example, one has to [5] a combined CPU-GPU implementation of the Interior solve frequently linear programming problems for bound Point Method for LP; computational results carried out on computation purpose when one uses branch and bound NETLIB LP problems [6] for at most 516 variables and algorithms and it may happen that some instances give rise 758 constraints, show that some speedup can be obtained to dense LP problems. The present work is part of a study on by using GPU for sufficiently large dense problems. the parallelization of optimization methods (see also [1]). Spampinato and Elster have proposed in [7] a parallel The paper is structured as follows. Section II deals with implementation of the revised Simplex method for LP on the Simplex method. The parallel implementation of the GPU with NVIDIA CUBLAS [8] and NVIDIA LAPACK Simplex algorithm on CPU-GPU systems is presented in [9] libraries. Tests were carried out on randomly generated Section III. The Section IV is devoted to presentation and LP problems of at most 2000 variables and 2000 constraints. analysis of computational results for randomly generated The implementation showed a maximum speedup of 2:5 instances. Finally, in Section V, we give some conclusions on a NVIDIA GTX 280 GPU as compared with sequential and perspectives. implementation on CPU with Intel Core2 Quad 2.83 GHz. Bieling, Peschlow and Martini have proposed in [10] an II. MATHEMATICAL BACKGROUND ON SIMPLEX other implementation of the revised Simplex method on METHOD GPU. This implementation permits one to speed up solution with a maximum factor of 18 in single precision on a Linear programming (LP) problems consist in maximizing NVIDIA GeForce 9600 GT GPU card as compared with (or minimizing) a linear objective function subject to a set of GLPK solver run on Intel Core 2 Duo 3GHz CPU. To the linear constraints. More formally, we consider the following 1530-2075/11 $26.00 © 2011 IEEE 19981994 DOI 10.1109/IPDPS.2011.362 problem : dimension n of nonbasic variables associated to N. 0 max x0 = cx ; The problem can then be written as follows: 0 0 0 s:t : A x ≤ b ; (1) −1 −1 0 x0 cBB b cBB N − cN x ≥ 0; = −1 − −1 xN : (3) xB B b B N with −1 0 Remark: By setting xN = 0, xB = B b and n −1 c = (c1; c2; :::; cn) 2 R ; x0 = cBB b, a feasible basic solution is obtained if 0 1 xB ≥ 0. a11 a12 ··· a1n 0 B a21 a22 ··· a2n C A = B C 2 Rm×n; Simplex tableau B . .. C @ . A am1 am2 ··· amn We introduce now the following notations: 2 3 and s0;0 0 T x = (x1; x2; :::; xn) ; s −1 6 1;0 7 cBB b • 6 7 ≡ 6 . 7 B−1b n and m are the number of variables and constraints, 4 . 5 respectively. sm;0 2 3 Inequality constraints can be written as equality constraints s0;1 s0;2 ··· s0;n s s ··· s −1 by introducing m new variables xn+l named slack variables, 6 1;1 1;2 1;n 7 cBB N − cN • 6 7 ≡ so that: 6 . .. 7 B−1N 4 . 5 sm;1 sm;2 ··· sm;n Then (3) can be written as follows: al1x1 + al2x2 + ::: + alnxn + xn+l = bl; l 2 f1; 2; : : : ; mg; 2 x 3 2 s 3 2 s s ··· s 3 with x ≥ 0 and c = 0. Then, the standard form of 0 0;0 0;1 0;2 0;n n+l n+l x s s s ··· s linear programming problem can be written as follows: 6 B1 7 6 1;0 7 6 1;1 1;2 1;n 7 6 . 7 = 6 . 7−6 . 7 xN : (4) 6 . 7 6 . 7 6 . .. 7 max x = cx; 4 5 4 5 4 5 0 x s s s ··· s s:t : Ax = b; (2) Bm m;0 m;1 m;2 m;n x ≥ 0; From (4), we construct the so called Simplex tableau as shown in Table I. with 0 c = (c ; 0; :::; 0) 2 R(n+m); Table I SIMPLEX TABLEAU 0 A = A ;I 2 Rm×(n+m); m x0 s0;0 s0;1 s0;2 ··· s0;n xB1 s1;0 s1;1 s1;2 ··· s1;n Im is the m × m identity matrix and x = . 0 . T . (x ; xn+1; xn+2; : : : ; xn+m) : x s s s ··· s In 1947, George Dantzig proposed the Simplex algorithm Bm m;0 m;1 m;2 m;n for solving linear programming problems (see [11]). The Simplex algorithm is a pivoting method that proceeds By adding the slack variables in LP problem (see 2) and 0 −1 from a first feasible extreme point solution of a LP setting N = A , B = Im ) B = Im, a first basic problem to another feasible solution, by using matrix feasible solution can be written as follows: 0 n −1 manipulations, the so-called pivoting operations, in such xN = x = (0; 0; :::; 0) 2 R and xB = B b = b. a way as to continually increase the objective value. At each iteration of the Simplex algorithm, we try to replace Different versions of this method have been proposed. In a basic variable, the so-called leaving variable, by a nonbasic this paper, we consider the method proposed by Garfinkel variable, the so-called entering variable, so that the objective and Nemhauser in [19] which improves the algorithm of function is increased. Then, a better feasible solution is Dantzig by reducing the number of operations and the yielded by updating the Simplex tableau. More formally, memory occupancy. the Simplex algorithm implements iteratively the following We suppose that the columns of A are permuted so that steps: A = (B; N), where B is an m × m nonsingular matrix. B • Step 1: Compute the index k of the smallest negative is so-called basic matrix for the LP problem. We denote by value of the first line of the Simplex tableau, i.e. xB the sub-vector of x of dimension m of basic variables k = arg min fs0;j j s0;j < 0g: associated to matrix B and xN the sub-vector of x of j=1;2;:::;n 19991995 The variable xk is the entering variable. If no such III. SIMPLEX ON CPU-GPU SYSTEM index is found, then the current solution is optimal, This section deals with the CPU- GPU implementation else we go to the next step. of the Simplex algorithm via CUDA. For that, a brief description of the GPU architecture is given in the following • Step 2: θ = s =s ; i = Compute the ratio i;k i;0 i;k paragraph. 1; 2; ··· ; m then compute index r as: A.
Recommended publications
  • A Learning-Based Iterative Method for Solving Vehicle
    Published as a conference paper at ICLR 2020 ALEARNING-BASED ITERATIVE METHOD FOR SOLVING VEHICLE ROUTING PROBLEMS Hao Lu ∗ Princeton University, Princeton, NJ 08540 [email protected] Xingwen Zhang ∗ & Shuang Yang Ant Financial Services Group, San Mateo, CA 94402 fxingwen.zhang,[email protected] ABSTRACT This paper is concerned with solving combinatorial optimization problems, in par- ticular, the capacitated vehicle routing problems (CVRP). Classical Operations Research (OR) algorithms such as LKH3 (Helsgaun, 2017) are inefficient and dif- ficult to scale to larger-size problems. Machine learning based approaches have recently shown to be promising, partly because of their efficiency (once trained, they can perform solving within minutes or even seconds). However, there is still a considerable gap between the quality of a machine learned solution and what OR methods can offer (e.g., on CVRP-100, the best result of learned solutions is between 16.10-16.80, significantly worse than LKH3’s 15.65). In this paper, we present “Learn to Improve” (L2I), the first learning based approach for CVRP that is efficient in solving speed and at the same time outperforms OR methods. Start- ing with a random initial solution, L2I learns to iteratively refine the solution with an improvement operator, selected by a reinforcement learning based controller. The improvement operator is selected from a pool of powerful operators that are customized for routing problems. By combining the strengths of the two worlds, our approach achieves the new state-of-the-art results on CVRP, e.g., an average cost of 15.57 on CVRP-100. 1 INTRODUCTION In this paper, we focus on an important class of combinatorial optimization, vehicle routing problems (VRP), which have a wide range of applications in logistics.
    [Show full text]
  • A Generalized Dual Phase-2 Simplex Algorithm1
    A Generalized Dual Phase-2 Simplex Algorithm1 Istvan´ Maros Department of Computing, Imperial College, London Email: [email protected] Departmental Technical Report 2001/2 ISSN 1469–4174 January 2001, revised December 2002 1This research was supported in part by EPSRC grant GR/M41124. I. Maros Phase-2 of Dual Simplex i Contents 1 Introduction 1 2 Problem statement 2 2.1 The primal problem . 2 2.2 The dual problem . 3 3 Dual feasibility 5 4 Dual simplex methods 6 4.1 Traditional algorithm . 7 4.2 Dual algorithm with type-1 variables present . 8 4.3 Dual algorithm with all types of variables . 9 5 Bound swap in dual 11 6 Bound Swapping Dual (BSD) algorithm 14 6.1 Step-by-step description of BSD . 14 6.2 Work per iteration . 16 6.3 Degeneracy . 17 6.4 Implementation . 17 6.5 Features . 18 7 Two examples for the algorithmic step of BSD 19 8 Computational experience 22 9 Summary 24 10 Acknowledgements 25 Abstract Recently, the dual simplex method has attracted considerable interest. This is mostly due to its important role in solving mixed integer linear programming problems where dual feasible bases are readily available. Even more than that, it is a realistic alternative to the primal simplex method in many cases. Real world linear programming problems include all types of variables and constraints. This necessitates a version of the dual simplex algorithm that can handle all types of variables efficiently. The paper presents increasingly more capable dual algorithms that evolve into one which is based on the piecewise linear nature of the dual objective function.
    [Show full text]
  • The Generalized Simplex Method for Minimizing a Linear Form Under Linear Inequality Restraints
    Pacific Journal of Mathematics THE GENERALIZED SIMPLEX METHOD FOR MINIMIZING A LINEAR FORM UNDER LINEAR INEQUALITY RESTRAINTS GEORGE BERNARD DANTZIG,ALEXANDER ORDEN AND PHILIP WOLFE Vol. 5, No. 2 October 1955 THE GENERALIZED SIMPLEX METHOD FOR MINIMIZING A LINEAR FORM UNDER LINEAR INEQUALITY RESTRAINTS GEORGE B. DANTZIG, ALEX ORDEN, PHILIP WOLFE 1. Background and summary. The determination of "optimum" solutions of systems of linear inequalities is assuming increasing importance as a tool for mathematical analysis of certain problems in economics, logistics, and the theory of games [l;5] The solution of large systems is becoming more feasible with the advent of high-speed digital computers; however, as in the related problem of inversion of large matrices, there are difficulties which remain to be resolved connected with rank. This paper develops a theory for avoiding as- sumptions regarding rank of underlying matrices which has import in applica- tions where little or nothing is known about the rank of the linear inequality system under consideration. The simplex procedure is a finite iterative method which deals with problems involving linear inequalities in a manner closely analogous to the solution of linear equations or matrix inversion by Gaussian elimination. Like the latter it is useful in proving fundamental theorems on linear algebraic systems. For example, one form of the fundamental duality theorem associated with linear inequalities is easily shown as a direct consequence of solving the main prob- lem. Other forms can be obtained by trivial manipulations (for a fuller discus- sion of these interrelations, see [13]); in particular, the duality theorem [8; 10; 11; 12] leads directly to the Minmax theorem for zero-sum two-person games [id] and to a computational method (pointed out informally by Herman Rubin and demonstrated by Robert Dorfman [la]) which simultaneously yields optimal strategies for both players and also the value of the game.
    [Show full text]
  • Triangular Factorization
    Chapter 1 Triangular Factorization This chapter deals with the factorization of arbitrary matrices into products of triangular matrices. Since the solution of a linear n n system can be easily obtained once the matrix is factored into the product× of triangular matrices, we will concentrate on the factorization of square matrices. Specifically, we will show that an arbitrary n n matrix A has the factorization P A = LU where P is an n n permutation matrix,× L is an n n unit lower triangular matrix, and U is an n ×n upper triangular matrix. In connection× with this factorization we will discuss pivoting,× i.e., row interchange, strategies. We will also explore circumstances for which A may be factored in the forms A = LU or A = LLT . Our results for a square system will be given for a matrix with real elements but can easily be generalized for complex matrices. The corresponding results for a general m n matrix will be accumulated in Section 1.4. In the general case an arbitrary m× n matrix A has the factorization P A = LU where P is an m m permutation× matrix, L is an m m unit lower triangular matrix, and U is an×m n matrix having row echelon structure.× × 1.1 Permutation matrices and Gauss transformations We begin by defining permutation matrices and examining the effect of premulti- plying or postmultiplying a given matrix by such matrices. We then define Gauss transformations and show how they can be used to introduce zeros into a vector. Definition 1.1 An m m permutation matrix is a matrix whose columns con- sist of a rearrangement of× the m unit vectors e(j), j = 1,...,m, in RI m, i.e., a rearrangement of the columns (or rows) of the m m identity matrix.
    [Show full text]
  • An Efficient Three-Term Iterative Method for Estimating Linear
    mathematics Article An Efficient Three-Term Iterative Method for Estimating Linear Approximation Models in Regression Analysis Siti Farhana Husin 1,*, Mustafa Mamat 1, Mohd Asrul Hery Ibrahim 2 and Mohd Rivaie 3 1 Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, Terengganu 21300, Malaysia; [email protected] 2 Faculty of Entrepreneurship and Business, Universiti Malaysia Kelantan, Kelantan 16100, Malaysia; [email protected] 3 Department of Computer Sciences and Mathematics, Universiti Teknologi Mara, Terengganu 54000, Malaysia; [email protected] * Correspondence: [email protected] Received: 15 April 2020; Accepted: 20 May 2020; Published: 15 June 2020 Abstract: This study employs exact line search iterative algorithms for solving large scale unconstrained optimization problems in which the direction is a three-term modification of iterative method with two different scaled parameters. The objective of this research is to identify the effectiveness of the new directions both theoretically and numerically. Sufficient descent property and global convergence analysis of the suggested methods are established. For numerical experiment purposes, the methods are compared with the previous well-known three-term iterative method and each method is evaluated over the same set of test problems with different initial points. Numerical results show that the performances of the proposed three-term methods are more efficient and superior to the existing method. These methods could also produce an approximate linear regression equation to solve the regression model. The findings of this study can help better understanding of the applicability of numerical algorithms that can be used in estimating the regression model. Keywords: steepest descent method; large-scale unconstrained optimization; regression model 1.
    [Show full text]
  • An Efficient Implementation of the Thomas-Algorithm for Block Penta
    An Efficient Implementation of the Thomas-Algorithm for Block Penta-diagonal Systems on Vector Computers Katharina Benkert1 and Rudolf Fischer2 1 High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, 70569 Stuttgart, Germany [email protected] 2 NEC High Performance Computing Europe GmbH, Prinzenallee 11, 40549 Duesseldorf, Germany [email protected] Abstract. In simulations of supernovae, linear systems of equations with a block penta-diagonal matrix possessing small, dense matrix blocks occur. For an efficient solution, a compact multiplication scheme based on a restructured version of the Thomas algorithm and specifically adapted routines for LU factorization as well as forward and backward substi- tution are presented. On a NEC SX-8 vector system, runtime could be decreased between 35% and 54% for block sizes varying from 20 to 85 compared to the original code with BLAS and LAPACK routines. Keywords: Thomas algorithm, vector architecture. 1 Introduction Neutrino transport and neutrino interactions in dense matter play a crucial role in stellar core collapse, supernova explosions and neutron star formation. The multidimensional neutrino radiation hydrodynamics code PROMETHEUS / VERTEX [1] discretizes the angular moment equations of the Boltzmann equa- tion giving rise to a non-linear algebraic system. It is solved by a Newton Raphson procedure, which in turn requires the solution of multiple block-penta- diagonal linear systems with small, dense matrix blocks in each step. This is achieved by the Thomas algorithm and takes a major part of the overall com- puting time. Since the code already performs well on vector computers, this kind of architecture has been the focus of the current work.
    [Show full text]
  • Pivoting for LU Factorization
    Pivoting for LU Factorization Matthew W. Reid April 21, 2014 University of Puget Sound E-mail: [email protected] Copyright (C) 2014 Matthew W. Reid. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". 1 INTRODUCTION 1 1 Introduction Pivoting for LU factorization is the process of systematically selecting pivots for Gaussian elimina- tion during the LU factorization of a matrix. The LU factorization is closely related to Gaussian elimination, which is unstable in its pure form. To guarantee the elimination process goes to com- pletion, we must ensure that there is a nonzero pivot at every step of the elimination process. This is the reason we need pivoting when computing LU factorizations. But we can do more with piv- oting than just making sure Gaussian elimination completes. We can reduce roundoff errors during computation and make our algorithm backward stable by implementing the right pivoting strategy. Depending on the matrix A, some LU decompositions can become numerically unstable if relatively small pivots are used. Relatively small pivots cause instability because they operate very similar to zeros during Gaussian elimination. Through the process of pivoting, we can greatly reduce this instability by ensuring that we use relatively large entries as our pivot elements. This prevents large factors from appearing in the computed L and U, which reduces roundoff errors during computa- tion.
    [Show full text]
  • Revised Simplex Algorithm - 1
    Lecture 9: Linear Programming: Revised Simplex and Interior Point Methods (a nice application of what we have learned so far) Prof. Krishna R. Pattipati Dept. of Electrical and Computer Engineering University of Connecticut Contact: [email protected] (860) 486-2890 ECE 6435 Fall 2008 Adv Numerical Methods in Sci Comp October 22, 2008 Copyright ©2004 by K. Pattipati Lecture Outline What is Linear Programming (LP)? Why do we need to solve Linear-Programming problems? • L1 and L∞ curve fitting (i.e., parameter estimation using 1-and ∞-norms) • Sample LP applications • Transportation Problems, Shortest Path Problems, Optimal Control, Diet Problem Methods for solving LP problems • Revised Simplex method • Ellipsoid method….not practical • Karmarkar’s projective scaling (interior point method) Implementation issues of the Least-Squares subproblem of Karmarkar’s method ….. More in Linear Programming and Network Flows course Comparison of Simplex and projective methods References 1. Dimitris Bertsimas and John N. Tsisiklis, Introduction to Linear Optimization, Athena Scientific, Belmont, MA, 1997. 2. I. Adler, M. G. C. Resende, G. Vega, and N. Karmarkar, “An Implementation of Karmarkar’s Algorithm for Linear Programming,” Mathematical Programming, Vol. 44, 1989, pp. 297-335. 3. I. Adler, N. Karmarkar, M. G. C. Resende, and G. Vega, “Data Structures and Programming Techniques for the Implementation of Karmarkar’s Algorithm,” ORSA Journal on Computing, Vol. 1, No. 2, 1989. 2 Copyright ©2004 by K. Pattipati What is Linear Programming? One of the most celebrated problems since 1951 • Major breakthroughs: • Dantzig: Simplex method (1947-1949) • Khachian: Ellipsoid method (1979) - Polynomial complexity, but not competitive with the Simplex → not practical.
    [Show full text]
  • A Non-Iterative Method for Optimizing Linear Functions in Convex Linearly Constrained Spaces
    A non-iterative method for optimizing linear functions in convex linearly constrained spaces GERARDO FEBRES*,** * Departamento de Procesos y Sistemas, Universidad Simón Bolívar, Venezuela. ** Laboratorio de Evolución, Universidad Simón Bolívar, Venezuela. Received… This document introduces a new strategy to solve linear optimization problems. The strategy is based on the bounding condition each constraint produces on each one of the problem’s dimension. The solution of a linear optimization problem is located at the intersection of the constraints defining the extreme vertex. By identifying the constraints that limit the growth of the objective function value, we formulate linear equations system leading to the optimization problem’s solution. The most complex operation of the algorithm is the inversion of a matrix sized by the number of dimensions of the problem. Therefore, the algorithm’s complexity is comparable to the corresponding to the classical Simplex method and the more recently developed Linear Programming algorithms. However, the algorithm offers the advantage of being non-iterative. Key Words: linear optimization; optimization algorithm; constraint redundancy; extreme points. 1. INTRODUCTION Since the apparition of Dantzig’s Simplex Algorithm in 1947, linear optimization has become a widely used method to model multidimensional decision problems. Appearing even before the establishment of digital computers, the first version of the Simplex algorithm has remained for long time as the most effective way to solve large scale linear problems; an interesting fact perhaps explained by the Dantzig [1] in a report written for Stanford University in 1987, where he indicated that even though most large scale problems are formulated with sparse matrices, the inverse of these matrixes are generally dense [1], thus producing an important burden to record and track these matrices as the algorithm advances toward the problem solution.
    [Show full text]
  • Mergesort / Quicksort Steven Skiena
    Lecture 8: Mergesort / Quicksort Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794–4400 http://www.cs.sunysb.edu/∼skiena Problem of the Day Given an array-based heap on n elements and a real number x, efficiently determine whether the kth smallest in the heap is greater than or equal to x. Your algorithm should be O(k) in the worst-case, independent of the size of the heap. Hint: you not have to find the kth smallest element; you need only determine its relationship to x. Solution Mergesort Recursive algorithms are based on reducing large problems into small ones. A nice recursive approach to sorting involves partitioning the elements into two groups, sorting each of the smaller problems recursively, and then interleaving the two sorted lists to totally order the elements. Mergesort Implementation mergesort(item type s[], int low, int high) f int i; (* counter *) int middle; (* index of middle element *) if (low < high) f middle = (low+high)/2; mergesort(s,low,middle); mergesort(s,middle+1,high); merge(s, low, middle, high); g g Mergesort Animation M E R G E S O R T M E R G E S O R T M E R G E S O R T M E M E R G E S O R T E M E M R E G O S R T E E G M R O R S T E E G M O R R S T Merging Sorted Lists The efficiency of mergesort depends upon how efficiently we combine the two sorted halves into a single sorted list.
    [Show full text]
  • Lecture Notes on Iterative Optimization Algorithms
    Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell December 8, 2014 Lecture Notes on Iterative Optimization Algorithms Contents Preface vii 1 Overview and Examples 1 1.1 Overview . 1 1.2 Auxiliary-Function Methods . 2 1.2.1 Barrier-Function Methods: An Example . 3 1.2.2 Barrier-Function Methods: Another Example . 3 1.2.3 Barrier-Function Methods for the Basic Problem 4 1.2.4 The SUMMA Class . 5 1.2.5 Cross-Entropy Methods . 5 1.2.6 Alternating Minimization . 6 1.2.7 Penalty-Function Methods . 6 1.3 Fixed-Point Methods . 7 1.3.1 Gradient Descent Algorithms . 7 1.3.2 Projected Gradient Descent . 8 1.3.3 Solving Ax = b ................... 9 1.3.4 Projected Landweber Algorithm . 9 1.3.5 The Split Feasibility Problem . 10 1.3.6 Firmly Nonexpansive Operators . 10 1.3.7 Averaged Operators . 11 1.3.8 Useful Properties of Operators on H . 11 1.3.9 Subdifferentials and Subgradients . 12 1.3.10 Monotone Operators . 13 1.3.11 The Baillon{Haddad Theorem . 14 2 Auxiliary-Function Methods and Examples 17 2.1 Auxiliary-Function Methods . 17 2.2 Majorization Minimization . 17 2.3 The Method of Auslander and Teboulle . 18 2.4 The EM Algorithm . 19 iii iv Contents 3 The SUMMA Class 21 3.1 The SUMMA Class of Algorithms . 21 3.2 Proximal Minimization . 21 3.2.1 The PMA . 22 3.2.2 Difficulties with the PMA . 22 3.2.3 All PMA are in the SUMMA Class . 23 3.2.4 Convergence of the PMA .
    [Show full text]
  • Practical Implementation of the Active Set Method for Support Vector Machine Training with Semi-Definite Kernels
    PRACTICAL IMPLEMENTATION OF THE ACTIVE SET METHOD FOR SUPPORT VECTOR MACHINE TRAINING WITH SEMI-DEFINITE KERNELS by CHRISTOPHER GARY SENTELLE B.S. of Electrical Engineering University of Nebraska-Lincoln, 1993 M.S. of Electrical Engineering University of Nebraska-Lincoln, 1995 A dissertation submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in the Department of Electrical Engineering and Computer Science in the College of Engineering and Computer Science at the University of Central Florida Orlando, Florida Spring Term 2014 Major Professor: Michael Georgiopoulos c 2014 Christopher Sentelle ii ABSTRACT The Support Vector Machine (SVM) is a popular binary classification model due to its superior generalization performance, relative ease-of-use, and applicability of kernel methods. SVM train- ing entails solving an associated quadratic programming (QP) that presents significant challenges in terms of speed and memory constraints for very large datasets; therefore, research on numer- ical optimization techniques tailored to SVM training is vast. Slow training times are especially of concern when one considers that re-training is often necessary at several values of the models regularization parameter, C, as well as associated kernel parameters. The active set method is suitable for solving SVM problem and is in general ideal when the Hessian is dense and the solution is sparse–the case for the `1-loss SVM formulation. There has recently been renewed interest in the active set method as a technique for exploring the entire SVM regular- ization path, which has been shown to solve the SVM solution at all points along the regularization path (all values of C) in not much more time than it takes, on average, to perform training at a sin- gle value of C with traditional methods.
    [Show full text]