Efficient Implementation of the Simplex Method on a CPU-GPU

2011 IEEE International Parallel & Distributed Processing Symposium Efficient Implementation of the Simplex Method on a CPU-GPU System Mohamed Esseghir Lalami, Vincent Boyer, Didier El-Baz CNRS ; LAAS ; 7 avenue du colonel Roche, F-31077 Toulouse, France Universite´ de Toulouse ; UPS, INSA, INP, ISAE ; LAAS ; F-31077 Toulouse France Email: [email protected] [email protected] [email protected] Abstract—The Simplex algorithm is a well known method to best of our knowledge, these are the available references on solve linear programming (LP) problems. In this paper, we pro- parallel implementations on GPUs of algorithms for LP. pose a parallel implementation of the Simplex on a CPU-GPU The revised Simplex method is generally more efficient than systems via CUDA. Double precision implementation is used in order to improve the quality of solutions. Computational tests the standard Simplex method for large linear programming have been carried out on randomly generated instances for problems (see [11] and [12]), but for dense LP problems, non-sparse LP problems. The tests show a maximum speedup the two approaches are equivalent (see [13] and [14]). of 12:5 on a GTX 260 board. In this paper, we concentrate on the parallel implementation Keywords-hybrid computing; GPU computing; parallel com- of the standard Simplex algorithm on CPU-GPU systems puting; CUDA; Simplex method; linear programming. for dense LP problems. Dense linear programming problems occur in many important domains. In particular, some I. INTRODUCTION decompositions like Benders, Dantzig-Wolfe give rise to full Initially developed for real time and high-definition 3D dense LP problems. Reference is made to [15] and [16] for graphic applications, Graphics Processing Units (GPUs) applications leading to dense LP problems. have gained recently attention for High Performance Com- The standard Simplex method is an iterative method that puting applications. Indeed, the peak computational capa- manipulates independently at each iteration the elements of a bilities of modern GPUs exceeds the one of top-of-the-line fixed size matrix. The main challenge was to implement this central processing units (CPUs). GPUs are highly parallel, algorithm in double precision with CUDA C environment multithreaded, manycore units. without using existing NVIDIA libraries like CUBLAS In November 2006, NVIDIA introduced, Compute Unified and LAPACK in order to obtain as best speedup as we Device Architecture (CUDA), a technology that enables can. By identifying the tasks that can be parallelized and users to solve many complex problems on their GPU cards good management of GPUs memories one can obtain good (see for example [1] - [4]). speedup with regards to sequential implementation. Some related works have been presented on the parallel We have been solving linear programming problems in implementation of algorithms on GPU for linear program- the context of the solution of NP-complete combinatorial ming (LP) problems. O’Leary and Jung have proposed in optimization problems (see [17]). For example, one has to [5] a combined CPU-GPU implementation of the Interior solve frequently linear programming problems for bound Point Method for LP; computational results carried out on computation purpose when one uses branch and bound NETLIB LP problems [6] for at most 516 variables and algorithms and it may happen that some instances give rise 758 constraints, show that some speedup can be obtained to dense LP problems. The present work is part of a study on by using GPU for sufficiently large dense problems. the parallelization of optimization methods (see also [1]). Spampinato and Elster have proposed in [7] a parallel The paper is structured as follows. Section II deals with implementation of the revised Simplex method for LP on the Simplex method. The parallel implementation of the GPU with NVIDIA CUBLAS [8] and NVIDIA LAPACK Simplex algorithm on CPU-GPU systems is presented in [9] libraries. Tests were carried out on randomly generated Section III. The Section IV is devoted to presentation and LP problems of at most 2000 variables and 2000 constraints. analysis of computational results for randomly generated The implementation showed a maximum speedup of 2:5 instances. Finally, in Section V, we give some conclusions on a NVIDIA GTX 280 GPU as compared with sequential and perspectives. implementation on CPU with Intel Core2 Quad 2.83 GHz. Bieling, Peschlow and Martini have proposed in [10] an II. MATHEMATICAL BACKGROUND ON SIMPLEX other implementation of the revised Simplex method on METHOD GPU. This implementation permits one to speed up solution with a maximum factor of 18 in single precision on a Linear programming (LP) problems consist in maximizing NVIDIA GeForce 9600 GT GPU card as compared with (or minimizing) a linear objective function subject to a set of GLPK solver run on Intel Core 2 Duo 3GHz CPU. To the linear constraints. More formally, we consider the following 1530-2075/11 $26.00 © 2011 IEEE 19981994 DOI 10.1109/IPDPS.2011.362 problem : dimension n of nonbasic variables associated to N. 0 max x0 = cx ; The problem can then be written as follows: 0 0 0 s:t : A x ≤ b ; (1) −1 −1 0 x0 cBB b cBB N − cN x ≥ 0; = −1 − −1 xN : (3) xB B b B N with −1 0 Remark: By setting xN = 0, xB = B b and n −1 c = (c1; c2; :::; cn) 2 R ; x0 = cBB b, a feasible basic solution is obtained if 0 1 xB ≥ 0. a11 a12 ··· a1n 0 B a21 a22 ··· a2n C A = B C 2 Rm×n; Simplex tableau B . .. C @ . A am1 am2 ··· amn We introduce now the following notations: 2 3 and s0;0 0 T x = (x1; x2; :::; xn) ; s −1 6 1;0 7 cBB b • 6 7 ≡ 6 . 7 B−1b n and m are the number of variables and constraints, 4 . 5 respectively. sm;0 2 3 Inequality constraints can be written as equality constraints s0;1 s0;2 ··· s0;n s s ··· s −1 by introducing m new variables xn+l named slack variables, 6 1;1 1;2 1;n 7 cBB N − cN • 6 7 ≡ so that: 6 . .. 7 B−1N 4 . 5 sm;1 sm;2 ··· sm;n Then (3) can be written as follows: al1x1 + al2x2 + ::: + alnxn + xn+l = bl; l 2 f1; 2; : : : ; mg; 2 x 3 2 s 3 2 s s ··· s 3 with x ≥ 0 and c = 0. Then, the standard form of 0 0;0 0;1 0;2 0;n n+l n+l x s s s ··· s linear programming problem can be written as follows: 6 B1 7 6 1;0 7 6 1;1 1;2 1;n 7 6 . 7 = 6 . 7−6 . 7 xN : (4) 6 . 7 6 . 7 6 . .. 7 max x = cx; 4 5 4 5 4 5 0 x s s s ··· s s:t : Ax = b; (2) Bm m;0 m;1 m;2 m;n x ≥ 0; From (4), we construct the so called Simplex tableau as shown in Table I. with 0 c = (c ; 0; :::; 0) 2 R(n+m); Table I SIMPLEX TABLEAU 0 A = A ;I 2 Rm×(n+m); m x0 s0;0 s0;1 s0;2 ··· s0;n xB1 s1;0 s1;1 s1;2 ··· s1;n Im is the m × m identity matrix and x = . 0 . T . (x ; xn+1; xn+2; : : : ; xn+m) : x s s s ··· s In 1947, George Dantzig proposed the Simplex algorithm Bm m;0 m;1 m;2 m;n for solving linear programming problems (see [11]). The Simplex algorithm is a pivoting method that proceeds By adding the slack variables in LP problem (see 2) and 0 −1 from a first feasible extreme point solution of a LP setting N = A , B = Im ) B = Im, a first basic problem to another feasible solution, by using matrix feasible solution can be written as follows: 0 n −1 manipulations, the so-called pivoting operations, in such xN = x = (0; 0; :::; 0) 2 R and xB = B b = b. a way as to continually increase the objective value. At each iteration of the Simplex algorithm, we try to replace Different versions of this method have been proposed. In a basic variable, the so-called leaving variable, by a nonbasic this paper, we consider the method proposed by Garfinkel variable, the so-called entering variable, so that the objective and Nemhauser in [19] which improves the algorithm of function is increased. Then, a better feasible solution is Dantzig by reducing the number of operations and the yielded by updating the Simplex tableau. More formally, memory occupancy. the Simplex algorithm implements iteratively the following We suppose that the columns of A are permuted so that steps: A = (B; N), where B is an m × m nonsingular matrix. B • Step 1: Compute the index k of the smallest negative is so-called basic matrix for the LP problem. We denote by value of the first line of the Simplex tableau, i.e. xB the sub-vector of x of dimension m of basic variables k = arg min fs0;j j s0;j < 0g: associated to matrix B and xN the sub-vector of x of j=1;2;:::;n 19991995 The variable xk is the entering variable. If no such III. SIMPLEX ON CPU-GPU SYSTEM index is found, then the current solution is optimal, This section deals with the CPU- GPU implementation else we go to the next step. of the Simplex algorithm via CUDA. For that, a brief description of the GPU architecture is given in the following • Step 2: θ = s =s ; i = Compute the ratio i;k i;0 i;k paragraph. 1; 2; ··· ; m then compute index r as: A.

Load more