Accelerating Revised Simplex Method Using GPU-Based Basis Update IEEE Access, 8: 52121-52138
Total Page:16
File Type:pdf, Size:1020Kb
http://www.diva-portal.org This is the published version of a paper published in IEEE Access. Citation for the original published paper (version of record): Ali Shah, U., Yousaf, S., Ahmad, I., Ur Rehman, S., Ahmad, M O. (2020) Accelerating Revised Simplex Method using GPU-based Basis Update IEEE Access, 8: 52121-52138 https://doi.org/10.1109/ACCESS.2020.2980309 Access to the published version may require subscription. N.B. When citing this work, cite the original published paper. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Permanent link to this version: http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-77314 Received February 3, 2020, accepted March 2, 2020, date of publication March 12, 2020, date of current version March 25, 2020. Digital Object Identifier 10.1109/ACCESS.2020.2980309 Accelerating Revised Simplex Method Using GPU-Based Basis Update USMAN ALI SHAH 1, SUHAIL YOUSAF 1,2, IFTIKHAR AHMAD 1, SAFI UR REHMAN 3, AND MUHAMMAD OVAIS AHMAD 4 1Department of Computer Science and Information Technology, University of Engineering and Technology at Peshawar, Peshawar 25120, Pakistan 2Intelligent Systems Design Lab, National Center of Artificial Intelligence, University of Engineering and Technology at Peshawar, Peshawar 25120, Pakistan 3Department of Mining Engineering, Karakoram International University, Gilgit 15100, Pakistan 4Department of Mathematics and Computer Science, Karlstad University, 65188 Karlstad, Sweden Corresponding author: Muhammad Ovais Ahmad ([email protected]) ABSTRACT Optimization problems lie at the core of scientific and engineering endeavors. Solutions to these problems are often compute-intensive. To fulfill their compute-resource requirements, graphics processing unit (GPU) technology is considered a great opportunity. To this end, we focus on linear program- ming (LP) problem solving on GPUs using revised simplex method (RSM). This method has potentially GPU-friendly tasks, when applied to large dense problems. Basis update (BU) is one such task, which is performed in every iteration to update a matrix called basis-inverse matrix. The contribution of this paper is two-fold. Firstly, we experimentally analyzed the performance of existing GPU-based BU techniques. We discovered that the performance of a relatively old technique, in which each GPU thread computed one element of the basis-inverse matrix, could be significantly improved by introducing a vector-copy operation to its implementation with a sophisticated programming framework. Second, we extended the adapted element-wise technique to develop a new BU technique by using three inexpensive vector operations. This allowed us to reduce the number of floating-point operations and conditional processing performed by GPU threads. A comparison of BU techniques implemented in double precision showed that our proposed technique achieved 17.4% and 13.3% average speed-up over its closest competitor for randomly generated and well-known sets of problems, respectively. Furthermore, the new technique successfully updated basis- inverse matrix in relatively large problems, which the competitor was unable to update. These results strongly indicate that our proposed BU technique is not only efficient for dense RSM implementations but is also scalable. INDEX TERMS Dense matrices, GPU, GPGPU, linear programming, revised simplex method. I. INTRODUCTION Linear programming (LP), a technique used to find the Contemporary graphics processing units (GPUs) can eas- optimal value of a linear objective function subject to a set ily perform thousands of giga floating-point operations of linear constraints, is extensively used in a wide range of per second (GFLOPS) for efficient real-time processing of fields like scientific research, engineering, economics and high-definition (HD) graphics [1]. This capability of GPUs industry [5]. LP problem solvers also serve as drivers for has encouraged high-performance-computing community to optimizing more complex nonlinear and mixed integer pro- explore general-purpose computing on GPUs (GPGPU) for gramming (MIP) problems [6]. The two basic algorithms solving an ever-growing set of compute-intensive prob- available for solving LP problems, simplex method and lems [2]. NVIDIA has become a leading manufacturer of interior-point method [7], have at least some phases with GPUs used for general-purpose computing [3]. It provides potential to benefit from extensive computational capability a convenient programming environment based on CCC lan- of modern GPUs [8], [9]. An LP problem can be categorized guage, which is specifically tailored for its GPUs [4]. as being either dense or sparse, depending on the density of matrix containing coefficients of its constraints. Interior- point method is the preferred algorithm for solving sparse The associate editor coordinating the review of this manuscript and problems, which are more prevalent in real-world applica- approving it for publication was Juan Touriño . tions [10]. GPU implementations of interior-point method for This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 8, 2020 52121 U. A. Shah et al.: Accelerating Revised Simplex Method Using GPU-Based Basis Update sparse problems have been proposed, which use the GPU During literature review, we came across five single- either for processing sets of similar columns that can be problem-single-GPU implementations of RSM. Greeff [20] treated as dense submatrices, called supernodes, present in first proposed a GPU implementation of RSM as early as an otherwise sparse matrix or for multiplying sparse matrices 2005. In 2009, Spampinato and Elster [21] extended Gre- with dense vectors [9]–[11]. However, the peculiar memory eff's work by using NVIDIA's proprietary programming organization of GPUs makes them more efficient at process- environment, known as compute-unified device architecture ing dense data structures [8]. This encouraged us to consider (CUDA) [4]. They implemented the classical PFI-based BU in this work an adaptation of simplex method that is suitable technique in single precision, using linear algebra functions for solving dense LP problems. Dense LP problems are less from NVIDIA's highly optimized CUBLAS library [22]. widely used than sparse problems, yet they remain useful in In 2010, Bieling et al. [16] proposed another single-precision fields like digital filter circuit design, data analysis and clas- implementation of RSM using NVIDIA's Cg programming sification, financial planning and portfolio management [12]. language, which has since become obsolete [23]. Instead Dense problems also arise while solving LP problems having of directly using either PFI or MPFI, they proposed a BU special structural properties using Benders or Dantzig-Wolfe technique in which each GPU thread computed exactly one decomposition [13]. element of matrix B−1. In 2013, Ploskas and Samaras [19] GPU implementations of standard simplex method published the results of a comparative study of different BU have already been proposed [13], [14]. Revised simplex techniques that could be used in GPU implementations of method (RSM) is an extension of the standard algo- RSM. They concluded that MPFI was more efficient than rithm, which is suitable for developing efficient computer both LU factorization and PFI. It is important to mention implementations owing to its use of matrix manipulation that they did not consider the element-wise BU technique operations [15], [16]. Even though, relatively advanced proposed earlier by Bieling et al. Later in 2015, Ploskas and GPU-based solvers have been proposed, like multi-GPU Samaras proposed a GPU implementation of RSM, which and batch-processing implementations of standard simplex used MPFI to perform BU [8]. They used MATLAB to method [14], [17], [18], we decided to focus on the tra- develop their implementation in double precision. The latest ditional approach of solving a single large LP problem GPU implementation of RSM was proposed by He et al. using a single GPU. We selected this approach because our in 2018 [24]. They followed a column-wise approach towards analysis revealed that there still was potential for improv- updating the matrix B−1 by exploiting symmetry among its ing the performance of existing single-problem-single-GPU columns. They developed their implementation using CUDA solvers. Furthermore, any resultant enhancement could be in single precision. However, they did not provide perfor- incorporated into future multi-GPU and batch-processing mance comparison of their technique against any of the pre- implementations. viously proposed techniques. RSM is an iterative algorithm. During each iteration, We began by implementing all the BU techniques dis- it requires solution of a system of linear equations for dif- cussed in the preceding paragraph using CUDA. It turned out ferent right-hand-side (RHS) vectors, but the same coeffi- that the column-wise technique, proposed by He et al., pro- cient matrix, called basis matrix (B). Three main methods vided the best performance. However, it had a drawback in the are available for solving these systems of linear equations. form of its inability to update matrix B−1 in large problems. The first method, called Gaussian Elimination, is relatively This encouraged us to explore possibilities for developing a expensive in terms of its time complexity [19]. The sec- BU technique that was efficient