The Linear Direct Sparse Solver on GPU for Bundle Adjustment Method

i iii v Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Master’s Thesis The Linear Direct Sparse Solver on GPU for Bundle Adjustment Method Bc. Ondrej Ivančík Supervisor: Ing. Ivan Šimeček, Ph.D. Study Programme: Open Informatics Field of Study: Computer Vision and Image Processing May 11, 2012 v vi Aknowledgements I would like to thank to my supervisor Ivan Šimeček who enabled me to deal with a very interesting topic and to prof. Olaf Hellwich and Cornelius Wefelscheid who allow me to work on my thesis within an individual project at TU Berlin. vii viii Declaration I hereby declare that I have completed this thesis independently and that I have listed all the literature and publications used. I have no objection to usage of this work in compliance with the act §60 Zákon č. 121/2000Sb. (copyright law), and with the rights connected with the copyright act including the changes in the act. Prague, May 11, 2012 ix Abstract The thesis deals with solving of sparse linear positive definite systems. It implements Cholesky decomposition on CPU utilizing a CRS format for sparse matrices, a fast AMD ordering, and a symbolic factorization. Analysed are possibilities of a parallelization of Cholesky decomposition for sparse diagonal-based linear systems and for Bundle Adjustment problem where matrices of specific structure arise. Cholesky decomposition exploiting a Schur complement is implemented on both CPU and GPU side. Abstrakt Práce se zabývá řešením řídkých lineárních pozitivně definitních soustav. Implementuje Choleského dekompozici na CPU s využitím CRS formátu řídkých matic, rychlé AMD permutace a symbolické faktorizace. Analyzuje možnosti paralelizace Choleského dekompozice pro řídké lineárné systémy diagonálního tvaru a pro problém vyrovnání svazku, kde vznikají řídké matice specifické struktury. Navrhuje a implementuje výpočet Choles- kého dekompozice na GPU a CPU pomoci Schůrova komplementu. x xi Contents 1 Introduction 2 1.1 Motivation ............................. 2 2 Solving Linear Systems 4 2.1 System of Linear Equations ................... 4 2.2 Direct Methods for Solving Linear Systems .......... 5 2.2.1 Cramer’s Rule ...................... 5 2.2.2 Forward and Backward Substitution .......... 5 2.2.3 Gaussian Elimination .................. 6 2.2.4 Gauss-Jordan Elimination ................ 7 2.2.5 LU Decomposition .................... 7 2.2.6 Cholesky Decomposition ................. 7 2.3 Iterative Methods for Solving Linear Systems ......... 8 3 Sparse Matrices 10 3.1 Ordering Methods ........................ 10 3.1.1 Arrowhead Matrix Example ............... 11 3.1.2 Graph Representation .................. 11 3.1.3 Bottom-up Ordering Methods .............. 12 3.1.4 Top-down Ordering Methods .............. 12 3.2 Symbolical Factorization ..................... 13 4 Bundle Adjustment 16 4.1 Unconstrained Optimization ................... 17 4.1.1 Search Methods ...................... 18 4.1.2 Levenberg–Marquardt Algorithm ............ 19 5 Overview of NVIDIA CUDA 22 5.1 The CUDA Execution Model .................. 23 5.2 GPU Memory ........................... 24 6 Analysis of the Problem 28 6.1 Structure of Linear Systems in BA ............... 28 xii xiii CONTENTS 6.2 Block Cholesky Decomposition for BA ............. 29 7 Implementation 34 7.1 Used Framework ......................... 34 7.2 Compressed Row Storage Format ................ 34 7.3 Cholesky decomposition on GPU ................ 35 7.4 Ordering for CPU solver ..................... 36 7.5 Block Matrix Format for GPU ................. 36 7.6 Block Cholesky decomposition on GPU ............. 37 7.7 Ordering for GPU solver ..................... 38 8 Testing 40 8.1 Octave solvers ........................... 40 8.2 CPU solver ............................ 41 8.3 GPU solver ............................ 42 8.4 CUSP solvers ........................... 43 9 Conclusion 44 A List of Abbrevations 50 B User Manual 52 B.1 Requirements ........................... 52 B.2 Usage ............................... 52 C Contetns of the Attached CD 54 List of Figures 3.1 The dependence of the reordering of a sparse matrix on the fill-in count ............................ 11 3.2 Ordering example ......................... 14 4.1 Reprojection error ........................ 17 5.1 Block diagram of a GF100 GPU ................. 24 5.2 Streaming multiprocessor of a GF100 (Fermi) GPU ...... 25 5.3 Bandwidth of various GPU memory .............. 25 6.1 An example of a modestly sized Hessian in BA ........ 30 7.1 Sample of a symmetric positive definite sparse matrix 6 × 6 with 22 nonzero elements .................... 35 7.2 Performing k-way ordering on diagonal-based matrix ’Wathen 10 × 10’ .............................. 38 7.3 Performing k-way ordering on diagonal-based matrix ’Poisson 30’ ................................. 39 8.1 Test of Octave solvers ...................... 41 8.2 Test of iterative CUSP solvers. Max. error is the maximal difference with Octave’s reference solution ........... 43 xiv xv LIST OF FIGURES Chapter 1 Introduction Finding a solution of a system of linear algebraic equations (2.1) is the most basic task in linear algebra and the heart of many engineering problems. It is the topic of studies for many years not only for its application in many branches of scientific computing, but also for its high computational com- plexity and a wide variety of methods and approaches that help to solve linear systems of different types faster and more accurately. Finding a solution for a system of nonlinear algebraic equations can be achieved using iterative solvers which keystone is solving a linear system in each iteration step to bring near the sufficiently accurate solution. There- fore, a linear solver forms a crucial part and a bottleneck of a nonlinear solver at the same time. A widely used optimization method in 3D reconstruction algorithms is bundle adjustment. As a nonlinear iterative optimization method, it needs to solve a sparse, often very large linear system of a specific structure many times. Studying of the suitable linear solver for bundle adjustment is the main part of my thesis. 1.1 Motivation One particular and promising approach for speeding-up the process of solving systems of linear equations consists in parallel computation. In case of dense direct solvers, the parallelization is more straightforward and has better per- formance results than those for sparse direct solvers. Iterative methods, almost used for solving large sparse linear systems, are efficiently paralleliz- able thanks to the character of iterative solvers that used only sparse matrix and vector multiplications and additions. 2 3 1.1. MOTIVATION In the last decade, there has been growing interest in general-purpose computation on graphics processing units (GPGPU). Several libraries were devel- oped which implement basic linear algebra subroutines or even linear solvers for dense matrices (NVIDIA cuBLAS, MAGMA, CULA Dense) and sparse matrices (NVIDIA cuSparse, NVIDIA CUSP, CULA Sparse). At the present time, no implementation of a linear direct solver for general sparse matrices on GPU exists. The main cause is the problematic fine-grain parallelization and the thread divergence on a GPU. Sparse matrices consisting of many small independent full blocks on diagonal with some dependent parts on borders are formed during computation of bundle adjustment. It seems that there is possibility to eliminate these blocks in parallel manner effectively even on GPU. The question is which type of solver is more suitable — direct or iterative? My thesis aims to give the answer for it. Chapter 2 Solving Linear Systems1 2.1 System of Linear Equations Definition 1. A system of m linear equations in n unknowns consists of a set of algebraic relations of the form n aijxj = bi, i = 1,...,m (2.1) Xj=1 where xj are unknowns, aij are the coefficients of the system and bi are the components of the right-hand side. System (2.1) can be more conveniently written in matrix form as Ax = b, (2.2) m×n m where A = (aij) ∈ C denotes the coefficient matrix, b = (bi) ∈ C n the right side vector and x = (xi) ∈ C the unknown vector, respectively. A solution of (2.2) is any n-tuple of values xi which satisfies (2.1). Remark 1. The existence and uniqueness of the solution of are ensured if one of the following (equivalent) hypotheses holds: 1. A is invertible, 2. rank(A)= n, 3. the homogeneous system Ax = 0 admits only the null solution. In next chapters I will be dealing with numerical methods finding the solution of real-valued square systems of order n, that is, systems of the form (2.2) with A ∈ Rn×n and x, b ∈ Rn. Such linear systems arise frequently in any 1For this chapter was cited from [20] and [21] 4 5 2.2. DIRECT METHODS FOR SOLVING LINEAR SYSTEMS branch of science, also in bundle adjustment. These numerical methods can generally be divided into two classes. In absence of roundoff errors, direct methods yield the exact solution in a finite number of steps. Iterative methods require (theoretically) an infinite number of steps to find the exact solution. 2.2 Direct Methods for Solving Systems of Linear Equations 2.2.1 Cramer’s Rule The solution of system (2.2) is formally provided by Cramer’s rule det(Aj) xj = , j = 1,...,n, (2.3) det(A) where Aj is the matrix obtained by substituting the j-th column of A with the right-hand side b. If the determinants are evaluated by the recursive Laplace rule, the method based on Cramer’s rule turns out to be unac- ceptable even for small dimensions of A because of its computational costs (n + 1)! flops. However, Habgood and Arel [11] have recently shown that Cramer’s rule can be implemented in O(n3) time, which is comparable to more common methods of solving systems of linear equations. 2.2.2 Forward and Backward Substitution Definition 2. A square matrix with zero entries above the main diagonal (aij = 0 for i<j) is called lower triangular. A square matrix with zero entries below the main diagonal (aij = 0 for i>j) is called upper triangular. A lower (upper) triangular matrix is strictly lower (upper) triangular when its entries on the main diagonal are zeros, too.

Load more