Dense and Sparse Parallel Linear Algebra Algorithms on Graphics Processing Units

Departament de Sistemes Informatics` i Computacio´ Dense and sparse parallel linear algebra algorithms on graphics processing units Author: Alejandro Lamas Davi~na Director: JoséE. RománMoltó October 2018 To the extent possible under law, the author has waived all copyright and related or neighboring rights to this work. To one, two, and three. Acknowledgments I would like to express my gratitude to my director JoséRomán,for his permanent support during all these years of work. His wise advice and selfless guidance have been decisive for the culmination of this thesis. His door has always been open for me, and he has solved all my doubts with unlimited patience. For all that and for more, I thank him. I would like to extend my gratitude to my colleagues of the SLEPc project. Here I thank again to JoséRománfor his unique humor sense. To Carmen, for showing me the way and for all her good advices. To Enrique, who helped me to get rolling. And to the former members Andrésand Eloy, to whom I had the opportunity to meet and who enliven the group meals. I will keep good memories from these years. I do not want to forget to mention to Xavier Cartoixà,Jeff Steward and Altuˇg Aksoy, great researchers with whom I have had the opportunity to collaborate. The afternoon snacks would not have been the same without the excellent discus- sions and comments of Fernando, David and of course Salva, who, without noticing it, also helped to improve this dissertation. Last, I would like to thank to JoséLuis, IT staff of the department, for his high valuable work behind the scenes and his promptly response to any incidence. To all of you who have supported me, thank you. Abstract One line of development followed in the field of supercomputing is the use of specific purpose processors to speed up certain types of computations. In this thesis we study the use of graphics processing units as computer accelerators and apply it to the field of linear algebra. In particular, we work with the SLEPc library to solve large-scale eigenvalue problems, and to apply matrix functions in scientific applications. SLEPc is a parallel library based on the MPI standard and is developed with the premise of being scalable, i.e. to allow solving larger problems by increasing the processing units. We address the linear eigenvalue problem, Ax = λx in its standard form, using iterative techniques, in particular with Krylov's methods, with which we calculate a small portion of the eigenvalue spectrum. This type of algorithms is based on generating a subspace of reduced size (m) in which to project the large dimension problem (n), being m n. Once the problem has been projected, it is solved by direct methods, which provide us with approximations of the eigenvalues of the initial problem we wanted to solve. The operations used in the expansion of the subspace vary depending on whether the desired eigenvalues are from the exterior or from the interior of the spectrum. In the case of searching for exterior eigenvalues, the expansion is done by matrix-vector multiplications. We do this on the GPU, either by using libraries or by creating functions that take advantage of the structure of the matrix. In the case of eigenvalues from the interior of the spectrum, the expansion requires solving linear systems of equations. In this thesis we implemented several algorithms to solve linear systems of equations for the specific case of matrices with a block-tridiagonal structure, that are run on GPU. In the computation of matrix functions we have to distinguish between the direct application of a matrix function, f(A), and the action of a matrix function on a vector, f(A)b. The first case involves a dense computation that limits the size of the problem. The second allows us to work with large sparse matrices, and to solve it we also make use of Krylov's methods. The expansion of subspace is done by matrix- vector multiplication, and we use GPUs in the same way as when solving eigenvalues. In this case the projected problem starts being of size m, but it is increased by m on each restart of the method. The solution of the projected problem is done by directly applying a matrix function. We have implemented several algorithms to compute the square root and the exponential matrix functions, in which the use of GPUs allows us to speed up the computation. i Resum Una l´ıniade desenvolupament seguida en el camp de la supercomputacióésl'úsde processadors de propòsitespec´ıficper a accelerar determinats tipus de càlcul. En aquesta tesi estudiem l'úsde targetes gràfiquescom a acceleradors de la computació i ho apliquem a l’àmbit de l’àlgebralineal. En particular treballem amb la biblioteca SLEPc per a resoldre problemes de càlculd'autovalors en matrius de gran dimensió, i per a aplicar funcions de matrius en els càlculsd'aplicacions cient´ıfiques.SLEPc és una biblioteca paral·lela que es basa en l’estàndardMPI i estàdesenvolupada amb la premissa de ser escalable, a¸còés,de permetre resoldre problemes mésgrans en augmentar les unitats de processament. El problema lineal d'autovalors, Ax = λx en la seua forma estàndard,ho abor- dem amb l'úsde tècniquesiteratives, en concret amb mètodes de Krylov, amb els quals calculem una xicoteta porcióde l'espectre d'autovalors. Aquest tipus d'algorismes es basa a generar un subespai de grandàriaredu¨ıda(m) en el qual projectar el problema de gran dimensió(n), sent m n. Una vegada s'ha projectat el problema, es resol aquest mitjan¸cant mètodes directes, que ens proporcionen aproximacions als autovalors del problema inicial que vol´ıemresoldre. Les operacions que s'utilitzen en l’expansiódel subespai varien en funcióde si els autovalors desitjats estan en l'exterior o a l'interior de l'espectre. En cas de cercar autovalors en l'exterior de l'espectre, l’expansióes fa mitjan¸cant multiplicacions matriu-vector. Aquesta operació la realitzem en la GPU, bémitjan¸cant l'ús de biblioteques o mitjan¸cant la creacióde funcions que aprofiten l'estructura de la matriu. En cas d'autovalors a l'interior de l'espectre, l’expansiórequereix resoldre sistemes d'equacions lineals. En aquesta tesi implementem diversos algorismes per a la resolucióde sistemes d'equacions lineals per al cas espec´ıficde matrius amb estructura tridiagonal a blocs, que s'executen en GPU. En el càlculde les funcions de matrius hem de diferenciar entre l’aplicaciódi- recta d'una funciósobre una matriu, f(A), i l’aplicacióde l’acciód'una funcióde matriu sobre un vector, f(A)b. El primer cas implica un càlculdens que limita la grandàriadel problema. El segon permet treballar amb matrius disperses grans, i per a resoldre-ho tambéfem úsde mètodes de Krylov. L’expansiódel subespai es fa mitjan¸cant multiplicacions matriu-vector, i fem úsde GPUs de la mateixa forma que en resoldre autovalors. En aquest cas el problema projectat comen¸casent de grandària m, peròs'incrementa en m en cada reinici del mètode. La resoluciódel problema projectat es fa aplicant una funcióde matriu de forma directa. Nosaltres iii hem implementat diversos algorismes per a calcular les funcions de matrius arrel quadrada i exponencial, en les quals l'úsde GPUs permet accelerar el càlcul. iv Resumen Una l´ıneade desarrollo seguida en el campo de la supercomputaciónes el uso de procesadores de propósitoespec´ıfico para acelerar determinados tipos de cálculo. En esta tesis estudiamos el uso de tarjetas gráficascomo aceleradores de la compu- tacióny lo aplicamos al ámbito del álgebralineal. En particular trabajamos con la biblioteca SLEPc para resolver problemas de cálculode autovalores en matrices de gran dimensión, y para aplicar funciones de matrices en los cálculosde aplicaciones cient´ıficas.SLEPc es una biblioteca paralela que se basa en el estándarMPI y está desarrollada con la premisa de ser escalable, esto es, de permitir resolver problemas másgrandes al aumentar las unidades de procesado. El problema lineal de autovalores, Ax = λx en su forma estándar,lo abordamos con el uso de técnicasiterativas, en concreto con métodos de Krylov, con los que calculamos una peque~naporcióndel espectro de autovalores. Este tipo de algoritmos se basa en generar un subespacio de tama~noreducido (m) en el que proyectar el problema de gran dimensión(n), siendo m n. Una vez se ha proyectado el problema, se resuelve este mediante métodos directos, que nos proporcionan aproximaciones a los autovalores del problema inicial que quer´ıamosresolver. Las operaciones que se utilizan en la expansióndel subespacio var´ıanen funciónde si los autovalores desea- dos estánen el exterior o en el interior del espectro. En caso de buscar autovalores en el exterior del espectro, la expansiónse hace mediante multiplicaciones matriz- vector. Esta operaciónla realizamos en la GPU, bien mediante el uso de bibliotecas o mediante la creaciónde funciones que aprovechan la estructura de la matriz. En caso de autovalores en el interior del espectro, la expansiónrequiere resolver sistemas de ecuaciones lineales. En esta tesis implementamos varios algoritmos para la resoluciónde sistemas de ecuaciones lineales para el caso espec´ıficode matrices con estructura tridiagonal a bloques, que se ejecutan en GPU.

Dense and Sparse Parallel Linear Algebra Algorithms on Graphics Processing Units

NVIDIA Opengl in 2012 Mark Kilgard

Enhanced Capabilities of the Spike Algorithm and a New Spike- Openmp Solver

Developer Tools Showcase

Manycore GPU Architectures and Programming, Part 1

A Parallel Solver for Incompressible Fluid Flows

PSPIKE: a Parallel Hybrid Sparse Linear System Solver *

1.2 Molecular Dynamics Simulations

A Banded Spike Algorithm and Solver for Shared Memory Architectures Karan Mendiratta University of Massachusetts Amherst

Reviewer's Guide

Lecture: Manycore GPU Architectures and Programming, Part 1

PC Hardware Contents

View Annual Report