Solution of General Linear Systems of Equations Using Block Krylov Based

Solution of general linear systems of equations using blo ck Krylov based iterative metho ds on distributed computing environments LeroyAnthony Drummond Lewis Decemb er i Resolution de systemes lineaires creux par des metho des iteratives par blo cs dans des environnements distribues heterogenes Resume Nous etudions limplantation de metho des iteratives par blo cs dans des environnements mul tipro cesseur amemoire distribuee p our la resolution de systemes lineaires quelconques Dans un premier temps nous nous interessons aletude du p otentiel de la metho de du gradient con jugue classique en environnement parallele Dans un deuxieme temps nous etudions une autre metho de iterativedelameme famille que celle du gradient conjugue qui a ete concue p our resoudre simultanementdessystemes lineaires avec de multiples seconds membres et commune mentreferencee sous le nom de gradient conjugue par blo cs La complexite algorithmique de la metho de du gradient conjugue par blo cs est superieure a celle de la metho de du gradientconjugue classique car elle demande plus de calculs par iteration et necessite en outre plus de memoire Malgre cela la metho de du gradient conjugue par blo cs apparat comme etant mieux adaptee aux environnements vectoriels et paralleles Nous avons etudie trois variantes du gradient conjugue par blo cs basees sur dierents mo deles de programmation parallele et leur ecaciteaete comparee sur divers environnements distribues Pour la resolution des systemes lineaires non symetriques nous considerons lutilisation de me thodes iteratives de pro jection par lignes accelerees par la metho de du gradient conjugue par blo cs Les dierentes versions de gradient conjugue par blo cs precedemmentetudiees sont util isees en vue de cette acceleration En particulier nous etudions limplantation dans des environ nements distribues de la metho de de Cimmino par blo cs acceleree par la metho de du gradient conjugue par blo cs La combinaison de ces deux techniques presente en eet un b on p otentiel de parallelisme Pour une b onne p erformance de limplantation de cette derniere metho de dans des environ nements distribues heterogenes nous avons etudie dierentes strategies de repartition des taches aux divers pro cesseurs et nous comparons deux sequencements statiques realisant cette repar tition Le premier a p our ob jectif de maintenir lequilibre des charges et le second a p our but de reduire en premier les communications entre les dierents pro cesseurs tout en essayant dequilibrer aux mieux la charge des pro cesseurs Finalement nous etudions des strategies de pretraitement des systemes lineaires p our ameliorer la p erformance de la metho de iterative basee sur la metho de de Cimmino Mots clefs La metho de de Cimmino gradient conjugue calcul distribue calcul heterogene metho des itera tives calcul parallele pretraitement des systemes lineaires sequencement systemes lineaires creux et symetriques matrices creuses systemes lineaires creux non symetriques ii Solution of general linear systems of equations using blo ck Krylov based iterative metho ds on distributed computing environments Abstract We study the implementation of blo ck Krylov based iterative metho ds on distributed computing environments for the solution of general linear systems of equations First we study p otential implementation of the classical conjugate gradient CG metho d on parallel environments From the family of conjugate direction metho ds we study the Blo ck Conjugate GradientBlockCG metho d which is based on the classical CG metho d The Blo ckCG works on a blo ckofs right hand side vectors instead of a single one as is the case of the Classical CG and we study the implementation of the Blo ckCG on distributed environments The complexity of the Blo ckCG metho d is higher than the complexity of the classical CG in terms of computations and memory requirements We show that the fact that an iteration of the Blo ckCG requires more computations than the classical CG makes the Blo ckCG more suitable for vector and parallel environments Additionally the increase in memory requirements is only amultiple of s which generally is much smaller than the size of the linear system b eing solved We present three mo dels of distributed implementations of the Blo ckCG and discuss the ad vantages and disadvantages from each of these mo del implementations The Classical CG and Blo ckCG are suitable for the solution of symmetric and p ositive denite systems of equations Furthermore b oth metho ds guarantee in exact arithmetic to nd a so lution to p ositive denite systems in a nite numb er of iterations For non symmetric linear systems we study blo ckrow pro jection iterative metho ds for solving general linear systems of equations and we are particularly interested in showing that the rate of convergence of some row pro jection metho ds can b e accelerated with the Blo ckCG metho d We review twoblock pro jection metho ds the blo ck Cimmino and the blo ck Kaczmarz metho d Afterwards we study the implementation of an iterative pro cedure based on the blo ck Cimmino metho d using the Blo ckCG metho d to accelerate its rate of convergence on distributed com puting environments The complexity of the new iterative pro cedure is higher than the one of the Blo ckCG metho d in terms of computations and memory requirements In this case the main advantage is the extension of the application of the CG based metho ds to general linear systems of equations We present a parallel implementation of the blo ck Cimmino metho d with Blo ckCG acceleration that p erforms well on distributed computing environments This last parallel implementation op ens a study of p otential scheduling strategies for distribut ing tasks to a set of computing elements We presentascheduler for heterogeneous computing environments whichiscurrently implemented inside the blo ck Cimmino based solver and can b e reused inside parallel implementations of several other iterative metho ds Lastlywe combine all of the ab ove eorts for parallelizing iterative metho ds with prepro cessing strategies to improve the numerical b ehavior of the blo ck Cimmino based iterativesolver Keywords Cimmino metho d conjugate gradient distributed computing heterogeneous computing iter ative metho ds parallel computing prepro cessing linear systems scheduling symmetric sparse linear systems sparse matrices unsymmetric sparse linear systems iii Acknowledgements The author gratefully acknowledges the following p eople and organizations for their immeasur able supp ort Iain S Du For his professional support and research guidance Daniel Ruiz For the time and eorts he has dedicated to my thesis and the genuine pleasure that has been working with him Joseph Noailles Who was responsible for my PhD studies Mario Arioli For his remarkable contributions to my work Craig Douglas For al l his insights and valuable comments Proofreading my thesis during the processofprepa ration and great advice you are supposed to have fun while writing your thesis J C Diaz For his constant support in my career Dominique Bennett For al l the help settling in Toulouse and CERFACS which is an indirect but substantial support for four years of work in France CERFACS For providing the scientic resources and pleasant working environment To al l my col leagues and special ly to Osni Marques for helping me plotting and nding eigenvalues ENSEEIHT IRIT The APO team for hosting me in their group and sharing with me their research experiences and great resources Brigitte Sor and Frederic Noail les for their outstanding computing support MCS Argonne National Lab oratory For letting me use their paral lel computing facilities IANCNR Pavia Italy For hosting me in Summer CNUSC For the use of SP Thin node My family For always being with me To Sandra Drummond for her help and company My friends for al l that I have found learned and shared in our friendship iv Contents Intro duction Blo ck conjugate gradient Conjugate gradient algorithm A stable Blo ckCG algorithm Performance of Blo ckCG vs Classical CG Stopping criteria Sequential Blo ckCG exp eriments Solving the LANPRO NOS problem Solving again the LANPRO NOS problem Solving the LANPRO NOS problem Solving the SHERMAN problem Remarks Parallel Blo ckCG implementation Partitioning and scheduling strategies Partitioning strategy Scheduling strategy Implementations of Blo ckCG AlltoAll implementation MasterSlave distributed Blo ckCG implementation MasterSlave centralized Blo ckCG implementation Parallel Blo ckCG exp eriments Parallel solution of the SHERMAN problem Parallel solution of the LANPRO NOS problem Parallel solution of the LANPRO NOS problem Fixing the numb er of equivalent iterations Comparing dierent computing platforms Parallel solution of POISSON problem Remarks Solving general systems using Blo ckCG The blo ck Cimmino metho d The blo ck Kaczmarz metho d Blo ck Cimmino accelerated byBlockCG v vi CONTENTS Computer implementation issues Parallel implementation Ascheduler for heterogeneous environments Taxonomyofscheduling techniques A static scheduler for blo ck iterative metho ds Heterogeneous environment sp ecication Ascheduler for a parallel blo ck Cimmino

Solution of General Linear Systems of Equations Using Block Krylov Based

Randomized Fast Solvers for Linear and Nonlinear Problems in Data Science

Overview of Iterative Linear System Solver Packages

Comparison of Two Stationary Iterative Methods

A Novel Greedy Kaczmarz Method for Solving Consistent Linear Systems

Parallelization and Implementation of Methods for Image Reconstruction Applications in GNSS Water Vapor Tomography

Chebyshev and Fourier Spectral Methods 2000

Iterative Solution Methods and Their Rate of Convergence

Randomised Algorithms for Solving Systems of Linear Equations

A Kaczmarz Method with Simple Random Sampling for Solving Large Linear Systems

Iterative Methods for Linear Systems of Equations: a Brief Historical Journey

Linear Algebra Libraries

Implementing a Preconditioned Iterative Linear Solver Using Massively Parallel Graphics Processing Units