Solution of general linear systems of equations using blo ck Krylov
based iterative metho ds on distributed computing environments
LeroyAnthony Drummond Lewis
Decemb er
i
R esolution de syst emes lin eaires creux par des m etho des it eratives par
blo cs dans des environnements distribu es h et erog enes
R esum e
Nous etudions l implantation de m etho des it eratives par blo cs dans des environnements mul
tipro cesseur am emoire distribu ee p our la r esolution de syst emes lin eaires quelconques Dans
un premier temps nous nous int eressons al etude du p otentiel de la m etho de du gradient con
jugu e classique en environnement parall ele Dans un deuxi eme temps nous etudions une autre
m etho de it erativedelam eme famille que celle du gradient conjugu e qui a et e con cue p our
r esoudre simultan ementdessyst emes lin eaires avec de multiples seconds membres et commun e
mentr ef erenc ee sous le nom de gradient conjugu e par blo cs
La complexit e algorithmique de la m etho de du gradient conjugu e par blo cs est sup erieure a celle
de la m etho de du gradientconjugu e classique car elle demande plus de calculs par it eration
et n ecessite en outre plus de m emoire Malgr e cela la m etho de du gradient conjugu e par blo cs
appara t comme etant mieux adapt ee aux environnements vectoriels et parall eles
Nous avons etudi e trois variantes du gradient conjugu e par blo cs bas ees sur di erents mo d eles de
programmation parall ele et leur e cacit ea et e compar ee sur divers environnements distribu es
Pour la r esolution des syst emes lin eaires non sym etriques nous consid erons l utilisation de m e
thodes it eratives de pro jection par lignes acc el er ees par la m etho de du gradient conjugu e par
blo cs Les di erentes versions de gradient conjugu e par blo cs pr ec edemment etudi ees sont util
is ees en vue de cette acc el eration En particulier nous etudions l implantation dans des environ
nements distribu es de la m etho de de Cimmino par blo cs acc el er ee par la m etho de du gradient
conjugu e par blo cs La combinaison de ces deux techniques pr esente en e et un b on p otentiel
de parall elisme
Pour une b onne p erformance de l implantation de cette derni ere m etho de dans des environ
nements distribu es h et erog enes nous avons etudi e di erentes strat egies de r epartition des t aches
aux divers pro cesseurs et nous comparons deux s equencements statiques r ealisant cette r epar
tition Le premier a p our ob jectif de maintenir l equilibre des charges et le second a p our
but de r eduire en premier les communications entre les di erents pro cesseurs tout en essayant
d equilibrer aux mieux la charge des pro cesseurs
Finalement nous etudions des strat egies de pr etraitement des syst emes lin eaires p our am eliorer
la p erformance de la m etho de it erative bas ee sur la m etho de de Cimmino
Mots clefs
La m etho de de Cimmino gradient conjugu e calcul distribu e calcul h et erog ene m etho des it era
tives calcul parall ele pr etraitement des syst emes lin eaires s equencement syst emes lin eaires
creux et sym etriques matrices creuses syst emes lin eaires creux non sym etriques
ii
Solution of general linear systems of equations using blo ck Krylov based
iterative metho ds on distributed computing environments
Abstract
We study the implementation of blo ck Krylov based iterative metho ds on distributed computing
environments for the solution of general linear systems of equations First we study p otential
implementation of the classical conjugate gradient CG metho d on parallel environments From
the family of conjugate direction metho ds we study the Blo ck Conjugate Gradient Block CG
metho d which is based on the classical CG metho d The Blo ck CG works on a blo ckofs right
hand side vectors instead of a single one as is the case of the Classical CG and we study the
implementation of the Blo ck CG on distributed environments
The complexity of the Blo ck CG metho d is higher than the complexity of the classical CG in
terms of computations and memory requirements We show that the fact that an iteration of the
Blo ck CG requires more computations than the classical CG makes the Blo ck CG more suitable
for vector and parallel environments Additionally the increase in memory requirements is only
amultiple of s which generally is much smaller than the size of the linear system b eing solved
We present three mo dels of distributed implementations of the Blo ck CG and discuss the ad
vantages and disadvantages from each of these mo del implementations
The Classical CG and Blo ck CG are suitable for the solution of symmetric and p ositive de nite
systems of equations Furthermore b oth metho ds guarantee in exact arithmetic to nd a so
lution to p ositive de nite systems in a nite numb er of iterations
For non symmetric linear systems we study blo ckrow pro jection iterative metho ds for solving
general linear systems of equations and we are particularly interested in showing that the rate
of convergence of some row pro jection metho ds can b e accelerated with the Blo ck CG metho d
We review twoblock pro jection metho ds the blo ck Cimmino and the blo ck Kaczmarz metho d
Afterwards we study the implementation of an iterative pro cedure based on the blo ck Cimmino
metho d using the Blo ck CG metho d to accelerate its rate of convergence on distributed com
puting environments The complexity of the new iterative pro cedure is higher than the one of
the Blo ck CG metho d in terms of computations and memory requirements In this case the
main advantage is the extension of the application of the CG based metho ds to general linear
systems of equations We present a parallel implementation of the blo ck Cimmino metho d with
Blo ck CG acceleration that p erforms well on distributed computing environments
This last parallel implementation op ens a study of p otential scheduling strategies for distribut
ing tasks to a set of computing elements We presentascheduler for heterogeneous computing
environments whichiscurrently implemented inside the blo ck Cimmino based solver and can
b e reused inside parallel implementations of several other iterative metho ds
Lastly we combine all of the ab ove e orts for parallelizing iterative metho ds with prepro cessing
strategies to improve the numerical b ehavior of the blo ck Cimmino based iterativesolver
Keywords
Cimmino metho d conjugate gradient distributed computing heterogeneous computing iter
ative metho ds parallel computing prepro cessing linear systems scheduling symmetric sparse linear systems sparse matrices unsymmetric sparse linear systems
iii
Acknowledgements
The author gratefully acknowledges the following p eople and organizations for their immeasur
able supp ort
Iain S Du
For his professional support and research guidance
Daniel Ruiz
For the time and e orts he has dedicated to my thesis and the genuine pleasure that has been
working with him
Joseph Noailles
Who was responsible for my Ph D studies
Mario Arioli
For his remarkable contributions to my work
Craig Douglas
For al l his insights and valuable comments Proof reading my thesis during the processofprepa
ration and great advice you are supposed to have fun while writing your thesis
J C Diaz
For his constant support in my career
Dominique Bennett
For al l the help settling in Toulouse and CERFACS which is an indirect but substantial support
for four years of work in France
CERFACS
For providing the scienti c resources and pleasant working environment To al l my col leagues
and special ly to Osni Marques for helping me plotting and nding eigenvalues
ENSEEIHT IRIT
The APO team for hosting me in their group and sharing with me their research experiences
and great resources Brigitte Sor and Frederic Noail les for their outstanding computing support
MCS Argonne National Lab oratory
For letting me use their paral lel computing facilities
IAN CNR Pavia Italy
For hosting me in Summer
CNUSC
For the use of SP Thin node
My family
For always being with me To Sandra Drummond for her help and company My friends for al l
that I have found learned and shared in our friendship iv
Contents