Cranfield University 2006

CRANFIELDUNIVERSITY DulceneiaBecker Parallel Unstructured Solvers for Linear Partial Differential Equations School of Engineering Applied Mathematicsand ComputingGroup PhD Thesis CRANFIELDUNIVERSITY School of Engineering,Department of Processand SystemsEngineering Applied Mathematicsand ComputingGroup PhD Thesis AcademicYear 2005 - 2006 Dulceneia Becker Parallel Unstructured Solversfor Linear Partial Differential Equations ProfessorC. P. Thompson Supervisor May 2006 This thesis is submitted in partial fulfilment of the requiremetnsfor the degree of Doctor of Philosophy be © Cranfield University 2006. All rights reserved.No part of this publication may reproducedwithout the written permissionof the copyright holder. .1 ALL MISSING PAGES ARE BLANK IN ORIGINAL To my parents, Nelson and Marlise, to whom I owe so much. Abstract This thesis presentsthe developmentof a parallel algorithm to solve symmet- ric systems of linear equations and the computational implementation of a parallel partial differential equations solver for unstructured meshes. The proposed distributive method, called conjugate gradient - DCG, is based on a single-level domain decomposition method and the conjugate gradient method to obtain a highly scalable parallel algorithm. An overview on methods for the discretization of domains and partial differential equations is given. The partition and refinement of meshesis discussedand the formulation of the weighted residual method for two- and three-dimensions presented. Some of the methods to solve systems of linear equations are intro- duced, highlighting the conjugate gradient method and domain decomposition methods. A parallel unstructured PDE solver is proposed and its actual implementation presented. Emphasis is given to the data partition adopted and the schemeused for communication among adjacent subdomainsis explained. A series of experiments in processorscalability is also reported. The derivation and parallelization of DCG are presentedand the method val- idated throughout numerical experiments. The method capabilities and limita- tions were investigated by the solution of the Poissonequation with various source terms. The experimental results obtained using the parallel solver developedas part of this work show that the algorithm presented is accurate and highly scalable, achieving roughly linear parallel speed-up in many of the casestested. Acknowledgements I would like to thank my supervisor Professor Chris Thompson, Dr. Phil Rubini and Dr. Julian Turnbull for the guidance throughout this work. My sincere thanks go to Professor Rudnei Dias da Cunha who was a source of guidance and support both before and during the course of this work. I would also like to thank Coordenacäo Superior for CAPES - de Aperfeicoamento de Pessoal de Nivel - funding this project and the Cambridge-Cranfield High Performance Computing Facility for allowing me the use of its computing facilities. I wish to expressmy thankfulness to my sister Daiana for her invaluable help. I also would like to thank my friends and colleaguesfrom UFRGS and Cranfield for their encouragementand support. I especially would like to thank Aaron, who kindly spared his time to proofread some of my documents. Finally, I wish to expressmy gratitude to every person that in one way or another has supported me during the course of this work. Contents Abstract VII Acknowledgements IX Notation XXIII 1 Introduction 1 2 Literature Review 7 2.1 Domain Discretization 8 .... ........ ... ........ 2.1.1 Sequential Parallel Mesh Generation 10 and .. ...... .. 2.1.2 Mesh Partitioning 11 ........... .... ...... .. 2.1.3 Mesh Refinement 12 . ...... ..... .. .... .... 2.2 Discretization Partial Differential Equations 15 of ... ..... ... 2.3 Systems Linear Equations 18 of ....... ...... .... .. 2.3.1 Preconditioning 21 ................ .... ... 2.3.2 Conjugate Gradients 22 ........ .... .... .... 2.3.3 Domain Decomposition 24 . ....... .... .... .... 2.3.3.1 SchwarzMethods 26 .......... ........ 2.3.3.2 Substructuring Methods 27 ...... ... ..... 2.3.3.3 Multigrid 28 . ........ ... ....... 2.4 Summary 31 ....... .... ....... .... ..... ... XI 3 Discretization of PDEs and the Finite Element Method 33 3.1 Domain Discretization Unstructured Grids 35 - ...... .... .. 3.2 Weighted ResidualsMethod 37 .......... .... ..... .. 3.3 Boundary Conditions 38 ....................... .. 3.4 Matrix Formulation 40 ....... ......... ........ .. 3.4.1 PoissonEquation 40 ..... ...... .... ...... .. 3.4.2 Convection-Diffusion Equation 44 ..... ......... .. 3.5 Galerkin and Petrov-Galerkin Methods 45 ............. .. 3.6 Shape Functions 48 .................. ...... .. 3.6.1 Linear Triangular Element 48 ........ ........ .. 3.6.2 Linear Tetrahedral Element 50 ............... .. 3.7 Summary 51 . .......... ......... ........ .. 4 Systems of Linear Equations 53 4.1 Definitions 54 . ... ...... ......... .... .... ... 4.1.1 Residual and Error 55 ................... ... 4.1.2 Krylov SubspaceMethods 55 ............... ... 4.1.3 Quadratic Form 56 ............... .... ... 4.1.4 Gradient of a Quadratic Form 57 ............. ... 4.2 Basic Iterative Methods .................... ... 58 4.2.1 Jacobi, Gauss-Seideland SOR 58 ............. ... 4.2.2 ConvergenceAnalysis 62 ..... .... ......... ... 4.3 SteepestDescent Method .................... ... 64 4.4 Conjugate Gradient Method ....... ....... .... ... 67 4.4.1 Convergence . ......... ........ ..... ... 69 4.5 Domain Decomposition Methods ...... ........ ... 72 4.5.1 SchwarzMethods .................... ... 75 4.5.2 Substructuring or Schur Complement Methods .... ... 75 4.6 Summary ............................ ... 77 XII 5 Parallel PDE Solver 79 5.1 Concepts Terminology 80 and ........... ........ ... 5.2 Classification Parallel Computers 81 of ....... ...... ... 5.3 Parallel Performance Metrics 86 .................. ... 5.4 Parallel Computational Models 90 ..... .... ... .. .. 5.4.1 SPMD 92 programming model ..... .... ..... ... 5.4.2 Message Passing Interface MPI 92 - ...... .... .. 5.5 Experiments in Processor Scalability 93 ............. ... 5.5.1 Sun Fire 15K Parallel Computer 94 ........ ... ... 5.5.2 IBM 9076 SP/2 Cluster 95 ... ......... ... ... 5.5.3 Parallel RRQR Factorization Scalability 96 ....... ... 5.5.4 Latency 'TransferRate 96 and ...... .... ... ... 5.6 Development the Parallel Solver 100 of .............. ... 5.6.1 Solver Overview 102 ..................... ... 5.6.2 Mesh Partitioning 106 .................... ... 5.6.3 Data Partitioning 111 . .......... .... .... ... 5.6.4 Communication Among Adjacent Subdomains 111 .... ... 5.7 Summary 116 . ........ .... .... ... ........ ... 6 Distributive Conjugate Gradient 117 6.1 Problem Formulation 118 ..... .... ..... .... ..... 6.2 Preliminary Study 121 ...... ........ ...... .... .. 6.2.1 Study 1: KASO MatLab 124 ............. ...... 6.2.2 Study 2: KASO Fortran 135 ............. ...... 6.2.3 Study 3: CG truncated 150 restarted and ...... ...... 6.3 Distributive Conjugate Gradient 158 ........ .... ...... 6.3.1 Derivation Algorithm 158 and .......... .. ... ... 6.3.2 Preconditioned DCG 160 ........ .... ..... 6.3.3 Parallelization 164 ................... ...... XIII 6.3.4 Stopping Criteria 168 . .......... .... ........ 6.4 Conclusion 169 .... ...... ... ... .... .... .... 7 Numerical Results 171 7.1 Test Problems 171 ....................... ...... 7.2 Matrices Sparsity 172 ......... ... ... ...... ..... 7.3 Spatial Discretization FEM 177 - ... ......... ... ... ... 7.3.1 Galerkin Petrov-Galerkin FEM 178 and ..... .. ...... 7.3.2 Boundary Conditions 182 . ........ .... .. ...... 7.4 Algebraic Solver DCG 188 - ................. ...... 7.4.1 ConvergenceRate 188 . .... ...... ..... ...... 7.4.2 Speedup Efficiency 189 and .... ..... .... ..... 7.4.3 Overlapping Interface Update 193 and ........ ...... 7.4.4 Preconditioning 194 .... .... ........ .. ...... 7.4.5 Boundary Conditions 196 . ..... ...... .. ...... 7.5 Poisson Solver 199 ..... ... .... ... ..... ...... 7.6 Final Remarks 201 ....................... ...... 8 Conclusion and Future Developments 203 A SunFire 15K Bandwidth and Communication Time 207 References 213 XIV List of Figures Conforming 13 2.1 and nonconforming grid ...... ....... .. 13 2.2 Local refinement of an originally conforming structured grid ... 2.3 Evolution finite 19 of elements analysis ...... .... ..... .. 35 3.1 Sketch of an unstructured mesh and a structured grid ..... .. Example 43 3.2 of mesh ......................... .. 3.3 Matrix 43 entries ......... .... ......... ..... .. 3.4 Element h triangle 46 size of a ................... .. 3.5 Linear triangle 48 ........ ................. .. 3.6 Tetrahedron 50 ...... ............ ..... ..... .. Initial A 59 4.1 partitioning of matrix ................ 66 4.2 Typical zig-zag pattern of steepest descentapproximations . ... NOS4 from Collection 70 4.3 Matrix Harwell-Boeing ........... NOS4 71 4.4 Problem - orthogonality of residual vectors .... ..... 72 4.5 Problem NOS4 - A-orthogonality of searchdirection vectors ... NOS4 73 4.6 Problem - conjugate gradient residual norm ...... ... Shared Memory Architecture 84 5.1 ..................... 5.2 Distributed Memory Architecture 85 ............. ". "". 5.3 Distributed Shared Memory Architecture 86 . ... .... ..... 88 5.4 Amdahl's Law predicted speedup (2,4,8 and 16 processors).... 88 5.5 Amdahl's Law predicted speedup (256,512 and 1024 processors). 89 5.6 Amdahl's Law predicted speedup (linear, 1,5,10 and 20% serial). xv 5.7 Amdahl's Law (1,5,10 20% 89 predicted speedup and

Cranfield University 2006

A Survey of Direct Methods for Sparse Linear Systems

Parallel Codes for Computing the Numerical Rank '

Efficient Scalable Algorithms for Hierarchically Semiseparable Matrices

A Parallel Butterfly Algorithm

Solving Linear Systems Arising from Reservoirs Modeling Hussam Al Daas

Dense Matrix Computations : Communication Cost and Numerical Stability

Rank Revealing Factorizations and Low Rank Approximations

50 LA15 Abstracts

A Randomized Algorithm for Rank Revealing QR Factorizations and Applications

Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting

Parallel Implementations for Solving Matrix Factorization Problems with Optimization

Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting Laura Grigori, Sébastien Cayrols, James W