HPC Libraries
Total Page:16
File Type:pdf, Size:1020Kb
High Performance Computing: Concepts, Methods & Means HPC Libraries Hartmut Kaiser PhD Center for Computation & Technology Louisiana State University April 19 th , 2007 Outline • Introduction to High Performance Libraries • Linear Algebra Libraries (BLAS, LAPACK) • PDE Solvers (PETSc) • Mesh manipulation and load balancing (METIS/ParMETIS, JOSTLE) • Special purpose libraries (FFTW) • General purpose libraries (C++: Boost) • Summary – Materials for test 2 Outline • Introduction to High Performance Libraries • Linear Algebra Libraries (BLAS, LAPACK) • PDE Solvers (PETSc) • Mesh manipulation and load balancing (METIS/ParMETIS, JOSTLE) • Special purpose libraries (FFTW) • General purpose libraries (C++: Boost) • Summary – Materials for test 3 Puzzle of the Day #include <stdio.h> int main() { int a = 10; switch (a) { case '1': printf("ONE\n"); break ; case '2': printf("TWO\n"); break ; defa1ut : printf("NONE\n"); } If you expect the output of the above return 0; } program to be NONE , I would request you to check it out! 4 Application domains • Linear algebra – BLAS, ATLAS, LAPACK, ScaLAPACK, Slatec, pim • Ordinary and partial Differential Equations – PETSc • Mesh manipulation and Load Balancing – METIS, ParMETIS, CHACO, JOSTLE, PARTY • Graph manipulation – Boost.Graph library • Vector/Signal/Image processing – VSIPL, PSSL. • General parallelization – MPI, pthreads • Other domain specific libraries – NAMD, NWChem, Fluent, Gaussian, LS-DYNA 5 Application Domain Overview • Linear Algebra Libraries – Provide optimized methods for constructing sets of linear equations, performing operations on them (matrix-matrix products, matrix-vector products) and solving them (factoring, forward & backward substitution. – Commonly used libraries include BLAS, ATLAS, LAPACK, ScaLAPACK, PaLAPACK • PDE Solvers: – Developing general-porpose, parallel numerical PDE libraries – Usual toolsets include manipulation of sparse data structures, iterative linear system solvers, preconditioners, nonlinear solvers and time-stepping methods. – Commonly used libraries for solving PDEs include SAMRAI, PETSc, PARASOL, Overture, among others. 6 Application Domain Overview • Mesh manipulation and Load Balancing – These libraries help in partitioning meshes in roughly equal sizes across processors, thereby balancing the workload while minimizing size of separators and communication costs. – Commonly used libraries for this purpose include METIS, ParMetis, Chaco, JOSTLE among others. • Other packages: – FFTW: features highly optimized Fourier transform package including both real and complex multidimensional transforms in sequential, multithreaded, and parallel versions. – NAMD: molecular dynamics library available for Unix/Linux, Windows, OS X – Fluent: computational fluid dynamics package, used for such applications as environment control systems, propulsion, reactor modeling etc. 7 Outline • Introduction to High Performance Libraries • Linear Algebra Libraries (BLAS , LAPACK) • PDE Solvers (PETSc) • Mesh manipulation and load balancing (METIS/ParMETIS, JOSTLE) • Special purpose libraries (FFTW) • General purpose libraries (C++: Boost) • Summary – Materials for test 8 BLAS • (Updated set of) Basic Linear Algebra Subprograms • The BLAS functionality is divided into three levels: – Level 1: contains vector operations of the form: as well as scalar dot products and vector norms – Level 2: contains matrix-vector operations of the form as well as Tx = y solving for x with T being triangular – Level 3: contains matrix-matrix operations of the form as well as solving for triangular matrices T. This level contains the widely used General Matrix Multiply operation. 9 BLAS • Several implementations for different languages exist – Reference implementation (F77 and C) http://www.netlib.org/blas/ – ATLAS, highly optimized for particular processor architectures – A generic C++ template class library providing BLAS functionality: uBLAS http://www.boost.org – Several vendors provide libraries optimized for their architecture (AMD, HP, IBM, Intel, NEC, NViDIA, Sun) 10 BLAS: F77 naming conventions 11 BLAS: C naming conventions • F77 routine name is changed to lowercase and prefixed with cblas_ • All routines which accept two dimensional arrays have a new additional first parameter specifying the matrix memory layout (row major or column major) • Character parameters are replaced by corresponding enum values • Input arguments are declared const • Non-complex scalar input parameters are passed by value • Complex scalar input argiments are passed using a void* • Arrays are passed by address • Output scalar arguments are passed by address • Complex functions become subroutines which return the result via an additional last parameter ( void* ), appending _sub to the name 12 BLAS Level 1 routines • Vector operations (xROT, xSWAP, xCOPY etc.) • Scalar dot products (xDOT etc.) • Vector norms (IxAMX etc.) 13 BLAS Level 2 routines • Matrix-vector operations (xGEMV, xGBMV, xHEMV, xHBMV etc.) • Solving Tx = y for x, where T is triangular (xGER, xHER etc.) 14 BLAS Level 3 routines • Matrix-matrix operations (xGEMM etc.) • Solving for triangular matrices (xTRMM) • Widely used matrix-matrix multiply (xSYMM, xGEMM) 15 Demo 1 • Shows solving a matrix multiplication problem using BLAS expressed in FORTRAN, C, and C++ • Shows genericity of uBLAS, by comparing generic and banded matrix versions • Shows newmat, a C++ matrix library which uses operator overloading 16 Outline • Introduction to High Performance Libraries • Linear Algebra Libraries (BLAS, LAPACK ) • PDE Solvers (PETSc) • Mesh manipulation and load balancing (METIS/ParMETIS, JOSTLE) • Special purpose libraries (FFTW) • General purpose libraries (C++: Boost) • Summary – Materials for test 17 LAPACK • Linear Algebra PACKage – http://www.netlib.org/lapack/ – Written in F77 – Provides routines for • Solving systems of simultaneous linear equations, • Least-squares solutions of linear systems of equations, • Eigenvalue problems, • Householder transformation to implement QR decomposition on a matrix and • Singular value problems – Was initially designed to run efficiently on shared memory vector machines – Depends on BLAS – Has been extended for distributed (SIMD) systems (ScaPACK and PLAPACK) 18 LAPACK (Architecture) 19 LAPACK naming conventions 20 Demo 2 • Shows how using a library might speed up the computation considerably 21 Outline • Introduction to High Performance Libraries • Linear Algebra Libraries (BLAS, LAPACK) • PDE Solvers (PETSc) • Mesh manipulation and load balancing (METIS/ParMETIS, JOSTLE) • Special purpose libraries (FFTW) • General purpose libraries (C++: Boost) • Summary – Materials for test 22 PETSc (pronounced PET-see) • Portable, Extensible Toolkit for Scientific Computation (http://www-unix.mcs.anl.gov/petsc/petsc-as/ ) – Suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations (PDEs) – Employs the MPI standard for all message-passing communication – Intended for use in large-scale application projects – Includes a large suite of parallel linear and nonlinear equation solvers – Easily used in application codes written in C, C++, Fortran and Python • Good introduction: http://www-unix.mcs.anl.gov/petsc/petsc-as/documentation/tutorials/nersc02/nersc02.ppt 23 PETSc (general features) • Features include: – Parallel vectors • Scatters (handles communicating ghost point information) • Gathers – Parallel matrices • Several sparse storage formats • Easy, efficient assembly. – Scalable parallel preconditioners – Krylov subspace methods – Parallel Newton-based nonlinear solvers – Parallel time stepping (ODE) solvers 24 PETSc (Architecture) PETSc: Module architecture and layers of abstraction 25 PETSc: Component details • Vector operations (Vec) : Provides the vector operations required for setting up and solving large-scale linear and nonlinear problems. Includes easy-to-use parallel scatter and gather operations, as well as special-purpose code for handling ghost points for regular data structures. • Matrix operations (Mat) : A large suite of data structures and code for the manipulation of parallel sparse matrices. Includes four different parallel matrix data structures, each appropriate for a different class of problems. • Preconditioners (PC) : A collection of sequential and parallel preconditioners, including – (sequential) ILU(k) (incomplete factorization), – LU (lower/upper decomposition), – both sequential and parallel block Jacobi, overlapping additive Schwarz methods • Time stepping ODE solvers (TS) : Code for the time evolution of solutions of PDEs. In addition, provides pseudo-transient continuation techniques for computing steady-state solutions. 26 PETSc: Component details • Krylov subspace solvers (KSP) : Parallel implementations of many popular Krylov subspace iterative methods, including – GMRES (Generalized Minimal Residual method), – CG (Conjugate Gradient), – CGS (Conjugate Gradient Squared), – Bi-CG-Stab (BiConjugate Gradient Squared), – two variants of TFQMR (transpose free QMR), – CR (Conjugate Residuals), – LSQR (Least Square Root). All are coded so that they are immediately usable with any preconditioners and any matrix data structures, including matrix-free methods. • Non-linear solvers (SNES) : Data-structure-neutral implementations of Newton-like methods for nonlinear systems. Includes both line search and trust region techniques with a single interface. Employs by default the above data structures and linear solvers.