HPC Libraries

Total Page:16

File Type:pdf, Size:1020Kb

HPC Libraries High Performance Computing: Concepts, Methods & Means HPC Libraries Hartmut Kaiser PhD Center for Computation & Technology Louisiana State University April 19 th , 2007 Outline • Introduction to High Performance Libraries • Linear Algebra Libraries (BLAS, LAPACK) • PDE Solvers (PETSc) • Mesh manipulation and load balancing (METIS/ParMETIS, JOSTLE) • Special purpose libraries (FFTW) • General purpose libraries (C++: Boost) • Summary – Materials for test 2 Outline • Introduction to High Performance Libraries • Linear Algebra Libraries (BLAS, LAPACK) • PDE Solvers (PETSc) • Mesh manipulation and load balancing (METIS/ParMETIS, JOSTLE) • Special purpose libraries (FFTW) • General purpose libraries (C++: Boost) • Summary – Materials for test 3 Puzzle of the Day #include <stdio.h> int main() { int a = 10; switch (a) { case '1': printf("ONE\n"); break ; case '2': printf("TWO\n"); break ; defa1ut : printf("NONE\n"); } If you expect the output of the above return 0; } program to be NONE , I would request you to check it out! 4 Application domains • Linear algebra – BLAS, ATLAS, LAPACK, ScaLAPACK, Slatec, pim • Ordinary and partial Differential Equations – PETSc • Mesh manipulation and Load Balancing – METIS, ParMETIS, CHACO, JOSTLE, PARTY • Graph manipulation – Boost.Graph library • Vector/Signal/Image processing – VSIPL, PSSL. • General parallelization – MPI, pthreads • Other domain specific libraries – NAMD, NWChem, Fluent, Gaussian, LS-DYNA 5 Application Domain Overview • Linear Algebra Libraries – Provide optimized methods for constructing sets of linear equations, performing operations on them (matrix-matrix products, matrix-vector products) and solving them (factoring, forward & backward substitution. – Commonly used libraries include BLAS, ATLAS, LAPACK, ScaLAPACK, PaLAPACK • PDE Solvers: – Developing general-porpose, parallel numerical PDE libraries – Usual toolsets include manipulation of sparse data structures, iterative linear system solvers, preconditioners, nonlinear solvers and time-stepping methods. – Commonly used libraries for solving PDEs include SAMRAI, PETSc, PARASOL, Overture, among others. 6 Application Domain Overview • Mesh manipulation and Load Balancing – These libraries help in partitioning meshes in roughly equal sizes across processors, thereby balancing the workload while minimizing size of separators and communication costs. – Commonly used libraries for this purpose include METIS, ParMetis, Chaco, JOSTLE among others. • Other packages: – FFTW: features highly optimized Fourier transform package including both real and complex multidimensional transforms in sequential, multithreaded, and parallel versions. – NAMD: molecular dynamics library available for Unix/Linux, Windows, OS X – Fluent: computational fluid dynamics package, used for such applications as environment control systems, propulsion, reactor modeling etc. 7 Outline • Introduction to High Performance Libraries • Linear Algebra Libraries (BLAS , LAPACK) • PDE Solvers (PETSc) • Mesh manipulation and load balancing (METIS/ParMETIS, JOSTLE) • Special purpose libraries (FFTW) • General purpose libraries (C++: Boost) • Summary – Materials for test 8 BLAS • (Updated set of) Basic Linear Algebra Subprograms • The BLAS functionality is divided into three levels: – Level 1: contains vector operations of the form: as well as scalar dot products and vector norms – Level 2: contains matrix-vector operations of the form as well as Tx = y solving for x with T being triangular – Level 3: contains matrix-matrix operations of the form as well as solving for triangular matrices T. This level contains the widely used General Matrix Multiply operation. 9 BLAS • Several implementations for different languages exist – Reference implementation (F77 and C) http://www.netlib.org/blas/ – ATLAS, highly optimized for particular processor architectures – A generic C++ template class library providing BLAS functionality: uBLAS http://www.boost.org – Several vendors provide libraries optimized for their architecture (AMD, HP, IBM, Intel, NEC, NViDIA, Sun) 10 BLAS: F77 naming conventions 11 BLAS: C naming conventions • F77 routine name is changed to lowercase and prefixed with cblas_ • All routines which accept two dimensional arrays have a new additional first parameter specifying the matrix memory layout (row major or column major) • Character parameters are replaced by corresponding enum values • Input arguments are declared const • Non-complex scalar input parameters are passed by value • Complex scalar input argiments are passed using a void* • Arrays are passed by address • Output scalar arguments are passed by address • Complex functions become subroutines which return the result via an additional last parameter ( void* ), appending _sub to the name 12 BLAS Level 1 routines • Vector operations (xROT, xSWAP, xCOPY etc.) • Scalar dot products (xDOT etc.) • Vector norms (IxAMX etc.) 13 BLAS Level 2 routines • Matrix-vector operations (xGEMV, xGBMV, xHEMV, xHBMV etc.) • Solving Tx = y for x, where T is triangular (xGER, xHER etc.) 14 BLAS Level 3 routines • Matrix-matrix operations (xGEMM etc.) • Solving for triangular matrices (xTRMM) • Widely used matrix-matrix multiply (xSYMM, xGEMM) 15 Demo 1 • Shows solving a matrix multiplication problem using BLAS expressed in FORTRAN, C, and C++ • Shows genericity of uBLAS, by comparing generic and banded matrix versions • Shows newmat, a C++ matrix library which uses operator overloading 16 Outline • Introduction to High Performance Libraries • Linear Algebra Libraries (BLAS, LAPACK ) • PDE Solvers (PETSc) • Mesh manipulation and load balancing (METIS/ParMETIS, JOSTLE) • Special purpose libraries (FFTW) • General purpose libraries (C++: Boost) • Summary – Materials for test 17 LAPACK • Linear Algebra PACKage – http://www.netlib.org/lapack/ – Written in F77 – Provides routines for • Solving systems of simultaneous linear equations, • Least-squares solutions of linear systems of equations, • Eigenvalue problems, • Householder transformation to implement QR decomposition on a matrix and • Singular value problems – Was initially designed to run efficiently on shared memory vector machines – Depends on BLAS – Has been extended for distributed (SIMD) systems (ScaPACK and PLAPACK) 18 LAPACK (Architecture) 19 LAPACK naming conventions 20 Demo 2 • Shows how using a library might speed up the computation considerably 21 Outline • Introduction to High Performance Libraries • Linear Algebra Libraries (BLAS, LAPACK) • PDE Solvers (PETSc) • Mesh manipulation and load balancing (METIS/ParMETIS, JOSTLE) • Special purpose libraries (FFTW) • General purpose libraries (C++: Boost) • Summary – Materials for test 22 PETSc (pronounced PET-see) • Portable, Extensible Toolkit for Scientific Computation (http://www-unix.mcs.anl.gov/petsc/petsc-as/ ) – Suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations (PDEs) – Employs the MPI standard for all message-passing communication – Intended for use in large-scale application projects – Includes a large suite of parallel linear and nonlinear equation solvers – Easily used in application codes written in C, C++, Fortran and Python • Good introduction: http://www-unix.mcs.anl.gov/petsc/petsc-as/documentation/tutorials/nersc02/nersc02.ppt 23 PETSc (general features) • Features include: – Parallel vectors • Scatters (handles communicating ghost point information) • Gathers – Parallel matrices • Several sparse storage formats • Easy, efficient assembly. – Scalable parallel preconditioners – Krylov subspace methods – Parallel Newton-based nonlinear solvers – Parallel time stepping (ODE) solvers 24 PETSc (Architecture) PETSc: Module architecture and layers of abstraction 25 PETSc: Component details • Vector operations (Vec) : Provides the vector operations required for setting up and solving large-scale linear and nonlinear problems. Includes easy-to-use parallel scatter and gather operations, as well as special-purpose code for handling ghost points for regular data structures. • Matrix operations (Mat) : A large suite of data structures and code for the manipulation of parallel sparse matrices. Includes four different parallel matrix data structures, each appropriate for a different class of problems. • Preconditioners (PC) : A collection of sequential and parallel preconditioners, including – (sequential) ILU(k) (incomplete factorization), – LU (lower/upper decomposition), – both sequential and parallel block Jacobi, overlapping additive Schwarz methods • Time stepping ODE solvers (TS) : Code for the time evolution of solutions of PDEs. In addition, provides pseudo-transient continuation techniques for computing steady-state solutions. 26 PETSc: Component details • Krylov subspace solvers (KSP) : Parallel implementations of many popular Krylov subspace iterative methods, including – GMRES (Generalized Minimal Residual method), – CG (Conjugate Gradient), – CGS (Conjugate Gradient Squared), – Bi-CG-Stab (BiConjugate Gradient Squared), – two variants of TFQMR (transpose free QMR), – CR (Conjugate Residuals), – LSQR (Least Square Root). All are coded so that they are immediately usable with any preconditioners and any matrix data structures, including matrix-free methods. • Non-linear solvers (SNES) : Data-structure-neutral implementations of Newton-like methods for nonlinear systems. Includes both line search and trust region techniques with a single interface. Employs by default the above data structures and linear solvers.
Recommended publications
  • Fortran Resources 1
    Fortran Resources 1 Ian D Chivers Jane Sleightholme May 7, 2021 1The original basis for this document was Mike Metcalf’s Fortran Information File. The next input came from people on comp-fortran-90. Details of how to subscribe or browse this list can be found in this document. If you have any corrections, additions, suggestions etc to make please contact us and we will endeavor to include your comments in later versions. Thanks to all the people who have contributed. Revision history The most recent version can be found at https://www.fortranplus.co.uk/fortran-information/ and the files section of the comp-fortran-90 list. https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=comp-fortran-90 • May 2021. Major update to the Intel entry. Also changes to the editors and IDE section, the graphics section, and the parallel programming section. • October 2020. Added an entry for Nvidia to the compiler section. Nvidia has integrated the PGI compiler suite into their NVIDIA HPC SDK product. Nvidia are also contributing to the LLVM Flang project. Updated the ’Additional Compiler Information’ entry in the compiler section. The Polyhedron benchmarks discuss automatic parallelisation. The fortranplus entry covers the diagnostic capability of the Cray, gfortran, Intel, Nag, Oracle and Nvidia compilers. Updated one entry and removed three others from the software tools section. Added ’Fortran Discourse’ to the e-lists section. We have also made changes to the Latex style sheet. • September 2020. Added a computer arithmetic and IEEE formats section. • June 2020. Updated the compiler entry with details of standard conformance.
    [Show full text]
  • Life As a Developer of Numerical Software
    A Brief History of Numerical Libraries Sven Hammarling NAG Ltd, Oxford & University of Manchester First – Something about Jack Jack’s thesis (August 1980) 30 years ago! TOMS Algorithm 589 Small Selection of Jack’s Projects • Netlib and other software repositories • NA Digest and na-net • PVM and MPI • TOP 500 and computer benchmarking • NetSolve and other distributed computing projects • Numerical linear algebra Onto the Rest of the Talk! Rough Outline • History and influences • Fortran • Floating Point Arithmetic • Libraries and packages • Proceedings and Books • Summary Ada Lovelace (Countess Lovelace) Born Augusta Ada Byron 1815 – 1852 The language Ada was named after her “Is thy face like thy mother’s, my fair child! Ada! sole daughter of my house and of my heart? When last I saw thy young blue eyes they smiled, And then we parted,-not as now we part, but with a hope” Childe Harold’s Pilgramage, Lord Byron Program for the Bernoulli Numbers Manchester Baby, 21 June 1948 (Replica) 19 Kilburn/Tootill Program to compute the highest proper factor 218 218 took 52 minutes 1.5 million instructions 3.5 million store accesses First published numerical library, 1951 First use of the word subroutine? Quality Numerical Software • Should be: – Numerically stable, with measures of quality of solution – Reliable and robust – Accompanied by test software – Useful and user friendly with example programs – Fully documented – Portable – Efficient “I have little doubt that about 80 per cent. of all the results printed from the computer are in error to a much greater extent than the user would believe, ...'' Leslie Fox, IMA Bulletin, 1971 “Giving business people spreadsheets is like giving children circular saws.
    [Show full text]
  • Numerical Libraries
    Numerical libraries M.Cremonesi May 2016 Numerical libraries Numerical libraries are collections of functions that implement a variety of mathematical algorithms. These may include low level operations such as matrix-vector arithmetics or random functions, but also more complicated algorithms such as Fast Fourier Transforms or Minimization Problems. Numerical libraries Linear algebra operations are among the most common problems solved in numerical libraries. Typical operations are: Scalar products: s = ∑ai .b i Linear Systems: Aij ⋅ x j = bi Eigenvalue Equations: Aij ⋅ x j = α ⋅ xi Numerical libraries Libraries should be used in programs: • To avoid code repeating • To enhance program functionality • To avert numerical errors • To gain efficiency As far as parallel computing is concerned many versions of numerical libraries are available to run efficiently in different computer environments. Numerical libraries Many numerical libraries have been written to solve linear system equations efficiently. Linear problems are of the kind: find x := A . x = b Not linear problems may be solved with a sequence of linear problems. Numerical libraries Solving a linear system with Gaussian elimination can take a lot of time and memory, for large matrices. This is why many libraries use iterative solvers. They are based on finding solution of the problem by calculating successive approximations, even though convergence can not always be guaranteed. Numerical libraries Iterative solvers are faster and use less memory but a check for correctness must be computed at each step, and a pre- conditioner is usually needed. The condition number associated to a linear equation A.x = b gives a bound on how inaccurate the solution x will be after approximate solution.
    [Show full text]
  • Numerical and Parallel Libraries
    Numerical and Parallel Libraries Uwe Küster University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Numerical and [33] and Parallel Libraries Numerical Libraries Slide 1 Höchstleistungsrechenzentrum Stuttgart Numerical Libraries Public Domain commercial vendor specific Libraries Uwe Küster Slide 2 of 35 Höchstleistungsrechenzentrum Stuttgart Overview • numerical libraries for linear systems – dense – sparse Numerical and Parallel Libraries [33] [33] Libraries Parallel and Numerical • FFT • support for parallelization Libraries Uwe Küster Slide 3 of 35 Höchstleistungsrechenzentrum Stuttgart Public Domain Lapack-3 linear equations, eigenproblems BLAS fast linear kernels Linpack linear equations Eispack eigenproblems Slatec old library, large functionality Quadpack numerical quadrature Itpack sparse problems pim linear systems PETSc linear systems Netlib Server best server http://www.netlib.org/utk/papers/iterative-survey/packages.html Libraries Uwe Küster Slide 4 of 35 Höchstleistungsrechenzentrum Stuttgart netlib server for all public domain numerical programs and libraries http://www.netlib.org http://www-unix.mcs.anl.gov/scidac-tops/Software_Page.html Libraries Uwe Küster Slide 5 of 35 Höchstleistungsrechenzentrum Stuttgart Contents of netlib access aicm alliant amos ampl anl-reports apollo atlas benchmark bib bibnet bihar blacs blas blast bmp c c++ cephes chammp cheney-kincaid clapack commercial confdb conformal contin control crc cumulvs ddsv dierckx diffpack domino eispack elefunt env f2c fdlibm fftpack
    [Show full text]
  • Numerical Libraries Numerical Libraries
    Numerical libraries Numerical libraries Numerical libraries are sets of functions that implement a variety of mathematical algorithms. These may include low level operations such as matrix-vector arithmetics or random functions, but also more complicated algorithms such as Fast Fourier Transforms or Minimization Problems. Libraries should be used in programs: To avoid code repeating To enhance program functionality To avert numerical errors To gain efficiency As far as parallel computing is concerned many versions of numerical libraries are available to run efficiently in various computer environments. Numerical libraries Many numerical libraries have been written to solve linear system equations efficiently. Linear problems are of the kind: find x := A . x = b Not linear problems may be solved with a sequence of linear problems. Solving a linear system with Gaussian elimination can take a lot of time and memory, for large matrices. This is why many libraries use iterative solvers. They are based on finding solution of the problem by calculating successive approximations, even though convergence can not always be guaranteed. Numerical libraries Iterative solvers are faster and use less memory but a check for correctness must be computed at each step, and a pre-conditioner is usually needed. The condition number associated to a linear equation A.x = b gives a bound on how inaccurate the solution x will be after approximate solution. The value of the condition number depends on the properties of the matrix A. It is not related to round-off errors nor accuracy in computing floating point operations. Instead it could be interpreted as the rate at which the solution x will change with respect to a change in b.
    [Show full text]
  • Introduction to Scientific Libraries
    Introduction to Scientific Libraries M. Guarrasi, M. Cremonesi, F. Affinito - CINECA 2015/10/27 Numerical Libraries •Groups of functions or subroutines •They implement various numerical algorithms, e.g: •Simple arithmetic operations; •Linear algebra; •FFT; •Solver for minimization problems; •Random generators; Numerical Libraries Why we should use NLs? •Improve modularity •Standardization •Portability •Efficency •Ready to use Numerical Libraries Disadvantages: Hidden details Unknown Algorithms Too many confidence in the implementation: E.g.: the algorithm works on only in some specific cases Numerical Libraries Agenda: •Linear Algebra: •BLAS/PBLAS •LAPACK/SCALAPAK •ARPAK/P_ARPACK •PETSc •…. •AMR: •METHIS/ ParaMETHIS •PARAMESH •CHOMBO •FFT: •FFTPACK •FFTW •2DECOMP&FFT •P3DFFT •I/O: •HDF5 •NETCDF Numerical Libraries (Linear Algebra) Serial Linear Algebra Packages • BLAS essl (IBM AIX) mkl (Intel) threaded acml (AMD) • LAPACK plasma (ICL – Univ. Tennessee) magma (ICL – Univ. Tennessee) → hybrid • PBLAS Parallel (distributed) Linear Algebra Packages (for dense matrices) • ScaLAPACK dplasma (ICL – Univ. Tennessee) • ARPACK Eigenvalues Problems (for sparse • P_ARPACK matrices) • PETSc → Sparse (non-)Linear Systems Linear Algebra Libraries BLAS/CBLAS •La Basic Linear Algebra Subprograms is one of the first libraries (written in 1979). •It includes simple operation between vectors and matrixes (e.g. scalar product, scalar operations, transposition,..); •It is currently used by other numerical libraries ; • Several distributions was created. •Available on several architectures. Language: FORTRAN, C Availability: public domain Developers: Jack Dongarra, ORNL and Eric Grosse, Bell Labs Distributors: NETLIB Ref.: The University of Tennessee at Knoxville and Bell Laboratories Linear Algebra Libraries BLAS/CBLAS BLAS lev. 1: Fortran subroutine for simple algebra operations (scalar- vector op.s). Developed in 1977. BLAS lev. 2: Vectors-matrices operations.
    [Show full text]