LAPACK/ Scalapack /PLASMA/ MAGMA/Hicma – Petsc – HYPRE – TRILINOS • Signal Processing – FFTW • Numerical Integration – GSL • Random Number Generators – SPRNG

LAPACK/ Scalapack /PLASMA/ MAGMA/Hicma – Petsc – HYPRE – TRILINOS • Signal Processing – FFTW • Numerical Integration – GSL • Random Number Generators – SPRNG

Introduction to Numerical Libraries for HPC Bilel Hadri [email protected] Computational Scientist KAUST Supercomputing Lab Bilel Hadri 1 Numerical Libraries – Application Areas • Most used libraries/software in HPC ! • Linear Algebra – Systems of equations • Direct , Iterative, Multigrid solvers • Sparse, Dense system – Eigenvalue problems – Least squares • Signal processing – FFT • Numerical Integration • Random Number Generators Bilel Hadri – Introduction to Numerical Libraries for HPC 2 Numerical Libraries - Motivations • Don’t Reinvent the Wheel ! • Improves productivity ! • Get a better performance ! – Faster and better algorithms Bilel Hadri – Introduction to Numerical Libraries for HPC 3 Faster (Better Code) • Achieving best performance requires creating very processor- and system-specific code • Example: Dense matrix-matrix multiply – Simple to express: do i=1, n do j=1,n do k=1,n c(i,j) = c(i,j) + a(i,k) * b(k,j) enddo enddo enddo Bilel Hadri – Introduction to Numerical Libraries for HPC 4 Performance • How fast should this run? – Our matrix-matrix multiply algorithm has 2n3 floating point operations • 3 nested loops, each with n iterations • 1 multiply, 1 add in each inner iteration – For n=100, 2x106 operations, about 1 msec on a 2GHz processor – For n=1000, 2x109 operations, or about 1 sec • Reality: – N=100 1.1ms – N=1000 6s à Obvious expression of algorithms are not transformed into leading performance. Bilel Hadri – Introduction to Numerical Libraries for HPC 5 Numerical Libraries – Packages • Linear Algebra – BLAS/LAPACK/ ScaLAPACK /PLASMA/ MAGMA/HiCMA – PETSc – HYPRE – TRILINOS • Signal processing – FFTW • Numerical Integration – GSL • Random Number Generators – SPRNG Bilel Hadri – Introduction to Numerical Libraries for HPC 6 Others MUMPS 4.9.2. MUMPS (MUltifrontal Massively Parallel sparse direct Solver) is a package of parallel, sparse, direct linear system solvers based on a multifrontal algorithm. http://graal.enslyon.fr/MUMPS/ SuperLU 4.3. SuperLU is a sequential version of SuperLU_dist and a sequential incomplete LU preconditioner that can accelerate the convergence of Krylov subspace iterative solvers. http://crd.lbl.gov/~xiaoye/SuperLU/ ParMETIS 4.0.2. ParMETIS (Parallel Graph Partitioning and Fill reducing Matrix Ordering) is a library of routines that partition unstructured graphs and meshes and compute fill reducing orderings of sparse matrices. http://glaros.dtc.umn.edu/gkhome/views/metis/ Bilel Hadri – Introduction to Numerical Libraries for HPC 7 • SUNDIALS 2.5.0 (SUite of Nonlinear and DIfferential/Algebraic equation Solvers) consists of 5 solvers: CVODE, CVODES, IDA, IDAS, and KINSOL. In addition, SUNDIALS provides a MATLAB interface to CVODES, IDAS, and KINSOL that is called sundialsTB. https://computation.llnl.gov/casc/sundials/main.html • Scotch 5.1.12b Scotch is a software package and libraries for sequential and parallel graph partitioning, static mapping, sparse matrix block ordering, and sequential mesh and hypergraph partitioning. http://www.labri.fr/perso/pelegrin/scotch Note: On Shaheen , they are all grouped into cray-tpsl • Freely Available Software for Linear Algebra http://www.netlib.org/utk/people/JackDongarra/la-sw.html Bilel Hadri – Introduction to Numerical Libraries for HPC 8 BLAS (Basic Linear Algebra Subprograms) The BLAS functionality is divided into three levels: • Level 1: contains vector operations of the form: • Level 2: contains matrix-vector operations of the form: • Level 3: contains matrix-matrix operations of the form: • Several implementations for different languages exist – Reference implementation http://www.netlib.org/blas/ Bilel Hadri – Introduction to Numerical Libraries for HPC 9 BLAS: naming conventions • Each routine has a name which specifies the operation, the type of matrices involved and their precisions. Names are in the form: PMMOO”. – Each operation is defined for four precisions (P) – Some of the most common • S single real D double real operations (OO): C single complex • DOT scalar product, x^T y Z double complex AXPY vector sum, α x + y MV matrix-vector product, A x – The types of matrices are (MM) SV matrix-vector solve, inv(A) x • GE general MM matrix-matrix product, A B GB general band SM matrix-matrix solve, inv(A) B SY symmetric SB symmetric band SP symmetric packed HE hermitian HB hermitian band HP hermitian packed TR triangular TB triangular band TP triangular packed • Examples SGEMM stands for “single-precision general matrix-matrix multiply” DGEMM stands for “double-precision matrix-matrix multiply”. Bilel Hadri – Introduction to Numerical Libraries for HPC 10 BLAS Level 1 routines • Vector operations (xROT, xSWAP, xCOPY etc.) • Scalar dot products (xDOT etc.) • Vector norms (IxAMX etc.) Bilel Hadri – Introduction to Numerical Libraries for HPC 11 BLAS Level 2 routines • Matrix-vector operations (xGEMV, xGBMV, xHEMV, xHBMV etc.) • Solving Tx = y for x, where T is triangular (xGER, xHER etc.) Bilel Hadri – Introduction to Numerical Libraries for HPC 12 BLAS Level 3 routines • Matrix-matrix operations (xGEMM etc.) • Solving for triangular matrices (xTRMM) • Widely used matrix-matrix multiply (xSYMM, xGEMM) Bilel Hadri – Introduction to Numerical Libraries for HPC 13 LAPACK (Linear Algebra PACKage) • Linear Algebra PACKage – http://www.netlib.org/lapack/ – Provides routines for • Solving systems of simultaneous linear equations, • Least-squares solutions of linear systems of equations, • Eigenvalue problems, • Householder transformation to implement QR decomposition on a matrix and • Singular value problems – Was initially designed to run efficiently on shared memory vector machines – Depends on BLAS – Has been extended for distributed systems ScaLAPACK ( Scalable Linear Algebra PACKage) Bilel Hadri – Introduction to Numerical Libraries for HPC 14 LAPACK naming conventions • Very similar to BLAS – XYYZZZ • YY: more matrix types • X: data type – PO: symmetric or Hermitian positive definite – S: REAL – PP: symmetric or Hermitian positive definite, – D: DOUBLE PRECISION packed storage – C: COMPLEX – PT: symmetric or Hermitian positive definite – Z: COMPLEX*16 or DOUBLE COMPLEX tridiagonal • YY: matrix type – SB: (real) symmetric band – BD: bidiagonal – SP: symmetric, packed storage – DI: diagonal – ST: (real) symmetric tridiagonal – GB: general band – SY: symmetric – GE: general (i.e., unsymmetric, in some cases – TB: triangular band rectangular) – TG: triangular matrices, generalized problem – GG: general matrices, generalized problem (i.e., a pair of triangular matrices) (i.e., a pair of general matrices) – TP: triangular, packed storage – GT: general tridiagonal – TR: triangular (or in some cases quasi- – HB: (complex) Hermitian band triangular) – HE: (complex) Hermitian – TZ: trapezoidal – HG: upper Hessenberg matrix, generalized – UN: (complex) unitary problem (i.e a Hessenberg and a triangular – UP: (complex) unitary, packed storage matrix) • ZZZ: performed computation – HP: (complex) Hermitian, packed storage – Linear systems – HS: upper Hessenberg – Factorizations – OP: (real) orthogonal, packed storage – Eigenvalue problems – OR: (real) orthogonal – Singular value decomposition – PB: symmetric or Hermitian positive definite band – Etc. Bilel Hadri – Introduction to Numerical Libraries for HPC 15 LAPACK routines http://www.icl.utk.edu/~mgates3/docs/lapack-summary.pdf Bilel Hadri – Introduction to Numerical Libraries for HPC 16 Numerical Libraries packages Dongarra/ICL • Vendor libraries optimized implementations of BLAS, LAPACK, ScaLAPACK to their processors and their platform. Bilel Hadri – Introduction to Numerical Libraries for HPC 17 LAPACK & ScaLAPACK • ScaLAPACK is a library with a subset of LAPACK routines running on distributed memory machines. • ScaLAPACK is designed for heterogeneous computing and is potable on any computer that supports MPI or PVM. • http://www.netlib.org/scalapack/scalapack_home.html Bilel Hadri – Introduction to Numerical Libraries for HPC 18 Overview of ScaLAPACK Bilel Hadri – Introduction to Numerical Libraries for HPC 19 Why use LAPACK or ScaLAPACK? • Solving systems of: – Linear equations: Ax = b – Least squares: min|| Ax -b||2 – Eigenvalue problem: �� = �� – Singular value problem: A = USV T Bilel Hadri – Introduction to Numerical Libraries for HPC 20 Reference BLAS vs Tuned • The reference BLAS and LAPACK libraries a re reference implementations of the BLAS and LAPACK standard. These are not optimised and not multi-threaded, so not much performance should be expected. These libraries are available for downloadhttp://www.netlib.org/blas and http://www.netlib.org/lapack • The Automatically Tuned Linear Algebra Software, ATLAS. During compile time, ATLAS automatically choses the algorithms delivering the best performance. ATLAS does not contain all LAPACK functionality; it can be downloaded from http://www.netlib.org/atlas • The Goto BLAS an implementation of the level 3 BLAS aimed at high efficiency]. The Goto BLAS is available for download from http://www.tacc.utexas.edu/resources/software Bilel Hadri – Introduction to Numerical Libraries for HPC 21 Optimized vendor libraries for BLAS/LAPACK • Highly efficient versions • Hand tuned assembly by hardware vendors • Provide near peak performance • Several vendors provide libraries optimized for their architecture (AMD, Fujitsu, IBM, Intel, NEC,…) – Intel à MKL – Cray à LibSci – AMD à ACML – IBM à ESSL • USE them ! ( Speedup up to 10 or more ) Bilel Hadri – Introduction to Numerical Libraries for HPC 22 AMD / MKL • ACML (AMD Core Math Library)

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    35 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us