Numerical Linear Algebra for High Performance Computing

Numerical Linear Algebra for High Performance Computing SBD Workshop: Transfer of HPC and Data Know-How to Scientific Communities June 11th/12th 2018, Juelich Supercomputing Centre (JSC) Hartwig Anzt, Terry Cojean, Goran Flegar, Thomas Grűtzmacher, Pratik Nayak,Tobias Ribizel Steinbuch Centre for Computing (SCC) KIT – The Research University in the Helmholtz Association www.kit.edu Algorithms reflecting hardware evolution • Explosion in core count. ”Parallelism needed – synchronization kills performance.” 2 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Algorithms reflecting hardware evolution 64bit read, 1 DP-FLOP John D. McCaplin (TACC) • Explosion in core count. • Compute power (#FLOPs) grows much faster than bandwidth. ”Parallelism needed – synchronization kills performance.” ”Operations are free, memory access is what counts.” 3 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Algorithms reflecting hardware evolution • Explosion in core count. ”Parallelism needed – synchronization kills performance.” 4 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Algorithms reflecting hardware evolution Task-based algorithms • Define work packages as “tasks”. • Identify dependencies between tasks. time → • Explosion in core count. ”Parallelism needed – synchronization kills performance.” 5 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Algorithms reflecting hardware evolution Task-based algorithms • Define work packages as “tasks”. • Identify dependencies between tasks. • Break fork-join-model. • Synchronize locally, avoid global synchronization. • OpenMP tasks, OmpSs, PARSEC, StarPU. time → • Explosion in core count. ”Parallelism needed – synchronization kills performance.” 6 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Algorithms reflecting hardware evolution Task-based algorithms • Define work packages as “tasks”. • Identify dependencies between tasks. • Break fork-join-model. • Synchronize locally, avoid global synchronization. • OpenMP tasks, OmpSs, PARSEC, StarPU. Reformulate algorithms in terms of fixed-point iterations • Fixed-point iteration converging in the asymptotic sense can accept lack of synchronization. • Element-wise independent iterations allow to scale one multi- and many-core architectures. • Explosion in core count. ”Parallelism needed – synchronization kills performance.” 7 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Fixed-point based algorithms: ParILU We want to solve a linear problem of the form Ax = b. For this, we factorize into the product of a lower triangular matrix A L and an upper triangular matrix .U 8 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Fixed-point based algorithms: ParILU We want to solve a linear problem of the form Ax = b. For this, we factorize into the product of a lower triangular matrix A L and an upper triangular matrix .U Exact LU Factorization • Decompose system matrix into product .A = L U · • Based on Gaussian elimination. • Triangular solves to solve a system :Ax = b Ly = b y Ux = y x ) ) ) • De-Facto standard for solving dense problems. • What about sparse? Often significant fill-in… 9 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Fixed-point based algorithms: ParILU We want to solve a linear problem of the form Ax = b. For this, we factorize into the product of a lower triangular matrix A L and an upper triangular matrix .U Exact LU Factorization Incomplete LU Factorization (ILU) A L U ⇡ · • Decompose system matrix into product .A = L U • Focused on restricting fill-in to a · • Based on Gaussian elimination. specific sparsity pattern . S • Triangular solves to solve a system :Ax = b n n L R ⇥ lower (unit-) triangular, sparse. • For 2ILU(0), is the sparsity pattern of A. Ly = b y Ux = y x n nS ) ) ) U• RWorks well for many problems.⇥ upper triangular, sparse. 2 • De-Facto standard for solving dense problems. • Is this the best we can get for nonzero count? Lij = Uij =0 (i, j) / . • What about sparse? Often significant fill-in… 8 2 S R = A L U, R =0 (i, j) . • Fill-in in threshold ILU (− · ij ILUT8) bases on the2 S significance of elements (e.g. magnitude).S • Often better preconditioners than level-based ILU. • Difficult to parallelize. 10 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Fixed-point based algorithms: ParILU We want to solve a linear problem of the form Ax = b. For this, we factorize into the product of a lower triangular matrix A L and an upper triangular matrix .U Exact LU Factorization Incomplete LU Factorization (ILU) A L U ⇡ · • Decompose system matrix into product .A = L U • Focused on restricting fill-in to a · • Based on Gaussian elimination. specific sparsity pattern . S • Triangular solves to solve a system :Ax = b n n L R ⇥ lower (unit-) triangular, sparse. • For 2ILU(0), is the sparsity pattern of A. Ly = b y Ux = y x n nS ) ) ) U• RWorks well for many problems.⇥ upper triangular, sparse. 2 • De-Facto standard for solving dense problems. • Is this the best we can get for nonzero count? Lij = Uij =0 (i, j) / . • What about sparse? Often significant fill-in… 8 2 S R = A L U, R =0 (i, j) . • Fill-in in threshold ILU (− · ij ILUT8) bases on the2 S significance of elements (e.g. magnitude).S • Often better preconditioners than level-based ILU. • Difficult to parallelize. 11 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Fixed-point based algorithms: ParILU We want to solve a linear problem of the form Ax = b. For this, we factorize into the product of a lower triangular matrix A L and an upper triangular matrix .U Exact LU Factorization Incomplete LU Factorization (ILU) A L U ⇡ · • Decompose system matrix into product .A = L U • Focused on restricting fill-in to a · • Based on Gaussian elimination. specific sparsity pattern . • Triangular solves to solve a system :Ax = b S Ly = b y Ux = y x • For ILU(0), is the sparsity pattern of A. ) ) ) • Works well for many problems.S • De-Facto standard for solving dense problems. • How to generate an ILU in parallel? • What about sparse? Often significant fill-in… • Fill-in in threshold ILU (ILUT) bases on the significance of elements (e.g. magnitude).S • Often better preconditioners than level-based ILU. • Difficult to parallelize. 12 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Fixed-point based algorithms: ParILU We want to solve a linear problem of the form Ax = b. For this, we factorize into the product of a lower triangular matrix A L and an upper triangular matrix .U Exact LU Factorization Incomplete LU Factorization (ILU) A L U ⇡ · • Decompose system matrix into product .A = L U • Focused on restricting fill-in to a · • Based on Gaussian elimination. specific sparsity pattern . • Triangular solves to solve a system :Ax = b S Ly = b y Ux = y x • For ILU(0), is the sparsity pattern of A. ) ) ) • Works well for many problems.S • De-Facto standard for solving dense problems. • How to generate an ILU in parallel? • What about sparse? Often significant fill-in… • Gaussian Elimination naturally sequential. • Fill•-in in threshold ILU (Level-scheduling brings limited parallelism.ILUT) bases on the significance of elements (e.g. magnitude).• A L U usually only a rough approximation.S ⇡ · • Often better preconditioners than level-based ILU. • Difficult to parallelize. 13 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Fixed-point based algorithms: ParILU We want to solve a linear problem of the form Ax = b. For this, we factorize into the product of a lower triangular matrix A L and an upper triangular matrix .U Exact LU Factorization Incomplete LU Factorization (ILU) A L U ⇡ · • Decompose system matrix into product .A = L U • Focused on restricting fill-in to a · • Based on Gaussian elimination. specific sparsity pattern . • Triangular solves to solve a system :Ax = b S Ly = b y Ux = y x • For ILU(0), is the sparsity pattern of A. ) ) ) • Works well for many problems.S • De-Facto standard for solving dense problems. • How to generate an ILU in parallel? • What about sparse? Often significant fill-in… • Gaussian Elimination naturally sequential. • Fill•-in in threshold ILU (Level-scheduling brings limited parallelism.ILUT) bases on the significance of elements (e.g. magnitude).• A L U usually only a rough approximation.S ⇡ · • Often better preconditioners than We would be better off if we level-based ILU. could cheaply (parallel) • Difficult to parallelize.generate an approximation. 14 05/07/2018 Hartwig Anzt: Numerical Linear Algebra for High Performance Computing Steinbuch Centre for Computing Fixed-point based algorithms: ParILU • Generate an incomplete factorization preconditioner via an iterative process. • Exploit the property (A L U = 0) − · S ???? ? ???? ? ? ???? ? ???? ?? ??? ?? ? ? ?? ?? 0???? ? ? 1 0??? 1 0??? 1 0 ? 1 = B? ? ? ? ? C B??C − B? ? C ⇥ B ? C B C B C B C B C B ? ? ??C B ???C B ? ? C B ??C B C B C B C B C B??? ? ??C B??

Numerical Linear Algebra for High Performance Computing

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support