Opencl in Scientific High Performance Computing

Opencl in Scientific High Performance Computing

OpenCL in Scientific High Performance Computing —The Good, the Bad, and the Ugly Matthias Noack [email protected] Zuse Institute Berlin Distributed Algorithms and Supercomputing 2017-05-18, IWOCL 2017 https://doi.org/10.1145/3078155.3078170 1 / 28 Systems: • 2× Cray XC40 (#118 and #150 in top500, year 4/5 in lifetime) with Xeon CPUs • 80-node Xeon Phi (KNL) Cray XC40 test-and-development system • 2× 32-node Infiniband cluster with Nvidia K40 • 2 test-and-development systems with AMD FirePro W8100 What I do: • computer science research (methods) • development of HPC codes • evaluation of upcoming technologies • consulting/training with system users Context ZIB hosts part of the HLRN (Northern German Supercomputing Alliance) facilities. 2 / 28 What I do: • computer science research (methods) • development of HPC codes • evaluation of upcoming technologies • consulting/training with system users Context ZIB hosts part of the HLRN (Northern German Supercomputing Alliance) facilities. Systems: • 2× Cray XC40 (#118 and #150 in top500, year 4/5 in lifetime) with Xeon CPUs • 80-node Xeon Phi (KNL) Cray XC40 test-and-development system • 2× 32-node Infiniband cluster with Nvidia K40 • 2 test-and-development systems with AMD FirePro W8100 2 / 28 Context ZIB hosts part of the HLRN (Northern German Supercomputing Alliance) facilities. Systems: • 2× Cray XC40 (#118 and #150 in top500, year 4/5 in lifetime) with Xeon CPUs • 80-node Xeon Phi (KNL) Cray XC40 test-and-development system • 2× 32-node Infiniband cluster with Nvidia K40 • 2 test-and-development systems with AMD FirePro W8100 What I do: • computer science research (methods) • development of HPC codes • evaluation of upcoming technologies • consulting/training with system users 2 / 28 • intra-node parallelism dominated by OpenMP (e.g. Intel) and CUDA (Nvidia) ⇒ vendor and tool dependencies ⇒ limited portability ⇒ multiple diverging code branches ⇒ hard to maintain • inter-node communication = MPI • hardware life time: 5 years • software life time: multiple tens of years ⇒ outlives systems by far ⇒ aim for portability Why OpenCL? (aka: The Good ) Scientific HPC in a Nutshell • tons of legacy code (FORTRAN) authored by domain experts ⇒ rather closed community ⇒ decoupled from computer science (ask a student about FORTRAN) • highly conservative code owners ⇒ modern software engineering advances are picked up very slowly 3 / 28 • hardware life time: 5 years • software life time: multiple tens of years ⇒ outlives systems by far ⇒ aim for portability Why OpenCL? (aka: The Good ) Scientific HPC in a Nutshell • tons of legacy code (FORTRAN) authored by domain experts ⇒ rather closed community ⇒ decoupled from computer science (ask a student about FORTRAN) • highly conservative code owners ⇒ modern software engineering advances are picked up very slowly • intra-node parallelism dominated by OpenMP (e.g. Intel) and CUDA (Nvidia) ⇒ vendor and tool dependencies ⇒ limited portability ⇒ multiple diverging code branches ⇒ hard to maintain • inter-node communication = MPI 3 / 28 Why OpenCL? (aka: The Good ) Scientific HPC in a Nutshell • tons of legacy code (FORTRAN) authored by domain experts ⇒ rather closed community ⇒ decoupled from computer science (ask a student about FORTRAN) • highly conservative code owners ⇒ modern software engineering advances are picked up very slowly • intra-node parallelism dominated by OpenMP (e.g. Intel) and CUDA (Nvidia) ⇒ vendor and tool dependencies ⇒ limited portability ⇒ multiple diverging code branches ⇒ hard to maintain • inter-node communication = MPI • hardware life time: 5 years • software life time: multiple tens of years ⇒ outlives systems by far ⇒ aim for portability 3 / 28 • use modern techniques with a broad community (beyond HPC) ⇒ modern C++ for host code • develop code interdisciplinary ⇒ domain experts design the model . ⇒ . computer scientists the software Why OpenCL? (aka: The Good ) Goal for new code: Do not contribute to that situation! • portability first (6= performance portability) ⇒ OpenCL has the largest hardware coverage for intra-node programming ⇒ library only ⇒ no tool dependencies 4 / 28 • develop code interdisciplinary ⇒ domain experts design the model . ⇒ . computer scientists the software Why OpenCL? (aka: The Good ) Goal for new code: Do not contribute to that situation! • portability first (6= performance portability) ⇒ OpenCL has the largest hardware coverage for intra-node programming ⇒ library only ⇒ no tool dependencies • use modern techniques with a broad community (beyond HPC) ⇒ modern C++ for host code 4 / 28 Why OpenCL? (aka: The Good ) Goal for new code: Do not contribute to that situation! • portability first (6= performance portability) ⇒ OpenCL has the largest hardware coverage for intra-node programming ⇒ library only ⇒ no tool dependencies • use modern techniques with a broad community (beyond HPC) ⇒ modern C++ for host code • develop code interdisciplinary ⇒ domain experts design the model . ⇒ . computer scientists the software 4 / 28 Target Hardware vendor architecture device compute memory Intel Haswell 2× Xeon E5-2680v3 0.96 TFLOPS 136 GiB/s Intel Knights Landing Xeon Phi 7250 2.61 TFLOPS* 490/115 GiB/s AMD Hawaii Firepro W8100 2.1 TFLOPS 320 GiB/s Nvidia Kepler Tesla K40 1.31 TFLOPS 480 GiB/s * calculated with max. AVX frequency of 1.2 GHz: 2611.2 GFLOPS = 1.2 GHz × 68 cores × 8 SIMD × 2 VPUs × 2 FMA 5 / 28 COSIM - A Predictive Cometary Coma Simulation Solve dust dynamics: ~adust(~r) = ~agas-drag + ~agrav + ~aCoriolis + ~acentrifugal 1 = C αN (~r)m (~v − ~v )|~v − ~v | − ∇φ(~r) 2 d gas gas dust gas dust gas −2~ω × ~vdust − ~ω × (~ω ×~r) Compare with data of 67P/Churyumov–Gerasimenko from Rosetta spacecraft: corr 0.90 time 12:12 0 2 4 6 8 10 12 Panels 1-2: OSIRIS NAC Image, Panels 3-4: Simulation Results, Right Image: ESA – C. Carreau/ATG medialab, CC BY-SA 3.0-igo 6 / 28 • millions of coupled ODEs • hierarchical graph of matrices HEOM - The Hierarchical Equations of Motion Model for Open Quantum Systems • understand the energy transfer in photo-active molecular complexes ⇒ e.g. photosynthesis [Image by University of Copenhagen Biology Department] M. Noack, F. Wende, K.-D. Oertel, OpenCL: There and Back Again, in High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches, Vol. 2, 2015 7 / 28 • hierarchical graph of matrices HEOM - The Hierarchical Equations of Motion Model for Open Quantum Systems dσu i = [H; σu] • understand the energy transfer in dt ¯h B K−1 photo-active molecular complexes X X − σ n γ(b; k) ⇒ e.g. photosynthesis u u;(b;k) b=1 k B K−1 • millions of coupled ODEs X h 2λb X c(b; k) i × × − − V V σu β¯h2ν ¯hγ(b; k) s(b) s(b) b=1 b k B K−1 X X × + + iVs(b)σu;b;k b=1 k B K−1 X X − + nu;(b;k)θMA(b;k)σ(u;b;k) b=1 k M. Noack, F. Wende, K.-D. Oertel, OpenCL: There and Back Again, in High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches, Vol. 2, 2015 7 / 28 HEOM - The Hierarchical Equations of Motion Model for Open Quantum Systems th ρ 0 0 • understand the energy transfer in 0 level 0 photo-active molecular complexes ⇒ e.g. photosynthesis 1st level ρ 1 0 ρ 0 1 1 2 • millions of coupled ODEs nd 2 level ρ 2 0 ρ 1 1 ρ 0 2 • hierarchical graph of matrices 3 4 5 M. Noack, F. Wende, K.-D. Oertel, OpenCL: There and Back Again, in High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches, Vol. 2, 2015 7 / 28 HEOM - The Hierarchical Equations of Motion Model for Open Quantum Systems • understand the energy transfer in photo-active molecular complexes ⇒ e.g. photosynthesis • millions of coupled ODEs • hierarchical graph of matrices M. Noack, F. Wende, K.-D. Oertel, OpenCL: There and Back Again, in High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches, Vol. 2, 2015 7 / 28 HEOM - The Hierarchical Equations of Motion Model for Open Quantum Systems • understand the energy transfer in photo-active molecular complexes ⇒ e.g. photosynthesis • millions of coupled ODEs • hierarchical graph of matrices M. Noack, F. Wende, K.-D. Oertel, OpenCL: There and Back Again, in High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches, Vol. 2, 2015 7 / 28 Interdisciplinary Workflow domain experts computer scientists Mathematical Model 8 / 28 Interdisciplinary Workflow domain experts computer scientists Mathematical Model - ODEs - PDEs - Graphs -... 8 / 28 Interdisciplinary Workflow domain experts computer scientists Mathematical High Level Prototype Model (Mathematica) - ODEs - PDEs - Graphs -... 8 / 28 Interdisciplinary Workflow domain experts computer scientists Mathematical High Level Prototype Model (Mathematica) - ODEs - domain scientist’s tool - PDEs - high level - Graphs - symbolic solvers -... - arbitrary precision - very limited performance 8 / 28 Interdisciplinary Workflow domain experts computer scientists Mathematical High Level Prototype OpenCL kernel Model (Mathematica) within Mathematica - ODEs - domain scientist’s tool - PDEs - high level - Graphs - symbolic solvers -... - arbitrary precision - very limited performance 8 / 28 Interdisciplinary Workflow domain experts computer scientists Mathematical High Level Prototype OpenCL kernel Model (Mathematica) within Mathematica - ODEs - domain scientist’s tool - replace some code - PDEs - high level with OpenCL - Graphs - symbolic solvers -... - arbitrary precision - very limited performance 8 / 28 Mathematica and OpenCL (* Load OpenCL support*) Needs["OpenCLLink"] (* Create OpenCLFunction from source, kernel name, signature*) doubleFun = OpenCLFunctionLoad["

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    76 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us