LLVM AMDGPU for High Performance Computing: are we competitive yet?
Vedran Miletić, HITS gGmbH Szilárd Páll, KTH Frauke Gräter, HITS gGmbH
Layers of GPU computing
GPU accelerated apps and libraries
CUDA OpenCL Our work
NVIDIA AMD proprietary Clang Clang proprietary compiler compiler
AMD proprietary Libclc, LLVM, Mesa, and NVIDIA proprietary driver driver amdgpu/nouveau
State of the art: CUDA and OpenCL
● CUDA ● OpenCL – 338 applications listed at – ~70 applications listed NVIDIA’s website on Wikipedia – Over 50% market share in ● ~30 in Scientific Top 500 (Nov 2016) computing category ● Couple of benchmarks and toys
OpenCL applications
● Image taken from: Ribeiro, João V., et al. "QwikMD—Integrative Molecular Dynamics Toolkit for Novices and Experts." Scientific reports 6 (2016).
● Focus on GROMACS, LAMMPS, OpenMM, ASL
Running open source OpenCL stack on Radeon/FirePro/FireStream
● AMD’s proprietary OpenCL driver and compiler – GPUs released 2012 or later – Will be open sourced soon™
● Mesa/LLVM – AMD GPUs released 2009 or later – Open source from the beginning™
Our work
● No changes or minor changes in apps/libs
● Improvements to LLVM, Clang, libclc, Mesa – Missing math functions, OpenCL 1.2 API calls – Bug fixes
GROMACS OpenCL kernel execution time 140 GROMACS OpenCL kernel execution time 120
140 100
80
e AMDGPU-PRO 120 m i
T 60 AMDGPU
40 100 20
80 0 1,5 3 6 12 24 48 96 192 384 768 1536 3072 AMDGPU-PRO e
m Systerm size AMDGPU i
T 60
40
20
0 1,5 3 6 12 24 48 96 192 384 768 1536 3072
Systerm size
LAMMPS example execution time
80
OpenMM test execution time 70 60 50 60 40 30 50 20 AMDGPU-PRO AMDGPU-PRO e
e 40
10 m i m AMDGPU AMDGPU i 0 T T t r r r 30 ta to to to s ra ra ra o g g g rm te te te e n n n h I I tI T n in e n ia v rl 20 e n e e rs w g V e o n d r a n B L A 10
Test 0 melt_imd-gpu
Example
ASL test execution time
50
45
40
35
30 AMDGPU-PRO e 25 m AMDGPU i T 20
15
10
5
0 testKernel testKernelMerger testPrivateVar
Test
Other OpenCL software
● Blender – Different users report performance issues and crashes
● BEAGLE, phylogenetics library – Made some progress
● clBLAS and clFFT – Implmented clEnqueueFillBuffer, requires more work – Required for Octopus (quantum chem), probably others
Other OpenCL software
● BOINC, CP2K, Theano – Had users tell me “I would try it if worked”
● clpeak, opencl-stream, SNU NPB – Benchmarks
● App or lib you care about?
Acknowledgments
● Matt Arsenault, AMD
● Jan Vesely, Aaron Watry and Serge Martin, Mesa contributors
● Francisco Jerez, Intel
● Peter Eastman, OpenMM
● Tom Stellard, Red Hat
● Freenode channel #radeon