LLVM AMDGPU for High Performance Computing: are we competitive yet?

Vedran Miletić, HITS gGmbH Szilárd Páll, KTH Frauke Gräter, HITS gGmbH

Layers of GPU computing

GPU accelerated apps and libraries

CUDA OpenCL Our work

NVIDIA AMD proprietary Clang proprietary compiler

AMD proprietary Libclc, LLVM, , and proprietary driver driver amdgpu/nouveau

State of the art: CUDA and OpenCL

● CUDA ● OpenCL – 338 applications listed at – ~70 applications listed NVIDIA’s website on Wikipedia – Over 50% market share in ● ~30 in Scientific Top 500 (Nov 2016) computing category ● Couple of benchmarks and toys

OpenCL applications

● Image taken from: Ribeiro, João V., et al. "QwikMD—Integrative Molecular Dynamics Toolkit for Novices and Experts." Scientific reports 6 (2016).

● Focus on GROMACS, LAMMPS, OpenMM, ASL

Running open source OpenCL stack on Radeon/FirePro/FireStream

● AMD’s proprietary OpenCL driver and compiler – GPUs released 2012 or later – Will be open sourced soon™

● Mesa/LLVM – AMD GPUs released 2009 or later – Open source from the beginning™

Our work

● No changes or minor changes in apps/libs

● Improvements to LLVM, Clang, libclc, Mesa – Missing math functions, OpenCL 1.2 API calls – Bug fixes

GROMACS OpenCL kernel time 140 GROMACS OpenCL kernel execution time 120

140 100

80

e AMDGPU-PRO 120 m i

T 60 AMDGPU

40 100 20

80 0 1,5 3 6 12 24 48 96 192 384 768 1536 3072 AMDGPU-PRO e

m Systerm size AMDGPU i

T 60

40

20

0 1,5 3 6 12 24 48 96 192 384 768 1536 3072

Systerm size

LAMMPS example execution time

80

OpenMM test execution time 70 60 50 60 40 30 50 20 AMDGPU-PRO AMDGPU-PRO e

e 40

10 m i m AMDGPU AMDGPU i 0 T T t r r r 30 ta to to to s ra ra ra o g g g rm te te te e n n n h I I tI T n in e n ia v rl 20 e n e e rs w g V e o n r a n B L A 10

Test 0 melt_imd-gpu

Example

ASL test execution time

50

45

40

35

30 AMDGPU-PRO e 25 m AMDGPU i T 20

15

10

5

0 testKernel testKernelMerger testPrivateVar

Test

Other OpenCL software

● Blender – Different users report performance issues and crashes

● BEAGLE, phylogenetics library – Made some progress

● clBLAS and clFFT – Implmented clEnqueueFillBuffer, requires more work – Required for Octopus (quantum chem), probably others

Other OpenCL software

● BOINC, CP2K, Theano – Had users tell me “I would try it if worked”

● clpeak, -stream, SNU NPB – Benchmarks

● App or lib you care about?

Acknowledgments

● Matt Arsenault, AMD

● Jan Vesely, Aaron Watry and Serge Martin, Mesa contributors

● Francisco Jerez,

● Peter Eastman, OpenMM

● Tom Stellard, Red Hat

● Freenode channel #radeon