Automatic GPU Code Generation for Android HIPAcc – http://hipacc-lang.org
Oliver Reiche, Richard Membarth, Frank Hannig, and Jürgen Teich
A DSL for Image Processing
HIPAcc: The Heterogeneous Image Processing Acceleration Framework
DSL code embedded into C++ Optimizations tailored to GPU architectures C++ Domain knowledge captured by classes embedded DSL Compiler Features cc Domain HIPA Classes Knowledge Source-to-Source I memory padding / alignment Compiler Image input / output buffers I utilization of textures Architecture Clang/LLVM Accessor ROI of input image Knowledge I exploit full GPU memory hierarchy
I boundary handling I MPMD code generation CUDA OpenCL C/C++ Renderscript I interpolation / filtering I loop unrolling (GPU) (x86/GPU) (x86) (x86/ARM/GPU) IterationSpace ROI of output image I constant propagation CUDA/OpenCL/Renderscript Runtime Library Kernel compute kernel description I thread-coarsening Mask convolution mask (multiple pixels per thread)
Domain iteration domain I vectorization support (point operators)
Pyramid image pyramid description I implicit use of unified CPU/GPU memory Figure: Overview of the HIPAcc Framework
DSL Code Example: Gaussian Blur
DSL Host Code DSL Kernel Code 1 // ... Runtime 1 class GaussianBlur : public Kernel
Android Development Tools (ADT) GPU Computing on Android Android Software Development Kit (SDK) Renderscript Compute Filterscript
I based on Eclipse I code mapping to native I stricter limitations I relaxed precision I supports C++ via Java Native Interface (JNI) worker threads I no scatter writes I automatic compilation and packaging into app I targets DSPs, CPUs, I pointers are illegal I compilation done by the Android Native and GPUs (since Android 4.2) Development Kit (NDK) I ensures wider compatibility
HIPAcc Integration Modular Makefile host device Used to seamlessly integrate HIPAcc into ADT .cpp source g++ cc I called during preprocessing step HIPA binary executable
I set appropriate target compiler flags DSL source reflection libRScpp I append generated files to NDK sources
.rs source .bc file native lib system lib Eclipse llvm-rs-cc libbcc libRS
Demonstration Setup and Results Exynos 5250 MPSoC Live Demo Application
I ARM Cortex-A15 I five image filters in DSL code I dual core @1.7 GHz I single description of filters I 64/128 bit SIMD NEON I target-independent code I ARM Mali T-604 GPU
I 4 cores @533 MHz I automatic code generation for I 16 SIMD lanes per core Renderscript and Filterscript
I 2 GB of 800 MHz DDR3 DRAM I up to 4× faster
Figure: Samsung Exynos 5250 Arndale Board