
Automatic GPU Code Generation for Android HIPAcc – http://hipacc-lang.org Oliver Reiche, Richard Membarth, Frank Hannig, and Jürgen Teich A DSL for Image Processing HIPAcc: The Heterogeneous Image Processing Acceleration Framework DSL code embedded into C++ Optimizations tailored to GPU architectures C++ Domain knowledge captured by classes embedded DSL Compiler Features cc Domain HIPA Classes Knowledge Source-to-Source I memory padding / alignment Compiler Image input / output buffers I utilization of textures Architecture Clang/LLVM Accessor ROI of input image Knowledge I exploit full GPU memory hierarchy I boundary handling I MPMD code generation CUDA OpenCL C/C++ Renderscript I interpolation / filtering I loop unrolling (GPU) (x86/GPU) (x86) (x86/ARM/GPU) IterationSpace ROI of output image I constant propagation CUDA/OpenCL/Renderscript Runtime Library Kernel compute kernel description I thread-coarsening Mask convolution mask (multiple pixels per thread) Domain iteration domain I vectorization support (point operators) Pyramid image pyramid description I implicit use of unified CPU/GPU memory Figure: Overview of the HIPAcc Framework DSL Code Example: Gaussian Blur DSL Host Code DSL Kernel Code 1 // ... Runtime 1 class GaussianBlur : public Kernel<uchar4> { 2 const float filter_mask[3][3] = {...}; Calls 2 // ... 3 Mask<float> mask(filter_mask); rewrite ++ (C ) 3 void kernel() { 4 DSL 4 float4 sum = convolve(mask, HipaccSUM, [&]() -> float4{ 5 BoundaryCondition<uchar4> bound(in, mask, BOUNDARY_CLAMP); Code cc 5 return mask() * convert_float4(input(mask)); 6 Accessor<uchar4> acc(bound); HIPA 6 }); 7 IterationSpace<uchar4> iter(out); (C++) 7 output() = convert_uchar4(sum + 0.5f); 8 Target generate 8 } 9 GaussianBlur filter(iter, acc, mask); Device 9 }; 10 filter.execute(); Code Seamless Integration into the Android Development Tools Android Development Tools (ADT) GPU Computing on Android Android Software Development Kit (SDK) Renderscript Compute Filterscript I based on Eclipse I code mapping to native I stricter limitations I relaxed precision I supports C++ via Java Native Interface (JNI) worker threads I no scatter writes I automatic compilation and packaging into app I targets DSPs, CPUs, I pointers are illegal I compilation done by the Android Native and GPUs (since Android 4.2) Development Kit (NDK) I ensures wider compatibility HIPAcc Integration Modular Makefile host device Used to seamlessly integrate HIPAcc into ADT .cpp source g++ cc I called during preprocessing step HIPA binary executable I set appropriate target compiler flags DSL source reflection libRScpp I append generated files to NDK sources .rs source .bc file native lib system lib Eclipse llvm-rs-cc libbcc libRS Demonstration Setup and Results Exynos 5250 MPSoC Live Demo Application I ARM Cortex-A15 I five image filters in DSL code I dual core @1.7 GHz I single description of filters I 64/128 bit SIMD NEON I target-independent code I ARM Mali T-604 GPU I 4 cores @533 MHz I automatic code generation for I 16 SIMD lanes per core Renderscript and Filterscript I 2 GB of 800 MHz DDR3 DRAM I up to 4× faster Figure: Samsung Exynos 5250 Arndale Board.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages1 Page
-
File Size-