Automatic GPU Code Generation for Android Hipacc –

Automatic GPU Code Generation for Android Hipacc –

Automatic GPU Code Generation for Android HIPAcc – http://hipacc-lang.org Oliver Reiche, Richard Membarth, Frank Hannig, and Jürgen Teich A DSL for Image Processing HIPAcc: The Heterogeneous Image Processing Acceleration Framework DSL code embedded into C++ Optimizations tailored to GPU architectures C++ Domain knowledge captured by classes embedded DSL Compiler Features cc Domain HIPA Classes Knowledge Source-to-Source I memory padding / alignment Compiler Image input / output buffers I utilization of textures Architecture Clang/LLVM Accessor ROI of input image Knowledge I exploit full GPU memory hierarchy I boundary handling I MPMD code generation CUDA OpenCL C/C++ Renderscript I interpolation / filtering I loop unrolling (GPU) (x86/GPU) (x86) (x86/ARM/GPU) IterationSpace ROI of output image I constant propagation CUDA/OpenCL/Renderscript Runtime Library Kernel compute kernel description I thread-coarsening Mask convolution mask (multiple pixels per thread) Domain iteration domain I vectorization support (point operators) Pyramid image pyramid description I implicit use of unified CPU/GPU memory Figure: Overview of the HIPAcc Framework DSL Code Example: Gaussian Blur DSL Host Code DSL Kernel Code 1 // ... Runtime 1 class GaussianBlur : public Kernel<uchar4> { 2 const float filter_mask[3][3] = {...}; Calls 2 // ... 3 Mask<float> mask(filter_mask); rewrite ++ (C ) 3 void kernel() { 4 DSL 4 float4 sum = convolve(mask, HipaccSUM, [&]() -> float4{ 5 BoundaryCondition<uchar4> bound(in, mask, BOUNDARY_CLAMP); Code cc 5 return mask() * convert_float4(input(mask)); 6 Accessor<uchar4> acc(bound); HIPA 6 }); 7 IterationSpace<uchar4> iter(out); (C++) 7 output() = convert_uchar4(sum + 0.5f); 8 Target generate 8 } 9 GaussianBlur filter(iter, acc, mask); Device 9 }; 10 filter.execute(); Code Seamless Integration into the Android Development Tools Android Development Tools (ADT) GPU Computing on Android Android Software Development Kit (SDK) Renderscript Compute Filterscript I based on Eclipse I code mapping to native I stricter limitations I relaxed precision I supports C++ via Java Native Interface (JNI) worker threads I no scatter writes I automatic compilation and packaging into app I targets DSPs, CPUs, I pointers are illegal I compilation done by the Android Native and GPUs (since Android 4.2) Development Kit (NDK) I ensures wider compatibility HIPAcc Integration Modular Makefile host device Used to seamlessly integrate HIPAcc into ADT .cpp source g++ cc I called during preprocessing step HIPA binary executable I set appropriate target compiler flags DSL source reflection libRScpp I append generated files to NDK sources .rs source .bc file native lib system lib Eclipse llvm-rs-cc libbcc libRS Demonstration Setup and Results Exynos 5250 MPSoC Live Demo Application I ARM Cortex-A15 I five image filters in DSL code I dual core @1.7 GHz I single description of filters I 64/128 bit SIMD NEON I target-independent code I ARM Mali T-604 GPU I 4 cores @533 MHz I automatic code generation for I 16 SIMD lanes per core Renderscript and Filterscript I 2 GB of 800 MHz DDR3 DRAM I up to 4× faster Figure: Samsung Exynos 5250 Arndale Board.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    1 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us