Progress Report Image Processing Using Nvidia Jetson Tegra K1

Progress Report Image Processing using NVidia Jetson Tegra K1 Development Board August 2015 – November 2015 Prepared by: Dr. Tuba Kurban, [email protected] Visiting Post-doctoral Researcher http://imaging.utk.edu/research/tkurban Dr. Rifat Kurban, [email protected] Visiting Post-doctoral Researcher http://imaging.utk.edu/research/rifkur Supervisor: Dr. Mongi A. Abidi Imaging, Robotics, and Intelligent Systems Laboratory, University of Tennessee, Knoxville 1 Contents 1. Introduction ....................................................................................................................................... 3 2. GPU Computing Platforms & Libraries ............................................................................................ 3 3. Dynamic Voltage and Frequency Scaling ......................................................................................... 4 4. Linux Desktop Environments ............................................................................................................ 4 5. Previous studies using TK1 in IRIS Lab ........................................................................................... 5 6. Image Processing using TK1 ............................................................................................................. 7 7. Video File Processing ........................................................................................................................ 9 8. Streaming Video Processing with Ximea Camera .......................................................................... 10 9. Streaming Video Processing with VREO Board & SONY Camera ................................................ 11 10. Conclusions ................................................................................................................................. 14 References .................................................................................................................................................. 15 2 1. Introduction In this progress report, a brief description of the technologies used and results of some basic image processing tasks using NVidia Jetson Tegra K1 embedded computing platform are given. When Tegra K1 is first announced by NVidia in Q2, 2014, it was attracted much attention not only the industry but also the researchers working in the high performance computing area, as well. Tegra K1 mobile processor includes 4+1 Quad-Core ARM Cortex-A15 CPU clocked at 2.3 GHz and 192 CUDA Core Kepler architecture based GPU clocked at 852 MHz. Tegra K1 (TK1) supports up to 8 GB DDR3L memory and 4K display over HDMI. The chip is produced with 28 nm process [1]. Jetson TK1 development kit is announced in April 2014 by NVidia and sold at $192 in US. It includes; 2 GB memory, 16GB eMMC memory, USB 3.0, HDMI, GigE LAN, SATA, audio, PCIE, RS232, CSI and GPIO ports [2], as shown in Figure 1. Average power consumption of the board is reported typically 5W and maximum consumption reaches to 15W when CPU, GPU and other peripherals used together [3]. Figure 1. NVidia Jetson Tegra K1 development board. There are two journal papers appeared in Scopus that utilizes TK1 (as of November, 2015): Cocchioni et.al. realized a landing application for an unmanned quadrotor [4] and Zhao used TK1 for fast filter bank convolution with three-dimensional wavelet transform [5]. With its 326 GFLOPS peak power GPU, TK1 promises a lot for the area of embedded supercomputing [6]. 2. GPU Computing Platforms & Libraries For general purpose GPU programming, NVidia CUDA and Khronos OpenCL platforms are widely used. While CUDA only works with NVidia GPUs, OpenCL is supported by both AMD and NVidia GPUs. Moreover, OpenCL support parallel programming for common Intel, AMD and ARM CPUs, some mobile GPUs and FPGAs. According to NVidia, Tegra K1 supports OpenCL, however the compatible drivers and SDK has not been realized yet (as of November 2015) [7]. Both, CUDA and OpenCL are cross-platform 3 APIs that can run natively on Windows, Linux and OSX operating systems. Jetson TK1 runs Linux4Tegra operating system (basically an Ubuntu 14.04 with pre-configured drivers) and CUDA is the most common choice for GPU programming on TK1. ArrayFire is an open-source C/C++ library which aims to make GPU programming simple and fast. ArrayFire API supports both CUDA and OpenCL capable devices including NVidia and AMD GPUs, AMD and Intel CPUs and some mobile devices from ARM [8]. ArrayFire has hundreds of functions across various domains such as: Vector Algorithms Image Processing Computer Vision Signal Processing Linear Algebra Statistics 3. Dynamic Voltage and Frequency Scaling Dynamic voltage and frequency scaling (DVFS) is a commonly-used technique to make the computing systems power efficient by decreasing the clock frequency of a processor to allow a reduction in the supply voltage [9]. Linux kernel comes with five predefined CPU DVFS algorithms (governors) [10]: Userspace: frequency can be set manually by the user Powersave: frequency is set to the lowest possible value Performance: frequency is set to the highest possible value Ondemand: frequency is adjusted to the maximum in response to large increases in work-load Conservative: frequency is adjusted similar to the ondemand but reacts slower to the changes TK1’s GPU is also under DVFS control however it utilizes one governor that can be enabled or disabled to allow userspace control [10]. If time consumption is important in the applications performance governor can be used. However, when the power consumption is more important than the time consumption, the remaining governors can be selected. 4. Linux Desktop Environments There are three layers included in the Linux desktop system: 4 X windows is the foundation and the primitive framework that allows for graphic elements to be drawn on the screen. Window manager controls the placement and appearance of windows. It requires X windows but not a desktop environment. Some well-known examples are Fluxbox, IceWM, Window maker and Openbox. Desktop environment is a fully integrated system that includes a window manager and builds upon it. GNOME, KDE, Xfce and LXDE are popular desktop environments for Linux [11]. TK1 comes with Unity desktop environment which is typically based on GNOME3 and Unity is the default desktop on Ubuntu distributions. 5. Previous studies using TK1 in IRIS Lab In this section, benchmark results obtained by Mr. Ben Olson using TK1 are briefly introduced. The detailed information can be found in [12]. In order to determine NVidia Jetson TK1 performance on imaging applications, a gamma correction application was written using C and CUDA. In the experiments, two different test image was used with resolutions at 400x300 and 1920x1080, respectively. Test images, shown in Figure 2, was used to show Jetson capabilities in real world applications. Image gamma correction process is performed on single- threaded CPU, multi-threaded CPU, CUDA GPU and hybrid (multi-threaded CPU + CUDA GPU) modes. (a) 400x300 test image (b) 1920x1080 test image Figure 2. Test images used in the experiments [12]. Results of 400x300 test image are given in Table 1. Olson has obtained a maximum of 140 FPS without displaying the results. In this study, which Linux DVFS governor used is not mentioned. However, results of the next section proves that the default ondemand governor was probably used. Olson used a split parameter which is in the range of [0,1] to determine how much of the work will be processed with CPU or 5 GPU. His results indicate that using the split parameter as 0.9 gives the best results, which means 90% percent of the work is processed in the GPU and the remaining in the CPU. Table 1. Gamma correction frame rates of 400x300 test image [12]. Method With Without display (fps) display (fps) Single threaded CPU 10 16 Multi threaded CPU 30 60 CUDA GPU 60 130 Hybrid 80 140 (multi-thread CPU + CUDA GPU) Figure 3 shows the results of gamma correction on the 1920x1080p Full HD image. In this study only the results without displaying images are given. According to the results maximum FPS is obtained as 27 FPS for the Full HD test image by using the split parameter as 1.0 (bimage_cuda). Singe-thread CPU (bimage_nothread) and multi-threaded CPU (bimage_thread) modes resulted that 1 and 4 FPS, respectively. Figure 3. Gamma correction frame rates of 1920x1080 test image [12]. 6 6. Image Processing using TK1 In this application; a simple image processing task, gamma correction, is realized using ArrayFire library and compared with the results of Olson’s previous study. Image gamma is a non-linear operation used to encode or decode luminance in still images or video: 훾 푉표푢푡 = 퐴푉푖푛 (1) where A is a constant and equals 1 in common, input values are in the range of 0-1, 훾 < 0 is called as encoding gamma or gamma compression and 훾 > 0 is called as decoding gamma or gamma expansion. In Figure 4, applying different gamma values to an image are demonstrated [13]. Figure 4. Gamma correction example. ArrayFire handles all variables in a universal data-type class called array. Computational algorithms are more readable with math-resembling array-based notation. All array objects are stored in the GPU’s memory and all operations on array objects are realized in GPU in parallel. In Figure 5, a single-line code- snippet of a C++ function is given that realizes image gamma correction on GPU. array applyGammaEachPixel(array img, float gammadef) { return pow((img/255.f),gammadef)*255; } Figure 5. ArrayFire implementation

Load more