The LPGPU2 Project: Low-Power Parallel Computing on Gpus
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by DepositOnce Juurlink, B.; Lucas, J.; Mammeri, N.; Bliss, M.; Keramidas, G.; Kokkala, C.; Richards, A. The LPGPU2 Project: Low-Power Parallel Computing on GPUs Conference paper | Accepted manuscript (Postprint) This version is available at https://doi.org/10.14279/depositonce-7088 © ACM 2017. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in SCOPES '17 Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems http://dx.doi.org/10.1145/3078659.3078672 Juurlink, B., Lucas, J., Mammeri, N., Bliss, M., Keramidas, G., Kokkala, C., & Richards, A. (2017). The LPGPU2 Project. In Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems - SCOPES ’17. ACM Press. https://doi.org/10.1145/3078659.3078672 Terms of Use Copyright applies. A non-exclusive, non-transferable and limited right to use is granted. This document is intended solely for personal, non-commercial use. The LPGPU2 Project - Low-Power Parallel Computing on GPUs Extended Abstract Ben Juurlink Jan Lucas Nadjib Mammeri TU Berlin TU Berlin TU Berlin Einsteinufer 17 Einsteinufer 17 Einsteinufer 17 Berlin, Germany 10587 Berlin, Germany 10587 Berlin, Germany 10587 [email protected] [email protected] [email protected] Martyn Bliss Georgios Keramidas Chrysa Kokkala Samsung Electronics Think Silicon Think Silicon Communications House Patras Science Park Patras Science Park Staines, UK Rion Achaias, Greece 26504 Rion Achaias, Greece 26504 [email protected] [email protected] [email protected] Andrew Richards Codeplay Argyle House Edinburgh, Scotland [email protected] ABSTRACT 1 INTRODUCTION TheLPGPU2 project is a30-month-project (InnovationAction) Consumers today expect to be able to carry a supercomputer fundedby theEuropean Union.Itsoverall goalis to develop an aroundwiththem,thathasdetailedgraphicaldisplays,is easy to analysisandvisualizationframework thatenablesGPU application useandlastsformorethanadayon asinglebatterycharge.Not developers to improve theperformance andpower consumptionof only that, but now users expect their devices to see, listen and theirapplications.To achieve thisoverall goal,several key objec- understand the world around them. This applies to devices from tives need to be achieved. First,several applications(usecases)need smartphonesthatcanunderstandhumanspokenrequeststo allthe to be developed foror ported to low-power GPUs. Thereafter,these way to cars that can drive themselves. To deliver on this applicationsneed to be optimizedusingthetooling framework. capability requires very power efficient graphics processors In addition,power measurementdevices andpower models need ("GPUs"). to be developed thatare 10xmoreaccuratethanthestateof the Powerconsumptionis importantto processing,smartpower art.Theproject consortiumactivelypromotes open vendor-neutral managementalgorithmssuchas dynamicvoltagefrequencyscaling standards viathe Khronosgroup. Thispaper brieflyreports on the (DVFS)areusedas amitigationforhighpowerconsumptionbut achievements madein thefirsthalfof theproject, andfocuses on thisusuallyresultsin alesscompellinguserexperienceas theCPU theprogress madein applications;in power measurement,esti- andGPUareclockeddownto conservepowerresultingin lessraw mation,andmodelling; andin theanalysisandvisualizationtool processingpower.Thestrictpowerlimitationsmeansthatthese suite. demandscannotbe metthroughhardwareimprovementsalone,the softwaremustbetterexploittheavailableresources.Unfortunately, programmersarehinderedwhencreatinglow-powerGPUsoftware by thequalityof currentperformanceanalysistools.As software becomesmorecomplexit becomesincreasinglyunmanageablefor ,&:803%4 programmersto optimizethesoftwareforlow-powerdevices. (16T MPXQPXFS FNCFEEFE QPXFSNPEFMJOH QFSGPSNBODFDPVO TheLPGPU2consortiumhascometogetherto workas adiverse UFST NJDSPCFODINBSLT WJTVBMJ[BUJPO GSBNFXPSL teamon deliveringlow-powerGPUsfromalltherangeof angles 1 and functionality. These applications will be used as benchmarks for the power and performance analysis tool being developed as part of the project. Figure 1 illustrates the role of the applications in LPGPU 2 . The applications under consideration are important commer- cial applications for individual partners in the area of their core businesses: • Spin Digital develops video codecs for the next generation of ultra high quality media. In LPGPU 2 Spin Digital is de- veloping a media player based on the H.265 video codec. The main contribution has been the development of a high performance and high quality multi-API video rendering engine. When combined together with the H.265 video de- Figure 1: General diagram of the LPGPU 2 framework coder, the new video rendering engine allows the creation of state-of-the art media playback applications in areas such as UHD TV, Virtual Reality (VR), and large screen required. The consortium combines commercial tools, applications, display. platform and GPU designers with academic researchers to analyse • Samsung graphics team in the UK is responsible for android GPU power and performance, define standard interfaces to reliably mobile graphics in their platform. Their applications are measure the power and performance, and create a tool chain to related to Virtual Reality (VR), Augmented Reality (AR) provide clear information and insights to software developers. The and font rendering. Any output from this work will help companies in the consortium are world leaders in power-efficient in improving Samsungs’ mobile graphics platform where GPU design (Think Silicon), GPU and compute tools (Codeplay), their mobile phones could be used for VR using GearVR, graphics standards and applications (Samsung), video codecs and their mobile camera can used for AR and their mobile media players (Spin Digital), and the university in the consortium phone screen could be used for displaying text using font (Technical University of Berlin TUB) has leading experts on parallel rendering. applications and multi-core architectures. • To realize the possibilities of computational photography, This project proposes to aid the programmer in creating software applications must access the hardware at low level. This for low-power GPUs by building on the results of the first LPGPU would allow to control individual ISP blocks, as well as project [ 2] to provide a complete performance analysis process for send and receive detailed information from the hardware, programmers. It will address all aspects of performance analysis, including exposure settings, flash, focus point, timestamps, from hardware power and performance counters, to a tool chain and raw image sensor data. Using the GPU for post-camera that processes and visualizes information from these counters. The processing is a promising direction since the bulk of the main objectives of this project are: memory traffic can be eliminated. Think Silicon has built 1. To help programmers to improve the energy efficiency of three different implementations of a set of ISP algorithms their applications, the LPGPU2 tool chain will provide hints and (in C, in NEMA|gfx, and in Vulkan). An ISP demonstrator suggestions to GPU programmers showing ways to reduce power. based on NEMA|gfx [ 10 ], a proprietary low-level graphics 2. To enable programmers to be able to write their software API of Think Silicon, was developed and presented at in- once and run it on a variety of different low-power GPUs. The dustrial exhibitions with the goal of exploiting commercial LPGPU2 project will work on standardizing power analysis and opportunities of executing ISP and post-camera processing power-efficient programming models. algorithms on embedded GPUs. 3. To increase the productivity in GPU software development, • TensorFlow ™[1], is an artificial intelligence framework we propose an approach in which there are layers of technologies, that can be used for executing machine learning algorithms. that all work together via open standards, or open source software. While a computation expressed using TensorFlow can be 4. To bring technologies to market in a commercializable form, executed across heterogeneous systems, support has so including productizing and commercializing the technologies de- far been limited to NVIDIA ® processors using CUDA ® [6]. veloped in the previous LPGPU project [ 2]. This includes bringing In order to enable developers to access a wider range of the SYCL ™ standard into real-world AI applications and putting processors, we are working to bring support for OpenCL the LPGPU video decoders into commercial video systems. devices to the TensorFlow framework using SYCL. OpenCL is a framework for writing programs that execute across 2 APPLICATIONS heterogeneous platforms, and SYCL is a royalty-free, cross- As part of LPGPU 2 several applications are being developed and platform C++ abstraction layer that builds on the underly- ported to low-power GPU environments. These applications com- ing concepts, portability and efficiency of OpenCL, while bine graphics and compute parts, and require the use of existing adding the ease-of-use and flexibility of modern C++14. Par- and emerging APIs such as OpenGL ES ™ [4], OpenCL ™ [7, 11 ], allelization is also important from a processing and power and Vulkan ™ [13 ], in order to deliver the required performance management perspective.