Reducing Redundancy of Real Time Computer Graphics in Mobile Systems

Reducing Redundancy of Real Time Computer Graphics in Mobile Systems Enrique de Lucas Doctor of Philosophy UNIVERSITAT POLITECNICA` DE CATALUNYA Department of Computer Architecture Barcelona, 2018 2 Reducing Redundancy of Real Time Computer Graphics in Mobile Systems Enrique de Lucas Advisers Dr. Joan-Manuel Parcerisa Dr. Pedro Marcuello ARCO Research Group Department of Computer Architecture UNIVERSITAT POLITECNICA` DE CATALUNYA Barcelona Spain Thesis submitted for the degree of Doctor of Philosophy at the Universitat Politècnica de Catalunya 3 4 ‘Gutta cavat lapidem, [non vi, sed saepe cadendo]’ –Publius Ovidius Naso, in Epistulae ex Ponto IV, 10, 5. Expanded in the Middle Ages. 5 6 Keywords Collision Detection, GPU, Android, Rasterization, Mobile GPU, Rendering, Image Based Colli- sion Detection, Graphics Rendering Hardware, Object Interference Detection, Rasterizing Graphics Hardware, Overdraw, Overshading, Rendering Order, Front-to-back, GPU microarchitecture, Visi- bility, Hidden Surface Removal, Deferred Rendering, Deferred Shading, TBR, TBDR, Temporal Coherence, Frame Coherence, Topological Order, Topological Sort, Energy-efficiency 7 Abstract At each new generation the growing computing power of mobile devices has promoted the adoption of more powerful support for graphics and real-time physics simulations with increasing precision and realism. Given the battery-operated and handheld nature of these devices they must use as little energy as possible, because it is crucial to enlarge battery life and to keep a comfortable surface touch temperature. Hence, software and hardware improvements are crucial to deliver a low-power yet rich user experience that satisfies the user demands on the functionality of mobile devices. The goal of this thesis is to propose novel and effective techniques to eliminate redundant computations that waste energy and are performed in real-time computer graphics applications, with special focus on mobile GPU micro-architecture. Improving the energy-efficiency of CPU/GPU systems is not only key to enlarge their battery life, but also allows to increase their performance because, to avoid overheating above thermal limits, SoCs tend to be throttled when the load is high for a large period of time. Prior studies pointed out that the CPU and especially the GPU are the principal energy consumers in the graphics subsystem, being the off-chip main memory accesses and the processors inside the GPU the primary energy consumers of the graphics subsystem. In the first place, we focus on reducing redundant fragment processing computations by means of improving the culling of hidden surfaces. During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image. When the GPU realizes that an object or part of it is not going to be visible, all activity required to compute its color and store it has already been performed. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware to maximize the culling effectiveness of the GPU and minimize overshading, hence reducing execution time and energy consumption. VRO exploits the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence) to provide the feeling of smooth transition. VRO keeps visibility information of a frame, and uses it to reorder the objects of the following frame. Since depth-order relationships among objects are already tested in the GPU, VRO incurs minimal energy overheads. It just requires adding a small hardware to capture the visibility information and use it later to guide the rendering of the following frame. Moreover, VRO works in parallel with the graphics pipeline, so negligible performance overheads are incurred. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average. In the second place, we focus on avoiding redundant computations related to CPU Collision Detection. Graphics animation applications such as 3D games represent a large percentage of 9 downloaded applications for mobile devices and the trend is towards more complex and realistic scenes with accurate 3D physics simulations. Collision detection (CD) is one of the most important algorithms in any physics kernel since it identifies the contact points between the objects of a scene, and determines when they collide. However, real-time highly accurate CD is very expensive in terms of energy consumption. We propose Render Based Collision Detection (RBCD), a novel energy-efficient high-fidelity CD scheme that leverages some intermediate results of the rendering pipeline to perform CD, so that redundant tasks are done just once. Comparing RBCD with a conventional CD completely executed in the CPU, we show that its execution time is reduced by almost three orders of magnitude (600x speedup), because most of the CD task of our model comes for free by reusing the image rendering intermediate results. Although not necessarily, such a dramatic time improvement may result in better frames per second if physics simulation stays in the critical path. However, the most important advantage of our technique is the enormous energy savings that result from eliminating a long and costly CPU computation and converting it into a few simple operations executed by a specialized hardware within the GPU. Our results show that the energy consumed by CD is reduced on average by a factor of 448x (i.e., by 99.8%). These dramatic benefits are accompanied by a higher fidelity CD analysis (i.e., with finer granularity), which improves the quality and realism of the application. 10 Acknowledgements First of all I want to thank my advisers, Joan-Manuel Parcerisa and Pedro Marcuello, who have taught me almost everything I know about Computer Architecture. They always offered me valuable guidance and support as well as they were involved in my day-to-day work. I am truly convinced I could not have had better advisers. I am also very grateful to Prof. Antonio González, who offered me the opportunity to start in this research area, first at Intel Barcelona and then at UPC1. Thanks for your sagacious feedback and your wise advise. I wish to thank the members of ARCO I met these years. Especially to JoséMar´ıa Arnau, whose outstanding and pioneering work encouraged me to work in the field of low-power GPUs. I am grateful for all his support and help and for teaching me what an international conference is about. Thanks to Mart´ı-the one- for continuing the research line in ARCO group. Thank you Gem and Emilio for your feedback and friendship. I also wish to thank the “The D6ers”: Albert, Franyell, Hamid, Josue, Marc, Mart´ı-the other one- and Reza. I wish you the best of luck. I am lucky to have met all my friends from Campus Nord. Enric, Gemma, Javi, Manu, Marc, Niko and Oscar, who welcomed me to Barcelona and shared with me seminars, meetings, coffees, dinners and parties. However, they did not warn me about where I got when I started the doctorate. I am really blessed to have so many great friends like Alex, Andrés, David, Guille, Javi, Manu, Oscar and Rubén, who have shared with me many moments for more than 15 years. Thank you for your moral support, motivation and friendship, which made this journey more comfortable and drove me to give my best. Finally, I would like to express my deepest gratitude to my family for their commitment, their unconditional love, their support and their infinite patience. Especially Jenifer, who closely shared with me joys and celebrations during this thesis, but also had to keep up with me during the endless periods of stress that precede academic deadlines. Thank you for your patience, comprehension, and support during these years of thesis. 1 This work has been supported by the Spanish State Research Agency under grants TIN2013-44375-R, TIN2016- 75344-R (AEI/FEDER, EU), and BES-2014-068225. 11 Contents 1 Introduction 25 1.1 Current Trends ....................................... 25 1.1.1 Real Time Mobile Graphics Software ....................... 26 1.1.2 Mobile Graphics Hardware ............................ 28 1.2 Problem Statement ..................................... 30 1.2.1 Major Energy Consumers ............................. 31 1.2.2 Major GPU Energy Consumers .......................... 33 1.2.3 Occlusion Culling .................................. 34 1.2.4 Collision Detection ................................. 35 1.3 State of the art ....................................... 35 1.3.1 Reduction of Redundant Fragment Shading ................... 35 1.3.2 Collision Detection ................................. 39 1.4 Thesis Overview and Contributions ............................ 42 1.4.1 Visibility Rendering Order ............................ 42 1.4.2 Render Based Collision Detection ......................... 44 1.4.3 Other contributions ................................ 46 2 Background 49 2.1 Graphics Rendering Pipeline ............................... 49 2.1.1 Application Stage ................................. 49 2.1.2 Geometry Stage .................................. 50 13 CONTENTS 2.1.3 Rasterization .................................... 55 2.2 GPU Microarchitecture: ................................. 59 2.2.1 Immediate Mode Rendering ............................ 59 2.2.2

Reducing Redundancy of Real Time Computer Graphics in Mobile Systems

Evolution of Programmable Models for Graphics Engines (High

Rendering 13, Deferred Shading, a Unity Tutorial

Real-Time Lighting Effects Using Deferred Shading

Privacy of Streaming Apps and Devices

Objet30 User Guide

Passmark Software - Video Card (GPU) Benchmark Charts - Video Car

More Efficient Virtual Shadow Maps for Many Lights

A Novel Multithreaded Rendering System Based on a Deferred Approach

University of Massachusetts Dartmouth Center for Scientific

Efficient Virtual Shadow Maps for Many Lights

Wipein - F- - a 3D Action Game

User Guide Trueconf for Android TV Table of Contents 1