Bachelor of Science in Digital Game Development September 2020

Light Performance Comparison between Forward, Deferred and Tile-based forward rendering

Vladislav Polyakov

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Bachelor of Science in Digital Game Development. The thesis is equivalent to 10 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information: Author(s): Vladislav Polyakov E-mail: [email protected]

University advisor: Hans Tap Department of Computer Science

Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract

Background. In this experiment forward, deferred and tile-based forward render- ing techniques are implemented to research about the light-rendering performance of these rendering techniques. Nowadays most games and programs contains a graph- ical content and this graphical content is done by using different kind of rendering operations. These rendering operations is being developed and optimized by graphic programmers in order to show better performance. Forward rendering is the stan- dard technique that pushes the geometry data through the whole rendering pipeline to build up the final image. Deferred rendering on the other hand is divided into two passes where the first pass rasterizes the geometry data into g-buffers and the second pass, also called lighting pass, uses the data from g-buffers and rasterizes the lightsources to build up the final image. Next rendering technique is tile-based forward rendering, is also divided into two passes. The first pass creates a frustum grid and performs light culling. The second pass rasterizes all the geometry data to the screen as the standard forward rendering technique. Objectives. The objective is to implement three rendering techniques in order to find the optimal technique for light-rendering in different environments. When the implementation process is done, analyze the result from tests to answer the research questions and come to a conclusion. Methods. The problem was answered by using method "Implementation and Ex- perimentation". A render engine with three different rendering techniques was im- plemented using C++ and OpenGL API. The tests were implemented in the render engine and the duration of each test was five minutes. The data from the tests was used to create diagrams for result evaluation. Results. The results showed that standard forward rendering was stronger than tile- based forward rendering and deferred rendering with few lights in the scene.When the light amount became large deferred rendering showed the best light performance results. Tile-based forward rendering wasn’t that strong as expected and the reason can possibly be the implementation method, since different culling procedures were performed on the CPU-side. During the tests of tile-based forward rendering there were 4 tiles used in the frustum grid since this amount showed highest performance compared to other tile-configurations. Conclusions. After all this research a conclusion was formed as following, in envi- ronments with limited amount of lightsources the optimal rendering technique was the standard forward rendering. In environments with large amount of lightsources deferred rendering should be used. If tile-based forward rendering is used, then it should be used with 4 tiles in the frustum grid. The hypothesis of this study wasn’t fully confirmed since only the suggestion with limited amount of lights were con- firmed, the other parts were disproven. The tile-based forward rendering wasn’t strong enough and the reason for this is possibly that the implementation was on the CPU-side.

Keywords: Forward rendering, Deferred rendering, Tile-based forward rendering, Forward-plus rendering, Pointlight implementation.

Acknowledgments

I would like to thank my supervisor Hans Tap for helping me writing this thesis. I would also like to thank Stefan Petersson for his support in the implementation of the rendering techniques for this experiment.

iii

Contents

Abstract i

Acknowledgments iii

1 Introduction 1 1.1 Background ...... 1 1.2Aim...... 3 1.3 Objectives ...... 3 1.4 Research Questions ...... 3 1.5 Hypothesis ...... 3 1.6 Structure of this thesis ...... 4

2 Related Work 5

3 Method 9 3.1 Implementation ...... 9 3.1.1 OpenGL Pipeline ...... 10 3.1.2 Forward Rendering ...... 10 3.1.3 Deferred Rendering ...... 15 3.1.4 Tile-based Forward Rendering ...... 19 3.2 Performance Testing ...... 21 3.2.1 Hardware ...... 22

4 Results 23

5 Analysis and Discussion 31

6 Conclusions and Future Work 35

References 37

v

List of Figures

3.1 Shows OpenGL Shader Pipeline. The yellow sequences are the fixed one and the grey are the programmable one. Picture taken from Khronos official page...... 9 3.2 Shows the forward rendering pipeline. Picture taken from [3]. 10 3.3 Shows the camera initialization calculations...... 11 3.4 Shows different matrix calculations and light data updates...... 11 3.5 Shows an example on a vertex shader for forward rendering technique. 12 3.6 Shows the pointlight calculation implementation in the fragment shader. 13 3.7 This illustration is showing the attenuation formula mentioned above. The Kc is attenuation constant, Kl is linear and Kq is quadratic coef- ficient. The d represents the distance from the specific and the lightsource...... 13 3.8 Illustrates a lightsource surrounded by different geometric objects. . . 14 3.9 Illustrates a lightsource with specular component...... 15 3.10 Shows the deferred rendering shading pipeline. Picture taken from [3]. 15 3.11 Shows the creation of the G-buffer which stores geometric data such as positions, normals, albedo (diffuse) and depth...... 16 3.12 Shows the first pass fragment shader...... 17 3.13 This illustration shows the second pass fragment shader...... 18 3.14 This figure shows the position, normal and albedo (diffuse) buffers. . 18 3.15 Shows the visual difference between deferred rendering without and with light volumes...... 19 3.16 Shows different lists used for the tile-based forward rendering. Picture taken from [2]...... 21 3.17 Shows an example scene for tests of light calculation performance. . . 22

4.1 Shows the results of forward rendering frametime test...... 23 4.2 Shows the results of forward rendering framerate test...... 24 4.3 Shows the results of deferred rendering frametime test...... 25 4.4 Shows the results of deferred rendering framerate test...... 26 4.5 Shows the performance results with different amount of tiles for tile- based forward rendering...... 27 4.6 Shows the results of tile-based forward rendering frametime test. . . . 28 4.7 Shows the results of tile-based forward rendering framerate test. . . . 29

5.1 Shows the frametime results with the forward rendering in blue, de- ferred rendering in orange and tile-based forward rendering in grey. . 31

vii 5.2 Shows the framerate results with the forward rendering in blue, de- ferred rendering in orange and tile-based forward rendering in grey. . 32

viii Chapter 1 Introduction

This thesis is researching about the light-rendering performance of different render- ing techniques such as standard forward, deferred rendering technique and a more advanced rendering technique called tile-based forward rendering. Since light render- ing is an expensive procedure the goal is to determine which rendering technique is a better choice in environments with few lights and which one is better in environments with a large amount of lights. For this thesis OpenGL API is used and there is a render engine implemented in order to be able to render geometry and other visual constituents to the screen. In this thesis it’s explained how the rendering techniques and different light calculations is implemented. Furthermore to improve the results there are different optimizations explained and implemented for chosen rendering techniques.

1.1 Background

Nowadays the most games or programs, contains a graphical content. It’s not there to only make it easier for the user to use and navigate, but even make the program more beautiful and presentable. The graphical content is created by using different kind of rendering operations. Rendering is used to create images on the computer screen, where the program collects needed data and performs different calculations on how it should be rendered on the screen. Year after year, graphics programmers are trying to develop different rendering techniques in order to achieve high quality and photo-realistic images. The first common used rendering technique is forward rendering, there are other several rendering techniques that can generate images, but the most simple in implementation and of that reason popular technique are forward rendering. But simple doesn’t mean cheap and effective, the mentioned rendering technique has disadvantages and of that reason more advanced rendering techniques are created and optimized in order to show better performance.

Forward rendering is the standard rendering method that rasterizes the geometry objects in the scene. There are points, lines and triangles used to represent different 3D shapes. is a process which converts these shapes into an image, theses images are later displayed on the screen as series of . The rasterizer is a part of the whole rendering pipeline, during the shading-process the geometry is pushed forward through the whole rendering pipeline in order to build up the final image which will be rendered on the screen. A transparent pass can be added, in this transparent pass all objects will be sorted from back to front, from the cameras

1 2 Chapter 1. Introduction point of view. It’s done in order get a correct final image, supporting the transparent objects in the scene. In this thesis transparent objects is not used, respectively trans- parent pass is not implemented. If there is lights in the scene, the forward rendering technique will apply different light-calculations on each light for each object in the scene. The light is involved even if the specific light doesn’t affect the final colour of the pixel. This rendering technique doesn’t even try to eliminate those lights[12], it shows therefore worser results the higher amount of lightsources there is in the scene. Forward rendering isn’t the most optimal rendering technique for building big scenes with many geometric objects and lightsources[12], of that reason there are a few other rendering techniques presented, like deferred rendering and tile-based forward rendering.

Traditional deferred rendering is a technique that is divided in 2 different render passes. In the first pass, called the geometry pass, the objects’ attributes are ras- terized into different geometry buffers (G-Buffers)[5][3]. In the second pass, called lighting pass, the lightsources are rasterized and used for the final lighting of the scene[12]. The main purpose of the deferred rendering technique is to support a larger amount of dynamic lightsources at a reasonable cost and perform the shading process only once per fragment to solve overdraw problems. Deferred rendering re- quires modern hardware since it uses multiple render targets and has a high memory consumption.

Tile-based forward rendering is based on the standard forward rendering and is like deferred rendering divided into several rendering-passes. It’s a modern technique which tries to reduce the memory consumption and increase the performance of dif- ferent computations. Tile-based rendering techniques in general can be implemented on the standard forward rendering and deferred rendering techniques. In this thesis there is only tile-based forward rendering implemented. Tile-based forward render- ing can be used to handle large amount of lightsources in the scene[10]. It can be done by creating an amount of tiles consisting of pixels, these tiles divides the screen and builds a frustum. The frustum is then used to perform lightculling and remove the lights that doesn’t affect the specific pixel on the screen. The rest of the lights which actually is affecting determined pixel is stored in specific light index list and is connected to the tile[11].

In this thesis there are only pointlights implemented, since other types of light- sources is not necessary for the rendering technique comparison, the implementation is different but the effect on the performance are the same. A pointlight is a light- source with a specific position which lights up in all directions with a given radius. The light fades out over with the distance from the origin of the pointlight. Can be visualized like the sun or a lightbulb[4]. They have four main factors, these are ambient, attenuation, diffuse, specular and a few other minor constants that helps to light up the created environment in a correct way. There can also be found small differences in the light calculations among the rendering techniques but the final vi- sual result is close to identical. 1.2. Aim 3 1.2 Aim

The aim of this thesis is to obtain the performance and compare the different ren- dering techniques based on their fps. Fps is an abbreviation for frames per second. It’s the amount of frames/images that can be displayed each second. This is done in order to find out which rendering technique is the optimal to use in different situations, like in environments with few lights and environments with a large amount of lights. Moreover the goal is also to find out if tile-based forward rendering is good enough, to be an alternative solution to the standard rendering techniques presented in this thesis.

1.3 Objectives

The objective with this thesis is to implement three different rendering techniques in order to find the best suitable rendering technique for light rendering in different environments. When the implementation is done, perform different tests with determined amount of lights and analyze the collected data to come to a conclusion.

1.4 Research Questions

RQ1: Since light rendering is an expensive and a common process, what rendering technique is the most suitable technique to be used in an environment with a large amount of lights?

RQ2: In environments with a limited amount of lights, which rendering technique is optimal to use?

RQ3: When using tile-based forward rendering, what amount of tiles is the best choice according to the performance?

1.5 Hypothesis

The hypothesis of this thesis is that the more advanced rendering technique called tile-based forward rendering will show better performance in environments with large amount of lightsources. It will show better performance when the tile-amount is 9 since it’s the optimal size according to the storage and bandwidth usage. The hypoth- esis is also that the standard forward rendering will be stronger than other rendering techniques in environments with few lightsources. 4 Chapter 1. Introduction 1.6 Structure of this thesis

This thesis consists of different sections, the first section is Related Work. In this section different other researches are described, and all those mentioned works are related to this experiment. All these works were found on BTH Summon Databases. After Related work section there is the Method section, which describes what method is used to be able to answer the research questions. Furthermore the implementa- tion process is explained in the method section so other researchers can repeat this test if so needed. The Result section will present the results of the performed tests to come to a conclusion. Next section is Analysis and Discussion where the results of the experiment are analyzed and discussed. The last section is Conclusion and Future Work where a conclusion are formed and different futural works are suggested. Chapter 2 Related Work

Different related works were done where the goal was to measure and compare dif- ferent aspects of a few different rendering techniques. Different optimizations were also implemented in order to improve the performance of each rendering technique.

In 2017 a research was done by Marcus Rahm[8], where the goal was to investigate the performance of the tile-based forward rendering technique. The investigation was done on the light culling part, and the comparison was done between the implemen- tation on the CPU-side and the GPU-side. He did also use multithreading to test the performance of the implementation on the CPU-side. He used DirectX 11 API in the research and the implementation on the GPU-side was realized with HLSL pro- gramming language, which is an abbreviation for High Level Shading Language. The results showed that the GPU has better performance in general, but the CPU on the other hand showed good performance using tile-based forward rendering compared to the performance of the standard forward rendering technique. This showed that the light-culling process implemented on the CPU is a good solution but requires more computer resources compared to the standard forward rendering.

Tile-based forward rendering showed a good potential compared to deferred ren- dering. Deferred rendering was popular in games and other environments with a large amount of lights and geometry. Deferred rendering works as following, differ- ent parameters such as position, normal and different material coefficients stored in so called G-buffers. In Fabio Policarpo and Francisco Fonseca research from 2005[7], it’s shown that this method allows to submit the geometry in the scene to the GPU only once, which increases the performance since only affecting fragments are shaded. Deferred rendering showed many advantages like less complexity compared to the traditional rendering techniques and perfect depth complexity for lighting. On the other hand, deferred rendering has a lot of disadvantages, for example it can’t handle transparency. Of that reason other rendering techniques were used instead. Among them all there was tile-based forward rendering, which Takahiro Harada, Jay McKee and Jason C.Yang introduced in 2012[14].

Tile-based forward rendering were also presented by Ola Olsson and Ulf Assarsson in 2011[11]. The article presented other different rendering techniques but focused mostly on tile-based rendering techniques. There were a few optimization methods described for each rendering technique, furthermore there were a few advantages and disadvantages presented for each rendering technique. In the end there was

5 6 Chapter 2. Related Work an interesting fact presented, the forward rendering technique supports antialiasing and transparency, it showed good results in environments with few lightsources and would be a good choice instead of the more advanced tile-based rendering techniques.

There were more other works done by Takahiro Harada in 2012[13]. When using tile-based forward rendering, there can be a 2.5D culling method applied. As a mat- ter of fact there can be light that is overlapping other lightsources in the scene on the specific tile. The reason for this is the 2D culling method. The 2D culling has its minimum and maximum depth clip, and if there are many lights in the scene it can decrease the performance. In the research Takahiro presented a way to improve the accuracy of culling the lights using 2.5D culling method. The frustum is divided into an determined amount of cells in the z-direction, if there is overlapping lightsources found, then a depth mask is created. After that the geometry depth values are calcu- lated. This data is then used to perform logical operations to check the overlap. This method can be applied on tile-based forward rendering and is faster than deferred lighting. The 2.5D culling method showed an improvement in performance since the amount of overlapping lightsources decreased significantly.

In 2017 Jeremiah van Oosten[12] presented a work about volume tiled forward shading. Volume tiled forward rendering is based on the tile-based forward render- ing technique implemented by Ola Olsson and Ulf Assarsson in 2011[11]. The new technique divides the 2D screen tiles into 3D tiles. Furthermore it uses the same methods as O.Olsson, M.Billeter and U.Assarsson did in 2012[9][10] when imple- menting the cluster-based rendering techniques. When assigning the lightsources to the tiles in volume tiled forward rendering, it can cause a performance bottleneck. To solve the bottleneck a bounding volume hierarchy (BVH) was implemented, it searches for lightsources that is overlapping specific volume tile. Those lights which are overlapping the volume tiles and are within the nodes of the bounding volume hierarchy, will be checked for crossing the specific volume tile. Volume tiled for- ward rendering is divided into two different passes where the first pass initializes and builds up the volume grid and the axis-aligned bounding boxes for each tile. In the second pass the light insertion to each volume tile is performed. In this pass different intersection tests are made between the lightsources in the scene and the volume tiles. Lightsources will be included only if they are positioned within the volume tile for the specific pixel. According to Jeremiah van Oosten volume tiled forward rendering with bounding volume hierarchy technique outperforms rendering techniques like forward rendering and tile-based forward rendering in scenes with more than 16 384 lightsources. On the other side tile-based forward rendering is a better choice in scenes with a few lightsources, for example 256 lightsources since it easier to implement and the volum tiled forward rendering is not necessary for implementation in that situation. To resumé all the written above it is worth to mention that if it’s needed to support a large amount of lightsources in the scene, then it’s a good choice to implement volume tiled forward rendering technique with bounding volume hierarchy.

Several different researches were done within this subject but there is very few works that is mainly focusing on comparing the performance between chosen render- 7 ing techniques based on their light performance. Furthermore very few does directly explain in what situations and circumstances chosen rendering technique is a better choice. The most researches with similar subjects has their main focus on different topics and in some way can indirectly help to answer the questions of this experiment.

Chapter 3 Method

This is an experiment and is designed to perform different tests and use the collected data to answer the research questions mentioned earlier.

3.1 Implementation

To be able to perform tests on different rendering techniques a render engine is im- plemented. This render engine is based on OpenGL API and written using C++ language on the CPU and GLSL language on the GPU. This render engine can im- port obj-files, the only criteria which must be fulfilled is that the objects must be triangulated, otherwise the object will not be accepted by the render engine. First of all a few different steps is needed to do in order to be able to render to the screen. The first step is to prepare and configurate different libraries such as GLFW, GLAD and GLM. With these libraries it’s possible to create a window in which the render- ing process will occur. GLFW is an abbreviation for Framework, GLAD is a multi language loader generator and GLM is Graphics Library Mathe- matics. There is also other libraries added for logging and checking different factors during the runtime. These libraries are spdlog and ImGui, ImGui is a library which also can be used as an user interface for different operations.

Figure 3.1: Shows OpenGL Shader Pipeline. The yellow sequences are the fixed one and the grey are the programmable one. Picture taken from Khronos official page.

9 10 Chapter 3. Method

3.1.1 OpenGL Shader Pipeline The OpenGL shader pipeline consists of fixed and programmable sequences as it’s showed in figure 3.1. This pipeline is called when a rendering operation is activated. It requires a vertex array object (VAO) and a linked shader program. The vertex array object represents a geometric object that is needed to render on the screen. The shader program is an index for the specific render method. The first programmable shader in the pipeline is the vertex shader, next is the tessellation stage and geometry stage and finally fragment shader. Each step in the pipeline is iterated differently, for example the vertex shader is iterated once per vertex and the geometry shader is iterated per primitive. The fragment-shader/pixel-shader is respectively iterated once per pixel. Each programmable section is implemented using GLSL language.

3.1.2 Forward Rendering

Figure 3.2: Shows the forward rendering shading pipeline. Picture taken from [3].

As mentioned earlier forward rendering is a standard rendering technique and the implementation process is simple compared to other techniques. This technique is used as a foundation to get the performance data and compare it with the other rendering techniques. Forward rendering can have several render passes if there are transparent objects used, but since they aren’t used in this experiment, the trans- parent pass is not implemented. First in the forward rendering pipeline is the vertex shader, it’s shown in figure 3.2. In this experiment the vertex shader is very simple, it needs model, view and projection matrices for the correct positioning of the ge- ometry objects in the view space. Those matrices are generated on the CPU-side, view-matrix is calculated by using camera position, front vector and up vector. The projection matrix is calculated by using the camera field of view, aspect ratio, near plane and far plane. The model matrix is used for model rotation or translation, and 3.1. Implementation 11 these calculations is showed in figures 3.3 and 3.4. All those three matrices are used to create a MVP matrix which as mentioned earlier will be used in the .

Figure 3.3: Shows the camera initialization calculations.

Figure 3.4: Shows different matrix calculations and light data updates.

The vertex shader in this experiment is easier to implement since it does only support static geometry, it doesn’t support any forms of animation. In figure 3.5 it’s shown that the vertex shader sends later the recieved data over to the next shader in the pipeline, in this experiment it’s the fragment shader which performs a lot of calculations. First of all it recieves all needed data about the geometric objects, then all data for light calculations. It performs then different kinds of calculations 12 Chapter 3. Method to determine the final color of each pixel on the screen.

Figure 3.5: Shows an example on a vertex shader for forward rendering technique.

As mentioned earlier, in this experiment pointlights were implemented in order to determine which rendering techniques has better performance in light rendering. To be able to calculate and render a pointlight in the correct way, these factors is needed: position and colour of the lightsource[4]. There are also a few other factors needed like lights’ radius of activity, strength and light attenuation constant. These constants are defined depending on which settings the light has. A pointlight has a limited range called lightradius, when that radius is reached the strength decreases with distance. It’s done with the attenuation constant, a higher attenuation constant leads to larger decrease of the light strength. Furthermore attenuation constant is divided into smaller subcategories, these are constant, linear and quadratic attenua- tion coefficients[1]. 3.1. Implementation 13

Figure 3.6: Shows the pointlight calculation implementation in the fragment shader.

Figure 3.7: This illustration is showing the attenuation formula mentioned above. The Kc is attenuation constant, Kl is linear and Kq is quadratic coefficient. The d represents the distance from the specific pixel and the lightsource.

The final visual image doesn’t only depend on the light factors but also on other factors like ambient, diffuse and specular. The ambient factor can be explained as a global illumination of the scene and depends direcly on the number and positions of the lightsources in the scene. The more lightsources and the larger radius they have, the higher will the ambient component be. The ambient factor is easy to calculate by simply multiply the radius of the light and lights colour. Next component is diffuse, it will surely improve the final image of the scene. The diffuse factor gives each object in the scene a beautiful visual effect, to do it correct a vector is created between the light position and the fragment of the object. Then the angle is calculated between the objects normal and the new vector created earlier to know at what angle the light vector is intercepting the surface of the object. The normal vector is a vector perpendicular to the objects surface. The smaller the difference between the normal 14 Chapter 3. Method vector and the light vector is, the more will the fragment be affected by this light.

Figure 3.8: Illustrates a lightsource surrounded by different geometric objects.

To add reflection to the surface, a specular component is used. Like diffuse it’s based on the light vector, the normal vector. It also needs cameras point of view. Ex- planation to this is simple, it’s done in order to know from what direction the camera is looking at the surface of the object. Specular component depends on the reflec- tion factor of the object, respectively the higher it is and the closer the camera is the clearer will the reflection be. It can be calculated by reflecting the light vector around the normal vector and then calculate the distance between the reflection vector and camera view. Smaller distance leads to bigger effect on the surface, the visual re- sult will show a highlight on the objects surface, the result can be found in figure 3.9. 3.1. Implementation 15

Figure 3.9: Illustrates a lightsource with specular component.

When all these coefficients are calculated and the attenuation is applied on each factor the light result is send further for rendering it on the screen. If the rendering performance is far from good there can be different optimizations performed, for example light culling. Light culling enables only lights that actually affects the fragment and in that way improves the performance.

3.1.3 Deferred Rendering

Figure 3.10: Shows the deferred rendering shading pipeline. Picture taken from [3]. 16 Chapter 3. Method

Deferred rendering is used to reduce the number of fragments to the actual amount that really contribute to the final color on the screen. Deferred rendering is divided into two passes, there can be added more passes, for example transparent pass if it’s needed but in this experiment only two passes is used. These passes are geometry pass and lighting pass. This technique delays the most of the heavy part of the rendering process, it performs the major heavy shading calculations in the second pass. In the first pass it first rasterizes each geometric object in the scene to specific buffers called g-buffers. These g-buffers is showed in figure 3.11 and is like 2D images which stores geometric data like positions, normals, diffuse factors and depth to later use it for different calculations in the second pass.

Figure 3.11: Shows the creation of the G-buffer which stores geometric data such as positions, normals, albedo (diffuse) and depth. 3.1. Implementation 17

The g-buffers are created on the CPU-side, there is also created a full-screen quad where these g-buffers will be rendered on. The vertex shader of the first pass is identical to the forward rendering vertex shader. It is passing over the geometry data to next shader and calculating the MVP matrix.

Figure 3.12: Shows the first pass fragment shader.

The fragment shader is recieving the data and writes it to the g-buffer. It’s all done to the g-buffer framebuffer, the second pass on the other hand is rendered using the default framebuffer until the screen quad is rendered. After it’s done, the g-buffer framebuffer data is used to read the data from and the default framebuffer is used to draw the data to. The second pass vertex shader is recieving the positions and uv-coordinates. The uv-coordinates is then sent over to the fragment shader which is showed in figure 3.13, where they are then used to render the scene according to which texture is chosen. 18 Chapter 3. Method

Figure 3.13: This illustration shows the second pass fragment shader.

Figure 3.14: This figure shows the position, normal and albedo (diffuse) buffers.

The light calculations is similar to the forward rendering light calculations. Since deferred rendering itself is a heavy rendering technique, it needs to be optimized. The first and fastest way to optimize deferred rendering is to apply light volumes on lightsources in the scene. It means that the volume of the lightsource is calculated and the lightcalculations is made only if the certain fragment is inside the light volume. This is done in order to save the amount of computations. The final result will be the same but now only relevant fragments will be rendered per each lightsource in the scene, which is shown in figure 3.15. The second way to optimize deferred rendering 3.1. Implementation 19 is tile-based deferred rendering. This method uses tiles to cull lights which does not affect specific fragment. The goal with the tile-based deferred rendering is to reduce the bandwidth and reduce number of light calculations.

Figure 3.15: Shows the visual difference between deferred rendering without and with light volumes.

3.1.4 Tile-based Forward Rendering Tile-based forward rendering, even called forward plus rendering, is an improved ver- sion of the standard forward rendering technique. This technique has been popular since the usage of the deferred rendering wasn’t the most optimal. This technique is divided into three different passes if there is transparent objects used, otherwise only two passes, these are the light culling pass and opaque pass. The goal with this rendering technique is to combine the advantages of both the forward rendering and deferred rendering. For this experiment the most important factor is the light calculation performance, the forward plus rendering technique has an improved light culling method which reduces the amount of light calculations[6]. Only lights which affect a fragment is included, it’s done through dividing the screen into a determined amount of tiles which contains A * B pixels[2]. When the tiles are created they are then used to create a frustum, the frustum is used for light culling. If a pointlight is positioned close enough to a tiles’ frustum it will be included, otherwise it will be culled. The included lightsources will later be inserted into specific lists and used for the final light calculations. 20 Chapter 3. Method

The first step in the implementation of the tile-based forward rendering is the step where the tile frustum is created, also called grid frustum. The screen is di- vided into a determined amount of 2D-tiles with a specific size, created in view space and will later be used for the light culling procedure. The size of each tile needs to be correctly regulated, if the size of each tile is smaller than normal it will result higher storage and bandwidth usage. On the other side, small tiles shows better light computations[11]. Larger tiles showed contrariwise results, larger tiles leads to lower bandwidth usage and decreases the light computation accuracy[12]. Precisely speak- ing, with larger tiles it’s harder to get accurate results which lightsource is actually touching which tile on the screen. In this experiment the amount of tiles varies from the minimal 1 tile to maximum 36 tiles.

The second step in implementation of tile-based forward rendering is the light culling procedure. The light culling process is done using the grid frustum created earlier. First of all, minimal and maximal depth values are calculated for each tile in the frustum grid[6]. These values will be used to determine the near and far plane which then is used to cull different light volumes. When using transparent geom- etry only light volumes that are beyond the maximum far plane is culled, but as mentioned earlier transparent geometry is not used in this experiment and of that reason only solutions for opaque geometry will be described. Lightsources that are in front the opaque geometry and affects the specific pixel in the tile will be included. During the light culling process lightsources is used to verify if the lightsource is positioned close enough to a tile’s frustum. If the lightsource is close enough, then the lightsource will be saved in specific light buffers for further usage. When imple- menting this rendering method there is different buffers used to store different kind of information needed for creating the frustum grid and performing the light culling process. These buffers are the light list, light index list and light grid list[11]. The light list contains all existing lightsources that is needed to be processed. The light grid list is a 2D grid which contains a size and offset values stored in the light index list, where the size is the amount of lightsources affecting the specific tile and the offset is describing the stride in the light index list. The size of light grid list depends on the amount of tiles used on the screen for light culling. To get the amount of tiles on the screen a specific calculation can be done, for these calculations there are size of each tile and resolution of the screen needed.

To determine if the lightsource is affecting the specific pixel in the tile, a frustum culling is used, for pointlights there is frustum sphere culling used. Each lightsource has a radius of activity which decreases with distance using attenuation coefficients which were described earlier. This radius component together with the origin of the lightsource is used to perform sphere collision test in the tile frustum. 3.2. Performance Testing 21

Figure 3.16: Shows different lists used for the tile-based forward rendering. Picture taken from [2].

Tile-based forward rendering is itself an optimization to forward rendering, but for further optimization there can be used other rendering methods like clustered rendering. In tile-based rendering techniques the view samples are packed into 2D- positions, but the clustered rendering method adds depth subdivision[10] and divides the screen into 3D clusters[12]. 3D cluster is fixed 3D volume, it’s a pack of view samples, positions and normals. The results showed a reduction of light calculations using this method. This method shows better performance than the tile-based ren- dering method, even in situations with heavy geometry and large amount of lights in the scene[9].

3.2 Performance Testing

The performance measurement procedure was as following, the render engine started with a chosen rendering method and a specific scene was loaded to perform the tests on. Each test can be launched whenever the user want and the duration can be changed to higher values to get more reliable results. Each test in this experiment is performed with the 800 x 640 pixel screen resolution. The test are collecting data in form of frame per second and for a determined amount of time. The frametime tests are the main tests which returns the performance data and the framerate tests are used to confirm the performance from the frametime tests. When the test is finished the average values is calculated and exported to an external testfile. 22 Chapter 3. Method

This testfile is then used to create graphs and tables of results. In this experiment each test was 5 minutes long and the amount of lightsources increases for each test. They are done with minimum 2 lightsources to maximum 256 lightsources. The tests were identical for all rendering methods, but in the end there were other tests performed that collected performance data based on the tile-amount in the frustum grid. Each tile test had 16 lightsources in the scene and were done in order to answer what amount of tiles is the best choice according to the performance.

Figure 3.17: Shows an example scene for tests of light calculation performance.

3.2.1 Hardware To implement and to perform different tests following hardware was used: the CPU was AMD(R) Ryzen 5 3600 (4.2 GHz) with following RAM configurations: Corsair (2x8GB) DDR4 3200 MHz CL16 Vengeance LPX Black. For all those rendering procedures following GPU was used: Gigabyte Geforce GTX 1660 Super 6GB 1785 MHz. Chapter 4 Results

The first tests were done using forward rendering technique, the tests were in general divided into two different subcategories. The first subcategory was collecting the data about the frametime and the second subcategory was collecting the framerate data. For each rendering technique there were 16 tests done with different amount of lightsources. Moreover when collecting the tile-based forward rendering performance data, the tests were divided into a new subcategory based on the amount of tiles in the frustum grid.

Figure 4.1: Shows the results of forward rendering frametime test.

Figure 4.1 above illustrates the results of the frametime using the forward ren- dering technique. The vertical axis in the figure represents the frametime (FPS) and the horisontal axis represents the number of lightsources. Figure 4.1 shows that the

23 24 Chapter 4. Results test results were distributed between approximately 200 frames per second to around 1860 frames per second.

Figure 4.2: Shows the results of forward rendering framerate test.

In figure 4.2 is the framerate tests shown, these tests are identical to the frametime tests and is used to validate the data from the first tests in figure 4.1. The vertical axis in the figure represents the framerate in milliseconds and the horisontal axis represents the number of lightsources as the figure earlier. The test showed values between 0.539 milliseconds and 5.047 milliseconds. 25

Figure 4.3: Shows the results of deferred rendering frametime test. 26 Chapter 4. Results

Figure 4.4: Shows the results of deferred rendering framerate test.

In figure 4.3 and 4.4 the results using deferred rendering method can be found. The tests are identical to the tests performed using forward rendering method and the figures have the same axis layout. The frametime tests showed results between 474 and 732 frames per second and the framerate tests showed values between 1.365 and 2.108 milliseconds. 27

Figure 4.5: Shows the performance results with different amount of tiles for tile-based forward rendering.

As mentioned earlier different tile-amount performance tests were done for tile- based forward rendering method. These test were done to answer the RQ3, what amount of tiles is the best choice according to the performance. In this experiment there were minimum 4 and maximum 36 tiles used for the tile-based forward rendering method. As it can be seen, the figure 4.5 shows the results from these tests and the values were from 500 and 40 frames per second. All these test were done in the same scene and the same amount of lightsources. For tile-based forward rendering performance measurement there were 4 tiles used in the frustum grid. 28 Chapter 4. Results

Figure 4.6: Shows the results of tile-based forward rendering frametime test. 29

Figure 4.7: Shows the results of tile-based forward rendering framerate test.

In figure 4.6 there is frametime results shown, these tests showed results between 800 and 30 frames per second. The framerate tests in figure 4.7 showed results between 1.22 and 30.3 milliseconds. All these test were performed with 4 tiles in the frustum grid.

Chapter 5 Analysis and Discussion

In this chapter the results will be analyzed and discussed in order to find the answer to the stated research questions. After it’s done, the hypothesis will be described and analyzed.

Figure 5.1: Shows the frametime results with the forward rendering in blue, deferred rendering in orange and tile-based forward rendering in grey.

31 32 Chapter 5. Analysis and Discussion

Figure 5.2: Shows the framerate results with the forward rendering in blue, deferred rendering in orange and tile-based forward rendering in grey.

In figures 5.1 and 5.2 all performance results are collected from each rendering technique. The frametime tests are collected in figure 5.1 and the framerate tests are collected in figure 5.2. This is done in order to easier compare the results and analyze it in this chapter.

During the forward rendering light performance tests it’s was noticed that the larger amount of lightsources there is in the scene the lower will the fps value be. In figure 5.1 it’s shown that the light performance of forward rendering is stable until the number of lightsources reaches around 50 lightsources. When the scene contains more than 50 lightsources the fps value decreases more than normally, the frametime drop is 472 fps from 49 lightsources to 64 lightsources which is approximately 27 per- cent performance decrease. Same thing was noticed when the number of lightsources increased from 64 to 81. It was also noticed that the performance decrease was lower when the number of lights increased to 100. In figure 5.1 there can be seen that between 50 lightsources and 100 lightsources there is the main performance decrease for forward rendering technique. The performance decrease is in general increasing in environments with less than 50 lightsources but is on the other hand decreasing in scenes with lightsources larger than 50.

The results of deferred rendering are not similar to forward rendering since the 33 frametime of deferred rendering was much lower when the number of lights were lower than 120. Deferred rendering was on the other hand much stronger in environments with more than 120 lightsources and showed approximately from 30 - 150 percent better performance.

Tile-based forward rendering showed lower frametime results than the standard forward rendering and deferred rendering technique. In environments with less than 9 lightsources the performance is approximately 6 times lower compared to forward rendering and in environments with 49 lightsources the performance are around 8 times lower. In comparison with the deferred rendering method the tile-based for- ward rendering showed 2-3 times lower performance in environments with less than 50 lightsources. In scenes with higher lightcount the performance dropped even more, it was 6-7 times lower with 100 lightsources and 9-10 times lower with 170 lightsources. With maximum amount of lights (256) the performance was 6 times lower than the light performance of standard forward rendering technique and 17-18 times lower than deferred rendering light performance.

The graph in figure 5.2 illustrates the framerate result of implemented rendering techniques. The first rendering technique, in blue, is the forward rendering. The per- formance drop has it’s origin in environments with larger than 50 lightsources, these values confirms the frametime results. The framerate results for deferred rendering is nearly linear and is again worser compared to forward rendering in environments with less than 110 - 120 lightsources. The deferred rendering results of framerate test procedure shows the same prognosis as deferred rendering frametime tests. In both cases the results are worser than forward rendering when the number of lights is lower than 110-120 and much higher when it’s higher than 120. The tile-based forward rendering showed again worser performance as in frametime tests. The first large performance drop has it’s origin in environments with 16 lightsources and the second drop was in scenes with approximately 50 lightsources.

To test the light performance of tile-based forward rendering there were 4 tiles used in the frustum grid. Figure 4.5 illustrates the performance results in scenes with 16 lightsources and different amount of tiles. The best performance was when there were 4 tiles in the frustum grid and of that reason the 4 tiles configuration was chosen to test the tile-based forward rendering light performance.

The tile-based forward rendering light performance results showed low perfor- mance compared to other rendering methods, and the reason to that can be the implementation. The implementation of the frustum grid, sphere culling and light culling was done on the CPU-side. This can be the reason for low performance since the CPU can’t always handle such tasks, of that reason there should be the GPU used instead. For these tasks a computer shader with multithreading can be used for different logical and rendering calculations.

The hypothesis of this experiment was that the tile-based forward rendering would show better performance in environments with large amount of lights and the per- formance would be better if 9 tiles would be used in frustum grid. This part of the 34 Chapter 5. Analysis and Discussion hypothesis wasn’t confirmed since the experiment results showed worser results than other rendering techniques in scenes with both low and large lightcount. There were a suggestion in the hypothesis that the standard forward rendering technique would show better light performance in environments with few lightsources, this was confirmed in the result section and discussed in this section above. To answer the RQ1 there can be mentioned that the most suitable rendering technique for en- vironments with high lightcount is the deferred rendering technique, since it showed the best light performance than the other rendering methods. To answer RQ2, the optimal rendering technique for environments with limited amount of lightsources is the standard forward rendering technique, it showed the best light performance compared to other rendering techniques used in this experiment. The answer to RQ3 is as following, for the best performance using tile-based forward rendering the best amount of tiles in the frustum grid is equal to 4 tiles. Chapter 6 Conclusions and Future Work

The goal with this study was to find the most efficient rendering technique for light rendering in specific environments. Three different rendering techniques were imple- mented using the created rendering engine which was based on OpenGL API. When the technique implementation were completed, different tests were performed to col- lect performance data from each rendering technique. To evaluate recieved results different graphs were made, these graphs represented the average value of frametime and framerate. The analysis of the results showed that the stated hypothesis wasn’t fully confirmed.

The results showed that standard forward rendering was the most suitable for use in environments with low lightcount, which matches the suggestions of the hy- pothesis. On the other hand for use in environments with high lightcount there were tile-based forward rendering suggested, but since it showed bad performance this part was disproven in the hypothesis. Furthermore the tileamount for tile-based forward rendering didn’t match the hypothesis suggestion, the best performance were with 4 tiles instead of 9. The reason for this can be the tile-based forward rendering imple- mentation method, since different culling procedures is implemented and executed on the CPU-side. Tile-based forward rendering with current implementation method isn’t good enough to be an alternative solution to the standard forward rendering technique. The hypothesis wasn’t proved in general but all these collected results are enough to be able to answer the stated research questions.

The answer to RQ1, "Since light rendering is an expensive and a common process, what rendering technique is the most suitable technique to be used in an environment with a large amount of lights?", in this case it’s deferred rendering, it fulfilled its purpose and was much stronger than the standard forward rendering and tile-based forward rendering.

The answer to RQ2, "In environments with a limited amount of lights, which rendering technique is optimal to use?", is the standard forward rendering. This technique showed much better light performance than tile-based forward rendering and deferred rendering, but specificaly in environments with a limited amount of lightsources.

The answer to RQ3, "When using tile-based forward rendering, what amount of tiles is the best choice according to the performance?", in this case the maximum

35 36 Chapter 6. Conclusions and Future Work performance was found when there was 4 tiles in the frustum grid.

Different techniques with standard methods were researched in this study, in fu- ture there can be studies done with new and more advanced rendering techniques. The first suggestion for futural work is tile-based forward rendering using compute shader and compare it with this method. This will allow to compare the methods done on the CPU-side and GPU-side. It can be interesting since it can answer differ- ent questions about which operations can be done on the CPU-side and which can be sent to the GPU. Furthermore there are other different suggestions for futural work about light performance, such as cluster-based rendering techniques and volume tiled rendering techniques.

Cluster-based rendering techniques is as mentioned earlier an optimization which again can be based on forward rendering or deferred rendering. Tile-based render- ing method creates 2D tiles and the cluster-based rendering method extends it and divides the screen into 3D clusters. 3D clusters is a pack of data with positions, normals and view samples. For each sample a cluster key is computed and used to encode the quantized normals of each sample. This allows us to improve the light culling process and lead to an improved light calculation performance. Cluster-based rendering method is an interesting technique that shows better performance in scenes with large amount of lights, and it would be interesting to implement it in future and compare the results with older rendering techniques. Furthermore it would also be interesting to implement volume-tiled rendering technique, since this technique sup- ports large amounts of lights in the scene and shows better performance compared to tile-based rendering methods. It would be interesting to compare this technique with cluster-based rendering techniques in future. References

[1] 3D GEP. 2014. Accessed 23 July 2020. https://www.3dgep.com/texturing-and- lighting-with-opengl-and-glsl/.

[2] 3D GEP. 2015. Accessed 8 August 2020. https://www.3dgep.com/forward-plus/.

[3] Gamedevelopment. Accessed 6 May 2020. https://gamedevelopment.tutplus.com/articles/forward rendering-vs-deferred-rendering-gamedev-12342.

[4] LearnOpenGL. Accessed 28 April 2020. https://learnopengl.com/lighting/colors.

[5] LearnOpenGL. Accessed 4 May 2020. https://learnopengl.com/advanced- lighting/deferred-shading.

[6] T.Akenine-Möller B.Johnsson. A performance and energy evaluation of many- light rendering algorithms. 2014.

[7] F.Fonseca F.Policarpo. Deferred shading tutorial. 2005.

[8] M.Rahm. Forward plus rendering performance using the gpu vs cpu multi- threading. 2017.

[9] M.Billeter O.Olsson and U.Assarsson. Clustered deferred and forward shading. 2012.

[10] M.Billeter O.Olsson and U.Assarsson. Tiled and clustered forward shading. 2012.

[11] U.Assarsson O.Olsson. Tiled shading. 2011.

[12] J.Van Oosten. Volume tiled forward shading. 2017.

[13] T.Harada. A 2.5d culling for forward+. 2012.

[14] J.C.Yang T.Harada, J.McKee. Forward+: Bringing deferred lighting to the next level. 2012.

37

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden