Master of Science in Engineering: Game and Software Engineering July 2020

Analysing Variable Rate Shading’s Image-Based Shading in Deferred Lighting Composition A comparison between image-based shading and uniform shading for games

Filip Lundbeck

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Master of Science in Engineering: Game and Software Engineering. The thesis is equivalent to 20 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information: Author(s): Filip Lundbeck E-mail: fi[email protected]

University advisor: Dr. Prashant Goswami Department of Computer Science

Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract

Background. The shading cost of a pixel is only getting more expensive with more realistic games. Resolution of games is equally pushed to display the all the details in a scene. This causes rendering a frame to be very expensive. Dynamic Resolution Rendering has been used to uniformly decreases resolution to gain performance but with the new release of image-based shading through Variable Rate Shading could be the new way to gain performance with less impact on image quality. Objectives. The goal is to see if the adaptive shading possibilities of Variable Rate Shading can show equal or better results, in regards to performance and image qual- ity, compared to the uniform shading of Dynamic Resolution Rendering. Methods. This investigation is performed by implementing them into the Deferred Lighting pass in a Deferred Renderer. The performance is measured by the render pass time of the Deferred Lighting and the image quality is measured by comparing the final frames of Variable Rate Shading and Dynamic Resolution Rendering against the original resolution through SSIM. Results. Overall Variable Rate Shading show comparable performance results to Dynamic Resolution Rendering but the image quality is closer to the original reso- lution. Conclusions. Using image-based shading on the deferred lighting pass allow the possibility of extracting similar performance gains as dynamic resolution rendering but allows maintaining higher image quality.

Keywords: Rendering, Adaptive Shading, Uniform Shading, Variable Rate Shading, Dynamic Resolution Rendering

i

Sammanfattning

Bakgrund. Kostnaden att ljussätta en pixel blir dyrare med realistiska spel. Up- plösningen av spel är ökas i samma tak för att visa upp detaljerna. Att rendera en blidruta för användare blir därför allt dyrare. Dynamic Resolution Rendering är ett sätt att enhetligt minska upplösningen för att öka prestandan fast med den nyligen släppta variable rate shading finns det nya sätt att öka prestandan men med mindre påverkan på bildkvalitén. Syfte. Målet är att se om adaptiv ljussätningsning via Variable Rate Shading kan tillåta liknande eller bättre resultat, med hänsyn på prestanda och bildkvalité, jäm- fört med den enhetliga ljussättningen av Dynamic Resolution Rendering. Metod. Denna undersökning kommer att utföras genom att implementera Variable Rate Shading och Dynamic Resolution Rendering i en Deferred Renderer. Prestan- dan kommer att mätas genom att ta tid för ljussättningspasset och bildkvalitén kommer att mätas genom att jämföra den slutgiltiga bildrutan av båda teknikerna mot den ursprunliga upplösningen via SSIM. Resultat. Övergripande visade Variable Rate Shading jämförelsebar prestanda när den applicerades på Deferred Lighting passet fast hade bildkvalité som liknande mer den originalupplösningen. Slutsatser. Variable Rate Shading visade sig vara jämförbar i prestandan som i jämförelse med dynamic resolution rendering, fast gav bättre möjlighet att bibehålla bildkvalitén.

Nyckelord: Rendering, Adaptiv ljussättning, enhetlig ljussättning, Variable Rate Shading, Dynamic Resolution Rendering

iii

Acknowledgments

I would like to thank for allowing me to conduct my thesis at the office which has been very educational and exciting. I would also like to thank Duncan Williams for being my supervisor at the company with several interesting discussions that has helped me with my thesis. Lastly I would like to thank the Graphics Team at Avalanche Studios Group for more insight into graphics programming.

v

Contents

Abstract i

Sammanfattning iii

Acknowledgments v

1 Introduction 1 1.1 Aim and Research Questions ...... 3 1.2 Outline ...... 4

2 Background 5 2.1 Compute Shader ...... 5 2.2 Deferred Rendering ...... 5 2.3 Dynamic Resolution Rendering ...... 6 2.4 Variable Rate Shading ...... 7 2.5 Structural Similarity Index ...... 8

3 Related Work 9

4 Method 11 4.1 Implementation ...... 11 4.1.1 Variable Rate Shading ...... 11 4.1.2 Dynamic Resolution Rendering ...... 13 4.2 Heuristics ...... 13 4.2.1 Haar Wavelet Transform ...... 14 4.2.2 Luma & Albedo ...... 15 4.2.3 Silhouette Rendering ...... 16 4.2.4 Nvidia’s Content Adaptive ...... 17 4.2.5 Combined ...... 17 4.3 Evaluation ...... 17 4.3.1 Scenes Under Test ...... 19 4.4 Validity ...... 20 4.5 Reliability ...... 20 4.6 Delimitations ...... 21

5 Results and Analysis 23

6 Discussion 37

vii 7 Conclusions and Future Work 39 7.1 Future Work ...... 39

References 41

A Supplemental Information 45

viii List of Figures

1.1 In-game capture from Red Dead Redemption 2 at 1920x1080 output resolution on PlayStation 4...... 1

2.1 Top: Four G-Buffers in Killzone 2 (from left to right), Depth, Albedo, Normal and Specular. Bottom Left: Deferred Composition of lights and G-Buffers. Bottom Right: Final frame after post-processing. (Source: DEFERRED RENDERING IN KILLZONE 2 [26]) . . . . . 6 2.2 Dynamic Resolution Rendering works by varying the size of the view- port below the size of the render target. (Source: Dynamic Resolution Rendering Article [11]) ...... 6 2.3 Visualization of the shading rate texture feature in Variable Rate Shading (Source: Nvidia VRWorks - Variable Rate Shading [20]) . . . 7

4.1 Pseudo code for the tiling compute shader (in hlsl) used to determine the adaptive shading for Variable Rate Shading...... 12 4.2 Pseudo code for edge detection using Haar Wavelet Transform (in c) used to evaluate the desired shading rate...... 14 4.3 Scenes under test - Area in rectangle is further investigated ...... 19

5.1 Left: Global SSIM value. Right: Performance including both the time for the overhead and the lighting pass. (Note: It is not a linear increase between resolutions) ...... 29 5.1 Left: Global SSIM value. Right: Performance including both the time for the overhead and the lighting pass. (Note: It is not a linear increase between resolutions) ...... 30 5.2 Local SSIM Values for the different heuristics in Desert Scene at 1200p. Comparison between resolutions can be seen in the appendix in figure A.1...... 31 5.3 Local SSIM Values for the different heuristics in Forest Scene at 1200p. Comparison between resolutions can be seen in the appendix in figure A.2...... 31 5.4 Local SSIM Values for the different heuristic in Office Scene at 1200p. Comparison between resolutions can be seen in the appendix in figure A.3...... 31 5.5 Local SSIM Values for the different heuristic in Sewer Scene at 1200p. Comparison between resolutions can be seen in the appendix in figure A.4...... 32

ix 5.6 A zoomed in section of the Forest Scene on the final image using different heuristics at 1200p (Crop at 240x170 resolution)...... 33 5.7 A zoomed in section of the Sewer Scene on the final image using different heuristics at 1200p (Crop at 240x170 resolution)...... 34 5.8 A zoomed in section of the Office Scene on the final image using different heuristics at 1200p (Crop at 240x170 resolution)...... 35 5.9 A zoomed in section of the Desert Scene on the final image using different heuristics at 1200p (Crop at 240x170 resolution)...... 36

A.1 Local SSIM values in Desert Scene at 1200p, 1600p and 2400p. . . . . 46 A.2 Local SSIM values in Forest Scene at 1200p, 1600p and 2400p. . . . . 47 A.3 Local SSIM values in Office Scene at 1200p, 1600p and 2400p. . . . . 48 A.4 Local SSIM values in Sewer Scene at 1200p, 1600p and 2400p. . . . . 49 A.5 Box plot of desert scene with 1000 samples ...... 50 A.6 Box plot of sewer scene with 1000 samples ...... 51 A.7 Box plot of office scene with 1000 samples ...... 52 A.8 Box plot of forest scene with 1000 samples ...... 53

x List of Tables

2.1 Memory usage required for Shading Rate Texture at different resolutions. 8

4.1 Factors of interests for test cases and evaluation...... 17

5.1 Median time (ms) for upsampling (DRR) and tiling (VRS) at different resolutions using different heuristics...... 24 5.2 Median render time (ms) for the deferred lightning pass and global SSIM value at different resolutions using different heuristics in the Desert scene. More detailed values can be seen in the appendix at A.5. 24 5.3 The percentage of time (%) taken for Variable Rate Shading and Dy- namic Resolution Rendering based on the time of the native resolution in the Desert scene when only comparing the deferred lighting pass. . 24 5.4 Median render time (ms) for the deferred lightning pass and global SSIM value at different resolutions using different heuristics in the Office scene. More detailed values can be seen in the appendix at A.7. 25 5.5 The percentage of time (%) taken for Variable Rate Shading and Dy- namic Resolution Rendering based on the time of the native resolution in the Office Scene when only comparing the deferred lighting pass. . 25 5.6 Median render time (ms) for the deferred lightning pass and global SSIM value at different resolutions using different heuristics in the Forest scene. More detailed values can be seen in the appendix at A.8. 26 5.7 The percentage of time (%) taken for Variable Rate Shading and Dy- namic Resolution Rendering based on the time of the native resolution in the Forest Scene when only comparing the deferred lighting pass. . 26 5.8 Median render time (ms) for the deferred lightning pass and global SSIM value at different resolutions using different heuristics in the Sewer scene. More detailed values can be seen in the appendix at A.6. 27 5.9 The percentage of time (%) taken for Variable Rate Shading and Dy- namic Resolution Rendering based on the time of the native resolution in the Sewer Scene when only comparing the deferred lighting pass. . 27

xi

Chapter 1 Introduction

Figure 1.1: In-game capture from Red Dead Redemption 2 at 1920x1080 output resolution on PlayStation 4.

Many games are always pushing for higher fidelity graphics, lighting being one of the more important aspects in real-time rendering to achieve believable results. The cost to calculate the lighting is getting more expensive and the demand is increasingly difficult to please in games [28]. The majority of users on PC currently have 1080p display resolution on their primary display [23]. The latest released consoles as of today, One X (2017) and PlayStation 4 Pro (2016), are pushing for 2160p, which is four times the amount of pixels in comparison to 1080p. As of June 2017 Xbox introduced new packaging icons [18] for their console where they clarified that the 4K Ultra HD icon indicate that a 2160p frame buffer is used but the game is not necessarily rendered in that resolution. This means techniques are often used to mimic 2160p resolution when frame time is not enough in games. The problem is therefore to reach higher resolution with minimal loss in frame time. One way to tackle this is to reduce the amount of shading required per pixel for a frame but still retaining the amount of pixels of 2160p resolution. There are sev- eral techniques that aims to solve this by producing pixel information without actual

1 2 Chapter 1. Introduction shading calculations; Checkerboard Rendering (CBR), Dynamic Resolution Render- ing (DRR) are the most commonly used in the games industry [10] and the newly released Variable Rate Shading (VRS) may possibly fit in there as well. Temporal Injection is another technique to upscale an image to 2160p, developed by Insom- niac, based on jittered temporal information to produce anti-aliased image in 2160p resolution [14]. DRR is used to decrease the resolution uniformly across the screen to decrease frame time at the cost of decreased image quality and CBR renders the image at half resolution and is upscaled through a checkerboard pattern, but both of them requires time to implement in a game engine. The X often requires CBR and DRR to achieve 2160p with desirable frame rate [18]. Many games use these kinds of techniques [8] since native 2160p is out of the scope for most games on the market running on today’s hardware. Even the showcase of Unreal Engine 5 on PlayStation 5, a console supposedly set to release during the end of 2020, used some form of DRR [15] to lower render resolution to 1440p to increase performance. The problem with DRR is that the decrease in render resolution is uniform across the image and therefore cannot retain important details of the scene. Important de- tails can be lost and the whole image gets blurry during upsampling to the desired output resolution. This is where VRS is a possible contender to DRR. VRS allows for adaptive shading in the current pipeline and does not require any modification to existing code, making it convenient to implement. This technique can decrease and retain resolution in different areas of the image where desired [25]. It can pos- sibly provide better visual quality while also potentially allowing the same decrease in render pass time as DRR. At this stage it is unknown how performance friendly adaptive shading is with VRS since it has only been used for visually lossless imple- mentation [28] and it is also known that the visual quality is not optimal with a fixed shading rate in a deferred shading pass [27]. Since VRS causes sharing of fragments among a group of pixels, which may cause threads being active with no work, it could possibly mean that there are wasted resources and could limit the performance gains compared to DRR. The scope of the thesis is investigate the difference between DRR and VRS image- based shading in Apex Engine by Avalanche Studios Group, using Deferred Render- ing. These techniques will be applied for the deferred lighting pass where the direct lighting calculations exist to measure the time to run the pass. This will avoid the overhead of vertex operations and limit the timing to primarily the pixel shader invocations. The image will be later enhanced through post-processing effects to simulate a real game environment. The resulting image of the frame, for both the VRS and DRR, will be compared towards the native resolution using SSIM to evalu- ate which one produced higher image quality and the timing of the deferred lighting pass and the overhad of tiling (VRS) and upsampling (DRR). Image-based shading requires the need to generate data for each tile a shading rate texture, therefore several heuristics will be used to populate the texture which will produce different results for VRS to measure against DRR. DRR requires upsampling and therefore it will use the engine’s implementation of Contrast Adaptive Sharpening (CAS), from AMD [9]. They will be compared at a similar time taken of 50% of the native de- ferred lighting pass to evaluate if the gain in performance is similar and how the image quality compare at that level. The contributions of this thesis are: 1.1. Aim and Research Questions 3

• An overview investigation at the potential use of different heuristics to use for the shading rate texture and a comparison between the use of uniform shading and adaptive image-based shading using Variable Rate Shading, in terms of render time and image quality in the deferred lighting pass.

1.1 Aim and Research Questions The aim of the thesis is analyse the performance and image quality of Variable Rate Shading’s image-based shading in a deferred lighting composition pass. To conduct the analysis of VRS’s Image-Based Shading will be compared against Dynamic Reso- lution Rendering in regards to performance and image quality. The performance will be measured by the time taken to run the composition pass and image quality will be evaluated through SSIM. Since VRS and DRR do not cooperate well it is important to evaluate if they are comparable in terms of performance and then evaluate based on the visual quality quality. The purpose is therefore to investigate if VRS enables enough performance gains to be a contender to DRR and then compare visual if VRS allows higher visual quality. The goal will therefore be broken down into these objectives: 1. By using Apex Engine Dynamic Resolution Rendering is already implemented and therefore their implementation will be used, which includes CAS for up- sampling. 2. Enable VRS Tier 2 Image-Based Shading in the deferred lighting pass. 3. Implement 6 heuristics to individually populate the shading rate texture for the image-based shading to evaluate different quality levels against DRR and possible performance deviations. 4. Evaluate VRS and DRR at similar times at around half the resolution to detect possible performance gains. 5. Measure the render pass time of the deferred lighting pass at native resolution to compare against VRS and DRR. 6. Run SSIM on the final image for VRS and DRR to compare against the native rendering resolution to evaluate image quality. 7. Do the measurements on 1200p, 1600p and 2400p to measure the relevance at different resolutions. 8. Evaluate the results of VRS in both terms of render pass time and visual quality compared to DRR. The research question is as follows: • How does Variable Rate Shading’s Image-Based shading compare to Dynamic Resolution Rendering in terms of performance and image quality when used on a deferred lighting pass? • How does different heuristics compare in image-based shading in regards to performance and image quality? 4 Chapter 1. Introduction

1.2 Outline This chapter has introduced the topic of adaptive shading and how Variable Rate Shading can compare to Dynamic Resolution Rendering. It has also been stated why this is viable to investigate and what the main objectives are and what research question to answer. The following chapter will go into detail about related background in regards to the thesis, which includes clarification of Variable Rate Shading and Dynamic Resolution Rendering. Among others it will also explain Deferred Lighting. Chapter 3 will talk about related work in the field of adaptive shading. It will also mention previous work on Variable Rate Shading and Dynamic Resolution Rendering. Lastly this section will summarize how the related work compare to the research of this thesis. Chapter 4 will discuss more about the method used in this thesis. The chapter the focus on in detail about the different heuristics chosen and why they were chosen to evaluate Variable Rate Shading. The implementation for the tiling will also be discussed. A professional game engine is used to conduct the experiment and the reason for why will be further explained. Chapter 5 will go into results regarding how Variable Rate Shading and Dynamic Resolution Rendering compare to one another in terms of performance and image quality. There will be images using grayscale SSIM local values to show how close to the native resolution they were, where white color is better. There will also be graphs showing the performance across several resolutions. Chapter 6 will discuss the results of the previous chapter in general terms and how the VRS’s image-based shading performed against Deferred Rendering Resolution and evaluate their differences in a more global context. Chapter 7 will conclude the thesis by summarizing the more important aspects of the thesis and possible take aways to be aware of if image-based shading is considered to be implemented. Chapter 2 Background

To further explain the thesis this section will go into detail about the required back- ground for the work that will be presented. This section mentions Deferred Rendering because that is where the results will be evaluated. Dynamic Resolution Rendering, as mentioned in the previous section, will be compared to Variable Rate Shading and lastly Variable Rate Shading to explain it capabilities and what can make it a potential competitor.

2.1 Compute Shader

Compute Shaders are used to run user defined heuristics on the GPU for general purpose, not necessary related to graphics [1]. They allow code to be execute in parallel produce different results for each thread, also known as SIMD. The user can decide the amount of group threads to be executed and how many threads there are for each group.

2.2 Deferred Rendering

There are two popular fundamental techniques (which many rendering techniques build upon); Forward Rendering and Deferred Rendering. Forward rendering does the shading of objects at the time of drawing and puts the results into the final frame buffer. Deferred Rendering, as the name suggest, postpones the calculations. It is often used in games for flexibility and as a performance optimization [1]. All opaque models seen from the camera are drawn to a G-Buffer (a list of 2D textures) to use in several passes. These textures has the same dimensions as the frame buffer. The information include, from top left to bottom right, the depth of the scene, models albedo (material color), normals, specular and other model data is required for later processing, as can be seen in Figure 2.1. The figure also includes at the bottom, at the left being the composition of the deferred lighting and to the right includes post-processing such as depth of field, bloom, motion blur, colorize, ILR). This is only for Killzone 2, made by Guerilla Games, but all games use similar g-buffer layout, depending on necessary data. The data is then used, as in this case, to light the scene using a deferred lighting pass, where lighting calculations and shadows will be evaluated for the scene. By using deferred rendering the engine can avoid lighting calculations which will later be overwritten.

5 6 Chapter 2. Background

Figure 2.1: Top: Four G-Buffers in Killzone 2 (from left to right), Depth, Albedo, Normal and Specular. Bottom Left: Deferred Composition of lights and G-Buffers. Bottom Right: Final frame after post-processing. (Source: DEFERRED REN- DERING IN KILLZONE 2 [26])

Figure 2.2: Dynamic Resolution Rendering works by varying the size of the viewport below the size of the render target. (Source: Dynamic Resolution Rendering Article [11])

2.3 Dynamic Resolution Rendering

Dynamic Resolution Rendering (DRR) is a technique to uniformly change the ren- dering resolution to either increase the performance or the image quality [11]. To accomplish this the viewport size, that determines the amount of pixels to draw on the frame buffer (texture), is either dynamically updated by the program to achieve a certain frame rate or manually changed if a user prefers a certain setting. When the viewport is lower than the frame buffer the result is then upsampled to the desired resolution. When upsampling it requires a magnifying technique which in most cases causes a blurry image because of the creation of new data to fill out the frame buffer. The upsampling can be done using hardware [2] or there are software implementa- tions. The implementation used in this thesis is called FidelityFX Contrast Adaptive Sharpening (CAS) [9], open source implementation and is developed by AMD. It is a software implementation primarily used for sharpening but also supports DRR up- sampling. The upsampling of the image causes an overhead for DRR because of the extra cost of this type of rendering. 2.4. Variable Rate Shading 7

Figure 2.3: Visualization of the shading rate texture feature in Variable Rate Shading (Source: Nvidia VRWorks - Variable Rate Shading [20])

2.4 Variable Rate Shading

Variable Rate Shading (VRS) [19] is a newly developed technique to allow decrease in pixel invocations. The technique is similar to MSAA as in it changes the point of intersection during rasterization to identify a hit to a fragment. While MSAA (Multisample Anti-Aliasing) allow to increase the density of the intersection points while VRS allows to decrease the amount of intersection points, which determines whether to execute a fragment shader. When decreasing the amount of intersection points through VRS it also decreases the amount of fragments to be executed. VRS therefore, as an effect, allows to decrease the image quality and save performance by decreasing the shading rate of fragments. The limiting factor of VRS is that the shading rate allows 1:1, 2:1 and 4:1 mapping, where the resulting fragment is used for several pixels, but 1:1 and 4:1 mapping cannot be used at the same time on different axis, see Figure 2.3, compared to Dynamic Resolution Rendering that can decide freely which mapping to use. VRS allows two behaviours of decreasing the the shading rate [19]:

• Fixed Shading Rate

• By Shading Rate Texture

This thesis will only focus on the shading rate texture which enables adaptive shading. The size of the shading rate texture is a 1 th of the native render resolution 16 in both width and height separately. Each fragment in the shading rate texture covers 16x16 pixels and determines their shading rate using a byte (8 bits). This results in the memory usage, seen in Table 2.1, based on different resolutions used to evaluate Variable Rate Shading. The two lowest bits determine the y-axis shading rate of the screen and the following two bits determine the x-axis shading rate. The shading rate texture is then applied to the rasterization pipeline to be handled by the hardware. This texture has to be updated each frame based on heuristics to determine where to execute fewer pixel shader invocations. 8 Chapter 2. Background

Memory Usage (KB) 1200p 1600p 2400p Shading Rate Texture 9 16 36

Table 2.1: Memory usage required for Shading Rate Texture at different resolutions.

2.5 Structural Similarity Index The purpose of Structural Similarity is to objectively compare quality between two images, where the human visual system (HVS) is in focus. Structural Similarity Index (SSIM) was developed by Wang. et. al. [29]. SSIM evaluates an image by calculating the difference in luminance, contrast and structure between two images. There are other objective image comparison methods like Mean-Square Error (MSE) and Peak Signal-to-Noise Ratio (PSNR). Wang et. al. states that the philosophy between them are different, where MSE and PSNR focus on perceived error while SSIM focus on perceived differences in structural information. The SSIM value has a continuous range from 0 to 1 where 1 means structurally identical to the reference image. Chapter 3 Related Work

The shading cost of a frame has always been costly. There has been several attempts to reduce the cost of shading a frame by using adaptive techniques to determine important part of a screen. This goes as far back as 1987 when Mitchel et. al. [17] ray trace a scene non-uniformly based on where aliased parts of the images predicted to be. The heuristics to guide where to sample more was based on color and contrast. With the recent advances in Virtual Reality Stengel et. al. [24] investigated several techniques to predict where high resolution would occur and try to save per- formance based on human visual system. Which also is implemented into a deferred renderer. Dubla et. al. [4] investigated interactive global illumination using ray tracing with adaptive shading. The focus was on predicting where higher resolution would be required by using three different heuristics. The first heuristic focused on material id, the other on the amount of light would be accumulated by a pixel and the last one on shadow volumes boundaries and objects inside the volume. There has been investigation from 2014 regarding coarse pixel shading by Vaidyanathan et. al. [25] where instead of having a mapping of 1:1 from the rasterizer to the frame buffer they test the mapping of 1:4 pixels instead which proved to give better image quality than 1:1 mapping at a quarter resolution. This is similar to how VRS work and this could possibly be taken advantage of by VRS in comparison to DRR to produce higher quality images. Checkerboard Rendering (CBR) [5] is a popular technique in today’s games. This technique is a uniform way to reduce the pixel shader invocations but still increase the image quality by upsampling the image in a checkerboard pattern. A technique called Deferred Adaptive Compute Shading (DACS) developed by Mallett et. al. [13] is compared against CR and proves to perform better in terms of performance or higher visual quality by using an adaptive pattern. By using an adaptive pattern they could determine parts of an image where there is an estimated high variance and also existing silhouette edges. This shows that a uniform pattern may not be the best option to provide higher quality. The adaptive shading is also stated to be faster. This comparison can be similar to DRR and VRS, where VRS could possibly be the better option because of its adaptive ability. Yang et. al. has investigated how VRS would perform using the adaptive shading [28] and used calculated the luma value for each pixel of the previous frame to determine minimal visual differences. Due to this calculation it would allow them to determine if it is possible to decrease the pixel calculations and reuse the value for several pixels. This paper investigated how VRS could be used get visually lossless

9 10 Chapter 3. Related Work image quality degradation but increase performance. To build upon it this thesis will investigate how it could be used to gain performance by maintain image quality only where it matters. Overbeck et. al. [21] explored adaptive wavelet ray tracing when using Monte Carlo to reduce the amount of samples required to achieve higher quality. The wavelet receives higher amplitude at high frequency details, such as edges. The heuristic samples then only where there is no strong edge, where high variance is found but not high magnitude. VRS has been explored in comparison to DRR by Bois [3]. The talk presents how VRS compares to DRR regarding VRS applied per-draw and how it can preserve edges when using 2x2 shading rate in comparison to a render scale of 50% which gets blurred across objects. At this stage it proved useful. Therefore it can be interesting to look at the adaptive shading capabilities of VRS due to being able to control the resolution at a variable rate. But this shows one advantage VRS has over DRR and this thesis will further investigate that by looking at the tiling possibility of VRS against DRR. Using adaptive heuristics to determine where to increase visual quality has been around for very long. The research that has been done on the subject show several different ways of achieving higher image quality without the need to use uniform heuristics. Dynamic Resolution Rendering is a way to decrease the image quality uniformly but by using adaptive techniques may allow for increased image quality with similar performance gains. Chapter 4 Method

To gather the necessary data to answer the research questions Variable Rate Shading (VRS) and Dynamic Resolution Rendering (DRR) will be implemented in Apex Engine, Avalanche Studios Group’s open-world game engine. Using a professional game engine to gather the data will allow the results to reflect similar behaviour as other game developers may experience. While this may cause the results to be somewhat specific to the analysed game it will be possible to see how VRS performs in a realistic scenario. Since today’s games can vary a lot in visuals there is no perfect test bench for all types of games. The game engine used in this thesis accomplishes realistic high quality real-time graphics in games which allows it to reflect other games attempting similar visuals.

4.1 Implementation

This section will talk about the implementation. The focus lies on the implementa- tion of the image-based shading because that is specific to this thesis.

4.1.1 Variable Rate Shading The implementation of the adaptive shading in Variable Rate Shading is done using a Compute Shader, as can be seen in figure 4.1, to populate the shading rate texture. The compute shader is executed using 32 threads per thread group. Using 32 threads allows full utilization of a warp that also enables inter-communications between the threads. There is a thread group for each texel in the shading rate texture. The first 16 threads, in a group, do the calculations for the heuristic horizontally, along the x- axis per row, and the other 16 threads do the calculations for the heuristic vertically, in the y-axis per column. The calculations are separated horizontally and vertically because the shading rate needs to be evaluated for x and y-axis individually. Each thread get a data segment of 16 pixels from a desired texture as heuristic. In the end of the shader program the first thread in each thread group gathers the results to determine the shading rate and writes it to the thread group’s corresponding texel in shading rate texture. In the pesudo code, Figure 4.1, the results are calculated by comparing texels according to the direction. This is to find a relation between neighboring pixels. In the end the results are collected by finding the max value calculated, higher value means higher shading rate, i.e. higher resolution. Then based on the result it is

11 12 Chapter 4. Method

// Tiling Compute Shader

[numthreads(32, 1, 1)] void main ( ) { // Calculate start texel position in the frame // for the current thread group uint2 global_pos = global_id.xy ∗ tile_size.xy;

// Calculate the direction and start texel for // a thread in this thread group uint dir =thread_index< 16; uint2 axis = uint2(dir, !dir); uint2 start_pos = (thread_index % 16) ∗ ! a x i s ;

// Get necessary texture data line_of_texel_data = GetTextureDataSegmentAt(global_pos + start_pos , 16);

// Calculate result of the data set r e s u l t = CalculateHeuristic(line_of_texel_data);

// Collect results from all threads // Individual results for x and y uint2 final; f i n a l . x = GetMaxValueInActiveThreadGroupInAxis(x, result ); f i n a l . y = GetMaxValueInActiveThreadGroupInAxis(y, result );

// Calculate shading rate and output to shading rate texture i f (thread_index == 0) { uint shading_rate = CalculateShadingRate(final ); ShadingRateTexture[global_pos] = shading_rate; } }

Figure 4.1: Pseudo code for the tiling compute shader (in hlsl) used to determine the adaptive shading for Variable Rate Shading. 4.2. Heuristics 13 measured against a user defined threshold to decide the shading rate. Higher user defined threshold means the results needs to be higher to get a higher shading rate.

4.1.2 Dynamic Resolution Rendering The implementation of DRR uses the game engine’s implementation. The game engine’s implementation uses AMD’s open source, Contrast-Adaptive Sharpening (CAS) [9], to upsample the image to the desired resolution and is therefore the reason for why it is used in this thesis. CAS is primarily used for sharpening an image rather than upsampling but the upsampling is there to allow DRR. This technique for upsampling proved to be a good choice at the game company and was therefore used in the thesis to reflect real world possible results with DRR. The determining factor of how good DRR is based on its upsampling technique. There are several different mentioned in the intel article [11] about dynamic resolution rendering but the article is from 2011 and does not reflect current standards.

4.2 Heuristics This section will explain more about the heuristics used in this thesis. Due to the nature of adaptive shading it requires heuristics to find where the decrease the image quality and where to keep it. This could be done in several possible way as described in chapter 3. To utilize VRS’s image-based shading there will be several heuristics implemented to evaluate the adaptive shading capabilities of VRS. This thesis imple- ments several heuristics to reduce the risk of deviating results and therefore obtain a general sense of VRS. The heuristics are:

• Haar Wavelet Transform (HWT)

• Luma

• Albedo

• Silhouette Rendering (SR)

• Nvidia’s Content Adaptive (see [28])

• All five heuristics combined

To obtain a single value to evaluate the appropriate shading rate the color data is transformed to grayscale. The grayscale value is obtain through the luminance value, which is calculated by using dot product of the RGB color vector v, obtained from scene data in the game, by the luminance vector:

lluma = vRed ∗ 0.2125 + vGreen ∗ 0.7154 + vBlue ∗ 0.0721 (4.1) According to Yang et. al. [28] using luma proved to be efficient enough to obtain reasonable results. The RGB difference were not necessary and also proved to be more expensive to calculate. The silhouette rendering does not look at color and therefore is the only one not using luma to decide the shading rate. 14 Chapter 4. Method

// Haar Wavelet Transform

float EdgeDetectionUsingHWT() { // Retrieve the signal (signal is made up of 16 values) i = 16; float y[16], x[16]; GetSignal(x);

// As long as the active signal is larger than 1 while ( i > 1) { i /= 2 ;

// Derive new values using the filters for ( k = 0 ; k < i ; k++) { y[k] = LowPassFilter(x, k); y[k + i] = HighPassFilter(x, k); }

Copy(x , y ) ; // Copy the data of y to x }

// Return value retrieved from the high pass filter return y [ 1 ] ; }

Figure 4.2: Pseudo code for edge detection using Haar Wavelet Transform (in c) used to evaluate the desired shading rate.

4.2.1 Haar Wavelet Transform Wavelet Transform is a way to transform signals or filter the data to extract char- acteristics, which is what JPEG2000 uses to detect edges during compression [6]. By extracting a row (or column) in a texture it is possible to define it as a signal. The RGB data from a texture can be transformed to grayscale to interpret it as a 1D array, in this case an array of 16 grayscale values. When filtering the array it is possible to detect edges that can be used to determine the shading rate. There are two filters of interest when decomposing the signal when using the Haar Wavelet Transform. A low pass filter is used to denoise a filter while a high pass filter is used to detect the noise. [6] This technique is one of three that will be used on the previous frame to detect areas to decrease the shading rate. The haar heuristic is simple and therefore possibly a more efficient heuristic to use on a GPU compared to other wavelet transforms. The reason for why a wavelet transform was used is because it is used in the field to detect edges during compression. The Haar Wavelet 4.2.Heuristics 15

Transform(HWT)definethelowpassfilterasthefollowing √ x +x y = 2 2k 2k+1 (4.2) k 2 andthehighpassfilterasthefollowing √ −x +x y = 2 2k 2k+1 (4.3) k+i 2 where xkisthe kthvalueinthe1Darrayoflumavalues, xisatemporarybufferto storethecurrentsignalwhile yisatemporarybuffertostorethemodifiedsignal, where ykdefinesthekfirstvaluestostorethelowpassfilteredvaluesand yk+istore thehighpasssignalswhere iisdefinedashalfthelengthoftheactivedatasetinthe x buffer.ThealgorithmisthendefinedasinFigure4.2wherethesignalisconstructed by16lumadatavaluesgatheredinthecomputeshader.Thepurposeofthelow passfilteristodownsamplethesignal,duetocalculatingtheaveragevalue.Onthe samesignalthehighpassfilterwilldetectdifferencesinvaluesduetocalculatingthe distance.Usingthesefiltersthelowpassfilterwillfirstdownsamplethesignalto twovaluesandthenthehighpassfilterwilldetectwhetherthereisanedgeornot.

4.2.2 Luma& Albedo LumaandAlbedo,similartoHaarWaveletTransform,usesagrayscalevalueto detectedges.Thisheuristicwasderivedduringtestingoftheimplementationof VRSintheengine,whichsimplycomparevalues.Luma,usesthepreviousframe andAlbedousesthealbedotexturefromtheG-Buffer.Theheuristicaimstolook atdifferencesinthegrayscaleddata.Therearetwovariablestoconsiderhere.One lookingattheaveragedifferenceingrayscaleofthetileas

16 |luma −luma | luma = n=2 n n 1 (4.4) avg 15 where luma n isthelumavalueofthe nthpixeland luma n 1isthelumavalueofthe previouspixel,n-1,whichisthendividedby15tocalculatetheaveragedifference inlumaonasetof16values.Theotherisdefinedasthemaximaldifferencebetween neighboringpixels

luma max =max (|luma n−luma n 1|,luma max ) (4.5) whichaimstoonlyfindthemaximaldifferenceusingthegrayscaledataamongaset ofdata,where luma nisthenlumavalueofapixelandthe luma n 1isthelumavalue ofthepreviouspixel,n-1.Inthiscaseonlythedifferenceisofinterestbecausethe purposeistofinddeviationsandifthereareanydeviationwhichwillindicatethatit willrequirehigherresolutiontofullyportraytheimage.Similarcolorscanbeshaded lowersincetheyshouldnotdeviatewhenpresentedbyfewerpixels.Theaverageis thenrootedtoallowforamarginoferrorbeforetheshadingrategetslower,because thevalueliesbetween[0;1].Bylookingattheaverageitwillbepossibletoknow thegeneralnoiseofthetileandifitisverylowthenhigherhighershadingratemay notbenefitthetilewhilelookingatthemaxdifferencewilldetermineifanytilewill 16 Chapter4.Method requirehigherquality.Thesearethenmultipliedtoevaluateifthedifferenceislarge enoughtoconsiderlowershadingrate.

result luma =luma max ∗ luma sum (4.6) where luma max istheresultsofEquation(4.5)and luma avg istheresultsofEquation (4.4),whichisthenusedtodeterminetheshadingrate.Albedoonlycontainsraw texturecolorappliedtoobjectswithnoshadingorothercompositeandcantherefore possiblybetterdeterminedeviationinjustmaterialstoallowbettervisualizationof individualobjectsbeforeshading,whichwillbecalculatedthesame.

4.2.3 SilhouetteRendering OneoftheheuristicsusedisSilhouetteRendering.Thisaimstoprovidehigher resolutionatedgesbylookingatthenormalsanddepthbufferinscreenspace.By usingthedotproducttocomparethenormals’anglesandeliminatingalllargerangles withminimaldepthdifferencesthesilhouetteofthescenewasrendered.Silhouette renderingwasusedinDACS[13]asawaytoenhancetheimagequalityfurtherthan justusingcolorheuristicsandisthereforealsoinvestigatedinthisscenario. Thedotproducttoevaluatethenormalsangletowardseachother.Sincelarger thedifferencethehigherchancetheshadingwilldeviatebetweenthepixels.Using dotwillevaluatehowsimilartheyareinwhichdirectiontheyarepointingatwhere1 isexactlythesameand-1istheexactopposite.Thisisthenaddedby1anddivided by2tonormalizethevalueto[1;0].Itistheninvertedbysubtractingaone.Since thelargestanglesneedstobepreservedthehigherthevaluethebetter.

dot (n,n )+1 normal =max (1 − n n 1 ,normal ) (4.7) max 2 max whichaimstofindthemaximumvalueamongasetofdata,where nisthenormalof thecurrentpixelandn-1isthepreviouspixel’snormal.Aftertheangleiscalculated itisthencheckedusingabinarydepthtestinthepixel.Ifthedifferenceindepth betweentwopixelsisnearlyidenticalthentheangleofthenormalsismultipliedby 0,toindicatesameentitywithinthescene,or1,toindicateanedge.Themaxis foundamongadatasetas:

depth max =max (|depth n−depth n 1|,depth max ) (4.8) where depth nisthecurrentpixel’sdepthvalue, depth n 1isthepreviouspixel’sdepth valueand depth max istheresultofthepreviouscomparisonbetweenthetwopixel’s depthvalue.Thisheuristicwillnotdetectnormalmapsandmaythereforeprove inefficientinsomecases.

results silhouette =normal max ∗(depth max > ) (4.9)

where normal max istheresultofEquation(4.7)and depth max istheresultof Equation(4.8),whileepsilonisanarbitrarynumbertoindicatewhetherthereisa changeindepthornot. 4.3.Evaluation 17

Resolutions(16:10AspectRatio) Measurements [email protected] RenderPassTime(ms) [email protected] StructuralSimilarityIndex [email protected]

Table4.1:Factorsofinterestsfortestcasesandevaluation.

4.2.4 Nvidia’sContentAdaptive AsmentionedintherelatedworksectionNvidia[28]developedaheuristicforVari- ableRateShading’simage-basedshading.Thistechniqueandasobelfilter,presented atGameDevelopersConferencebyMicrosoft[22],aretwoheuristicspresentedto thepublicfortipsonimplementation.Thesobelfilterusesa3x3kernelwindow overpixelsandthereforerequiressharedmemorytorunmoreefficientbutusing sharedmemoryistootimeconsumingonasingleheuristicandthereforeoutofthe scopeforthethesis.ThepresentationbyMicrosoftdidnotmentionanycostforthe imagegenerationprocessandsobelfilterwasthereforenotconsidered.Itcanalsobe simplifiedtobecomesimilartotheheuristicsoftheLumaandtheAlbedo.Nvidia’s heuristicwaslightweightenoughtoconsider. InthepaperaboutcontentadaptiveheuristicYanget.al.[28]alsopresents motionadaptiveheuristic,butitwasnotimplementedsincetheevaluationisforstill frames.Thecontentadaptiveheuristicreliesonthelumadifferencetermcompared totheaveragelumainthetiletoevaluatetheshadingrate.Readmoreaboutitin theirpaper.TheequationsproposedbyYanget.al.isdefinedas:

2 16 luma n luma n 1 n=2 2 result = (4.10) nvidia 15 where luma n isthelumavalueofpixelnand luma n 1 isthelumavalueofthe previouspixel.Thisresultwillthenbeusedtodeterminetheshadingrate.

4.2.5 Combined Thecombinedisconstructedbythesumofalltheshadingratesdeterminedbythe heuristicsanddivideitbythesameamountofheuristics.Thiswillcreateanaverage shadingratetouseforthetile.Thiswaschosenduetoitsgeneralcasewherethereis noexception.Thiswilltakealltheheuristicsandcombinethemforapossiblymore efficientresult.Duetotherebeingthreeheuristicslookingatthepreviousframe theremaybeashifttowardstheseheuristicsinthefinalresultofthecombinedbut usingalbedoandsilhouettetheymayimposeashiftinshadingrate.

4.3 Evaluation

Themeasurementswillbequantitativeforboththeperformanceandimagequality. Toevaluatetheresultstherenderingtimewillbeasampleof1000timingsofthe 18 Chapter 4. Method render pass time in consecutive frames for each heuristic used for VRS, the DRR and the native resolution. 1000 sample count should possibly ensure that deviation in time will be reflected as outliers. The timings of VRS and DRR will be captured when they perform in equal timings to understand their relative image quality against each other. The evaluation will be conducted on samples close to half the render pass time for the deferred rendering pass. Half the time to run the pass should be considered enough performance gains to consider valuable in a sense where there are enough pixels left no allow adaptive shading for VRS with x2 maximum shading rate. This is where the aim will be for VRS and DRR to comparable enough to consider whether VRS image-based is a viable solution. The image quality assessment is done through SSIM. The final image of the scene during a frame is captured and stored to disc where it later will be run through MATLAB [16] for evaluation. SSIM creates an image based on the compared images, using same dimension as the input images, where each pixel represent the local SSIM values. It’s a grayscale image on a continuous scale between white and black, where white indicates identical traits while black means not similar. There is also a global SSIM value created at the same time describing the global similarity of the images, instead of an image. The highest quality will be set in the game engine which will enable several graphical features:

1. Shadows

2. Screen-Space Ambient Occlusion

3. Reflections

4. Temporal Anti-Aliasing

5. Depth of Field

6. Global Illumination

These effects will mask potential decrease in resolution. Lavoué et. al. [12] state that texture compression should not be directly measured on the texture map itself but rather in a rendered image and that "[t]he best way to predict the perceptual impact of artifacts is to apply the perceptual metric on rendered images". Hence using these masking techniques will allow for a better result when used in games rather than comparing the raw output. Lavoué et. al. also concludes that SSIM performs better than PSNR and therefore the focus lies on SSIM while it is also mainly used in other reports as well. The following reports [28], [13] and [25] uses SSIM and also focus on the subject of rendering resolution, making it the reason for why it will be used in this report as well. According to wang et. al. [29] PSNR and MSE are not good regarding evaluation of perceived image quality, with the main reason being that they measure error instead of a difference in the structural information meaning that MSE could give similar result for completely different alterations of the same image, as presented in their work, while a structural similarity index gives different alterations different results. 4.3. Evaluation 19

(a) Forest Scene (b) Sewer Scene

(c) Office Scene (d) Desert Scene Figure 4.3: Scenes under test - Area in rectangle is further investigated

The scenes will be captured in three different resolutions, 1200p, 1600p and 2400p at 16:10 aspect ratio, as stated in Table 4.1. These resolution are chosen due to the shading rate texture of VRS is scaled by 16 in each axis. 1920x1200, 2560x1600 and 3840x2400 are evenly dividable by 16x16 and will therefore be easier to measure compared to 1080p at 16:9 because it is not evenly dividable by 16 and can therefore cause deviations when doing measurements. The system under test has the following system specifications:

• OS: 10 Pro 64-bit

– Version: 1903 – Build: 18362.592

• GPU: Nvidia GeForce Turing RTX 2060 Super

– Hardware Driver: 445.87 – Vulkan API Version: 1.1.126

• CPU: AMD Ryzen Threadripper 2950X 16-Core

• RAM: 32 GB @ 3200 MHz

4.3.1 Scenes Under Test This will not involve any geometry transformation more than a full screen pass, with the help of a triangle. The open-world game engine will be used to capture a variety 20 Chapter 4. Method of scenes. With the use of the Apex Engine, it will allow complex and different scenes that are; both in-doors and out-doors environment and dark and light environments. The scenes chosen to do the assessments on will be determined by the variety of scenes. Due to the Apex Engine being adapted to both out-door environment and in-door environment this should prove to be the best generic case to evaluate VRS against DRR because of the varied scenes that will be applicable to several games. This thesis will evaluate VRS and DRR in four different scenes. The scenes that will be tested on can be seen in Figure 4.3. The first one is a forest, which will test high variance in colors due to leaves and bushes. The second scene is a sewer which has a lot of dark areas, with high variance in contrast. The third one is in a office which has a general tone in the color pallets. Lastly the desert scene which looks at distant areas where fog and depth of field may mask the artifacts. The location of the crop outs of the scenes are seen in Figure 4.3, which will be look at in the results. Constructing an artificial environment will not reflect the state of the art that is currently in the game industry. Many techniques, mentioned in chapter 1, are used to alleviate the operations where it may not be necessary and the addition of Variable Rate Shading may not be as helpful as it would prove in an artificial environment.

4.4 Validity

To gather the results the experiment was conducted on a external computer which required streaming to control. This could cause performance implications where other applications desire time, especially when the graphics card is used to encode the frames from the used application. Therefore every heuristic got 1000 samples in the render pass time to ensure outliers are seen. This extends to other applications running in background, such as daemons, which can cause interruptions when doing the tests. Due to games naturally including optimizations to perform better this could cause validity threats to the data because some optimizations may intervene with the tests. This is naturally expected because games try to run as fast as possible but because of this the data could show deviating results. It could also prove if in a real-life scenario it will have an impact or not.

4.5 Reliability

When evaluating image quality the images has to be stored to disc and later loaded into MATLAB [16] which could cause image degradation due to compression. The images were stored using PNG format so there should be no loss in image quality but could affect the results. Due to no source code provided for NAS (Nvidia Adaptive Shading) [28] the implementation of Contrast Adaptive Shading (called Nvidia’s heuristic in this thesis) could deviate from the real implementation due to misinterpretation of the text. This could cause Nvidia’s heuristic to deviate in expected image quality. Yang et. al. proposes optional quality improvements which was not implemented due to time constraints. 4.6. Delimitations 21

The upsampling method used for Dynamic Resolution Rendering, Contrast Adap- tive Sharpening (CAS) [9], includes sharpening filter as well. To gather the time for the upsampling CAS is run with and without upsampling and the difference in ms would be the time required to do the upsampling. The timings will be gathered through Nsight. This could cause deviating results which the reader should be aware of.

4.6 Delimitations This thesis will be limited to static scenes because the purpose is to investigate pos- sible performance gains on detailed environment. A rotating camera cannot clearly display detail and has as already been investigated by Yang et. al. [28]. The thesis will also not evaluate DRR and VRS in a continuous span over several lower resolutions in a sense of dynamic resolution. It will specifically investigate possible performance gains at a set resolution of the native resolution using different heuristics and to evaluate how well they perform against upsampling in terms of image quality. Another limitation is that there will only be four scenes which will be evaluated. This could cause unclear results but due to time constraints this should provide enough data to reason about the results. The thesis will not investigate the memory usage of VRS’s Image-Based shading because of its low memory footprint. As can be seen in Table 2.1 in the background that at 3840x2400 px resolution (36,864,000 bytes) a shading rate texture on 36 KB (36,000 bytes) is insignificant compared to what a frame buffer requires to display 2160p resolution, being lower than 1% of the memory usage. The results in this thesis will not show a continuous line to suggest where VRS should and should not be applied but rather a discrete set of scenes that can provide guidelines for possible cases where VRS may or may not show benefits in comparison to DRR.

Chapter 5 Results and Analysis

This chapter will present the results on how the image-based shading through Vari- able Rate Shading (VRS) performed against Dynamic resolution Rendering (DRR) both in time, running the deferred lighting pass, and in image quality using Structural Similarity Index (SSIM). The deferred lighting pass performed faster using Dynamic Resolution Rendering in comparison to Variable Rate Shading but the upsampling of DRR was slower than the tiling process. The results show Variable Rate Shad- ing at 48% of the native pixels while Dynamic Resolution Rendering at 43% of the native pixels, because at those resolution was at around half the speed of the native resolution. This means that the amount of work to be done for the deferred lighting should reflect that as well. The Office and the Sewer scene showed the best result for VRS compared to the other scenes, as seen in Table 5.5 and 5.9 respectively. Where at 2400p VRS performed the best showing no decrease in performance compared to the amount of pixel shader invocations. In the same tables at 1600p VRS performed worse comparing to 1200p. The overhead of using VRS and DRR can be seen in table 5.1. The upsampling of DRR is slower compared to the tiling of VRS. Haar Wavelet Transform has the lowest overhead at below 0.1 ms for all three resolutions, taking 20.5% time of upsampling at 1200p and 12.4% at 2400p, which shows that it scales better. The Nvidia heuristic being the second fastest, also below 0.1 ms for all three resolutions. The combined heuristics for tiling performed the worst among the VRS heuristics but is still faster than the upsampling. The most probable cause of the difference in overhead for the different VRS heuristics is probably due to the texture fetches. Haar Wavelet Transform, Luma Comparison, Albedo and Nvidia all require only one texture fetch. Silhouette requires two texture fetches and the combined requires four. In the Desert Scene the median render time for the deferred lighting pass show that DRR is faster than VRS, as can be seen in Table 5.2. This is because the upsampling is more expensive than VRS’s tiling process. Between VRS heuristics the render pass time is low enough to consider no significant difference. There is also no significant difference between DRR and VRS if 0.3 ms, at 2400p, is low enough to be insignificant as well since the image quality show in favor of VRS. They image quality is higher when previous frame is used as heuristic. 1600p performed worse for VRS in terms of render pass time, as seen in Table 5.3. DRR performed increasingly faster with each step in resolution but VRS show no increase in performance at 1600p deviating from the results of DRR. The Office Scene is a more expensive scene compared to the other scenes, as seen in Table 5.4. Similar to the Desert Scene DRR is running the pass faster here as well

23 24 Chapter 5. Results and Analysis

Upsampling / Tiling 1200p 1600p 2400p Dynamic Resolution Rendering 0.170 0.290 0.660 Haar Wavelet Transform 0.035 0.054 0.082 Luma 0.045 0.069 0.127 Albedo 0.045 0.069 0.127 Silhouette 0.067 0.106 0.166 Nvidia 0.043 0.066 0.098 Combined 0.119 0.194 0.314

Table 5.1: Median time (ms) for upsampling (DRR) and tiling (VRS) at different resolutions using different heuristics.

1200p 1600p 2400p Desert Scene time ssim time ssim time ssim Native 0.956 1.000 1.621 1.000 3.568 1.000 Dynamic Resolution Rendering 0.466 0.923 0.741 0.932 1.563 0.941 Haar Wavelet Transform 0.546 0.952 0.946 0.962 1.874 0.970 Luma 0.548 0.959 0.952 0.966 1.870 0.975 Albedo 0.534 0.952 0.922 0.957 1.837 0.966 Silhouette 0.536 0.945 0.933 0.955 1.824 0.956 Nvidia 0.546 0.961 0.922 0.963 1.838 0.975 Combined 0.549 0.956 0.946 0.963 1.837 0.975

Table 5.2: Median render time (ms) for the deferred lightning pass and global SSIM value at different resolutions using different heuristics in the Desert scene. More detailed values can be seen in the appendix at A.5.

Desert Scene 1200p 1600p 2400p Dynamic Resolution Rendering 48.7 45.6 43.9 Haar Wavelet Transform 57.1 57.9 52.5 Luma 57.3 58.3 52.4 Albedo 55.9 56.9 51.5 Silhouette 56.1 57.6 51.1 Nvidia 57.1 56.9 51.5 Combined 56.3 56.9 51.5

Table 5.3: The percentage of time (%) taken for Variable Rate Shading and Dynamic Resolution Rendering based on the time of the native resolution in the Desert scene when only comparing the deferred lighting pass. 25

1200p 1600p 2400p Office Scene time ssim time ssim time ssim Native 2.200 1.000 3.852 1.000 8.436 1.000 Dynamic Resolution Rendering 1.048 0.934 1.719 0.947 3.637 0.951 Haar Wavelet Transform 1.130 0.966 1.994 0.970 3.941 0.973 Luma 1.115 0.969 2.017 0.978 3.967 0.980 Albedo 1.089 0.940 2.009 0.953 4.059 0.959 Silhouette 1.104 0.940 2.007 0.954 3.965 0.957 Nvidia 1.125 0.970 1.992 0.976 4.102 0.975 Combined 1.103 0.969 1.968 0.974 4.046 0.975

Table 5.4: Median render time (ms) for the deferred lightning pass and global SSIM value at different resolutions using different heuristics in the Office scene. More detailed values can be seen in the appendix at A.7.

Office Scene 1200p 1600p 2400p Dynamic Resolution Rendering 46.8 44.6 43.1 Haar Wavelet Transform 51.4 51.8 46.7 Luma 50.7 52.4 47.0 Albedo 49.5 52.1 47.1 Silhouette 50.2 52.1 47.0 Nvidia 51.1 51.7 48.6 Combined 50.1 51.1 47.8

Table 5.5: The percentage of time (%) taken for Variable Rate Shading and Dynamic Resolution Rendering based on the time of the native resolution in the Office Scene when only comparing the deferred lighting pass.

but at 1080p the difference is marginal due to the difference being at lower than 0.1 ms but at 2400p the difference is almost 0.4 ms. DRR and VRS performed similar in image quality compared to Desert Scene. VRS repeats the same pattern here where the 1600p deviates in regards to the increase in time that DRR has, as can be seen in Table 5.5. While the difference in absolute time at 2400p for VRS and DRR in this scene is larger compared to the Desert Scene the relative time is still faster. In the Forest Scene VRS performed the worst, in terms of render pass time, compared to the other scenes, which can be seen in Table 5.6. Similar as the other scenes at a resolution of 1200p the time is almost insignificant, at around 0.15 ms more at most. Where 0.01 ms could be negligible due to the timings of the graphics card not being able to measure that precisely and rounding would make at around 0.1 ms difference. In the same Table, at 2400p, VRS performs up to 0.5 ms slower which is considerably slower than any other scene when comparing to DRR but has considerably higher image quality. The relative timings show decreased performance since there is no or very limited decrease in timing with each increased step in resolution, as can be seen in Table 5.7. VRS performed second best in the Sewer Scene, where the best was in the Office 26 Chapter 5. Results and Analysis

1200p 1600p 2400p Forest Scene time ssim time ssim time ssim Native 1.166 1.000 1.972 1.000 4.310 1.000 Dynamic Resolution Rendering 0.577 0.792 0.914 0.804 1.971 0.817 Haar Wavelet Transform 0.687 0.879 1.147 0.907 2.507 0.902 Luma 0.688 0.884 1.161 0.907 2.477 0.917 Albedo 0.678 0.860 1.132 0.883 2.441 0.896 Silhouette 0.674 0.862 1.149 0.887 2.439 0.897 Nvidia 0.690 0.879 1.167 0.905 2.463 0.915 Combined 0.688 0.878 1.159 0.906 2.461 0.914

Table 5.6: Median render time (ms) for the deferred lightning pass and global SSIM value at different resolutions using different heuristics in the Forest scene. More detailed values can be seen in the appendix at A.8.

Forest 1200p 1600p 2400p Dynamic Resolution Rendering 49.7 46.3 43.9 Haar Wavelet Transform 59.3 58.1 58.2 Luma 59.3 58.9 57.5 Albedo 58.5 57.4 56.7 Silhouette 58.2 58.2 56.6 Nvidia 59.6 59.1 57.1 Combined 59.4 58.8 57.1

Table 5.7: The percentage of time (%) taken for Variable Rate Shading and Dynamic Resolution Rendering based on the time of the native resolution in the Forest Scene when only comparing the deferred lighting pass. 27

1200p 1600p 2400p Sewer Scene time ssim time ssim time ssim Native 1.130 1.000 1.961 1.000 4.284 1.000 Dynamic Resolution Rendering 0.561 0.932 0.903 0.939 1.966 0.949 Haar Wavelet Transform 0.608 0.976 1.074 0.980 2.157 0.979 Luma 0.594 0.981 1.061 0.984 2.134 0.985 Albedo 0.615 0.947 1.062 0.955 2.217 0.960 Silhouette 0.626 0.948 1.083 0.956 2.094 0.967 Nvidia 0.617 0.982 1.080 0.985 2.199 0.985 Combined 0.593 0.980 1.063 0.985 2.067 0.985

Table 5.8: Median render time (ms) for the deferred lightning pass and global SSIM value at different resolutions using different heuristics in the Sewer scene. More detailed values can be seen in the appendix at A.6.

Sewer 1200p 1600p 2400p Dynamic Resolution Rendering 49.3 46.0 46.0 Haar Wavelet Transform 52.7 54.8 49.3 Luma 53.1 54.1 49.9 Albedo 53.4 54.2 49.8 Silhouette 54.1 55.2 48.9 Nvidia 53.1 55.1 49.2 Combined 51.6 54.2 48.2

Table 5.9: The percentage of time (%) taken for Variable Rate Shading and Dynamic Resolution Rendering based on the time of the native resolution in the Sewer Scene when only comparing the deferred lighting pass.

Scene, which can be seen in Table 5.9. At 1200p the difference was almost none at all and at 2400p the difference is around 0.2 ms. What is interesting here is that Albedo and Silhouette performed considerately worse compared to when using the previous frame as heuristic, where it is almost similar to DRR. When adding the tiling/upsampling to the render pass time, as shown in Table 5.1, VRS and DRR are at most marginally different in performance. This is if the overhead is only applied to the deferred lighting pass. The combined heuristics is overall slower in the Forest Scene, but DRR is slower at 2400p in the Sewer Scene. One of the two most interesting parts of the performance is that at 1600p the gap between VRS and DRR is bigger in comparison to 1200p and 2400p. According to these results VRS is less reliable at 1600p resolution in comparison to the others. This anomaly could be either because of a bug in the implementation or there could be bug in the VRS, but further investigation has to be made. By looking at the box plots in the Appendix A.6, A.5, A.8 and A.7 it is possible to see that there are many outliers in the results. This could mostly be cause by the streaming application which was required to conduct the tests. Because of the image-based shading provided by VRS it shows relative higher 28 Chapter 5. Results and Analysis image quality in comparison to DRR. DRR is always below 0.95 in structural sim- ilarity. When looking at the global SSIM values the results show that using the previous frame provides higher SSIM value and therefore is more similar to the orig- inal image. The Forest Scene had overall the lowest quality when using DRR. According to Figure 5.3 DRR has trouble displaying the leaves and the grass correctly in the scene, which is emphasized by comparing Figure 5.6a and 5.6b. As can be seen in the forest scene and the sewer scene is that the in the figure 5.6b and 5.7b DRR is beneficial for shadows since the DRR is naturally blurred due to the upsampling. Looking at the VRS techniques the shadows becomes pixelated at the edges. While also looking at the forest scene image comparison it is possible to see that the bushes and leaves of the forest is better represented with VRS, though albedo also has a lot of greys and blacks, by looking at the grayscale local SSIM image, at the leaves and bushes which lower the quality of the scene. DRR also struggled to maintain as high of a structural similarity as VRS in the Sewer Scene, except Silhouette and Albedo. The ceiling with a lot of noise in contrast is harder for the Albedo, Silhouette and DRR to maintain higher visual quality, this is closer shown in Figure 5.7e, 5.7f and 5.7b. But according to 5.7f it is possible to see that the edges in the scene maintain higher visual quality. Figures 5.6 and 5.7 show the difference in visual quality in close up. In the sewer scene image comparison it is possible to see that the luma comparison provides overall the best quality as it provides the most white in the image. The silhouette is among the better to provide higher quality around the edges of objects in the scene but fails to keep the quality high in the ceiling due to being a flat surface. Looking at the cropped out image comparison in the office Scene, Figure 5.8, the DRR causes blurred visual quality making the bottles seem out of focus and not very sharp. But the VRS keeps the image sharp in that region of the scene, but albedo lowers the quality here more than the other VRS heuristics. The rug in the same scene is harder maintained in the DRR, Albedo and Silhouette. The last cropped image to compare against is the desert scene, this proved albedo being the better alternative, as shown in Figure 5.9. As can be seen in table 5.1 the SSIM values gets closer to the native with higher resolution. It is also possible to detect it in the appendix in figures A.1, A.2, A.4 and A.3, which get whiter with higher resolutions, which is also more true for the image- based shading capabilities of VRS because the shading rate tiles becomes smaller with increased resolution. It is also possible to see how image-based shading benefits from higher resolution in the Albedo and Silhouette heuristics, in the sewer scene, in figures A.4 and A.4e. 29

Desert Scene Image Quality Performance 0.98

3 0.96 (ms) 2 SSIM Value 0.94 Time 1 Global 0.92 1200p 1600p 2400p 1200p 1600p 2400p Resolution Resolution Native DRR HWT Luma Albedo Silhouette Nvidia Combined Office Scene Image Quality Performance

0.98 8

6

0.96 (ms)

SSIM Value 4 Time

0.94 2 Global

1200p 1600p 2400p 1200p 1600p 2400p Resolution Resolution Native DRR HWT Luma Albedo Silhouette Nvidia Combined

Figure 5.1: Left: Global SSIM value. Right: Performance including both the time for the overhead and the lighting pass. (Note: It is not a linear increase between resolutions) 30 Chapter 5. Results and Analysis

Sewer Scene Image Quality Performance

0.98 4 0.96 3

0.94 (ms)

SSIM Value 2

0.92 Time

1 Global 0.9 1200p 1600p 2400p 1200p 1600p 2400p Resolution Resolution Native DRR HWT Luma Albedo Silhouette Nvidia Combined Forest Scene Image Quality Performance

4 0.9

3 (ms) 0.85 SSIM Value 2 Time

1 Global 0.8

1200p 1600p 2400p 1200p 1600p 2400p Resolution Resolution Native DRR HWT Luma Albedo Silhouette Nvidia Combined

Figure 5.1: Left: Global SSIM value. Right: Performance including both the time for the overhead and the lighting pass. (Note: It is not a linear increase between resolutions) 31

(a) DRR (b) HWT (c) Luma

(d) Albedo (e) Silhouette (f) Nvidia (g) Combined Figure 5.2: Local SSIM Values for the different heuristics in Desert Scene at 1200p. Comparison between resolutions can be seen in the appendix in figure A.1.

(a) DRR (b) HWT (c) Luma

(d) Albedo (e) Silhouette (f) Nvidia (g) Combined Figure 5.3: Local SSIM Values for the different heuristics in Forest Scene at 1200p. Comparison between resolutions can be seen in the appendix in figure A.2.

(a) DRR (b) HWT (c) Luma

(d) Albedo (e) Silhouette (f) Nvidia (g) Combined Figure 5.4: Local SSIM Values for the different heuristic in Office Scene at 1200p. Comparison between resolutions can be seen in the appendix in figure A.3. 32 Chapter 5. Results and Analysis

(a) DRR (b) HWT (c) Luma

(d) Albedo (e) Silhouette (f) Nvidia (g) Combined Figure 5.5: Local SSIM Values for the different heuristic in Sewer Scene at 1200p. Comparison between resolutions can be seen in the appendix in figure A.4. 33

(a) Native Resolution (b) Dynamic Resolution Rendering

(c) Haar Wavelet Transform (d) Luma

(e) Albedo (f) Silhouette

(g) Nvidia (h) Combined

Figure 5.6: A zoomed in section of the Forest Scene on the final image using different heuristics at 1200p (Crop at 240x170 resolution). 34 Chapter 5. Results and Analysis

(a) Native Resolution (b) Dynamic Resolution Rendering

(c) Haar Wavelet Transform (d) Luma

(e) Albedo (f) Silhouette

(g) Nvidia (h) Combined

Figure 5.7: A zoomed in section of the Sewer Scene on the final image using different heuristics at 1200p (Crop at 240x170 resolution). 35

(a) Native Resolution (b) Dynamic Resolution Rendering

(c) Haar Wavelet Transform (d) Luma

(e) Albedo (f) Silhouette

(g) Nvidia (h) Combined

Figure 5.8: A zoomed in section of the Office Scene on the final image using different heuristics at 1200p (Crop at 240x170 resolution). 36 Chapter 5. Results and Analysis

(a) Native Resolution (b) Dynamic Resolution Rendering

(c) Haar Wavelet Transform (d) Luma

(e) Albedo (f) Silhouette

(g) Nvidia (h) Combined

Figure 5.9: A zoomed in section of the Desert Scene on the final image using different heuristics at 1200p (Crop at 240x170 resolution). Chapter 6 Discussion

The first research question:

"How does Variable Rate Shading’s Image-Based Shading compare to Dynamic Resolution Rendering in terms of performance and image quality when used on a deferred lighting pass?"

Through the gathered results it is possible to see that the performance difference, in terms of render pass time, is minimal in comparison for a game. Since Dynamic Resolution Rendering (DRR) is primarily used to gain performance in this compari- son it is possible to see that Variable Rate Shading Image-Based Shading show the possibility of also increasing performance if necessary. Flynn et. al. [7] proposes that when using SSIM on grayscale images a value of 95 will be the threshold on whether a compression is noticeable or not. It was a small study with only 28 participants using still images and therefore in a game that value could be less due to motion. This indicate that all scenes, except for Forest Scene, show no visible compression when using image-based shading with previous frame as heuristic. The results on performance only show for a single data point at each resolution which makes it difficult to conclude if a dynamic resolution would be possible, but what can be concluded is that there are performance gains to be had which can hint at a possible contender for dynamic resolution. The overhead of the image- based shading, based on the heuristics, can be very lightweight too compared to the increased performance. This show that the upsampling is slower than the tiling and if additional shading rate textures can be justified, both in extra memory usage and tiling time, it can increase the visual quality of the image with less overhead due to no need of upsampling. Some games try to attain cinematic visual quality rather than clarity by using Anti-Aliasing techniques to smooth edges, this could require extra post-processing of the image to reduce the pixelated look of Variable Rate Shading, if that is desired. DRR has the advantage when using on several passes because the upsampling is applicable in the end of the rendering pipeline, this is where VRS could cause perform worse because of increased tiling passes. If several shading rate textures are necessary to obey to different passes this could also cause increase memory usage. This causes DRR to scale better in comparison if there is a need to apply to several passes with different shading rate textures. For example in Gears Tactics [27] the fixed shading rate seemed to be enough for some post-processing effects which may introduce another shading rate texture to allow a dynamic change in resolution. For

37 38 Chapter 6. Discussion example a foveated focused shading rate texture to which determines the radius of the circle to use 1x shading rate based on the frame time of the game. This thesis focused on which technique achieved the most truthful image when compared to the native resolution. The information in this thesis show that VRS can represent the original scene better than DRR. This is not the only way to investigate their differences since the truth may not be what is desired. The biggest advantage to VRS allows it to detect high frequency changes in an image. This allows it to maintain higher visual quality where DRR would otherwise blur the result which on the other hand DRR provides better visual quality where blur would otherwise be preferred. Shadows’ penumbra is more true to the original with DRR rather than VRS that pixelates the shadow. This could prove that VRS should be used where high frequency changes should be pertained like sharpness and DRR could provide better image quality where blur would be necessary. It could also be possible to blur parts of the image that used coarse shading and decide which coarse shaded tiles should be blurred. The result gathered show only the comparison between VRS and DRR in games aiming to be realistic. Games going for a cartoon look where more plain colors will be used may benefit from VRS a lot more in terms of performance to image quality ratio. VRS excel at keeping detail where desired. The forest scene proved it the most, by looking at the bushes and the grass the DRR failed to represent it as good as VRS in most cases. By using the previous frame as heuristic is possible to easier predict where lower resolution should be as well make the shading rate texture generic enough for most passes to use it. Variable Rate Shading is a better fit when there are expected areas that will not require a higher resolution, for example objects in near black areas and while there are also highlighted areas that can receive higher resolution. The silhouette rendering may not prove useful on its own when compared against its native resolution but comparing to lower resolution the increased cost of shading edges may be prove beneficial as a way to handle anti-aliasing. The second research question:

"How does different heuristics compare in image-based shading in regards to performance and image quality?" At first glance using the previous frame as heuristic is the most optimal in all cases to produce the best visual quality. There is no significant difference in render pass time between them which emphasizes that the heuristic will only affect visual quality and not cause any unexpected performance implications.Therefore when choosing a heuristic it is better to focus on gaining higher image quality rather than gaining frame time. When looking closer at the cropped images it hints that other heuristics may show the best results in specific areas of the screen. For example the Albedo in the Desert Scene show potential increased visual quality at distant objects where post-effects and other potential masking effects could cause the affect the results. Also, the Silhouette show potential visual quality increase in the Sewer Scene where geometry edges may be more apparent than RGB values. This means that depending on the type of environment that the scene has it may be worth to investigate what is the most important aspect of the scene. In most cases it would probably be most optimal to use the previous frame but in out-door scenes where the contrast is not as apparent as in in-door scenes there may be other heuristic which fit better. Chapter 7 Conclusions and Future Work

Using image-based shading on the deferred lighting pass allow the possibility of ex- tracting similar performance gains as dynamic resolution rendering but allows main- taining higher image quality. The gain in time is less predictable compared to DRR but the impact is not significant compared to the time in full resolution. Variable Rate Shading’s image-based shading could become the new standard for dynamic resolution in games and may replace Dynamic Resolution Rendering. The overhead of the tiling process show minimal impact and the memory footprint is also negli- gible. This puts the focus on what is instead trying to be achieved rather than to enable the functionality. The image-based shading can better handle high frequency patterns, where there is a high contrast ratio or detailed texture, but the blur caused by the upsampling of DRR will increase the image quality where sharp edges are not desired, e.g. a shadow’s penumbra. There are several heuristics to choose between and this thesis has investigated a few possibilities. All heuristics show higher image quality but using the previous frame gained the highest image quality but other heuristics can have their advantages. The only limiting factor of image-based shading is its option of only 1x, 2x and 4x shading rate. There are some anomalies in the results at 1600p in performance timings for the image-based shading. This should be further investigated before it can be confirmed, but at the moment developers should just be aware. This is not a conclusive look at Variable Rate Shading’s Image-Based shading in the deferred lighting pass, because there are more aspects of this topic to investigate and confirm. This thesis only highlights what image-based shading can potentially help with in games and that dynamic resolution rendering may not be the only way to gain performance and if higher image quality is of importance adaptive image-based will allow to decrease less important detail before decreasing the more important ones.

7.1 Future Work Adaptive shading has and is a hot topic to research in real-time rendering because of hardware limitations and the desire to increase the rendering resolution. Current games are still mainly working with uniform shading and investigating more adaptive ways would prove beneficial for games. With the release of Variable Rate Shading it has recently been possible with adaptive shading in hardware through the rasteriza- tion pipeline. It still very early in its life cycle with only a small portion of games

39 40 Chapter 7. Conclusions and Future Work in the market supporting it, as of writing. To further investigate the use of Variable Rate Shading and its adaptive capabilities there are several interesting topics to look at. Where possible future work include:

• Due to how Variable Rate Shading works it can cause pixelated results when using 2x and 4x shading rate. Further investigation would be to look at tech- niques removing these to improve image quality. With the help of the shading rate texture it will be possible to determine where to apply post-processing.

• Variable Rate Shading has been mainly proposed as a technique to increase image quality. Another topic to investigate the use to enhance gameplay ex- perience. Regions of interest on the screen would receive higher resolution to clearly display it in detail. In online multiplayer first person shooters enemy players are important to notice to win the game and increase resolution on them may increase the visibility of an enemy and will therefore be easier to defeat.

• Investigate image-based shading’s dynamic performance and evaluate if there is an even enough correlation between image-based shading and dynamic reso- lution rendering and evaluate the performance of 4x shading rate. There could be conflicts when using heuristics based on the scene data. In this case an example could be to change resolution by using a type of foveated rendering as heuristics, where inside the radius of an ellipse would equal 1x1 shading rate and outside would result in 2x2 shading rate. When performance would be necessary the radius could decrease and then increase if image quality is preferred. The outer bounds could be further developed to include the possi- bility of one axis using use 1x to increase image quality since it would be less apparent. This could also be applied to less visible post-effects like SSAO or Global Illumination where the results may not be as noticeable. References

[1] Tomas Akenine-Möller, Eric Haines, and Naty Hoffman. Real-Time Rendering, Fourth Edition, 4th Edition. A K Peters/CRC Press, 4 edition, 2018. [2] Windows Dev Center. D3D11 FILTER enumeration, December 5 2018. [On- line]. Accessed: July 31 2020. Available: https://docs.microsoft.com/en-us/ windows/win32/api/d3d11/ne-d3d11-d3d11_filter. [3] Marissa du Bois. Variable rate shading tier 1 with microsoft directx* 12 from theory to practice, April 7 2020. [Online] Accessed: July 31 2020.. Available: https://devmesh.intel.com/projects/variable-rate-shading- tier-1-with-microsoft-directx-12-from-theory-to-practice. [4] P. Dubla, K. Debattista, and A. Chalmers. Adaptive interleaved sampling for interactive high-fidelity rendering. Computer Graphics Forum, 28(8):2117–2130, 2009.

[5] Jalal Eddine El Mansouri. Rendering ’rainbow six | siege’, 2016. [Online]. Accessed: July 31 2020. Available: https://www.gdcvault.com/play/1023287/ Rendering-Rainbow-Six-Siege.

[6] Patrick V. Fleet. Discrete Wavelet Transformations: An Elementary Approach with Applications. Wiley-Interscience, Hoboken, 1. aufl. edition, 2011;2008;. [7] Jeremy R. Flynn, Steve Ward, Julian Abich, and David Poole. Image quality assessment using the ssim and the just noticeable difference paradigm. In Don Harris, editor, Engineering Psychology and Cognitive Ergonomics. Understand- ing Human Cognition, pages 23–30, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.

[8] Digital Foundry. [4k] in theory: Should next-gen consoles focus on ’true 4k’ rendering?, July 14 2018. [Online]. Accessed: July 31 2020. Available: https: //www.youtube.com/watch?v=SWcRtzjyH-c. [9] AMD GPUOpen. Fidelityfx, July 9 2019. [Online] Accessed: July 31 2020. Available: https://gpuopen.com/fidelityfx-cas/. [10] Shawn Hargreaves, Adam Lake, Kelly Gawne, and John Kloetzli. Boost rendering performance with variable rate shading | game developers confer- ence 2019, March 20 - 22 2019. [Online]. Accessed: July 31 2020. Available: https://www.youtube.com/watch?v=f-SklVb2MDI.

41 42 References

[11] Intel. Dynamic resolution rendering article, July 13 2011. [Online]. Accessed: July 31 2020. Available: https://software.intel.com/content/www/us/en/ develop/articles/dynamic-resolution-rendering-article.html.

[12] G. Lavoué, M. Langer, A. Peytavie, and P. Poulin. A psychophysical evaluation of texture compression masking effects. IEEE Transactions on Visualization and Computer Graphics, 25(2):1336–1346, 2019.

[13] Ian Mallett and Cem Yuksel. Deferred adaptive compute shading. In Proceedings of the Conference on High-Performance Graphics, HPG ’18, New York, NY, USA, 2018. Association for Computing Machinery.

[14] Justin Massongill. Insomniac interview: The tech behind marvel’s spider-man. [Online]. Accessed: July 31 2020. Available: https: //blog.playstation.com/2018/09/06/insomniac-interview-the-tech- behind-marvels-spider-man/.

[15] Justin Massongill. This is next-gen: see unreal engine 5 run- ning on playstation 5. [Online]. Accessed: July 31 2020. Avail- able: https://www.eurogamer.net/articles/digitalfoundry-2020-this- is-next-gen-unreal-engine-running-on-playstation-5.

[16] MathWorks. Matlab. [Online]. Accessed: July 31 2020. Available: https:// se.mathworks.com/products/matlab.html.

[17] Don P. Mitchell. Generating antialiased images at low sampling densities. In Proceedings of the 14th Annual Conference on Computer Graphics and Inter- active Techniques, SIGGRAPH ’87, page 65–72, New York, NY, USA, 1987. Association for Computing Machinery.

[18] Mike Nelson. Introducing new packaging icons for xbox. [Online]. Ac- cessed: July 31 2020. Available: https://news.xbox.com/en-us/2017/06/11/ new-packaging-icons-xbox/.

[19] Microsoft Development Network. Variable-rate shading (VRS). [Online]. Accessed: July 31 2020. Available: https://docs.microsoft.com/en-us/ windows/win32/direct3d12/vrs.

[20] Nvidia. VRWorks - Variable Rate Shading (VRS). [Online]. Accessed: July 31 2020. Available: https://developer.nvidia.com/vrworks/graphics/ variablerateshading.

[21] Ryan S. Overbeck, Craig Donner, and Ravi Ramamoorthi. Adaptive wavelet rendering. ACM Trans. Graph., 28(5):1–12, December 2009.

[22] Microsoft Game Stack. Variable Rate Shading, A Deep Dive | Game Developers Conference 2019, May 17 2019. [Online]. Accessed: July 31 2020. Available: https://www.youtube.com/watch?v=2vKnKba0wxk. References 43

[23] . Steam Hardware Survey, 2020. [Online]. Accessed: July 31 2020. Available: https://store.steampowered.com/hwsurvey/Steam- Hardware-Software-Survey-Welcome-to-Steam. [24] Michael Stengel, Steve Grogorick, Martin Eisemann, and Marcus Magnor. Adap- tive image-space sampling for gaze-contingent real-time rendering. Computer Graphics Forum, 35(4):129–139, 2016. [25] K. Vaidyanathan, M. Salvi, R. Toth, T. Foley, T. Akenine-Möller, J. Nils- son, J. Munkberg, J. Hasselgren, M. Sugihara, P. Clarberg, T. Janczak, and A. Lefohn. Coarse pixel shading. In Proceedings of High Performance Graphics, HPG ’14, pages 9–18, Goslar Germany, Germany, 2014. Eurographics Associa- tion.

[26] Michal Valient. DEFERRED RENDERING IN KILLZONE 2, July 26 2007. [Online]. Accessed: July 31 2020. Available: https://www.guerrilla- games.com/read/deferred-rendering-in-killzone-2. [27] Jacques Van Rhyn. Iterating on Variable Rate Shading in Gears Tac- tics, May 26 2020. [Online]. Accessed: July 31 2020. Available: https: //devblogs.microsoft.com/directx/gears-tactics-vrs/. [28] Lei Yang, Dmitry Zhdan, Emmett Kilgariff, Eric B. Lum, Yubo Zhang, Matthew Johnson, and Henrik Rydgård. Visually lossless content and motion adaptive shading in games. Proc. ACM Comput. Graph. Interact. Tech., 2(1):6:1–6:19, June 2019.

[29] Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.

Appendix A Supplemental Information

45 46 Appendix A. Supplemental Information

(a) Dynamic Resolution Rendering

(b) Haar Wavelet Transform

(c) Luma

(d) Albedo

(e) Silhouette

(f) Nvidia

(g) Combined Figure A.1: Local SSIM values in Desert Scene at 1200p, 1600p and 2400p. 47

(a) Dynamic Resolution Rendering

(b) Haar Wavelet Transform

(c) Luma

(d) Albedo

(e) Silhouette

(f) Nvidia

(g) Combined Figure A.2: Local SSIM values in Forest Scene at 1200p, 1600p and 2400p. 48 Appendix A. Supplemental Information

(a) Dynamic Resolution Rendering

(b) Haar Wavelet Transform

(c) Luma

(d) Albedo

(e) Silhouette

(f) Nvidia

(g) Combined Figure A.3: Local SSIM values in Office Scene at 1200p, 1600p and 2400p. 49

(a) Dynamic Resolution Rendering

(b) Haar Wavelet Transform

(c) Luma

(d) Albedo

(e) Silhouette

(f) Nvidia

(g) Combined Figure A.4: Local SSIM values in Sewer Scene at 1200p, 1600p and 2400p. 50 Appendix A. Supplemental Information

Figure A.5: Box plot of desert scene with 1000 samples 51

Figure A.6: Box plot of sewer scene with 1000 samples 52 Appendix A. Supplemental Information

Figure A.7: Box plot of office scene with 1000 samples 53

Figure A.8: Box plot of forest scene with 1000 samples

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden