DEGREE PROJECTIN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2021

Real-time Ray Traced and Animation Image quality and performance of hardware- accelerated ray traced ambient occlusion

FABIAN WALDNER

KTHROYALINSTITUTEOFTECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Real-time Ray Traced Ambient Occlusion and Animation

Image quality and performance of hardware-accelerated ray traced ambient occlusion

FABIAN Waldner

Master’s Programme, Industrial Engineering and Management, 120 credits Date: June 2, 2021

Supervisor: Christopher Peters Examiner: Tino Weinkauf School of Electrical Engineering and Computer Science Swedish title: Strålspårad ambient ocklusion i realtid med animationer Swedish subtitle: Bildkvalité och prestanda av hårdvaruaccelererad, strålspårad ambient ocklusion © 2021 Fabian Waldner Abstract | i

Abstract

Recently, new hardware capabilities in GPUs has opened the possibility of in real-time at interactive framerates. These new capabilities can be used for a range of ray tracing techniques - the focus of this thesis is on ray traced ambient occlusion (RTAO). This thesis evaluates real-time ray RTAO by comparing it with ground- truth ambient occlusion (GTAO), a state-of-the-art screen space ambient occlusion (SSAO) method. A contribution by this thesis is that the evaluation is made in scenarios that includes animated objects, both rigid-body animations and skinning animations. This approach has some advantages: it can emphasise visual artefacts that arise due to objects moving and animating. Furthermore, it makes the performance tests better approximate real-world applications such as video games and interactive visualisations. This is particularly true for RTAO, which gets more expensive as the number of objects in a scene increases and have additional costs from managing the ray tracing acceleration structures. The ambient occlusion methods are evaluated in terms of image quality and performance. Image quality is assessed using structural similarity index (SSIM) and through visual inspection. The performance is assessed by measuring computation time, in milliseconds. This thesis shows that the image quality of RTAO is a substantial improvement over GTAO, being close to offline rendering quality. The primary visual issue with RTAO is visible noise - especially noticeable around the contours of moving objects. Nevertheless, GTAO is very competitive due to its performance, the computation time for all GTAO tests were below one ms per frame. At 1080p full-resolution GTAO was computed in 0.3883 ms on a RTX 3070 GPU. In contrast, the computation time of RTAO at 1080p and two samples per were 2.253 ms. The cost of updating and rebuilding ray tracing acceleration structures were also noteworthy. Overall, the results indicate that hardware accelerated ray tracing can be used for significant improvements in image quality but adoption of this technique is not trivial due to performance concerns.

Keywords Ambient occlusion, ray tracing, real-time, animation ii | Sammanfattning

Sammanfattning

Med hårdvaruaccelererad strålspårning på grafikkort som introducerades nyligen möjliggjordes flera strålspårningsbaserade tekniker för rendering i realtid. Detta examensarbete undersöker en sådan teknik - strålspårad ambient ocklusion (engelska: ray traced ambient occlusion (RTAO)). RTAO undersöks och utvärderas för användning i realtidsapplikationer genom en jämförelse med en ambient-ocklusionsmetod som beräknas i bildrummet (screen space ambient occlusion (SSAO)) kallad ground-truth ambient occlusion (GTAO). Detta examensarbete bidrar genom att utvärdera metoderna i testscenarion som inkluderar animerade objekt. Detta medför ett antal fördelar: utvärderingen kan betona visuella artefakter som kan uppstå när objekt rör sig och animeras. Vidare gör det att prestandatesterna kan inkludera kostnader som tillkommer när scener innehåller animerade objekt - detta är särskilt betydande för RTAO som blir dyrare att beräkna när antalet objekt stiger samt har ytterligare kostnader för att uppdatera datastrukturer som används för att accelerera strålspårningen. På så vis närmar sig testscenarion en bred kategori av applikationer som använder rendering i realtid, exempelvis spel och interaktiva visualiseringar. Utvärderingen sker på uppnådd bildkvalité samt metodernas prestanda. Bildkvalitén utvärderas genom structural similarity index (SSIM) samt visuellinspektion. Prestandan utvärderas genom att mäta beräkningstiden i millisekunder. Resultaten visar att RTAOs bildkvalité är tydligt överlägsen GTAO och närmar sig de resultat som uppnås genom förrendering. Det primära problemet med RTAOs bildkvalité är förekomsten av visuellt brus. Detta är extra tydligt runt konturerna på de objekt som är animerade och förflyttar sig. Hursomhelst är GTAO attraktivt då denna metod kan beräknas betydligt snabbare än RTAO. Samtliga GTAOs prestandatester visade på beräkningstider under en millisekund. Vid en upplösning på 1080p med två prov per (samples per pixel) var beräkningstiden för RTAO 2,253 ms. Kostnaden för att uppdatera datastrukturerna för strålspårningen visade sig också vara betydlig i många tester. Sammantaget indikerar resultaten att hårdvaruaccelererad strålspårning kan resultera i en signifikant förbättring av bildkvalité men att det kan innebära en kostnad som kräver betänklighet.

Nyckelord Ambient ocklusion, strålspårning, realtid, animation Acknowledgments | iii

Acknowledgments

I wish to thank my supervisor Christopher Peters for providing valuable advice and feedback.

Stockholm, June 2021 Fabian Waldner iv | Acknowledgments CONTENTS | v

Contents

1 Introduction1 1.1 Background ...... 1 1.2 Purpose ...... 3 1.3 Research question ...... 4 1.3.1 Research question 1 - evaluating image quality . . . . 6 1.3.2 Research question 2 - evaluating performance . . . . . 6 1.3.3 Summary ...... 7 1.4 Hypothesis ...... 7 1.5 Goals ...... 8 1.6 Research methodology ...... 8 1.7 Delimitations ...... 10 1.8 Structure of the thesis ...... 13

2 Background 15 2.1 Ray tracing ...... 15 2.1.1 Overview of the ray tracing rendering model . . . . . 15 2.1.2 and the rendering equation ...... 18 2.1.3 Real-time rendering and ...... 20 2.1.4 Real-time ray tracing ...... 21 2.2 Ambient occlusion ...... 25 2.2.1 Ambient occlusion ...... 25 2.2.2 Ray traced ambient occlusion (RTAO) ...... 28 2.2.3 Screen space ambient occlusion (SSAO) ...... 29 2.3 Denoising ...... 30 2.4 Animation ...... 33 2.4.1 Rigid-body transformation ...... 33 2.4.2 Vertex-blending or "skinning" ...... 34 2.5 SSIM ...... 34 2.5.1 Image quality assessment ...... 34 vi | CONTENTS

2.5.2 Structural similarity index metric (SSIM) ...... 35 2.6 Related work ...... 36 2.6.1 Crytek SSAO ...... 36 2.6.2 Horizon-based ambient occlusion ...... 37 2.6.3 Ground truth ambient occlusion (GTAO) ...... 40 2.6.4 RTAO and SSAO ...... 43

3 Method 45 3.0.1 Overview of the experimental design ...... 45 3.0.2 Image quality tests ...... 46 3.0.3 Performance tests ...... 49 3.1 Execution ...... 50 3.1.1 Image quality tests ...... 51 3.1.2 Performance tests ...... 53

4 Evaluation 57 4.1 Results and analysis ...... 57 4.2 Image quality tests ...... 58 4.2.1 Scenario IQ.1 SSIM scores ...... 58 4.2.2 Scenario IQ.2 SSIM scores ...... 62 4.2.3 Scenario IQ.3 SSIM scores ...... 65 4.2.4 Visual inspection ...... 69 4.3 Performance tests ...... 79 4.3.1 Scenario P.1 ...... 79 4.3.2 Scenario P.2 & P.3 ...... 85 4.4 Discussion ...... 91 4.4.1 On research question 1 - evaluating image quality . . . 91 4.4.2 On research question 2 - evaluating performance . . . 93 4.4.3 On the main research question ...... 95 4.5 Limitations ...... 97

5 Future work and conclusion 101 5.1 Future work ...... 101 5.2 Conclusion ...... 102

References 105

A Videos 111 A.1 Extended figures ...... 111 A.2 Scenario IQ.1 videos ...... 113 Contents | vii

A.2.1 RTAO ...... 113 A.2.2 GTAO ...... 113 A.2.3 GTAO Full-resolution ...... 113 A.3 Scenario IQ.2 videos ...... 114 A.3.1 RTAO ...... 114 A.3.2 GTAO ...... 114 A.3.3 GTAO Full-resolution ...... 114 A.4 Scenario IQ.3 videos ...... 115 A.4.1 RTAO ...... 115 A.4.2 GTAO ...... 115 A.4.3 GTAO Full-resolution ...... 115 viii | Contents LIST OF FIGURES | ix

List of Figures

1.1 An illustration of the difference between constant ambient lightning and ambient occlusion (right image). Ambient occlusion can convey a lot of information about the form of an object and bring out details, while the constant ambient lightning only conveys the silhouette. Image taken from [1]. "Dragon" model, by Delatronic, Benedikt Bitterli Rendering Resources, licensed under CC BY 3.0...... 2

2.1 An illustration of a simple pinhole camera model. This camera model, when altered slightly, is a conceptual building block for rendering images with ray tracing...... 15 2.2 A simple ray tracing model, describing some fundamental concepts. The is divided into a grid, each cell representing a picture element (pixel). The colour of the pixel indicated in this illustration would be calculated by including the contributions of the reflection ray, the shadow ray and the refraction ray. For simplicity, only one instance of a ray type is depicted...... 17 2.3 A flow chart of what happens when TraceRay() is called. Figure taken from [2] ...... 22 2.4 A point p is sampled by casting rays. Notice that rays closer to the normal are fatter, indicating that they contribute more to the ambient occlusion factor. This point would likely receive an ambient occlusion factor ≈ 0.5 ...... 27 2.5 Illustration of raw RTAO (left) with one sample per pixel, and RTAO that has been denoised using a spatio-temporal approach and an edge-aware filter (right). Image from D3D12 Raytracing Real-Time Denoised Ambient Occlusion sample by Peter Kristof, Microsoft...... 30 x | LIST OF FIGURES

2.6 Illustration of Crytek SSAO. Two points are sampled with six

samples each. Note that p1 will be over-occluded because the samples are taken from a sphere...... 37 2.7 Demonstration of Crytek SSAO. Image taken from [3]. Note how flat surfaces are lighter towards the edges, a characteristic of this method and a consequence of the sampling method used. 38 2.8 Demonstration of HBAO. Image taken from [3]...... 39 2.9 A conceptual overview of the main components used to calculate HBAO. Refer to the text for an explanation...... 40 2.10 A diagram of the reference frame used for calculating the horizon angles in GTAO, θ1 and θ2. ωi is the view vector. Figure taken from [4] ...... 42

3.1 Overview of the evaluation scheme...... 45 3.2 A rendering of the UE4 Sun Temple, done in Maya. The camera view presents a deep view of the hallway, which can allow testing of occlusion as objects move in and out of depth and obscure each other. Also note the varying degree of detail and mixture of smoother and sharper shapes...... 46 3.3 A rendering of the scene used for Scenario IQ.3 - consisting of high resolution face that employs facial rigging for the animations. Rendered in Maya...... 47 3.4 Overview of the image quality tests...... 48 3.5 Baseline renderings. Some models in the Solar Temple scene have been rendered with flat colours. This is used to establish a baseline between Maya and Unity before comparing the results of applying AO. Sub-figure 3.5c appears to be entirely black. There are slight differences however, as can be seen in Figure 3.6...... 54 3.6 Difference blending applied on baseline renderings (close- up of Figure 3.5c). Although it may be hard to make out, a silhouette is visible in this close-up...... 55 3.7 Overview of performance tests data gathering...... 56

4.1 Scenario IQ.1 comparisons of renderings in Maya and real- time renderings with GTAO and RTAO. The quality configurations with the most similar SSIM scores were chosen as examples. The local SSIM maps have been coloured red to distinguish them. A darker shade indicates a stronger deviation from the reference image...... 58 LIST OF FIGURES | xi

4.2 Example of some visual artefacts occurring in the first (roughly) ten frames of GTAO. This is a detail from one of the first frame of Scenario IQ.2. The brightness of the image has been altered to make the artefacts more visible...... 59 4.3 Scenario IQ.1 The SSIM over 600 frames...... 61 4.4 Scenario IQ.2 comparisons of renderings in Maya and real- time renderings with GTAO and RTAO. The quality configurations with the most similar SSIM scores were chosen as examples. The local SSIM maps have been coloured red to distinguish them. A darker shade indicates a stronger deviation from the reference image...... 62 4.5 Scenario IQ.2 SSIM over 600 frames...... 64 4.6 Scenario IQ.3 comparisons of renderings in Maya and real- time renderings with GTAO and RTAO. The quality configurations with the most similar SSIM scores were chosen as examples. The local SSIM maps have been coloured red to distinguish them. A darker shade indicates a stronger deviation from the reference image...... 65 4.7 Scenario IQ.3 SSIM over 200 frames...... 68 4.8 The SSIM values are very similar for GTAO. Consequently, it is hard to see any distinctive differences between the local SSIM maps of the quality configurations. One thing that can be seen is the difference in sampling the radius, which increases with the quality configuration. See for instance the dark band that surrounds the contour to the right of the head in IQ.3...... 70 4.9 The local SSIM maps show a consistent progression from ’Low Quality’ to ’High Quality.’ Noise is reduced. The issue with noise surrounding the contours of moving objects is also less prevalent, see the examples for IQ.2. Some areas remain a problem for all configurations, an example being the mouth and eye lids in IQ.3 ...... 71 4.10 Illustration of three different issues with GTAO. Edges are rendered too soft (green), no occlusion from objects that are off-camera (red) and over-occlusion around an object, appearing as a ’dark halo’ effect (blue). Close-up from Scenario IQ.1, left image rendered in Maya. Right image, GTAO ’High Quality’, full resolution...... 72 xii | LIST OF FIGURES

4.11 Showing of how occlusion information is lost, for GTAO, when an object is obscured. Also, note the noise along the contours of the foreground object. Close-up from Scenario IQ.1, GTAO, ’High Quality’, full resolution...... 73 4.12 Illustration of artefacts, related to GTAO, that can occur in occluded areas. Close-up from Scenario IQ.3, left image rendered in Maya. The GTAO example also shows the issue with a sharp distinction between occluded and non-occluded areas, whereas in the reference image there is a smooth, gradual transition between the areas. This can be somewhat adjusted with the sampling radius but at the possible expense of poorer overall results. Right image is GTAO ’High Quality’, full resolution...... 74 4.13 Details from various renderings, comparing the quality configurations for GTAO...... 76 4.14 Example showing noise along the contours of an animated object, prevalent in RTAO renderings. Close-up from Scenario IQ.1, RTAO ’Low Quality’. The contrast of this image has been exaggerated to show the noise more clearly...... 77 4.15 Showing of a visual artefact, likely due to a lag from the use of spatio-temporal reprojection in the RTAO renderings. Close- up from Scenario IQ.3, RTAO ’High Quality.’ ...... 77 4.16 Details from renderings showing the difference between quality configurations for RTAO. While it may be difficult to discern, ’Low Quality’ contains more noise and details looks less distinct than ’Medium Quality’ and ’High Quality.’ ...... 78 4.17 Scenario P.1, GTAO computation time vs. step count. As the number of samples increase, the cost of the ambient occlusion step (’Horizon SSAO’) increases linearly. The denoising and upscaling are not affected...... 82 4.18 Scenario P.1 GTAO computation time vs. rendering resolution 83 4.19 Scenario P.1, performance scaling for RTAO...... 84 4.20 Scenario P.2-3, computations time as number of animated objects increase. As expected, computation time of GTAO is invariant to an increase in number of animated objects. In contrast, RTAO costs more, as the geometric complexity increases. The higher cost of skinning can likely be attributed to the fact that this model has a higher geometric complexity than the model used for rigid-body animations...... 88 LIST OF FIGURES | xiii

4.21 Scenario P.2-3, cost of updating and rebuilding ray tracing acceleration structures. The increase of the costs associated with skinning animations dwarfs the increase in costs associated with rigid-body animations...... 90

A.1 Scenario IQ.1 GTAO SSIM, this graph shows the same results as 4.3a but with a different y-axis that makes the difference between the quality configurations more apparent...... 111 A.2 Scenario IQ.2 GTAO SSIM, this graph shows the same results as 4.5a but with a different y-axis that makes the difference between the quality configurations more apparent...... 112 A.3 Scenario IQ.3 GTAO SSIM, this graph shows the same results as 4.7a but with a different y-axis that makes the difference between the quality configurations more apparent...... 112 xiv | LIST OF FIGURES LIST OF TABLES | xv

List of Tables

3.1 GTAO parameter descriptions...... 50 3.2 RTAO parameter descriptions...... 50 3.3 GTAO quality configurations, used in Scenario IQ.1-3 tests. . . 51 3.4 RTAO quality configurations, used in Scenario IQ.1-3 tests. For ’Denoiser radius’ different values were used for each scenario, i.e. Scenario IQ.1, IQ.2 and IQ.3 ...... 51 3.5 Arnold ambient occlusion configurations, used for the reference renderings in Scenario IQ.1-3...... 51

4.1 Scenario IQ.1, The mean SSIM of 600 frames. RTAO has the highest SSIM and shows significant improvement with the higher quality configurations. For GTAO, using full resolution seem to be the most significant factor for improving the SSIM. Values in parenthesis uses full resolution GTAO. See Figure 4.3 for a companion graph...... 60 4.2 Scenario IQ.2, the mean SSIM of 600 frames. Values in parenthesis uses full resolution GTAO. Both GTAO and RTAO gets a lower SSIM compared to the SSIM of Scenario IQ.1. IQ.2 has a model with more details than IQ.1 that might present a challenge for both AO methods. See Figure 4.5 for a companion graph...... 63 4.3 Scenario IQ.3, The mean SSIM of 200 frames. Values in parenthesis uses full resolution GTAO. Both GTAO and RTAO has very high SSIM. See Figure 4.7 for companion graph. . . . 66 4.4 Scenario P.1, Mean GTAO computation time vs. step count. Values are given in ms. Note that ’Total’ subsumes the other events. See Figure 4.17 for companion graph...... 79 xvi | LIST OF TABLES

4.5 Scenario P.1, Mean RTAO computation time vs. sample count. Values are given in ms. Note that ’Total’ subsumes the other events. See Figure 4.19a for companion graph...... 80 4.6 Scenario P.1, Mean GTAO computation time vs. resolution. Values are given in ms. Note that ’Total’ subsumes the other events. See Figure 4.18a and Figure 4.18b for companion graphs...... 80 4.7 Scenario P.1, Mean RTAO computation time vs. resolution. Values are given in ms. Note that ’Total’ subsumes the other events. See Figure 4.19b for companion graph...... 81 4.8 Scenario P.1, Mean GTAO computation time vs. radius. Values are given in ms. Note that ’Total’ subsumes the other events...... 81 4.9 Scenario P.1, Mean GTAO computation time vs. max radius in pixels. Values are given in ms. Note that ’Total’ subsumes the other events...... 81 4.10 Scenario P.1, Mean RTAO computation time vs. maximum ray length. Values are given in ms. Note that ’Total’ subsumes the other events...... 82 4.11 Scenario P.2-3, input parameter configurations for GTAO and RTAO used in P.2 and P.3...... 85 4.12 Scenario P.2-3, mean computation time as the number of objects increases. Values are given in ms. Results continue in Table 4.13. See Figure 4.20 for a companion graph. . . . . 86 4.13 Scenario P.2-3, mean computation time as the number of objects increases. Values are given in ms. Continuation of Table 4.12. See Figure 4.20 for a companion graph...... 87 4.14 Scenario P.2-3mean ray tracing acceleration structure update and rebuild time. For rigid-body, it is likely that updating the top level acceleration structures are enough. For skinning refitting or rebuilding bottom level acceleration structures might be necessary. These operations are much more expensive, which is reflected in the results. See Figure 4.21 for a companion graph...... 89 LISTINGS | xvii

Listings

2.1 Pseudo code of a simple ray tracing renderer ...... 17 xviii | LISTINGS Introduction | 1

Chapter 1

Introduction

1.1 Background

This thesis’s research area is for real-time applications and specifically ambient occlusion (AO) algorithms. AO is an approximation of global ambient illumination at a much cheaper cost than fully accurate . Put simply, AO estimates which parts of a 3d-model are likely to be obscured from light and therefore appear darker - an example could be crevices and folds. On its own, ambient occlusion can present enough information to convey the shape and forms of a 3d-model. Combined with other rendering techniques, AO can help create the impression of more realistic lightning and increase the sculptural quality of the models. Figure 1.1 shows a rendering that has AO applied and no additional . Presently, the industry standard for real-time applications (e.g., video games) seems to be a category of algorithms called screen-space ambient occlusion (SSAO). They are popular due to the reasonable trade-off these algorithms offer between image quality and performance. Unfortunately, SSAO methods are based on assumptions that fundamentally limits their accuracy. These assumptions also leads to visual artefacts that are common to many of the SSAO methods [5]. The gold-standard for AO is achieved with Monte Carlo ray tracing. Until recently, ray tracing has been mostly impractical for real-time applications due to performance concerns. Video games that used ray traced ambient occlusion (RTAO) had to do so by doing offline renderings and baking the results into texture maps. While this can yield good results, it also requires extra memory and is limited to static geometry. Additionally, baking requires an extra preprocessing step for the 3d-assets. 2 | Introduction

Figure 1.1 – An illustration of the difference between constant ambient lightning and ambient occlusion (right image). Ambient occlusion can convey a lot of information about the form of an object and bring out details, while the constant ambient lightning only conveys the silhouette. Image taken from [1]. "Dragon" model, by Delatronic, Benedikt Bitterli Rendering Resources, licensed under CC BY 3.0.

Ray tracing can be a preferred way of rendering as it can be applied to solve more general problems than rasterisation based techniques. For instance, using ray tracing, it is possible to approximate the rendering equation, a cornerstone in physically-based rendering with correct global illumination; see Section 2.1.1 for more information on the rendering equation. Visual effects and 3d-rendering in the movie industry has embraced physically based shading techniques and adopted ray tracing as the standard rendering algorithm (mainly through path tracing, see Section 2.1.2)[6]. The movie industry can afford to run very complex and expensive rendering calculations on massive render farms. Naturally, an application that renders graphics in real-time does not have this computational time budget and horsepower. Nevertheless, it appears that real-time graphics is looking to make a similar transition as the movie industry. Many real-time renderers (such as is used in video games) have transitioned to physically based shading instead of ad hoc models. However, the realism that they can convey is held back by the lack of a more general light model, which is not achievable using rasterisation but is possible with ray tracing [7]. Hence, it appears that ray tracing will become an integral part of real-time rendering in the future and may subsume rasterisation entirely at some point [8]. Introduction | 3

To make ray tracing feasible in real-time applications, graphics chips manufacturers have developed specialised acceleration hardware. Examples include the NVIDIA Turing and Ampere RTX cards and AMD’s counterpart, the RDNA2 architecture graphic cards. Paired with this new hardware comes specialised API:s that lets programmers leverage the new ray tracing capabilities these cards provide, such as DirectX Raytracing (DXR) and Vulkan Ray Tracing. Using may open the possibility of RTAO in real- time. Indeed, a recent study evaluated and compared different ambient occlusion methods for real-time use, where real-time RTAO was included [9]. This study found that RTAO was superior in precision but substantially more expensive than the alternatives. However, the evaluation in this study did not include any animated objects, which is the focus of the present study. The present author is not aware of studies that have looked explicitly at image quality and performance of different AO methods directly related to scenes containing animated objects. Including animated objects makes the test cases closer to a large category of applications where animated objects are a key component for the intended experience or use-case. This will make it possible to identify image quality issues that occur only as objects animate and move in a scene. Animated objects may also incur performance costs when using RTAO that any actual application should consider. The reason is that the computation time of RTAO increases as the number of objects in a scene goes up. Also, hardware-accelerated ray tracing relies on acceleration structures that must be updated or rebuilt when objects animate. Therefore, in terms of research, including animated objects in the tests can provide a more holistic approach to evaluating an ambient occlusion method - at least it will cover features of a broader category of applications.

1.2 Purpose

The purpose of this thesis is to evaluate to what extent RTAO is suitable for real-time applications with respect to animated objects. To achieve this, RTAO will be evaluated by comparing it with a state-of-the-art SSAO method called ground-truth ambient occlusion (GTAO). Both methods will be evaluated on the image quality they achieve and their performance, in terms of computation time. Basing the evaluation on scenarios with animated objects is often overlooked in the literature. The motivation to do so in this thesis is two-fold. First, it aims to provide a more extensive evaluation of the image quality and performance 4 | Introduction

of RTAO for anyone looking to use RTAO in a real-time application. Second, comparing RTAO with a state-of-the-art SSAO method can serve as an example of the extent of the qualitative leap that can be achieved by leveraging ray tracing in real-time and the price paid in terms of performance. The latter may indicate how far along the integration of ray tracing techniques into real- time rendering has come. This project’s results should benefit anyone who wants insight into the performance characteristics of RTAO, in comparison with an industry proven SSAO method. An example could be a programmer wanting to make an informed decision about which AO method is most suited for their specifications. Examples of such projects could be video games, simulations or interactive visualisations - such as an architectural visualisation. Additionally, using ambient occlusion to convey form can be an attractive choice for applications that are more abstract and not necessarily want to simulate realistic lightning. This category is a bit more vague but could include various visualisation applications. Also, this project can be informative for researchers as it covers ambient occlusion in a way that is not common in the academic literature. Lastly, this project will provide more results regarding the performance of hardware-accelerated ray tracing. More research in this field can give indications as to how fast this technology is developing and maturing.

1.3 Research question

As mentioned in Section 1.1, hardware-accelerated ray tracing could make it feasible to use ray traced ambient occlusion (RTAO) in real-time applications. Indeed, Ghavamian [9] compared the image quality and performance of RTAO to various screen space ambient occlusion methods (SSAO), concluding that the image quality of RTAO was superior. Nevertheless, it was also shown that the performance cost of RTAO was significantly higher than the SSAO alternatives. [9] evaluated ambient occlusion on scenes with no animated geometry. The present thesis’s contribution is by evaluating RTAO specifically for animated scenes. The animated scenes involve objects that are both animated by simple rigid-body animations and hierarchical skinning animations. Evaluating RTAO in animated scenes will provide test cases that resemble real-use cases for real-time applications since these are likely to include animated objects. It may shed light on specific image quality issues that could arise due to objects animating. Moreover, animated objects pose specific performance concerns for hardware-accelerated ray tracing and appraising the significance Introduction | 5

of these will be informative and an essential factor to consider for any real- time application. Finally, since [9], a new generation of GPUs that support ray tracing has been introduced. It will be interesting to see if the technology has matured to the point that the cost of RTAO compared to SSAO alternatives now is negligible. The main research question that this thesis aims to answer is

To what extent is ray traced ambient occlusion suitable in real-time applications in scenes with animated objects?

This thesis will compare RTAO with ground-truth ambient occlusion (GTAO). GTAO was developed and presented by Jiménez et al. in [4]. GTAO was chosen for this study since it represents a state-of-the-art SSAO method. As shown in [4], it can produce impressive results and is performant enough to be used in video games with high-fidelity graphics since it can be computed in 0.5 ms at 1080p on a Playstation 4. Consequently, GTAO has been used in AAA video game titles and it is the standard SSAO implementation used by Unity for their HDRP rendering pipeline. However, since GTAO is a SSAO method, it has inherited the fundamental limitations associated with calculating ambient occlusion in screen space. For instance, off-camera geometry cannot occlude geometry visible to the camera, which can lead to artefacts and inconsistencies. Both [5] and [9] discuss the limitations of SSAO methods in detail. These limitations do not apply to RTAO. Comparing RTAO with GTAO will provide a context for the evaluation of RTAO, making it possible to examine how much of an improvement in image quality can be gained by using RTAO and what the performance implications are. This can be of interest to anyone considering which AO method to implement in their application - such as video games, simulations or interactive visualisations. The main research question will be divided into two sub-questions that address a different evaluation dimension - image quality and performance, respectively. This division is appropriate since computer graphics is concerned with developing algorithms that constitute a compelling trade-off between the two. Generally, a higher level of image quality comes at the expense of more computation time. An assumption (rather obvious) of this thesis is that higher image quality is always desirable, all else being equal. The main research question is discussed, in relation to the results, in Section 4.4.3. 6 | Introduction

1.3.1 Research question 1 - evaluating image quality Due to the fundamental differences between the RTAO and GTAO, we can assume that there will be a difference in the image quality they achieve. However, it will be interesting to evaluate the significance of this difference, particularly regarding animated scenes. This thesis will compare RTAO and GTAO in scenarios with different features and characteristics, to investigate under what circumstances they produce the best image quality. It will also be interesting to see the possible range of image quality that each method can achieve when input parameters are changed. The image quality will be assessed in two ways. An objective image quality assessment will be performed, using SSIM as the metric, see Section 2.5.1 for information regarding this metric. The second assessment will consist of visual inspection. The visual inspection will entail that the author makes an overall comparison between RTAO, GTAO and offline reference renderings, looking at how closely the real-time ambient occlusion methods come to the reference renderings. Also, the visual inspection will investigate whether there are specific image quality issues that arise as objects animate and move, by studying movies that are composed from the renderings. Thus, we formulate the first sub-question, aimed at evaluating image quality,

RQ 1: To what extent is the image quality of RTAO an improvement over GTAO when evaluated using SSIM and visual inspection?

RQ1 is assessed using three different scenarios labelled Scenario IQ.1, IQ.2 and IQ.3, see Section 3.0.2 for a description of these. The results which address RQ 1 can be found in Section 4.2 and discussed in Section 4.4.1.

1.3.2 Research question 2 - evaluating performance Performance is an essential concern for real-time applications as these work within a limited computational budget. A common metric of performance for a real-time application is the average it can maintain, i.e. how many frames it can render and present in a given time interval. Usually, this interval is a second - hence the metric frames-per-seconds (FPS). A slow frame rate will make the experience appear sluggish and unresponsive, while a high frame rate can give a smoother and more natural experience. Also, the stability of the frame rate is important. If the frame rate varies too much, Introduction | 7

the application can appear to stutter. Naturally, having a high, stable frame rate is important for animations. For real-time applications such as video games, it is common to target 30 FPS or 60 FPS (or even higher). These target FPSs give a time budget for rendering a frame to 33.333 ms and 16.667 ms. For an application to appear smooth and responsive it must not overshoot this time budget. As mentioned above, animated objects present a performance cost for hardware-accelerated ray tracing. This is because ray tracing relies on organising the scene geometry in acceleration structures. When an object is animated, its acceleration structure needs to be updated or rebuilt, which can be costly, as described in [10]. See Section 2.1.4 for further elaboration on this. This thesis will evaluate performance by measuring the computation time in milliseconds to perform RTAO or GTAO per frame. The second sub-question is

RQ 2: How is the computation time, measured in milliseconds, of RTAO affected by changes in the image quality, geometric complexity of the scene as well as the number of animated objects?

RQ 2 will be evaluated using three different scenarios, labelled Scenario P.1, P.2 and P.3, see Section 3.0.3 for a description of these. The results addressing RQ 2 are presented in Section 4.3 and discussed in Section 4.4.2.

1.3.3 Summary For the convenience of the reader, all research questions are repeated here. Main research question: To what extent is ray traced ambient occlusion suitable in real-time applications in scenes with animated objects? RQ 1: To what extent is the image quality of RTAO an improvement over GTAO when evaluated using SSIM and visual inspection? RQ 2: How is the computation time, measured in milliseconds, of RTAO affected by changes in the image quality, geometric complexity of the scene as well as the number of animated objects?

1.4 Hypothesis

The hypothesis of this thesis is that (hardware accelerated) ray traced ambient occlusion (RTAO) will produce a superior image quality to ground-truth 8 | Introduction

ambient occlusion (GTAO). However, the performance cost of RTAO will be non-trivially higher than GTAO, especially in scenes containing many animated objects. Ultimately, RTAO will be a viable alternative for real-time use, but the context will significantly determine how suitable it is.

1.5 Goals

This thesis’ overall goal is to conduct experiments to evaluate ray traced ambient occlusion (RTAO) for real-time use, by comparing it to a state-of- the-art screen space ambient occlusion method called ground truth ambient occlusion (GTAO). Both methods will be evaluated in real-time use cases, with scenes containing a varying number of animated objects. This goal is divided into the following sub-goals:

1. Research Goal 1 - evaluate and compare the image quality of RTAO and GTAO. This will be done using an objective image quality assessment method (using SSIM) and by visual inspection. The methods will also be evaluated on how stable the image quality over time is. Reference images, rendered in an offline renderer using ray traced ambient occlusion will be used both for the SSIM calculation and the visual inspection. This goal addresses RQ 1, see Section 1.3.1.

2. Research Goal 2 - evaluate and compare the performance of RTAO and GTAO, in terms of computational time per frame. Evaluate how the performance changes when relevant input parameters are changed. Some examples of input parameters are rendering resolution, the number of samples per pixel and the number of animated objects. This goal addresses RQ 2, see Section 1.3.2.

1.6 Research methodology

To answer the research questions and realise the project’s goals, we will carry out controlled experiments, and the data gathered from these experiments will be analysed. The experiments’ design and execution will be based on knowledge of how the ambient occlusion methods work and methods used in similar studies. The theoretical knowledge is gathered in a pre-study phase and presented in Chapter2. The theoretical background knowledge will ensure that the experiments isolate the relevant parameters and will inform the analysis. Introduction | 9

The background knowledge will also determine how to construct test cases representing worst-case (and common-case) scenarios. These are important to consider for anyone interested in applying any of the ambient occlusion methods. The experiments will be divided into two categories that each address one of the project’s research questions and goals. The first category is experiments that test the image quality produced by the AO methods. The second category of experiments is going to test performance and how performance is impacted when input parameters are scaled. This methodology is suitable for this project since it aims to provide understanding of whether hardware accelerated RTAO is feasible for use in real-time applications that contain animated objects. The experiments will provide data on RTAO performance compared to a state-of-the-art SSAO method, GTAO, that has seen industrial use. While one cannot assert a universal, strict computational budget for AO, one ms appear to be a good guideline of how much time should be spent on AO [5]. In this thesis, image quality will be evaluated by comparing the real-time renderings with offline renderings done by the Arnold renderer in Maya. This is similar to [4] and [9], that also use offline renderings as reference images to compare with the real-time AO results. This approach is common in a lot of research that deals with real-time ray tracing. Doing a comparison with offline renderings can be criticised in that it introduces assumptions around what constitutes image quality. However, in this thesis it is appropriate to use the offline renderings as reference, or ’ground-truth’, because the offline renderer approximates the same AO equation as the real-time renderer. Since the offline renderer has a more generous computational budget it can use a higher number of samples and consequently be more accurate. Using reference images allows the computation of an objective image quality assessment score - this thesis will use a metric called structural similarity index (SSIM). In [9], the author used SSIM for the image quality comparisons. SSIM is also used in other computer graphics research when comparing the output of algorithms with reference images. For instance, it is used to evaluate denoising (or reconstruction) algorithms for real-time ray tracing, such as in [7] and [11]. Moreover, since SSIM can be computed for a sequence of frames, the mean SSIM for the entire sequence can be calculated. Hence, SSIM is also used as a metric for video quality assessment [12]. Besides calculating mean SSIM for the entire sequence, the stability of SSIM can be determined by looking at 10 | Introduction

the standard deviation. The stability of SSIM may be helpful to indicate how consistent the image quality of either AO method is. It should be noted that objective image quality assessments are limited. Appraisal by the human eye remain the gold standard for evaluating image quality. Therefore, the image quality evaluation will also consist of visual inspection. Visual inspection is a common method for evaluating image quality in computer graphics. The visual inspection will aim to investigate whether there are specific image quality issues that arise in scenes with animated objects. If particular visual artefacts occur, these will be exemplified with images that the reader can inspect. It will also be important to make an assessment of how closely the real-time renderings resemble the offline reference renderings. The performance will be evaluated by using a profiler to measure the actual computation time of the AO methods, rather than average FPS. This is appropriate for this thesis since the test scenarios will not include shading. Where an average FPS metric would be misleading, measuring the computation time of the AO in isolation can provide more generalisable results. Studies such as [4], [9] and [5] presents performance results in a similar manner. There is of course some issues with this approach, in that it relies on a simplified view of GPU architectures. Due to the parallel nature of GPUs, it may be difficult to predict how the computation time of the AO methods might be affected when included in applications with a completely different workload. Nevertheless, the measurements done in this thesis should give a reasonable idea of what could be expected, in terms of performance. In order to provide more generalisable results, the performance tests will be designed to show how computation costs increase as input parameters (such as rendering resolution) scale. While individual measurements may be specific for the hardware that runs the experiment, the relationship between computation time and an increase in rendering resolution, as an example, can be assumed to be deterministic and behave similarly on other (reasonably similar) systems.

1.7 Delimitations

This project will only use the ambient occlusion part of the rendering pipeline. Therefore, scenes will not contain additional shading. Hence, the answer to RQ 1 (see Section 1.3.1) will not consider how the final image quality might be impacted when the AO has been composed with the output of the remaining rendering pipeline. Introduction | 11

For all scenarios, the camera will be fixed. Furthermore, the camera position and angle will be fairly equivalent of an eye level shot. Hence, the impact of camera movement and viewpoint will not be considered explicitly. The reason for this is to keep the scope of this project limited to animations of objects in the scenes. It is likely the case that possible effects of a moving camera could be inferred from looking at how animated objects are rendered. For instance, if some artefacts occur while an objects is moving then it is reasonable that similar artefacts could occur if the camera moved instead. As for performance, moving the camera could cause performance costs. For instance, the camera could pan from one corner of a scene to another, where the latter is filled with animated objects. This should be roughly equivalent to having a static camera and moving objects into the camera’s view. As such, the results from this thesis could indicate some of the performance costs involved with having a moving camera. Also, the visual inspection part of the image quality evaluation will be carried out solely by the author of this thesis. It will primarily be based on the prevalence of visible artefacts that occur in the moving images and how close the real-time renderings resemble the offline reference renderings. No research project can hope to evaluate all possible scenarios where ambient occlusion might be applied. Rather than using a multitude of scenes, models and animations, the aim is to design scenarios that can provide generalisable results by looking at the relationship between computation cost and a given input parameter. These relationships should hold for other types of scenarios. Furthermore, for this thesis, the scenarios will be designed with specific features in mind - the limits and generalisability of the results from these scenarios will be discussed in-depth in Section 4.4 and Section 4.5. The scenarios themselves and the difference between them are described in more detail in Section 3.0.2 and 3.0.3. In any case, it is worth outlining these features here to provide a clear idea of what features these scenarios include and not include. First, an enclosed interior space scene will be used because it will clearly distinguish between static geometry and animated objects. Also, the animated objects will be close enough to each other and the static geometry so that they can occlude each other. This will show how the ambient occlusion methods look when animated objects occlude (and is occluded by) static geometry and other animated objects and how the occlusion changes as they animate. Hopefully, this will make any visual issues with the animated objects apparent. The other scene will be a close-up of a human face. This scene is intended to show the results of the ambient occlusion methods where any errors are likely 12 | Introduction

to be discernable and disturbing because the viewer focuses on a realistic- looking face and will be sensitive to anything that looks wrong. This scene will consist predominately of organic forms and not as many sharp angles as the interior scene. The animations in this scene, changing facial expressions, will be more subtle than what is used in the interior scene - the extent to which each ambient occlusion method can capture these subtleties will be part of the image quality evaluation. Overall, the scenes and models used in the scenarios contain detail and geometric complexity similar to what could be expected in applications with reasonable visual fidelity. Examples could be mid-tier video games running on mobile phones or interactive visualisations used to showcase architectural spaces. The features of the scenarios likely represents a fairly broad category of applications such as video games or architectural visualisations. Since the scenes used is either an interior space or a close up of a face, there is no outdoor scene with an expansive vista and great distances. Also, there is no procedural geometry, such as a landscape generated from heightfield maps or more abstract geometry. Neither is there delicate organic geometry such as foliage, fur or hair. Also, none of the scenarios contain super-detailed models. Finally, as the models used in the scenarios are realistic-looking, the evaluation will not encompass various visual styles, such as a cartoon look. The type of animations that will be used is limited to a walking animation, simple rigid-body animations and some facial animations. The walking animation is based on motion capture data. It is smooth and evenly paced, with no exaggerated movements. The rigid-body animations will be simple, either consisting of translations, rotations or a combination of both. These will be designed to be distinctive and make it clear if any issues are arising from either of these types. The rigid-body animations will also be smooth and gradual. Video games, in particular, may contain much more varied and sophisticated rigid-body animations. The skinning animations will similarly be more varied and can be more expressive or exaggerated. For instance, in a video game, animations will likely range from smooth and slow to jerky and swift. Nevertheless, comparing the walking animation with the simple rigid- body animations could show any qualitative differences between these two, which could indicate general problems for both categories of animations. Both RTAO and GTAO rely on denoising. However, this thesis will not compare different denoising algorithms and instead rely on the implementations in Unity. Similarly, there will be no attempt to apply optimisations to any of the ambient occlusion implementations provided in Unity or to the scenarios otherwise. Consequently, the answer to RQ 2 will not show the extent to Introduction | 13

which potential performance gains may be possible if work was put into optimisation.

1.8 Structure of the thesis

The thesis is structured in the following manner. Chapter 2 lays the theoretical groundwork for understanding ambient occlusion and ray tracing as well as other areas that are relevant for this thesis. It also contains a section on related work, Section 2.6, which deals with relevant research and how it affects the present thesis. Chapter 3 explains the methodology used in this thesis and the design of the conducted experiments. It also contains a description of how the experiments were executed. Then, Chapter 4 presents the results, starting in Section 4.1, and analyses the results in the light of theory and the literature. The results are then discussed in Section 4.4. The discussion is followed by Section 4.5, which addresses limitations of the research conducted in this thesis. Finally, Chapter 5 points out directions for future research and brings the thesis to a conclusion. 14 | Introduction Background | 15

Chapter 2

Background

2.1 Ray tracing

2.1.1 Overview of the ray tracing rendering model This section presents the reader with a basic conceptual overview of ray tracing and is intended for readers that have little previous exposure to these concepts. The intention is that familiarity with these concepts will make the performance implications of ray tracing algorithms clear - particularly the need for acceleration structures. In the context of this thesis, this applies specifically to ray traced ambient occlusion.

Figure 2.1 – An illustration of a simple pinhole camera model. This camera model, when altered slightly, is a conceptual building block for rendering images with ray tracing.

A basic ray tracing model consists of a camera, an image plane and a scene. A simple camera model is the pinhole camera, see Figure 2.1. A pinhole 16 | Background

camera consists of a box with a single inlet for light, i.e. the pinhole. On the side opposite the pinhole, a film receives the incoming light, and with enough exposure, the film captures a photographic image. A scene contains geometry, consisting of different materials, and light sources. The geometry can be defined implicitly (e.g. by an equation ) or explicitly, using meshes defined by vertices. How light interacts with an object depends on the properties of its material. In computer graphics rendering, the pinhole camera model is modified such that the film is in front of the pinhole. The pinhole is renamed the eye (or simply the camera) and the film is renamed the image plane. The image plane is divided into a grid, where each cell represents a picture element, pixel, in the image that will be rendered. The division of an image into discrete pixels that each have a numerical value is essentially how images are represented digitally. The altered pinhole camera model makes it easy to determine the origin and direction of rays that are cast from the camera into the scene. A ray can be mathematically defined using a parametric form:

r(t) = O + d × t (2.1) where, O is the origin of the ray, d is the normalised direction of the ray and t ∈ [0, ∞). In practice, t is bounded by a finite max value, t ∈ [0, tmax]. A ray can be used to query a scene via an operation called ray casting. Ray casting involves shooting a ray in a direction and finding the closest intersection, if it indeed intersects any objects. In order to render an image, the colour of each pixel in the image needs to be determined. In ray traced rendering this is done by sampling the scene via ray casting. With the camera as their origin, rays are cast through each of the image’s pixels into the scene. The rays shot from the camera are commonly referred to as camera rays or eye rays. The goal is to find the closest object that the ray intersects or alternatively conclude that the ray intersects with "the sky". Figure 2.2 illustrates the components of a simple ray tracer. If the ray intersects geometry at a point p, we can look up the surface’s colour at p and decide that this is the resulting colour of the pixel. Determining the colour of a pixel is called shading. Before shading the pixel, the algorithm must determine if p is in shadow. With p as its origin, a new ray is shot in the direction of a light source. Such a ray is commonly called a shadow ray. The point p is in shadow if the shadow ray intersects any opaque geometry. Of course, a shadow ray can be allowed to pass through transparent objects, such as glass. Background | 17

Figure 2.2 – A simple ray tracing model, describing some fundamental concepts. The image plane is divided into a grid, each cell representing a picture element (pixel). The colour of the pixel indicated in this illustration would be calculated by including the contributions of the reflection ray, the shadow ray and the refraction ray. For simplicity, only one instance of a ray type is depicted.

Similarly, we can launch other ray types from p in order to achieve optical phenomena such as reflection and refraction. Reflections are computed by shooting a reflection ray from p with the incident direction mirrored across the normal of the point. A refraction ray is launched from p with a direction that has bent the angle of the incident direction, according to the material’s refractive properties. When an intersection point shoots reflection and refraction rays, these rays can, in turn, give rise to new intersections points that, recursively, shoot new reflection and refraction rays. When a ray reaches a light source, the recursion terminates. The concatenation of rays, starting from the camera and ending at a light source, is called a ray path. (Ray) tracing is the process of using ray casting to recursively gather the contributions from light sources, reflections and refractions on a surface point p. Naturally, ray tracing can become computationally expensive as a ray path can become arbitrarily long. Therefore, in a real implementation, the recursive depth is limited. The following pseudo code sums up a basic ray tracing algorithm:

Listing 2.1 – Pseudo code of a simple ray tracing renderer 18 | Background

f o r each p i x e l in image : ray = make_ray(eye, pixel) trace(ray, pixel) closest_hit = None f o r each triangle in s c e ne : hit = intersect (ray, triangle) i f h i t and is_closest_hit (hit): closest_hit = hit i f closest_hit: shade (pixel , closest_hit) Listing 2.1 shows a naive ray tracing algorithm that uses a brute-force approach. This will have a time complexity is O(P × N), where P is the number of pixels in the image and N is the number of objects in the scene. This complexity can become infeasible for high-resolution renderings of scenes with much geometry. A more intelligent implementation than Listing 2.1 leverages acceleration structures to reduce the number of ray-geometry intersection tests. The acceleration structure contains information about the scene geometry in a hierarchical tree structure, typically either a BVH or a k-D tree [13]. When a ray tracer uses such a data structure, finding ray-geometry intersection becomes a tree traversal algorithm which can bring the time complexity of the ray tracing algorithm down to O(P × log N), a substantial improvement. For real-time ray tracing, using acceleration structures becomes imperative. However, building the acceleration structure has a cost of at least O(N) [13].

2.1.2 Path tracing and the rendering equation This section will introduce a central equation in computer graphics called "the rendering equation." The rendering equation is included in this text, partly because of its overall significance in computer graphics, but also as a way of understanding ambient occlusion later. Also, the rendering equation was introduced in the same context as path tracing. Path tracing is a method that can be used to evaluate ray traced ambient occlusion, which will be presented later. In 1989, Kajiya published a seminal paper called ’The Rendering Equation’ which introduced the fundamental concepts and techniques for modern ray tracers [14]. The rendering equation, also called the transport equation for the graphics-specific problem, generalised and subsumed multiple previous rendering equations into a single expression. One form of the rendering Background | 19

equation is Z L0(p, ωO) = Le(p, ωO) + fr(p, ω0, ωi)Li(p, ωi)| cos(θi)|dωi, (2.2) Ω where L0 is the outgoing radiance of a point p, in the direction ωO, Le, is the emitted radiance at the point p in direction ωO, Li is the incident radiance of a direction ωi with incident angle θi , cos θi is the cosine-weight factor, Ω is the unit hemisphere centred around the normal of the surface at p , which constitutes all valid directions of ωi and, finally, fr(p, ω0, ωi) is the bidirectional reflectance distribution function (BRDF) of the surface at p. The rendering equation states that in order to correctly shade a point p on a surface, compute the outgoing radiance at p in the direction of the viewer, e.g. the camera. Unfortunately, the rendering equation involve a recursive term - 0 Li(p, wi). This contribution can depend on the outgoing radiance of a point p at a surface somewhere from the ωi direction. Similarly, the outgoing radiance of p0 can, in turn, depend on the incoming radiance from other points. This recursion can carry on indefinitely. The rendering equation cannot be solved analytically nor evaluated directly for most cases. It is possible to approximate Equation 2.2 by sampling the incoming radiance from N directions, uniformly chosen around Ω. However, even with a recursive depth limit, such an approach would risk an exponential increase in the number of rays. Instead of shooting N new rays from an intersection point, we could decide only to shoot one new ray per intersection, thus building a ray path with no branches and avoiding the exponential increase in rays. In this scheme, a new ray’s direction from an intersection point is decided stochastically from a distribution. Shading a pixel accurately now involves sampling the scenes with N number of paths, per pixel, originating with the camera ray through the pixel and finally averaging these samples. A higher N will yield a more accurate solution. Kaijya introduced this approach, called path tracing and formalised his solution using Monte Carlo integration. Monte Carlo integration is a numerical integration technique that relies on randomness. Given enough samples, a Monte Carlo algorithm will converge on the accurate answer. Therefore, a Monte Carlo path tracer can produce physically accurate images by evaluating Equation 2.2. Unfortunately, the convergence rate of Monte Carlo integration is O(n−1/2), meaning that to reduce the error by half, four times as many samples are needed [13]. Because Monte Carlo integration is an approximation, the resulting image will contain noise. This is true even for images rendered with a high sample rate [1]. Therefore, denoising the path tracing algorithm’s output becomes an 20 | Background

essential part of producing the final image. This can be done using a blur filter, as an example, but there exists more involved methods. Researchers have invested much energy in finding sampling schemes to improve performance and image quality of path tracing, one example being importance sampling. Importance sampling is based on the idea that the choice of a sample direction should be weighted towards the directions that are likely to contribute the most for the final result.

2.1.3 Real-time rendering and rasterisation This section aims to describe the basic constraints that characterises rendering done in real-time. Naturally, this relates directly to the research question and research goals of this thesis. Also, this section briefly compares ray tracing with the dominant rendering algorithm - rasterisation. This is intended to provide a more comprehensive background and context for the present research. There exists no formal definition of real-time in the context of rendering. Usually, real-time rendering performance is quantified as rendered frames- per-second (FPS). The average FPS will be the most important to consider, but the variance of the frame rate needs to be low for a smooth and consistent experience. For interactive purposes, a minimum FPS target is generally considered to be 30 FPS. For some categories of games or other interactive applications, a lower FPS might be acceptable. Many PC games and the next generation of console games target 60 FPS. Sometimes even higher frame rates are desired - a minimum FPS of 90 seems to be a standard target for Virtual Reality (VR) applications as this frame rate has been shown to give significantly better perceived experience than lower frame rates [15]. Whatever the case, real-time rendering puts constraints on the computation time for rendering a single frame. For a consistent 30 FPS, a new frame must be rendered within a 33.333 ms window, and for 60 FPS, this number is 16.667 ms. Rasterisation is the dominant rendering algorithm in real-time graphics. Rasterisation, in simple terms, inverts the ray tracing algorithm (see Listing 2.1) by considering each piece of geometry in a scene, and then figuring out whether this geometry covers a pixel. The time complexity for rasterisation is O(P × N), where N is the number of objects in a scene and P is the number of pixels. This is worse than ray tracing that uses acceleration structures, O(P log N). However, rasterisation can benefit from many optimisations such as clipping, frustum culling, hierarchical z-buffer culling and more. Graphics Background | 21

chip manufacturers have developed specialised hardware to accelerate rasterisation and related operations such as texture lookups. Hence, a modern (GPU) can render scenes with astounding geometric complexity and many visual effects at real-time frame rates. It is worth mentioning some reasons behind why rasterisation has been the dominant algorithm for real-time rendering. Ray tracing’s memory requirements have been a disadvantage, as a GPU must store a representation of the entire scene in memory for efficient execution. Today, memory is generally cheap and abundantly available, but this was not always the case. Rasterisation does not have the same memory requirements because it can consider one mesh at a time. Moreover, ray tracing is heavily dependent on denoising images. As stated above, even images rendered with a high sample rate will contain visible noise. Designing denoising algorithms that can work efficiently, within a real-time computation budget, and produce good results for images rendered with minimal sample rates is still an ongoing problem. Ray tracing in real- time affords only a few samples per pixels; hence in this context it becomes especially pertinent to denoise the rendering before presenting the image.

2.1.4 Real-time ray tracing In this section, the contemporary use of ray tracing, which utilises hardware acceleration, is described. Since 2018, NVIDIA has introduced GPUs with circuitry specifically for accelerating ray tracing∗. This hardware contains specialised cores that computes ray-BVH traversal and ray-triangle intersections. In 2020, AMD presented their own GPU’s that have ray tracing acceleration†. The graphics APIs DirectX 12 and Vulkan have added extensions to allow programmers to utilise the hardware accelerated ray tracing capabilities. The following section will outline DirectX Raytracing (DXR) since this is the API used to implement the algorithms used in this thesis. Nevertheless, the Vulkan specification uses similar concepts and abstractions as DXR.

∗ https://nvidianews.nvidia.com/news/nvidia-reinvents-computer-graphics-with-turing- architecture (Accessed March 2021) † https://www.amd.com/en/press-releases/2020-10-28-amd-unveils-next-generation-pc- gaming-amd--rx-6000-series-bringing (Accessed March 2021) 22 | Background

DirectX raytracing The ray tracing pipeline in DirectX is unfortunately quite complex, and only a simplified overview will be presented here, to provide context and implications for the empirical part of this thesis.

Figure 2.3 – A flow chart of what happens when TraceRay() is called. Figure taken from [2]

The ray tracing pipeline in DXR is invoked by a call to DispatchRays(). A ray, in DXR, is defined in much the same way as Equation 2.1 - it has an Background | 23

origin, direction and is parameterized. The parameter has an explicit interval, [tmin, tmax]. Ray casts are only performed within this interval. Moreover, a ray can contain a user-defined payload - typically, this will store the ray radiance in an RGBA format but other forms of data can be added [10]. DispatchRays() performs multiple ray generation invocations by calling TraceRay(). Overall, the pipeline initiated by TraceRay() works as follows: it starts with a tree traversal on the acceleration structures to determine the next object for ray-intersection testing. If it finds a candidate, it will perform an intersection test. The pipeline chooses which intersection test to execute depending on the type of geometry of the object. Triangular meshes get tested by a fixed-function ray-triangle intersection test. Geometry defined differently (implicit geometry, for instance) get tested by the programmable intersection shader. If the ray does not intersect the geometry, then the next geometric primitive will be processed. If there is an intersection and the object is not opaque, then the any hit shader can be called. This shader can determine if the intersection should be ignored - for instance, if the object contains transparent portions that the ray can pass through. An object can use a texture map’s alpha channel to store such information. The pipeline keeps track of the closest hit that has been recorder. When no more geometry is available, the closest hit shader can be called, given that at least one intersection has been recorded. If there were no intersections, the miss shader could be called before the pipeline terminates. Figure 2.3 shows the different stages that occur after a call to TraceRay(). Ray generation, closest-hit and miss can call TraceRay(), which will invoke the ray generation shader and spawn one, or more, new rays [2]. This ensures that the ray tracing pipeline, outlined above, does not dictate the choice of ray tracing algorithm. It can support a classical recursive ray tracer as well as a path tracer.

Top and bottom level acceleration structures As mentioned in section 2.1.1, acceleration structures are needed to avoid unnecessary ray-geometry intersection tests. The DXR implementation relies on such acceleration structures, implemented using bounding volume hierarchies (BVHs) [10][2]. The graphics API’s drivers is responsible for building the acceleration structures and this process can be done by the GPU [10]. Two levels of the acceleration structure are exposed to the programmer: the top level acceleration structure (TLAS) and the bottom level acceleration 24 | Background

structure (BLAS)[2]. The BLAS can contain a set of geometries and additional information such as transform matrices. However, these transform matrices are only applied when the BLAS is built. The geometry stored in the BLAS can come in two forms. The first is as an explicit mesh consisting of triangles. The second is in a procedural form and stores a reference to an intersection shader. The programmer’s implementation of the ray-geometry intersection test then defines the geometry. For instance, a sphere can be stored implicitly via a ray-sphere intersection test. The TLAS contain instances, where each instance stores a reference to a BLAS and additional data such as a transformation matrix. One or more instances can refer to the same BLAS. In this way, the TLAS can represent geometric instancing. Two objects can share the same underlying geometric data but apply different top-level transforms and hence be rendered as separate objects. This avoids duplicating geometric data and hence reducing the memory footprint of a scene. Furthermore, an instance’s top-level transform can be updated and applied at each new frame, which allows for simpler, rigid- body animations [8][10]. If all the geometry in a scene is static, the acceleration structures can be built once, and there will be no need to update them between frames. However, if a scene contains animated geometry, the acceleration structures must be updated - which can be requested by the programmer. There are different options available for updating the acceleration structures. As mentioned, for geometry animated using simple rigid-body animations, an update of the TLAS matrix can suffice since this can be done in each frame. For more complex animations involving the deformation of the mesh, the BLAS needs to be modified either by refitting the bounding boxes or by a complete rebuild. A rebuild is necessary if the mesh topology changes, which is an expensive operation [8]. If the topology remains the same, a refit of the geometry bounding boxes can be computed instead, which can be an order of magnitude faster than a rebuild [8]. Nevertheless, if the geometry changes too much over time, ray tracing performance may degrade, making a full rebuild necessary. Hence, there is a balance to be struck between refitting and rebuilding. One solution that a programmer can implement is a scheme where rebuilds are done at regular intervals. This would amortise the cost of the rebuilds across all the frames in the interval [8]. As an example of how the costs of acceleration structure rebuilds can vary, Deligiannis and Schmid described how they reduced the time to rebuild from 64 ms to 1.15 ms in Battlefield V, using various strategies [16]. Background | 25

2.2 Ambient occlusion

2.2.1 Ambient occlusion In this section the theoretical framework for ambient occlusion is presented, as well as some variations on it. It presents the fundamental simplifications that ambient occlusion relies on, which are important to understand as they mean that ambient occlusion is not physically correct - but can still produce compelling results, see Figure 1.1. Performing a full global illumination computation in a scene is an expensive proposition because of the rendering equation’s recursive nature. Recall that to know the outgoing radiance at a point p one must first calculate the outgoing radiance from all surrounding points that might influence the present point. The same applies recursively to each of these points, ad infinitum (or limited by some fixed recursive depth), Due to a limited computational budget, real-time rendering must avoid recursive global illumination calculations. Instead, it relies on combining local illumination, excluding interreflections between surfaces, with other techniques to reintroduce the impression of global illumination. In such schemes, light is often divided into different components (diffuse, ambient and specular) and treated separately until finally combined to produce the final colour of a pixel. Simply put, ambient occlusion is a method for approximating ambient global illumination but at a much lower cost than a full computation of the rendering equation. This is done under certain assumptions about the ambient light. The mathematical formulation of ambient occlusion goes back to the early 80s. The explicit use of ambient occlusion as a rendering method seems to have been introduced by Landis in 2002 when ILM started to use ambient occlusion for movie productions [17]. In Cook and Torrance 1982 [18], the authors describe ambient light as the aggregation of all light not coming directly from a specific light source. They make some further assumptions regarding this ambient light. First, that the ambient light is uniformly incident; that is, the incoming radiance is the same for all incoming directions. Second, the reflectance of the surface is independent of the viewing direction. Under these assumptions, Cook and Torrance formulated ambient illumination in the following manner: 1 Z LOa = RaLia (n · l)dl, (2.3) π V (Ω) 26 | Background

where Ra is the hemispherical-directional reflectance and Lia is the radiance of the ambient light, Ω is the normal-oriented unit hemisphere around p and V (Ω) is the unoccluded portion of the hemisphere. Due to the assumptions stated above, both Ra and LiA are constants. If the hemisphere is perfectly unoccluded, the ambient illumination will be uniform over each objects’ entire surfaces, regardless of shape and surroundings, thus appearing flat. This model can be simplified further by assuming that all surfaces are Lambertians. For a Lambertian surface, both the BRDF and the hemispherical- ρ directional reflectance are constants [1]. This means that, fLambert(l, v) = π , Ra = πf(l, v) , hence Ra = ρ, where ρ is the surface albedo. Another way of expressing Equation 2.3 is to integrate over the entire hemisphere Ω and use a function V that determines which incoming ambient radiance is occluded and hence discarded. This function is called the visibility function and is defined as a ray cast from p in the direction of the incoming light l. If the ray intersects an object, it means that light from this direction should be discarded. The canonical form for the ambient occlusion factor A, can thus be stated as 1 Z A(p) = V (p, l)(n · l)dl, (2.4) π Ω where V is the visibility function. V has two possible values as output, 1 either zero if the light direction is obscured and otherwise one. The π is a normalisation factor, such that the range of A ∈ [0, 1], and (n · l) is the Lambert, which weighs light directions closer to the surface normal heavier than light coming from directions closer to the tangent plane, see Figure 2.4 for an illustration. Moving forward, it will be assumed that A depends on p, and will therefore be abbreviated without the argument. Equation 2.4 render objects such that crevices are darker and flat surfaces lighter. This rendering contains enough visual information and cues to convey the form and shape of an object. Unfortunately, assuming that the incoming radiance from an occluded direction is zero is not physically correct as this leads to a loss in energy. In reality, there would be some amount of ambient light from occluded directions due to interreflections between surfaces. The loss of energy manifests as surfaces appearing darker than they should. Also, consider the case where a scene consists of some entirely enclosed objects, e.g. a room without windows. Since rays cast from a point on any object inside this scene would be guaranteed to intersect with some other geometry, the ambient occlusion would be zero for the entire surface of that object. We shall describe ways of addressing these issues below. Background | 27

Figure 2.4 – A point p is sampled by casting rays. Notice that rays closer to the normal are fatter, indicating that they contribute more to the ambient occlusion factor. This point would likely receive an ambient occlusion factor ≈ 0.5

To summarise, ambient occlusion can be interpreted as a formulation of the rendering equation, where all light is uniform, all surfaces are Lambertian and only a single bounce of light is allowed, i.e., there are no interreflections between points [4].

Ambient obscurance Zhukov et al. 1988 [19] proposed a slightly different ambient light model, which they called the obscurance illumination model, which today is widely referred to as ambient obscurance in the literature. The main difference between ambient occlusion and ambient obscurance is the use of a continuous function ρO(l) instead of the binary V from Equation 2.4. In this text, the subscript O is used to disambiguate the obscurance visibility function ρO from a surface’s albedo ρ. The equation 1 Z A = ρO(l)(n · l)dl, (2.5) π Ω 28 | Background

shows a formulation of ambient obscurance. ρO functions similar to V , in that it uses ray casting, but it takes into account the distance, d, from the point p to an intersection. Thus the obscurance of faraway occluders is attenuated. It is common to clamp the distance at a value dmax; for any distance beyond dmax, ρO will be equal to one. Replacing V with ρO can result in renderings that are visually more pleasing since this avoids the over-darkening that V can cause. It can also solved the problem with full occlusion in an enclosed space by tuning dmax. Nevertheless, this new visibility function is ad hoc, in the sense that it does not result in physically correct results.

2.2.2 Ray traced ambient occlusion (RTAO) The ambient occlusion equation, Equation 2.4 can be approximated using a Monte Carlo estimator as

" N # 1 X 1 A = E V (li)(li · n) (2.6) N π i=1 where N is the number of samples, li is the direction of the i:th ray, sampled randomly over the hemisphere and E is the expected value. Note that this estimator involves sampling N rays uniformly over the hemisphere Ω and summarise by cosine-weighting each contribution. Unfortunately, this means that we may do work that contribute little to the final result; consider a ray that is close to the horizon - the dot product with the surface normal will yield a very small value. To keep computation time down, using fewer but more intelligent samples is possible, this is called importance sampling. One example of importance sampling is Malley’s scheme[1]. In this scheme, the distribution from which ray directions are randomly chosen is cosine- weighted. Hence, it is more probable that a direction close to the surface normal is chosen. Since these directions has the most substantial impact on the final results we can get away with fewer samples but achieve the same quality. For instance, if we use a cosine distribution instead of a uniform distribution when choosing which rays to sample, roughly half the number of samples are needed to achieve a similar image quality [20]. Using a cosine-distribution mean that the probability of choosing a given direction, l, can be expressed as p(l) = (l · n)/π, where π is used to normalise the probabilities. We can divide the estimator in Equation 2.6 by Background | 29

this probability, to normalise the contribution of each sample

" N # " N # 1 X 1 Vd(ωi)(li · n) 1 X A = E = E V (li) (2.7) N π (l · n) 1 N i=1 i π i=1

. Now, each sample contributes the same (either zero or one, depending on the visibility function), and it is more likely to sample directions closer to the surface normal, which is what we wanted to achieve. Use of Monte Carlo integration introduces noise, even for high sample rates. Therefore, it is important to use apply a denoising pass on the ambient occlusion results, preferably with a geometrically aware blur filter, such as a bilateral filter. For real-time applications, ray traced ambient occlusion (RTAO) has mostly been computed offline due to the computational costs, and then baked into a data structure such as a texture or directly into vertices [1]. This method’s disadvantages are the need for preprocessing all assets in a scene and that the baked data requires extra memory. Therefore, it is not suitable for dynamic scenes that contain animated geometry. In this thesis, hardware-accelerated ray tracing will be leveraged to compute RTAO in real-time.

2.2.3 Screen space ambient occlusion (SSAO) Broadly speaking, screen space ambient occlusion (SSAO) is a family of ambient occlusion methods that use the z-buffer as its input and optionally other information such as surface normals, although such information can be computed from the z-buffer. The terms z-buffer, depth buffer and depth image will be used interchangeably in this text. An attractive feature of SSAO is that the algorithm’s time complexity depends on the depth buffer’s resolution, i.e. the number of pixels it contains and not on the geometric complexity of the scene. This fact makes screen space methods well suited for dynamic, real-time use. It also makes it easy to control the computation time by lowering the depth buffer’s resolution and then upscaling the results before composing with the final image. Since ambient occlusion is mostly a low-frequency effect the results from such a process can be passable [21]. Unfortunately, computing the AO from the depth buffer puts limitations on the accuracy of the results. For instance, information about any occluders must be in the depth buffer (i.e. in the camera’s view). This means that no occlusion 30 | Background

Figure 2.5 – Illustration of raw RTAO (left) with one sample per pixel, and RTAO that has been denoised using a spatio-temporal approach and an edge- aware filter (right). Image from D3D12 Raytracing Real-Time Denoised Ambient Occlusion sample by Peter Kristof, Microsoft. can come from object’s that are off-screen. A related problem is that points close to the edges of the depth buffer need special treatment. Various SSAO methods have been proposed to mitigate such problems, but these solutions remain ad hoc in the sense that they do not solve the fundamental limitations stemming from using the depth buffer to find occluders. RTAO is not limited in this manner, since it has access to the entire scene geometry when searching for occluders. Nevertheless, the trade-off between real-time performance and visual quality offered by SSAO methods have made them a popular method in video games. SSAO methods usually involve a discrete step in the rendering pipeline and generally finishes with a denoising step [5]. There exist a plethora of SSAO methods, and it is not the intention of this text to outline or explain all of them. The Crytek SSAO (see Section 2.6.1) is described, for reference, as it is attributed as the first SSAO method. The additional SSAO methods that this text describes have direct relevance for this thesis. The interested reader is encouraged to read [1] or [5] for a more exhaustive walkthrough of this family of methods. These also go into more detail of the limitations of various SSAO methods.

2.3 Denoising

Since denoising is an essential part of ray tracing, such as RTAO, and extensively used for SSAO methods, this section will provide a brief theoretical background on denoising and outline some state-of-the-art research in this area. Background | 31

A traditional approach to denoise an image involves processing an image with a blur filter. Image processing methods takes an image as input, computes a new value for each pixel in the image via some operation, and then outputs a new image. GPUs can perform various image processing efficiently - either via a pixel shader or a compute shader [1]. The shader takes the input image as a texture. The shader processes each texture element (analogous to a pixel), texel, and then renders the result to a backbuffer or another texture. The rest of this text will assume that the image processing is done on the GPU and the terminology used will reflect this assumption. The new value for a texel is calculated by accounting for the neighbouring texels. A filtering kernel determines the exact way this is done. The number of neighbouring texels included is called the kernel diameter. As an aside, the kernel diameter can be adjusted according to the estimated variance in an image. Variance can be estimated from the samples from the Monte Carlo integration, for instance. The estimated variance indicates how much a texel or pixel deviates from the expected value. An area with high variance may require a larger kernel diameter, while an area with small variance can get away with a smaller one [20]. For a noisy image, such as can be produced by either RTAO or an SSAO method, we might want to apply a blur filter, to reduce the noise. An example of a filter kernel that is commonly used to blur an image is the Gaussian Filter, defined as 1 −r2 G(x) = ( √ )e 2σ2 (2.8) σ 2π where σ is the standard deviation and r is the distance to the texel’s centre. Increasing σ results in the application of a stronger blur. When applying the Gaussian kernel to a texel, its new value will be calculated by averaging adjacent texels, weighted by the distance to the centre of the texel. The Gaussian Filter is an example of a separable filter, which means that it can be applied in two separate passes. Each pass is applied along a single dimension and yields the same results as a single pass along two dimensions. The main benefit of splitting the filtering into two passes is that this reduces the number of samples from n2 to 2n per texel, where n is the kernel diameter. Unfortunately, Gaussian blur can cause edges to become too soft or lost. In order to preserve edges while smoothing an image, a bilateral filter can be used instead. The idea behind bilateral filters is that whether or not two texels (or, more generally, pixels) are close to each other is determined by their spatial proximity and additional properties, such as their colour, brightness or surface normal. When applying the weighted average to calculate the new value of a 32 | Background

texel, the filter can ignore texels that are spatially close but appear unrelated because they differ in the other properties. In this way, the filter can preserve edges while smoothing the rest of the image. This feature makes bilateral filters an attractive choice for denoising. Bilateral filters can also be used to upscale images to higher resolutions, called upsampling[1]. For instance, a common technique is to render SSAO at a lower resolution than the final render target resolution and upscale the results. A thorough introduction to bilateral filters is given in [22]. Another effective approach to reducing noise is to accumulate and incorporate results from previous frames. Generally, using information from previous frames is called reverse reprojection and can be applied to a variety of rendering techniques [23]. Spatio-temporal reprojection is such a technique. Spatio-temporal reprojection uses a velocity buffer, which encodes the difference of the position of each rendered vertex from one frame to another. Then, the velocity buffer is used to reproject data from earlier frames into the current frame, given that this part of the scene has not changed too much. A downside of spatio-temporal reprojection is that it can introduce lag if a scene changes between frames, because outdated information about the scene can linger in the accumulated frame data used for the reprojection. This can also manifest as visible "ghosting" artefacts. There are filtering techniques specifically developed for Monte Carlo ray tracing in real-time, such as Edge-Avoiding À-Trous Wavelet Transform [24]. Its an edge-aware filter that can be computed faster than bilateral filters. In [11], Schied et al. built on this work and introduced Spatio-Temporal Variance Guided Filter (SVGF), which is a reconstruction algorithm that uses spatio- temporal reprojection and spatiotemporal luminance variance estimates to guide an À-Trous filter. The term reconstruction algorithm is used because it presumes an image that has been rendered with path tracing, using a single sample per pixel, as input - such an image will contain so much noise that it becomes more sensible to frame it as a reconstruction problem rather than denoising [7]. The authors show how SVGF can be used to reconstruct an image from an input of RTAO that only uses a single sample per pixel. An example of SVGF and RTAO can be seen in Image 2.5. An improvement to SVGF was made, called Adaptive Spatio-Temporal Variance Guided Filter (A-SVGF) which reduced the temporal lag and ghosting present in SVGF during fast motion [25]. A-SVGF appear to be the-state-of- the-art of this type of reconstruction algorithm at the moment. Finally, it is worth mentioning that machine learning (ML) based denoising is a research area that has received much attention in the last few years. Background | 33

ML can be used in various ways for denoising. For instance, they can be used as autoencoders or used to drive filtering kernels, and the results can be impressive. Nevertheless, at the moment, ML approaches appear to be slower than methods such as A-SVGF [26]. Some notable ML-based denoisers are OIDN by Intel and OptiX by NVIDIA [7]. NVIDIA also provides Deep Learning Super Sampling (DLSS) to select video games on PC, given that the user has an NVIDIA GPU that supports the feature. As the name implies, DLSS uses a trained neural network to upscale renderings to higher resolutions, allowing the rendering to occur at a lower resolution, computationally cheaper, and then upscaled before presenting. As mentioned above, denoising has become an area of intensive research. One strong motivation for this trend is that real-time ray tracing is dependant on good denoising algorithms, since it can afford so few samples. The interested reader is encourage look at [26] for an overview of the current state- of-art in denoising, specifically as it relates to real-time ray tracing.

2.4 Animation

Computer animation is a vast subject and this text will only concern the basic operations that animate 3d meshes. These are outlined here to provide a basic understanding of the differences between rigid-body animation and vertex-blending (also called skinning) since these are the types of animations investigated in this thesis.

2.4.1 Rigid-body transformation A rigid-body transformation is a concatenation of rotation and translation transformations. A rigid-body transform preserves angles and lengths between vertices and the orientation’s handedness (i.e. the cross-product). A matrix can encode a rigid-body transform,   r00 r01 r02 tx r10 r11 r12 ty X = TR =   (2.9) r20 r21 r22 tz  0 0 0 1 , where T is a translation transform matrix and R is a rotation transform matrix [1]. Animating an object using rigid-body transform consists of applying the transform vi(t) = X(t)vi(t − 1) to each vertex, vi , i ∈ m, of a mesh m, for the N number of frames of the animation. 34 | Background

Rigid-body transformations are useful for simpler animations, particularly of non-organic objects. However, rigid-body transforms do not lend themselves to objects that consists of different members, where members connect to other members via joints. In theory, each member can have a rigid-body transform applied independently. Unfortunately, the joint will look unconvincing as the intersection of the members connected through this joint defines the joint’s appearance.

2.4.2 Vertex-blending or "skinning" Vertex-blending, also called linear-blend skinning (or just "skinning") solves the issue of how the geometry in a joint should behave, when the members attached to the joint move. A vertex-blending animation system consists of a skeleton consisting of bones and joints and a skin. A bone essentially defines a linear transform for a specific part of a mesh. The skin contains all vertices that can be affected by a bone transform. A single vertex can be subject to several transforms, depending on its proximity to the skeleton’s bones. The transformation of a vertex results from all these transforms added together in a weighted fashion. More formally, this can be expressed as

N−1 N−1 X −1 X v(t) = wiBi(t)Mi u, where wi = 1, wi ≥ 0, (2.10) i=0 i=0 where v is the transformed vertex position with respect to the time t, u is the original vertex position, Bi(t) is the i-th bone matrix transform at time t and Mi is the transform from bone matrix Bi to world coordinates [1]. In essence, equation 2.10 interpolates (or blends) between different positions of the vertex to produce the final position v at time t.

2.5 SSIM

2.5.1 Image quality assessment Image quality assessment metrics can be used to monitor streaming services and optimise and benchmark image processing algorithms [27]. In the context of this thesis, benchmarking image processing algorithms is relevant. The gold standard in quality assessment (QA) is subjective QA [12]. In subjective QA, a group of human observers assess and rate the perceived Background | 35

quality of an image. These observers can be trained or naive. The results of each observer are averaged into a mean opinion score (MOS) [12]. Unfortunately, subjective QA is expensive and time-consuming. A test will usually require a large group of observers since the variance of the observers’ ratings can be high [12]. Also, the process cannot be automated. In contrast, objective QA relies upon an algorithm instead of human observers. Objective QA algorithms aims to compute a value that correlates with the MOS, called MOS prediction. A good objective QA algorithm has a correlation score > 0.8 with the corresponding MOS [12]. Objective QA methods divide into three categories: full-reference, reduced- reference and no-reference. Full-reference methods rely on assessing the image quality of a test image compared to an original reference image. Reduced reference methods do not need the full original image but instead uses features that are representative of the original image. No-reference methods do not require any information from the original image [12]. Since this thesis will compare offline rendered reference images with real-time rendered images, only a full-reference method is considered further in this text.

2.5.2 Structural similarity index metric (SSIM) In this thesis, a method called Structural Similarity Index Metric (SSIM) will be used to compute the objective QA metric. SSIM was introduced in [27], where the authors developed a new approach to image QA, which they based on the assumption that the human visual system (HVS) is adapted to extract structural information from the field of view [27]. Structural information is attributes that represent the structure of the objects in the field of view. These structures can be thought of as dependencies between pixels, particularly pixels that are near each other. The structure of an object is independent of the luminance and local contrast [27]. SSIM incorporates a comparison of structural information into the image quality assessment with the assumption that if a distorted image preserves a reference image’s structures, this will correlate with higher perceived image quality. Computing SSIM entails combining three comparisons between the test image signal with the reference image signal: a luminance comparison, a contrast comparison and a structure comparison. Let x and y be non-negative image signals, each taken from a portion (window) of the images, where one of the signals is the reference signal, then

SSIM(x, y) = bl(x, y)cα · bc(x, y)cβ · bs(x, y)cγ, (2.11) 36 | Background

where l(x, y) is a the luminance component, c(x, y) is the contrast component, s(x, y) is the structure component and α, β, γ > 0 are parameters that weight the contribution of each of the components [27]. In [27], the authors computes Equation 2.11 for a 11 × 11 window around a given pixel. This window uses a circular-symmetric Gaussian weighting function [27]. For a full image assessment, they move the window over each pixel in the image. They calculate the overall score, the mean SSIM (MSSIM) by

N 1 X MSSIM(X, Y) = SSIM(xi, y ), (2.12) N i i=1 where X and Y are the reference and comparison images respectively, x and y are the image content at the i-th local window and N is the number of local windows in the image [27]. A key reason to use SSIM in this project is to provide a metric that can be used on a large number of images - each being a rendering of a single frame of a video. Then the overall image quality can be approximated using the average SSIM over all the frames . Also, the stability of the image quality can be approximated by looking at the standard deviation. There exists different variations of SSIM algorithms and some are used for measuring video quality [12]. This thesis will use the original SSIM metric, as presented in [27].

2.6 Related work

2.6.1 Crytek SSAO The first SSAO method was developed and introduced by Crytek in 2007 [3]. The only input is the depth buffer. The method is conceptually simple; it estimates each pixel’s ambient occlusion by choosing a finite set of random samples selected from a sphere around the pixel. It is easy to convert the pixel’s position into camera space; the xy-component is the pixel’s image coordinates, and the z-component uses the depth buffer’s value at the given pixel. The depth value of each sample is compared with the depth buffer’s corresponding value at the same xy-position. If the sample has a lower z-value (i.e. is in front of the geometry in the depth buffer), it is not occluded; otherwise, it is occluded. The ratio of occluded compared to the total number of samples determines the final ambient occlusion factor. The ambient occlusion is calculated according to Equation 2.13. If all samples pass, the surface will have an ambient occlusion factor of one and conversely zero if all samples fail the test. See Figure 2.6 for Background | 37

Figure 2.6 – Illustration of Crytek SSAO. Two points are sampled with six samples each. Note that p1 will be over-occluded because the samples are taken from a sphere. an illustration of the sampling process.

N 1 X A = V (si) (2.13) N i=1

While this method is conceptually easy to grasp and can produce visually pleasing results, it has some drawbacks. Its main issue is that each sample’s contribution is not cosine-weighted, which leads to incorrect results. Since all samples are collected from a sphere, a point on a flat surface will appear darker, because the samples below the surface contribute to the final ambient occlusion factor. Also, points near the edges of a surface will appear lighter. See Figure 2.7 for an example.

2.6.2 Horizon-based ambient occlusion Horizon-based ambient occlusion (HBAO), developed by Bavoil et al. [28], belongs to the SSAO-family, in that it uses the depth buffer as its input. The main idea behind HBAO is to treat the depth buffer as a continuous heightfield. It uses ideas similar to horizon mapping, by [29]. Bavoil et al. starts with the 38 | Background

Figure 2.7 – Demonstration of Crytek SSAO. Image taken from [3]. Note how flat surfaces are lighter towards the edges, a characteristic of this method and a consequence of the sampling method used. following formulation of ambient occlusion 1 Z A = 1 − V ∗(¯ω)W (¯ω)dω, (2.14) 2π Ω where V ∗ is a visibility function that equals one for occluded directions and otherwise zero. Hence, V ∗ is an inversion of V from Equation 2.4. W is a linear attenuation function, similar to Zuhkov et al, see Equation 2.5. Notice that Equation 2.14 inverts A. However, due to the definition of V ∗, A = 1 means no occlusion and zero means full occlusion and is therefore interpreted the same way as before. Furthermore, Equation 2.14 is uniformly-weighted instead of cosine-weighted. The ambient light from all incoming directions contributes equally, which will be physically inaccurate. By treating the depth buffer as a continuous heightfield, all directions ω¯ below the horizon angle are assumed to be occluded. Then, Equation 2.14 can stated as 1 Z π Z h(θ) A = 1 − W (¯ω) cos(α)dαdθ, (2.15) 2π θ=−π α=t(θ) Background | 39

Figure 2.8 – Demonstration of HBAO. Image taken from [3]. which uses a spherical coordinate system, where the zenith axis is aligned with the view direction v, θ is the azimuth angle and α is the elevation angle. Notice that Equation 2.15 interprets A as the ambient occlusion of an unoccluded hemisphere minus the part of the hemisphere below the horizon, which is treated as occluded. The inner-integral can be solved analytically; Equation 2.15 becomes 1 Z π A = 1 − (sin h(θ) − sin t(θ))W (θ)dθ, (2.16) 2π θ=−π which is the final form used by Bavoil et al., where h(θ) is the horizon angle, such that it is the maximum elevation angle α ≥ t(θ) for which all directions with angle α < h(θ) are occluded. t(θ) is the tangent angle, which is the signed elevation angel of the surface tangent vector [28]. Bavoil et al. defines W (θ) = max(0, 1 − r(θ)/R) where r(θ) is the distance between the point p, and the horizon point in direction ω and R is the radius of influence. See Figure 2.9 for an illustration of the relevant components. Figure 2.8 show a rendering done with HBAO. Equation 2.16 can be evaluated using Monte Carlo integration. For each pixel, Nd directions are sampled uniformly, by picking an angle θ and finding the horizon angle h(θ) in this direction. This is done in the following manner; the point p is the reconstructed position of the pixel in camera space, where the z-component is obtained from the depth buffer. Then Bavoil et al. calculates t(θ) by intersecting the view vector with the tangent plane, defined by p and surface normal n. They then ray march the heightfield, in image space, starting at p and stepping in the direction given by θ. Ns steps are taken for each direction, each step generating a sample si. Each sample’s position is transformed into camera space and the horizon vector Hi is computed as Hi = si − p . Finally, the elevation angle of the sample is computed, α(si) = arctan(−Hi.z/kHi.xyk) and subsequently the horizon angle is 40 | Background

Figure 2.9 – A conceptual overview of the main components used to calculate HBAO. Refer to the text for an explanation.

h(θ) = max (t(θ), α(si)). i=1,...,Ns Samples are only taken within the radius of influence R , which is defined in camera space and projected into screen space. Both the direction angle and step size are randomised, per pixel, to avoid banding. The per- pixel randomisation introduces noise instead and a denoising step becomes necessary, for this purpose Bavoil et al. uses a cross bilateral filter [21]. The attenuation function is applied per sample. The resulting samples are then averaged and the final ambient occlusion value is calculated for the pixel.

2.6.3 Ground truth ambient occlusion (GTAO) Ground truth ambient occlusion (GTAO) represents the current state-of-the- art of SSAO methods. It was presented in [4], 2016, and built on the horizon- based approach, but it differs from HBAO significantly. The aim of Jimenez et al. when developing GTAO was to create results that match a ground truth ray traced ambient occlusion while maintaining real-time performance [4]. To achieve this, the authors needed to reintroduce cosine weighting in GTAO instead of uniform weighting as in HBAO. They also use a different coordinate system, defining the angles with respect to the view vector v instead of the Background | 41

surface tangent. The formulation of A by Jimenez et al. is

1 Z π Z h2(φ) A = cos(θ − γ)+| sin θ|dθ dφ, (2.17) π 0 h1(φ) | {z } Π where φ is the azimuth angle, θ a polar angle along view vector v , γ is the angle between the normal n and the view vector v and h1(φ) and h2(φ) are the horizon angles. Note that we are considering the entire slice of the hemisphere, and therefore need two horizon angles, taken from opposite directions. The + operator signifies that the cos function is clamped between 0 and cos(x). There is no attenuation factor in Equation 2.17. As with Equation 2.15, the inner integral, above denoted as Π, essentially representing an arc segment of the slice, can be solved analytically, although it is slightly more involved for 2.17: 1 1 Π(h , h , γ) = (− cos(2h −γ)+cos γ+2h sin γ) + (− cos(2h −γ)+cos γ+2h sin γ) 1 2 4 1 1 4 2 2 (2.18) Jimenez et al. point out that Equation 2.18 relies on the normal n lying in the plane defined by the horizon vectors [4]. If not, they use a normalised projection of the normal onto the plane instead, as shown by [30]. Finally, the ambient occlusion can be expressed as Z π 1 0 A = knpk Π(h1(φ), h2(φ), γ ) dφ, (2.19) π 0

n γ0 = arccos( np · v) where p is the projected normal and knpk . Figure 2.10 shows an overview of the components used when calculating the horizon angles in GTAO. To solve Equation 2.19, the horizon angles h1 and h2 need to be found. For each pixel, pˆ , in the depth buffer, the position of the pixel in camera space, p, is computed. In the depth image plane, a ray is cast ˆr(t) = pˆ + tdˆ(φ), t ∈ [0, 1] , where dˆ(φ) is the direction, in the image plane. The azimuth angle φ is sampled uniformly around the view vector v. The sample position, pˆs(t), is converted to camera space, ps(t). We then define the vector vs = ps − p. The angle h1 can then be defined as  h1(φ) = arccos max (vs · v/|vs|) , i=1,..,n/2 where n is the number of samples per pixel. Similarly, we can find h2, by 42 | Background

casting a ray in the direction −dˆ(φ).

Figure 2.10 – A diagram of the reference frame used for calculating the horizon angles in GTAO, θ1 and θ2. ωi is the view vector. Figure taken from [4]

Reintroducing interreflections Equation 2.19 only considers a single bounce of light; there are no interreflections. This causes a loss of energy since, in reality, there would be multiple bounces of light between surfaces. To achieve a more physically correct result, Jimenez et al. re-introduce interreflections [4]. Their approach is based on the observation that there is a strong correlation between ambient reflection, surface type and global illumination. Jimenez et al. use a data-driven method to model this correlation. For a given albedo ρ, they calculate the ambient occlusion, AO, using offline Monte Carlo ray tracing with a single bounce of light. Then they increase the number of bounces and recalculate the ambient occlusion, AO0. Now, they fit a function f such that

AO0 = f(AO, ρ)

By doing this for different albedo values they found that a cubic polynomial could generalise the pattern in the data and therefore f was defined as

f(A, ρ) = ((aA + b)A + c)A, (2.20) where A is the ambient occlusion from Equation 2.19 and ρ is the surface albedo. The coefficients of this cubic polynomial are modelled as linear Background | 43

functions

a = a0 + a1ρ

b = b0 + b1ρ

c = c0 + c1ρ

Equation 2.20 then represents the final ambient occlusion, where the light from interreflections have been reintroduced.

2.6.4 RTAO and SSAO In ’Real-time raytracing and screen-space ambient occlusion’, [9], the author compares the performance characteristics and image accuracy of several SSAO methods (including HBAO) with real-time RTAO implemented using DXR. The different methods’ performance is investigated by scaling parameters (such as resolution, the radius of influence and sample count) and recording how an increase in these parameters impacts each method’s computation time per frame. Image accuracy is measured using the SSIM metric (see Section 2.5.2.) The author uses the Sponza scene for tests and employs two setups: a close-up view of a statue and a hall view. The author concludes that RTAO delivers superior accuracy for both the close-up view (91% SSIM) and the hall view (92% SSIM)∗. Overall, the SSAO methods performed poorly on the close-up view - the method that reached the highest accuracy in this test was Unity’s SAO (70% SSIM). The SSAO methods collectively fared better in the hall view, the best performing method being HBAO (84% SSIM). RTAO’s superior accuracy was expected since it has access to the entire scene geometry, compared to the SSAO methods which primarily rely on the depth buffer for the geometrical information. RTAO does come with a higher performance cost, as it is significantly more expensive than the SSAO methods. The SSAO method that the author singled out as having a great balance between accuracy and performance was Multi-scale volumetric AO (MSVO). As an example of the computational cost difference, for a resolution of 1920x1080, MSVO could be computed in 0.7470 ms while RTAO needed 4.0350 ms. Overall, RTAO appeared to be 4-5 times more expensive than MSVO. Compared with HBAO, RTAO appeared to be 3-4 times more expensive. Note that these are rough estimates (done by the present author

∗ For brevity, only the results of the configuration that produced the highest accuracy is presented here 44 | Background

and not included in [9]) to illustrate the relative difference in performance between the methods. In any case, for real-time applications, these constitute significant differences in performance. Also, RTAO scales somewhat poorly with an increase in resolution, sampling radius and sample count, especially given the high base-cost of RTAO. However, the author notes that an increased sample count has a marginal impact on the accuracy beyond four samples per pixel (spp) and can produce good accuracy even for one spp. Furthermore, [9] mentions some common issues with SSAO methods that lead to visual artefacts. The first is banding and noise, which are the result of under-sampling. Many authors of SSAO methods prefer noise over banding as this tend to be less visually distracting. Another common problem is that there is no occlusion around the edges of the screen. This absence is due to the lack of information about the scene outside the depth buffer. A problem with a similar root cause is under-occlusion which causes a surface to appear brighter than it should. Under-occlusion can happen because an occluder disappeared from view, in-between frames. Over-occlusion can occur due to discontinuities in the depth buffer. There can also be problems with flickering and blurring, All tests in [9] were done in scenes without animated objects. Therefore, it will be interesting to see how the performance of RTAO is impacted in the presence of animated geometry. This may require acceleration structure rebuilds (see Section 2.1.4). Also, it will be interesting to see how the image quality of both RTAO and a SSAO method is impacted by geometry that is moving. In the latter case, particularly as objects enter and exit the field of view of the camera. [9] provides context for this thesis by showing how RTAO performs in scenes with no animated objects. The author of [9] used the first generation of NVIDIA ray tracing hardware (i.e. the Turing-class GPUs) for his measurements. In contrast, this thesis will use the next generation. Besides improvements in hardware, one can also expect to see graphics card drivers’ optimisations since 2019, when [9] was carried out. Together, this should imply a significant performance increase. To what extent this makes real-time RTAO feasible remains to be seen. Method | 45

Chapter 3

Method

3.0.1 Overview of the experimental design

Figure 3.1 – Overview of the evaluation scheme.

The experiments divide into two test categories. The first category will evaluate the ambient occlusion methods’ image quality, addressing Research 46 | Method

Question 1 and Research Goal 1, see Section 1.3 and Section 1.5. The second test category will measure each method’s performance, addressing Research Question 2 and Research Goal 2. Figure 3.1 gives a schematic overview of how the evaluation is organised. Unity∗ will be used to build the experimental framework and provides the implementation of the real-time ambient occlusion methods.

3.0.2 Image quality tests

Figure 3.2 – A rendering of the UE4 Sun Temple, done in Maya. The camera view presents a deep view of the hallway, which can allow testing of occlusion as objects move in and out of depth and obscure each other. Also note the varying degree of detail and mixture of smoother and sharper shapes.

The image quality tests will consist of three different scenarios, based on two scenes; see Figure 3.1 for an illustration of how the scenarios and scenes relate. The first two scenarios Scenario IQ.1 and Scenario IQ.2 will use the UE4 Sun Temple [31]. The third scenario, Scenario IQ.3, will use a scene consisting of a close-up of a human face against a flat background. The UE4 Sun Temple consists of classical architecture and sculptures with varying degrees of detail. Hence, the scene has many perpendicular angles and sharp changes in surface geometry and smoother shapes. The surface normal of fairly close points can change dramatically in the busiest areas. This may require a higher sample rate to compute the AO in these areas accurately. See Figure 3.2 for a rendering of the UE4 Sun Temple. Moreover, the camera is situated at eye level and provides a deep view of a hallway. Animated objects will move along the depth axis, obscuring the background geometry and sometimes each other. This view will pose a challenge to the assumption of the continuous heightfield of GTAO.

∗ This thesis used Unity version 2020.3.0f1 LTS. Method | 47

Scenario IQ.1 will consist only of objects that are animated using rigid- body animations while Scenario IQ.2 will only use models animated by skinning animations. The reason for this is to investigate if there is a difference in image quality and performance between rigid-body animation and skinning animations. The rigid-body animations consist of translations, rotations, and combinations of these on models that the original scene contains. The model used for the hierarchical skinning animations and the animation data comes from Renderpeople∗. The animation portrays a walk.

Figure 3.3 – A rendering of the scene used for Scenario IQ.3 - consisting of high resolution face that employs facial rigging for the animations. Rendered in Maya.

Scenario IQ.3 will be a close up of a high-resolution model of a human face before a flat background†. In contrast to UE4 Sun Temple, this scene predominantly consists of continuous organic shapes. The surface normal of two nearby points will likely be fairly similar, with gradual transitions across a large surface. See Figure 3.3 for a rendering of this scene. The animations in Scenario IQ.3 use facial rigging to create changes to the facial expressions of the model. This scenario will test how the AO methods behave in a scene with different geometrical characteristics compared to the UE4 Sun Temple, test a more subtle type of animation than the animations used in Scenario IQ.1-2 and investigate the result of the AO methods in a context where visual artefacts would likely be extra apparent and disturbing.

Data gathering and analysis Figure 3.4 shows an overview of how the image quality tests are going to be carried out. Scenarios IQ.1-3 will be rendered in Maya using the

∗ The ’Nathan’ model, which is rigged and includes a walking animation, https://renderpeople.com/free-3d-people/ † Animatable Digital Double of Louise by Eisko© ( www.eisko.com). 48 | Method

Figure 3.4 – Overview of the image quality tests.

Arnold renderer. These renderings provide the reference images for the visual inspection and are used to compute the SSIM scores. Scenario IQ.1-2 will be rendered as a sequence of 600 frames, where each frame is saved as a separate image. The rendering resolution is 2560 x 1440 (1440p), and the images will be stored in the .png-format. Similarly, for Scenario IQ.3, which consists of 200 frames. In Unity, each frame of Scenario IQ.1-3 will be rendered and stored using the Unity Recorder. The file format and resolution will be the same as for the reference images. For both ambient occlusion method, RTAO and GTAO, three different quality configurations will be rendered. The quality configurations constitute different settings of the methods’ input parameters. Testing various quality configurations will provide insight into the range of possible image quality for both methods. The data analysis for the image quality will be performed as a combination of visual inspection and calculating the SSIM. This thesis will use Matlab’s SSIM implementation∗. The SSIM will be calculated on each consecutive image in the sequence of images, and then a mean SSIM will be calculated for the entire sequence. Also, local SSIM maps will be produced. These are images that illustrate where the reference image and the compared image deviate. In this way, it could be possible to see patterns in how the real-time

∗ Matlab version R2021a Method | 49

ambient occlusion methods deviate from the reference method.

3.0.3 Performance tests The performance tests aim to establish how each method’s computation time is affected by scaling input parameters. Section 3.1 describes each input parameter that will be investigated. The performance is measured as the computation time for the ambient occlusion method. The computation time for rendering an entire frame will not be considered, the reason being that the tests are not designed to consider the entire rendering pipeline. For instance, no textures will be used. In other words, the test cases are not representative of a real use case for an entire rendering pipeline. However, knowing the computation time spent on the actual ambient occlusion makes it possible to infer how this could perform in the context of a real application. The performance tests will consist of three different scenarios: Scenario P.1, Scenario P.2 and Scenario P.3, all based on the UE4 Sun Temple scene. Scenario P.1 will not contain any animated objects. Scene P.2 will consist of animated objects using rigid-body animations and, finally, Scenario P.3 will consist of objects animated using skinning-animations. Scenario P.2 will use a high-detailed statue as the animated object, included in the original scene. Scenario P.3 will use the same model and animation as Scenario IQ.2. Scenario P.1 will be used to establish a baseline performance for both RTAO and GTAO. These tests will also be used to determine how varying the method’s input parameters impacts performance. Each parameter will be tested in isolation by changing its value over a series of tests while the other parameters remain fixed. In this way, it can become easy to determine which parameters have the most impact on performance. The tests on Scenario P.2-3 will scale the number of on-screen objects from one to 640 while all other input parameters remain fixed. The animated objects will be instances of the same model, using the same animation, but spawned at different locations in space. Thus, measuring how an increase in geometric complexity affects performance becomes possible. Also, distinguishing between rigid-body animations and skinning animations will make it possible to measure if there is a difference in performance between the two, especially considering that they may require different treatment of the ray tracing acceleration structures. Performance test data will be gathered using NVIDIA’s Nsight Graphics∗ GPU trace functionality. Unity applications’ rendering code is instrumented

∗ Nsight version 2021.1.1 50 | Method

to make it easy to see how much time the GPU spends on each step in the rendering pipeline, using a profiler such as Nsight. This makes it possible to determine the computation time of the ambient occlusion methods in isolation.

3.1 Execution

Parameter Description Intensity Controls the darkness of the occlusion, a higher value leads to darker results. Radius The sample distance around a point (in screen space). This value is clamped by ’Maximum radius in pixels.’ Maximum radius in pixels The maximum distance (in terms of pixels) of the occlusion testing. Step count The number of steps used to test for occluders, i.e. how many samples are used per point.

Table 3.1 – GTAO parameter descriptions.

Parameter Description Intensity Controls the darkness of the occlusion, a higher value leads to darker results. Max ray length The maximum length of the ray casts that test for occluders. Sample count The number of rays used to sample each pixel. Denoiser radius Controls how much noise is reduced, a high value indicates more noise reduction but tended to produce more visible artefacts.

Table 3.2 – RTAO parameter descriptions.

All measurements were carried out on a desktop PC with a NVIDIA RTX 3070 GPU, an AMD Ryzen 7 2700X 8-core CPU and 32 GB DDR4 memory. Method | 51

Parameter Low Quality Medium Quality High Quality Intensity 1 1 1 Radius 4 4 4 Maximum radius in pixels 24 32 36 Step count 2 4 6

Table 3.3 – GTAO quality configurations, used in Scenario IQ.1-3 tests.

Parameter Low Quality Medium Quality High Quality Intensity 1 1 1 Maximum ray length 1 1 1 Sample count 2 4 8 Denoiser radius 0.057, 0.049, 0.001 0.034, 0.049, 0.002 0.034, 0.057, 0.002

Table 3.4 – RTAO quality configurations, used in Scenario IQ.1-3 tests. For ’Denoiser radius’ different values were used for each scenario, i.e. Scenario IQ.1, IQ.2 and IQ.3

3.1.1 Image quality tests The scenes were arranged in Maya∗ and then imported into Unity - this made it easier to ensure consistency between the renderings. The rendering of the scenes were carried out as described in Section 3.0.2. In Maya, all geometry in the scenes had the Arnold ambient occlusion material applied (called "aiAmbientOcclusion"), see Table 3.5 for the configuration used.

Parameter Value Samples 10 Spread 1.0 Falloff 0.0 Near Clip 0.0 Far Clip 100.0

Table 3.5 – Arnold ambient occlusion configurations, used for the reference renderings in Scenario IQ.1-3.

Since Unity and Maya are two very different software packages with different renderers, it is important to ensure that meaningful comparisons can be made between their respective output. This is particularly important since the Maya renderings are used as the reference images for the SSIM

∗ Maya 2020. 52 | Method

calculations. To this end, some base renderings were made in Maya and Unity, which consists of the Solar Temple scene rendered with flat colours. By comparing these renderings, any "baseline" deviations can be identified. These renderings can be seen in Figure 3.5. When blending the images using a "difference" mode, it became apparent that the renderings are almost identical but differ slightly around the contours of the objects. The reason for this deviation is unknown to the author. Furthermore, the SSIM for these baseline renderings was calculated, resulting in a value of 0.9999 SSIM, reflecting the small difference between the baseline renderings. This implies that, for the image evaluation using SSIM, we cannot expect to achieve an SSIM score of one, even if the AO calculations are identical between the two software. The image quality tests on Scenario IQ.1-3 used three different quality configurations for both AO methods. The specific configurations can be seen in Table 3.3 and 3.4. For descriptions of each parameter, refer to Table 3.1 for GTAO and Table 3.2 for RTAO. More detailed descriptions can be found in Unity 3D’s manual∗. Doing three quality configurations per AO method made it possible to test the range of the image quality improvements within a method while keeping the number of tests reasonable. The three categories of quality levels also mirror what video games typically offer as configuration options - which is a likely context for the use of real-time AO. According to what achieved the closest results to the reference images, the parameters ’Intensity’ (both GTAO and RTAO) and ’Denoiser radius’ (RTAO only) were manually set. Also, the ’Maximum ray length’ (RTAO only) was kept constant at one for similar reasons. Increasing this parameter tended to produce over-occluded surfaces. Admittedly, setting these parameters by hand involved a degree of subjectivity. However, it is not uncommon that AO must be manually tweaked by an artist to achieve the most pleasing results. These parameters were set before rendering the sequence of frames and subsequently computing the SSIM. Note that an additional parameter is used for the GTAO tests, not mentioned in Table 3.3. This parameter controls if GTAO is rendered at full resolution, i.e. the same resolution as the final render target. If not, GTAO is rendered at half the resolution of the render target. The tests that use full resolution GTAO will be clearly marked in the Results section. Unless otherwise stated, assume that GTAO is rendered at half the resolution of the render target.

∗ Unity 3D Ambient Occlusion documentation (Accessed May 2021) Method | 53

3.1.2 Performance tests

Data gathering Figure 3.7 show an overview of how the performance tests were carried out. A Unity application was built for Scenario P.1-3. On startup, this application reads a configuration file that controls the input parameters for the experiment. For instance, the rendering resolution can be set in the configuration file. Using a configuration file made it easier to perform successive tests without having to rebuild the application. In turn, this simplified the automation of testing. Data gathering was performed by running a script that invoked Nsight. At specific time intervals, Nsight started a GPU trace, sampling 15 consecutive frames (this was the maximum possible value). For each test conducted on Scenario P.1, the GPU trace was repeated three times, thus totalling 3 × 15 = 45 samples for each test. A test here refers to a testing instance where a specific input parameter is evaluated in isolation. An example of a test could be to render RTAO at 360p resolution, gathering 45 samples in total for this test. The next test would then be to increase the resolution to 720p and repeat while keeping all other parameters constant. For Scenario P.2-3, the GPU tracing was done on four occasions, with an interval of one second between each occasion, beginning at one second after startup. Therefore, these tests had 60 samples each. Spreading out the tracing at different time points was done to account for the animated objects moving around in the scene. 54 | Method

(a) Baseline Maya rendering.

(b) Baseline Unity rendering.

(c) Difference blending applied. See Figure 3.6 for a close-up.

Figure 3.5 – Baseline renderings. Some models in the Solar Temple scene have been rendered with flat colours. This is used to establish a baseline between Maya and Unity before comparing the results of applying AO. Sub-figure 3.5c appears to be entirely black. There are slight differences however, as can be seen in Figure 3.6. Method | 55

Figure 3.6 – Difference blending applied on baseline renderings (close-up of Figure 3.5c). Although it may be hard to make out, a silhouette is visible in this close-up. 56 | Method

Figure 3.7 – Overview of performance tests data gathering. Evaluation | 57

Chapter 4

Evaluation

4.1 Results and analysis

This chapter presents the results in the following manner. First comes the results for the image quality tests, see Section 4.2. The image quality tests consist of Scenario IQ.1-3. Scenario IQ.1 and IQ.2 are based on the UE4 Sun Temple. Scenario IQ.1 contains objects animated with rigid-body animations only, and Scenario IQ.2 contains objects with skinning animations. Scenario IQ.3 contains close-up of a face, animated using facial rigging. The results of these tests are subject to two kinds of evaluation, by SSIM and visual inspection. The performance tests results, see Section 4.3, follow the image quality test results, These results consist of Scenario P.1-3 all based on the UE4 Sun Temple scene. Scenario P.1 does not contain any animated geometry. This scenario is used to measure how changing the ambient occlusion methods’ parameters impacts the performance. The two remaining scenarios, Scenario P.2 and P.3, contain objects animated by rigid-body and skinning animations respectively. Finally, the results are discussed in Section 4.4. 58 | Evaluation

4.2 Image quality tests

4.2.1 Scenario IQ.1 SSIM scores

Figure 4.1 – Scenario IQ.1 comparisons of renderings in Maya and real-time renderings with GTAO and RTAO. The quality configurations with the most similar SSIM scores were chosen as examples. The local SSIM maps have been coloured red to distinguish them. A darker shade indicates a stronger deviation from the reference image.

Scenario IQ.1 contained objects, animated with rigid-body animations only, in the UE4 Sun Temple scene. Figure 4.1 show renderings of three frames along with a local SSIM map. Table 4.1 shows the mean SSIM (for 600 Evaluation | 59

frames) for Scenario IQ.1. These are also visualised in Figure 4.3a and 4.3b. For GTAO, the SSIM values appear more or less constant across the different quality configurations. However, there is a marked increase in SSIM when rendering GTAO at full resolution. The scene contains some finer details that may benefit from a higher resolution depth buffer as input.

Figure 4.2 – Example of some visual artefacts occurring in the first (roughly) ten frames of GTAO. This is a detail from one of the first frame of Scenario IQ.2. The brightness of the image has been altered to make the artefacts more visible.

Furthermore, the GTAO SSIM appears to be relatively stable over the 600 frames for all quality configurations, as indicated by the standard deviation values (σ). Figure 4.3a shows some sharp drops in SSIM, particularly around frame 200 and frame 350. At around 200 frames an objects approaches the camera and subsequently leaves the cameras field of view. See Figure 4.1. It later comes back into view at around frame 350. Having a moving object take up a large portion of the camera view appear to make some issues apparent enough to noticeably reduce the SSIM score. These issues will be discussed further in Section 4.2.4. For RTAO there is a distinct difference in SSIM between the different quality configurations. The most dramatic increase in SSIM appears between 60 | Evaluation

Measure Low Quality Medium Quality High Quality GTAO Mean SSIM 0.9087 (0.9110) 0.9088 (0.9115) 0.9089 (0.9112) GTAO σ, SSIM 0.0011 (0.0010) 0.0009 (0.0010) 0.0010 (0.0011) RTAO Mean SSIM 0.9351 0.9653 0.9769 RTAO σ, SSIM 0.0027 0.0010 0.0006

Table 4.1 – Scenario IQ.1, The mean SSIM of 600 frames. RTAO has the highest SSIM and shows significant improvement with the higher quality configurations. For GTAO, using full resolution seem to be the most significant factor for improving the SSIM. Values in parenthesis uses full resolution GTAO. See Figure 4.3 for a companion graph.

’Low Quality’ and ’Medium Quality’. A reasonable explanation is that the increase in quality comes from reduction of noise and that smaller details are captured better. The RTAO SSIM appears stable over the 600 frames, and a higher quality setting seems to increase stability, indicated by the smaller standard deviation. Likely, this is due to the reduction in noise as the sample rate increases. Interestingly, there appears to be a sharp decrease in SSIM at roughly the same frames as GTAO. As shall be discussed later (see Section 4.2.4), an issue with RTAO is that noise tend to visible around the contours of moving objects. The object that leaves and returns into the cameras field of view take up a fair amount of the screen. Similarly, the noise around the object takes up a larger portion of the screen. The decrease in quality at these frames appear to be less dramatic for ’Medium Quality’ and ’High Quality’ compared to ’Low Quality’, which is consistent with the lower amount of noise in the higher quality configurations. All quality configurations of RTAO achieve a higher SSIM score than GTAO. This is not surprising, given that RTAO uses the same underlying ambient occlusion method as the offline renderings. Evaluation | 61

Scenario IQ.1 GTAO 1.0000

0.9900

0.9800

0.9700

0.9600

0.9500

SSIM 0.9400

0.9300

0.9200

0.9100

0.9000

0.8900 1 600 Frame

Low Medium High Low (Full-res) Medium (Full-res) High (Full-res)

(a) The plotted SSIM values follow a similar shape and largely overlap in two clusters. This show the distinction between GTAO rendered at full resolution, and GTAO rendered at the half resolution. Otherwise, the graph emphasises the similarity between the output of the different quality configurations. For a zoomed in graph, view Figure A.1 in the Appendix. Scenario IQ.1 RTAO 1.0000

0.9900

0.9800

0.9700

0.9600

0.9500

SSIM 0.9400

0.9300

0.9200

0.9100

0.9000

0.8900 1 600 Frame

Low Medium High

(b) The RTAO SSIM quality configurations show clear differences in mean SSIM score, but has a similar shape. Note that the two sharp reductions in SSIM appear to be larger in ’Low Quality’ and gets less significant for the other configurations. In part, this is likely due to less noise as the number of samples increase.

Figure 4.3 – Scenario IQ.1 The SSIM over 600 frames. 62 | Evaluation

4.2.2 Scenario IQ.2 SSIM scores

Figure 4.4 – Scenario IQ.2 comparisons of renderings in Maya and real-time renderings with GTAO and RTAO. The quality configurations with the most similar SSIM scores were chosen as examples. The local SSIM maps have been coloured red to distinguish them. A darker shade indicates a stronger deviation from the reference image.

Three frames from Scenario IQ.2, rendered using the different AO methods, are shown in Figure 4.4. The SSIM values are shown in Table 4.2 and Figure 4.5. Compared with Scenario IQ.1, the GTAO SSIM values are also more or less constant across the GTAO quality configurations but appear to have a Evaluation | 63

lower SSIM value than IQ.1. The lower SSIM score likely comes from the fact that the model used for animations in Scenario IQ.2 contains small details - like the face of the model or fold in the cloth. These details are lacking in the GTAO renderings. The SSIM in IQ.2 appear less stable, as the standard deviations for the GTAO SSIM values of IQ.2 are higher than for IQ.1. For the half-resolution GTAO the standard deviations are roughly twice as large for IQ.2 compared with IQ.1. The first frames of the GTAO renderings contain visual artefacts that lower the SSIM score. These frames contain visible artefacts, mainly noise. This could be due to to spatiotemporal reprojection that can produce artefacts before it converges, see Figure 4.2 for an example. Subsequent frames are free from these particular artefacts and the SSIM scores stabilises well over 0.8900 SSIM for all quality configurations. RTAO results for Scenario IQ.2 are slightly lower than the results for IQ.1. The biggest difference in SSIM score is for the ’Low Quality’. A plausible explanation is that, as mentioned above, IQ.2 uses a model with finer details than IQ.1. A low sample rate (as is the case for ’Low Quality’) could mean that these areas will not be rendered as sharp and suffer from noise. Overall, RTAO SSIM values appear slightly more stable in IQ.2 than IQ.1, likely accountable to the two sharp decreases in SSIM in IQ.1, not present in IQ.2. Similar to Scenario IQ.1, RTAO has the highest SSIM score across all the quality configurations.

Measure Low Quality Medium Quality High Quality GTAO Mean SSIM 0.8990 (0.9063) 0.8999 (0.9063) 0.9000 (0.9058) GTAO σ, SSIM 0.0032 (0.0041) 0.0033 (0.0051) 0.0034 (0.0056) RTAO Mean SSIM 0.9301 0.9630 0.9744 RTAO σ, SSIM 0.0026 0.0010 0.0009

Table 4.2 – Scenario IQ.2, the mean SSIM of 600 frames. Values in parenthesis uses full resolution GTAO. Both GTAO and RTAO gets a lower SSIM compared to the SSIM of Scenario IQ.1. IQ.2 has a model with more details than IQ.1 that might present a challenge for both AO methods. See Figure 4.5 for a companion graph. 64 | Evaluation

(a) As with Scenario IQ.1, the most significant difference between the quality configurations are whether they are rendered in full resolution or not. For a zoomed in graph, view Figure A.2 in the Appendix. Scenario IQ.2 RTAO 1.0000

0.9900

0.9800

0.9700

0.9600

0.9500

SSIM 0.9400

0.9300

0.9200

0.9100

0.9000

0.8900 1 600 Frame

Low Medium High

(b) The improvement of SSIM for the RTAO quality configurations appear to follow the same patterns as for Scenario IQ.1. Also, the curves are flatter than GTAO’s, indicating that the RTAO image quality is more consistent than GTAO for the entire sequence.

Figure 4.5 – Scenario IQ.2 SSIM over 600 frames. Evaluation | 65

4.2.3 Scenario IQ.3 SSIM scores

Figure 4.6 – Scenario IQ.3 comparisons of renderings in Maya and real-time renderings with GTAO and RTAO. The quality configurations with the most similar SSIM scores were chosen as examples. The local SSIM maps have been coloured red to distinguish them. A darker shade indicates a stronger deviation from the reference image.

Scenario IQ.3 is the close-up shot of a human face that changes facial expressions over the course of 200 frames. Figure 4.6 show renderings of three frames from Scenario IQ.3. Table 4.3 and Figure 4.7 shows the SSIM scores for Scenario IQ.3. Immediately apparent is the high SSIM score achieved by all quality configurations, 66 | Evaluation

especially by GTAO. All quality configurations and different resolutions of GTAO yield a similar SSIM score - the clear distinction between full resolution and half resolution is gone. In addition, the SSIM scores are more stable than for IQ.1 and IQ.2. The characteristics of Scenario IQ.3 seem to suit GTAO, partly because there are not as many areas with small details as in Scenario IQ.1-2 and also because treating the depth buffer of this scene as a continuous heightfield maps better to the actual scene geometry, as opposed to the UE4 Sun Temple used in Scenario IQ.1-2, 1. The forms in the model in Scenario IQ.3 are, for the most part, continuous, organic and smooth. On the other hand, Scenario IQ.1- 2 presents a deep view of a hallway with sharp angles and many objects that overlap without being connected - information that is lost in the depth buffer.

Measure Low Quality Medium Quality High Quality GTAO Mean SSIM 0.9846 (0.9847) 0.9847 (0.9846) 0.9848 (0.9845) GTAO σ, SSIM 0.0005 (0.0006) 0.0004 (0.0005) 0.0004 (0.0004) RTAO Mean SSIM 0.9620 0.9809 0.9883 RTAO σ, SSIM 0.0010 0.0005 0.0004

Table 4.3 – Scenario IQ.3, The mean SSIM of 200 frames. Values in parenthesis uses full resolution GTAO. Both GTAO and RTAO has very high SSIM. See Figure 4.7 for companion graph.

As with Scenario IQ.1 and IQ.2, the SSIM for RTAO in Scenario IQ.3 differs between the quality configurations. Overall, the RTAO SSIM scores are significantly higher for Scenario IQ.3 than IQ.1-2. Additionally, the stability of the SSIM scores of RTAO is better for Scenario IQ.3, compared with IQ.1-2. The local SSIM maps from Scenario IQ.1-3 indicate that the RTAO renderings deviate from the reference images mainly in areas with finer details and by containing noise. The noise appear in the areas with finer details, but also along the edges of moving objects (more on this later). Scenario IQ.1- 2 have many areas where there are finer details, while Scenario IQ.3 largely consist of larger, smoother surfaces with only a handful of areas where there are more details and intricate forms - i.e. the face. Also, the head itself is not moving, so there is no additional noise produced along its contours. The different scene characteristics likely accounts for the difference of RTAO SSIM score between Scenario IQ.1-2 and IQ.3. It is notable that only ’High Quality’ RTAO achieved a higher SSIM score than the other GTAO quality configurations. In this scenario it appears that the noise produced by ’Low Quality’ RTAO in the background and in the Evaluation | 67

neck lowers its SSIM score. GTAO, overall, suffers less from noise than RTAO. In any case, when comparing the renderings it is quite obvious that the RTAO renderings are more similar to the reference renderings than the GTAO renderings, which indicate that the SSIM score cannot be taken at face value all the time. 68 | Evaluation

Scenario IQ.3 GTAO 1.0000

0.9900

0.9800

0.9700

0.9600

0.9500

SSIM 0.9400

0.9300

0.9200

0.9100

0.9000

0.8900 1 200 Frame

High Quality (Full res) Medium Quality (Full res) Low Quality (Full res) High Quality Medium Quality Low Quality

(a) In this scenario, there was apparently no benefit in using a higher resolution depth buffer; the clear distinction between full resolution and half resolution GTAO is gone. Comparing with IQ.1-2, IQ.3 appear to better suit GTAO and the errors produced by GTAO in IQ.3 are of smaller scope. For a zoomed in graph, view Figure A.3 in the Appendix. Scenario IQ.3 RTAO 1.0000

0.9900

0.9800

0.9700

0.9600

0.9500

SSIM 0.9400

0.9300

0.9200

0.9100

0.9000

0.8900 1 200 Frame

Low Medium High

(b) Again, RTAO shows distinct SSIM scores for each quality configuration. The shape of the curve appear even more stable for IQ.3 than IQ.1 and arguably IQ.2. A possible explanation is that there is no object that moves across the screen in IQ.3 - which caused issues in IQ.1 and IQ.2.

Figure 4.7 – Scenario IQ.3 SSIM over 200 frames. Evaluation | 69

4.2.4 Visual inspection

Inspecting the local SSIM maps The visual inspection will start by using the local SSIM maps that were generated when the SSIM scores were calculated. The local SSIM maps show where the real-time rendered images differ from the reference images. The maps can be seen in Figures 4.1, 4.4 and 4.6, where they have been coloured red to distinguish them. By composing the local SSIM maps into a video file it became easier to determine issues in the renderings that appear over time as objects animate and move. Also, it made it easier to determine issues that are consistent across quality configuration and to determine if there is an overall trend between the quality configurations. Figures 4.8 and 4.9 show how the local SSIM maps differ between the different quality configurations, and some of the trends that will be mentioned below are noticeable in these progressions. For GTAO, some consistent issues could be determined by looking at the local SSIM maps for Scenario IQ.1-3. The first is mostly apparent in IQ.1-2 where GTAO has issues with sharp edges and corners. This issue persist across all quality configurations but is somewhat alleviated when GTAO is rendered at full-resolution. Noise is also an issue - mainly in occluded areas that receives occlusion from moving objects. There is also issues with over-occluding areas that surround objects, particularly if it is moving. This can make the occlusion appear as a "dark halo" or a cast shadow - this will be furthered discussed below. Finally, GTAO is not always able to capture areas with finer detail. Examples of this are the folds on the winged statue in Scenario IQ.1 and IQ.2 and the animated model in IQ.2. As can be seen in Figure 4.8, there are no dramatic differences in the local SSIM maps between the different quality configurations of GTAO, which reflects that the mean SSIM scores are very similar. The issues mentioned above are present in all the local SSIM maps for all configurations but the higher resolution GTAO has more details and sharper edges which is noticeable. For RTAO the main visual issues indicated by the local SSIM maps appear to be caused by noise. The noise levels are markedly reduced between the quality configurations, particularly from ’Low Quality’ to ’Medium Quality’, which is apparent in Figure 4.9. There is also the prevalence of noise around the contours of moving objects and moving objects can be followed by "ghosting" artefacts. These issues will be discussed further below. Otherwise, the local SSIM maps indicate that RTAO deviates in areas with small details. 70 | Evaluation

Figure 4.8 – The SSIM values are very similar for GTAO. Consequently, it is hard to see any distinctive differences between the local SSIM maps of the quality configurations. One thing that can be seen is the difference in sampling the radius, which increases with the quality configuration. See for instance the dark band that surrounds the contour to the right of the head in IQ.3.

The reason being that the real-time renderings do not have as many samples to capture these as accurately as the offline renderings. Also, there are some aliasing issues with sharp edges and contours. Evaluation | 71

Figure 4.9 – The local SSIM maps show a consistent progression from ’Low Quality’ to ’High Quality.’ Noise is reduced. The issue with noise surrounding the contours of moving objects is also less prevalent, see the examples for IQ.2. Some areas remain a problem for all configurations, an example being the mouth and eye lids in IQ.3 72 | Evaluation

GTAO GTAO seem to have problems with rendering sharp edges - they appear too soft, see Figure 4.10. Also, finer details are usually not captured as accurately as RTAO, even when GTAO uses full-resolution and a high sample count. This is apparent in all scenarios - even in IQ.3 where all GTAO configurations get a high SSIM score. In IQ.3, the loss of subtle details means that some shapes in the face do not appear to be animated, leading to a considerably less natural- looking and pleasing result than the reference renderings and RTAO.

Figure 4.10 – Illustration of three different issues with GTAO. Edges are rendered too soft (green), no occlusion from objects that are off-camera (red) and over-occlusion around an object, appearing as a ’dark halo’ effect (blue). Close-up from Scenario IQ.1, left image rendered in Maya. Right image, GTAO ’High Quality’, full resolution.

Because GTAO uses the horizon-based sampling technique on the depth buffer, some issues appear when objects obscure each other (from the camera’s viewpoint). What can occur is the loss of occlusion information as an object closer to the camera obscures another object further back; Figure 4.11 shows an illustration of this. When calculating the occlusion for the background area, the information about the obscured object is lost since the depth buffer now instead contains information about the object in the foreground. The sampling algorithm likely determines that the distance to the object in the foreground is too large, and therefore should not contribute any occlusion. While this may be accurate, the occlusion that should be coming from the obscured object is lost. Hence the area appears too bright - often as a white ’halo’ around the foreground object. This artefact can be vivid, especially around objects that Evaluation | 73

move across a scene.

Figure 4.11 – Showing of how occlusion information is lost, for GTAO, when an object is obscured. Also, note the noise along the contours of the foreground object. Close-up from Scenario IQ.1, GTAO, ’High Quality’, full resolution.

The issue with lost occlusion information can occur as animated objects move across the scene. However, it can also happen with static geometry - leaving parts of the scene under-occluded. The explanation is the same as above; the occluder’s information is not part of the depth buffer. Examples of this is shown by the red circles in Figure 4.10. Another issue is that the transition between occluded and unoccluded areas can be too sharp, giving the occluded areas a shape that usually follows the occluder’s silhouette. The occluded area can resemble dark ’halo’ around objects or a cast shadow, looking unnatural and distracting, see Figure 4.10 (indicated by a blue circle). Also, when the occlusion from two or more shapes overlap, visual artefacts can appear - usually as areas that are either too dark or light, see Figure 4.12 for illustrations of these issues. There is also some visible noise, even at full resolution GTAO with a high sample count. Moreover, the denoising appear to produce some visual artefacts - these can be noise at the contours of objects that are animated and as ’ghosting’ effect. The ghosting effect is likely due to the use of spatio- temporal reprojection. In Figure 4.13 some renderings show the difference between the quality configurations for GTAO. The higher quality settings preserve slightly more 74 | Evaluation

Figure 4.12 – Illustration of artefacts, related to GTAO, that can occur in occluded areas. Close-up from Scenario IQ.3, left image rendered in Maya. The GTAO example also shows the issue with a sharp distinction between occluded and non-occluded areas, whereas in the reference image there is a smooth, gradual transition between the areas. This can be somewhat adjusted with the sampling radius but at the possible expense of poorer overall results. Right image is GTAO ’High Quality’, full resolution. details, especially if GTAO renders in full resolution. Increasing the sampling radius makes the ambient occlusion effect wider and can thus incorporate objects that are further away. Unfortunately, a larger sampling radius can aggravate the dark haloing effect mentioned above and reduce the number of local details captured - the enlarged "halo" can be seen quite clearly in the renderings from Scenario IQ.3 in Figure 4.13. While it may not be reflected in the SSIM results, GTAO can benefit from being hand-tuned to achieve a desired aesthetic result. This is likely a necessity if GTAO is to be used to its full capacity. Evaluation | 75

RTAO RTAO, even at low sample levels, captures an impressive amount of details and deals with overlapping objects correctly. RTAO has access to all the geometric information of the scene. Hence, occluders that are not visible in the camera can therefore contribute to nearby objects’ occlusion. In IQ.3, while not being as sharp as the reference renderings, RTAO manages to preserve subtle details and shifts in the occlusion across the entire face as it animates. This means that the face looks much more natural as it animates compared with the results from GTAO. In the RTAO renderings, noise is prevalent and visible, mainly when the sampling rate is low. In particular, noise is prominent around the contours of animated objects; see Figure 4.14. One explanation for this is that areas where objects overlap and intersect can become "busy" and need more samples to be accurately resolved, compared to areas that are simpler. Another explanation is that when objects are moving, the moving objects and areas around the moving objects are less consistent and coherent across frames. Therefore, they cannot rely on the accumulation of samples, due to the spatio-temporal reprojection, to the same extent as the parts of the scene that remain coherent across all frames. In effect, the areas around moving objects receive a lower amount of samples - while these areas, in particular, would benefit from more samples. Also, there is visible noise around areas containing detailed geometry which is directly related to the explanation above concerning "busy" areas. Some examples are, in Scenario IQ.2, around the hands, eye sockets and folds on the clothes of the animated models. Some apparent visual artefacts appear to be the result of the denoising process and the use of spatio-temporal reprojection. This can create "ghosting" or a visual lag, similar to what was described for GTAO above. In Scenario IQ.3, there are obvious visual artefacts such as lag around the eyelids and mouth, particularly across the model’s teeth as the model smiles, see Figure 4.15. Figure 4.16 contain example renderings of different quality configuration of RTAO. When comparing the different quality configurations for RTAO, the most significant improvement in image quality comes from the reduction of noise and an increase in the level of detail. This is expected when the sample rate is increased. 76 | Evaluation

Figure 4.13 – Details from various renderings, comparing the quality configurations for GTAO. Evaluation | 77

Figure 4.14 – Example showing noise along the contours of an animated object, prevalent in RTAO renderings. Close-up from Scenario IQ.1, RTAO ’Low Quality’. The contrast of this image has been exaggerated to show the noise more clearly.

Figure 4.15 – Showing of a visual artefact, likely due to a lag from the use of spatio-temporal reprojection in the RTAO renderings. Close-up from Scenario IQ.3, RTAO ’High Quality.’ 78 | Evaluation

Figure 4.16 – Details from renderings showing the difference between quality configurations for RTAO. While it may be difficult to discern, ’Low Quality’ contains more noise and details looks less distinct than ’Medium Quality’ and ’High Quality.’ Evaluation | 79

4.3 Performance tests

4.3.1 Scenario P.1 All tests done with Scenario P.1 consists of the UE4 Sun Temple scene with no animated geometry. Overall, the results from these tests show that the computation time for both ambient occlusion methods can change drastically according to changes in some of the input parameters. However, all tests of GTAO’s computation time stay below one millisecond, while RTAO has a much higher base-cost and only achieves a computation time below one millisecond when rendered at 360p or below. This means that all GTAO parameter configurations, tested here, are likely viable for many real-time applications while only a smaller subset of the RTAO parameter configurations are feasible.

Sample count Table 4.4 and Figure 4.17 summarises the results from the step count tests for GTAO. Increasing the step count, increases the number of samples GTAO uses when searching for occluders. The increase in computation costs rises linearly with an increase in step count, as is expected. For RTAO, increasing sample count also increases the performance costs linearly, see Table 4.5 and Figure 4.19a. When RTAO sampling rate doubles, the computation time seems to roughly double too.

Event/Step Count 2 4 8 16 32 Total 0.1631 0.1796 0.2153 0.2864 0.4242 Horizon SSAO 0.0644 0.0813 0.1164 0.1857 0.3234 Denoise 0.0682 0.0669 0.0678 0.0680 0.0675 Upscale 0.0304 0.0313 0.0311 0.0327 0.0333

Table 4.4 – Scenario P.1, Mean GTAO computation time vs. step count. Values are given in ms. Note that ’Total’ subsumes the other events. See Figure 4.17 for companion graph.

Rendering resolution Both ambient occlusion methods’ computation time increases linearly when rendering resolution increases. Table 4.6 and Figure 4.18b shows the relationship for GTAO, while Table 4.7 and Figure 4.19b shows the relationship for RTAO. 80 | Evaluation

Event/Sample Count 2 4 8 16 32 Total 2.253 3.203 5.052 8.847 16.41 Raytracing 1.054 2.001 3.878 7.651 15.21 Temporal filter 0.3081 0.3111 0.2971 0.3076 0.3088 Diffuse filter 0.8104 0.8117 0.8006 0.8100 0.8120 Compose 0.0798 0.0801 0.0764 0.0781 0.0778

Table 4.5 – Scenario P.1, Mean RTAO computation time vs. sample count. Values are given in ms. Note that ’Total’ subsumes the other events. See Figure 4.19a for companion graph.

We can also see that using a full resolution GTAO is roughly two times more expensive than using half-resolution, which conforms to expectations. A difference when increasing the number of samples, compared with increasing the rendering resolution, is that the former only increases the computation time of the ambient occlusion calculation itself (The ’Raytracing’ event for RTAO or the ’Horizon SSAO’ event for GTAO), while the latter also impacts the denoising step for both methods.

Event/Resolution 360p 720p 1080p 1440p Total (full res) 0.0985 0.2012 0.3883 0.6691 Horizon SSAO (full res) 0.0489 0.0985 0.1880 0.3330 Denoise (full res) 0.0496 0.1027 0.2002 0.3362 Total 0.0749 0.1106 0.1725 0.2631 Horizon SSAO 0.0322 0.0475 0.0749 0.1149 Denoise 0.0265 0.0417 0.0668 0.1035 Upscale 0.0162 0.0214 0.0308 0.0448

Table 4.6 – Scenario P.1, Mean GTAO computation time vs. resolution. Values are given in ms. Note that ’Total’ subsumes the other events. See Figure 4.18a and Figure 4.18b for companion graphs.

Sampling radius In Table 4.8 we can see that changing the maximum sampling radius in pixels for GTAO also has an impact, although minor, on performance. A possible reason for this is that, as the sample radius increases, the GPU need to access memory that is not spatially coherent which increases the cache-miss rate. Table 4.10 summarises the results for increasing the maximum ray length of RTAO, in other words the sampling radius. Increasing the maximum ray Evaluation | 81

Event/Resolution 360p 720p 1080p 1440p Total 0.6278 1.514 3.203 5.339 Raytracing 0.3792 0.9310 2.001 3.431 Temporal filter 0.0813 0.1622 0.3111 0.4518 Diffuse filter 0.1398 0.3752 0.8117 1.4085 Compose 0.0276 0.0455 0.0801 0.0478

Table 4.7 – Scenario P.1, Mean RTAO computation time vs. resolution. Values are given in ms. Note that ’Total’ subsumes the other events. See Figure 4.19b for companion graph.

Event/Radius 0.25 0.50 0.75 1.00 Total 0.1792 0.1802 0.1799 0.1809 Horizon SSAO 0.0809 0.0816 0.0815 0.0819 Denoise 0.0671 0.0672 0.0672 0.0676 Upscale 0.0312 0.0314 0.0312 0.0314

Table 4.8 – Scenario P.1, Mean GTAO computation time vs. radius. Values are given in ms. Note that ’Total’ subsumes the other events. length of RTAO will make the scope of the ambient occlusion effect larger. Hence, the intersection tests done by the ray tracing algorithm must potentially consider more objects as the maximum ray length increases, while shorter lengths can terminate faster. Therefore, the increased performance cost of a larger ray length is not surprising. The difference in computation time between the smallest value, 2, and the two largest values, 32 and 64, is 1.857 ms which is substantial. The fact that 32 and 64 has the same computation time could be specific to the scene that the test was performed in. A scene with a different geometry and layout could likely have yielded a different result.

Event/Max Radius in Pixels 16 32 64 128 256 Total 0.1798 0.1805 0.1842 0.1871 0.1876 Horizon SSAO 0.0807 0.0825 0.0864 0.0888 0.0887 Denoise 0.0680 0.0676 0.0669 0.0671 0.0675 Upscale 0.0311 0.0304 0.0310 0.0312 0.0314

Table 4.9 – Scenario P.1, Mean GTAO computation time vs. max radius in pixels. Values are given in ms. Note that ’Total’ subsumes the other events. 82 | Evaluation

Event/Ray Length 2 4 8 16 32 64 Total 3.192 3.461 4.240 4.988 5.049 5.049 Raytracing 1.996 2.264 3.049 3.804 3.868 3.870 Temporal filter 0.3065 0.3065 0.3059 0.3079 0.3040 0.3038 Diffuse filter 0.8104 0.8114 0.8075 0.8004 0.8002 0.7993 Compose 0.0793 0.0794 0.0769 0.0755 0.0762 0.0764

Table 4.10 – Scenario P.1, Mean RTAO computation time vs. maximum ray length. Values are given in ms. Note that ’Total’ subsumes the other events.

Scenario P1. computa�on �me of GTAO vs. step count 1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000

Computa�on �me, ms 0.2000 0.1000 0.0000 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Step count, samples per texel

Horizon SSAO Denoise Upscale

Figure 4.17 – Scenario P.1, GTAO computation time vs. step count. As the number of samples increase, the cost of the ambient occlusion step (’Horizon SSAO’) increases linearly. The denoising and upscaling are not affected. Evaluation | 83

Scenario P1. computa�on �me of (full resolu�on) GTAO vs. resolu�on 1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000

Computa�on �me, ms 0.2000 0.1000 0.0000 360 720 1080 1440 Resolu�on (horizontal), pixels

Horizon SSAO (full res) Denoise (full res)

(a) Increasing the rendering resolution makes both the ambient occlusion step (Horizon SSAO) and the denoising more expensive. Note that there is no ’Upscale’ as this uses full resolution GTAO. Scenario P1. computa�on �me of GTAO vs. resolu�on 1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000

Computa�on �me, ms 0.2000 0.1000 0.0000 360 720 1080 1440 Resolu�on (horizontal), pixels

Horizon SSAO Denoise Upscale

(b) These results use GTAO at half the rendering resolution, therefore upscaling is necessary. Still, the half-resolution GTAO is substantially cheaper than the full resolution variant.

Figure 4.18 – Scenario P.1 GTAO computation time vs. rendering resolution 84 | Evaluation

Scenario P1. computa�on �me of RTAO vs. sample count 18.000 16.000 14.000 12.000 10.000 8.000 6.000

Computa�on �me, ms 4.000 2.000 0.000 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Sample count, samples per pixel

Raytracing Temporal filter Diffuse filter Compose

(a) As with GTAO, increasing the number of samples only makes the ambient occlusion step more expensive (’Raytracing’), while the denoising is invariant. Scenario P1. computa�on �me of RTAO vs. resolu�on 18.0000 16.0000

14.0000 12.0000

10.0000 8.0000 6.0000

Computa�on �me, ms 4.0000 2.0000 0.0000 360 720 1080 1440 Resolu�on (horizontal), pixels

Raytracing Temporal filter Diffuse filter Compose

(b) An increase in resolution causes both the ambient occlusion step and the denoising step to be more expensive.

Figure 4.19 – Scenario P.1, performance scaling for RTAO. Evaluation | 85

4.3.2 Scenario P.2 & P.3 Scenario P.2 has objects animated with rigid-body animations and Scenario P.3 has objects animated with skinning animations. The input parameter configurations for these tests are shown in Table 4.11. Table 4.12, Table 4.13 and Figure 4.20 summarises the results for the tests done on both scenarios and for both methods. Note that the computation time of RTAO in these measurements do not include the time to rebuild and update acceleration structures. These scenarios show the additional performance costs that appear when using RTAO. The computation time of GTAO appear constant as the amount of object increases. This is not surprising, since the time complexity of SSAO methods are not dependent on the geometric complexity of a scene.

Parameter Value Resolution (both) 1920 x 1080 Intensity (both) 1 Radius (GTAO) 0.5 Max radius in pixels (GTAO) 32 Step count (GTAO) 4 Max ray length (RTAO) 2 Sample count (RTAO) 4 Denoiser radius (RTAO) 0.5

Table 4.11 – Scenario P.2-3, input parameter configurations for GTAO and RTAO used in P.2 and P.3.

Hence, it is more interesting to look at the results for RTAO. It is puzzling that the computation times, as a function of the geometric complexity, appear to follow a linear, rather than a logarithmic relationship, which would be the expected relationship given that acceleration structures are used. A possible explanation is that Scenario P.2 and Scenario P.3 present worst-case scenarios because many of the animated objects clustered together, causing overlap and intersection between them. This would naturally lead to more intersection tests compared to a scenario where objects are sparsely distributed. Furthermore, both rigid-body and skinning tests show a linear increase of performance cost for RTAO, but the increase rate differs. A plausible explanation for the difference is that the models used for either tests differ in their geometric complexity. The model used for rigid-body animations consists of 18.1k triangles, while the model for the skinning animations consists of 43.2k triangles. That is, the skinning-model, used in P.3, has more 86 | Evaluation

than twice the geometric complexity than the model in P.2. The computation cost of RTAO for skinning animations seem to increase at a rate roughly twice the rate of rigid-body animations, which corresponds fairly well with the discrepancy in geometric complexity.

Method/ Number of 1 5 10 20 30 40 50 60 animated objects GTAO (rigid- 0.1765 0.1772 0.1755 0.1771 0.1763 0.1766 0.1770 0.1759 body) GTAO 0.1757 0.1767 0.1768 0.1771 0.1765 0.1737 0.1721 0.1710 (skinning) RTAO (rigid- 3.195 3.189 3.208 3.282 3.284 3.342 3.360 3.383 body) RTAO 3.163 3.192 3.198 3.235 3.274 3.303 3.325 3.367 (skinning)

Table 4.12 – Scenario P.2-3, mean computation time as the number of objects increases. Values are given in ms. Results continue in Table 4.13. See Figure 4.20 for a companion graph.

Acceleration structure rebuild As explained in Section 2.1.4, when animating objects the acceleration structures used for ray tracing must be updated or rebuilt. Rigid-body animations can typically get away with updating the top-level acceleration structures, while skinning animations will likely need to do a full rebuild (or refitting) of the bottom-level acceleration structures. The difference in cost is clearly shown in the measurements presented in Table 4.14 and Figure 4.21. Note that these measurements pertain specifically to the acceleration structure updates and do not include the computation time of RTAO. While the computation cost of updating or rebuilding appear to scale linearly for both rigid-body and skinning, the increase rate differs substantially. Evaluation | 87

Method/ Number of 70 80 90 100 160 320 640 animated objects GTAO (rigid- 0.1769 0.1759 0.1757 0.1763 0.1737 0.1719 0.1728 body) GTAO 0.1722 0.1720 0.1721 0.1725 0.1726 0.1707 0.1731 (skinning) RTAO (rigid- 3.408 3.427 3.472 3.508 3.571 3.686 4.155 body) RTAO 3.407 3.453 3.562 3.546 3.702 4.163 4.692 (skinning)

Table 4.13 – Scenario P.2-3, mean computation time as the number of objects increases. Values are given in ms. Continuation of Table 4.12. See Figure 4.20 for a companion graph. 88 | Evaluation

Scenario P.2-3, computa�on �me vs. number of animated objects 6.0

5.0

4.0

3.0

2.0 Computa�on �me, ms 1.0

0.0 0 50 100 150 200 250 300 350 400 450 500 550 600 650 Number of animated objects

GTAO (rigid-body) GTAO (skinning) RTAO (rigid-body) RTAO (skinning)

Figure 4.20 – Scenario P.2-3, computations time as number of animated objects increase. As expected, computation time of GTAO is invariant to an increase in number of animated objects. In contrast, RTAO costs more, as the geometric complexity increases. The higher cost of skinning can likely be attributed to the fact that this model has a higher geometric complexity than the model used for rigid-body animations. Evaluation | 89

Rigid-body Skinning Number of animated objects Mean time, ms Mean time, ms 1 0.153 0.300 5 0.155 0.348 10 0.153 0.403 20 0.150 0.593 30 0.158 0.780 40 0.155 0.958 50 0.160 1.14 60 0.165 1.35 70 0.163 1.53 80 0.173 1.73 90 0.173 1.95 100 0.175 2.13 160 0.193 3.28 320 0.203 6.38 640 0.223 12.6

Table 4.14 – Scenario P.2-3 mean ray tracing acceleration structure update and rebuild time. For rigid-body, it is likely that updating the top level acceleration structures are enough. For skinning refitting or rebuilding bottom level acceleration structures might be necessary. These operations are much more expensive, which is reflected in the results. See Figure 4.21 for a companion graph. 90 | Evaluation

Scenario P.2-3, ray tracing accelera�on structures rebuild and update �me 14.000

12.000

10.000

8.000

6.000

4.000

Computa�on �me, ms 2.000

0.000 0 100 200 300 400 500 600 700 Number of animated objects

Rigid-body Skinning

Figure 4.21 – Scenario P.2-3, cost of updating and rebuilding ray tracing acceleration structures. The increase of the costs associated with skinning animations dwarfs the increase in costs associated with rigid-body animations. Evaluation | 91

4.4 Discussion

4.4.1 On research question 1 - evaluating image quality The difference in the level of detail captured by RTAO compared to GTAO is very apparent, even when compared across all quality configurations. For instance, in Scenario IQ.1-2, the renderings made with RTAO include a range of small architectural details and details such as folds in the clothes on the human model and details in the human model face - clearly showing features such as the mouth and nose. Details are not always present in the renderings done with GTAO, and thus, visual information about the scene is lost. While there is a discernible difference in the image quality between RTAO and GTAO for Scenario IQ.3, it is less drastic. This is clearly reflected in the SSIM scores. The characteristics of Scenario IQ.3 is more favourable for the assumption made by GTAO - i.e. treating the depth buffer as a continuous heightfield. These characteristics include that the geometry in the scene is predominantly continuous and that the surface normal, for the most part, changes gradually across the surface of the object. Perhaps most importantly, scenario IQ.3 lacks sharp edges and perpendicular corners - which GTAO tended to render too soft in Scenarios IQ.1-2. The GTAO local SSIM maps of IQ.1-2 clearly indicated the edges as consistent problems and the absence of these likely contributed to the higher SSIM score in IQ.3 for GTAO. Finally, Scenario IQ.3 does not contain distant objects, where one object obscure parts of the other in the camera view, which circumvented the problem GTAO have with the loss of occlusion in these situations. In contrast, the characteristics of the scene in Scenario IQ.1 and IQ.2 are different - it contains sharp edges and corners, many sudden changes in surface normals and distant objects that overlap each other. Also, this scene contain many areas with finer details, particularly in Scenario IQ.2 which uses an animated model that is more detailed than the model used for animation in Scenario IQ.1. Thus, these scenarios expose the limitations of GTAO to a higher degree and here RTAO is clearly superior. Overall, the extent to which RTAO’s image quality is an improvement over GTAO’s, depends somewhat on the characteristics of a scene. In fairness, it should be said that the visual quality of GTAO could likely been improved to some extent by meticulously tailoring the input parameters for each scenario, even though this may not be reflected in the SSIM scores. But, it is unlikely that this would have impacted the overall results in a major way - the issues stemming from depth buffer are not removed by changing any parameter and the visual artefacts occurring as 92 | Evaluation

objects move and animate would still be present. Besides a difference in the level of detail captured, a significant advantage of RTAO, compared to GTAO, is that RTAO does not suffer from the issues that ail screen space methods. One such issue is the disappearance of occlusion from objects that are not visible to the camera (i.e. objects that are not present in the depth buffer), which was observed in the tests carried out in this thesis. Of course, this result was expected because the limitation of SSAO methods is known, but the question is to what extent it impacts the image quality and viewer’s overall experience. The results from the tests indicate that, particularly in an animated scene with moving objects, such artefacts are noticeable and can negatively impact the viewer’s experience. As objects move across the scene, occlusion information can be lost - giving rise to the haloing effects discussed in 4.2. In a scene with many moving objects, this effect could be very apparent. How apparent this effect is will also depends on what the viewer is paying attention to in the scene. For instance, in a scene with a single moving object, the viewer is likely to focus their attention on the moving object - which could also draw attention to the artefacts that surround the object as it moves across the scene. Naturally, we should expect that the extent to which a viewer may be disturbed by these artefacts will differ from one individual to another. For the present author, it creates an effect that the scene is shifting in unsettling ways. None of the problems described above applies to RTAO, which arguably constitutes a significant improvement to image quality. The renderings made with RTAO do have some issues, specifically tied to animated objects, worth discussing. Arguably, the most significant of these issues relate to noise and the denoising step. Similar to how the haloing effects appear as objects move in the GTAO renderings, the moving objects in the RTAO renderings are surrounded by visible noise - most apparent at their contours. However, compared with the haloing effects, the noise is much less visible and disturbing. Also, the higher quality configurations (i.e. a higher sample rate) makes the noise surrounding the contours less visible. On the other hand, the haloing issues with GTAO were similarly prevalent on all quality configurations. Although outside the scope of this thesis, it is worth pointing out that a different denoising algorithm could possibly reduce the noise levels and improve the image quality of RTAO (and GTAO). Besides the noise, there are also issues with the presence of visual artefacts due to spatio-temporal reprojection. Such artefacts can occur at locations that Evaluation | 93

make them visible and stand out. As an example, the lag that appeared as a dark band on the teeth of the model in Scenario IQ.3, see 4.2.4 and Figure 4.15. The occurrence of artefacts that are so visible could deflate the impact of moments that aim to be visually striking. It could also break the viewer’s immersion, destroying the animators intent of conveying subtle emotion, as an example. Depending on the use case, the possibility of this kind of artefacts can warrant attention and special care so that it does not damage the intended impression on the viewer. The RTAO renderings do not contain the same amount of details as the offline renderings and looks a bit blurred in comparison. Also, the RTAO renderings contain visible noise, which is not present in the offline renderings. These differences in image quality largely comes down to what could be expected from using the same method but with a lower sample rate. Overall, the image quality of RTAO approaches that of the offline renderings. To summarise, and directly address RQ 1: to what extent is the image quality of RTAO an improvement over GTAO when evaluated using SSIM and visual inspection? The image quality of RTAO constitute a significant improvement over GTAO, particularly in scenes with characteristics that is less suited for GTAO. The difference goes beyond RTAO producing renderings with more details and handling off-screen occluders. When objects move across a scene, the visual artefacts associated with RTAO are less prevalent and disturbing than GTAO. The results do show that, given favourable conditions for GTAO, the difference in image quality is less pronounced. The SSIM scores largely reflect this assessment and the issues that weigh down the image quality of both methods are issues that are consistent in the local SSIM maps.

4.4.2 On research question 2 - evaluating performance There is a big difference in the computation cost for RTAO compared to GTAO. Consistently, in all performance tests, GTAO computation time stayed below one ms per frame, even when rendered at full resolution at 1440p. In comparison, the computation time of RTAO was never shorter than one ms per frame for any configuration, except when rendering at 360p. At 1080p and four samples per pixel, the computation time was around 3.2 ms per frame. In real-time applications targeting 60 FPS or more, this is a substantial amount of time. Moreover, the RTAO computation times were sensitive to changing the input parameters. Increasing the sample rate by one roughly added 0,5 ms 94 | Evaluation

to the computation time. The sampling radius was also sensitive, with the highest sampling radius being two milliseconds more expensive than the lowest radius. While increasing some GTAO parameters (most notably the rendering resolution) incurred a large, relative performance increase, GTAO’s low base-cost meant that all GTAO performance tests stayed below one ms in computation time. However, we note that while the change in parameters can drastically impact the performance of RTAO, the image quality does not necessarily change as significantly. As shown in the image quality tests, the difference between going from two samples per pixel (’Low Quality’) to four samples (’Medium Quality’) is more apparent than from four samples to eight samples (’High Quality’). Increasing the sample rate further (to 16 or 32) is likely to yield diminishing returns and not worth the price paid in performance. Two samples per pixel is enough to capture an impressive amount of detail. Indeed, some denoisers and reconstruction algorithms assumes that one sample per pixel is used, and can achieve impressive results based on this sample rate alone. An application should be able to find a trade-off between performance and image quality when using RTAO, particularly if some noise is acceptable. Another RTAO parameter that increased the performance cost was the sampling radius. In the image quality tests, the sampling radius was set to one, since this yielded the results closest to the reference renderings. Increasing the sampling radius would cause over-occlusion. Hence, a larger sampler radius does not imply better image quality or a desired result. Whereas the computation time of GTAO is invariant to the number of animated objects in a scene, increasing the number of animated objects does impact the computation time of RTAO. It does so in two ways: one is due to the geometric complexity increasing, the other is the need to build and maintain acceleration structures for the ray tracing algorithm. Regarding the geometric complexity, the expectation is that the computation time of RTAO should increase by a log n relationship. In contrast, the results appeared to show computation times scaling linearly. At first glance, these results appeared contradictory. However, the likely explanation is that the tests constituted a worst-case scenario due to many objects being close to each other, overlapping and intersecting. Moreover, when an object is animated, its corresponding acceleration structure needs to be updated (or, in the worst case, rebuilt.) For scenes with many objects animated with skinning, the update and rebuild time of the ray tracing acceleration structures were non-trivial. There was no attempt to optimise the management of the ray tracing acceleration structures in this Evaluation | 95

thesis. Significant reductions to the costs associated with the management of acceleration structures have been shown in practice, as mentioned in Section 2.1.4. Therefore, a smarter management scheme could presumably yield significant performance gains over the results presented here. Also, in practice, other algorithms that use the ray tracing pipeline might share the cost of maintaining acceleration structures, so this cost may not be entirely credited to RTAO. Hence, the cost associated with the ray tracing acceleration structures may be less of an issue in practice. In any case, the number of concurrent animated objects in a scene becomes a genuine concern if an application is to use RTAO. The increased computation time when adding animated objects can be a significant disadvantage of using RTAO compared to an SSAO method, particularly if an application demands a lot of animated objects. Anyone looking to use RTAO must likely monitor and have strategies to deal with this cost - which is not a concern if they use GTAO instead. To conclude this section, we address RQ 2 directly: how is the computation time, measured in milliseconds, of RTAO affected by changes in the image quality, geometric complexity of the scene as well as the number of animated objects? RTAO is significantly affected by changes in image quality, geometric complexity and the number of objects animated by skinning. By significantly we mean that the associated costs are high enough to be a concern when used in real-time applications. On the other hand, RTAO can achieve a very good image quality using small values to the input parameters, such as a low sample rate and small sampling radius and therefore it should be possible to find a good balance between image quality and performance that suits a range of applications.

4.4.3 On the main research question Finally, to address the main research question: to what extent is ray traced ambient occlusion suitable in real-time applications in scenes with animated objects? RTAO is well suited in terms of image quality, with some minor artefacts occurring around animated objects. These artefacts were (to the present author) not as disturbing as the artefacts produced by GTAO. However, the high image quality of RTAO must be weighed against its cost in terms of computation. Performance-wise, a real-time application can use RTAO. By tuning RTAO’s input parameters, finding an acceptable balance between image quality and performance should be possible. Nevertheless, the suitability of RTAO should likely be evaluated on a 96 | Evaluation

case-by-case basis, considering the prospective application specifications. For example, when using RTAO, one must pay close attention to the geometric complexity and the number of objects animated by skinning. If an application intends to have scenes with many animated objects, RTAO may be less suitable. Furthermore, applications may predominantly have scenes with characteristics that benefit to a lesser extent (image quality wise) from using RTAO over GTAO. Also, one should consider the overhead involved with using hardware-accelerated ray tracing, including the effort necessary to optimise the management of the acceleration structures. Overall, RTAO is more suitable in applications where the difference in image quality will matter and justify the price paid in performance. An example could be an application that aspires to a high degree of visual realism but constrains the number of animated objects displayed on the screen. Also, if a lower frame-rate target is acceptable (30FPS, for instance), this gives leeway for the more expensive RTAO. The results indicate that, while RTAO has superior image quality, SSAO methods such as GTAO are not obsolete. Real-time applications will likely continue to rely on SSAO methods for the foreseeable future because of the significant difference in performance. Also, GPUs with ray tracing acceleration are not yet ubiquitous, so there would still be a need for a fallback AO method. Comparing the results with [9] shows that performance is still a concern when using RTAO (and a more pressing concern if animations are used). The tests in the present thesis were done using a newer generation GPU compared to [9]. There is also the possibility that GPU drivers have improved since. In [9], the RTAO computation time for four samples per pixel and 1080p resolution was 4.0350 ms. In this thesis’ the equivalent was 3.203 ms. a one ms improvement. While this improvement is notable, it is not dramatic and likely not enough to make a shift to RTAO a trivial concern. In a broader view, comparing RTAO with GTAO demonstrates the potential for using ray tracing in real-time. Conceptually, ray tracing algorithms can be simple yet powerful enough to solve general rendering problems, such as accurate reflections, global illumination and ambient occlusion. In contrast, the rasterisation pipeline often relies on mixing various specialised ad hoc techniques (sometimes using "hacks") to achieve the desired effect. This can yield impressive results, but such techniques are often fundamentally limited in some way and are not solutions that generalise to all scenarios. SSAO methods, such as GTAO, are examples of this. Seeing the difference in image quality that can be achieved by using ray tracing is compelling. Therefore, it seems like a natural progression Evaluation | 97

to aim for real-time renderers that can fully leverage ray tracing. At least for applications that aspire to realistic and physically correct renderings. Historically, GPUs have been developed along a technological trajectory where rasterisation is the dominant rendering algorithm. Therefore, much research and development into the hardware and supporting software have made rendering using rasterisation extremely fast. With similar investments in time and effort, we could expect to see large strides taken to make fully ray traced real-time renderers ubiquitous. For the time being, we should expect to see real-time renderers moving towards a hybrid model, predominantly based on rasterisation but including some ray tracing techniques, such as RTAO. As the results in this thesis show, this can lead to notable improvements in image quality and hopefully a more compelling visual experience.

4.5 Limitations

The test scenarios were designed to resemble scenes in a broad category of applications while also challenge the ambient occlusion methods in different ways. More concretely, the features that define the scenarios resemble what one could expect to find in video games or perhaps interactive visualisations. For instance, scenes that contain multiple objects that are animated by both rigid-body animations and skinning animations. The choice of scenes and models could well resemble those applications mentioned above - at least an extensive category of applications that contain scenes with interior spaces and realistic models. Also, the scenarios were designed to allow measurements to be taken while controlling and isolating relevant parameters Nevertheless, in some ways, the scenarios were contrived. For instance, the performance tests spawned animated objects that often intersected each other. Having many objects clumped together likely led to a worst-case scenario for the ray intersection tests which could explain the results for Scenario P.1 and P.2. A real-time application, such as a video game, would likely be more judicious about where to spawn animated objects and avoid such worst-case scenarios. Similarly for interactive architectural visualisations. Therefore, the performance characteristics of these tests may not necessarily reflect an actual application. Another limitation of the scenario design was to use different models for the rigid-body animations and the skinning animations. This choice introduced a small amount of ambiguity regarding the reason for the difference in RTAO computation time between Scenario P.2 and Scenario P.3. Ideally, the same model should have been used for both scenarios. However, it is 98 | Evaluation

unlikely that the type of animation was the determining factor. Rather, it likely comes down to difference in geometrical complexity between the two different models used in respective scenario, as was pointed out in Section 4.3. The lack of camera movement in any scenario may also be a significant difference with an actual application. Based on the results of this thesis, moving the camera would probably produce visual artefacts (for both RTAO and GTAO) similar to what has been discussed for moving objects but for the entire scene - including static geometry. For RTAO this would likely involve an increase in the level of noise as the camera moves around. Nevertheless, the degree to which this would impact the image quality or performance remains unanswered by this thesis. There is also the lack of a more varied kind of scenarios and geometry - one example is the absence of procedural geometry (such as landscapes) or, perhaps more interestingly, fur, hair or foliage. The latter examples could be animated, of course - an example being a tree that sways in the wind. These types of geometries would likely present problems for both AO methods. From the results in this thesis, we could infer that the main problem for RTAO would be extensive noise around such geometry, but tests would have to be carried out to know for sure. Moreover, the types of animations used in the tests were somewhat limited - both in number and in their features. For example, all animations were evenly paced and relatively smooth. It is not entirely clear to what extent the image quality results would differ if snappier and faster animations had been used - the issues with noise and temporal lag would likely have been more severe, but this also remains unanswered. The author of this thesis did not attempt to apply any optimisations to the Unity application used to create the tests nor the AO implementations. It is possible that notable performance increases could be made - as an example, the management of ray tracing acceleration structures could likely be improved. Other optimisation for RTAO was not considered either, so the degree to which the performance of RTAO could be improved is unanswered by this thesis. The scenario designs and the possibility of optimisations raise the question of how generalisable the results of this thesis are. While any given measurement may be specific to its context, both in terms of test scenario and test hardware, the relationships between measurements are likely transferable to other contexts. The performance results also follow (with a few noted exceptions) relationships that theory predicts. Hence, the results in this thesis give a general idea of what performance can be expected from the two different AO methods. There should be no surprise if there is room for performance improvements due Evaluation | 99

to optimisation. However, it is unlikely that such efforts will improve the computation time of RTAO by several orders of magnitude. This reasoning does not extend to significant changes in hardware architecture or using a substantially different approach - such as relying on state-of-the-art machine learning based denoisers/reconstruction algorithms. The visual issues that have been mentioned should also generalise to different applications and scenarios as they appear to stem from the fundamental design of the two AO methods. For RTAO, the biggest issue involved noise, which would still be present (to some degree) if a higher sample count is used. GTAO is mainly limited by using the information in the depth buffer to compute the AO and this limitation remains regardless of the application. The SSIM score turned out to have some limitations when used in this thesis, which became apparent when comparing the SSIM scores with visual inspections. This observation mainly stems from the somewhat low SSIM score of the ’Low Quality’ configuration of RTAO in Scenario IQ.3, compared with the GTAO SSIM scores in the same scenario. In this authors view, the RTAO renderings are visibly superior to all GTAO quality configurations. It is likely that the noise in the RTAO renderings (especially in ’Low Quality’) penalised the SSIM scores but it is somewhat difficult to get an intuitive sense of exactly what the SSIM is measuring. Possibly, an alternative SSIM implementation could have been used, better configured to suit the demands of this thesis’ tests. Also, it is worth mentioning that there was a slight difference in the baseline renderings shown in Figure 3.5 in Section 3.1. As a reminder, the SSIM for the baseline renderings were 0.9999. This difference, although small, means that none of the AO methods could achieve an SSIM score of one. Overall, this difference likely had a negligible impact on the SSIM scores for the image quality tests. Whatever the case, the SSIM scores did not constitute the final verdict in evaluating image quality. Consequently, its significance for this thesis might be debated. On this point, it should be said that SSIM was a valuable tool in the analysis and evaluation. It was useful as it could be computed for a high number of consecutive frames, which revealed some information about the sequence of frames as a whole. Moreover, the local SSIM maps were helpful as they indicated visual artefacts and consistent issues with the different AO methods. They could also indicate differences between quality configurations, by indicating which visual issues were consistent and which issues were less prevalent. It is possible to object to the somewhat reductionist approach to evaluate image quality in this thesis. Specifically, one could criticise the assumption 100 | Evaluation

that the closer the real-time ambient occlusion renderings are to the offline renderings, the better image quality. The offline renderings can be used as reference images because all ambient occlusion methods fundamentally attempt to solve (or rather approximate) the same equation. Using Monte Carlo ray traced ambient occlusion can be mathematically shown to converge on the solution to the ambient occlusion equation. An offline renderer can afford many samples to this end and therefore produce highly accurate images. Thus, evaluating the image quality of a real-time AO method can be done impassionately and to a large extent objectively by comparing it with the AO produced by an offline renderer. Admittedly, there is a degree of subjectivity involved in the evaluation based on the visual inspection, especially since the visual inspection was carried out by solely by the author of this thesis. As explained in the section on Image Quality Assessment, Section 2.5.1, the variance between users in a subjective evaluation can be large. A user study could have provided a more exhaustive assessment, reduced the risk of biases and landed in an evaluation that would be closer to an average subjective assessment. Nevertheless, a comprehensive user study was deemed outside the scope of this project. The reader is of course encouraged to look at the renderings and images in this report and make their own assessment regarding the image quality and how disturbing the visual issues associated with each method are. Rendering and evaluating the ambient occlusion without any additional shading makes it easier to compare and contrast different methods with offline renderings, but, this can present a different limitation. Presumably, the ambient occlusion is not intended to constitute the final rendering but would be combined shading and perhaps post-processing effects. Therefore, the extent to which RTAO and GTAO differ in image quality might appear to be more or less pronounced when composing into a final rendering with shading and other effects. If the difference appears less pronounced, that would make RTAO less attractive, due to its higher cost. Furthermore, there are likely artistic and design choices that could change how much the choice of AO method impacts the final rendering. Examples of such decisions could be camera position and angle, the composition of scenes and the aesthetic style. From the results in this thesis, it is not clear how noticeable the image quality difference between RTAO and GTAO would be in a final rendering used in a real application. Future work and conclusion | 101

Chapter 5

Future work and conclusion

5.1 Future work

An immediate extension to the research carried out in this thesis would be to do more comprehensive tests. A natural step would be scenarios with a moving camera. This could provide more insight into suitable use cases for both RTAO and GTAO and a better understanding of potential issues that both ambient occlusion methods have in a dynamic scene. Scenarios could also be added to address a wider span of features. An example could be to investigate how the results would differ if all movement and animations were very rapid. Since denoising is crucial for RTAO and the results indicated that the denoising stage produced some visual artefacts, future work could investigate and compare different denoising algorithms to see if this could lead to image quality improvements. There is also potential for performance improvements if this stage could be made more efficient. Likely, development in this area could make a significant difference towards the viability of RTAO but also other ray traced techniques used in real-time. Consequently, there has been a lot of research in this area recently and applying recent advances in denoising to RTAO could yield compelling results. Another avenue for future research would be to look at ways of improving the performance of RTAO. Some possible examples could be to render RTAO at a lower resolution and then upscale or use simpler (low-poly) geometry when calculating the ambient occlusion - this would likely sacrifice image quality but the extent to which this would be noticeable is worthy of investigation. Additionally, one could look into the impact of different sampling schemes, such as variable rate sampling. The latter example could be evaluated on different scenarios, to determine where it is suitable and not. 102 | Future work and conclusion

As stated in Section 4.5, based on the tests done in this thesis, it is not entirely clear how much difference the choice of either RTAO or GTAO would make on the final image quality after including lightning, shading, atmospheric effects, post-processing effects and more. It could be interesting to do a user study to investigate to what extent the choice of AO method impacts final image quality. This could be considered with animated objects too.

5.2 Conclusion

In this thesis, we investigated to what extent RTAO is suitable in a real-time application that contain animations. Examples of such applications include video games and interactive visualisations. The research was carried out in the context of recently introduced capabilities of GPUs that enables hardware- accelerated ray tracing. In order to evaluate RTAO, it was compared with a state-of-the-art SSAO method called GTAO. Both goals, stated at the beginning of this thesis, were achieved. RTAO and GTAO were evaluated and compared regarding image quality and performance. While the results showed that RTAO was superior in image quality, it also showed that RTAO is considerably more expensive than GTAO, usually an order of magnitude (or more.) While RTAO can be used in a real-time application, it is still costly enough that it warrants some concerns, particularly if a large number of animated objects is a requirement. For some applications that prioritise visual fidelity, the price paid in performance can be well worth it. It should also be mentioned that RTAO comes with an added burden of complexity. The increased complexity mainly stems from interacting with the requisite ray tracing API:s, such as DXR. The added complexity should be weighed as a cost for anyone interested in utilising RTAO. If the renderer already implements hardware accelerated ray tracing techniques (for reflections as an example) this is less of a consideration. Hardware-accelerated ray tracing is exciting since it opens new possibilities for which algorithms can be performed in a real-time renderer. Ray tracing- based techniques tend to be more general in their applications, which can improve visual quality but also simplify the workflow of engineers and artists alike. The results in this thesis gives some clear examples of were hardware- accelerated ray tracing can provide superior results to other approaches but it also showed that performance can still be a concern when using such techniques. With more iterations of the hardware accelerated ray tracing, ray Future work and conclusion | 103

traced ambient occlusion may become fast enough to be trivially used in real- time, which would likely render SSAO methods obsolete. 104 | Future work and conclusion REFERENCES | 105

References

[1] T. Möller, E. Haines, N. Hoffman, A. Pesce, M. Iwanicki, and S. Hillaire, Real-time rendering, fourth edition ed. Boca Raton: Taylor & Francis, CRC Press, 2018. ISBN 978-1-138-62700-0

[2] “DirectX Raytracing (DXR) Functional Spec.” [Online]. Available: https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html

[3] M. Mittring, “Finding next Gen: CryEngine 2,” in ACM SIGGRAPH 2007 Courses, ser. SIGGRAPH ’07. New York, NY, USA: Association for Computing Machinery, 2007. doi: 10.1145/1281500.1281671. ISBN 978-1-4503-1823-5 pp. 97–121, event-place: San Diego, California. [Online]. Available: https://doi.org/10.1145/1281500.1281671

[4] J. Jimenez, “Practical Real-Time Strategies for Accurate Indirect Occlusion (Presentation),” 2016. [Online]. Available: https://blog. selfshadow.com/publications/s2016-shading-course/

[5] F. P. Aalund, “A comparative study of screen-space ambient occlusion methods,” Bachelor’s Thesis, DTU, Technical University of Denmark Informatics and Mathematical Modelling, Kgs. Lyngby, Denmark, 2013. [Online]. Available: http://frederikaalund.com/wp-content/uploads/2013/05/ A-Comparative-Study-of-Screen-Space-Ambient-Occlusion-Methods. pdf

[6] A. Keller, L. Fascione, M. Fajardo, I. Georgiev, P. Christensen, J. Hanika, C. Eisenacher, and G. Nichols, “The path tracing revolution in the movie industry,” in ACM SIGGRAPH 2015 Courses. Los Angeles California: ACM, Jul. 2015. doi: 10.1145/2776880.2792699. ISBN 978-1-4503-3634-5 pp. 1–7. [Online]. Available: https://dl.acm. org/doi/10.1145/2776880.2792699 106 | REFERENCES

[7] C. R. A. Chaitanya, A. S. Kaplanyan, C. Schied, M. Salvi, A. Lefohn, D. Nowrouzezahrai, and T. Aila, “Interactive reconstruction of Monte Carlo image sequences using a recurrent denoising autoencoder,” ACM Transactions on Graphics, vol. 36, no. 4, pp. 1–12, Jul. 2017. doi: 10.1145/3072959.3073601. [Online]. Available: https: //dl.acm.org/doi/10.1145/3072959.3073601 [8] A. Keller, T. Viitanen, C. Barré-Brisebois, C. Schied, and M. McGuire, “Are we done with ray tracing?” in ACM SIGGRAPH 2019 Courses. Los Angeles California: ACM, Jul. 2019. doi: 10.1145/3305366.3329896. ISBN 978-1-4503-6307-5 pp. 1–381. [Online]. Available: https://dl.acm.org/doi/10.1145/3305366.3329896 [9] P. Ghavamian, “Real-time Raytracing and Screen-space Ambient Occlusion,” Master’s thesis, KTH, School of Electrical Engineering and Computer Science (EECS), 2019, backup Publisher: KTH, School of Electrical Engineering and Computer Science (EECS) Issue: 2019:442 Series: TRITA-EECS-EX. [10] C. Wyman and A. Marrs, “Introduction to DirectX Raytracing,” in Ray Tracing Gems, E. Haines and T. Akenine-Möller, Eds. Berkeley, CA: Apress, 2019, pp. 21–47. ISBN 978-1-4842-4426-5 978-1- 4842-4427-2. [Online]. Available: http://link.springer.com/10.1007/ 978-1-4842-4427-2_3 [11] C. Schied, A. Kaplanyan, C. Wyman, A. Patney, C. R. A. Chaitanya, J. Burgess, S. Liu, C. Dachsbacher, A. Lefohn, and M. Salvi, “Spatiotemporal variance-guided filtering: real-time reconstruction for path-traced global illumination,” in Proceedings of High Performance Graphics. Los Angeles California: ACM, Jul. 2017. doi: 10.1145/3105762.3105770. ISBN 978-1-4503-5101-0 pp. 1–12. [Online]. Available: https://dl.acm.org/doi/10.1145/3105762.3105770 [12] Z. Akhtar and T. H. Falk, “Audio-Visual Multimedia Quality Assessment: A Comprehensive Survey,” IEEE Access, vol. 5, pp. 21 090–21 117, 2017. doi: 10.1109/ACCESS.2017.2750918. [Online]. Available: http://ieeexplore.ieee.org/document/8030999/ [13] M. Pharr, W. Jakob, and G. Humphreys, Physically based rendering: from theory to implementation, third edition ed. Cambridge, MA: Morgan Kaufmann Publishers/Elsevier, 2017. ISBN 978-0-12-800645-0 OCLC: ocn936532273. REFERENCES | 107

[14] J. T. Kajiya, “The rendering equation,” ACM SIGGRAPH Computer Graphics, vol. 20, no. 4, pp. 143–150, Aug. 1986. doi: 10.1145/15886.15902. [Online]. Available: https://dl.acm.org/doi/10. 1145/15886.15902

[15] F. Hofmeyer, S. Fremerey, T. Cohrs, and A. Raake, “Impacts of internal HMD playback processing on subjective quality perception,” Electronic Imaging, vol. 2019, no. 12, pp. 219–1– 219–7, Jan. 2019. doi: 10.2352/ISSN.2470-1173.2019.12.HVEI-219. [Online]. Available: https://www.ingentaconnect.com/content/10.2352/ ISSN.2470-1173.2019.12.HVEI-219

[16] J. Deligiannis and J. Schmid, “”It Just Works”: Ray-Traced Reflections in ’Battlefield V’,” Mar. 2019. [Online]. Available: https://www. gdcvault.com/play/1026282/It-Just-Works-Ray-Traced

[17] H. Landis, “Production-Ready Global Illumination,” in SIGGRAPH RenderMan in Production course, ser. SIGGRAPH ’02, 2002.

[18] R. L. Cook and K. E. Torrance, “A Reflectance Model for Computer Graphics,” ACM Transactions on Graphics, vol. 1, no. 1, pp. 7– 24, Jan. 1982. doi: 10.1145/357290.357293. [Online]. Available: https://dl.acm.org/doi/10.1145/357290.357293

[19] S. Zhukov, A. Iones, and G. Kronin, “An ambient light illumination model,” in Rendering Techniques ’98, G. Drettakis and N. Max, Eds. Vienna: Springer Vienna, 1998. ISBN 978-3-7091-6453-2 pp. 45–55.

[20] M. Pharr, “On the Importance of Sampling,” in Ray Tracing Gems, E. Haines and T. Akenine-Möller, Eds. Berkeley, CA: Apress, 2019, pp. 207–222. ISBN 978-1-4842-4426-5 978-1-4842-4427-2. [Online]. Available: http://link.springer.com/10.1007/978-1-4842-4427-2_15

[21] L. Bavoil, M. Sainz, and R. Dimitrov, “Image-space horizon- based ambient occlusion,” in ACM SIGGRAPH 2008 talks on - SIGGRAPH ’08. Los Angeles, California: ACM Press, 2008. doi: 10.1145/1401032.1401061. ISBN 978-1-60558-343-3 p. 1. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1401032.1401061

[22] S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, “A gentle introduction to bilateral filtering and its applications,” in ACM SIGGRAPH 2007 courses on - SIGGRAPH ’07. San Diego, California: 108 | REFERENCES

ACM Press, 2007. doi: 10.1145/1281500.1281602. ISBN 978-1-4503- 1823-5 p. 1. [Online]. Available: http://dl.acm.org/citation.cfm?doid= 1281500.1281602

[23] D. Nehab, P. V. Sander, J. Lawrence, N. Tatarchuk, and J. R. Isidoro, “Accelerating Real-Time Shading with Reverse Reprojection Caching,” in Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, ser. GH ’07. Goslar, DEU: Eurographics Association, 2007. ISBN 978-1-59593-625-7 pp. 25–35, event-place: San Diego, California.

[24] H. Dammertz, D. Sewtz, J. Hanika, and H. P. A. Lensch, “Edge-Avoiding À-Trous Wavelet Transform for Fast Global Illumination Filtering,” in Proceedings of the Conference on High Performance Graphics, ser. HPG ’10. Goslar, DEU: Eurographics Association, 2010, pp. 67–75, event- place: Saarbrucken, Germany.

[25] C. Schied, C. Peters, and C. Dachsbacher, “Gradient Estimation for Real-time Adaptive Temporal Filtering,” Proceedings of the ACM on Computer Graphics and Interactive Techniques, vol. 1, no. 2, pp. 1–16, Aug. 2018. doi: 10.1145/3233301. [Online]. Available: https://dl.acm.org/doi/10.1145/3233301

[26] A. Galvain, “Ray Tracing Denoising,” Oct. 2020. [Online]. Available: https://alain.xyz/blog/ray-tracing-denoising

[27] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004. doi: 10.1109/TIP.2003.819861. [Online]. Available: http: //ieeexplore.ieee.org/document/1284395/

[28] L. Bavoil, M. Sainz, and R. Dimitrov, “Image-space horizon-based ambient occlusion (Presentation),” in ACM SIGGRAPH 2008 talks on - SIGGRAPH ’08. Los Angeles, California: ACM Press, 2008. doi: 10.1145/1401032.1401061. ISBN 978-1-60558-343-3 p. 1. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1401032.1401061

[29] N. L. Max, “Horizon mapping: shadows for bump-mapped surfaces,” The Visual Computer, vol. 4, no. 2, pp. 109–117, Mar. 1988. doi: 10.1007/BF01905562. [Online]. Available: http://link.springer.com/10. 1007/BF01905562 REFERENCES | 109

[30] V. Timonen, “Line-Sweep Ambient Obscurance,” Computer Graphics Forum (Proceedings of EGSR 2013), vol. 32, no. 4, pp. 97–105, 2013. [Online]. Available: http://wili.cc/research/lsao/

[31] E. Games, Unreal Engine Sun Temple, Open Research Content Archive (ORCA), Oct. 2017. [Online]. Available: http://developer.nvidia.com/ orca/epic-games-sun-temple 110 | REFERENCES Appendix A: Videos | 111

Appendix A

Videos

A.1 Extended figures

Scenario IQ.1 GTAO 0.9200

0.9150

0.9100

0.9050 SSIM

0.9000

0.8950

0.8900 1 600 Frame

Low Medium High Low (Full-res) Medium (Full-res) High (Full-res)

Figure A.1 – Scenario IQ.1 GTAO SSIM, this graph shows the same results as 4.3a but with a different y-axis that makes the difference between the quality configurations more apparent. 112 | Appendix A: Videos

Scenario IQ.2 GTAO 0.9200

0.9100

0.9000

0.8900

0.8800 SSIM

0.8700

0.8600

0.8500

0.8400 1 600 Frame

High Quality (Full res) Medium Quality(Full res) Low Quality (Full res) High Quality Medium Quality Low Quality

Figure A.2 – Scenario IQ.2 GTAO SSIM, this graph shows the same results as 4.5a but with a different y-axis that makes the difference between the quality configurations more apparent.

Scenario IQ.3 GTAO 0.9900

0.9880

0.9860 SSIM 0.9840

0.9820

0.9800 1 200 Frame

High Quality (Full res) Medium Quality (Full res) Low Quality (Full res) High Quality Medium Quality Low Quality

Figure A.3 – Scenario IQ.3 GTAO SSIM, this graph shows the same results as 4.7a but with a different y-axis that makes the difference between the quality configurations more apparent. Appendix A: Videos | 113

A.2 Scenario IQ.1 videos

Arnold reference rendering

A.2.1 RTAO • High Quality

• High Quality, local SSIM map

• Medium Quality

• Medium Quality, local SSIM map

• Low Quality

• Low Quality, local SSIM map

A.2.2 GTAO • High Quality

• High Quality, local SSIM map

• Medium Quality

• Medium Quality, local SSIM map

• Low Quality

• Low Quality, local SSIM map

A.2.3 GTAO Full-resolution • High Quality

• High Quality, local SSIM map

• Medium Quality

• Medium Quality, local SSIM map

• Low Quality

• Low Quality, local SSIM map 114 | Appendix A: Videos

A.3 Scenario IQ.2 videos

Arnold reference rendering

A.3.1 RTAO • High Quality

• High Quality, local SSIM map

• Medium Quality

• Medium Quality, local SSIM map

• Low Quality

• Low Quality, local SSIM map

A.3.2 GTAO • High Quality

• High Quality, local SSIM map

• Medium Quality

• Medium Quality, local SSIM map

• Low Quality

• Low Quality, local SSIM map

A.3.3 GTAO Full-resolution • High Quality

• High Quality, local SSIM map

• Medium Quality

• Medium Quality, local SSIM map

• Low Quality

• Low Quality, local SSIM map Appendix A: Videos | 115

A.4 Scenario IQ.3 videos

Arnold reference rendering

A.4.1 RTAO • High Quality

• High Quality, local SSIM map

• Medium Quality

• Medium Quality, local SSIM map

• Low Quality

• Low Quality, local SSIM map

A.4.2 GTAO • High Quality

• High Quality, local SSIM map

• Medium Quality

• Medium Quality, local SSIM map

• Low Quality

• Low Quality, local SSIM map

A.4.3 GTAO Full-resolution • High Quality

• High Quality, local SSIM map

• Medium Quality

• Medium Quality, local SSIM map

• Low Quality

• Low Quality, local SSIM map For DIVA

{ "Author1": { "Last name": "Waldner", "First name": "Fabian", "E-mail": "[email protected]", "organisation": {"L1": "School of Industrial Engineering and Management ", } }, "Degree": {"Educational program": "Master’s Programme, Industrial Engineering and Management, 120 credits"}, "Title": { "Main title": "Real-time Ray Traced Ambient Occlusion and Animation", "Subtitle": "Image quality and performance of hardware-accelerated ray traced ambient occlusion", "Language": "eng" }, "Alternative title": { "Main title": "Strålspårad ambient ocklusion i realtid med animationer", "Subtitle": "Bildkvalité och prestanda av hårdvaruaccelererad, strålspårad ambient ocklusion", "Language": "swe" }, "Supervisor1": { "Last name": "Peters", "First name": "Christopher", "E-mail": "[email protected]", "organisation": {"L1": "School of Electrical Engineering and Computer Science ", "L2": "Division of Computational Science and Technology" } }, "Examiner1": { "Last name": "Weinkauf", "First name": "Tino", "E-mail": "[email protected]", "organisation": {"L1": "School of Electrical Engineering and Computer Science ", "L2": "Division of Computational Science and Technology" } }, "Other information": { "Year": "2021", "Number of pages": "xviii,115"} } TRITA -EECS-EX-2021:222

www.kth.se