Real-Time Generation of Kd-Trees for Ray Tracing Using Directx 11
Total Page:16
File Type:pdf, Size:1020Kb
Master of Science in Engineering: Game and Software Engineering 2017 Real-time generation of kd-trees for ray tracing using DirectX 11 Martin Säll and Fredrik Cronqvist Dept. Computer Science & Engineering Blekinge Institute of Technology SE–371 79 Karlskrona, Sweden This thesis is submitted to the Department of Computer Science & Engineering at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 20 weeks of full-time studies. Contact Information: Authors: Martin Säll E-mail: [email protected] Fredrik Cronqvist E-mail: [email protected] University advisor: Prof. Lars Lundberg Dept. Computer Science & Engineering Mail: [email protected] Examiner: Veronica Sundstedt E-mail: [email protected] Dept. Computer Science & Engineering Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract Context. Ray tracing has always been a simple but effective way to create a photorealistic scene but at a greater cost when expanding the scene. Recent improvements in GPU and CPU hardware have made ray tracing faster, making more complex scenes possible with the same amount of time needed to process the scene. Despite the improvements in hardware ray tracing is still rarely run at an interactive speed. Objectives. The aim of this experiment was to implement a new kd- tree generation algorithm using DirectX 11 compute shaders. Methods. The implementation created during the experiment was tested using two platforms and five scenarios where the generation time for the kd-tree was measured in milliseconds. The results where com- pared to a sequential implementation running on the CPU. Results. In the end the kd-tree generation algorithm implemented did not run within our definition of real-time. Comparing the generation times from the implementations shows that there is a speedup for the GPU implementation compared to our CPU implementation, it also shows linear scaling for the generation time as the number of triangles in the scene increase. Conclusions. Noticeable limitations encountered during the experi- ment was that the handling of dynamic structures and sorting of arrays are limited which forced us to use less memory efficient solutions. Keywords: Ray tracing, Kd-tree, acceleration, DirectX 11, Compute shader. i Contents Abstract i 1 Introduction 1 1.1 Ray tracing . 1 1.2 Kd-tree . 2 1.2.1 Rebuild per frame . 3 1.3 Our contribution . 4 2 Related Work 5 2.1 Uniform grid . 5 2.2 Bounding volume hierarchies . 5 2.3 Kd-tree . 6 2.3.1 Hardware . 6 2.3.2 Traversing . 7 2.3.3 Split selection . 8 3 Method 9 3.1 The experiment . 10 3.1.1 CPU generation . 10 3.1.2 GPU generation . 14 3.2 Scenarios . 20 3.3 Platforms . 21 3.4 Evaluation method . 22 4 Results 23 4.1 CPU creation . 23 4.2 GPU creation . 25 4.3 Average summary . 27 4.4 Tables with exact values . 28 4.4.1 Speedup . 29 5 Discussion 30 6 Conclusions 32 ii 7 Future Work 34 Appendices 37 A Code and raw data 38 List of Algorithms 1 CPU Generation . 13 2 GPU Generation . 17 3 GPU Generation continuation . 18 4 GPU Generation continuation . 19 List of Figures 1.1 Kd-tree representations. 3 3.1 GPU work flow . 15 3.2 The scenarios used in the testing in order from lowset triange amount to highest . 21 4.1 Laptop CPU Graph . 24 4.2 Desktop CPU Graph . 24 4.3 Laptop GPU Graph . 26 4.4 Desktop GPU Graph . 26 4.5 Comparison Graph . 27 iii List of Tables 4.1 Laptop generation times . 28 4.2 Desktop generation times . 28 4.3 Speedup . 29 iv Chapter 1 Introduction Realistically rendered scenes within some applications is the goal but often unob- tainable. Ray tracing is a way to get this realism without investing in a multitude of systems and techniques. While ray tracing gives this simplicity it will also come with a very high performance cost. There are ways to get around some of the performance problems by having the scene pre generated. Pre generated scenes are built at startup and are therefore static and can not be altered when the application is running and they take a long time to generate. Ray tracing’s main performance issue comes from the ray vs triangle tests that need to be done for every pixel in the window that will display the ray traced environment as well as the number of lights every ray will have to trace. When making an application that uses ray tracing there are many things to consider regarding the performance of the application. Should the calculations be done on the GPU or on the CPU, which language should be used and what kind of extra methods and data structures should be used. All of these choices have their benefits and drawbacks, both with and without the combination of them. Real-time is defined from application to application and what the developers consider as needed to achieve that specific goal in the system. In our case real- time is defined as having no pre generated scenes and no structures built at the startup moment of the application. The system shall work even when moving the camera around in the scene. When moving around in the scene there shall be no major changes in the quality of the system. 1.1 Ray tracing The first ray tracing implementation that is now called ray casting was introduced by Arthur Appel in 1968 [1], when he presented a technique that sent rays from the eye of the viewer, one per pixel, that then collided with the closest object in the scene and with some lighting effects gave the pixel its color. The next step was presented by Turner Whitted in 1979 [23], when he presented a method which not only traced the rays to the first hit but from that point generated new rays that gave additional information to use for the lighting of the pixel. Ray 1 Chapter 1. Introduction 2 tracing was introduced to the wider public by Eric Graham in 1987 [9], when he created an animated scene that ran in real-time on the Amiga. The tech- nique has been around for awhile, but because of the heavy computations that the technique uses it has never reached a real-time runtime speed for applications with complex scenery. This limitation has stopped the technique from being used for interactive applications like games. In recent years, however, new techniques using the graphics cards ability for massively parallel computations have allowed for drastic improvements in the execution times of ray tracing applications. This has led to new research being done in the area. The main reason that makes ray tracing slow to compute is the intersection tests between rays and triangles. This is because every ray needs to be tested against every triangle to find which tri- angle is closest to the ray origin. The key to achieving real-time performance for a ray tracer in a complex scene is effective and optimized acceleration structures containing the scene geometry to reduce the amounts of ray versus triangle tests. 1.2 Kd-tree A k-dimensional tree (kd-tree) is a space-partitioning data structure used to speedup various kinds of tests. In this project the kd-tree is used to speedup the intersection tests between rays and triangles by reducing the number of pos- sible intersections. This technique uses the triangles in the scene as input and creates a tree structure. The triangles are split according to a predefined con- dition. This split is commonly made using the median triangle as the splitting point to make the tree as balanced as possible. This split is made on the list of triangles that the function takes as input parameters. The function then recur- sively splits the input list into two splits and calls itself with the new lists as the inputs. This continues until an end criterion is met. The end criterion is often a threshold value limiting the minimum amount of triangles that are to be split or the volume that the triangles are contained in is below a threshold value. Once all the splits reach one of these end conditions the tree is created and the function terminates. During this splitting progress nodes have been created and linked in a tree structure creating the kd-tree with the triangles stored in the leaf nodes once the stall condition has been met. Kd-trees are fast to traverse but slow to build. A standard kd-tree can not be generated for each frame and is usually built at startup. This means that kd-trees can not handle dynamic objects in scenes, only the static parts. A binary kd-tree was used in this project, the layout of a binary tree is shown in Figure 1.1a with the root node at the top and then left and right child nodes building up the tree. Figure 1.1a also shows which axis the node splits along. Figure 1.1b found at Wikipedia [14] shows a graphical representation of a scene split by a kd-tree. The entire box is stored in the root node that is then split, that Chapter 1. Introduction 3 split being illustrated by the red border. Each of those splits are stored in the root nodes child nodes and then they are split represented by the green border and stored in their child nodes.