Volume Rendering with Marching Cubes and Async Compute
Total Page:16
File Type:pdf, Size:1020Kb
Bachelor of Science in Computer Science May 2019 Volume rendering with Marching cubes and async compute Max Tlatlik Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fullment of the requirements for the degree of Bachelor of Science in Computer Science. The thesis is equivalent to 10 weeks of full time studies. The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identied as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree. Contact Information: Author(s): Max Tlatlik E-mail: [email protected] University advisor(s): Lecturer Stefan Petersson, Associate Professor Hans Tap Department of DIDA Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract With the addition of the compute shader stage for GPGPU hardware it has become possible to run CPU like programs on modern GPU hardware. The greatest bene- t can be seen for algorithms that are of highly parallel nature and in the case of volume rendering the Marching cubes algorithm makes for a great candidate due to its simplicity and parallel nature. For this thesis the Marching cubes algorithm was implemented on a compute shader and used in a DirectX 12 framework to de- termine if GPU frametime performance can be improved by executing the compute command queue parallell to the graphics command queue. Results from performance benchmarks show that a gain is present for each benchmarked conguration and the largest gains are seen for smaller workloads with up to 52%. This information could therefore prove useful for game developers who want to improve framerates or de- crease development time but also in other elds such as volume rendering for medical images. Keywords: Volume rendering, async compute, multi-engine. i Contents Abstract i 1 Introduction 1 1.1 Background . 1 1.2 Problem description . 2 1.3 Scope . 3 1.4 Research methodology . 3 2 Theory 4 2.1 Volume rendering . 4 2.2 Marching cubes . 6 2.3 GPGPU and compute shaders . 10 2.4 Multi-engine with DirectX 12 . 11 3 Implementation 13 3.1 Framework overview . 13 3.2 Density functions . 14 3.3 Vertex generation . 17 3.4 Rendering . 18 3.5 Performance benchmarking . 19 4 Results 21 4.1 Density volume creation time . 21 4.2 Mesh generation time . 21 4.3 Mesh draw time . 23 4.4 Performance benchmark results . 24 5 Discussion 27 5.1 The benets of asynchronous compute execution . 27 5.2 Implementation complexity and limitations . 27 5.3 Terrain generation at an interactive framerate . 28 5.4 Summary . 28 6 Conclusion 29 6.1 Reections . 29 6.2 Future work . 29 References 31 iii Chapter 1 Introduction This rst introductory chapter will introduce the reader to the topic of volumetric rendering, the history and background of the topic and its relevance for game de- velopment. A problem description is also presented with suggestion on how to solve problems related to the topic and the reader is presented with the problem this thesis tries to answer. Finally the research methodology used to answer the problem will be presented. The introduction is followed up by a chapter covering relevant theory about vol- umetric rendering, the algorithm used for volume rendering with the framework, important annotations about compute shaders and why they were used for this the- sis implementation and nally the reader is presented with the concept of multi- engines in DirectX 12. In chapter 3 details about the implementation is presented that shows how the theory is applied practically and how the results for performance benchmarking are collected. In chapter 4 the results are presented and in chapter 5 they are discussed. Furthermore, in chapter 5 the implementations problems and limitations, and terrain generation are discussed and a summary of the discoveries is also presented. The report is then concluded in chapter 6 with reections and possible future work for this thesis. 1.1 Background During game development one of the most time consuming development processes is the creation of assets such as 3D models, world sculpturing, texturing, etc. In some cases smaller game development studios do not even have the necessary funds for proper assets creation. To speed up content creation processes developers can make use of procedural mesh generation. Such content creation tools can quickly create meshes that may only need some polishing thus greatly increasing the speed of the development process. While this is great for oine content creation, such tools can also be adapted to be used online. It is then important that the procedural mesh generation is running in real-time at an acceptable framerate. Many algorithms for procedural terrain generation can create meshes with the CPU based on height maps with some noise functions such as Perlin noise, Value noise or Worley noise [13], how- ever a high complexity terrain is then usually not created during playable runtime but rather done during world loading. Terrain generation at high complexity, which can include terrain features such as caves and overhangs, is best suited for the GPU since it is a highly parallel task and the GPU is better suited for parallel processing than the CPU [11]. 1 2 Chapter 1. Introduction Utilizing the GPU for procedural terrain generation can accelerate development speed and generate more complex worlds than CPU based heightmap solutions. Minecraft has procedural world generation based on voxels1; while it has simplistic graphics, it creates unique experiences for each new world. To generate the games world Minecraft makes use of a volume rendering technique. There are many dierent algorithms available for volume rendering and one particular algorithm, Marching cubes (MC), presented by Lorensen and Cline (1987) [10] has become very popular for volumetric mesh generation because it is fast to implement on the CPU and gen- erates high-resolution meshes given large enough volumes. The common usage of the MC algorithm are 3D visualization of volume data from magnetic resonance imaging and computed tomography scans [10, 16] but has nowadays also been adapted to be used in real-time interactive applications, such as video games, due to the ad- vancements made in GPU hardware and the accessibility to run CPU-like programs on a general purpose GPU (GPGPU) with compute shaders. Marching cubes is in its nature well suited for parallel execution since it works on independent individual voxels and therefore will benet greatly by an implementation that runs on the GPU, this is explained in greater detail in chapter 2. 1.2 Problem description One major problem with volume rendering is the fact that the space complexity is O(n3) and therefore heavily impacts performance and memory consumption as the volume increases. To be able to render the volume at an acceptable interactive fram- erate, one must make use of algorithms, such as Marching cubes [10], that can during generation discard polygons that are of no interest. There are a few other algorithms that instead can be utilized such as Cubical Marching Squares [7], surface nets [6] and HistoPyramids [9], and while these advanced algorithms can perform better than MC they are also more dicult to implement. During game development time for developing algorithms or creating assets is often limited and therefore a simple, yet good enough, solution might be more desirable if high performance is not a crucial criterion. This makes Marching cubes a good candidate for problems such as proce- dural terrain generation. One aim for this thesis is to present the reader with viable modern options for procedural terrain generation techniques that can be used in in- teractive applications, such as games, on general purpose GPUs. Hopefully this will help developers gain insight for which volume rendering technique to use for their problem. The second aim for this thesis is to answer this question: What are the dierences in GPU frametime performance with marching cubes mesh generation and mesh rendering for sequential and parallell compute execution? 1Voxel is the combined word of Volume, pixel and element 1.3. Scope 3 1.3 Scope Volume rendering has a long history and a lot of work has been done on the topic, especially in the medical and compute graphics area. The catalogue for relevant in- formation is therefore expansive which implies that a major part for this thesis can be spent on information gathering and literature analysis. For this reason the focus will be narrowed down to one algorithm for volume rendering, the Marching Cubes algorithm, which is implemented for the project. The practical part for this thesis is focused on examining the dierence in GPU frametime performance when the Marching cubes algorithm is executed sequentially and in parallell to mesh rendering. The goal is to push the algorithm to its extremes and therefore it is made sure that generated overhead is minimal by implementing the algorithm in a very simple and light rendering framework. The framework is custom built with C++ and uses the DirectX 12 API for this purpose. The test environment will also be limited to one particular machine with modern hardware. 1.4 Research methodology To be able to show the reader which volume rendering techniques are of particu- lar interest literature analysis is done for relevant articles to gain knowledge about procedural mesh generation and to get a good understanding for its application in real-world scenarios.