GPU Volume Voxelization Exploration of the Performance Characteristics of Different GPU-Based Implementations
Total Page:16
File Type:pdf, Size:1020Kb
EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP STOCKHOLM, SVERIGE 2019 GPU Volume Voxelization Exploration of the performance characteristics of different GPU-based implementations GRIGORY GLUKHOV ALEKSANDRA SOLTAN KTH SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP GPU Volume Voxelization Exploration of the performance characteris- tics of different GPU-based implementations Aleksandra Soltan Grigory Glukhov Examiner Johan Montelius Supervisor Thomas Sjöland A thesis presented for the degree of Bachelor of Information and Communication Technology KTH Royal Institute of Technology School of Electrical Engineering and Computer Science SE-100 44 Stockholm, Sweden June 2019 page intentionally left blank 1 Abstract In recent years, voxel-based modelling has seen a reintroduction to computer game development through massive graphics hardware improvements. Never- theless, polygons continue to be the default building block of 3D objects, intro- ducing a need for the transformation of polygon meshes into voxel-based models; this process is known as voxelization. Efficient voxelization algorithms take ad- vantage of the flexibility and control offered by modern, programmable GPU pipelines. However, the variability in possible approaches poses the question of how different GPU-based implementations affect voxelization performance. This thesis explores the impact of GPU-based improvements by comparing four different implementations of a solid voxelization algorithm. The implemen- tations include a naive transition from the CPU to the GPU, a non-branching execution path approach, data pre-processing, and a combination of the two previous approaches. Benchmarking experiments run on four, standard polygo- nal models and three graphics cards (NVIDIA and AMD) provide runtime and memory usage data for each implementation. A comparative analysis is per- formed on the basis of this data to determine the performance impact of the GPU-based adjustments to the voxelization algorithm implementation. Results indicate that the non-branching execution path approach yields clear improvements over the naive implementation, while data pre-processing has in- consistent performance and a large initial performance cost; the combination of the two improvements unsurprisingly leads to combined results. Therefore, the conclusive recommendation is using the non-branching execution path technique for GPU-based improvements. Keywords voxelization, GPU, GPGPU, SIMT, thread divergence, Vulkan API 2 Sammanfattning Voxel-baserad modellering har på senare år blivit återintroducerat till datorspel- sutveckling tack vare massiva förbättringar i grafikhårdvara. Trots detta fortsät- ter polygoner att vara standarden för uppbyggnaden av 3D-objekt. Detta gör det nödvändigt att kunna transformera polygonytor till voxel-baserade mod- eller; denna process kallas för voxelisering. Effektiva voxeliseringsalgoritmer tar vara på den flexibilitet och kontroll som ges av moderna, programmerbara GPU-pipelines. Variationen i möjliga tillvägagångssätt gör det dock intressant att veta hur olika GPU-baserade implementationer påverkar prestandan av vox- eliseringen. Denna avhandling undersöker påverkan av GPU-baserade förbättringar gen- om att jämföra fyra olika implementationer av en solid-voxeliseringsalgoritm. Implementationerna inkluderar en naiv övergång från CPU:n till GPU:n, en metod med en non-branching exekveringsväg, förbehandling av data, och en kombination av det två tidigare metoderna. Benchmarking-experiment görs på fyra standardpolygonmodeller och tre grafikkort (NVIDIA och AMD) förser data för exekveringstid och minnesåtgång för varje implementation. En jäm- förande analys görs med detta data som grund för att bestämma den påverkan som de GPU-baserade ändringarna har på prestandan av voxeliseringsalgorit- mens implementation. Resultaten indikerar att implementationen med en non-branching exekver- ingsväg ger klara förbättringar över den naiva implementationen, medans förbe- handlingen av data presterar inkonsekvent och har en stor initial prestandakost- nad; kombinationen av dem båda ledde, inte överraskande, till blandade resul- tat. Den slutgiltiga rekommendationen är således att använda tekniken med en non-branching exekveringsväg för GPU-baserade förbättringar. Nyckelord voxelization, GPU, GPGPU, SIMT, tråd divergering, Vulkan API 3 Acknowledgements Special thanks go to Igor Glukhov for helping us make figures, Michael Schwarz for responding to our email regarding details of the implementation of his al- gorithm, Erik Bauer for translating our abstract to Swedish, as well as Johan Montelius and Thomas Sjöland for answering thesis-related questions. Addi- tionally, we would like to thank the Stanford Computer Graphics Laboratory for publishing the 3D meshes used in our experiments. Finally, we thank Mutate for hosting this thesis for two months. Stockholm, June 2019 Aleksandra Soltan and Grigory Glukhov 4 Contents Abstract 2 Sammanfattning 3 Acknowledgements 4 Contents 5 List of Figures 7 List of Tables 7 1 Introduction 8 1.1 Background . .8 1.2 Problem definition . 11 1.3 Purpose . 11 1.4 Goals . 11 1.5 Research methodology . 12 1.6 Delimitations . 12 1.6.1 Existing algorithm . 12 1.6.2 Single algorithm . 12 1.6.3 GPU-specific evaluation . 13 1.7 Structure of the thesis . 14 2 Background 14 2.1 GPU programmability . 14 2.1.1 GPU computing . 14 2.1.2 Graphics APIs . 15 2.1.3 Compute shaders . 15 2.2 Voxelization . 16 2.2.1 Rasterization . 16 2.2.2 Triangle-box test . 17 2.2.3 Sparse Voxel Octrees . 18 2.2.4 Schwarz and Seidel algorithm . 19 2.3 Related work . 22 2.3.1 Surface voxelization . 22 2.3.2 Solid voxelization . 23 3 Experimental methodology 24 3.1 Tested implementations . 24 3.2 Experimental setup and data collection . 24 3.2.1 Models . 24 3.2.2 Experimental design and implementation . 25 3.2.3 Data collection . 26 3.3 Testing environment . 27 5 3.3.1 Hardware . 27 3.3.2 Software . 27 4 Design and implementation 27 4.1 Design . 28 4.1.1 Naive approach . 28 4.1.2 Non-branching execution path approach . 28 4.1.3 Data pre-processing approach . 28 4.1.4 Combined approach . 28 4.2 Implementation . 30 4.2.1 Naive approach . 32 4.2.2 Non-branching execution path approach . 32 4.2.3 Data pre-processing approach . 32 4.2.4 Combined approach . 33 5 Results and analysis 33 5.1 Major results . 33 5.1.1 Mean runtime performance . 34 5.1.2 Relative runtime performance . 36 5.1.3 Mean runtime performance with pre-processing . 38 5.1.4 GPU memory requirement . 40 5.2 Discussion . 41 5.2.1 Runtime performance . 41 5.2.2 Memory performance . 42 6 Conclusions and future work 42 6.1 Conclusions . 42 6.2 Limitations . 42 6.3 Future work . 43 References 44 Appendix 49 6 List of Figures 1 Graphics pipeline . .9 2 Voxelization . 10 3 Edge function test . 16 4 Triangle plane test . 18 5 Example octree . 19 6 Tile assignment . 21 7 Tile processing . 22 8 Tested meshes . 25 9 Experiment design structure. 26 10 Implementation flow chart . 29 11 Tile data structure . 31 12 Triangle data structure . 33 13 Mean runtime performance results (GTX 1070) . 34 14 Mean runtime performance results (RTX 2070) . 35 15 Mean runtime performance results (RX Vega 64) . 35 16 Relative runtime improvement results (GTX 1070) . 36 17 Relative runtime improvement results (RTX 2070) . 37 18 Relative runtime improvement results (RX Vega 64) . 37 19 Runtime results with pre-processing (GTX 1070) . 39 20 Runtime results with pre-processing (RTX 2070) . 39 21 Runtime results with pre-processing (RX Vega 64) . 40 22 Memory requirement results . 41 List of Tables 1 Algorithm comparison on basis of criteria fulfillment . 13 2 Testbed hardware setups . 27 3 Average voxelization times for different meshes, excluding pre- processing time. 36 7 1 Introduction For decades polygons have been the default building block of 3D models in computer graphics. However, recent claims regarding "unlimited detail" [1], improved scalability [2], and intuitive content manipulation [2] have reignited interest in voxel representation of 3D data. Voxels are discrete cubes used to construct volumetric objects; due to voxels’ comparability to real world atoms, they offer a higher level of detail and greater freedom for manipulation of 3D models. Voxel model representation has long been widely used in medical imag- ing like CAT scans and MRIs [3, 4]; recently, its use in computer game develop- ment has accelerated, with applications such as Global Illumination [5], terrain representation [6], and pathfinding [7]. The process of transforming a polygon representation of a 3D model into a voxel-based one is called voxelization. Several notable algorithms detailing this process come from a 2010 report by Schwarz and Seidel [8]; their binary voxelization methods are the current defining work in the field, having inspired several papers proposing novel approaches to voxelization [9–13]. Schwarz and Seidel utilize the programmability offered by modern GPUs through NVIDIA’s CUDA parallel computing platform, which allows far more flexibility and control over how data is computed and processed on the GPU. Breaking out of the limitations of fixed-function rasterization leads to new approaches like direct voxelization into Sparse Voxel Octrees [8, 14, 15]. Within Schwarz and Seidel’s tile-based solid voxelization algorithm there are opportunities for varying GPU-based implementations, ranging from a simple, naive approach to advanced, multiple-pass techniques with data pre-processing. Therefore, this thesis proposes four different implementations in order to com- pare the effects