GPU Accelerated Ray-Tracing for Simulating Sound Propagation in Water
Total Page:16
File Type:pdf, Size:1020Kb
Master of Science Thesis in Electrical Engineering Department of Electrical Engineering, Linköping University, 2019 GPU Accelerated Ray-tracing for Simulating Sound Propagation in Water Joacim Stålberg & Mattias Ulmstedt Master of Science Thesis in Electrical Engineering GPU Accelerated Ray-tracing for Simulating Sound Propagation in Water: Joacim Stålberg & Mattias Ulmstedt LiTH-ISY-EX--19/5247--SE Supervisor: August Ernstsson ida, Linköping university Stefan Persson Saab Dynamics AB Examiner: Ingemar Ragnemalm isy, Linköping university Division of Automatic Control Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden Copyright © 2019 Joacim Stålberg & Mattias Ulmstedt Abstract The propagation paths of sound in water can be somewhat complicated due to the fact that the sound speed in water varies with properties such as water tempera- ture and pressure, which has the effect of curving the propagation paths. This the- sis shows how sound propagation in water can be simulated using a ray-tracing based approach on a GPU using Nvidia’s OptiX ray-tracing engine. In particu- lar, it investigates how much speed-up can be achieved compared to CPU based implementations and whether the RT cores introduced in Nvidia’s Turing archi- tecture, which provide hardware accelerated ray-tracing, can be used to speed up the computations. The presented GPU implementation is shown to be up to 310 times faster then the CPU based Fortran implementation Bellhop. Although the speed-up is significant, it is hard to say how much speed-up is gained by utilizing the RT cores due to not having anything equivalent to compare the performance to. iii Acknowledgments We would like to thank Saab Dynamics for providing us with the opportunity and resources to do this project. A special thanks to Stefan Persson, Andreas Gäll- ström and Per-Ola Svensson for their help and direction throughout the project. We would also like to thank Ingemar Ragnemalm for being our examiner, and August Ernstsson for being our academic supervisor at Linköping University. A special thanks to Jens Hovem for his previous work on the topic and for providing us with the source code of PlaneRay which served as a way to validate our implementation. Linköping, June 2019 Joacim Stålberg & Mattias Ulmstedt v Contents List of Figures ix List of Tables xi 1 Introduction 1 1.1 Motivation . 1 1.2 Aim . 2 1.3 Research Questions . 2 1.4 Delimitations . 2 1.5 Outline . 2 2 Sound Propagation in Water 3 2.1 Bottom Reflection Loss . 8 2.2 Surface Reflection Loss . 8 2.3 Acoustic Absorption . 9 2.4 Turning Points and Caustics . 9 2.5 Eigenrays . 9 2.6 Transfer Function . 10 2.7 Practical Uses . 11 3 GPGPU Computing 13 3.1 Patterson’s Three Walls . 13 3.2 GPU Architecture . 14 3.2.1 The Turing Architecture . 14 3.2.2 Nvidia GeForce RTX 2060 . 18 3.3 CUDA . 19 3.3.1 Intrinsic Functions . 19 4 Ray-tracing 21 4.1 Bounding Volume Hierarchy . 23 4.2 OptiX . 25 4.2.1 OptiX Basics . 26 4.2.2 Scenes in OptiX . 27 vii viii Contents 4.2.3 RTX Mode . 28 4.3 DirectX Raytracing . 28 4.4 Previous Work . 28 5 Method 31 5.1 Pre-study . 31 5.2 Implementation . 32 5.3 Evaluation . 33 5.3.1 RT Core Evaluation . 34 5.3.2 Floating-point Precision Format and Intrinsics . 34 5.3.3 Error Metric . 35 5.3.4 Hardware . 35 6 Implementation 37 6.1 Initializations and Parameters . 37 6.2 Modes of Operation . 38 6.3 Ray-tracing . 40 6.3.1 The Scene . 40 6.3.2 Ray Classificaton . 43 6.4 Analyzing and Sorting Collected Data . 43 6.5 Interpolation of Eigenrays . 44 6.6 Signal Processing . 45 6.7 Graphical Presentation . 47 6.8 Memory Considerations . 49 7 Results 51 7.1 Validation . 59 7.2 Floating-point Precision Format and Intrinsics . 59 8 Discussion 65 8.1 Results . 65 8.1.1 Floating-point Precision Format and Intrinsics . 67 8.2 Method . 68 9 Conclusions 71 9.1 Future Work . 71 A Scenario Parameters 75 B Sound Speed Profiles 81 Bibliography 85 List of Figures 2.1 Example of a sound speed profile . 4 2.2 Relation between ray and wave-front propagation . 4 2.3 Example of the modeling of sound propagation in the ocean . 5 2.4 Small segment of a ray path . 6 2.5 Example of a ray’s path through the layers . 7 2.6 Eigenray example . 10 3.1 Comparison between the architecture of CPUs and GPUs . 14 3.2 Architectural overview of a Turing SM . 16 3.3 Shared memory structure . 17 4.1 Ray-tracing in graphics . 22 4.2 Bounding volume hierarchy . 23 4.3 OptiX scene representation . 27 6.1 Data flow of OXSP from input to output . 38 6.2 Ray trajectories for Scenario 8 with 50 rays . 40 6.3 Simplified view of the structure of the OptiX scene in OXSP . 42 6.4 Eigenray interpolation . 44 6.5 Ricker pulse with center frequency 200 Hz in the time domain . 46 6.6 Amplitude spectrum of a Ricker pulse with center frequency 200 Hz 46 6.7 The GUI of OXSP . 47 6.8 Heatmap for Scenario 8 . 48 6.9 Heatmap for a Scenario 9 . 48 6.10 Heatmap for a Scenario 10 . 48 6.11 Heatmap for a Scenario 11 . 48 7.1 Amplitude spectrum of transfer function for Scenario 1 . 52 7.2 Phase spectrum of transfer function for Scenario 1 . 52 7.3 Eigenray contributions . 53 7.4 Final time response for Scenario 1 . 53 7.5 Ray-tracing execution times for Scenario 1 with a varying number ofrays................................... 54 7.6 Time response comparison using triangles/rectangles . 54 ix x LIST OF FIGURES 7.7 Ray-tracing execution times for Scenario 1 with a varying number of layers . 55 7.8 Full execution time of OXSP for Scenario 6 . 55 7.9 Memory transfer time . 57 7.10 S&G comprehensive error factor for Scenario 3 with a varying num- ber of rays . 57 7.11 S&G comprehensive error factor for Scenario 4 with a varying num- ber of rays . 58 7.12 S&G comprehensive error factor for Scenario 5 with a varying num- ber of rays . 58 7.13 S&G comprehensive error factor for Scenario 1 with a varying num- ber of layers . 59 7.14 Comparison of time responses of OXSP and PlaneRay for Scenario 1...................................... 60 7.15 Comparison of time responses of OXSP and PlaneRay for Scenario 2...................................... 60 7.16 Comparison of time responses of OXSP and PlaneRay for Scenario 7...................................... 61 7.17 Time responses for different precision formats and use of intrinsics for Scenario 1 . 61 7.18 Time responses for different precision formats and use of intrinsics for Scenario 5 with 1500 rays . 62 7.19 Average ray-tracing execution times for different precisions for Sce- nario 1 with a varying number of rays . 62 7.20 S&G comprehensive error factor for Scenario 3 with a varying num- ber of rays . 63 7.21 S&G comprehensive error factor for Scenario 4 with a varying num- ber of rays . 63 7.22 S&G comprehensive error factor for Scenario 5 with a varying num- ber of rays . 64 7.23 S&G comprehensive error factor for Scenario 1 with a varying num- ber of layers . 64 B.1 Sound speed profile 1 . 82 B.2 Sound speed profile 2 . 82 B.3 Sound speed profile 3 . 83 List of Tables 3.1 Intrinsic functions in CUDA . 20 6.1 Parameter list . 39 6.2 Data saved when a ray intersects the receiver depth . 41 6.3 Ray interactions and their associated values . 43 7.1 RT-core speed-up . 56 A.1 Scenario 1 parameters . 76 A.2 Scenario 2 parameters . 76 A.3 Scenario 3 parameters . 77 A.4 Scenario 4 parameters . 77 A.5 Scenario 5 parameters . ..