Porting and Maintaining a Developing Optix Path Tracer for Non-NVIDIA Devices

Total Page:16

File Type:pdf, Size:1020Kb

Porting and Maintaining a Developing Optix Path Tracer for Non-NVIDIA Devices Master Thesis Porting and Maintaining a Developing OptiX Path Tracer for Non-NVIDIA Devices Supervisors: Dr. ing. Jacco Bikker, Dr. Zerrin Yumak Satwiko Wirawan Indrawanto ICA-6201539 Faculty of Science Department of Informatics and Computing Science 13 August 2019 Contents 1. Introduction ............................................................................................................................................ 4 1.1 Objectives ..................................................................................................................................... 4 1.2 Research Questions ...................................................................................................................... 4 1.3 Content Overview ........................................................................................................................ 4 2. Preliminaries .......................................................................................................................................... 6 2.1 Rendering ..................................................................................................................................... 6 2.1.1 Rendering Equation ................................................................................................................. 6 2.1.2 Path Tracing ............................................................................................................................. 7 2.1.3 Performance ............................................................................................................................. 8 2.2 Case Study: LightHouse 2 ............................................................................................................ 8 2.2.1 Application Layer .................................................................................................................... 9 2.2.2 Rendering System .................................................................................................................... 9 2.2.3 Rendering Core ........................................................................................................................ 9 2.3 OptiX .......................................................................................................................................... 10 3. Literature Review ................................................................................................................................ 12 3.1 Portability Concepts ................................................................................................................... 12 3.2 Porting Cost ................................................................................................................................ 14 3.3 GPGPU Development Technologies .......................................................................................... 15 3.4 NVIDIA CUDA ......................................................................................................................... 16 3.5 OpenCL ...................................................................................................................................... 17 3.6 HIP ............................................................................................................................................. 18 3.7 Automatic Porting ...................................................................................................................... 19 3.7.1 CU2CL ................................................................................................................................... 19 3.7.2 Hipify ..................................................................................................................................... 19 3.8 Ray Tracing API ......................................................................................................................... 19 3.8.1 Radeon Rays .......................................................................................................................... 19 4. Research Methodology ........................................................................................................................ 21 4.1 Porting Implementation .............................................................................................................. 21 4.2 Portability Experiment ............................................................................................................... 21 4.3 Features Experiment ................................................................................................................... 21 4.4 Performance Experiment ............................................................................................................ 22 5. Implementation .................................................................................................................................... 23 5.1 Device Selection ......................................................................................................................... 23 5.1.1 OpenCL-OpenGL Interoperability ......................................................................................... 24 5.1.2 Use of Persistent Threads ....................................................................................................... 24 2 5.2 Data Structure ............................................................................................................................. 24 5.2.1 Data Alignment ...................................................................................................................... 25 5.2.2 Half Data Type ....................................................................................................................... 26 5.2.3 Buffer with Device Pointers ................................................................................................... 27 5.2.4 Pass-by-Reference .................................................................................................................. 27 5.2.5 Constant Variables and Buffers ............................................................................................. 28 5.2.6 Fixed-point Arithmetic ........................................................................................................... 29 5.3 Ray Tracing Engine .................................................................................................................... 29 5.3.1 Ray Structure ......................................................................................................................... 30 5.3.2 Intersection Structure ............................................................................................................. 30 5.4 Scene Geometry Loading ........................................................................................................... 31 5.5 Rendering ................................................................................................................................... 32 5.5.1 On-render Buffer Initialization .............................................................................................. 32 5.5.2 Primary Rays .......................................................................................................................... 32 5.5.3 Shading .................................................................................................................................. 33 5.5.4 Shadow Rays .......................................................................................................................... 34 5.6 Persistent Threads ...................................................................................................................... 35 6. Results .................................................................................................................................................. 36 6.1 Measure of Software Similarity ................................................................................................. 36 6.2 Image Comparison ..................................................................................................................... 37 6.2.1 Mean Squared Error ............................................................................................................... 38 6.2.2 Structural Similarity ............................................................................................................... 41 6.2.3 Cross-platform Image Comparison ........................................................................................ 43 6.2.4 Cross-core Image Comparison ............................................................................................... 45 6.3 Performance ............................................................................................................................... 48 6.3.1 Cross-platform Performance .................................................................................................. 49 6.3.2 Persistent Threads .................................................................................................................. 50 7. Conclusion ........................................................................................................................................... 52 7.1 Future Work ............................................................................................................................... 53 8. References ...........................................................................................................................................
Recommended publications
  • CUDA-SCOTTY Fast Interactive CUDA Path Tracer Using Wide Trees and Dynamic Ray Scheduling
    15-618: Parallel Computer Architecture and Programming CUDA-SCOTTY Fast Interactive CUDA Path Tracer using Wide Trees and Dynamic Ray Scheduling “Golden Dragon” TEAM MEMBERS Sai Praveen Bangaru (Andrew ID: saipravb) Sam K Thomas (Andrew ID: skthomas) Introduction Path tracing has long been the select method used by the graphics community to render photo-realistic images. It has found wide uses across several industries, and plays a major role in animation and filmmaking, with most special effects rendered using some form of Monte Carlo light transport. It comes as no surprise, then, that optimizing path tracing algorithms is a widely studied field, so much so, that it has it’s own top-tier conference (​HPG; ​High Performance Graphics). There are generally a whole spectrum of methods to increase the efficiency of path tracing. Some methods aim to create ​better sampling methods (Metropolis Light Transport), while others try to ​reduce noise​ in the final image by filtering the output ​(4D Sheared transform)​. In the spirit of the parallel programming course ​15-618​, however, we focus on a third category: system-level optimizations and leveraging hardware and algorithms that better utilize the hardware (​Wide Trees, Packet tracing, Dynamic Ray Scheduling)​. Most of these methods, understandably, focus on the ​ray-scene intersection ​part of the path tracing pipeline, since that is the main bottleneck. In the following project, we describe the implementation of a hybrid ​non-packet​ method which uses ​Wide Trees​ and ​Dynamic Ray Scheduling ​to provide an ​80x ​improvement over a ​8-threaded CPU implementation. Summary Over the course of roughly 3 weeks, we studied two ​non-packet​ BVH traversal optimizations: ​Wide Trees​, which involve non-binary BVHs for shallower BVHs and better Warp/SIMD Utilization and ​Dynamic Ray Scheduling​ which involved changing our perspective to process rays on a per-node basis rather than processing nodes on a per-ray basis.
    [Show full text]
  • Practical Path Guiding for Efficient Light-Transport Simulation
    Eurographics Symposium on Rendering 2017 Volume 36 (2017), Number 4 P. Sander and M. Zwicker (Guest Editors) Practical Path Guiding for Efficient Light-Transport Simulation Thomas Müller1;2 Markus Gross1;2 Jan Novák2 1ETH Zürich 2Disney Research Vorba et al. MSE: 0.017 Reference Ours (equal time) MSE: 0.018 Training: 5.1 min Training: 0.73 min Rendering: 4.2 min, 8932 spp Rendering: 4.2 min, 11568 spp Figure 1: Our method allows efficient guiding of path-tracing algorithms as demonstrated in the TORUS scene. We compare equal-time (4.2 min) renderings of our method (right) to the current state-of-the-art [VKv∗14, VK16] (left). Our algorithm automatically estimates how much training time is optimal, displays a rendering preview during training, and requires no parameter tuning. Despite being fully unidirectional, our method achieves similar MSE values compared to Vorba et al.’s method, which trains bidirectionally. Abstract We present a robust, unbiased technique for intelligent light-path construction in path-tracing algorithms. Inspired by existing path-guiding algorithms, our method learns an approximate representation of the scene’s spatio-directional radiance field in an unbiased and iterative manner. To that end, we propose an adaptive spatio-directional hybrid data structure, referred to as SD-tree, for storing and sampling incident radiance. The SD-tree consists of an upper part—a binary tree that partitions the 3D spatial domain of the light field—and a lower part—a quadtree that partitions the 2D directional domain. We further present a principled way to automatically budget training and rendering computations to minimize the variance of the final image.
    [Show full text]
  • AMD Accelerated Parallel Processing Opencl Programming Guide
    AMD Accelerated Parallel Processing OpenCL Programming Guide November 2013 rev2.7 © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated Parallel Processing logo, ATI, the ATI logo, Radeon, FireStream, FirePro, Catalyst, and combinations thereof are trade- marks of Advanced Micro Devices, Inc. Microsoft, Visual Studio, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the U.S. and/or other jurisdic- tions. Other names are for informational purposes only and may be trademarks of their respective owners. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or other- wise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as compo- nents in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD’s product could create a situation where personal injury, death, or severe property or envi- ronmental damage may occur.
    [Show full text]
  • Comparison of Technologies for General-Purpose Computing on Graphics Processing Units
    Master of Science Thesis in Information Coding Department of Electrical Engineering, Linköping University, 2016 Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Torbjörn Sörman Master of Science Thesis in Information Coding Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Torbjörn Sörman LiTH-ISY-EX–16/4923–SE Supervisor: Robert Forchheimer isy, Linköpings universitet Åsa Detterfelt MindRoad AB Examiner: Ingemar Ragnemalm isy, Linköpings universitet Organisatorisk avdelning Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden Copyright © 2016 Torbjörn Sörman Abstract The computational capacity of graphics cards for general-purpose computing have progressed fast over the last decade. A major reason is computational heavy computer games, where standard of performance and high quality graphics con- stantly rise. Another reason is better suitable technologies for programming the graphics cards. Combined, the product is high raw performance devices and means to access that performance. This thesis investigates some of the current technologies for general-purpose computing on graphics processing units. Tech- nologies are primarily compared by means of benchmarking performance and secondarily by factors concerning programming and implementation. The choice of technology can have a large impact on performance. The benchmark applica- tion found the difference in execution time of the fastest technology, CUDA, com- pared to the slowest, OpenCL, to be twice a factor of two. The benchmark applica- tion also found out that the older technologies, OpenGL and DirectX, are compet- itive with CUDA and OpenCL in terms of resulting raw performance. iii Acknowledgments I would like to thank Åsa Detterfelt for the opportunity to make this thesis work at MindRoad AB.
    [Show full text]
  • Megakernels Considered Harmful: Wavefront Path Tracing on Gpus
    Megakernels Considered Harmful: Wavefront Path Tracing on GPUs Samuli Laine Tero Karras Timo Aila NVIDIA∗ Abstract order to handle irregular control flow, some threads are masked out when executing a branch they should not participate in. This in- When programming for GPUs, simply porting a large CPU program curs a performance loss, as masked-out threads are not performing into an equally large GPU kernel is generally not a good approach. useful work. Due to SIMT execution model on GPUs, divergence in control flow carries substantial performance penalties, as does high register us- The second factor is the high-bandwidth, high-latency memory sys- age that lessens the latency-hiding capability that is essential for the tem. The impressive memory bandwidth in modern GPUs comes at high-latency, high-bandwidth memory system of a GPU. In this pa- the expense of a relatively long delay between making a memory per, we implement a path tracer on a GPU using a wavefront formu- request and getting the result. To hide this latency, GPUs are de- lation, avoiding these pitfalls that can be especially prominent when signed to accommodate many more threads than can be executed in using materials that are expensive to evaluate. We compare our per- any given clock cycle, so that whenever a group of threads is wait- formance against the traditional megakernel approach, and demon- ing for a memory request to be served, other threads may be exe- strate that the wavefront formulation is much better suited for real- cuted. The effectiveness of this mechanism, i.e., the latency-hiding world use cases where multiple complex materials are present in capability, is determined by the threads’ resource usage, the most the scene.
    [Show full text]
  • An Advanced Path Tracing Architecture for Movie Rendering
    RenderMan: An Advanced Path Tracing Architecture for Movie Rendering PER CHRISTENSEN, JULIAN FONG, JONATHAN SHADE, WAYNE WOOTEN, BRENDEN SCHUBERT, ANDREW KENSLER, STEPHEN FRIEDMAN, CHARLIE KILPATRICK, CLIFF RAMSHAW, MARC BAN- NISTER, BRENTON RAYNER, JONATHAN BROUILLAT, and MAX LIANI, Pixar Animation Studios Fig. 1. Path-traced images rendered with RenderMan: Dory and Hank from Finding Dory (© 2016 Disney•Pixar). McQueen’s crash in Cars 3 (© 2017 Disney•Pixar). Shere Khan from Disney’s The Jungle Book (© 2016 Disney). A destroyer and the Death Star from Lucasfilm’s Rogue One: A Star Wars Story (© & ™ 2016 Lucasfilm Ltd. All rights reserved. Used under authorization.) Pixar’s RenderMan renderer is used to render all of Pixar’s films, and by many 1 INTRODUCTION film studios to render visual effects for live-action movies. RenderMan started Pixar’s movies and short films are all rendered with RenderMan. as a scanline renderer based on the Reyes algorithm, and was extended over The first computer-generated (CG) animated feature film, Toy Story, the years with ray tracing and several global illumination algorithms. was rendered with an early version of RenderMan in 1995. The most This paper describes the modern version of RenderMan, a new architec- ture for an extensible and programmable path tracer with many features recent Pixar movies – Finding Dory, Cars 3, and Coco – were rendered that are essential to handle the fiercely complex scenes in movie production. using RenderMan’s modern path tracing architecture. The two left Users can write their own materials using a bxdf interface, and their own images in Figure 1 show high-quality rendering of two challenging light transport algorithms using an integrator interface – or they can use the CG movie scenes with many bounces of specular reflections and materials and light transport algorithms provided with RenderMan.
    [Show full text]
  • Getting Started (Pdf)
    GETTING STARTED PHOTO REALISTIC RENDERS OF YOUR 3D MODELS Available for Ver. 1.01 © Kerkythea 2008 Echo Date: April 24th, 2008 GETTING STARTED Page 1 of 41 Written by: The KT Team GETTING STARTED Preface: Kerkythea is a standalone render engine, using physically accurate materials and lights, aiming for the best quality rendering in the most efficient timeframe. The target of Kerkythea is to simplify the task of quality rendering by providing the necessary tools to automate scene setup, such as staging using the GL real-time viewer, material editor, general/render settings, editors, etc., under a common interface. Reaching now the 4th year of development and gaining popularity, I want to believe that KT can now be considered among the top freeware/open source render engines and can be used for both academic and commercial purposes. In the beginning of 2008, we have a strong and rapidly growing community and a website that is more "alive" than ever! KT2008 Echo is very powerful release with a lot of improvements. Kerkythea has grown constantly over the last year, growing into a standard rendering application among architectural studios and extensively used within educational institutes. Of course there are a lot of things that can be added and improved. But we are really proud of reaching a high quality and stable application that is more than usable for commercial purposes with an amazing zero cost! Ioannis Pantazopoulos January 2008 Like the heading is saying, this is a Getting Started “step-by-step guide” and it’s designed to get you started using Kerkythea 2008 Echo.
    [Show full text]
  • 3D Animation
    Contents Zoom In Zoom Out For navigation instructions please click here Search Issue Next Page ComputerINNOVATIONS IN VISUAL COMPUTING FOR THE GLOBAL DCC COMMUNITY June 2007 www.cgw.com WORLD Making Waves Digital artists create ‘pretend spontaneity’ in the documentary-style animation Surf’s Up $4.95 USA $6.50 Canada Contents Zoom In Zoom Out For navigation instructions please click here Search Issue Next Page A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS _____________________________________________________ A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS June 2007 • Volume 30 • Number 6 INNOVATIONS IN VISUAL COMPUTING FOR THE GLOBAL DCC COMMUNITY Also see www.cgw.com for computer graphics news, special surveys and reports, and the online gallery. ____________ » Director Luc Besson discusses Computer WORLD his black-and-white fi lm, WORLD Post Angel-A. » Trends in broadcast design. » Getting the most out of canned music and sound. See it in www.postmagazine.com Features Cover story Radical, Dude 12 3D ANIMATION | In one of the most unusual animated features to hit the Departments screen, Surf’s Up incorporates a documentary fi lming style into the Editor’s Note 2 CG medium. Triple the Fun Summer blockbusters are making their By Barbara Robertson debut at theaters, and this year, it is Wrangling Waves 18 apparent that three’s a charm, as ani- 3D ANIMATION | The visual effects mators upped the graphics ante in 12 supervisor on Surf’s Up takes us on an Spider-Man 3, Shrek 3, and At World’s incredible behind-the-scenes journey End.
    [Show full text]
  • Kerkythea 2007 Rendering System
    GETTING STARTED PHOTO REALISTIC RENDERS OF YOUR 3D MODELS Available for Ver. 1.01 © Kerkythea 2008 Echo Date: April 24th, 2008 GETTING STARTED Page 1 of 41 Written by: The KT Team GETTING STARTED Preface: Kerkythea is a standalone render engine, using physically accurate materials and lights, aiming for the best quality rendering in the most efficient timeframe. The target of Kerkythea is to simplify the task of quality rendering by providing the necessary tools to automate scene setup, such as staging using the GL real-time viewer, material editor, general/render settings, editors, etc., under a common interface. Reaching now the 4th year of development and gaining popularity, I want to believe that KT can now be considered among the top freeware/open source render engines and can be used for both academic and commercial purposes. In the beginning of 2008, we have a strong and rapidly growing community and a website that is more "alive" than ever! KT2008 Echo is very powerful release with a lot of improvements. Kerkythea has grown constantly over the last year, growing into a standard rendering application among architectural studios and extensively used within educational institutes. Of course there are a lot of things that can be added and improved. But we are really proud of reaching a high quality and stable application that is more than usable for commercial purposes with an amazing zero cost! Ioannis Pantazopoulos January 2008 Like the heading is saying, this is a Getting Started “step-by-step guide” and it’s designed to get you started using Kerkythea 2008 Echo.
    [Show full text]
  • Path Tracing
    Advanced Computer Graphics Path Tracing Matthias Teschner Computer Science Department University of Freiburg Motivation global illumination tries to account for all light-transport mechanisms in a scene considers direct and indirect illumination (emitted and reflected radiance) allows for effects such as, e.g., interreflections (surfaces illuminate direct direct illumination illumination each other, potentially changing their colors) caustics (reflected radiance from indirect surfaces is focused at illumination a scene point) University of Freiburg – Computer Science Department – Computer Graphics - 2 [Suffern] Motivation caustics and color transfer / bleeding from red and green side walls, i.e. interreflections http://www.cse.iitb.ac.in/~rhushabh/ University of Freiburg – Computer Science Department – Computer Graphics - 3 [Rhushabh Goradia] Outline path tracing brute-force path tracing direct illumination indirect illumination University of Freiburg – Computer Science Department – Computer Graphics - 4 Path Tracing - Concept generate light transport paths (a chain of rays) from visible surface points to light sources rays are traced recursively until hitting a light source recursively evaluate the rendering equation along a path ray generation in a path is governed by light sources and BRDFs recursion depth is generally governed by the amount of radiance along a ray can distinguish direct and indirect illumination direct: emitted radiance from light sources indirect: reflected radiance from surfaces (and light sources)
    [Show full text]
  • Path Tracing on Massively Parallel Neuromorphic Hardware
    EG UK Theory and Practice of Computer Graphics (2012) Hamish Carr and Silvester Czanner (Editors) Path Tracing on Massively Parallel Neuromorphic Hardware P. Richmond1 and D.J. Allerton1 1Department of Automatic Control Systems Engineering, University of Sheffield Abstract Ray tracing on parallel hardware has recently benefit from significant advances in the graphics hardware and associated software tools. Despite this, the SIMD nature of graphics card architectures is only able to perform well on groups of coherent rays which exhibit little in the way of divergence. This paper presents SpiNNaker, a massively parallel system based on low power ARM cores, as an architecture suitable for ray tracing applications. The asynchronous design allows us to demonstrate a linear performance increase with respect to the number of cores. The performance per Watt ratio achieved within the fixed point path tracing example presented is far greater than that of a multi-core CPU and similar to that of a GPU under optimal conditions. Categories and Subject Descriptors (according to ACM CCS): I.3.1 [Computer Graphics]: Hardware Architecture— Parallel processing I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Ray Tracing 1. Introduction million cores) and highly interconnected architecture which Ray tracing offers a significant departure from traditional considers power efficiency as a primary design goal. Origi- rasterized graphics with the promise of more naturally oc- nally designed for the purpose of simulating large neuronal curring lighting effects such as soft shadows, global illu- models in real time, the architecture is based on low power mination and caustics. Understandably this improved visual (passively cooled) asynchronous ARM processors with pro- realism comes at a large computational cost.
    [Show full text]
  • Graviton: Trusted Execution Environments on Gpus
    In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18) Graviton: Trusted Execution Environments on GPUs Stavros Volos Kapil Vaswani Rodrigo Bruno Microsoft Research Microsoft Research INESC-ID / IST, University of Lisbon Abstract tors. This limitation gives rise to an undesirable trade-off between security and performance. We propose Graviton, an architecture for supporting There are several reasons why adding TEE support trusted execution environments on GPUs. Graviton en- to accelerators is challenging. With most accelerators, ables applications to offload security- and performance- a device driver is responsible for managing device re- sensitive kernels and data to a GPU, and execute kernels sources (e.g., device memory) and has complete control in isolation from other code running on the GPU and all over the device. Furthermore, high-throughput accelera- software on the host, including the device driver, the op- tors (e.g., GPUs) achieve high performance by integrat- erating system, and the hypervisor. Graviton can be in- ing a large number of cores, and using high bandwidth tegrated into existing GPUs with relatively low hardware memory to satisfy their massive bandwidth requirements complexity; all changes are restricted to peripheral com- [4, 11]. Any major change in the cores, memory man- ponents, such as the GPU’s command processor, with agement unit, or the memory controller can result in no changes to existing CPUs, GPU cores, or the GPU’s unacceptably large overheads. For instance, providing MMU and memory controller. We also propose exten- memory confidentiality and integrity via an encryption sions to the CUDA runtime for securely copying data engine and Merkle tree will significantly impact avail- and executing kernels on the GPU.
    [Show full text]