Optix: a General Purpose Ray Tracing Engine
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Ray-Tracing in Vulkan.Pdf
Ray-tracing in Vulkan A brief overview of the provisional VK_KHR_ray_tracing API Jason Ekstrand, XDC 2020 Who am I? ▪ Name: Jason Ekstrand ▪ Employer: Intel ▪ First freedesktop.org commit: wayland/31511d0e dated Jan 11, 2013 ▪ What I work on: Everything Intel but not OpenGL front-end – src/intel/* – src/compiler/nir – src/compiler/spirv – src/mesa/drivers/dri/i965 – src/gallium/drivers/iris 2 Vulkan ray-tracing history: ▪ On March 19, 2018, Microsoft announced DirectX Ray-tracing (DXR) ▪ On September 19, 2018, Vulkan 1.1.85 included VK_NVX_ray_tracing for ray- tracing on Nvidia RTX GPUs ▪ On March 17, 2020, Khronos released provisional cross-vendor extensions: – VK_KHR_ray_tracing – SPV_KHR_ray_tracing ▪ Final cross-vendor Vulkan ray-tracing extensions are still in-progress within the Khronos Vulkan working group 3 Overview: My objective with this presentation is mostly educational ▪ Quick overview of ray-tracing concepts ▪ Walk through how it all maps to the Vulkan ray-tracing API ▪ Focus on the provisional VK/SPV_KHR_ray_tracing extension – There are several details that will likely be different in the final extension – None of that is public yet, sorry. – The general shape should be roughly the same between provisional and final ▪ Not going to discuss details of ray-tracing on Intel GPUs 4 What is ray-tracing? All 3D rendering is a simulation of physical light 6 7 8 Why don't we do all 3D rendering this way? The primary problem here is wasted rays ▪ The chances of a random photon from the sun hitting your scene is tiny – About 1 in -
CUDA-SCOTTY Fast Interactive CUDA Path Tracer Using Wide Trees and Dynamic Ray Scheduling
15-618: Parallel Computer Architecture and Programming CUDA-SCOTTY Fast Interactive CUDA Path Tracer using Wide Trees and Dynamic Ray Scheduling “Golden Dragon” TEAM MEMBERS Sai Praveen Bangaru (Andrew ID: saipravb) Sam K Thomas (Andrew ID: skthomas) Introduction Path tracing has long been the select method used by the graphics community to render photo-realistic images. It has found wide uses across several industries, and plays a major role in animation and filmmaking, with most special effects rendered using some form of Monte Carlo light transport. It comes as no surprise, then, that optimizing path tracing algorithms is a widely studied field, so much so, that it has it’s own top-tier conference (HPG; High Performance Graphics). There are generally a whole spectrum of methods to increase the efficiency of path tracing. Some methods aim to create better sampling methods (Metropolis Light Transport), while others try to reduce noise in the final image by filtering the output (4D Sheared transform). In the spirit of the parallel programming course 15-618, however, we focus on a third category: system-level optimizations and leveraging hardware and algorithms that better utilize the hardware (Wide Trees, Packet tracing, Dynamic Ray Scheduling). Most of these methods, understandably, focus on the ray-scene intersection part of the path tracing pipeline, since that is the main bottleneck. In the following project, we describe the implementation of a hybrid non-packet method which uses Wide Trees and Dynamic Ray Scheduling to provide an 80x improvement over a 8-threaded CPU implementation. Summary Over the course of roughly 3 weeks, we studied two non-packet BVH traversal optimizations: Wide Trees, which involve non-binary BVHs for shallower BVHs and better Warp/SIMD Utilization and Dynamic Ray Scheduling which involved changing our perspective to process rays on a per-node basis rather than processing nodes on a per-ray basis. -
Practical Path Guiding for Efficient Light-Transport Simulation
Eurographics Symposium on Rendering 2017 Volume 36 (2017), Number 4 P. Sander and M. Zwicker (Guest Editors) Practical Path Guiding for Efficient Light-Transport Simulation Thomas Müller1;2 Markus Gross1;2 Jan Novák2 1ETH Zürich 2Disney Research Vorba et al. MSE: 0.017 Reference Ours (equal time) MSE: 0.018 Training: 5.1 min Training: 0.73 min Rendering: 4.2 min, 8932 spp Rendering: 4.2 min, 11568 spp Figure 1: Our method allows efficient guiding of path-tracing algorithms as demonstrated in the TORUS scene. We compare equal-time (4.2 min) renderings of our method (right) to the current state-of-the-art [VKv∗14, VK16] (left). Our algorithm automatically estimates how much training time is optimal, displays a rendering preview during training, and requires no parameter tuning. Despite being fully unidirectional, our method achieves similar MSE values compared to Vorba et al.’s method, which trains bidirectionally. Abstract We present a robust, unbiased technique for intelligent light-path construction in path-tracing algorithms. Inspired by existing path-guiding algorithms, our method learns an approximate representation of the scene’s spatio-directional radiance field in an unbiased and iterative manner. To that end, we propose an adaptive spatio-directional hybrid data structure, referred to as SD-tree, for storing and sampling incident radiance. The SD-tree consists of an upper part—a binary tree that partitions the 3D spatial domain of the light field—and a lower part—a quadtree that partitions the 2D directional domain. We further present a principled way to automatically budget training and rendering computations to minimize the variance of the final image. -
GPU Developments 2018
GPU Developments 2018 2018 GPU Developments 2018 © Copyright Jon Peddie Research 2019. All rights reserved. Reproduction in whole or in part is prohibited without written permission from Jon Peddie Research. This report is the property of Jon Peddie Research (JPR) and made available to a restricted number of clients only upon these terms and conditions. Agreement not to copy or disclose. This report and all future reports or other materials provided by JPR pursuant to this subscription (collectively, “Reports”) are protected by: (i) federal copyright, pursuant to the Copyright Act of 1976; and (ii) the nondisclosure provisions set forth immediately following. License, exclusive use, and agreement not to disclose. Reports are the trade secret property exclusively of JPR and are made available to a restricted number of clients, for their exclusive use and only upon the following terms and conditions. JPR grants site-wide license to read and utilize the information in the Reports, exclusively to the initial subscriber to the Reports, its subsidiaries, divisions, and employees (collectively, “Subscriber”). The Reports shall, at all times, be treated by Subscriber as proprietary and confidential documents, for internal use only. Subscriber agrees that it will not reproduce for or share any of the material in the Reports (“Material”) with any entity or individual other than Subscriber (“Shared Third Party”) (collectively, “Share” or “Sharing”), without the advance written permission of JPR. Subscriber shall be liable for any breach of this agreement and shall be subject to cancellation of its subscription to Reports. Without limiting this liability, Subscriber shall be liable for any damages suffered by JPR as a result of any Sharing of any Material, without advance written permission of JPR. -
Adding RTX Acceleration to Iray with Optix 7
Adding RTX acceleration to Iray with OptiX 7 Lutz Kettner Director Advanced Rendering and Materials July 30th, SIGGRAPH 2019 What is Iray? Production Rendering on CUDA In Production since > 10 Years Bring ray tracing based production / simulation quality rendering to GPUs New paradigm: Push Button rendering (open up new markets) Plugins for 3ds Max Maya Rhino SketchUp … 2 SIMULATION QUALITY 3 iray legacy ARTISTIC FREEDOM 4 How Does it Work? 99% physically based Path Tracing To guarantee simulation quality and Push Button • Limit shortcuts and good enough hacks to minimum • Brute force (spectral) simulation no intermediate filtering scale over multiple GPUs and hosts even in interactive use GTC 2014 19 VCA * 8 Q6000 GPUs 5 How Does it Work? 99% physically based Path Tracing To guarantee simulation quality and Push Button • Limit shortcuts and good enough hacks to minimum • Brute force (spectral) simulation no intermediate filtering scale over multiple GPUs and hosts even in interactive use • Two-way path tracing from camera and (opt.) lights 6 How Does it Work? 99% physically based Path Tracing To guarantee simulation quality and Push Button • Limit shortcuts and good enough hacks to minimum • Brute force (spectral) simulation no intermediate filtering scale over multiple GPUs and hosts even in interactive use • Two-way path tracing from camera and (opt.) lights • Use NVIDIA Material Definition Language (MDL) 7 How Does it Work? 99% physically based Path Tracing To guarantee simulation quality and Push Button • Limit shortcuts and good -
Directx ® Ray Tracing
DIRECTX® RAYTRACING 1.1 RYS SOMMEFELDT INTRO • DirectX® Raytracing 1.1 in DirectX® 12 Ultimate • AMD RDNA™ 2 PC performance recommendations AMD PUBLIC | DirectX® 12 Ultimate: DirectX Ray Tracing 1.1 | November 2020 2 WHAT IS DXR 1.1? • Adds inline raytracing • More direct control over raytracing workload scheduling • Allows raytracing from all shader stages • Particularly well suited to solving secondary visibility problems AMD PUBLIC | DirectX® 12 Ultimate: DirectX Ray Tracing 1.1 | November 2020 3 NEW RAY ACCELERATOR AMD PUBLIC | DirectX® 12 Ultimate: DirectX Ray Tracing 1.1 | November 2020 4 DXR 1.1 BEST PRACTICES • Trace as few rays as possible to achieve the right level of quality • Content and scene dependent techniques work best • Positive results when driving your raytracing system from a scene classifier • 1 ray per pixel can generate high quality results • Especially when combined with techniques and high quality denoising systems • Lets you judiciously spend your ray tracing budget right where it will pay off AMD PUBLIC | DirectX® 12 Ultimate: DirectX Ray Tracing 1.1 | November 2020 5 USING DXR 1.1 TO TRACE RAYS • DXR 1.1 lets you call TraceRay() from any shader stage • Best performance is found when you use it in compute shaders, dispatched on a compute queue • Matches existing asynchronous compute techniques you’re already familiar with • Always have just 1 active RayQuery object in scope at any time AMD PUBLIC | DirectX® 12 Ultimate: DirectX Ray Tracing 1.1 | November 2020 6 RESOURCE BALANCING • Part of our ray tracing system -
The Growing Importance of Ray Tracing Due to Gpus
NVIDIA Application Acceleration Engines advancing interactive realism & development speed July 2010 NVIDIA Application Acceleration Engines A family of highly optimized software modules, enabling software developers to supercharge applications with high performance capabilities that exploit NVIDIA GPUs. Easy to acquire, license and deploy (most being free) Valuable features and superior performance can be quickly added App’s stay pace with GPU advancements (via API abstraction) NVIDIA Application Acceleration Engines PhysX physics & dynamics engine breathing life into real-time 3D; Apex enabling 3D animators CgFX programmable shading engine enhancing realism across platforms and hardware SceniX scene management engine the basis of a real-time 3D system CompleX scene scaling engine giving a broader/faster view on massive data OptiX ray tracing engine making ray tracing ultra fast to execute and develop iray physically correct, photorealistic renderer, from mental images making photorealism easy to add and produce © 2010 Application Acceleration Engines PhysX • Streamlines the adoption of latest GPU capabilities, physics & dynamics getting cutting-edge features into applications ASAP, CgFX exploiting the full power of larger and multiple GPUs programmable shading • Gaining adoption by key ISVs in major markets: SceniX scene • Oil & Gas Statoil, Open Inventor management • Design Autodesk, Dassault Systems CompleX • Styling Autodesk, Bunkspeed, RTT, ICIDO scene scaling • Digital Content Creation Autodesk OptiX ray tracing • Medical Imaging N.I.H iray photoreal rendering © 2010 Accelerating Application Development App Example: Auto Styling App Example: Seismic Interpretation 1. Establish the Scene 1. Establish the Scene = SceniX = SceniX 2. Maximize interactive 2. Maximize data visualization quality + quad buffered stereo + CgFX + OptiX + volume rendering + ambient occlusion 3. -
Visualization Tools and Trends a Resource Tour the Obligatory Disclaimer
Visualization Tools and Trends A resource tour The obligatory disclaimer This presentation is provided as part of a discussion on transportation visualization resources. The Atlanta Regional Commission (ARC) does not endorse nor profit, in whole or in part, from any product or service offered or promoted by any of the commercial interests whose products appear herein. No funding or sponsorship, in whole or in part, has been provided in return for displaying these products. The products are listed herein at the sole discretion of the presenter and are principally oriented toward transportation practitioners as well as graphics and media professionals. The ARC disclaims and waives any responsibility, in whole or in part, for any products, services or merchandise offered by the aforementioned commercial interests or any of their associated parties or entities. You should evaluate your own individual requirements against available resources when establishing your own preferred methods of visualization. What is Visualization • As described on Wikipedia • Illustration • Information Graphics – visual representations of information data or knowledge • Mental Image – imagination • Spatial Visualization – ability to mentally manipulate 2dimensional and 3dimensional figures • Computer Graphics • Interactive Imaging • Music visual IEEE on Visualization “Traditionally the tool of the statistician and engineer, information visualization has increasingly become a powerful new medium for artists and designers as well. Owing in part to the mainstreaming -
RTX Beyond Ray Tracing
RTX Beyond Ray Tracing Exploring the Use of Hardware Ray Tracing Cores for Tet-Mesh Point Location -Now, let’s run a lot of experiments … I Wald (NVIDIA), W Usher, N Morrical, L Lediaev, V Pascucci (University of Utah) Motivation – What this is about - In this paper: We accelerate Unstructured-Data (Tet Mesh) Volume Ray Casting… NVIDIA Confidential Motivation – What this is about - In this paper: We accelerate Unstructured-Data (Tet Mesh) Volume Ray Casting… - But: This is not what this is (primarily) about - Volume rendering is just a “proof of concept”. - Original question: “What else” can you do with RTX? - Remember the early 2000’s (e.g., “register combiners”): Lots of innovation around “using graphics hardware for non- graphics problems”. - Since CUDA: Much of that has been subsumed through CUDA - Today: Now that we have new hardware units (RTX, Tensor Cores), what else could we (ab-)use those for? (“(ab-)use” as in “use for something that it wasn’t intended for”) NVIDIA Confidential Motivation – What this is about - In this paper: We accelerate Unstructured-Data (Tet Mesh) Volume Ray Casting… - But: This is not what this is (primarily) about - Volume rendering is just a “proof of concept”. - Original question: “What else” can you do with RTX? - Remember the early 2000’s (e.g., “register combiners”): Lots of innovation around “using graphics hardware for non- graphics →problems”.Two main goal(s) of this paper: -a)SinceGet CUDA: readers Much ofto that think has beenabout subsumed the “what through else”s CUDA… - Today: Nowb) Showthat -
Geforce ® RTX 2070 Overclocked Dual
GAMING GeForce RTX™ 2O7O 8GB XLR8 Gaming Overclocked Edition Up to 6X Faster Performance Real-Time Ray Tracing in Games Latest AI Enhanced Graphics Experience 6X the performance of previous-generation GeForce RTX™ 2070 is light years ahead of other cards, delivering Powered by NVIDIA Turing, GeForce™ RTX 2070 brings the graphics cards combined with maximum power efficiency. truly unique real-time ray-tracing technologies for cutting-edge, power of AI to games. hyper-realistic graphics. GRAPHICS REINVENTED PRODUCT SPECIFICATIONS ® The powerful new GeForce® RTX 2070 takes advantage of the cutting- NVIDIA CUDA Cores 2304 edge NVIDIA Turing™ architecture to immerse you in incredible realism Clock Speed 1410 MHz and performance in the latest games. The future of gaming starts here. Boost Speed 1710 MHz GeForce® RTX graphics cards are powered by the Turing GPU Memory Speed (Gbps) 14 architecture and the all-new RTX platform. This gives you up to 6x the Memory Size 8GB GDDR6 performance of previous-generation graphics cards and brings the Memory Interface 256-bit power of real-time ray tracing and AI to your favorite games. Memory Bandwidth (Gbps) 448 When it comes to next-gen gaming, it’s all about realism. GeForce RTX TDP 185 W>5 2070 is light years ahead of other cards, delivering truly unique real- NVLink Not Supported time ray-tracing technologies for cutting-edge, hyper-realistic graphics. Outputs DisplayPort 1.4 (x2), HDMI 2.0b, USB Type-C Multi-Screen Yes Resolution 7680 x 4320 @60Hz (Digital)>1 KEY FEATURES SYSTEM REQUIREMENTS Power -
RTX-Accelerated Hair Brought to Life with NVIDIA Iray (GTC 2020 S22494)
RTX-accelerated Hair brought to Life with NVIDIA Iray (GTC 2020 S22494) Carsten Waechter, March 2020 What is Iray? Production Rendering on CUDA In Production since > 10 Years Bring ray tracing based production / simulation quality rendering to GPUs New paradigm: Push Button rendering (open up new markets) Plugins for 3ds Max Maya Rhino SketchUp … … … 2 What is Iray? NVIDIA testbed and inspiration for new tech NVIDIA Material Definition Language (MDL) evolved from internal material representation into public SDK NVIDIA OptiX 7 co-development, verification and guinea pig NVIDIA RTX / RT Cores scene- and ray-dumps to drive hardware requirements NVIDIA Maxwell…NVIDIA Turing (& future) enhancements profiling/experiments resulting in new features/improvements Design and test/verify NVIDIA’s new Headquarter (in VR) close cooperation with Gensler 3 Simulation Quality 4 iray legacy Artistic Freedom 5 How Does it Work? 99% physically based Path Tracing To guarantee simulation quality and Push Button • Limit shortcuts and good enough hacks to minimum • Brute force (spectral) simulation no intermediate filtering scale over multiple GPUs and hosts even in interactive use • Two-way path tracing from camera and (opt.) lights • Use NVIDIA Material Definition Language (MDL) • NVIDIA AI Denoiser to clean up remaining noise 6 How Does it Work? 99% physically based Path Tracing To guarantee simulation quality and Push Button • Limit shortcuts and good enough hacks to minimum • Brute force (spectral) simulation no intermediate filtering scale over multiple -
Megakernels Considered Harmful: Wavefront Path Tracing on Gpus
Megakernels Considered Harmful: Wavefront Path Tracing on GPUs Samuli Laine Tero Karras Timo Aila NVIDIA∗ Abstract order to handle irregular control flow, some threads are masked out when executing a branch they should not participate in. This in- When programming for GPUs, simply porting a large CPU program curs a performance loss, as masked-out threads are not performing into an equally large GPU kernel is generally not a good approach. useful work. Due to SIMT execution model on GPUs, divergence in control flow carries substantial performance penalties, as does high register us- The second factor is the high-bandwidth, high-latency memory sys- age that lessens the latency-hiding capability that is essential for the tem. The impressive memory bandwidth in modern GPUs comes at high-latency, high-bandwidth memory system of a GPU. In this pa- the expense of a relatively long delay between making a memory per, we implement a path tracer on a GPU using a wavefront formu- request and getting the result. To hide this latency, GPUs are de- lation, avoiding these pitfalls that can be especially prominent when signed to accommodate many more threads than can be executed in using materials that are expensive to evaluate. We compare our per- any given clock cycle, so that whenever a group of threads is wait- formance against the traditional megakernel approach, and demon- ing for a memory request to be served, other threads may be exe- strate that the wavefront formulation is much better suited for real- cuted. The effectiveness of this mechanism, i.e., the latency-hiding world use cases where multiple complex materials are present in capability, is determined by the threads’ resource usage, the most the scene.