USING BINDLESS RESOURCES with DIRECTX RAYTRACING Matt Pettineo Ready at Dawn Studios

CHAPTER 17 USING BINDLESS RESOURCES WITH DIRECTX RAYTRACING Matt Pettineo Ready At Dawn Studios ABSTRACT Resource binding in Direct3D 12 can be complex and diffcult to implement correctly, particularly when used in conjunction with DirectX Raytracing. This chapter will explain how to use bindless techniques to provide shaders with global access to all resources, which can simplify application and shader code while also enabling new techniques. 17.1 INTRODUCTION Prior to the introduction of Direct3D 12, GPU texture and buffer resources were accessed using a simple CPU-driven binding model. The GPU’s resource access capabilities were typically exposed as a fxed set of “slots” that were Figure 17-1. The Unreal Engine Sun Temple scene [5] rendered with an open source DXR path tracer [9] that uses bindless resources. © NVIDIA 2021 A. Marrs, P. Shirley, I. Wald (eds.), Ray Tracing Gems II, https://doi.org/10.1007/978-1-4842-7185-8_17 257 RAY TRACING GEMS II tied to a particular stage of the logical GPU pipeline, and API functions were provided that allowed the GPU to “bind” a resource view to one of the exposed slots. This sort of binding model was a natural ft for earlier GPUs, which typically featured a fxed set of hardware descriptor registers that were used by the shader cores to access resources. While this old style of binding was relatively simple and well understood, it naturally came with many limitations. The limited nature of the binding slots meant that programs could typically only bind the exact set of resources that would be accessed by a particular shader program, which would often have to be done before every draw or dispatch. The CPU-driven nature of binding demanded that a shader’s required resources had to be statically known after compilation, which naturally led to inherent restrictions on the complexity of a shader program. As ray tracing on the GPU started to gain traction, the classic binding model reached its breaking point. Ray tracing tends to be an inherently global process: one shader program might launch rays that could potentially interact with every material in the scene. This is largely incompatible with the notion of having the CPU bind a fxed set of resources prior to dispatch. Techniques such as atlasing or Sparse Virtual Texturing [1] can be viable as a means of emulating global resource access, but may also require adding signifcant complexity to a renderer. Fortunately, newer GPUs and APIs no longer suffer from the same limitations. Most recent GPU architectures have shifted to a model where resource descriptors can be loaded from memory instead of from registers, and in some cases they can also access resources directly from a memory address. This removes the prior restrictions on the number of resources that can be accessed by a particular shader program, and also opens the door for those shader programs to dynamically choose which resource is actually accessed. This newfound fexibility is directly refected in the binding model of Direct3D 12, which has been completely revamped compared to previous versions of the API. In particular, it supports features that collectively enable a technique commonly known as bindless resources [2]. When implemented, bindless techniques effectively provide shader programs with full global access to the full set of textures and buffers that are present on the GPU. Instead of requiring the CPU to bind a view for each individual resource, shaders can instead access an individual resource using a simple 32-bit index that can be 258 CHAPTER 17. USING BINDLESS RESOURCES WITH DIRECTX RAYTRACING freely embedded in user-defned data structures. While this level of fexibility can be incredibly useful in more traditional rasterization scenarios [6], they are borderline essential when using DirectX Raytracing (DXR). The remainder of this chapter will cover the details of how to enable bindless resource access using Direct3D 12 (D3D12), and will also cover the basics of how to use bindless techniques in a DXR ray tracer. Basic familiarity with both D3D12 and DXR is assumed, and we refer the reader to an introductory chapter from the frst volume of Ray Tracing Gems [11]. 17.2 TRADITIONAL BINDING WITH DXR Like the rest of D3D12, DXR utilizes root signatures to specify how resources should be made available to shader programs. These root signatures specify collections of root descriptors, descriptor tables, and 32-bit constants and map those to ranges of HLSL binding registers. When using DXR, we actually deal with two different types of root signatures: a global root signature and a local root signature. The global root signature is applicable to the ray generation shader as well as all executed miss, any-hit, closest-hit, and intersection shaders. The local root signature only applies to a particular hit group. Used together, global and local root signatures can implement a fairly traditional binding model where the CPU code “pushes” descriptors for all resources that are needed by the shader program. In a typical rendering scenario, this would likely involve having the local root signature provide a descriptor table containing all textures and constant buffers required by the particular material assigned to the mesh in the hit group. An example of this traditional model of resource binding is shown in Figure 17-2. Though this approach can be workable, there are several problems that make it less than ideal. First, the mechanics of the local root signature are somewhat inconsistent with how root signatures normally work within D3D12. Standard root signatures require using command list APIs to specify which constants, root descriptors, and descriptor table should be bound to corresponding entries in the root signature. Local root signatures do not work this way because there can be many different root signatures contained within a single state object. Instead, the parameters for the root signature entries must be placed inline within a shader record in the hit group shader table, immediately following the shader identifer. This setup is further complicated by the fact that both shader records and root signature parameters have specifc alignment requirements that must be observed. Shader identifers 259 RAY TRACING GEMS II Root Signature Descriptor Heap SRV Table A SRV Table B Descriptor 17 Root CBV Descriptor 18 Descriptor 19 Descriptor 23 Descriptor 24 Descriptor 25 Constants Figure 17-2. An example of traditional resource binding in D3D12. A root signature contains two entries for Shader Resource View (SRV) descriptor tables, each of which point to a range of contiguous SRV descriptors within a global descriptor heap, as well as contains the root Constant Buffer View (CBV). consume 32 bytes (D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES) and must also be located at an offset that is aligned to 32 bytes (D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT). Root signature parameters that are 8 bytes in size (such as root descriptors) must also be placed at offsets that are aligned to 8 bytes. Thus, carefully written packing code or helper types have to be used in order to fulfll these specifc rules when generating the shader table. The following code shows an example of what shader record helper structs might look like: 1 struct ShaderIdentifier 2 { 3 uint8_t Data[D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES] = { }; 4 5 ShaderIdentifier() = default; 6 explicit ShaderIdentifier(const void* idPointer) 7 { 8 memcpy(Data, idPointer, D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES); 9 } 10 }; 11 12 struct HitGroupRecord 13 { 14 ShaderIdentifier ID; 15 D3D12_GPU_DESCRIPTOR_HANDLE SRVTableA = { }; 16 D3D12_GPU_DESCRIPTOR_HANDLE SRVTableB = { }; 17 uint32_t Padding1 = 0; // Ensure that CBV has 8-byte alignment. 18 uint64_t CBV = 0; 19 uint8_t Padding[8] = { }; // Needed to keep shader ID at 20 // 32-byte alignment 21 }; 260 CHAPTER 17. USING BINDLESS RESOURCES WITH DIRECTX RAYTRACING For more complex scenarios with many meshes and materials, the local root signature approach can quickly become unwieldy and error-prone. Generating the local root signature arguments on the CPU as part of flling the hit shader table is the most straightforward approach, but this can consume precious CPU cycles if it needs to be done frequently in order to support dynamic arguments. It also subjects us to some of the same limitations on our shader programs that we had in previous versions of D3D: the set of resources required for a hit shader must be known in advance, and the shader itself must be written so that it uses a static set of resource bindings. Generation of the shader table on the GPU using compute shaders is a viable option that allows for more GPU-driven rendering approaches, however this can be considerably more diffcult to write, validate, and debug compared with the equivalent CPU implementation. It also does not remove the limitations regarding shader programs and dynamically selecting which resources to access. Both approaches are fundamentally incompatible with the new inline raytracing functionality that was added for DXR 1.1, since using a local root signature is no longer an option. This means that we must consider a less restrictive binding solution in order to make use of the new RayQuery APIs for general rendering scenarios. 17.3 BINDLESS RESOURCES IN D3D12 As mentioned earlier, bindless techniques allow us to effectively provide our shader programs with global access to all currently loaded resources instead of being restricted to a small subset. D3D12 supports bindless access to every resource type that utilizes shader visible descriptor heaps: Shader Resource Views (SRVs), Constant Buffer Views (CBVs), Unordered Access Views (UAVs), and Samplers. However, these types each have different limitations that can prevent their use in bindless scenarios, most of which are dictated by the value of D3D12_RESOURCE_BINDING_TIER that is exposed by the device. We will cover some of these limitations in more detail in Section 17.5, but for now we will primarily focus on using bindless techniques for SRVs because they typically form the bulk of resources accessed by shader programs.

Load more