”It Just Works”: Ray-Traced Reflections in ’Battlefield V’
Johannes Deligiannis Jan Schmid EA DICE *PLACEHOLDER*
* PLAY GAMESCOM TRAILER OR SIMILAR * TODAY we present Raytracing
• Project background
• GPU Raytracing Pipeline
• Engine integration of DXR
• GPU Performance Battlefield V
• FPS set in WWII
• Released Nov 2018
• Raytracing work began Dec 2017
• First DXR game released! Project Background
• ~10 months dev time • Engineering
• Use DXR in Battlefield V • Yasin Uludag (EA DICE) • AO • Johannes Deligiannis (EA DICE) • GI • Jiho Choi (NVIDIA) • Shadows • Pawel Kozlowski (NVIDIA)
• Reflections • And a bunch of other people! ☺ Main Challanges
• Not a Tech Demo • Early adopter tax
• Content is set • API not final
• Game in full production • Driver hang/bugs • BSoD • Scope of Engine changes • No capture tool (Nsight, Pix) • Performance • But we shipped it☺ • Denoising vs Ray Count
• No RTX cards 10 11 (simple) raytracing pipeline
Intersect/Material Generate Rays Light Rays Light Combine Data 12 Generate Rays
Lookup Texture
G Buffer
*Tomasz Stachowiak and Yasin Uludag, Siggraph 2015. “Stochastic Screen-Space Reflections” 13 Raytracing
MAGIC 14 Light Rays
float4 light(MaterialData surfaceInfo, float3 rayDir) {
foreach (light : pointLights) radiance += calcPoint(surfaceInfo, rayDir, light);
foreach (light : spotLights) radiance += calcSpot(surfaceInfo, rayDir, light);
foreach (light : reflectionVolumes) radiance += calcReflVol(surfaceInfo, rayDir, light);
… } 15 Light Combine
Lookup Texture
Lit Raster result 16
Rays Contributeunhappy Less
B a d b a d b a d , very sad crying faces
Sloooow
Very Noisy 17 Improving raytracing pipeline
Variable Rate Tracing
Intersect/Material Generate Rays Light Rays Light Combine Data 18 Variable Rate Tracing
0 128 128 0
128 256 256 128
0 128 128 0
Max Ratio Classify
0 .5 .5 0 0 .1 .1 0 .5 1 1 .5 .1 .2 .2 .1 0 .5 .5 0 Normalize 0 .1 .1 0 19 Variable Rate Tracing
256 rays 128 rays 64 rays 32 rays 20 Variable Rate Tracing
Success!
- More Rays on Water
- More Rays on grazing angles 21 Problem 22 Improving raytracing pipeline
Variable Rate Tracing
Intersect/Material Generate Rays Light Rays Light Combine Data
Ray Binning 23 Ray Binning
Screen Offset 3012 Bin Index
Angle 24 Ray Binning 25 Ray Binning
Rays Atomic Local Offsets Increment
Ray 1000 Bin 3011 0 Ray 1001 Bin 3013 0 Ray 1002 Bin 3011 1
2 0 1 Bin 3011 Bin 3012 Bin 3013 26 Ray Binning Bin 3011 Bin 3012 Bin 3013 1000 1002 1002
Local Offsets
0 0 1 Exclusive Parallel Sum *
2 0 1 Bin 3011 Bin 3012 Bin 3013 *Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA” 27 Ray Binning Bin 3011 Bin 3012 Bin 3013 1000 1002 1002
Rays Lookup Local Offsets
Ray 1000 Add 0 Ray 1002 Add 0 Ray 1001 Add 1 28 Problem 29 Improving raytracing pipeline
Variable Rate SSR Tracing Hybridization
Intersect/Material Generate Rays Light Rays Light Combine Data
Ray Binning 30 SS-Hybridization
Rejected Miss [StachowiakHierarchical et al Screen15] "Stochastic Intersect/Material Give Up Rays Screen-SpaceSpace Reflections" Trace Data
Material Data
Material Data Light Material Radiance 31 SS-Hybridization 32 SS-Hybridization 33 Problem
Hit Miss Hit Miss Busy Idle Busy Idle
Miss Hit Hit Miss Idle Busy Busy Idle
Miss Hit Hit Miss Idle Busy Busy Idle
Miss Miss Hit Hit Idle Idle Busy Busy
Raytrace Light Shader Wavefront 34 Improving raytracing pipeline
Variable Rate SSR Tracing Hybridization
Intersect/Material Generate Rays Light Rays Light Combine Data
Ray Binning Defrag 35 Defrag
1 0 1 0 0 1 1 0 1 0 1 0
Exclusive Parallel Sum *
0 1 1 2 2 2 3 4 4 5 5 6
Hit Hit Hit Hit Hit Hit
*Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA” 36 Problem
Busy Busy Busy Busy
Busy Busy Busy Busy
Busy Busy Busy Busy
Busy Busy Busy Busy
Light Shader 2.0ms 37 Improving raytracing pipeline
Variable Rate SSR Tracing Hybridization
Intersect/Material Per Cell Light List Generate Rays Light Rays Light Combine Data Lighting
Ray Binning Defrag 38 Per Cell Light Lists
Light 2 Light 3 Next Next
Light 0 Light 1 Light 3 Next Next Next 39 Problem 40 Improving raytracing pipeline
SSR Variable Rate Denoise Tracing Hybridization
Intersect/Material Per Cell Light List Generate Rays Light Combine Data Lighting
Ray Binning Defrag 41 Denoising
Reuse Reuse Spatial [Stachowiak et al 15] "StochasticTemporal InformationScreen-Space Reflections" Information
Temporal BRDF Filter Filter 42 BRDF Denoise Filter
푁 퐿푖 푙푘 푓푠 푙푘 → 푣 cos Θ푙푘 σ푘=1 푝푘 Kernel Size???? 퐿0 ≈ 퐹퐺 푁 푓푠 푙푘 → 푣 cos Θ푙푘 σ푘=1 푝푘 43 BRDF Denoise Filter
????? 44 BRDF Denoise Filter 45 BRDF Denoise Filter
Frame N -1
Frame N 46 BRDF Denoise Filter
Pad Pad Pad Pad Pad Pad Pad Pad Actual: 6 Pad Pad Pad Pad Pad Pad Pad Pad
Pad Pad Thread Thread Thread Thread Pad Pad Actual: up to 13
Pad Pad Thread Thread Thread Thread Pad Pad Actual: 16 Pad Pad Thread Thread Thread Thread Pad Pad
Pad Pad Thread Thread Thread Thread Pad Pad
Pad Pad Pad Pad Pad Pad Pad Pad Actual: 6 Pad Pad Pad Pad Pad Pad Pad Pad 47 BRDF Denoise Filter 48 Temporal Denoise Filter
Is it a good sample? If only... BRDF Denoiser! 49 temporal Denoise Filter
Still Noisy 50 Image Denoise Filter
Generate LUT { angle, roughness } to { width, height } for unit length ray 51 Image Denoise Filter
∗ = 52 Image Denoise Filter
1 1 ∗ = 2 2 53 Image Denoise Filter 54 New Pipeline
Screen Variable Generate Intersect/ Ray Binning Space Rate Tracing Rays Material Data Hybrid
0.37ms 0.19ms 0.15ms 0.36ms 1.98ms 55 New Pipeline
6.29ms total
Intersect/ ‘Improved’ Temporal Defrag Spatial Filter Image Filter Material Data Lighting Filter
1.98ms 0.08ms 0.46ms 1.45ms 0.24ms 1.00ms 56 D X R – a.k.a ”BLACK BOX”
No DXR Intersection Shading D X R b a s i c s
A
D D C B A • BLAS - Bottom Level BLAS
Acceleration Structure 푐1,1 ⋯ 푐4,1 ⋮ ⋱ ⋮ 푏1,1 ⋯ 푏4,1 푎 ⋯ 푐 • TLAS - T op Le v e l 푎1,1 ⋯⋮ 푎1⋱4,4,1 ⋮ 4,4 x ⋮ 푎1⋱,4 ⋯⋮ 푏4,4 Acceleration Structure 푎1,4 ⋯ 푎4,4
푑1,1 ⋯ 푑4,1 • CS x ⋮ ⋱ ⋮ 푑1,4 ⋯ 푑4,4 • Skinning, Destruction TLAS • Compute shader
• Update each frame A A • Blas can update incrementally D CS
D C B A ACCELERATION STRUCTURE
• Which objects?
• Frustum Culling
• Occlusion Culling
• Easy... no culling! Acceleration structure – F I R S T P A S S
• Rotterdam
• 20200 TLAS instances...
• 5000 BLAS rebuilds...
• GPU rebuild 64 ms (!) What to do?
• Idea: Reduce instance count
• Use a culling heuristic
• Accept (some) minor artifacts Culling HEURISTIC
• Assumtion:
• Far away objects not important
• Except for large objects
• Bridge, building etc
• Need some kind of measurement... C u l l i n g
• Project bounding sphere
푟 • 휃 = 푡푎푛 푑 푟 • If 휃°< 푇ℎ푟푒푠ℎ표푙푑 ° : Cull
푑 휃 c u l l i n g
푟푒푓푒푟푒푛푐푒 − 푛표 푐푢푙푙𝑖푛푔 휃 = 4° 휃 = 15° c u l l i n g
Culled Objects
푟푒푓푒푟푒푛푐푒 − 푛표 푐푢푙푙𝑖푛푔 휃 = 4° C U L L I N G - RESULTS
• 4 deg culling
• 5000 -> 400 BLAS rebuilds each frame
• 20000 -> 2800 TLAS instances
• TLAS + BLAS build (GPU): 64 ms -> 14.5 ms
• Pros
• F a s t e r
• Cons • Occasional popping • Missing objects Blas update optimizations
• Still expensive! More ideas:
1. Stagger full and incremental BLAS rebuild
• N frames incremental before full rebuild
2. D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_PREFER_FAST_BUILD
3. Avoid redundant rebuilds
• Check CS input (bone matrix)
• 400 - > 50
• Overlap BLAS update with GFX
• Gbuffer, shadowmaps
77 r e s u l t s
• TLAS + BLAS build (GPU): 14.5 ms -> 1.15 ms
• RayGen (GPU): 0.71 ms -> 0.81 ms (staggered refit + flags)
• Much better ☺
78 SHADING (OPAQUE)
RT ON | SHADING OFF RT ON | SHADING ON R a y t r a c i n g Requirements Raster Raytrace
• Shader output must match!
• ClosestHit Shader
• AnyHit Shader
Same?
80 Shaders in FROSTBITE
• VS – Handwritten
• PS - Shader Graphs
• Graph -> .hlsl
• Manual conversion... no
• 1000s of shaders
• Auto VS + PS to HitGroup
81 H i t s h a d e r t e m p l a t e
[shader(”closesthit")] void chMain() { IA0 = iaMain(id + 0) IA1 = iaMain(id + 1) Vertex buffer UV, Normal IA2 = iaMain(id + 2) V0 = vsMain(IA0) VS – VertexFragment V1 = vsMain(IA1) World Space Normal V2 = vsMain(IA2) #define Sample(s, uv) ...clip?SampleLevel(s, uv, 0) Texture MIP? V = lerp(V0, V1, V2, U, V, W) #define ddx(x) x PS - ShaderGraph P = psMain(V)... ddx/ddy? #define ddy(x) x ... ¯\_(ツ)_/¯writePayload(P) } ALPHA TESTing
• AnyHit Shader:
• If (AlphaTest(alphaValue)) IgnoreHit();
84 A L P H A T E S T
ANY HIT OFF ANY HIT ON
86 A L P H A T E S T
87 Alpha testing video Summary opaque
• Closest Hit Shader
• Always
• Any Hit Shader (Optional)
• Alpha tested
• Compute Shader (Optional)
• Skinning, destruction etc
89 RAY PAYLOAD
• Payload: returned on ray intersection
• Same format as Gbuffer RTV struct GbufferPayloadPacked { • Contains Material Data uint data0; // R10G10B10A2_UNORM uint data1; // R8G8B8A8_SRGB • Normal uint data2; // R8G8B8A8_UNORM • Bas e Col or uint data3; // R11G11B10_FLOAT float hitT; // Ray length • Smoothnes s }; • …etc V e r i f y i n g correctness
1. Rasterizer output -
2. Shoot primary rays in to scene
Gbuffer (BaseColor) Raytraced (BaseColor) 3. Compare Payload with Gbuffer Reference Primary Rays
4. Non zero output? Bug!
5. Fix bug =
Delta
92 Shader compilation
• All shaders generated ☺
• ~3000 per level
• ~250 per frame
• Single RT PSO
• Runtime compile times?
Color coded Closest Hit Shaders
97 Pso generation
• Dx12 GFX PSO...
• ... DXR 3000 shaders
• Compile times? • Majority > 1 0 0 m s
• Cold cache • 7 min 30 sec thread time • 6 threads: 1 min 30 sec
• Warm cache • 1 min 30 sec thread time • 6 threads: 1 5 s e c
milliseconds P a r t i c l e s
Smoke, Fire and Exposions. Important elements in BFV!
101 P a r t i c l e s
• Particle = Transparent+Billboard
• Basic algorithm
1. Shoot ray in Opaque TLAS
2. Shoot again in Particle TLAS (Max ray length from Opaque)
3. Blend particles with opaque hit
102 THE Problem with Particles
• Camera aligned billboards
• Rotate odd particles 90 deg around Y?
Billboards visible when viewed from the side.
103 Rotated billboards
Before: Billboards visible in reflection After: Rotating odd quads produces a more volumetric look
104 Performance
• Accumulate intersections along ray RayGen Shader
*... init ray using opaqueHitT and currT* • 1 rpp => N rpp for (hitCount = 0; hitCount < MaxIntersectionCount; ++hitCount) { ... • RayGen loop ForwardPayloadPacked forwardPayloadPacked; initForwardPayloadPacked(forwardPayloadPacked); TraceRay(g_tlasPartices, 0, 0xFF, 0, 1, 0, ray, forwardPayloadPacked); • Sounds... expensive? ForwardPayload forwardPayload = unpackForwardPayload(forwardPayloadPacked); if (forwardPayload.hitT <= 0.0f) // Miss, tracing done break;
* ... update ray using forwardPayload.hitT, accumulate color, alpha * }
105 THE (second) Problem with Particles
RayGen loop 0.96ms 106 Optimizing particles
• Loop Idea: AnyHit shader? RayGen Shader
• Same... but different ... init ray using maxT and currT TraceRay(g_tlasPartices, 0, 0xFF, 0 , 1, 0, ray, forwardPayloadPacked); * ... process payload and calculate weighted average * • Inspired by WBOIT* Any Hit Shader • struct Attributes { float2 barycentrics; }; Weight = max(luminance,r,g,b [shader("anyhit")] void main(inout ForwardPayloadPacked payloadPacked, in Attributes attributes) alpha) { *... Calculate color, transparency * payloadPacked.alpha += alpha * weight payloadPacked.color += color * weight; • Emissive, fire payloadPacked.weight += weight; IgnoreHit(); }
*Weighted Blended Order-Independent Tranparency: http://jcgt.org/published/0002/02/09/ 107 P a r t i c l e s - R E S U L T S
’Naive’ Closest Hit 0.96ms Order Independent AnyHit 0.34ms Slow but accurate Really fast, but slightly different look 109 1 1 0
THANK YOU! ...Any Questions?