”It Just Works”: Ray-Traced Reflections in ’Battlefield V’

Johannes Deligiannis Jan Schmid EA DICE *PLACEHOLDER*

* PLAY TRAILER OR SIMILAR * TODAY we present Raytracing

• Project background

• GPU Raytracing Pipeline

• Engine integration of DXR

• GPU Performance Battlefield V

• FPS set in WWII

• Released Nov 2018

• Raytracing work began Dec 2017

• First DXR game released! Project Background

• ~10 months dev time • Engineering

• Use DXR in Battlefield V • Yasin Uludag (EA DICE) • AO • Johannes Deligiannis (EA DICE) • GI • Jiho Choi (NVIDIA) • Shadows • Pawel Kozlowski (NVIDIA)

• Reflections • And a bunch of other people! ☺ Main Challanges

• Not a Tech Demo • Early adopter tax

• Content is set • API not final

• Game in full production • Driver hang/bugs • BSoD • Scope of Engine changes • No capture tool (Nsight, Pix) • Performance • But we shipped it☺ • Denoising vs Ray Count

• No RTX cards 10 11 (simple) raytracing pipeline

Intersect/Material Generate Rays Light Rays Light Combine Data 12 Generate Rays

Lookup Texture

G Buffer

*Tomasz Stachowiak and Yasin Uludag, Siggraph 2015. “Stochastic Screen-Space Reflections” 13 Raytracing

MAGIC 14 Light Rays

float4 light(MaterialData surfaceInfo, float3 rayDir) {

foreach (light : pointLights) radiance += calcPoint(surfaceInfo, rayDir, light);

foreach (light : spotLights) radiance += calcSpot(surfaceInfo, rayDir, light);

foreach (light : reflectionVolumes) radiance += calcReflVol(surfaceInfo, rayDir, light);

… } 15 Light Combine

Lookup Texture

Lit Raster result 16

Rays Contributeunhappy Less

B a d b a d b a d , very sad crying faces

Sloooow

Very Noisy 17 Improving raytracing pipeline

Variable Rate Tracing

Intersect/Material Generate Rays Light Rays Light Combine Data 18 Variable Rate Tracing

0 128 128 0

128 256 256 128

0 128 128 0

Max Ratio Classify

0 .5 .5 0 0 .1 .1 0 .5 1 1 .5 .1 .2 .2 .1 0 .5 .5 0 Normalize 0 .1 .1 0 19 Variable Rate Tracing

256 rays 128 rays 64 rays 32 rays 20 Variable Rate Tracing

Success!

- More Rays on Water

- More Rays on grazing angles 21 Problem 22 Improving raytracing pipeline

Variable Rate Tracing

Intersect/Material Generate Rays Light Rays Light Combine Data

Ray Binning 23 Ray Binning

Screen Offset 3012 Bin Index

Angle 24 Ray Binning 25 Ray Binning

Rays Atomic Local Offsets Increment

Ray 1000 Bin 3011 0 Ray 1001 Bin 3013 0 Ray 1002 Bin 3011 1

2 0 1 Bin 3011 Bin 3012 Bin 3013 26 Ray Binning Bin 3011 Bin 3012 Bin 3013 1000 1002 1002

Local Offsets

0 0 1 Exclusive Parallel Sum *

2 0 1 Bin 3011 Bin 3012 Bin 3013 *Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA” 27 Ray Binning Bin 3011 Bin 3012 Bin 3013 1000 1002 1002

Rays Lookup Local Offsets

Ray 1000 Add 0 Ray 1002 Add 0 Ray 1001 Add 1 28 Problem 29 Improving raytracing pipeline

Variable Rate SSR Tracing Hybridization

Intersect/Material Generate Rays Light Rays Light Combine Data

Ray Binning 30 SS-Hybridization

Rejected Miss [StachowiakHierarchical et al Screen15] "Stochastic Intersect/Material Give Up Rays Screen-SpaceSpace Reflections" Trace Data

Material Data

Material Data Light Material Radiance 31 SS-Hybridization 32 SS-Hybridization 33 Problem

Hit Miss Hit Miss Busy Idle Busy Idle

Miss Hit Hit Miss Idle Busy Busy Idle

Miss Hit Hit Miss Idle Busy Busy Idle

Miss Miss Hit Hit Idle Idle Busy Busy

Raytrace Light Shader Wavefront 34 Improving raytracing pipeline

Variable Rate SSR Tracing Hybridization

Intersect/Material Generate Rays Light Rays Light Combine Data

Ray Binning Defrag 35 Defrag

1 0 1 0 0 1 1 0 1 0 1 0

Exclusive Parallel Sum *

0 1 1 2 2 2 3 4 4 5 5 6

Hit Hit Hit Hit Hit Hit

*Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA” 36 Problem

Busy Busy Busy Busy

Busy Busy Busy Busy

Busy Busy Busy Busy

Busy Busy Busy Busy

Light Shader 2.0ms 37 Improving raytracing pipeline

Variable Rate SSR Tracing Hybridization

Intersect/Material Per Cell Light List Generate Rays Light Rays Light Combine Data Lighting

Ray Binning Defrag 38 Per Cell Light Lists

Light 2 Light 3 Next Next

Light 0 Light 1 Light 3 Next Next Next 39 Problem 40 Improving raytracing pipeline

SSR Variable Rate Denoise Tracing Hybridization

Intersect/Material Per Cell Light List Generate Rays Light Combine Data Lighting

Ray Binning Defrag 41 Denoising

Reuse Reuse Spatial [Stachowiak et al 15] "StochasticTemporal InformationScreen-Space Reflections" Information

Temporal BRDF Filter Filter 42 BRDF Denoise Filter

푁 퐿푖 푙푘 푓푠 푙푘 → 푣 cos Θ푙푘 σ푘=1 푝푘 Kernel Size???? 퐿0 ≈ 퐹퐺 푁 푓푠 푙푘 → 푣 cos Θ푙푘 σ푘=1 푝푘 43 BRDF Denoise Filter

????? 44 BRDF Denoise Filter 45 BRDF Denoise Filter

Frame N -1

Frame N 46 BRDF Denoise Filter

Pad Pad Pad Pad Pad Pad Pad Pad Actual: 6 Pad Pad Pad Pad Pad Pad Pad Pad

Pad Pad Thread Thread Thread Thread Pad Pad Actual: up to 13

Pad Pad Thread Thread Thread Thread Pad Pad Actual: 16 Pad Pad Thread Thread Thread Thread Pad Pad

Pad Pad Thread Thread Thread Thread Pad Pad

Pad Pad Pad Pad Pad Pad Pad Pad Actual: 6 Pad Pad Pad Pad Pad Pad Pad Pad 47 BRDF Denoise Filter 48 Temporal Denoise Filter

Is it a good sample? If only... BRDF Denoiser! 49 temporal Denoise Filter

Still Noisy 50 Image Denoise Filter

Generate LUT { angle, roughness } to { width, height } for unit length ray 51 Image Denoise Filter

∗ = 52 Image Denoise Filter

1 1 ∗ = 2 2 53 Image Denoise Filter 54 New Pipeline

Screen Variable Generate Intersect/ Ray Binning Space Rate Tracing Rays Material Data Hybrid

0.37ms 0.19ms 0.15ms 0.36ms 1.98ms 55 New Pipeline

6.29ms total

Intersect/ ‘Improved’ Temporal Defrag Spatial Filter Image Filter Material Data Lighting Filter

1.98ms 0.08ms 0.46ms 1.45ms 0.24ms 1.00ms 56 D X R – a.k.a ”BLACK BOX”

No DXR Intersection Shading D X R b a s i c s

A

D D C B A • BLAS - Bottom Level BLAS

Acceleration Structure 푐1,1 ⋯ 푐4,1 ⋮ ⋱ ⋮ 푏1,1 ⋯ 푏4,1 푎 ⋯ 푐 • TLAS - T op Le v e l 푎1,1 ⋯⋮ 푎1⋱4,4,1 ⋮ 4,4 x ⋮ 푎1⋱,4 ⋯⋮ 푏4,4 Acceleration Structure 푎1,4 ⋯ 푎4,4

푑1,1 ⋯ 푑4,1 • CS x ⋮ ⋱ ⋮ 푑1,4 ⋯ 푑4,4 • Skinning, Destruction TLAS • Compute shader

• Update each frame A A • Blas can update incrementally D CS

D C B A ACCELERATION STRUCTURE

• Which objects?

• Frustum Culling

• Occlusion Culling

• Easy... no culling! Acceleration structure – F I R S T P A S S

• Rotterdam

• 20200 TLAS instances...

• 5000 BLAS rebuilds...

• GPU rebuild 64 ms (!) What to do?

• Idea: Reduce instance count

• Use a culling heuristic

• Accept (some) minor artifacts Culling HEURISTIC

• Assumtion:

• Far away objects not important

• Except for large objects

• Bridge, building etc

• Need some kind of measurement... C u l l i n g

• Project bounding sphere

푟 • 휃 = 푡푎푛 푑 푟 • If 휃°< 푇ℎ푟푒푠ℎ표푙푑 ° : Cull

푑 휃 c u l l i n g

푟푒푓푒푟푒푛푐푒 − 푛표 푐푢푙푙𝑖푛푔 휃 = 4° 휃 = 15° c u l l i n g

Culled Objects

푟푒푓푒푟푒푛푐푒 − 푛표 푐푢푙푙𝑖푛푔 휃 = 4° C U L L I N G - RESULTS

• 4 deg culling

• 5000 -> 400 BLAS rebuilds each frame

• 20000 -> 2800 TLAS instances

• TLAS + BLAS build (GPU): 64 ms -> 14.5 ms

• Pros

• F a s t e r

• Cons • Occasional popping • Missing objects Blas update optimizations

• Still expensive! More ideas:

1. Stagger full and incremental BLAS rebuild

• N frames incremental before full rebuild

2. D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_PREFER_FAST_BUILD

3. Avoid redundant rebuilds

• Check CS input (bone matrix)

• 400 - > 50

• Overlap BLAS update with GFX

• Gbuffer, shadowmaps

77 r e s u l t s

• TLAS + BLAS build (GPU): 14.5 ms -> 1.15 ms

• RayGen (GPU): 0.71 ms -> 0.81 ms (staggered refit + flags)

• Much better ☺

78 SHADING (OPAQUE)

RT ON | SHADING OFF RT ON | SHADING ON R a y t r a c i n g Requirements Raster Raytrace

• Shader output must match!

• ClosestHit Shader

• AnyHit Shader

Same?

80 Shaders in

• VS – Handwritten

• PS - Shader Graphs

• Graph -> .hlsl

• Manual conversion... no

• 1000s of shaders

• Auto VS + PS to HitGroup

81 H i t s h a d e r t e m p l a t e

[shader(”closesthit")] void chMain() { IA0 = iaMain(id + 0) IA1 = iaMain(id + 1) Vertex buffer UV, Normal IA2 = iaMain(id + 2) V0 = vsMain(IA0) VS – VertexFragment V1 = vsMain(IA1) World Space Normal V2 = vsMain(IA2) #define Sample(s, uv) ...clip?SampleLevel(s, uv, 0) Texture MIP? V = lerp(V0, V1, V2, U, V, W) #define ddx(x) x PS - ShaderGraph P = psMain(V)... ddx/ddy? #define ddy(x) x ... ¯\_(ツ)_/¯writePayload(P) } ALPHA TESTing

• AnyHit Shader:

• If (AlphaTest(alphaValue)) IgnoreHit();

84 A L P H A T E S T

ANY HIT OFF ANY HIT ON

86 A L P H A T E S T

87 Alpha testing video Summary opaque

• Closest Hit Shader

• Always

• Any Hit Shader (Optional)

• Alpha tested

• Compute Shader (Optional)

• Skinning, destruction etc

89 RAY PAYLOAD

• Payload: returned on ray intersection

• Same format as Gbuffer RTV struct GbufferPayloadPacked { • Contains Material Data uint data0; // R10G10B10A2_UNORM uint data1; // R8G8B8A8_SRGB • Normal uint data2; // R8G8B8A8_UNORM • Bas e Col or uint data3; // R11G11B10_FLOAT float hitT; // Ray length • Smoothnes s }; • …etc V e r i f y i n g correctness

1. Rasterizer output -

2. Shoot primary rays in to scene

Gbuffer (BaseColor) Raytraced (BaseColor) 3. Compare Payload with Gbuffer Reference Primary Rays

4. Non zero output? Bug!

5. Fix bug =

Delta

92 Shader compilation

• All shaders generated ☺

• ~3000 per level

• ~250 per frame

• Single RT PSO

• Runtime compile times?

Color coded Closest Hit Shaders

97 Pso generation

• Dx12 GFX PSO...

• ... DXR 3000 shaders 

• Compile times? • Majority > 1 0 0 m s

• Cold cache • 7 min 30 sec thread time • 6 threads: 1 min 30 sec

• Warm cache • 1 min 30 sec thread time • 6 threads: 1 5 s e c

milliseconds P a r t i c l e s

Smoke, Fire and Exposions. Important elements in BFV!

101 P a r t i c l e s

• Particle = Transparent+Billboard

• Basic algorithm

1. Shoot ray in Opaque TLAS

2. Shoot again in Particle TLAS (Max ray length from Opaque)

3. Blend particles with opaque hit

102 THE Problem with Particles

• Camera aligned billboards 

• Rotate odd particles 90 deg around Y?

Billboards visible when viewed from the side.

103 Rotated billboards

Before: Billboards visible in reflection After: Rotating odd quads produces a more volumetric look

104 Performance

• Accumulate intersections along ray RayGen Shader

*... init ray using opaqueHitT and currT* • 1 rpp => N rpp  for (hitCount = 0; hitCount < MaxIntersectionCount; ++hitCount) { ... • RayGen loop ForwardPayloadPacked forwardPayloadPacked; initForwardPayloadPacked(forwardPayloadPacked); TraceRay(g_tlasPartices, 0, 0xFF, 0, 1, 0, ray, forwardPayloadPacked); • Sounds... expensive? ForwardPayload forwardPayload = unpackForwardPayload(forwardPayloadPacked); if (forwardPayload.hitT <= 0.0f) // Miss, tracing done break;

* ... update ray using forwardPayload.hitT, accumulate color, alpha * }

105 THE (second) Problem with Particles

RayGen loop 0.96ms 106 Optimizing particles

• Loop Idea: AnyHit shader? RayGen Shader

• Same... but different ... init ray using maxT and currT TraceRay(g_tlasPartices, 0, 0xFF, 0 , 1, 0, ray, forwardPayloadPacked); * ... process payload and calculate weighted average * • Inspired by WBOIT* Any Hit Shader • struct Attributes { float2 barycentrics; }; Weight = max(luminance,r,g,b [shader("anyhit")] void main(inout ForwardPayloadPacked payloadPacked, in Attributes attributes) alpha) { *... Calculate color, transparency * payloadPacked.alpha += alpha * weight payloadPacked.color += color * weight; • Emissive, fire payloadPacked.weight += weight; IgnoreHit(); }

*Weighted Blended Order-Independent Tranparency: http://jcgt.org/published/0002/02/09/ 107 P a r t i c l e s - R E S U L T S

’Naive’ Closest Hit 0.96ms Order Independent AnyHit 0.34ms Slow but accurate Really fast, but slightly different look 109 1 1 0

THANK YOU! ...Any Questions?