Ray-Traced Reflections in 'Battlefield V'
Total Page:16
File Type:pdf, Size:1020Kb
”It Just Works”: Ray-Traced Reflections in ’Battlefield V’ Johannes Deligiannis Jan Schmid EA DICE *PLACEHOLDER* * PLAY GAMESCOM TRAILER OR SIMILAR * TODAY we present Raytracing • Project background • GPU Raytracing Pipeline • Engine integration of DXR • GPU Performance Battlefield V • FPS set in WWII • Released Nov 2018 • Raytracing work began Dec 2017 • First DXR game released! Project Background • ~10 months dev time • Engineering • Use DXR in Battlefield V • Yasin Uludag (EA DICE) • AO • Johannes Deligiannis (EA DICE) • GI • Jiho Choi (NVIDIA) • Shadows • Pawel Kozlowski (NVIDIA) • Reflections • And a bunch of other people! ☺ Main Challanges • Not a Tech Demo • Early adopter tax • Content is set • API not final • Game in full production • Driver hang/bugs • BSoD • Scope of Engine changes • No capture tool (Nsight, Pix) • Performance • But we shipped it☺ • Denoising vs Ray Count • No RTX cards 10 11 (simple) raytracing pipeline Intersect/Material Generate Rays Light Rays Light Combine Data 12 Generate Rays Lookup Texture G Buffer *Tomasz Stachowiak and Yasin Uludag, Siggraph 2015. “Stochastic Screen-Space Reflections” 13 Raytracing MAGIC 14 Light Rays float4 light(MaterialData surfaceInfo, float3 rayDir) { foreach (light : pointLights) radiance += calcPoint(surfaceInfo, rayDir, light); foreach (light : spotLights) radiance += calcSpot(surfaceInfo, rayDir, light); foreach (light : reflectionVolumes) radiance += calcReflVol(surfaceInfo, rayDir, light); … } 15 Light Combine Lookup Texture Lit Raster result 16 Rays Contributeunhappy Less B a d b a d b a d , very sad crying faces Sloooow Very Noisy 17 Improving raytracing pipeline Variable Rate Tracing Intersect/Material Generate Rays Light Rays Light Combine Data 18 Variable Rate Tracing 0 128 128 0 128 256 256 128 0 128 128 0 Max Ratio Classify 0 .5 .5 0 0 .1 .1 0 .5 1 1 .5 .1 .2 .2 .1 0 .5 .5 0 Normalize 0 .1 .1 0 19 Variable Rate Tracing 256 rays 128 rays 64 rays 32 rays 20 Variable Rate Tracing Success! - More Rays on Water - More Rays on grazing angles 21 Problem 22 Improving raytracing pipeline Variable Rate Tracing Intersect/Material Generate Rays Light Rays Light Combine Data Ray Binning 23 Ray Binning Screen Offset 3012 Bin Index Angle 24 Ray Binning 25 Ray Binning Rays Atomic Local Offsets Increment Ray 1000 Bin 3011 0 Ray 1001 Bin 3013 0 Ray 1002 Bin 3011 1 2 0 1 Bin 3011 Bin 3012 Bin 3013 26 Ray Binning Bin 3011 Bin 3012 Bin 3013 1000 1002 1002 Local Offsets 0 0 1 Exclusive Parallel Sum * 2 0 1 Bin 3011 Bin 3012 Bin 3013 *Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA” 27 Ray Binning Bin 3011 Bin 3012 Bin 3013 1000 1002 1002 Rays Lookup Local Offsets Ray 1000 Add 0 Ray 1002 Add 0 Ray 1001 Add 1 28 Problem 29 Improving raytracing pipeline Variable Rate SSR Tracing Hybridization Intersect/Material Generate Rays Light Rays Light Combine Data Ray Binning 30 SS-Hybridization Rejected Miss [StachowiakHierarchical et al Screen15] "Stochastic Intersect/Material Give Up Rays Screen-SpaceSpace Reflections" Trace Data Material Data Material Data Light Material Radiance 31 SS-Hybridization 32 SS-Hybridization 33 Problem Hit Miss Hit Miss Busy Idle Busy Idle Miss Hit Hit Miss Idle Busy Busy Idle Miss Hit Hit Miss Idle Busy Busy Idle Miss Miss Hit Hit Idle Idle Busy Busy Raytrace Light Shader Wavefront 34 Improving raytracing pipeline Variable Rate SSR Tracing Hybridization Intersect/Material Generate Rays Light Rays Light Combine Data Ray Binning Defrag 35 Defrag 1 0 1 0 0 1 1 0 1 0 1 0 Exclusive Parallel Sum * 0 1 1 2 2 2 3 4 4 5 5 6 Hit Hit Hit Hit Hit Hit *Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA” 36 Problem Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Light Shader 2.0ms 37 Improving raytracing pipeline Variable Rate SSR Tracing Hybridization Intersect/Material Per Cell Light List Generate Rays Light Rays Light Combine Data Lighting Ray Binning Defrag 38 Per Cell Light Lists Light 2 Light 3 Next Next Light 0 Light 1 Light 3 Next Next Next 39 Problem 40 Improving raytracing pipeline SSR Variable Rate Denoise Tracing Hybridization Intersect/Material Per Cell Light List Generate Rays Light Combine Data Lighting Ray Binning Defrag 41 Denoising Reuse Reuse Spatial [Stachowiak et al 15] "StochasticTemporal InformationScreen-Space Reflections" Information Temporal BRDF Filter Filter 42 BRDF Denoise Filter 푁 퐿푖 푙푘 푓푠 푙푘 → 푣 cos Θ푙푘 σ푘=1 푝푘 Kernel Size???? 퐿0 ≈ 퐹퐺 푁 푓푠 푙푘 → 푣 cos Θ푙푘 σ푘=1 푝푘 43 BRDF Denoise Filter ????? 44 BRDF Denoise Filter 45 BRDF Denoise Filter Frame N -1 Frame N 46 BRDF Denoise Filter Pad Pad Pad Pad Pad Pad Pad Pad Actual: 6 Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Thread Thread Thread Thread Pad Pad Actual: up to 13 Pad Pad Thread Thread Thread Thread Pad Pad Actual: 16 Pad Pad Thread Thread Thread Thread Pad Pad Pad Pad Thread Thread Thread Thread Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Actual: 6 Pad Pad Pad Pad Pad Pad Pad Pad 47 BRDF Denoise Filter 48 Temporal Denoise Filter Is it a good sample? If only... BRDF Denoiser! 49 temporal Denoise Filter Still Noisy 50 Image Denoise Filter Generate LUT { angle, roughness } to { width, height } for unit length ray 51 Image Denoise Filter ∗ = 52 Image Denoise Filter 1 1 ∗ = 2 2 53 Image Denoise Filter 54 New Pipeline Screen Variable Generate Intersect/ Ray Binning Space Rate Tracing Rays Material Data Hybrid 0.37ms 0.19ms 0.15ms 0.36ms 1.98ms 55 New Pipeline 6.29ms total Intersect/ ‘Improved’ Temporal Defrag Spatial Filter Image Filter Material Data Lighting Filter 1.98ms 0.08ms 0.46ms 1.45ms 0.24ms 1.00ms 56 D X R – a.k.a ”BLACK BOX” No DXR Intersection Shading D X R b a s i c s A D D C B A • BLAS - Bottom Level BLAS Acceleration Structure 푐1,1 ⋯ 푐4,1 ⋮ ⋱ ⋮ 푏1,1 ⋯ 푏4,1 푎 ⋯ 푐 • TLAS - T op Le v e l 푎1,1 ⋯⋮ 푎1⋱4,4,1 ⋮ 4,4 x ⋮ 푎1⋱,4 ⋯⋮ 푏4,4 Acceleration Structure 푎1,4 ⋯ 푎4,4 푑1,1 ⋯ 푑4,1 • CS x ⋮ ⋱ ⋮ 푑1,4 ⋯ 푑4,4 • Skinning, Destruction TLAS • Compute shader • Update each frame A A • Blas can update incrementally D CS D C B A ACCELERATION STRUCTURE • Which objects? • Frustum Culling • Occlusion Culling • Easy... no culling! Acceleration structure – F I R S T P A S S • Rotterdam • 20200 TLAS instances... • 5000 BLAS rebuilds... • GPU rebuild 64 ms (!) What to do? • Idea: Reduce instance count • Use a culling heuristic • Accept (some) minor artifacts Culling HEURISTIC • Assumtion: • Far away objects not important • Except for large objects • Bridge, building etc • Need some kind of measurement... C u l l i n g • Project bounding sphere 푟 • 휃 = 푡푎푛 푑 푟 • If 휃°< 푇ℎ푟푒푠ℎ표푙푑 ° : Cull 푑 휃 c u l l i n g 푟푒푓푒푟푒푛푐푒 − 푛표 푐푢푙푙푛푔 휃 = 4° 휃 = 15° c u l l i n g Culled Objects 푟푒푓푒푟푒푛푐푒 − 푛표 푐푢푙푙푛푔 휃 = 4° C U L L I N G - RESULTS • 4 deg culling • 5000 -> 400 BLAS rebuilds each frame • 20000 -> 2800 TLAS instances • TLAS + BLAS build (GPU): 64 ms -> 14.5 ms • Pros • F a s t e r • Cons • Occasional popping • Missing objects Blas update optimizations • Still expensive! More ideas: 1. Stagger full and incremental BLAS rebuild • N frames incremental before full rebuild 2. D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_PREFER_FAST_BUILD 3. Avoid redundant rebuilds • Check CS input (bone matrix) • 400 - > 50 • Overlap BLAS update with GFX • Gbuffer, shadowmaps 77 r e s u l t s • TLAS + BLAS build (GPU): 14.5 ms -> 1.15 ms • RayGen (GPU): 0.71 ms -> 0.81 ms (staggered refit + flags) • Much better ☺ 78 SHADING (OPAQUE) RT ON | SHADING OFF RT ON | SHADING ON R a y t r a c i n g Requirements Raster Raytrace • Shader output must match! • ClosestHit Shader • AnyHit Shader Same? 80 Shaders in FROSTBITE • VS – Handwritten • PS - Shader Graphs • Graph -> .hlsl • Manual conversion... no • 1000s of shaders • Auto VS + PS to HitGroup 81 H i t s h a d e r t e m p l a t e [shader(”closesthit")] void chMain() { IA0 = iaMain(id + 0) IA1 = iaMain(id + 1) Vertex buffer UV, Normal IA2 = iaMain(id + 2) V0 = vsMain(IA0) VS – VertexFragment V1 = vsMain(IA1) World Space Normal V2 = vsMain(IA2) #define Sample(s, uv) ...clip?SampleLevel(s, uv, 0) Texture MIP? V = lerp(V0, V1, V2, U, V, W) #define ddx(x) x PS - ShaderGraph P = psMain(V)... ddx/ddy? #define ddy(x) x ... ¯\_(ツ)_/¯writePayload(P) } ALPHA TESTing • AnyHit Shader: • If (AlphaTest(alphaValue)) IgnoreHit(); 84 A L P H A T E S T ANY HIT OFF ANY HIT ON 86 A L P H A T E S T 87 Alpha testing video Summary opaque • Closest Hit Shader • Always • Any Hit Shader (Optional) • Alpha tested • Compute Shader (Optional) • Skinning, destruction etc 89 RAY PAYLOAD • Payload: returned on ray intersection • Same format as Gbuffer RTV struct GbufferPayloadPacked { • Contains Material Data uint data0; // R10G10B10A2_UNORM uint data1; // R8G8B8A8_SRGB • Normal uint data2; // R8G8B8A8_UNORM • Bas e Col or uint data3; // R11G11B10_FLOAT float hitT; // Ray length • Smoothnes s }; • …etc V e r i f y i n g correctness 1. Rasterizer output - 2. Shoot primary rays in to scene Gbuffer (BaseColor) Raytraced (BaseColor) 3. Compare Payload with Gbuffer Reference Primary Rays 4. Non zero output? Bug! 5. Fix bug = Delta 92 Shader compilation • All shaders generated ☺ • ~3000 per level • ~250 per frame • Single RT PSO • Runtime compile times? Color coded Closest Hit Shaders 97 Pso generation • Dx12 GFX PSO..