Vulkan Case Study 2016 Khronos Seoul DevU SAMSUNG Electronics

Soowan Park Graphics Engineer ([email protected]) Joonyong Park Senior Graphics Engineer ([email protected]) Before the start

All case study information & contents are based on our development experiences with Galaxy S7 spanning two chipset variants, using the ARM Mali and Qualcomm GPU.

Samsung Electronics Who we are

• GPU, Graphics R&D, MCD, SAMSUNG Electronics. • [email protected]

Samsung Electronics What we did

 ProtoStar, HIT, NFS, Vainglory  MWC, GDC, SDC, E3, Gamescom, CEDEC

Samsung Electronics History

Samsung Electronics History

Samsung Electronics Agenda

1. Swapchain 2. Uniform Buffer 3. GPU Driver 4. Rendering 5. GLES Fall-back 6. Development Tip

Samsung Electronics For who?

 For Android Vulkan Developer.  It’s very simple case, But important!

Samsung Electronics 1. Swapchain Swapchain - Android

 Triple Buffering - Project Butter (Applied since Android 4.1 Jelly Bean release) • Android OpenGL ES runs with triple buffering by default • adb shell dumpsys SurfaceFlinger →

 Image Count of Swapchain • Android platform requires at least 3 buffers to have better performance for this reason.

#0 #1 #2 #0 #1 User can’t control the number of BackBuffer in OpenGL ES

 With Java SurfaceView • Currently Android Vulkan only support native activity. But, there are way to using SurfaceView & Java activity by passing surface handle to native through JNI to get NativeWindow handle.

q.v. : https://developer.android.com/ndk/reference/group___native_activity.html • Recommend to using GLSurfaceView like separated java side Renderthread for main render loop.

Samsung Electronics Swapchain - Presentation Mode

• VK_PRESENT_MODE_MAILBOX_KHR

Swapchain Images Internal queue (impl dependant)

#0 #1 #2 X*

vkAcquireNextImage vkAcquireNextImage vkAcquireNextImage

vkQueuePresent vkQueuePresent vkQueuePresent

#0 #1 #2 X=#0 X=#1 X=#2

Latency

VBLANK Display controller will read from #1

Samsung Electronics Swapchain - Presentation Mode

• VK_PRESENT_MODE_FIFO_KHR

Swapchain Images Internal queue

#0 #1 #2 X* Y* Z*

vkAcquireNextImage vkAcquireNextImage vkAcquireNextImage

vkQueuePresent vkQueuePresent vkQueuePresent

#0 #1 #2 X=#0 Y=#1 Z=#2 Latency

VBLANK Swaps #0 stored in X with the backbuffer.

Samsung Electronics Swapchain - Presentation Mode

VK_PRESENT_MODE_FIFO_KHR

60 FPS line

VK_PRESENT_MODE_MAILBOX_KHR

60 FPS line ※ DO NOT use MAILBOX mode in game. Unless latency is critical and you know what you’re doing.

Samsung Electronics Swapchain - Presentation Mode

 Code Level (q.v. : https://www.khronos.org/registry/vulkan/specs/1.0-wsi_extensions/xhtml/vkspec.html, , 29.5. Surface Queries)

uint32_t presentModeCount = 0; vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, VK_NULL_HANDLE); std::vector pPresentModes(presentModeCount); vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, pPresentModes.data()); VkPresentModeKHR presentMode = VK_PRESENT_MODE_FIFO_KHR;

const uint32_t desiredArraySize = 2; VkPresentModeKHR desiredPresentMode[] = { VK_PRESENT_MODE_FIFO_KHR, VK_PRESENT_MODE_MAILBOX_KHR };

for (int d_n = 0; d_n < desiredArraySize; ++d_n) { for (int p_n = 0; p_n < presentModeCount; ++p_n) { if (pPresentModes[p_n] == desiredPresentMode[d_n]) { presentMode = desiredPresentMode[d_n]; d_n = desiredArraySize; break; } } }

Samsung Electronics Swapchain - SwapBuffer Comparison (Android)

WSI (Windows System Integration) RENDERFRAME N (Vulkan) WILL BLOCK HERE COMMAND FLUSHING & RENDERING Recorded Into Command Buffer #0 vkAcquireNextImageKHR #0 vkQueueSubmit #0 vkQueuePresentKHR #0 associated Graphics Queue

APPLICATION COMPLETE! Can explicitly get GPU rendering completion signal by using fence from submit VkImage(Buffer) #0 Rendering Complete Semaphore VkImage #0 VkImage (Buffer) #1 WindowBuffer INTERNAL WindowBuffer WAIT VkImage (Buffer) #2 Dequeue Queue SURFACE FLINGER Associated Native Window DISPLAY ※ Application does the “blocking wait” to sync with GPU.(VK_PRESENT_MODE_FIFO_KHR)

RENDERFRAME N (OpenGL ES) WILL BLOCK HERE

glClear / glDrawXXX #0 Render Into BackBuffer (FrameBuffer 0) #0 eglSwapBuffer #0

APPLICATION COMMAND FLUSHING & glFlush() #0 RENDERING EGLSurface : GfxBuffer #0 No way to get GPU rendering WindowBuffer EGLSurface : GfxBuffer #0 EGLSurface : GfxBuffer #1 completion WindowBuffer EGLSurface : Gfxbuffer #2 Dequeue Queue SURFACE FLINGER Associated Native Window DISPLAY

Samsung Electronics Swapchain - Synchronization failed case

• Tearing

Samsung Electronics Swapchain - Synchronization

• Fence Logic

Swapchain Swapchain

VkImage #0 VkImage #1 VkImage #2 VkImage #0 VkImage #1 VkImage #2

VkFence #0 VkFence #1 VkFence #2 VkFence #0 VkFence #1 VkFence #2

VkCommandBufferPool(Single-Thread)

VkCommandBuffer #0 VkCommandBuffer #1 VkCommandBuffer #2 VkCommandBuffer #0 VkCommandBuffer #1 VkCommandBuffer #2

vkWaitForFences(fence #0) vkWaitForFences(fence #1) vkWaitForFences(fence #2) vkResetFence(fence #0) vkResetFence(fence #1) vkResetFence(fence #2) vkResetCommandBuffer(buf #0) vkResetCommandBuffer(buf #1) vkResetCommandBuffer(buf #2) vkBeginCommandBuffer(buf #0) vkBeginCommandBuffer(buf #1) vkBeginCommandBuffer(buf #2)

Render ~ Render ~ Render ~

vkQueueSubmit(fence #0) vkQueueSubmit(fence #1) vkQueueSubmit(fence #2) vkQueuePresentKHR vkQueuePresentKHR vkQueuePresentKHR

Samsung Electronics Image Layout - Swapchain

• Transitioning to the correct image layout for presenting and rendering. • Very begin of drawing, after the first acquire • getSwapchainImagesKHR : VK_IMAGE_LAYOUT_UNDEFINED • VK_IMAGE_LAYOUT_GENERAL • Clear presentable image • Draw Routine • Acquire • VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL • Render • VK_IMAGE_LAYOUT_PRESENT_SRC_KHR • Present

// Create Swapchain vkGetSwapchainImagesKHR(device, swapchain, &swapchainImageCount, pSwapchainImages); // VK_IMAGE_LAYOUT_UNDEFINED

// Frame loop swapchainIndex = acquire(); if (firstAcquire) { setImagesLayout(pSwapchainImages, swapchainImageCount, VK_IMAGE_LAYOUT_GENERAL); clearImages(pSwapchainImages, swapchainImageCount); } setImageLayout(pSwapchainImages[swapchainIndex], VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL); /* Rendering */ setImageLayout(pSwapchainImages[swapchainIndex], VK_IMAGE_LAYOUT_PRESENT_SRC_KHR); present(swapchainIndex);

Samsung Electronics Image Layout - Texture

Texturing in Vulkan

VkDescriptorSet VkDescriptorImageInfo VkImageView VkImage VkDeviceMemory

VkSampler  VK_TILING_LINEAR • Create with VK_IMAGE_LAYOUT_PREINITIALIZED • Set ImageData using vkMapMemory, vkUnmapMemory • VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL  VK_TILING_OPTIMAL • Create with VK_IMAGE_LAYOUT_UNDEFINED • VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL • Set ImageData using Staging Buffer • VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

 You can check format property like this

// Get Image Format Property VkFormatProperties formatProperty; vkGetPhysicalDeviceFormatProperties(physicalDevice, imageFormat, &formatProperty); if (formatProperty.optimalTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) /**/; else if (formatProperty.linearTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) /**/;

Samsung Electronics Image Layout - Texture

• Why we should use Staging Buffer?

VK_TILING_LINEAR VK_TILING_OPTIMAL  Texels are laid out in memory in row-  Texels are laid out in an implementation- major order, possibly with some dependent arrangement, for more padding on each row VkImage(VkDeviceMemory) optimal memory access VkImage(VkDeviceMemory)

 So you can access it with this eq.

Common

// (x,y,z,layer) are in texel coordinates address(x,y,z,layer) = layer*arrayPitch + z*depthPitch + y*rowPitch + x*texelSize + ? offset;

Compressed // (x,y,z,layer) are in compressed texel block coordinates address(x,y,z,layer) = layer*arrayPitch + z*depthPitch + y*rowPitch + x*compressedTexelBlockByteSize + offset;

Samsung Electronics Image Layout - Texture

• How can use Staging Buffer?

Image data Fill image data into the VkBuffer VkImage with VK_TILING_OPTIMAL

VkCommandBuffer vkCmdCopyBufferToImage ? VkBuffer

VkBuffer& stagingBuffer = getStagingBuffer(imageBufferSize); VkBufferImageCopy region = getRegionFromImage(image); DO NOT use VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT fillBuffer(stagingBuffer, pImageData); with VK_TILING_OPTIMAL. vkCmdCopyBufferToImage(commandBuffer, stagingBuffer, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ®ion);

Samsung Electronics Image Layout - Framebuffer (OnlyForColor)

• Bind for Attachment (transitioning Off-screen render target to input texture e.g. environment map.. Post-processing.. etc) • VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL • Bind for Texture • VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

// Initialize FrameBuffer createFrameBuffer(frameBuffer); // VK_IMAGE_LAYOUT_UNDEFINED setImageLayout(frameBuffer, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL); ………

//Bind FrameBuffer bindFrameBuffer(frameBuffer); setImageLayout(frameBuffer, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL); /* Render into frameBuffer */ unbindFrameBuffer(frameBuffer); // And set Default Framebuffer setImageLayout(frameBuffer, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL); setTexture(frameBuffer, 0); /* Render into backbuffer */

Samsung Electronics Image Layout - Framebuffer (OnlyForColor)

Framebuffer in Vulkan VkFramebuffer VkImageView VkImage VkDeviceMemory

Off-screen #0 Original Scene VkImage #0 VkImage #0 VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL VkImage #1 VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

Rendering

Off-screen #1 NormalMap for PostProcessing VkImage #1 VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL

Samsung Electronics Swapchain - SurfaceFormat ?

Samsung Electronics Swapchain - SurfaceFormat

 Java to native, All related window surfaces are should have same format (Be careful with Java SurfaceView) APPLICATION APPLICATION

NATIVE ACTIVITY JAVA ACTIVITY

CREATE SWAPCHAIN CREATE JAVA SURFACEVIEW getHolder().setFormat(PixelFormat.RGB_888) Need to match both format VK_FORMAT_R8G8B8_UNORM CREATE SWAPCHAIN or you need to check image format in renderpass.  Recommend querying the surface format to check whether your target device supports it, All Java- and native- side surfaces should use a matching format. size_t surfaceFormatCount = 0; vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevice, surface, &surfaceFormatCount, VK_NULL_HANDLE); std::vector surfaceFormats(surfaceFormatCount); vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevice, surface, &surfaceFormatCount, surfaceFormats.data()); const size_t desiredArraySize = 2; VkFormat desiredSurafceFormats[] ={ VK_FORMAT_R8G8B8_UNORM, VK_FORMAT_R5G6B5_UNORM_PACK16}; for (int d_n = 0; d_n < desiredArraySize; ++d_n){ for (int s_n = 0; s_n < surfaceFormatCount; ++s_n){ if (surfaceFormats[s_n].format == desiredSurafceFormats[d_n]){ swapchainImageFormat = surfaceFormats[d_n].format; colorSpace = surfaceFormats[d_n].colorSpace; d_n = desiredArraySize; break; } } } Samsung Electronics Swapchain - Create / Recreation example rely on android activity events

• Surface handling

Activity Starts Create VkInstance

surfaceCreated() Create VkSurfaceKHR

Event surfaceChanged() Create VkSwapchainKHR

resizeSurface Create VkSwapchainKHR – surfaceChanged() need to pass oldSwapchain onPause

Crash surfaceDestroyed() Destroy VkSwapchainKHR, VkSurfaceKHR Need to wait until queue is empty. onResume

surfaceCreated() Create VkSurfaceKHR

surfaceChanged() Create VkSwapchainKHR

shutdown App

surfaceDestroyed() Destroy VkSwapchainKHR, VkSurfaceKHR

Activity is Destroy VkInstance shut down

Samsung Electronics Swapchain – Crash at onSurfaceChanged ( Resize )

Surface Changed

pOldSwap chain Passing Swapchain A to pOldSwapchain, Then It’s Internal resources are will be SwapChain A Create B Destroyed at Swapchain B creation time. CommandBuffer N-2 Present Completed Image #0 Crash! CommandBuffer N-1 Image #1 Present Completed Image #2 CommandBuffer N Presenting

Surface Changed

vkQueueWaitIdle() Create B Wait till queue empty Create B Image #1 Image #2

Samsung Electronics Swapchain – Crash at onSurfaceDestroyed ( Pause )

Surface Surface onPause Destroye onResume Created d SwapChain A Destroy A SwapChain B CommandBuffer N-2 Present Completed Image #0 Crash! CommandBuffer N-1 Image #1 Present Completed Image #2 CommandBuffer N Presenting

Surface Destroyed

vkQueueWaitIdle() Destroy A Wait till queue empty then Image #1 destroy A Image #2

Samsung Electronics Similar problem - Vulkan Object Release

Begin Frame

destroyShader Destroy graphicPipeline, descriptor…

destroyVertexBuffer Destroy VkBuffer, release or return VkDeviceMemory, … destroyIndexBuffer

destroyXXX

End Frame

Destroy

VkPipeline vkDestroyPipeline CommandBuffer N-2 Present Completed CommandBuffer #0 Queue

CommandBuffer N-1 CommandBuffer #1 Presenting CommandBuffer #2 CommandBuffer N ? In-Progress

Samsung Electronics Similar problem - Vulkan Object Release

Destroy Shader

RENDERFRAME RENDERFRAME RENDERFRAME RENDERFRAME RENDERFRAME N N+1 N+2 N+3 N+4

VkCommandBuffer VkCommandBuffer VkCommandBuffer VkCommandBuffer VkCommandBuffer #0 #1 #2 #0 #1

Create Dependency check Dependency check Dependency check Use VkPipeline VkPipeline VkPipeline VkPipeline VkPipeline #0~#10 #0~#10 #0~#5 #0~#5 #0~#5

Destroy Use VkPipeline Use VkPipeline Use VkPipeline VkPipeline #0~#10 #6~#10 #6~#10 #0~#5

Samsung Electronics 2. UniformBuffer UniformBuffer - Shader Memory Alignment

Expected

Samsung Electronics UniformBuffer - Shader Memory Alignment

Error

Samsung Electronics UniformBuffer - Shader Memory Alignment

Expected Error

Samsung Electronics UniformBuffer - Shader Memory Alignment

Expected #2 layout(set=0, binding=0) uniform buf1{ #0 float _unif1; // #0 vec3 _unif2; // #1 vec2 _unif3; // #2 #1 }

Convert SPIRV In case of shader using Vulkan GLSL Extension would not have alignment Applied std140 layout problem. q.v. : VulkanSpec_1.0.28, 14.5.4. Offset and Stride Assignment, But, Need to be careful if you are using Result directly converted SPIR-V from it without alignment (std140) through glslang. #0 #1 #2

Order

Samsung Electronics UniformBuffer - Shader Memory Alignment

q.v. : VulkanSpec_1.0.28, 14.5.4. Offset and Stride Assignment, • The Offset Decoration must be a multiple of its base alignment, computed recursively as follows: ◦ a scalar of size N has a base alignment of N ◦ a two-component vector, with components of size N , has a base alignment of 2N ◦ a three- or four-component vector, with components of size N , has a base alignment of 4N ◦ an array has a base alignment equal to the base alignment of its element type, rounded up to a multiple of 16 ◦ a structure has a base alignment equal to the largest base alignment of any of its members, rounded up to a multiple of 16 ◦ a row-major matrix of C columns has a base alignment equal to the base alignment of vector of C matrix components ◦ a column-major matrix has a base alignment equal to the base alignment of the matrix column type

• Any ArrayStride or MatrixStride decoration must be an integer multiple of the base alignment of the array or matrix from above. • The Offset Decoration of a member immediately following a structure or an array must be greater than or equal to the next multiple of the base alignment of that structure or array.

Samsung Electronics UniformBuffer - Shader Memory Alignment

VkDeviceMemory 4 Bytes

#0 #1 #2 layout(set=0, binding=0, std140) uniform buf1{ mat4 _unif00; // #0 vec4 _unif01; // #1 vec4 _unif02; // #2 }

#2 layout(set=0, binding=0, std140) uniform buf1{ vec2 _unif00; // #0 #0 vec2 _unif01; // #1 vec3 _unif02; // #2 } #1

Samsung Electronics UniformBuffer - Shader Memory Alignment

VkDeviceMemory

#1 layout(set=0, binding=0, std140) uniform buf1{ vec4 _unif00; // #0 vec2 _unif01; // #1 #0 vec2 _unif02; // #2 }

#2

layout(set=0, binding=0, std140) uniform buf1{ vec2 _unif00; // #0 #0 float _unif01; // #1 float _unif02; // #2 #1 } #2

Samsung Electronics UniformBuffer - Shader Memory Alignment

VkDeviceMemory

#2 layout(set=0, binding=0, std140) uniform buf1{ vec3 _unif00; // #0 float _unif01; // #1 #0 vec2 _unif02; // #2 } #1

#0 #1 #2 layout(set=0, binding=0, std140) uniform buf1{ float _unif00; // #0 vec3 _unif01; // #1 vec2 _unif02; // #2 }

Samsung Electronics UniformBuffer - Shader Memory Alignment

VkDeviceMemory

#2 layout(set=0, binding=0, std140) uniform buf1{ float _unif00; // #0 #0 vec2 _unif01; // #1 vec2 _unif02; // #2 } #1

#0 #1 #2 layout(set=0, binding=0, std140) uniform buf1{ float _unif00; // #0 vec4 _unif01; // #1 vec4 _unif02; // #2 }

Samsung Electronics UniformBuffer - Shader Memory Alignment

VkDeviceMemory

#0 #1 #2 layout(set=0, binding=0, std140) uniform buf1{ mat2 _unif00; // #0 mat3 _unif01; // #1 mat4 _unif02; // #2 }

#0 #1 layout(set=0, binding=0, std140) uniform buf1{ mat3 _unif00; // #0 float _unif01; // #1 vec2 _unif02; // #2 }

#2

Sorting, multiple UBO, using vec4… there will be many other approaches depends on your application or engine. *

Samsung Electronics UniformBuffer - Memory Alignment

 Memory Pools are useful for dynamic objects.

Assume that each object has following structure. 1 Byte layout(set=0, binding=0, std140) uniform buf1{ VkDeviceMemory vec2 _unif00; // #0 vec2 _unif01; // #1 vec2 vec2 float _unif02; // #2 float float float _unif03; // #3 }

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

rendering issue

Samsung Electronics UniformBuffer - Memory Alignment

 UBO value corruption.

?

Samsung Electronics UniformBuffer - Memory Alignment

• VkDescriptorBufferInfo - should be take care with given alignment from physical device limits VkPhysicalDeviceLimits::minUniformBufferOffsetAlignment

VkDeviceMemory VkDeviceMemory

Applied Memory Alignment

Assume that minUniformBufferOffsetAlignment : 16, block size : 1 byte

Samsung Electronics UniformBuffer - Memory Alignment

 Code level

VkPhysicalDeviceProperties properties; vkGetPhysicalDeviceProperties(physicalDevice, &properties); size_t minUniformBufferOffsetAlignment = properties.limits.minUniformBufferOffsetAlignment;

size_t padding = 0; size_t mod = _uniformBufferSize % minUniformBufferOffsetAlignment; if (mod != 0) { padding = minMemoryMapAlignment - mod; } _nextBufferOffset = _uniformBufferSize + padding;

Samsung Electronics UniformBuffer - Memory Alignment

Following limits in VkPhysicalDeviceLimits are important when you dealing with memory management.

size_t minMemoryMapAlignment; VkDeviceSize minTexelBufferOffsetAlignment; VkDeviceSize minUniformBufferOffsetAlignment; VkDeviceSize minStorageBufferOffsetAlignment VkDeviceSize nonCoherentAtomSize; …

Samsung Electronics UniformBuffer - Tile Artifact

Samsung Electronics UniformBuffer - Tile Artifact

Samsung Electronics UniformBuffer - Tile Artifact

 Why? • Because we didn’t take care about multiple UniformBuffer usage.

Swapchain

VkImage #0 VkImage #1 VkImage #2 VkImage #0 VkImage #1 VkImage #2

VkCommand VkCommand VkCommand VkCommand VkCommand VkCommand Buffer #0 Buffer #1 Buffer #2 Buffer #0 Buffer #1 Buffer #2

UniformBuffer

Samsung Electronics UniformBuffer - Tile Artifact

Mobile Tile based GPU N = N Frame MVP Matrix N+1 = N+1 Frame MVP Matrix(Changed) N N N N

N N N N

N N N+1 N+1

Samsung Electronics UniformBuffer - Tile Artifact

• Should have at least one UniformBuffer for each corresponding swapchain index of image • Or using multiple DynamicOffset with A UniformBuffer in vkCmdBindDescriptorSets can solve this issue.

Swapchain Swapchain Swapchain

VkImage VkImage VkImage VkImage VkImage VkImage VkImage VkImage VkImage #0 #1 #2 #0 #1 #2 #0 #1 #2

VkCommand VkCommand VkCommand VkCommand VkCommand VkCommand VkCommand VkCommand VkCommand Buffer #0 Buffer #1 Buffer #2 Buffer #0 Buffer #1 Buffer #2 Buffer #0 Buffer #1 Buffer #2

UB #0 UB #0 UB #1 UB #2 UB #0

dynamicOffset

Samsung Electronics 3. GPU Driver GPU Driver - Type of shader input

GPU Skinning problem

Samsung Electronics GPU Driver - Type of shader input

Original data uint32 x 4 Original data uint32 x 4 Original data uint32 x 4

SHADER vec4 SHADER vec4 SHADER uvec4

Somehow driver will correct the type of input, Please use correct Empty Even though incorrect type type of input. passed in

OpenGL ES Driver vec4 ← uvec4 Vulkan Driver vec4 Vulkan Driver uvec4

Samsung Electronics GPU Driver - API Comparison

 In common cases, the GPU load should be the same using both  But we’ve faced some cases where GLES was bit better than Vulkan • Even though we were using the same vertices and indices  A lot of effort has been spent in OpenGL ES driver optimization.  Vulkan drivers are intended to be light-weight, predictable and no more driver magic.  SO YOU HAVE TO IMPLEMENT OPTIMIZATIONS YOURSELF!

GPU load GPU load

Samsung Electronics GPU Driver - API Comparison

 Geometry sorting (Vertex & Index) • There is a limitation that a range of vertices must be shaded. This means that a triangle built from indices {0,1,2} should be significantly cheaper to execute than a triangle built from indices {0, 999, 1999} (3 vertices transformed vs. 2000 vertices transformed).

Without Geometry Sorting

With Geometry Sorting

Samsung Electronics 4. Rendering Rendering - Quality

 Sometimes you may see the color aliasing artifacts in Vulkan Applications. • You may need to consider changing surface format(RGB565 to RGB888)

• SPIR-V has only two precisions for shader calculation. ※ Saturate was modified for presentation. • In Glslang logic • lowp & mediump : RelaxedPrecision Decoration • highp : empty • You should consider using highp if your application needs accuracy. • But, please use mediump wherever possible because of performance. 

Samsung Electronics RGB32

Samsung Electronics ETC1

Samsung Electronics ASTC 6x6

Samsung Electronics ASTC 8x8

Samsung Electronics RGB32

Samsung Electronics ETC1

Samsung Electronics ASTC 6x6

Samsung Electronics ASTC 8x8

Samsung Electronics Rendering - Texture Format

 You can optimize your app using various ASTC block sizes. But you should select proper option for texture quality.

Font, NormalMap, Block Size Bits Per Pixel Color_Low, Color_High  ETC1, ASTC Comparison UI, Etc… APK Size 4x4 8.00 ETC1 + OpenGL ES 2.0 : 599 MB 5x4 6.40 Bandwidth (HIT case) ASTC + Vulkan : 521 MB 5x5 5.12 ETC1 read bandwidth = 14.56 MBs Memory usage 6x5 4.27 ASTC read bandwidth = 13.80 MBs ETC1 + OpenGL ES 2.0 : 1115 MB 6x6 3.56 ASTC + Vulkan : 557 MB 8x5 3.20 bandwidth_delta = 0.76 MBs ※ It depends on the quality choice. 8x6 2.67 bandwidth_reduction = 5.22% 10x5 2.56 10x6 2.13 8x8 2.00 10x8 1.60 10x10 1.28 12x10 1.07 12x12 0.89 Anyway, It’s better to use! VkPhysicalDeviceFeatures textureCompressionASTC_LDR

Samsung Electronics 5. GLES Fall-back GLES Fall-back

Application::onCreate()

Application::onSurfaceCreated()

Is Vulkan supported Application::loadVulkanRHI () Create Swapchain

Application::loadOpenGLESRHI () Create EGLSurface Highly recommend to put API detection at very first initialization stage – Before the surface creation not Application::onSurfaceResize() to waste additional resource allocations ..etc. Application::Vulkan::Resize() Resize Swapchain

Application::GLES::Resize() Resize Surface (FB, Viewport)

Samsung Electronics GLES Fall-back - Vulkan Detection(Mobile)

Attempt to load Attempt to create success == dlopen(“libVulkan.so”) Instance with 1.0.11 Vulkan PDK stable API version

Attempt to load Attempt to load success == vkGetInstanceProcAddr Basic functions All functions success == vkGetDeviceProcAddr

VK_SUCCESS == vkCreateInstance Attempt to create Get/Check patch VK_SUCCESS == vkCreateDevice instance version from driver 1.0.0 THREE ADDITIONAL CHECKS

Be careful that some early patch versions may be unstable or lacking features. We use the GLES renderer for patch versions less than 11.

Samsung Electronics GLES Fall-back - Resources

• Texture Resource • We do recommend to using ASTC but, still not supported drivers out there. • At moment, Common texture format for GLES in market is ETC1_RGB8_OES. • Maintaining 2 formats of texture pack just for the API compatibility would be burden. • If you want to port a game which is using ETC1 to Vulkan, you can use the existing resources by using back-word compatible format VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK. • Shader Resource • Vulkan requires SPIR-V, You can have following 2 options for the shader resources. 1) ES310 (std140) GLSL + Runtime SPIR-V conversion https://github.com/google/shaderc

ES310 glslang Persistent Runtime SPIR-V GLSL Code (shaderC) CONVERSION PipelineCache

GLSL Code Persistent (ES310) Program Binary 2) ES2/310 (std140) GLSL + Offline-compiled SPIR-V

ES310 Persistent glslang Offline SPIR-V GLSL Code CONVERSION PipelineCache

GLSL Code GLSL Code Persistent (ES2.0) (ES2/310) Program Binary

Samsung Electronics 6. Development Tip Development Tip - Vulkan Viewport Volume

?

Expected Result

Samsung Electronics Development Tip - Vulkan Viewport Volume

• Vulkan Viewport Volume • Basically, Vulkan use OriginUpperLeft Execution model. And default Viewport volume is different from OpenGL ES. (The OriginLowerLeft execution mode must not be used; fragment entry points must declare OriginUpperLeft) q.v. : VulkanSpec1.0.28 / A.3. Validation Rules within a module.

NDC Space (Ex : DepthFunc Less, DepthClear 1.0f) Y (-1 ~ +1) Add VertexShader PostFix

X (-1 ~ +1) Multiply VMatrix in front of MVP X (-1 ~ +1)

Z (-1 ~ +1) Modify Math Function

Y (-1 ~ +1) Z (0 ~ +1)

Samsung Electronics Development Tip - Vulkan Viewport Volume

// VertexShader void main() { gl_Position=vec4(0.5, 0.5, 0.0, 1.0); }

Samsung Electronics Development Tip - Vulkan Viewport Volume

Add VertexShader PostFix. #version 310 es precision highp float … void main() { … gl_position = MVP*_vertex; gl_position.y = -gl_position.y; // added gl_position.z = (gl_position.z + gl_position.w) / 2.0; // added return; }

You will face this kind of problem without above correction in Vulkan

Samsung Electronics Development Tip - Vulkan Viewport Volume

• Multiply VMatrix in front of MVP.

MAT4 VMatrix = { 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.5f, 0.5f, 0.0f, 0.0f, 0.0f, 1.0f };

Or (Depend on usage)

MAT4 VMatrix = { 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.5f, 0.0f, 0.0f, 0.0f, 0.5f, 1.0f };

MVP = VMatrix * P * V * M; setUniform(MVP);

Samsung Electronics Development Tip - Vulkan Viewport Volume

 Modify Math Function (q.v. : http://glm.g-truc.net/0.9.8/index.html)

#define GLM_LEFT_HANDED 0x00000001// For DirectX, Metal, Vulkan #define GLM_RIGHT_HANDED 0x00000002// For OpenGL, default in GLM

Ortho Ortho if GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONE #if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDED Result[2][2] = - static_cast(1) / (zFar - zNear); return orthoLH(left, right, bottom, top, zNear, zFar); Result[3][2] = - zNear / (zFar - zNear); #else #else return orthoRH(left, right, bottom, top, zNear, zFar); Result[2][2] = - static_cast(2) / (zFar - zNear); #endif Result[3][2] = - (zFar + zNear) / (zFar - zNear); #endif

Frustum Frustum #if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDED #if GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONE return frustumLH(left, right, bottom, top, nearVal, farVal); Result[2][2] = farVal / (farVal - nearVal); #else Result[3][2] = -(farVal * nearVal) / (farVal - nearVal); return frustumRH(left, right, bottom, top, nearVal, farVal); #else #endif Result[2][2] = (farVal + nearVal) / (farVal - nearVal); Result[3][2] = - (static_cast(2) * farVal * nearVal) / (farVal - nearVal); #endif Perspective Perspective #if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDED if GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONE return perspectiveFovLH(fov, width, height, zNear, zFar); Result[2][2] = zFar / (zFar - zNear); #else Result[3][2] = -(zFar * zNear) / (zFar - zNear); return perspectiveFovRH(fov, width, height, zNear, zFar); #else #endif Result[2][2] = (zFar + zNear) / (zFar - zNear); Result[3][2] = - (static_cast(2) * zFar * zNear) / (zFar - zNear); #endif

Samsung Electronics Development Tip - Validation Layer

 Loader supports layering APIs

Call Vulkan Function

vkQueueSubmit Application

vkQueueSubmit libVkLayerXXX.so

libVulkan.so vkQueueSubmit libVkLayerXXXX.so Loader

vkQueueSubmit vulkan.XXX.so Driver

Samsung Electronics Development Tip - Validation Layer

 Sometimes you may face VK_ERROR_DEVICE_LOST or unexpected error (GPU hang)  Turn on validation layer and fix it!

• [MEM, 3]: Linear Image 0x515 is aliased with non-linear image 0x507 which is in violation of the Buffer-Image Granularity section of the Vulkan specification • [DS, 49]: ]DS 0x890 encountered the following validation error at draw time: Dynamic descriptor in binding #1 at global descriptor index 2 uses buffer 4 with dynamic offset 524000 combined with offset 0 and range 720 that oversteps the buffer size of 524288 [MEM, 12]: vkCmdBeginRenderPass(): cannot read invalid memory 0x26, please fill the memory before using • bCode 7 : Cannot map an image with layout VK_IMAGE_LAYOUT_UNDEFINED. Only GENERAL or PREINITIALIZED are supported. • Code 7 : Cannot submit cmd buffer using image (0xffffffffc91fe3b0) [sub-resource: aspectMask 0x1 array layer 2, mip level 5], with layout VK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL. • Code 10 : Attempt to set lineWidth to 0.000000 but physical device wideLines feature not supported/enabled so lineWidth must be 1.0f! • Code 7 : Cannot submit cmd buffer using image (0xffffffffc7fe7ea0) [sub-resource: aspectMask 0x1 array layer 0, mip level 0], with layout VK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. • [MEM] Code 12 : vkCmdBeginRenderPass(): Cannot read invalid memory 0xffffffffc93bf470, please fill the memory before using. • Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. • Code 25 : Unable to allocate 2 descriptors of type VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER from pool 0xffffffffcf476468. This pool only has 0 descriptors of this type remaining. • Code 54 : Attempt to reset command buffer (0x28ceabb21c) which is in use. • [MEM] Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceabb21c before it has completed. You must check CB fence before this call. • Code 27 : vkUpdateDescriptorsSets() failed write update validation for Descriptor Set 0xffffffffd0e00188 with error: Cannot call vkUpdateDescriptorSets() to perform write update on descriptor set 18446744072918925704 that is in use by a command buffer. FIX Candidate CL 83010 • Code 53 : Command Buffer 0xcf3be004 is already in use and is not marked for simultaneous use. • Code 54 : Attempt to reset command buffer (0x28ceab721c) which is in use. Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceab721c before it has completed. You must check CB fence before this call. • Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.

Samsung Electronics Validation Error #1

ONE_TIME_SUBMIT CommandBuffer N-2 Frame N

CommandBuffer N-1 vkQueueSubmit (N-2)

CommandBuffer N Frame N + 1

vkQueueSubmit (N-1)

Normal Case Frame N + 2 vkResetCommandBuffer (N-2) vkQueueSubmit (N) vkWaitForFence (N-2) vkBeginCommandBuffer (N-2) Wait Index ++ % 3

CommandBuffer N

Bug Case Frame N + 2 vkResetCommandBuffer (N-2) vkQueueSubmit (N) vkWaitForFence (N) vkBeginCommandBuffer (N-2) Wait Index ++ % 3

Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceab721c before it has completed. You must check CB fence before this call. Code 53 : Command Buffer 0xcf3be004 is already in use and is not marked for simultaneous use. Code 54 : Attempt to reset command buffer (0x28ceab721c) which is in use.

Samsung Electronics, MCD, GPU Validation Error #2

Normal Case

Image (SwapChain) TILING_OPITMAL

Read Captured vkCmdCopyImageToBuffer Buffer (Staging) N/A Memory RAW data

Bug Case

Image (SwapChain) TILING_OPITMAL

Image (Staging) Read Captured vkCmdCopyImage TILING_LINEAR Memory RAW data

[MEM, 3]: Linear Image 0x515 is aliased with non-linear image 0x507 which is in violation of the Buffer- Image Granularity section of the Vulkan specification

Samsung Electronics, MCD, GPU Validation Error #3

switch (ImageLayout) { case VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL: return VK_ACCESS_TRANSFER_READ_BIT; case VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL: return VK_ACCESS_TRANSFER_WRITE_BIT; case VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL: return VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT; case VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL: return VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT; case VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL: return VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_INPUT_ATTACHMENT_READ_BIT; case VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL: return VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT; case VK_IMAGE_LAYOUT_GENERAL: return VK_ACCESS_INPUT_ATTACHMENT_READ_BIT | Transition for image copy, resolve, clear… etc VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_SHADER_WRITE_BIT | Set Image memory barrier VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | copy, resolve, clear… VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT | Restore original image layout VK_ACCESS_TRANSFER_READ_BIT | VK_ACCESS_TRANSFER_WRITE_BIT | VK_ACCESS_MEMORY_READ_BIT | VK_ACCESS_MEMORY_WRITE_BIT; case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR: return VK_ACCESS_MEMORY_READ_BIT; case VK_IMAGE_LAYOUT_UNDEFINED: case VK_IMAGE_LAYOUT_PREINITIALIZED: return 0; } Code 7 : Cannot submit cmd buffer using image (0xffffffffc91fe3b0) [sub- resource: aspectMask 0x1 array layer 2, mip level 5], with layout VK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.

Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.

And so on…

Samsung Electronics, MCD, GPU Validation Error #4

BUFFER ( 524000 ) 720 BUFFER ( 524288 ) [DS, 49]: DS 0x890 encountered the following validation error at draw time: Dynamic descriptor in binding #1 at global descriptor index 2 uses buffer 4 with dynamic offset 524000 combined with offset 0 and range 720 that oversteps the buffer size of 524288

RenderPass - Attachment Description VK_ATTACHMENT_LOAD_OP_DONT_CARE VK_ATTACHMENT_LOAD_OP_CLEAR / LOAD [MEM, 12]: vkCmdBeginRenderPass(): cannot read invalid memory 0x26, please fill the memory before using

No Debug Lines? VkPipelineRasterizationStateCreateInfo.lineWidth Code 10 : Attempt to set lineWidth to 0.000000 but physical device wideLines feature not supported/enabled so lineWidth must be 1.0f!

Samsung Electronics, MCD, GPU Development Tip - VkPipelineCache

 If you don’t use VkPipelineCache, you may face performance problem. (lag)  vkCreateGraphicsPipelines -> It’s really slow! So that we need to use VkPipelineCache.

Game Loading Time (createGraphicPipeline 300 EA + @)

Without VkPipelineCache With VkPipelineCache (Persistent)

13.260 seconds 4.187 seconds

std::vector& pipelineCacheData = getPipelineCacheFromSDcard(); VkPipelineCacheCreateInfo pipelineCacheCreateInfo = {}; pipelineCacheCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_CACHE_CREATE_INFO; onResume pipelineCacheCreateInfo.initialDataSize = pipelineCacheData.size(); pipelineCacheCreateInfo.pInitialData = pipelineCacheData.data(); VkPipelineCache pipelineCache = VK_NULL_HANDLE; vkCreatePipelineCache(device, &pipelineCacheCreateInfo, VK_NULL_HANDLE, &pipelineCache);

createGraphicPipeline vkCreateGraphicsPipelines(device, pipelineCache, 1, &createInfo, VK_NULL_HANDLE, &pipline);

size_t pDataSize = 0; vkGetPipelineCacheData(device, pipelineCache, &pDataSize, VK_NULL_HANDLE); onPause // if is valid vkGetPipelineCacheData(device, pipelineCache, &pDataSize, pipelineCacheData.data()); savePipelineCacheToSDcard(pipelineCacheData);

Samsung Electronics Development Tip - Managing VkPipeline

 Let’s think about very simple renderer logic.

Initialize To make pipeline need a lot of information. VkGraphicsPipelineCreateInfo glUseProgram setShader VkPipelineVertexInputStateCreateInfo VkPipelineInputAssemblyStateCreateInfo VkPipelineRasterizationStateCreateInfo glEnable VkPipelineColorBlendStateCreateInfo glDisable setRenderState VkPipelineDepthStencilStateCreateInfo gl... VkPipelineViewportStateCreateInfo VkPipelineMultisampleStateCreateInfo VkDynamicState glBindTexture setTexture VkPipelineDynamicStateCreateInfo VkPipelineShaderStageCreateInfo … vkCreateGraphicsPipelines glDraw… draw vkCmdDraw…

Samsung Electronics Development Tip - Managing VkPipeline

For example worst case, Given RenderState & Attributes can be changed every per drawcall. So that, having efficiently designed pipeline management structure will be very Important for your performance optimization.

setShader VertexShader FragmentShader

VkPipelineDepthStencilStateCreateInfo, … RenderState #0 RenderState #1 setRenderState depth enable, … depth disable, …

setTexture Ignore this block in current case

VkPipelineVertexInputStateCreateInfo, … VertexAttribute #0 VertexAttribute #1 draw stride, location, binding stride, location, binding

Make structure vkCreateGraphicsPipelines to reuse VkPipeline

VkPipeline #0 VkPipeline #1

Samsung Electronics Development Tip - Managing VkRenderpass, VkFramebuffer

VkRenderpass VkRenderpass & VkFramebuffer also VkRenderPassCreateInfo { should consider reusing. … uint32_t attachmentCount; const VkAttachmentDescription* pAttachments; … }

VkAttachmentDescription { … VkAttachmentLoadOp loadOp; VK_ATTACHMENT_LOAD_OP_LOAD VkAttachmentStoreOp storeOp; VK_ATTACHMENT_LOAD_OP_CLEAR … VK_ATTACHMENT_LOAD_OP_DONT_CARE } VkAttachmentDescription;

vkCreateRenderPass

VkRenderPass #0 VkRenderPass #1

VkFreambuffer VkFramebufferCreateInfo { … VkRenderPass renderPass; … }

vkCreateFramebuffer

VkFramebuffer #0 VkFramebuffer #1

Samsung Electronics Development Tip - Clear framebuffer cost

There are 3 way to clear framebuffer (color, depth, stencil) • Renderpass Load Operation • vkCmdClearAttachments • vkCmdClearColorImage/vkCmdClearDepthStencilImage  It’s important to using proper clear approach to not waste additional clear cost ( e.g. clear all, color only, depth only ) • 1 clear color & 30 clear depth

Renderpass begin/end using vkCmdClearAttachments LoadOpClear 24 FPS 57 FPS

 Recommend to not clearing framebuffer by loading empty Renderpass begin()/end() without actual draw calls.. etcetera.

Samsung Electronics Development Tip - Clear framebuffer cost

 Only Renderpass begin/end

Clear All Clear Depth Clear Depth LoadOpClear LoadOpLoad LoadOpLoad Color Color Color Color

LoadOpClear LoadOpClear LoadOpClear Depth Depth Depth Depth

LoadOpClear LoadOpLoad LoadOpLoad Stencil Stencil Stencil Stencil

 Renderpass begin/end + APIs Faster!

Clear All Clear Depth Clear Depth LoadOpClear Color Color

LoadOpClear vkCmdClear vkCmdClear Attachments Attachments Depth Depth Depth Depth

LoadOpClear Stencil Stencil

Samsung Electronics Development Tip - Clear framebuffer cost

 Very simple example only for description.

Request Request drawPrimitive clear

Is inside Is inside Set variable Get variable Renderpass? VK_ATTACHMENT_LOAD_OP_CLEAR Renderpass? VK_ATTACHMENT_LOAD_OP_CLEAR false false

true true false Find proper Renderpass

true CreateRenderpass()

Need to store it!

vkCmdClearAttachments() DrawPrimitive() vkCmdBeginRenderPass()

Next event Next event

Samsung Electronics Development Tip - PushConstant

 Push constants are helpful to increase performance (the effect is GPU dependent)  It’s very easy to use.   But you should check device limit. VkPhysicalDeviceLimits::maxPushConstantsSize

VkPipelineLayout

// VertexShader … layout(push_constant) uniform buf1{ mat4 _unif00; } pc; // you cannot skip instancing, if uniform is push_constant. void main() { gl_position = pc._unif00 * _in_vertex; }

vkCmdPushConstants(commandBuffer, layout, stageFlags, offset, MVPMatrix.size(), MVPMatrix.data());

Samsung Electronics Development Tip - Swizzle

Error Expected

The map texture format was ETC1 (VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK)

// Fragment shader void main() { … vec4 mapColor = texture(mapSampler, texCoord); fragColor = mapColor.rgb * mapColor.a; }

Samsung Electronics Development Tip - Swizzle

VkImageViewCreateInfo { … VkComponentMapping components; … };

VkComponentMapping { VkComponentSwizzle r; VkComponentSwizzle g; VkComponentSwizzle b; VkComponentSwizzle a; VK_COMPONENT_SWIZZLE_ONE };

Samsung Electronics Development Tip - SecondaryCommandBuffer + Multi-thread

 This is simple logic for using SecondaryCommandBuffer.

Recording Phase Update Phase Execute Phase

Create Secondary Begin Update UniformBuffer CommandBuffer Primary CommandBuffer

Begin Execute Bind GraphicPipeline Secondary CommandBuffer Assume that there are no dynamic PSOs End Bind DescriptorSet Primary CommandBuffer

Bind UniformBuffer VertexBuffer / IndexBuffer

Draw

End

Samsung Electronics Development Tip - SecondaryCommandBuffer + Multi-thread

 This is simple logic for using SecondaryCommandBuffer. Draw phase Just update UniformBuffer & Execute SecondaryCommandBuffer! Create & Record SecondaryCommandBuffer

Bind SCB #0 UniformBuffer #0

SCB #1 UniformBuffer #1

SCB #2 UniformBuffer #2

SCB #3 UniformBuffer #3

SCB #4 UniformBuffer #4

SCB #5 UniformBuffer #5

Samsung Electronics Development Tip - SecondaryCommandBuffer + Multi-thread

OpenGL ES 2.0 + Single thread

Vulkan + SecondaryCommandBuffer + Multi thread

※ CPU side matrix transform calculation.

Samsung Electronics Development Tip - SecondaryCommandBuffer + Multi-thread

With Secondary Command Buffer

CPU Thread Update buffer 1/4

CPU Thread Update buffer 2/4

CPU Thread Update buffer 3/4

CPU Thread Update buffer 4/4

Queue

Execute SecondaryCommandBuffer

Samsung Electronics Development Tip - Multi-thread

Command Buffer

CPU Thread cmd cmd cmd

CPU Thread cmd cmd cmd

CPU Thread cmd cmd cmd

CPU Thread cmd cmd cmd

Queue

Samsung Electronics Development Tip - Multi-thread

Command Buffer

CPU Thread cmd cmd cmd

CPU Thread cmd cmd cmd submit!

CPU Thread cmd cmd cmd

CPU Thread cmd cmd cmd

Queue

Samsung Electronics Development Tip - Multi-thread

Command Buffer

CPU Thread cmd cmd cmd

CPU Thread

CPU Thread cmd cmd cmd

CPU Thread cmd cmd cmd

Queue

cmd cmd cmd

VkCommandPool , VkDesciprtorPool should be synchronized or all those pools are should be independently handled by each corresponding thread.

Samsung Electronics Development Tip - Reducing duplicated API calls

It is important to calling bind/set function once in a VkCommandBuffer to prevent duplicated vkCmdSetXXX, vkCmdBindXXX calls with same value / parameter.

Worst case

※ In our test case, 500 Calls vkCmdSetViewPort and vkCmdSetScissor take 1.412 ms.

Samsung Electronics Thank you