Vulkan Case Study 2016 Khronos Seoul DevU SAMSUNG Electronics
Soowan Park Graphics Engineer ([email protected]) Joonyong Park Senior Graphics Engineer ([email protected]) Before the start
All case study information & contents are based on our development experiences with Galaxy S7 spanning two chipset variants, using the ARM Mali and Qualcomm Adreno GPU.
Samsung Electronics Who we are
• GPU, Graphics R&D, MCD, SAMSUNG Electronics. • [email protected]
Samsung Electronics What we did
ProtoStar, HIT, NFS, Vainglory MWC, GDC, SDC, E3, Gamescom, CEDEC
Samsung Electronics History
Samsung Electronics History
Samsung Electronics Agenda
1. Swapchain 2. Uniform Buffer 3. GPU Driver 4. Rendering 5. GLES Fall-back 6. Development Tip
Samsung Electronics For who?
For Android Vulkan Developer. It’s very simple case, But important!
Samsung Electronics 1. Swapchain Swapchain - Android
Triple Buffering - Google Project Butter (Applied since Android 4.1 Jelly Bean release) • Android OpenGL ES runs with triple buffering by default • adb shell dumpsys SurfaceFlinger →
Image Count of Swapchain • Android platform requires at least 3 buffers to have better performance for this reason.
#0 #1 #2 #0 #1 User can’t control the number of BackBuffer in OpenGL ES
With Java SurfaceView • Currently Android Vulkan only support native activity. But, there are way to using SurfaceView & Java activity by passing surface handle to native through JNI to get NativeWindow handle.
q.v. : https://developer.android.com/ndk/reference/group___native_activity.html • Recommend to using GLSurfaceView like separated java side Renderthread for main render loop.
Samsung Electronics Swapchain - Presentation Mode
• VK_PRESENT_MODE_MAILBOX_KHR
Swapchain Images Internal queue (impl dependant)
#0 #1 #2 X*
vkAcquireNextImage vkAcquireNextImage vkAcquireNextImage
vkQueuePresent vkQueuePresent vkQueuePresent
#0 #1 #2 X=#0 X=#1 X=#2
Latency
VBLANK Display controller will read from #1
Samsung Electronics Swapchain - Presentation Mode
• VK_PRESENT_MODE_FIFO_KHR
Swapchain Images Internal queue
#0 #1 #2 X* Y* Z*
vkAcquireNextImage vkAcquireNextImage vkAcquireNextImage
vkQueuePresent vkQueuePresent vkQueuePresent
#0 #1 #2 X=#0 Y=#1 Z=#2 Latency
VBLANK Swaps #0 stored in X with the backbuffer.
Samsung Electronics Swapchain - Presentation Mode
VK_PRESENT_MODE_FIFO_KHR
60 FPS line
VK_PRESENT_MODE_MAILBOX_KHR
60 FPS line ※ DO NOT use MAILBOX mode in game. Unless latency is critical and you know what you’re doing.
Samsung Electronics Swapchain - Presentation Mode
Code Level (q.v. : https://www.khronos.org/registry/vulkan/specs/1.0-wsi_extensions/xhtml/vkspec.html, , 29.5. Surface Queries)
uint32_t presentModeCount = 0; vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, VK_NULL_HANDLE); std::vector
const uint32_t desiredArraySize = 2; VkPresentModeKHR desiredPresentMode[] = { VK_PRESENT_MODE_FIFO_KHR, VK_PRESENT_MODE_MAILBOX_KHR };
for (int d_n = 0; d_n < desiredArraySize; ++d_n) { for (int p_n = 0; p_n < presentModeCount; ++p_n) { if (pPresentModes[p_n] == desiredPresentMode[d_n]) { presentMode = desiredPresentMode[d_n]; d_n = desiredArraySize; break; } } }
Samsung Electronics Swapchain - SwapBuffer Comparison (Android)
WSI (Windows System Integration) RENDERFRAME N (Vulkan) WILL BLOCK HERE COMMAND FLUSHING & RENDERING Recorded Into Command Buffer #0 vkAcquireNextImageKHR #0 vkQueueSubmit #0 vkQueuePresentKHR #0 associated Graphics Queue
APPLICATION COMPLETE! Can explicitly get GPU rendering completion signal by using fence from submit VkImage(Buffer) #0 Rendering Complete Semaphore VkImage #0 VkImage (Buffer) #1 WindowBuffer INTERNAL WindowBuffer WAIT VkImage (Buffer) #2 Dequeue Queue SURFACE FLINGER Associated Native Window DISPLAY ※ Application does the “blocking wait” to sync with GPU.(VK_PRESENT_MODE_FIFO_KHR)
RENDERFRAME N (OpenGL ES) WILL BLOCK HERE
glClear / glDrawXXX #0 Render Into BackBuffer (FrameBuffer 0) #0 eglSwapBuffer #0
APPLICATION COMMAND FLUSHING & glFlush() #0 RENDERING EGLSurface : GfxBuffer #0 No way to get GPU rendering WindowBuffer EGLSurface : GfxBuffer #0 EGLSurface : GfxBuffer #1 completion WindowBuffer EGLSurface : Gfxbuffer #2 Dequeue Queue SURFACE FLINGER Associated Native Window DISPLAY
Samsung Electronics Swapchain - Synchronization failed case
• Tearing
Samsung Electronics Swapchain - Synchronization
• Fence Logic
Swapchain Swapchain
VkImage #0 VkImage #1 VkImage #2 VkImage #0 VkImage #1 VkImage #2
VkFence #0 VkFence #1 VkFence #2 VkFence #0 VkFence #1 VkFence #2
VkCommandBufferPool(Single-Thread)
VkCommandBuffer #0 VkCommandBuffer #1 VkCommandBuffer #2 VkCommandBuffer #0 VkCommandBuffer #1 VkCommandBuffer #2
vkWaitForFences(fence #0) vkWaitForFences(fence #1) vkWaitForFences(fence #2) vkResetFence(fence #0) vkResetFence(fence #1) vkResetFence(fence #2) vkResetCommandBuffer(buf #0) vkResetCommandBuffer(buf #1) vkResetCommandBuffer(buf #2) vkBeginCommandBuffer(buf #0) vkBeginCommandBuffer(buf #1) vkBeginCommandBuffer(buf #2)
Render ~ Render ~ Render ~
vkQueueSubmit(fence #0) vkQueueSubmit(fence #1) vkQueueSubmit(fence #2) vkQueuePresentKHR vkQueuePresentKHR vkQueuePresentKHR
Samsung Electronics Image Layout - Swapchain
• Transitioning to the correct image layout for presenting and rendering. • Very begin of drawing, after the first acquire • getSwapchainImagesKHR : VK_IMAGE_LAYOUT_UNDEFINED • VK_IMAGE_LAYOUT_GENERAL • Clear presentable image • Draw Routine • Acquire • VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL • Render • VK_IMAGE_LAYOUT_PRESENT_SRC_KHR • Present
// Create Swapchain vkGetSwapchainImagesKHR(device, swapchain, &swapchainImageCount, pSwapchainImages); // VK_IMAGE_LAYOUT_UNDEFINED
// Frame loop swapchainIndex = acquire(); if (firstAcquire) { setImagesLayout(pSwapchainImages, swapchainImageCount, VK_IMAGE_LAYOUT_GENERAL); clearImages(pSwapchainImages, swapchainImageCount); } setImageLayout(pSwapchainImages[swapchainIndex], VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL); /* Rendering */ setImageLayout(pSwapchainImages[swapchainIndex], VK_IMAGE_LAYOUT_PRESENT_SRC_KHR); present(swapchainIndex);
Samsung Electronics Image Layout - Texture
Texturing in Vulkan
VkDescriptorSet VkDescriptorImageInfo VkImageView VkImage VkDeviceMemory
VkSampler VK_TILING_LINEAR • Create with VK_IMAGE_LAYOUT_PREINITIALIZED • Set ImageData using vkMapMemory, vkUnmapMemory • VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL VK_TILING_OPTIMAL • Create with VK_IMAGE_LAYOUT_UNDEFINED • VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL • Set ImageData using Staging Buffer • VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
You can check format property like this
// Get Image Format Property VkFormatProperties formatProperty; vkGetPhysicalDeviceFormatProperties(physicalDevice, imageFormat, &formatProperty); if (formatProperty.optimalTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) /**/; else if (formatProperty.linearTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) /**/;
Samsung Electronics Image Layout - Texture
• Why we should use Staging Buffer?
VK_TILING_LINEAR VK_TILING_OPTIMAL Texels are laid out in memory in row- Texels are laid out in an implementation- major order, possibly with some dependent arrangement, for more padding on each row VkImage(VkDeviceMemory) optimal memory access VkImage(VkDeviceMemory)
So you can access it with this eq.
Common
// (x,y,z,layer) are in texel coordinates address(x,y,z,layer) = layer*arrayPitch + z*depthPitch + y*rowPitch + x*texelSize + ? offset;
Compressed // (x,y,z,layer) are in compressed texel block coordinates address(x,y,z,layer) = layer*arrayPitch + z*depthPitch + y*rowPitch + x*compressedTexelBlockByteSize + offset;
Samsung Electronics Image Layout - Texture
• How can use Staging Buffer?
Image data Fill image data into the VkBuffer VkImage with VK_TILING_OPTIMAL
VkCommandBuffer vkCmdCopyBufferToImage ? VkBuffer
VkBuffer& stagingBuffer = getStagingBuffer(imageBufferSize); VkBufferImageCopy region = getRegionFromImage(image); DO NOT use VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT fillBuffer(stagingBuffer, pImageData); with VK_TILING_OPTIMAL. vkCmdCopyBufferToImage(commandBuffer, stagingBuffer, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ®ion);
Samsung Electronics Image Layout - Framebuffer (OnlyForColor)
• Bind for Attachment (transitioning Off-screen render target to input texture e.g. environment map.. Post-processing.. etc) • VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL • Bind for Texture • VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
// Initialize FrameBuffer createFrameBuffer(frameBuffer); // VK_IMAGE_LAYOUT_UNDEFINED setImageLayout(frameBuffer, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL); ………
//Bind FrameBuffer bindFrameBuffer(frameBuffer); setImageLayout(frameBuffer, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL); /* Render into frameBuffer */ unbindFrameBuffer(frameBuffer); // And set Default Framebuffer setImageLayout(frameBuffer, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL); setTexture(frameBuffer, 0); /* Render into backbuffer */
Samsung Electronics Image Layout - Framebuffer (OnlyForColor)
Framebuffer in Vulkan VkFramebuffer VkImageView VkImage VkDeviceMemory
Off-screen #0 Original Scene VkImage #0 VkImage #0 VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL VkImage #1 VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
Rendering
Off-screen #1 NormalMap for PostProcessing VkImage #1 VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
Samsung Electronics Swapchain - SurfaceFormat ?
Samsung Electronics Swapchain - SurfaceFormat
Java to native, All related window surfaces are should have same format (Be careful with Java SurfaceView) APPLICATION APPLICATION
NATIVE ACTIVITY JAVA ACTIVITY
CREATE SWAPCHAIN CREATE JAVA SURFACEVIEW getHolder().setFormat(PixelFormat.RGB_888) Need to match both format VK_FORMAT_R8G8B8_UNORM CREATE SWAPCHAIN or you need to check image format in renderpass. Recommend querying the surface format to check whether your target device supports it, All Java- and native- side surfaces should use a matching format. size_t surfaceFormatCount = 0; vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevice, surface, &surfaceFormatCount, VK_NULL_HANDLE); std::vector
• Surface handling
Activity Starts Create VkInstance
surfaceCreated() Create VkSurfaceKHR
Event surfaceChanged() Create VkSwapchainKHR
resizeSurface Create VkSwapchainKHR – surfaceChanged() need to pass oldSwapchain onPause
Crash surfaceDestroyed() Destroy VkSwapchainKHR, VkSurfaceKHR Need to wait until queue is empty. onResume
surfaceCreated() Create VkSurfaceKHR
surfaceChanged() Create VkSwapchainKHR
shutdown App
surfaceDestroyed() Destroy VkSwapchainKHR, VkSurfaceKHR
Activity is Destroy VkInstance shut down
Samsung Electronics Swapchain – Crash at onSurfaceChanged ( Resize )
Surface Changed
pOldSwap chain Passing Swapchain A to pOldSwapchain, Then It’s Internal resources are will be SwapChain A Create B Destroyed at Swapchain B creation time. CommandBuffer N-2 Present Completed Image #0 Crash! CommandBuffer N-1 Image #1 Present Completed Image #2 CommandBuffer N Presenting
Surface Changed
vkQueueWaitIdle() Create B Wait till queue empty Create B Image #1 Image #2
Samsung Electronics Swapchain – Crash at onSurfaceDestroyed ( Pause )
Surface Surface onPause Destroye onResume Created d SwapChain A Destroy A SwapChain B CommandBuffer N-2 Present Completed Image #0 Crash! CommandBuffer N-1 Image #1 Present Completed Image #2 CommandBuffer N Presenting
Surface Destroyed
vkQueueWaitIdle() Destroy A Wait till queue empty then Image #1 destroy A Image #2
Samsung Electronics Similar problem - Vulkan Object Release
Begin Frame
destroyShader Destroy graphicPipeline, descriptor…
destroyVertexBuffer Destroy VkBuffer, release or return VkDeviceMemory, … destroyIndexBuffer
destroyXXX
End Frame
Destroy Shader
VkPipeline vkDestroyPipeline CommandBuffer N-2 Present Completed CommandBuffer #0 Queue
CommandBuffer N-1 CommandBuffer #1 Presenting CommandBuffer #2 CommandBuffer N ? In-Progress
Samsung Electronics Similar problem - Vulkan Object Release
Destroy Shader
RENDERFRAME RENDERFRAME RENDERFRAME RENDERFRAME RENDERFRAME N N+1 N+2 N+3 N+4
VkCommandBuffer VkCommandBuffer VkCommandBuffer VkCommandBuffer VkCommandBuffer #0 #1 #2 #0 #1
Create Dependency check Dependency check Dependency check Use VkPipeline VkPipeline VkPipeline VkPipeline VkPipeline #0~#10 #0~#10 #0~#5 #0~#5 #0~#5
Destroy Use VkPipeline Use VkPipeline Use VkPipeline VkPipeline #0~#10 #6~#10 #6~#10 #0~#5
Samsung Electronics 2. UniformBuffer UniformBuffer - Shader Memory Alignment
Expected
Samsung Electronics UniformBuffer - Shader Memory Alignment
Error
Samsung Electronics UniformBuffer - Shader Memory Alignment
Expected Error
Samsung Electronics UniformBuffer - Shader Memory Alignment
Expected #2 layout(set=0, binding=0) uniform buf1{ #0 float _unif1; // #0 vec3 _unif2; // #1 vec2 _unif3; // #2 #1 }
Convert SPIRV In case of shader using Vulkan GLSL Extension would not have alignment Applied std140 layout problem. q.v. : VulkanSpec_1.0.28, 14.5.4. Offset and Stride Assignment, But, Need to be careful if you are using Result directly converted SPIR-V from it without alignment (std140) through glslang. #0 #1 #2
Order
Samsung Electronics UniformBuffer - Shader Memory Alignment
q.v. : VulkanSpec_1.0.28, 14.5.4. Offset and Stride Assignment, • The Offset Decoration must be a multiple of its base alignment, computed recursively as follows: ◦ a scalar of size N has a base alignment of N ◦ a two-component vector, with components of size N , has a base alignment of 2N ◦ a three- or four-component vector, with components of size N , has a base alignment of 4N ◦ an array has a base alignment equal to the base alignment of its element type, rounded up to a multiple of 16 ◦ a structure has a base alignment equal to the largest base alignment of any of its members, rounded up to a multiple of 16 ◦ a row-major matrix of C columns has a base alignment equal to the base alignment of vector of C matrix components ◦ a column-major matrix has a base alignment equal to the base alignment of the matrix column type
• Any ArrayStride or MatrixStride decoration must be an integer multiple of the base alignment of the array or matrix from above. • The Offset Decoration of a member immediately following a structure or an array must be greater than or equal to the next multiple of the base alignment of that structure or array.
Samsung Electronics UniformBuffer - Shader Memory Alignment
VkDeviceMemory 4 Bytes
#0 #1 #2 layout(set=0, binding=0, std140) uniform buf1{ mat4 _unif00; // #0 vec4 _unif01; // #1 vec4 _unif02; // #2 }
#2 layout(set=0, binding=0, std140) uniform buf1{ vec2 _unif00; // #0 #0 vec2 _unif01; // #1 vec3 _unif02; // #2 } #1
Samsung Electronics UniformBuffer - Shader Memory Alignment
VkDeviceMemory
#1 layout(set=0, binding=0, std140) uniform buf1{ vec4 _unif00; // #0 vec2 _unif01; // #1 #0 vec2 _unif02; // #2 }
#2
layout(set=0, binding=0, std140) uniform buf1{ vec2 _unif00; // #0 #0 float _unif01; // #1 float _unif02; // #2 #1 } #2
Samsung Electronics UniformBuffer - Shader Memory Alignment
VkDeviceMemory
#2 layout(set=0, binding=0, std140) uniform buf1{ vec3 _unif00; // #0 float _unif01; // #1 #0 vec2 _unif02; // #2 } #1
#0 #1 #2 layout(set=0, binding=0, std140) uniform buf1{ float _unif00; // #0 vec3 _unif01; // #1 vec2 _unif02; // #2 }
Samsung Electronics UniformBuffer - Shader Memory Alignment
VkDeviceMemory
#2 layout(set=0, binding=0, std140) uniform buf1{ float _unif00; // #0 #0 vec2 _unif01; // #1 vec2 _unif02; // #2 } #1
#0 #1 #2 layout(set=0, binding=0, std140) uniform buf1{ float _unif00; // #0 vec4 _unif01; // #1 vec4 _unif02; // #2 }
Samsung Electronics UniformBuffer - Shader Memory Alignment
VkDeviceMemory
#0 #1 #2 layout(set=0, binding=0, std140) uniform buf1{ mat2 _unif00; // #0 mat3 _unif01; // #1 mat4 _unif02; // #2 }
#0 #1 layout(set=0, binding=0, std140) uniform buf1{ mat3 _unif00; // #0 float _unif01; // #1 vec2 _unif02; // #2 }
#2
Sorting, multiple UBO, using vec4… there will be many other approaches depends on your application or engine. *
Samsung Electronics UniformBuffer - Memory Alignment
Memory Pools are useful for dynamic objects.
Assume that each object has following structure. 1 Byte layout(set=0, binding=0, std140) uniform buf1{ VkDeviceMemory vec2 _unif00; // #0 vec2 _unif01; // #1 vec2 vec2 float _unif02; // #2 float float float _unif03; // #3 }
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
rendering issue
Samsung Electronics UniformBuffer - Memory Alignment
UBO value corruption.
?
Samsung Electronics UniformBuffer - Memory Alignment
• VkDescriptorBufferInfo - should be take care with given alignment from physical device limits VkPhysicalDeviceLimits::minUniformBufferOffsetAlignment
VkDeviceMemory VkDeviceMemory
Applied Memory Alignment
Assume that minUniformBufferOffsetAlignment : 16, block size : 1 byte
Samsung Electronics UniformBuffer - Memory Alignment
Code level
VkPhysicalDeviceProperties properties; vkGetPhysicalDeviceProperties(physicalDevice, &properties); size_t minUniformBufferOffsetAlignment = properties.limits.minUniformBufferOffsetAlignment;
size_t padding = 0; size_t mod = _uniformBufferSize % minUniformBufferOffsetAlignment; if (mod != 0) { padding = minMemoryMapAlignment - mod; } _nextBufferOffset = _uniformBufferSize + padding;
Samsung Electronics UniformBuffer - Memory Alignment
Following limits in VkPhysicalDeviceLimits are important when you dealing with memory management.
size_t minMemoryMapAlignment; VkDeviceSize minTexelBufferOffsetAlignment; VkDeviceSize minUniformBufferOffsetAlignment; VkDeviceSize minStorageBufferOffsetAlignment VkDeviceSize nonCoherentAtomSize; …
Samsung Electronics UniformBuffer - Tile Artifact
Samsung Electronics UniformBuffer - Tile Artifact
Samsung Electronics UniformBuffer - Tile Artifact
Why? • Because we didn’t take care about multiple UniformBuffer usage.
Swapchain
VkImage #0 VkImage #1 VkImage #2 VkImage #0 VkImage #1 VkImage #2
VkCommand VkCommand VkCommand VkCommand VkCommand VkCommand Buffer #0 Buffer #1 Buffer #2 Buffer #0 Buffer #1 Buffer #2
UniformBuffer
Samsung Electronics UniformBuffer - Tile Artifact
Mobile Tile based GPU N = N Frame MVP Matrix N+1 = N+1 Frame MVP Matrix(Changed) N N N N
N N N N
N N N+1 N+1
Samsung Electronics UniformBuffer - Tile Artifact
• Should have at least one UniformBuffer for each corresponding swapchain index of image • Or using multiple DynamicOffset with A UniformBuffer in vkCmdBindDescriptorSets can solve this issue.
Swapchain Swapchain Swapchain
VkImage VkImage VkImage VkImage VkImage VkImage VkImage VkImage VkImage #0 #1 #2 #0 #1 #2 #0 #1 #2
VkCommand VkCommand VkCommand VkCommand VkCommand VkCommand VkCommand VkCommand VkCommand Buffer #0 Buffer #1 Buffer #2 Buffer #0 Buffer #1 Buffer #2 Buffer #0 Buffer #1 Buffer #2
UB #0 UB #0 UB #1 UB #2 UB #0
dynamicOffset
Samsung Electronics 3. GPU Driver GPU Driver - Type of shader input
GPU Skinning problem
Samsung Electronics GPU Driver - Type of shader input
Original data uint32 x 4 Original data uint32 x 4 Original data uint32 x 4
SHADER vec4 SHADER vec4 SHADER uvec4
Somehow driver will correct the type of input, Please use correct Empty Even though incorrect type type of input. passed in
OpenGL ES Driver vec4 ← uvec4 Vulkan Driver vec4 Vulkan Driver uvec4
Samsung Electronics GPU Driver - API Comparison
In common cases, the GPU load should be the same using both APIs But we’ve faced some cases where GLES was bit better than Vulkan • Even though we were using the same vertices and indices A lot of effort has been spent in OpenGL ES driver optimization. Vulkan drivers are intended to be light-weight, predictable and no more driver magic. SO YOU HAVE TO IMPLEMENT OPTIMIZATIONS YOURSELF!
GPU load GPU load
Samsung Electronics GPU Driver - API Comparison
Geometry sorting (Vertex & Index) • There is a limitation that a range of vertices must be shaded. This means that a triangle built from indices {0,1,2} should be significantly cheaper to execute than a triangle built from indices {0, 999, 1999} (3 vertices transformed vs. 2000 vertices transformed).
Without Geometry Sorting
With Geometry Sorting
Samsung Electronics 4. Rendering Rendering - Quality
Sometimes you may see the color aliasing artifacts in Vulkan Applications. • You may need to consider changing surface format(RGB565 to RGB888)
• SPIR-V has only two precisions for shader calculation. ※ Saturate was modified for presentation. • In Glslang logic • lowp & mediump : RelaxedPrecision Decoration • highp : empty • You should consider using highp if your application needs accuracy. • But, please use mediump wherever possible because of performance.
Samsung Electronics RGB32
Samsung Electronics ETC1
Samsung Electronics ASTC 6x6
Samsung Electronics ASTC 8x8
Samsung Electronics RGB32
Samsung Electronics ETC1
Samsung Electronics ASTC 6x6
Samsung Electronics ASTC 8x8
Samsung Electronics Rendering - Texture Format
You can optimize your app using various ASTC block sizes. But you should select proper option for texture quality.
Font, NormalMap, Block Size Bits Per Pixel Color_Low, Color_High ETC1, ASTC Comparison UI, Etc… APK Size 4x4 8.00 ETC1 + OpenGL ES 2.0 : 599 MB 5x4 6.40 Bandwidth (HIT case) ASTC + Vulkan : 521 MB 5x5 5.12 ETC1 read bandwidth = 14.56 MBs Memory usage 6x5 4.27 ASTC read bandwidth = 13.80 MBs ETC1 + OpenGL ES 2.0 : 1115 MB 6x6 3.56 ASTC + Vulkan : 557 MB 8x5 3.20 bandwidth_delta = 0.76 MBs ※ It depends on the quality choice. 8x6 2.67 bandwidth_reduction = 5.22% 10x5 2.56 10x6 2.13 8x8 2.00 10x8 1.60 10x10 1.28 12x10 1.07 12x12 0.89 Anyway, It’s better to use! VkPhysicalDeviceFeatures textureCompressionASTC_LDR
Samsung Electronics 5. GLES Fall-back GLES Fall-back
Application::onCreate()
Application::onSurfaceCreated()
Is Vulkan supported Application::loadVulkanRHI () Create Swapchain
Application::loadOpenGLESRHI () Create EGLSurface Highly recommend to put API detection at very first initialization stage – Before the surface creation not Application::onSurfaceResize() to waste additional resource allocations ..etc. Application::Vulkan::Resize() Resize Swapchain
Application::GLES::Resize() Resize Surface (FB, Viewport)
Samsung Electronics GLES Fall-back - Vulkan Detection(Mobile)
Attempt to load Attempt to create success == dlopen(“libVulkan.so”) Instance with 1.0.11 Vulkan PDK stable API version
Attempt to load Attempt to load success == vkGetInstanceProcAddr Basic functions All functions success == vkGetDeviceProcAddr
VK_SUCCESS == vkCreateInstance Attempt to create Get/Check patch VK_SUCCESS == vkCreateDevice instance version from driver 1.0.0 THREE ADDITIONAL CHECKS
Be careful that some early patch versions may be unstable or lacking features. We use the GLES renderer for patch versions less than 11.
Samsung Electronics GLES Fall-back - Resources
• Texture Resource • We do recommend to using ASTC but, still not supported drivers out there. • At moment, Common texture format for GLES in market is ETC1_RGB8_OES. • Maintaining 2 formats of texture pack just for the API compatibility would be burden. • If you want to port a game which is using ETC1 to Vulkan, you can use the existing resources by using back-word compatible format VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK. • Shader Resource • Vulkan requires SPIR-V, You can have following 2 options for the shader resources. 1) ES310 (std140) GLSL + Runtime SPIR-V conversion https://github.com/google/shaderc
ES310 glslang Persistent Runtime SPIR-V GLSL Code (shaderC) CONVERSION PipelineCache
GLSL Code Persistent (ES310) Program Binary 2) ES2/310 (std140) GLSL + Offline-compiled SPIR-V
ES310 Persistent glslang Offline SPIR-V GLSL Code CONVERSION PipelineCache
GLSL Code GLSL Code Persistent (ES2.0) (ES2/310) Program Binary
Samsung Electronics 6. Development Tip Development Tip - Vulkan Viewport Volume
?
Expected Result
Samsung Electronics Development Tip - Vulkan Viewport Volume
• Vulkan Viewport Volume • Basically, Vulkan use OriginUpperLeft Execution model. And default Viewport volume is different from OpenGL ES. (The OriginLowerLeft execution mode must not be used; fragment entry points must declare OriginUpperLeft) q.v. : VulkanSpec1.0.28 / A.3. Validation Rules within a module.
NDC Space (Ex : DepthFunc Less, DepthClear 1.0f) Y (-1 ~ +1) Add VertexShader PostFix
X (-1 ~ +1) Multiply VMatrix in front of MVP X (-1 ~ +1)
Z (-1 ~ +1) Modify Math Function
Y (-1 ~ +1) Z (0 ~ +1)
Samsung Electronics Development Tip - Vulkan Viewport Volume
// VertexShader void main() { gl_Position=vec4(0.5, 0.5, 0.0, 1.0); }
Samsung Electronics Development Tip - Vulkan Viewport Volume
Add VertexShader PostFix. #version 310 es precision highp float … void main() { … gl_position = MVP*_vertex; gl_position.y = -gl_position.y; // added gl_position.z = (gl_position.z + gl_position.w) / 2.0; // added return; }
You will face this kind of problem without above correction in Vulkan
Samsung Electronics Development Tip - Vulkan Viewport Volume
• Multiply VMatrix in front of MVP.
MAT4 VMatrix = { 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.5f, 0.5f, 0.0f, 0.0f, 0.0f, 1.0f };
Or (Depend on usage)
MAT4 VMatrix = { 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.5f, 0.0f, 0.0f, 0.0f, 0.5f, 1.0f };
MVP = VMatrix * P * V * M; setUniform(MVP);
Samsung Electronics Development Tip - Vulkan Viewport Volume
Modify Math Function (q.v. : http://glm.g-truc.net/0.9.8/index.html)
#define GLM_LEFT_HANDED 0x00000001// For DirectX, Metal, Vulkan #define GLM_RIGHT_HANDED 0x00000002// For OpenGL, default in GLM
Ortho Ortho if GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONE #if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDED Result[2][2] = - static_cast
Frustum Frustum #if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDED #if GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONE return frustumLH(left, right, bottom, top, nearVal, farVal); Result[2][2] = farVal / (farVal - nearVal); #else Result[3][2] = -(farVal * nearVal) / (farVal - nearVal); return frustumRH(left, right, bottom, top, nearVal, farVal); #else #endif Result[2][2] = (farVal + nearVal) / (farVal - nearVal); Result[3][2] = - (static_cast
Samsung Electronics Development Tip - Validation Layer
Loader supports layering APIs
Call Vulkan Function
vkQueueSubmit Application
vkQueueSubmit libVkLayerXXX.so
libVulkan.so vkQueueSubmit libVkLayerXXXX.so Loader
vkQueueSubmit vulkan.XXX.so Driver
Samsung Electronics Development Tip - Validation Layer
Sometimes you may face VK_ERROR_DEVICE_LOST or unexpected error (GPU hang) Turn on validation layer and fix it!
• [MEM, 3]: Linear Image 0x515 is aliased with non-linear image 0x507 which is in violation of the Buffer-Image Granularity section of the Vulkan specification • [DS, 49]: ]DS 0x890 encountered the following validation error at draw time: Dynamic descriptor in binding #1 at global descriptor index 2 uses buffer 4 with dynamic offset 524000 combined with offset 0 and range 720 that oversteps the buffer size of 524288 [MEM, 12]: vkCmdBeginRenderPass(): cannot read invalid memory 0x26, please fill the memory before using • bCode 7 : Cannot map an image with layout VK_IMAGE_LAYOUT_UNDEFINED. Only GENERAL or PREINITIALIZED are supported. • Code 7 : Cannot submit cmd buffer using image (0xffffffffc91fe3b0) [sub-resource: aspectMask 0x1 array layer 2, mip level 5], with layout VK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL. • Code 10 : Attempt to set lineWidth to 0.000000 but physical device wideLines feature not supported/enabled so lineWidth must be 1.0f! • Code 7 : Cannot submit cmd buffer using image (0xffffffffc7fe7ea0) [sub-resource: aspectMask 0x1 array layer 0, mip level 0], with layout VK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. • [MEM] Code 12 : vkCmdBeginRenderPass(): Cannot read invalid memory 0xffffffffc93bf470, please fill the memory before using. • Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. • Code 25 : Unable to allocate 2 descriptors of type VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER from pool 0xffffffffcf476468. This pool only has 0 descriptors of this type remaining. • Code 54 : Attempt to reset command buffer (0x28ceabb21c) which is in use. • [MEM] Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceabb21c before it has completed. You must check CB fence before this call. • Code 27 : vkUpdateDescriptorsSets() failed write update validation for Descriptor Set 0xffffffffd0e00188 with error: Cannot call vkUpdateDescriptorSets() to perform write update on descriptor set 18446744072918925704 that is in use by a command buffer. FIX Candidate CL 83010 • Code 53 : Command Buffer 0xcf3be004 is already in use and is not marked for simultaneous use. • Code 54 : Attempt to reset command buffer (0x28ceab721c) which is in use. Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceab721c before it has completed. You must check CB fence before this call. • Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.
Samsung Electronics Validation Error #1
ONE_TIME_SUBMIT CommandBuffer N-2 Frame N
CommandBuffer N-1 vkQueueSubmit (N-2)
CommandBuffer N Frame N + 1
vkQueueSubmit (N-1)
Normal Case Frame N + 2 vkResetCommandBuffer (N-2) vkQueueSubmit (N) vkWaitForFence (N-2) vkBeginCommandBuffer (N-2) Wait Index ++ % 3
CommandBuffer N
Bug Case Frame N + 2 vkResetCommandBuffer (N-2) vkQueueSubmit (N) vkWaitForFence (N) vkBeginCommandBuffer (N-2) Wait Index ++ % 3
Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceab721c before it has completed. You must check CB fence before this call. Code 53 : Command Buffer 0xcf3be004 is already in use and is not marked for simultaneous use. Code 54 : Attempt to reset command buffer (0x28ceab721c) which is in use.
Samsung Electronics, MCD, GPU Validation Error #2
Normal Case
Image (SwapChain) TILING_OPITMAL
Read Captured vkCmdCopyImageToBuffer Buffer (Staging) N/A Memory RAW data
Bug Case
Image (SwapChain) TILING_OPITMAL
Image (Staging) Read Captured vkCmdCopyImage TILING_LINEAR Memory RAW data
[MEM, 3]: Linear Image 0x515 is aliased with non-linear image 0x507 which is in violation of the Buffer- Image Granularity section of the Vulkan specification
Samsung Electronics, MCD, GPU Validation Error #3
switch (ImageLayout) { case VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL: return VK_ACCESS_TRANSFER_READ_BIT; case VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL: return VK_ACCESS_TRANSFER_WRITE_BIT; case VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL: return VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT; case VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL: return VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT; case VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL: return VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_INPUT_ATTACHMENT_READ_BIT; case VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL: return VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT; case VK_IMAGE_LAYOUT_GENERAL: return VK_ACCESS_INPUT_ATTACHMENT_READ_BIT | Transition for image copy, resolve, clear… etc VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_SHADER_WRITE_BIT | Set Image memory barrier VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | copy, resolve, clear… VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT | Restore original image layout VK_ACCESS_TRANSFER_READ_BIT | VK_ACCESS_TRANSFER_WRITE_BIT | VK_ACCESS_MEMORY_READ_BIT | VK_ACCESS_MEMORY_WRITE_BIT; case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR: return VK_ACCESS_MEMORY_READ_BIT; case VK_IMAGE_LAYOUT_UNDEFINED: case VK_IMAGE_LAYOUT_PREINITIALIZED: return 0; } Code 7 : Cannot submit cmd buffer using image (0xffffffffc91fe3b0) [sub- resource: aspectMask 0x1 array layer 2, mip level 5], with layout VK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.
Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.
And so on…
Samsung Electronics, MCD, GPU Validation Error #4
BUFFER ( 524000 ) 720 BUFFER ( 524288 ) [DS, 49]: DS 0x890 encountered the following validation error at draw time: Dynamic descriptor in binding #1 at global descriptor index 2 uses buffer 4 with dynamic offset 524000 combined with offset 0 and range 720 that oversteps the buffer size of 524288
RenderPass - Attachment Description VK_ATTACHMENT_LOAD_OP_DONT_CARE VK_ATTACHMENT_LOAD_OP_CLEAR / LOAD [MEM, 12]: vkCmdBeginRenderPass(): cannot read invalid memory 0x26, please fill the memory before using
No Debug Lines? VkPipelineRasterizationStateCreateInfo.lineWidth Code 10 : Attempt to set lineWidth to 0.000000 but physical device wideLines feature not supported/enabled so lineWidth must be 1.0f!
Samsung Electronics, MCD, GPU Development Tip - VkPipelineCache
If you don’t use VkPipelineCache, you may face performance problem. (lag) vkCreateGraphicsPipelines -> It’s really slow! So that we need to use VkPipelineCache.
Game Loading Time (createGraphicPipeline 300 EA + @)
Without VkPipelineCache With VkPipelineCache (Persistent)
13.260 seconds 4.187 seconds
std::vector
createGraphicPipeline vkCreateGraphicsPipelines(device, pipelineCache, 1, &createInfo, VK_NULL_HANDLE, &pipline);
size_t pDataSize = 0; vkGetPipelineCacheData(device, pipelineCache, &pDataSize, VK_NULL_HANDLE); onPause // if is valid vkGetPipelineCacheData(device, pipelineCache, &pDataSize, pipelineCacheData.data()); savePipelineCacheToSDcard(pipelineCacheData);
Samsung Electronics Development Tip - Managing VkPipeline
Let’s think about very simple renderer logic.
Initialize To make pipeline need a lot of information. VkGraphicsPipelineCreateInfo glUseProgram setShader VkPipelineVertexInputStateCreateInfo VkPipelineInputAssemblyStateCreateInfo VkPipelineRasterizationStateCreateInfo glEnable VkPipelineColorBlendStateCreateInfo glDisable setRenderState VkPipelineDepthStencilStateCreateInfo gl... VkPipelineViewportStateCreateInfo VkPipelineMultisampleStateCreateInfo VkDynamicState glBindTexture setTexture VkPipelineDynamicStateCreateInfo VkPipelineShaderStageCreateInfo … vkCreateGraphicsPipelines glDraw… draw vkCmdDraw…
Samsung Electronics Development Tip - Managing VkPipeline
For example worst case, Given RenderState & Attributes can be changed every per drawcall. So that, having efficiently designed pipeline management structure will be very Important for your performance optimization.
setShader VertexShader FragmentShader
VkPipelineDepthStencilStateCreateInfo, … RenderState #0 RenderState #1 setRenderState depth enable, … depth disable, …
setTexture Ignore this block in current case
VkPipelineVertexInputStateCreateInfo, … VertexAttribute #0 VertexAttribute #1 draw stride, location, binding stride, location, binding
Make structure vkCreateGraphicsPipelines to reuse VkPipeline
VkPipeline #0 VkPipeline #1
Samsung Electronics Development Tip - Managing VkRenderpass, VkFramebuffer
VkRenderpass VkRenderpass & VkFramebuffer also VkRenderPassCreateInfo { should consider reusing. … uint32_t attachmentCount; const VkAttachmentDescription* pAttachments; … }
VkAttachmentDescription { … VkAttachmentLoadOp loadOp; VK_ATTACHMENT_LOAD_OP_LOAD VkAttachmentStoreOp storeOp; VK_ATTACHMENT_LOAD_OP_CLEAR … VK_ATTACHMENT_LOAD_OP_DONT_CARE } VkAttachmentDescription;
vkCreateRenderPass
VkRenderPass #0 VkRenderPass #1
VkFreambuffer VkFramebufferCreateInfo { … VkRenderPass renderPass; … }
vkCreateFramebuffer
VkFramebuffer #0 VkFramebuffer #1
Samsung Electronics Development Tip - Clear framebuffer cost
There are 3 way to clear framebuffer (color, depth, stencil) • Renderpass Load Operation • vkCmdClearAttachments • vkCmdClearColorImage/vkCmdClearDepthStencilImage It’s important to using proper clear approach to not waste additional clear cost ( e.g. clear all, color only, depth only ) • 1 clear color & 30 clear depth
Renderpass begin/end using vkCmdClearAttachments LoadOpClear 24 FPS 57 FPS
Recommend to not clearing framebuffer by loading empty Renderpass begin()/end() without actual draw calls.. etcetera.
Samsung Electronics Development Tip - Clear framebuffer cost
Only Renderpass begin/end
Clear All Clear Depth Clear Depth LoadOpClear LoadOpLoad LoadOpLoad Color Color Color Color
LoadOpClear LoadOpClear LoadOpClear Depth Depth Depth Depth
LoadOpClear LoadOpLoad LoadOpLoad Stencil Stencil Stencil Stencil
Renderpass begin/end + APIs Faster!
Clear All Clear Depth Clear Depth LoadOpClear Color Color
LoadOpClear vkCmdClear vkCmdClear Attachments Attachments Depth Depth Depth Depth
LoadOpClear Stencil Stencil
Samsung Electronics Development Tip - Clear framebuffer cost
Very simple example only for description.
Request Request drawPrimitive clear
Is inside Is inside Set variable Get variable Renderpass? VK_ATTACHMENT_LOAD_OP_CLEAR Renderpass? VK_ATTACHMENT_LOAD_OP_CLEAR false false
true true false Find proper Renderpass
true CreateRenderpass()
Need to store it!
vkCmdClearAttachments() DrawPrimitive() vkCmdBeginRenderPass()
Next event Next event
Samsung Electronics Development Tip - PushConstant
Push constants are helpful to increase performance (the effect is GPU dependent) It’s very easy to use. But you should check device limit. VkPhysicalDeviceLimits::maxPushConstantsSize
VkPipelineLayout
// VertexShader … layout(push_constant) uniform buf1{ mat4 _unif00; } pc; // you cannot skip instancing, if uniform is push_constant. void main() { gl_position = pc._unif00 * _in_vertex; }
vkCmdPushConstants(commandBuffer, layout, stageFlags, offset, MVPMatrix.size(), MVPMatrix.data());
Samsung Electronics Development Tip - Swizzle
Error Expected
The map texture format was ETC1 (VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK)
// Fragment shader void main() { … vec4 mapColor = texture(mapSampler, texCoord); fragColor = mapColor.rgb * mapColor.a; }
Samsung Electronics Development Tip - Swizzle
VkImageViewCreateInfo { … VkComponentMapping components; … };
VkComponentMapping { VkComponentSwizzle r; VkComponentSwizzle g; VkComponentSwizzle b; VkComponentSwizzle a; VK_COMPONENT_SWIZZLE_ONE };
Samsung Electronics Development Tip - SecondaryCommandBuffer + Multi-thread
This is simple logic for using SecondaryCommandBuffer.
Recording Phase Update Phase Execute Phase
Create Secondary Begin Update UniformBuffer CommandBuffer Primary CommandBuffer
Begin Execute Bind GraphicPipeline Secondary CommandBuffer Assume that there are no dynamic PSOs End Bind DescriptorSet Primary CommandBuffer
Bind UniformBuffer VertexBuffer / IndexBuffer
Draw
End
Samsung Electronics Development Tip - SecondaryCommandBuffer + Multi-thread
This is simple logic for using SecondaryCommandBuffer. Draw phase Just update UniformBuffer & Execute SecondaryCommandBuffer! Create & Record SecondaryCommandBuffer
Bind SCB #0 UniformBuffer #0
SCB #1 UniformBuffer #1
SCB #2 UniformBuffer #2
SCB #3 UniformBuffer #3
SCB #4 UniformBuffer #4
SCB #5 UniformBuffer #5
Samsung Electronics Development Tip - SecondaryCommandBuffer + Multi-thread
OpenGL ES 2.0 + Single thread
Vulkan + SecondaryCommandBuffer + Multi thread
※ CPU side matrix transform calculation.
Samsung Electronics Development Tip - SecondaryCommandBuffer + Multi-thread
With Secondary Command Buffer
CPU Thread Update buffer 1/4
CPU Thread Update buffer 2/4
CPU Thread Update buffer 3/4
CPU Thread Update buffer 4/4
Queue
Execute SecondaryCommandBuffer
Samsung Electronics Development Tip - Multi-thread
Command Buffer
CPU Thread cmd cmd cmd
CPU Thread cmd cmd cmd
CPU Thread cmd cmd cmd
CPU Thread cmd cmd cmd
Queue
Samsung Electronics Development Tip - Multi-thread
Command Buffer
CPU Thread cmd cmd cmd
CPU Thread cmd cmd cmd submit!
CPU Thread cmd cmd cmd
CPU Thread cmd cmd cmd
Queue
Samsung Electronics Development Tip - Multi-thread
Command Buffer
CPU Thread cmd cmd cmd
CPU Thread
CPU Thread cmd cmd cmd
CPU Thread cmd cmd cmd
Queue
cmd cmd cmd
VkCommandPool , VkDesciprtorPool should be synchronized or all those pools are should be independently handled by each corresponding thread.
Samsung Electronics Development Tip - Reducing duplicated API calls
It is important to calling bind/set function once in a VkCommandBuffer to prevent duplicated vkCmdSetXXX, vkCmdBindXXX calls with same value / parameter.
Worst case
※ In our test case, 500 Calls vkCmdSetViewPort and vkCmdSetScissor take 1.412 ms.
Samsung Electronics Thank you