Gen Vulkan API

Faculty of Science and Technology Department of Computer Science A closer look at problems related to the next- gen Vulkan API — Håvard Mathisen INF-3981 Master’s Thesis in Computer Science June 2017 Abstract Vulkan is a significantly lower-level graphics API than OpenGL and require more effort from application developers to do memory management, synchronization, and other low-level tasks that are spe- cific to this API. The API is closer to the hardware and offer features that is not exposed in older APIs. For this thesis we will extend an existing game engine with a Vulkan back-end. This allows us to eval- uate the API and compare with OpenGL. We find ways to efficiently solve some challenges encountered when using Vulkan. i Contents 1 Introduction 1 1.1 Goals . .2 2 Background 3 2.1 GPU Architecture . .3 2.2 GPU Drivers . .3 2.3 Graphics APIs . .5 2.3.1 What is Vulkan . .6 2.3.2 Why Vulkan . .7 3 Vulkan Overview 8 3.1 Vulkan Architecture . .8 3.2 Vulkan Execution Model . .8 3.3 Vulkan Tools . .9 4 Vulkan Objects 10 4.1 Instances, Physical Devices, Devices . 10 4.1.1 Lost Device . 12 4.2 Command buffers . 12 4.3 Queues . 13 4.4 Memory Management . 13 4.4.1 Memory Heaps . 13 4.4.2 Memory Types . 14 4.4.3 Host visible memory . 14 4.4.4 Memory Alignment, Aliasing and Allocation Limitations 15 4.5 Synchronization . 15 4.5.1 Execution dependencies . 16 4.5.2 Memory dependencies . 16 4.5.3 Image Layout Transitions . 17 4.5.4 Queue Family Ownership Transfers . 17 4.6 Render Pass . 17 4.7 Shaders . 18 4.8 Pipeline State Objects . 18 4.9 Resource Descriptors . 18 5 A Vulkan Game Engine 20 5.1 Engine Overview . 20 5.2 Previous work on DirectX 12 . 20 ii 5.3 Designing a Vulkan Graphics Engine . 21 5.4 Command Buffers . 21 5.4.1 Multi-threading . 22 5.5 Memory Management . 22 5.6 Synchronization . 23 5.7 Vulkan API . 23 5.8 Debug Markers . 24 6 Results 25 6.1 Vulkan vs OpenGL . 25 6.2 Higher Graphics Settings . 26 6.3 Async Compute . 26 7 Discussion 28 7.1 Vulkan vs OpenGL . 28 7.2 Command buffers and multi-threading . 28 7.3 Queues . 30 7.4 Memory Management . 30 7.5 Synchronization . 31 7.6 The Vulkan API . 31 7.7 Validation layers . 31 8 Conclusion 32 9 Figures 35 iii 1 Introduction GPUs have evolved significantly since their early history to meet the de- mand for better graphics and smoother frame-rates. Exposing new hardware features to developers have been done through extending graphics APIs like OpenGL. OpenGL had its initial release in 1992 and was designed for graphics hardware that was significantly different from the modern GPU. It has problems adapting to modern multi-core processors, new GPU architectures and applications that required more efficient and predictable performance. One example of such an application is Gear VR, which combines mobile graphics with VR. VR require low latency, high performance and more predictable performance while mobile graphics require high efficiency, less power usage and support for tile-based GPU architectures. It was time for a grounds-up redesign even though OpenGL has aged quite well through added extensions and new versions. Vulkan is an API that is designed to meet the demands of modern graphics applications. Not only does the API expose new graphics hardware features, it is also designed to be programmed using modern multi-core processors. Some advantages of Vulkan are: • Designed to allow for more efficient use of CPU and GPU resources. The added efficiency comes from a closer mapping of the API to the hardware. • It is a lower-level1 API that giver more control to developers. • Thinner drivers with less overhead and latency that should remove some micro-stuttering that older drivers had. Drivers should not have to do any run-time shader re-compilations. • Intended to scale to multiple threads. • Exposes new hardware queues for compute and DMA. A good motivation for learning about Vulkan is gaining a better under- standing of how GPU drivers work. In this project we will take a closer look at Vulkan. 1Note that a lower-level API is not the same as a low-level API 1 1.1 Goals The main goal for this thesis is how we can design a graphics engine to better utilize the underlying hardware by using the Vulkan API. Central questions are: • How can we multi-thread the tasks involved in generating commands for the GPU • How can we utilize new hardware features the API exposes like the new queues for doing compute and DMA concurrently to the graphics engine • What is the best way to do memory management • How can we best manage and synchronize resources • How does Vulkan compare to OpenGL • How do we make useful abstraction that minimize complexity of the API To answer these questions we will design and implement a Vulkan graphics engine for an existing game engine written by the author. 2 2 Background In this chapter, we explain modern GPU architectures and drivers before introducing the Vulkan graphics API. 2.1 GPU Architecture Modern GPU architectures come in multiple forms. We have both GPUs integrated on a SOC, and we have standalone discrete graphics cards. We have GPUs for mobile and GPUs for desktop. Different GPUs have different tech- nical capabilities. Integrated GPUs all have a uniform memory architecture (UMA), meaning that the CPU and the GPU share memory, while discrete GPUs have dedicated memory in addition to sharing system memory with the CPU. Modern GPUs also share virtual memory space with the CPU. Desktop GPUs usually use a feed-forward rasterizing architecture while mobile GPUs use a tiled rendering architecture. Tiled rendering works by deferring the rasterization by rather storing the geometry data of the scene in a screen-space tiled cache that is later used to render the scene a tile at the time. By using this technique we can move the framebuffer out of main memory and into high-speed on-chip memory which can reduce the used memory bandwidth [16]. Even discrete desktop GPUs are starting to use similar techniques to reduce memory bandwidth23. Modern GPUs have features not exposed in the OpenGL API. They can have multiple compute engines that can execute compute workloads asynchronously to the graphics engine. They can also execute memory copies using the DMA engine asynchronously to the other engines. 2.2 GPU Drivers GPU drivers work by packing commands for the GPU into command buffers. There are two components in a graphics driver, a user space library and a kernel module. Commands like Draw*() or Dispatch() are not executed immediately on the GPU when the function is called, but rather staged for later execution in a command buffer in the user space driver. When the command buffer fills up with enough commands they are optimized and sent to the kernel. The kernel ensures that the commands are valid and don't access memory not belonging to the application before staging the 2http://www.realworldtech.com/tile-based-rasterization-nvidia-gpus/ 3http://www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser/ 3 3 Figure 1: OpenGL Driver commands for execution on the GPU. When the GPU runs out of commands in the current command buffer it is executing, an interrupt is sent to the OS requesting a new command buffer to execute. The GPU front-end can fetch its own commands from command buffers in system memory through DMA operations and is executing commands at its own pace. Figure 1 shows an overview of GPU drivers. As an optimization, some drivers allow the optimization step of the command buffers to be done in a separate driver thread as shown in Figure 2. This makes draw and dispatch calls really fast in the application but comes at the cost of additional latency. To take full advantage of this technique it might also be necessary for the application to triple buffer per-frame resources, as opposed to the traditional double buffering. One buffer is used by the application, one by the driver thread and one by the GPU. This comes at the cost of extra memory usage. Marchesin [14] has an extensive but not finished introduction to Linux graphics drivers. There are multiple sources on Approaching Zero Driver Overhead (AZDO) techniques that shed light on the problems of the traditional graphics driver architecture and how to circumvent those [4] [5] [10]. 4 Figure 2: Multi-threaded OpenGL Driver AZDO is a collection of multiple different GPU techniques to remove driver overhead. The most predominant AZDO techniques are about moving the logic used to select which resources should be used by a shader from the CPU to the shader itself. It is often recommended to start with the AZDO techniques when learning about GPU drivers in-depth. McDonald [15] has a presentation about driver models and how to avoid sync points. 2.3 Graphics APIs Modern graphics is based around a pipeline that specifies some fixed function stages and some programmable shader stages. Fixed function stages consists of steps like vertex fetching, rasterization, fragment operations and tessellation primitive generation. Programmable stages are the vertex shader, fragment shader, geometry shader, and tessellation control and evaluation shaders. The pipeline begins by fetching an index buffer used to look up a vertex buffer. Vertexes are processed by vertex shaders and some optional stages before being assembled into polygons.

Gen Vulkan API

Real-Time Finite Element Method (FEM) and Tressfx

AMD Powerpoint- White Template

Candidate Features for Future Opengl 5 / Direct3d 12 Hardware and Beyond 3 May 2014, Christophe Riccio

Comparison of Technologies for General-Purpose Computing on Graphics Processing Units

The Amd Linux Graphics Stack – 2018 Edition Nicolai Hähnle Fosdem 2018

A Review of Gpuopen Effects

Club 3D Radeon R9 380 Royalqueen 4096MB GDDR5 256BIT | PCI EXPRESS 3.0

SAPPHIRE R9 285 2GB GDDR5 ITX COMPACT OC Edition (UEFI)

Masterarbeit / Master's Thesis

Quickspecs AMD Firepro W5100 4GB Graphics

AMD Linux Driver 2021.10 Release Notes

Radeon GPU Profiler Documentation