An Evaluation of Hardware Tessellation in a Visibility Buffer Renderer in Real-Time Graphics Applications

An evaluation of hardware tessellation in a visibility buffer renderer in real-time graphics applications. Cameron McPherson Bsc (Hons) Computer Games Technology, 2019 Abertay University School of Design and Informatics CONTENTS Abstract ...................................................................................................................................... 1 1. Introduction ........................................................................................................................... 1 1.1 Research Question ........................................................................................................... 3 1.2 Aims and Objectives ......................................................................................................... 3 2. Literature Review ................................................................................................................... 3 2.1 Memory Bandwidth in Games ......................................................................................... 4 2.2 The Visibility Buffer .......................................................................................................... 4 2.3 Hardware Tessellation ...................................................................................................... 7 2.4 Visibility Buffer with Tessellation ..................................................................................... 9 3. Methodology ........................................................................................................................ 10 3.1 Application Overview ..................................................................................................... 10 3.2 Development Environment ............................................................................................ 11 3.3 Application Framework .................................................................................................. 12 3.3.1 Renderers ................................................................................................................ 12 3.4 Testing ............................................................................................................................ 18 3.4.1 Hardware ................................................................................................................. 18 3.4.2 Evaluation Methods ................................................................................................ 19 3.4.3 Evaluation Parameters ............................................................................................ 19 3.4.4 Performance Metrics .............................................................................................. 20 4. Results .................................................................................................................................. 21 4.1 Memory usage ............................................................................................................... 22 4.1.1 Working Set ............................................................................................................. 22 4.1.2 Forward Writes........................................................................................................ 23 4.1.3 Deferred Reads ........................................................................................................ 24 4.1.4 Total Read/Writes.................................................................................................... 25 4.1.5 Coherency ............................................................................................................... 25 4.2 Streaming Multi-Processor Usage .................................................................................. 26 4.3 Net Performance Impact ................................................................................................ 27 4.3.1 Per Resolution ......................................................................................................... 27 4.3.2 Per Triangle Count ................................................................................................... 28 4.3.3 Full Frame Performance .......................................................................................... 28 5. Discussion ............................................................................................................................. 29 5.1 Analysis of Results .......................................................................................................... 29 5.1.1 Memory Usage ........................................................................................................ 30 5.1.2 Processor Usage ...................................................................................................... 32 5.1.3 Net Performance ..................................................................................................... 33 5.2 Design Considerations and Future Work ....................................................................... 34 5.2.1 Triangle Culling/Filtering ......................................................................................... 34 5.2.2 Support of Hardware Tessellation ........................................................................... 34 5.2.3 Compute-based Tessellation ................................................................................... 35 5.2.4 Turing Mesh Shaders ............................................................................................... 36 6. Conclusion ............................................................................................................................ 36 7. Acknowledgements .............................................................................................................. 38 8. List of References ................................................................................................................. 39 9. Appendices ........................................................................................................................... 41 Appendix 1 ........................................................................................................................... 41 ABSTRACT One of the most common performance bottlenecks in real-time graphics applications is memory bandwidth, especially on low-end and mobile hardware. Among multiple developments in both software design techniques and hardware architectures to alleviate this restriction, the visibility buffer and hardware tessellation are two rendering processes shown to reduce memory bandwidth usage — but there has been little research in employing the two techniques together. To determine the effects of supporting hardware tessellation in the visibility buffer on memory efficiency, a real-time graphics application with two distinct graphics pipelines was developed to render a procedurally generated terrain. Performance metrics pertaining to bandwidth usage, total working set size and pass times were gathered across multiple resolutions and geometric detail levels, both with and without hardware tessellation. The results show that while the support of hardware tessellation in the visibility buffer resulted in frame time improvements across all test cases, the associated bandwidth usage and compute cost would likely prove prohibitive to mobile and integrated platforms. Despite this, there remains opportunities for further improvements to be made to both renderers to improve performance and more closely reflect production rendering systems. Specifically, optimisations in geometry processing such as triangle culling, Mesh Shaders (NVIDIA, 2018a) or compute-based tessellation could drastically improve the visibility buffer’s suitability for high quality real-time applications on bandwidth-limited platforms. 1. INTRODUCTION Since its emergence as a leading form of entertainment media, the video games industry has experienced a constant progression in visual fidelity. As processing hardware becomes more powerful, the demand for triple-A development studios to produce consistently improving graphics continues to grow. In order to achieve this, efficient use of available hardware resources is imperative, and this necessity has ‘fuelled an explosion of research in the field of interactive computer graphics’ (Akenine-Moller, Haines and Hoffman, 2018). Therefore, the focus of research in real-time graphics is often concentrated on mitigating areas of the 1 rendering pipeline and underlying hardware that are considered to be performance bottlenecks. As the movement towards high-resolution and virtual-reality gaming intensifies, modern applications require huge amounts of data to be transferred to and from memory in the rendering of a 3D scene. For this reason, the amount of bandwidth available to the graphics hardware is key for determining how quickly data can be moved between memory and processing units. This is especially true for mobile platforms with integrated graphics architectures, where memory bandwidth capability is often the limiting factor as to the amount of detail that may be rendered per frame (Niessner et al., 2016). It is therefore necessary to develop rendering patterns which minimise the required bandwidth to process, shade, and render geometry. In doing so, systems of all capabilities may benefit from the freeing-up of GPU resources; allowing for tasks to utilise general-purpose compute features of high-performance platforms, as well as improved visual quality on mobile and integrated systems. For

Load more