Deferred Rendering Using Compute Shaders

Total Page:16

File Type:pdf, Size:1020Kb

Deferred Rendering Using Compute Shaders Deferred rendering using Compute shaders A comparative study using shader model 4.0 and 5.0 Benjamin Golba 1 | P a g e This thesis is submitted to the Department of Interaction and System Design at Blekinge Institute of Technology in partial fulfillment of the requirements for the Bachelor degree in Computer Science. The thesis is equivalent to 10 weeks of full time studies. Contact Information: Author: Benjamin Golba Address: Folkparksvägen 10:17, 372 40 Ronneby E-mail: [email protected] University advisor: Stefan Petersson Department of Software Engineering and Computer Science Address: Soft Center, RONNEBY Phone: +46 457 38 58 15 Department of Interaction and System Design Blekinge Institute of Technology SE - 372 25 RONNEBY Sweden Internet: http://www.bth.se/tek/ais Phone: +46 457 38 58 00 Fax: +46 457 271 25 2 | P a g e Abstract Game developers today are putting a lot of effort into their games. Consumers are hard to please and demand a game which can provide both fun and visual quality. This is why developers aim to make the most use of what hardware resources are available to them to achieve the best possible quality of the game. It is easy to use too many performance demanding techniques in a game, making the game unplayable. The hard part is to make the game look good without decreasing the performance. This can be done by using techniques in a smart way to make the graphics as smooth and efficient as they can be without compromising the visual quality. One of these techniques is deferred rendering. The latest version of Microsoft’s graphics platform, DirectX 11, comes with several new features. One of these is the Compute shader which is a feature making it easier to execute general computation on the graphics card. Developers do not need to use DirectX 11 cards to be able to use this feature though. Microsoft has made it available on graphic cards made for DirectX 10 as well. There are however a few differences between the two versions. The focus of this report will be to investigate the possible performance differences between these versions on when using deferred rendering. An application was made supporting both shader model 4 and 5 of the compute shader, to be able to investigate this. Keywords Deferred rendering, DirectX, Direct3D, Vertex shader, Pixel shader, Compute shader 3 | P a g e Table of contents Abstract ................................................................................................................................................... 3 Keywords ............................................................................................................................................. 3 1. Introduction ......................................................................................................................................... 6 1.1 Background ........................................................................................................................................ 6 1.2 Research objectives ........................................................................................................................... 7 1.3 Hypothesis ......................................................................................................................................... 7 1.4 Methodology ..................................................................................................................................... 7 1.5 Delimitations ..................................................................................................................................... 7 1.6 Acknowledgements ........................................................................................................................... 7 2. Programming in three dimensions using Direct3D ............................................................................. 8 2.1 Primitives ........................................................................................................................................... 9 2.2 Resources ........................................................................................................................................ 10 2.2.1 Textures .................................................................................................................................... 10 2.2.2 Buffers ...................................................................................................................................... 11 2.2.3 Unordered Access Views .......................................................................................................... 11 2.3 Programmable shaders ................................................................................................................... 11 2.3.1 Vertex shader ........................................................................................................................... 12 2.3.2 Pixel shader .............................................................................................................................. 12 2.3.3 Compute shader ....................................................................................................................... 12 3. Deferred rendering ............................................................................................................................ 14 3.1 Forward rendering ........................................................................................................................... 14 3.2 Deferred rendering .......................................................................................................................... 14 3.2.1 Multiple Render Targets ........................................................................................................... 15 3.2.2 Filling the G-buffer ................................................................................................................... 15 3.2.3 Rendering lights ........................................................................................................................ 15 4. Prototype performance ..................................................................................................................... 17 4.1 The prototype .................................................................................................................................. 17 4.2 Performance testing ........................................................................................................................ 18 4.2.1 Test 1 ........................................................................................................................................ 19 4.2.2 Test 2 ........................................................................................................................................ 20 4 | P a g e 4.2.3 Test 3 ........................................................................................................................................ 21 4.2.4 Test 4 ........................................................................................................................................ 22 5. Discussion and conclusion ................................................................................................................. 23 5.1 Discussion of test 1 and 2 ................................................................................................................ 23 5.2 Discussion of test 3 and 4 ................................................................................................................ 24 5.3 Discussion ........................................................................................................................................ 25 5.2 Conclusion ....................................................................................................................................... 26 6. References ......................................................................................................................................... 27 6.1 Bibliography ..................................................................................................................................... 27 6.2 Papers .............................................................................................................................................. 27 6.3 Presentations ................................................................................................................................... 27 6.4 Websites .......................................................................................................................................... 27 Apendix A – Compute shader code ....................................................................................................... 28 Apendix B – G-buffer filling code ........................................................................................................... 29 5 | P a g e 1. Introduction This chapter is an introduction to what this thesis is about and the work I have put into it. I will mention the background information to the topic as well as the methodology chosen to investigate the hypothesis. Research objectives and delimitations are also present in this chapter. 1.1 Background Most games that use some kind of graphical interface have a rendering procedure. There are two rasterization-based rendering methods today: forward rendering and deferred rendering. This report will focus on deferred rendering using Microsoft DirectX and Compute shaders[15]. Forward rendering, which
Recommended publications
  • Evolution of Programmable Models for Graphics Engines (High
    Hello and welcome! Today I want to talk about the evolution of programmable models for graphics engine programming for algorithm developing My name is Natalya Tatarchuk (some folks know me as Natasha) and I am director of global graphics at Unity I recently joined Unity… 4 …right after having helped ship the original Destiny @ Bungie where I was the graphics lead and engineering architect … 5 and lead the graphics team for Destiny 2, shipping this year. Before that, I led the graphics research and demo team @ AMD, helping drive and define graphics API such as DirectX 11 and define GPU hardware features together with the architecture team. Oh, and I developed a bunch of graphics algorithms and demos when I was there too. At Unity, I am helping to define a vision for the future of Graphics and help drive the graphics technology forward. I am lucky because I get to do it with an amazing team of really talented folks working on graphics at Unity! In today’s talk I want to touch on the programming models we use for real-time graphics, and how we could possibly improve things. As all in the room will easily agree, what we currently have as programming models for graphics engineering are rather complex beasts. We have numerous dimensions in that domain: Model graphics programming lives on top of a very fragmented and complex platform and API ecosystem For example, this is snapshot of all the more than 25 platforms that Unity supports today, including PC, consoles, VR, mobile platforms – all with varied hardware, divergent graphics API and feature sets.
    [Show full text]
  • Rendering 13, Deferred Shading, a Unity Tutorial
    Catlike Coding Unity C# Tutorials Rendering 13 Deferred Shading Explore deferred shading. Fill Geometry Bufers. Support both HDR and LDR. Work with Deferred Reflections. This is part 13 of a tutorial series about rendering. The previous installment covered semitransparent shadows. Now we'll look at deferred shading. This tutorial was made with Unity 5.5.0f3. The anatomy of geometry. 1 Another Rendering Path Up to this point we've always used Unity's forward rendering path. But that's not the only rendering method that Unity supports. There's also the deferred path. And there are also the legacy vertex lit and the legacy deferred paths, but we won't cover those. So there is a deferred rendering path, but why would we bother with it? After all, we can render everything we want using the forward path. To answer that question, let's investigate their diferences. 1.1 Switching Paths Which rendering path is used is defined by the project-wide graphics settings. You can get there via Edit / Project Settings / Graphics. The rendering path and a few other settings are configured in three tiers. These tiers correspond to diferent categories of GPUs. The better the GPU, the higher a tier Unity uses. You can select which tier the editor uses via the Editor / Graphics Emulation submenu. Graphics settings, per tier. To change the rendering path, disable Use Defaults for the desired tier, then select either Forward or Deferred as the Rendering Path. 1.2 Comparing Draw Calls I'll use the Shadows Scene from the Rendering 7, Shadows tutorial to compare both approaches.
    [Show full text]
  • Real-Time Lighting Effects Using Deferred Shading
    Real-time Lighting Effects using Deferred Shading Michal Ferko∗ Supervised by: Michal Valient† Faculty of Mathematics, Physics and Informatics Comenius University Bratislava / Slovakia Abstract We are targeting OpenGL 3 capable hardware, because we require the framebuffer object features as well as mul- Rendering realistic objects at interactive frame rates is a tiple render targets. necessary goal for many of today’s applications, especially computer games. However, most rendering engines used in these games induce certain limitations regarding mov- 2 Related Work ing of objects or the amount of lights used. We present a rendering system that helps overcome these limitations There are many implementations of Deferred Shading and while the system is still able to render complex scenes at this concept has been widely used in modern games [15] 60 FPS. Our system uses Deferred Shading with Shadow [12] [5], coupled with techniques used in our paper as well Mapping for a more efficient way to synthesize lighting as certain other. coupled with Screen-Space Ambient Occlusion to fine- Deferred Shading does not directly allow rendering of tune the final shading. We also provide a way to render transparent objects and therefore, we need to use a differ- transparent objects efficiently without encumbering the ent method to render transparent objects. There are several CPU. approaches to hardware-accelerated rendering of transpar- ent objects without the need to sort geometry. This group Keywords: Real-time Rendering, Deferred Shading, of algorithms is referred to as Order-Independent Trans- High-dynamic range rendering, Tone-mapping, Order- parency. Independent Transparency, Ambient Occlusion, Screen- An older approach is Depth Peeling [7] [4], which re- Space Ambient Occlusion, Stencil Routed A-Buffer quires N scene rendering passes to capture N layers of transparent geometry.
    [Show full text]
  • More Efficient Virtual Shadow Maps for Many Lights
    1 More Efficient Virtual Shadow Maps for Many Lights Ola Olsson1;2, Markus Billeter1;3, Erik Sintorn1, Viktor Kampe¨ 1, and Ulf Assarsson1 (Invited Paper) Abstract—Recently, several algorithms have been introduced for shading is much more important. To this end we use that enable real-time performance for many lights in applications Clustered Deferred Shading [3], as our starting point. This such as games. In this paper, we explore the use of hardware- algorithm offers the highest light-culling efficiency among supported virtual cube-map shadows to efficiently implement high-quality shadows from hundreds of light sources in real time current real-time many-light algorithms and the most robust and within a bounded memory footprint. In addition, we explore shading performance. Moreover, clustered shading provides the utility of ray tracing for shadows from many lights and tight 3D bounds around groups of samples in the frame present a hybrid algorithm combining ray tracing with cube buffer and therefore can be viewed as a fast voxelization of maps to exploit their respective strengths. Our solution supports the visible geometry. Thus, as we will show, these clusters real-time performance with hundreds of lights in fully dynamic high-detail scenes. provide opportunities for efficient culling of shadow casters and allocation of shadow map memory. Index Terms—Computer graphics, GPU, real-time shading, shadows, virtual texturing. A. Contributions We contribute an efficient culling scheme, based on clusters, I. INTRODUCTION which is used to render shadow-casting geometry to many cube In recent years, several techniques have been presented shadow maps. We demonstrate that this can enable real-time that enable real-time performance for applications such as rendering performance using shadow maps for hundreds of games using hundreds to many thousands of lights.
    [Show full text]
  • A Novel Multithreaded Rendering System Based on a Deferred Approach
    VIII Brazilian Symposium on Games and Digital Entertainment Rio de Janeiro, RJ – Brazil, October, 8th-10th 2009 A Novel Multithreaded Rendering System based on a Deferred Approach Jorge Alejandro Lorenzon Esteban Walter Gonzalez Clua Universidad Austral, Argentina Media Lab – UFF, Brazil [email protected] [email protected] Figure 1: Mix of the final illuminated picture, the diffuse color buffer and the normal buffer Abstract Therefore, the architecture of newer game engines must include fine-grained multithreaded algorithms This paper presents the architecture of a rendering and systems. Fortunately for some systems like physics system designed for multithreaded rendering. The and AI this can be done. However, when it comes to implementation of the architecture following a deferred rendering there is one big issue: All draw and state rendering approach shows gains of 65% on a dual core calls must go to the graphics processing unit (GPU) in 1 machine. a serialized manner . This limits game engines as only one thread can actually execute draw calls to the Keywords : multithreaded rendering, deferred graphics card. Adding to the problem, draw calls and rendering, DirectX 11, command buffer, thread pool state management of the graphics pipeline are expensive for the CPU as there is a considerable overhead created by the API and driver. For this 1. Introduction reason, most games and 3D applications are CPU Game engines and 3D software are constantly bound and rely on batching 3D models to feed the changing as the underlying hardware and low level GPU. APIs evolve. The main driving force of change is the pursuit of greater performance for 3D software, which Microsoft, aware of this problem, is pushing means, pushing more polygons with more realistic forward a new multithreaded graphics API for the PC, models of illumination and shading techniques to the Direct3D11.
    [Show full text]
  • NVIDIA GPU Programming Guide
    Version 2.4.0 1 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, GeForce, and NVIDIA Quadro are registered trademarks of NVIDIA Corporation. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright © 2005 by NVIDIA Corporation. All rights reserved. HISTORY OF MAJOR REVISIONS Version Date Changes 2.4.0 07/08/2005 Updated cover Added GeForce 7 Series content 2.3.0 02/08/2005 Added 2D & Video Programming chapter Added more SLI information 2.2.1 11/23/2004 Minor formatting improvements 2.2.0 11/16/2004 Added normal map format advice Added ps_3_0 performance advice Added General Advice chapter 2.1.0 07/20/2004 Added Stereoscopic Development chapter 2.0.4 07/15/2004 Updated MRT section 2.0.3 06/25/2004 Added Multi-GPU Support chapter 2 NVIDIA GPU Programming Guide Table of Contents Chapter 1.
    [Show full text]
  • Efficient Virtual Shadow Maps for Many Lights
    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2015.2418772, IEEE Transactions on Visualization and Computer Graphics 1 More Efficient Virtual Shadow Maps for Many Lights Ola Olsson1;2, Markus Billeter1;3, Erik Sintorn1, Viktor Kampe¨ 1, and Ulf Assarsson1 (Invited Paper) Abstract—Recently, several algorithms have been introduced algorithm offers the highest light-culling efficiency among that enable real-time performance for many lights in applications current real-time many-light algorithms and the most robust such as games. In this paper, we explore the use of hardware- shading performance. Moreover, clustered shading provides supported virtual cube-map shadows to efficiently implement high-quality shadows from hundreds of light sources in real time tight 3D bounds around groups of samples in the frame and within a bounded memory footprint. In addition, we explore buffer and therefore can be viewed as a fast voxelization of the utility of ray tracing for shadows from many lights and the visible geometry. Thus, as we will show, these clusters present a hybrid algorithm combining ray tracing with cube provide opportunities for efficient culling of shadow casters maps to exploit their respective strengths. Our solution supports and allocation of shadow map memory. real-time performance with hundreds of lights in fully dynamic high-detail scenes. A. Contributions I. INTRODUCTION We contribute an efficient culling scheme, based on clusters, In recent years, several techniques have been presented which is used to render shadow-casting geometry to many cube that enable real-time performance for applications such as shadow maps.
    [Show full text]
  • Wipein - F- - a 3D Action Game
    .sssss WipeIn - F- - A 3D Action Game Bachelor's Thesis Computer Science and Engineering Programme CHRISTOPHER ANDERSSON JESPER LINDH MIKAEL MÖLLER MIKAEL OLAISSON KARL SCHMIDT CARL-JOHAN SÖDERSTEN ALLAN WANG Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2011 Bachelor's Thesis DATX11-09 - Rally Sport Racing Game 1 Abstract The following thesis describes a case study of 3D game program- ming. It involves the evaluation of several techniques commonly used in real-time rendering, as well as some associated elds such as mod- elling, collision handling and sound. We will investigate which of the many options available are the most ecient, as well as which areas are preferably put aside, in the aim of achieving an entertaining and visually appealing 3D computer game within a short time span. 2 CONTENTS CONTENTS Contents 1 Introduction 5 1.1 Background . .5 1.2 Purpose . .5 1.3 Problem . .5 1.4 Limitations . .5 1.4.1 Contents . .6 1.4.2 Areas of focus . .6 1.4.3 Open-source code . .6 1.4.4 Computer power . .6 1.5 Method . .6 1.5.1 Choice of programming language and framework . .7 1.5.2 API . .7 1.5.3 Development process . .8 1.6 Game design . .8 2 Graphics 9 2.1 Pipeline . .9 2.2 The application stage . 10 2.3 The geometry stage . 10 2.4 The rasteriser stage . 12 2.4.1 Hidden surface determination . 12 2.5 Shading . 14 2.5.1 The Phong Shading Model . 14 2.5.2 Bidirectional Reectance Distribution Functions .
    [Show full text]
  • Real Shading in Unreal Engine 4 by Brian Karis, Epic Games
    Real Shading in Unreal Engine 4 by Brian Karis, Epic Games Figure 1: UE4: Infiltrator demo Introduction About a year ago, we decided to invest some time in improving our shading model and embrace a more physically based material workflow. This was driven partly by a desire to render more realistic images, but we were also interested in what we could achieve through a more physically based approach to material creation and the use of material layering. The artists felt that this would be an enormous improvement to workflow and quality, and I had already seen these benefits first hand at another studio, where we had transitioned to material layers that were composited offline. One of our technical artists here at Epic experimented with doing the layering in the shader with promising enough results that this became an additional requirement. In order to support this direction, we knew that material layering needed to be simple and effi- cient. With perfect timing came Disney’s presentation [2] concerning their physically based shading and material model used for Wreck-It Ralph. Brent Burley demonstrated that a very small set of material parameters could be sophisticated enough for offline feature film rendering. He also showed that a fairly practical shading model could closely fit most sampled materials. Their work became an inspiration and basis for ours, and like their “principles,” we decided to define goals for our own system: Real-Time Performance • First and foremost, it needs to be efficient to use with many lights visible at a time. 1 Reduced Complexity • There should be as few parameters as possible.
    [Show full text]
  • Real-Time 2D Manipulation of Plausible 3D Appearance Using Shading and Geometry Buffers Carlos Jorge Zubiaga Pena
    Real-time 2D manipulation of plausible 3D appearance using shading and geometry buffers Carlos Jorge Zubiaga Pena To cite this version: Carlos Jorge Zubiaga Pena. Real-time 2D manipulation of plausible 3D appearance using shading and geometry buffers. Other [cs.OH]. Université de Bordeaux, 2016. English. NNT : 2016BORD0178. tel-01486698 HAL Id: tel-01486698 https://tel.archives-ouvertes.fr/tel-01486698 Submitted on 10 Mar 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESIS PRESENTED AT UNIVERSITE´ DE BORDEAUX ECOLE´ DOCTORALE DE MATHMATIQUES ET D’INFORMATIQUE par Carlos Jorge Zubiaga Pe˜na POUR OBTENIR LE GRADE DE DOCTEUR SPECIALIT´ E´ : INFORMATIQUE Real-time 2D manipulation of plausible 3D appearance using shading and geometry buffers Date de soutenance : 7 November 2016 Devant la commission d’examen compose de : Diego Gutierrez . Professeur, Universidad de Zaragoza . Rapporteur Daniel Sykora´ . Professeur associ´e, Czech Technical University in Prague Rapporteur Pascal Guitton ...... Professeur, Univerist´eBordeaux ................... ..... Pr´esident David Vanderhaeghe Maˆıtre de Conferences, Univerist´ede Toulouse . Examinateur Xavier Granier ...... Professeur, Institut d’Optique .................. Examinateur Pascal Barla ......... Charg´ede recherche, Inria ................... ........... Directeur 2016 Abstract Traditional artists paint directly on a canvas and create plausible appearances of real-world scenes.
    [Show full text]
  • The Opengl Rendering Pipeline
    The OpenGL Rendering Pipeline CSE 781 Winter 2010 Han-Wei Shen Brief History of OpenGL Originated from a proprietary API called Iris GL from Silicon Graphics, Inc. Provide access to graphics hardware capabilities at the lowest possible level that still provides hardware independence The evolution is controlled by OpenGL Architecture Review Board, or ARB. OpenGL 1.0 API finalized in 1992, first implementation in 1993 In 2006, OpenGL ARB became a workgroup of the Khronos Group 10 revisions since 1992 OpenGL Evolution 1.1 (1997): vertex arrays and texture objects 1.2 (1998): 3D textures 1.3 (2001): cubemap textures, compressed textures, multitextures 1.4 (2002): mipmap generation, shadow map textures, etc 1.5 (2003): vertex buffer object, shadow comparison functions, occlusion queries, non-power-of-2 textures OpenGL Evolution 2.0 (2004): vertex and fragment shading (GLSL 1.1), multiple render targets, etc 2.1 (2006): GLSL 1.2, pixel buffer objects, etc 3.0 (2008): GLSL 1.3, deprecation model, etc 3.1 (2009): GLSL 1.4, texture buffer objects, move much of deprecated functions to ARB compatible extension 3.2 (2009) OpenGL Extensions New features/functions are marked with prefix Supported only by one vendor NV_float_buffer (by nvidia) Supported by multiple vendors EXT_framebuffer_object Reviewed by ARB ARB_depth_texture Promoted to standard OpenGL API Deprecation Model, Contexts, and Profiles Redundant and In-efficient functions are deprecated – to be removed in the future glBegin(), glEnd() OpenGL Contexts – data
    [Show full text]
  • Deferred Shading Tutorial
    Deferred Shading Tutorial Fabio Policarpo1 Francisco Fonseca2 [email protected] [email protected] CheckMate Games1,2 Pontifical Catholic University of Rio de Janeiro2 ICAD/Igames/VisionLab 1. Introduction Techniques usually consider non-interactive a few years ago are now possible in real-time using the flexibility and speed of new programmable graphics hardware. An example of that is the deferred shading technique, which is an approach that postpones shading calculations for a fragment1 until the visibility of that fragment is completely determined. In other words, it implies that only fragments that really contribute to the resultant image are shaded. Although deferred shading has become practical for real-time applications in recent years, this technique was firstly published in 1988 by Michael Deering et al. [Deering88]. In that work, the authors proposed a VLSI system where a pipeline of triangle processors rasterizes the geometry, and then a pipeline of shading processors applies Phong shading [Phong75] with multiple light sources to such geometry. After the initial research performed by Deering et al., the next relevant work involving deferred shading was developed by Saito and Takahashi [Saito90] in 1990. The authors of this article proposed a rendering technique that produces 3D images that favor the recognition of shapes and patterns, since shapes can be readily understood if certain geometric properties are enhanced. In order to optimize the enhancement process, geometric properties of the surfaces are preserved as Geometric-Buffers (G-buffers). So, by using G-buffers as intermediate results, artificial enhancement processes are separated from geometric processes (projection and hidden surface removal) and physical processes (shading and texture mapping), and performed as a post-processing pass.
    [Show full text]