Modular and High Performance Shader Development

Total Page:16

File Type:pdf, Size:1020Kb

Modular and High Performance Shader Development Shader Components: Modular and High Performance Shader Development YONG HE, Carnegie Mellon University TIM FOLEY, NVIDIA TEGUH HOFSTEE, Carnegie Mellon University HAOMIN LONG, Tsinghua University KAYVON FATAHALIAN, Carnegie Mellon University at a sucient rate. Many of these commands pertain to binding Modern game engines seek to balance the conicting goals of high ren- shader parameters (textures, buers, etc.) to the graphics pipeline. dering performance and productive software development. To improve CPU This problem is suciently acute that a new generation of real-time performance, the most recent generation of real-time graphics APIs provide graphics APIs [Khronos Group, Inc. 2016; Microsoft 2017] has been new primitives for performing ecient batch updates to shader parameters. designed to provide new primitives for performing ecient batch However, modern game engines featuring large shader codebases have strug- updates to shader parameters. However, modern game engines have gled to take advantage of these benets. The problem is that even though shader parameters can be organized into ecient modules bound to the been unable to fully take advantage of this functionality. pipeline at various frequencies, modern shading languages lack correspond- The problem is that even though shader parameters can now be ing primitives to organize shader logic (requiring these parameters) into organized into ecient modules and bound to the pipeline only at modules as well. The result is that complex shaders are typically compiled to the necessary frequencies, modern shading languages lack corre- use a monolithic block of parameters, defeating the design, and performance sponding primitives to organize shader logic (that requires these benets, of the new parameter binding API. In this paper we propose to parameters) into modules as well. As a result, complex shaders are resolve this mismatch by introducing shader components, a rst-class unit compiled into monolithic programs that access a monolithic block of modularity in a shader program that encapsulates a unit of shader logic of parameters. These parameters are typically updated en masse by and the parameters that must be bound when that logic is in use. We show engines, defeating the design, and performance benets, of the new that by building sophisticated shaders out of components, we can retain parameter binding API. essential aspects of performance (static specialization of the shader logic in use and ecient update of parameters at component granularity) while In this paper we propose a solution to this mismatch by introduc- maintaining the modular shader code structure that is desirable in today’s ing shader components, a rst-class unit of modularity in a shader high-end game engines. program that encapsulates both the shader logic of a feature and the parameters that must be bound when that feature is in use. From CCS Concepts: •Computing methodologies ! Graphics systems and in- the perspective of a shader writer, shader components provide a terfaces; mechanism that can be used to organize large shader codebases. Additional Key Words and Phrases: shading languages, real-time rendering From the perspective of a game engine, shader components provide ACM Reference format: both the logical units (e.g., a specic material, an animation eect, Yong He, Tim Foley, Teguh Hofstee, Haomin Long, and Kayvon Fatahalian. and a lighting model) used to assemble shading features into com- 2017. Shader Components: Modular and High Performance Shader Develop- plete shaders, and also the granularity at which blocks of shader ment. ACM Trans. Graph. 36, 4, Article 100 (July 2017), 11 pages. parameters (needed as inputs to these features) are bound to the DOI: http://dx.doi.org/10.1145/3072959.3073648 graphics pipeline. By supporting shader components directly in a shading language, our system is able to provide services such as 1 INTRODUCTION static checking of shader components against interfaces, specializa- A modern game engine must achieve high performance to render tion of complete shaders to a specic set of components (features) detailed scenes with complex materials and lighting. At the same in use, and generation of shader parameter block interfaces that time, high productivity is required when authoring and maintaining facilitate ecient binding. These services allow a well-written en- the large libraries of shader code that dene materials and lighting gine to retain the benets of modular shader software development, in these scenes. Today, a major challenge limiting the performance while also realizing the low CPU overhead and high rendering per- of real-time rendering systems is the inability for the CPU (even formance promised by modern graphics APIs. a multi-core CPU) to supply the GPU with rendering commands Our specic contributions include: • The design of the shader component concept and its imple- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed mentation in a shader compiler library. This implementa- for prot or commercial advantage and that copies bear this notice and the full citation tion includes a shading language front-end and host side on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or APIs that engines use to assemble components into com- republish, to post on servers or to redistribute to lists, requires prior specic permission plete shaders and to allocate and bind blocks of shader and/or a fee. Request permissions from [email protected]. parameters. © 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. 0730-0301/2017/7-ART100 $15.00 • A reference implementation of a large library of shader com- DOI: http://dx.doi.org/10.1145/3072959.3073648 ponents and a Vulkan-based rendering engine that achieves ACM Transactions on Graphics, Vol. 36, No. 4, Article 100. Publication date: July 2017. 100:2 • Yong He, Tim Foley, Teguh Hofstee, Haomin Long, and Kayvon Fatahalian high rendering performance when using shaders assembled paper will ignore xed-function state and focus only on generating from these components. Specically, we demonstrate that and selecting shader variants; statements made about the handling our design simultaneously achieves modular shader author- of variants can be extended, without loss of generality, to PSOs. ing and ecient shader parameter binding, yielding over 2× lower CPU cost than a system implemented using current 2.2 Composing Shaders from Modules AAA engine best practices. Consider the nested loops from the previous section. The right shader variant to use in the inner loop might depend on contribu- 2 BACKGROUND tions from many sources. The object might contribute animation, 2.1 Organizing a Renderer for Performance tessellation, or other geometric eects. The material might con- In order to achieve high performance, a renderer must minimize tribute logic to fetch and composite texture layers. The render pass changes to GPU state. This state comprises the conguration of itself might contribute logic to evaluate light sources, write to a G- xed-function pipeline stages, compiled shader kernels to be used buer, etc. Figure 1 depicts one way of composing a shader variant by programmable stages, and a set of shader parameter bindings that from these dierent concerns. provide argument values to kernel parameters. For an engine featuring a large library of shaders and eects it Changing parts of the GPU state has costs, in both CPU and is desirable that disparate concerns or features can be dened as GPU time. CPU costs include time spent by the application and composable modules. For example, any given surface shader should driver to prepare and send hardware commands. GPU costs include work with any given light, and vice versa. Dierent modules are possible hardware pipeline stalls waiting for work using an old illustrated as dierent boxes at the left in Figure 1, grouped by state to complete. Both kinds of costs can negatively impact frame concern. rates, depending on whether an application is CPU- or GPU-bound. In other domains, it is common to compose separately-developed Reducing CPU costs during rendering frees CPU resources for other modules dynamically at runtime, by using a level of indirection key engine tasks such as AI, animation, audio, or input processing. (e.g., using function pointers). However, in the context of real-time Every millisecond counts when, e.g., only 11ms are available per shaders, the cost of runtime indirection is usually prohibitive, and frame in VR. almost all shader composition is achieved by static specialization. In order to reduce state changes, engines often organize rendering As illustrated in Figure 1, a typical way for an engine to perform work around dierent rates of change. For example, a rendering specialization is to use the C preprocessor to perform ad hoc code pass can be thought of in terms of a set of nested loops, e.g.: generation. The code for a given feature is guarded with #ifdefs and enabled by a #define. In this case, a shader compiler must be for(v in view) for(m in materials) invoked for each combination of features, to generate a statically for(o in objects) specialized variant that “links” dierent features together. When draw(v, m, o); shipping a game, all required variants are typically compiled ahead The key observation is that some state (e.g., per-object transforma- of time, and stored in a variant database. tions) changes at a higher rate than others (e.g., view parameters). Because specialized variants are generated statically, the desired In order to maximize CPU eciency when changing GPU state, behavior of composing modules dynamically at runtime must be graphics APIs provide mechanisms to group state into objects that achieved by looking up an appropriate variant in the database.
Recommended publications
  • GLSL 4.50 Spec
    The OpenGL® Shading Language Language Version: 4.50 Document Revision: 7 09-May-2017 Editor: John Kessenich, Google Version 1.1 Authors: John Kessenich, Dave Baldwin, Randi Rost Copyright (c) 2008-2017 The Khronos Group Inc. All Rights Reserved. This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast, or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specification for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part. Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be reformatted AS LONG AS the contents of the specification are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group website should be included whenever possible with specification distributions.
    [Show full text]
  • First Person Shooting (FPS) Game
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072 Thunder Force - First Person Shooting (FPS) Game Swati Nadkarni1, Panjab Mane2, Prathamesh Raikar3, Saurabh Sawant4, Prasad Sawant5, Nitesh Kuwalekar6 1 Head of Department, Department of Information Technology, Shah & Anchor Kutchhi Engineering College 2 Assistant Professor, Department of Information Technology, Shah & Anchor Kutchhi Engineering College 3,4,5,6 B.E. student, Department of Information Technology, Shah & Anchor Kutchhi Engineering College ----------------------------------------------------------------***----------------------------------------------------------------- Abstract— It has been found in researches that there is an have challenged hardware development, and multiplayer association between playing first-person shooter video games gaming has been integral. First-person shooters are a type of and having superior mental flexibility. It was found that three-dimensional shooter game featuring a first-person people playing such games require a significantly shorter point of view with which the player sees the action through reaction time for switching between complex tasks, mainly the eyes of the player character. They are unlike third- because when playing fps games they require to rapidly react person shooters in which the player can see (usually from to fast moving visuals by developing a more responsive mind behind) the character they are controlling. The primary set and to shift back and forth between different sub-duties. design element is combat, mainly involving firearms. First person-shooter games are also of ten categorized as being The successful design of the FPS game with correct distinct from light gun shooters, a similar genre with a first- direction, attractive graphics and models will give the best person perspective which uses light gun peripherals, in experience to play the game.
    [Show full text]
  • 009NAG – September 2012
    SOUTH AFRICA’S LEADING GAMING, COMPUTER & TECHNOLOGY MAGAZINE VOL 15 ISSUE 6 BORDERLANDS 2 COMPETITION Stuff you can’t buy anywhere! PC / PLAYSTATION / XBOX / NINTENDO PREVIEWS Sleeping Dogs Beyond: Two Souls Pikmin 3 Injustice: Gods among Us ENEMY UNKNOWN Is that a plasma rifl e in your pocket, or are you just happy to see me? ULTIMATE GAMING LOUNGE What your lounge should look like Contents Editor Michael “RedTide“ James Regulars [email protected] 10 Ed’s Note Assistant editor 12 Inbox Geoff “GeometriX“ Burrows 16 Bytes Staff writer Dane “Barkskin “ Remendes Opinion 16 I, Gamer Contributing editor Lauren “Guardi3n “ Das Neves 18 The Game Stalkerer 20 The Indie Investigatorgator Technical writer 22 Miktar’s Meanderingsrings Neo “ShockG“ Sibeko 83 Hardwired 98 Game Over Features International correspondent Miktar “Miktar” Dracon 30 TOPTOP 8 HOLYHOLY SH*TSH*T MOMENTS IN GAMING Contributors Previews Throughout gaming’s relatively short history, we’ve Rodain “Nandrew” Joubert 44 Sleeping Dogs been treated to a number of moments that very nearly Walt “Ramjet” Pretorius 46 Injustice: Gods Among Us made our minds explode out the back of our heads. Miklós “Mikit0707 “ Szecsei Find out what those are. Pippa “UnexpectedGirl” Tshabalala 48 Beyond: Two Souls Tarryn “Azimuth “ Van Der Byl 50 Pikmin 3 Adam “Madman” Liebman 52 The Cave 32 THE ULTIMATE GAMING LOUNGE Tired of your boring, traditional lounge fi lled with Art director boring, traditional lounge stuff ? Then read this! Chris “SAVAGE“ Savides Reviews Photography 60 Reviews: Introduction 36 READER U Chris “SAVAGE“ Savides The results of our recent reader survey have been 61 Short Reviews: Dreamstime.com tallied and weighed by humans better at mathematics Fotolia.com Death Rally / Deadlight and number-y stuff than we pretend to be! We’d like 62 The Secret World to share some of the less top-secret results with you.
    [Show full text]
  • Advanced Computer Graphics to Do Motivation Real-Time Rendering
    To Do Advanced Computer Graphics § Assignment 2 due Feb 19 § Should already be well on way. CSE 190 [Winter 2016], Lecture 12 § Contact us for difficulties etc. Ravi Ramamoorthi http://www.cs.ucsd.edu/~ravir Motivation Real-Time Rendering § Today, create photorealistic computer graphics § Goal: interactive rendering. Critical in many apps § Complex geometry, lighting, materials, shadows § Games, visualization, computer-aided design, … § Computer-generated movies/special effects (difficult or impossible to tell real from rendered…) § Until 10-15 years ago, focus on complex geometry § CSE 168 images from rendering competition (2011) § § But algorithms are very slow (hours to days) Chasm between interactivity, realism Evolution of 3D graphics rendering Offline 3D Graphics Rendering Interactive 3D graphics pipeline as in OpenGL Ray tracing, radiosity, photon mapping § Earliest SGI machines (Clark 82) to today § High realism (global illum, shadows, refraction, lighting,..) § Most of focus on more geometry, texture mapping § But historically very slow techniques § Some tweaks for realism (shadow mapping, accum. buffer) “So, while you and your children’s children are waiting for ray tracing to take over the world, what do you do in the meantime?” Real-Time Rendering SGI Reality Engine 93 (Kurt Akeley) Pictures courtesy Henrik Wann Jensen 1 New Trend: Acquired Data 15 years ago § Image-Based Rendering: Real/precomputed images as input § High quality rendering: ray tracing, global illumination § Little change in CSE 168 syllabus, from 2003 to
    [Show full text]
  • Developer Tools Showcase
    Developer Tools Showcase Randy Fernando Developer Tools Product Manager NVISION 2008 Software Content Creation Performance Education Development FX Composer Shader PerfKit Conference Presentations Debugger mental mill PerfHUD Whitepapers Artist Edition Direct3D SDK PerfSDK GPU Programming Guide NVIDIA OpenGL SDK Shader Library GLExpert Videos CUDA SDK NV PIX Plug‐in Photoshop Plug‐ins Books Cg Toolkit gDEBugger GPU Gems 3 Texture Tools NVSG GPU Gems 2 Melody PhysX SDK ShaderPerf GPU Gems PhysX Plug‐Ins PhysX VRD PhysX Tools The Cg Tutorial NVIDIA FX Composer 2.5 The World’s Most Advanced Shader Authoring Environment DirectX 10 Support NVIDIA Shader Debugger Support ShaderPerf 2.0 Integration Visual Models & Styles Particle Systems Improved User Interface Particle Systems All-New Start Page 350Z Sample Project Visual Models & Styles Other Major Features Shader Creation Wizard Code Editor Quickly create common shaders Full editor with assisted Shader Library code generation Hundreds of samples Properties Panel Texture Viewer HDR Color Picker Materials Panel View, organize, and apply textures Even More Features Automatic Light Binding Complete Scripting Support Support for DirectX 10 (Geometry Shaders, Stream Out, Texture Arrays) Support for COLLADA, .FBX, .OBJ, .3DS, .X Extensible Plug‐in Architecture with SDK Customizable Layouts Semantic and Annotation Remapping Vertex Attribute Packing Remote Control Capability New Sample Projects 350Z Visual Styles Atmospheric Scattering DirectX 10 PCSS Soft Shadows Materials Post‐Processing Simple Shadows
    [Show full text]
  • Real-Time Rendering Techniques with Hardware Tessellation
    Volume 34 (2015), Number x pp. 0–24 COMPUTER GRAPHICS forum Real-time Rendering Techniques with Hardware Tessellation M. Nießner1 and B. Keinert2 and M. Fisher1 and M. Stamminger2 and C. Loop3 and H. Schäfer2 1Stanford University 2University of Erlangen-Nuremberg 3Microsoft Research Abstract Graphics hardware has been progressively optimized to render more triangles with increasingly flexible shading. For highly detailed geometry, interactive applications restricted themselves to performing transforms on fixed geometry, since they could not incur the cost required to generate and transfer smooth or displaced geometry to the GPU at render time. As a result of recent advances in graphics hardware, in particular the GPU tessellation unit, complex geometry can now be generated on-the-fly within the GPU’s rendering pipeline. This has enabled the generation and displacement of smooth parametric surfaces in real-time applications. However, many well- established approaches in offline rendering are not directly transferable due to the limited tessellation patterns or the parallel execution model of the tessellation stage. In this survey, we provide an overview of recent work and challenges in this topic by summarizing, discussing, and comparing methods for the rendering of smooth and highly-detailed surfaces in real-time. 1. Introduction Hardware tessellation has attained widespread use in computer games for displaying highly-detailed, possibly an- Graphics hardware originated with the goal of efficiently imated, objects. In the animation industry, where displaced rendering geometric surfaces. GPUs achieve high perfor- subdivision surfaces are the typical modeling and rendering mance by using a pipeline where large components are per- primitive, hardware tessellation has also been identified as a formed independently and in parallel.
    [Show full text]
  • Slang: Language Mechanisms for Extensible Real-Time Shading Systems
    Slang: language mechanisms for extensible real-time shading systems YONG HE, Carnegie Mellon University KAYVON FATAHALIAN, Stanford University TIM FOLEY, NVIDIA Designers of real-time rendering engines must balance the conicting goals and GPU eciently, and minimizing CPU overhead using the new of maintaining clear, extensible shading systems and achieving high render- parameter binding model oered by the modern Direct3D 12 and ing performance. In response, engine architects have established eective de- Vulkan graphics APIs. sign patterns for authoring shading systems, and developed engine-specic To help navigate the tension between performance and maintain- code synthesis tools, ranging from preprocessor hacking to domain-specic able/extensible code, engine architects have established eective shading languages, to productively implement these patterns. The problem is design patterns for authoring shading systems, and developed code that proprietary tools add signicant complexity to modern engines, lack ad- vanced language features, and create additional challenges for learning and synthesis tools, ranging from preprocessor hacking, to metapro- adoption. We argue that the advantages of engine-specic code generation gramming, to engine-proprietary domain-specic languages (DSLs) tools can be achieved using the underlying GPU shading language directly, [Tatarchuk and Tchou 2017], for implementing these patterns. For provided the shading language is extended with a small number of best- example, the idea of shader components [He et al. 2017] was recently practice principles from modern, well-established programming languages. presented as a pattern for achieving both high rendering perfor- We identify that adding generics with interface constraints, associated types, mance and maintainable code structure when specializing shader and interface/structure extensions to existing C-like GPU shading languages code to coarse-grained features such as a surface material pattern or enables real-time renderer developers to build shading systems that are a tessellation eect.
    [Show full text]
  • NVIDIA Quadro P620
    UNMATCHED POWER. UNMATCHED CREATIVE FREEDOM. NVIDIA® QUADRO® P620 Powerful Professional Graphics with FEATURES > Four Mini DisplayPort 1.4 Expansive 4K Visual Workspace Connectors1 > DisplayPort with Audio The NVIDIA Quadro P620 combines a 512 CUDA core > NVIDIA nView® Desktop Pascal GPU, large on-board memory and advanced Management Software display technologies to deliver amazing performance > HDCP 2.2 Support for a range of professional workflows. 2 GB of ultra- > NVIDIA Mosaic2 fast GPU memory enables the creation of complex 2D > Dedicated hardware video encode and decode engines3 and 3D models and a flexible single-slot, low-profile SPECIFICATIONS form factor makes it compatible with even the most GPU Memory 2 GB GDDR5 space and power-constrained chassis. Support for Memory Interface 128-bit up to four 4K displays (4096x2160 @ 60 Hz) with HDR Memory Bandwidth Up to 80 GB/s color gives you an expansive visual workspace to view NVIDIA CUDA® Cores 512 your creations in stunning detail. System Interface PCI Express 3.0 x16 Quadro cards are certified with a broad range of Max Power Consumption 40 W sophisticated professional applications, tested by Thermal Solution Active leading workstation manufacturers, and backed by Form Factor 2.713” H x 5.7” L, a global team of support specialists, giving you the Single Slot, Low Profile peace of mind to focus on doing your best work. Display Connectors 4x Mini DisplayPort 1.4 Whether you’re developing revolutionary products or Max Simultaneous 4 direct, 4x DisplayPort telling spectacularly vivid visual stories, Quadro gives Displays 1.4 Multi-Stream you the performance to do it brilliantly.
    [Show full text]
  • Nvidia Quadro T1000
    NVIDIA professional laptop GPUs power the world’s most advanced thin and light mobile workstations and unique compact devices to meet the visual computing needs of professionals across a wide range of industries. The latest generation of NVIDIA RTX professional laptop GPUs, built on the NVIDIA Ampere architecture combine the latest advancements in real-time ray tracing, advanced shading, and AI-based capabilities to tackle the most demanding design and visualization workflows on the go. With the NVIDIA PROFESSIONAL latest graphics technology, enhanced performance, and added compute power, NVIDIA professional laptop GPUs give designers, scientists, and artists the tools they need to NVIDIA MOBILE GRAPHICS SOLUTIONS work efficiently from anywhere. LINE CARD GPU SPECIFICATIONS PERFORMANCE OPTIONS 2 1 ® / TXAA™ Anti- ® ™ 3 4 * 5 NVIDIA FXAA Aliasing Manager NVIDIA RTX Desktop Support Vulkan NVIDIA Optimus NVIDIA CUDA NVIDIA RT Cores Cores Tensor GPU Memory Memory Bandwidth* Peak Memory Type Memory Interface Consumption Max Power TGP DisplayPort Open GL Shader Model DirectX PCIe Generation Floating-Point Precision Single Peak)* (TFLOPS, Performance (TFLOPS, Performance Tensor Peak) Gen MAX-Q Technology 3rd NVENC / NVDEC Processing Cores Processing Laptop GPUs 48 (2nd 192 (3rd NVIDIA RTX A5000 6,144 16 GB 448 GB/s GDDR6 256-bit 80 - 165 W* 1.4 4.6 7.0 12 Ultimate 4 21.7 174.0 Gen) Gen) 40 (2nd 160 (3rd NVIDIA RTX A4000 5,120 8 GB 384 GB/s GDDR6 256-bit 80 - 140 W* 1.4 4.6 7.0 12 Ultimate 4 17.8 142.5 Gen) Gen) 32 (2nd 128 (3rd NVIDIA RTX A3000
    [Show full text]
  • A Qualitative Comparison Study Between Common GPGPU Frameworks
    A Qualitative Comparison Study Between Common GPGPU Frameworks. Adam Söderström Department of Computer and Information Science Linköping University This dissertation is submitted for the degree of M. Sc. in Media Technology and Engineering June 2018 Acknowledgements I would like to acknowledge MindRoad AB and Åsa Detterfelt for making this work possible. I would also like to thank Ingemar Ragnemalm and August Ernstsson at Linköping University. Abstract The development of graphic processing units have during the last decade improved signif- icantly in performance while at the same time becoming cheaper. This has developed a new type of usage of the device where the massive parallelism available in modern GPU’s are used for more general purpose computing, also known as GPGPU. Frameworks have been developed just for this purpose and some of the most popular are CUDA, OpenCL and DirectX Compute Shaders, also known as DirectCompute. The choice of what framework to use may depend on factors such as features, portability and framework complexity. This paper aims to evaluate these concepts, while also comparing the speedup of a parallel imple- mentation of the N-Body problem with Barnes-hut optimization, compared to a sequential implementation. Table of contents List of figures xi List of tables xiii Nomenclature xv 1 Introduction1 1.1 Motivation . .1 1.2 Aim . .3 1.3 Research questions . .3 1.4 Delimitations . .4 1.5 Related work . .4 1.5.1 Framework comparison . .4 1.5.2 N-Body with Barnes-Hut . .5 2 Theory9 2.1 Background . .9 2.1.1 GPGPU History . 10 2.2 GPU Architecture .
    [Show full text]
  • Graphics Shaders Mike Hergaarden January 2011, VU Amsterdam
    Graphics shaders Mike Hergaarden January 2011, VU Amsterdam [Source: http://unity3d.com/support/documentation/Manual/Materials.html] Abstract Shaders are the key to finalizing any image rendering, be it computer games, images or movies. Shaders have greatly improved the output of computer generated media; from photo-realistic computer generated humans to more vivid cartoon movies. Furthermore shaders paved the way to general-purpose computation on GPU (GPGPU). This paper introduces shaders by discussing the theory and practice. Introduction A shader is a piece of code that is executed on the Graphics Processing Unit (GPU), usually found on a graphics card, to manipulate an image before it is drawn to the screen. Shaders allow for various kinds of rendering effect, ranging from adding an X-Ray view to adding cartoony outlines to rendering output. The history of shaders starts at LucasFilm in the early 1980’s. LucasFilm hired graphics programmers to computerize the special effects industry [1]. This proved a success for the film/ rendering industry, especially at Pixars Toy Story movie launch in 1995. RenderMan introduced the notion of Shaders; “The Renderman Shading Language allows material definitions of surfaces to be described in not only a simple manner, but also highly complex and custom manner using a C like language. Using this method as opposed to a pre-defined set of materials allows for complex procedural textures, new shading models and programmable lighting. Another thing that sets the renderers based on the RISpec apart from many other renderers, is the ability to output arbitrary variables as an image—surface normals, separate lighting passes and pretty much anything else can be output from the renderer in one pass.” [1] The term shader was first only used to refer to “pixel shaders”, but soon enough new uses of shaders such as vertex and geometry shaders were introduced, making the term shaders more general.
    [Show full text]
  • Embedded Solutions Nvidia Quadro Mxm Modules
    EMBEDDED SOLUTIONS NVIDIA QUADRO MXM MODULES NVIDIA QUADRO PERFORMANCE AND PRODUCT NVIDIA QUADRO NVIDIA QUADRO NVIDIA QUADRO NVIDIA QUADRO NVIDIA QUADRO NVIDIA QUADRO FEATURES IN AN MXM FORM FACTOR FEATURE RTX 5000 RTX 3000 T1000 P5000 P3000 P1000 PNY PART NUMBER QRTX5000-KIT QRTX3000-KIT QT1000-KIT QP5000-KIT QP3000-KIT QP1000-KIT NVIDIA® Quadro® RTX (Turing™) and Pascal MXM modules GPU ARCHITECTURE NVIDIA Turing NVIDIA Pascal offer professional NVIDIA Quadro performance, features, INTERFACE MXM 3.1 SDK and API support, exacting build standards, rigorous FORM FACTOR Type-B Type-BType-A Type-BType-B Type-A quality assurance, and broad ISV application compatibility. DIMENSIONS 82 x 105 mm 82 x 105 mm 82 x 70 mm 82 x 105 mm 82 x 105 mm 82 x 70 mm PEAK FP32 PERF. 9.49 TFLOPS 5.3 TFLOPS 2.6 TFLOPS 6.4 TFLOPS 3.9 TFLOPS 1.5 TFLOPS Designed for the needs of embedded, ruggedized, or PEAK FP16 PERF. 18.98 TFLOPS 12.72 TFLOPS 4.47 TFLOPS 101.2 GFLOPS 48.6 GFLOPS 23.89 GFLOPS mobile system builders, these products make NVIDIA NVIDIA® CUDA® CORES 3072 1920 896 2048 1280 512 Quadro RTX™ real-time rendering and AI/DL/Ml capabilities RT CORES N/A 36 Not Applicable (RTX 5000 and RTX 3000) available to form factors TENSOR CORES 384 240 Not Applicable unsuited to traditional PCI Express expansion cards. GPU MEMORY 16 GB 6 GB 4 GB 16 GB 6 GB 4 GB Pascal MXM products offer superb graphics capabilities MEMORY TYPE GDDR6 GDDR5 and outstanding FP32 compute capabilities.
    [Show full text]