GPU Architecture II: Scheduling the Graphics Pipeline Mike Houston, AMD / Stanford Aaron Lefohn, Intel / University of Washington

Total Page:16

File Type:pdf, Size:1020Kb

GPU Architecture II: Scheduling the Graphics Pipeline Mike Houston, AMD / Stanford Aaron Lefohn, Intel / University of Washington GPU architecture II: Scheduling the graphics pipeline Mike Houston, AMD / Stanford Aaron Lefohn, Intel / University of Washington Winter 2011 – Beyond Programmable Shading 1 Notes • The front half of this talk is almost verbatim from: “Keeping Many Cores Busy: Scheduling the Graphics Pipeline” by Jonathan Ragan-Kelley from SIGGRAPH Beyond Programmable Shading, 2010 • I’ll be adding my own twists from my background on working on some of the hardware in the talk Winter 2011 – Beyond Programmable Shading 2 This talk • How to think about scheduling GPU-style pipelines Four constraints which drive scheduling decisions • Examples of these concepts in real GPU designs • Goals – Know why GPUs, APIs impose the constraints they do. – Develop intuition for what they can do well. – Understand key patterns for building your own pipelines. • Dig into ATI Radeon™ HD 5800 Series (“Cypress”) architecture deep dive – What questions do you have? Winter 2011 – Beyond Programmable Shading 3 First, a definition Scheduling [n.]: Assigning computations and data to resources in space and time. Winter 2011 – Beyond Programmable Shading 4 The workload: Direct3D IA VS PA Logical pipeline Fixed-function stage HS Programmable stage Tess DS data flow data PA GS Rast PS Blend Winter 2011 – Beyond Programmable Shading 5 The machine: a modern GPU Shader Shader Input Primitive Logical pipeline Tex Core Core Assembler Assembler Fixed-function stage Programmable stage Shader Shader Output Tex Rasterizer Core Core Blend Physical processor Shader Shader Fixed-function logic Tex Core Core Programmable core Task Distributor Shader Shader Fixed-function control Tex Core Core Winter 2011 – Beyond Programmable Shading 6 Scheduling a draw call as a series of tasks Shader Shader Shader Shader Input Primitive Output Rasterizer Core Core Core Core Assembler Assembler Blend IA VS PA Rast time PS PS PS Blend Blend Blend Winter 2011 – Beyond Programmable Shading 7 An efficient schedule keeps hardware busy Shader Shader Shader Shader Input Primitive Output Rasterizer Core Core Core Core Assembler Assembler Blend PS VS IA PA Rast Blend VS PS PS PS IA PA Blend Rast VS VS VS IA PA PS time PS PS PA Rast Blend IA PS PS PS PS Rast PA VS VS VS IA Rast Blend PS PS PS PS IA PA Rast Blend Winter 2011 – Beyond Programmable Shading 8 Choosing which tasks to run when (and where) • Resource constraints – Tasks can only execute when there are sufficient resources for their computation and their data. • Coherence – Control coherence is essential to shader core efficiency. – Data coherence is essential to memory and communication efficiency. • Load balance – Irregularity in execution time create bubbles in the pipeline schedule. • Ordering – Graphics APIs define strict ordering semantics, which restrict possible schedules. Winter 2011 – Beyond Programmable Shading 9 Resource constraints limit scheduling options Shader Shader Shader Shader Input Primitive Output Rasterizer Core Core Core Core Assembler Assembler Blend VS IA PS PS PS time ??? PS Winter 2011 – Beyond Programmable Shading 10 Resource constraints limit scheduling options Shader Shader Shader Shader Input Primitive Output Rasterizer Core Core Core Core Assembler Assembler Blend VS IA PS PS PS Deadlock??? time ??? PS Key Pre-allocation of concept: resources helps guarantee forward progress. Winter 2011 – Beyond Programmable Shading 11 Coherence is a balancing act Intrinsic tension between: Horizontal (control, fetch) coherence and Vertical (producer-consumer) locality. Locality and Load Balance. Winter 2011 – Beyond Programmable Shading 12 Graphics workloads are irregular SuperExpensiveShader( ) TrivialShader( ) Rasterizer …! But: Shaders are optimized for regular, self-similar work. Imbalanced work creates bubbles in the task schedule. Solution: Dynamically generating and aggregating tasks isolates irregularity and recaptures coherence. Redistributing tasks restores load balance. Winter 2011 – Beyond Programmable Shading 13 Redistribution after irregular amplification Shader Shader Shader Shader Input Primitive Output Rasterizer Core Core Core Core Assembler Assembler Blend IA VS PA Rast time PS PS PS Blend Blend Blend Winter 2011 – Beyond Programmable Shading 14 Redistribution after irregular amplification Shader Shader Shader Shader Input Primitive Output Rasterizer Core Core Core Core Assembler Assembler Blend IA VS PA Rast time PS PS PS Blend Blend Key Managing irregularity by dynamically concept: generating, aggregating, and redistributing Blend tasks Winter 2011 – Beyond Programmable Shading 15 Ordering Rule: All framebuffer updates must appear as though all triangles were drawn in strict sequential order Key Carefully structuring task concept: redistribution to maintain API ordering. Winter 2011 – Beyond Programmable Shading 16 Building a real pipeline Winter 2011 – Beyond Programmable Shading 17 Static tile scheduling The simplest thing that could possibly work. Vertex Pixel Pixel Pixel Pixel Multiple cores: 1 front-end n back-end Exemplar: ARM Mali 400 Winter 2011 – Beyond Programmable Shading 18 Static tile scheduling Pixel Pixel Vertex Pixel Pixel Exemplar: ARM Mali 400 Winter 2011 – Beyond Programmable Shading 19 Static tile scheduling Pixel Pixel Vertex Pixel Pixel Exemplar: ARM Mali 400 Winter 2011 – Beyond Programmable Shading 20 Static tile scheduling Pixel Pixel Vertex Pixel Pixel Exemplar: ARM Mali 400 Winter 2011 – Beyond Programmable Shading 21 Static tile scheduling Locality captured within Pixel tiles Resource Pixel constraints static = simple Ordering Pixel single front-end, sequential processing within Pixel each tile Exemplar: ARM Mali 400 Winter 2011 – Beyond Programmable Shading 22 Static tile scheduling Pixel !!! The problem: load imbalance Pixel idle… only one task creation point. no dynamic task Pixel idle… redistribution. Pixel idle… Exemplar: ARM Mali 400 Winter 2011 – Beyond Programmable Shading 23 Sort-last fragment shading Pixel Pixel Redistribution restores fragment Vertex Rasterizer load balance. Pixel But how can we maintain order? Pixel Exemplars: NVIDIA G80, ATI RV770 Winter 2011 – Beyond Programmable Shading 24 Sort-last fragment shading Pixel Pixel Vertex Rasterizer Pixel Preallocate outputs in Pixel FIFO order Exemplars: NVIDIA G80, ATI RV770 Winter 2011 – Beyond Programmable Shading 25 Sort-last fragment shading Pixel Pixel Vertex Rasterizer Pixel Complete shading asynchronously Pixel Exemplars: NVIDIA G80, ATI RV770 Winter 2011 – Beyond Programmable Shading 26 Sort-last fragment shading Blend Pixel fragments in FIFO order Pixel Output Vertex Rasterizer Blend Pixel Pixel Exemplars: NVIDIA G80, ATI RV770 Winter 2011 – Beyond Programmable Shading 27 Unified shaders Solve load balance by time-multiplexing different stages onto shared processors according to load Shader Shader Input Primitive Tex Core Core Assembler Assembler Shader Shader Output Tex Rasterizer Core Core Blend Shader Shader Tex Core Core Task Distributor Shader Shader Tex Core Core Exemplars: NVIDIA G80, ATI RV770 Winter 2011 – Beyond Programmable Shading 28 Unified Shaders: time-multiplexing cores Shader Shader Shader Shader Input Primitive Output Rasterizer Core Core Core Core Assembler Assembler Blend IA VS PA Rast time PS PS PS Blend Blend Blend Exemplars: NVIDIA G80, ATI RV770 Winter 2011 – Beyond Programmable Shading 29 Prioritizing the logical pipeline IA 5 VS 4 priority PA 3 Rast 2 PS 1 Blend 0 Winter 2011 – Beyond Programmable Shading 30 Prioritizing the logical pipeline IA 5 VS 4 priority fixed-size PA 3 queue storage Rast 2 PS 1 Blend 0 Winter 2011 – Beyond Programmable Shading 31 Scheduling the pipeline Shader Shader IA Core Core VS PA time Rast PS Blend Winter 2011 – Beyond Programmable Shading 32 Scheduling the pipeline Shader Shader IA Core Core VS VS VS PA time Rast PS Blend Winter 2011 – Beyond Programmable Shading 33 Scheduling the pipeline Shader Shader IA Core Core VS VS VS PA time Rast PS Blend Winter 2011 – Beyond Programmable Shading 34 Scheduling the pipeline Shader Shader IA Core Core High priority, VS VS VS but stalled on output PS PS PA time Rast Lower PS priority, but ready to run Blend Winter 2011 – Beyond Programmable Shading 35 Scheduling the pipeline Shader Shader IA Core Core VS VS VS PS PS PA time Rast … PS Blend Winter 2011 – Beyond Programmable Shading 36 Scheduling the pipeline Queue sizes and backpressure provide a natural knob for balancing horizontal batch coherence and producer-consumer locality. Winter 2011 – Beyond Programmable Shading 37 OptiX Winter 2011 – Beyond Programmable Shading 38 Summary Winter 2011 – Beyond Programmable Shading 39 Key concepts • Think of scheduling the pipeline as mapping tasks onto cores. • Preallocate resources before launching a task. – Preallocation helps ensure forward progress and prevent deadlock. • Graphics is irregular. – Dynamically generating, aggregating and redistributing tasks at irregular amplification points regains coherence and load balance. • Order matters. – Carefully structure task redistribution to maintain ordering. Winter 2011 – Beyond Programmable Shading 40 Why don’t we have dynamic resource allocation? e.g. recursion, malloc() in shaders Static preallocation of resources guarantees forward progress. Tasks which outgrow available resources can stall, causing deadlock. Winter 2011 – Beyond Programmable Shading 41 Geometry Shaders are slow because they allow dynamic amplification in shaders. •Pick your poison: – Always stream through DRAM. –Exemplar: ATI R600 –Smooth falloff for large amplification, but very slow for small amplification
Recommended publications
  • GLSL 4.50 Spec
    The OpenGL® Shading Language Language Version: 4.50 Document Revision: 7 09-May-2017 Editor: John Kessenich, Google Version 1.1 Authors: John Kessenich, Dave Baldwin, Randi Rost Copyright (c) 2008-2017 The Khronos Group Inc. All Rights Reserved. This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast, or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specification for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part. Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be reformatted AS LONG AS the contents of the specification are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group website should be included whenever possible with specification distributions.
    [Show full text]
  • First Person Shooting (FPS) Game
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072 Thunder Force - First Person Shooting (FPS) Game Swati Nadkarni1, Panjab Mane2, Prathamesh Raikar3, Saurabh Sawant4, Prasad Sawant5, Nitesh Kuwalekar6 1 Head of Department, Department of Information Technology, Shah & Anchor Kutchhi Engineering College 2 Assistant Professor, Department of Information Technology, Shah & Anchor Kutchhi Engineering College 3,4,5,6 B.E. student, Department of Information Technology, Shah & Anchor Kutchhi Engineering College ----------------------------------------------------------------***----------------------------------------------------------------- Abstract— It has been found in researches that there is an have challenged hardware development, and multiplayer association between playing first-person shooter video games gaming has been integral. First-person shooters are a type of and having superior mental flexibility. It was found that three-dimensional shooter game featuring a first-person people playing such games require a significantly shorter point of view with which the player sees the action through reaction time for switching between complex tasks, mainly the eyes of the player character. They are unlike third- because when playing fps games they require to rapidly react person shooters in which the player can see (usually from to fast moving visuals by developing a more responsive mind behind) the character they are controlling. The primary set and to shift back and forth between different sub-duties. design element is combat, mainly involving firearms. First person-shooter games are also of ten categorized as being The successful design of the FPS game with correct distinct from light gun shooters, a similar genre with a first- direction, attractive graphics and models will give the best person perspective which uses light gun peripherals, in experience to play the game.
    [Show full text]
  • Advanced Computer Graphics to Do Motivation Real-Time Rendering
    To Do Advanced Computer Graphics § Assignment 2 due Feb 19 § Should already be well on way. CSE 190 [Winter 2016], Lecture 12 § Contact us for difficulties etc. Ravi Ramamoorthi http://www.cs.ucsd.edu/~ravir Motivation Real-Time Rendering § Today, create photorealistic computer graphics § Goal: interactive rendering. Critical in many apps § Complex geometry, lighting, materials, shadows § Games, visualization, computer-aided design, … § Computer-generated movies/special effects (difficult or impossible to tell real from rendered…) § Until 10-15 years ago, focus on complex geometry § CSE 168 images from rendering competition (2011) § § But algorithms are very slow (hours to days) Chasm between interactivity, realism Evolution of 3D graphics rendering Offline 3D Graphics Rendering Interactive 3D graphics pipeline as in OpenGL Ray tracing, radiosity, photon mapping § Earliest SGI machines (Clark 82) to today § High realism (global illum, shadows, refraction, lighting,..) § Most of focus on more geometry, texture mapping § But historically very slow techniques § Some tweaks for realism (shadow mapping, accum. buffer) “So, while you and your children’s children are waiting for ray tracing to take over the world, what do you do in the meantime?” Real-Time Rendering SGI Reality Engine 93 (Kurt Akeley) Pictures courtesy Henrik Wann Jensen 1 New Trend: Acquired Data 15 years ago § Image-Based Rendering: Real/precomputed images as input § High quality rendering: ray tracing, global illumination § Little change in CSE 168 syllabus, from 2003 to
    [Show full text]
  • Developer Tools Showcase
    Developer Tools Showcase Randy Fernando Developer Tools Product Manager NVISION 2008 Software Content Creation Performance Education Development FX Composer Shader PerfKit Conference Presentations Debugger mental mill PerfHUD Whitepapers Artist Edition Direct3D SDK PerfSDK GPU Programming Guide NVIDIA OpenGL SDK Shader Library GLExpert Videos CUDA SDK NV PIX Plug‐in Photoshop Plug‐ins Books Cg Toolkit gDEBugger GPU Gems 3 Texture Tools NVSG GPU Gems 2 Melody PhysX SDK ShaderPerf GPU Gems PhysX Plug‐Ins PhysX VRD PhysX Tools The Cg Tutorial NVIDIA FX Composer 2.5 The World’s Most Advanced Shader Authoring Environment DirectX 10 Support NVIDIA Shader Debugger Support ShaderPerf 2.0 Integration Visual Models & Styles Particle Systems Improved User Interface Particle Systems All-New Start Page 350Z Sample Project Visual Models & Styles Other Major Features Shader Creation Wizard Code Editor Quickly create common shaders Full editor with assisted Shader Library code generation Hundreds of samples Properties Panel Texture Viewer HDR Color Picker Materials Panel View, organize, and apply textures Even More Features Automatic Light Binding Complete Scripting Support Support for DirectX 10 (Geometry Shaders, Stream Out, Texture Arrays) Support for COLLADA, .FBX, .OBJ, .3DS, .X Extensible Plug‐in Architecture with SDK Customizable Layouts Semantic and Annotation Remapping Vertex Attribute Packing Remote Control Capability New Sample Projects 350Z Visual Styles Atmospheric Scattering DirectX 10 PCSS Soft Shadows Materials Post‐Processing Simple Shadows
    [Show full text]
  • Real-Time Rendering Techniques with Hardware Tessellation
    Volume 34 (2015), Number x pp. 0–24 COMPUTER GRAPHICS forum Real-time Rendering Techniques with Hardware Tessellation M. Nießner1 and B. Keinert2 and M. Fisher1 and M. Stamminger2 and C. Loop3 and H. Schäfer2 1Stanford University 2University of Erlangen-Nuremberg 3Microsoft Research Abstract Graphics hardware has been progressively optimized to render more triangles with increasingly flexible shading. For highly detailed geometry, interactive applications restricted themselves to performing transforms on fixed geometry, since they could not incur the cost required to generate and transfer smooth or displaced geometry to the GPU at render time. As a result of recent advances in graphics hardware, in particular the GPU tessellation unit, complex geometry can now be generated on-the-fly within the GPU’s rendering pipeline. This has enabled the generation and displacement of smooth parametric surfaces in real-time applications. However, many well- established approaches in offline rendering are not directly transferable due to the limited tessellation patterns or the parallel execution model of the tessellation stage. In this survey, we provide an overview of recent work and challenges in this topic by summarizing, discussing, and comparing methods for the rendering of smooth and highly-detailed surfaces in real-time. 1. Introduction Hardware tessellation has attained widespread use in computer games for displaying highly-detailed, possibly an- Graphics hardware originated with the goal of efficiently imated, objects. In the animation industry, where displaced rendering geometric surfaces. GPUs achieve high perfor- subdivision surfaces are the typical modeling and rendering mance by using a pipeline where large components are per- primitive, hardware tessellation has also been identified as a formed independently and in parallel.
    [Show full text]
  • NVIDIA Quadro Technical Specifications
    NVIDIA Quadro Technical Specifications NVIDIA Quadro Workstation GPU High-resolution Antialiasing ° Dassault CATIA • Full 128-bit floating point precision • Up to 16x full-scene antialiasing (FSAA), ° ESRI ArcGIS pipeline at resolutions up to 1920 x 1200 ° ICEM Surf • 12-bit subpixel precision • 12-bit subpixel sampling precision ° MSC.Nastran, MSC.Patran • Hardware-accelerated antialiased enhances AA quality ° PTC Pro/ENGINEER Wildfire, points and lines • Rotated-grid FSAA significantly 3Dpaint, CDRS The NVIDIA Quadro® family of In addition to a full line up of 2D and • Hardware OpenGL overlay planes increases color accuracy and visual ° SolidWorks • Hardware-accelerated two-sided quality for edges, while maintaining ° UDS NX Series, I-deas, SolidEdge, professional solutions for workstations 3D workstation graphics solutions, the lighting performance3 Unigraphics, SDRC delivers the fastest application NVIDIA Quadro professional products • Hardware-accelerated clipping planes and many more… Memory performance and the highest quality include a set of specialty solutions that • Third-generation occlusion culling • Digital Content Creation (DCC) graphics. have been architected to meet the • 16 textures per pixel • High-speed memory (up to 512MB Alias Maya, MOTIONBUILDER needs of a wide range of industry • OpenGL quad-buffered stereo (3-pin GDDR3) ° NewTek Lightwave 3D Raw performance and quality are only sync connector) • Advanced lossless compression ° professionals. These specialty Autodesk Media and Entertainment the beginning. The NVIDIA
    [Show full text]
  • Extending the Graphics Pipeline with Adaptive, Multi-Rate Shading
    Extending the Graphics Pipeline with Adaptive, Multi-Rate Shading Yong He Yan Gu Kayvon Fatahalian Carnegie Mellon University Abstract compute capability as a primary mechanism for improving the qual- ity of real-time graphics. Simply put, to scale to more advanced Due to complex shaders and high-resolution displays (particularly rendering effects and to high-resolution outputs, future GPUs must on mobile graphics platforms), fragment shading often dominates adopt techniques that perform shading calculations more efficiently the cost of rendering in games. To improve the efficiency of shad- than the brute-force approaches used today. ing on GPUs, we extend the graphics pipeline to natively support techniques that adaptively sample components of the shading func- In this paper, we enable high-quality shading at reduced cost on tion more sparsely than per-pixel rates. We perform an extensive GPUs by extending the graphics pipeline’s fragment shading stage study of the challenges of integrating adaptive, multi-rate shading to natively support techniques that adaptively sample aspects of the into the graphics pipeline, and evaluate two- and three-rate imple- shading function more sparsely than per-pixel rates. Specifically, mentations that we believe are practical evolutions of modern GPU our extensions allow different components of the pipeline’s shad- designs. We design new shading language abstractions that sim- ing function to be evaluated at different screen-space rates and pro- plify development of shaders for this system, and design adaptive vide mechanisms for shader programs to dynamically determine (at techniques that use these mechanisms to reduce the number of in- fine screen granularity) which computations to perform at which structions performed during shading by more than a factor of three rates.
    [Show full text]
  • NVIDIA Quadro P620
    UNMATCHED POWER. UNMATCHED CREATIVE FREEDOM. NVIDIA® QUADRO® P620 Powerful Professional Graphics with FEATURES > Four Mini DisplayPort 1.4 Expansive 4K Visual Workspace Connectors1 > DisplayPort with Audio The NVIDIA Quadro P620 combines a 512 CUDA core > NVIDIA nView® Desktop Pascal GPU, large on-board memory and advanced Management Software display technologies to deliver amazing performance > HDCP 2.2 Support for a range of professional workflows. 2 GB of ultra- > NVIDIA Mosaic2 fast GPU memory enables the creation of complex 2D > Dedicated hardware video encode and decode engines3 and 3D models and a flexible single-slot, low-profile SPECIFICATIONS form factor makes it compatible with even the most GPU Memory 2 GB GDDR5 space and power-constrained chassis. Support for Memory Interface 128-bit up to four 4K displays (4096x2160 @ 60 Hz) with HDR Memory Bandwidth Up to 80 GB/s color gives you an expansive visual workspace to view NVIDIA CUDA® Cores 512 your creations in stunning detail. System Interface PCI Express 3.0 x16 Quadro cards are certified with a broad range of Max Power Consumption 40 W sophisticated professional applications, tested by Thermal Solution Active leading workstation manufacturers, and backed by Form Factor 2.713” H x 5.7” L, a global team of support specialists, giving you the Single Slot, Low Profile peace of mind to focus on doing your best work. Display Connectors 4x Mini DisplayPort 1.4 Whether you’re developing revolutionary products or Max Simultaneous 4 direct, 4x DisplayPort telling spectacularly vivid visual stories, Quadro gives Displays 1.4 Multi-Stream you the performance to do it brilliantly.
    [Show full text]
  • Graphics Pipeline and Rasterization
    Graphics Pipeline & Rasterization Image removed due to copyright restrictions. MIT EECS 6.837 – Matusik 1 How Do We Render Interactively? • Use graphics hardware, via OpenGL or DirectX – OpenGL is multi-platform, DirectX is MS only OpenGL rendering Our ray tracer © Khronos Group. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. 2 How Do We Render Interactively? • Use graphics hardware, via OpenGL or DirectX – OpenGL is multi-platform, DirectX is MS only OpenGL rendering Our ray tracer © Khronos Group. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. • Most global effects available in ray tracing will be sacrificed for speed, but some can be approximated 3 Ray Casting vs. GPUs for Triangles Ray Casting For each pixel (ray) For each triangle Does ray hit triangle? Keep closest hit Scene primitives Pixel raster 4 Ray Casting vs. GPUs for Triangles Ray Casting GPU For each pixel (ray) For each triangle For each triangle For each pixel Does ray hit triangle? Does triangle cover pixel? Keep closest hit Keep closest hit Scene primitives Pixel raster Scene primitives Pixel raster 5 Ray Casting vs. GPUs for Triangles Ray Casting GPU For each pixel (ray) For each triangle For each triangle For each pixel Does ray hit triangle? Does triangle cover pixel? Keep closest hit Keep closest hit Scene primitives It’s just a different orderPixel raster of the loops!
    [Show full text]
  • Nvidia Quadro T1000
    NVIDIA professional laptop GPUs power the world’s most advanced thin and light mobile workstations and unique compact devices to meet the visual computing needs of professionals across a wide range of industries. The latest generation of NVIDIA RTX professional laptop GPUs, built on the NVIDIA Ampere architecture combine the latest advancements in real-time ray tracing, advanced shading, and AI-based capabilities to tackle the most demanding design and visualization workflows on the go. With the NVIDIA PROFESSIONAL latest graphics technology, enhanced performance, and added compute power, NVIDIA professional laptop GPUs give designers, scientists, and artists the tools they need to NVIDIA MOBILE GRAPHICS SOLUTIONS work efficiently from anywhere. LINE CARD GPU SPECIFICATIONS PERFORMANCE OPTIONS 2 1 ® / TXAA™ Anti- ® ™ 3 4 * 5 NVIDIA FXAA Aliasing Manager NVIDIA RTX Desktop Support Vulkan NVIDIA Optimus NVIDIA CUDA NVIDIA RT Cores Cores Tensor GPU Memory Memory Bandwidth* Peak Memory Type Memory Interface Consumption Max Power TGP DisplayPort Open GL Shader Model DirectX PCIe Generation Floating-Point Precision Single Peak)* (TFLOPS, Performance (TFLOPS, Performance Tensor Peak) Gen MAX-Q Technology 3rd NVENC / NVDEC Processing Cores Processing Laptop GPUs 48 (2nd 192 (3rd NVIDIA RTX A5000 6,144 16 GB 448 GB/s GDDR6 256-bit 80 - 165 W* 1.4 4.6 7.0 12 Ultimate 4 21.7 174.0 Gen) Gen) 40 (2nd 160 (3rd NVIDIA RTX A4000 5,120 8 GB 384 GB/s GDDR6 256-bit 80 - 140 W* 1.4 4.6 7.0 12 Ultimate 4 17.8 142.5 Gen) Gen) 32 (2nd 128 (3rd NVIDIA RTX A3000
    [Show full text]
  • A Qualitative Comparison Study Between Common GPGPU Frameworks
    A Qualitative Comparison Study Between Common GPGPU Frameworks. Adam Söderström Department of Computer and Information Science Linköping University This dissertation is submitted for the degree of M. Sc. in Media Technology and Engineering June 2018 Acknowledgements I would like to acknowledge MindRoad AB and Åsa Detterfelt for making this work possible. I would also like to thank Ingemar Ragnemalm and August Ernstsson at Linköping University. Abstract The development of graphic processing units have during the last decade improved signif- icantly in performance while at the same time becoming cheaper. This has developed a new type of usage of the device where the massive parallelism available in modern GPU’s are used for more general purpose computing, also known as GPGPU. Frameworks have been developed just for this purpose and some of the most popular are CUDA, OpenCL and DirectX Compute Shaders, also known as DirectCompute. The choice of what framework to use may depend on factors such as features, portability and framework complexity. This paper aims to evaluate these concepts, while also comparing the speedup of a parallel imple- mentation of the N-Body problem with Barnes-hut optimization, compared to a sequential implementation. Table of contents List of figures xi List of tables xiii Nomenclature xv 1 Introduction1 1.1 Motivation . .1 1.2 Aim . .3 1.3 Research questions . .3 1.4 Delimitations . .4 1.5 Related work . .4 1.5.1 Framework comparison . .4 1.5.2 N-Body with Barnes-Hut . .5 2 Theory9 2.1 Background . .9 2.1.1 GPGPU History . 10 2.2 GPU Architecture .
    [Show full text]
  • Graphics Shaders Mike Hergaarden January 2011, VU Amsterdam
    Graphics shaders Mike Hergaarden January 2011, VU Amsterdam [Source: http://unity3d.com/support/documentation/Manual/Materials.html] Abstract Shaders are the key to finalizing any image rendering, be it computer games, images or movies. Shaders have greatly improved the output of computer generated media; from photo-realistic computer generated humans to more vivid cartoon movies. Furthermore shaders paved the way to general-purpose computation on GPU (GPGPU). This paper introduces shaders by discussing the theory and practice. Introduction A shader is a piece of code that is executed on the Graphics Processing Unit (GPU), usually found on a graphics card, to manipulate an image before it is drawn to the screen. Shaders allow for various kinds of rendering effect, ranging from adding an X-Ray view to adding cartoony outlines to rendering output. The history of shaders starts at LucasFilm in the early 1980’s. LucasFilm hired graphics programmers to computerize the special effects industry [1]. This proved a success for the film/ rendering industry, especially at Pixars Toy Story movie launch in 1995. RenderMan introduced the notion of Shaders; “The Renderman Shading Language allows material definitions of surfaces to be described in not only a simple manner, but also highly complex and custom manner using a C like language. Using this method as opposed to a pre-defined set of materials allows for complex procedural textures, new shading models and programmable lighting. Another thing that sets the renderers based on the RISpec apart from many other renderers, is the ability to output arbitrary variables as an image—surface normals, separate lighting passes and pretty much anything else can be output from the renderer in one pass.” [1] The term shader was first only used to refer to “pixel shaders”, but soon enough new uses of shaders such as vertex and geometry shaders were introduced, making the term shaders more general.
    [Show full text]