NVIDIA Quadro by PNY Spring 07 Sales Presentation

Total Page:16

File Type:pdf, Size:1020Kb

NVIDIA Quadro by PNY Spring 07 Sales Presentation VR/AR Enterprise Value Propositions Data immersion and 3D conceptualization NVIDIA RTX Server High-Performance Visual Computing in the Data Center NVIDIA RTX Server Do your life’s work from anywhere Creators, designers, data scientists, engineers, government workers, and students around the world are working or learning remotely wherever possible. They still need the powerful performance they relied on in the office, lab, and classroom to keep up with complex workloads like interactive graphics, data analytics, machine learning, and AI. NVIDIA RTX Server gives you the power to tackle critical day-to-day tasks and compute- heavy workloads – from home or wherever you need to work. Visual Computing Today Increasing daily workflow challenges ~6.5 Billion Render Hours Per Year 30,000 New Products Launch Each Year 2.5 Quintillion Bytes of Data Created Each Day 80% of Applications Utilize AI by 2020 $12.9 Trillion in Global Construction by 2022 $13.45 Billion Simulation Software Market by 2022 American Gods image courtesy of Tendril NVIDIA RTX Server High-performance, flexible visual computing in the data center Highly Flexible Reference Design Delivered by Select OEM Partners I Scalable Configurations Ii Cost Effective and Power Efficient Powerful Virtual Workstations Accelerated Rendering Data Science CAE and Simulation NVIDIA Turing The power of Quadro RTX from desktop to data center Physical Workstations Virtual Workstations NVIDIA RTX Technology Ray Tracing AI Visualization Compute NVIDIA RTX Server Spans Industries Near universal applicability AEC Education Energy Financial Services Government Healthcare Manufacturing Media and Entertainment NVIDIA RTX Server Workloads Covers all major solutions segments Virtual Workstations Rendering Data Science Simulation On-Demand Workstations Design and Visualization Offline Workstations for Workstations for Viewport and Render Workload Workstations Rendering Data Science Sim and Sci Viz Rendering Nodes Applications Hypervisor Applications Hypervisor Hypervisor ISV Software Renderer Renderer Applications Renderer Data Science Software Sim and Sci Viz Apps Hypervisor Quadro vDWS Quadro vDWS Quadro vDWS Quadro vDWS CUDA-X AI OptiX NVIDIA Software CUDA-X AI CUDA-X AI CUDA-X AI CUDA-X AI OptiX NGC Containers OptiX OptiX OptiX NGC Containers NVIDIA RTX Server Built on Quadro RTX 8000 and RTX 6000 active or passive RTX 8000 and 6000 Active RTX 8000 and 6000 Passive Form Factor Dual Slot PCIe Dual Slot PCIe GPU Memory 48 GB GDDR6 ECC | 24 GB GDDR6 ECC 48 GB GDDR6 ECC | 24 GB GDDR6 ECC NVLink 100 GB/Sec Bi-Directional (With Bridge) 100 GB/Sec Bi-Directional (With Bridge) CUDA Cores 4608 4608 RT Cores 72 72 Tensor Cores 576 576 Rays Cast 10 Giga Rays/Sec 10 Giga Rays/Sec Peak FP32 Performance 16.3 TFLOPS 14.9 TFLOPS Peak FP16 Performance 32.6 TFLOPS 29.9 TFLOPS Deep learning TFLOPS 130.5 TFLOPS 119.4 TFLOPS Total Graphics Power 260 W 250 W Thermal Management Active Fansink Passive Auxiliary Power 8-pin PCIe + 6-pin PCIe 8-pin CPU Power Connector Display Outputs 4x DisplayPort 1.4 + VirtualLink (USB-C) Not Applicable Quadro Sync II Compatibility Yes Not Applicable NVIDIA Quadro RTX 8000 and RTX 6000 Active vs. passive performance RTX 8000 and RTX 6000 Active RTX 8000 and RTX 6000 Passive Performance Delta RTX-OPS 84 T 84 T None Rays Cast 10 Giga Rays/Sec 10 Giga Rays/Sec None Peak FP32 Performance 16.3 TFLOPS 14.9 TFLOPS Passive -8.60% Peak FP16 Performance 32.6 TFLOPS 29.9 TFLOPS Passive -8.20% Peak INT8 Performance 261.0 TOPS 238.9 TOPS Passive -8.40% Deep Learning TFLOPS 130.5 Tensor TFLOPS 119.4 Tensor TFLOPS Passive -8.50% Display Connectors 4x DisplayPort 1.4 + VirtualLink Not Applicable Passive requires Quadro vDWS Max Power Consumption TGP 260 W | TBP 295 W TGP 250 W Passive TGP 10 W less Power Connectors 8-pin + 6-pin PCIe (included) 8-pin CPU (included) Passive utilizes 8-pin CPU cable NVIDIA RTX Server Exponential power at a fraction of the cost Rendering Artificial Intelligence Simulation 6x Dual 18-Core Skylake Servers 8x Dual 18-Core Skylake Servers 60x Dual 12-Core Skylake Servers 1x Eight GPU RTX Server 1x Single GPU RTX Server 1x Single GPU RTX Server RTX Server Solution 1/4th the Cost RTX Server Solution 1/5th the Cost RTX Server Solution 1/7th the Cost VR/AR Enterprise Value Propositions Data immersion and 3D conceptualization NVIDIA RTX Server Virtual Workstations NVIDIA Quadro vDWS Delivering complete value to every workflow Advanced Features Highest Performance Application Certification Enterprise Reliability Datacenter Security Resource Optimization Data Proximity IT Management NVIDIA RTX Server for Virtual Workstations Workload configuration Virtual Workstations for Design and Visualization Hypervisor | ISV Applications Quadro vDWS | CUDA-X AI | OptiX RTX 8000 Active or Passive RTX 6000 Active or Passive NVIDIA RTX Server Reference Design OEM Partners NVIDIA Quadro RTX Virtual Workstations Positioning and recommendations Light users Medium Users Heavy Users Type of User Small to medium models, scenes or Large assemblies with simple parts or small Massive datasets, very large 3D models, assemblies with simple parts assemblies with complex parts complex designs, large assemblies NVIDIA T4 or P6 NVIDIA T4 or P6 NVIDIA Quadro RTX 8000 or 6000 Recommended Solution (Perf/$) Quadro Virtual Data Center Workstation Active or passive, Tesla V100S, Quadro vDWS Quadro vDWS software (Quadro vDWS) software software GPU Memory 16 GB 16 GB 48 GB | 32 GB | 24 GB Equivalent Performance Multiple P1000 Up to Quadro P4000 Up to Quadro RTX 8000 Replaces K2, M60, P40, P4, M6 K2, M60, P40, P4, M6 M60, P40 Quadro Virtual Workstation Performance Work faster with larger models 1.5 Quadro Virtual Workstations 1.4x improved performance with Quadro RTX 8000 or RTX 6000 for virtual workstations I 1.0 2x GPU memory with Quadro RTX 8000 for larger model sizes I NVLink high-speed GPU interconnect pools GPU memory and scales performance 0.5 Added ray tracing and AI support with RT and Tensor Cores P40 RTX 8000 or 6000 RTX 3D Graphics 1.4x Faster1 1SPECviewperf13 Geomean NVIDIA RTX Server for Virtual Workstations Broad industry support and NVIDIA NVENC adoption Thin Clients Soft Clients Hypervisor Platforms Infrastructure Providers VR/AR Enterprise Value Propositions Data immersion and 3D conceptualization NVIDIA RTX Server Rendering NVIDIA RTX Server for Rendering Workload configuration options Option One Option Two Option Three Workstation by Day Offline Rendering (Final Frame) On Demand Viewport Rendering Render Node by Night GPU Renderer ISV Applications | GPU Renderer | Hypervisor ISV Applications | GPU Renderer | Hypervisor CUDA-X AI | OptiX Quadro vDWS | CUDA-X AI | OptiX Quadro vDWS | CUDA-X AI | OptiX RTX 8000 Active or Passive RTX 6000 Active or Passive NVIDIA RTX Server Reference Design OEM Partners 25x Accelerated Rendering for Netflix Renders in a fraction of the time using NVIDIA RTX Server CPU Node NVIDIA RTX Server Performance (Dual Skylake) (4x RTX 8000) Improvement Single Frame Render Time 38 Minutes 6 Minutes 6x Total Render Time (120 frames) 76 Hours 3 Hours 25x Number of Render Nodes 25 1 25x Power Requirement (kW) 13.2 1.9 7x Acquisition Cost $188,000 $25,000 7x 5 Year Cost of Power $68,000 $10,000 7x Total Cost $256,000 $38,000 7x 6x Faster for a Single Frame 25x Faster for the Entire Shot Render courtesy of Image Engine | Copyright Netflix | NVIDIA RTX Server was not used in actual Lost in Space production NVIDIA RTX Server Dramatically boosts rendering workload performance Blender Cycles V-Ray Next GPU Autodesk Arnold SOLIDWORKS Visualize RTX up to 8x faster than CPU RTX up to 18x faster than CPU RTX up to 17x faster than CPU RTX up to 17x faster than CPU CPU vs. 2x Quadro RTX 6000 boards CPU vs. 2x Quadro RTX 6000 boards CPU vs. 8x Quadro RTX 8000 boards CPU vs. 2x Quadro RTX 6000 boards running Blender 2.811 running V-Ray Next GPU1 running Autodesk Arnold 6.0.11 running SOLIDWORKS Visualize 20201 1Performance results may vary depending on the scene NVIDIA RTX Server Rendering ISV Support 30+ RTX accelerated applications are available now DaVinci Resolve VR/AR Enterprise Value Propositions Data immersion and 3D conceptualization NVIDIA RTX Server Data Science Data Science is Everywhere Use cases in every industry Retail Financial Services ▪ Supply chain and inventory management ▪ Claim fraud ▪ Price management and markdown optimization ▪ Customer service chatbots and routing ▪ Promotion prioritization and ad targeting ▪ Risk evaluation Telecom Manufacturing ▪ Detect network and security anomalies ▪ Remaining useful life estimation ▪ Forecast network performance ▪ Failure prediction ▪ Network resource optimization (SON) ▪ Demand forecasting Healthcare Energy ▪ Improve clinical care ▪ Sensor data tag mapping ▪ Drive operational efficiency ▪ Anomaly detection ▪ Speed up drug discovery ▪ Robust fault prediction Consumer Internet Automotive ▪ Ad personalization ▪ Personalization and intelligent driver interaction ▪ Click through rate optimization ▪ Connected vehicle predictive maintenance ▪ Churn reduction ▪ Forecasting, demand and capacity planning NVIDIA RTX Server for Data Science and AI CUDA-X AI end-to-end GPU accelerated workflow Data Science Workstation Software Data Science Server Software Package Frameworks Cloud ML Services Deployment Amazon Google Amazon SageMaker Cloud ML SageMaker Neo Machine Learning Serving NVIDIA CUDA-X AI DA GRAPH ML DL TRAIN DL INFERENCE NVIDIA CUDA NVIDIA Data Science Workstation OEM Partners NVIDIA RTX Server Reference Design OEM Partners Cloud
Recommended publications
  • Copyright by Jian He 2020 the Dissertation Committee for Jian He Certifies That This Is the Approved Version of the Following Dissertation
    Copyright by Jian He 2020 The Dissertation Committee for Jian He certifies that this is the approved version of the following dissertation: Empowering Video Applications for Mobile Devices Committee: Lili Qiu, Supervisor Mohamed G. Gouda Aloysius Mok Xiaoqing Zhu Empowering Video Applications for Mobile Devices by Jian He DISSERTATION Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT AUSTIN May 2020 Acknowledgments First and foremost, I want to thank my advisor Prof. Lili Qiu, for the support and guidance I have received over the past few years. I appreciate all her contributions of time, ideas and funding to make my Ph.D. experience productive and stimulating. The enthusiasm she has for her research signifi- cantly motivated to concentrate on my research especially during tough times in my Ph.D. pursuit. She taught me how to crystallize ideas into solid and fancy research works. I definitely believe that working with her will help me have a more successful career in the future. I also want to thank all the members in my dissertation committee, Prof. Mohamed G. Gouda, Prof. Aloysius Mok and Dr. Xiaoqing Zhu. I owe many thanks to them for their insightful comments on my dissertation. I was very fortunate to collaborate with Wenguang Mao, Mubashir Qureshi, Ghufran Baig, Zaiwei Zhang, Yuchen Cui, Sangki Yun, Zhaoyuan He, Chenxi Yang, Wangyang Li and Yichao Chen on many interesting works. They always had time and passion to devote to my research projects.
    [Show full text]
  • GPU Developments 2018
    GPU Developments 2018 2018 GPU Developments 2018 © Copyright Jon Peddie Research 2019. All rights reserved. Reproduction in whole or in part is prohibited without written permission from Jon Peddie Research. This report is the property of Jon Peddie Research (JPR) and made available to a restricted number of clients only upon these terms and conditions. Agreement not to copy or disclose. This report and all future reports or other materials provided by JPR pursuant to this subscription (collectively, “Reports”) are protected by: (i) federal copyright, pursuant to the Copyright Act of 1976; and (ii) the nondisclosure provisions set forth immediately following. License, exclusive use, and agreement not to disclose. Reports are the trade secret property exclusively of JPR and are made available to a restricted number of clients, for their exclusive use and only upon the following terms and conditions. JPR grants site-wide license to read and utilize the information in the Reports, exclusively to the initial subscriber to the Reports, its subsidiaries, divisions, and employees (collectively, “Subscriber”). The Reports shall, at all times, be treated by Subscriber as proprietary and confidential documents, for internal use only. Subscriber agrees that it will not reproduce for or share any of the material in the Reports (“Material”) with any entity or individual other than Subscriber (“Shared Third Party”) (collectively, “Share” or “Sharing”), without the advance written permission of JPR. Subscriber shall be liable for any breach of this agreement and shall be subject to cancellation of its subscription to Reports. Without limiting this liability, Subscriber shall be liable for any damages suffered by JPR as a result of any Sharing of any Material, without advance written permission of JPR.
    [Show full text]
  • NVIDIA Quadro RTX for V-Ray Next
    NVIDIA QUADRO RTX V-RAY NEXT GPU Image courtesy of © Dabarti Studio, rendered with V-Ray GPU Quadro RTX Accelerates V-Ray Next GPU Rendering Solutions for V-Ray Next GPU V-Ray Next GPU taps into the power of NVIDIA® Quadro® NVIDIA Quadro® provides a wide range of RTX-enabled RTX™ to speed up production rendering with dedicated RT solutions for desktop, mobile, server-based rendering, and Cores for ray tracing and Tensor Cores for AI-accelerated virtual workstations with NVIDIA Quadro Virtual Data denoising.¹ With up to 18X faster rendering than CPU-based Center Workstation (Quadro vDWS) software.2 With up to 96 solutions and enhanced performance with NVIDIA NVLink™, gigabytes (GB) of GPU memory available,3 Quadro RTX V-Ray Next GPU with RTX support provides incredible provides the power you need for the largest professional performance improvements for your rendering workloads. graphics and rendering workloads. “ Accelerating artist productivity is always our top Benchmark: V-Ray Next GPU Rendering Performance Increase on Quadro RTX GPUs priority, so we’re quick to take advantage of the latest ray-tracing hardware breakthroughs. By Quadro RTX 6000 x2 1885 ™ Quadro RTX 6000 104 supporting NVIDIA RTX in V-Ray GPU, we’re Quadro RTX 4000 783 bringing our customers an exciting new boost in PU 1 0 2 4 6 8 10 12 14 16 18 20 their GPU production rendering speeds.” Relatve Performance – Phillip Miller, Vice President, Product Management, Chaos Group Desktop performance Tests run on 1x Xeon old 6154 3 Hz (37 Hz Turbo), 64 B DDR4 RAM Wn10x64 Drver verson 44128 Performance results may vary dependng on the scene NVIDIA Quadro professional graphics solutions are verified and recommended for the most demanding projects by Chaos Group.
    [Show full text]
  • RTX-Accelerated Hair Brought to Life with NVIDIA Iray (GTC 2020 S22494)
    RTX-accelerated Hair brought to Life with NVIDIA Iray (GTC 2020 S22494) Carsten Waechter, March 2020 What is Iray? Production Rendering on CUDA In Production since > 10 Years Bring ray tracing based production / simulation quality rendering to GPUs New paradigm: Push Button rendering (open up new markets) Plugins for 3ds Max Maya Rhino SketchUp … … … 2 What is Iray? NVIDIA testbed and inspiration for new tech NVIDIA Material Definition Language (MDL) evolved from internal material representation into public SDK NVIDIA OptiX 7 co-development, verification and guinea pig NVIDIA RTX / RT Cores scene- and ray-dumps to drive hardware requirements NVIDIA Maxwell…NVIDIA Turing (& future) enhancements profiling/experiments resulting in new features/improvements Design and test/verify NVIDIA’s new Headquarter (in VR) close cooperation with Gensler 3 Simulation Quality 4 iray legacy Artistic Freedom 5 How Does it Work? 99% physically based Path Tracing To guarantee simulation quality and Push Button • Limit shortcuts and good enough hacks to minimum • Brute force (spectral) simulation no intermediate filtering scale over multiple GPUs and hosts even in interactive use • Two-way path tracing from camera and (opt.) lights • Use NVIDIA Material Definition Language (MDL) • NVIDIA AI Denoiser to clean up remaining noise 6 How Does it Work? 99% physically based Path Tracing To guarantee simulation quality and Push Button • Limit shortcuts and good enough hacks to minimum • Brute force (spectral) simulation no intermediate filtering scale over multiple
    [Show full text]
  • Transcoding SDK Combine Your Encoding Presets Into a Single Tool
    DATASHEET | Page 1 Transcoding SDK Combine your encoding presets into a single tool MainConcept Transcoding SDK is an all-in-one production tool offering HOW DOES IT WORK? developers the ability to manage multiple codecs and parameters in one • Transcoding SDK works as an place. This streamlined SDK supports the latest encoders and decoders additional layer above MainConcept from MainConcept, including HEVC/H.265, AVC/H.264, DVCPRO, and codecs. MPEG-2. The transcoder generates compliant streams across different • The easy-to-use API replaces the devices, media types, and camcorder formats, and includes support for need to set conversion parameters MPEG-DASH and Apple HLS adaptive bitstream formats. Compliance manually by allowing you to configure ensures content is delivered that meets each unique specification. the encoders with predefined profiles, letting the transcoding engine take Transcoding SDK was created to simplify the workflow for developers care of the rest. who frequently move between codecs and output to a multitude of • If needed, manual control of the configurations. conversion process is supported, including source/target destinations, export presets, transcoding, and filter AVAILABLE PACKAGES parameters. HEVC/H.265 HEVC/H.265 encoder for creating HLS, DASH-265, and other ENCODER PACKAGE generic 8-bit/10-bit 4:2:0 and 4:2:2 streams in ES, MP4 and TS file formats. Includes hardware encoding support using Intel Quick KEY FEATURES Sync Video (IQSV) and NVIDIA NVENC (including Hybrid GPU) for Windows and Linux. • Integrated SDKs for fast deployment HEVC/H.265 SABET HEVC/H.265 encoder package plus Smart Adaptive Bitrate Encod- of transcoding tools ENCODER PACKAGE ing Technology (SABET).
    [Show full text]
  • MSI Afterburner V4.6.4
    MSI Afterburner v4.6.4 MSI Afterburner is ultimate graphics card utility, co-developed by MSI and RivaTuner teams. Please visit https://msi.com/page/afterburner to get more information about the product and download new versions SYSTEM REQUIREMENTS: ...................................................................................................................................... 3 FEATURES: ............................................................................................................................................................. 3 KNOWN LIMITATIONS:........................................................................................................................................... 4 REVISION HISTORY: ................................................................................................................................................ 5 VERSION 4.6.4 .............................................................................................................................................................. 5 VERSION 4.6.3 (PUBLISHED ON 03.03.2021) .................................................................................................................... 5 VERSION 4.6.2 (PUBLISHED ON 29.10.2019) .................................................................................................................... 6 VERSION 4.6.1 (PUBLISHED ON 21.04.2019) .................................................................................................................... 7 VERSION 4.6.0 (PUBLISHED ON
    [Show full text]
  • Salo Jouni-Junior.Pdf
    Näytönohjainarkkitehtuurit Jouni-Junior Salo OPINNÄYTETYÖ Helmikuu 2019 Tieto- ja viestintätekniikan koulutus Sulautetut järjestelmät TIIVISTELMÄ Tampereen ammattikorkeakoulu Tieto- ja viestintätekniikan koulutus Sulautetut järjestelmät SALO JOUNI-JUNIOR Näytönohjainarkkitehtuurit Opinnäytetyö 39 sivua Maaliskuu 2019 Tässä opinnäytetyössä on perehdytty Yhdysvaltalaisen grafiikkasuorittimien val- mistajan Nvidian historiaan ja tuotteisiin. Nvidia on toinen maailman suurim- masta grafiikkasuorittimien valmistajasta. Tässä työssä tutustutaan tarkemmin Nvidian arkkitehtuureihin, Fermiin, Kepleriin, Maxwelliin, Pascaliin, Voltaan ja Turingiin. Opinnäytetyössä tutkittiin, mistä asioista Nvidian arkkitehtuurit koostuvat ja mi- ten eri komponentit kommunikoivat keskenään. Työssä käytiin läpi jokaisen ark- kitehtuurin julkaisuvuosi ja niiden käyttökohteet. Työssä huomattiin kuinka pal- jon Nvidian teknologia on kehittynyt vuosien varrella ja kuinka Nvidian koneop- pimiseen tarkoitettuja työkaluja on käytetty. Nvidia Fermi Kepler Maxwell Pascal Volta Turing rtx näytönohjain gpu ABSTRACT Tampere University of Applied Sciences Information and communication technologies Embedded systems SALO JOUNI-JUNIOR GPU architectures Bachelor's thesis 39 pages March 2019 This thesis focuses on the history and products of an American technology company Nvidia Corporation. Nvidia Corporation is one of the two largest graphics processing unit designers and producers. This thesis examines all of the following Nvidia architectures, Fermi, Kepler, Maxwell, Pascal,
    [Show full text]
  • Finite Element Analysis in Nanotechnology Research Rameshbabu Chandran
    Chapter Finite Element Analysis in Nanotechnology Research RameshBabu Chandran Abstract The Finite Element Analysis in the field of Nanotechnology is continually contributing to the areas ranging from electronics, micro computing, material science, quantum science, engineering, biotechnology, medicine, aerospace, and environment and in computational nanotechnology. The finite element method (FEM) is widely used for solving problems of traditional fields of engineering and Nano research where experimental analysis is unaffordable. This numerical technique can provide accurate solution to complex engineering problems. Over decades this method has become the noted research area for the mathematicians. The popularity of FEM is due to the advent of computer FEA software such as NASTRAN, ANSYS, ABAQUS, Matlab, OPEN Foam, Simscale and the like. With the development of nanoscience, the researchers found difficulties in spending funds for nano related projects. The FEA has evolved as the affordable methodology and offers solutions to all complicated systems of research. Keywords: nanotechnology, FEM, FEA, research, nanoscience 1. Introduction “To move precisely in nanoworld, you donot succeed by perfecting proven techniques”.- Handelsblat. [1] . As stated, the nano research requires newer methodologies and techniques to be worked out to succeed. The microtechnol- ogy to nanotechnology needs a factor of thousand for size reduction. Different methodologies exist to club cooperation between macro, micro and nano robots and analytical based FEM for static, modal, harmonic and transient analysis of structures. Clubbed with multiparametric optimization and neural networks, FEM had developed as an optimal solution to all complicated problems of engi- neering, science, technology, medicine and research. The “bottom up” technology of late twentieth century promises the use of robotics for micro/nano manipula- tion processing [1].
    [Show full text]
  • NVIDIA Ampere GA102 GPU Architecture Whitepaper
    NVIDIA AMPERE GA102 GPU ARCHITECTURE Second-Generation RTX Updated with NVIDIA RTX A6000 and NVIDIA A40 Information V2.0 Table of Contents Introduction 5 GA102 Key Features 7 2x FP32 Processing 7 Second-Generation RT Core 7 Third-Generation Tensor Cores 8 GDDR6X and GDDR6 Memory 8 Third-Generation NVLink® 8 PCIe Gen 4 9 Ampere GPU Architecture In-Depth 10 GPC, TPC, and SM High-Level Architecture 10 ROP Optimizations 11 GA10x SM Architecture 11 2x FP32 Throughput 12 Larger and Faster Unified Shared Memory and L1 Data Cache 13 Performance Per Watt 16 Second-Generation Ray Tracing Engine in GA10x GPUs 17 Ampere Architecture RTX Processors in Action 19 GA10x GPU Hardware Acceleration for Ray-Traced Motion Blur 20 Third-Generation Tensor Cores in GA10x GPUs 24 Comparison of Turing vs GA10x GPU Tensor Cores 24 NVIDIA Ampere Architecture Tensor Cores Support New DL Data Types 26 Fine-Grained Structured Sparsity 26 NVIDIA DLSS 8K 28 GDDR6X Memory 30 RTX IO 32 Introducing NVIDIA RTX IO 33 How NVIDIA RTX IO Works 34 Display and Video Engine 38 DisplayPort 1.4a with DSC 1.2a 38 HDMI 2.1 with DSC 1.2a 38 Fifth Generation NVDEC - Hardware-Accelerated Video Decoding 39 AV1 Hardware Decode 40 Seventh Generation NVENC - Hardware-Accelerated Video Encoding 40 NVIDIA Ampere GA102 GPU Architecture ii Conclusion 42 Appendix A - Additional GeForce GA10x GPU Specifications 44 GeForce RTX 3090 44 GeForce RTX 3070 46 Appendix B - New Memory Error Detection and Replay (EDR) Technology 49 Appendix C - RTX A6000 GPU Perf ormance 50 List of Figures Figure 1.
    [Show full text]
  • Programming Tensor Cores from an Image Processing DSL
    Programming Tensor Cores from an Image Processing DSL Savvas Sioutas Sander Stuijk Twan Basten Eindhoven University of Technology Eindhoven University of Technology Eindhoven University of Technology Eindhoven, The Netherlands Eindhoven, The Netherlands TNO - ESI [email protected] [email protected] Eindhoven, The Netherlands [email protected] Lou Somers Henk Corporaal Canon Production Printing Eindhoven University of Technology Eindhoven University of Technology Eindhoven, The Netherlands Eindhoven, The Netherlands [email protected] [email protected] ABSTRACT 1 INTRODUCTION Tensor Cores (TCUs) are specialized units first introduced by NVIDIA Matrix multiplication (GEMM) has proven to be an integral part in the Volta microarchitecture in order to accelerate matrix multipli- of many applications in the image processing domain [8]. With cations for deep learning and linear algebra workloads. While these the rise of CNNs and other Deep Learning applications, NVIDIA units have proved to be capable of providing significant speedups designed the Tensor Core Unit (TCU). TCUs are specialized units for specific applications, their programmability remains difficult capable of performing 64 (4x4x4) multiply - accumulate operations for the average user. In this paper, we extend the Halide DSL and per cycle. When first introduced alongside the Volta microarchi- compiler with the ability to utilize these units when generating tecture, these TCUs aimed to improve the performance of mixed code for a CUDA based NVIDIA GPGPU. To this end, we introduce precision multiply-accumulates (MACs) where input arrays contain a new scheduling directive along with custom lowering passes that half precision data and accumulation is done on a single precision automatically transform a Halide AST in order to be able to gener- output array.
    [Show full text]
  • NVIDIA Professional Graphics Solutions | Line Card
    You need to do great things. Create and collaborate from anywhere, on any device, without distractions like slow performance, poor stability, or application incompatibility. With NVIDIA RTX™, you can unleash your vision and enjoy ultimate creative freedom. NVIDIA RTX powers a wide range of mobile, desktop, and data center solutions for millions of professionals. Leverage the latest advancements in real-time ray tracing, AI, virtual reality (VR), and interactive, photorealistic rendering, so you can develop revolutionary products, tell vivid NVIDIA PROFESSIONAL visual stories, and design groundbreaking architecture like never before. Support for advanced GRAPHICS SOLUTIONS features, frameworks, and SDKs across all of our products gives you the power to tackle the most challenging visual computing tasks, no matter the scale. NVIDIA Laptop GPUs NVIDIA Desktop Workstations GPUs NVIDIA Servers GPUs Professionals today increasingly need to work on complex NVIDIA RTX-powered desktop workstations are designed and built Demand for visualization, rendering, data science, and simulation workflows like VR, 8K video editing, and photorealistic rendering specifically for artists, designers, and engineers, to drive their most continues to grow as businesses tackle larger, more complex on the go. NVIDIA RTX mobile GPUs deliver desktop-level challenging workloads. Connect multiple NVIDIA RTX GPUs to scale workloads. Scale up your visual compute infrastructure and tackle performance in a portable form factor. With up to 24 gigabytes (GB) up to 96 GB of GPU memory and performance to tackle the largest graphics-intensive workloads, complex designs, photorealistic of massive GPU memory, NVIDIA RTX mobile GPUs combine the workloads and speed up your workflow. This delivers significant renders, and augmented and virtual environments at the edge with latest advancements in real-time ray tracing, advanced shading, business impact across industries like manufacturing, media and NVIDIA GPUs.
    [Show full text]
  • Processing Multimedia Workloads on Heterogeneous Multicore Architectures
    Doctoral Dissertation Processing Multimedia Workloads on Heterogeneous Multicore Architectures H˚akon Kvale Stensland February 2015 Submitted to the Faculty of Mathematics and Natural Sciences at the University of Oslo in partial fulfilment of the requirements for the degree of Philosophiae Doctor © Håkon Kvale Stensland, 2015 Series of dissertations submitted to the Faculty of Mathematics and Natural Sciences, University of Oslo No. 1601 ISSN 1501-7710 All rights reserved. No part of this publication may be reproduced or transmitted, in any form or by any means, without permission. Cover: Hanne Baadsgaard Utigard. Printed in Norway: AIT Oslo AS. Produced in co-operation with Akademika Publishing. The thesis is produced by Akademika Publishing merely in connection with the thesis defence. Kindly direct all inquiries regarding the thesis to the copyright holder or the unit which grants the doctorate. Abstract Processor architectures have been evolving quickly since the introduction of the central processing unit. For a very long time, one of the important means of increasing per- formance was to increase the clock frequency. However, in the last decade, processor manufacturers have hit the so-called power wall, with high heat dissipation. To overcome this problem, processors were designed with reduced clock frequencies but with multiple cores and, later, heterogeneous processing elements. This shift introduced a new challenge for programmers: Legacy applications, written without parallelization in mind, gain no benefits from moving to multicore and heterogeneous architectures. Another challenge for the programmers is that heterogeneous architecture designs are very different with respect to caches, memory types, execution unit organization, and so forth and, in the worst case, a programmer must completely rewrite the application to obtain the best performance on the new architecture.
    [Show full text]