IN-SITU VIS in the AGE of HPC+AI Peter Messmer, WOIV – ISC 2018, Frankfurt, 6/28/18 EXCITING TIMES for VISUALIZATION

Total Page:16

File Type:pdf, Size:1020Kb

IN-SITU VIS in the AGE of HPC+AI Peter Messmer, WOIV – ISC 2018, Frankfurt, 6/28/18 EXCITING TIMES for VISUALIZATION IN-SITU VIS IN THE AGE OF HPC+AI Peter Messmer, WOIV – ISC 2018, Frankfurt, 6/28/18 EXCITING TIMES FOR VISUALIZATION Huge dataset everywhere Novel workflows for HPC * AI Rendering technology leap 2 DEEP LEARNING COMES TO HPC Accelerates Scientific Discovery UIUC & NCSA: ASTROPHYSICS U. FLORIDA & UNC: DRUG DISCOVERY LBNL: High Energy Particle PHYSICS 5,000X LIGO Signal Processing 300,000X Molecular Energetics Prediction 100,000X faster simulated output by GAN PRINCETON & ITER: CLEAN ENERGY U.S. DoE: PARTICLE PHYSICS Univ. of Illinois: DRUG DISCOVERY 50% Higher Accuracy for Fusion Sustainment 33% More Accurate Neutrino Detection 2X Accuracy for Protein Structure Prediction 3 The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka!” but “That’s funny …” — Isaac Asimov 4 ANOMALY DETECTION Look at all the data and learn from it! Vast amount of data produced by simulations and experiments I/O limitations force throw-away Human in the loop infeasible “In-situ vis 2.0” Dataset: Visualization of historic cyclones from JWTC hurricane report from 1979 to 2016 Kim et al, 2017) Run inference on the simulation data to detect anomalies 5 Deep Learning for Climate Modeling ANOMALY DETECTION IN CLIMATE DATA Identifying “extreme” weather events in multi-decadal datasets with 5-layered Convolutional Neural Network. Reaching 99.98% of detection accuracy. (Kim et al, 2017) Dataset: Visualization of historic cyclones from JWTC hurricane report from 1979 to 2016 MNIST structure Systemic framework for detection and localization of extreme climate event DOI 10.1109/ICDMW.2017.476 OPTIMIZING PARAMETERIZED MODELS Learning from simulations to accelerate simulations Heuristics and semi-empirical models Examples: throughout simulation codes - Parameterized models: Often expensive, complex control flow - Atmospheric radiation transport, Use trained approximate models instead collisional cross-sections, chemical reaction chains, … Trained with past simulations - Simulation parameters - Grid refinement, load balancing, scheduling, … Use simulations to train approximate models 7 HPC SIMULATION OF THE FUTURE Concurrent simulation, training, inference Leads to - Improved approximate models - Better utilization of data - Faster results Coupling of HPC and DL software stack is needed 8 DATA SCIENCE DRIVES SOFTWARE-STACK Data science mission-critical to non-traditional HPC organizations Deep learning, graph analytics, in-core databases,.. Sustainability and performance by scale Frameworks supported by big corps, large communities Big market, big support by all vendors Economics drive performance portability and sustainability Happened in the past, but different situation today Trilinos: 274 stars, PyTorch: 16’615 stars Cast HPC algorithms in DL terms 9 EXAMPLE: WAVE EQUATION VIA CONV NEURAL NETWORK Applying stencils = Inference in Conv Network layer { name : “laplace_u” type : “convolution” bottom : “u_0” top : “laplace_u” u convolution_param { 0 num_output : 1 푑푣 kernel_size : 3 = 푐 ∆ 푢 stride : 1 푑푡 pad : 1 .. 푑푢 } = 푣 Du0 =ux-1–2u+ux+1 푑푡 layer { name : “velocity_update” type : “Eltwise” V3/2 = c dt Du0 + v1/2 bottom : “laplace_u” top : “velocity_update” } u1 = dt v1/2 + u0 layer { name : “position_update” type : “Eltwise” Simplified 1D example cartoon bottom : “velocity_update” Approach works for higher dimensions top : “position_update” } Other examples: Stencils, Spectral transforms, spectral elements, .. 10 EXAMPLE: MARCHING CUBES (well.. squares) Input Data Threshold Interpol Interpol X Y Label Cells Select Interpols 11 HPC CAN CONTRIBUTE TO EMERGING DATA SCIENCE NEEDS HPC solved lots of “new” problems in the past DL will need distributed memory parallelism New challenges for DL algorithms HPC has probably hit those challenges in the past https://www.hpcwire.com/2017/02 Better implementation, better algorithms /21/hpc-technique-benefits-deep- learning/ Collaborate with DL framework developers, contribute to DL frameworks First step: speak a common language 12 RENDERING TECHNOLOGIES 13 VISUALIZATION IN THE DATACENTER Benefits of Rendering on Supercomputer Scale with Simulation Cheaper Infrastructure Interactive High-Fidelity Rendering No Need to Scale Separate Vis Cluster All Heavy Lifting Performed on the Server Improves Perception and Scientific Insight 14 VISUALIZATION REQUIREMENTS Accuracy Responsiveness Visualization needs Interpretability 15 REAL-TIME RAY-TRACING Enabled by Volta V100 OptiX Ray-Tracing SDK State-of-the-art performance: 500M+ rays/sec Algorithm and hardware agnostic Shaders with single-ray programming model, recursion Free for private and commercial use https://developer.nvidia.com/optix 16 OptiX 5 with the CNN-based AI Denoiser 157x 54x 7x 1x 4x Pascal GP100 Volta V100 Volta V100 DGX Station CPU OptiX 4.1 OptiX 5 OptiX 5 W/ AI OptiX 5 W/ AI C.A. Chaitanya, A. Kaplanyan, C. Schied, M. Salvi, A. Lefohn, D. Nowrouzezahrai, T. Aila. “Interactive Reconstruction of Monte Carlo Image Sequences Using a Recurrent Denoising Autoencoder.” SIGGRAPH 2017 17 OPTIX FOR SCIENCE Ray-Tracing for extra visual cues Enhanced visual perception Single workflow for science and outreach Fast rendering with AI denoiser Integrated into leading vis tools With Denoiser 18 WILL HPC TRANSFORM AI OR WILL AI TRANSFORM HPC? Combination of HPC and AI will lead to better exploitation of data, faster models Both communities need each other - AI will need distributed memory technologies at one point - HPC needs AI for analysis and parameterized models Economic interest in AI drives software and hardware development Expressing HPC algorithms in AI terms helps leveraging AI features Applying HPC techniques to scale AI => Continue with joint events! Simulation and streamline data courtesy of Christoph Garth19 OptiX AI Denoiser in ParaView Without Denoiser With Denoiser 20 NVIDIA IndeX SDK Large scale and distributed data rendering Scene management with volume data Transparent support for NVLink Higher-order filtering, advanced lighting & transfer functions https://developer.nvidia.com/index 21 ParaView Plugin NVDIA IndeX ParaView Plugin Distributed with ParaView (pvNvidiaIndeX) Mixing with ParaView primitives Support for multiple slice planes Structured and unstructured meshes In-situ capable via Catalyst https://developer.nvidia.com/index 22 NVIDIA INDEX PARAVIEW PLUGIN Growing user base • Single GPU version bundled with ParaView • Over 20,000 downloads in 2 months. • Cluster version • 10+ labs and institutions actively working with us. https://www.nvidia.com/en-us/data-center/index-paraview-plugin/ 23 NVIDIA INDEX PARAVIEW PLUGIN Licensing and Availability SINGLE GPU ACADEMIC COMMERCIAL Free license Free license License fee Supports single GPU Multi-node Multi-GPU Multi-node Multi-GPU Plugin binaries part of Plugin source part of Plugin source with ParaView v5.5 download ParaView v5.5 download customer specific features New releases synced with Latest features in Premium support ParaView release cycle ParaView-Master 24 FLEXIBLE GPU ACCELERATION ARCHITECTURE Independent CUDA Cores & Video Engines Decode HW* Encode HW* Formats: Formats: • MPEG-2 CPU • VC1 • H.264 • VP8 • H.265 • VP9 • Lossless • H.264 • H.265 Bit depth: • Lossless • 8 bit NVDEC Buffer NVENC • 10 bit Bit depth: • 8 bit Color** • 10/12 bit • YUV 4:4:4 • YUV 4:2:0 Color** • YUV 4:2:0 CUDA Cores Resolution • Up to 8K*** Resolution • Up to 8K*** * See support diagram for previous NVIDIA HW generations ** 4:2:2 is not natively supported on HW 25 *** Support is codec dependent VIDEO CODEC SDK Hardware Accelerated Video Encoding/Decoding Encode and Decode API Industry-standard codecs: H.265(HEVC), H.264(AVCHD), VP9, VP8, VC-1 & MPEG-2 Supports up to 8K x 8K encoding Lossy & lossless compression https://developer.nvidia.com/nvidia-video-codec-sdk 26 Streaming of Large Tile Counts Frame rates / Latency / Bandwidth CASE STUDY Synchronization Comparison against CPU-based Compressors Strong Scaling (Direct-Send Sort-First Compositing) Hardware-Accelerated Multi-Tile Streaming for Realtime Remote Visualization, Biedert et al, EGPGV 2018, https://doi.org/10.2312/pgv.20181093 27 CONCEPTUAL OVERVIEW Asynchronous Pipelines 28 BENCHMARK SCENES NASA Synthesis 4K Space Orbit Ice Streamlines Low Complexity Medium Complexity High Complexity Extreme Complexity 29 HARDWARE LATENCY Decreases with Resolution 16 14 12 ] s 10 m [ y c 8 n e t a L 6 4 2 0 3840x2160 2720x1530 1920x1080 1360x766 960x540 672x378 480x270 Resolution Encode (HEVC) Encode (H.264) Decode (HEVC) Decode (H.264) 30 N:N STREAMING Pipeline Latencies Server: Piz Daint 45 Client: Piz Daint 40 Ice 4K 35 H.264: 32 Mbps 30 ] s m MPI-based [ 25 synchronization y c n e 20 t a L 15 10 5 0 1 2 4 8 16 32 64 128 256 Tiles Synchronize (Servers) Encode Network Decode Synchronize (Clients) 31 N:N STREAMING Client-Side Frame Rates and Bandwidths Server: Piz Daint 90 900 Client: Piz Daint 80 800 Ice 4K 70 700 ] s H.264: 32 Mbps ] / z 60 600 B HEVC: 16 Mbps H [ M [ e t 50 500 a h t R d i e 40 400 w m d a n r a F 30 300 B 20 200 10 100 0 0 1 2 4 8 16 32 64 128 256 Tiles HEVC (Bandwidth) H.264 (Bandwidth) HEVC (Frame Rate) H.264 (Frame Rate) 32 N:1 STREAMING Client-Side Frame Rate Server: Piz Daint 90 Clients: Site A (5 ms) Site B (25 ms) 80 Ice 4K 70 ] z 60 H.264: 32 Mbps H [ e t 50 a R e 40 m a r F 30 20 10 0 1 2 4 8 16 32 64 128 256 Tiles Site A (1x GP100) Site B (1x GP100) Site B (2x GP100) 33 N:1 STRONG SCALING Client-Side Frame Rate Server: Piz Daint 400 Clients: Site A (5 ms) Site B (25 ms) 350 Ice 4K 300 ] z H.264: 32 Mbps H 250
Recommended publications
  • Nvidia Video Codec Sdk - Encoder
    NVIDIA VIDEO CODEC SDK - ENCODER Application Note vNVENC_DA-6209-001_v14 | July 2021 Table of Contents Chapter 1. NVIDIA Hardware Video Encoder.......................................................................1 1.1. Introduction............................................................................................................................... 1 1.2. NVENC Capabilities...................................................................................................................1 1.3. NVENC Licensing Policy...........................................................................................................3 1.4. NVENC Performance................................................................................................................ 3 1.5. Programming NVENC...............................................................................................................5 1.6. FFmpeg Support....................................................................................................................... 5 NVIDIA VIDEO CODEC SDK - ENCODER vNVENC_DA-6209-001_v14 | ii Chapter 1. NVIDIA Hardware Video Encoder 1.1. Introduction NVIDIA GPUs - beginning with the Kepler generation - contain a hardware-based encoder (referred to as NVENC in this document) which provides fully accelerated hardware-based video encoding and is independent of graphics/CUDA cores. With end-to-end encoding offloaded to NVENC, the graphics/CUDA cores and the CPU cores are free for other operations. For example, in a game recording scenario,
    [Show full text]
  • Tegra Linux Driver Package
    TEGRA LINUX DRIVER PACKAGE RN_05071-R32 | March 18, 2019 Subject to Change 32.1 Release Notes RN_05071-R32 Table of Contents 1.0 About this Release ................................................................................... 3 1.1 Login Credentials ............................................................................................... 4 2.0 Known Issues .......................................................................................... 5 2.1 General System Usability ...................................................................................... 5 2.2 Boot .............................................................................................................. 6 2.3 Camera ........................................................................................................... 6 2.4 CUDA Samples .................................................................................................. 7 2.5 Multimedia ....................................................................................................... 7 3.0 Top Fixed Issues ...................................................................................... 9 3.1 General System Usability ...................................................................................... 9 3.2 Camera ........................................................................................................... 9 4.0 Documentation Corrections ..................................................................... 10 4.1 Adaptation and Bring-Up Guide ............................................................................
    [Show full text]
  • Mechdyne-TGX-2.1-Installation-Guide
    TGX Install Guide Version 2.1.3 Mechdyne Corporation March 2021 TGX INSTALL GUIDE VERSION 2.1.3 Copyright© 2021 Mechdyne Corporation All Rights Reserved. Purchasers of TGX licenses are given limited permission to reproduce this manual, provided the copies are for their use only and are not sold or distributed to third parties. All such copies must contain the title page and this notice page in their entirety. The TGX software program and accompanying documentation described herein are sold under license agreement. Their use, duplication, and disclosure are subject to the restrictions stated in the license agreement. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. This publication is provided “as is” without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non- infringement. Any Mechdyne Corporation publication may include inaccuracies or typographical errors. Changes are periodically made to these publications, and changes may be incorporated in new editions. Mechdyne may improve or change its products described in any publication at any time without notice. Mechdyne assumes no responsibility for and disclaims all liability for any errors or omissions in this publication. Some jurisdictions do not allow the exclusion of implied warranties, so the above exclusion may not apply. TGX is a trademark of Mechdyne Corporation. Windows® is registered trademarks of Microsoft Corporation. Linux® is registered trademark of Linus Torvalds.
    [Show full text]
  • NVIDIA RTX Virtual Workstation
    NVIDIA RTX Virtual Workstation Sizing Guide NVIDIA RTX Virtual Workstation | 1 Executive Summary Document History nv-quadro-vgpu-deployment-guide-citrixonvmware-v2-082020 Version Date Authors Description of Change 01 Aug 17, 2020 AFS, JJC, EA Initial Release 02 Jan 08, 2021 CW Version 2 03 Jan 14, 2021 AFS Branding Update NVIDIA RTX Virtual Workstation | 2 Executive Summary Table of Contents Chapter 1. Executive Summary ......................................................................................................... 5 1.1 What is NVIDIA RTX vWS? ....................................................................................................... 5 1.2 Why NVIDIA vGPU? ................................................................................................................. 6 1.3 NVIDIA vGPU Architecture....................................................................................................... 6 1.4 Recommended NVIDIA GPU’s for NVIDIA RTX vWS ................................................................ 7 Chapter 2. Sizing Methodology......................................................................................................... 9 2.1 vGPU Profiles ........................................................................................................................... 9 2.2 vCPU Oversubscription .......................................................................................................... 10 Chapter 3. Tools .............................................................................................................................
    [Show full text]
  • NVIDIA A100 Tensor Core GPU Architecture UNPRECEDENTED ACCELERATION at EVERY SCALE
    NVIDIA A100 Tensor Core GPU Architecture UNPRECEDENTED ACCELERATION AT EVERY SCALE V1.0 Table of Contents Introduction 7 Introducing NVIDIA A100 Tensor Core GPU - our 8th Generation Data Center GPU for the Age of Elastic Computing 9 NVIDIA A100 Tensor Core GPU Overview 11 Next-generation Data Center and Cloud GPU 11 Industry-leading Performance for AI, HPC, and Data Analytics 12 A100 GPU Key Features Summary 14 A100 GPU Streaming Multiprocessor (SM) 15 40 GB HBM2 and 40 MB L2 Cache 16 Multi-Instance GPU (MIG) 16 Third-Generation NVLink 16 Support for NVIDIA Magnum IO™ and Mellanox Interconnect Solutions 17 PCIe Gen 4 with SR-IOV 17 Improved Error and Fault Detection, Isolation, and Containment 17 Asynchronous Copy 17 Asynchronous Barrier 17 Task Graph Acceleration 18 NVIDIA A100 Tensor Core GPU Architecture In-Depth 19 A100 SM Architecture 20 Third-Generation NVIDIA Tensor Core 23 A100 Tensor Cores Boost Throughput 24 A100 Tensor Cores Support All DL Data Types 26 A100 Tensor Cores Accelerate HPC 28 Mixed Precision Tensor Cores for HPC 28 A100 Introduces Fine-Grained Structured Sparsity 31 Sparse Matrix Definition 31 Sparse Matrix Multiply-Accumulate (MMA) Operations 32 Combined L1 Data Cache and Shared Memory 33 Simultaneous Execution of FP32 and INT32 Operations 34 A100 HBM2 and L2 Cache Memory Architectures 34 ii NVIDIA A100 Tensor Core GPU Architecture A100 HBM2 DRAM Subsystem 34 ECC Memory Resiliency 35 A100 L2 Cache 35 Maximizing Tensor Core Performance and Efficiency for Deep Learning Applications 37 Strong Scaling Deep Learning
    [Show full text]
  • Tegra Linux Driver Package
    TEGRA LINUX DRIVER PACKAGE RN_05071-R32 | July 17, 2019 Subject to Change 32.2 Release Notes RN_05071-R32 Table of Contents 1.0 About this Release ................................................................................... 4 1.1 Login Credentials ............................................................................................... 5 2.0 Known Issues .......................................................................................... 6 2.1 General System Usability ...................................................................................... 6 2.2 Boot .............................................................................................................. 7 2.3 Camera ........................................................................................................... 8 2.4 Connectivity ..................................................................................................... 9 2.5 CUDA ........................................................................................................... 10 2.6 Jetson Developer Kit ......................................................................................... 10 2.7 Multimedia ..................................................................................................... 10 2.8 SDK Manager .................................................................................................. 11 3.0 Top Fixed Issues .................................................................................... 12 3.1 General System Usability ...................................................................................
    [Show full text]
  • NVIDIA DESIGNWORKS Ankit Patel - [email protected] Prerna Dogra - [email protected]
    NVIDIA DESIGNWORKS Ankit Patel - [email protected] Prerna Dogra - [email protected] 1 Autonomous Driving Deep Learning Visual Effects Virtual Desktops Gaming Product Design Visual Computing is our singular mission 2 RENDERING PHYSICS Iray SDK OptiX SDK MDL SDK NV Pro Pipeline vMaterials PhysX VOXELS VIDEO MANAGEMENT GVDB Voxels VXGI GPUDirect for Video Video Codec SDK GRID SW MGMT SDK NVAPI/NVWMI DISPLAY https://developer.nvidia.com/designworks Multi-Display Capture SDK Warp and Blend 3 End-Users (Designers, Artists, Scientists) Application Partners Tools and technologies for Professional Visualization Application Developers NVWMI VXGI GRID Video Codec Mosaic Capture PhysX OptiX Management vMaterials NvPro Pipeline GPU Direct for Video Warp and Blend GVDB Iray NVAPI MDL GRAPHICS DRIVER CUDA DRIVER NVIDIA GPU 4 RENDERING 5 IRAY SDK Rendering by simulating the physical behavior of lights and materials • Use Case: physically based rendering for product design • Unprecedented visual quality and fidelity; enabling fluid and interactive product design flow Iray 2017 developer.nvidia.com/iray-sdk 6 MDL SDK Material Definition Language for seamless and quick integration of physically based materials into renderers • Use Case: physically based rendering for product design Physically based Materials • Enables designers and artists to understand how materials impact product design MDL support in Iray for Maya MDL SDK 2017 developer.nvidia.com/mdl-sdk 7 vMaterials Library with hundreds of ready to use real world materials • Use Case: physically based
    [Show full text]
  • Nvidia Video Codec Sdk - Decoder
    NVIDIA VIDEO CODEC SDK - DECODER Application Note vNVDEC_DA-08097-001_v08 | July 2021 Table of Contents Chapter 1. NVIDIA Hardware Video Decoder.......................................................................1 1.1. Introduction............................................................................................................................... 1 1.2. NVDEC Capabilities...................................................................................................................1 1.3. What’s new................................................................................................................................ 2 1.4. NVDEC Performance................................................................................................................ 2 1.5. Programming NVDEC............................................................................................................... 4 1.6. FFmpeg Support....................................................................................................................... 4 NVIDIA VIDEO CODEC SDK - DECODER vNVDEC_DA-08097-001_v08 | ii Chapter 1. NVIDIA Hardware Video Decoder 1.1. Introduction NVIDIA GPUs contain a hardware-based decoder (referred to as NVDEC in this document) which provides fully accelerated hardware-based video decoding for several popular codecs. With complete decoding offloaded to NVDEC, the graphics engine and CPU are free for other operations. NVDEC supports much faster than real-time decoding which makes it suitable for transcoding scenarios
    [Show full text]
  • Studies on CUDA Offloading for Real-Time Simulation And
    Studies on CUDA Offloading for Real-Time Simulation and Visualization Edgar Josafat Mart´ınez-Noriega 電気通信大学 情報・通信工学専攻 A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Engineering March 2020 Studies on CUDA Offloading for Real-Time Simulation and Visualization Chairperson: Prof. Narumi Tetsu (成見 哲 先生) Member: Prof. Terada Minoru (寺田 実 先生) Member: Prof. Nakatani Yoshinobu (仲谷 栄伸 先生) Member: Prof. Yoshinaga Tsutomu (吉永 努 先生) Member: Prof. Miwa Shinobu (三輪 忍 先生) © Copyright Edgar Josafat Mart´ınez-Noriega, 2020 All rights reserved. 概要 リアルタイムのシミュレーションと可視化のためのCUDA のオフロードに関する 研究 マルチネス ノリエガ エドガー ホサファット 電気通信大学 本論文では, GPU を使ったリアルタイムのシミュレーションを可視化する際に, 計算部 分をネットワークの先にオフロードすることで計算効率を向上できることを示している. GPU はもともと3D グラフィックス用に開発されたものではあるが, 近年はGPGPU を呼 ばれる汎用的な計算が行えるようになってきている. 様々なコンピュータシミュレーショ ンもGPU 上で実行出来るが, その中でCUDA と呼ばれるアーキテクチャはGPU 業界の事 実上の標準技術となっている. 一方タブレットやスマートホンのようなモバイルデバイス は, タッチ機能や加速度センサーのようなPC には無かった機能が追加されており, データ を可視化し操作する際のやり方が以前とは変わってきている. 例えば分子シミュレーショ ンの世界では, インタラクティブに操作可能な程シミュレーションが高速化されてきてお り, 特定の分子を人工的に動かすことで周りの分子の反応を見るなどシミュレーション技 術の新しい方向性が生まれている. ただしモバイルデバイスには消費電力的な制約があ り, PC 用のGPU 程の性能は期待出来ない. このようなモバイルデバイスの性能を補完す るために, クラウド技術を使う方法がある. つまり計算の重い部分に関してはネットワー クの先のGPU サーバーに処理を任せる. このようなやり方をCUDA のオフロードと呼び, GVirtus, ShadowFax, DS-CUDA, GPUvm, MGP, vCUDA, rCUDA 等のフレームワークが 提唱されている. 本論文では, リアルタイムの分子動力学シミュレーションを対象のアプリ ケーションと定め,タブレット端末上で高速に実行するために有効なオフロードの方法を 検討した. 最初にDS-CUDA を用いてCUDA の計算だけをGPU サーバーにオフロードす るシステムを評価した. 特にこれまでサポートされていなかったAndroid タブレットから のオフロードシステムも開発した. この結果タブレット単体に比べて高い演算性能は達成 できたものの, 画面表示のフレームレートが十分に滑らかには出来なかった.
    [Show full text]
  • NVIDIA® RTX™ A4000 GPU Performance Features NVIDIA Ampere Architecture NVIDIA RTX A4000 Is the Most Powerful Single Slot
    NVIDIA RTX NVIDIA A4000 RFEATURESTX A4000 &FEATURES BENEFITS & BENEFITS FEATURES AND BENEFITS NVIDIA® RTX™ A4000 GPU Performance Features NVIDIA Ampere Architecture NVIDIA RTX A4000 is the most powerful single slot GPU solution offering high performance real-time ray tracing, AI-accelerated compute, and professional graphics rendering. Building upon the major SM enhancements from the Turing GPU, the NVIDIA Ampere architecture enhances ray tracing operations, tensor matrix opera tions, and concurrent executions of FP32 and INT32 operations. CUDA Cores The NVIDIA Ampere architecture-based CUDA cores bring up to 2.7X the single-precision floating point (FP32) throughput compared to the previous generation, providing significant performance improvements for graphics workflows such as 3D model development and compute for workloads such as desktop simulation for computer-aided engineering (CAE). The RTX A4000 enables two FP32 primary data paths, doubli ng the peak FP32 operations. 2nd Generation RT Cores Incorporating 2nd generation ray tracing engines, NVIDIA Ampere architecture-based GPUs provide incredible ray traced rendering performance. A single RTX A4000 board can render complex professional models with physically accurate shadows, reflections, and refractions to empower users with instant insight. Working in concert with applications leveraging APIs such as NVIDIA OptiX, Microsoft DXR and Vulkan ray tracing, systems based on the RTX A40 00 will power truly interactive design workflows to provide immediate feedback for unprecedented levels of productivity. The RTX A4000 is up to 2X faster in ray tracing compared to the previous generation. This technology also speeds up the rendering of ray- traced motion blur for faster results with greater visual accuracy.
    [Show full text]
  • Nvidia Video Codec Sdk - Decoder
    NVIDIA VIDEO CODEC SDK - DECODER Programming Guide vNVDECODEAPI_PG-08085-001_v07 | July 2021 Table of Contents Chapter 1. Overview..............................................................................................................1 1.1. Supported Codecs.....................................................................................................................1 Chapter 2. Video Decoder Capabilities................................................................................ 3 Chapter 3. Video Decoder Pipeline...................................................................................... 5 Chapter 4. Using NVIDIA Video Decoder (NVDECODE API)................................................. 7 4.1. Video Parser.............................................................................................................................. 7 4.1.1. Creating a parser............................................................................................................... 7 4.1.2. Parsing the packets........................................................................................................... 8 4.1.3. Destroying parser...............................................................................................................8 4.2. Video Decoder........................................................................................................................... 9 4.2.1. Querying decode capabilities.............................................................................................9 4.2.2.
    [Show full text]
  • Gpu-Accelerated Applications
    GPU-ACCELERATED APPLICATIONS Test Drive the World’s Fastest Accelerator – Free! Take the GPU Test Drive, a free and easy way to experience accelerated computing on GPUs. You can run your own application or try one of the preloaded ones, all running on a remote cluster. Try it today. www.nvidia.com/gputestdrive GPU-ACCELERATED APPLICATIONS Accelerated computing has revolutionized a broad range of industries with over five hundred applications optimized for GPUs to help you accelerate your work. CONTENTS 1 Computational Finance 2 Climate, Weather and Ocean Modeling 2 Data Science and Analytics 5 Artificial Intelligence DEEP LEARNING AND MACHINE LEARNING 9 Federal, Defense and Intelligence 10 Design for Manufacturing/Construction: CAD/CAE/CAM COMPUTATIONAL FLUID DYNAMICS COMPUTATIONAL STRUCTURAL MECHANICS DESIGN AND VISUALIZATION ELECTRONIC DESIGN AUTOMATION INDUSTRIAL INSPECTION 19 Media & Entertainment ANIMATION, MODELING AND RENDERING COLOR CORRECTION AND GRAIN MANAGEMENT COMPOSITING, FINISHING AND EFFECTS EDITING ENCODING AND DIGITAL DISTRIBUTION ON-AIR GRAPHICS ON-SET, REVIEW AND STEREO TOOLS WEATHER GRAPHICS 26 Medical Imaging 29 Oil and Gas 30 Research: Higher Education and Supercomputing COMPUTATIONAL CHEMISTRY AND BIOLOGY NUMERICAL ANALYTICS PHYSICS SCIENTIFIC VISUALIZATION 41 Safety and Security 44 Tools and Management Computational Finance APPLICATION NAME COMPANY/DEVELOPER PRODUCT DESCRIPTION SUPPORTED FEATURES GPU SCALING Accelerated Elsen Secure, accessible, and accelerated back- • Web-like API with Native bindings for Multi-GPU Computing Engine testing, scenario analysis, risk analytics Python, R, Scala, C Single Node and real-time trading designed for easy • Custom models and data streams are integration and rapid development. easy to add Adaptiv Analytics SunGard A flexible and extensible engine for fast • Existing models code in C# supported Multi-GPU calculations of a wide variety of pricing transparently, with minimal code Single Node and risk measures on a broad range of changes asset classes and derivatives.
    [Show full text]